From bjorn.burr.nyberg at gmail.com Thu Mar 1 05:14:41 2012 From: bjorn.burr.nyberg at gmail.com (Bjorn Burr Nyberg) Date: Thu, 1 Mar 2012 11:14:41 +0100 Subject: [SciPy-User] Loadtxt vs genfromtxt In-Reply-To: References: Message-ID: <967E7468-A62E-47BD-8307-D99B8CC281B1@gmail.com> Hi, I have a general question about loading data into numpy as I want to compare numpy and r by loading the juraset.dat ASCII file from the gstat package. Reading the support documents I have decided that it is better to use the loadtxt function as I do not have any missing data as useful by the genfromtxt function. However I receive this error when running loadtxt: File ..... Numpy\lib\npyio.py, line 796, in loadtxt Items = [conv(Val) for (conv,val) in zip(converts,Vals)] ValueError: invalid literal for float() Using the same parameters but with genfromtxt works, although the first entry of the array is Nan(not a numeric - expected a header like in a data frame of r). I suppose I was wondering if there was any way to save header data of an array whereby one could simply call that header for the data?Do I just have to remember the data associated with each column and call using data[]? Even loading the data as x,y,z = loadtxt is problematic when there are several columns associated with the data that I do not necessarily remember offhand. Thanks for any advice and with your patience as I'm rather new to Numpy. Nyberg From warren.weckesser at enthought.com Thu Mar 1 08:12:11 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 1 Mar 2012 07:12:11 -0600 Subject: [SciPy-User] Loadtxt vs genfromtxt In-Reply-To: <967E7468-A62E-47BD-8307-D99B8CC281B1@gmail.com> References: <967E7468-A62E-47BD-8307-D99B8CC281B1@gmail.com> Message-ID: On Thu, Mar 1, 2012 at 4:14 AM, Bjorn Burr Nyberg < bjorn.burr.nyberg at gmail.com> wrote: > > Hi, > I have a general question about loading data into numpy as I want to > compare numpy and r by loading the juraset.dat ASCII file from the gstat > package. Reading the support documents I have decided that it is better to > use the loadtxt function as I do not have any missing data as useful by the > genfromtxt function. However I receive this error when running loadtxt: > > File ..... Numpy\lib\npyio.py, line 796, in loadtxt > Items = [conv(Val) for (conv,val) in zip(converts,Vals)] > ValueError: invalid literal for float() > > Using the same parameters but with genfromtxt works, although the first > entry of the array is Nan(not a numeric - expected a header like in a data > frame of r). I suppose I was wondering if there was any way to save header > data of an array whereby one could simply call that header for the data?Do > I just have to remember the data associated with each column and call using > data[]? Even loading the data as x,y,z = loadtxt is problematic when there > are several columns associated with the data that I do not necessarily > remember offhand. > > Thanks for any advice and with your patience as I'm rather new to Numpy. > Nyberg > Bjorn, I don't see the text file 'juraset.dat' in the gstat package (gstat_1.0-10), but google finds this: http://www.ualberta.ca/~jbb/files/juraset.dat If that is your file, you can read it with genfromtxt like this: In [1]: data = genfromtxt('juraset.dat', skiprows=26, names=True) In [2]: data[0] Out[2]: (2.386, 3.077, 3.0, 3.0, 1.74, 25.72, 77.36, 9.32, 38.32, 21.32, 92.56) In [3]: data['Zn'][:3] Out[3]: array([ 92.56, 73.56, 64.8 ]) In [4]: data.dtype.names Out[4]: ('X', 'Y', 'Rock', 'Land', 'Cd', 'Cu', 'Pb', 'Co', 'Cr', 'Ni', 'Zn') The option 'names=True' tells genfromtxt to create a structured array, using the fields in first line (after skiprows) as the field names for the array. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From barthpi at gmail.com Thu Mar 1 10:54:48 2012 From: barthpi at gmail.com (Pierre Barthelemy) Date: Thu, 1 Mar 2012 16:54:48 +0100 Subject: [SciPy-User] Scipy fitting Message-ID: Dear all, i am writing a program for data analysis. One of the functions of this program gives the possibility to fit the functions. I therefore use the recipe described in : http://www.scipy.org/Cookbook/FittingData under the section "Simplifying the syntax". This recipe make use of the function: scipy.optimize.leastsq. One thing that i would like to know is how can i get the error on the parameters ? From what i understood from the "Cookbook" page, and from the scipy manual ( http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq), the second argument returned by the leastsq function gives access to these errors. std_error=std(y-function(x)) param_error=sqrt(diagonal(out[1])*std_error) The param_errors that i get in this case are extremely small. Much smaller than what i expected, and much smaller than what i can get fitting the function with matlab. So i guess i made an error here. Can someone tell me how i should do to retrieve the parameter errors ? Bests, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Mar 1 10:54:53 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 01 Mar 2012 16:54:53 +0100 Subject: [SciPy-User] Running NumPy or SciPy scripts in the background on Windows Message-ID: <4F4F9BCD.1070807@molden.no> Often when running long computations with NumPy or SciPy we want to "run the task in the background". This is particularly important on Windows, where a process that is greedy on CPU or RAM can almost make the system unresponsive (e.g. the desktop seems to hang). On Unix there is the "nice" command, but it is not available on Windows. We can set a process priority with the Windows task manager, but that is of no help if the system is unresponsive -- i.e. you cannot get to the task manager. This is very simple to do with the Windows API (or pywin32). Here is how a Python script (e.g. running NumPy or SciPy) can put itself in the background using pywin32: from win32process import (GetCurrentProcess, IDLE_PRIORITY_CLASS, SetPriorityClass) SetPriorityClass(GetCurrentProcess(), IDLE_PRIORITY_CLASS) These are the available flags for SetPriorityClass: REALTIME_PRIORITY_CLASS ## absolute highest priority ## NB! Windows is not a RT OS, ## except Windows CE ABOVE_NORMAL_PRIORITY_CLASS NORMAL_PRIORITY_CLASS ## the normal priority BELOW_NORMAL_PRIORITY_CLASS IDLE_PRIORITY_CLASS ## only execute when system is idle Surprisingly many users of NumPy on Windows does not know this. I thought we might put a "receipe" for doing this in the cookbook? Sometimes we want to do this just for a thread, e.g. to keep an UI responsive. We cannot control thread priorities with the Python stdlib. Using pywin32, a Python thread that does this will put itself in the background "relative to the priority class" for the process -- but not relative to the rest of the system: from win32api import GetCurrentThread from win32process import THREAD_PRIORITY_IDLE, SetThreadPriority SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_IDLE) Python interpreter does not attempt to schedule access to the GIL. That is, a high-priority thread that releases the GIL might win it back, and so get more time in the Python interpreter (at least on Python 2.x). Similary a low-priority thread might more often miss the GIL battle. But if the GIL is locked while an extension library (e.g. NumPy) is doing a long computation, thread priority can be of no relevance. So this is generally less useful than SetPriorityClass. Here are the flags we can use for SetThreadPriority. Note that these are relative to the "priority class" for the process (and complicated by the GIL), not relative to the other processes on the system: THREAD_PRIORITY_TIME_CRITICAL ## higher than 'highest' THREAD_PRIORITY_HIGHEST THREAD_PRIORITY_ABOVE_NORMAL THREAD_PRIORITY_NORMAL THREAD_PRIORITY_BELOW_NORMAL THREAD_PRIORITY_LOWEST THREAD_PRIORITY_IDLE Sturla From josef.pktd at gmail.com Thu Mar 1 11:55:54 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 1 Mar 2012 11:55:54 -0500 Subject: [SciPy-User] Scipy fitting In-Reply-To: References: Message-ID: On Thu, Mar 1, 2012 at 10:54 AM, Pierre Barthelemy wrote: > Dear all, > > i am writing a program for data analysis. One of the functions of this > program gives the possibility to fit the functions. I therefore use the > recipe described in : http://www.scipy.org/Cookbook/FittingData?under the > section "Simplifying the syntax". This recipe make use of the > function:?scipy.optimize.leastsq. > > > One thing that i would like to know is how can i get the error on the > parameters ? From what i understood from the "Cookbook" page, and from the > scipy manual > (http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq), > the second argument returned by the leastsq function gives access to these > errors. > std_error=std(y-function(x)) > param_error=sqrt(diagonal(out[1])*std_error) you are taking the sqrt twice numpy.std takes the sqrt of the variance and then you take it again. once is enough if I read the snippet correctly, you might also add a ddof correction to std (y - function(x)) should have also mean zero if a constant is included ? Josef > > The param_errors that i get in this case are extremely small. Much smaller > than what i expected, and much smaller than what i can get fitting the > function with matlab. So i guess i made an error here. > > Can someone tell me how i should do to retrieve the parameter errors ? > > Bests, > > Pierre > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bjorn.burr.nyberg at gmail.com Thu Mar 1 11:58:22 2012 From: bjorn.burr.nyberg at gmail.com (Bjorn Burr Nyberg) Date: Thu, 1 Mar 2012 17:58:22 +0100 Subject: [SciPy-User] Loadtxt vs genfromtxt In-Reply-To: References: <967E7468-A62E-47BD-8307-D99B8CC281B1@gmail.com> Message-ID: <76E4351E-C5E2-4B28-98DB-12AEEC037DA3@gmail.com> Thanks that's exactly what I was looking for. Juraset.dat was saved by using require(gstat) data(jura) ... Nyberg Sent from my iPad On 1. mars 2012, at 14:12, Warren Weckesser wrote: > > > On Thu, Mar 1, 2012 at 4:14 AM, Bjorn Burr Nyberg wrote: > > Hi, > I have a general question about loading data into numpy as I want to compare numpy and r by loading the juraset.dat ASCII file from the gstat package. Reading the support documents I have decided that it is better to use the loadtxt function as I do not have any missing data as useful by the genfromtxt function. However I receive this error when running loadtxt: > > File ..... Numpy\lib\npyio.py, line 796, in loadtxt > Items = [conv(Val) for (conv,val) in zip(converts,Vals)] > ValueError: invalid literal for float() > > Using the same parameters but with genfromtxt works, although the first entry of the array is Nan(not a numeric - expected a header like in a data frame of r). I suppose I was wondering if there was any way to save header data of an array whereby one could simply call that header for the data?Do I just have to remember the data associated with each column and call using data[]? Even loading the data as x,y,z = loadtxt is problematic when there are several columns associated with the data that I do not necessarily remember offhand. > > Thanks for any advice and with your patience as I'm rather new to Numpy. > Nyberg > > > Bjorn, > > I don't see the text file 'juraset.dat' in the gstat package (gstat_1.0-10), but google finds this: > http://www.ualberta.ca/~jbb/files/juraset.dat > > If that is your file, you can read it with genfromtxt like this: > > > In [1]: data = genfromtxt('juraset.dat', skiprows=26, names=True) > > In [2]: data[0] > Out[2]: (2.386, 3.077, 3.0, 3.0, 1.74, 25.72, 77.36, 9.32, 38.32, 21.32, 92.56) > > In [3]: data['Zn'][:3] > Out[3]: array([ 92.56, 73.56, 64.8 ]) > > In [4]: data.dtype.names > Out[4]: ('X', 'Y', 'Rock', 'Land', 'Cd', 'Cu', 'Pb', 'Co', 'Cr', 'Ni', 'Zn') > > > The option 'names=True' tells genfromtxt to create a structured array, using the fields in first line (after skiprows) as the field names for the array. > > Warren > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Mar 1 11:59:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 1 Mar 2012 08:59:53 -0800 Subject: [SciPy-User] Scipy fitting In-Reply-To: References: Message-ID: You could look at the curve_fit function in scipy optimize as well. It returns the error in the fitted parameters. The source code of that function should be useful for you. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Mar 1, 2012, at 7:54 AM, Pierre Barthelemy wrote: > Dear all, > > i am writing a program for data analysis. One of the functions of this program gives the possibility to fit the functions. I therefore use the recipe described in : http://www.scipy.org/Cookbook/FittingData under the section "Simplifying the syntax". This recipe make use of the function: scipy.optimize.leastsq. > > > One thing that i would like to know is how can i get the error on the parameters ? From what i understood from the "Cookbook" page, and from the scipy manual (http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq), the second argument returned by the leastsq function gives access to these errors. > std_error=std(y-function(x)) > param_error=sqrt(diagonal(out[1])*std_error) > > The param_errors that i get in this case are extremely small. Much smaller than what i expected, and much smaller than what i can get fitting the function with matlab. So i guess i made an error here. > > Can someone tell me how i should do to retrieve the parameter errors ? > > Bests, > > Pierre > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From draft2008 at bk.ru Fri Mar 2 01:02:43 2012 From: draft2008 at bk.ru (=?UTF-8?B?0JLQu9Cw0LTQuNC80LjRgA==?=) Date: Fri, 02 Mar 2012 10:02:43 +0400 Subject: [SciPy-User] =?utf-8?q?Orthogonal_distance_regression_in_3D?= Message-ID: Hello! I'm working with orthogonal distance regression (scipy.odr). I try to fit the curve to a point cloud (3d), but it doesn work properly, it returns wrong results For example I want to fit the simple curve y = a*x + b*z + c to some point cloud (y_data, x_data, z_data) ? ? ? ? def func(p, input): ? ? x,z = input ? ? x = np.array(x) ? ? z = np.array(z) ? ? return (p[0]*x + p[1]*z + p[2]) ? ? initialGuess = [1,1,1] ? ? myModel = Model(func) ? ? myData = Data([x_data, z_daya], y_data) ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) ? ? myOdr.set_job(fit_type=0) ? ? out = myOdr.run() ? ? print out.beta? It works perfectly in 2d dimension (2 axes), but in 3d dimension the results are not even close to real, moreover it is very sensitive to initial Guess, so it returns different result even if i change InitiaGuess from?[1,1,1] to?[0.99,1,1] What do I do wrong? ? Im not very strong in mathematics, but may be I should specify some additional parameters such as Jacobian matrix or weight matrix or something else? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Mar 2 06:48:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 2 Mar 2012 11:48:40 +0000 Subject: [SciPy-User] Orthogonal distance regression in 3D In-Reply-To: References: Message-ID: On Fri, Mar 2, 2012 at 06:02, ???????? wrote: > Hello! > I'm working with orthogonal distance regression (scipy.odr). > I try to fit the curve to a point cloud (3d), but it doesn work properly, it > returns wrong results > > For example I want to fit the simple curve y = a*x + b*z + c to some point > cloud (y_data, x_data, z_data) > > > ? ? def func(p, input): > > ? ? x,z = input > > ? ? x = np.array(x) > > ? ? z = np.array(z) > > ? ? return (p[0]*x + p[1]*z + p[2]) > > > ? ? initialGuess = [1,1,1] > > ? ? myModel = Model(func) > > ? ? myData = Data([x_data, z_daya], y_data) > > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) > > ? ? myOdr.set_job(fit_type=0) > > ? ? out = myOdr.run() > > ? ? print out.beta > > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results > are not even close to real, moreover it is very sensitive to initial Guess, > so it returns different result even if i change InitiaGuess from?[1,1,1] > to?[0.99,1,1] > > What do I do wrong? Can you provide a complete runnable example including some data? Note that if you do not specify any errors on your data, they are assumed to correspond to a standard deviation of 1 for all dimensions. If that is wildly different from the actual variance around the "true" surface, then it might lead the optimizer astray. -- Robert Kern From ralf.gommers at googlemail.com Sat Mar 3 09:12:59 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 3 Mar 2012 15:12:59 +0100 Subject: [SciPy-User] Facing a problem with integrations In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 3:47 AM, Marcel Caraciolo wrote: > Hi all, > > My name is Marcel and I am lecturing scientific computing with Python here > at Brazil. One of my students came to me with a problem that he is > currently solving it with matlab but he decided to change his code to > Python (thanks to the course!) > > The problem is calculate numerically the coefficients aim that are defined > by the following integral [1]. > > It must be calculated using integrals. In the example showed above he > wants to use the trapezoid rule adapted for 2-D arrays or if there is any > another solutions easily with scipy it would be match perfectly also. > > Here is the matrix input (U) and the corresponding coefficients > (solution). The goal is to calculate the corresponding coefficients > by the formula (integral) shown at [1]. > > Could anyone give some a solution using scipy.integrate ? I tried several > proposals but it didn't worked. > It's a double integral, so your first try should be integrate.dblquad. I suggest that you try that, then if you get stuck show us where exactly and we'll try to help you. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Mar 3 11:41:32 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 3 Mar 2012 11:41:32 -0500 Subject: [SciPy-User] OT: Data Analysis in Python Message-ID: just some thoughts, mostly personal http://jpktd.blogspot.com/2012/03/data-in-python.html Josef (trying to blog to keep the mailing list on its trend to be less noisy) From travis at continuum.io Sat Mar 3 14:16:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 3 Mar 2012 13:16:12 -0600 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: <2401158E-2B49-46EF-A961-CB493F20DEC0@continuum.io> Thanks for posting this. The trend towards blogging is not new. It feels like a pretty good venue for this sort of thing --- and it let's the mailing lists stay more technical. "Big-Data" does have a lot of "Big-Hype" but it is my opinion that there are user-stories that the SciPy community would be wise to address. The PyData workshop was basically an attempt to make sure that the noise around Strata at least has some "Python". It was also a chance to be in the same room as many people active in the larger SciPy community as a pre-cursor to PyCon. With R getting a lot of attention from Venture Capitalists, there are a lot of people who are "new" to so-called "Data-Science" who are being pushed to "R" instead of Python because of the larger market messaging. Their use-cases are not informing this community as much as it could be. Wes McKinney, fortunately, is doing a lot of work to change that. -Travis On Mar 3, 2012, at 10:41 AM, josef.pktd at gmail.com wrote: > just some thoughts, mostly personal > > http://jpktd.blogspot.com/2012/03/data-in-python.html > > Josef > (trying to blog to keep the mailing list on its trend to be less noisy) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Sat Mar 3 14:24:57 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 03 Mar 2012 20:24:57 +0100 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: Hi, 03.03.2012 17:41, josef.pktd at gmail.com kirjoitti: > just some thoughts, mostly personal > > http://jpktd.blogspot.com/2012/03/data-in-python.html OT on OT: should your blog be added the planet http://planet.scipy.org/ -- Pauli Virtanen From josef.pktd at gmail.com Sat Mar 3 14:53:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 3 Mar 2012 14:53:27 -0500 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: On Sat, Mar 3, 2012 at 2:24 PM, Pauli Virtanen wrote: > Hi, > > 03.03.2012 17:41, josef.pktd at gmail.com kirjoitti: >> just some thoughts, mostly personal >> >> http://jpktd.blogspot.com/2012/03/data-in-python.html > > OT on OT: should your blog be added the planet http://planet.scipy.org/ Yes, thank you. I wasn't sure I will keep it up, but now it's one command to go from rst file to blogger. The next posts should be more technical. Josef > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From emanuele at relativita.com Sat Mar 3 16:05:14 2012 From: emanuele at relativita.com (Emanuele Olivetti) Date: Sat, 03 Mar 2012 22:05:14 +0100 Subject: [SciPy-User] issue estimating the multivariate Polya distribution: why? Message-ID: <4F52878A.3040808@relativita.com> Dear All, I am playing with the multivariate Polya distribution - also known as the Dirichlet compound multinomial distribution. A brief description from wikipedia: http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution I made a straightforward implementation of the probability density function in the log-scale here: https://gist.github.com/1968113 together with a straightforward montecarlo estimation (by sampling first from a Dirichlet and then computing the log-likelihood of the multinomial) in the log-scale as well. The log-scale was chosen in order to improve numerical stability. If you run the code liked above you should get these two examples: ---- X: [ 0 50 50] alpha: [ 1 10 10] analytic: -5.22892710577 montecarlo -5.23470053651 X: [100 0 0] alpha: [ 1 10 10] analytic: -51.737395965 montecarlo -93.5266543113 ---- As you can see in the first case, i.e. X=[0,50,50], there is excellent agreement between the two implementations while in the second case, i.e. x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo estimate is quite stable and if you change the seed of the random number generator you get numbers not too far from -90. So my question is: where does this issue come from? I cannot see mistakes in the implementation (the code is very simple) and I cannot see the source of numerical instability. Any hint? Best, Emanuele From josef.pktd at gmail.com Sat Mar 3 18:24:32 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 3 Mar 2012 18:24:32 -0500 Subject: [SciPy-User] issue estimating the multivariate Polya distribution: why? In-Reply-To: <4F52878A.3040808@relativita.com> References: <4F52878A.3040808@relativita.com> Message-ID: On Sat, Mar 3, 2012 at 4:05 PM, Emanuele Olivetti wrote: > Dear All, > > I am playing with the multivariate Polya distribution - also > known as the Dirichlet compound multinomial distribution. A > brief description from wikipedia: > ? http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution > > I made a straightforward implementation of the probability > density function in the log-scale here: > ? https://gist.github.com/1968113 > together with a straightforward montecarlo estimation (by > sampling first from a Dirichlet and then computing the log-likelihood > of the multinomial) in the log-scale as well. The log-scale was > chosen in order to improve numerical stability. > > If you run the code liked above you should get these two examples: > ---- > X: [ 0 50 50] > alpha: [ 1 10 10] > analytic: -5.22892710577 > montecarlo -5.23470053651 > > X: [100 ? 0 ? 0] > alpha: [ 1 10 10] > analytic: -51.737395965 > montecarlo -93.5266543113 > ---- > > As you can see in the first case, i.e. X=[0,50,50], there is excellent > agreement between the two implementations while in the second case, i.e. > x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo > estimate is quite stable and if you change the seed of the random > number generator you get numbers not too far from -90. > > So my question is: where does this issue come from? I cannot see > mistakes in the implementation (the code is very simple) and > I cannot see the source of numerical instability. I don't see anything. I was trying out several different parameters, and my only guess is that the logaddexp is not precise enough in this case. My results (numpy 1.5.1) are even worse. The probability that you want to calculate is very low >>> np.exp(-51.737395965) 3.3941765165211696e-23 For larger values it seems to work fine, but it deteriorates fast when the loglikelihood drops below -15 or so (with the versions I have installed). Do you need almost zero probability events? In a similar problem with poisson mixtures but without monte carlo, I was trying out various ways of rescaling, but didn't come up with anything useful. Josef > > Any hint? > > Best, > > Emanuele > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Sat Mar 3 18:51:38 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 3 Mar 2012 18:51:38 -0500 Subject: [SciPy-User] issue estimating the multivariate Polya distribution: why? In-Reply-To: References: <4F52878A.3040808@relativita.com> Message-ID: On Sat, Mar 3, 2012 at 6:24 PM, wrote: > On Sat, Mar 3, 2012 at 4:05 PM, Emanuele Olivetti > wrote: >> Dear All, >> >> I am playing with the multivariate Polya distribution - also >> known as the Dirichlet compound multinomial distribution. A >> brief description from wikipedia: >> ? http://en.wikipedia.org/wiki/Multivariate_P%C3%B3lya_distribution >> >> I made a straightforward implementation of the probability >> density function in the log-scale here: >> ? https://gist.github.com/1968113 >> together with a straightforward montecarlo estimation (by >> sampling first from a Dirichlet and then computing the log-likelihood >> of the multinomial) in the log-scale as well. The log-scale was >> chosen in order to improve numerical stability. >> >> If you run the code liked above you should get these two examples: >> ---- >> X: [ 0 50 50] >> alpha: [ 1 10 10] >> analytic: -5.22892710577 >> montecarlo -5.23470053651 >> >> X: [100 ? 0 ? 0] >> alpha: [ 1 10 10] >> analytic: -51.737395965 >> montecarlo -93.5266543113 >> ---- >> >> As you can see in the first case, i.e. X=[0,50,50], there is excellent >> agreement between the two implementations while in the second case, i.e. >> x=[100,0,0], there is a dramatic disagreement. Note that the montecarlo >> estimate is quite stable and if you change the seed of the random >> number generator you get numbers not too far from -90. >> >> So my question is: where does this issue come from? I cannot see >> mistakes in the implementation (the code is very simple) and >> I cannot see the source of numerical instability. > > I don't see anything. I was trying out several different parameters, > and my only guess is that > the logaddexp is not precise enough in this case. My results (numpy > 1.5.1) are even worse. > > The probability that you want to calculate is very low > >>>> np.exp(-51.737395965) > 3.3941765165211696e-23 > > For larger values it seems to work fine, but it deteriorates fast when > the loglikelihood drops below -15 or so (with the versions I have > installed). just an observation with iterations=1e7 I get much better numbers, which are still way off. But I don't see why this should matter much, since you are simulating alpha and not low probability events. (unless lot's of tiny errors add up in different ways) Josef > > Do you need almost zero probability events? > > In a similar problem with poisson mixtures but without monte carlo, I > was trying out various ways of rescaling, but didn't come up with > anything useful. > > Josef > >> >> Any hint? >> >> Best, >> >> Emanuele >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From aronne.merrelli at gmail.com Sun Mar 4 00:54:47 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Sat, 3 Mar 2012 23:54:47 -0600 Subject: [SciPy-User] issue estimating the multivariate Polya distribution: why? In-Reply-To: References: <4F52878A.3040808@relativita.com> Message-ID: On Sat, Mar 3, 2012 at 5:51 PM, wrote: > On Sat, Mar 3, 2012 at 6:24 PM, ? wrote: >> On Sat, Mar 3, 2012 at 4:05 PM, Emanuele Olivetti >> The probability that you want to calculate is very low >> >>>>> np.exp(-51.737395965) >> 3.3941765165211696e-23 >> >> For larger values it seems to work fine, but it deteriorates fast when >> the loglikelihood drops below -15 or so (with the versions I have >> installed). > > just an observation > > with iterations=1e7 I get much better numbers, which are still way > off. But I don't see why this should matter much, since you are > simulating alpha and not low probability events. (unless lot's of tiny > errors add up in different ways) > > Josef I'm a little out of my field here, so take this with a grain of salt. I think Josef's observation is the key; the problem is the number of samples in the MC is too low. This distribution seems very, very skewed; if you plot the actual values in the second case (specifically, exp(logp_Hs)) - some of it underflows, obviously, but if you plot it in linear scale, it appears to be dominated by 1 or 2 large "outlier" values. The final mean value is largely dependent on only those outliers. The MC just would require *a lot* more samples to get a few realizations that would pull up the mean to more accurately match the analytic prediction. Try this test: compute the MC mean as the number of samples increase; for example (this will take a few minutes to compute - the spacing in the iteration number is overdone but when you plot it, you get some rough approximation to the scatter and expectation value for the MC estimate) X = array([100,0,0]) alpha = array([1,10,10]) n_iter = 10**linspace(3,6,161) test_logmeans = np.zeros(n_iter.shape) for n in range(n_iter.shape[0]): test_logmeans[n] = log_multivariate_polya_montecarlo(X, alpha, int(n_iter[n])) If you plot test_logmeans, it clearly shows a negative bias (relative to the analytic prediction) that decreases as the sample size increases. Cheers, Aronne From gael.varoquaux at normalesup.org Sun Mar 4 07:58:23 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 4 Mar 2012 13:58:23 +0100 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: <20120304125823.GA705@phare.normalesup.org> On Sat, Mar 03, 2012 at 02:53:27PM -0500, josef.pktd at gmail.com wrote: > On Sat, Mar 3, 2012 at 2:24 PM, Pauli Virtanen wrote: > > Hi, > > 03.03.2012 17:41, josef.pktd at gmail.com kirjoitti: > >> just some thoughts, mostly personal > >> http://jpktd.blogspot.com/2012/03/data-in-python.html > > OT on OT: should your blog be added the planet http://planet.scipy.org/ > Yes, thank you. Indeed. I just added it. It will be in the update. Gael From lee.j.joon at gmail.com Sun Mar 4 02:59:12 2012 From: lee.j.joon at gmail.com (Jae-Joon Lee) Date: Sun, 4 Mar 2012 16:59:12 +0900 Subject: [SciPy-User] [Matplotlib-users] matplotlib: Simple legend code no longer works after upgrade to Ubuntu 11.10 In-Reply-To: References: Message-ID: Although this is quite an old post, one need to set the location again. e.g., lh._loc = 2 Regards, -JJ On Wed, Dec 14, 2011 at 12:09 AM, Warren Weckesser wrote: > > > On Mon, Dec 12, 2011 at 7:05 PM, C Barrington-Leigh > wrote: >> >> Oops; I just posted this to comp.lang.python, but I wonder whether >> matplotlib questions are supposed to go to scipy-user? > > > > How about matplotlib-users at lists.sourceforge.net?? I've cc'ed to that list. > > Warren > > >> >> Here it is: >> """ >> Before I upgraded to 2.7.2+ / 4 OCt 2011, the following code added a >> comment line to an axis legend using matplotlib / pylab. >> Now, the same code makes the legend appear "off-screen", ie way >> outside the axes limits. >> >> Can anyone help? And/or is there a new way to add a title and footer >> to the legend? >> >> Thanks! >> """ >> >> from pylab import * >> plot([0,0],[1,1],label='Ubuntu 11.10') >> lh=legend(fancybox=True,shadow=False) >> lh.get_frame().set_alpha(0.5) >> >> from matplotlib.offsetbox import TextArea, VPacker >> fontsize=lh.get_texts()[0].get_fontsize() >> legendcomment=TextArea('extra comments here', >> textprops=dict(size=fontsize)) >> show() >> # Looks fine here >> lh._legend_box = VPacker(pad=5, >> ? ? ? ? ? ? ? ? ? ? ? ? sep=0, >> ? ? ? ? ? ? ? ? ? ? ? ? children=[lh._legend_box,legendcomment], >> ? ? ? ? ? ? ? ? ? ? ? ? align="left") >> lh._legend_box.set_figure(gcf()) >> draw() >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > ------------------------------------------------------------------------------ > Systems Optimization Self Assessment > Improve efficiency and utilization of IT resources. Drive out cost and > improve service delivery. Take 5 minutes to use this Systems Optimization > Self Assessment. http://www.accelacomm.com/jaw/sdnl/114/51450054/ > _______________________________________________ > Matplotlib-users mailing list > Matplotlib-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > From timmichelsen at gmx-topmail.de Sun Mar 4 13:58:11 2012 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Sun, 04 Mar 2012 19:58:11 +0100 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: hello, > http://jpktd.blogspot.com/2012/03/data-in-python.html Is there any information if the slides an materials will be published for those who couldn't attend? I'd be interested also if the materials of the following meeting will be published: Data analysis in Python with pandas https://us.pycon.org/2012/schedule/presentation/427/ It's just difficult to fly round the globe for just one day... Regards, Timmie From wesmckinn at gmail.com Sun Mar 4 14:30:00 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 4 Mar 2012 14:30:00 -0500 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: On Sun, Mar 4, 2012 at 1:58 PM, Tim Michelsen wrote: > hello, > >> http://jpktd.blogspot.com/2012/03/data-in-python.html > Is there any information if the slides an materials will be published > for those who couldn't attend? > > > I'd be interested also if the materials of the following meeting will be > published: > Data analysis in Python with pandas > https://us.pycon.org/2012/schedule/presentation/427/ > > It's just difficult to fly round the globe for just one day... > > Regards, > Timmie > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user hi Timmie, I will definitely post materials from my pandas tutorial (which are at the moment a bit scattered, becoming less so over the next couple days) and my talk at PyCon. I'm also due to cut some screencasts of recent demos. Some videos from the PyData workshop will also be published since we had a recording crew. cheers, Wes From emanuele at relativita.com Sun Mar 4 17:41:58 2012 From: emanuele at relativita.com (Emanuele Olivetti) Date: Sun, 04 Mar 2012 23:41:58 +0100 Subject: [SciPy-User] issue estimating the multivariate Polya distribution: why? In-Reply-To: References: <4F52878A.3040808@relativita.com> Message-ID: <4F53EFB6.60100@relativita.com> On 03/04/2012 06:54 AM, Aronne Merrelli wrote: > On Sat, Mar 3, 2012 at 5:51 PM, wrote: >> On Sat, Mar 3, 2012 at 6:24 PM, wrote: >>> 3.3941765165211696e-23 >>> >>> For larger values it seems to work fine, but it deteriorates fast when >>> the loglikelihood drops below -15 or so (with the versions I have >>> installed). >> just an observation >> >> with iterations=1e7 I get much better numbers, which are still way >> off. But I don't see why this should matter much, since you are >> simulating alpha and not low probability events. (unless lot's of tiny >> errors add up in different ways) >> >> Josef > I'm a little out of my field here, so take this with a grain of salt. > > I think Josef's observation is the key; the problem is the number of > samples in the MC is too low. This distribution seems very, very > skewed; [...] > If you plot test_logmeans, it clearly shows a negative bias (relative > to the analytic prediction) that decreases as the sample size > increases. Thanks Josef and Aronne, your tests and comments are very useful. Indeed the distribution of interest is very skewed and the specific nature of the skewness could be the explanation of the unexpected behavior of the montecarlo estimate. I am trying to set up a minimal and straightforward example where same surprising effect will (hopefully) appear. I guess that crafting a very skewed distribution - even a very simple one - should show such an anomalous behavior and give insights. Unfortunately until now I was not successful. But I'll let you know as soon as I make some progress. Of course any help in digging more on this issue is warmly welcome! Best, Emanuele From ramercer at gmail.com Sun Mar 4 19:05:25 2012 From: ramercer at gmail.com (Adam Mercer) Date: Sun, 4 Mar 2012 18:05:25 -0600 Subject: [SciPy-User] Test failures with SciPy-0.10.1 and Mac OS X Lion Message-ID: Hi I've been updating the scipy in MacPorts and when updating to 0.10.1 the test suite now has the following failures: $ python Python 2.7.2 (default, Mar 1 2012, 20:21:11) [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.test() Running unit tests for scipy NumPy version 1.6.1 NumPy is installed in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy SciPy version 0.10.1 SciPy is installed in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy Python version 2.7.2 (default, Mar 1 2012, 20:21:11) [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] nose version 1.1.2 ====================================================================== FAIL: test_asum (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/tests/test_blas.py", line 58, in test_asum assert_almost_equal(f([3,-4,5]),12) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: 12 ====================================================================== FAIL: test_dot (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/tests/test_blas.py", line 67, in test_dot assert_almost_equal(f([3,-4,5],[2,5,1]),-9) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: -9 ====================================================================== FAIL: test_nrm2 (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/tests/test_blas.py", line 78, in test_nrm2 assert_almost_equal(f([3,-4,5]),math.sqrt(50)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: 7.0710678118654755 ====================================================================== FAIL: test_basic.TestNorm.test_overflow ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_basic.py", line 581, in test_overflow assert_almost_equal(norm(a), a) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 452, in assert_almost_equal return assert_array_almost_equal(actual, desired, decimal, err_msg) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal header=('Arrays are not almost equal to %d decimals' % decimal)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals (mismatch 100.0%) x: array(-0.0) y: array([ 1.00000002e+20], dtype=float32) ====================================================================== FAIL: test_basic.TestNorm.test_stable ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_basic.py", line 586, in test_stable assert_almost_equal(norm(a) - 1e4, 0.5) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: -10000.0 DESIRED: 0.5 ====================================================================== FAIL: test_basic.TestNorm.test_types ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_basic.py", line 568, in test_types assert_allclose(norm(x), np.sqrt(14), rtol=tol) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 1168, in assert_allclose verbose=verbose, header=header) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=2.38419e-06, atol=0 (mismatch 100.0%) x: array(1.0842021724855044e-19) y: array(3.7416573867739413) ====================================================================== FAIL: test_asum (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_blas.py", line 99, in test_asum assert_almost_equal(f([3,-4,5]),12) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: 12 ====================================================================== FAIL: test_dot (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_blas.py", line 109, in test_dot assert_almost_equal(f([3,-4,5],[2,5,1]),-9) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: -9 ====================================================================== FAIL: test_nrm2 (test_blas.TestFBLAS1Simple) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/tests/test_blas.py", line 127, in test_nrm2 assert_almost_equal(f([3,-4,5]),math.sqrt(50)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.0 DESIRED: 7.0710678118654755 ---------------------------------------------------------------------- Ran 5101 tests in 86.528s FAILED (KNOWNFAIL=12, SKIP=31, failures=9) I can't recall seeing these failures before, this has been compiled using the compilers from Xcode-4.3? Are these anything to worry about? Cheers Adam From draft2008 at bk.ru Mon Mar 5 05:26:39 2012 From: draft2008 at bk.ru (=?UTF-8?B?0JLQu9Cw0LTQuNC80LjRgA==?=) Date: Mon, 05 Mar 2012 14:26:39 +0400 Subject: [SciPy-User] =?utf-8?q?Orthogonal_distance_regression_in_3D?= Message-ID: 02 ????? 2012, 15:49 ?? Robert Kern : > On Fri, Mar 2, 2012 at 06:02, ???????? wrote: > > Hello! > > I'm working with orthogonal distance regression (scipy.odr). > > I try to fit the curve to a point cloud (3d), but it doesn work properly, it > > returns wrong results > > > > For example I want to fit the simple curve y = a*x + b*z + c to some point > > cloud (y_data, x_data, z_data) > > > > > > ? ? def func(p, input): > > > > ? ? x,z = input > > > > ? ? x = np.array(x) > > > > ? ? z = np.array(z) > > > > ? ? return (p[0]*x + p[1]*z + p[2]) > > > > > > ? ? initialGuess = [1,1,1] > > > > ? ? myModel = Model(func) > > > > ? ? myData = Data([x_data, z_daya], y_data) > > > > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) > > > > ? ? myOdr.set_job(fit_type=0) > > > > ? ? out = myOdr.run() > > > > ? ? print out.beta > > > > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results > > are not even close to real, moreover it is very sensitive to initial Guess, > > so it returns different result even if i change InitiaGuess from?[1,1,1] > > to?[0.99,1,1] > > > > What do I do wrong? > > Can you provide a complete runnable example including some data? Note > that if you do not specify any errors on your data, they are assumed > to correspond to a standard deviation of 1 for all dimensions. If that > is wildly different from the actual variance around the "true" > surface, then it might lead the optimizer astray. > > -- > Robert Kern > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. Here is an example (Sorry for the huge array of data, but its important to show what happens on it) import numpy as np from scipy.odr import * from math import * x_data = [26.6, 25.5, 26.2, 25.5, 24.9, 24.9, 30.3, 30.4, 35.7, 25.8, 34.7, 32.8, 36.5, 20.4, 24.2, 25.4, 24.5, 23.4, 20.9, 33.3, 27.0, 28.1, 21.7, 24.47, 22.2, 22.43, 23.43, 22.27, 22.27, 28.63, 28.7, 29.1, 30.0, 29.77, 26.93, 27.7, 27.83, 28.23, 31.2, 30.3, 32.13, 27.6, 33.73, 29.07, 29.0, 29.77, 32.7, 32.93, 32.65, 22.63, 25.13, 27.77, 19.77, 31.52, 33.0, 34.1, 33.23, 35.6, 33.83, 32.4, 33.1, 31.08, 31.5, 32.6, 33.67, 31.6, 34.1, 32.92, 34.63, 32.4, 30.1, 33.8, 35.6, 31.9, 29.8, 35.7, 34.4, 34.6, 36.0, 28.63, 27.47, 29.93, 24.17, 24.1, 23.8, 27.2, 26.1, 31.12, 25.17, 29.35, 31.67, 29.67, 30.13, 30.1, 30.27, 30.73, 31.1, 33.33, 28.75, 26.67, 30.37, 28.6, 24.75, 30.5, 29.63, 31.0, 30.15, 29.85, 12.85, 34.2, 20.67, 20.73, 19.83, 13.6, 20.0, 19.45, 21.25, 22.05, 14.27, 28.37, 29.85, 32.8, 32.42, 31.05, 33.15, 32.25, 32.38, 32.42, 34.87, 31.97, 31.98, 33.0, 32.03, 34.4, 34.05, 21.43, 19.92, 22.0, 22.1, 27.57, 21.1, 23.77, 21.63, 26.77, 22.93, 25.67, 30.6, 25.37, 26.7, 31.03, 27.7, 31.13, 29.35, 27.33, 28.6, 26.9, 31.9, 9.37, 30.6, 26.93, 29.95, 31.6, 30.7, 24.6, 31.4, 23.47, 26.25, 33.6, 25.0, 33.8, 31.9, 22.7, 25.23, 20.1, 17.55, 18.45, 17.9, 23.13, 18.33, 18.5, 19.38, 19.27, 22.6, 22.37, 22.97, 25.07, 26.37, 17.43, 21.0, 18.23, 32.83, 30.52, 19.3, 20.1, 5.6, 26.23, 26.95, 26.35, 26.47, 28.8, 28.73, 25.45, 26.53, 23.45, 20.6, 24.47, 23.9, 13.35, 30.47, 29.67, 31.15, 29.15, 29.63, 28.58, 29.0, 31.7, 28.57, 29.58, 29.1, 22.48, 30.0, 26.15, 26.93, 28.77, 30.1, 24.03, 28.77, 29.4, 29.3, 29.95, 27.0, 29.05, 25.62, 26.8, 25.85, 22.42, 21.77, 28.2, 21.0, 20.8, 28.8, 28.7, 24.8, 27.87, 29.4, 29.1, 28.33, 27.6, 32.93, 33.3, 33.17, 31.4, 30.37, 34.8, 25.0, 23.48, 33.4, 30.23, 29.07, 28.9, 26.9, 31.77, 25.57, 19.67, 30.5, 19.97, 18.63, 26.35, 19.23, 31.2, 30.37, 30.6, 31.92, 27.35, 28.15, 28.67, 32.6, 20.83, 29.0, 28.2, 20.7, 20.6, 30.1, 30.4, 30.83, 31.0, 30.7, 29.77, 31.17, 31.63, 31.37, 32.2, 31.3, 20.0, 18.9, 32.4, 32.5, 32.3, 32.53, 21.67, 22.17, 22.4, 21.7, 23.8, 26.1, 20.7, 23.5, 21.5, 15.9, 24.3, 21.7, 25.5, 26.6, 23.3, 14.0, 20.2, 21.0, 25.4, 21.4, 28.4, 16.2, 22.2, 22.8, 21.8, 24.4, 23.6, 21.7, 31.5, 27.3, 27.7, 23.7, 20.9, 22.5, 21.9, 18.1, 20.9, 13.3, 12.9, 25.0, 22.2, 21.4, 31.6, 24.2, 15.0, 15.3, 20.1, 24.5, 21.3, 21.4, 22.8, 27.9, 29.5, 18.8, 18.1, 17.8, 23.5, 19.4, 18.8, 13.8, 17.3, 19.6, 22.1, 22.2, 20.3, 10.7, 22.2, 10.3, 22.4, 16.1, 22.9, 28.0, 28.8, 25.6, 29.7, 29.7, 20.8, 21.0, 27.9, 23.8, 24.1, 21.1, 24.1, 32.6, 21.6, 28.9, 26.9, 5.1, 31.3, 21.2, 22.6, 24.2, 31.1, 24.8, 30.3, 27.3, 26.3, 20.9, 22.2, 29.7, 25.0, 25.9, 31.0, 31.0, 29.0, 25.0, 25.0, 30.2, 28.1, 29.5, 26.8, 27.0, 30.4, 30.8, 31.5, 31.0, 30.7, 32.2, 21.25, 30.35, 31.15, 33.25, 28.1, 33.1, 26.3, 33.05, 31.65, 31.9, 27.15, 28.4, 28.1, 29.9, 31.8, 33.45, 30.3, 31.75, 26.15, 23.6, 23.85, 28.0, 29.5, 25.55, 24.6, 22.25, 27.2, 27.95, 25.0, 29.7, 24.95, 26.6, 22.85, 25.0, 27.55, 26.0, 21.1, 25.95, 29.4, 27.8, 32.6, 25.2, 25.05, 31.0, 27.55, 27.95, 32.15, 30.85, 32.75, 31.05, 31.7, 31.25, 32.55, 30.9, 31.8, 31.45, 29.85, 29.75, 31.75, 32.35, 23.15, 28.3, 28.95, 26.8, 27.55, 28.45, 28.5, 27.45, 17.0, 22.8, 26.05, 27.8, 28.1, 29.25, 27.35, 27.2, 27.85, 27.55, 26.75, 29.0, 27.6, 29.25, 30.45, 28.05, 24.5, 23.15, 27.9, 28.55, 27.85, 23.6, 28.7, 29.55, 27.35, 28.25, 26.7, 25.95, 28.75, 27.6, 27.4, 30.0, 26.7, 28.1, 27.8, 28.0, 32.15, 27.25, 32.0, 33.9, 19.7, 30.15, 29.8, 27.0, 24.65, 22.15, 23.3, 24.0, 23.85, 26.35, 26.35, 26.1, 25.75, 27.7, 26.35, 26.25, 27.9, 29.05, 27.85, 26.0, 26.6, 25.3, 24.75, 24.85, 23.0, 28.95, 22.9, 24.15, 26.95, 21.8, 26.2, 28.5, 22.5, 24.75, 24.45, 30.4, 26.15, 19.85, 18.55, 26.3, 29.8, 26.05, 13.9, 27.65, 30.8, 24.0, 19.6, 21.7, 25.05, 26.65, 21.6, 28.2, 28.65, 25.15, 20.55, 21.55, 23.4, 23.6, 22.15, 21.2, 24.5, 21.7, 21.45, 23.1, 24.1, 22.2, 17.35, 22.05, 22.05, 23.75, 25.15, 25.75, 21.05, 27.43, 22.45, 26.25, 27.1, 27.15, 25.55, 23.0, 26.45, 25.95, 29.2, 25.55, 25.15, 26.45, 27.15, 24.75, 25.7, 32.65, 31.15, 23.85, 27.0, 27.3, 25.15, 25.55, 29.2, 24.0, 25.0, 28.2, 27.95, 24.65, 25.0, 25.6, 25.05, 24.7, 26.9, 23.6, 24.15, 19.75, 20.55, 27.7, 23.75, 24.8, 21.2, 31.2, 26.3, 30.8, 32.4, 22.2, 21.5, 27.4, 26.3, 30.4, 21.0, 23.3, 28.1, 16.65, 26.15, 17.9, 21.35, 21.1, 20.65, 30.4, 23.0, 25.7, 26.7, 22.7, 29.4, 25.9, 29.4, 21.55, 26.3, 25.1, 26.3, 32.7, 27.9, 31.4, 29.6, 31.4, 31.9, 31.45, 32.55, 31.35, 31.65, 29.75, 25.75, 26.1, 28.7, 30.75, 27.4, 24.9, 2.7, 27.2, 26.35, 28.2, 27.5, 28.45, 27.25, 30.5, 28.9, 30.8, 26.45, 24.5, 31.2, 26.45, 24.85, 25.1, 26.0, 25.5, 25.2, 25.6, 24.0, 30.8, 31.8, 31.6, 27.5, 9.2, 22.2, 23.3, 28.1, 30.8, 31.3, 3.5, 29.2, 31.4, 25.3, 29.7, 28.2, 25.6, 32.9, 24.0, 23.5, 27.5, 24.9, 22.4, 28.1, 26.2, 23.9, 23.9, 7.8] z_data = [75.5, 76.7, 78.7, 77.4, 79.5, 73.6, 80.0, 77.3, 46.9, 61.4, 40.3, 56.3, 45.3, 67.0, 80.4, 85.1, 82.5, 69.4, 74.8, 91.0, 79.6, 84.2, 92.5, 73.0, 91.4, 86.0, 76.0, 78.0, 82.3, 37.7, 71.5, 39.3, 60.8, 60.2, 34.3, 56.8, 57.0, 41.0, 51.3, 55.2, 42.1, 36.3, 39.2, 62.9, 77.3, 55.0, 44.0, 44.3, 40.8, 49.2, 72.0, 61.6, 83.6, 46.2, 24.7, 22.8, 17.2, 20.0, 25.9, 28.5, 19.2, 34.4, 29.7, 27.2, 22.0, 29.2, 21.1, 28.7, 23.4, 35.9, 37.8, 17.2, 17.9, 31.7, 39.0, 18.4, 23.3, 23.1, 14.0, 77.9, 72.3, 32.3, 82.0, 78.3, 82.7, 65.1, 54.2, 59.2, 70.7, 23.1, 22.0, 25.0, 29.0, 28.3, 27.8, 27.9, 27.1, 27.7, 48.6, 45.3, 45.0, 55.0, 63.0, 46.8, 55.4, 46.5, 32.1, 61.9, 50.2, 42.4, 34.5, 91.0, 85.3, 76.5, 94.8, 91.7, 76.2, 31.4, 66.9, 27.5, 28.0, 21.0, 14.2, 20.6, 21.6, 24.3, 18.8, 24.2, 13.8, 19.1, 35.8, 19.6, 25.0, 19.4, 19.3, 89.8, 88.1, 91.7, 84.5, 46.6, 88.9, 81.2, 81.0, 57.9, 77.9, 67.0, 31.8, 57.0, 60.5, 45.0, 57.6, 44.5, 36.2, 41.3, 45.7, 49.3, 41.9, 61.7, 32.2, 71.2, 45.0, 32.6, 31.0, 49.0, 29.8, 15.1, 38.5, 27.0, 38.5, 35.8, 4.05, 90.7, 68.7, 85.1, 90.9, 92.7, 94.4, 89.3, 92.2, 95.5, 91.7, 92.9, 91.5, 86.7, 64.0, 74.1, 50.0, 91.8, 87.3, 86.1, 40.8, 40.7, 89.0, 92.9, 93.7, 58.1, 50.5, 58.4, 53.6, 30.7, 43.3, 54.5, 51.0, 96.4, 85.2, 87.0, 60.5, 49.7, 40.0, 51.0, 20.1, 21.2, 44.5, 41.1, 43.3, 38.1, 47.2, 42.7, 52.2, 38.4, 21.9, 56.0, 55.0, 48.9, 22.7, 46.5, 46.5, 22.4, 39.2, 22.0, 53.3, 40.6, 51.1, 26.6, 53.0, 75.0, 77.8, 56.0, 78.0, 74.8, 48.7, 50.6, 19.5, 38.8, 43.1, 61.9, 50.1, 49.8, 27.2, 28.3, 28.3, 34.2, 33.8, 26.6, 71.3, 62.5, 25.6, 40.4, 26.9, 30.1, 24.0, 25.9, 43.5, 80.0, 39.2, 88.0, 87.0, 44.5, 65.1, 34.6, 30.1, 33.4, 34.4, 35.9, 38.5, 42.1, 48.5, 56.0, 37.2, 36.5, 88.5, 73.0, 35.1, 34.7, 28.0, 29.6, 30.7, 30.6, 30.9, 29.9, 30.3, 29.8, 31.0, 88.3, 88.4, 29.8, 29.8, 27.9, 29.2, 73.2, 85.0, 91.4, 83.6, 42.5, 44.4, 57.2, 41.4, 32.5, 60.7, 41.2, 45.9, 37.4, 46.2, 44.2, 45.9, 43.3, 46.2, 46.5, 37.4, 66.4, 62.3, 75.8, 19.4, 15.4, 15.2, 73.8, 78.5, 38.0, 33.2, 32.5, 89.5, 92.2, 83.6, 77.0, 61.7, 88.8, 72.9, 65.6, 30.7, 78.2, 52.4, 46.3, 76.3, 52.1, 53.0, 44.8, 35.5, 40.0, 41.3, 41.1, 32.8, 45.6, 42.9, 43.3, 41.8, 86.8, 81.2, 73.0, 69.7, 65.3, 59.5, 78.5, 58.3, 86.1, 80.0, 74.7, 82.2, 90.8, 67.2, 69.2, 42.0, 56.5, 35.3, 44.7, 51.6, 37.5, 58.3, 45.0, 45.3, 18.2, 70.6, 51.3, 55.2, 68.5, 63.0, 65.6, 95.7, 41.8, 73.7, 67.2, 58.9, 51.4, 57.1, 37.0, 48.9, 61.9, 79.5, 61.6, 53.0, 45.9, 53.4, 55.1, 50.9, 60.2, 55.2, 50.4, 39.3, 44.4, 55.9, 52.4, 45.2, 45.4, 32.8, 37.4, 41.3, 25.9, 22.7, 82.7, 36.3, 45.4, 55.2, 73.7, 44.6, 81.9, 68.0, 35.6, 39.3, 80.9, 33.9, 26.9, 29.8, 31.1, 23.3, 49.9, 30.9, 89.1, 37.4, 64.5, 60.8, 53.4, 76.3, 80.4, 77.1, 56.8, 65.8, 75.8, 25.2, 82.7, 50.6, 69.9, 78.9, 78.9, 57.8, 90.2, 65.9, 57.0, 59.0, 36.3, 73.4, 71.8, 56.5, 77.0, 73.9, 66.9, 59.4, 51.3, 46.1, 52.0, 53.2, 50.8, 57.6, 57.7, 48.0, 59.7, 56.8, 55.4, 45.3, 74.8, 43.9, 46.0, 49.1, 43.8, 43.9, 43.9, 52.5, 38.5, 24.2, 60.8, 81.2, 32.6, 12.5, 15.8, 24.8, 67.9, 69.0, 80.0, 66.4, 86.2, 63.8, 71.0, 72.8, 86.0, 93.0, 52.0, 74.7, 68.3, 74.1, 63.6, 70.1, 80.6, 73.3, 78.9, 80.9, 61.3, 82.3, 69.2, 60.4, 82.7, 80.4, 75.6, 81.1, 54.7, 62.6, 56.7, 41.1, 78.7, 67.1, 70.3, 75.4, 89.0, 74.4, 56.0, 93.4, 63.8, 61.7, 81.5, 82.9, 76.5, 80.1, 73.1, 71.4, 66.6, 53.4, 73.8, 82.6, 60.2, 79.1, 86.8, 80.7, 93.4, 43.9, 85.3, 78.3, 32.8, 47.3, 48.3, 48.6, 67.5, 76.1, 82.2, 35.3, 58.2, 58.6, 62.3, 65.2, 40.6, 67.9, 62.5, 49.2, 28.2, 85.6, 47.6, 94.8, 82.9, 84.6, 94.7, 49.2, 69.5, 81.5, 86.9, 74.7, 68.9, 70.7, 81.9, 73.5, 70.8, 76.2, 82.5, 78.7, 75.3, 76.6, 80.5, 78.9, 65.3, 62.6, 90.4, 59.2, 63.6, 50.0, 72.6, 47.6, 48.2, 48.3, 47.7, 57.1, 51.5, 75.7, 48.8, 83.1, 73.4, 67.0, 64.8, 80.4, 88.9, 34.7, 40.2, 87.4, 78.2, 69.2, 84.8, 83.4, 37.0, 63.3, 62.9, 6.5, 76.2, 87.9, 67.9, 68.3, 77.0, 65.3, 57.1, 67.2, 63.5, 64.8, 64.0, 39.0, 82.5, 64.0, 69.1, 53.4, 76.6, 34.7, 54.0, 89.7, 84.0, 57.6, 66.3, 54.0, 77.6, 84.5, 75.2, 46.5, 77.2, 85.2, 74.0, 69.3, 61.7, 42.7, 82.7, 74.7, 57.2, 78.7, 36.7, 56.2, 47.8, 84.8, 42.6, 48.1, 53.6, 34.0, 38.5, 31.5, 28.9, 29.8, 27.6, 27.7, 27.1, 25.3, 32.6, 59.0, 41.1, 69.8, 49.4, 47.2, 50.2, 81.4, 70.0, 70.0, 78.2, 68.6, 76.7, 63.6, 76.5, 62.5, 66.4, 57.2, 87.6, 82.8, 53.9, 73.0, 78.5, 63.5, 65.3, 67.7, 94.5, 84.5, 77.9, 57.2, 47.8, 70.0, 99.4, 64.0, 100.0, 72.8, 79.5, 88.1, 85.4, 42.1, 59.2, 62.9, 71.5, 71.7, 75.2, 100.0, 52.3, 88.4, 99.8, 71.9, 90.0, 96.1, 69.0, 74.3, 99.7, 85.8, 70.4] y_data = [6.33, 0.73, 12.6, 1.01, 5.95, 2.89, 13.5, 11.5, 360.0, 52.4, 614.0, 477.0, 492.0, 1.51, 1.93, 11.2, 2.16, 4.51, 1.47, 53.0, 0.9, 2.17, 1.7, 5.2, 1.5, 3.6, 9.9, 12.2, 6.8, 35.3, 7.8, 26.7, 10.8, 6.9, 7.7, 15.3, 10.8, 22.9, 16.2, 19.4, 52.6, 25.1, 95.5, 5.6, 4.1, 23.1, 161.4, 72.3, 38.6, 22.0, 5.7, 12.1, 2.2, 77.9, 328.0, 349.1, 323.9, 516.1, 197.7, 172.7, 339.3, 84.1, 194.6, 109.1, 221.1, 169.8, 553.0, 97.1, 294.9, 110.7, 58.4, 857.0, 532.5, 106.5, 51.3, 594.0, 246.4, 406.3, 727.5, 12.7, 11.2, 25.5, 6.5, 5.8, 4.1, 19.5, 40.5, 7.1, 5.1, 545.6, 421.9, 285.1, 317.8, 294.6, 339.6, 308.7, 312.8, 301.2, 26.3, 38.4, 39.3, 51.1, 3.3, 56.1, 33.3, 48.8, 150.9, 11.1, 16.5, 53.9, 1.2, 0.91, 1.7, 4.6, 0.53, 0.59, 11.6, 149.4, 1.3, 410.8, 418.3, 679.4, 731.3, 705.9, 660.1, 543.2, 871.6, 544.8, 1651.5, 1075.5, 226.1, 854.4, 471.2, 669.7, 709.8, 2.1, 1.6, 1.2, 1.1, 7.2, 0.79, 1.1, 1.5, 5.6, 2.9, 3.7, 58.5, 21.1, 9.6, 60.7, 25.8, 41.9, 62.6, 68.7, 56.6, 23.6, 118.8, 3.0, 202.1, 6.4, 40.4, 261.5, 139.4, 21.8, 107.2, 265.5, 116.6, 154.8, 97.5, 224.4, 550.9, 1.2, 12.6, 1.7, 0.97, 1.5, 0.96, 1.1, 0.29, 0.13, 0.62, 0.16, 0.52, 6.3, 30.4, 2.8, 46.0, 1.1, 0.47, 2.2, 63.0, 43.5, 0.46, 0.31, 0.05, 113.1, 18.2, 5.1, 11.7, 262.5, 43.4, 20.2, 30.9, 2.1, 1.6, 0.94, 4.9, 3.3, 91.1, 39.0, 133.3, 48.2, 69.1, 112.6, 112.1, 251.8, 33.8, 108.2, 45.2, 31.0, 82.9, 41.7, 36.7, 75.9, 49.2, 20.8, 80.3, 47.9, 67.7, 55.0, 37.6, 120.4, 50.6, 42.2, 13.8, 4.7, 4.2, 25.0, 0.56, 2.4, 104.7, 24.4, 300.4, 70.4, 40.5, 5.7, 30.9, 8.2, 149.7, 278.5, 288.7, 38.2, 123.0, 686.5, 1.7, 10.9, 939.1, 83.8, 538.5, 259.8, 485.8, 996.9, 10.0, 2.7, 50.4, 0.71, 0.65, 8.9, 1.4, 82.5, 80.8, 92.6, 98.7, 23.4, 34.2, 18.8, 55.4, 2.8, 39.5, 24.0, 0.82, 7.9, 75.9, 98.4, 108.5, 82.8, 122.0, 138.9, 157.9, 161.5, 174.0, 161.7, 163.7, 2.4, 0.75, 206.2, 187.8, 168.3, 157.6, 2.9, 0.96, 1.4, 1.03, 24.4, 80.8, 13.6, 66.0, 156.5, 1.8, 38.6, 43.0, 150.7, 81.4, 140.3, 18.2, 37.3, 31.6, 50.2, 86.1, 5.1, 24.6, 6.5, 150.8, 288.2, 481.6, 5.0, 5.9, 179.2, 248.6, 150.2, 1.3, 1.4, 4.6, 24.9, 4.8, 4.6, 0.98, 3.2, 211.1, 2.2, 14.6, 45.0, 3.4, 4.8, 5.3, 22.0, 53.4, 38.2, 29.5, 69.2, 179.9, 29.7, 18.6, 26.0, 28.8, 1.2, 0.9, 7.3, 1.6, 3.5, 22.7, 0.81, 15.1, 0.76, 0.46, 5.5, 0.29, 0.76, 2.4, 40.1, 173.8, 32.9, 69.7, 24.0, 7.7, 30.8, 15.0, 80.0, 68.4, 428.6, 16.3, 82.4, 42.3, 12.5, 2.7, 6.7, 0.23, 79.2, 1.3, 4.9, 12.8, 26.5, 11.9, 107.9, 27.5, 10.0, 1.1, 26.1, 54.0, 74.8, 26.3, 72.8, 71.3, 34.1, 80.5, 33.9, 201.8, 138.2, 35.8, 40.0, 65.4, 72.5, 96.6, 58.3, 31.0, 624.9, 1047.6, 0.41, 112.6, 66.5, 19.4, 1.1, 75.1, 0.68, 3.8, 28.2, 126.1, 0.91, 50.8, 62.4, 45.9, 137.7, 575.8, 78.1, 36.5, 0.41, 24.8, 4.8, 6.4, 8.8, 1.4, 1.4, 1.1, 7.3, 9.7, 6.2, 11.5, 1.1, 5.4, 5.1, 1.7, 77.8, 19.4, 0.53, 4.3, 29.5, 2.8, 225.9, 3.0, 14.7, 10.8, 3.3, 11.0, 6.8, 11.4, 73.3, 112.3, 21.6, 24.2, 34.0, 12.5, 25.2, 46.7, 12.7, 261.0, 2.5, 50.0, 0.3, 77.6, 126.0, 126.6, 87.0, 14.2, 77.0, 31.2, 19.4, 182.1, 19.4, 5.6, 63.6, 1316.0, 620.7, 744.8, 3.6, 5.9, 3.0, 3.4, 1.5, 9.6, 12.3, 6.3, 0.84, 50.0, 50.5, 1.24, 15.3, 1.6, 12.5, 9.0, 8.0, 9.6, 1.3, 1.3, 11.5, 3.7, 3.4, 13.9, 1.9, 1.4, 9.1, 1.4, 41.3, 31.7, 40.5, 191.4, 1.9, 15.7, 4.5, 3.5, 0.37, 2.0, 6.5, 1.0, 5.6, 5.7, 2.1, 1.7, 7.5, 1.3, 12.3, 6.8, 9.9, 52.9, 2.8, 2.5, 11.4, 7.1, 5.1, 4.6, 1.0, 53.4, 0.84, 1.08, 316.4, 63.6, 20.5, 50.3, 7.5, 0.91, 1.8, 123.8, 8.4, 15.3, 6.7, 4.4, 172.4, 3.6, 5.3, 74.7, 377.5, 0.91, 35.4, 2.0, 2.0, 1.9, 2.1, 17.5, 1.5, 2.1, 1.2, 2.5, 4.6, 4.1, 1.2, 1.5, 5.3, 1.9, 0.85, 1.3, 2.2, 3.7, 1.1, 0.75, 4.9, 10.4, 1.1, 21.8, 8.3, 60.5, 3.6, 85.4, 56.6, 88.5, 55.5, 24.5, 75.2, 4.3, 76.3, 0.93, 4.2, 7.65, 25.1, 3.5, 0.83, 434.8, 255.1, 0.77, 2.2, 7.2, 1.1, 1.3, 460.0, 6.5, 33.8, 9.6, 5.8, 0.85, 7.5, 7.8, 3.4, 17.9, 22.0, 4.4, 16.8, 4.7, 5.1, 9.7, 3.5, 16.5, 2.8, 80.2, 6.0, 139.2, 18.6, 1.27, 2.3, 18.5, 4.1, 5.4, 1.7, 7.9, 3.3, 6.1, 5.5, 1.4, 2.7, 14.5, 12.8, 60.3, 3.4, 2.2, 5.8, 6.3, 144.3, 39.6, 37.3, 3.2, 92.4, 43.0, 16.3, 261.8, 102.0, 250.9, 321.2, 375.1, 447.3, 493.1, 601.4, 543.1, 544.5, 30.7, 46.8, 14.8, 18.3, 76.2, 18.1, 7.8, 0.09, 3.3, 2.8, 2.3, 19.1, 6.2, 1.9, 43.2, 11.8, 16.7, 8.1, 1.9, 16.9, 2.9, 1.6, 2.4, 6.8, 2.3, 1.97, 4.23, 10.47, 83.56, 81.33, 24.98, 5.58, 0.12, 1.16, 2.01, 32.6, 43.62, 193.7, 0.13, 13.56, 13.3, 37.79, 19.85, 13.25, 2.38, 375.7, 0.79, 15.84, 12.19, 2.94, 0.63, 5.68, 5.68, 12.54, 6.73, 0.66] def funcReturner(p, input): input = np.array(input) x = input[0] z = input[1] return 10**(p[0]*x + p[1]*z +p[2]) myModel = Model(funcReturner) myData = Data([x_data,z_data], y_data) myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, 1.75]) myOdr.set_job(fit_type=0) out = myOdr.run() result = out.beta print "Optimal coefficients: ", result I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 From robert.kern at gmail.com Mon Mar 5 05:58:46 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Mar 2012 10:58:46 +0000 Subject: [SciPy-User] Orthogonal distance regression in 3D In-Reply-To: References: Message-ID: On Mon, Mar 5, 2012 at 10:26, ???????? wrote: > 02 ????? 2012, 15:49 ?? Robert Kern : >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: >> > Hello! >> > I'm working with orthogonal distance regression (scipy.odr). >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it >> > returns wrong results >> > >> > For example I want to fit the simple curve y = a*x + b*z + c to some point >> > cloud (y_data, x_data, z_data) >> > >> > >> > ? ? def func(p, input): >> > >> > ? ? x,z = input >> > >> > ? ? x = np.array(x) >> > >> > ? ? z = np.array(z) >> > >> > ? ? return (p[0]*x + p[1]*z + p[2]) >> > >> > >> > ? ? initialGuess = [1,1,1] >> > >> > ? ? myModel = Model(func) >> > >> > ? ? myData = Data([x_data, z_daya], y_data) >> > >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) >> > >> > ? ? myOdr.set_job(fit_type=0) >> > >> > ? ? out = myOdr.run() >> > >> > ? ? print out.beta >> > >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results >> > are not even close to real, moreover it is very sensitive to initial Guess, >> > so it returns different result even if i change InitiaGuess from?[1,1,1] >> > to?[0.99,1,1] >> > >> > What do I do wrong? >> >> Can you provide a complete runnable example including some data? Note >> that if you do not specify any errors on your data, they are assumed >> to correspond to a standard deviation of 1 for all dimensions. If that >> is wildly different from the actual variance around the "true" >> surface, then it might lead the optimizer astray. >> >> -- >> Robert Kern >> > > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. > > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) > > import numpy as np > from scipy.odr import * > from math import * [snip] > def funcReturner(p, input): > ? ? ? ?input = np.array(input) > ? ? ? ?x = input[0] > ? ? ? ?z = input[1] > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear problem you initially asked about. Setting the uncertainties accurately on all axes of your data is essential. Do you really know what they are? It's possible that you want to try fitting a plane to np.log10(y_data) instead. > myModel = Model(funcReturner) > myData = Data([x_data,z_data], y_data) > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) > myOdr.set_job(fit_type=0) > out = myOdr.run() > result = out.beta > > print "Optimal coefficients: ", result > > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 For such a nonlinear problem, finding reasonable initial guesses is useful. There is also a maximum iteration limit defaulting to a fairly low 50. Check out.stopreason to see if it actually converged or just ran into the iteration limit. You can keep calling myOdr.restart() until it converges. If I start with beta0=[1,1,1], it converges somewhere between 300 and 400 iterations. -- Robert Kern From pav at iki.fi Mon Mar 5 07:35:16 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 05 Mar 2012 13:35:16 +0100 Subject: [SciPy-User] Test failures with SciPy-0.10.1 and Mac OS X Lion In-Reply-To: References: Message-ID: 05.03.2012 01:05, Adam Mercer kirjoitti: [clip] > I can't recall seeing these failures before, this has been compiled > using the compilers from Xcode-4.3? > > Are these anything to worry about? The failures seem to indicate that (single-precision) linear algebra routines are not functioning correctly. These are probably a sign of some Veclib / Fortran ABI mismatch related to Fortran functions -- similar problems have been seen e.g. when mixing g77 and gfortran compilers (also on other platforms than OSX), and on OSX adding Veclib to the mix seems make the situation even more finicky. Could you file a ticket: http://projects.scipy.org/scipy/ Currently, we use the Veclib's C interface for complex-valued functions, but possibly we should do this for all Fortran functions to work around these problems. -- Pauli Virtanen From ramercer at gmail.com Mon Mar 5 10:30:22 2012 From: ramercer at gmail.com (Adam Mercer) Date: Mon, 5 Mar 2012 09:30:22 -0600 Subject: [SciPy-User] Test failures with SciPy-0.10.1 and Mac OS X Lion In-Reply-To: References: Message-ID: On Mon, Mar 5, 2012 at 06:35, Pauli Virtanen wrote: > The failures seem to indicate that (single-precision) linear algebra > routines are not functioning correctly. These are probably a sign of > some Veclib / Fortran ABI mismatch related to Fortran functions -- > similar problems have been seen e.g. when mixing g77 and gfortran > compilers (also on other platforms than OSX), and on OSX adding Veclib > to the mix seems make the situation even more finicky. > > Could you file a ticket: > http://projects.scipy.org/scipy/ > > Currently, we use the Veclib's C interface for complex-valued functions, > but possibly we should do this for all Fortran functions to work around > these problems. Filed: In my original email I said that the compilers from Xcode-4.3 had been used, this is not the case. Everything should have been build using the compilers from gcc-4.4.6. Xcode-4.3 is installed on the machine but it's compilers aren't used for building scipy itself. Cheers Adam From timmichelsen at gmx-topmail.de Mon Mar 5 10:40:34 2012 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 05 Mar 2012 16:40:34 +0100 Subject: [SciPy-User] OT: Data Analysis in Python In-Reply-To: References: Message-ID: > I will definitely post materials from my pandas tutorial (which are at > the moment a bit scattered, becoming less so over the next couple > days) and my talk at PyCon. I'm also due to cut some screencasts of > recent demos. > > Some videos from the PyData workshop will also be published since we > had a recording crew. Good news. Your effort is really appreciated. Thanks. From dcday137 at gmail.com Mon Mar 5 13:27:45 2012 From: dcday137 at gmail.com (Collin Day) Date: Mon, 5 Mar 2012 11:27:45 -0700 Subject: [SciPy-User] How to feed np.mgrid a variable number of 'arguments' Message-ID: Hi all, I am guessing there is an easy way to do this, but I am just not seeing it. I have a function where I can have a variable number of input dimensions. In the function, I need to use np.mgrid to generate the data I need. How would I create a line of code that would feed np.mgrid a variable number of inputs? For example: 3d, with 17 nodes a = np.mgrid[0:17,0:17,0:17] 4d a = np.mgrid[0:17,0:17,0:17,0:17] Is there a way I can do nodes=17 inDims = a_number a = np.mgrid[0:17,0:17...a_number of times] easily? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From cweisiger at msg.ucsf.edu Mon Mar 5 13:36:25 2012 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Mon, 5 Mar 2012 10:36:25 -0800 Subject: [SciPy-User] How to feed np.mgrid a variable number of 'arguments' In-Reply-To: References: Message-ID: On Mon, Mar 5, 2012 at 10:27 AM, Collin Day wrote: > Is there a way I can do > > a = np.mgrid[0:17,0:17...a_number of times] You want the slice() function. For example: slices = [] for i in xrange(num_times): slices.append(slice(0, 17)) a = np.mgrid[slices] -Chris From robert.kern at gmail.com Mon Mar 5 13:36:53 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Mar 2012 18:36:53 +0000 Subject: [SciPy-User] How to feed np.mgrid a variable number of 'arguments' In-Reply-To: References: Message-ID: On Mon, Mar 5, 2012 at 18:27, Collin Day wrote: > Hi all, > > I am guessing there is an easy way to do this, but I am just not seeing it. > I have a function where I can have a variable number of input dimensions. > In the function, I need to use np.mgrid to generate the data I need.? How > would I create a line of code that would feed np.mgrid a variable number of > inputs?? For example: > > 3d, with 17 nodes > > a = np.mgrid[0:17,0:17,0:17] > > 4d > > a = np.mgrid[0:17,0:17,0:17,0:17] > > Is there a way I can do > > nodes=17 > inDims = a_number > > a = np.mgrid[0:17,0:17...a_number of times] > > easily? ix = (slice(0, nodes),) * inDims a = np.mgrid[idx] -- Robert Kern From draft2008 at bk.ru Tue Mar 6 03:22:15 2012 From: draft2008 at bk.ru (=?UTF-8?B?0JLQu9Cw0LTQuNC80LjRgA==?=) Date: Tue, 06 Mar 2012 12:22:15 +0400 Subject: [SciPy-User] =?utf-8?q?Orthogonal_distance_regression_in_3D?= In-Reply-To: References: Message-ID: 05 ????? 2012, 14:59 ?? Robert Kern : > On Mon, Mar 5, 2012 at 10:26, ???????? wrote: > > 02 ????? 2012, 15:49 ?? Robert Kern : > >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: > >> > Hello! > >> > I'm working with orthogonal distance regression (scipy.odr). > >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it > >> > returns wrong results > >> > > >> > For example I want to fit the simple curve y = a*x + b*z + c to some point > >> > cloud (y_data, x_data, z_data) > >> > > >> > > >> > ? ? def func(p, input): > >> > > >> > ? ? x,z = input > >> > > >> > ? ? x = np.array(x) > >> > > >> > ? ? z = np.array(z) > >> > > >> > ? ? return (p[0]*x + p[1]*z + p[2]) > >> > > >> > > >> > ? ? initialGuess = [1,1,1] > >> > > >> > ? ? myModel = Model(func) > >> > > >> > ? ? myData = Data([x_data, z_daya], y_data) > >> > > >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) > >> > > >> > ? ? myOdr.set_job(fit_type=0) > >> > > >> > ? ? out = myOdr.run() > >> > > >> > ? ? print out.beta > >> > > >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results > >> > are not even close to real, moreover it is very sensitive to initial Guess, > >> > so it returns different result even if i change InitiaGuess from?[1,1,1] > >> > to?[0.99,1,1] > >> > > >> > What do I do wrong? > >> > >> Can you provide a complete runnable example including some data? Note > >> that if you do not specify any errors on your data, they are assumed > >> to correspond to a standard deviation of 1 for all dimensions. If that > >> is wildly different from the actual variance around the "true" > >> surface, then it might lead the optimizer astray. > >> > >> -- > >> Robert Kern > >> > > > > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. > > > > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) > > > > import numpy as np > > from scipy.odr import * > > from math import * > > [snip] > > > def funcReturner(p, input): > > ? ? ? ?input = np.array(input) > > ? ? ? ?x = input[0] > > ? ? ? ?z = input[1] > > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) > > Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear > problem you initially asked about. Setting the uncertainties > accurately on all axes of your data is essential. Do you really know > what they are? It's possible that you want to try fitting a plane to > np.log10(y_data) instead. > > > myModel = Model(funcReturner) > > myData = Data([x_data,z_data], y_data) > > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) > > myOdr.set_job(fit_type=0) > > out = myOdr.run() > > result = out.beta > > > > print "Optimal coefficients: ", result > > > > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. > > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 > > For such a nonlinear problem, finding reasonable initial guesses is > useful. There is also a maximum iteration limit defaulting to a fairly > low 50. Check out.stopreason to see if it actually converged or just > ran into the iteration limit. You can keep calling myOdr.restart() > until it converges. If I start with beta0=[1,1,1], it converges > somewhere between 300 and 400 iterations. > > -- > Robert Kern > Yeah, increasing the number of iterations (maxit parameter) makes the results slightly more accurate, but not better. I mean if I attain that the stop reason is "sum square convergence", results are even worse. But, I tryed to fit converted function, like you recommended - np.log10(y_data). And it gave me the proper results. Why that happens and is it possible to achieve these results without convertion? I could use converted function further, but the problem is that I have the whole list of different functions to fit. And I'd like to create universal fitter for all of them. From evanmason at gmail.com Tue Mar 6 04:24:51 2012 From: evanmason at gmail.com (Evan Mason) Date: Tue, 6 Mar 2012 09:24:51 +0000 (UTC) Subject: [SciPy-User] scipy.spatial, dsearchn? Message-ID: Hi, I am wondering if there is any way in scipy.spatial to get the information given by matlab's dsearchn, i.e.: http://www.mathworks.es/help/techdoc/ref/dsearchn.html k = dsearchn(X,T,XI) returns the indices k of the closest points in X for each point in XI. X is an m-by-n matrix representing m points in n-dimensional space. XI is a p-by-n matrix, representing p points in n-dimensional space. T is a numt-by-n+1 matrix, a triangulation of the data X generated by delaunayn. The output k is a column vector of length p. Many thanks, Evan From macdonald at maths.ox.ac.uk Tue Mar 6 04:33:23 2012 From: macdonald at maths.ox.ac.uk (Colin Macdonald) Date: Tue, 06 Mar 2012 09:33:23 +0000 Subject: [SciPy-User] [Job] Undergrad summer research project at Oxford Message-ID: <4F55D9E3.40007@maths.ox.ac.uk> Hi, I'm looking for a talented undergrad student to work on a 10-week summer research project at Oxford. We're developing Scipy/Numpy code for solving differential equations on curved surfaces (to be released as free/open source software). A background in both Python and numerical analysis (finite differences mostly) is recommended. More info at: http://www.maths.ox.ac.uk/groups/occam/vacancies/oss2012 (also lists other projects). Candidates would need to be in their final two years or between ugrad and MSc). Deadline is March 21st, 2012. thanks, Colin -- Colin Macdonald University Lecturer in Numerical Methodologies Tutorial Fellow at Oriel College University of Oxford From robert.kern at gmail.com Tue Mar 6 06:14:48 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 6 Mar 2012 11:14:48 +0000 Subject: [SciPy-User] Orthogonal distance regression in 3D In-Reply-To: References: Message-ID: On Tue, Mar 6, 2012 at 08:22, ???????? wrote: > > 05 ????? 2012, 14:59 ?? Robert Kern : >> On Mon, Mar 5, 2012 at 10:26, ???????? wrote: >> > 02 ????? 2012, 15:49 ?? Robert Kern : >> >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: >> >> > Hello! >> >> > I'm working with orthogonal distance regression (scipy.odr). >> >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it >> >> > returns wrong results >> >> > >> >> > For example I want to fit the simple curve y = a*x + b*z + c to some point >> >> > cloud (y_data, x_data, z_data) >> >> > >> >> > >> >> > ? ? def func(p, input): >> >> > >> >> > ? ? x,z = input >> >> > >> >> > ? ? x = np.array(x) >> >> > >> >> > ? ? z = np.array(z) >> >> > >> >> > ? ? return (p[0]*x + p[1]*z + p[2]) >> >> > >> >> > >> >> > ? ? initialGuess = [1,1,1] >> >> > >> >> > ? ? myModel = Model(func) >> >> > >> >> > ? ? myData = Data([x_data, z_daya], y_data) >> >> > >> >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) >> >> > >> >> > ? ? myOdr.set_job(fit_type=0) >> >> > >> >> > ? ? out = myOdr.run() >> >> > >> >> > ? ? print out.beta >> >> > >> >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results >> >> > are not even close to real, moreover it is very sensitive to initial Guess, >> >> > so it returns different result even if i change InitiaGuess from?[1,1,1] >> >> > to?[0.99,1,1] >> >> > >> >> > What do I do wrong? >> >> >> >> Can you provide a complete runnable example including some data? Note >> >> that if you do not specify any errors on your data, they are assumed >> >> to correspond to a standard deviation of 1 for all dimensions. If that >> >> is wildly different from the actual variance around the "true" >> >> surface, then it might lead the optimizer astray. >> >> >> >> -- >> >> Robert Kern >> >> >> > >> > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. >> > >> > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) >> > >> > import numpy as np >> > from scipy.odr import * >> > from math import * >> >> [snip] >> >> > def funcReturner(p, input): >> > ? ? ? ?input = np.array(input) >> > ? ? ? ?x = input[0] >> > ? ? ? ?z = input[1] >> > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) >> >> Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear >> problem you initially asked about. Setting the uncertainties >> accurately on all axes of your data is essential. Do you really know >> what they are? It's possible that you want to try fitting a plane to >> np.log10(y_data) instead. >> >> > myModel = Model(funcReturner) >> > myData = Data([x_data,z_data], y_data) >> > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) >> > myOdr.set_job(fit_type=0) >> > out = myOdr.run() >> > result = out.beta >> > >> > print "Optimal coefficients: ", result >> > >> > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. >> > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 >> >> For such a nonlinear problem, finding reasonable initial guesses is >> useful. There is also a maximum iteration limit defaulting to a fairly >> low 50. Check out.stopreason to see if it actually converged or just >> ran into the iteration limit. You can keep calling myOdr.restart() >> until it converges. If I start with beta0=[1,1,1], it converges >> somewhere between 300 and 400 iterations. >> >> -- >> Robert Kern >> > > Yeah, increasing the number of iterations (maxit parameter) makes the results slightly more accurate, but not better. I mean if I attain that the stop reason is "sum square convergence", results are even worse. But, I tryed to fit converted function, like you recommended - np.log10(y_data). And it gave me the proper results. Why that happens and is it possible to achieve these results without convertion? As I mentioned before, in a nonlinear case, you really need to have good estimates of the uncertainties on each point. Since your Y variable varies over several orders of magnitude, I really doubt that the uncertainties are the same for each point. It's more likely that you want to assign a relative 10% (or whatever) uncertainty to each point rather than the same absolute uncertainty to each. I don't think that you have really measured both 1651.5+-1.0 and 0.05+-1.0, but that's what you are implicitly saying when you don't provide explicit estimates of the uncertainties. One problem that you are going to run into is that least-squares isn't especially appropriate for your model. Your Y output is strictly positive, but it goes very close to 0.0. The error model that least-squares fits is that each measurement follows a Gaussian distribution about the true point, and the Gaussian has infinite support (specifically, it crosses that 0 line, and you know a priori that you will never observe a negative value). For the observations ~1000.0, that doesn't matter much, but it severely distorts the problem at 0.05. Your true error distribution is probably something like log-normal; the errors below the curve are small but the errors above can be large. Transforming strictly-positive data with a logarithm is a standard technique. In a sense, the "log-transformed" model is the "true" model to be using, at least if you want to use least-squares. Looking at the residuals of both original and the log10-transformed problem (try plot(x_data, out.eps, 'k.'), plot(x_data, out.delta[0], 'k.'), etc.), it looks like the log10-transformed data does fit fairly well; the residuals mostly follow a normal distribution of the same size across the dataset. That's good! But it also means that if you transform these residuals back to the original space, they don't follow a normal distribution anymore, and using least-squares to fit the problem isn't appropriate anymore. > I could use converted function further, but the problem is that I have the whole list of different functions to fit. And I'd like to create universal fitter for all of them. Well, you will have to go through those functions (and their implicit error models) and determine if least-squares is truly appropriate for them. Least-squares is not appropriate for all models. However, log-transforming the strictly-positive variables in a model quite frequently is all you need to do to turn a least-squares-inappropriate model into a least-squares-appropriate one. You can write your functions in that log-transformed form and write a little adapter to transform the data (which is given to you in the original form). -- Robert Kern From draft2008 at bk.ru Wed Mar 7 01:57:46 2012 From: draft2008 at bk.ru (=?UTF-8?B?0JLQu9Cw0LTQuNC80LjRgA==?=) Date: Wed, 07 Mar 2012 10:57:46 +0400 Subject: [SciPy-User] =?utf-8?q?Orthogonal_distance_regression_in_3D?= In-Reply-To: References: Message-ID: 06 ????? 2012, 15:15 ?? Robert Kern : > On Tue, Mar 6, 2012 at 08:22, ???????? wrote: > > > > 05 ????? 2012, 14:59 ?? Robert Kern : > >> On Mon, Mar 5, 2012 at 10:26, ???????? wrote: > >> > 02 ????? 2012, 15:49 ?? Robert Kern : > >> >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: > >> >> > Hello! > >> >> > I'm working with orthogonal distance regression (scipy.odr). > >> >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it > >> >> > returns wrong results > >> >> > > >> >> > For example I want to fit the simple curve y = a*x + b*z + c to some point > >> >> > cloud (y_data, x_data, z_data) > >> >> > > >> >> > > >> >> > ? ? def func(p, input): > >> >> > > >> >> > ? ? x,z = input > >> >> > > >> >> > ? ? x = np.array(x) > >> >> > > >> >> > ? ? z = np.array(z) > >> >> > > >> >> > ? ? return (p[0]*x + p[1]*z + p[2]) > >> >> > > >> >> > > >> >> > ? ? initialGuess = [1,1,1] > >> >> > > >> >> > ? ? myModel = Model(func) > >> >> > > >> >> > ? ? myData = Data([x_data, z_daya], y_data) > >> >> > > >> >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) > >> >> > > >> >> > ? ? myOdr.set_job(fit_type=0) > >> >> > > >> >> > ? ? out = myOdr.run() > >> >> > > >> >> > ? ? print out.beta > >> >> > > >> >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results > >> >> > are not even close to real, moreover it is very sensitive to initial Guess, > >> >> > so it returns different result even if i change InitiaGuess from?[1,1,1] > >> >> > to?[0.99,1,1] > >> >> > > >> >> > What do I do wrong? > >> >> > >> >> Can you provide a complete runnable example including some data? Note > >> >> that if you do not specify any errors on your data, they are assumed > >> >> to correspond to a standard deviation of 1 for all dimensions. If that > >> >> is wildly different from the actual variance around the "true" > >> >> surface, then it might lead the optimizer astray. > >> >> > >> >> -- > >> >> Robert Kern > >> >> > >> > > >> > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. > >> > > >> > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) > >> > > >> > import numpy as np > >> > from scipy.odr import * > >> > from math import * > >> > >> [snip] > >> > >> > def funcReturner(p, input): > >> > ? ? ? ?input = np.array(input) > >> > ? ? ? ?x = input[0] > >> > ? ? ? ?z = input[1] > >> > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) > >> > >> Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear > >> problem you initially asked about. Setting the uncertainties > >> accurately on all axes of your data is essential. Do you really know > >> what they are? It's possible that you want to try fitting a plane to > >> np.log10(y_data) instead. > >> > >> > myModel = Model(funcReturner) > >> > myData = Data([x_data,z_data], y_data) > >> > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) > >> > myOdr.set_job(fit_type=0) > >> > out = myOdr.run() > >> > result = out.beta > >> > > >> > print "Optimal coefficients: ", result > >> > > >> > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. > >> > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 > >> > >> For such a nonlinear problem, finding reasonable initial guesses is > >> useful. There is also a maximum iteration limit defaulting to a fairly > >> low 50. Check out.stopreason to see if it actually converged or just > >> ran into the iteration limit. You can keep calling myOdr.restart() > >> until it converges. If I start with beta0=[1,1,1], it converges > >> somewhere between 300 and 400 iterations. > >> > >> -- > >> Robert Kern > >> > > > > Yeah, increasing the number of iterations (maxit parameter) makes the results slightly more accurate, but not better. I mean if I attain that the stop reason is "sum square convergence", results are even worse. But, I tryed to fit converted function, like you recommended - np.log10(y_data). And it gave me the proper results. Why that happens and is it possible to achieve these results without convertion? > > As I mentioned before, in a nonlinear case, you really need to have > good estimates of the uncertainties on each point. Since your Y > variable varies over several orders of magnitude, I really doubt that > the uncertainties are the same for each point. It's more likely that > you want to assign a relative 10% (or whatever) uncertainty to each > point rather than the same absolute uncertainty to each. I don't think > that you have really measured both 1651.5+-1.0 and 0.05+-1.0, but > that's what you are implicitly saying when you don't provide explicit > estimates of the uncertainties. > > One problem that you are going to run into is that least-squares isn't > especially appropriate for your model. Your Y output is strictly > positive, but it goes very close to 0.0. The error model that > least-squares fits is that each measurement follows a Gaussian > distribution about the true point, and the Gaussian has infinite > support (specifically, it crosses that 0 line, and you know a priori > that you will never observe a negative value). For the observations > ~1000.0, that doesn't matter much, but it severely distorts the > problem at 0.05. Your true error distribution is probably something > like log-normal; the errors below the curve are small but the errors > above can be large. Transforming strictly-positive data with a > logarithm is a standard technique. In a sense, the "log-transformed" > model is the "true" model to be using, at least if you want to use > least-squares. Looking at the residuals of both original and the > log10-transformed problem (try plot(x_data, out.eps, 'k.'), > plot(x_data, out.delta[0], 'k.'), etc.), it looks like the > log10-transformed data does fit fairly well; the residuals mostly > follow a normal distribution of the same size across the dataset. > That's good! But it also means that if you transform these residuals > back to the original space, they don't follow a normal distribution > anymore, and using least-squares to fit the problem isn't appropriate > anymore. > > > I could use converted function further, but the problem is that I have the whole list of different functions to fit. And I'd like to create universal fitter for all of them. > > Well, you will have to go through those functions (and their implicit > error models) and determine if least-squares is truly appropriate for > them. Least-squares is not appropriate for all models. However, > log-transforming the strictly-positive variables in a model quite > frequently is all you need to do to turn a least-squares-inappropriate > model into a least-squares-appropriate one. You can write your > functions in that log-transformed form and write a little adapter to > transform the data (which is given to you in the original form). > > -- > Robert Kern > Robert, thank you very much for detailed answer, now I see what is the problem. I don't really have any uncertainties, and I guess it would be hard to compute them from the data. Moreover, this data is just the sample, and I will have a different types of data in real. Transformation actually helps just for the the couple of functions, for instance, 10**(A*x + B*z +C) and C*(A)**X*(B)**Z functions fit just perfectly, but doesn't work for any others (like A*lg(X) + B*Z + C, C/(1 + A*X + B*Z)). I transform the function like this (conditionally): y_data = np.log10(y_data) function = np.log10(function) , is that correct? And what do you mean by little adapter to transform the data? By the way, the problem appears only in 3d mode. When I use the same logarithmic data in 2d mode (no Z axis), it works perfectly for all functions, and no log10 transformation needed (this transformation distort the results, make them worse in that case). Do you know any other fitting methods, available in python? From sturla at molden.no Wed Mar 7 05:28:40 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 07 Mar 2012 11:28:40 +0100 Subject: [SciPy-User] scipy.spatial, dsearchn? In-Reply-To: References: Message-ID: <4F573858.2030100@molden.no> On 06.03.2012 10:24, Evan Mason wrote: > k = dsearchn(X,T,XI) returns the indices k of the closest points in X for each > point in XI. I have never used Matlab's dsearchn, but scipy.spatial.cKDTree can search rather quickly for nearest-neighbour points (time complexity O(log n) for each search). Sturla From evanmason at gmail.com Wed Mar 7 07:21:59 2012 From: evanmason at gmail.com (Evan Mason) Date: Wed, 7 Mar 2012 12:21:59 +0000 (UTC) Subject: [SciPy-User] scipy.spatial, dsearchn? References: <4F573858.2030100@molden.no> Message-ID: > I have never used Matlab's dsearchn, but scipy.spatial.cKDTree can > search rather quickly for nearest-neighbour points (time complexity > O(log n) for each search). Thanks, yes, and here I found pretty much what I needed using scipy.spatial.cKDTree: http://permalink.gmane.org/gmane.comp.python.scientific.user/19610 Evan From robert.kern at gmail.com Wed Mar 7 07:30:48 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Mar 2012 12:30:48 +0000 Subject: [SciPy-User] Orthogonal distance regression in 3D In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 06:57, ???????? wrote: > > > > 06 ????? 2012, 15:15 ?? Robert Kern : >> On Tue, Mar 6, 2012 at 08:22, ???????? wrote: >> > >> > 05 ????? 2012, 14:59 ?? Robert Kern : >> >> On Mon, Mar 5, 2012 at 10:26, ???????? wrote: >> >> > 02 ????? 2012, 15:49 ?? Robert Kern : >> >> >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: >> >> >> > Hello! >> >> >> > I'm working with orthogonal distance regression (scipy.odr). >> >> >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it >> >> >> > returns wrong results >> >> >> > >> >> >> > For example I want to fit the simple curve y = a*x + b*z + c to some point >> >> >> > cloud (y_data, x_data, z_data) >> >> >> > >> >> >> > >> >> >> > ? ? def func(p, input): >> >> >> > >> >> >> > ? ? x,z = input >> >> >> > >> >> >> > ? ? x = np.array(x) >> >> >> > >> >> >> > ? ? z = np.array(z) >> >> >> > >> >> >> > ? ? return (p[0]*x + p[1]*z + p[2]) >> >> >> > >> >> >> > >> >> >> > ? ? initialGuess = [1,1,1] >> >> >> > >> >> >> > ? ? myModel = Model(func) >> >> >> > >> >> >> > ? ? myData = Data([x_data, z_daya], y_data) >> >> >> > >> >> >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) >> >> >> > >> >> >> > ? ? myOdr.set_job(fit_type=0) >> >> >> > >> >> >> > ? ? out = myOdr.run() >> >> >> > >> >> >> > ? ? print out.beta >> >> >> > >> >> >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results >> >> >> > are not even close to real, moreover it is very sensitive to initial Guess, >> >> >> > so it returns different result even if i change InitiaGuess from?[1,1,1] >> >> >> > to?[0.99,1,1] >> >> >> > >> >> >> > What do I do wrong? >> >> >> >> >> >> Can you provide a complete runnable example including some data? Note >> >> >> that if you do not specify any errors on your data, they are assumed >> >> >> to correspond to a standard deviation of 1 for all dimensions. If that >> >> >> is wildly different from the actual variance around the "true" >> >> >> surface, then it might lead the optimizer astray. >> >> >> >> >> >> -- >> >> >> Robert Kern >> >> >> >> >> > >> >> > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. >> >> > >> >> > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) >> >> > >> >> > import numpy as np >> >> > from scipy.odr import * >> >> > from math import * >> >> >> >> [snip] >> >> >> >> > def funcReturner(p, input): >> >> > ? ? ? ?input = np.array(input) >> >> > ? ? ? ?x = input[0] >> >> > ? ? ? ?z = input[1] >> >> > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) >> >> >> >> Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear >> >> problem you initially asked about. Setting the uncertainties >> >> accurately on all axes of your data is essential. Do you really know >> >> what they are? It's possible that you want to try fitting a plane to >> >> np.log10(y_data) instead. >> >> >> >> > myModel = Model(funcReturner) >> >> > myData = Data([x_data,z_data], y_data) >> >> > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) >> >> > myOdr.set_job(fit_type=0) >> >> > out = myOdr.run() >> >> > result = out.beta >> >> > >> >> > print "Optimal coefficients: ", result >> >> > >> >> > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. >> >> > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 >> >> >> >> For such a nonlinear problem, finding reasonable initial guesses is >> >> useful. There is also a maximum iteration limit defaulting to a fairly >> >> low 50. Check out.stopreason to see if it actually converged or just >> >> ran into the iteration limit. You can keep calling myOdr.restart() >> >> until it converges. If I start with beta0=[1,1,1], it converges >> >> somewhere between 300 and 400 iterations. >> >> >> >> -- >> >> Robert Kern >> >> >> > >> > Yeah, increasing the number of iterations (maxit parameter) makes the results slightly more accurate, but not better. I mean if I attain that the stop reason is "sum square convergence", results are even worse. But, I tryed to fit converted function, like you recommended - np.log10(y_data). And it gave me the proper results. Why that happens and is it possible to achieve these results without convertion? >> >> As I mentioned before, in a nonlinear case, you really need to have >> good estimates of the uncertainties on each point. Since your Y >> variable varies over several orders of magnitude, I really doubt that >> the uncertainties are the same for each point. It's more likely that >> you want to assign a relative 10% (or whatever) uncertainty to each >> point rather than the same absolute uncertainty to each. I don't think >> that you have really measured both 1651.5+-1.0 and 0.05+-1.0, but >> that's what you are implicitly saying when you don't provide explicit >> estimates of the uncertainties. >> >> One problem that you are going to run into is that least-squares isn't >> especially appropriate for your model. Your Y output is strictly >> positive, but it goes very close to 0.0. The error model that >> least-squares fits is that each measurement follows a Gaussian >> distribution about the true point, and the Gaussian has infinite >> support (specifically, it crosses that 0 line, and you know a priori >> that you will never observe a negative value). For the observations >> ~1000.0, that doesn't matter much, but it severely distorts the >> problem at 0.05. Your true error distribution is probably something >> like log-normal; the errors below the curve are small but the errors >> above can be large. Transforming strictly-positive data with a >> logarithm is a standard technique. In a sense, the "log-transformed" >> model is the "true" model to be using, at least if you want to use >> least-squares. Looking at the residuals of both original and the >> log10-transformed problem (try plot(x_data, out.eps, 'k.'), >> plot(x_data, out.delta[0], 'k.'), etc.), it looks like the >> log10-transformed data does fit fairly well; the residuals mostly >> follow a normal distribution of the same size across the dataset. >> That's good! But it also means that if you transform these residuals >> back to the original space, they don't follow a normal distribution >> anymore, and using least-squares to fit the problem isn't appropriate >> anymore. >> >> > I could use converted function further, but the problem is that I have the whole list of different functions to fit. And I'd like to create universal fitter for all of them. >> >> Well, you will have to go through those functions (and their implicit >> error models) and determine if least-squares is truly appropriate for >> them. Least-squares is not appropriate for all models. However, >> log-transforming the strictly-positive variables in a model quite >> frequently is all you need to do to turn a least-squares-inappropriate >> model into a least-squares-appropriate one. You can write your >> functions in that log-transformed form and write a little adapter to >> transform the data (which is given to you in the original form). >> >> -- >> Robert Kern >> > > Robert, thank you very much for detailed answer, now I see what is the problem. I don't really have any uncertainties, and I guess it would be hard to compute them from the data. Moreover, this data is just the sample, and I will have a different types of data in real. Transformation actually helps just for the the couple of functions, for instance, 10**(A*x + B*z +C) and C*(A)**X*(B)**Z functions fit just perfectly, but doesn't work for any others (like A*lg(X) + B*Z + C, C/(1 + A*X + B*Z)). Right. That's why I said that you have to go through all of the functions to see if it's applicable or not. > I transform the function like this (conditionally): > y_data = np.log10(y_data) > function = np.log10(function) > , is that correct? Yes. > And what do you mean by little adapter to transform the data? I just meant the "y_data = np.log10(y_data)" part. > By the way, the problem appears only in 3d mode. When I use the same logarithmic data in 2d mode (no Z axis), it works perfectly for all functions, and no log10 transformation needed (this transformation distort the results, make them worse in that case). Without knowing how you are getting the data to make that determination, I don't have much to say about it. The problem of inaccurate uncertainties will probably get larger as you increase the dimension of the inputs. Since you don't know them, it's probably not a good idea to keep trying to do ODR instead of ordinary least squares. Use myOdr.set_job(fit_type=2) to use OLS instead. You still have a problem with the unknown uncertainties on the Y output, but that narrows some of the problems down. > Do you know any other fitting methods, available in python? None that free you from this kind of thoughtful analysis. Fitting functions isn't a black box, I'm afraid. You need to consider your models and your data before fitting, and you need to look at the residuals afterwards to check that the results make sense. Then you improve your model. Least-squares is not suitable for all models. You may need to roll your own error model and use the generic minimization routines in scipy.optimize. If you can formulate your models as generalized linear models (which you can for a couple of them, but not all), you could look at the functionality for this in the statsmodels package. -- Robert Kern From josef.pktd at gmail.com Wed Mar 7 10:20:06 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 7 Mar 2012 10:20:06 -0500 Subject: [SciPy-User] Orthogonal distance regression in 3D In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 7:30 AM, Robert Kern wrote: > On Wed, Mar 7, 2012 at 06:57, ???????? wrote: >> >> >> >> 06 ????? 2012, 15:15 ?? Robert Kern : >>> On Tue, Mar 6, 2012 at 08:22, ???????? wrote: >>> > >>> > 05 ????? 2012, 14:59 ?? Robert Kern : >>> >> On Mon, Mar 5, 2012 at 10:26, ???????? wrote: >>> >> > 02 ????? 2012, 15:49 ?? Robert Kern : >>> >> >> On Fri, Mar 2, 2012 at 06:02, ???????? wrote: >>> >> >> > Hello! >>> >> >> > I'm working with orthogonal distance regression (scipy.odr). >>> >> >> > I try to fit the curve to a point cloud (3d), but it doesn work properly, it >>> >> >> > returns wrong results >>> >> >> > >>> >> >> > For example I want to fit the simple curve y = a*x + b*z + c to some point >>> >> >> > cloud (y_data, x_data, z_data) >>> >> >> > >>> >> >> > >>> >> >> > ? ? def func(p, input): >>> >> >> > >>> >> >> > ? ? x,z = input >>> >> >> > >>> >> >> > ? ? x = np.array(x) >>> >> >> > >>> >> >> > ? ? z = np.array(z) >>> >> >> > >>> >> >> > ? ? return (p[0]*x + p[1]*z + p[2]) >>> >> >> > >>> >> >> > >>> >> >> > ? ? initialGuess = [1,1,1] >>> >> >> > >>> >> >> > ? ? myModel = Model(func) >>> >> >> > >>> >> >> > ? ? myData = Data([x_data, z_daya], y_data) >>> >> >> > >>> >> >> > ? ? myOdr = ODR(myData, myModel, beta0 = initialGuess) >>> >> >> > >>> >> >> > ? ? myOdr.set_job(fit_type=0) >>> >> >> > >>> >> >> > ? ? out = myOdr.run() >>> >> >> > >>> >> >> > ? ? print out.beta >>> >> >> > >>> >> >> > It works perfectly in 2d dimension (2 axes), but in 3d dimension the results >>> >> >> > are not even close to real, moreover it is very sensitive to initial Guess, >>> >> >> > so it returns different result even if i change InitiaGuess from?[1,1,1] >>> >> >> > to?[0.99,1,1] >>> >> >> > >>> >> >> > What do I do wrong? >>> >> >> >>> >> >> Can you provide a complete runnable example including some data? Note >>> >> >> that if you do not specify any errors on your data, they are assumed >>> >> >> to correspond to a standard deviation of 1 for all dimensions. If that >>> >> >> is wildly different from the actual variance around the "true" >>> >> >> surface, then it might lead the optimizer astray. >>> >> >> >>> >> >> -- >>> >> >> Robert Kern >>> >> >> >>> >> > >>> >> > I wonder why when I change the initial guess the results changes too. As it, the result depends on the initial guess directly. This is wrong. >>> >> > >>> >> > Here is an example (Sorry for the huge array of data, but its important to show what happens on it) >>> >> > >>> >> > import numpy as np >>> >> > from scipy.odr import * >>> >> > from math import * >>> >> >>> >> [snip] >>> >> >>> >> > def funcReturner(p, input): >>> >> > ? ? ? ?input = np.array(input) >>> >> > ? ? ? ?x = input[0] >>> >> > ? ? ? ?z = input[1] >>> >> > ? ? ? ?return 10**(p[0]*x + p[1]*z +p[2]) >>> >> >>> >> Ah. 10**(p[0]*x+p[1]*z+p[2]) is a *lot* different from the linear >>> >> problem you initially asked about. Setting the uncertainties >>> >> accurately on all axes of your data is essential. Do you really know >>> >> what they are? It's possible that you want to try fitting a plane to >>> >> np.log10(y_data) instead. >>> >> >>> >> > myModel = Model(funcReturner) >>> >> > myData = Data([x_data,z_data], y_data) >>> >> > myOdr = ODR(myData, myModel, beta0=[0.04, -0.02, ?1.75]) >>> >> > myOdr.set_job(fit_type=0) >>> >> > out = myOdr.run() >>> >> > result = out.beta >>> >> > >>> >> > print "Optimal coefficients: ", result >>> >> > >>> >> > I tryed to specify sx, sy, we, wd, delta, everything: and I get the better results, but they are still not what I need. And they are still depends directly on initial guess as well. >>> >> > If I set initial guess to [1,1,1], it fails to find any close solution and returns totally wrong result with huge Residual Variance like 3.21014784829e+209 >>> >> >>> >> For such a nonlinear problem, finding reasonable initial guesses is >>> >> useful. There is also a maximum iteration limit defaulting to a fairly >>> >> low 50. Check out.stopreason to see if it actually converged or just >>> >> ran into the iteration limit. You can keep calling myOdr.restart() >>> >> until it converges. If I start with beta0=[1,1,1], it converges >>> >> somewhere between 300 and 400 iterations. >>> >> >>> >> -- >>> >> Robert Kern >>> >> >>> > >>> > Yeah, increasing the number of iterations (maxit parameter) makes the results slightly more accurate, but not better. I mean if I attain that the stop reason is "sum square convergence", results are even worse. But, I tryed to fit converted function, like you recommended - np.log10(y_data). And it gave me the proper results. Why that happens and is it possible to achieve these results without convertion? >>> >>> As I mentioned before, in a nonlinear case, you really need to have >>> good estimates of the uncertainties on each point. Since your Y >>> variable varies over several orders of magnitude, I really doubt that >>> the uncertainties are the same for each point. It's more likely that >>> you want to assign a relative 10% (or whatever) uncertainty to each >>> point rather than the same absolute uncertainty to each. I don't think >>> that you have really measured both 1651.5+-1.0 and 0.05+-1.0, but >>> that's what you are implicitly saying when you don't provide explicit >>> estimates of the uncertainties. >>> >>> One problem that you are going to run into is that least-squares isn't >>> especially appropriate for your model. Your Y output is strictly >>> positive, but it goes very close to 0.0. The error model that >>> least-squares fits is that each measurement follows a Gaussian >>> distribution about the true point, and the Gaussian has infinite >>> support (specifically, it crosses that 0 line, and you know a priori >>> that you will never observe a negative value). For the observations >>> ~1000.0, that doesn't matter much, but it severely distorts the >>> problem at 0.05. Your true error distribution is probably something >>> like log-normal; the errors below the curve are small but the errors >>> above can be large. Transforming strictly-positive data with a >>> logarithm is a standard technique. In a sense, the "log-transformed" >>> model is the "true" model to be using, at least if you want to use >>> least-squares. Looking at the residuals of both original and the >>> log10-transformed problem (try plot(x_data, out.eps, 'k.'), >>> plot(x_data, out.delta[0], 'k.'), etc.), it looks like the >>> log10-transformed data does fit fairly well; the residuals mostly >>> follow a normal distribution of the same size across the dataset. >>> That's good! But it also means that if you transform these residuals >>> back to the original space, they don't follow a normal distribution >>> anymore, and using least-squares to fit the problem isn't appropriate >>> anymore. >>> >>> > I could use converted function further, but the problem is that I have the whole list of different functions to fit. And I'd like to create universal fitter for all of them. >>> >>> Well, you will have to go through those functions (and their implicit >>> error models) and determine if least-squares is truly appropriate for >>> them. Least-squares is not appropriate for all models. However, >>> log-transforming the strictly-positive variables in a model quite >>> frequently is all you need to do to turn a least-squares-inappropriate >>> model into a least-squares-appropriate one. You can write your >>> functions in that log-transformed form and write a little adapter to >>> transform the data (which is given to you in the original form). >>> >>> -- >>> Robert Kern >>> >> >> Robert, thank you very much for detailed answer, now I see what is the problem. I don't really have any uncertainties, and I guess it would be hard to compute them from the data. Moreover, this data is just the sample, and I will have a different types of data in real. Transformation actually helps just for the the couple of functions, for instance, 10**(A*x + B*z +C) and C*(A)**X*(B)**Z functions fit just perfectly, but doesn't work for any others (like A*lg(X) + B*Z + C, C/(1 + A*X + B*Z)). > > Right. That's why I said that you have to go through all of the > functions to see if it's applicable or not. > >> I transform the function like this (conditionally): >> y_data = np.log10(y_data) >> function = np.log10(function) >> , is that correct? > > Yes. > >> And what do you mean by little adapter to transform the data? > > I just meant the "y_data = np.log10(y_data)" part. > >> By the way, the problem appears only in 3d mode. When I use the same logarithmic data in 2d mode (no Z axis), it works perfectly for all functions, and no log10 transformation needed (this transformation distort the results, make them worse in that case). > > Without knowing how you are getting the data to make that > determination, I don't have much to say about it. The problem of > inaccurate uncertainties will probably get larger as you increase the > dimension of the inputs. Since you don't know them, it's probably not > a good idea to keep trying to do ODR instead of ordinary least > squares. Use myOdr.set_job(fit_type=2) to use OLS instead. You still > have a problem with the unknown uncertainties on the Y output, but > that narrows some of the problems down. > >> Do you know any other fitting methods, available in python? > > None that free you from this kind of thoughtful analysis. Fitting > functions isn't a black box, I'm afraid. You need to consider your > models and your data before fitting, and you need to look at the > residuals afterwards to check that the results make sense. Then you > improve your model. Least-squares is not suitable for all models. You > may need to roll your own error model and use the generic minimization > routines in scipy.optimize. If you can formulate your models as > generalized linear models (which you can for a couple of them, but not > all), you could look at the functionality for this in the statsmodels > package. some additional comments If the nonlinear function are common or standard (in a field), then it can be possible to find predefined starting values, or figure out function specific ways to (semi-)automate the estimation. I didn't look very closely at those, but these packages have a large collection of nonlinear functions (the last time I looked) http://code.google.com/p/pyeq2/ (also with global optimizer) http://packages.python.org/PyModelFit/ maybe 1d only R has a few "self-starting" non-linear functions, but AFAIK 1d only, where it's possible to figure out good starting values from the data. As Robert explained one problem with least squares is that the variance of the error might not be the same for all observations (heteroscedasticity in econometrics) Besides guessing the right transformation, as the log in this example, there are ways to test for the heteroscedasticity and to estimate the transformation within a class of nonlinear transformation. statsmodels has a few statistical tests, but currently only intended and checked for the linear case (it should extend to nonlinear functions) scipy.stats has an estimator for the Box-Cox transformation, but I looked at it only for the case of identically distributed observation, not for function fitting. (statsmodels doesn't have any parameterized transformations yet, only predefined transformation like the log. ) I don't know of anything in python that could currently be directly used for this. Josef > -- > Robert Kern > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From dcday137 at gmail.com Wed Mar 7 12:51:54 2012 From: dcday137 at gmail.com (Collin Day) Date: Wed, 7 Mar 2012 17:51:54 +0000 (UTC) Subject: [SciPy-User] How to feed np.mgrid a variable number of 'arguments' References: Message-ID: Collin Day gmail.com> writes: > > Hi all,I am guessing there is an easy way to do this, but I am just not seeing it.? I have a function where I can have a variable number of input dimensions.? In the function, I need to use np.mgrid to generate the data I need.? How would I create a line of code that would feed np.mgrid a variable number of inputs?? For example:3d, with 17 nodesa = np.mgrid[0:17,0:17,0:17]4da = np.mgrid[0:17,0:17,0:17,0:17]Is there a way I can donodes=17inDims = a_numbera = np.mgrid[0:17,0:17...a_number of times]easily?Thanks! > > _______________________________________________ > SciPy-User mailing list > SciPy-User scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Thanks both of you who answered - this was exactly what I was looking for, but it would have been forever before I found it! From peter.cimermancic at gmail.com Wed Mar 7 21:39:17 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Wed, 7 Mar 2012 18:39:17 -0800 Subject: [SciPy-User] Generalized least square on large dataset Message-ID: Hi, I'd like to linearly fit the data that were NOT sampled independently. I came across generalized least square method: b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y X and Y are coordinates of the data points, and V is a "variance matrix". The equation is Matlab format - I've tried solving problem there too, bit it didn't work - but eventually I'd like to be able to solve problems like that in python. The problem is that due to its size (1000 rows and columns), the V matrix becomes singular, thus un-invertable. Any suggestions for how to get around this problem? Maybe using a way of solving generalized linear regression problem other than GLS? Regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Mar 7 22:09:14 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 7 Mar 2012 22:09:14 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 9:39 PM, Peter Cimerman?i? wrote: > Hi, > > I'd like to linearly fit the data that were NOT sampled independently. I > came across generalized least square method: > > b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y > > > X and Y are coordinates of the data points, and V is a "variance matrix". > > The equation is Matlab format - I've tried solving problem there too, bit it > didn't work - but eventually I'd like to be able to solve problems like that > in python. The problem is that due to its size (1000 rows and columns), the > V matrix becomes singular, thus un-invertable. Any suggestions for how to > get around this problem? Maybe using a way of solving generalized linear > regression problem other than GLS? V is (nobs,nobs) has nobs*(nobs-1)/2 parameter (or something like this) (nobs number of observations rows) I don't think there is a general solution without imposing a lot of structure on V that reduces the effective number of parameters. (singular matrix in itself is not necessarily a problem using pinv or a small Ridge penalty) Most of the solutions I'm looking for for GLS is for cases where V is a kronecker product (or block matrix) or where V^{-0.5} (the matrix to transform the X) has a nice solution. I only looked at a few special cases so far. There are some papers that claim that they have an efficient cholesky of the V inverse for mixed effects models for example but I never looked at the details. General rule: if you manage to get it to work in matlab, then it should be possible to get it to work in numpython (unless ... which shouldn't apply in this case) I think the main question is: What kind of V matrix do you have? What kind of violation if independent sampling? I'm also still struggling with this question, and with which linear algebra to use, so I'm also interested in any solutions. Josef > > Regards, > > Peter > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Wed Mar 7 22:46:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Mar 2012 20:46:53 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? < peter.cimermancic at gmail.com> wrote: > Hi, > > I'd like to linearly fit the data that were NOT sampled independently. I > came across generalized least square method: > > b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y > > X and Y are coordinates of the data points, and V is a "variance matrix". > > The equation is Matlab format - I've tried solving problem there too, bit > it didn't work - but eventually I'd like to be able to solve problems like > that in python. The problem is that due to its size (1000 rows and > columns), the V matrix becomes singular, thus un-invertable. Any > suggestions for how to get around this problem? Maybe using a way of > solving generalized linear regression problem other than GLS? > > Plain old least squares will probably do a decent job for the fit, where you will run into trouble is if you want to estimate the covariance. The idea of using the variance matrix is to transform the data set into independent observations of equal variance, but except in extreme cases that shouldn't really be necessary if you have sufficient data points. Weighting the data is a simple case of this that merely equalizes the variance, and it often doesn't make that much difference. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Mar 7 22:58:10 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 7 Mar 2012 22:58:10 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 10:46 PM, Charles R Harris wrote: > > > On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? > wrote: >> >> Hi, >> >> I'd like to linearly fit the data that were NOT sampled independently. I >> came across generalized least square method: >> >> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y >> >> >> X and Y are coordinates of the data points, and V is a "variance matrix". >> >> The equation is Matlab format - I've tried solving problem there too, bit >> it didn't work - but eventually I'd like to be able to solve problems like >> that in python. The problem is that due to its size (1000 rows and columns), >> the V matrix becomes singular, thus un-invertable. Any suggestions for how >> to get around this problem? Maybe using a way of solving generalized linear >> regression problem other than GLS? >> > > Plain old least squares will probably do a decent job for the fit, where you > will run into trouble is if you want to estimate the covariance. side question: Are heteroscedasticity and (auto)correlation robust standard errors popular in any field outside of economics/econometrics, so called sandwich estimators of covariance matrix? (estimate with OLS ignoring non-independent and non-identical noise, but correct the covariance matrix) I recently expanded this in statsmodels, and would like to start soon some advertising in favor of sandwiches. Josef > The idea of > using the variance matrix is to transform the data set into independent > observations of equal variance, but except in extreme cases that shouldn't > really be necessary if you have sufficient data points. Weighting the data > is a simple case of this that merely equalizes the variance, and it often > doesn't make that much difference. > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Wed Mar 7 23:00:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Mar 2012 21:00:24 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 8:46 PM, Charles R Harris wrote: > > > On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? < > peter.cimermancic at gmail.com> wrote: > >> Hi, >> >> I'd like to linearly fit the data that were NOT sampled independently. I >> came across generalized least square method: >> >> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y >> >> X and Y are coordinates of the data points, and V is a "variance matrix". >> >> The equation is Matlab format - I've tried solving problem there too, bit >> it didn't work - but eventually I'd like to be able to solve problems like >> that in python. The problem is that due to its size (1000 rows and >> columns), the V matrix becomes singular, thus un-invertable. Any >> suggestions for how to get around this problem? Maybe using a way of >> solving generalized linear regression problem other than GLS? >> >> > Plain old least squares will probably do a decent job for the fit, where > you will run into trouble is if you want to estimate the covariance. The > idea of using the variance matrix is to transform the data set into > independent observations of equal variance, but except in extreme cases > that shouldn't really be necessary if you have sufficient data points. > Weighting the data is a simple case of this that merely equalizes the > variance, and it often doesn't make that much difference. > > To expand a bit, if it is simply the case that the measurement errors aren't independent and you know their covariance, then you want to minimize (y - Ax)^T * cov^-1 * (y - ax) and if you factor cov^-1 into U^T * U, then you can solve the ordinary least squares problem U*A*x = U*y. I can't really tell what your data/problem is like without more details. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Mar 7 23:04:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Mar 2012 21:04:04 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 8:58 PM, wrote: > On Wed, Mar 7, 2012 at 10:46 PM, Charles R Harris > wrote: > > > > > > On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? > > wrote: > >> > >> Hi, > >> > >> I'd like to linearly fit the data that were NOT sampled independently. I > >> came across generalized least square method: > >> > >> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y > >> > >> > >> X and Y are coordinates of the data points, and V is a "variance > matrix". > >> > >> The equation is Matlab format - I've tried solving problem there too, > bit > >> it didn't work - but eventually I'd like to be able to solve problems > like > >> that in python. The problem is that due to its size (1000 rows and > columns), > >> the V matrix becomes singular, thus un-invertable. Any suggestions for > how > >> to get around this problem? Maybe using a way of solving generalized > linear > >> regression problem other than GLS? > >> > > > > Plain old least squares will probably do a decent job for the fit, where > you > > will run into trouble is if you want to estimate the covariance. > > side question: > Are heteroscedasticity and (auto)correlation robust standard errors > popular in any field outside of economics/econometrics, so called > sandwich estimators of covariance matrix? > (estimate with OLS ignoring non-independent and non-identical noise, > but correct the covariance matrix) > > I recently expanded this in statsmodels, and would like to start soon > some advertising in favor of sandwiches. > > I'm not familiar with them, but I can't speak for many. Indeed, there seems to be the most rudimentary understanding of statistics in many fields, basically reducible to root sum of squares for the more sophisticated ;) But I think I was contemplating something similar to what you mention. Sounds interesting. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.cimermancic at gmail.com Wed Mar 7 23:25:36 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Wed, 7 Mar 2012 20:25:36 -0800 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: To describe my problem into more details, I have a list of ~1000 bacterial genome lengths and number of certain genes for each one of them. I'd like to see if there is any correlation between genome lengths and number of the genes. It may look like an easy linear regression problem; however, one has to be a bit more careful as the measurements aren't sampled independently. Bacteria, whose genomes are similar, tend to also contain similar number of the genes. Bacterial similarity is what is described with matrix V - it contains similarity values for each pair of bacteria, ranging from 0 to 1. Anybody encountered similar problem already? On Wed, Mar 7, 2012 at 8:00 PM, Charles R Harris wrote: > > > On Wed, Mar 7, 2012 at 8:46 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? < >> peter.cimermancic at gmail.com> wrote: >> >>> Hi, >>> >>> I'd like to linearly fit the data that were NOT sampled independently. I >>> came across generalized least square method: >>> >>> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y >>> >>> X and Y are coordinates of the data points, and V is a "variance matrix". >>> >>> The equation is Matlab format - I've tried solving problem there too, >>> bit it didn't work - but eventually I'd like to be able to solve problems >>> like that in python. The problem is that due to its size (1000 rows and >>> columns), the V matrix becomes singular, thus un-invertable. Any >>> suggestions for how to get around this problem? Maybe using a way of >>> solving generalized linear regression problem other than GLS? >>> >>> >> Plain old least squares will probably do a decent job for the fit, where >> you will run into trouble is if you want to estimate the covariance. The >> idea of using the variance matrix is to transform the data set into >> independent observations of equal variance, but except in extreme cases >> that shouldn't really be necessary if you have sufficient data points. >> Weighting the data is a simple case of this that merely equalizes the >> variance, and it often doesn't make that much difference. >> >> > To expand a bit, if it is simply the case that the measurement errors > aren't independent and you know their covariance, then you want to minimize > (y - Ax)^T * cov^-1 * (y - ax) and if you factor cov^-1 into U^T * U, then > you can solve the ordinary least squares problem U*A*x = U*y. I can't > really tell what your data/problem is like without more details. > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Mar 7 23:26:07 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 7 Mar 2012 23:26:07 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 11:04 PM, Charles R Harris wrote: > > > On Wed, Mar 7, 2012 at 8:58 PM, wrote: >> >> On Wed, Mar 7, 2012 at 10:46 PM, Charles R Harris >> wrote: >> > >> > >> > On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? >> > wrote: >> >> >> >> Hi, >> >> >> >> I'd like to linearly fit the data that were NOT sampled independently. >> >> I >> >> came across generalized least square method: >> >> >> >> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y >> >> >> >> >> >> X and Y are coordinates of the data points, and V is a "variance >> >> matrix". >> >> >> >> The equation is Matlab format - I've tried solving problem there too, >> >> bit >> >> it didn't work - but eventually I'd like to be able to solve problems >> >> like >> >> that in python. The problem is that due to its size (1000 rows and >> >> columns), >> >> the V matrix becomes singular, thus un-invertable. Any suggestions for >> >> how >> >> to get around this problem? Maybe using a way of solving generalized >> >> linear >> >> regression problem other than GLS? >> >> >> > >> > Plain old least squares will probably do a decent job for the fit, where >> > you >> > will run into trouble is if you want to estimate the covariance. >> >> side question: >> Are heteroscedasticity and (auto)correlation robust standard errors >> popular in any field outside of economics/econometrics, so called >> sandwich estimators of covariance matrix? >> (estimate with OLS ignoring non-independent and non-identical noise, >> but correct the covariance matrix) >> >> I recently expanded this in statsmodels, and would like to start soon >> some advertising in favor of sandwiches. >> > > I'm not familiar with them, but I can't speak for many. Indeed, there seems > to be the most rudimentary understanding of statistics in many fields, > basically reducible to root sum of squares for the more sophisticated ;) > > But I think I was contemplating something similar to what you mention. > Sounds interesting. Basic idea in an example: Suppose you have a large sample where the noise is very highly autocorrelated. OLS assumes you have a lot of independent observation and the standard errors will be small. The real standard errors are much larger because observations close to each other are almost the same. Robust standard errors correct for this without assuming much about the actual correlations. A (a bit cryptic) example https://groups.google.com/forum/#!topic/pystatsmodels/GKaQrfyyN7c/discussion that turned into a discussion about how to write a blog ;) Josef > > > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Thu Mar 8 00:36:35 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 8 Mar 2012 00:36:35 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 11:25 PM, Peter Cimerman?i? wrote: > To describe my problem into more details, I have a list of ~1000 bacterial > genome lengths and number of certain genes for each one of them. I'd like to > see if there is any correlation between genome lengths and number of the > genes. It may look like an easy linear regression problem; however, one has > to be a bit more careful as the measurements aren't sampled independently. > Bacteria, whose genomes are similar, tend to also contain similar number of > the genes. Bacterial similarity is what is described with matrix V - it > contains similarity values for each pair of bacteria, ranging from 0 to 1. > > Anybody encountered similar problem already? The closest I can think of is spatial econometrics where V is the spatial similarity http://pysal.geodacenter.org/1.3/library/spreg/ols.html But I never looked at the details of the spatial specification, and I don't know pysal covers this case. (The V is also similar to the correlation in Gaussian Processes, but they do only local estimation as far as I have looked at it.) (I have only vague ideas about this case, If you have the distances/similarity, then you need to estimate the correlation as a function of the similarity. If the correlation matrix is not invertible, then it should be possible to just use the generalized inverse, pinv, of V. 1000x1000 doesn't sound too big to pinv or to use an svd. But I don't see any reason that the covariance matrix should be singular.) But as Chuck said, OLS would still be a consistent estimator, but for standard errors a correction will be necessary. (If it's not in pysal, then it might not be trivial to work out the correction of the standard error.) interesting problem Josef (...) == maybe on topic > > > > On Wed, Mar 7, 2012 at 8:00 PM, Charles R Harris > wrote: >> >> >> >> On Wed, Mar 7, 2012 at 8:46 PM, Charles R Harris >> wrote: >>> >>> >>> >>> On Wed, Mar 7, 2012 at 7:39 PM, Peter Cimerman?i? >>> wrote: >>>> >>>> Hi, >>>> >>>> I'd like to linearly fit the data that were NOT sampled independently. I >>>> came across generalized least square method: >>>> >>>> b=(X'*V^(-1)*X)^(-1)*X'*V^(-1)*Y >>>> >>>> >>>> X and Y are coordinates of the data points, and V is a "variance >>>> matrix". >>>> >>>> The equation is Matlab format - I've tried solving problem there too, >>>> bit it didn't work - but eventually I'd like to be able to solve problems >>>> like that in python. The problem is that due to its size (1000 rows and >>>> columns), the V matrix becomes singular, thus un-invertable. Any suggestions >>>> for how to get around this problem? Maybe using a way of solving generalized >>>> linear regression problem other than GLS? >>>> >>> >>> Plain old least squares will probably do a decent job for the fit, where >>> you will run into trouble is if you want to estimate the covariance. The >>> idea of using the variance matrix is to transform the data set into >>> independent observations of equal variance, but except in extreme cases that >>> shouldn't really be necessary if you have sufficient data points. Weighting >>> the data is a simple case of this that merely equalizes the variance, and it >>> often doesn't make that much difference. >>> >> >> To expand a bit, if it is simply the case that the measurement errors >> aren't independent and you know their covariance, then you want to minimize >> (y - Ax)^T * cov^-1 * (y - ax) and if you factor cov^-1 into U^T * U, then >> you can solve the ordinary least squares problem U*A*x = U*y. I can't really >> tell what your data/problem is like without more details. >> >> Chuck >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Thu Mar 8 08:35:23 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2012 06:35:23 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Wed, Mar 7, 2012 at 9:25 PM, Peter Cimerman?i? < peter.cimermancic at gmail.com> wrote: > To describe my problem into more details, I have a list of ~1000 bacterial > genome lengths and number of certain genes for each one of them. I'd like > to see if there is any correlation between genome lengths and number of the > genes. It may look like an easy linear regression problem; however, one has > to be a bit more careful as the measurements aren't sampled independently. > Bacteria, whose genomes are similar, tend to also contain similar number of > the genes. Bacterial similarity is what is described with matrix V - it > contains similarity values for each pair of bacteria, ranging from 0 to 1. > > Anybody encountered similar problem already? > > Ah, that sounds like a fairly common sort of thing to deal with, separating the effect of two variables, but it is out of the area of my experience. The statisticians around here should be able to say something useful about it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 10:04:05 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2012 15:04:05 +0000 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 4:25 AM, Peter Cimerman?i? wrote: > To describe my problem into more details, I have a list of ~1000 bacterial > genome lengths and number of certain genes for each one of them. I'd like to > see if there is any correlation between genome lengths and number of the > genes. It may look like an easy linear regression problem; however, one has > to be a bit more careful as the measurements aren't sampled independently. > Bacteria, whose genomes are similar, tend to also contain similar number of > the genes. Bacterial similarity is what is described with matrix V - it > contains similarity values for each pair of bacteria, ranging from 0 to 1. > > Anybody encountered similar problem already? I agree with Josef, the first thing that comes to mind is controlling for spatial effects (which happens in various fields; ecology folks worry about this a lot too). In this case, though, I think you may need to think more carefully about whether your similarity measure is really appropriate. If your matrix is uninvertible, then IIUC that means you think that you effectively have less than 1000 distinct genomes -- some of your genomes are "so similar" to other ones that they can be predicted *exactly*. In terms of the underlying probabilistic model: you have some population of bacteria genomes, and you picked 1000 of them to study. Each genome you picked has some length, and it also has a number of genes. The number of genes is determined probabilistically by taking some linear function of the length, and then adding some Gaussian noise. Your goal is to figure out what that linear function looks like. In OLS, we assume that each of those Gaussian noise terms is IID. In GLS, we assume that they're correlated. The way to think about this is that we take 1000 uncorrelated IID gaussian samples, let's call this vector "g", and then we mix them together by multiplying by a matrix chol(V), chol(V)*g. (cholV) is the cholesky decomposition; it's triangular, and chol(V)*chol(V)' = V.) So the noise added to each measurement is a mixture of these underlying IID gaussian terms, and bacteria that are more similar have noise terms that overlap more. If V is singular, then this means that the last k rows of chol(V) are all-zero, which means that when you compute chol(V)*g, there are some elements of g that get ignored entirely -- they don't mixed in at all, and don't effect any bacteria. So, your choice of V is encoding an assumption that there are really only, like, 900 noise terms and 1000 bacteria, and 'g' effectively only has 900 entries. So if you make the measurements for the first 900 bacteria, you should be able to reconstruct the full vector 'g', and then you can use it to compute *exactly* what measurements you will see for the last 100 bacteria. And also you can compute the linear relationship exactly. No need to do any hypothesis tests on the result (and in fact you can't do any hypothesis tests on the result, the math won't work), because you know The Truth! Of course none of these assumptions are actually true. Your bacteria are less similar to each other -- and your measurements more noisy -- than your V matrix claims. So you need a better way to compute V. The nice thing about the above derivation -- and the reason I bothered to go through it -- is that it tells you what entries in V mean, numerically. Ideally you should figure out how to re-calibrate your similarity score so that bacteria which are 0.5 similar have a covariance of 0.5 in their noise -- perhaps by calculating empirical covariances on other measures or something, figuring out the best way to do this calibration will take domain knowledge. The ecology folks might have some practical ideas on how to calibrate such things. Or, you could just replace V with V+\lambda*I. That'd solve the numerical problem, but you should be very suspicious of any p-values you get out, since they are based on lies ;-). -- Nathaniel From peter.cimermancic at gmail.com Thu Mar 8 11:09:04 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Thu, 8 Mar 2012 08:09:04 -0800 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: > > > > I agree with Josef, the first thing that comes to mind is controlling > for spatial effects (which happens in various fields; ecology folks > worry about this a lot too). > > In this case, though, I think you may need to think more carefully > about whether your similarity measure is really appropriate. If your > matrix is uninvertible, then IIUC that means you think that you > effectively have less than 1000 distinct genomes -- some of your > genomes are "so similar" to other ones that they can be predicted > *exactly*. > That's exactly true - some of the bacteria are almost identical (I'll try filtering those and see if it changes anything). > > In terms of the underlying probabilistic model: you have some > population of bacteria genomes, and you picked 1000 of them to study. > Each genome you picked has some length, and it also has a number of > genes. The number of genes is determined probabilistically by taking > some linear function of the length, and then adding some Gaussian > noise. Your goal is to figure out what that linear function looks > like. > > In OLS, we assume that each of those Gaussian noise terms is IID. In > GLS, we assume that they're correlated. The way to think about this is > that we take 1000 uncorrelated IID gaussian samples, let's call this > vector "g", and then we mix them together by multiplying by a matrix > chol(V), chol(V)*g. (cholV) is the cholesky decomposition; it's > triangular, and chol(V)*chol(V)' = V.) So the noise added to each > measurement is a mixture of these underlying IID gaussian terms, and > bacteria that are more similar have noise terms that overlap more. > > I'm also unable to calculate chol of my V matrix, because it doesn't appear to be a positive definite. Any suggestion here? > If V is singular, then this means that the last k rows of chol(V) are > all-zero, which means that when you compute chol(V)*g, there are some > elements of g that get ignored entirely -- they don't mixed in at all, > and don't effect any bacteria. So, your choice of V is encoding an > assumption that there are really only, like, 900 noise terms and 1000 > bacteria, and 'g' effectively only has 900 entries. So if you make the > measurements for the first 900 bacteria, you should be able to > reconstruct the full vector 'g', and then you can use it to compute > *exactly* what measurements you will see for the last 100 bacteria. > And also you can compute the linear relationship exactly. No need to > do any hypothesis tests on the result (and in fact you can't do any > hypothesis tests on the result, the math won't work), because you know > The Truth! > > Of course none of these assumptions are actually true. Your bacteria > are less similar to each other -- and your measurements more noisy -- > than your V matrix claims. > > So you need a better way to compute V. The nice thing about the above > derivation -- and the reason I bothered to go through it -- is that it > tells you what entries in V mean, numerically. Ideally you should > figure out how to re-calibrate your similarity score so that bacteria > which are 0.5 similar have a covariance of 0.5 in their noise -- > perhaps by calculating empirical covariances on other measures or > something, figuring out the best way to do this calibration will take > domain knowledge. The ecology folks might have some practical ideas on > how to calibrate such things. > > Or, you could just replace V with V+\lambda*I. That'd solve the > numerical problem, but you should be very suspicious of any p-values > you get out, since they are based on lies ;-). > Yes, p-values are something I'd eventually like to come close to. Thank you for your answer! Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 11:31:52 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2012 16:31:52 +0000 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 4:09 PM, Peter Cimerman?i? wrote: >> >> >> I agree with Josef, the first thing that comes to mind is controlling >> for spatial effects (which happens in various fields; ecology folks >> worry about this a lot too). >> >> In this case, though, I think you may need to think more carefully >> about whether your similarity measure is really appropriate. If your >> matrix is uninvertible, then IIUC that means you think that you >> effectively have less than 1000 distinct genomes -- some of your >> genomes are "so similar" to other ones that they can be predicted >> *exactly*. > > That's exactly true - some of the bacteria are almost identical (I'll try > filtering those and see if it changes anything). You aren't just telling the computer that they're almost identical -- that would be fine, the model would just mostly-but-not-entirely ignore the near-duplicates. You're telling the computer that they are exactly identical and you had no reason to even collect the data because you knew ahead of time exactly what it would be. This is the sort of thing that really confuses statistical programs :-). >> In terms of the underlying probabilistic model: you have some >> population of bacteria genomes, and you picked 1000 of them to study. >> Each genome you picked has some length, and it also has a number of >> genes. The number of genes is determined probabilistically by taking >> some linear function of the length, and then adding some Gaussian >> noise. Your goal is to figure out what that linear function looks >> like. >> >> In OLS, we assume that each of those Gaussian noise terms is IID. In >> GLS, we assume that they're correlated. The way to think about this is >> that we take 1000 uncorrelated IID gaussian samples, let's call this >> vector "g", and then we mix them together by multiplying by a matrix >> chol(V), chol(V)*g. (cholV) is the cholesky decomposition; it's >> triangular, and chol(V)*chol(V)' = V.) So the noise added to each >> measurement is a mixture of these underlying IID gaussian terms, and >> bacteria that are more similar have noise terms that overlap more. >> > > I'm also unable to calculate chol of my V matrix, because it doesn't appear > to be a positive definite. Any suggestion here? Singular matrices can't be positive definite, by definition. They can be positive semi-definite. (The analogy is numbers -- a number that is zero cannot be greater than zero, by definition. But it can be >= zero.) Any well-defined covariance matrix is necessarily positive semi-definite. If your covariance matrix isn't positive semi-definite, then that's like claiming that you have three random variables where A and B have a correlation of 0.99, and B and C have a correlation of 0.99, but A and C are uncorrelated. That's impossible. ([[1, 0.99, 0], [0.99, 1, 0.99], [0, 0.99, 1]] is not a positive-definite matrix.) Singular, positive semi-definite matrices do *have* Cholesky decompositions, but your average off-the-shelf Cholesky routine can't compute them. (Again by analogy -- in theory you can compute the square root of zero, but in practice you can't reliably with floating point, because your "zero" may turn out to actually be represented as "-2.2e-16" or something, and an off-the-shelf square root routine will blow up on this because it looks negative.) You can look around for a "rank-revealing Cholesky", perhaps. Anyway, the question is whether your matrix is positive semi-definite. If it is, then this is all expected, and your problem is just that you need to fix your covariances to be more realistic, as discussed. If it isn't, then you don't even have a covariance matrix, and again you need to figure out how to get one :-). You can check for positive (semi-)definiteness by looking at the eigenvalues -- they should be all >= 0 for semi-definite, > 0 for definite. The easiest way to manufacture a positive-definite matrix on command is to take a non-singular matrix A and compute A'A. HTH, - N From charlesr.harris at gmail.com Thu Mar 8 11:32:42 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2012 09:32:42 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 9:09 AM, Peter Cimerman?i? < peter.cimermancic at gmail.com> wrote: > >> >> I agree with Josef, the first thing that comes to mind is controlling >> for spatial effects (which happens in various fields; ecology folks >> worry about this a lot too). >> >> In this case, though, I think you may need to think more carefully >> about whether your similarity measure is really appropriate. If your >> matrix is uninvertible, then IIUC that means you think that you >> effectively have less than 1000 distinct genomes -- some of your >> genomes are "so similar" to other ones that they can be predicted >> *exactly*. >> > > > That's exactly true - some of the bacteria are almost identical (I'll try > filtering those and see if it changes anything). > > > > >> >> In terms of the underlying probabilistic model: you have some >> population of bacteria genomes, and you picked 1000 of them to study. >> Each genome you picked has some length, and it also has a number of >> genes. The number of genes is determined probabilistically by taking >> some linear function of the length, and then adding some Gaussian >> noise. Your goal is to figure out what that linear function looks >> like. >> >> In OLS, we assume that each of those Gaussian noise terms is IID. In >> GLS, we assume that they're correlated. The way to think about this is >> that we take 1000 uncorrelated IID gaussian samples, let's call this >> vector "g", and then we mix them together by multiplying by a matrix >> chol(V), chol(V)*g. (cholV) is the cholesky decomposition; it's >> triangular, and chol(V)*chol(V)' = V.) So the noise added to each >> measurement is a mixture of these underlying IID gaussian terms, and >> bacteria that are more similar have noise terms that overlap more. >> >> > I'm also unable to calculate chol of my V matrix, because it doesn't > appear to be a positive definite. Any suggestion here? > > >> If V is singular, then this means that the last k rows of chol(V) are >> all-zero, which means that when you compute chol(V)*g, there are some >> elements of g that get ignored entirely -- they don't mixed in at all, >> and don't effect any bacteria. So, your choice of V is encoding an >> assumption that there are really only, like, 900 noise terms and 1000 >> bacteria, and 'g' effectively only has 900 entries. So if you make the >> measurements for the first 900 bacteria, you should be able to >> reconstruct the full vector 'g', and then you can use it to compute >> *exactly* what measurements you will see for the last 100 bacteria. >> And also you can compute the linear relationship exactly. No need to >> do any hypothesis tests on the result (and in fact you can't do any >> hypothesis tests on the result, the math won't work), because you know >> The Truth! >> >> Of course none of these assumptions are actually true. Your bacteria >> are less similar to each other -- and your measurements more noisy -- >> than your V matrix claims. >> >> So you need a better way to compute V. The nice thing about the above >> derivation -- and the reason I bothered to go through it -- is that it >> tells you what entries in V mean, numerically. Ideally you should >> figure out how to re-calibrate your similarity score so that bacteria >> which are 0.5 similar have a covariance of 0.5 in their noise -- >> perhaps by calculating empirical covariances on other measures or >> something, figuring out the best way to do this calibration will take >> domain knowledge. The ecology folks might have some practical ideas on >> how to calibrate such things. >> >> Or, you could just replace V with V+\lambda*I. That'd solve the >> numerical problem, but you should be very suspicious of any p-values >> you get out, since they are based on lies ;-). >> > > Yes, p-values are something I'd eventually like to come close to. > > I think Josef's suggestion would be a good place to start. The problem seems to be that the similarity matrix isn't a correlation matrix. What Josef suggests would give you some way of analysing what the actual correlations are. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Mar 8 11:58:42 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 8 Mar 2012 11:58:42 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 11:31 AM, Nathaniel Smith wrote: > On Thu, Mar 8, 2012 at 4:09 PM, Peter Cimerman?i? > wrote: >>> >>> >>> I agree with Josef, the first thing that comes to mind is controlling >>> for spatial effects (which happens in various fields; ecology folks >>> worry about this a lot too). >>> >>> In this case, though, I think you may need to think more carefully >>> about whether your similarity measure is really appropriate. If your >>> matrix is uninvertible, then IIUC that means you think that you >>> effectively have less than 1000 distinct genomes -- some of your >>> genomes are "so similar" to other ones that they can be predicted >>> *exactly*. >> >> That's exactly true - some of the bacteria are almost identical (I'll try >> filtering those and see if it changes anything). > > You aren't just telling the computer that they're almost identical -- > that would be fine, the model would just mostly-but-not-entirely > ignore the near-duplicates. You're telling the computer that they are > exactly identical and you had no reason to even collect the data > because you knew ahead of time exactly what it would be. This is the > sort of thing that really confuses statistical programs :-). > >>> In terms of the underlying probabilistic model: you have some >>> population of bacteria genomes, and you picked 1000 of them to study. >>> Each genome you picked has some length, and it also has a number of >>> genes. The number of genes is determined probabilistically by taking >>> some linear function of the length, and then adding some Gaussian >>> noise. Your goal is to figure out what that linear function looks >>> like. >>> >>> In OLS, we assume that each of those Gaussian noise terms is IID. In >>> GLS, we assume that they're correlated. The way to think about this is >>> that we take 1000 uncorrelated IID gaussian samples, let's call this >>> vector "g", and then we mix them together by multiplying by a matrix >>> chol(V), chol(V)*g. (cholV) is the cholesky decomposition; it's >>> triangular, and chol(V)*chol(V)' = V.) So the noise added to each >>> measurement is a mixture of these underlying IID gaussian terms, and >>> bacteria that are more similar have noise terms that overlap more. >>> >> >> I'm also unable to calculate chol of my V matrix, because it doesn't appear >> to be a positive definite. Any suggestion here? > > Singular matrices can't be positive definite, by definition. They can > be positive semi-definite. (The analogy is numbers -- a number that is > zero cannot be greater than zero, by definition. But it can be >= > zero.) Any well-defined covariance matrix is necessarily positive > semi-definite. If your covariance matrix isn't positive semi-definite, > then that's like claiming that you have three random variables where A > and B have a correlation of 0.99, and B and C have a correlation of > 0.99, but A and C are uncorrelated. That's impossible. ([[1, 0.99, 0], > [0.99, 1, 0.99], [0, 0.99, 1]] is not a positive-definite matrix.) > > Singular, positive semi-definite matrices do *have* Cholesky > decompositions, but your average off-the-shelf Cholesky routine can't > compute them. (Again by analogy -- in theory you can compute the > square root of zero, but in practice you can't reliably with floating > point, because your "zero" may turn out to actually be represented as > "-2.2e-16" or something, and an off-the-shelf square root routine will > blow up on this because it looks negative.) You can look around for a > "rank-revealing Cholesky", perhaps. I would use SVD or eigenvalue decomposition to get the transformation matrix. With reduced rank and dropping zero eigenvalues, I think, the transformation will just drop some observations that are redundant. Or for normal equations, use X pinv(V) X beta = X pinv(V) y which uses SVD inside and requires less work writing the code. I'm reasonably sure that I have seen the pinv used this way before. That still leaves going from similarity matrix to covariance matrix. Josef > > Anyway, the question is whether your matrix is positive semi-definite. > If it is, then this is all expected, and your problem is just that you > need to fix your covariances to be more realistic, as discussed. If it > isn't, then you don't even have a covariance matrix, and again you > need to figure out how to get one :-). You can check for positive > (semi-)definiteness by looking at the eigenvalues -- they should be > all >= 0 for semi-definite, > 0 for definite. > > The easiest way to manufacture a positive-definite matrix on command > is to take a non-singular matrix A and compute A'A. > > HTH, > - N > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From peter.cimermancic at gmail.com Thu Mar 8 12:32:45 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Thu, 8 Mar 2012 09:32:45 -0800 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: > > > > I would use SVD or eigenvalue decomposition to get the transformation > matrix. With reduced rank and dropping zero eigenvalues, I think, the > transformation will just drop some observations that are redundant. > > Or for normal equations, use X pinv(V) X beta = X pinv(V) y which > uses SVD inside and requires less work writing the code. > > I'm reasonably sure that I have seen the pinv used this way before. > > That still leaves going from similarity matrix to covariance matrix. > Yes, pinv() solved the compute problem (no errors anymore). I've also found some papers describing how to get from a similarity matrix to correlation. Do you maybe know, are p-values (from MSE calculation) fairly accurate this way? Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Mar 8 13:07:17 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 8 Mar 2012 13:07:17 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 12:32 PM, Peter Cimerman?i? wrote: >> >> >> I would use SVD or eigenvalue decomposition to get the transformation >> matrix. With reduced rank and dropping zero eigenvalues, I think, the >> transformation will just drop some observations that are redundant. >> >> Or for normal equations, use X pinv(V) X beta = X pinv(V) y ? ?which >> uses SVD inside and requires less work writing the code. >> >> I'm reasonably sure that I have seen the pinv used this way before. >> >> That still leaves going from similarity matrix to covariance matrix. > > > Yes, pinv() solved the compute problem (no errors anymore). I've also found > some papers describing how to get from a similarity matrix to correlation. > Do you maybe know, are p-values (from MSE calculation) fairly accurate this > way? While parameter estimates are pretty robust, the standard errors and the pvalues depend a lot on additional assumptions. If the assumptions are not satsified with a given datasets, then the pvalues can be pretty far off. For example if your error covariance matrix (from the V) is misspecified, then it could be the case that the pvalues are not very accurate. In a small sample assuming normal distribution might be a problem, but I would expect that for 1000 observations (or close to it) asymptotic normality will be accurate enough. With only one regressor (plus constant) multicollinearity cannot have a negative impact, so I wouldn't expect any other numerical problems. If your pvalue is 0.04 or 0.11 then I would do some additional specification checks. If the pvalue is 0.6 or 1.e-4, then I wouldn't worry about pvalue accuracy. Comparing the GLS standard errors with the (in this case incorrect) standard errors from OLS might give some idea about how much p-values can change with your data. I would be interested in hearing how you get from a similarity matrix to correlation matrix in your case. I would like to see if it is very difficult to include something like this in statsmodels. Josef > > > Peter > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Thu Mar 8 13:35:08 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Mar 2012 11:35:08 -0700 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 11:07 AM, wrote: > On Thu, Mar 8, 2012 at 12:32 PM, Peter Cimerman?i? > wrote: > >> > >> > >> I would use SVD or eigenvalue decomposition to get the transformation > >> matrix. With reduced rank and dropping zero eigenvalues, I think, the > >> transformation will just drop some observations that are redundant. > >> > >> Or for normal equations, use X pinv(V) X beta = X pinv(V) y which > >> uses SVD inside and requires less work writing the code. > >> > >> I'm reasonably sure that I have seen the pinv used this way before. > >> > >> That still leaves going from similarity matrix to covariance matrix. > > > > > > Yes, pinv() solved the compute problem (no errors anymore). I've also > found > > some papers describing how to get from a similarity matrix to > correlation. > > Do you maybe know, are p-values (from MSE calculation) fairly accurate > this > > way? > > While parameter estimates are pretty robust, the standard errors and > the pvalues depend a lot on additional assumptions. > If the assumptions are not satsified with a given datasets, then the > pvalues can be pretty far off. For example if your error covariance > matrix (from the V) is misspecified, then it could be the case that > the pvalues are not very accurate. > In a small sample assuming normal distribution might be a problem, but > I would expect that for 1000 observations (or close to it) asymptotic > normality will be accurate enough. > > With only one regressor (plus constant) multicollinearity cannot have > a negative impact, so I wouldn't expect any other numerical problems. > > If your pvalue is 0.04 or 0.11 then I would do some additional > specification checks. If the pvalue is 0.6 or 1.e-4, then I wouldn't > worry about pvalue accuracy. > > Comparing the GLS standard errors with the (in this case incorrect) > standard errors from OLS might give some idea about how much p-values > can change with your data. > > I would be interested in hearing how you get from a similarity matrix > to correlation matrix in your case. I would like to see if it is very > difficult to include something like this in statsmodels. > > With a model this simple there are likely to be significant systematic errors, which would make it even more difficult to interpret significance. OTOH, this may be a case where the residuals are as interesting as the parameter values. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 8 14:04:31 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Mar 2012 19:04:31 +0000 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 6:07 PM, wrote: > While parameter estimates are pretty robust, the standard errors and > the pvalues depend a lot on additional assumptions. > If the assumptions are not satsified with a given datasets, then the > pvalues can be pretty far off. For example if your error covariance > matrix (from the V) is misspecified, then it could be the case that > the pvalues are not very accurate. > In a small sample assuming normal distribution might be a problem, but > I would expect that for 1000 observations (or close to it) asymptotic > normality will be accurate enough. > > With only one regressor (plus constant) multicollinearity cannot have > a negative impact, so I wouldn't expect any other numerical problems. > > If your pvalue is 0.04 or 0.11 then I would do some additional > specification checks. If the pvalue is 0.6 or 1.e-4, then I wouldn't > worry about pvalue accuracy. > > Comparing the GLS standard errors with the (in this case incorrect) > standard errors from OLS might give some idea about how much p-values > can change with your data. These kinds of GLS models are one of the places where having the wrong model can give you arbitrarily spurious p values. To get an intuition, consider the case where your errors are all very highly correlated, so while you made N measurements, you really only effectively have 1. Without proper correction, as N increases, your p value will get arbitrarily small... even though you still only have 1 real data point. Most cases aren't so extreme, of course, but that's the kind of thing you have to be careful of -- underestimating your correlations = overestimating your significance. A good thing to do is check whether the resulting residuals "look uncorrelated" -- if you have corrected for similarity in the analysis, then bacteria that are similar to each other should not have similar residuals, overall. A coarse check of this would be to come up with some method for visualizing similarity spatially (like clustering your bacteria into a dendrogram, or using factor analysis to plot coarse similarity in 1 or 2 dimensions), and then using this to arrange your residuals. Then you'd want to check that you don't see any overall patterns, like one part of the plot has residuals that are systematically larger than another part. - N From eemselle at eso.org Thu Mar 8 15:29:22 2012 From: eemselle at eso.org (Eric Emsellem) Date: Thu, 08 Mar 2012 21:29:22 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? Message-ID: <4F5916A2.2040604@eso.org> Dear all, I know the title looks a little provocative, but this was obviously done on purpose. I am very impressed by the capabilities of scipy (et al., numpy etc) and have been a fan since years! But one thing (in my opinion) seems to be missing (see below). If it exists, then great (and apologies)! What I didn't find in Scipy (or numpy or..) is *an efficient least-squares fitting routine which can include bounded, or fixed parameters*. This seems like something many people must be needing! I am right now using mpfit.py (from minpack then Craig B. Markwardt for idl and Mark Rivers for python), which I did integrate in the package I am developing. It is much faster than many other routines in scipy although Adam Ginsburg did mention some test-bench he conducted some time ago, showing that leastsq was quite efficient. It can include bounds, fixed parameters etc. And it works great! But this is probably not the best way to have such a stand-alone routine... and it is far from being optimised for the modern python. So: is there ANY plan for having such a module in Scipy?? I think (personally) that this is a MUST DO. This is typically the type of routines that I hear people use in e.g., idl etc. If this could be an optimised, fast (and easy to use) routine, all the better. Any input is welcome! Thanks. Eric From adam.ginsburg at colorado.edu Thu Mar 8 15:40:31 2012 From: adam.ginsburg at colorado.edu (Adam Ginsburg) Date: Thu, 8 Mar 2012 13:40:31 -0700 Subject: [SciPy-User] scipy compiles, but importing interpolate fails Message-ID: Hi, I've recently (surprisingly) gotten scipy to compile by following these http://blog.hyperjeff.net/?p=160 instructions. However, if I try to import scipy.interpolate, it fails. I'm trying to install scipy into a virtualenv environment, though I don't think that's the issue because I have another install in a Framework that sees the same error. I'm using numpy 1.6.1, scipy 0.10.1, mac OS X 10.6.8. Can anyone help me understand the following error? $ ~/virtual-python/bin/python -c "import scipy, scipy.interpolate" Traceback (most recent call last): File "", line 1, in File "/Users/adam/virtual-python/lib/python2.7/site-packages/scipy/interpolate/__init__.py", line 156, in from ndgriddata import * File "/Users/adam/virtual-python/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", line 9, in from interpnd import LinearNDInterpolator, NDInterpolatorBase, \ File "numpy.pxd", line 174, in init interpnd (scipy/interpolate/interpnd.c:7771) ValueError: numpy.ndarray has the wrong size, try recompiling Thanks, -- Adam Ginsburg Graduate Student Center for Astrophysics and Space Astronomy University of Colorado at Boulder http://casa.colorado.edu/~ginsbura/ From gael.varoquaux at normalesup.org Thu Mar 8 16:07:22 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 8 Mar 2012 22:07:22 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <4F5916A2.2040604@eso.org> References: <4F5916A2.2040604@eso.org> Message-ID: <20120308210722.GC12436@phare.normalesup.org> I am sorry I am going to react to the provocation. As some one who spends a fair amount of time working on open source software I hear such remarks quite often: 'why is feature foo not implemented in package bar?. I am finding it harder and harder not to react negatively to these emails. Now I cannot consider myself as a contributor to scipy, and thus I can claim that I am not taking your comment personally. Why isn't scipy not up to the task? Will, the answer is quite simple: because it's developed by volunteers that do it on their spare time, late at night too often, or companies that put some of their benefits in open source rather in locking down a market. 90% of the time the reason the feature isn't as good as you would want it is because of lack of time. I personally find that suggesting that somebody else should put more of the time and money they are already giving away in improving a feature that you need is almost insulting. I am aware that people do not realize how small the group of people that develop and maintain their toys is. Borrowing from Fernando Perez's talk at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide 80), the number of people that do 90% of the grunt work to get the core scientific Python ecosystem going is around two handfuls. I'd like to think that it's a problem of skill set: users that have the ability to contribute are just too rare. This is not entirely true, there are scores of skilled people on the mailing lists. You yourself mention that you are developing a package. Sorry for the rant, but if you want things to improve, you will have more successes sending in pull request than messages on mailing list that sound condescending to my ears. I hope that I haven't overreacted too badly. Ga?l From gael.varoquaux at normalesup.org Thu Mar 8 16:09:35 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 8 Mar 2012 22:09:35 +0100 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: References: Message-ID: <20120308210935.GD12436@phare.normalesup.org> On Thu, Mar 08, 2012 at 01:40:31PM -0700, Adam Ginsburg wrote: > from interpnd import LinearNDInterpolator, NDInterpolatorBase, \ > File "numpy.pxd", line 174, in init interpnd > (scipy/interpolate/interpnd.c:7771) > ValueError: numpy.ndarray has the wrong size, try recompiling My guess is that the numpy headers that were used during the compilation of scipy do no correspond to the numpy that is being imported when you are trying to import scipy. In other words, your compilation environment doesn't match well your run time environment. HTH, Gael From keflavich at gmail.com Thu Mar 8 16:59:44 2012 From: keflavich at gmail.com (Keflavich) Date: Thu, 8 Mar 2012 13:59:44 -0800 (PST) Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <20120308210935.GD12436@phare.normalesup.org> References: <20120308210935.GD12436@phare.normalesup.org> Message-ID: That's plausible. How do I specify which numpy is used when compiling scipy? On Mar 8, 2:09?pm, Gael Varoquaux wrote: > On Thu, Mar 08, 2012 at 01:40:31PM -0700, Adam Ginsburg wrote: > > ? ? from interpnd import LinearNDInterpolator, NDInterpolatorBase, \ > > ? File "numpy.pxd", line 174, in init interpnd > > (scipy/interpolate/interpnd.c:7771) > > ValueError: numpy.ndarray has the wrong size, try recompiling > > My guess is that the numpy headers that were used during the compilation > of scipy do no correspond to the numpy that is being imported when you > are trying to import scipy. In other words, your compilation environment > doesn't match well your run time environment. > > HTH, > > Gael > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From gael.varoquaux at normalesup.org Thu Mar 8 17:01:40 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 8 Mar 2012 23:01:40 +0100 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: References: <20120308210935.GD12436@phare.normalesup.org> Message-ID: <20120308220140.GA19681@phare.normalesup.org> On Thu, Mar 08, 2012 at 01:59:44PM -0800, Keflavich wrote: > That's plausible. How do I specify which numpy is used when compiling > scipy? It should be the one that is imported by Python when you type 'import numpy'. Basically, in scipy's 'setup.py', the header are found using the 'numpy.get_include_folder()' function. Ga?l From eemselle at eso.org Thu Mar 8 18:12:17 2012 From: eemselle at eso.org (Eric Emsellem) Date: Fri, 09 Mar 2012 00:12:17 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <20120308210722.GC12436@phare.normalesup.org> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> Message-ID: <4F593CD1.6010207@eso.org> Dear Gael thanks for the feedback. Well yes, I thought of the fact that an email (with all the drawbacks of such a medium) may not be the right way, and that my message may be misinterpreted. I obviously didn't mean to be offensive here. I am one of the biggest fan of python et al., open source in general etc, and I try to contribute whenever I can with my very limited expertise (I recently opened a github account and am still fighting to finalised my badly-written package to release it in case it could be useful to anyone). I believe it is fine to react as you did, because it does at least show that people care (seeing the bright side). I apologize: if I offended you, I probably offended others. This was definitely not my initial intent. And I was, for sure, not condescending, by very very far, as again I am impressed by what has been achieved (I did use python at a time when most modules were very buggy or hard to handle, and you had to tweak your system to make it work at all, not talking about early version of e.g. linux distribs). I am putting energy (at my own, very modest, level) to have more users of e.g., python. This means organising tutorials and providing advice when relevant. I try to do that as much as I can. The question about a leastsq I relayed is often what I hear from non-experts when looking at what is available. I always first try to look for an easy solution, to convince them that something like that exists, or that you can solve the problem anyway. In the specific case I describe, I didn't find a way out (besides using the very useful implementation of mpfit!). I do believe that, considering the amazing amount of extensive development in scipy and related packages, such a routine should be available and directly linked with the main, big, packages. This was basically my two cents (probably badly toned) to trigger a reaction. I did for sure get one, but not the one I expected. So once more: apologies to all I may have offended. Let's see where this goes now. I see that there is a lmfit package (thanks Matt for the input!). I'll have a look at this asap and try to test it on my own minimisation problem. I do think that having a well-tested module integrated within scipy would be a big plus. Apart from testing these on my specific problems, I cannot offer much considering my limited expertise (in programming and maths). cheers Eric On 03/08/2012 10:07 PM, Gael Varoquaux wrote: > I am sorry I am going to react to the provocation. > > As some one who spends a fair amount of time working on open source > software I hear such remarks quite often: 'why is feature foo not > implemented in package bar?. I am finding it harder and harder not to > react negatively to these emails. Now I cannot consider myself as a > contributor to scipy, and thus I can claim that I am not taking your > comment personally. > From eemselle at eso.org Thu Mar 8 18:17:19 2012 From: eemselle at eso.org (Eric Emsellem) Date: Fri, 09 Mar 2012 00:17:19 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> Message-ID: <4F593DFF.90101@eso.org> > Yes, see https://github.com/newville/lmfit-py, which does everything > you ask for, and a bit more, with the possible exception of "being > included in scipy". For what its worth, I work with Mark Rivers > (who's no longer actively developing Python), and our group is full of > IDL users who are very familiar with Markwardt's implementation. > > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK > directly, so has the advantage of not being implemented in pure IDL or > Python. It is definitely faster than mpfit.py. > > With lmfit-py, one writes a python function-to-minimize that takes a > list of Parameters instead of the array of floating point variables > that scipy.optimize.leastsq() uses. Each Parameter can be freely > varied of fixed, have upper and/or lower bounds placed on them, or be > written as algebraic expressions of other Parameters. Uncertainties > in varied Parameters and correlations between Parameters are estimated > using the same "scaled covariance" method as used in > scipy.optimize.curve_fit(). There is limited support for > optimization methods other than scipy.optimize.leastsq(), but I don't > find these methods to be very useful for the kind of fitting problems > I normally see, so support for them may not be perfect. > > Whether this gets included into scipy is up to the scipy developers. > I'd be happy to support this module within scipy or outside scipy. > I have no doubt that improvements could be made to lmfit.py. If you > have suggestion, I'd be happy to hear them. looks great! I'll have a go at this, as mentioned in my previous post. I believe that leastsq is probably the fastest anyway (according to the test Adam mentioned to me today) so this could be it. I'll make a test and compare it with mpfit (for the specific case I am thinking of, I am optimising over ~10^5-6 points with ~90 parameters...). thanks again for this, and I'll try to report on this (if relevant) asap. Eric From keflavich at gmail.com Thu Mar 8 18:27:48 2012 From: keflavich at gmail.com (Keflavich) Date: Thu, 8 Mar 2012 15:27:48 -0800 (PST) Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <20120308220140.GA19681@phare.normalesup.org> References: <20120308210935.GD12436@phare.normalesup.org> <20120308220140.GA19681@phare.normalesup.org> Message-ID: <1dfb3f19-81a1-42b7-bb6e-16ec2084e964@q18g2000yqh.googlegroups.com> Well, it took another half-dozen clean rebuilds, but I got it working. Thanks! (clarification: it's numpy.get_include(), not numpy.get_include_folder(), I think) On Mar 8, 3:01?pm, Gael Varoquaux wrote: > On Thu, Mar 08, 2012 at 01:59:44PM -0800, Keflavich wrote: > > That's plausible. ?How do I specify which numpy is used when compiling > > scipy? > > It should be the one that is imported by Python when you type 'import > numpy'. Basically, in scipy's 'setup.py', the header are found using the > 'numpy.get_include_folder()' function. > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Thu Mar 8 19:00:12 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 09 Mar 2012 01:00:12 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <4F5916A2.2040604@eso.org> References: <4F5916A2.2040604@eso.org> Message-ID: 08.03.2012 21:29, Eric Emsellem kirjoitti: [clip] > What I didn't find in Scipy (or numpy or..) is *an efficient > least-squares fitting routine which can include bounded, or fixed > parameters*. This seems like something many people must be needing! I am > right now using mpfit.py (from minpack then Craig B. Markwardt for idl > and Mark Rivers for python), which I did integrate in the package I am > developing. mpfit is a Fortran-to-Python translation of a MINPACK routine. Scipy's leastsq uses the original MINPACK Fortran code, so it's probably more efficient than mpfit.py. However, the bounded parameters seems to be a more recent addition that are not in the original. The good news is that mpfit license seems at first sight compatible with Scipy's. There's also an existing pull request for reimplementation of Levenberg-Marquardt which might also work as a base for further work, although IIRC it didn't implement bound limits. The only thing missing is someone who needs this stuff and is not averse for a little bit of dirty work, combining the existing pieces and making sure that the API makes sense. -- Pauli Virtanen From david_baddeley at yahoo.com.au Thu Mar 8 21:37:48 2012 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Thu, 8 Mar 2012 18:37:48 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> Message-ID: <1331260668.7917.YahooMailNeo@web113405.mail.gq1.yahoo.com> You guys beat me too it - I just wanted to add that support for fixed parameters is already available in leastsq (albeit with an interface which means you have to decide which parameters are fixed when you're writing your objective function), and it's not too hard to kludge bounds by performing variable substitution (using,?for example, the square of the variable if you want a one-ended constraint, or a sigmoidal function of the variable such as erf or the logistic function for an interval). This may in fact be preferable to the approach taken by mpfit in which parameters are "pegged" at the boundary as soon as they touch it. cheers, David? ________________________________ From: Pauli Virtanen To: scipy-user at scipy.org Sent: Friday, 9 March 2012 1:00 PM Subject: Re: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? 08.03.2012 21:29, Eric Emsellem kirjoitti: [clip] > What I didn't find in Scipy (or numpy or..) is *an efficient > least-squares fitting routine which can include bounded, or fixed > parameters*. This seems like something many people must be needing! I am > right now using mpfit.py (from minpack then Craig B. Markwardt for idl > and Mark Rivers for python), which I did integrate in the package I am > developing. mpfit is a Fortran-to-Python translation of a MINPACK routine. Scipy's leastsq uses the original MINPACK Fortran code, so it's probably more efficient than mpfit.py. However, the bounded parameters seems to be a more recent addition that are not in the original. The good news is that mpfit license seems at first sight compatible with Scipy's. There's also an existing pull request for reimplementation of Levenberg-Marquardt which might also work as a base for further work, although IIRC it didn't implement bound limits. The only thing missing is someone who needs this stuff and is not averse for a little bit of dirty work, combining the existing pieces and making sure that the API makes sense. -- Pauli Virtanen _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_baddeley at yahoo.com.au Thu Mar 8 22:14:23 2012 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Thu, 8 Mar 2012 19:14:23 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <4F593DFF.90101@eso.org> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> Message-ID: <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> >From a pure performance perspective, you're probably going to be best setting your bounds by variable substitution (particularly if they're only single-ended - x**2 is cheap) - you really don't want to have the for loops, dictionary lookups and conditionals that lmfit introduces for it's bounds checking inside your objective function. I think a high level wrapper that permitted bounds, an unadulterated goal function, and setting which parameters to fit, but also retained much of the raw speed of leastsq could be accomplished with some clever on the fly code generation (maybe also using Sympy to automatically derive the Jacobian). Would make an interesting project ... David ________________________________ From: Eric Emsellem To: Matthew Newville Cc: scipy-user at scipy.org; scipy-user at googlegroups.com Sent: Friday, 9 March 2012 12:17 PM Subject: Re: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? > Yes, see https://github.com/newville/lmfit-py,? which does everything > you ask for, and a bit more, with the possible exception of "being > included in scipy".? For what its worth, I work with Mark Rivers > (who's no longer actively developing Python), and our group is full of > IDL users who are very familiar with Markwardt's implementation. > > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK > directly, so has the advantage of not being implemented in pure IDL or > Python. It is definitely faster than mpfit.py. > > With lmfit-py, one writes a python function-to-minimize that takes a > list of Parameters instead of the array of floating point variables > that scipy.optimize.leastsq() uses. Each Parameter can be freely > varied of fixed, have upper and/or lower bounds placed on them, or be > written as algebraic expressions of other Parameters.? Uncertainties > in varied Parameters and correlations between Parameters are estimated > using the same "scaled covariance" method as used in > scipy.optimize.curve_fit().? There is limited support for > optimization methods other than scipy.optimize.leastsq(), but I don't > find these methods to be very useful for the kind of fitting? problems > I normally see, so support for them may not be perfect. > > Whether this gets included into scipy is up to the scipy developers. > I'd be happy to support this module within scipy or outside scipy. > I have no doubt that improvements could be made to lmfit.py.? If you > have suggestion, I'd be happy to hear them. looks great! I'll have a go at this, as mentioned in my previous post. I believe that leastsq is probably the fastest anyway (according to the test Adam mentioned to me today) so this could be it. I'll make a test and compare it with mpfit (for the specific case I am thinking of, I am optimising over ~10^5-6 points with ~90 parameters...). thanks again for this, and I'll try to report on this (if relevant) asap. Eric _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdekauwe at gmail.com Thu Mar 8 22:31:10 2012 From: mdekauwe at gmail.com (mdekauwe) Date: Thu, 8 Mar 2012 19:31:10 -0800 (PST) Subject: [SciPy-User] [SciPy-user] Is there a better way to read a CSV file and store for processing? Message-ID: <33469432.post@talk.nabble.com> Hi, So I was wondering if there might be a "better" way to go about reading a CSV file and storing it for later post-processing. What I have written does the job fine, but I think there might be a better way as I seem to be duplicating some steps to get around things I don't know. For example I guess ideally I would like to read the CSV file into a numpy array one could access by variable names but I couldn't work that out. Any thoughts welcome. Thanks... CSV file looks a bit like this Year,Day of the year,NPP, etc... --,--,some units, etc... YEAR,DOY,NPP, etc... 1996.0,1.0,10.09, etc... etc etc Code... #!/usr/bin/env python """ Example of reading CSV file and some simple processing... 1. Read CSV file into a python dictionary/list 2. Save the data to a pickle object, to speed up reading back in 3. Read the object back in to test everything is fine 4. Get the timeseries of one of the variables, print it and plot it... """ __author__ = "Martin De Kauwe" __version__ = "1.0 (09.03.2012)" __email__ = "mdekauwe at gmail.com" import numpy as np import sys import glob import csv import cPickle as pickle def main(): for fname in glob.glob("*.csv"): data = read_csv_file(fname, head_length=3, delim=",") # save the data to the hard disk for quick access later pkl_fname = "test_model_data.pkl" save_dictionary(data, pkl_fname) # read the data back in to check it worked... f = open(pkl_fname, 'rb') data = pickle.load(f) npp = get_var(data, "NPP") for i in xrange(len(npp)): print npp[i] import matplotlib.pyplot as plt plt.plot(npp, "ro-") plt.show() def read_csv_file(fname, head_length=None, delim=None): """ read the csv file into a dictionary """ f = open(fname, "rb") # read the correct header keys... f = find_header_keys(f, line_with_keys=2) # read the data into a nice big dictionary...and return as a list reader = csv.DictReader(f, delimiter=',') data = [row for row in reader] return data def find_header_keys(fp, line_with_keys=None): """ Incase the csv file doesn't have the header keys on the first line, advanced the pointer until the line we desire """ dialect = csv.Sniffer().sniff(fp.read(1024)) fp.seek(0) for i in xrange(line_with_keys): next(fp) return fp def save_dictionary(data, outfname): """ save dictionary to disk, i.e. pickle it """ out_dict = open(outfname, 'wb') pickle.dump(data, out_dict, pickle.HIGHEST_PROTOCOL) out_dict.close() def get_var(data, var): """ return the entire time series for a given variable """ return np.asarray([data[i][var] for i in xrange(len(data))]) if __name__ == "__main__": main() -- View this message in context: http://old.nabble.com/Is-there-a-better-way-to-read-a-CSV-file-and-store-for-processing--tp33469432p33469432.html Sent from the Scipy-User mailing list archive at Nabble.com. From william.ratcliff at gmail.com Fri Mar 9 01:20:37 2012 From: william.ratcliff at gmail.com (william ratcliff) Date: Fri, 9 Mar 2012 01:20:37 -0500 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: A response to Gael: I don't think the problem is just a question of motivation/effort being put in by people who are asking for new features. Perhaps I'm overly optimistic, but I think that most people are aware of the effort put in by very busy people to put together scipy and are rather grateful that the tool exists. Many of us would like to see python adopted more in our organizations and find that some tools that our users are used to are not available and would like to see them made available. However, I think that the barrier to inclusion in scipy seems high. A bit of history: We also used the IDL version of mpfit--in moving to python, I looked for an analogue and found mpfit.py that at the time, relied on Numeric. I made a port to numpy and found Sergei Koposov had made a similar port ( http://code.google.com/p/astrolibpy/wiki/AstrolibpyContents) in addition to fixing some bugs in the original and adding extensions. I talked with all of the stakeholders to receive licensing permission for inclusion into scipy. However, there were questions about the coding style, and it never made it in. (I'm happy to see the lmfit project on github) Also, related to bounds, in the past, the scipy implementation of simulated annealing would wander outside of bounds (which is not my expected behavior--or that of many people), so I made a patch, which if a guess would place the next point out of bounds, would stay at the same spot and guess again (it may not be optimal, but it preserves ergodicity). I was told that if someone actually wanted the function to stay in bounds, they could add a penalty function. The end result: I have my own version of anneal.py and mpfit.py. I would like to contribute. I have students that work on things that might be of general use. However, the process needs to be more streamlined if the community wants more participation. Sympy is a good example of this--if you have something that seems useful, they are very happy to take it--and clean it up later (or help you to). I've only watched sckit-learn from a far, but http://scikit-learn.org/stable/developers/index.html seems to provide rather clear instructions for contributing... Take the question of bounds for example--is it better to have no easy way of implementing bounds, or to have the cleanest/most efficient piece of code? What is the actual process of contributing these days? For example, for making a patch, now that the codebase is on github. Do we make a fork, patch, and point to the fork? Submit a patch? If so, where? What exactly are scikits? What determines if something belongs in scipy.optimize as compared to a scikit? What is the process for creating a scikit? The webpage is a bit vague. Do scikits share more than a namespace? Sorry that this is a bit disorganized, but the TL;DR is that I think scipy could do more to make it easier for people to contribute...I understand the need to have maintainable code in a large project, but in many cases, having a less than perfect implementation (with tests) would be better than having no implementation...Also, what may be easy for us, may not be easy to many users of scipy, so having convenience methods is worthwhile... Best, William On Thu, Mar 8, 2012 at 10:14 PM, David Baddeley wrote: > From a pure performance perspective, you're probably going to be best > setting your bounds by variable substitution (particularly if they're only > single-ended - x**2 is cheap) - you really don't want to have the for > loops, dictionary lookups and conditionals that lmfit introduces for it's > bounds checking inside your objective function. > > I think a high level wrapper that permitted bounds, an unadulterated goal > function, and setting which parameters to fit, but also retained much of > the raw speed of leastsq could be accomplished with some clever on the fly > code generation (maybe also using Sympy to automatically derive the > Jacobian). Would make an interesting project ... > > David > > ------------------------------ > *From:* Eric Emsellem > *To:* Matthew Newville > *Cc:* scipy-user at scipy.org; scipy-user at googlegroups.com > *Sent:* Friday, 9 March 2012 12:17 PM > > *Subject:* Re: [SciPy-User] Least-squares fittings with bounds: why is > scipy not up to the task? > > > > > Yes, see https://github.com/newville/lmfit-py, which does everything > > you ask for, and a bit more, with the possible exception of "being > > included in scipy". For what its worth, I work with Mark Rivers > > (who's no longer actively developing Python), and our group is full of > > IDL users who are very familiar with Markwardt's implementation. > > > > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK > > directly, so has the advantage of not being implemented in pure IDL or > > Python. It is definitely faster than mpfit.py. > > > > With lmfit-py, one writes a python function-to-minimize that takes a > > list of Parameters instead of the array of floating point variables > > that scipy.optimize.leastsq() uses. Each Parameter can be freely > > varied of fixed, have upper and/or lower bounds placed on them, or be > > written as algebraic expressions of other Parameters. Uncertainties > > in varied Parameters and correlations between Parameters are estimated > > using the same "scaled covariance" method as used in > > scipy.optimize.curve_fit(). There is limited support for > > optimization methods other than scipy.optimize.leastsq(), but I don't > > find these methods to be very useful for the kind of fitting problems > > I normally see, so support for them may not be perfect. > > > > Whether this gets included into scipy is up to the scipy developers. > > I'd be happy to support this module within scipy or outside scipy. > > I have no doubt that improvements could be made to lmfit.py. If you > > have suggestion, I'd be happy to hear them. > > looks great! I'll have a go at this, as mentioned in my previous post. I > believe that leastsq is probably the fastest anyway (according to the > test Adam mentioned to me today) so this could be it. I'll make a test > and compare it with mpfit (for the specific case I am thinking of, I am > optimising over ~10^5-6 points with ~90 parameters...). > > thanks again for this, and I'll try to report on this (if relevant) asap. > > Eric > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Mar 9 01:31:20 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 9 Mar 2012 07:31:20 +0100 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <1dfb3f19-81a1-42b7-bb6e-16ec2084e964@q18g2000yqh.googlegroups.com> References: <20120308210935.GD12436@phare.normalesup.org> <20120308220140.GA19681@phare.normalesup.org> <1dfb3f19-81a1-42b7-bb6e-16ec2084e964@q18g2000yqh.googlegroups.com> Message-ID: <20120309063120.GC2046@phare.normalesup.org> On Thu, Mar 08, 2012 at 03:27:48PM -0800, Keflavich wrote: > Well, it took another half-dozen clean rebuilds, but I got it > working. Thanks! Well, good job on persisting: half a dozen rebuild is too much :). Do you have an idea what finally made the difference? > (clarification: it's numpy.get_include(), not > numpy.get_include_folder(), I think) Good point, I was writing this email in a rush, without checking my facts. Gael > On Mar 8, 3:01?pm, Gael Varoquaux > wrote: > > On Thu, Mar 08, 2012 at 01:59:44PM -0800, Keflavich wrote: > > > That's plausible. ?How do I specify which numpy is used when compiling > > > scipy? > > It should be the one that is imported by Python when you type 'import > > numpy'. Basically, in scipy's 'setup.py', the header are found using the > > 'numpy.get_include_folder()' function. > > Ga?l > > _______________________________________________ > > SciPy-User mailing list > > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info From gael.varoquaux at normalesup.org Fri Mar 9 01:57:10 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 9 Mar 2012 07:57:10 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: <20120309065710.GD2046@phare.normalesup.org> David, You raise good points. On the one hand, contributing to scipy may be a bit more technical than it should be for someone wanting to add simple things: it requires building the beast! On the other hand, a lot of the points you raise simply boil down to the fact that putting independent piece of code together in a larger, somewhat consistent project, is actually much more work than writing the individual pieces of code. There is a large overhead of big project. It is also much more value. With regards to the scikits, that you mention, I am a huge fan of the scikits approach, because it enables to break down a bit that friction of large project, to the cost of some fragmentation. Let us not fool us, the major scikits can easily grow fat and end up in the same situation, although if I believe the N^2 complexity increase ( http://www.computer.org/csdl/trans/ts/1979/02/01702600-abs.html ) means that the union of two projects will tend to be significantly harder to handle that each project separately. To answer your question about scikits: they are nothing much more than a brand name, and maybe a bit of a community trend. Both are good, because they help create dynamism, but they are not enough per se. With regards to how disorganized things are, you are definitely right. Organizing a community, writing contribution guidelines, keeping web pages up to date, takes a lot of time. Such work, also from volunteers, is also needed to keep a project alive, and even more a 'meta project', like scipy. Ga?l PS: As a side note, I am recruiting on a regular basis (say one young engineer everything year) people to work part time on the scipy ecosystem (mainly on scikit-learn, but side projects are encouraged). The salary doesn't compete with what the industry has to offer, and I've had a hard time finding good people. From eemselle at eso.org Fri Mar 9 03:46:37 2012 From: eemselle at eso.org (Eric Emsellem) Date: Fri, 09 Mar 2012 09:46:37 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: <4F59C36D.3020607@eso.org> Thanks David for this. The main issue is not in fact to solve the pb for myself (with some variable substitution or..) as I can also think of e.g. interfacing C/fortran efficient codes with python via standard wrapping (I had used this with e.g. the amazing NAG library with the help of expert programmers). There are 3 issues here (which are closely related to each others): - to have such a module integrated in scipy means that new python users would find the module by default and do not need to install more and more modules. This is one of the problem many people encounter. In the early days of scipy (or python) things had to be installed, tuned, re-installed, etc. This was fun but does not allow a large community to join. There are efforts to coordinate, homogeneise, optimise all this. Scipy is one of these (and an impressive success). Astropy is another path specific to astronomy (my field). But for such complex routines, we need (I believe) things which are "simple" to use and already integrated. I acknowledge this is a huge effort, both to develop the module, and integrate it and I am not blaming anyone here (on the contrary, as mentioned, I am very impressed by what has been achieved!). I am just saying: I believe this is a "must have". People who will need such a module for their own goals could then use it transparently. - if the specifics of the bounds/fixed parameters are in the user-defined function itself, then we loose it I think. To me it is then nearly equivalent (although slightly better), for a new python user, as having to download and install several additional packages. You need to spend some time tuning your function, and cannot change it on the fly. On the long run, I would be surprised if the "non-advanced" users would really go for this. They would turn to e.g., idl or whatever is convenient for them. - When contributing to an effort like astropy (via e.g., github) and when you do post a new package, you would like to avoid requiring the installation of 2-3 more packages on top of the one you are proposing (even if their installation is automatised). At the moment, my package includes mpfit.py as a sub-module. This is bad practice (as various packages will have various versions of mpfit maybe, and mpfit is not optimised) but this guarantees that the person who downloads the package can just rely on that. In astropy, the guideline is that APART from matplotlib, scipy/numpy, you shouldn't have to download more if you wish to have a specific piece of software work on your computer. This ensures that the community reacts positively to this coordinating effort (which is very significant) and that it will attract more and more people around these beautiful developments, namely numpy, scipy et al. Of course, this is just a biased opinion from a non-expert python user! :-) cheers Eric On 03/09/2012 04:14 AM, David Baddeley wrote: > From a pure performance perspective, you're probably going to be best setting > your bounds by variable substitution (particularly if they're only single-ended > - x**2 is cheap) - you really don't want to have the for loops, dictionary > lookups and conditionals that lmfit introduces for it's bounds checking inside > your objective function. > > I think a high level wrapper that permitted bounds, an unadulterated goal > function, and setting which parameters to fit, but also retained much of the raw > speed of leastsq could be accomplished with some clever on the fly code > generation (maybe also using Sympy to automatically derive the Jacobian). Would > make an interesting project ... From adnothing at gmail.com Fri Mar 9 03:47:24 2012 From: adnothing at gmail.com (Adrien Gaidon) Date: Fri, 9 Mar 2012 09:47:24 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: 2012/3/9 william ratcliff > > Take the question of bounds for example--is it better to have no easy way > of implementing bounds, or to have the cleanest/most efficient piece of > code? What is the actual process of contributing these days? For > example, for making a patch, now that the codebase is on github. Do we > make a fork, patch, and point to the fork? Submit a patch? If so, where? > > > What exactly are scikits? What determines if something belongs in > scipy.optimize as compared to a scikit? What is the process for creating a > scikit? The webpage is a bit vague. Do scikits share more than a > namespace? > > Sorry that this is a bit disorganized, but the TL;DR is that I think scipy > could do more to make it easier for people to contribute...I understand the > need to have maintainable code in a large project, but in many cases, > having a less than perfect implementation (with tests) would be better than > having no implementation... > I 100% agree with William here and think *how* to contribute is at the heart of the problem. I think many users in the scipy multiverse have their own `utils.py` or other home-made modules, which may contain code useful to a wide audience and absent of the numpy/scipy or related well-known libraries (e.g. the excellent scikit-learn). As all pythonistas have a good heart, I'm sure they would like to share it, but, as William said, the road along that path is unclear, bumpy and sometimes not super friendly. For instance, I wrote a simple multi-dimensional digitize function and posted a gist (https://gist.github.com/1509853) to this list (or numpy or some other relevant mailing list...). Before doing that, I really pondered: "is it useful enough? is it not trivial? where and how should I contribute it?" etc. All these metaphysical questions are a barrier to the wannabe-contributor, that, IMHO, filters out a lot of useful code. Especially for such small contributions, the hassle becomes superior to the expected gain for the community and the code ends up self-censored or forgotten (we're all always very busy). That's problematic, because I believe in the "emergence" property of open source projects: a sum of small contributions can make a powerful library. Furthermore, it seems that large projects tend to have API zealots that don't even want to see code unless it can be directly merged in master (caricature). I totally understand that, and think it's in the nature of open source projects in order to not grow anarchistically. However, this also prevents small "diamonds in the rough" to be discovered, or useful temporary hole-filling solutions to be proposed until a proper one is available. To me, this is a false problem due to the fact that the only advertised way to contribute is by forking + pull request. But not everybody is a scipy source code guru! Therefore, I think it's necessary for the community to discuss this issue, get a consensus on the desired ways to contribute with respect to the contribution type (very important), write a small tutorial or document explaining this, and, most importantly, publicly advertise it on the website. So far, the ways to contribute that I know of are: - fork + pull request: high barrier of entry - mailing list + gist: quick & easy, but one must be willing to "spam" everyone - scipy central If I have forgotten any, feel free to add them! To conclude this long rant, I think that http://scipy-central.org/ is a great idea with lots of potential, especially for sharing small snippets. To me, it can be some kind of "social network for code snippets", and the comments / voting / popularity system can allow "true scipy contributors" to peek at the best contributions and clean / test / integrate their selection. That way, users can just "dump" their code, without worrying about the difficult engineering issues of integration into scipy! But it needs more advertising to gain visibility to every scipy user. For now, there is not even a link on http://www.scipy.org/ and Google results are not looking either... Cheers, Adrien -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Mar 9 06:04:00 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 09 Mar 2012 12:04:00 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: Hi, 09.03.2012 07:20, william ratcliff kirjoitti: [clip] > We also used the IDL version of mpfit--in moving to python, I looked > for an analogue and found mpfit.py that at the time, relied on Numeric. > I made a port to numpy and found Sergei Koposov had made a similar > port (http://code.google.com/p/astrolibpy/wiki/AstrolibpyContents) in > addition to fixing some bugs in the original and adding extensions. I > talked with all of the stakeholders to receive licensing permission for > inclusion into scipy. However, there were questions about the coding > style, and it never made it in. (I'm happy to see the lmfit project on > github) Sorry, I completely forgot you had already done work for inclusion of mpfit, and I guess it was my reply that halted it: http://mail.scipy.org/pipermail/scipy-dev/2009-May/011947.html My intent here was honestly not to be a total API zealot and say that *everything* needs to be fixed before checkin --- just that errors should raise exceptions, and some minor stylistic cleanup should be made --- the rest could be cleaned up later. Though, I can understand that a long laundry list of things to correct is not the nicest first response to code contributions. There's also a second issue here, which is more organizational --- since there was no procedure for the contributions, I lost track of where this work was progressing, and eventually forgot about it. This is where Github's pull requests improve the situation by a large amount. In principle Trac could serve the same role, but in practice it turns out to work somewhat less well. [clip] > Sorry that this is a bit disorganized, but the TL;DR is that I think > scipy could do more to make it easier for people to contribute... I > understand the need to have maintainable code in a large project, but in > many cases, having a less than perfect implementation (with tests) would > be better than having no implementation... Also, what may be easy for us, > may not be easy to many users of scipy, so having convenience methods is > worthwhile... The fine line to walk here is that there must be some quality control for code contributions, to avoid ending up with a set of routines that are awkward to use or don't work as promised (except around the research problem for which it was first written). The point in doing as much as possible before accepting contributions is that if this is left for later, the contributor may be MIA and there's nobody around who understands the piece of code well, and you're committed to a clunky API which you cannot easily change anymore if there has been a release in between. The flip side is of course that the barrier to contributions is higher, and it should not be made too high. The scipy-central.org is a good solution for just sharing research code with minimum hassle, and definitely could use more advertisement. Pauli From josef.pktd at gmail.com Fri Mar 9 07:12:09 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 9 Mar 2012 07:12:09 -0500 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: On Fri, Mar 9, 2012 at 6:04 AM, Pauli Virtanen wrote: > Hi, > > 09.03.2012 07:20, william ratcliff kirjoitti: > [clip] >> We also used the IDL version of ?mpfit--in moving to python, I looked >> for an analogue and found mpfit.py that at the time, relied on Numeric. >> ? I made a port to numpy and found Sergei Koposov had made a similar >> port (http://code.google.com/p/astrolibpy/wiki/AstrolibpyContents) in >> addition to fixing some bugs in the original and adding extensions. ? I >> talked with all of the stakeholders to receive licensing permission for >> inclusion into scipy. ? However, there were questions about the coding >> style, and it never made it in. ?(I'm happy to see the lmfit project on >> github) > > Sorry, I completely forgot you had already done work for inclusion of > mpfit, and I guess it was my reply that halted it: > > ? ?http://mail.scipy.org/pipermail/scipy-dev/2009-May/011947.html > > My intent here was honestly not to be a total API zealot and say that > *everything* needs to be fixed before checkin --- just that errors > should raise exceptions, and some minor stylistic cleanup should be made > --- the rest could be cleaned up later. Though, I can understand that a > long laundry list of things to correct is not the nicest first response > to code contributions. > > There's also a second issue here, which is more organizational --- since > there was no procedure for the contributions, I lost track of where this > work was progressing, and eventually forgot about it. This is where > Github's pull requests improve the situation by a large amount. In > principle Trac could serve the same role, but in practice it turns out > to work somewhat less well. > > [clip] >> Sorry that this is a bit disorganized, but the TL;DR is that I think >> scipy could do more to make it easier for people to contribute... I >> understand the need to have maintainable code in a large project, but in >> many cases, having a less than perfect implementation (with tests) would >> be better than having no implementation... Also, what may be easy for us, >> may not be easy to many users of scipy, so having convenience methods is >> worthwhile... > > The fine line to walk here is that there must be some quality control > for code contributions, to avoid ending up with a set of routines that > are awkward to use or don't work as promised (except around the research > problem for which it was first written). The point in doing as much as > possible before accepting contributions is that if this is left for > later, the contributor may be MIA and there's nobody around who > understands the piece of code well, and you're committed to a clunky API > which you cannot easily change anymore if there has been a release in > between. The flip side is of course that the barrier to contributions is > higher, and it should not be made too high. taking anneal as an example None of the scipy "maintainers" knew the background for simulated annealing The function in scipy has several problems, and it might take some time for someone that understands this to make it work better. The proposed patch for bounds, *replaced* the bounds on the updating step with bounds on the parameters instead of adding an additional functionality. http://projects.scipy.org/scipy/ticket/1126 http://projects.scipy.org/scipy/ticket/875 http://article.gmane.org/gmane.comp.python.scientific.devel/10398 I didn't and don't think just changing the behavior was an appropriate patch. Essentially, maintenance for global optimizers in scipy is MIA because of the lack of someone (?) who can evaluate it and convert a patch into a tested improvement of the code. The advantage of a developers' own tools or utilities functions is that the developer can do as much testing and quality control as (s)he feels like or as much as is necessary for a specific use case. For inclusion in scipy the quality control should be higher, in my opinion. One advantage of scipy-central and a commenting system would be that code will get user testing and feedback, so it's easier to evaluate for inclusion in scipy whether the code works as promised, and is useful to a wider audience. scipy is missing linear programming, and quadratic programming in the scipy.optimize, and there is no work-around. Both are "must-haves", I think. Josef > > The scipy-central.org is a good solution for just sharing research code > with minimum hassle, and definitely could use more advertisement. > > ? ? ? ?Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From guyer at nist.gov Fri Mar 9 08:50:34 2012 From: guyer at nist.gov (Jonathan Guyer) Date: Fri, 9 Mar 2012 08:50:34 -0500 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: References: Message-ID: <9B08C4C4-7639-4996-AB7E-82D96ADCCF79@nist.gov> My instructions to FiPy users are at: http://www.matforge.org/fipy/wiki/InstallFiPy/MacOSX/SnowLeopard They're largely based on hyperjeff's, although his sudo mv /System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/numpy \ /System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/numpyX is evil and unnecessary if you pass --no-site-packages to mkvirtualenv. I'm still on SciPy 0.9.0, but I don't have any trouble importing scipy.interpolate. On Mar 8, 2012, at 3:40 PM, Adam Ginsburg wrote: > Hi, I've recently (surprisingly) gotten scipy to compile by following > these http://blog.hyperjeff.net/?p=160 instructions. However, if I > try to import scipy.interpolate, it fails. I'm trying to install > scipy into a virtualenv environment, though I don't think that's the > issue because I have another install in a Framework that sees the same > error. > > I'm using numpy 1.6.1, scipy 0.10.1, mac OS X 10.6.8. > > Can anyone help me understand the following error? > > $ ~/virtual-python/bin/python -c "import scipy, scipy.interpolate" > Traceback (most recent call last): > File "", line 1, in > File "/Users/adam/virtual-python/lib/python2.7/site-packages/scipy/interpolate/__init__.py", > line 156, in > from ndgriddata import * > File "/Users/adam/virtual-python/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", > line 9, in > from interpnd import LinearNDInterpolator, NDInterpolatorBase, \ > File "numpy.pxd", line 174, in init interpnd > (scipy/interpolate/interpnd.c:7771) > ValueError: numpy.ndarray has the wrong size, try recompiling > > Thanks, > -- > Adam Ginsburg > Graduate Student > Center for Astrophysics and Space Astronomy > University of Colorado at Boulder > http://casa.colorado.edu/~ginsbura/ > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From sturla at molden.no Fri Mar 9 10:47:03 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Mar 2012 16:47:03 +0100 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: Message-ID: <4F5A25F7.3020408@molden.no> On 08.03.2012 05:25, Peter Cimerman?i? wrote: > To describe my problem into more details, I have a list of ~1000 > bacterial genome lengths and number of certain genes for each one of > them. I'd like to see if there is any correlation between genome lengths > and number of the genes. > It may look like an easy linear regression > problem; No, it does not. If you are working with counts, the appropriate model would usually be Poisson regression. I.e. Generalized linear model with log-link function and Possion probability family. I have seen many examples of microbiologists using linear regression when they should actually use Poisson regression (e.g. counting genes) or logistic regression (e.g. dose-response and titration curves). This will do it for you: MATLAB: glmfit from the statistics toolbox R: glm SAS: PROC GLIM Python: statmodels scikit Another example of inappropriate use of linear regression in microbiology is the Lineweaver-Burk plot as substitute for non-linear least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten curve. Some microbiologists are bevare of this, but they seem to prefer all sorts of ad hoc trickeries like linearizations and variance-stabilizing transforms instead of "just doing it right". As for samples that are not independent, that will affect the final likelihood. If you want to optimize the log-likelhood yourself, to control for this, getting ML estimates by maximizing the log-likelhood is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's method does not even need the gradient.) And if you need the "p-value", you can either use the likelihood ratio or Monte Carlo (e.g. permutation test). Sturla P.S. I think biostatistics courses that biologists are tought do not cover the tools that are most commonly needed. Ronald Fisher (famous for multiple regression and ANOVA) worked with quantitative genetics (e.g. animal and plant breeding). But today most biologists work in molecular biology labs, and other methods for data analysis are often needed. That includes generalized linear models, non-linear regression, image processing (for microscopy), and general signal processing (e.g. electrophysiology). From matt.newville at gmail.com Thu Mar 8 17:09:49 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Thu, 8 Mar 2012 14:09:49 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <4F5916A2.2040604@eso.org> References: <4F5916A2.2040604@eso.org> Message-ID: <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> Dear Eric, On Thursday, March 8, 2012 2:29:22 PM UTC-6, Eric Emsellem wrote: > Dear all, > > I know the title looks a little provocative, but this was obviously done > on purpose. I am very impressed by the capabilities of scipy (et al., > numpy etc) and have been a fan since years! But one thing (in my > opinion) seems to be missing (see below). If it exists, then great (and > apologies)! > > What I didn't find in Scipy (or numpy or..) is *an efficient > least-squares fitting routine which can include bounded, or fixed > parameters*. This seems like something many people must be needing! I am > right now using mpfit.py (from minpack then Craig B. Markwardt for idl > and Mark Rivers for python), which I did integrate in the package I am > developing. It is much faster than many other routines in scipy although > Adam Ginsburg did mention some test-bench he conducted some time ago, > showing that leastsq was quite efficient. It can include bounds, fixed > parameters etc. And it works great! But this is probably not the best > way to have such a stand-alone routine... and it is far from being > optimised for the modern python. > > So: > > is there ANY plan for having such a module in Scipy?? I think > (personally) that this is a MUST DO. This is typically the type of > routines that I hear people use in e.g., idl etc. If this could be an > optimised, fast (and easy to use) routine, all the better. Yes, see https://github.com/newville/lmfit-py, which does everything you ask for, and a bit more, with the possible exception of "being included in scipy". For what its worth, I work with Mark Rivers (who's no longer actively developing Python), and our group is full of IDL users who are very familiar with Markwardt's implementation. The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK directly, so has the advantage of not being implemented in pure IDL or Python. It is definitely faster than mpfit.py. With lmfit-py, one writes a python function-to-minimize that takes a list of Parameters instead of the array of floating point variables that scipy.optimize.leastsq() uses. Each Parameter can be freely varied of fixed, have upper and/or lower bounds placed on them, or be written as algebraic expressions of other Parameters. Uncertainties in varied Parameters and correlations between Parameters are estimated using the same "scaled covariance" method as used in scipy.optimize.curve_fit(). There is limited support for optimization methods other than scipy.optimize.leastsq(), but I don't find these methods to be very useful for the kind of fitting problems I normally see, so support for them may not be perfect. Whether this gets included into scipy is up to the scipy developers. I'd be happy to support this module within scipy or outside scipy. I have no doubt that improvements could be made to lmfit.py. If you have suggestion, I'd be happy to hear them. Cheers, --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.newville at gmail.com Thu Mar 8 19:55:49 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Thu, 8 Mar 2012 16:55:49 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <20120308210722.GC12436@phare.normalesup.org> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> Message-ID: <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Gael, On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: > > I am sorry I am going to react to the provocation. And I am sorry that I am going to react to your message. I think your reaction is unfair. > As some one who spends a fair amount of time working on open source > software I hear such remarks quite often: 'why is feature foo not > implemented in package bar?. I am finding it harder and harder not to > react negatively to these emails. Now I cannot consider myself as a > contributor to scipy, and thus I can claim that I am not taking your > comment personally. Where I work (a large scientific user facility), there are lots of scientists in what I'll presume is Eric's position -- able and willing to work well with scientific programming tools, but unable to devote the extra time needed to develop core functionality or maintain much work outside of their own area of interest. There are a great many scientists interested in learning and using python. Several people there *are* writing scientific libraries with python. Similarly in the fields I work in, python is widely accepted as an important ecosystem. > Why isn't scipy not up to the task? Will, the answer is quite simple: > because it's developed by volunteers that do it on their spare time, late > at night too often, or companies that put some of their benefits in open > source rather in locking down a market. 90% of the time the reason the > feature isn't as good as you would want it is because of lack of time. > > I personally find that suggesting that somebody else should put more of > the time and money they are already giving away in improving a feature > that you need is almost insulting. Well, in some sense, Eric's message is an expression of interest.... Perhaps you would prefer that nobody outside the core group of developers or mailing list subscribers asked for any new features or clarification of existing features. > I am aware that people do not realize how small the group of people that > develop and maintain their toys is. Borrowing from Fernando Perez's talk > at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide 80), > the number of people that do 90% of the grunt work to get the core > scientific Python ecosystem going is around two handfuls. Well, Fernando's slides indicate there is a small group that dominates commits to the projects, then explains, at least partially, why that it is. It is *NOT* because scientists expect this work to be done for them by volunteers who should just work harder. There are very good reasons for people to not be involved. The work is rarely funded, is generally a distraction from funded work, and hardly ever "counts" as scientific work. That's all on top of being a scientist, not a programmer. Now, if you'll allow me, I myself am one of the "lucky" scientific software developers, well-recognized in my own small community for open source analysis software, and also in a scientific position and in a group where building tools for better data collection and analysis can easily be interpreted as part of the job. In fact, I spend a very significant amount of my time writing open source software, and work nearly exclusively in python. So, just as as an example of what happens when someone might "contribute", I wrote some code (lmfit-py) that could go into scipy and posted it to this list several months ago. Many people have expressed interest in this module, and it has been discussed on this list a few times in the past few months. Though lmfit-py is older than Fernando's slides (it was inspired after being asked several times "Is there something like IDL's mpfit, only faster and in python?"), it actually follows his directions of "get involved" quite closely: it is BSD, at github, with decent documentation, and does not depend on packages other than scipy and numpy. Though it's been discussed on this list recently, two responses from frequent mailing-list responders (you, Paul V) was more along the lines of "yes, that could be done, in principle, if someone were up to doing the work" instead of "perhaps package xxx would work for you". At no point has anyone from the scipy team expressed an interest in putting this into scipy. OK, perhaps lmfit-py is not high enough quality. I can accept that. My point is that there *is* a contribution but one that would not show up on Fernando's graph as a lengthening of "the tail of contributors". There ARE a few developers out there who are interested in making contributions, and the scipy team is not doing everything it could be doing to either facilitate or even encourage such participation. In fact, especially given your response, it would be possible to conclude that contributions are actually discouraged. It's also possible to be more optimistic, and conclude that Fernando's statistics are accurate only for each project shown, but wildly underestimate the whole of the community. > I'd like to think that it's a problem of skill set: users that have the > ability to contribute are just too rare. This is not entirely true, there > are scores of skilled people on the mailing lists. You yourself mention > that you are developing a package. There are many kinds of skills. Sometimes, not insulting your customers, colleagues, and potential collaborators is the most important one. > Sorry for the rant, but if you want things to improve, you will have more > successes sending in pull request than messages on mailing list that > sound condescending to my ears. > > I hope that I haven't overreacted too badly. Sorry, but I think you have. I'm impressed that Eric was appreciative -- I know many who would not be. For myself, I find it quite discouraging that the scipy team is so insular. Cheers, --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.newville at gmail.com Fri Mar 9 10:04:17 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Fri, 9 Mar 2012 07:04:17 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: <16397446.506.1331305457726.JavaMail.geo-discussion-forums@ynlw24> David, On Thursday, March 8, 2012 9:14:23 PM UTC-6, David Baddeley wrote: > > From a pure performance perspective, you're probably going to be best > setting your bounds by variable substitution (particularly if they're only > single-ended - x**2 is cheap) - you really don't want to have the for > loops, dictionary lookups and conditionals that lmfit introduces for it's > bounds checking inside your objective function. > >From a performance perspective in which dictionary lookups and additions in the wrapper that lmfit puts on an objective function are considered high, I think you would probably not want the objective function written in python to begin with, but just use Fortran (or C). Much of scipy presupposes that one must include development time (here, or writing and manipulating the objective function) into "performance". So, for some trivial cases, one can easily change the parameter from, say "x, with a minimum value of 0" to "x**2", but then one also has to change the objective function and re-map the estimated uncertainties in the parameters every time the bounds might be changed. These would be changes that the end-user would have to do..... Or they could use lmfit which does this automatically, if at a slight performance cost compared to having no bounds set. Is that performance hit important? I doubt it. The original question, and several follow-up messages, point to mpfit.py. As I'm sure you're aware, this implements the Levenberg-Marquardt algorithm **in python**. And people use it because it provides a convenient way to set bounds. So, like much of scipy and python, sometimes pure performance is not the main requirement. Now, mpfit.py is slow (and is a translation of MINPACK from fortran to IDL to python-with-Numeric), OTOH, lmfit uses scipy.optimize.leastsq(), which calls into the fortran version of MINPACK, and so does have improved performance compared to mpfit.py. I think a high level wrapper that permitted bounds, an unadulterated goal > function, and setting which parameters to fit, but also retained much of > the raw speed of leastsq could be accomplished with some clever on the fly > code generation (maybe also using Sympy to automatically derive the > Jacobian). Would make an interesting project .. > Sounds great.. Let me know when it's ready and I'll be happy to give it a try. --Matt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 9 11:40:01 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 9 Mar 2012 11:40:01 -0500 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Thu, Mar 8, 2012 at 7:55 PM, Matthew Newville wrote: > Gael, > > > On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: >> >>?? I am sorry I am going to react to the provocation. > > And I am sorry that I am going to react to your message.? I think your > reaction is unfair. > > >>?? As some one who spends a fair amount of time working on open source >>?? software I hear such remarks quite often: 'why is feature foo not >>?? implemented in package bar?. I am finding it harder and harder not to >>?? react negatively to these emails. Now I cannot consider myself as a >>?? contributor to scipy, and thus I can claim that I am not taking your >>?? comment personally. > > Where I work (a large scientific user facility), there are lots of > scientists in what I'll presume is Eric's position -- able and willing to > work well with scientific programming tools, but unable to devote the extra > time needed to develop core functionality or maintain much work outside of > their own area of interest.? There are a great many scientists interested in > learning and using python.? Several people there *are* writing scientific > libraries with python.? Similarly in the fields I work in, python is widely > accepted as an important ecosystem. > > >>?? Why isn't scipy not up to the task? Will, the answer is quite simple: >>?? because it's developed by volunteers that do it on their spare time, >> late >>?? at night too often, or companies that put some of their benefits in open >>?? source rather in locking down a market. 90% of the time the reason the >>?? feature isn't as good as you would want it is because of lack of time. >> >>?? I personally find that suggesting that somebody else should put more of >>?? the time and money they are already giving away in improving a feature >>?? that you need is almost insulting. > > Well, in some sense, Eric's message is an expression of interest.... Perhaps > you would prefer that nobody outside the core group of developers or mailing > list subscribers asked for any new features or clarification of existing > features. > > >>?? I am aware that people do not realize how small the group of people that >>?? develop and maintain their toys is. Borrowing from Fernando Perez's talk >>?? at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide 80), >>?? the number of people that do 90% of the grunt work to get the core >>?? scientific Python ecosystem going is around two handfuls. > > Well, Fernando's slides indicate there is a small group that dominates > commits to the projects, then explains, at least partially, why that it is. > It is *NOT* because scientists expect this work to be done for them by > volunteers who should just work harder. > > There are very good reasons for people to not be involved.? The work is > rarely funded, is generally a distraction from funded work, and hardly ever > "counts" as scientific work.? That's all on top of being a scientist, not a > programmer.? Now, if you'll allow me, I myself am one of the "lucky" > scientific software developers, well-recognized in my own small community > for open source analysis software, and also in a scientific position and in > a group where building tools for better data collection and analysis can > easily be interpreted as part of the job.? In fact, I spend a very > significant amount of my time writing open source software, and work nearly > exclusively in python. > > So, just as as an example of what happens when someone might "contribute", > I wrote some code (lmfit-py) that could go into scipy and posted it to this > list several months ago.? Many people have expressed interest in this > module, and it has been discussed on this list a few times in the past few > months.? Though lmfit-py is older than Fernando's slides (it was inspired > after being asked several times "Is there something like IDL's mpfit, only > faster and in python?"), it actually follows his directions of "get > involved" quite closely: it is BSD, at github, with decent documentation, > and does not depend on packages other than scipy and numpy.?? Though it's > been discussed on this list recently, two responses from frequent > mailing-list responders (you, Paul V) was more along the lines of? "yes, > that could be done, in principle, if someone were up to doing the work" > instead of "perhaps package xxx would work for you". > > At no point has anyone from the scipy team expressed an interest in putting > this into scipy.? OK, perhaps lmfit-py is not high enough quality.? I can > accept that.? My point is that there *is* a contribution but one that would > not show up on Fernando's graph as a lengthening of "the tail of > contributors". There ARE a few developers out there who are interested in > making contributions, and the scipy team is not doing everything it could be > doing to either facilitate or even encourage such participation.? In fact, > especially given your response, it would be possible to conclude that > contributions are actually discouraged.? It's also possible to be more > optimistic, and conclude that Fernando's statistics are accurate only for > each project shown, but wildly underestimate the whole of the community. I think lmfit is a good project, it can be easy installed. You are able to maintain and develop it. So I don't think the need to have it in scipy is very urgent. On the other hand, for anyone not familiar with AST manipulation it feels to me like a possible maintenance nightmare. It doesn't mean it is, but as part of a community project it should be possible to maintain (or come with a maintainer). But maybe I have just seen to much stranded and broken code in scipy (that remained neglected for years). As an example for a contribution: fisher's exact test, a pretty important function, but didn't quite work for several cases. I spend several days trying to figure out how to fix it. I was not successfull since I was not familiar with the algorithm and the numerical problems it raised. A while later users or the original developer found ways to fix the corner cases. At that stage it was possible to include it in scipy. (There were a few additional edge cases afterwards, but that were minor fixes.) As a positive example, Denis Laxalde became very active and is revamping and improving large parts of the scipy.optimize code. Josef > > >>? I'd like to think that it's a problem of skill set: users that have the >>? ability to contribute are just too rare. This is not entirely true, there >>? are scores of skilled people on the mailing lists. You yourself mention >>? that you are developing a package. > > There are many kinds of skills.? Sometimes, not insulting your customers, > colleagues, and potential collaborators is the most important one. > > >>? Sorry for the rant, but if you want things to improve, you will have more >>? successes sending in pull request than messages on mailing list that >>? sound condescending to my ears. >> >>? I hope that I haven't overreacted too badly. > > Sorry, but I think you have.? I'm impressed that Eric was appreciative -- I > know many who would not be. > > For myself, I find it quite discouraging that the scipy team is so insular. > Cheers, > > --Matt Newville > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From keflavich at gmail.com Fri Mar 9 11:46:24 2012 From: keflavich at gmail.com (Keflavich) Date: Fri, 9 Mar 2012 08:46:24 -0800 (PST) Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <9B08C4C4-7639-4996-AB7E-82D96ADCCF79@nist.gov> References: <9B08C4C4-7639-4996-AB7E-82D96ADCCF79@nist.gov> Message-ID: <5b25d17c-3a93-4e6c-a5eb-df976557f6c1@j5g2000yqm.googlegroups.com> re: Gael: I removed numpy and scipy completely from both my Frameworks and virtualenv installs and removed the build directories from both. However, I think that *still* didn't work. One part of the build that probably caused problems was using this: PYTHONPATH="/Library/Frameworks/Python.framework/Versions/2.7/lib/ python2.7/site-packages/" which some site (maybe hyperjeff? maybe elsewhere?) said was necessary for building scipy. I'm pretty sure getting rid of that in the scipy build was essential. My eventual successful build looked like this: numpy: CFLAGS="-arch i386 -arch x86_64" FFLAGS="-m32 -m64" LDFLAGS="-Wall - undefined dynamic_lookup -bundle -arch i386 -arch x86_64" MACOSX_DEPLOYMENT_TARGET=10.6 PYTHONPATH="/Library/Frameworks/ Python.framework/Versions/2.7/lib/python2.7/site-packages/" ~/virtual- python/bin/python2.7 setup.py build --fcompiler=gnu95 scipy: CFLAGS="-arch i386 -arch x86_64" FFLAGS="-m32 -m64" LDFLAGS="-Wall - undefined dynamic_lookup -bundle -arch i386 -arch x86_64" MACOSX_DEPLOYMENT_TARGET=10.6 PYTHONPATH="/Users/adam/virtual-python/ lib/python2.7/site-packages/" ~/virtual-python/bin/python2.7 setup.py build --fcompiler=gnu95 Re: Jonathan - good to know. I don't think that affected me, though, as I made my virtualenv from a /Library python, not a /System/Library python. Why evil, though? From william.ratcliff at gmail.com Fri Mar 9 11:51:45 2012 From: william.ratcliff at gmail.com (william ratcliff) Date: Fri, 9 Mar 2012 11:51:45 -0500 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: But this is exactly the problem--I also work at a large user facility and will play with his package--but scipy is one of the first places new users will turn to--and fitting a function with bounds is a very common task. In this particular case, what are the exact steps needed to get it into scipy? Can they charge be listed as tickets somewhere so that others of us can help? Can we document the process to make it easier the next time? I realize everyone is busy, but if the barrier to contribution is lowered it will make life better in the long run. On Mar 9, 2012 11:40 AM, wrote: > On Thu, Mar 8, 2012 at 7:55 PM, Matthew Newville > wrote: > > Gael, > > > > > > On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: > >> > >> I am sorry I am going to react to the provocation. > > > > And I am sorry that I am going to react to your message. I think your > > reaction is unfair. > > > > > >> As some one who spends a fair amount of time working on open source > >> software I hear such remarks quite often: 'why is feature foo not > >> implemented in package bar?. I am finding it harder and harder not to > >> react negatively to these emails. Now I cannot consider myself as a > >> contributor to scipy, and thus I can claim that I am not taking your > >> comment personally. > > > > Where I work (a large scientific user facility), there are lots of > > scientists in what I'll presume is Eric's position -- able and willing to > > work well with scientific programming tools, but unable to devote the > extra > > time needed to develop core functionality or maintain much work outside > of > > their own area of interest. There are a great many scientists > interested in > > learning and using python. Several people there *are* writing scientific > > libraries with python. Similarly in the fields I work in, python is > widely > > accepted as an important ecosystem. > > > > > >> Why isn't scipy not up to the task? Will, the answer is quite simple: > >> because it's developed by volunteers that do it on their spare time, > >> late > >> at night too often, or companies that put some of their benefits in > open > >> source rather in locking down a market. 90% of the time the reason the > >> feature isn't as good as you would want it is because of lack of time. > >> > >> I personally find that suggesting that somebody else should put more > of > >> the time and money they are already giving away in improving a feature > >> that you need is almost insulting. > > > > Well, in some sense, Eric's message is an expression of interest.... > Perhaps > > you would prefer that nobody outside the core group of developers or > mailing > > list subscribers asked for any new features or clarification of existing > > features. > > > > > >> I am aware that people do not realize how small the group of people > that > >> develop and maintain their toys is. Borrowing from Fernando Perez's > talk > >> at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide > 80), > >> the number of people that do 90% of the grunt work to get the core > >> scientific Python ecosystem going is around two handfuls. > > > > Well, Fernando's slides indicate there is a small group that dominates > > commits to the projects, then explains, at least partially, why that it > is. > > It is *NOT* because scientists expect this work to be done for them by > > volunteers who should just work harder. > > > > There are very good reasons for people to not be involved. The work is > > rarely funded, is generally a distraction from funded work, and hardly > ever > > "counts" as scientific work. That's all on top of being a scientist, > not a > > programmer. Now, if you'll allow me, I myself am one of the "lucky" > > scientific software developers, well-recognized in my own small community > > for open source analysis software, and also in a scientific position and > in > > a group where building tools for better data collection and analysis can > > easily be interpreted as part of the job. In fact, I spend a very > > significant amount of my time writing open source software, and work > nearly > > exclusively in python. > > > > So, just as as an example of what happens when someone might > "contribute", > > I wrote some code (lmfit-py) that could go into scipy and posted it to > this > > list several months ago. Many people have expressed interest in this > > module, and it has been discussed on this list a few times in the past > few > > months. Though lmfit-py is older than Fernando's slides (it was inspired > > after being asked several times "Is there something like IDL's mpfit, > only > > faster and in python?"), it actually follows his directions of "get > > involved" quite closely: it is BSD, at github, with decent documentation, > > and does not depend on packages other than scipy and numpy. Though it's > > been discussed on this list recently, two responses from frequent > > mailing-list responders (you, Paul V) was more along the lines of "yes, > > that could be done, in principle, if someone were up to doing the work" > > instead of "perhaps package xxx would work for you". > > > > At no point has anyone from the scipy team expressed an interest in > putting > > this into scipy. OK, perhaps lmfit-py is not high enough quality. I can > > accept that. My point is that there *is* a contribution but one that > would > > not show up on Fernando's graph as a lengthening of "the tail of > > contributors". There ARE a few developers out there who are interested in > > making contributions, and the scipy team is not doing everything it > could be > > doing to either facilitate or even encourage such participation. In > fact, > > especially given your response, it would be possible to conclude that > > contributions are actually discouraged. It's also possible to be more > > optimistic, and conclude that Fernando's statistics are accurate only for > > each project shown, but wildly underestimate the whole of the community. > > I think lmfit is a good project, it can be easy installed. You are > able to maintain and develop it. > So I don't think the need to have it in scipy is very urgent. > > On the other hand, for anyone not familiar with AST manipulation it > feels to me like a possible maintenance nightmare. > It doesn't mean it is, but as part of a community project it should be > possible to maintain (or come with a maintainer). > > But maybe I have just seen to much stranded and broken code in scipy > (that remained neglected for years). > > As an example for a contribution: fisher's exact test, a pretty > important function, but didn't quite work for several cases. I spend > several days trying to figure out how to fix it. I was not successfull > since I was not familiar with the algorithm and the numerical problems > it raised. A while later users or the original developer found ways to > fix the corner cases. At that stage it was possible to include it in > scipy. (There were a few additional edge cases afterwards, but that > were minor fixes.) > > As a positive example, Denis Laxalde became very active and is > revamping and improving large parts of the scipy.optimize code. > > Josef > > > > > > >> I'd like to think that it's a problem of skill set: users that have the > >> ability to contribute are just too rare. This is not entirely true, > there > >> are scores of skilled people on the mailing lists. You yourself mention > >> that you are developing a package. > > > > There are many kinds of skills. Sometimes, not insulting your customers, > > colleagues, and potential collaborators is the most important one. > > > > > >> Sorry for the rant, but if you want things to improve, you will have > more > >> successes sending in pull request than messages on mailing list that > >> sound condescending to my ears. > >> > >> I hope that I haven't overreacted too badly. > > > > Sorry, but I think you have. I'm impressed that Eric was appreciative > -- I > > know many who would not be. > > > > For myself, I find it quite discouraging that the scipy team is so > insular. > > Cheers, > > > > --Matt Newville > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Mar 9 11:54:43 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 9 Mar 2012 17:54:43 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: <20120309165443.GA26552@phare.normalesup.org> Hi Matt, I am not going to answer to the core of your message. I partly agree with it and partly disagree. I think that it is fair to have different points of view. In addition, I do share the opinion that the situation of developers in open source scientific software is not ideal. I've suffered from it personally. I just want to react to a couple of minor points > At no point has anyone from the scipy team expressed an interest in > putting this into scipy. Who is the scipy team? What is the scipy team? Who could or should express such an interest? These are people struggling to maintain a massive package on their free time. It actually takes a lot of time to monitor the mailing lists and pick up offers like yours to turn them into something that can be integrated. Had you submitted a pull request, with code ready to be merged, i.e. with no extra work in terms of documentation, API or tests, I think that it would be legitimate to blame the scipy developers for lack of interest. That said, I can easily understand how such a pull request would fall between the cracks. It's unfortunate, not excusable, but it does happen. Indeed, in the projects I maintain, I am kept busy full time with pure maintenance work (bug fixing, answering emails, improving documentation). When I review and merge pull requests, a lot of the time they are for features that I do not need, and I spend full week ends adding tests, fixing numerical instabilities, completing the docs so that they can be merged. You have to realize that most contributions to open source projects actually add up to the workload of the core developers. Thankfully, not all of them. Teams do build upon people unexpectedly fixing bugs, contributing flawless code that can be merged in without any additional work. I personally have seen my time invested in maintenance of open source project go up and up for the last few years, until it was to a point where I was spending a major part of my free time on it. It ended up giving me a nasty back pain, and I started not answering bug reports, pull requests and support emails to preserve my health: it is not sane to spend all onces time in front of a computer. > There are many kinds of skills.? Sometimes, not insulting your customers, > colleagues, and potential collaborators is the most important one. Maybe I went over the top. I didn't want to sound insulting. I felt insulted, as an open source develop (even thought I am not a scipy developer). I am sorry that I ignited a flame. Getting worked out about email is never a good thing, and discussion pushing blame certainly don't help building a community. Maybe I shouldn't have sent this email, or I should have worded it differently. I apologize for the harsh tone. I certainly did feel bad when I received the original email, and I wanted to express it. > For myself, I find it quite discouraging that the scipy team is so > insular. Firstly, I would like to stress that I cannot consider myself as part of the scipy team. I contribute very little code to scipy. As a consequence I do not feel that I have much legitimacy in making decisions or comments on the codebase. Thus you shouldn't take my reaction as a reaction coming from the scipy team, but rather as coming from myself. Second, can I ask you what makes you think that the scipy team is insular? Scipy is a big project with a lot of history. As such it is harder to contribute to it than a small and light project. But I don't feel any dogmatism or clique attitude from the developers. And, by the way, if we are going to talk about the scipy developers, I encourage everybody to find out who they are, i.e. who has been contributing lately [1]. I don't think that the handful of people that come on top of the list have an insular behavior. I do think that they are on an island, in the sens that they are pretty much left alone to do the grunt work. None of these people reacted badly to any mail on this mailing list about the state of scipy. I raise my hat to them! Ga?l [1]:: $ git clone https://github.com/scipy/scipy.git $ git shortlog -sn v0.7.0.. From gael.varoquaux at normalesup.org Fri Mar 9 11:57:36 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 9 Mar 2012 17:57:36 +0100 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <5b25d17c-3a93-4e6c-a5eb-df976557f6c1@j5g2000yqm.googlegroups.com> References: <9B08C4C4-7639-4996-AB7E-82D96ADCCF79@nist.gov> <5b25d17c-3a93-4e6c-a5eb-df976557f6c1@j5g2000yqm.googlegroups.com> Message-ID: <20120309165736.GB13064@phare.normalesup.org> On Fri, Mar 09, 2012 at 08:46:24AM -0800, Keflavich wrote: > re: Gael: I removed numpy and scipy completely from both my Frameworks > and virtualenv installs and removed the build directories from both. > However, I think that *still* didn't work. One part of the build that > probably caused problems was using this: > PYTHONPATH="/Library/Frameworks/Python.framework/Versions/2.7/lib/ > python2.7/site-packages/" > which some site (maybe hyperjeff? maybe elsewhere?) said was > necessary for building scipy. I'm pretty sure getting rid of that in > the scipy build was essential. Indeed, you story confirms my experience: a lot of build problem are related to having several installs of Python, and having a hard time controlling which one exactly is used when. I don't know a good solution to these problems :(. G From peter.cimermancic at gmail.com Fri Mar 9 12:43:45 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Fri, 9 Mar 2012 09:43:45 -0800 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: <4F5A25F7.3020408@molden.no> References: <4F5A25F7.3020408@molden.no> Message-ID: > > > > No, it does not. If you are working with counts, the appropriate model > would usually be Poisson regression. I.e. Generalized linear model with > log-link function and Possion probability family. I have seen many > examples of microbiologists using linear regression when they should > actually use Poisson regression (e.g. counting genes) or logistic > regression (e.g. dose-response and titration curves). > > This will do it for you: > > MATLAB: glmfit from the statistics toolbox > R: glm > SAS: PROC GLIM > Python: statmodels scikit > > Another example of inappropriate use of linear regression in > microbiology is the Lineweaver-Burk plot as substitute for non-linear > least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten > curve. Some microbiologists are bevare of this, but they seem to prefer > all sorts of ad hoc trickeries like linearizations and > variance-stabilizing transforms instead of "just doing it right". > > As for samples that are not independent, that will affect the final > likelihood. If you want to optimize the log-likelhood yourself, to > control for this, getting ML estimates by maximizing the log-likelhood > is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's > method does not even need the gradient.) And if you need the "p-value", > you can either use the likelihood ratio or Monte Carlo (e.g. permutation > test). > > Sturla, could you be more specific here? I don't know much about (bio)statistics, but that doesn't mean I don't want to do the things right :). All I want to get out of this analysis is to be able to say whether the correlation between genome lengths and numbers of particular genes (which looks neat and obvious from the scatter plot) is statistically significant given that the data points are heavily phylogenetically biased. That's why I mentioned "p-values". Of course, I'm open to any better/more accurate way of getting there than initially planned. > > Sturla > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Mar 9 12:50:48 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Mar 2012 10:50:48 -0700 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <20120309165443.GA26552@phare.normalesup.org> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> Message-ID: On Fri, Mar 9, 2012 at 9:54 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > Hi Matt, > > I am not going to answer to the core of your message. I partly agree with > it and partly disagree. I think that it is fair to have different points > of view. In addition, I do share the opinion that the situation of > developers in open source scientific software is not ideal. I've suffered > from it personally. > > I just want to react to a couple of minor points > > > > At no point has anyone from the scipy team expressed an interest in > > putting this into scipy. > > Who is the scipy team? What is the scipy team? Who could or should > express such an interest? These are people struggling to maintain a > massive package on their free time. It actually takes a lot of time to > monitor the mailing lists and pick up offers like yours to turn them into > something that can be integrated. > > Had you submitted a pull request, with code ready to be merged, i.e. with > no extra work in terms of documentation, API or tests, I think that it > would be legitimate to blame the scipy developers for lack of interest. > That said, I can easily understand how such a pull request would fall > between the cracks. It's unfortunate, not excusable, but it does happen. > Indeed, in the projects I maintain, I am kept busy full time with pure > maintenance work (bug fixing, answering emails, improving documentation). > When I review and merge pull requests, a lot of the time they are for > features that I do not need, and I spend full week ends adding tests, > fixing numerical instabilities, completing the docs so that they can be > merged. You have to realize that most contributions to open source > projects actually add up to the workload of the core developers. > Thankfully, not all of them. Teams do build upon people unexpectedly > fixing bugs, contributing flawless code that can be merged in without any > additional work. > > I personally have seen my time invested in maintenance of open source > project go up and up for the last few years, until it was to a point > where I was spending a major part of my free time on it. It ended up > giving me a nasty back pain, and I started not answering bug reports, > pull requests and support emails to preserve my health: it is not sane to > spend all onces time in front of a computer. > > > There are many kinds of skills. Sometimes, not insulting your > customers, > > colleagues, and potential collaborators is the most important one. > > Maybe I went over the top. I didn't want to sound insulting. I felt > insulted, as an open source develop (even thought I am not a scipy > developer). I am sorry that I ignited a flame. Getting worked out about > email is never a good thing, and discussion pushing blame certainly don't > help building a community. Maybe I shouldn't have sent this email, or I > should have worded it differently. I apologize for the harsh tone. I > certainly did feel bad when I received the original email, and I wanted > to express it. > > > For myself, I find it quite discouraging that the scipy team is so > > insular. > > Firstly, I would like to stress that I cannot consider myself as part of > the scipy team. I contribute very little code to scipy. As a consequence > I do not feel that I have much legitimacy in making decisions or > comments on the codebase. Thus you shouldn't take my reaction as a > reaction coming from the scipy team, but rather as coming from myself. > > Second, can I ask you what makes you think that the scipy team is > insular? Scipy is a big project with a lot of history. As such it is > harder to contribute to it than a small and light project. But I don't > feel any dogmatism or clique attitude from the developers. And, by the > way, if we are going to talk about the scipy developers, I encourage > everybody to find out who they are, i.e. who has been contributing lately > [1]. I don't think that the handful of people that come on top of the > list have an insular behavior. I do think that they are on an island, in > the sens that they are pretty much left alone to do the grunt work. None > of these people reacted badly to any mail on this mailing list about the > state of scipy. I raise my hat to them! > > Carefully stepping past the kerfluffle at the bar, I think this sort of functionality in scipy would be useful. If nothing else, I wouldn't have to keep implementing for myself ;) IIRC, Dennis Lexalde was going to do something similar and I think it would be good if some of the folks with implementations started a separate thread about getting it into scipy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 9 12:51:23 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 9 Mar 2012 12:51:23 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> Message-ID: On Fri, Mar 9, 2012 at 12:43 PM, Peter Cimerman?i? wrote: >> >> >> No, it does not. If you are working with counts, the appropriate model >> would usually be Poisson regression. I.e. Generalized linear model with >> log-link function and Possion probability family. I have seen many >> examples of microbiologists using linear regression when they should >> actually use Poisson regression (e.g. counting genes) or logistic >> regression (e.g. dose-response and titration curves). >> >> This will do it for you: >> >> MATLAB: glmfit from the statistics toolbox >> R: glm >> SAS: PROC GLIM >> Python: statmodels scikit >> >> Another example of inappropriate use of linear regression in >> microbiology is the Lineweaver-Burk plot as substitute for non-linear >> least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten >> curve. Some microbiologists are bevare of this, but they seem to prefer >> all sorts of ad hoc trickeries like linearizations and >> variance-stabilizing transforms instead of "just doing it right". >> >> As for samples that are not independent, that will affect the final >> likelihood. If you want to optimize the log-likelhood yourself, to >> control for this, getting ML estimates by maximizing the log-likelhood >> is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's >> method does not even need the gradient.) And if you need the "p-value", >> you can either use the likelihood ratio or Monte Carlo (e.g. permutation >> test). >> > > Sturla,?could you be more specific here? I don't know much about > (bio)statistics, but that doesn't mean I don't want to do the things right > :). All I want to get out of this analysis is to be able to say whether the > correlation between genome lengths and numbers of particular genes (which > looks neat and obvious from the scatter plot) is statistically significant > given that the data points are?heavily?phylogenetically biased. That's why I > mentioned "p-values". Of course, I'm open to any better/more accurate way of > getting there than initially planned. Peter, Could you post a scatter plot of your data (with axis ticks and labels) so we get an idea what your data looks like? I have no idea at all about the bio topic. Josef > > > > >> >> >> Sturla >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From guyer at nist.gov Fri Mar 9 12:55:41 2012 From: guyer at nist.gov (Jonathan Guyer) Date: Fri, 9 Mar 2012 12:55:41 -0500 Subject: [SciPy-User] scipy compiles, but importing interpolate fails In-Reply-To: <5b25d17c-3a93-4e6c-a5eb-df976557f6c1@j5g2000yqm.googlegroups.com> References: <9B08C4C4-7639-4996-AB7E-82D96ADCCF79@nist.gov> <5b25d17c-3a93-4e6c-a5eb-df976557f6c1@j5g2000yqm.googlegroups.com> Message-ID: <79EFF91E-EDC5-40F2-8CC5-0E60F21B7588@nist.gov> On Mar 9, 2012, at 11:46 AM, Keflavich wrote: > Re: Jonathan - good to know. I don't think that affected me, though, > as I made my virtualenv from a /Library python, not a /System/Library > python. Why evil, though? Because /System belongs to Apple and should not be tampered with. Using mkvirtualenv --no-site-packages means you don't have to. It doesn't matter how stale Apple's packages are; they get to have what they expect and you get to use what you want/need. From pav at iki.fi Fri Mar 9 13:31:22 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 09 Mar 2012 19:31:22 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> Message-ID: Hi, 09.03.2012 18:50, Charles R Harris kirjoitti: [clip] > Carefully stepping past the kerfluffle at the bar, I think this sort of > functionality in scipy would be useful. If nothing else, I wouldn't have > to keep implementing for myself ;) IIRC, Dennis Lexalde was going to do > something similar and I think it would be good if some of the folks with > implementations started a separate thread about getting it into scipy. Dennis actually not only intended, but also implemented something similar. I wasn't too deeply involved in that, but it's already merged in Scipy's trunk. Now, based on a *very* quick look to lmfit (I did not look at it before now as I did not remember it existed), it seems to be quite similar in purpose. Hashing out if lmfit has something extra, or if the current implementation is missing something could be useful, however. Pauli From pav at iki.fi Fri Mar 9 13:47:20 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 09 Mar 2012 19:47:20 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: Hi, 09.03.2012 17:51, william ratcliff kirjoitti: [clip] > In this particular case, what are the exact steps needed to get it into > scipy? Can they charge be listed as tickets somewhere so that others of > us can help? Can we document the process to make it easier the next > time? I realize everyone is busy, but if the barrier to contribution is > lowered it will make life better in the long run. In general, basically two ways for contributions: 1. A pull request via Github. We have a writeup here with various tips: http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html Just replace "Numpy" by "Scipy" everywhere. 2. File a ticket in the Trac: http://projects.scipy.org/scipy/ Attach whetever you have (a patch, separate files) to the ticket, and tag it as "enhancement" and "needs_review". That's about it. *** However, to make it easier for someone to look at the work and verify it works properly: - Ensure your code is accompanied by tests that demonstrate it actually works as intended. You can look for examples how to write them in the Scipy source tree, in files named test_XXX.py - Ensure the behavior of the public functions is documented in the docstrings. - Prefer the Github way. Granted, there *is* a learning curve, but it saves work in the long run, and it is far less clunky to use. - The more finished the contribution is, the less work it is to merge, and gets in faster. If you get no response, shout on the scipy-devel mailing list. If there's still no response, shout louder and start accusing people ;) If the contribution is "controversial" --- duplicates existing functionality, breaks backwards compatibility, is very specialized for a particular research problem, relies on magic, etc. --- it's good to give an argument why the stuff should be included, as otherwise the motivation may be missed. Specific to mpfit: this can be regarded as a "just another optimization routine", and that doesn't seem too controversial to me. It would be nicer to subsume the functionality to leastsq, though, but I don't see anyone wanting to modify the MINPACK fortran code. Instead, perhaps this could be addressed on the level of the unified optimization interface. Cheers, Pauli From peter.cimermancic at gmail.com Fri Mar 9 14:30:35 2012 From: peter.cimermancic at gmail.com (=?UTF-8?Q?Peter_Cimerman=C4=8Di=C4=8D?=) Date: Fri, 9 Mar 2012 11:30:35 -0800 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> Message-ID: Sure, please see attached. Bacteria.jpg is the plot we're talking about. As you can see there is a nice correlation in the graph, but I'm afraid there might something like in the second figure (ives.jpg) going on. The second figure is from Ives and Zhu; Statistics for correlated data: phylogenies, space and time (2006). Peter On Fri, Mar 9, 2012 at 9:51 AM, wrote: > On Fri, Mar 9, 2012 at 12:43 PM, Peter Cimerman?i? > wrote: > >> > >> > >> No, it does not. If you are working with counts, the appropriate model > >> would usually be Poisson regression. I.e. Generalized linear model with > >> log-link function and Possion probability family. I have seen many > >> examples of microbiologists using linear regression when they should > >> actually use Poisson regression (e.g. counting genes) or logistic > >> regression (e.g. dose-response and titration curves). > >> > >> This will do it for you: > >> > >> MATLAB: glmfit from the statistics toolbox > >> R: glm > >> SAS: PROC GLIM > >> Python: statmodels scikit > >> > >> Another example of inappropriate use of linear regression in > >> microbiology is the Lineweaver-Burk plot as substitute for non-linear > >> least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten > >> curve. Some microbiologists are bevare of this, but they seem to prefer > >> all sorts of ad hoc trickeries like linearizations and > >> variance-stabilizing transforms instead of "just doing it right". > >> > >> As for samples that are not independent, that will affect the final > >> likelihood. If you want to optimize the log-likelhood yourself, to > >> control for this, getting ML estimates by maximizing the log-likelhood > >> is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's > >> method does not even need the gradient.) And if you need the "p-value", > >> you can either use the likelihood ratio or Monte Carlo (e.g. permutation > >> test). > >> > > > > Sturla, could you be more specific here? I don't know much about > > (bio)statistics, but that doesn't mean I don't want to do the things > right > > :). All I want to get out of this analysis is to be able to say whether > the > > correlation between genome lengths and numbers of particular genes (which > > looks neat and obvious from the scatter plot) is statistically > significant > > given that the data points are heavily phylogenetically biased. That's > why I > > mentioned "p-values". Of course, I'm open to any better/more accurate > way of > > getting there than initially planned. > > Peter, Could you post a scatter plot of your data (with axis ticks and > labels) so we get an idea what your data looks like? > > I have no idea at all about the bio topic. > > Josef > > > > > > > > > > >> > >> > >> Sturla > >> > >> > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ives.tiff Type: image/tiff Size: 55288 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bacteria.jpg Type: image/jpeg Size: 37352 bytes Desc: not available URL: From njs at pobox.com Fri Mar 9 14:46:07 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Mar 2012 19:46:07 +0000 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> Message-ID: On Fri, Mar 9, 2012 at 7:30 PM, Peter Cimerman?i? wrote: > Sure, please see attached. Bacteria.jpg is the plot we're talking about. As > you can see there is a nice correlation in the graph, but I'm afraid there > might something like in the second figure (ives.jpg) going on. The second > figure is from Ives and Zhu; Statistics for correlated data: phylogenies, > space and time (2006). So in the figure from Ives and Zhu, the two variables do seem to be well-correlated across groups, but then within individual groups they aren't well-correlated. Is that what you're worried about -- that gene count and genome length might be correlated overall, but not within individual groups? Because GLS doesn't actually address that question. It lets you correct your p-values for the fact that similarity between bacteria means that you effectively have somewhat less data than it would otherwise appear, and thus your p-values should be larger than they would be in a naive analysis. But it'd still be a p-value on whether the two variables are correlated overall. (Which they obviously are...) -- Nathaniel From josef.pktd at gmail.com Fri Mar 9 15:13:37 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 9 Mar 2012 15:13:37 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> Message-ID: On Fri, Mar 9, 2012 at 2:46 PM, Nathaniel Smith wrote: > On Fri, Mar 9, 2012 at 7:30 PM, Peter Cimerman?i? > wrote: >> Sure, please see attached. Bacteria.jpg is the plot we're talking about. As >> you can see there is a nice correlation in the graph, but I'm afraid there >> might something like in the second figure (ives.jpg) going on. The second >> figure is from Ives and Zhu; Statistics for correlated data: phylogenies, >> space and time (2006). > > So in the figure from Ives and Zhu, the two variables do seem to be > well-correlated across groups, but then within individual groups they > aren't well-correlated. Is that what you're worried about -- that gene > count and genome length might be correlated overall, but not within > individual groups? > > Because GLS doesn't actually address that question. It lets you > correct your p-values for the fact that similarity between bacteria > means that you effectively have somewhat less data than it would > otherwise appear, and thus your p-values should be larger than they > would be in a naive analysis. But it'd still be a p-value on whether > the two variables are correlated overall. (Which they obviously > are...) I don't think there would be any problem with p-values for the overall positive relationship. I would be surprised when any statistical methods wouldn't produce a large p-value for the slope. Although there is a bit of bunching of points I don't see any big clusters that would indicate that the linear relationship is different. In terms of size of the slope I would guess a robust estimator (statsmodels.RLM) would downweight the observations on the high part of the graph, large count/length ratio, outliers of shorties? I think Sturla has a point in that both count and length are positive. It doesn't look like it's relevant for length, but in the counts there is a bunching just above zero, this creates either a non-linearity or requires another distribution log-normal (?) or Poisson (without zeros, or loc=1)? Josef > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Fri Mar 9 16:36:47 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 9 Mar 2012 22:36:47 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Fri, Mar 9, 2012 at 1:55 AM, Matthew Newville wrote: > Gael, > > > On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: > > > > I am sorry I am going to react to the provocation. > > And I am sorry that I am going to react to your message. I think your > reaction is unfair. > > > > As some one who spends a fair amount of time working on open source > > software I hear such remarks quite often: 'why is feature foo not > > implemented in package bar?. I am finding it harder and harder not to > > react negatively to these emails. Now I cannot consider myself as a > > contributor to scipy, and thus I can claim that I am not taking your > > comment personally. > > Where I work (a large scientific user facility), there are lots of > scientists in what I'll presume is Eric's position -- able and willing to > work well with scientific programming tools, but unable to devote the extra > time needed to develop core functionality or maintain much work outside of > their own area of interest. There are a great many scientists interested > in learning and using python. Several people there *are* writing > scientific libraries with python. Similarly in the fields I work in, > python is widely accepted as an important ecosystem. > > > > Why isn't scipy not up to the task? Will, the answer is quite simple: > > because it's developed by volunteers that do it on their spare time, > late > > at night too often, or companies that put some of their benefits in > open > > source rather in locking down a market. 90% of the time the reason the > > feature isn't as good as you would want it is because of lack of time. > > > > I personally find that suggesting that somebody else should put more of > > the time and money they are already giving away in improving a feature > > that you need is almost insulting. > > Well, in some sense, Eric's message is an expression of interest.... > Perhaps you would prefer that nobody outside the core group of developers > or mailing list subscribers asked for any new features or clarification of > existing features. > > > > I am aware that people do not realize how small the group of people > that > > develop and maintain their toys is. Borrowing from Fernando Perez's > talk > > at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide > 80), > > the number of people that do 90% of the grunt work to get the core > > scientific Python ecosystem going is around two handfuls. > > Well, Fernando's slides indicate there is a small group that dominates > commits to the projects, then explains, at least partially, why that it > is. It is *NOT* because scientists expect this work to be done for them by > volunteers who should just work harder. > > There are very good reasons for people to not be involved. The work is > rarely funded, is generally a distraction from funded work, and hardly ever > "counts" as scientific work. That's all on top of being a scientist, not a > programmer. Now, if you'll allow me, I myself am one of the "lucky" > scientific software developers, well-recognized in my own small community > for open source analysis software, and also in a scientific position and in > a group where building tools for better data collection and analysis can > easily be interpreted as part of the job. In fact, I spend a very > significant amount of my time writing open source software, and work nearly > exclusively in python. > > So, just as as an example of what happens when someone might > "contribute", I wrote some code (lmfit-py) that could go into scipy and > posted it to this list several months ago. Many people have expressed > interest in this module, and it has been discussed on this list a few times > in the past few months. Though lmfit-py is older than Fernando's slides > (it was inspired after being asked several times "Is there something like > IDL's mpfit, only faster and in python?"), it actually follows his > directions of "get involved" quite closely: it is BSD, at github, with > decent documentation, and does not depend on packages other than scipy and > numpy. Though it's been discussed on this list recently, two responses > from frequent mailing-list responders (you, Paul V) was more along the > lines of "yes, that could be done, in principle, if someone were up to > doing the work" instead of "perhaps package xxx would work for you". > > At no point has anyone from the scipy team expressed an interest in > putting this into scipy. OK, perhaps lmfit-py is not high enough quality. > I can accept that. I don't think anyone has doubts about the quality of lmfit. On the contrary, I've asked you to list it on http://scipy.org/Topical_Software(which you did) because I thought it looked interesting, and have directed some users towards your package. The documentation is excellent, certainly better than that of many parts of scipy. The worry with your code is that the maintenance burden may be relatively high, simply because very few developers are familiar with AST. The same for merging it in scipy - one of the core developers will have to invest a significant amount of time wrapping his head around your work. The ideal scenario from my point of view would be this: - lmfit keeps being maintained by you as a separate package for a while (say six months to a year) - it gains more users, who can discover potential flaws and provide feedback. The API can still be changed if necessary. - once it's stabilized a bit more, you propose it again (and more explicitly) for inclusion in scipy - one of the developers does a thorough review and merges it into scipy.optimize - you get commit rights and maintain the code within scipy - bonus points: if you would be interested in improving and reviewing PRs for related code in optimize. Scipy is a very good place to add functionality that's of use in many different fields of science and engineering, but it needs many more active developers. I think this thread is another reminder of that. Some of the criticism in this thread about how hard it is to contribute is certainly justified. I've had the plan for a while (since Fernando's EuroScipy talk actually) to write a more accessible "how to contribute" document than the one Pauli linked to. Besides the mechanics (git, Trac, etc.) it should at least provide some guidance on what belongs in scipy vs. in a scikit, how to get help, how to move a contribution that doesn't get a response forward, etc. I'll try to get a first draft ready within the next week or so. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Mar 9 16:46:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Mar 2012 14:46:29 -0700 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Fri, Mar 9, 2012 at 2:36 PM, Ralf Gommers wrote: > > > On Fri, Mar 9, 2012 at 1:55 AM, Matthew Newville wrote: > >> Gael, >> >> >> On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: >> > >> > I am sorry I am going to react to the provocation. >> >> And I am sorry that I am going to react to your message. I think your >> reaction is unfair. >> >> >> > As some one who spends a fair amount of time working on open source >> > software I hear such remarks quite often: 'why is feature foo not >> > implemented in package bar?. I am finding it harder and harder not to >> > react negatively to these emails. Now I cannot consider myself as a >> > contributor to scipy, and thus I can claim that I am not taking your >> > comment personally. >> >> Where I work (a large scientific user facility), there are lots of >> scientists in what I'll presume is Eric's position -- able and willing to >> work well with scientific programming tools, but unable to devote the extra >> time needed to develop core functionality or maintain much work outside of >> their own area of interest. There are a great many scientists interested >> in learning and using python. Several people there *are* writing >> scientific libraries with python. Similarly in the fields I work in, >> python is widely accepted as an important ecosystem. >> >> >> > Why isn't scipy not up to the task? Will, the answer is quite simple: >> > because it's developed by volunteers that do it on their spare time, >> late >> > at night too often, or companies that put some of their benefits in >> open >> > source rather in locking down a market. 90% of the time the reason the >> > feature isn't as good as you would want it is because of lack of time. >> > >> > I personally find that suggesting that somebody else should put more >> of >> > the time and money they are already giving away in improving a feature >> > that you need is almost insulting. >> >> Well, in some sense, Eric's message is an expression of interest.... >> Perhaps you would prefer that nobody outside the core group of developers >> or mailing list subscribers asked for any new features or clarification of >> existing features. >> >> >> > I am aware that people do not realize how small the group of people >> that >> > develop and maintain their toys is. Borrowing from Fernando Perez's >> talk >> > at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide >> 80), >> > the number of people that do 90% of the grunt work to get the core >> > scientific Python ecosystem going is around two handfuls. >> >> Well, Fernando's slides indicate there is a small group that dominates >> commits to the projects, then explains, at least partially, why that it >> is. It is *NOT* because scientists expect this work to be done for them by >> volunteers who should just work harder. >> >> There are very good reasons for people to not be involved. The work is >> rarely funded, is generally a distraction from funded work, and hardly ever >> "counts" as scientific work. That's all on top of being a scientist, not a >> programmer. Now, if you'll allow me, I myself am one of the "lucky" >> scientific software developers, well-recognized in my own small community >> for open source analysis software, and also in a scientific position and in >> a group where building tools for better data collection and analysis can >> easily be interpreted as part of the job. In fact, I spend a very >> significant amount of my time writing open source software, and work nearly >> exclusively in python. >> >> So, just as as an example of what happens when someone might >> "contribute", I wrote some code (lmfit-py) that could go into scipy and >> posted it to this list several months ago. Many people have expressed >> interest in this module, and it has been discussed on this list a few times >> in the past few months. Though lmfit-py is older than Fernando's slides >> (it was inspired after being asked several times "Is there something like >> IDL's mpfit, only faster and in python?"), it actually follows his >> directions of "get involved" quite closely: it is BSD, at github, with >> decent documentation, and does not depend on packages other than scipy and >> numpy. Though it's been discussed on this list recently, two responses >> from frequent mailing-list responders (you, Paul V) was more along the >> lines of "yes, that could be done, in principle, if someone were up to >> doing the work" instead of "perhaps package xxx would work for you". >> >> At no point has anyone from the scipy team expressed an interest in >> putting this into scipy. OK, perhaps lmfit-py is not high enough quality. >> I can accept that. > > > I don't think anyone has doubts about the quality of lmfit. On the > contrary, I've asked you to list it on http://scipy.org/Topical_Software(which you did) because I thought it looked interesting, and have directed > some users towards your package. The documentation is excellent, certainly > better than that of many parts of scipy. The worry with your code is that > the maintenance burden may be relatively high, simply because very few > developers are familiar with AST. The same for merging it in scipy - one of > the core developers will have to invest a significant amount of time > wrapping his head around your work. > > The ideal scenario from my point of view would be this: > - lmfit keeps being maintained by you as a separate package for a while > (say six months to a year) > - it gains more users, who can discover potential flaws and provide > feedback. The API can still be changed if necessary. > - once it's stabilized a bit more, you propose it again (and more > explicitly) for inclusion in scipy > - one of the developers does a thorough review and merges it into > scipy.optimize > - you get commit rights and maintain the code within scipy > - bonus points: if you would be interested in improving and reviewing PRs > for related code in optimize. > > > Scipy is a very good place to add functionality that's of use in many > different fields of science and engineering, but it needs many more active > developers. I think this thread is another reminder of that. Some of the > criticism in this thread about how hard it is to contribute is certainly > justified. I've had the plan for a while (since Fernando's EuroScipy talk > actually) to write a more accessible "how to contribute" document than the > one Pauli linked to. Besides the mechanics (git, Trac, etc.) it should at > least provide some guidance on what belongs in scipy vs. in a scikit, how > to get help, how to move a contribution that doesn't get a response > forward, etc. I'll try to get a first draft ready within the next week or > so. > > I wonder if it would be useful to put a reference to lmfit in the leastsq documentation? I know that would need to be temporary and that referencing something outside scipy is unusual, but it might help increase the number of users and help it on it's way. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Mar 9 16:49:29 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 9 Mar 2012 22:49:29 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Fri, Mar 9, 2012 at 10:46 PM, Charles R Harris wrote: > > > On Fri, Mar 9, 2012 at 2:36 PM, Ralf Gommers wrote: > >> >> >> On Fri, Mar 9, 2012 at 1:55 AM, Matthew Newville > > wrote: >> >>> Gael, >>> >>> >>> On Thursday, March 8, 2012 3:07:22 PM UTC-6, Gael Varoquaux wrote: >>> > >>> > I am sorry I am going to react to the provocation. >>> >>> And I am sorry that I am going to react to your message. I think your >>> reaction is unfair. >>> >>> >>> > As some one who spends a fair amount of time working on open source >>> > software I hear such remarks quite often: 'why is feature foo not >>> > implemented in package bar?. I am finding it harder and harder not to >>> > react negatively to these emails. Now I cannot consider myself as a >>> > contributor to scipy, and thus I can claim that I am not taking your >>> > comment personally. >>> >>> Where I work (a large scientific user facility), there are lots of >>> scientists in what I'll presume is Eric's position -- able and willing to >>> work well with scientific programming tools, but unable to devote the extra >>> time needed to develop core functionality or maintain much work outside of >>> their own area of interest. There are a great many scientists interested >>> in learning and using python. Several people there *are* writing >>> scientific libraries with python. Similarly in the fields I work in, >>> python is widely accepted as an important ecosystem. >>> >>> >>> > Why isn't scipy not up to the task? Will, the answer is quite simple: >>> > because it's developed by volunteers that do it on their spare time, >>> late >>> > at night too often, or companies that put some of their benefits in >>> open >>> > source rather in locking down a market. 90% of the time the reason >>> the >>> > feature isn't as good as you would want it is because of lack of >>> time. >>> > >>> > I personally find that suggesting that somebody else should put more >>> of >>> > the time and money they are already giving away in improving a >>> feature >>> > that you need is almost insulting. >>> >>> Well, in some sense, Eric's message is an expression of interest.... >>> Perhaps you would prefer that nobody outside the core group of developers >>> or mailing list subscribers asked for any new features or clarification of >>> existing features. >>> >>> >>> > I am aware that people do not realize how small the group of people >>> that >>> > develop and maintain their toys is. Borrowing from Fernando Perez's >>> talk >>> > at Euroscipy (http://www.euroscipy.org/file/6459?vid=download slide >>> 80), >>> > the number of people that do 90% of the grunt work to get the core >>> > scientific Python ecosystem going is around two handfuls. >>> >>> Well, Fernando's slides indicate there is a small group that dominates >>> commits to the projects, then explains, at least partially, why that it >>> is. It is *NOT* because scientists expect this work to be done for them by >>> volunteers who should just work harder. >>> >>> There are very good reasons for people to not be involved. The work is >>> rarely funded, is generally a distraction from funded work, and hardly ever >>> "counts" as scientific work. That's all on top of being a scientist, not a >>> programmer. Now, if you'll allow me, I myself am one of the "lucky" >>> scientific software developers, well-recognized in my own small community >>> for open source analysis software, and also in a scientific position and in >>> a group where building tools for better data collection and analysis can >>> easily be interpreted as part of the job. In fact, I spend a very >>> significant amount of my time writing open source software, and work nearly >>> exclusively in python. >>> >>> So, just as as an example of what happens when someone might >>> "contribute", I wrote some code (lmfit-py) that could go into scipy and >>> posted it to this list several months ago. Many people have expressed >>> interest in this module, and it has been discussed on this list a few times >>> in the past few months. Though lmfit-py is older than Fernando's slides >>> (it was inspired after being asked several times "Is there something like >>> IDL's mpfit, only faster and in python?"), it actually follows his >>> directions of "get involved" quite closely: it is BSD, at github, with >>> decent documentation, and does not depend on packages other than scipy and >>> numpy. Though it's been discussed on this list recently, two responses >>> from frequent mailing-list responders (you, Paul V) was more along the >>> lines of "yes, that could be done, in principle, if someone were up to >>> doing the work" instead of "perhaps package xxx would work for you". >>> >>> At no point has anyone from the scipy team expressed an interest in >>> putting this into scipy. OK, perhaps lmfit-py is not high enough quality. >>> I can accept that. >> >> >> I don't think anyone has doubts about the quality of lmfit. On the >> contrary, I've asked you to list it on http://scipy.org/Topical_Software(which you did) because I thought it looked interesting, and have directed >> some users towards your package. The documentation is excellent, certainly >> better than that of many parts of scipy. The worry with your code is that >> the maintenance burden may be relatively high, simply because very few >> developers are familiar with AST. The same for merging it in scipy - one of >> the core developers will have to invest a significant amount of time >> wrapping his head around your work. >> >> The ideal scenario from my point of view would be this: >> - lmfit keeps being maintained by you as a separate package for a while >> (say six months to a year) >> - it gains more users, who can discover potential flaws and provide >> feedback. The API can still be changed if necessary. >> - once it's stabilized a bit more, you propose it again (and more >> explicitly) for inclusion in scipy >> - one of the developers does a thorough review and merges it into >> scipy.optimize >> - you get commit rights and maintain the code within scipy >> - bonus points: if you would be interested in improving and reviewing PRs >> for related code in optimize. >> >> >> Scipy is a very good place to add functionality that's of use in many >> different fields of science and engineering, but it needs many more active >> developers. I think this thread is another reminder of that. Some of the >> criticism in this thread about how hard it is to contribute is certainly >> justified. I've had the plan for a while (since Fernando's EuroScipy talk >> actually) to write a more accessible "how to contribute" document than the >> one Pauli linked to. Besides the mechanics (git, Trac, etc.) it should at >> least provide some guidance on what belongs in scipy vs. in a scikit, how >> to get help, how to move a contribution that doesn't get a response >> forward, etc. I'll try to get a first draft ready within the next week or >> so. >> >> > I wonder if it would be useful to put a reference to lmfit in the leastsq > documentation? I know that would need to be temporary and that referencing > something outside scipy is unusual, but it might help increase the number > of users and help it on it's way. > Fine with me. I actually think we can do this more often, both for packages that may be included in scipy later and for pacakges like scikits.image/statsmodels/learn. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Mar 10 08:45:11 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 14:45:11 +0100 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> Message-ID: <4F5B5AE7.1020109@molden.no> Den 09.03.2012 21:13, skrev josef.pktd at gmail.com: > I think Sturla has a point in that both count and length are positive. > It doesn't look like it's relevant for length, but in the counts there > is a bunching just above zero, this creates either a non-linearity or > requires another distribution log-normal (?) or Poisson (without > zeros, or loc=1)? Josef You can see that the dependent variable is counts with most of them below 10. So I maintain that appropriate model is Poisson regression. That is, COX_count ~ Poission(lambda) with log(lambda) = b0 + b1 * genome_length Or if there are N groups of bacteria, log(lambda) = b[0] + b[1] * genome_length + np.dot(b[2:N+1], group[0:N-1]) with N-1 dummy indicator variables in the vector "group". One could of course consider even more complicated models, such as interaction terms between bacterial group and genome length. It's just a matter of adding in the appropriate predictor variables. Normally, the p-value of a Poisson regression model can be inferred from the likelihood ratio against a reduced model if samples are independent. But if samples are not independent, one cannot assume that the total log-likelihood for the whole data is the sum of log-likelihoods for each data point. So Peter would need to derive a correction for this. I cannot be more specific because I don't know the specifics about how this between-sample dependency is generated. Perhaps Peter could explain it? Sturla From josef.pktd at gmail.com Sat Mar 10 08:57:01 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 10 Mar 2012 08:57:01 -0500 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: <4F5B5AE7.1020109@molden.no> References: <4F5A25F7.3020408@molden.no> <4F5B5AE7.1020109@molden.no> Message-ID: On Sat, Mar 10, 2012 at 8:45 AM, Sturla Molden wrote: > Den 09.03.2012 21:13, skrev josef.pktd at gmail.com: >> I think Sturla has a point in that both count and length are positive. >> It doesn't look like it's relevant for length, but in the counts there >> is a bunching just above zero, this creates either a non-linearity or >> requires another distribution log-normal (?) or Poisson (without >> zeros, or loc=1)? Josef > > You can see that the dependent variable is counts with most of them > below 10. So I maintain that appropriate model is Poisson regression. > > That is, > > ? ?COX_count ~ Poission(lambda) > > with > > ? ?log(lambda) = b0 + b1 * genome_length > > Or if there are N groups of bacteria, > > ? ?log(lambda) = b[0] + b[1] * genome_length > ? ? ? ? ?+ np.dot(b[2:N+1], group[0:N-1]) > > with N-1 dummy indicator variables in the vector "group". > > One could of course consider even more complicated models, such as > interaction terms between bacterial group and genome length. It's just a > matter of adding in the appropriate predictor variables. > > Normally, the p-value of a Poisson regression model can be inferred from > the likelihood ratio against a reduced model if samples are independent. > > But if samples are not independent, one cannot assume that the total > log-likelihood for the whole data is the sum of log-likelihoods for each > data point. So Peter would need to derive a correction for this. I > cannot be more specific because I don't know the specifics about how > this between-sample dependency is generated. Perhaps Peter could explain it? He explained the between sample correlation with the similarity (my analogy autocorrelation in time series, or spatial correlation). The main problem I see with using Poisson is that I wouldn't know how to include the correlation. I never looked at this, and statsmodels doesn't implement it. (I looked a bit at count processes for time series with serial dependence, but not much.) My guess is that log-linear or something like that would be easier Is there a multivariate version of Poisson with correlated observations similar to GLS for the linear model? Josef > > > Sturla > > > > > > > > > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From njs at pobox.com Sat Mar 10 09:01:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 10 Mar 2012 14:01:38 +0000 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Fri, Mar 9, 2012 at 9:36 PM, Ralf Gommers wrote: > I don't think anyone has doubts about the quality of lmfit. On the contrary, > I've asked you to list it on http://scipy.org/Topical_Software (which you > did) because I thought it looked interesting, and have directed some users > towards your package. The documentation is excellent, certainly better than > that of many parts of scipy. The worry with your code is that the > maintenance burden may be relatively high, simply because very few > developers are familiar with AST. The same for merging it in scipy - one of > the core developers will have to invest a significant amount of time > wrapping his head around your work. Out of curiosity (and apropos an earlier thread), would it affect your reservations if lmfit's ad-hoc AST usage and python interpreter were replaced by a simple call to 'eval'? -- N From sturla at molden.no Sat Mar 10 09:34:54 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 15:34:54 +0100 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> <4F5B5AE7.1020109@molden.no> Message-ID: <4F5B668E.4000100@molden.no> Den 10.03.2012 14:57, skrev josef.pktd at gmail.com: > My guess is that log-linear or something like that would be easier Log-linear models are to Poisson regression what ANOVA is to linear regression. There is covariate (genome length), so he cannot just use categorical predictors. > Is there a multivariate version of Poisson with correlated > observations similar to GLS for the linear model? Yes there is, but I am not sure it would help. I have to think about this again. Sturla From ralf.gommers at googlemail.com Sat Mar 10 09:50:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 10 Mar 2012 15:50:14 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: On Sat, Mar 10, 2012 at 3:01 PM, Nathaniel Smith wrote: > On Fri, Mar 9, 2012 at 9:36 PM, Ralf Gommers > wrote: > > I don't think anyone has doubts about the quality of lmfit. On the > contrary, > > I've asked you to list it on http://scipy.org/Topical_Software (which > you > > did) because I thought it looked interesting, and have directed some > users > > towards your package. The documentation is excellent, certainly better > than > > that of many parts of scipy. The worry with your code is that the > > maintenance burden may be relatively high, simply because very few > > developers are familiar with AST. The same for merging it in scipy - one > of > > the core developers will have to invest a significant amount of time > > wrapping his head around your work. > > Out of curiosity (and apropos an earlier thread), would it affect your > reservations if lmfit's ad-hoc AST usage and python interpreter were > replaced by a simple call to 'eval'? I don't think using "eval" will make anyone happy. Also, my reservation isn't very strong. I personally think it would make sense for lmfit to remain a stand-alone package for a while, but if others disagree and there's one developer who invests the required time in reviewing/merging it, that's all it would take. This doesn't even have to be a core scipy developer - if you for example would do this work and indicate that you would be able to do some basic maintenance for it in case Matt can't / won't anymore, that would be perfectly fine. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Mar 10 09:54:55 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Mar 2012 15:54:55 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: 10.03.2012 15:01, Nathaniel Smith kirjoitti: [clip] > Out of curiosity (and apropos an earlier thread), would it affect your > reservations if lmfit's ad-hoc AST usage and python interpreter were > replaced by a simple call to 'eval'? As far as I see, the AST manipulation is used only to provide a restricted dialect of Python. The main question then seems to be whether there is a need to guard against malicious input (IMHO, not really). The AST pieces seem to be more or less abstracted away, so this shouldn't matter so much. The bigger issue is that the interface provided by this package for specifying constraints etc. is completely different from the unified optimization interface that is currently in Scipy's Git. Just dumping lmfit into Scipy would IMHO not be a good idea --- having one interface for scalar minimizers, and then a completely different one for least squares does not seem like a good idea. Code and ideas could be reused, though, picking up the best parts of the two approaches that are there now, and producing a better one. -- Pauli Virtanen From sturla at molden.no Sat Mar 10 10:05:58 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 16:05:58 +0100 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: References: <4F5A25F7.3020408@molden.no> <4F5B5AE7.1020109@molden.no> Message-ID: <4F5B6DD6.7020500@molden.no> Den 10.03.2012 14:57, skrev josef.pktd at gmail.com: > > He explained the between sample correlation with the similarity (my > analogy autocorrelation in time series, or spatial correlation). > > Look at his attachment ives.tiff. If the categories are known in advance (right panel in ives.tiff), I think what he actually needs is computing the likelihood ratio between the model log(lambda) = b[0] + b[1] * genome_length + np.dot(b[2:N+1], group[0:N-1]) and a reduced model log(lambda) = b[0] + np.dot(b[1:N], group[0:N-1]) That is, adding genome length as a predictor should not improve the fit given that bacterial groups are already in the model. If he does not have groups, but some sort of dendrogram (left panel in ives.tiff), perhaps he could preprocess the data by clustering the bacteria based on his dendrogram? A full dendrogram (e.g. used as nested log-linear model) would overfit the data and explain it perfectly. So adding genome length would always give zero improvement. But if the dendrogram can be reduced into a few descrete categories, he could use a likelihood ratio test for the genome length. Sturla From sturla at molden.no Sat Mar 10 10:10:02 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 16:10:02 +0100 Subject: [SciPy-User] Generalized least square on large dataset In-Reply-To: <4F5B6DD6.7020500@molden.no> References: <4F5A25F7.3020408@molden.no> <4F5B5AE7.1020109@molden.no> <4F5B6DD6.7020500@molden.no> Message-ID: <4F5B6ECA.1090307@molden.no> Den 10.03.2012 16:05, skrev Sturla Molden: > That is, adding genome length as a predictor should not > improve the fit given that bacterial groups are already in > the model... ... and the situation is as in the right panel of ives.tiff. Sturla From pav at iki.fi Sat Mar 10 10:46:34 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Mar 2012 16:46:34 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> Message-ID: 10.03.2012 15:54, Pauli Virtanen kirjoitti: [clip] > Just dumping lmfit into Scipy would IMHO not be a good idea --- having > one interface for scalar minimizers, and then a completely different one > for least squares does not seem like a good idea. Code and ideas could > be reused, though, picking up the best parts of the two approaches that > are there now, and producing a better one. ... although it seems that the interface in Scipy now is a much more lower-level one --- and does not have the parameter convenience abstraction, which seems to be the main point in lmfit, and is orthogonal to what's currently in. Now that would be a rather useful addition, one just would have to figure out how to make it work nicely also with the scalar optimizers. -- Pauli Virtanen From matt.newville at gmail.com Fri Mar 9 15:50:57 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Fri, 9 Mar 2012 12:50:57 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> Message-ID: <5272144.709.1331326257133.JavaMail.geo-discussion-forums@ynhs12> Pauli, On Friday, March 9, 2012 12:31:22 PM UTC-6, Pauli Virtanen wrote: > > Hi, > > 09.03.2012 18:50, Charles R Harris kirjoitti: > [clip] > > Carefully stepping past the kerfluffle at the bar, I think this sort of > > functionality in scipy would be useful. If nothing else, I wouldn't have > > to keep implementing for myself ;) IIRC, Dennis Lexalde was going to do > > something similar and I think it would be good if some of the folks with > > implementations started a separate thread about getting it into scipy. > > Dennis actually not only intended, but also implemented something > similar. I wasn't too deeply involved in that, but it's already merged > in Scipy's trunk. > > Now, based on a *very* quick look to lmfit (I did not look at it before > now as I did not remember it existed), it seems to be quite similar in > purpose. Hashing out if lmfit has something extra, or if the current > implementation is missing something could be useful, however. > > Pauli > If I understand, you are talking about scipy.optimize.minimize(), which can take many minimization methods, but only accepts bounds for the underlying methods (l-bfgs-b, coblya, slsqp, and tnc), and constraints only for coblya and slsqp. Thus, I would interpret minimize() to aim to be (and documented to be) a unification of the routines to minimize a scalar function of one or more variables. For the discussion here, minimize() does not support the Levenberg-Marquardt least-squares algorithm (leastsq) at all, as lmfit uses (and as mpfit uses). The constraint mechanism is entirely different between lmfit and the other constrained optimization methods. Cheers, --Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.newville at gmail.com Fri Mar 9 16:13:35 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Fri, 9 Mar 2012 13:13:35 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <20120309165443.GA26552@phare.normalesup.org> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> Message-ID: <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> Hi Gael, On Friday, March 9, 2012 10:54:43 AM UTC-6, Gael Varoquaux wrote: > > Hi Matt, > > I am not going to answer to the core of your message. I partly agree with > it and partly disagree. I think that it is fair to have different points > of view. In addition, I do share the opinion that the situation of > developers in open source scientific software is not ideal. I've suffered > from it personally. > > I just want to react to a couple of minor points > > > At no point has anyone from the scipy team expressed an interest in > > putting this into scipy. > > Who is the scipy team? What is the scipy team? Who could or should > express such an interest? These are people struggling to maintain a > massive package on their free time. It actually takes a lot of time to > monitor the mailing lists and pick up offers like yours to turn them into > something that can be integrated. > OK, that's all fair. I am certainly not a scipy dev. But you do seem to want it both ways: chastise people for asking for features instead of writing those features themselves, and also saying you're far too busy to read and act on submissions of code unless it conforms exactly to your needs (this year, with github!). I would guess that this is not likely to work very well for you. For myself, I might be willing to contribute to scipy, but I have my own projects and mailing lists to worry about. Lmfit was something of a fun side-project for me -- I'm willing to support it, and can see that it *could* go into scipy. Since you originally brought it up, I did essentially everything in Fernando Perez's slides pleaded for how to contribute (except submit a pull request, but I did ask whether this was viewed as something that should be a scikit, or go into scipy, or retained as standalone). > Had you submitted a pull request, with code ready to be merged, i.e. with > no extra work in terms of documentation, API or tests, I think that it > would be legitimate to blame the scipy developers for lack of interest. > That said, I can easily understand how such a pull request would fall > between the cracks. It's unfortunate, not excusable, but it does happen. > Indeed, in the projects I maintain, I am kept busy full time with pure > maintenance work (bug fixing, answering emails, improving documentation). > When I review and merge pull requests, a lot of the time they are for > features that I do not need, and I spend full week ends adding tests, > fixing numerical instabilities, completing the docs so that they can be > merged. You have to realize that most contributions to open source > projects actually add up to the workload of the core developers. > Thankfully, not all of them. Teams do build upon people unexpectedly > fixing bugs, contributing flawless code that can be merged in without any > additional work. > Well, I understand you don't necessarily consider yourself to be a scipy dev, but I'll ask anyway: How many pull requests have been made, and how many have been accepted? Perhaps I am not reading github's Pull Request link correctly, but that seems to indicate that the numbers are 17 and 0. Surely, those cannot be correct. Unless I am counting wrong, 5 of the 17 (total? outstanding?) pull requests listed for scipy/scipy involve optimization. > I personally have seen my time invested in maintenance of open source > project go up and up for the last few years, until it was to a point > where I was spending a major part of my free time on it. It ended up > giving me a nasty back pain, and I started not answering bug reports, > pull requests and support emails to preserve my health: it is not sane to > spend all onces time in front of a computer. > > > There are many kinds of skills. Sometimes, not insulting your > customers, > > colleagues, and potential collaborators is the most important one. > > Maybe I went over the top. I didn't want to sound insulting. I felt > insulted, as an open source develop (even thought I am not a scipy > developer). I am sorry that I ignited a flame. Getting worked out about > email is never a good thing, and discussion pushing blame certainly don't > help building a community. Maybe I shouldn't have sent this email, or I > should have worded it differently. I apologize for the harsh tone. I > certainly did feel bad when I received the original email, and I wanted > to express it. > It is easy to feel like one's hard work on an open source project is under-appreciated. I understand that feeling, and have felt that way myself in the past. I am definitely quite appreciative of the work done on scipy and friends. > > For myself, I find it quite discouraging that the scipy team is so > > insular. > > Firstly, I would like to stress that I cannot consider myself as part of > the scipy team. I contribute very little code to scipy. As a consequence > I do not feel that I have much legitimacy in making decisions or > comments on the codebase. Thus you shouldn't take my reaction as a > reaction coming from the scipy team, but rather as coming from myself. > > Second, can I ask you what makes you think that the scipy team is > insular? Scipy is a big project with a lot of history. As such it is > harder to contribute to it than a small and light project. But I don't > feel any dogmatism or clique attitude from the developers. And, by the > way, if we are going to talk about the scipy developers, > Well, there was a discussion "Alternative to scipy.optimize" in the past two weeks on scipy-users that mentioned lmfit, and several over the past several months, and apparently requests about mpfit in the more distant past. And yet two scipy contributers (according to github's list, I am counting you) responded to the original request for features **exactly like lmfit** with something reading an awful lot like "Well, you'll have to write one". Perhaps insular is not a fair characterization -- how would you characterize that? I encourage everybody to find out who they are, i.e. who has been > contributing lately > [1]. I don't think that the handful of people that come on top of the > list have an insular behavior. I do think that they are on an island, in > the sens that they are pretty much left alone to do the grunt work. None > of these people reacted badly to any mail on this mailing list about the > state of scipy. I raise my hat to them! > And so do I. --Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Mar 10 12:00:44 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Mar 2012 18:00:44 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> Message-ID: 09.03.2012 22:13, Matthew Newville kirjoitti: [clip] > Well, I understand you don't necessarily consider yourself to be a scipy > dev, but I'll ask anyway: How many pull requests have been made, and > how many have been accepted? Perhaps I am not reading github's Pull > Request link correctly, but that seems to indicate that the numbers are > 17 and 0. Surely, those cannot be correct. Unless I am counting > wrong, 5 of the 17 (total? outstanding?) pull requests listed for > scipy/scipy involve optimization. Wrong: 17 open, 99 accepted. [clip] > Well, there was a discussion "Alternative to scipy.optimize" in the past > two weeks on scipy-users that mentioned lmfit, and several over the past > several months, and apparently requests about mpfit in the more distant > past. And yet two scipy contributers (according to github's list, I am > counting you) responded to the original request for features **exactly > like lmfit** with something reading an awful lot like "Well, you'll have > to write one". Perhaps insular is not a fair characterization -- how > would you characterize that? Come on. The correct characterization is just: "busy". I did not remember that your project existed as it was announced half a year ago with no proposal that it should be integrated, and did not read the recent thread on scipy-user. -- Pauli Virtanen From pav at iki.fi Sat Mar 10 12:02:13 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Mar 2012 18:02:13 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> Message-ID: 10.03.2012 18:00, Pauli Virtanen kirjoitti: > 09.03.2012 22:13, Matthew Newville kirjoitti: > [clip] >> Well, I understand you don't necessarily consider yourself to be a scipy >> dev, but I'll ask anyway: How many pull requests have been made, and >> how many have been accepted? Perhaps I am not reading github's Pull >> Request link correctly, but that seems to indicate that the numbers are >> 17 and 0. Surely, those cannot be correct. Unless I am counting >> wrong, 5 of the 17 (total? outstanding?) pull requests listed for >> scipy/scipy involve optimization. > > Wrong: 17 open, 99 accepted Sorry, the actual number is 161 accepted. -- Pauli Virtanen From josef.pktd at gmail.com Sat Mar 10 12:12:56 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 10 Mar 2012 12:12:56 -0500 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> Message-ID: On Sat, Mar 10, 2012 at 12:00 PM, Pauli Virtanen wrote: > 09.03.2012 22:13, Matthew Newville kirjoitti: > [clip] >> Well, I understand you don't necessarily consider yourself to be a scipy >> dev, but I'll ask anyway: ?How many pull requests have been made, and >> how many have been accepted? ?Perhaps I am not reading github's Pull >> Request link correctly, but that seems to indicate that the numbers are >> 17 and 0. ?Surely, those cannot be correct. ? ? ?Unless I am counting >> wrong, 5 of the 17 (total? outstanding?) pull requests listed for >> scipy/scipy involve optimization. > > Wrong: 17 open, 99 accepted. > > [clip] >> Well, there was a discussion "Alternative to scipy.optimize" in the past >> two weeks on scipy-users that mentioned lmfit, and several over the past >> several months, and apparently requests about mpfit in the more distant >> past. ?And yet two scipy contributers (according to github's list, I am >> counting you) responded to the original request for features **exactly >> like lmfit** with something reading an awful lot like "Well, you'll have >> to write one". ? ?Perhaps insular is not a fair characterization -- how >> would you characterize that? > > Come on. The correct characterization is just: "busy". I did not > remember that your project existed as it was announced half a year ago > with no proposal that it should be integrated, and did not read the > recent thread on scipy-user. just as a reminder there is also https://github.com/scipy/scipy/pull/90 openopt started out as a scikits. It doesn't look easy to me to come up with an interface to optimizers that satisfy "all" use cases. (and there is only one Pauli) Josef > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Sat Mar 10 13:16:09 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 10 Mar 2012 13:16:09 -0500 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? Message-ID: I would like to extend the discussion a bit. (I thought of dropping the half finished message and go back to my corner but I think this might still be useful.) On Fri, Mar 9, 2012 at 1:47 PM, Pauli Virtanen wrote: > Hi, > > 09.03.2012 17:51, william ratcliff kirjoitti: > [clip] >> In this particular case, what are the exact steps needed to get it into >> scipy? ?Can they charge be listed as tickets somewhere so that others of >> us can help? ?Can we document the process to make it easier the next >> time? ?I realize everyone is busy, but if the barrier to contribution is >> lowered it will make life better in the long run. > > In general, basically two ways for contributions: > > 1. A pull request via Github. We have a writeup here with various tips: > > ? http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html > > ? Just replace "Numpy" by "Scipy" everywhere. > > 2. File a ticket in the Trac: http://projects.scipy.org/scipy/ > > ? Attach whetever you have (a patch, separate files) to the ticket, > ? and tag it as "enhancement" and "needs_review". > > That's about it. > > ? ?*** > > However, to make it easier for someone to look at the work and verify it > works properly: > > - Ensure your code is accompanied by tests that demonstrate it actually > ?works as intended. You can look for examples how to write them in the > ?Scipy source tree, in files named test_XXX.py > > - Ensure the behavior of the public functions is documented in the > ?docstrings. > > - Prefer the Github way. Granted, there *is* a learning curve, but it > ?saves work in the long run, and it is far less clunky to use. > > - The more finished the contribution is, the less work it is to merge, > ?and gets in faster. > > If you get no response, shout on the scipy-devel mailing list. If > there's still no response, shout louder and start accusing people ;) > > If the contribution is "controversial" --- duplicates existing > functionality, breaks backwards compatibility, is very specialized for a > particular research problem, relies on magic, etc. --- it's good to give > an argument why the stuff should be included, as otherwise the > motivation may be missed. > Code that is not ready for Prime Time -------------------------------------------------- Stata has a large library of user contributed code http://ideas.repec.org/s/boc/bocode.html matlab has the file exchange several other statistical packages have user contributed code collections in forums SciPy has mailing lists, pull requests, gists, tickets and scipy-central and open source and pypi Pull request on github are very convenient for code that changes, improves, fixes, existing code, or code that can be included after some work. For new functions or modules intended for library code, I find them easier to try out when they are standalone. One of the bottlenecks in including code in established packages is the time it takes to review, refactor and test code that isn't quite right or doesn't quite fit. I think it would be useful to have a central location for publishing code and get earlier feedback from users. (Currently I bookmark, for example, gists or modules in a random package, that look interesting and I might not find again when I need them in future.) I'm still envious of the matlab fileexchange with a very useful commenting system where I look more often than at scipy-central, and I like the commenting system on stackoverflow that gives a much faster way of evaluating several ways of doing things. I also think Mathworks and Statacorp are very smart in supporting user code, since (I assume) that they look at the download statistic to see what is in high demand and might incorporate code if it's license compatible. A community supported scipy-central !? Cheers, Josef > > Cheers, > Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Sat Mar 10 13:32:47 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Mar 2012 19:32:47 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: 10.03.2012 19:16, josef.pktd at gmail.com kirjoitti: [clip] > I think it would be useful to have a central location for publishing > code and get earlier feedback from users. (Currently I bookmark, for > example, gists or modules in a random package, that look interesting > and I might not find again when I need them in future.) > > I'm still envious of the matlab fileexchange with a very useful > commenting system where I look more often than at scipy-central, and I > like the commenting system on stackoverflow that gives a much faster > way of evaluating several ways of doing things. > > I also think Mathworks and Statacorp are very smart in supporting user > code, since (I assume) that they look at the download statistic to see > what is in high demand and might incorporate code if it's license > compatible. I think this is pretty much the position scipy-central tries to fill. As far as I understand, what you say that is missing is - more advertisement - comment system The former can be fixed. The latter requires someone to sit down and think how to implement it --- but it shouldn't be too difficult, as the app [1] is written with Django and seems pretty easy to work on. If someone familiar with web applications wants to give a hand here, this would be an opportunity to contribute. [1] https://github.com/kgdunn/SciPyCentral/ -- Pauli Virtanen From ralf.gommers at googlemail.com Sat Mar 10 14:48:11 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 10 Mar 2012 20:48:11 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: On Sat, Mar 10, 2012 at 7:32 PM, Pauli Virtanen wrote: > 10.03.2012 19:16, josef.pktd at gmail.com kirjoitti: > [clip] > > I think it would be useful to have a central location for publishing > > code and get earlier feedback from users. (Currently I bookmark, for > > example, gists or modules in a random package, that look interesting > > and I might not find again when I need them in future.) > > > > I'm still envious of the matlab fileexchange with a very useful > > commenting system where I look more often than at scipy-central, and I > > like the commenting system on stackoverflow that gives a much faster > > way of evaluating several ways of doing things. > > > > I also think Mathworks and Statacorp are very smart in supporting user > > code, since (I assume) that they look at the download statistic to see > > what is in high demand and might incorporate code if it's license > > compatible. > > I think this is pretty much the position scipy-central tries to fill. As > far as I understand, what you say that is missing is > > - more advertisement > As a start, I added a link on the main scipy.org site. Adding it to http://new.scipy.org/ would also be good - SciPy Central needs a logo though. Ralf > > - comment system > > The former can be fixed. The latter requires someone to sit down and > think how to implement it --- but it shouldn't be too difficult, as the > app [1] is written with Django and seems pretty easy to work on. If > someone familiar with web applications wants to give a hand here, this > would be an opportunity to contribute. > > [1] https://github.com/kgdunn/SciPyCentral/ > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Sat Mar 10 17:09:49 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 10 Mar 2012 14:09:49 -0800 Subject: [SciPy-User] Layering a virtualenv over EPD Message-ID: <4F5BD12D.9090504@simplistix.co.uk> Hi All, I hope this is the right list. So, here's the problem I'm facing: I use EPD as my base python, but I have a bunch of projects that all have additional dependencies. For example, I may want to use a version of Pandas that's newer than the one that ships with my chosen version of EPD, and add some additional libraries that don't ship with EPD. Okay, so I worry that the advice may be "well just install stuff into EPD with pip or easy_install". I don't want to do that, just because I need a newer version of Pandas for one project, doesn't mean I want to have to make sure *all* my projects work with that new version, etc. So, I tried to wrap a virtualenv around epd: epd-python virtualenv.py mytestenv ...and then install new pandas and other stuff: mytestenv/bin/python easy_install -U pandas ...but now, how would I start ipython using that virutalenv? I tried just running "ipython", but of course, that doesn't include the virtualenv. I tried activating the virtualenv and then running ipython, but it still doesn't include anything installed in the virtualenv. So, how should I do this? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From robert.kern at gmail.com Sat Mar 10 17:22:52 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 10 Mar 2012 22:22:52 +0000 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD12D.9090504@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> Message-ID: On Sat, Mar 10, 2012 at 22:09, Chris Withers wrote: > Hi All, > > I hope this is the right list. > > So, here's the problem I'm facing: I use EPD as my base python, but I > have a bunch of projects that all have additional dependencies. Questions about EPD should go to epd-users at enthought.com https://mail.enthought.com/mailman/listinfo/epd-users However, your question is about virtualenv, not EPD. > ...but now, how would I start ipython using that virutalenv? > > I tried just running "ipython", but of course, that doesn't include the > virtualenv. You need to install IPython in your virtualenv. -- Robert Kern From sturla at molden.no Sat Mar 10 17:24:46 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 23:24:46 +0100 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD12D.9090504@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> Message-ID: <4F5BD4AE.5020703@molden.no> You might also want to ask on the EPD users' list: epd-users at the provider of EPD (written so to avoid spam). Sturla Den 10.03.2012 23:09, skrev Chris Withers: > Hi All, > > I hope this is the right list. > > So, here's the problem I'm facing: I use EPD as my base python, but I > have a bunch of projects that all have additional dependencies. For > example, I may want to use a version of Pandas that's newer than the one > that ships with my chosen version of EPD, and add some additional > libraries that don't ship with EPD. > > Okay, so I worry that the advice may be "well just install stuff into > EPD with pip or easy_install". I don't want to do that, just because I > need a newer version of Pandas for one project, doesn't mean I want to > have to make sure *all* my projects work with that new version, etc. > > So, I tried to wrap a virtualenv around epd: > > epd-python virtualenv.py mytestenv > > ...and then install new pandas and other stuff: > > mytestenv/bin/python easy_install -U pandas > > ...but now, how would I start ipython using that virutalenv? > > I tried just running "ipython", but of course, that doesn't include the > virtualenv. > > I tried activating the virtualenv and then running ipython, but it still > doesn't include anything installed in the virtualenv. > > So, how should I do this? > > Chris > From chris at simplistix.co.uk Sat Mar 10 17:40:33 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 10 Mar 2012 14:40:33 -0800 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: References: <4F5BD12D.9090504@simplistix.co.uk> Message-ID: <4F5BD861.8050205@simplistix.co.uk> On 10/03/2012 14:22, Robert Kern wrote: >> So, here's the problem I'm facing: I use EPD as my base python, but I >> have a bunch of projects that all have additional dependencies. > > Questions about EPD should go to epd-users at enthought.com Well, okay, but this is a more generic question that seems to face a lot of SciPy users: "I want to layer some stuff on top of a binary install of the scipy stuff, without poluting that base layer", that base layer being EPD, OS-installed packages, etc... >> ...but now, how would I start ipython using that virutalenv? >> >> I tried just running "ipython", but of course, that doesn't include the >> virtualenv. > > You need to install IPython in your virtualenv. Okay, but how do I do that without having to build the whole of ipython myself? How do I say "just let me run ipython (or any of the other binary tools that are in scipy) with a virtual env wrapped over it? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From sturla at molden.no Sat Mar 10 17:44:48 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Mar 2012 23:44:48 +0100 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD861.8050205@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> Message-ID: <4F5BD960.3040902@molden.no> Den 10.03.2012 23:40, skrev Chris Withers: > > Well, okay, but this is a more generic question that seems to face a lot > of SciPy users: "I want to layer some stuff on top of a binary install > of the scipy stuff, without poluting that base layer", that base layer > being EPD, OS-installed packages, etc... > Install the package or module to a local folder instead of site_packages and make sure it is in PYTHONPATH. Sturla From robert.kern at gmail.com Sat Mar 10 17:45:41 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 10 Mar 2012 22:45:41 +0000 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD861.8050205@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> Message-ID: On Sat, Mar 10, 2012 at 22:40, Chris Withers wrote: > On 10/03/2012 14:22, Robert Kern wrote: >>> >>> So, here's the problem I'm facing: I use EPD as my base python, but I >>> have a bunch of projects that all have additional dependencies. >> >> >> Questions about EPD should go to epd-users at enthought.com > > > Well, okay, but this is a more generic question that seems to face a lot of > SciPy users: "I want to layer some stuff on top of a binary install of the > scipy stuff, without poluting that base layer", that base layer being EPD, > OS-installed packages, etc... > > >>> ...but now, how would I start ipython using that virutalenv? >>> >>> I tried just running "ipython", but of course, that doesn't include the >>> virtualenv. >> >> >> You need to install IPython in your virtualenv. > > > Okay, but how do I do that without having to build the whole of ipython > myself? The best way to use IPython in your virtualenv is to install it in your virtualenv. It's easy. > How do I say "just let me run ipython (or any of the other binary > tools that are in scipy) There are no executable scripts in scipy. > with a virtual env wrapped over it? If you must, you can edit the ipython script that you already have installed. Change the #! line to be #!/usr/bin/env python instead of the full path to your EPD Python executable. This will make the ipython script use whichever "python" executable is first in your $PATH. If you have your virtualenv activated, this will be the virtualenv's "python" executable. -- Robert Kern From chris at simplistix.co.uk Sat Mar 10 17:57:47 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 10 Mar 2012 14:57:47 -0800 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD960.3040902@molden.no> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> <4F5BD960.3040902@molden.no> Message-ID: <4F5BDC6B.6090102@simplistix.co.uk> On 10/03/2012 14:44, Sturla Molden wrote: > Den 10.03.2012 23:40, skrev Chris Withers: >> >> Well, okay, but this is a more generic question that seems to face a lot >> of SciPy users: "I want to layer some stuff on top of a binary install >> of the scipy stuff, without poluting that base layer", that base layer >> being EPD, OS-installed packages, etc... >> > > Install the package or module to a local folder instead of site_packages > and make sure it is in PYTHONPATH. Yeah, virtualenv is kinda the abstraction of that pattern ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From sturla at molden.no Sat Mar 10 18:05:39 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 11 Mar 2012 00:05:39 +0100 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BDC6B.6090102@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> <4F5BD960.3040902@molden.no> <4F5BDC6B.6090102@simplistix.co.uk> Message-ID: <4F5BDE43.1090401@molden.no> Den 10.03.2012 23:57, skrev Chris Withers: > > Yeah, virtualenv is kinda the abstraction of that pattern ;-) > It seems Python has introduced its' own version of DLL hell. Sorry for stating the obvious. Sturla From rex at nosyntax.net Sat Mar 10 18:44:42 2012 From: rex at nosyntax.net (rex) Date: Sat, 10 Mar 2012 15:44:42 -0800 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD861.8050205@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> Message-ID: <20120310234442.GN24301@ninja.nosyntax.net> Chris Withers [2012-03-10 14:40]: >On 10/03/2012 14:22, Robert Kern wrote: >>> So, here's the problem I'm facing: I use EPD as my base python, but I >>> have a bunch of projects that all have additional dependencies. >> >> Questions about EPD should go to epd-users at enthought.com > >Well, okay, but this is a more generic question that seems to face a lot >of SciPy users: "I want to layer some stuff on top of a binary install >of the scipy stuff, without poluting that base layer", that base layer >being EPD, OS-installed packages, etc... > >>> ...but now, how would I start ipython using that virutalenv? >>> >>> I tried just running "ipython", but of course, that doesn't include the >>> virtualenv. >> >> You need to install IPython in your virtualenv. > >Okay, but how do I do that without having to build the whole of ipython >myself? How do I say "just let me run ipython (or any of the other >binary tools that are in scipy) with a virtual env wrapped over it? Simple answer: use R. I fought with Python+NumPy+SciPy+Matplotlib problems for years before I discovered R. Night and day change. One package that just works. Thousands of libraries that just work. Developers, I'm not denigrating your efforts. I like Python, and I really tried to make Python+NumPy+SciPy+Matplotlib my main tool for years, but as a mere user it was simply too difficult to maintain the parts -- every hour spent screwing with tool problems is an hour lost to creative work. Perhaps the NumPy+SciPy+Matplotlib community could learn something by looking at how the R community works? To this mere user who wants to get a job done, it's a night and day difference. I still use Python for GP programming, but there's a snowball's chance I'd ever use anything but R for my main interest, which is econometrics. -rex -- "In the real world, this would be a problem. But in mathematics, we can just define a place where this problem doesn't exist. So we'll go ahead and do that now..." From wardefar at iro.umontreal.ca Sat Mar 10 20:17:56 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sat, 10 Mar 2012 20:17:56 -0500 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> Message-ID: On 2012-03-10, at 5:45 PM, Robert Kern wrote: >> Okay, but how do I do that without having to build the whole of ipython >> myself? > > The best way to use IPython in your virtualenv is to install it in > your virtualenv. It's easy. To add to what Robert said, there's not a lot to "build" in IPython, if anything at all. EPD provides all the compiled extensions that are dependencies/optional dependencies, like pyzmq. "easy_install ipython" or "python setup.py install" from a decompressed source package should take seconds. From telukpalu at gmail.com Sat Mar 10 21:01:45 2012 From: telukpalu at gmail.com (aa) Date: Sun, 11 Mar 2012 09:01:45 +0700 Subject: [SciPy-User] about ode Message-ID: <4F5C0789.1090800@gmail.com> can You give me alitle explanation using scipy escpecially abolut sc.integrate.ode, i found matlab code like this: g = @(t,x) [x(1); -2*x(2)] [t,x] = ode45(g,[0:1], [1.5,3]) plot(x(:,1),x(:,2)) --- how to get equivalent code in scipy...?? From lists at hilboll.de Sun Mar 11 04:55:46 2012 From: lists at hilboll.de (Andreas H.) Date: Sun, 11 Mar 2012 09:55:46 +0100 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5BD12D.9090504@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> Message-ID: <4F5C6892.3050606@hilboll.de> Am 10.03.2012 23:09, schrieb Chris Withers: > Hi All, > > I hope this is the right list. > > So, here's the problem I'm facing: I use EPD as my base python, but I > have a bunch of projects that all have additional dependencies. For > example, I may want to use a version of Pandas that's newer than the one > that ships with my chosen version of EPD, and add some additional > libraries that don't ship with EPD. > > Okay, so I worry that the advice may be "well just install stuff into > EPD with pip or easy_install". I don't want to do that, just because I > need a newer version of Pandas for one project, doesn't mean I want to > have to make sure *all* my projects work with that new version, etc. > > So, I tried to wrap a virtualenv around epd: > > epd-python virtualenv.py mytestenv > > ...and then install new pandas and other stuff: > > mytestenv/bin/python easy_install -U pandas > > ...but now, how would I start ipython using that virutalenv? > > I tried just running "ipython", but of course, that doesn't include the > virtualenv. > > I tried activating the virtualenv and then running ipython, but it still > doesn't include anything installed in the virtualenv. > > So, how should I do this? > > Chris > Chris, I just uploaded a quick log of what I did to accomplish exactly this to https://gist.github.com/2015652 I do have the problem that within the virtualenv, something with the console's not working right, as iPythons help doesn't work properly, and I cannot launch applications which open windows (except for ``ipython pylab=wx``) ... I hope it still helps ... Cheers, Andreas. From pav at iki.fi Sun Mar 11 07:14:30 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Mar 2012 12:14:30 +0100 Subject: [SciPy-User] about ode In-Reply-To: <4F5C0789.1090800@gmail.com> References: <4F5C0789.1090800@gmail.com> Message-ID: 11.03.2012 03:01, aa kirjoitti: > can You give me alitle explanation using scipy escpecially abolut > sc.integrate.ode, > i found matlab code like this: > g = @(t,x) [x(1); -2*x(2)] > [t,x] = ode45(g,[0:1], [1.5,3]) > plot(x(:,1),x(:,2)) > > --- > how to get equivalent code in scipy...?? import numpy as np from scipy.integrate import odeint def g(x, t): return [x[0], -2*x[1]] t = np.linspace(0, 1, 200) x = odeint(g, [1.5, 3], t) # LSODAR, not Runge-Kutta See also: http://stackoverflow.com/questions/9466046/how-to-make-odeint-successful ^ In that recipe, replace 'zvode' by 'dopri5' to get R-K http://www.scipy.org/NumPy_for_Matlab_Users From seb.haase at gmail.com Sun Mar 11 07:31:33 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Sun, 11 Mar 2012 12:31:33 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: On Sat, Mar 10, 2012 at 8:48 PM, Ralf Gommers wrote: > > > On Sat, Mar 10, 2012 at 7:32 PM, Pauli Virtanen wrote: >> >> 10.03.2012 19:16, josef.pktd at gmail.com kirjoitti: >> [clip] >> > I think it would be useful to have a central location for publishing >> > code and get earlier feedback from users. (Currently I bookmark, for >> > example, ?gists or modules in a random package, that look interesting >> > and I might not find again when I need them in future.) >> > >> > I'm still envious of the matlab fileexchange with a very useful >> > commenting system where I look more often than at scipy-central, and I >> > like the commenting system on stackoverflow that gives a much faster >> > way of evaluating several ways of doing things. >> > >> > I also think Mathworks and Statacorp are very smart in supporting user >> > code, since (I assume) that they look at the download statistic to see >> > what is in high demand and might incorporate code if it's license >> > compatible. >> >> I think this is pretty much the position scipy-central tries to fill. As >> far as I understand, what you say that is missing is >> >> - more advertisement > > > As a start, I added a link on the main scipy.org site. Adding it to > http://new.scipy.org/ would also be good - SciPy Central needs a logo > though. > > Ralf > FWIW, I didn't know (or must have forgotten) about scipy central - and a google search also did NOT really help !!! Only the 5th hit was "User profile: SciPy Central --- scipy-central.org/user/profile/scipy-central/" after "cookbook" on first place, followed by some mailing list post from Sept-2011 ... and so on ... How does SciPy central compare to the cookbook ? It sound's like I was kind-of meant to supersede it ... ? - Sebastian Haase From pav at iki.fi Sun Mar 11 08:39:09 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Mar 2012 13:39:09 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: 11.03.2012 12:31, Sebastian Haase kirjoitti: [clip] > How does SciPy central compare to the cookbook ? It sound's like I > was kind-of meant to supersede it ... ? The aim of the Scipy Central is more or less to supersede the Cookbook and the Topical Software wiki pages with something more friendly. I believe constructive suggestions on how to improve it would be welcome. > FWIW, I didn't know (or must have forgotten) about scipy central - and > a google search also did NOT really help !!! Only the 5th hit was > "User profile: SciPy Central --- > scipy-central.org/user/profile/scipy-central/" after "cookbook" on > first place, followed by some mailing list post from Sept-2011 ... > and so on ... Today, that link is the first hit, probably thanks to the Google juice from the scipy.org front page. I wonder why the leading hit is not the front page, though. Some additional Google-fu is maybe required. The user pages probably should have the noindex meta, as they're not so useful to have in search engines. -- Pauli Virtanen From pav at iki.fi Sun Mar 11 15:21:45 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Mar 2012 20:21:45 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: Hi, 09.03.2012 09:47, Adrien Gaidon kirjoitti: [clip] > Furthermore, it seems that large projects tend to have API zealots that > don't even want to see code unless it can be directly merged in master > (caricature). I totally understand that, and think it's in the nature of > open source projects in order to not grow anarchistically. > However, this also prevents small "diamonds in the rough" to be > discovered, or useful temporary hole-filling solutions to be proposed > until a proper one is available. To me, this is a false problem due to > the fact that the only advertised way to contribute is by forking + pull > request. But not everybody is a scipy source code guru! Coming back to this: it is also a false dichotomy. A contribution is not either accepted or rejected. Rather, - contribution is proposed - it gets feedback - original contributor (or someone else) revises, if needed - accepted when it's good enough This is exactly the same process through which all Scipy development is done. (As it is now, no new feature lands in without review.) The distinction between the "scipy team" and "contributors" is blurry at best, and unproductive at worst. The failure modes are that the original contributor or the other side goes MIA. This is, however, not a real problem. If the code was listed in a pull request, or a Trac ticket, it is possible (for the people originally involved, or someone else) to get back to it later on. Sure, for low-priority things, the delay may be long in the worst case, but for things of broad interest, not so often. Interestingly, in all of the concrete examples mentioned in this thread, the discussion was only done on the mailing list. On a mailing list, it's in practice not productive to read through the archives and pick up pending stuff. (I often tell people to open a ticket, but don't always remember to.) Note that a contribution being just rejected (!= needs-work) does not occur so often. In my experience, with Scipy this only happens if there's something really wrong, or it is out of scope. I don't remember many actual cases. -- Pauli Virtanen From massimodisasha at gmail.com Sun Mar 11 16:53:47 2012 From: massimodisasha at gmail.com (Massimo Di Stefano) Date: Sun, 11 Mar 2012 16:53:47 -0400 Subject: [SciPy-User] numpy - scipy test failure on OSX (git version 10/3/2012) Message-ID: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> Hi All, i'm on OSX lion using the system python. i just finished to build numpy and scipy using the later gig version and i tried to run the test. unlucky bumpy failed ands scipy test gives me a segfault. i hope something is wrong on my system .. have you any hints on how to debug this problem ? attached a link to the test log [1]. [1] http://www.geofemengineering.it//numpy_scipy_log.txt From wardefar at iro.umontreal.ca Sun Mar 11 23:30:19 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sun, 11 Mar 2012 23:30:19 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: <3A573391-18C3-4732-A3CC-BFEFC33F3321@iro.umontreal.ca> On 2012-03-10, at 2:48 PM, Ralf Gommers wrote: > As a start, I added a link on the main scipy.org site. Adding it to http://new.scipy.org/would also be good - SciPy Central needs a logo though. Not to belittle anyone's efforts, but it needs some aesthetic/design work too besides that. I think a nontrivial element in StackOverflow's success has been that it's very easy on the eyes, very well designed, and everything is exactly where you expect it to be. (That's one thing that I think new.scipy.org got relatively right; unfortunately there were workload vs. manpower problems, and problems with access to the hosting (that eventually got solved, albeit too late).) David From slasley at space.umd.edu Mon Mar 12 01:29:38 2012 From: slasley at space.umd.edu (Scott Lasley) Date: Mon, 12 Mar 2012 01:29:38 -0400 Subject: [SciPy-User] numpy - scipy test failure on OSX (git version 10/3/2012) In-Reply-To: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> References: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> Message-ID: On Mar 11, 2012, at 4:53 PM, Massimo Di Stefano wrote: > Hi All, > > i'm on OSX lion using the system python. > > i just finished to build numpy and scipy using the later gig version and i tried to run the test. > unlucky bumpy failed ands scipy test gives me a segfault. > i hope something is wrong on my system .. have you any hints on how to debug this problem ? > > attached a link to the test log [1]. > > [1] http://www.geofemengineering.it//numpy_scipy_log.txt Which version of Xcode do you have? I got a segfault in the scipy tests after building with the llvm-gcc included with Xcode 4.2. I have not tried building scipy with the latest llvm-gcc included with Xcode 4.3, but building with clang or gcc-4.2 from the cran.r-project worked. See this message for more information http://mail.scipy.org/pipermail/scipy-user/2012-February/031460.html hth, Scott From massimodisasha at gmail.com Mon Mar 12 04:15:54 2012 From: massimodisasha at gmail.com (Massimo Di Stefano) Date: Mon, 12 Mar 2012 04:15:54 -0400 Subject: [SciPy-User] numpy - scipy test failure on OSX (git version 10/3/2012) In-Reply-To: References: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> Message-ID: <840B0189-6F93-400C-86AD-FF5B7F10A408@gmail.com> I tried to upgrade to Xcode 4.3 .. but seems that 'for free' there is only Xcore 4.2 on the apple dev center. i installed the gcc from the crane website and i exported the flags .. but numpy build is giving me word error about array with negative size. here [1] the full log using the gcc from the R website. [1] http://www.geofemengineering.it/numpy_build_log.txt i think it i is the cause of all the evil ? i should fix this before to go a had with scipy. frustrated i also tried to build gcc4.6 from source .. but no lucky the setup.py ends with : MacBook-Pro-di-Massimo:numpy epifanio$ export CC=/usr/local/bin/gcc MacBook-Pro-di-Massimo:numpy epifanio$ export CXX=/usr/local/bin/g++ MacBook-Pro-di-Massimo:numpy epifanio$ rm -rf build/ MacBook-Pro-di-Massimo:numpy epifanio$ python setup.py build_ext --fcompiler=/usr/local/bin/gfortran-4.2 Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2 numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 6, with checksum eb54c77ff4149bab310324cd7c0cb176, but recorded checksum for C API version 6 in codegen_dir/cversions.txt is e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you have to update C_API_VERSION in numpy/core/setup_common.pyc. MismatchCAPIWarning) blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers'] lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-msse3'] running build_ext running build_src build_src building py_modules sources creating build creating build/src.macosx-10.7-intel-2.7 creating build/src.macosx-10.7-intel-2.7/numpy creating build/src.macosx-10.7-intel-2.7/numpy/distutils building library "npymath" sources customize NAGFCompiler Could not locate executable f95 customize AbsoftFCompiler Could not locate executable f90 Could not locate executable f77 customize IBMFCompiler Could not locate executable xlf90 Could not locate executable xlf customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize GnuFCompiler Could not locate executable g77 customize Gnu95FCompiler Found executable /usr/local/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using config C compiler: /usr/local/bin/gcc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c' gcc: _configtest.c gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead gcc: error: i386: No such file or directory gcc: error: x86_64: No such file or directory gcc: error: unrecognized option ?-arch? gcc: error: unrecognized option ?-arch? gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead gcc: error: i386: No such file or directory gcc: error: x86_64: No such file or directory gcc: error: unrecognized option ?-arch? gcc: error: unrecognized option ?-arch? failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "/Users/epifanio/dev/src/numpy/numpy/distutils/core.py", line 186, in setup return old_setup(**new_attr) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/Users/epifanio/dev/src/numpy/numpy/distutils/command/build_ext.py", line 57, in run self.run_command('build_src') File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/Users/epifanio/dev/src/numpy/numpy/distutils/command/build_src.py", line 152, in run self.build_sources() File "/Users/epifanio/dev/src/numpy/numpy/distutils/command/build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "/Users/epifanio/dev/src/numpy/numpy/distutils/command/build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "/Users/epifanio/dev/src/numpy/numpy/distutils/command/build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 648, in get_mathlib_info raise RuntimeError("Broken toolchain: cannot link a simple C program") RuntimeError: Broken toolchain: cannot link a simple C program MacBook-Pro-di-Massimo:numpy epifanio$ maybe i missed something during the gcc configure ??? Il giorno Mar 12, 2012, alle ore 1:29 AM, Scott Lasley ha scritto: > > On Mar 11, 2012, at 4:53 PM, Massimo Di Stefano wrote: > >> Hi All, >> >> i'm on OSX lion using the system python. >> >> i just finished to build numpy and scipy using the later gig version and i tried to run the test. >> unlucky bumpy failed ands scipy test gives me a segfault. >> i hope something is wrong on my system .. have you any hints on how to debug this problem ? >> >> attached a link to the test log [1]. >> >> [1] http://www.geofemengineering.it//numpy_scipy_log.txt > > Which version of Xcode do you have? I got a segfault in the scipy tests after building with the llvm-gcc included with Xcode 4.2. I have not tried building scipy with the latest llvm-gcc included with Xcode 4.3, but building with clang or gcc-4.2 from the cran.r-project worked. See this message for more information > http://mail.scipy.org/pipermail/scipy-user/2012-February/031460.html > > hth, > Scott > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From robert.kern at gmail.com Mon Mar 12 05:43:54 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 12 Mar 2012 09:43:54 +0000 Subject: [SciPy-User] numpy - scipy test failure on OSX (git version 10/3/2012) In-Reply-To: <840B0189-6F93-400C-86AD-FF5B7F10A408@gmail.com> References: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> <840B0189-6F93-400C-86AD-FF5B7F10A408@gmail.com> Message-ID: On Mon, Mar 12, 2012 at 08:15, Massimo Di Stefano wrote: > I tried to upgrade to Xcode 4.3 .. but seems that 'for free' there is only Xcore 4.2 on the apple dev center. > > i installed the gcc from the crane website and i exported the flags .. but numpy build is giving me word error about array with negative size. > > here [1] the full log using the gcc from the R website. > > [1] http://www.geofemengineering.it/numpy_build_log.txt > > > i think it i is the cause of all the evil ? i should fix this before to go a had with scipy. > > > > frustrated i also tried to build gcc4.6 from source ?.. but no lucky the setup.py ends with : > C compiler: /usr/local/bin/gcc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c' > gcc: _configtest.c > gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead > gcc: error: i386: No such file or directory > gcc: error: x86_64: No such file or directory > gcc: error: unrecognized option ?-arch? > gcc: error: unrecognized option ?-arch? > gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead > gcc: error: i386: No such file or directory > gcc: error: x86_64: No such file or directory > gcc: error: unrecognized option ?-arch? > gcc: error: unrecognized option ?-arch? > failure. > maybe i missed something during the gcc configure ??? You need to apply Apple's patches to provide the "-arch" compile flag. I don't believe they have patches to gcc 4.6. This is probably the easiest way to get command-line installs of the official gcc: http://kennethreitz.com/xcode-gcc-and-homebrew.html -- Robert Kern From matt.newville at gmail.com Sat Mar 10 13:24:42 2012 From: matt.newville at gmail.com (Matthew Newville) Date: Sat, 10 Mar 2012 10:24:42 -0800 (PST) Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> Message-ID: <1111627.2833.1331403882535.JavaMail.geo-discussion-forums@ynnk21> On Saturday, March 10, 2012 11:00:44 AM UTC-6, Pauli Virtanen wrote: > > 09.03.2012 22:13, Matthew Newville kirjoitti: > [clip] > > Well, I understand you don't necessarily consider yourself to be a scipy > > dev, but I'll ask anyway: How many pull requests have been made, and > > how many have been accepted? Perhaps I am not reading github's Pull > > Request link correctly, but that seems to indicate that the numbers are > > 17 and 0. Surely, those cannot be correct. Unless I am counting > > wrong, 5 of the 17 (total? outstanding?) pull requests listed for > > scipy/scipy involve optimization. > > Wrong: 17 open, 99 accepted. > Yes, I see that (or more) now.... I've been using git for a while now, but still learning how to read github pages. I knew 17/0 couldn't be right.... Sorry for that. > [clip] > > Well, there was a discussion "Alternative to scipy.optimize" in the past > > two weeks on scipy-users that mentioned lmfit, and several over the past > > several months, and apparently requests about mpfit in the more distant > > past. And yet two scipy contributers (according to github's list, I am > > counting you) responded to the original request for features **exactly > > like lmfit** with something reading an awful lot like "Well, you'll have > > to write one". Perhaps insular is not a fair characterization -- how > > would you characterize that? > > Come on. The correct characterization is just: "busy". I did not > remember that your project existed as it was announced half a year ago > with no proposal that it should be integrated, and did not read the > recent thread on scipy-user. > OK, I can accept that. We're all busy. I still take exception with Gael's response to the original question. In this instance, there was browbeating of outside people for not contributing coupled with ignoring contributions from outside people. You'll probably forgive me for thinking that is not so encouraging for outside developers.... Scipy is fantastic project, and I've been relying on it for many years. But it is also very large and diverse, with lots of great code, and some mediocre code, and it's sometimes difficult to tell what is well supported and what is less so. It's also not clear to me what the strategy is for deciding what belongs in the core and what belongs in outside projects. The problems of how to organize the scientific python projects, and how to attract more developers, are challenging. I do not pretend to know how to do that, and I'm sure you've all thought about and discussed this at length, but it might be important to give more attention to these issues. But I also submit that the (self) perception that there is a small group of people doing all the work and a large group of people who do nothing but ask for more features is likely to be self-fulfilling. The way out of the cage is probably not by running harder or screaming louder, but by moving the boundaries. Cheers, -Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From alec.kalinin at gmail.com Mon Mar 12 09:56:47 2012 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Mon, 12 Mar 2012 16:56:47 +0300 Subject: [SciPy-User] What is the best way to take elements from an array along an axis? Message-ID: Hello, When we use "fancy" indexing to take elements from an array along an axis, the error could appear. Look at the code import numpy as np a = np.random.rand(10, 3) b = np.random.rand(10, 3) c = b[:, 0] result = a * c We got an error: "ValueError: shape mismatch: objects cannot be broadcast to a single shape". We have the shape problem: >>> a.shape (10, 3) >>> c.shape (10,) >> I know two ways to overcome this problem: 1 way: c = b[:, 0].reshape(-1, 1) 2 way: c = b[:, 0, np.newaxis] But what is the best practice for this case? How I should take elements from an array? Sincerely, Alexander -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Mar 12 10:04:28 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 12 Mar 2012 10:04:28 -0400 Subject: [SciPy-User] What is the best way to take elements from an array along an axis? In-Reply-To: References: Message-ID: On Mon, Mar 12, 2012 at 9:56 AM, Alexander Kalinin wrote: > Hello, > > When we use "fancy" indexing to take elements from an array along an axis, > the error could appear. Look at the code > > import numpy as np > > a = np.random.rand(10, 3) > > b = np.random.rand(10, 3) > > c = b[:, 0] > > result = a * c > > > We got an error: "ValueError: shape mismatch: objects cannot be broadcast to > a single shape". We have the shape problem: > > >>>> a.shape > (10, 3) >>>> c.shape > (10,) >>> > > I know two ways to overcome this problem: > 1 way: > c = b[:, 0].reshape(-1, 1) > > > 2 way: > > c = b[:, 0, np.newaxis] > > > But what is the best practice for this case? How I should take elements from > an array? when I know I will need it right away with the extra axis, I usually use slices c = b[:, :1] or c = b[:, k:k+1] numpy also has a function to add a newaxis that I often use, but it has such an awful name that I don't find it anymore. :) Josef > > > Sincerely, > > Alexander > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Mar 12 10:07:11 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 12 Mar 2012 10:07:11 -0400 Subject: [SciPy-User] What is the best way to take elements from an array along an axis? In-Reply-To: References: Message-ID: On Mon, Mar 12, 2012 at 10:04 AM, wrote: > On Mon, Mar 12, 2012 at 9:56 AM, Alexander Kalinin > wrote: >> Hello, >> >> When we use "fancy" indexing to take elements from an array along an axis, >> the error could appear. Look at the code >> >> import numpy as np >> >> a = np.random.rand(10, 3) >> >> b = np.random.rand(10, 3) >> >> c = b[:, 0] >> >> result = a * c >> >> >> We got an error: "ValueError: shape mismatch: objects cannot be broadcast to >> a single shape". We have the shape problem: >> >> >>>>> a.shape >> (10, 3) >>>>> c.shape >> (10,) >>>> >> >> I know two ways to overcome this problem: >> 1 way: >> c = b[:, 0].reshape(-1, 1) >> >> >> 2 way: >> >> c = b[:, 0, np.newaxis] >> >> >> But what is the best practice for this case? How I should take elements from >> an array? > > when I know I will need it right away with the extra axis, I usually use slices > > c = b[:, :1] > or > c = b[:, k:k+1] > > numpy also has a function to add a newaxis that I often use, but it > has such an awful name that I don't find it anymore. :) numpy.expand_dims(a, axis) which is nice if the axis is a parameter and not in a fixed position Josef > > Josef > > > >> >> >> Sincerely, >> >> Alexander >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> From alec.kalinin at gmail.com Mon Mar 12 10:10:06 2012 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Mon, 12 Mar 2012 17:10:06 +0300 Subject: [SciPy-User] What is the best way to take elements from an array along an axis? In-Reply-To: References: Message-ID: Thank you, Josef! Yes, I think c = b[:, :1] or c = b[:, k:k+1] is the most clean way to do what i want. Sincerely, Alexander On Mon, Mar 12, 2012 at 6:04 PM, wrote: > On Mon, Mar 12, 2012 at 9:56 AM, Alexander Kalinin > wrote: > > Hello, > > > > When we use "fancy" indexing to take elements from an array along an > axis, > > the error could appear. Look at the code > > > > import numpy as np > > > > a = np.random.rand(10, 3) > > > > b = np.random.rand(10, 3) > > > > c = b[:, 0] > > > > result = a * c > > > > > > We got an error: "ValueError: shape mismatch: objects cannot be > broadcast to > > a single shape". We have the shape problem: > > > > > >>>> a.shape > > (10, 3) > >>>> c.shape > > (10,) > >>> > > > > I know two ways to overcome this problem: > > 1 way: > > c = b[:, 0].reshape(-1, 1) > > > > > > 2 way: > > > > c = b[:, 0, np.newaxis] > > > > > > But what is the best practice for this case? How I should take elements > from > > an array? > > when I know I will need it right away with the extra axis, I usually use > slices > > c = b[:, :1] > or > c = b[:, k:k+1] > > numpy also has a function to add a newaxis that I often use, but it > has such an awful name that I don't find it anymore. :) > > Josef > > > > > > > > > Sincerely, > > > > Alexander > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimodisasha at gmail.com Mon Mar 12 10:36:07 2012 From: massimodisasha at gmail.com (Massimo Di Stefano) Date: Mon, 12 Mar 2012 10:36:07 -0400 Subject: [SciPy-User] numpy - scipy test failure on OSX (git version 10/3/2012) In-Reply-To: References: <4C18CACA-0D26-4EEE-84B8-9CD8F82621C4@gmail.com> <840B0189-6F93-400C-86AD-FF5B7F10A408@gmail.com> Message-ID: i removed the Xcode tools and I've re-installed the latest Xcode tools, the gcc compiler is : MacBook-Pro-di-Massimo:~ epifanio$ gcc --version i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. it is the same version i had previously ? the build give the same errors about array of negative size and the test fails as before. so, i decided to give a chance to home brew. i installed it and python, then using the homebrew easy_install executable i installed nose. building nupy i noticed the same bad log about error array with negative size. running the test i got this : ............................................................. ====================================================================== FAIL: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/f2py/tests/test_kind.py", line 30, in test_all 'selectedrealkind(%s): expected %r but got %r' % (i, selected_real_kind(i), selectedrealkind(i))) File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: selectedrealkind(19): expected -1 but got 16 ---------------------------------------------------------------------- Ran 3650 tests in 29.466s FAILED (KNOWNFAIL=3, SKIP=5, failures=1) >>> so far, .. no lucky. i'll try to build the 'stable' old bumpy and see what happens. Il giorno Mar 12, 2012, alle ore 5:43 AM, Robert Kern ha scritto: > On Mon, Mar 12, 2012 at 08:15, Massimo Di Stefano > wrote: >> I tried to upgrade to Xcode 4.3 .. but seems that 'for free' there is only Xcore 4.2 on the apple dev center. >> >> i installed the gcc from the crane website and i exported the flags .. but numpy build is giving me word error about array with negative size. >> >> here [1] the full log using the gcc from the R website. >> >> [1] http://www.geofemengineering.it/numpy_build_log.txt >> >> >> i think it i is the cause of all the evil ? i should fix this before to go a had with scipy. >> >> >> >> frustrated i also tried to build gcc4.6 from source .. but no lucky the setup.py ends with : > >> C compiler: /usr/local/bin/gcc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe >> >> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c' >> gcc: _configtest.c >> gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead >> gcc: error: i386: No such file or directory >> gcc: error: x86_64: No such file or directory >> gcc: error: unrecognized option ?-arch? >> gcc: error: unrecognized option ?-arch? >> gcc: warning: ?-mfused-madd? is deprecated; use ?-ffp-contract=? instead >> gcc: error: i386: No such file or directory >> gcc: error: x86_64: No such file or directory >> gcc: error: unrecognized option ?-arch? >> gcc: error: unrecognized option ?-arch? >> failure. > >> maybe i missed something during the gcc configure ??? > > You need to apply Apple's patches to provide the "-arch" compile flag. > I don't believe they have patches to gcc 4.6. > > This is probably the easiest way to get command-line installs of the > official gcc: > > http://kennethreitz.com/xcode-gcc-and-homebrew.html > > -- > Robert Kern > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From efmgdj at yahoo.com Mon Mar 12 12:48:54 2012 From: efmgdj at yahoo.com (Eric) Date: Mon, 12 Mar 2012 16:48:54 +0000 (UTC) Subject: [SciPy-User] scipy sparse limits Message-ID: Hi, I'm trying to use large 10^5x10^5 sparse matrices but seem to be running up against a scipy limit: n=10**5, x=sp.rand(n,n,.001) gets "ValueError: Trying to generate a random sparse matrix such as the product of dimensions is greater than 2147483647 - this is not supported on this machine" Does anyone know why limit is there and if I can avoid it? (fyi, I'm using a macbook air with 4gb memory and the enthought distribution) thanks, Eric From pav at iki.fi Mon Mar 12 13:28:53 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 12 Mar 2012 18:28:53 +0100 Subject: [SciPy-User] scipy sparse limits In-Reply-To: References: Message-ID: 12.03.2012 17:48, Eric kirjoitti: [clip] > n=10**5, x=sp.rand(n,n,.001) gets > "ValueError: Trying to generate a random sparse matrix such > as the product of dimensions is greater than > 2147483647 - this is not supported on this machine" > > Does anyone know why limit is there and if I can avoid it? The limit seems to be only in the rand() routine --- you can create larger random matrices otherwise. The limit probably shouldn't be there --- this is probably a bug. As a workaround, you can copy and paste the rand() routine to your own code and remove the size check: https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L573 From cweisiger at msg.ucsf.edu Mon Mar 12 13:32:06 2012 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Mon, 12 Mar 2012 10:32:06 -0700 Subject: [SciPy-User] scipy sparse limits In-Reply-To: References: Message-ID: On Mon, Mar 12, 2012 at 9:48 AM, Eric wrote: > Hi, ?I'm trying to use large 10^5x10^5 sparse > matrices but seem to > be running up against a scipy limit: > > n=10**5, x=sp.rand(n,n,.001) gets > "ValueError: Trying to generate a random sparse matrix such > as the product of dimensions is greater than > 2147483647?- this is not supported on this machine" > > Does anyone know why limit is there and if I can avoid it? log2(2147483648) = 31, so it sounds like you're running into a 32-bit limitation somewhere. If you can, using 64-bit Python/Numpy/Scipy would probably get you around this. -Chris From ralf.gommers at googlemail.com Mon Mar 12 17:41:47 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 12 Mar 2012 22:41:47 +0100 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <1111627.2833.1331403882535.JavaMail.geo-discussion-forums@ynnk21> References: <4F5916A2.2040604@eso.org> <20120308210722.GC12436@phare.normalesup.org> <6942812.8.1331254549777.JavaMail.geo-discussion-forums@vbuf40> <20120309165443.GA26552@phare.normalesup.org> <32380089.1618.1331327615123.JavaMail.geo-discussion-forums@ynjc20> <1111627.2833.1331403882535.JavaMail.geo-discussion-forums@ynnk21> Message-ID: On Sat, Mar 10, 2012 at 7:24 PM, Matthew Newville wrote: > > > On Saturday, March 10, 2012 11:00:44 AM UTC-6, Pauli Virtanen wrote: >> >> 09.03.2012 22:13, Matthew Newville kirjoitti: >> [clip] >> > Well, I understand you don't necessarily consider yourself to be a scipy >> > dev, but I'll ask anyway: How many pull requests have been made, and >> > how many have been accepted? Perhaps I am not reading github's Pull >> > Request link correctly, but that seems to indicate that the numbers are >> > 17 and 0. Surely, those cannot be correct. Unless I am counting >> > wrong, 5 of the 17 (total? outstanding?) pull requests listed for >> > scipy/scipy involve optimization. >> >> Wrong: 17 open, 99 accepted. >> > > Yes, I see that (or more) now.... I've been using git for a while now, > but still learning how to read github pages. I knew 17/0 couldn't be > right.... Sorry for that. > > >> [clip] >> > Well, there was a discussion "Alternative to scipy.optimize" in the past >> > two weeks on scipy-users that mentioned lmfit, and several over the past >> > several months, and apparently requests about mpfit in the more distant >> > past. And yet two scipy contributers (according to github's list, I am >> > counting you) responded to the original request for features **exactly >> > like lmfit** with something reading an awful lot like "Well, you'll have >> > to write one". Perhaps insular is not a fair characterization -- how >> > would you characterize that? >> >> Come on. The correct characterization is just: "busy". I did not >> remember that your project existed as it was announced half a year ago >> with no proposal that it should be integrated, and did not read the >> recent thread on scipy-user. >> > OK, I can accept that. We're all busy. > > I still take exception with Gael's response to the original question. In > this instance, there was browbeating of outside people for not contributing > coupled with ignoring contributions from outside people. You'll probably > forgive me for thinking that is not so encouraging for outside > developers.... > > Scipy is fantastic project, and I've been relying on it for many years. > But it is also very large and diverse, with lots of great code, and some > mediocre code, and it's sometimes difficult to tell what is well supported > and what is less so. It's also not clear to me what the strategy is for > deciding what belongs in the core and what belongs in outside projects. > This strategy was never really clear, we're trying to improve that. The recent "SciPy Goal" thread helped a lot there. Better descriptions of what has been discussed and how to make these decisions (in general, there's always a grey area) should be put up for review and discussion soon. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From eddybarratt1 at yahoo.co.uk Mon Mar 12 17:46:07 2012 From: eddybarratt1 at yahoo.co.uk (Eddy Barratt) Date: Mon, 12 Mar 2012 21:46:07 +0000 (GMT) Subject: [SciPy-User] Building numpy/scipy for python3 on MacOS Lion Message-ID: <1331588767.69711.YahooMailNeo@web29505.mail.ird.yahoo.com> I can't get Numpy or Scipy to work with Python3 on Mac OSX Lion. I have used pip successfully to install numpy, scipy and matplotlib, and they work well with Python2.7, but in Python3 typing 'import numpy' brings up 'No module named numpy'. I've tried downloading the source code directly and then running 'python3 setup.py build', but I get various error warnings, some in red that have to do with fortran (e.g. 'Could not locate executable f95'). The error message that appears to fail in the end is 'RuntimeError: Broken toolchain: cannot link a simple C program', and appears to be related to the previous line 'sh: gcc-4.2: command not found'. The Scipy website (http://www.scipy.org/Installing_SciPy/Mac_OS_X) suggests that there may be issues with the c compiler, but the same problems didn't arise using pip to install for python2.7. I have followed the instructions on the website regarding changing the compiler but this has not made any difference. I have also tried installing from a virtual environment: >>> mkvirtualenv -p python3.2 test1 >>> pip install numpy But this fails with "Command python setup.py egg_info failed with error code 1 in /Users/Eddy/.virtualenvs/test1/build/numpy" I've considered making python3 default, and then I thought a pip install might work, but I don't know how to do that. Does anyone have any suggestions for how I might proceed? I'm relatively new to Python but it's something I feel I'm likely to become more involved in so I'd like to start using Python3 before I get too established with 2.7. Thanks for your help. Eddy From kgdunn at gmail.com Mon Mar 12 23:11:22 2012 From: kgdunn at gmail.com (Kevin Dunn) Date: Mon, 12 Mar 2012 23:11:22 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: On Sun, Mar 11, 2012 at 08:39, Pauli Virtanen wrote: > 11.03.2012 12:31, Sebastian Haase kirjoitti: > [clip] >> How does SciPy central compare to the cookbook ? ?It sound's like I >> was kind-of meant to supersede it ... ? > > The aim of the Scipy Central is more or less to supersede the Cookbook > and the Topical Software wiki pages with something more friendly. > > I believe constructive suggestions on how to improve it would be welcome. > >> FWIW, I didn't know (or must have forgotten) about scipy central - and >> a google search also did NOT really help !!! Only the 5th hit was >> "User profile: SciPy Central --- >> scipy-central.org/user/profile/scipy-central/" ?after "cookbook" on >> first place, ?followed by some mailing list post from Sept-2011 ... >> and so on ... > > Today, that link is the first hit, probably thanks to the Google juice > from the scipy.org front page. > > I wonder why the leading hit is not the front page, though. Some > additional Google-fu is maybe required. The user pages probably should > have the noindex meta, as they're not so useful to have in search engines. > > -- > Pauli Virtanen I agree that the SciPy Central site needs more work. I was hoping to slowly copy/paste the cookbook examples and some of the Topical Software wiki pages over to the site. I also agree the site needs to be visually improved. And, commenting can definitely be added to the site, but it will require a bit of thought on getting it right, so that it;s useful (something like StackOverflow, as David mentioned). However, I've not been able to get around to any of these issues due to personal time constraints, and will not be able to until May. So perhaps a good start is if someone would like to get the cookbook examples copied over. I can share the credentials for the http://scipy-central.org/user/profile/scipy-central/ user, so that no one in particular gets credit for these submissions. Ideas for visual changes to the site can usually be easily tested and incorporated; while suggestions for a commenting system might best be handled in a new thread, to keep discussion on track. Kevin (SciPy Central maintainer) From fralosal at ei.upv.es Tue Mar 13 09:02:11 2012 From: fralosal at ei.upv.es (javi) Date: Tue, 13 Mar 2012 13:02:11 +0000 (UTC) Subject: [SciPy-User] how to use properly the function fmin () to scipy.optimize Message-ID: Hello, I have been trying to find the right way to use the function fmin () to use downhill simplex. Mainly I have a problem with that is that the algorithm converges to good effect, ie as a solution with a value next to zero. To test the performance of the algorithm I used the following example: def minimize (x): min = x [0] + x [1] + x [2] + x [3] return min In which given a vector x would want to obtain the values ??of its elements that when added give the minimum possible value. To do this use the following function call: solution = fmin (minimize, x0 = array ([1, 2, 3, 4]), args = "1", xtol = 0.21, = 0.21 ftol, full_output = 1) print "value parameters", solution [0], "\ n" and I get the following results: Optimization terminated successfully. Current function value: 10.000000 Iterations: 1 Function evaluations: 5 value of the parameters: [1. 2. 3. 4.] As you can see the solution is VERY BAD, and I understand that due to large values ??of ftol and xtol that I gave it converges very quickly and gives a small value. Now, for that is a better result, ie, better than the 10 found understand that I must decrease and ftol xtol values??, but in doing so I get: "Warning: Maximum number of function evaluations exceeded Has Been." Where I understand the algorithm before converging has made excessive calls to the function "minimize". Could you tell me what the correct use of the parameters ftol and xtol to find a good minimum next to 0?. Sshould generally be used in subsequent cases of ftol and xtol values???, They differ?. A greeting and thank you very much. From denis-bz-gg at t-online.de Tue Mar 13 10:47:09 2012 From: denis-bz-gg at t-online.de (denis) Date: Tue, 13 Mar 2012 07:47:09 -0700 (PDT) Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: Message-ID: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Folks, a view from the peanut gallery of 3 somewhat different areas that Scipy-central, cookbook, docs.scipy, stackoverflow cover well / not so well: 1) Q+As 2) code recipes, a few pages with some test and """ doc """ 3) narrative overviews, slides, talks, FAQs on 3a) basics: indexing, plotting, linking to cython/c ... 3b) area overviews: cluster fft integrate interpolate io ... Some of these 1) Q+As are I think well covered by stackoverflow, so don't reinvent that wheel (although I liked advice.mechanicalkern.com) 2) there are quite a few sites to put up recipes but 100 unsorted recipes do not make a cookbook even with a snazzy cover. Sure user feedback, comments, weeding, organizing are important but weeding and sorting scipy.org/Cookbook is difficult-to- impossible, not happening. (Don't see what copying the lot would gain us.) 3) narrative overviews are not the topic of this thread but seem to me a need, an opportunity. Are there pages on scipy.org that collect the best slides, talks, screencasts, FAQs, Wikis with expert comments and critical reviews ? cheers -- denis On Mar 10, 7:16?pm, josef.p... at gmail.com wrote: > I would like to extend the discussion a bit. From warren.weckesser at enthought.com Tue Mar 13 13:35:25 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 13 Mar 2012 12:35:25 -0500 Subject: [SciPy-User] how to use properly the function fmin () to scipy.optimize In-Reply-To: References: Message-ID: On Tue, Mar 13, 2012 at 8:02 AM, javi wrote: > Hello, I have been trying to find the right way to use the function fmin > () to > use downhill simplex. > > Mainly I have a problem with that is that the algorithm converges to good > effect, ie as a solution with a value next to zero. > > To test the performance of the algorithm I used the following example: > > def minimize (x): > > min = x [0] + x [1] + x [2] + x [3] > return min > > In which given a vector x would want to obtain the values of its elements > that > when added give the minimum possible value. > > To do this use the following function call: > > solution = fmin (minimize, x0 = array ([1, 2, 3, 4]), args = "1", xtol = > 0.21, = > 0.21 ftol, full_output = 1) > > print "value parameters", solution [0], "\ n" > > and I get the following results: > > Optimization terminated successfully. > Current function value: 10.000000 > Iterations: 1 > Function evaluations: 5 > > value of the parameters: [1. 2. 3. 4.] > > As you can see the solution is VERY BAD, and I understand that due to large > values of ftol and xtol that I gave it converges very quickly and gives a > small value. > > Now, for that is a better result, ie, better than the 10 found understand > that I > must decrease and ftol xtol values??, but in doing so I get: > > > "Warning: Maximum number of function evaluations exceeded Has Been." > > Where I understand the algorithm before converging has made excessive > calls to > the function "minimize". > > Could you tell me what the correct use of the parameters ftol and xtol to > find > a good minimum next to 0?. Sshould generally be used in subsequent cases > of ftol > and xtol values???, They differ?. > > A greeting and thank you very much. > > It looks like you want to solve a *constrained* minimization problem, in which all the components of x remain positive. The function fmin() is for unconstrained optimization, and your objective function has no (unconstrained) minimum. You can try fmin_cobyla or fmin_slsqp. Here's a short demonstration: ----- from scipy.optimize import fmin_slsqp, fmin_cobyla def objective(x): """The objective function to be minized.""" return x.sum() def all_positive_constr(x): """Component constraint function for fmin_slsqp.""" return x # The following are the component constraint functions for fmin_cobyla. def x0_positive(x): return x[0] def x1_positive(x): return x[1] def x2_positive(x): return x[2] def x3_positive(x): return x[3] if __name__ == "__main__": print "Using fmin_slsqp" result = fmin_slsqp(objective, [1,2,3,4], f_ieqcons=all_positive_constr) print result print print "Using fmin_cobyla" result = fmin_cobyla(objective, [1,2,3,4], [x0_positive, x1_positive, x2_positive, x3_positive]) print result print ----- Warren _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Tue Mar 13 13:37:27 2012 From: lists at hilboll.de (Andreas H.) Date: Tue, 13 Mar 2012 18:37:27 +0100 Subject: [SciPy-User] [scipy-central] Site design Message-ID: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Hi all, I think everyone agrees that the webdesign of scipy-central.org needs some major enhancements in order to make the site appealing to users so that they want to stay, browse, and use it. I think it would make sense to make the site visually similar to the main SciPy site (new.scipy.org), so that users can already "feel" the connection. I'm mainly talking about colors and fonts here. Also, a logo would be good. For a start, maybe we could use the main SciPy logo, but eventually, scipy-central should have its own, similar logo. Then, a sidebar would be nice. Possible blocks for the sidebar include 'links to core and related projects', 'what is SciPy', ... ideas welcome. If you agree, I could start playing around with the templates/css over the next weeks. Best, Andreas. From warren.weckesser at enthought.com Tue Mar 13 13:50:52 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 13 Mar 2012 12:50:52 -0500 Subject: [SciPy-User] how to use properly the function fmin () to scipy.optimize In-Reply-To: References: Message-ID: On Tue, Mar 13, 2012 at 12:35 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Tue, Mar 13, 2012 at 8:02 AM, javi wrote: > >> Hello, I have been trying to find the right way to use the function fmin >> () to >> use downhill simplex. >> >> Mainly I have a problem with that is that the algorithm converges to good >> effect, ie as a solution with a value next to zero. >> >> To test the performance of the algorithm I used the following example: >> >> def minimize (x): >> >> min = x [0] + x [1] + x [2] + x [3] >> return min >> >> In which given a vector x would want to obtain the values of its elements >> that >> when added give the minimum possible value. >> >> To do this use the following function call: >> >> solution = fmin (minimize, x0 = array ([1, 2, 3, 4]), args = "1", xtol = >> 0.21, = >> 0.21 ftol, full_output = 1) >> >> print "value parameters", solution [0], "\ n" >> >> and I get the following results: >> >> Optimization terminated successfully. >> Current function value: 10.000000 >> Iterations: 1 >> Function evaluations: 5 >> >> value of the parameters: [1. 2. 3. 4.] >> >> As you can see the solution is VERY BAD, and I understand that due to >> large >> values of ftol and xtol that I gave it converges very quickly and gives a >> small value. >> >> Now, for that is a better result, ie, better than the 10 found understand >> that I >> must decrease and ftol xtol values??, but in doing so I get: >> >> >> "Warning: Maximum number of function evaluations exceeded Has Been." >> >> Where I understand the algorithm before converging has made excessive >> calls to >> the function "minimize". >> >> Could you tell me what the correct use of the parameters ftol and xtol >> to find >> a good minimum next to 0?. Sshould generally be used in subsequent cases >> of ftol >> and xtol values???, They differ?. >> >> A greeting and thank you very much. >> >> > > It looks like you want to solve a *constrained* minimization problem, in > which all the components of x remain positive. The function fmin() is for > unconstrained optimization, and your objective function has no > (unconstrained) minimum. > > You can try fmin_cobyla or fmin_slsqp. > Or fmin_tnc or fmin_l_bfgs. See the docstrings of these functions for more information and examples. Warren > Here's a short demonstration: > > ----- > from scipy.optimize import fmin_slsqp, fmin_cobyla > > > def objective(x): > """The objective function to be minized.""" > return x.sum() > > def all_positive_constr(x): > """Component constraint function for fmin_slsqp.""" > return x > > > # The following are the component constraint functions for fmin_cobyla. > > def x0_positive(x): > return x[0] > > def x1_positive(x): > return x[1] > > def x2_positive(x): > return x[2] > > def x3_positive(x): > return x[3] > > > if __name__ == "__main__": > > print "Using fmin_slsqp" > result = fmin_slsqp(objective, [1,2,3,4], > f_ieqcons=all_positive_constr) > print result > print > > print "Using fmin_cobyla" > result = fmin_cobyla(objective, [1,2,3,4], [x0_positive, x1_positive, > x2_positive, x3_positive]) > print result > print > ----- > > Warren > > _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fralosal at ei.upv.es Tue Mar 13 19:00:50 2012 From: fralosal at ei.upv.es (Francisco Javier =?iso-8859-1?b?TPNwZXo=?= Salcedo) Date: Wed, 14 Mar 2012 00:00:50 +0100 Subject: [SciPy-User] how to use properly the function fmin () to scipy.optimize In-Reply-To: References: Message-ID: <20120314000050.71713y0pqmkwrxcy@wm.upv.es> Warren thank you very much for your help and your demonstration. This was only invented problem to learn to use fmin (), but perhaps not the best problem solving with fmin as you say. I mistakenly thinking about that the expected minimum value would be zero, but obviously this is not true because the problem has no a priori minimum, I apologize. I understand that the problem will not converge naturally never, so if you do not define convergence by high values ??of xtol and ftol fmin() threw the warning that I mentioned having to evaluate the function "minimize" excessive times. Chiefly my question about the algorithm were two things, that defines exactly the parameter ftol and xtol?, What is the difference between them?, If I wanted to stop the algorithm when the minimum is not differentiated from one iteration to the next in a given amount which of these two parameters should have occasion to modify its default value?. Again thank you very much and sorry. Warren Weckesser escribi?: > On Tue, Mar 13, 2012 at 8:02 AM, javi wrote: > >> Hello, I have been trying to find the right way to use the function fmin >> () to >> use downhill simplex. >> >> Mainly I have a problem with that is that the algorithm converges to good >> effect, ie as a solution with a value next to zero. >> >> To test the performance of the algorithm I used the following example: >> >> def minimize (x): >> >> min = x [0] + x [1] + x [2] + x [3] >> return min >> >> In which given a vector x would want to obtain the values of its elements >> that >> when added give the minimum possible value. >> >> To do this use the following function call: >> >> solution = fmin (minimize, x0 = array ([1, 2, 3, 4]), args = "1", xtol = >> 0.21, = >> 0.21 ftol, full_output = 1) >> >> print "value parameters", solution [0], "\ n" >> >> and I get the following results: >> >> Optimization terminated successfully. >> Current function value: 10.000000 >> Iterations: 1 >> Function evaluations: 5 >> >> value of the parameters: [1. 2. 3. 4.] >> >> As you can see the solution is VERY BAD, and I understand that due to large >> values of ftol and xtol that I gave it converges very quickly and gives a >> small value. >> >> Now, for that is a better result, ie, better than the 10 found understand >> that I >> must decrease and ftol xtol values??, but in doing so I get: >> >> >> "Warning: Maximum number of function evaluations exceeded Has Been." >> >> Where I understand the algorithm before converging has made excessive >> calls to >> the function "minimize". >> >> Could you tell me what the correct use of the parameters ftol and xtol to >> find >> a good minimum next to 0?. Sshould generally be used in subsequent cases >> of ftol >> and xtol values???, They differ?. >> >> A greeting and thank you very much. >> >> > > It looks like you want to solve a *constrained* minimization problem, in > which all the components of x remain positive. The function fmin() is for > unconstrained optimization, and your objective function has no > (unconstrained) minimum. > > You can try fmin_cobyla or fmin_slsqp. Here's a short demonstration: > > ----- > from scipy.optimize import fmin_slsqp, fmin_cobyla > > > def objective(x): > """The objective function to be minized.""" > return x.sum() > > def all_positive_constr(x): > """Component constraint function for fmin_slsqp.""" > return x > > > # The following are the component constraint functions for fmin_cobyla. > > def x0_positive(x): > return x[0] > > def x1_positive(x): > return x[1] > > def x2_positive(x): > return x[2] > > def x3_positive(x): > return x[3] > > > if __name__ == "__main__": > > print "Using fmin_slsqp" > result = fmin_slsqp(objective, [1,2,3,4], f_ieqcons=all_positive_constr) > print result > print > > print "Using fmin_cobyla" > result = fmin_cobyla(objective, [1,2,3,4], [x0_positive, x1_positive, > x2_positive, x3_positive]) > print result > print > ----- > > Warren > > _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From tsyu80 at gmail.com Tue Mar 13 22:22:30 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Tue, 13 Mar 2012 22:22:30 -0400 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: > Hi all, > > I think everyone agrees that the webdesign of scipy-central.org needs some > major enhancements in order to make the site appealing to users so that > they want to stay, browse, and use it. > > I think it would make sense to make the site visually similar to the main > SciPy site (new.scipy.org), so that users can already "feel" the > connection. I'm mainly talking about colors and fonts here. > > Also, a logo would be good. For a start, maybe we could use the main SciPy > logo, but eventually, scipy-central should have its own, similar logo. > > Then, a sidebar would be nice. Possible blocks for the sidebar include > 'links to core and related projects', 'what is SciPy', ... ideas welcome. > > If you agree, I could start playing around with the templates/css over the > next weeks. > > Best, > Andreas. > > Here's a logo concept. The concept is a bit literal: SciPy curve, with the background of a globe and arrows pointing from different locations to a "central" point. I posted the code on github, if someone wants to play around with it (it's not particularly pretty code): https://github.com/tonysyu/SciPy-Central-Logo Tools used: * numpy * matplotlib * basemap * scipy (although this was a bit forced) -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy_central_logo.png Type: image/png Size: 124761 bytes Desc: not available URL: From matthew.brett at gmail.com Tue Mar 13 22:42:28 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 13 Mar 2012 19:42:28 -0700 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: Hi, On Tue, Mar 13, 2012 at 7:22 PM, Tony Yu wrote: > > > On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: >> >> Hi all, >> >> I think everyone agrees that the webdesign of scipy-central.org needs some >> major enhancements in order to make the site appealing to users so that >> they want to stay, browse, and use it. >> >> I think it would make sense to make the site visually similar to the main >> SciPy site (new.scipy.org), so that users can already "feel" the >> connection. I'm mainly talking about colors and fonts here. >> >> Also, a logo would be good. For a start, maybe we could use the main SciPy >> logo, but eventually, scipy-central should have its own, similar logo. >> >> Then, a sidebar would be nice. Possible blocks for the sidebar include >> 'links to core and related projects', 'what is SciPy', ... ideas welcome. >> >> If you agree, I could start playing around with the templates/css over the >> next weeks. >> >> Best, >> Andreas. >> > > Here's a logo concept. The concept is a bit literal: SciPy curve, with the > background of a globe and arrows pointing from different locations to a > "central" point. Thanks for doing that - it looks good. But - aren't the arrows pointing dangerously close to the North Atlantic Garbage Patch? http://en.wikipedia.org/wiki/North_Atlantic_Garbage_Patch Best, Matthew From tsyu80 at gmail.com Tue Mar 13 22:46:00 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Tue, 13 Mar 2012 22:46:00 -0400 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Tue, Mar 13, 2012 at 10:42 PM, Matthew Brett wrote: > Hi, > > On Tue, Mar 13, 2012 at 7:22 PM, Tony Yu wrote: > > > > > > On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: > >> > >> Hi all, > >> > >> I think everyone agrees that the webdesign of scipy-central.org needs > some > >> major enhancements in order to make the site appealing to users so that > >> they want to stay, browse, and use it. > >> > >> I think it would make sense to make the site visually similar to the > main > >> SciPy site (new.scipy.org), so that users can already "feel" the > >> connection. I'm mainly talking about colors and fonts here. > >> > >> Also, a logo would be good. For a start, maybe we could use the main > SciPy > >> logo, but eventually, scipy-central should have its own, similar logo. > >> > >> Then, a sidebar would be nice. Possible blocks for the sidebar include > >> 'links to core and related projects', 'what is SciPy', ... ideas > welcome. > >> > >> If you agree, I could start playing around with the templates/css over > the > >> next weeks. > >> > >> Best, > >> Andreas. > >> > > > > Here's a logo concept. The concept is a bit literal: SciPy curve, with > the > > background of a globe and arrows pointing from different locations to a > > "central" point. > > Thanks for doing that - it looks good. > > But - aren't the arrows pointing dangerously close to the North > Atlantic Garbage Patch? > > http://en.wikipedia.org/wiki/North_Atlantic_Garbage_Patch > > Best, > > Matthew > I had no intention of predicting the code quality of SciPy Central submissions when designing this. ;) Best, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Mar 13 23:44:13 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 13 Mar 2012 23:44:13 -0400 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Tue, Mar 13, 2012 at 10:46 PM, Tony Yu wrote: > > > On Tue, Mar 13, 2012 at 10:42 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Mar 13, 2012 at 7:22 PM, Tony Yu wrote: >> > >> > >> > On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: >> >> >> >> Hi all, >> >> >> >> I think everyone agrees that the webdesign of scipy-central.org needs >> >> some >> >> major enhancements in order to make the site appealing to users so that >> >> they want to stay, browse, and use it. >> >> >> >> I think it would make sense to make the site visually similar to the >> >> main >> >> SciPy site (new.scipy.org), so that users can already "feel" the >> >> connection. I'm mainly talking about colors and fonts here. >> >> >> >> Also, a logo would be good. For a start, maybe we could use the main >> >> SciPy >> >> logo, but eventually, scipy-central should have its own, similar logo. >> >> >> >> Then, a sidebar would be nice. Possible blocks for the sidebar include >> >> 'links to core and related projects', 'what is SciPy', ... ideas >> >> welcome. >> >> >> >> If you agree, I could start playing around with the templates/css over >> >> the >> >> next weeks. >> >> >> >> Best, >> >> Andreas. >> >> >> > >> > Here's a logo concept. The concept is a bit literal: SciPy curve, with >> > the >> > background of a globe and arrows pointing from different locations to a >> > "central" point. >> >> Thanks for doing that - it looks good. >> >> But - aren't the arrows pointing dangerously close to the North >> Atlantic Garbage Patch? >> >> http://en.wikipedia.org/wiki/North_Atlantic_Garbage_Patch Now you made me read about ocean pollution for more than half an hour ;( 200,000 invisible particles per square kilometer? Don't go swimming in the middle of any ocean. Would pointing the arrows to the central part of the snaky S work? Cheers, Josef >> >> Best, >> >> Matthew > > > I had no intention of predicting the code quality of SciPy Central > submissions when designing this. ;) > > Best, > -Tony > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From scott.sinclair.za at gmail.com Wed Mar 14 02:53:27 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 14 Mar 2012 08:53:27 +0200 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On 13 March 2012 19:37, Andreas H. wrote: > I think everyone agrees that the webdesign of scipy-central.org needs some > major enhancements in order to make the site appealing to users so that > they want to stay, browse, and use it. > > I think it would make sense to make the site visually similar to the main > SciPy site (new.scipy.org), so that users can already "feel" the > connection. I'm mainly talking about colors and fonts here. That sounds like a good idea to me. > Also, a logo would be good. For a start, maybe we could use the main SciPy > logo, but eventually, scipy-central should have its own, similar logo. Tony's logo seems like a great start. I like it, but now that I know about the garbage patch, I have a very minor aversion to the focal point of the inwardly pointing arrows. > Then, a sidebar would be nice. Possible blocks for the sidebar include > 'links to core and related projects', 'what is SciPy', ... ideas welcome. > > If you agree, I could start playing around with the templates/css over the > next weeks. Thanks for getting this rolling. I'd really like to see the scipy.org domain pointed at scipy.github.com, but the cookbook and topical software pages are the main sticking point since they contain lots of useful (and sometimes not so useful) content that needs to be organized in some manner. Scipy Central seems like a good candidate for this. Cheers, Scott From scott.sinclair.za at gmail.com Wed Mar 14 03:04:28 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 14 Mar 2012 09:04:28 +0200 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On 13 March 2012 16:47, denis wrote: > 1) Q+As are I think well covered by stackoverflow, so don't reinvent > that wheel > ? ?(although I liked advice.mechanicalkern.com) I've always liked the idea of advice.mechanicalkern.com or ask.scipy.org as a good way of hosting an FAQ (there is some good content on both sites), but I guess we could encourage using Stack Overflow instead. > 2) there are quite a few sites to put up recipes > ? ?but 100 unsorted recipes do not make a cookbook > ? ?even with a snazzy cover. > ? ?Sure user feedback, comments, weeding, organizing are important > ? ?but weeding and sorting scipy.org/Cookbook is difficult-to- > impossible, > ? ?not happening. (Don't see what copying the lot would gain us.) Selective copying could be useful, but that's still a lot of work and it doesn't look like there are (m)any volunteers at this stage. Cheers, Scott From ralf.gommers at googlemail.com Wed Mar 14 03:21:41 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 14 Mar 2012 08:21:41 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 8:04 AM, Scott Sinclair wrote: > On 13 March 2012 16:47, denis wrote: > > 1) Q+As are I think well covered by stackoverflow, so don't reinvent > > that wheel > > (although I liked advice.mechanicalkern.com) > > I've always liked the idea of advice.mechanicalkern.com or > ask.scipy.org as a good way of hosting an FAQ (there is some good > content on both sites), but I guess we could encourage using Stack > Overflow instead. > > > 2) there are quite a few sites to put up recipes > > but 100 unsorted recipes do not make a cookbook > > even with a snazzy cover. > > Sure user feedback, comments, weeding, organizing are important > > but weeding and sorting scipy.org/Cookbook is difficult-to- > > impossible, > > not happening. (Don't see what copying the lot would gain us.) > > Selective copying could be useful, but that's still a lot of work and > it doesn't look like there are (m)any volunteers at this stage. > Can we start by removing recipes that aren't useful anymore, links to external sites (there are ~40 OpenOpt / FuncDesigner links for example) and the list of all pages? That would cut down the Cookbook page to a more manageable size immediately. Then there will be some things left that can land in the numpy/scipy tutorials, and some things for SciPy Central. Moving that content will still be a lot of work, but much less than what it looks like now. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Wed Mar 14 03:58:34 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 14 Mar 2012 09:58:34 +0200 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On 14 March 2012 09:21, Ralf Gommers wrote: > > > On Wed, Mar 14, 2012 at 8:04 AM, Scott Sinclair > wrote: >> >> On 13 March 2012 16:47, denis wrote: >> > 2) there are quite a few sites to put up recipes >> > ? ?but 100 unsorted recipes do not make a cookbook >> > ? ?even with a snazzy cover. >> > ? ?Sure user feedback, comments, weeding, organizing are important >> > ? ?but weeding and sorting scipy.org/Cookbook is difficult-to- >> > impossible, >> > ? ?not happening. (Don't see what copying the lot would gain us.) >> >> Selective copying could be useful, but that's still a lot of work and >> it doesn't look like there are (m)any volunteers at this stage. > > > Can we start by removing recipes that aren't useful anymore, links to > external sites (there are ~40 OpenOpt / FuncDesigner links for example) and > the list of all pages? That would cut down the Cookbook page to a more > manageable size immediately. There's also quite a lot related to Matplotlib, MayaVi etc. which might have a better home with those projects. Cheers, Scott From seb.haase at gmail.com Wed Mar 14 04:16:45 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 14 Mar 2012 09:16:45 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 8:58 AM, Scott Sinclair wrote: > > On 14 March 2012 09:21, Ralf Gommers wrote: > > > > > > On Wed, Mar 14, 2012 at 8:04 AM, Scott Sinclair > > wrote: > >> > >> On 13 March 2012 16:47, denis wrote: > >> > 2) there are quite a few sites to put up recipes > >> > ? ?but 100 unsorted recipes do not make a cookbook > >> > ? ?even with a snazzy cover. > >> > ? ?Sure user feedback, comments, weeding, organizing are important > >> > ? ?but weeding and sorting scipy.org/Cookbook is difficult-to- > >> > impossible, > >> > ? ?not happening. (Don't see what copying the lot would gain us.) > >> > >> Selective copying could be useful, but that's still a lot of work and > >> it doesn't look like there are (m)any volunteers at this stage. > > > > > > Can we start by removing recipes that aren't useful anymore, links to > > external sites (there are ~40 OpenOpt / FuncDesigner links for example) and > > the list of all pages? That would cut down the Cookbook page to a more > > manageable size immediately. > > There's also quite a lot related to Matplotlib, MayaVi etc. which > might have a better home with those projects. > I find it quite interesting to see those examples. I'm not involved in those other projects, and this is the only time I would see "what's possible". So, "SciPy Central/Cookbook" could be understood in the broader sense of "Science with Python", rather than "only" how to use the scipy-package.... My 2 cents. Sebastian Haase From scott.sinclair.za at gmail.com Wed Mar 14 06:12:20 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 14 Mar 2012 12:12:20 +0200 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On 14 March 2012 10:16, Sebastian Haase wrote: > On Wed, Mar 14, 2012 at 8:58 AM, Scott Sinclair > wrote: >> >> On 14 March 2012 09:21, Ralf Gommers wrote: >> > >> > Can we start by removing recipes that aren't useful anymore, links to >> > external sites (there are ~40 OpenOpt / FuncDesigner links for example) and >> > the list of all pages? That would cut down the Cookbook page to a more >> > manageable size immediately. >> >> There's also quite a lot related to Matplotlib, MayaVi etc. which >> might have a better home with those projects. >> > > I find it quite interesting to see those examples. I'm not involved in > those other projects, > ?and this is the only time I would see "what's possible". > So, "SciPy Central/Cookbook" could be understood in > the broader sense of "Science with Python", rather than "only" how to > use the scipy-package.... Sure they're interesting, I'm not proposing to throw anything away (lot's of people have contributed their time to produce the recipes). I still think that recipes which tell me how to do task X, with package Y are better hosted in the documentation/online resources of package Y. Recipes that solve a specific problem primarily using Numpy/Scipy, but that might also use Matplotlib/MayaVi/Chaco/? for plotting or cython/f2py/SWIG to speed up or wrap compiled code feel like they have a better fit. Overall, navigating through the Scipy web presence is awfully convoluted and I'm wondering how we can start solving that. Cheers, Scott From lchaplin13 at gmail.com Wed Mar 14 06:16:09 2012 From: lchaplin13 at gmail.com (Lee) Date: Wed, 14 Mar 2012 03:16:09 -0700 (PDT) Subject: [SciPy-User] delete rows and columns Message-ID: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> Hi all, first time here, sorry if I am not posting in the right group. I am trying to run the below example from numpy docs: import numpy as np print np.version.version #1.6.1 (win7-64, py2.6) a = np.array([0, 10, 20, 30, 40]) np.delete(a, [2,4]) # remove a[2] and a[4] print a a = np.arange(16).reshape(4,4) print a np.delete(a, np.s_[1:3], axis=0) # remove rows 1 and 2 print a np.delete(a, np.s_[1:3], axis=1) # remove columns 1 and 2 print a Basically I am trying to delete some column/rows from an array or a matrix. It seems that delete doesn't work I expect (and advertised). Am I missing something? Thanks, Lee From punchagan at gmail.com Wed Mar 14 06:25:16 2012 From: punchagan at gmail.com (Puneeth Chaganti) Date: Wed, 14 Mar 2012 15:55:16 +0530 Subject: [SciPy-User] delete rows and columns In-Reply-To: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> References: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 3:46 PM, Lee wrote: > Hi all, > > first time here, sorry if I am not posting in the right group. > I am trying to run the below example from numpy docs: > > import numpy as np > print np.version.version #1.6.1 (win7-64, py2.6) > > a = np.array([0, 10, 20, 30, 40]) > np.delete(a, [2,4]) # remove a[2] and a[4] > print a > a = np.arange(16).reshape(4,4) > print a > np.delete(a, np.s_[1:3], axis=0) # remove rows 1 and 2 > print a > np.delete(a, np.s_[1:3], axis=1) # remove columns 1 and 2 > print a > > Basically I am trying to delete some column/rows from an array or a > matrix. > It seems that delete doesn't work I expect (and advertised). Am I > missing something? np.delete does not change the array in place. It does work as advertised, which says """ Return a new array with sub-arrays along an axis deleted. """ >>> arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) >>> arr array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> np.delete(arr, 1, 0) array([[ 1, 2, 3, 4], [ 9, 10, 11, 12]]) HTH, Puneeth From ernesto.adorio at gmail.com Wed Mar 14 07:51:06 2012 From: ernesto.adorio at gmail.com (Ernesto Adorio) Date: Wed, 14 Mar 2012 19:51:06 +0800 Subject: [SciPy-User] speed up a python function using scipy constructs Message-ID: Hi, The following function is a pure python implmentation of the multinomial logistic regression log likelihood function.
def negmloglik(Betas, X, Y,  m, reflevel=0):
    """
    log likelihood for polytomous regression or mlogit.
    Betas - estimated coefficients, as a SINGLE array!
    Y values are coded from 0 to ncategories - 1

    Betas matrix
            b[0][0] + b[0][1]+ b[0][2]+ ... + b[[0][D-1]
            b[1][0] + b[1][1]+ b[1][2]+ ... + b[[1][D-1]
                        ...
            b[ncategories-1][0] + b[ncategories-1][1]+ b[ncategories-1][2]
             .... + ... + b[[ncategories - 1][D-1]

            Stored in one array! The beta   coefficients for each level
            are stored with indices in range(level*D , level *D + D)
    X,Y   data X matrix and integer response Y vector with values
            from 0 to maxlevel=ncategories-1
    m - number of categories in Y vector. each value of  ylevel in Y must
be in the
            interval [0, ncategories) or 0 <= ylevel < m
    reflevel - reference level, default code: 0
    """

    n  = len(X[0]) # number of coefficients per level.
    L  = 0
    for (xrow, ylevel) in zip(X,Y):
        h   = [0.0] * m
        denom = 0.0
        for k in range(m):
                 if k == reflevel:
                    denom += 1
                 else:
                    sa = k * n
                    v = sum([(x * b) for (x,b) in zip(xrow, Betas[sa: sa +
n])])
                    h[k] = v
                    denom += exp(v)
        deltaL = h[ylevel] - log(denom)
        L += deltaL
    return -2 * L
I am wondering if there are Scipy/Numpy constructs which can speed up the above Python implementation? Rewrite if necessary. Regards, Ernesto -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.laxalde at mcgill.ca Tue Mar 13 09:26:48 2012 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Tue, 13 Mar 2012 09:26:48 -0400 Subject: [SciPy-User] how to use properly the function fmin () to scipy.optimize In-Reply-To: References: Message-ID: <20120313092648.1ab6d564@mcgill.ca> javi wrote: > To test the performance of the algorithm I used the following example: > > def minimize (x): > > min = x [0] + x [1] + x [2] + x [3] > return min This function does not have a minimum. From josef.pktd at gmail.com Wed Mar 14 08:20:47 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 14 Mar 2012 08:20:47 -0400 Subject: [SciPy-User] speed up a python function using scipy constructs In-Reply-To: References: Message-ID: On Wed, Mar 14, 2012 at 7:51 AM, Ernesto Adorio wrote: > Hi, > > The following function is a pure python implmentation of the multinomial > logistic regression > log likelihood function. > >
> def negmloglik(Betas, X, Y, ?m, reflevel=0):
> ? ? """
> ? ? log likelihood for polytomous regression or mlogit.
> ? ? Betas - estimated coefficients, as a SINGLE array!
> ? ? Y values are coded from 0 to ncategories - 1
>
> ? ? Betas matrix
> ? ? ? ? ? ? b[0][0] + b[0][1]+ b[0][2]+ ... + b[[0][D-1]
> ? ? ? ? ? ? b[1][0] + b[1][1]+ b[1][2]+ ... + b[[1][D-1]
> ? ? ? ? ? ? ? ? ? ? ? ? ...
> ? ? ? ? ? ? b[ncategories-1][0] + b[ncategories-1][1]+ b[ncategories-1][2]
> ? ? ? ? ? ? ?.... + ... + b[[ncategories - 1][D-1]
>
> ? ? ? ? ? ? Stored in one array! The beta ? coefficients for each level
> ? ? ? ? ? ? are stored with indices in range(level*D , level *D + D)
> ? ? X,Y ? data X matrix and integer response Y vector with values
> ? ? ? ? ? ? from 0 to maxlevel=ncategories-1
> ? ? m - number of categories in Y vector. each value of ?ylevel in Y must be
> in the
> ? ? ? ? ? ? interval [0, ncategories) or 0 <= ylevel < m
> ? ? reflevel - reference level, default code: 0
> ? ? """
>
> ? ? n ?= len(X[0]) # number of coefficients per level.
> ? ? L ?= 0
> ? ? for (xrow, ylevel) in zip(X,Y):
> ? ? ? ? h ? = [0.0] * m
> ? ? ? ? denom = 0.0
> ? ? ? ? for k in range(m):
> ? ? ? ? ? ? ? ? ?if k == reflevel:
> ? ? ? ? ? ? ? ? ? ? denom += 1
> ? ? ? ? ? ? ? ? ?else:
> ? ? ? ? ? ? ? ? ? ? sa = k * n
> ? ? ? ? ? ? ? ? ? ? v = sum([(x * b) for (x,b) in zip(xrow, Betas[sa: sa +
> n])])
> ? ? ? ? ? ? ? ? ? ? h[k] = v
> ? ? ? ? ? ? ? ? ? ? denom += exp(v)
> ? ? ? ? deltaL = h[ylevel] - log(denom)
> ? ? ? ? L += deltaL
> ? ? return -2 * L
> 
> > I am wondering if there are Scipy/Numpy constructs which can speed up the > above Python implementation? > Rewrite if necessary. Maybe it helps to look at our implementation in statsmodels https://github.com/statsmodels/statsmodels/blob/master/statsmodels/discrete/discrete_model.py#L1091 I didn't read your loop to see if it is the same. Cheers, Josef > > Regards, > Ernesto > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Wed Mar 14 08:38:24 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 14 Mar 2012 08:38:24 -0400 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: > Hi all, > > I think everyone agrees that the webdesign of scipy-central.org needs some > major enhancements in order to make the site appealing to users so that > they want to stay, browse, and use it. > > I think it would make sense to make the site visually similar to the main > SciPy site (new.scipy.org), so that users can already "feel" the > connection. I'm mainly talking about colors and fonts here. > > Also, a logo would be good. For a start, maybe we could use the main SciPy > logo, but eventually, scipy-central should have its own, similar logo. > > Then, a sidebar would be nice. Possible blocks for the sidebar include > 'links to core and related projects', 'what is SciPy', ... ideas welcome. > > If you agree, I could start playing around with the templates/css over the > next weeks. > A humble suggestion for the layout, if people don't think it's done to death, bootstrap may be appropriate here [1, 2]. I've had good luck with ideas from bootstrap at least if not the whole framework. E.g, I find a CSS grid system to be aesthetically pleasing [3, 4]. There are many more examples than the given links. I find it saves a lot of the work of design from scratch. Skipper [1] http://blog.baregit.com/2012/bootstrap-or-not-bootstrap [2] http://twitter.github.com/bootstrap/ [3] http://960.gs/ [4] http://cssgrid.net/ From paustin at eos.ubc.ca Wed Mar 14 09:09:04 2012 From: paustin at eos.ubc.ca (Phil Austin) Date: Wed, 14 Mar 2012 06:09:04 -0700 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: <4F609870.4080106@eos.ubc.ca> On 12-03-14 05:38 AM, Skipper Seabold wrote: > > A humble suggestion for the layout, if people don't think it's done to > death, bootstrap may be appropriate here [1, 2]. I've had good luck > with ideas from bootstrap at least if not the whole framework. E.g, I > find a CSS grid system to be aesthetically pleasing [3, 4]. There are > many more examples than the given links. I find it saves a lot of the > work of design from scratch. > and there's a sphinx theme that adds a few javascript functions to convert toc and localtoc formatting to be bootstrap-compatible https://github.com/ryan-roemer/sphinx-bootstrap-theme -- Phil From jeanluc.menut at free.fr Wed Mar 14 09:12:51 2012 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Wed, 14 Mar 2012 14:12:51 +0100 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: <4F609953.2040900@free.fr> A hierarchical way to browse the packages could be interesting also. For example Physics->Fluid mechanics->Navier?Stokes equations. From wardefar at iro.umontreal.ca Wed Mar 14 10:14:03 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 14 Mar 2012 10:14:03 -0400 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On 2012-03-14, at 2:53 AM, Scott Sinclair wrote: > Thanks for getting this rolling. I'd really like to see the scipy.org > domain pointed at scipy.github.com, but the cookbook and topical > software pages are the main sticking point since they contain lots of > useful (and sometimes not so useful) content that needs to be > organized in some manner. Scipy Central seems like a good candidate > for this. +1. I think the logo looks great, Matthew's observation notwithstanding. :) David From wardefar at iro.umontreal.ca Wed Mar 14 10:17:55 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 14 Mar 2012 10:17:55 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: On 2012-03-14, at 6:12 AM, Scott Sinclair wrote: > Sure they're interesting, I'm not proposing to throw anything away > (lot's of people have contributed their time to produce the recipes). > > I still think that recipes which tell me how to do task X, with > package Y are better hosted in the documentation/online resources of > package Y. Recipes that solve a specific problem primarily using > Numpy/Scipy, but that might also use Matplotlib/MayaVi/Chaco/? for > plotting or cython/f2py/SWIG to speed up or wrap compiled code feel > like they have a better fit. I agree, certain sorts of recipes are a better fit than others. However, it would be nice if we had some clear and simple guidelines as to what belongs and what doesn't rather than making it a matter of subjective judgment; otherwise the only fair way forward seems to be accepting almost everything. David From cweisiger at msg.ucsf.edu Wed Mar 14 11:23:39 2012 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Wed, 14 Mar 2012 08:23:39 -0700 Subject: [SciPy-User] delete rows and columns In-Reply-To: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> References: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 3:16 AM, Lee wrote: > Hi all, > > first time here, sorry if I am not posting in the right group. > I am trying to run the below example from numpy docs: > > import numpy as np > print np.version.version #1.6.1 (win7-64, py2.6) > > a = np.array([0, 10, 20, 30, 40]) > np.delete(a, [2,4]) # remove a[2] and a[4] > print a > a = np.arange(16).reshape(4,4) > print a > np.delete(a, np.s_[1:3], axis=0) # remove rows 1 and 2 > print a > np.delete(a, np.s_[1:3], axis=1) # remove columns 1 and 2 > print a > > Basically I am trying to delete some column/rows from an array or a > matrix. > It seems that delete doesn't work I expect (and advertised). Am I > missing something? Numpy arrays are continuous blocks of memory, so doing an in-place deletion would require allocating a new block and copying everything that isn't deleted over. numpy.delete does exactly that. It doesn't modify the original array; it creates a copy of the non-deleted portions and returns that. If you run the above program in Python's REPL, then you'd see this: >>> a = np.array([0, 10, 20, 30, 40]) >>> np.delete(a, [2, 4]) array([ 0, 10, 30]) >>> print a [ 0 10 20 30 40] Note how there's a result from running np.delete(), which gets printed by default in the REPL. Instead of allocating a new chunk of memory that's just a copy of most of the old chunk, you can accomplish a similar feat using array slices: >>> a[[0,1,3]] # Everything in a except columns 2 and 4 array([ 0, 10, 30]) You could assign that back to a, and then be able to treat a as if those two columns had been erased (since they'd be functionally inaccessible). I assume this would make lookups into the array a bit slower though, because now behind the scenes Numpy has to know to skip over those blocks of memory that you've elided. It's all a question of if CPU or RAM is more precious. -Chris > > Thanks, > Lee > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From denis-bz-gg at t-online.de Wed Mar 14 12:30:17 2012 From: denis-bz-gg at t-online.de (denis) Date: Wed, 14 Mar 2012 09:30:17 -0700 (PDT) Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> On Mar 14, 3:17?pm, David Warde-Farley wrote: > On 2012-03-14, at 6:12 AM, Scott Sinclair wrote: > I agree, certain sorts of recipes are a better fit than others. However, it would be nice if we had some clear and simple guidelines as to what belongs and what doesn't rather than making it a matter of subjective judgment; otherwise the only fair way forward seems to be accepting almost everything. "Has anyone used this recipe in living memory ?" would be a clear guideline. (SO etc. track that with member voting, up / down and when. Is there a simple off-the-shelf voting package that we could use for recipes ?) You're right, the tradeoff isn't easy: accept everything -- hodepodge -- or cut through the jungle. OT / beyond-topic, I think that each major area (cluster fft integrate interpolate io ...) should have an owner; all recipes with no owner go into "old/..." aka "nobodyknows/..." I'm sure that's been discussed, no volunteers ... cheers -- denis From josef.pktd at gmail.com Wed Mar 14 12:51:03 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 14 Mar 2012 12:51:03 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 12:30 PM, denis wrote: > On Mar 14, 3:17?pm, David Warde-Farley > wrote: >> On 2012-03-14, at 6:12 AM, Scott Sinclair wrote: >> I agree, certain sorts of recipes are a better fit than others. However, it would be nice if we had some clear and simple guidelines as to what belongs and what doesn't rather than making it a matter of subjective judgment; otherwise the only fair way forward seems to be accepting almost everything. > > "Has anyone used this recipe in living memory ?" > would be a clear guideline. > (SO etc. track that with member voting, up / down and when. > Is there a simple off-the-shelf voting package that we could use for > recipes ?) > You're right, the tradeoff isn't easy: > accept everything -- hodepodge -- or cut through the jungle. > > OT / beyond-topic, I think that each major area > (cluster fft integrate interpolate io ...) should have an owner; > all recipes with no owner go into "old/..." aka "nobodyknows/..." > I'm sure that's been discussed, no volunteers ... I think, if a commenting system, download statistic and tagging or searching works, then there will be very little "moderation" required. (I don't think mathworks does on the file exchange, except maybe spam, clear copyright violation, ..) (When I was still watching Siskel and Ebert (movie critics) then I could tell from their comments whether I would like the movie, but thumbs up or down was often not very informative because of different tastes.) Cheers, Josef > > cheers > ?-- denis > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Wed Mar 14 13:35:56 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 14 Mar 2012 18:35:56 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> Message-ID: 14.03.2012 17:51, josef.pktd at gmail.com kirjoitti: [clip] > I think, if a commenting system, download statistic and tagging or > searching works, then there will be very little "moderation" required. > (I don't think mathworks does on the file exchange, except maybe spam, > clear copyright violation, ..) In addition to commenting system, there could be a "I found this piece useful in real life" button for giving explicit endorsements. I'm not sure if there's a need for a "This is crap" button. For comments, one could add options for marking comments helpful or not, and an option for showing only "helpful" ones. Of course, it's probably not going to be Slashdot, so comments probably will work even without a relevance system in place. This leaves spam, but the email activation system that's currently in place is probably enough. One just needs suitable admin tools. Flag-as-spam feature could also be added. To make the download counts count something, one also needs to tell robots not to follow those links (some sites apparently don't do this). -- Pauli Virtanen From william.ratcliff at gmail.com Wed Mar 14 14:12:14 2012 From: william.ratcliff at gmail.com (william ratcliff) Date: Wed, 14 Mar 2012 14:12:14 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> Message-ID: There is a django-reddit (is it developed in django) library that could be used for basic comments/upvoting. I have some other ideas, but would like to try to implement them first. William On Wed, Mar 14, 2012 at 1:35 PM, Pauli Virtanen wrote: > 14.03.2012 17:51, josef.pktd at gmail.com kirjoitti: > [clip] > > I think, if a commenting system, download statistic and tagging or > > searching works, then there will be very little "moderation" required. > > (I don't think mathworks does on the file exchange, except maybe spam, > > clear copyright violation, ..) > > In addition to commenting system, there could be a "I found this piece > useful in real life" button for giving explicit endorsements. I'm not > sure if there's a need for a "This is crap" button. > > For comments, one could add options for marking comments helpful or not, > and an option for showing only "helpful" ones. Of course, it's probably > not going to be Slashdot, so comments probably will work even without a > relevance system in place. > > This leaves spam, but the email activation system that's currently in > place is probably enough. One just needs suitable admin tools. > Flag-as-spam feature could also be added. > > To make the download counts count something, one also needs to tell > robots not to follow those links (some sites apparently don't do this). > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Wed Mar 14 15:25:12 2012 From: denis-bz-gg at t-online.de (denis) Date: Wed, 14 Mar 2012 12:25:12 -0700 (PDT) Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> Message-ID: <39bc189c-023f-46c5-bb69-027a91ec7c87@db5g2000vbb.googlegroups.com> On Mar 14, 6:35?pm, Pauli Virtanen wrote: > 14.03.2012 17:51, josef.p... at gmail.com kirjoitti: > [clip] > > > I think, if a commenting system, download statistic and tagging or > > searching works, then there will be very little "moderation" required. May I suggest splitting this thread into a) new Cookbook (guidelines, all vs the best) b) voting / commenting system because they have such different timescales ? V/C could in theory tell us which recipes get used but may take a looooong time to discuss and implement cheers -- denis From ralf.gommers at googlemail.com Wed Mar 14 18:02:17 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 14 Mar 2012 23:02:17 +0100 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <9732cd01-88e3-4cca-b88f-8f312a07afbf@p6g2000yqi.googlegroups.com> Message-ID: On Wed, Mar 14, 2012 at 5:30 PM, denis wrote: > On Mar 14, 3:17 pm, David Warde-Farley > wrote: > > On 2012-03-14, at 6:12 AM, Scott Sinclair wrote: > > I agree, certain sorts of recipes are a better fit than others. However, > it would be nice if we had some clear and simple guidelines as to what > belongs and what doesn't rather than making it a matter of subjective > judgment; otherwise the only fair way forward seems to be accepting almost > everything. > > "Has anyone used this recipe in living memory ?" > would be a clear guideline. > (SO etc. track that with member voting, up / down and when. > Is there a simple off-the-shelf voting package that we could use for > recipes ?) > You're right, the tradeoff isn't easy: > accept everything -- hodepodge -- or cut through the jungle. > Not everything is easy to judge, it would be great if someone could take a shot at drafting a procedure for doing so. But all I wanted to propose is to remove things like links to external sites, duplicate links and content that's clearly not useful anymore. Examples of the latter: http://www.scipy.org/Cookbook/PIL http://www.scipy.org/Cookbook/xplt http://www.scipy.org/Cookbook/Pyrex_and_NumPy Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From lchaplin13 at gmail.com Wed Mar 14 19:00:47 2012 From: lchaplin13 at gmail.com (Lee) Date: Wed, 14 Mar 2012 16:00:47 -0700 (PDT) Subject: [SciPy-User] delete rows and columns In-Reply-To: References: <46726410-54bc-41cc-a1f9-9064d7a50055@x10g2000pbi.googlegroups.com> Message-ID: <8f6e743b-ab6f-4942-82eb-5106a0d76606@pz2g2000pbc.googlegroups.com> Thanks Chris and Puneeth, This explains all. Lee On Mar 15, 4:23?am, Chris Weisiger wrote: > if CPU or RAM is more precious. > From tmp50 at ukr.net Thu Mar 15 06:48:04 2012 From: tmp50 at ukr.net (Dmitrey) Date: Thu, 15 Mar 2012 12:48:04 +0200 Subject: [SciPy-User] [ANN] new release 0.38 of OpenOpt, FuncDesigner, SpaceFuncs, DerApproximator Message-ID: <73423.1331808484.17417893843813335040@ffe8.ukr.net> Hi, I'm glad to inform you about new release 0.38 (2012-March-15): OpenOpt: interalg can handle discrete variables (see MINLP for examples) interalg can handle multiobjective problems (MOP) interalg can handle problems with parameters fixedVars/freeVars Many interalg improvements and some bugfixes Add another EIG solver: numpy.linalg.eig New LLSP solver pymls with box bounds handling FuncDesigner: Some improvements for sum() Add funcs tanh, arctanh, arcsinh, arccosh Can solve EIG built from derivatives of several functions, obtained by automatic differentiation by FuncDesigner SpaceFuncs: Add method point.symmetry(Point|Line|Plane) Add method LineSegment.middle Add method Point.rotate(Center, angle) DerApproximator: Minor changes See http://openopt.org for more details. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Mar 15 08:12:45 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 15 Mar 2012 13:12:45 +0100 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: 14.03.2012 13:38, Skipper Seabold kirjoitti: [clip] > A humble suggestion for the layout, if people don't think it's done to > death, bootstrap may be appropriate here [1, 2]. I've had good luck > with ideas from bootstrap at least if not the whole framework. E.g, I > find a CSS grid system to be aesthetically pleasing [3, 4]. There are > many more examples than the given links. I find it saves a lot of the > work of design from scratch. +1 This is a very good suggestion. Bootstrap seems quite promising to me. Could use some tweaking in the styling, though, as the out-of-box appearance seems very generic Web 2.0 marketing-ishy. (To customize colors etc. in it, you need to modify and rebuild it, which requires Node.js). To fix the issues of navigation on the Scipy.org and related sites, what should be done is: design a base template layout with a place for navigation tools, with at least the link back to the main site fixed. Then use that for everything. Basing this on bootstrap would probably make styling in scipy-central quite a bit easier. -- Pauli Virtanen From bacmsantos at gmail.com Wed Mar 14 13:05:03 2012 From: bacmsantos at gmail.com (Bruno Santos) Date: Wed, 14 Mar 2012 17:05:03 +0000 Subject: [SciPy-User] rv_frozen when using gamma function Message-ID: I am trying to write a script to do some maximum likelihood parameter estimation of a function. But when I try to use the gamma function I get: gamma(5) Out[5]: I thought it might have been a problem solved already on the new distribution but even after installing the last scipy version I get the same problem. The test() after installation is also failing with the following information: Running unit tests for scipy NumPy version 1.5.1 NumPy is installed in /usr/lib/pymodules/python2.7/numpy SciPy version 0.10.1 SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] nose version 1.1.2 ... ... ... AssertionError: Arrays are not almost equal ACTUAL: 0.0 DESIRED: 0.5 ====================================================================== FAIL: Regression test for #651: better handling of badly conditioned ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", line 34, in test_bad_filter assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, in assert_raises return nose.tools.assert_raises(*args,**kwargs) AssertionError: BadCoefficients not raised ---------------------------------------------------------------------- Ran 5103 tests in 47.795s FAILED (KNOWNFAIL=13, SKIP=28, failures=3) Out[7]: My code is as follows: from numpy import array,log,sum,nan from scipy.stats import gamma from scipy import factorial, optimize #rinterface.initr() #IntSexpVector = rinterface.IntSexpVector #lgamma = rinterface.globalenv.get("lgamma") #Implementation for the Zero-inflated Negative Binomial function def alphabeta(params,x,dicerAcc): alpha = array(params[0]) beta = array(params[1]) if alpha<0 or beta<0:return nan return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - log(factorial(x))) if __name__=='__main__': x = array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, 0.14979999999999999, 0.012999999999999999]) optimize.() Am I doing something wrong or is this a known problem? Best, Bruno -------------- next part -------------- An HTML attachment was scrubbed... URL: From cameron.hayne at dftmicrosystems.com Wed Mar 14 15:16:10 2012 From: cameron.hayne at dftmicrosystems.com (Cameron Hayne) Date: Wed, 14 Mar 2012 15:16:10 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> Message-ID: <0FEAABD1-291F-4F90-A7D0-6E38D7DE010B@sympatico.ca> On 14-Mar-12, at 6:12 AM, Scott Sinclair wrote: > I still think that recipes which tell me how to do task X, with > package Y are better hosted in the documentation/online resources of > package Y. But that is only useful when you know that you need to use package Y. The cookbook should answer the question "How do I do X ?" (without any reference to specific packages). For example, it should answer the question "How can I fit my data to a straight line?" - the answer would show several ways of doing that (with different scipy packages) and discuss the pros/cons of each way. -- Cameron Hayne hayne at sympatico.ca From dhondt.olivier at gmail.com Thu Mar 15 04:59:28 2012 From: dhondt.olivier at gmail.com (tyldurd) Date: Thu, 15 Mar 2012 01:59:28 -0700 (PDT) Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images Message-ID: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> Hello, I am a beginner at python and numpy and I need to compute the matrix logarithm for each "pixel" (i.e. x,y position) of a matrix-valued image of dimension MxNx3x3. 3x3 is the dimensions of the matrix at each pixel. The function I have written so far is the following: def logm_img(im): from scipy import linalg dimx = im.shape[0] dimy = im.shape[1] res = zeros_like(im) for x in range(dimx): for y in range(dimy): res[x, y, :, :] = linalg.logm(asmatrix(im[x,y,:,:])) return res Is it ok? Is there a way to avoid the two nested loops ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Mar 15 09:39:14 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 15 Mar 2012 09:39:14 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Wed, Mar 14, 2012 at 1:05 PM, Bruno Santos wrote: > I am trying to write a script to do some maximum likelihood parameter > estimation of a function. But when I try to use the gamma function I get: > gamma(5) > Out[5]: > That's the Gamma distribution in scipy.stats. You want the Gamma function, it's in scipy.special [~/] [1]: from scipy import special [~/] [2]: special.gamma(5) [2]: 24.0 What kind of likelihood are you trying to maximize? Skipper From josef.pktd at gmail.com Thu Mar 15 09:40:13 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 15 Mar 2012 09:40:13 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Wed, Mar 14, 2012 at 1:05 PM, Bruno Santos wrote: > I am trying to write a script to do some maximum likelihood parameter > estimation of a function. But when I try to use the gamma function I get: > gamma(5) > Out[5]: > > I thought it might have been a problem solved already on the new > distribution but even after installing the last scipy version I get the same > problem. > The test() after installation is also failing with the following > information: > Running unit tests for scipy > NumPy version 1.5.1 > NumPy is installed in /usr/lib/pymodules/python2.7/numpy > SciPy version 0.10.1 > SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy > Python version 2.7.2+ (default, Oct ?4 2011, 20:06:09) [GCC 4.6.1] > nose version 1.1.2 > ... > ... > ... > AssertionError: > Arrays are not almost equal > ?ACTUAL: 0.0 > ?DESIRED: 0.5 > > ====================================================================== > FAIL: Regression test for #651: better handling of badly conditioned > ---------------------------------------------------------------------- > Traceback (most recent call last): > ? File > "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", > line 34, in test_bad_filter > ? ? assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) > ? File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, in > assert_raises > ? ? return nose.tools.assert_raises(*args,**kwargs) > AssertionError: BadCoefficients not raised > > ---------------------------------------------------------------------- > Ran 5103 tests in 47.795s > > FAILED (KNOWNFAIL=13, SKIP=28, failures=3) > Out[7]: > > > My code is as follows: > from numpy import array,log,sum,nan > from scipy.stats import gamma > from scipy import factorial, optimize > > #rinterface.initr() > #IntSexpVector = rinterface.IntSexpVector > #lgamma = rinterface.globalenv.get("lgamma") > > #Implementation for the Zero-inflated Negative Binomial function > def alphabeta(params,x,dicerAcc): > ? ? alpha = array(params[0]) > ? ? beta = array(params[1]) > ? ? if alpha<0 or beta<0:return nan > ? ? return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * log(dicerAcc) - > log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - log(factorial(x))) I guess what you want her is scipy.special.gamma which is the gamma function, not the gamma distribution loglikelihood of negative binomial is also in statsmodels.discrete if you want to compare notes Josef > > if __name__=='__main__': > ? ? x = > array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) > ? ? dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, > 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, > 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, > 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, > 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, > 0.14979999999999999, 0.012999999999999999]) > ? ? optimize.() > > > Am I doing something wrong or is this a known problem? > > Best, > Bruno > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Thu Mar 15 09:53:02 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 15 Mar 2012 09:53:02 -0400 Subject: [SciPy-User] Contributing to SciPy was Re: Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <0FEAABD1-291F-4F90-A7D0-6E38D7DE010B@sympatico.ca> References: <878586b0-b759-4c32-beec-b1628c4e3358@y27g2000yqy.googlegroups.com> <0FEAABD1-291F-4F90-A7D0-6E38D7DE010B@sympatico.ca> Message-ID: On Wed, Mar 14, 2012 at 3:16 PM, Cameron Hayne wrote: > > On 14-Mar-12, at 6:12 AM, Scott Sinclair wrote: > >> I still think that recipes which tell me how to do task X, with >> package Y are better hosted in the documentation/online resources of >> package Y. > > > But that is only useful when you know that you need to use package Y. I agree with this. > The cookbook should answer the question "How do I do X ?" (without any > reference to specific packages). Here's my problem with this. What if the question is, the fairly common, "how do I solve a non-linear programming problem?" The answer, with numpy/scipy, is that really you don't (or you drop in the entirety OpenOpt source code...). I don't think we should purge the cookbook of examples like these. Don't get me wrong, I don't think we should duplicate other package's documentation. But Dmitry (presumably) took the time to write up these examples, and they can be helpful in pointing someone towards topical software. Comes down to SciPy vs. scipy I think. And maybe I'm saying I'd prefer the former for the cookbook, and I think scikits, of which openopt was formerly one, and closely related packages fall under SciPy. This raises the question of what to do when the example becomes stale. Well, for the future, maybe you could link your e-mail with the posting of the original recipe and have a "Problem with this recipe" button. I don't know. That said, I'm also not doing any heavy lifting on this. Just my thoughts, Skipper From bacmsantos at gmail.com Thu Mar 15 11:07:25 2012 From: bacmsantos at gmail.com (Bruno Santos) Date: Thu, 15 Mar 2012 15:07:25 +0000 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: Thank you all very much for the replies that was exactly what I wanted. I am basically trying to get the parameters for a gamma-poisson distribution. I have the R code from a previous collaborator just trying to write a native function in python rather than using the R code or port it using rpy2. The function is the following: [image: Inline images 1] where f(b,d) is a function that gives me a probability of a certain position in the vector to be occupied and it depends on b (the position) and d (the likelihood of making an error). So the likelihood after a few transformations become: [image: Inline images 2] Which I then use the loglikelihood and try to maximise it using an optimization algorithm. [image: Inline images 3] The R code is as following: alphabeta<-function(alphabeta,x,dicerAcc) { alpha <-alphabeta[1] beta <-alphabeta[2] if (any(alphabeta<0)) return(NA) sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc > noiseT]) #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError != 0]) } x and dicerAcc are known so the I use the optim function in R ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, dicerAcc = dicerAcc)$par Is there any equivalent function in Scipy to the optim one? On 14 March 2012 17:05, Bruno Santos wrote: > I am trying to write a script to do some maximum likelihood parameter > estimation of a function. But when I try to use the gamma function I get: > gamma(5) > Out[5]: > > I thought it might have been a problem solved already on the new > distribution but even after installing the last scipy version I get the > same problem. > The test() after installation is also failing with the following > information: > Running unit tests for scipy > NumPy version 1.5.1 > NumPy is installed in /usr/lib/pymodules/python2.7/numpy > SciPy version 0.10.1 > SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy > Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] > nose version 1.1.2 > ... > ... > ... > AssertionError: > Arrays are not almost equal > ACTUAL: 0.0 > DESIRED: 0.5 > > ====================================================================== > FAIL: Regression test for #651: better handling of badly conditioned > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", > line 34, in test_bad_filter > assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) > File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, in > assert_raises > return nose.tools.assert_raises(*args,**kwargs) > AssertionError: BadCoefficients not raised > > ---------------------------------------------------------------------- > Ran 5103 tests in 47.795s > > FAILED (KNOWNFAIL=13, SKIP=28, failures=3) > Out[7]: > > > My code is as follows: > from numpy import array,log,sum,nan > from scipy.stats import gamma > from scipy import factorial, optimize > > #rinterface.initr() > #IntSexpVector = rinterface.IntSexpVector > #lgamma = rinterface.globalenv.get("lgamma") > > #Implementation for the Zero-inflated Negative Binomial function > def alphabeta(params,x,dicerAcc): > alpha = array(params[0]) > beta = array(params[1]) > if alpha<0 or beta<0:return nan > return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * log(dicerAcc) > - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - log(factorial(x))) > > if __name__=='__main__': > x = > array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) > dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, > 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, > 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, > 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, > 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, > 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, > 0.14979999999999999, 0.012999999999999999]) > optimize.() > > > Am I doing something wrong or is this a known problem? > > Best, > Bruno > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: From jsseabold at gmail.com Thu Mar 15 11:21:51 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 15 Mar 2012 11:21:51 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: > Thank you all very much for the replies that was exactly what I wanted. I > am basically trying to get the parameters for a gamma-poisson distribution. > I have the R code from a previous collaborator just trying to write a > native function in python rather than using the R code or port it using > rpy2. Oh, fun. > The function is the following: > [image: Inline images 1] > where f(b,d) is a function that gives me a probability of a certain > position in the vector to be occupied and it depends on b (the position) > and d (the likelihood of making an error). > So the likelihood after a few transformations become: > > [image: Inline images 2] > Which I then use the loglikelihood and try to maximise it using an > optimization algorithm. > [image: Inline images 3] > The R code is as following: > alphabeta<-function(alphabeta,x,dicerAcc) > { > alpha <-alphabeta[1] > beta <-alphabeta[2] > if (any(alphabeta<0)) > return(NA) > sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - > lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc > > noiseT]) > >From a quick (distracted) look (so I could be wrong) Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar integer I take it? > > #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError > != 0]) > } > x and dicerAcc are known so the I use the optim function in R > ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, dicerAcc > = dicerAcc)$par > > Is there any equivalent function in Scipy to the optim one? > > On 14 March 2012 17:05, Bruno Santos wrote: > >> I am trying to write a script to do some maximum likelihood parameter >> estimation of a function. But when I try to use the gamma function I get: >> gamma(5) >> Out[5]: >> >> I thought it might have been a problem solved already on the new >> distribution but even after installing the last scipy version I get the >> same problem. >> The test() after installation is also failing with the following >> information: >> Running unit tests for scipy >> NumPy version 1.5.1 >> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >> SciPy version 0.10.1 >> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >> nose version 1.1.2 >> ... >> ... >> ... >> AssertionError: >> Arrays are not almost equal >> ACTUAL: 0.0 >> DESIRED: 0.5 >> >> ====================================================================== >> FAIL: Regression test for #651: better handling of badly conditioned >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >> line 34, in test_bad_filter >> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, >> in assert_raises >> return nose.tools.assert_raises(*args,**kwargs) >> AssertionError: BadCoefficients not raised >> >> ---------------------------------------------------------------------- >> Ran 5103 tests in 47.795s >> >> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >> Out[7]: >> >> >> My code is as follows: >> from numpy import array,log,sum,nan >> from scipy.stats import gamma >> from scipy import factorial, optimize >> >> #rinterface.initr() >> #IntSexpVector = rinterface.IntSexpVector >> #lgamma = rinterface.globalenv.get("lgamma") >> >> #Implementation for the Zero-inflated Negative Binomial function >> def alphabeta(params,x,dicerAcc): >> alpha = array(params[0]) >> beta = array(params[1]) >> if alpha<0 or beta<0:return nan >> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >> log(factorial(x))) >> >> if __name__=='__main__': >> x = >> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >> 0.14979999999999999, 0.012999999999999999]) >> optimize.() >> >> >> Am I doing something wrong or is this a known problem? >> >> Best, >> Bruno >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: From lists at hilboll.de Thu Mar 15 13:13:04 2012 From: lists at hilboll.de (Andreas H.) Date: Thu, 15 Mar 2012 18:13:04 +0100 Subject: [SciPy-User] [scipy-central] Site design In-Reply-To: References: <3742dcea3dd622da7c4069310e9574e6.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: <38d8cde2b1bd3c29f88b5032c26a1f1f.squirrel@srv2.s4y.tournesol-consulting.eu> > On Tue, Mar 13, 2012 at 1:37 PM, Andreas H. wrote: >> Hi all, >> >> I think everyone agrees that the webdesign of scipy-central.org needs >> some >> major enhancements in order to make the site appealing to users so that >> they want to stay, browse, and use it. >> >> I think it would make sense to make the site visually similar to the >> main >> SciPy site (new.scipy.org), so that users can already "feel" the >> connection. I'm mainly talking about colors and fonts here. >> >> Also, a logo would be good. For a start, maybe we could use the main >> SciPy >> logo, but eventually, scipy-central should have its own, similar logo. >> >> Then, a sidebar would be nice. Possible blocks for the sidebar include >> 'links to core and related projects', 'what is SciPy', ... ideas >> welcome. >> >> If you agree, I could start playing around with the templates/css over >> the >> next weeks. >> > > A humble suggestion for the layout, if people don't think it's done to > death, bootstrap may be appropriate here [1, 2]. I've had good luck > with ideas from bootstrap at least if not the whole framework. E.g, I > find a CSS grid system to be aesthetically pleasing [3, 4]. There are > many more examples than the given links. I find it saves a lot of the > work of design from scratch. > > Skipper > > [1] http://blog.baregit.com/2012/bootstrap-or-not-bootstrap > [2] http://twitter.github.com/bootstrap/ > [3] http://960.gs/ > [4] http://cssgrid.net/ Thanks for the pointer to bootstrap, Skipper! I'm working on a very first idea of the site layout, including Tony's excellent logo. News early next week. Andreas. From jjhelmus at gmail.com Thu Mar 15 15:50:33 2012 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Thu, 15 Mar 2012 15:50:33 -0400 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> Message-ID: <4F624809.3000102@gmail.com> I know I am jumping into this thread late and it has drifted into another topics but I have some code that others might be interested in. With all the discussion of bounded leastsq and variable substitution I recalled that I had a wrapped version of leastsq in a larger project that allows for min, max bound using the variable transformations that MINUIT uses. I pulled out the necessary functions, refactored the code and made a github repo in case anyone is interested (https://github.com/jjhelmus/leastsqbound-scipy). This might make a good jumping off point for a more complete bounded leastsq optimizer that David had in mind. - Jonathan Helmus David Baddeley wrote: > From a pure performance perspective, you're probably going to be best > setting your bounds by variable substitution (particularly if they're > only single-ended - x**2 is cheap) - you really don't want to have the > for loops, dictionary lookups and conditionals that lmfit introduces > for it's bounds checking inside your objective function. > > I think a high level wrapper that permitted bounds, an unadulterated > goal function, and setting which parameters to fit, but also retained > much of the raw speed of leastsq could be accomplished with some > clever on the fly code generation (maybe also using Sympy to > automatically derive the Jacobian). Would make an interesting project ... > > David > > ------------------------------------------------------------------------ > *From:* Eric Emsellem > *To:* Matthew Newville > *Cc:* scipy-user at scipy.org; scipy-user at googlegroups.com > *Sent:* Friday, 9 March 2012 12:17 PM > *Subject:* Re: [SciPy-User] Least-squares fittings with bounds: why is > scipy not up to the task? > > > > > Yes, see https://github.com/newville/lmfit-py, which does everything > > you ask for, and a bit more, with the possible exception of "being > > included in scipy". For what its worth, I work with Mark Rivers > > (who's no longer actively developing Python), and our group is full of > > IDL users who are very familiar with Markwardt's implementation. > > > > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK > > directly, so has the advantage of not being implemented in pure IDL or > > Python. It is definitely faster than mpfit.py. > > > > With lmfit-py, one writes a python function-to-minimize that takes a > > list of Parameters instead of the array of floating point variables > > that scipy.optimize.leastsq() uses. Each Parameter can be freely > > varied of fixed, have upper and/or lower bounds placed on them, or be > > written as algebraic expressions of other Parameters. Uncertainties > > in varied Parameters and correlations between Parameters are estimated > > using the same "scaled covariance" method as used in > > scipy.optimize.curve_fit(). There is limited support for > > optimization methods other than scipy.optimize.leastsq(), but I don't > > find these methods to be very useful for the kind of fitting problems > > I normally see, so support for them may not be perfect. > > > > Whether this gets included into scipy is up to the scipy developers. > > I'd be happy to support this module within scipy or outside scipy. > > I have no doubt that improvements could be made to lmfit.py. If you > > have suggestion, I'd be happy to hear them. > > looks great! I'll have a go at this, as mentioned in my previous post. I > believe that leastsq is probably the fastest anyway (according to the > test Adam mentioned to me today) so this could be it. I'll make a test > and compare it with mpfit (for the specific case I am thinking of, I am > optimising over ~10^5-6 points with ~90 parameters...). > > thanks again for this, and I'll try to report on this (if relevant) asap. > > Eric > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dtlussier at gmail.com Thu Mar 15 16:23:49 2012 From: dtlussier at gmail.com (Dan Lussier) Date: Thu, 15 Mar 2012 13:23:49 -0700 Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> Message-ID: Have you tried numpy.frompyfunc? http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html http://stackoverflow.com/questions/6126233/can-i-create-a-python-numpy-ufunc-from-an-unbound-member-method With this approach you may be able create a function which acts elementwise over your array to compute the matrix logarithm at each entry using Numpy's ufuncs. This would avoid the explicit iteration over the array using the for loops. As a rough outline try: from scipy import linalg import numpy as np # Assume im is the container array containing a 3x3 matrix at each pixel. # Composite function so get matrix log of array A as a matrix in one step def log_matrix(A): return linalg.logm(np.asmatrix(A)) # Creating function to operate over container array. Takes one argument and returns the result. log_ufunc = np.frompyfunc(log_matrix, 1, 1) # Using log_ufunc on container array, im res = log_ufunc(im) Dan On 2012-03-15, at 1:59 AM, tyldurd wrote: > Hello, > > I am a beginner at python and numpy and I need to compute the matrix logarithm for each "pixel" (i.e. x,y position) of a matrix-valued image of dimension MxNx3x3. 3x3 is the dimensions of the matrix at each pixel. > > The function I have written so far is the following: > > def logm_img(im): > from scipy import linalg > dimx = im.shape[0] > dimy = im.shape[1] > res = zeros_like(im) > for x in range(dimx): > for y in range(dimy): > res[x, y, :, :] = linalg.logm(asmatrix(im[x,y,:,:])) > return res > Is it ok? Is there a way to avoid the two nested loops ? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ratcliff at gmail.com Thu Mar 15 16:38:22 2012 From: william.ratcliff at gmail.com (william ratcliff) Date: Thu, 15 Mar 2012 16:38:22 -0400 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: <4F624809.3000102@gmail.com> References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> <4F624809.3000102@gmail.com> Message-ID: What is the license for MINUIT? On Thu, Mar 15, 2012 at 3:50 PM, Jonathan Helmus wrote: > I know I am jumping into this thread late and it has drifted into > another topics but I have some code that others might be interested in. > With all the discussion of bounded leastsq and variable substitution I > recalled that I had a wrapped version of leastsq in a larger project > that allows for min, max bound using the variable transformations that > MINUIT uses. I pulled out the necessary functions, refactored the code > and made a github repo in case anyone is interested > (https://github.com/jjhelmus/leastsqbound-scipy). This might make a > good jumping off point for a more complete bounded leastsq optimizer > that David had in mind. > > - Jonathan Helmus > > David Baddeley wrote: > > From a pure performance perspective, you're probably going to be best > > setting your bounds by variable substitution (particularly if they're > > only single-ended - x**2 is cheap) - you really don't want to have the > > for loops, dictionary lookups and conditionals that lmfit introduces > > for it's bounds checking inside your objective function. > > > > I think a high level wrapper that permitted bounds, an unadulterated > > goal function, and setting which parameters to fit, but also retained > > much of the raw speed of leastsq could be accomplished with some > > clever on the fly code generation (maybe also using Sympy to > > automatically derive the Jacobian). Would make an interesting project ... > > > > David > > > > ------------------------------------------------------------------------ > > *From:* Eric Emsellem > > *To:* Matthew Newville > > *Cc:* scipy-user at scipy.org; scipy-user at googlegroups.com > > *Sent:* Friday, 9 March 2012 12:17 PM > > *Subject:* Re: [SciPy-User] Least-squares fittings with bounds: why is > > scipy not up to the task? > > > > > > > > > Yes, see https://github.com/newville/lmfit-py, which does everything > > > you ask for, and a bit more, with the possible exception of "being > > > included in scipy". For what its worth, I work with Mark Rivers > > > (who's no longer actively developing Python), and our group is full of > > > IDL users who are very familiar with Markwardt's implementation. > > > > > > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK > > > directly, so has the advantage of not being implemented in pure IDL or > > > Python. It is definitely faster than mpfit.py. > > > > > > With lmfit-py, one writes a python function-to-minimize that takes a > > > list of Parameters instead of the array of floating point variables > > > that scipy.optimize.leastsq() uses. Each Parameter can be freely > > > varied of fixed, have upper and/or lower bounds placed on them, or be > > > written as algebraic expressions of other Parameters. Uncertainties > > > in varied Parameters and correlations between Parameters are estimated > > > using the same "scaled covariance" method as used in > > > scipy.optimize.curve_fit(). There is limited support for > > > optimization methods other than scipy.optimize.leastsq(), but I don't > > > find these methods to be very useful for the kind of fitting problems > > > I normally see, so support for them may not be perfect. > > > > > > Whether this gets included into scipy is up to the scipy developers. > > > I'd be happy to support this module within scipy or outside scipy. > > > I have no doubt that improvements could be made to lmfit.py. If you > > > have suggestion, I'd be happy to hear them. > > > > looks great! I'll have a go at this, as mentioned in my previous post. I > > believe that leastsq is probably the fastest anyway (according to the > > test Adam mentioned to me today) so this could be it. I'll make a test > > and compare it with mpfit (for the specific case I am thinking of, I am > > optimising over ~10^5-6 points with ~90 parameters...). > > > > thanks again for this, and I'll try to report on this (if relevant) asap. > > > > Eric > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjhelmus at gmail.com Thu Mar 15 17:08:27 2012 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Thu, 15 Mar 2012 17:08:27 -0400 Subject: [SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task? In-Reply-To: References: <4F5916A2.2040604@eso.org> <18284571.1324.1331244589247.JavaMail.geo-discussion-forums@ynca15> <4F593DFF.90101@eso.org> <1331262863.55138.YahooMailNeo@web113402.mail.gq1.yahoo.com> <4F624809.3000102@gmail.com> Message-ID: <4F625A4B.9040905@gmail.com> MINUIT , or more precisely Minuet2 is part of the ROOT which is GPL version 2 licensed. There already exists a python wrapper for the package (http://code.google.com/p/pyminuit/), which is also GPL licensed. I expect the licensing would cause problems if one wanted to including the package in scipy. The code in my repo on the other hand is BSD licensed and isn't based off MINUIT. I merely used the same mathematical functions (sin, sqrt, arcsin, etc) for the variable transforms which are mentioned in MINUIT's User Manual. - Jonathan Helmus william ratcliff wrote: > What is the license for MINUIT? > > On Thu, Mar 15, 2012 at 3:50 PM, Jonathan Helmus > wrote: > > I know I am jumping into this thread late and it has drifted into > another topics but I have some code that others might be > interested in. > With all the discussion of bounded leastsq and variable substitution I > recalled that I had a wrapped version of leastsq in a larger project > that allows for min, max bound using the variable transformations that > MINUIT uses. I pulled out the necessary functions, refactored the > code > and made a github repo in case anyone is interested > (https://github.com/jjhelmus/leastsqbound-scipy). This might make a > good jumping off point for a more complete bounded leastsq optimizer > that David had in mind. > > - Jonathan Helmus > > David Baddeley wrote: > > From a pure performance perspective, you're probably going to be > best > > setting your bounds by variable substitution (particularly if > they're > > only single-ended - x**2 is cheap) - you really don't want to > have the > > for loops, dictionary lookups and conditionals that lmfit introduces > > for it's bounds checking inside your objective function. > > > > I think a high level wrapper that permitted bounds, an unadulterated > > goal function, and setting which parameters to fit, but also > retained > > much of the raw speed of leastsq could be accomplished with some > > clever on the fly code generation (maybe also using Sympy to > > automatically derive the Jacobian). Would make an interesting > project ... > > > > David > > > > > ------------------------------------------------------------------------ > > *From:* Eric Emsellem > > > *To:* Matthew Newville > > > *Cc:* scipy-user at scipy.org ; > scipy-user at googlegroups.com > > *Sent:* Friday, 9 March 2012 12:17 PM > > *Subject:* Re: [SciPy-User] Least-squares fittings with bounds: > why is > > scipy not up to the task? > > > > > > > > > Yes, see https://github.com/newville/lmfit-py, which does > everything > > > you ask for, and a bit more, with the possible exception of "being > > > included in scipy". For what its worth, I work with Mark Rivers > > > (who's no longer actively developing Python), and our group is > full of > > > IDL users who are very familiar with Markwardt's implementation. > > > > > > The lmfit-py version uses scipy.optimize.leastsq(), which uses > MINPACK > > > directly, so has the advantage of not being implemented in > pure IDL or > > > Python. It is definitely faster than mpfit.py. > > > > > > With lmfit-py, one writes a python function-to-minimize that > takes a > > > list of Parameters instead of the array of floating point > variables > > > that scipy.optimize.leastsq() uses. Each Parameter can be freely > > > varied of fixed, have upper and/or lower bounds placed on > them, or be > > > written as algebraic expressions of other Parameters. > Uncertainties > > > in varied Parameters and correlations between Parameters are > estimated > > > using the same "scaled covariance" method as used in > > > scipy.optimize.curve_fit(). There is limited support for > > > optimization methods other than scipy.optimize.leastsq(), but > I don't > > > find these methods to be very useful for the kind of fitting > problems > > > I normally see, so support for them may not be perfect. > > > > > > Whether this gets included into scipy is up to the scipy > developers. > > > I'd be happy to support this module within scipy or outside scipy. > > > I have no doubt that improvements could be made to lmfit.py. > If you > > > have suggestion, I'd be happy to hear them. > > > > looks great! I'll have a go at this, as mentioned in my previous > post. I > > believe that leastsq is probably the fastest anyway (according > to the > > test Adam mentioned to me today) so this could be it. I'll make > a test > > and compare it with mpfit (for the specific case I am thinking > of, I am > > optimising over ~10^5-6 points with ~90 parameters...). > > > > thanks again for this, and I'll try to report on this (if > relevant) asap. > > > > Eric > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From wesmckinn at gmail.com Thu Mar 15 18:35:43 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 15 Mar 2012 18:35:43 -0400 Subject: [SciPy-User] ANN: New PyData mailing list for pandas and other data-related projects Message-ID: hi all, Coming out of PyCon, there was a clear need to better organize & support the growing pandas userbase and broader "Python for Data" community. To that end, we've created a new Google Group which will become the more official home of pandas discussions and, we hope, broader data-related discussions. http://groups.google.com/group/pydata We've already begun to organize under the PyData banner for GitHub: https://github.com/pydata. I envision a rich ecosystem of projects in this namespace. There's also a new #pydata channel on irc.freenode.net. Join us! Many of you have also noticed the new pydata.org domain where the pandas project page is hosted: http://pandas.pydata.org/ The root domain currently points to the new NumFOCUS non-profit organization, but only as a placeholder (that's really at http://numfocus.org/). Thanks to them for the generous hosting support. This domain will soon become a portal for data scientists (analysts, hackers, whatever term floats your boat) who use, or want to use, Python. Plans for this website are nascent, but the intention is to provide better resources for new people entering the ecosystem (e.g.. which packages to use for each problem domain, cookbook-like examples for problem domains, access to open data sets, simpler installation & setup, conference/tutorial/meetup announcements, etc.) The specific mechanism for creating & curating the website are undetermined. A wiki is the right philosophy, but functionally has some real drawbacks. Regardless, the goal is to make the website community-driven. What do you need to take action on? If you're a pandas user, or are interested in data tools for Python, please join the new group. pandas mailing list traffic will be progressively shifted toward this new list. I also encourage you to start using the #pydata and #pystats hash tags on Twitter to help establish a community presence there. Looking forward to the rest of 2012-- lots of exciting things ahead! cheers, Wes From lists at hilboll.de Fri Mar 16 04:46:55 2012 From: lists at hilboll.de (Andreas H.) Date: Fri, 16 Mar 2012 09:46:55 +0100 Subject: [SciPy-User] ANN: New PyData mailing list for pandas and other data-related projects In-Reply-To: References: Message-ID: > This domain will soon become a portal for data scientists (analysts, > hackers, whatever term floats your boat) who use, or want to use, > Python. > > Plans for this website are nascent, but the intention is to provide > better resources for new people entering the ecosystem (e.g.. which > packages to use for each problem domain, cookbook-like examples for > problem domains, access to open data sets, simpler installation & > setup, conference/tutorial/meetup announcements, etc.) > > The specific mechanism for creating & curating the website are > undetermined. A wiki is the right philosophy, but functionally has > some real drawbacks. Regardless, the goal is to make the website > community-driven. I'm wondering if there are possible benefits of coordinating the scipy-central.org and pydata.org websites? It seems to me that we're beginning to develop two new portals for users, which are close enough in focus. As a users, I wouldn't necessarily know whether one or the other is better for my needs. Maybe the community could benefit from putting all effort into one site? Just my 2 cents ... Andreas. From josef.pktd at gmail.com Fri Mar 16 12:45:33 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 16 Mar 2012 12:45:33 -0400 Subject: [SciPy-User] quadratic programming with fmin_slsqp Message-ID: scipy is missing a fmin_quadprog http://en.wikipedia.org/wiki/Quadratic_programming#Problem_formulation Did anyone ever try to see if fmin_slsqp can be used for this? It looks flexible and targeted enough to be the base for a quadratic programming wrapper. So far I only got a quick experiment Josef ------ # -*- coding: utf-8 -*- """ Created on Thu Mar 15 20:25:27 2012 Author: Josef Perktold just a first try write a function fmin_quadprog(func, A,B,C,b,c) max xAx st Bx = b Cx >= c check if there is a standard notation for Matrices http://en.wikipedia.org/wiki/Quadratic_programming#Problem_formulation can work with f_eqcons and f_ieqcons I'm not sure how good the checking is I thought I had f_ieqcons that conflicted with eqcons, but got normal successful convergence. f_ineqcons was binding but didn't satisfy eqcons If f_ieqcons is defined, then ieqcons is ignored. see docstring """ from time import time import numpy as np from scipy.optimize import fmin_slsqp def func(x, args): A = np.eye(len(x)) A[0,0] = 2 x = np.atleast_2d(x) return np.dot(np.dot(x,A), x.T) #fprime=testfunc_deriv B = np.eye(2)[0] b = np.ones(2)[0] f_ieqcons = lambda x, *args: np.atleast_1d(np.dot(x, B) - b) t0 = time() xres = fmin_slsqp(func,[2.0,1.0], args=(-1.0,), eqcons=[lambda x, args: x[0]+x[1] + 4 ], # ieqcons=[lambda x, args: x[0]+.5, # lambda x, args: x[0]], f_ieqcons=f_ieqcons, iprint=2, full_output=1) print "Elapsed time:", 1000*(time()-t0), "ms" print "Results",xres print "\n\n" --- From glenjenness at gmail.com Fri Mar 16 16:34:58 2012 From: glenjenness at gmail.com (Glen Jenness) Date: Fri, 16 Mar 2012 15:34:58 -0500 Subject: [SciPy-User] problem running SciPy Message-ID: Dear users, I just recently installed SciPy, and when I went to run the tests, I got: [gjenness at pople tmp]$ python -c "import scipy; scipy.test()" Traceback (most recent call last): File "", line 1, in ? File "/home/gjenness/programs/scipy-0.10.1/scipy/__init__.py", line 128, in ? raise ImportError(msg) ImportError: Error importing scipy: you cannot import scipy while being in scipy source directory; please exit the scipy source tree first, and relaunch your python intepreter. Now this is fairly strange to me as I am not ANYWHERE near my SciPy source directory. I tried looking around to see if there was a solution (I had a similar problem with NumPy a couple months back, but sadly I don't recall what I did to fix it). My site.cfg is: [DEFAULT] library_dirs = /usr/lib include_dirs = /usr/include [fftw] libraries = fftw3 [mkl] library_dirs = /opt/intel/mkl/10.0.3.020/lib/em64t include_dirs = /opt/intel/mkl/10.0.3.020/include mkl_libs = mkl_intel_lp64,mkl_intel_thread,mkl_core and I configured/built SciPy with: python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install --prefix=/home/gjenness/programs/scipy-0.10.1/ If anyone can help me resolve this problem it'd be greatly appreciated. Thanks! Dr. Glen Jenness Schmidt Group Department of Chemistry University of Wisconsin - Madison -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Mar 16 16:52:47 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 16 Mar 2012 21:52:47 +0100 Subject: [SciPy-User] problem running SciPy In-Reply-To: References: Message-ID: Hi, 16.03.2012 21:34, Glen Jenness kirjoitti: [clip] > [gjenness at pople tmp]$ python -c "import scipy; scipy.test()" > Traceback (most recent call last): > File "", line 1, in ? > File "/home/gjenness/programs/scipy-0.10.1/scipy/__init__.py", line > 128, in ? > raise ImportError(msg) > ImportError: Error importing scipy: you cannot import scipy while > being in scipy source directory; please exit the scipy source > tree first, and relaunch your python intepreter. [clip] > python setup.py config --compiler=intelem --fcompiler=intelem build_clib > --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem > --fcompiler=intelem install --prefix=/home/gjenness/programs/scipy-0.10.1/ > > If anyone can help me resolve this problem it'd be greatly appreciated. You probably have done the following: cd ~/programs/ tar xzf scipy-0.10.1.tar.gz cd scipy-0.10.1 python setup.py ..... --prefix=~/programs/scipy-0.10.1/ The correct way would be either cd ~/tmp tar xzf scipy-0.10.1.tar.gz cd scipy-0.10.1 python setup.py ..... --user This installs everything under ~/.local. Or, python setup.py ..... --prefix=~/programs/scipy-0.10.1/ but now you need to set your PYTHONPATH to point to ~/programs/scipy-0.10.1/lib/python2.X/site-packages and not to ~/programs/scipy-0.10.1 -- Pauli Virtanen From glenjenness at gmail.com Fri Mar 16 16:58:17 2012 From: glenjenness at gmail.com (Glen Jenness) Date: Fri, 16 Mar 2012 15:58:17 -0500 Subject: [SciPy-User] problem running SciPy In-Reply-To: References: Message-ID: Pauli, Ah that did the trick! I had it in that directory just for testing purposes, but once I had to go to my other python libraries it went away. I am currently having another problem related to importing in SciPy's optimizers. If I do: python -c "import scipy.optimize" I get: Traceback (most recent call last): File "", line 1, in ? File "/home/gjenness/lib/lib64/python2.4/site-packages/scipy/optimize/__init__.py", line 132, in ? from lbfgsb import fmin_l_bfgs_b File "/home/gjenness/lib/lib64/python2.4/site-packages/scipy/optimize/lbfgsb.py", line 28, in ? import _lbfgsb ImportError: /opt/intel/mkl/10.0.3.020/lib/em64t/libmkl_intel_thread.so: undefined symbol: omp_in_parallel I am currently trying to figure this out, but if anyone has any advice that'll save the amount of Google'ing I'll have to do, it'd be appreciated :) On Fri, Mar 16, 2012 at 3:52 PM, Pauli Virtanen wrote: > Hi, > > 16.03.2012 21:34, Glen Jenness kirjoitti: > [clip] > > [gjenness at pople tmp]$ python -c "import scipy; scipy.test()" > > Traceback (most recent call last): > > File "", line 1, in ? > > File "/home/gjenness/programs/scipy-0.10.1/scipy/__init__.py", line > > 128, in ? > > raise ImportError(msg) > > ImportError: Error importing scipy: you cannot import scipy while > > being in scipy source directory; please exit the scipy source > > tree first, and relaunch your python intepreter. > [clip] > > python setup.py config --compiler=intelem --fcompiler=intelem build_clib > > --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem > > --fcompiler=intelem install > --prefix=/home/gjenness/programs/scipy-0.10.1/ > > > > If anyone can help me resolve this problem it'd be greatly appreciated. > > You probably have done the following: > > cd ~/programs/ > tar xzf scipy-0.10.1.tar.gz > cd scipy-0.10.1 > python setup.py ..... --prefix=~/programs/scipy-0.10.1/ > > The correct way would be either > > cd ~/tmp > tar xzf scipy-0.10.1.tar.gz > cd scipy-0.10.1 > python setup.py ..... --user > > This installs everything under ~/.local. Or, > > python setup.py ..... --prefix=~/programs/scipy-0.10.1/ > > but now you need to set your PYTHONPATH to point to > > ~/programs/scipy-0.10.1/lib/python2.X/site-packages > > and not to > > ~/programs/scipy-0.10.1 > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Dr. Glen Jenness Schmidt Group Department of Chemistry University of Wisconsin - Madison -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Mar 16 17:46:10 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 16 Mar 2012 17:46:10 -0400 Subject: [SciPy-User] ANN: pandas 0.7.2 released Message-ID: hi all, I'm pleased to announce the pandas 0.7.2 release. Like prior releases this includes a handful of bug fixes from 0.7.1, some performance enhancements, and new features. This release is recommended for all pandas users. Major work is underway for pandas 0.8.0, hopefully to be released at the end of April. In particular, the time series capabilities are seeing significant work, incorporating the new NumPy datetime64 dtype and features which have been available in scikits.timeseries but not in pandas. See the issue tracker for a full of list planned new features and performance/infrastructural improvements. If you are interested in becoming more involved with the project, the issue tracker (which is really the TODO list!) is the best place to start. See Adam Klein's post http://blog.adamdklein.com/?p=582 for more on the ongoing time series work. Also, my video from PyCon may be of interest to some: http://pyvideo.org/video/696/pandas-powerful-data-analysis-tools-for-python Note that pandas has a new mailing list! http://groups.google.com/group/pydata . There is also a new #pydata channel on irc.freenode.net. Thanks to all who contributed to this release! - Wes What is it ========== pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Links ===== Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst Documentation: http://pandas.pydata.org Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/pydata/pandas Mailing List: http://groups.google.com/group/pydata Blog: http://blog.wesmckinney.com From khoroshyy at gmail.com Fri Mar 16 10:26:53 2012 From: khoroshyy at gmail.com (Petro) Date: Fri, 16 Mar 2012 07:26:53 -0700 (PDT) Subject: [SciPy-User] load file with "header" in the bottom of the file. Message-ID: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> Hi all. numpy.loadtxt allows to skip headers line. I have a lot of tab-delimited files were description is on the bottom. Does anybody know an easy way to read such file. Thanks in advance. Petro. From conny.kuehne at googlemail.com Fri Mar 16 15:34:49 2012 From: conny.kuehne at googlemail.com (=?iso-8859-1?Q?Conny_K=FChne?=) Date: Fri, 16 Mar 2012 20:34:49 +0100 Subject: [SciPy-User] non-existing path in 'scipy/io': 'docs' Message-ID: <74F753BB-984A-400E-82F9-2B95EE0CEE80@googlemail.com> Hello, I get the following error when trying to build scipy 0.10.1 from source blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] non-existing path in 'scipy/io': 'docs' lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec'] umfpack_info: libraries umfpack not found in /Library/Frameworks/Python.framework/Versions/2.7/lib libraries umfpack not found in /usr/local/lib libraries umfpack not found in /usr/lib amd_info: libraries amd not found in /Library/Frameworks/Python.framework/Versions/2.7/lib libraries amd not found in /usr/local/lib libraries amd not found in /usr/lib FOUND: libraries = ['amd'] library_dirs = ['/opt/local/lib'] FOUND: libraries = ['umfpack', 'amd'] library_dirs = ['/opt/local/lib'] I already tried different scipy version with no success. Installing with easy_install yields a similar result. I use Mac OS X 10.6.8 gcc version: i686-apple-darwin10-gcc-4.0.1 (GCC) 4.0.1 gfortran version: GNU Fortran (GCC) 4.2.3 Any ideas? Conny K?hne From andrew_giessel at hms.harvard.edu Sat Mar 17 12:17:57 2012 From: andrew_giessel at hms.harvard.edu (Andrew Giessel) Date: Sat, 17 Mar 2012 12:17:57 -0400 Subject: [SciPy-User] load file with "header" in the bottom of the file. In-Reply-To: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> References: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> Message-ID: I'd suggest perhaps using the unix utils 'tail' and 'head' and 'wc' to pre-process the files On Fri, Mar 16, 2012 at 10:26, Petro wrote: > Hi all. > numpy.loadtxt allows to skip headers line. > I have a lot of tab-delimited files were description is on the bottom. > Does anybody know an easy way to read such file. > Thanks in advance. > Petro. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel at hms.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Sat Mar 17 12:32:16 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sat, 17 Mar 2012 11:32:16 -0500 Subject: [SciPy-User] load file with "header" in the bottom of the file. In-Reply-To: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> References: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> Message-ID: On Fri, Mar 16, 2012 at 9:26 AM, Petro wrote: > Hi all. > numpy.loadtxt allows to skip headers line. > I have a lot of tab-delimited files were description is on the bottom. > Does anybody know an easy way to read such file. > Thanks in advance. > Petro. > numpy.genfromtxt has a 'skip_footer' argument for ignoring lines at the end of the file. For example: In [5]: !cat test.tsv 100 200 300 400 500 600 This is a test. In [6]: a = genfromtxt('test.tsv', delimiter='\t', skip_footer=1) In [7]: a Out[7]: array([[ 100., 200., 300.], [ 400., 500., 600.]]) Warren > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From khoroshyy at gmail.com Sun Mar 18 06:04:18 2012 From: khoroshyy at gmail.com (Petro) Date: Sun, 18 Mar 2012 03:04:18 -0700 (PDT) Subject: [SciPy-User] load file with "header" in the bottom of the file. In-Reply-To: References: <2e8597ba-14ec-408f-afd7-471969cf6b08@gw9g2000vbb.googlegroups.com> Message-ID: Thanks. Genfromtxt solved my problem. On Mar 17, 5:32?pm, Warren Weckesser wrote: > On Fri, Mar 16, 2012 at 9:26 AM, Petro wrote: > > Hi all. > > numpy.loadtxt allows to skip headers line. > > I have a lot of tab-delimited files were description is on the bottom. > > Does anybody know an easy ?way ?to read such file. > > Thanks in advance. > > Petro. > > numpy.genfromtxt has a 'skip_footer' argument for ignoring lines at the end > of the file. ?For example: > > In [5]: !cat test.tsv > 100 ? ?200 ? ?300 > 400 ? ?500 ? ?600 > This is a test. > > In [6]: a = genfromtxt('test.tsv', delimiter='\t', skip_footer=1) > > In [7]: a > Out[7]: > array([[ 100., ?200., ?300.], > ? ? ? ?[ 400., ?500., ?600.]]) > > Warren > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-U... at scipy.org > >http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Sun Mar 18 17:42:11 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Mar 2012 22:42:11 +0100 Subject: [SciPy-User] non-existing path in 'scipy/io': 'docs' In-Reply-To: <74F753BB-984A-400E-82F9-2B95EE0CEE80@googlemail.com> References: <74F753BB-984A-400E-82F9-2B95EE0CEE80@googlemail.com> Message-ID: 2012/3/16 Conny K?hne > Hello, > > I get the following error when trying to build scipy 0.10.1 from source > > blas_opt_info: > FOUND: > extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] > define_macros = [('NO_ATLAS_INFO', 3)] > extra_compile_args = ['-faltivec', > '-I/System/Library/Frameworks/vecLib.framework/Headers'] > > non-existing path in 'scipy/io': 'docs' > This can be cleaned up by removing the line "config.add_data_dir('docs')" in scipy/io/setup.py. Note that this is not an actual error though, just a harmless warning. Do you have an actual install issue? If so, can you post the full build log? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaggi.federico at gmail.com Sun Mar 18 18:18:26 2012 From: vaggi.federico at gmail.com (federico vaggi) Date: Sun, 18 Mar 2012 23:18:26 +0100 Subject: [SciPy-User] Numpy/MATLAB difference in array indexing Message-ID: Hi everyone, I was trying to port some code from MATLAB to Scipy, and I noticed a slight bug in the functionality of numpy.tile vs repmat in matlab: For example: a = np.random.rand(10,2) b = tile(a[:,1],(1,5)) b.shape Out[86]: (1, 50) While MATLAB gives: >> a = rand(10,2); >> b = repmat(a(:,1),[1,5]); >> size(b) ans = 10 5 This is obviously trivial to fix**, but I'm wondering what causes the difference? If you take a vertical slice of an array in numpy that's seen as a row vector, while in MATLAB its seen as a column vector? Is it worth making a note in here: http://www.scipy.org/NumPy_for_Matlab_Users ? Federico ** The easiest way I found was: b = tile(a[:,1],(5,1)).T -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Mar 18 18:37:34 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 18 Mar 2012 23:37:34 +0100 Subject: [SciPy-User] Numpy/MATLAB difference in array indexing In-Reply-To: References: Message-ID: 18.03.2012 23:18, federico vaggi kirjoitti: > I was trying to port some code from MATLAB to Scipy, and I noticed a > slight bug in the functionality of numpy.tile vs repmat in matlab: > > For example: > > a = np.random.rand(10,2) > > b = tile(a[:,1],(1,5)) a[:,1] is an 1-d array, and therefore considered as a (1, N) vector in 2-d context. This is not a bug --- the Numpy constructs do not always map exactly to Matlab ones. -- Pauli Virtanen From wesmckinn at gmail.com Sun Mar 18 20:23:34 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 18 Mar 2012 20:23:34 -0400 Subject: [SciPy-User] Numpy/MATLAB difference in array indexing In-Reply-To: References: Message-ID: On Sun, Mar 18, 2012 at 6:37 PM, Pauli Virtanen wrote: > 18.03.2012 23:18, federico vaggi kirjoitti: >> I was trying to port some code from MATLAB to Scipy, and I noticed a >> slight bug in the functionality of numpy.tile vs repmat in matlab: >> >> For example: >> >> a = np.random.rand(10,2) >> >> b = tile(a[:,1],(1,5)) > > a[:,1] is an 1-d array, and therefore considered as a (1, N) vector in > 2-d context. This is not a bug --- the Numpy constructs do not always > map exactly to Matlab ones. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user In [3]: b = tile(a[:,1:2],(1,5)) In [4]: b.shape Out[4]: (10, 5) From sturla at molden.no Mon Mar 19 06:51:38 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 19 Mar 2012 11:51:38 +0100 Subject: [SciPy-User] Numpy/MATLAB difference in array indexing In-Reply-To: References: Message-ID: <4F670FBA.7010806@molden.no> On 18.03.2012 23:37, Pauli Virtanen wrote: >> b = tile(a[:,1],(1,5)) > > a[:,1] is an 1-d array, and therefore considered as a (1, N) vector in > 2-d context. This is not a bug --- the Numpy constructs do not always > map exactly to Matlab ones. Yes. Also, the need for "repmats" (np.repeat, np.tile, np.hstack, np.vstack) and "reshapes" (np.reshape, np.ndarray.reshape) is less prominent in NumPy because of broadcasting. Using MATLAB idioms like reshape and repmat instead of broadcasting is a common mistake (or bad habit) when coming to NumPy for MATLAB. In my experience, 99% of cases for a .* reshape(b,m,n) a .* repmat(b,m,n) in MATLAB will just map to NumPy constructs like these: a * b a * b[:,np.newaxis] This, in addition to view arrays, make NumPy much more memory efficient. Not to mention that a.T is O(1) in NumPy whereas a' is O(N*M) in MATLAB. Sturla From bacmsantos at gmail.com Mon Mar 19 12:14:58 2012 From: bacmsantos at gmail.com (Bruno Santos) Date: Mon, 19 Mar 2012 16:14:58 +0000 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: I believe the formula I have is accurate I checked it personally and also have it checked by two mathematicians in the lab and they come up with the same results. I left my notebook where I performed the transformations home so don't completely remember but I believe you can simply things to get rid of some of the parameters. dicerAcc is a scalar as you mentioned. I managed to implement the function in python now and it is giving the same results as in R my question how to maximize it still remains though. Is it possibly to maximize a function rather than minimize it in Python? On 15 March 2012 15:21, Skipper Seabold wrote: > On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: > >> Thank you all very much for the replies that was exactly what I wanted. I >> am basically trying to get the parameters for a gamma-poisson distribution. >> I have the R code from a previous collaborator just trying to write a >> native function in python rather than using the R code or port it using >> rpy2. > > > Oh, fun. > > >> The function is the following: >> [image: Inline images 1] >> where f(b,d) is a function that gives me a probability of a certain >> position in the vector to be occupied and it depends on b (the position) >> and d (the likelihood of making an error). >> So the likelihood after a few transformations become: >> >> [image: Inline images 2] >> Which I then use the loglikelihood and try to maximise it using an >> optimization algorithm. >> [image: Inline images 3] >> The R code is as following: >> alphabeta<-function(alphabeta,x,dicerAcc) >> { >> alpha <-alphabeta[1] >> beta <-alphabeta[2] >> if (any(alphabeta<0)) >> return(NA) >> sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - >> lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc >> > noiseT]) >> > > From a quick (distracted) look (so I could be wrong) > > Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) > should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar > integer I take it? > > >> >> #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError >> != 0]) >> } >> x and dicerAcc are known so the I use the optim function in R >> ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, >> dicerAcc = dicerAcc)$par >> >> Is there any equivalent function in Scipy to the optim one? >> >> On 14 March 2012 17:05, Bruno Santos wrote: >> >>> I am trying to write a script to do some maximum likelihood parameter >>> estimation of a function. But when I try to use the gamma function I get: >>> gamma(5) >>> Out[5]: >>> >>> I thought it might have been a problem solved already on the new >>> distribution but even after installing the last scipy version I get the >>> same problem. >>> The test() after installation is also failing with the following >>> information: >>> Running unit tests for scipy >>> NumPy version 1.5.1 >>> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >>> SciPy version 0.10.1 >>> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >>> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >>> nose version 1.1.2 >>> ... >>> ... >>> ... >>> AssertionError: >>> Arrays are not almost equal >>> ACTUAL: 0.0 >>> DESIRED: 0.5 >>> >>> ====================================================================== >>> FAIL: Regression test for #651: better handling of badly conditioned >>> ---------------------------------------------------------------------- >>> Traceback (most recent call last): >>> File >>> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >>> line 34, in test_bad_filter >>> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >>> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, >>> in assert_raises >>> return nose.tools.assert_raises(*args,**kwargs) >>> AssertionError: BadCoefficients not raised >>> >>> ---------------------------------------------------------------------- >>> Ran 5103 tests in 47.795s >>> >>> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >>> Out[7]: >>> >>> >>> My code is as follows: >>> from numpy import array,log,sum,nan >>> from scipy.stats import gamma >>> from scipy import factorial, optimize >>> >>> #rinterface.initr() >>> #IntSexpVector = rinterface.IntSexpVector >>> #lgamma = rinterface.globalenv.get("lgamma") >>> >>> #Implementation for the Zero-inflated Negative Binomial function >>> def alphabeta(params,x,dicerAcc): >>> alpha = array(params[0]) >>> beta = array(params[1]) >>> if alpha<0 or beta<0:return nan >>> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >>> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >>> log(factorial(x))) >>> >>> if __name__=='__main__': >>> x = >>> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >>> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >>> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >>> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >>> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >>> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >>> 0.14979999999999999, 0.012999999999999999]) >>> optimize.() >>> >>> >>> Am I doing something wrong or is this a known problem? >>> >>> Best, >>> Bruno >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: From jsseabold at gmail.com Mon Mar 19 12:23:20 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 19 Mar 2012 12:23:20 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 12:14 PM, Bruno Santos wrote: > I believe the formula I have is accurate I checked it personally and also > have it checked by two mathematicians in the lab and they come up with the > same results. I left my notebook where I performed the transformations home > so don't completely remember but I believe you can simply things to get rid > of some of the parameters. > dicerAcc is a scalar as you mentioned. > I managed to implement the function in python now and it is giving the > same results as in R my question how to maximize it still remains though. > Is it possibly to maximize a function rather than minimize it in Python? > Ok, then I guess my math is faulty. I only looked quickly and don't see the other close parens in the formula. To maximize put a negative in front of the function. > > > On 15 March 2012 15:21, Skipper Seabold wrote: > >> On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: >> >>> Thank you all very much for the replies that was exactly what I wanted. >>> I am basically trying to get the parameters for a >>> gamma-poisson distribution. I have the R code from a >>> previous collaborator just trying to write a native function in python >>> rather than using the R code or port it using rpy2. >> >> >> Oh, fun. >> >> >>> The function is the following: >>> [image: Inline images 1] >>> where f(b,d) is a function that gives me a probability of a certain >>> position in the vector to be occupied and it depends on b (the position) >>> and d (the likelihood of making an error). >>> So the likelihood after a few transformations become: >>> >>> [image: Inline images 2] >>> Which I then use the loglikelihood and try to maximise it using an >>> optimization algorithm. >>> [image: Inline images 3] >>> The R code is as following: >>> alphabeta<-function(alphabeta,x,dicerAcc) >>> { >>> alpha <-alphabeta[1] >>> beta <-alphabeta[2] >>> if (any(alphabeta<0)) >>> return(NA) >>> sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - >>> lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc >>> > noiseT]) >>> >> >> From a quick (distracted) look (so I could be wrong) >> >> Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) >> should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar >> integer I take it? >> >> >>> >>> #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError >>> != 0]) >>> } >>> x and dicerAcc are known so the I use the optim function in R >>> ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, >>> dicerAcc = dicerAcc)$par >>> >>> Is there any equivalent function in Scipy to the optim one? >>> >>> On 14 March 2012 17:05, Bruno Santos wrote: >>> >>>> I am trying to write a script to do some maximum likelihood parameter >>>> estimation of a function. But when I try to use the gamma function I get: >>>> gamma(5) >>>> Out[5]: >>>> >>>> I thought it might have been a problem solved already on the new >>>> distribution but even after installing the last scipy version I get the >>>> same problem. >>>> The test() after installation is also failing with the following >>>> information: >>>> Running unit tests for scipy >>>> NumPy version 1.5.1 >>>> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >>>> SciPy version 0.10.1 >>>> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >>>> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >>>> nose version 1.1.2 >>>> ... >>>> ... >>>> ... >>>> AssertionError: >>>> Arrays are not almost equal >>>> ACTUAL: 0.0 >>>> DESIRED: 0.5 >>>> >>>> ====================================================================== >>>> FAIL: Regression test for #651: better handling of badly conditioned >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >>>> line 34, in test_bad_filter >>>> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >>>> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 982, >>>> in assert_raises >>>> return nose.tools.assert_raises(*args,**kwargs) >>>> AssertionError: BadCoefficients not raised >>>> >>>> ---------------------------------------------------------------------- >>>> Ran 5103 tests in 47.795s >>>> >>>> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >>>> Out[7]: >>>> >>>> >>>> My code is as follows: >>>> from numpy import array,log,sum,nan >>>> from scipy.stats import gamma >>>> from scipy import factorial, optimize >>>> >>>> #rinterface.initr() >>>> #IntSexpVector = rinterface.IntSexpVector >>>> #lgamma = rinterface.globalenv.get("lgamma") >>>> >>>> #Implementation for the Zero-inflated Negative Binomial function >>>> def alphabeta(params,x,dicerAcc): >>>> alpha = array(params[0]) >>>> beta = array(params[1]) >>>> if alpha<0 or beta<0:return nan >>>> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >>>> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >>>> log(factorial(x))) >>>> >>>> if __name__=='__main__': >>>> x = >>>> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >>>> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >>>> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >>>> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >>>> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >>>> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >>>> 0.14979999999999999, 0.012999999999999999]) >>>> optimize.() >>>> >>>> >>>> Am I doing something wrong or is this a known problem? >>>> >>>> Best, >>>> Bruno >>>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: From jsseabold at gmail.com Mon Mar 19 12:25:00 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 19 Mar 2012 12:25:00 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 12:23 PM, Skipper Seabold wrote: > On Mon, Mar 19, 2012 at 12:14 PM, Bruno Santos wrote: > >> I believe the formula I have is accurate I checked it personally and also >> have it checked by two mathematicians in the lab and they come up with the >> same results. I left my notebook where I performed the transformations home >> so don't completely remember but I believe you can simply things to get rid >> of some of the parameters. >> > dicerAcc is a scalar as you mentioned. >> I managed to implement the function in python now and it is giving the >> same results as in R my question how to maximize it still remains though. >> Is it possibly to maximize a function rather than minimize it in Python? >> > > Ok, then I guess my math is faulty. I only looked quickly and don't see > the other close parens in the formula. > Oh, I see now. I was trying to go off the log-likelihood you provided instead of looking at the likelihood. > > To maximize put a negative in front of the function. > > > >> >> >> On 15 March 2012 15:21, Skipper Seabold wrote: >> >>> On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: >>> >>>> Thank you all very much for the replies that was exactly what I wanted. >>>> I am basically trying to get the parameters for a >>>> gamma-poisson distribution. I have the R code from a >>>> previous collaborator just trying to write a native function in python >>>> rather than using the R code or port it using rpy2. >>> >>> >>> Oh, fun. >>> >>> >>>> The function is the following: >>>> [image: Inline images 1] >>>> where f(b,d) is a function that gives me a probability of a certain >>>> position in the vector to be occupied and it depends on b (the position) >>>> and d (the likelihood of making an error). >>>> So the likelihood after a few transformations become: >>>> >>>> [image: Inline images 2] >>>> Which I then use the loglikelihood and try to maximise it using an >>>> optimization algorithm. >>>> [image: Inline images 3] >>>> The R code is as following: >>>> alphabeta<-function(alphabeta,x,dicerAcc) >>>> { >>>> alpha <-alphabeta[1] >>>> beta <-alphabeta[2] >>>> if (any(alphabeta<0)) >>>> return(NA) >>>> sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - >>>> lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc >>>> > noiseT]) >>>> >>> >>> From a quick (distracted) look (so I could be wrong) >>> >>> Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) >>> should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar >>> integer I take it? >>> >>> >>>> >>>> #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError >>>> != 0]) >>>> } >>>> x and dicerAcc are known so the I use the optim function in R >>>> ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, >>>> dicerAcc = dicerAcc)$par >>>> >>>> Is there any equivalent function in Scipy to the optim one? >>>> >>>> On 14 March 2012 17:05, Bruno Santos wrote: >>>> >>>>> I am trying to write a script to do some maximum likelihood parameter >>>>> estimation of a function. But when I try to use the gamma function I get: >>>>> gamma(5) >>>>> Out[5]: >>>>> >>>>> I thought it might have been a problem solved already on the new >>>>> distribution but even after installing the last scipy version I get the >>>>> same problem. >>>>> The test() after installation is also failing with the following >>>>> information: >>>>> Running unit tests for scipy >>>>> NumPy version 1.5.1 >>>>> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >>>>> SciPy version 0.10.1 >>>>> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >>>>> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >>>>> nose version 1.1.2 >>>>> ... >>>>> ... >>>>> ... >>>>> AssertionError: >>>>> Arrays are not almost equal >>>>> ACTUAL: 0.0 >>>>> DESIRED: 0.5 >>>>> >>>>> ====================================================================== >>>>> FAIL: Regression test for #651: better handling of badly conditioned >>>>> ---------------------------------------------------------------------- >>>>> Traceback (most recent call last): >>>>> File >>>>> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >>>>> line 34, in test_bad_filter >>>>> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >>>>> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line >>>>> 982, in assert_raises >>>>> return nose.tools.assert_raises(*args,**kwargs) >>>>> AssertionError: BadCoefficients not raised >>>>> >>>>> ---------------------------------------------------------------------- >>>>> Ran 5103 tests in 47.795s >>>>> >>>>> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >>>>> Out[7]: >>>>> >>>>> >>>>> My code is as follows: >>>>> from numpy import array,log,sum,nan >>>>> from scipy.stats import gamma >>>>> from scipy import factorial, optimize >>>>> >>>>> #rinterface.initr() >>>>> #IntSexpVector = rinterface.IntSexpVector >>>>> #lgamma = rinterface.globalenv.get("lgamma") >>>>> >>>>> #Implementation for the Zero-inflated Negative Binomial function >>>>> def alphabeta(params,x,dicerAcc): >>>>> alpha = array(params[0]) >>>>> beta = array(params[1]) >>>>> if alpha<0 or beta<0:return nan >>>>> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >>>>> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >>>>> log(factorial(x))) >>>>> >>>>> if __name__=='__main__': >>>>> x = >>>>> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >>>>> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >>>>> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >>>>> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >>>>> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >>>>> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >>>>> 0.14979999999999999, 0.012999999999999999]) >>>>> optimize.() >>>>> >>>>> >>>>> Am I doing something wrong or is this a known problem? >>>>> >>>>> Best, >>>>> Bruno >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: From josef.pktd at gmail.com Mon Mar 19 12:28:14 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 19 Mar 2012 12:28:14 -0400 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 12:23 PM, Skipper Seabold wrote: > On Mon, Mar 19, 2012 at 12:14 PM, Bruno Santos wrote: > >> I believe the formula I have is accurate I checked it personally and also >> have it checked by two mathematicians in the lab and they come up with the >> same results. I left my notebook where I performed the transformations home >> so don't completely remember but I believe you can simply things to get rid >> of some of the parameters. >> > dicerAcc is a scalar as you mentioned. >> I managed to implement the function in python now and it is giving the >> same results as in R my question how to maximize it still remains though. >> Is it possibly to maximize a function rather than minimize it in Python? >> > > Ok, then I guess my math is faulty. I only looked quickly and don't see > the other close parens in the formula. > I didn't check the parens, but to me it just like the negative binomial but I think only the ratio p = mu/(beta+mu) 1-p = beta/(beta+mu) are identified, unless a parameter is held fixed. negative binomial is a poisson-gamma mixture, but I only found the p, 1-p parameterization Josef > > To maximize put a negative in front of the function. > > > >> >> >> On 15 March 2012 15:21, Skipper Seabold wrote: >> >>> On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: >>> >>>> Thank you all very much for the replies that was exactly what I wanted. >>>> I am basically trying to get the parameters for a >>>> gamma-poisson distribution. I have the R code from a >>>> previous collaborator just trying to write a native function in python >>>> rather than using the R code or port it using rpy2. >>> >>> >>> Oh, fun. >>> >>> >>>> The function is the following: >>>> [image: Inline images 1] >>>> where f(b,d) is a function that gives me a probability of a certain >>>> position in the vector to be occupied and it depends on b (the position) >>>> and d (the likelihood of making an error). >>>> So the likelihood after a few transformations become: >>>> >>>> [image: Inline images 2] >>>> Which I then use the loglikelihood and try to maximise it using an >>>> optimization algorithm. >>>> [image: Inline images 3] >>>> The R code is as following: >>>> alphabeta<-function(alphabeta,x,dicerAcc) >>>> { >>>> alpha <-alphabeta[1] >>>> beta <-alphabeta[2] >>>> if (any(alphabeta<0)) >>>> return(NA) >>>> sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - >>>> lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc >>>> > noiseT]) >>>> >>> >>> From a quick (distracted) look (so I could be wrong) >>> >>> Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) >>> should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar >>> integer I take it? >>> >>> >>>> >>>> #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError >>>> != 0]) >>>> } >>>> x and dicerAcc are known so the I use the optim function in R >>>> ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, >>>> dicerAcc = dicerAcc)$par >>>> >>>> Is there any equivalent function in Scipy to the optim one? >>>> >>>> On 14 March 2012 17:05, Bruno Santos wrote: >>>> >>>>> I am trying to write a script to do some maximum likelihood parameter >>>>> estimation of a function. But when I try to use the gamma function I get: >>>>> gamma(5) >>>>> Out[5]: >>>>> >>>>> I thought it might have been a problem solved already on the new >>>>> distribution but even after installing the last scipy version I get the >>>>> same problem. >>>>> The test() after installation is also failing with the following >>>>> information: >>>>> Running unit tests for scipy >>>>> NumPy version 1.5.1 >>>>> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >>>>> SciPy version 0.10.1 >>>>> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >>>>> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >>>>> nose version 1.1.2 >>>>> ... >>>>> ... >>>>> ... >>>>> AssertionError: >>>>> Arrays are not almost equal >>>>> ACTUAL: 0.0 >>>>> DESIRED: 0.5 >>>>> >>>>> ====================================================================== >>>>> FAIL: Regression test for #651: better handling of badly conditioned >>>>> ---------------------------------------------------------------------- >>>>> Traceback (most recent call last): >>>>> File >>>>> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >>>>> line 34, in test_bad_filter >>>>> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >>>>> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line >>>>> 982, in assert_raises >>>>> return nose.tools.assert_raises(*args,**kwargs) >>>>> AssertionError: BadCoefficients not raised >>>>> >>>>> ---------------------------------------------------------------------- >>>>> Ran 5103 tests in 47.795s >>>>> >>>>> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >>>>> Out[7]: >>>>> >>>>> >>>>> My code is as follows: >>>>> from numpy import array,log,sum,nan >>>>> from scipy.stats import gamma >>>>> from scipy import factorial, optimize >>>>> >>>>> #rinterface.initr() >>>>> #IntSexpVector = rinterface.IntSexpVector >>>>> #lgamma = rinterface.globalenv.get("lgamma") >>>>> >>>>> #Implementation for the Zero-inflated Negative Binomial function >>>>> def alphabeta(params,x,dicerAcc): >>>>> alpha = array(params[0]) >>>>> beta = array(params[1]) >>>>> if alpha<0 or beta<0:return nan >>>>> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >>>>> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >>>>> log(factorial(x))) >>>>> >>>>> if __name__=='__main__': >>>>> x = >>>>> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >>>>> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >>>>> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >>>>> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >>>>> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >>>>> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >>>>> 0.14979999999999999, 0.012999999999999999]) >>>>> optimize.() >>>>> >>>>> >>>>> Am I doing something wrong or is this a known problem? >>>>> >>>>> Best, >>>>> Bruno >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: From bacmsantos at gmail.com Mon Mar 19 12:34:48 2012 From: bacmsantos at gmail.com (Bruno Santos) Date: Mon, 19 Mar 2012 16:34:48 +0000 Subject: [SciPy-User] rv_frozen when using gamma function In-Reply-To: References: Message-ID: [quote]are identified, unless a parameter is held fixed. negative binomial is a poisson-gamma mixture, but I only found the p, 1-p parameterization Josef [/quote] Guys you are starting to overwhelm my knowledge :). But you are correct the only way I can solve it by assuming that d is given and fixed. It is not ideally but unfortunately I haven't found a way to get it experimentally so will have to pretend I know and use several values. And Skipper thank you very much some times the solution is so obvious I have problems in seeing it. The -1 did the trick. Thank you very much for all the help. I am really learning a lot from this thread. Bruno On 19 March 2012 16:28, wrote: > > > On Mon, Mar 19, 2012 at 12:23 PM, Skipper Seabold wrote: > >> On Mon, Mar 19, 2012 at 12:14 PM, Bruno Santos wrote: >> >>> I believe the formula I have is accurate I checked it personally and >>> also have it checked by two mathematicians in the lab and they come up with >>> the same results. I left my notebook where I performed the transformations >>> home so don't completely remember but I believe you can simply things to >>> get rid of some of the parameters. >>> >> dicerAcc is a scalar as you mentioned. >>> I managed to implement the function in python now and it is giving the >>> same results as in R my question how to maximize it still remains though. >>> Is it possibly to maximize a function rather than minimize it in Python? >>> >> >> Ok, then I guess my math is faulty. I only looked quickly and don't see >> the other close parens in the formula. >> > > I didn't check the parens, but to me it just like the negative binomial > but I think only the ratio > > p = mu/(beta+mu) > 1-p = beta/(beta+mu) > > are identified, unless a parameter is held fixed. > > negative binomial is a poisson-gamma mixture, but I only found the p, 1-p > parameterization > > Josef > > > >> >> To maximize put a negative in front of the function. >> >> >> >>> >>> >>> On 15 March 2012 15:21, Skipper Seabold wrote: >>> >>>> On Thu, Mar 15, 2012 at 11:07 AM, Bruno Santos wrote: >>>> >>>>> Thank you all very much for the replies that was exactly what I >>>>> wanted. I am basically trying to get the parameters for a >>>>> gamma-poisson distribution. I have the R code from a >>>>> previous collaborator just trying to write a native function in python >>>>> rather than using the R code or port it using rpy2. >>>> >>>> >>>> Oh, fun. >>>> >>>> >>>>> The function is the following: >>>>> [image: Inline images 1] >>>>> where f(b,d) is a function that gives me a probability of a certain >>>>> position in the vector to be occupied and it depends on b (the position) >>>>> and d (the likelihood of making an error). >>>>> So the likelihood after a few transformations become: >>>>> >>>>> [image: Inline images 2] >>>>> Which I then use the loglikelihood and try to maximise it using an >>>>> optimization algorithm. >>>>> [image: Inline images 3] >>>>> The R code is as following: >>>>> alphabeta<-function(alphabeta,x,dicerAcc) >>>>> { >>>>> alpha <-alphabeta[1] >>>>> beta <-alphabeta[2] >>>>> if (any(alphabeta<0)) >>>>> return(NA) >>>>> sum((alpha*log(beta) + lgamma(alpha + x) + x * log(dicerAcc) - >>>>> lgamma(alpha) - (alpha + x) * log(beta+dicerAcc) - lfactorial(x))[dicerAcc >>>>> > noiseT]) >>>>> >>>> >>>> From a quick (distracted) look (so I could be wrong) >>>> >>>> Should this be alpha^2*log(beta) ? +lgamma(alpha) ? And lfactorial(x) >>>> should still be +lgamma(alpha)*lfactorial(x) ? And dicerAcc a scalar >>>> integer I take it? >>>> >>>> >>>>> >>>>> #sum((alpha*log(beta)+(lgamma(alpha+x)+log(dicerError^x))-(lgamma(alpha)+log((beta+dicerError)^(alpha+x))+lfactorial(x)))[dicerError >>>>> != 0]) >>>>> } >>>>> x and dicerAcc are known so the I use the optim function in R >>>>> ab <- optim(c(1,100), alphabeta, control=list(fnscale=-1), x = x, >>>>> dicerAcc = dicerAcc)$par >>>>> >>>>> Is there any equivalent function in Scipy to the optim one? >>>>> >>>>> On 14 March 2012 17:05, Bruno Santos wrote: >>>>> >>>>>> I am trying to write a script to do some maximum likelihood parameter >>>>>> estimation of a function. But when I try to use the gamma function I get: >>>>>> gamma(5) >>>>>> Out[5]: >>>>>> >>>>>> I thought it might have been a problem solved already on the new >>>>>> distribution but even after installing the last scipy version I get the >>>>>> same problem. >>>>>> The test() after installation is also failing with the following >>>>>> information: >>>>>> Running unit tests for scipy >>>>>> NumPy version 1.5.1 >>>>>> NumPy is installed in /usr/lib/pymodules/python2.7/numpy >>>>>> SciPy version 0.10.1 >>>>>> SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy >>>>>> Python version 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] >>>>>> nose version 1.1.2 >>>>>> ... >>>>>> ... >>>>>> ... >>>>>> AssertionError: >>>>>> Arrays are not almost equal >>>>>> ACTUAL: 0.0 >>>>>> DESIRED: 0.5 >>>>>> >>>>>> ====================================================================== >>>>>> FAIL: Regression test for #651: better handling of badly conditioned >>>>>> ---------------------------------------------------------------------- >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/scipy/signal/tests/test_filter_design.py", >>>>>> line 34, in test_bad_filter >>>>>> assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) >>>>>> File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line >>>>>> 982, in assert_raises >>>>>> return nose.tools.assert_raises(*args,**kwargs) >>>>>> AssertionError: BadCoefficients not raised >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> Ran 5103 tests in 47.795s >>>>>> >>>>>> FAILED (KNOWNFAIL=13, SKIP=28, failures=3) >>>>>> Out[7]: >>>>>> >>>>>> >>>>>> My code is as follows: >>>>>> from numpy import array,log,sum,nan >>>>>> from scipy.stats import gamma >>>>>> from scipy import factorial, optimize >>>>>> >>>>>> #rinterface.initr() >>>>>> #IntSexpVector = rinterface.IntSexpVector >>>>>> #lgamma = rinterface.globalenv.get("lgamma") >>>>>> >>>>>> #Implementation for the Zero-inflated Negative Binomial function >>>>>> def alphabeta(params,x,dicerAcc): >>>>>> alpha = array(params[0]) >>>>>> beta = array(params[1]) >>>>>> if alpha<0 or beta<0:return nan >>>>>> return sum((alpha*log(beta)) + log(gamma(alpha+x)) + x * >>>>>> log(dicerAcc) - log(gamma(alpha)) - (alpha+x) * log(beta+dicerAcc) - >>>>>> log(factorial(x))) >>>>>> >>>>>> if __name__=='__main__': >>>>>> x = >>>>>> array([123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,104,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,24,1,0,0,0,0,0,0,0,2,0,0,4,0,0,0,0,0,0,0,0,12,0,0]) >>>>>> dicerAcc = array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>>> 0.048750000000000002,0.90085000000000004, 0.0504, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0023, >>>>>> 0.089149999999999993, 0.81464999999999999, 0.091550000000000006, >>>>>> 0.0023500000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, 0.0061000000000000004, >>>>>> 0.12085, 0.7429, 0.12325, 0.0067000000000000002, 0.0, 0.0, 0.0, 0.0, 0.0, >>>>>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00020000000000000001, >>>>>> 0.012500000000000001, 0.14255000000000001, 0.68159999999999998, >>>>>> 0.14979999999999999, 0.012999999999999999]) >>>>>> optimize.() >>>>>> >>>>>> >>>>>> Am I doing something wrong or is this a known problem? >>>>>> >>>>>> Best, >>>>>> Bruno >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4401 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 3193 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 1620 bytes Desc: not available URL: From greg.friedland at gmail.com Mon Mar 19 13:42:32 2012 From: greg.friedland at gmail.com (Greg Friedland) Date: Mon, 19 Mar 2012 10:42:32 -0700 Subject: [SciPy-User] Scipy.weave.inline and py2exe Message-ID: Hi All, Is it possible to use scipy.weave.inline to create a windows exe with py2exe? Perhaps this is too much to ask for because of the behind the scenes stuff that inline does but I thought I'd ask anyway. For the moment when I simply try to import scipy.weave, the exe bundles but when I run it I get: File "scipy\weave\__init__.pyc", line 26, in File "scipy\weave\inline_tools.pyc", line 5, in File "scipy\weave\ext_tools.pyc", line 6, in File "scipy\weave\build_tools.pyc", line 28, in File "scipy\weave\platform_info.pyc", line 15, in File "numpy\distutils\core.pyc", line 25, in File "numpy\distutils\command\build_ext.pyc", line 9, in File "distutils\command\build_ext.pyc", line 13, in File "site.pyc", line 73, in File "site.pyc", line 38, in __boot ImportError: Couldn't find the real 'site' module thanks, Greg From seb.haase at gmail.com Mon Mar 19 13:52:47 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 19 Mar 2012 18:52:47 +0100 Subject: [SciPy-User] Scipy.weave.inline and py2exe In-Reply-To: References: Message-ID: Hi, It should work, with some weave magic ... since weave should know at compile time (or at "import time") if a module has already been compiled - so would just have to run (or "import") the module once before you do the py2exe stuff.... This is just loud thinking, I would not know the details off hand ..... Hoping someone else here has more details - Sebastian Haase PS: Last time I used weave (some 5 or so years ago) it seemed quite orphaned... I would be happy to hear now otherwise.. On Mon, Mar 19, 2012 at 6:42 PM, Greg Friedland wrote: > Hi All, > Is it possible to use scipy.weave.inline to create a windows exe with > py2exe? Perhaps this is too much to ask for because of the behind the > scenes stuff that inline does but I thought I'd ask anyway. > > For the moment when I simply try to import scipy.weave, the exe > bundles but when I run it I get: > > ?File "scipy\weave\__init__.pyc", line 26, in > ?File "scipy\weave\inline_tools.pyc", line 5, in > ?File "scipy\weave\ext_tools.pyc", line 6, in > ?File "scipy\weave\build_tools.pyc", line 28, in > ?File "scipy\weave\platform_info.pyc", line 15, in > ?File "numpy\distutils\core.pyc", line 25, in > ?File "numpy\distutils\command\build_ext.pyc", line 9, in > ?File "distutils\command\build_ext.pyc", line 13, in > ?File "site.pyc", line 73, in > ?File "site.pyc", line 38, in __boot > ImportError: Couldn't find the real 'site' module > > > > thanks, > Greg > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From nicolas.pinto at gmail.com Mon Mar 19 14:24:33 2012 From: nicolas.pinto at gmail.com (Nicolas Pinto) Date: Mon, 19 Mar 2012 19:24:33 +0100 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module Message-ID: Hello, The following simple code hangs only when sparse has been imported: ``` from scipy import sparse # <<<<<<< BUG import numpy as np from scipy import linalg N = 1000 np.random.seed(42) X = np.random.random((N, N)) print X.mean() v, Q = linalg.eigh(X) print v.mean() ``` Do you think this may be related to other arpack/umfpack/etc. known failures ? Please let us know how can we help fix this issue. Thanks for your help. Regards, -- Nicolas Pinto http://web.mit.edu/pinto From ralf.gommers at googlemail.com Mon Mar 19 14:54:32 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 19 Mar 2012 19:54:32 +0100 Subject: [SciPy-User] Scipy.weave.inline and py2exe In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 6:52 PM, Sebastian Haase wrote: > Hi, > It should work, with some weave magic ... since weave should know at > compile time (or at "import time") if a module has already been > compiled - so would just have to run (or "import") the module once > before you do the py2exe stuff.... > > This is just loud thinking, I would not know the details off hand ..... > Hoping someone else here has more details > - Sebastian Haase > > PS: Last time I used weave (some 5 or so years ago) it seemed quite > orphaned... I would be happy to hear now otherwise.. > You won't hear otherwise. It's unmaintained and only kept for backwards compatibility. Use Cython for new code. Ralf > > On Mon, Mar 19, 2012 at 6:42 PM, Greg Friedland > wrote: > > Hi All, > > Is it possible to use scipy.weave.inline to create a windows exe with > > py2exe? Perhaps this is too much to ask for because of the behind the > > scenes stuff that inline does but I thought I'd ask anyway. > > > > For the moment when I simply try to import scipy.weave, the exe > > bundles but when I run it I get: > > > > File "scipy\weave\__init__.pyc", line 26, in > > File "scipy\weave\inline_tools.pyc", line 5, in > > File "scipy\weave\ext_tools.pyc", line 6, in > > File "scipy\weave\build_tools.pyc", line 28, in > > File "scipy\weave\platform_info.pyc", line 15, in > > File "numpy\distutils\core.pyc", line 25, in > > File "numpy\distutils\command\build_ext.pyc", line 9, in > > File "distutils\command\build_ext.pyc", line 13, in > > File "site.pyc", line 73, in > > File "site.pyc", line 38, in __boot > > ImportError: Couldn't find the real 'site' module > > > > > > > > thanks, > > Greg > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Mon Mar 19 15:15:52 2012 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 19 Mar 2012 20:15:52 +0100 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: On Mon, 19 Mar 2012 19:24:33 +0100 Nicolas Pinto wrote: > Hello, > > The following simple code hangs only when sparse has >been imported: > > ``` > from scipy import sparse # <<<<<<< BUG > import numpy as np > from scipy import linalg > > N = 1000 > np.random.seed(42) > X = np.random.random((N, N)) > print X.mean() > v, Q = linalg.eigh(X) > print v.mean() > ``` > > Do you think this may be related to other >arpack/umfpack/etc. known failures ? > > Please let us know how can we help fix this issue. > > Thanks for your help. > > Regards, > Your matrix X is not symmetric. >>> X-X.T array([[ 0. , 0.76558138, 0.47028826, ..., 0.0515565 , 0.19001774, 0.33171462], [-0.76558138, 0. , 0.62596704, ..., -0.0230795 , -0.90677174, 0.12238354], [-0.47028826, -0.62596704, 0. , ..., -0.38459427, 0.28527075, 0.04568694], ..., [-0.0515565 , 0.0230795 , 0.38459427, ..., 0. , 0.57859577, -0.24268277], [-0.19001774, 0.90677174, -0.28527075, ..., -0.57859577, 0. , 0.52747713], [-0.33171462, -0.12238354, -0.04568694, ..., 0.24268277, -0.52747713, 0. ]]) eigh assumes a symmetric or hermitian matrix. Nils From dyamins at gmail.com Mon Mar 19 15:20:44 2012 From: dyamins at gmail.com (Dan Yamins) Date: Mon, 19 Mar 2012 15:20:44 -0400 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: Your matrix X is not symmetric. > This is not the problem. (Even if that were the problem, it wouldn't cause a hang -- docs say "no error will be reported but results will be wrong".) In fact, the same hang happens on the installation which originally had this problem if you replace X with X = np.dot(X, X.T) so that the matrix is symmetric. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Mar 19 17:09:17 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 19 Mar 2012 22:09:17 +0100 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: Hi, 19.03.2012 19:24, Nicolas Pinto kirjoitti: > The following simple code hangs only when sparse has been imported: [clip] It does not hang for me. So, first things first: - which platform? - which binaries? - which LAPACK? > Do you think this may be related to other arpack/umfpack/etc. known failures ? > > Please let us know how can we help fix this issue. ARPACK et al are probably not related, because they are not imported by ``from scipy import sparse``. A more likely candidate is the SWIG-wrapped `sparsetools` package: it is known to also cause some other weirdness: http://projects.scipy.org/scipy/ticket/1314 This unfortunately seems pretty difficult to debug. One thing I could imagine doing is minimizing the problem, by first stripping everything away from `scipy.sparse` except the sparsetools module, and then stripping down the sparsetools code until the failing part is found. -- Pauli Virtanen From pav at iki.fi Mon Mar 19 17:37:19 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 19 Mar 2012 22:37:19 +0100 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: 19.03.2012 22:09, Pauli Virtanen kirjoitti: [clip] > This unfortunately seems pretty difficult to debug. One thing I could > imagine doing is minimizing the problem, by first stripping everything > away from `scipy.sparse` except the sparsetools module, and then > stripping down the sparsetools code until the failing part is found. Another possibility is that the problem comes just from the c++ runtime. There's another c++ module in Scipy, `scipy.interpolate._interpolate` -- could you check if importing it also causes the same issue? -- Pauli Virtanen From dg.gmane at thesamovar.net Mon Mar 19 17:49:45 2012 From: dg.gmane at thesamovar.net (Dan Goodman) Date: Mon, 19 Mar 2012 22:49:45 +0100 Subject: [SciPy-User] Scipy.weave.inline and py2exe In-Reply-To: References: Message-ID: On 19/03/2012 19:54, Ralf Gommers wrote: > On Mon, Mar 19, 2012 at 6:52 PM, Sebastian Haase > wrote: > PS: Last time I used weave (some 5 or so years ago) it seemed quite > orphaned... I would be happy to hear now otherwise.. > > You won't hear otherwise. It's unmaintained and only kept for backwards > compatibility. Use Cython for new code. Interesting. I use weave for runtime code generation. Is it possible/simple to do this with Cython? Dan From kwgoodman at gmail.com Mon Mar 19 18:07:21 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 19 Mar 2012 15:07:21 -0700 Subject: [SciPy-User] [ANN] la 0.6, labeled array Message-ID: The sixth release of la (labeled array) adds new functions, improves existing functions, and fixes bugs. The main class of the la package is a labeled array, larry. A larry consists of data and labels. The data is stored as a NumPy array and the labels as a list of lists (one list per dimension). New functions - la.isaligned() returns True if two larrys are aligned along specified axis - la.sortby() sorts a larry by a row or column specified by its label - la.align_axis() aligns multiple larrys along (possibly) different axes - la.zeros(), la.ones(), la.empty() - la.lrange() similar to np.arange() but allows multi-dimensional output Enhancements - larry.lag() now accepts negative lags - datime.time and datetime.datetime labels can now be (HDF5) archived - la.align() can now skip the axes you do not wish to align - Upgrade numpydoc from 0.3.1 to 0.4 to support Sphinx 1.0.1 - la.farray.ranking() and larry ranking method support `axis=None` - Generate C code with Cython 0.15.1 instead of Cython 0.11 - Add makefile Faster - larry methods: merge, nan_replace, push, cumsum, cumprod, astype, __rdiv__ - larry function: cov - Numpy array functions: geometric_mean, correlation, covMissing Breakage from la 0.5 - optional parameter for larry creation renamed from integrity to validate Bugs fixes - #14 larry.lag() gives wrong output when nlag=0 - #20 Indexing chokes on lar[:,3:2] - #21 Merging two larrys chokes when one is empty - #22 Morphing an empty larry chokes lar.morph() - #31 la.panel() gives wrong output - #35 larry([1, 2]) == 'a' did not return a bool like numpy does - #38 Indexing single element of larry with object dtype - #39 move_func(myfunc) did not pass kwargs to myfunc when method='loop' - #49 setup.py does not install module to load yahoo finance data - #50 la.larry([], dtype=np.int).sum(0), and similar reductions, choke - #51 -la.larry([True, False]) returns wrong answer URLs download http://pypi.python.org/pypi/la docs http://berkeleyanalytics.com/la code https://github.com/kwgoodman/la mailing list http://groups.google.com/group/labeled-array issue tracker https://github.com/kwgoodman/la/issues From ralf.gommers at googlemail.com Mon Mar 19 18:12:28 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 19 Mar 2012 23:12:28 +0100 Subject: [SciPy-User] Scipy.weave.inline and py2exe In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 10:49 PM, Dan Goodman wrote: > On 19/03/2012 19:54, Ralf Gommers wrote: > > On Mon, Mar 19, 2012 at 6:52 PM, Sebastian Haase > > wrote: > > PS: Last time I used weave (some 5 or so years ago) it seemed quite > > orphaned... I would be happy to hear now otherwise.. > > > > You won't hear otherwise. It's unmaintained and only kept for backwards > > compatibility. Use Cython for new code. > > Interesting. I use weave for runtime code generation. Is it > possible/simple to do this with Cython? http://docs.cython.org/src/reference/compilation.html#compiling-with-cython-inline not sure how that exactly compares to weave.inline Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.friedland at gmail.com Mon Mar 19 19:41:05 2012 From: greg.friedland at gmail.com (Greg Friedland) Date: Mon, 19 Mar 2012 16:41:05 -0700 Subject: [SciPy-User] Scipy.weave.inline and py2exe In-Reply-To: References: Message-ID: Thanks, that's good to know. I hadn't realized it was so out of date. I will switch to cython then. I took a quick look and couldn't get cython.inline to work with py2exe but pyximport.install() did. Greg On Mon, Mar 19, 2012 at 11:54 AM, Ralf Gommers wrote: > > > On Mon, Mar 19, 2012 at 6:52 PM, Sebastian Haase > wrote: >> >> Hi, >> It should work, with some weave magic ... since weave should know at >> compile time (or at "import time") if a module has already been >> compiled - so would just have to run (or "import") the module once >> before you do the py2exe stuff.... >> >> This is just loud thinking, ?I would not know the details off hand ..... >> Hoping someone else here has more details >> - Sebastian Haase >> >> PS: Last time I used weave (some 5 or so years ago) ?it seemed quite >> orphaned... I would be happy to hear now otherwise.. > > > You won't hear otherwise. It's unmaintained and only kept for backwards > compatibility. Use Cython for new code. > > Ralf > > >> >> >> On Mon, Mar 19, 2012 at 6:42 PM, Greg Friedland >> wrote: >> > Hi All, >> > Is it possible to use scipy.weave.inline to create a windows exe with >> > py2exe? Perhaps this is too much to ask for because of the behind the >> > scenes stuff that inline does but I thought I'd ask anyway. >> > >> > For the moment when I simply try to import scipy.weave, the exe >> > bundles but when I run it I get: >> > >> > ?File "scipy\weave\__init__.pyc", line 26, in >> > ?File "scipy\weave\inline_tools.pyc", line 5, in >> > ?File "scipy\weave\ext_tools.pyc", line 6, in >> > ?File "scipy\weave\build_tools.pyc", line 28, in >> > ?File "scipy\weave\platform_info.pyc", line 15, in >> > ?File "numpy\distutils\core.pyc", line 25, in >> > ?File "numpy\distutils\command\build_ext.pyc", line 9, in >> > ?File "distutils\command\build_ext.pyc", line 13, in >> > ?File "site.pyc", line 73, in >> > ?File "site.pyc", line 38, in __boot >> > ImportError: Couldn't find the real 'site' module >> > >> > >> > >> > thanks, >> > Greg >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pujari.manisha at gmail.com Tue Mar 20 05:26:24 2012 From: pujari.manisha at gmail.com (manisha pujari) Date: Tue, 20 Mar 2012 10:26:24 +0100 Subject: [SciPy-User] Problem with Installation of Scipy on Macbook Message-ID: Hello everyone, This is the first time I am using scipy and I am having much trouble to just install it on my macbook. I am using Python 2.6 and numpy 1.6.0. I downloaded scipy0.9.0 tar.gz file from the web link http://sourceforge.net/projects/scipy/files/scipy/ and tried to install it from the source. But on giving the command scipy-0.9.0$ python setup.py build it is always giving the following problem : Warning: No configuration returned, assuming unavailable.blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] non-existing path in 'scipy/io': 'docs' lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-faltivec'] umfpack_info: libraries umfpack not found in /System/Library/Frameworks/Python.framework/Versions/2.6/lib libraries umfpack not found in /usr/local/lib libraries umfpack not found in /usr/lib /Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/system_info.py:463: UserWarning: UMFPACK sparse solver (http://www.cise.ufl.edu/research/sparse/umfpack/) not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [umfpack]) or by setting the UMFPACK environment variable. warnings.warn(self.notfounderror.__doc__) NOT AVAILABLE Traceback (most recent call last): File "setup.py", line 181, in setup_package() File "setup.py", line 173, in setup_package configuration=configuration ) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/core.py", line 152, in setup config = configuration() File "setup.py", line 122, in configuration config.add_subpackage('scipy') File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 972, in add_subpackage caller_level = 2) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 941, in get_subpackage caller_level = caller_level + 1) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 878, in _get_configuration_from_setup_py config = setup_module.configuration(*args) File "scipy/setup.py", line 20, in configuration config.add_subpackage('special') File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 972, in add_subpackage caller_level = 2) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 941, in get_subpackage caller_level = caller_level + 1) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 878, in _get_configuration_from_setup_py config = setup_module.configuration(*args) File "scipy/special/setup.py", line 54, in configuration extra_info=get_info("npymath") File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 2184, in get_info pkg_info = get_pkg_info(pkgname, dirs) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", line 2136, in get_pkg_info return read_config(pkgname, dirs) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", line 390, in read_config v = _read_config_imp(pkg_to_filename(pkgname), dirs) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", line 326, in _read_config_imp meta, vars, sections, reqs = _read_config(filenames) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", line 309, in _read_config meta, vars, sections, reqs = parse_config(f, dirs) File "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", line 281, in parse_config raise PkgNotFound("Could not find file(s) %s" % str(filenames)) numpy.distutils.npy_pkg_config.PkgNotFound: Could not find file(s) ['/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/core/lib/npy-pkg-config/npymath.ini'] I am unable to understand where and what exactly the problem is. Can anyone please help me? I will be really thankful for a response. -- Regards, Manisha -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Tue Mar 20 07:58:49 2012 From: denis-bz-gg at t-online.de (denis) Date: Tue, 20 Mar 2012 04:58:49 -0700 (PDT) Subject: [SciPy-User] quadratic programming with fmin_slsqp In-Reply-To: References: Message-ID: <4b58200d-59e9-4cc8-bd7b-6beee0921fd8@w32g2000vbt.googlegroups.com> On Mar 16, 5:45?pm, josef.p... at gmail.com wrote: > scipy is missing a fmin_quadprog Josef, minmize() is a reasonable common interface to 10 or so optimizers, see http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html however - minimize.py is not in scipy-0.9.0.tar nor in scipy-0.10.1.tar (a test to see if anybody's using it ?) - only L-BFGS-B TNC COBYLA and SLSQP support bounds. One could supply a trivial box / penaltybox as outlined below (I use this playing around with Neldermead) but I'm not sure anybody would use it plus there's openopt pyomo mystic ... maybe more solvers than real testcases :- cheers -- denis class Funcbox: """ F = Funcbox( func, [box=(0,1), penalty=None, grid=0, *funcargs, **kwargs wraps a func() with a constraint box and grid Parameters ---------- func: a function of a numpy vector or array-like box: (low, high) to np.clip, default (0,1). These can be vectors; low_j == high_j freezes x_j at that value. penalty: e.g. (0, 1, 1000) adds a quadratic penalty to func() where xclip is outside (0, 1) 1000 * sum( max( 0 - x, 0 )**2 + max( x - 1, 0 )**2 ) = 1 4 9 16 ... at -.01 -.02 ... and 1.01 1.02 ... The default is None, no penalty. (The penalty box should be smaller than the clip box; x is first gridded if grid > 0, then clipped, then penalty computed.) grid: e.g. .01 snaps all x_j to multiples of .01 -- a simple noise smoother, recommended for noisy functions. The default is 0, no gridding. From dhondt.olivier at gmail.com Mon Mar 19 05:23:57 2012 From: dhondt.olivier at gmail.com (tyldurd) Date: Mon, 19 Mar 2012 02:23:57 -0700 (PDT) Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> Message-ID: <28284362.88.1332149037436.JavaMail.geo-discussion-forums@yneo2> Dan, Thanks for your answer. However, this solution does not work for me. First, it returns an array with dtype=object which is not the original type of the data. Besides, the values in the array are not equal to the ones given by the 'traditional' nested loops. I think the problem comes from the fact that ufuncs are functions that act over each element of an array, not over slices. I have done a lot of research on this topic but it seems it is not feasible in terms of slicing or vectorizing. The only solution I found would be with generalized ufuncs but from what I understand, they require to write C code, which I would like to avoid :-) Therefore, I am going to stick to nested loops at least for now. Regards, Olivier On Thursday, March 15, 2012 9:23:49 PM UTC+1, Dan Lussier wrote: > > Have you tried numpy.frompyfunc? > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html > > http://stackoverflow.com/questions/6126233/can-i-create-a-python-numpy-ufunc-from-an-unbound-member-method > > With this approach you may be able create a function which acts > elementwise over your array to compute the matrix logarithm at each entry > using Numpy's ufuncs. This would avoid the explicit iteration over the > array using the for loops. > > As a rough outline try: > > from scipy import linalg > import numpy as np > > # Assume im is the container array containing a 3x3 matrix at each pixel. > > # Composite function so get matrix log of array A as a matrix in one step > def log_matrix(A): > return linalg.logm(np.asmatrix(A)) > > > # Creating function to operate over container array. Takes one argument > and returns the result. > log_ufunc = np.frompyfunc(log_matrix, 1, 1) > > # Using log_ufunc on container array, im > res = log_ufunc(im) > > Dan > > > On 2012-03-15, at 1:59 AM, tyldurd wrote: > > Hello, > > I am a beginner at python and numpy and I need to compute the matrix > logarithm for each "pixel" (i.e. x,y position) of a matrix-valued image of > dimension MxNx3x3. 3x3 is the dimensions of the matrix at each pixel. > > The function I have written so far is the following: > > def logm_img(im): > from scipy import linalg > dimx = im.shape[0] > dimy = im.shape[1] > res = zeros_like(im) > for x in range(dimx): > for y in range(dimy): > res[x, y, :, :] = linalg.logm(asmatrix(im[x,y,:,:])) > return res > > Is it ok? Is there a way to avoid the two nested loops ? > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From conny.kuehne at googlemail.com Tue Mar 20 05:43:08 2012 From: conny.kuehne at googlemail.com (=?iso-8859-1?Q?Conny_K=FChne?=) Date: Tue, 20 Mar 2012 10:43:08 +0100 Subject: [SciPy-User] non-existing path in 'scipy/io': 'docs' In-Reply-To: References: <74F753BB-984A-400E-82F9-2B95EE0CEE80@googlemail.com> Message-ID: Hi, thanks for the info. I actually got more install issues using the sources. So I removed everything and installed the binaries. This seems to have worked. Conny Am 18.03.2012 um 22:42 schrieb Ralf Gommers: > > > 2012/3/16 Conny K?hne > Hello, > > I get the following error when trying to build scipy 0.10.1 from source > > blas_opt_info: > FOUND: > extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] > define_macros = [('NO_ATLAS_INFO', 3)] > extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] > > non-existing path in 'scipy/io': 'docs' > > This can be cleaned up by removing the line "config.add_data_dir('docs')" in scipy/io/setup.py. Note that this is not an actual error though, just a harmless warning. Do you have an actual install issue? If so, can you post the full build log? > > Ralf > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From njs at pobox.com Tue Mar 20 11:33:58 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 20 Mar 2012 15:33:58 +0000 Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: <28284362.88.1332149037436.JavaMail.geo-discussion-forums@yneo2> References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> <28284362.88.1332149037436.JavaMail.geo-discussion-forums@yneo2> Message-ID: On Mon, Mar 19, 2012 at 9:23 AM, tyldurd wrote: > I have done a lot of research on this topic but it seems it is not feasible > in terms of slicing or vectorizing. The only solution I found would be with > generalized ufuncs but from what I understand, they require to write C code, > which I would like to avoid :-) I think the idea of generalized ufuncs is that linalg.logm should be written as a generalized ufunc already out of the box, and then this would be straightforward. However: (1) it isn't, and (2) even if it were, I'm having trouble understanding from the available docs how you would actually use it -- maybe calling logm would just work for your case, but there don't seem to be any examples available of how it chooses which dimensions to apply to. (Are there any generalized ufuncs actually defined in the standard packages? For instance, is np.dot implemented as a generalized ufunc? Should it be?) > Therefore, I am going to stick to nested loops at least for now. That seems like the best option to me. Nothing immoral about using a loop when that's what you need :-). -- Nathaniel From josef.pktd at gmail.com Tue Mar 20 11:38:42 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Mar 2012 11:38:42 -0400 Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> <28284362.88.1332149037436.JavaMail.geo-discussion-forums@yneo2> Message-ID: On Tue, Mar 20, 2012 at 11:33 AM, Nathaniel Smith wrote: > On Mon, Mar 19, 2012 at 9:23 AM, tyldurd wrote: >> I have done a lot of research on this topic but it seems it is not feasible >> in terms of slicing or vectorizing. The only solution I found would be with >> generalized ufuncs but from what I understand, they require to write C code, >> which I would like to avoid :-) > > I think the idea of generalized ufuncs is that linalg.logm should be > written as a generalized ufunc already out of the box, and then this > would be straightforward. However: (1) it isn't, and (2) even if it > were, I'm having trouble understanding from the available docs how you > would actually use it -- maybe calling logm would just work for your > case, but there don't seem to be any examples available of how it > chooses which dimensions to apply to. (Are there any generalized > ufuncs actually defined in the standard packages? For instance, is > np.dot implemented as a generalized ufunc? Should it be?) only in a test case, AFAIK from numpy.core.umath_tests import matrix_multiply Josef > >> Therefore, I am going to stick to nested loops at least for now. > > That seems like the best option to me. Nothing immoral about using a > loop when that's what you need :-). > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From borreguero at gmail.com Tue Mar 20 17:08:22 2012 From: borreguero at gmail.com (Jose Borreguero) Date: Tue, 20 Mar 2012 17:08:22 -0400 Subject: [SciPy-User] how to obtain I,J,V from sparse matrix (V,(I,J)) ? Message-ID: Dear Scipy users, Scipy docs state one can construct a matrix from three 1D arrays, A = sparse .coo_matrix((V,(I,J)),shape=(4,4)) However, given sparse matrix A, how can I obtain arrays V, I, and J? I could not find any methods of the sparse matrix that would return these arrays... - Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Mar 20 17:55:31 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 20 Mar 2012 22:55:31 +0100 Subject: [SciPy-User] how to obtain I, J, V from sparse matrix (V, (I, J)) ? In-Reply-To: References: Message-ID: 20.03.2012 22:08, Jose Borreguero kirjoitti: > Scipy docs state one can construct a matrix from three 1D > arrays,A=sparse.coo_matrix((V,(I,J)),shape=(4,4)) > > However, given sparse matrix A, how can I obtain arrays V, I, and J? > I could not find any methods of the sparse matrix that would return these arrays... Check out the `data`, `row` and `col` attributes: http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html From ralf.gommers at googlemail.com Tue Mar 20 18:06:04 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 20 Mar 2012 23:06:04 +0100 Subject: [SciPy-User] Problem with Installation of Scipy on Macbook In-Reply-To: References: Message-ID: On Tue, Mar 20, 2012 at 10:26 AM, manisha pujari wrote: > Hello everyone, > > This is the first time I am using scipy and I am having much trouble to > just install it on my macbook. I am using Python 2.6 and numpy 1.6.0. > I downloaded scipy0.9.0 tar.gz file from the web link > http://sourceforge.net/projects/scipy/files/scipy/ and tried to install > it from the source. > But on giving the command > > scipy-0.9.0$ python setup.py build > > it is always giving the following problem : > > Warning: No configuration returned, assuming unavailable.blas_opt_info: > FOUND: > extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] > define_macros = [('NO_ATLAS_INFO', 3)] > extra_compile_args = ['-faltivec', > '-I/System/Library/Frameworks/vecLib.framework/Headers'] > > non-existing path in 'scipy/io': 'docs' > lapack_opt_info: > FOUND: > extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] > define_macros = [('NO_ATLAS_INFO', 3)] > extra_compile_args = ['-faltivec'] > > umfpack_info: > libraries umfpack not found in > /System/Library/Frameworks/Python.framework/Versions/2.6/lib > libraries umfpack not found in /usr/local/lib > libraries umfpack not found in /usr/lib > /Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/system_info.py:463: > UserWarning: > UMFPACK sparse solver ( > http://www.cise.ufl.edu/research/sparse/umfpack/) > not found. Directories to search for the libraries can be specified in > the > numpy/distutils/site.cfg file (section [umfpack]) or by setting > the UMFPACK environment variable. > warnings.warn(self.notfounderror.__doc__) > NOT AVAILABLE > > Traceback (most recent call last): > File "setup.py", line 181, in > setup_package() > File "setup.py", line 173, in setup_package > configuration=configuration ) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/core.py", > line 152, in setup > config = configuration() > File "setup.py", line 122, in configuration > config.add_subpackage('scipy') > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 972, in add_subpackage > caller_level = 2) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 941, in get_subpackage > caller_level = caller_level + 1) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 878, in _get_configuration_from_setup_py > config = setup_module.configuration(*args) > File "scipy/setup.py", line 20, in configuration > config.add_subpackage('special') > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 972, in add_subpackage > caller_level = 2) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 941, in get_subpackage > caller_level = caller_level + 1) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 878, in _get_configuration_from_setup_py > config = setup_module.configuration(*args) > File "scipy/special/setup.py", line 54, in configuration > extra_info=get_info("npymath") > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 2184, in get_info > pkg_info = get_pkg_info(pkgname, dirs) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/misc_util.py", > line 2136, in get_pkg_info > return read_config(pkgname, dirs) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", > line 390, in read_config > v = _read_config_imp(pkg_to_filename(pkgname), dirs) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", > line 326, in _read_config_imp > meta, vars, sections, reqs = _read_config(filenames) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", > line 309, in _read_config > meta, vars, sections, reqs = parse_config(f, dirs) > File > "/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/distutils/npy_pkg_config.py", > line 281, in parse_config > raise PkgNotFound("Could not find file(s) %s" % str(filenames)) > numpy.distutils.npy_pkg_config.PkgNotFound: Could not find file(s) > ['/Library/Python/2.6/site-packages/numpy-1.6.0-py2.6-macosx-10.6-universal.egg/numpy/core/lib/npy-pkg-config/npymath.ini'] > > I am unable to understand where and what exactly the problem is. > Can anyone please help me? I will be really thankful for a response. That doesn't look like a familiar error message to me. But before trying to get to the root of that a simple question: why don't you just use a binary installer? You can download dmg installers for python itself from python.organd for numpy and scipy from the Sourceforge download sites. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From borreguero at gmail.com Tue Mar 20 18:47:49 2012 From: borreguero at gmail.com (Jose Borreguero) Date: Tue, 20 Mar 2012 18:47:49 -0400 Subject: [SciPy-User] how to obtain I, J, V from sparse matrix (V, (I, J)) ? In-Reply-To: References: Message-ID: Thank you. I was working with csr_matrix which lacks row and col attributes. I'll just cast to coo_matrix. -Jose On Tue, Mar 20, 2012 at 5:55 PM, Pauli Virtanen wrote: > 20.03.2012 22:08, Jose Borreguero kirjoitti: > > Scipy docs state one can construct a matrix from three 1D > > arrays,A=sparse.coo_matrix((V,(I,J)),shape=(4,4)) > > > > However, given sparse matrix A, how can I obtain arrays V, I, and J? > > I could not find any methods of the sparse matrix that would return > these arrays... > > Check out the `data`, `row` and `col` attributes: > > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Mar 21 09:51:53 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Mar 2012 09:51:53 -0400 Subject: [SciPy-User] quadratic programming with fmin_slsqp In-Reply-To: <4b58200d-59e9-4cc8-bd7b-6beee0921fd8@w32g2000vbt.googlegroups.com> References: <4b58200d-59e9-4cc8-bd7b-6beee0921fd8@w32g2000vbt.googlegroups.com> Message-ID: On Tue, Mar 20, 2012 at 7:58 AM, denis wrote: > > On Mar 16, 5:45?pm, josef.p... at gmail.com wrote: >> scipy is missing a fmin_quadprog > > Josef, > ?minmize() ?is a reasonable common interface to 10 or so optimizers, > see http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html > however > - minimize.py is not in scipy-0.9.0.tar nor in scipy-0.10.1.tar > ? ?(a test to see if anybody's using it ?) > - only L-BFGS-B TNC COBYLA and SLSQP support bounds. > > One could supply a trivial box / penaltybox as outlined below > (I use this playing around with Neldermead) > but I'm not sure anybody would use it > plus there's openopt pyomo mystic ... > maybe more solvers than real testcases :- > > cheers > ?-- denis > > > class Funcbox: > ? ?""" F = Funcbox( func, [box=(0,1), penalty=None, grid=0, > *funcargs, **kwargs > ? ? ? ?wraps a func() with a constraint box and grid > > ? ?Parameters > ? ?---------- > ? ?func: a function of a numpy vector or array-like > ? ?box: (low, high) to np.clip, default (0,1). > ? ? ? ?These can be vectors; low_j == high_j freezes x_j at that > value. > ? ?penalty: e.g. (0, 1, 1000) adds a quadratic penalty > ? ? ? ?to func() where xclip is outside (0, 1) > ? ? ? ? ? ?1000 * sum( max( 0 - x, 0 )**2 + max( x - 1, 0 )**2 ) > ? ? ? ? ? ?= 1 4 9 16 ... at -.01 -.02 ... and 1.01 1.02 ... > ? ? ? ?The default is None, no penalty. > ? ? ? ?(The penalty box should be smaller than the clip box; > ? ? ? ?x is first gridded if grid > 0, then clipped, then penalty > computed.) > ? ?grid: e.g. .01 snaps all x_j to multiples of .01 -- > ? ? ? ?a simple noise smoother, recommended for noisy functions. > ? ? ? ?The default is 0, no gridding. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user What I meant was, getting a high level interface that can be used as in other packages http://stats.stackexchange.com/questions/15741/matlabs-quadprog-equivalent-in-python http://www.mathworks.com/help/toolbox/optim/ug/quadprog.html http://svitsrv25.epfl.ch/R-doc/library/quadprog/html/solve.QP.html http://abel.ee.ucla.edu/cvxopt/userguide/coneprog.html#quadratic-programming scikits.datasmooth is using cvxopt for it fmin_slsqp "sounds" similar Josef From will at thearete.co.uk Wed Mar 21 13:48:38 2012 From: will at thearete.co.uk (William Furnass) Date: Wed, 21 Mar 2012 17:48:38 +0000 Subject: [SciPy-User] Joint distributions Message-ID: I am wanting to fit a parameterised model to a series of 15 datapoints, with each being a concentration C and time t. Within the objective function of the optimisation routine that I'm using for the model fitting I presently calculate fitness using the Bray Curtis distance between the data series and the prediction corresponding to a candidate solution. I would ideally like to calculate fitness in such a way as to account for uncertainty in each (C, t). I think I can achieve this for a given data series by a) modelling each data point using a bivariate Gaussian PDF (with static variances for both C and t) b) calculate a prediction using a small dt c) find the highest probability of all points in the prediction series for each of the 15 bivariate PDFs d) sum or average the probabilities to get a measure of the fit of the real data series to the prediction corresponding to the candidate solution. My question is is there an easy way of finding joint probabilities using scipy.stats? I thought I could construct a bivariate normal distribution using dens = scipy.stats.norm(loc=np.array([t[i], C[i]]), scale=np.array([t_stdev, C_stdev])) but dens.pdf(np.array([5,7])) returns an array when I thought it should return a scalar probability. Apologies if the above is not particularly clear or if I'm missing something obvious here. Regards, Will Furnass From josef.pktd at gmail.com Wed Mar 21 14:07:51 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Mar 2012 14:07:51 -0400 Subject: [SciPy-User] Joint distributions In-Reply-To: References: Message-ID: On Wed, Mar 21, 2012 at 1:48 PM, William Furnass wrote: > I am wanting to fit a parameterised model to a series of 15 > datapoints, with each being a concentration C and time t. ?Within the > objective function of the optimisation routine that I'm using for the > model fitting I presently calculate fitness using the Bray Curtis > distance between the data series and the prediction corresponding to a > candidate solution. > > I would ideally like to calculate fitness in such a way as to account > for uncertainty in each (C, t). ?I think I can achieve this for a > given data series by > ?a) modelling each data point using a bivariate Gaussian PDF (with > static variances for both C and t) > ?b) calculate a prediction using a small dt > ?c) find the highest probability of all points in the prediction > series for each of the 15 bivariate PDFs > ?d) sum or average the probabilities to get a measure of the fit of > the real data series to the prediction corresponding to the candidate > solution. > > My question is is there an easy way of finding joint probabilities > using scipy.stats? ?I thought I could construct a bivariate normal > distribution using > > dens = scipy.stats.norm(loc=np.array([t[i], C[i]]), > scale=np.array([t_stdev, C_stdev])) > > but > > dens.pdf(np.array([5,7])) > > returns an array when I thought it should return a scalar probability. scipy.stats only has univariate distributions, or to be exact it calculates it for many points independently. So the returned array is the pdf for each point separately calculated. If you want the pdf for the bivariate or multivariate normal distribution then it's just a few lines, ( I think the bivariate normal is also in matplotlib, in statsmodels ?) Your fitting problem sounds a bit like what scipy.odr does. Josef > > Apologies if the above is not particularly clear or if I'm missing > something obvious here. > > Regards, > > Will Furnass > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From hate_pod at yahoo.com Wed Mar 21 22:43:37 2012 From: hate_pod at yahoo.com (Odin Den) Date: Thu, 22 Mar 2012 02:43:37 +0000 (UTC) Subject: [SciPy-User] numpy array root operation Message-ID: Hi, 5th root of -32 can be computed correctly as follows: >>> -32**(1/5) >>> -2.0 However, when I try to perform same operation on numpy arrays I get the following: >>> array([-32])**(1/5) >>> array([ nan]) Is there anyway to compute roots of numpy arrays? I have a huge matrix which contains both negative and positive values. What is the easiest way of making python compute the "nth" roots of each element of this matrix without knowing the value of "n" a priory? From guziy.sasha at gmail.com Wed Mar 21 23:22:46 2012 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Wed, 21 Mar 2012 23:22:46 -0400 Subject: [SciPy-User] numpy array root operation In-Reply-To: References: Message-ID: Maybe like this, >>> import numpy as np >>> x = np.array([-81,25]) >>> np.sign(x) * np.absolute(x) ** (1.0/5.0) array([-2.40822469, 1.90365394]) >>> np.sign(x) * np.absolute(x) ** (1.0/2.0) array([-9., 5.]) try this way and you'll be also in trouble (-32)**(1.0/5.0) Cheers -- Oleksandr Huziy 2012/3/21 Odin Den : > Hi, > 5th root of -32 can be computed correctly as follows: >>>> -32**(1/5) >>>> -2.0 > > However, when I try to perform same operation on numpy arrays I get > the following: >>>> array([-32])**(1/5) >>>> array([ nan]) > > Is there anyway to compute roots of numpy arrays? I have a huge matrix which > contains both negative and positive values. What is the easiest way of making > python compute the "nth" roots of each element of this matrix without knowing > the value of "n" a priory? > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From wardefar at iro.umontreal.ca Wed Mar 21 23:26:21 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 21 Mar 2012 23:26:21 -0400 Subject: [SciPy-User] numpy array root operation In-Reply-To: References: Message-ID: <5E811AA7-0D07-4103-9369-6059C1158FF7@iro.umontreal.ca> On 2012-03-21, at 10:43 PM, Odin Den wrote: > Is there anyway to compute roots of numpy arrays? I have a huge matrix which > contains both negative and positive values. What is the easiest way of making > python compute the "nth" roots of each element of this matrix without knowing > the value of "n" a priory? I suspect not, not without writing your own function that handles negative numbers correctly in all cases. Note that the C standard pow() doesn't support non-integer powers on negative numbers, nor does plain Python itself. You might have luck at the symbolic/multiprecision math packages like SymPy or mpmath, which might implement the correct algorithm, but if you want to operate on arrays you probably need to write a custom function borrowing one of their algorithms and making it operate on array data. From wardefar at iro.umontreal.ca Wed Mar 21 23:31:13 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 21 Mar 2012 23:31:13 -0400 Subject: [SciPy-User] numpy array root operation In-Reply-To: References: Message-ID: <46118E9F-1C06-4AB9-9D0D-317744B30C36@iro.umontreal.ca> On 2012-03-21, at 10:43 PM, Odin Den wrote: >>>> -32**(1/5) >>>> -2.0 I just noticed that this was probably a REPL prompt (my mail client shows it as quotation indentation, which failed to register in my mind). You have your order of operations wrong. Exponentiation is higher priority than multiplication (i.e. the unary -) so what you are getting is -1 * (32 ** (1/5)). >>> (-32)**(1/5.) Traceback (most recent call last): File "", line 1, in ValueError: negative number cannot be raised to a fractional power From wardefar at iro.umontreal.ca Wed Mar 21 23:34:48 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 21 Mar 2012 23:34:48 -0400 Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> <28284362.88.1332149037436.JavaMail.geo-discussion-forums@yneo2> Message-ID: On 2012-03-20, at 11:33 AM, Nathaniel Smith wrote: > (Are there any generalized > ufuncs actually defined in the standard packages? For instance, is > np.dot implemented as a generalized ufunc? Should it be?) Ideally, so long as it still made use of BLAS for the actual matrix products. I tried my hand at implementing a gufunc for log(sum(exp(...))), with the sum being the "generalized" part. Did not have much luck... David From jaakko.luttinen at aalto.fi Thu Mar 22 05:34:52 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 11:34:52 +0200 Subject: [SciPy-User] Sparse matrix multiply Message-ID: <4F6AF23C.9080104@aalto.fi> Hi! Why do I get two different results for the code below? import numpy as np import scipy.sparse as sp A = sp.rand(20,20,density=0.1) B = sp.rand(20,20,density=0.1) np.multiply(A,B).sum() # out: 21.058793740984925 A.multiply(B).sum() # out: 0.76482546226069481 Am I missing something? I think numpy.multiply should either return the correct answer or an error that it can't compute the correct answer. Regards, Jaakko From cmutel at gmail.com Thu Mar 22 06:03:29 2012 From: cmutel at gmail.com (Christopher Mutel) Date: Thu, 22 Mar 2012 11:03:29 +0100 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: <4F6AF23C.9080104@aalto.fi> References: <4F6AF23C.9080104@aalto.fi> Message-ID: On Thu, Mar 22, 2012 at 10:34 AM, Jaakko Luttinen wrote: > Hi! > > Why do I get two different results for the code below? > > import numpy as np > import scipy.sparse as sp > A = sp.rand(20,20,density=0.1) > B = sp.rand(20,20,density=0.1) > np.multiply(A,B).sum() > # out: 21.058793740984925 > A.multiply(B).sum() > # out: 0.76482546226069481 > > Am I missing something? > I think numpy.multiply should either return the correct answer or an > error that it can't compute the correct answer. np.multiply performs element-wise multiplication, while A.multiply is matrix multiplication. They are both "correct", but answer different questions. See: http://en.wikipedia.org/wiki/Matrix_multiplication http://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html > Regards, > Jaakko > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- ############################ Chris Mutel ?kologisches Systemdesign - Ecological Systems Design Institut f.Umweltingenieurwissenschaften - Institute for Environmental Engineering ETH Z?rich - HIF C 44 - Schafmattstr. 6 8093 Z?rich Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61 ############################ From jaakko.luttinen at aalto.fi Thu Mar 22 06:07:23 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 12:07:23 +0200 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: References: <4F6AF23C.9080104@aalto.fi> Message-ID: <4F6AF9DB.4050201@aalto.fi> On 03/22/2012 12:03 PM, Christopher Mutel wrote: > On Thu, Mar 22, 2012 at 10:34 AM, Jaakko Luttinen > wrote: >> Hi! >> >> Why do I get two different results for the code below? >> >> import numpy as np >> import scipy.sparse as sp >> A = sp.rand(20,20,density=0.1) >> B = sp.rand(20,20,density=0.1) >> np.multiply(A,B).sum() >> # out: 21.058793740984925 >> A.multiply(B).sum() >> # out: 0.76482546226069481 >> >> Am I missing something? >> I think numpy.multiply should either return the correct answer or an >> error that it can't compute the correct answer. > > np.multiply performs element-wise multiplication, while A.multiply is > matrix multiplication. They are both "correct", but answer different > questions. Is it so..? http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.multiply.html I don't know what "point-wise multiplication" means.. Anyway, I thought that dot computes matrix multiplication and multiply computes matrix multiplication. -Jaakko From jaakko.luttinen at aalto.fi Thu Mar 22 06:08:47 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 12:08:47 +0200 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: <4F6AF9DB.4050201@aalto.fi> References: <4F6AF23C.9080104@aalto.fi> <4F6AF9DB.4050201@aalto.fi> Message-ID: <4F6AFA2F.1030701@aalto.fi> On 03/22/2012 12:07 PM, Jaakko Luttinen wrote: > On 03/22/2012 12:03 PM, Christopher Mutel wrote: >> On Thu, Mar 22, 2012 at 10:34 AM, Jaakko Luttinen >> wrote: >>> Hi! >>> >>> Why do I get two different results for the code below? >>> >>> import numpy as np >>> import scipy.sparse as sp >>> A = sp.rand(20,20,density=0.1) >>> B = sp.rand(20,20,density=0.1) >>> np.multiply(A,B).sum() >>> # out: 21.058793740984925 >>> A.multiply(B).sum() >>> # out: 0.76482546226069481 >>> >>> Am I missing something? >>> I think numpy.multiply should either return the correct answer or an >>> error that it can't compute the correct answer. >> >> np.multiply performs element-wise multiplication, while A.multiply is >> matrix multiplication. They are both "correct", but answer different >> questions. > > Is it so..? > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.multiply.html > > I don't know what "point-wise multiplication" means.. > > Anyway, I thought that dot computes matrix multiplication and multiply > computes matrix multiplication. TYPOFIX: I thought that multiply computes element-wise multiplication. -Jaakko From pav at iki.fi Thu Mar 22 06:18:40 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 22 Mar 2012 11:18:40 +0100 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: <4F6AF23C.9080104@aalto.fi> References: <4F6AF23C.9080104@aalto.fi> Message-ID: 22.03.2012 10:34, Jaakko Luttinen kirjoitti: > import numpy as np > import scipy.sparse as sp > A = sp.rand(20,20,density=0.1) > B = sp.rand(20,20,density=0.1) > np.multiply(A,B).sum() > # out: 0.76482546226069481 > Am I missing something? > I think numpy.multiply should either return the correct answer or an > error that it can't compute the correct answer. The answer is the same as to your previous questions --- the Numpy ufuncs do not deal with sparse matrices in a reasonable way. This lack of integration between dense and sparse is a bug. Why it does not raise an error instead, is probably that as a consequence of the operation overloading rules defined, there is a (nonsensical) operation that matches the call. -- Pauli Virtanen From jaakko.luttinen at aalto.fi Thu Mar 22 06:18:56 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 12:18:56 +0200 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: References: <4F6AF23C.9080104@aalto.fi> Message-ID: <4F6AFC90.9040102@aalto.fi> On 03/22/2012 12:03 PM, Christopher Mutel wrote: > On Thu, Mar 22, 2012 at 10:34 AM, Jaakko Luttinen > wrote: >> Hi! >> >> Why do I get two different results for the code below? >> >> import numpy as np >> import scipy.sparse as sp >> A = sp.rand(20,20,density=0.1) >> B = sp.rand(20,20,density=0.1) >> np.multiply(A,B).sum() >> # out: 21.058793740984925 >> A.multiply(B).sum() >> # out: 0.76482546226069481 >> >> Am I missing something? >> I think numpy.multiply should either return the correct answer or an >> error that it can't compute the correct answer. > > np.multiply performs element-wise multiplication, while A.multiply is > matrix multiplication. They are both "correct", but answer different > questions. Actually it seems that np.multiply computes matrix multiplication in this case! Below, only A.multiply(B) computes element-wise multiplication. import numpy as np import scipy.sparse as sp A = sp.rand(20,20,density=0.1) B = sp.rand(20,20,density=0.1) np.multiply(A,B).sum() # out: 25.240683127057885 A.multiply(B).sum() # out: 2.6382118196920503 A.dot(B).sum() # out: 25.240683127057885 np.dot(A,B).sum() # out: 25.240683127057885 -Jaakko From cmutel at gmail.com Thu Mar 22 06:24:04 2012 From: cmutel at gmail.com (Christopher Mutel) Date: Thu, 22 Mar 2012 11:24:04 +0100 Subject: [SciPy-User] Sparse matrix multiply In-Reply-To: <4F6AFC90.9040102@aalto.fi> References: <4F6AF23C.9080104@aalto.fi> <4F6AFC90.9040102@aalto.fi> Message-ID: On Thu, Mar 22, 2012 at 11:18 AM, Jaakko Luttinen wrote: > On 03/22/2012 12:03 PM, Christopher Mutel wrote: >> On Thu, Mar 22, 2012 at 10:34 AM, Jaakko Luttinen >> wrote: >>> Hi! >>> >>> Why do I get two different results for the code below? >>> >>> import numpy as np >>> import scipy.sparse as sp >>> A = sp.rand(20,20,density=0.1) >>> B = sp.rand(20,20,density=0.1) >>> np.multiply(A,B).sum() >>> # out: 21.058793740984925 >>> A.multiply(B).sum() >>> # out: 0.76482546226069481 >>> >>> Am I missing something? >>> I think numpy.multiply should either return the correct answer or an >>> error that it can't compute the correct answer. >> >> np.multiply performs element-wise multiplication, while A.multiply is >> matrix multiplication. They are both "correct", but answer different >> questions. > > Actually it seems that np.multiply computes matrix multiplication in > this case! Below, only A.multiply(B) computes element-wise multiplication. > > import numpy as np > import scipy.sparse as sp > A = sp.rand(20,20,density=0.1) > B = sp.rand(20,20,density=0.1) > np.multiply(A,B).sum() > # out: 25.240683127057885 > A.multiply(B).sum() > # out: 2.6382118196920503 > A.dot(B).sum() > # out: 25.240683127057885 > np.dot(A,B).sum() > # out: 25.240683127057885 Indeed. Sorry for the confusion. From jaakko.luttinen at aalto.fi Thu Mar 22 09:42:51 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 15:42:51 +0200 Subject: [SciPy-User] Bug in scipy.io.mmread? Message-ID: <4F6B2C5B.1050408@aalto.fi> Hi! I am trying to read the following matrix market file: ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033.mtx.gz However, it doesn't work with Python 3 and SciPy 0.11.0: ===================================================== Python 3.2.2 (default, Oct 27 2011, 13:08:00) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.version.version '0.11.0.dev-Unknown' >>> from scipy.io import mmread >>> mmread('illc1033.mtx.gz') Traceback (most recent call last): File "", line 1, in File "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", line 68, in mmread return MMFile().read(source) File "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", line 302, in read return self._parse_body(stream) File "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", line 447, in _parse_body flat_data = flat_data.reshape(-1,3) ValueError: total size of new array must be unchanged ===================================================== It does work with Python 2 and SciPy 0.7.2: ===================================================== Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.version.version '0.7.2' >>> from scipy.io import mmread >>> mmread('illc1033.mtx.gz') <1033x320 sparse matrix of type '' with 4732 stored elements in COOrdinate format> ===================================================== Is there a bug in recent scipy.io.mmread or what could be the problem? Best, Jaakko From nwagner at iam.uni-stuttgart.de Thu Mar 22 13:15:43 2012 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Thu, 22 Mar 2012 18:15:43 +0100 Subject: [SciPy-User] Bug in scipy.io.mmread? In-Reply-To: <4F6B2C5B.1050408@aalto.fi> References: <4F6B2C5B.1050408@aalto.fi> Message-ID: On Thu, 22 Mar 2012 15:42:51 +0200 Jaakko Luttinen wrote: > Hi! > > I am trying to read the following matrix market file: > ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033.mtx.gz > > However, it doesn't work with Python 3 and SciPy 0.11.0: > > ===================================================== > Python 3.2.2 (default, Oct 27 2011, 13:08:00) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for >more information. >>>> import scipy >>>> scipy.version.version > '0.11.0.dev-Unknown' >>>> from scipy.io import mmread >>>> mmread('illc1033.mtx.gz') > Traceback (most recent call last): > File "", line 1, in > File > "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", > line 68, in mmread > return MMFile().read(source) > File > "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", > line 302, in read > return self._parse_body(stream) > File > "/home/jluttine/.local/lib/python3.2/site-packages/scipy/io/mmio.py", > line 447, in _parse_body > flat_data = flat_data.reshape(-1,3) > ValueError: total size of new array must be unchanged > ===================================================== > > It does work with Python 2 and SciPy 0.7.2: > > ===================================================== > Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for >more information. >>>> import scipy >>>> scipy.version.version > '0.7.2' >>>> from scipy.io import mmread >>>> mmread('illc1033.mtx.gz') > <1033x320 sparse matrix of type '' > with 4732 stored elements in COOrdinate format> > ===================================================== > > Is there a bug in recent scipy.io.mmread or what could >be the problem? > > Best, > Jaakko > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Hi Jaakko, works fine for me with python 2.7.2 on opensuse12.1 >>> sp.__version__ '0.11.0.dev-e7d3e33' >>> np.__version__ '1.7.0.dev-3503c5f' Cheers, Nils From jaakko.luttinen at aalto.fi Thu Mar 22 13:46:02 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 22 Mar 2012 19:46:02 +0200 Subject: [SciPy-User] Bug in scipy.io.mmread? In-Reply-To: References: <4F6B2C5B.1050408@aalto.fi> Message-ID: <4F6B655A.6080000@aalto.fi> >> I am trying to read the following matrix market file: >> ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033.mtx.gz >> >> However, it doesn't work with Python 3 and SciPy 0.11.0: > > works fine for me with python 2.7.2 on opensuse12.1 > >>>> sp.__version__ > '0.11.0.dev-e7d3e33' >>>> np.__version__ > '1.7.0.dev-3503c5f' Hi! I also got it working with Python 2.6.6: >>> scipy.version.version '0.11.0.dev-0fbfdbc' So, it seems like it is related to Python 3.2? I tried to diff mmio.py but didn't notice any relevant differences between the Python versions.. Here is the diff: http://pastebin.com/e2xm3CVx -Jaakko From friedrichromstedt at gmail.com Thu Mar 22 15:24:49 2012 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Thu, 22 Mar 2012 20:24:49 +0100 Subject: [SciPy-User] numpy array root operation In-Reply-To: References: Message-ID: <644F8AAD-69EF-43C2-AD64-514D568C103F@gmail.com> Am 22.03.2012 um 03:43 schrieb Odin Den : > Hi, > 5th root of -32 can be computed correctly as follows: >>>> -32**(1/5) >>>> -2.0 Warning, mathematician (physicist to be precise) speaking. Additional to the operator precedence issue pointed out by David, notice that powers to non-integer numbers are cumbersome to define. In fact, let q be a rational number q = n/d, and v a complex number v = |v| exp(i phi), e.g. ?42 with |v| = 42 and phi = pi. Then the root v^(1/d) is d-fold, and can be defined as the set of all (complex) numbers whose d-th power is v, namely the set {r exp(2 pi i f/d + i phi/d), f = 0...(d ? 1)}, with a real positive number r such that r^d = |v|. The d-th power of each of this numbers is v. v^q is then just (v^(1/d))^n. For real v phi is either 0 or pi, so for positive v phi/d in the set equation will be zero, so there's always a positive root, which we call just "root" in daily language and in Python. For negative v, phi = pi, and (2 pi)/d is just twice that large, so there's no longer a positive root. But for odd d, the term 2 pi f/d + phi/d will be, for f = (d ? 1)/2, just pi, so there's then a negative root, amongst all this roots. For even d, there's neither a positive nor a negative root of v, but only d complex ones. Notice that taking the numbers in the set to the n-th power multiplies their angle with n. You can calculate the first root (f = 0) in Python and numpy by taking a complex number to the power, e.g. in Python (?42 + 0j) ** (0.2). But this will never be real, always complex for negative v and noninteger exponent. Notice that for numpy, the convention for phi is ?pi < phi <= pi. Negative numbers have phi = pi. The "first" root can be defined for any kind of exponent, but for irrational numbers as exponent there will be an infinite set of roots, AISI. But don't bet on that. It cannot be finite because then the irrational number would have been rational. :-) As I warned you, mathematically inclined people can speak an hour about the most obvious thing to CS people, but normally I found it worth doing it once and then just keeping in mind that there is no such thing as natural intuition. cu Friedrich > However, when I try to perform same operation on numpy arrays I get > the following: >>>> array([-32])**(1/5) >>>> array([ nan]) So ya, it's Not A Number, but A Set Of Numbers. :-) Maybe it could spit out the complex first root instead, as it will do for numpy.asarray(?42 + 0j) ** 0.2. But I'm not involved enough to be knowledgable about the design here. > Is there anyway to compute roots of numpy arrays? I have a huge matrix which > contains both negative and positive values. What is the easiest way of making > python compute the "nth" roots of each element of this matrix without knowing > the value of "n" a priory? The other proposed approach, by using just the modulus and keeping the sign, is, as I pointed out above, not in all cases mathematically valid. I would guess you screwed up your model if you ran across taking fractional powers of negative numbers. From pav at iki.fi Thu Mar 22 15:50:04 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 22 Mar 2012 20:50:04 +0100 Subject: [SciPy-User] Bug in scipy.io.mmread? In-Reply-To: <4F6B2C5B.1050408@aalto.fi> References: <4F6B2C5B.1050408@aalto.fi> Message-ID: 22.03.2012 14:42, Jaakko Luttinen kirjoitti: [clip] > Python 3.2.2 (default, Oct 27 2011, 13:08:00) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import scipy >>>> scipy.version.version > '0.11.0.dev-Unknown' >>>> from scipy.io import mmread >>>> mmread('illc1033.mtx.gz') Note that this succeeds if you gunzip the file first. This is a bug in the gzip module in Python 3.x. The PyObject_AsFileDescriptor call on a GzipFile object succeeds on Python 3, although it should fail (there is no OS level file handle giving the uncompressed stream). As a consequence, mmio ends up reading the compressed data stream, which of course does not work. It's possible to work around this in mmio. -- Pauli Virtanen From njs at pobox.com Thu Mar 22 16:26:35 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 22 Mar 2012 20:26:35 +0000 Subject: [SciPy-User] numpy array root operation In-Reply-To: References: Message-ID: On Thu, Mar 22, 2012 at 2:43 AM, Odin Den wrote: > Hi, > 5th root of -32 can be computed correctly as follows: >>>> -32**(1/5) >>>> -2.0 > > However, when I try to perform same operation on numpy arrays I get > the following: >>>> array([-32])**(1/5) >>>> array([ nan]) > > Is there anyway to compute roots of numpy arrays? I have a huge matrix which > contains both negative and positive values. What is the easiest way of making > python compute the "nth" roots of each element of this matrix without knowing > the value of "n" a priory? As long as you *know* that what you're computing is an odd root, and that what you want for negative numbers is the real root, then you could just work around this: roots = np.sign(a) * (a * np.sign(a))**(1./5) -- Nathaniel From fperez.net at gmail.com Thu Mar 22 17:11:14 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 22 Mar 2012 14:11:14 -0700 Subject: [SciPy-User] [ANN] PyData workshop videos are up online, including panel with Guido Message-ID: Hi all, just to let you know that the videos from the PyData workshop we held at Google a couple of weeks ago are now online (not all talks are up yet, so watch the page over the next few days if a talk you wanted to see isn't posted yet): http://marakana.com/s/2012_pydata_workshop,1090/index.html The panel discussion with Guido that we talked about on these lists is in there; I hope to write up a short summary about it soon. Many thanks to Simeon Franklin and the rest of the Marakana team for doing all this work (for free)! Cheers, f From ryanlists at gmail.com Thu Mar 22 17:33:34 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu, 22 Mar 2012 16:33:34 -0500 Subject: [SciPy-User] problem with loading data from data_store Message-ID: I have some data sets stored using scipy.io.save_as_module. I recently upgrade to 0.10 and I can no longer open this module. Further, I tried to reprocess my data and resave it and I am still getting the same error message. Just a couple of lines are needed to recreate my problem: mydict = {'a':12.34} scipy.io.save_as_module('mymod',mydict) import mymod The response to the last command (import) is In [18]: import mymod --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/TMM_SFLR_model1.py in () ----> 1 2 3 4 5 /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/mymod.py in () 1 import scipy.io.data_store as data_store 2 import mymod ----> 3 data_store._load(mymod) 4 5 AttributeError: 'module' object has no attribute '_load' Can anyone help me with this? Thanks, Ryan From cpsmusic at yahoo.com Fri Mar 23 07:28:23 2012 From: cpsmusic at yahoo.com (Chris Share) Date: Fri, 23 Mar 2012 04:28:23 -0700 (PDT) Subject: [SciPy-User] OSX, Python 3.2.2 and NumPy/SciPy Message-ID: <1332502103.17367.YahooMailNeo@web161502.mail.bf1.yahoo.com> Hi, I'm new to Python however I have a reasonable amount of programming experience (C/C++). I'm currently working on OSX (10.6.8) and I've installed Python 3.2.2. OSX also comes with Python 2.6.6. I'm interested in scientific computing so I'd like to install Numpy/SciPy. I've managed to do this for the 2.6.6 version of Python however I'm unclear as to how I do this for the 3.2.2 version. According to the 3.2.2 installer ReadMe: The installer puts applications, an "Update Shell Profile" command, >and a link to the optionally installed Python Documentation into the >"Python 3.2" subfolder of the system Applications folder, >and puts the underlying machinery into the folder >/Library/Frameworks/Python.framework. It can >optionally place links to the command-line tools in /usr/local/bin as >well. Double-click on the "Update Shell Profile" command to add the >"bin" directory inside the framework to your shell's search path. How do install NumPy/SciPy so that the 3.2.2 version of IDLE can access them? Is this possible? Cheers, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Fri Mar 23 14:55:15 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 23 Mar 2012 18:55:15 +0000 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <20120310234442.GN24301@ninja.nosyntax.net> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5BD861.8050205@simplistix.co.uk> <20120310234442.GN24301@ninja.nosyntax.net> Message-ID: <4F6CC713.50704@simplistix.co.uk> On 10/03/2012 23:44, rex wrote: > Perhaps the NumPy+SciPy+Matplotlib community could learn something by > looking at how the R community works? To this mere user who wants to > get a job done, it's a night and day difference. I still use Python > for GP programming, but there's a snowball's chance I'd ever use > anything but R for my main interest, which is econometrics. You really should try EPD. Sorry you had a bad experience. Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri Mar 23 15:00:42 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 23 Mar 2012 19:00:42 +0000 Subject: [SciPy-User] Layering a virtualenv over EPD In-Reply-To: <4F5C6892.3050606@hilboll.de> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5C6892.3050606@hilboll.de> Message-ID: <4F6CC85A.5080208@simplistix.co.uk> On 11/03/2012 08:55, Andreas H. wrote: > I just uploaded a quick log of what I did to accomplish exactly this to > > https://gist.github.com/2015652 > > I do have the problem that within the virtualenv, something with the > console's not working right, as iPythons help doesn't work properly, and > I cannot launch applications which open windows (except for ``ipython > pylab=wx``) ... That sounds less than ideal ;-) I suspect you've ended up doing what I'm intent on avoiding: re-installing ipython just to get the launch script in the bin directory of the virtualenv. Now, you can just manually craft a script in there by copying the system-wide one and hacking the pling line, but you shouldn't have to. I've opened a bug on virtualenv for this: https://github.com/pypa/pip/issues/480 However, now that I'm CC'ing the ipython list, does ipython only provide distutils shell scripts? It would be great if it could also provide a setuptools-compatible console_scripts entry point? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From takowl at gmail.com Sat Mar 24 06:59:04 2012 From: takowl at gmail.com (Thomas Kluyver) Date: Sat, 24 Mar 2012 10:59:04 +0000 Subject: [SciPy-User] [IPython-User] Layering a virtualenv over EPD In-Reply-To: <4F6CC85A.5080208@simplistix.co.uk> References: <4F5BD12D.9090504@simplistix.co.uk> <4F5C6892.3050606@hilboll.de> <4F6CC85A.5080208@simplistix.co.uk> Message-ID: On 23 March 2012 19:00, Chris Withers wrote: > I suspect you've ended up doing what I'm intent on avoiding: > re-installing ipython just to get the launch script in the bin directory > of the virtualenv. > > Now, you can just manually craft a script in there by copying the > system-wide one and hacking the pling line, but you shouldn't have to. Just to mention: the development version of IPython will detect if there's a virtualenv active when it starts and try to put its directories on sys.path. It's not flawless - it will always behave as though the virtualenv was created with --system-site-packages, but it's convenient for simple cases. Of course, that doesn't interfere if IPython is installed inside the virtualenv. Thomas From ralf.gommers at googlemail.com Sat Mar 24 18:24:04 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 24 Mar 2012 23:24:04 +0100 Subject: [SciPy-User] OSX, Python 3.2.2 and NumPy/SciPy In-Reply-To: <1332502103.17367.YahooMailNeo@web161502.mail.bf1.yahoo.com> References: <1332502103.17367.YahooMailNeo@web161502.mail.bf1.yahoo.com> Message-ID: On Fri, Mar 23, 2012 at 12:28 PM, Chris Share wrote: > Hi, > > I'm new to Python however I have a reasonable amount of programming > experience (C/C++). > > I'm currently working on OSX (10.6.8) and I've installed Python 3.2.2. OSX > also comes with Python 2.6.6. > > I'm interested in scientific computing so I'd like to install Numpy/SciPy. > I've managed to do this for the 2.6.6 version of Python however I'm unclear > as to how I do this for the 3.2.2 version. > > According to the 3.2.2 installer ReadMe: > > The installer puts applications, an "Update Shell Profile" command, > and a link to the optionally installed Python Documentation into the > "Python 3.2" subfolder of the system Applications folder, > and puts the underlying machinery into the folder > /Library/Frameworks/Python.framework. It can > optionally place links to the command-line tools in /usr/local/bin as > well. Double-click on the "Update Shell Profile" command to add the > "bin" directory inside the framework to your shell's search path. > > > > How do install NumPy/SciPy so that the 3.2.2 version of IDLE can access > them? > There's no binary installer for Python 3.x on OS X yet, so you have to compile numpy/scipy. Assuming you have XCode installed and the correct gfortran compiler (linked at http://scipy.org/Installing_SciPy/Mac_OS_X), you simply type "python setup.py install" in the base dir of the numpy/scipy repos. It will first convert the source with 2to3 to python 3.x format, then compile and install it. You should then be able to import it in IDLE. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Mar 25 16:55:22 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 25 Mar 2012 22:55:22 +0200 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: On Thu, Mar 22, 2012 at 10:33 PM, Ryan Krauss wrote: > I have some data sets stored using scipy.io.save_as_module. I > recently upgrade to 0.10 and I can no longer open this module. > Further, I tried to reprocess my data and resave it and I am still > getting the same error message. Just a couple of lines are needed to > recreate my problem: > > mydict = {'a':12.34} > scipy.io.save_as_module('mymod',mydict) > import mymod > > The response to the last command (import) is > > In [18]: import mymod > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > > /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/TMM_SFLR_model1.py > in () > ----> 1 > 2 > 3 > 4 > 5 > > /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/mymod.py in () > 1 import scipy.io.data_store as data_store > 2 import mymod > ----> 3 data_store._load(mymod) > 4 > 5 > > AttributeError: 'module' object has no attribute '_load' > > > Can anyone help me with this? > You can add this in scipy/io/data_store.py: def _load(module): """ Load data into module from a shelf with the same name as the module. """ dir,filename = os.path.split(module.__file__) filebase = filename.split('.')[0] fn = os.path.join(dir, filebase) f = dumb_shelve.open(fn, "r") #exec( 'import ' + module.__name__) for i in f.keys(): exec( 'import ' + module.__name__+ ';' + module.__name__+'.'+i + '=' + 'f["' + i + '"]') This was caused by an incorrect removal of deprecated code in https://github.com/scipy/scipy/commit/329a5e2713. So apparently save_as_module() has been completely broken for 2 years without anyone noticing..... Proposal: fix save_as_module now so it can load data again, deprecate it for 0.11 and remove it for 0.12. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdekauwe at gmail.com Sun Mar 25 18:39:47 2012 From: mdekauwe at gmail.com (mdekauwe) Date: Sun, 25 Mar 2012 15:39:47 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] Problem with Installation of Scipy on Macbook In-Reply-To: References: Message-ID: <33544711.post@talk.nabble.com> Hi, I would recommend install via macports, I have had no issue that way -- View this message in context: http://old.nabble.com/Problem-with-Installation-of-Scipy-on-Macbook-tp33537462p33544711.html Sent from the Scipy-User mailing list archive at Nabble.com. From pengkui.luo at gmail.com Sun Mar 25 18:47:40 2012 From: pengkui.luo at gmail.com (Pengkui Luo) Date: Sun, 25 Mar 2012 17:47:40 -0500 Subject: [SciPy-User] How to get the index iterator of a scipy sparse matrix? Message-ID: e.g. suppose A is a scipy lil sparse matrix, and the result of print(A) is: (0, 1) 1.0 (0, 2) -1.0 (1, 0) 1.0 (1, 2) -1.0 (2, 1) 2.0 How can I get an iterator (or at least a list) of these (i, j) index pairs? Thanks! -- Pengkui -------------- next part -------------- An HTML attachment was scrubbed... URL: From JRadinger at gmx.at Mon Mar 26 06:11:24 2012 From: JRadinger at gmx.at (Johannes Radinger) Date: Mon, 26 Mar 2012 12:11:24 +0200 Subject: [SciPy-User] ier-integer in optimize.leastsq Message-ID: <20120326101124.70420@gmx.net> Hi, Some months ago I started already this topic,... Now, while writing my manuskript this topic comes up again. I am using the optimize.leastsq function and would like to describe my results especielly the "ier-level". If ier in my result is 1, then the solution which was found ensures a ftol-"quality", resp. the sum of squares for the relative errors are below the ftol value? As I didn't set any ftol value, the default value is used. Therefore, in the case of optimization results with ier=1, is it possible to state: "For all fitted solutions the relative errors in the sum of squares are below the desired standard value of 1.49012e-08" ??? Where does this standard value come from? Best regards, Johannes -- NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone! Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a From Philip_Bransford at vrtx.com Mon Mar 26 09:54:03 2012 From: Philip_Bransford at vrtx.com (Philip_Bransford at vrtx.com) Date: Mon, 26 Mar 2012 09:54:03 -0400 Subject: [SciPy-User] ier-integer in optimize.leastsq In-Reply-To: <20120326101124.70420@gmx.net> References: <20120326101124.70420@gmx.net> Message-ID: 1.49012e-08 = numpy.sqrt(numpy.finfo(float).eps) From: "Johannes Radinger" To: scipy-user at scipy.org Date: 03/26/2012 06:11 AM Subject: [SciPy-User] ier-integer in optimize.leastsq Sent by: scipy-user-bounces at scipy.org Hi, Some months ago I started already this topic,... Now, while writing my manuskript this topic comes up again. I am using the optimize.leastsq function and would like to describe my results especielly the "ier-level". If ier in my result is 1, then the solution which was found ensures a ftol-"quality", resp. the sum of squares for the relative errors are below the ftol value? As I didn't set any ftol value, the default value is used. Therefore, in the case of optimization results with ier=1, is it possible to state: "For all fitted solutions the relative errors in the sum of squares are below the desired standard value of 1.49012e-08" ??? Where does this standard value come from? Best regards, Johannes -- NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone! Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From cweisiger at msg.ucsf.edu Mon Mar 26 11:59:52 2012 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Mon, 26 Mar 2012 08:59:52 -0700 Subject: [SciPy-User] OSX, Python 3.2.2 and NumPy/SciPy In-Reply-To: References: <1332502103.17367.YahooMailNeo@web161502.mail.bf1.yahoo.com> Message-ID: On Sat, Mar 24, 2012 at 3:24 PM, Ralf Gommers wrote: > > > On Fri, Mar 23, 2012 at 12:28 PM, Chris Share wrote: >> >> Hi, >> >> I'm new to Python however I have a reasonable amount of programming >> experience (C/C++). >> >> I'm currently working on OSX (10.6.8) and I've installed Python 3.2.2. OSX >> also comes with Python 2.6.6. >> >> I'm interested in scientific computing so I'd like to install Numpy/SciPy. >> I've managed to do this for the 2.6.6 version of Python however I'm unclear >> as to how I do this for the 3.2.2 version. If you aren't interested in compiling Numpy/Scipy yourself, you might also consider installing Python 2.7 and using the binary installers for Numpy/Scipy. Support for Python 3.x is still rather spotty despite it having been out for quite some time now. I wouldn't recommend installing anything using the system Python. Usually it works well, but you don't want to end up accidentally overwriting something the system is relying on, and if you ever find yourself wanting to make a standalone program with a bundled Python interpreter, you don't have the rights to distribute the system Python. Libraries like py2app for making standalone Python programs will refuse to bundle the system Python for that reason. Easier to just install another Python and then install all your libraries onto that. -Chris From andrew_giessel at hms.harvard.edu Mon Mar 26 12:10:32 2012 From: andrew_giessel at hms.harvard.edu (Andrew Giessel) Date: Mon, 26 Mar 2012 12:10:32 -0400 Subject: [SciPy-User] ier-integer in optimize.leastsq In-Reply-To: References: <20120326101124.70420@gmx.net> Message-ID: In other words, it is a function of the precision of the number type (float) you use in your fitting routines (specific to your computer architecture). hth, ag On Mon, Mar 26, 2012 at 09:54, wrote: > 1.49012e-08= numpy.sqrt(numpy.finfo(float).eps) > > > > From: "Johannes Radinger" > To: scipy-user at scipy.org > Date: 03/26/2012 06:11 AM > Subject: [SciPy-User] ier-integer in optimize.leastsq > Sent by: scipy-user-bounces at scipy.org > ------------------------------ > > > > Hi, > > Some months ago I started already this topic,... Now, while writing my > manuskript this topic comes up again. I am using the optimize.leastsq > function > and would like to describe my results especielly the "ier-level". > > If ier in my result is 1, then the solution which was found ensures a > ftol-"quality", resp. the sum of squares for the relative errors are below > the ftol value? > As I didn't set any ftol value, the default value is used. Therefore, in > the case of optimization > results with ier=1, is it possible to state: > > "For all fitted solutions the relative errors in the sum of squares are > below the desired standard value of 1.49012e-08" ??? > > Where does this standard value come from? > > Best regards, > > Johannes > -- > NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone! > > Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel at hms.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryanlists at gmail.com Mon Mar 26 13:28:13 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon, 26 Mar 2012 12:28:13 -0500 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: It seems like I am using code no one cares about or uses. I also worked around this by just using cPickle. Is there a better approach I should be using to save a collection of numpy arrays efficiently with little hassle? On Sun, Mar 25, 2012 at 3:55 PM, Ralf Gommers wrote: > > > On Thu, Mar 22, 2012 at 10:33 PM, Ryan Krauss wrote: >> >> I have some data sets stored using scipy.io.save_as_module. ?I >> recently upgrade to 0.10 and I can no longer open this module. >> Further, I tried to reprocess my data and resave it and I am still >> getting the same error message. ?Just a couple of lines are needed to >> recreate my problem: >> >> mydict = {'a':12.34} >> scipy.io.save_as_module('mymod',mydict) >> import mymod >> >> The response to the last command (import) is >> >> In [18]: import mymod >> >> --------------------------------------------------------------------------- >> AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call >> last) >> >> /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/TMM_SFLR_model1.py >> in () >> ----> 1 >> ? ? ?2 >> ? ? ?3 >> ? ? ?4 >> ? ? ?5 >> >> /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/mymod.py in >> () >> ? ? ?1 import scipy.io.data_store as data_store >> ? ? ?2 import mymod >> ----> 3 data_store._load(mymod) >> ? ? ?4 >> ? ? ?5 >> >> AttributeError: 'module' object has no attribute '_load' >> >> >> Can anyone help me with this? > > > You can add this in scipy/io/data_store.py: > > def _load(module): > ??? """ Load data into module from a shelf with > ??????? the same name as the module. > ??? """ > ??? dir,filename = os.path.split(module.__file__) > ??? filebase = filename.split('.')[0] > ??? fn = os.path.join(dir, filebase) > ??? f = dumb_shelve.open(fn, "r") > ??? #exec( 'import ' + module.__name__) > ??? for i in f.keys(): > ??????? exec( 'import ' + module.__name__+ ';' + > ????????????? module.__name__+'.'+i + '=' + 'f["' + i + '"]') > > This was caused by an incorrect removal of deprecated code in > https://github.com/scipy/scipy/commit/329a5e2713. So apparently > save_as_module() has been completely broken for 2 years without anyone > noticing..... > > Proposal: fix save_as_module now so it can load data again, deprecate it for > 0.11 and remove it for 0.12. > > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From matthew.brett at gmail.com Mon Mar 26 13:47:52 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 26 Mar 2012 10:47:52 -0700 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: Hi, On Sun, Mar 25, 2012 at 1:55 PM, Ralf Gommers wrote: > > > On Thu, Mar 22, 2012 at 10:33 PM, Ryan Krauss wrote: >> >> I have some data sets stored using scipy.io.save_as_module. ?I >> recently upgrade to 0.10 and I can no longer open this module. >> Further, I tried to reprocess my data and resave it and I am still >> getting the same error message. ?Just a couple of lines are needed to >> recreate my problem: >> >> mydict = {'a':12.34} >> scipy.io.save_as_module('mymod',mydict) >> import mymod >> >> The response to the last command (import) is >> >> In [18]: import mymod >> >> --------------------------------------------------------------------------- >> AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call >> last) >> >> /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/TMM_SFLR_model1.py >> in () >> ----> 1 >> ? ? ?2 >> ? ? ?3 >> ? ? ?4 >> ? ? ?5 >> >> /home/ryan/siue/Research/modeling/SFLR/system_ID/TMM/mymod.py in >> () >> ? ? ?1 import scipy.io.data_store as data_store >> ? ? ?2 import mymod >> ----> 3 data_store._load(mymod) >> ? ? ?4 >> ? ? ?5 >> >> AttributeError: 'module' object has no attribute '_load' >> >> >> Can anyone help me with this? > > > You can add this in scipy/io/data_store.py: > > def _load(module): > ??? """ Load data into module from a shelf with > ??????? the same name as the module. > ??? """ > ??? dir,filename = os.path.split(module.__file__) > ??? filebase = filename.split('.')[0] > ??? fn = os.path.join(dir, filebase) > ??? f = dumb_shelve.open(fn, "r") > ??? #exec( 'import ' + module.__name__) > ??? for i in f.keys(): > ??????? exec( 'import ' + module.__name__+ ';' + > ????????????? module.__name__+'.'+i + '=' + 'f["' + i + '"]') > > This was caused by an incorrect removal of deprecated code in > https://github.com/scipy/scipy/commit/329a5e2713. So apparently > save_as_module() has been completely broken for 2 years without anyone > noticing..... > > Proposal: fix save_as_module now so it can load data again, deprecate it for > 0.11 and remove it for 0.12. That sounds reasonable to me. See you, Matthew From ralf.gommers at googlemail.com Mon Mar 26 14:19:25 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Mar 2012 20:19:25 +0200 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: On Mon, Mar 26, 2012 at 7:28 PM, Ryan Krauss wrote: > It seems like I am using code no one cares about or uses. I also > worked around this by just using cPickle. Is there a better approach > I should be using to save a collection of numpy arrays efficiently > with little hassle? > Yes: http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryanlists at gmail.com Mon Mar 26 16:45:29 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon, 26 Mar 2012 15:45:29 -0500 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: > Yes: http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html Thanks. If I am the only one who has tried to use save_as_module in a really long time, feel free to get rid of it sooner. I will either use savez or cPickle. On Mon, Mar 26, 2012 at 1:19 PM, Ralf Gommers wrote: > > > On Mon, Mar 26, 2012 at 7:28 PM, Ryan Krauss wrote: >> >> It seems like I am using code no one cares about or uses. ?I also >> worked around this by just using cPickle. ?Is there a better approach >> I should be using to save a collection of numpy arrays efficiently >> with little hassle? > > > Yes: http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html > > Ralf > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Mon Mar 26 16:55:31 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Mar 2012 22:55:31 +0200 Subject: [SciPy-User] problem with loading data from data_store In-Reply-To: References: Message-ID: On Mon, Mar 26, 2012 at 10:45 PM, Ryan Krauss wrote: > > Yes: > http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html > > Thanks. > > If I am the only one who has tried to use save_as_module in a really > long time, feel free to get rid of it sooner. I will either use savez > or cPickle. > Even if you're the only user (impossible to tell), there's no reason to skip the normal deprecation dance I think. Ralf > > On Mon, Mar 26, 2012 at 1:19 PM, Ralf Gommers > wrote: > > > > > > On Mon, Mar 26, 2012 at 7:28 PM, Ryan Krauss > wrote: > >> > >> It seems like I am using code no one cares about or uses. I also > >> worked around this by just using cPickle. Is there a better approach > >> I should be using to save a collection of numpy arrays efficiently > >> with little hassle? > > > > > > Yes: > http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html > > > > Ralf > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Mon Mar 26 18:25:39 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 26 Mar 2012 17:25:39 -0500 Subject: [SciPy-User] How to get the index iterator of a scipy sparse matrix? In-Reply-To: References: Message-ID: On Sun, Mar 25, 2012 at 5:47 PM, Pengkui Luo wrote: > e.g. suppose A is a scipy lil sparse matrix, and the result of print(A) is: > > (0, 1) 1.0 > (0, 2) -1.0 > (1, 0) 1.0 > (1, 2) -1.0 > (2, 1) 2.0 > > How can I get an iterator (or at least a list) of these (i, j) index pairs? > > Thanks! > > You could convert the matrix to DOK format and get the keys: In [1]: from scipy.sparse import lil_matrix In [2]: a = lil_matrix([[0,0,0],[0,10,0],[20,0,30]]) In [3]: a.todok().keys() Out[3]: [(2, 0), (1, 1), (2, 2)] In [4]: a.todense() Out[4]: matrix([[ 0, 0, 0], [ 0, 10, 0], [20, 0, 30]]) That is not the most efficient method, but it is certainly easy to implement. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.borghgraef.rma at gmail.com Tue Mar 27 10:03:31 2012 From: alexander.borghgraef.rma at gmail.com (Alexander Borghgraef) Date: Tue, 27 Mar 2012 16:03:31 +0200 Subject: [SciPy-User] Inverse of binary_repr Message-ID: Hi all, Is there an inverse function of binary_repr, which takes a binary string representation of a number ( like '100') and returns an integer? -- Alex Borghgraef -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Tue Mar 27 10:06:07 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 27 Mar 2012 09:06:07 -0500 Subject: [SciPy-User] Inverse of binary_repr In-Reply-To: References: Message-ID: On Tue, Mar 27, 2012 at 9:03 AM, Alexander Borghgraef < alexander.borghgraef.rma at gmail.com> wrote: > Hi all, > > Is there an inverse function of binary_repr, which takes a binary string > representation of a number ( like '100') and returns an integer? > > Use the optional second argument of int(), which is the base: In [1]: s = "1001" In [2]: int(s, 2) Out[2]: 9 Warren > -- > Alex Borghgraef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Mar 27 10:06:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 27 Mar 2012 15:06:40 +0100 Subject: [SciPy-User] Inverse of binary_repr In-Reply-To: References: Message-ID: On Tue, Mar 27, 2012 at 15:03, Alexander Borghgraef wrote: > Hi all, > > ?Is there an inverse function of binary_repr, which takes a binary string > representation of a number ( like '100') and returns an integer? [~] |1> int('100', 2) 4 -- Robert Kern From alexander.borghgraef.rma at gmail.com Tue Mar 27 10:18:00 2012 From: alexander.borghgraef.rma at gmail.com (Alexander Borghgraef) Date: Tue, 27 Mar 2012 16:18:00 +0200 Subject: [SciPy-User] Inverse of binary_repr In-Reply-To: References: Message-ID: Thanks, I knew I was looking in the wrong place :-) -- Alex Borghgraef -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryanlists at gmail.com Tue Mar 27 14:48:14 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Mar 2012 13:48:14 -0500 Subject: [SciPy-User] problem with dot for complex matrices Message-ID: I am loosing my mind while trying to debug some code. I am trying to find the cause of some differences between numpy analysis and analysis done first in maxima and then converted to python code. The maxima approach is more difficult to do, but seems to lead to the correct answers. The core issue seems to be one dot product of a 2x2 and a 2x1 that are both complex numbers: here is the 2x2: ipdb> submatinv array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) here is the 2x1: ipdb> augcol array([[ -3.74729148e-05-0.0005937j ], [ 7.96025801e-04+0.01137658j]]) verifying their shapes and data types: ipdb> submatinv.shape (2, 2) ipdb> submatinv.dtype dtype('complex128') ipdb> augcol.shape (2, 1) ipdb> augcol.dtype dtype('complex128') I need to compute this result: ipdb> -1*numpy.dot(submatinv,augcol) array([[ 5.30985737e-05+0.00038316j], [ 1.72370377e-04+0.00115503j]]) If I hard code how to do the matrix multiplication, I get the correct answer (it agrees with Maxima): For the first element: ipdb> -1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) (-0.005327660633034575+0.0011288088216130766j) For the second ipdb> -1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) (-0.016047752110848554+0.003432076134378004j) What is the dot product doing if it isn't dotting row by column? I am not seeing something. Thanks, Ryan From josef.pktd at gmail.com Tue Mar 27 14:57:03 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 27 Mar 2012 14:57:03 -0400 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: On Tue, Mar 27, 2012 at 2:48 PM, Ryan Krauss wrote: > I am loosing my mind while trying to debug some code. ?I am trying to > find the cause of some differences between numpy analysis and analysis > done first in maxima and then converted to python code. ?The maxima > approach is more difficult to do, but seems to lead to the correct > answers. ?The core issue seems to be one dot product of a 2x2 and a > 2x1 that are both complex numbers: > > here is the 2x2: > > ipdb> submatinv > array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], > ? ? ? [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) > > here is the 2x1: > > ipdb> augcol > array([[ -3.74729148e-05-0.0005937j ], > ? ? ? [ ?7.96025801e-04+0.01137658j]]) > > verifying their shapes and data types: > > ipdb> submatinv.shape > (2, 2) > ipdb> submatinv.dtype > dtype('complex128') > ipdb> augcol.shape > (2, 1) > ipdb> augcol.dtype > dtype('complex128') > > I need to compute this result: > > ipdb> -1*numpy.dot(submatinv,augcol) > array([[ ?5.30985737e-05+0.00038316j], > ? ? ? [ ?1.72370377e-04+0.00115503j]]) > > If I hard code how to do the matrix multiplication, I get the correct > answer (it agrees with Maxima): > > For the first element: > ipdb> -1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) > (-0.005327660633034575+0.0011288088216130766j) > > For the second > ipdb> -1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) > (-0.016047752110848554+0.003432076134378004j) > > What is the dot product doing if it isn't dotting row by column? > > I am not seeing something. with numpy 1.5.1, I get the results you want >>> m1 = np.array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], ... [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) >>> m2 = np.array([[ -3.74729148e-05-0.0005937j ], ... [ 7.96025801e-04+0.01137658j]]) >>> np.dot(m1, m2) array([[ 0.00532766-0.00112881j], [ 0.01604775-0.00343208j]]) Josef > > Thanks, > > Ryan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From nwagner at iam.uni-stuttgart.de Tue Mar 27 14:58:26 2012 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 27 Mar 2012 20:58:26 +0200 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: On Tue, 27 Mar 2012 13:48:14 -0500 Ryan Krauss wrote: > I am loosing my mind while trying to debug some code. I >am trying to > find the cause of some differences between numpy >analysis and analysis > done first in maxima and then converted to python code. > The maxima > approach is more difficult to do, but seems to lead to >the correct > answers. The core issue seems to be one dot product of >a 2x2 and a > 2x1 that are both complex numbers: > > here is the 2x2: > > ipdb> submatinv > array([[-0.22740265-1.63857451j, >-0.07740957-0.55847886j], > [-3.20602957-4.93959054j, >-0.36746252-1.68352465j]]) > > here is the 2x1: > > ipdb> augcol > array([[ -3.74729148e-05-0.0005937j ], > [ 7.96025801e-04+0.01137658j]]) > > verifying their shapes and data types: > > ipdb> submatinv.shape > (2, 2) > ipdb> submatinv.dtype > dtype('complex128') > ipdb> augcol.shape > (2, 1) > ipdb> augcol.dtype > dtype('complex128') > > I need to compute this result: > > ipdb> -1*numpy.dot(submatinv,augcol) > array([[ 5.30985737e-05+0.00038316j], > [ 1.72370377e-04+0.00115503j]]) > > If I hard code how to do the matrix multiplication, I >get the correct > answer (it agrees with Maxima): > >For the first element: > ipdb> >-1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) > (-0.005327660633034575+0.0011288088216130766j) > >For the second > ipdb> >-1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) > (-0.016047752110848554+0.003432076134378004j) > > What is the dot product doing if it isn't dotting row by >column? > > I am not seeing something. > > Thanks, > > Ryan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Hi Ryan, I cannot reproduce your np.dot results, here. python -i ryan_1.py [[-0.00532766+0.00112881j] [-0.01604775+0.00343208j]] 1.7.0.dev-3503c5f Nils -------------- next part -------------- A non-text attachment was scrubbed... Name: ryan_1.py Type: text/x-python Size: 315 bytes Desc: not available URL: From ryanlists at gmail.com Tue Mar 27 15:14:57 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Mar 2012 14:14:57 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: Thanks to Nils and Josef for responding so quickly. I don't know if I feel more or less confused: If I copy and paste the code from my email, I can't reproduce my own problem: In [5]: A = array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], ...: [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) In [6]: B = array([[ -3.74729148e-05-0.0005937j ], ...: [ 7.96025801e-04+0.01137658j]]) In [7]: -1*dot(A,B) Out[7]: array([[-0.00532766+0.00112881j], [-0.01604775+0.00343208j]]) But if I use the matrices returned by my function I get the wrong result: In [27]: -1*numpy.dot(submat_inv_num, augcol_num) Out[27]: array([[ 5.30985737e-05+0.00038316j], [ 1.72370377e-04+0.00115503j]]) Even though they seem to be very nearly the same arrays: In [11]: submat_inv_num Out[11]: array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) In [12]: A Out[12]: array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) In [13]: submat_inv_num.dtype Out[13]: dtype('complex128') In [14]: A.dtype Out[14]: dtype('complex128') In [15]: A.shape Out[15]: (2, 2) In [16]: submat_inv_num.shape Out[16]: (2, 2) In [17]: submat_inv_num - A Out[17]: array([[ 1.18593824e-09 +3.83908949e-09j, 1.45239888e-09 +4.30272740e-09j], [ -2.42228770e-09 -2.12942108e-09j, -4.36657455e-09 +2.14619789e-09j]]) In [20]: augcol_num.dtype Out[20]: dtype('complex128') In [21]: augcol_num.shape Out[21]: (2, 1) In [22]: B.dtype Out[22]: dtype('complex128') In [23]: B.shape Out[23]: (2, 1) In [18]: augcol_num - B Out[18]: array([[ -2.57355850e-14 -5.09694849e-11j], [ -1.51298895e-13 +2.85492891e-09j]]) Any ideas as to what might be going on? FYI, In [24]: numpy.__version__ Out[24]: '1.6.1' In [25]: scipy.__version__ Out[25]: '0.10.0' Thanks again, Ryan On Tue, Mar 27, 2012 at 1:58 PM, Nils Wagner wrote: > On Tue, 27 Mar 2012 13:48:14 -0500 > ?Ryan Krauss wrote: >> >> I am loosing my mind while trying to debug some code. ?I am trying to >> find the cause of some differences between numpy analysis and analysis >> done first in maxima and then converted to python code. The maxima >> approach is more difficult to do, but seems to lead to the correct >> answers. ?The core issue seems to be one dot product of a 2x2 and a >> 2x1 that are both complex numbers: >> >> here is the 2x2: >> >> ipdb> submatinv >> array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], >> ? ? ?[-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) >> >> here is the 2x1: >> >> ipdb> augcol >> array([[ -3.74729148e-05-0.0005937j ], >> ? ? ?[ ?7.96025801e-04+0.01137658j]]) >> >> verifying their shapes and data types: >> >> ipdb> submatinv.shape >> (2, 2) >> ipdb> submatinv.dtype >> dtype('complex128') >> ipdb> augcol.shape >> (2, 1) >> ipdb> augcol.dtype >> dtype('complex128') >> >> I need to compute this result: >> >> ipdb> -1*numpy.dot(submatinv,augcol) >> array([[ ?5.30985737e-05+0.00038316j], >> ? ? ?[ ?1.72370377e-04+0.00115503j]]) >> >> If I hard code how to do the matrix multiplication, I get the correct >> answer (it agrees with Maxima): >> >> For the first element: >> ipdb> -1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) >> (-0.005327660633034575+0.0011288088216130766j) >> >> For the second >> ipdb> -1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) >> (-0.016047752110848554+0.003432076134378004j) >> >> What is the dot product doing if it isn't dotting row by column? >> >> I am not seeing something. >> >> Thanks, >> >> Ryan >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > Hi Ryan, > > I cannot reproduce your np.dot results, here. > > python -i ryan_1.py > [[-0.00532766+0.00112881j] > ?[-0.01604775+0.00343208j]] > 1.7.0.dev-3503c5f > > Nils > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From e.antero.tammi at gmail.com Tue Mar 27 15:15:19 2012 From: e.antero.tammi at gmail.com (eat) Date: Tue, 27 Mar 2012 22:15:19 +0300 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: Hi, On Tue, Mar 27, 2012 at 9:48 PM, Ryan Krauss wrote: > I am loosing my mind while trying to debug some code. I am trying to > find the cause of some differences between numpy analysis and analysis > done first in maxima and then converted to python code. The maxima > approach is more difficult to do, but seems to lead to the correct > answers. The core issue seems to be one dot product of a 2x2 and a > 2x1 that are both complex numbers: > > here is the 2x2: > > ipdb> submatinv > array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], > [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) > > here is the 2x1: > > ipdb> augcol > array([[ -3.74729148e-05-0.0005937j ], > [ 7.96025801e-04+0.01137658j]]) > > verifying their shapes and data types: > > ipdb> submatinv.shape > (2, 2) > ipdb> submatinv.dtype > dtype('complex128') > ipdb> augcol.shape > (2, 1) > ipdb> augcol.dtype > dtype('complex128') > > I need to compute this result: > > ipdb> -1*numpy.dot(submatinv,augcol) > array([[ 5.30985737e-05+0.00038316j], > [ 1.72370377e-04+0.00115503j]]) > > If I hard code how to do the matrix multiplication, I get the correct > answer (it agrees with Maxima): > > For the first element: > ipdb> -1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) > (-0.005327660633034575+0.0011288088216130766j) > > For the second > ipdb> -1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) > (-0.016047752110848554+0.003432076134378004j) > > What is the dot product doing if it isn't dotting row by column? > > I am not seeing something. > FWIIWO, I can't either reproduce your results: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: s.round(7) Out[]: array([[-0.2274026-1.6385745j, -0.0774096-0.5584789j], [-3.2060296-4.9395905j, -0.3674625-1.6835246j]]) In []: a.round(7) Out[]: array([[ -3.75000000e-05-0.0005937j], [ 7.96000000e-04+0.0113766j]]) In []: -1* dot(s, a).round(7) Out[]: array([[-0.0053277+0.0011288j], [-0.0160477+0.0034321j]]) Regards, -eat > > Thanks, > > Ryan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryanlists at gmail.com Tue Mar 27 15:26:02 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Mar 2012 14:26:02 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: To further add to my own mystery, why does this fix the problem: In [37]: -1*numpy.dot(submat_inv_num, augcol_num) Out[37]: array([[ 5.30985737e-05+0.00038316j], [ 1.72370377e-04+0.00115503j]]) In [38]: A2 = copy.copy(submat_inv_num) In [39]: B2 = copy.copy(augcol_num) In [40]: -1*dot(A2,B2) Out[40]: array([[-0.00532766+0.00112881j], [-0.01604775+0.00343208j]]) On Tue, Mar 27, 2012 at 2:15 PM, eat wrote: > Hi, > > On Tue, Mar 27, 2012 at 9:48 PM, Ryan Krauss wrote: >> >> I am loosing my mind while trying to debug some code. ?I am trying to >> find the cause of some differences between numpy analysis and analysis >> done first in maxima and then converted to python code. ?The maxima >> approach is more difficult to do, but seems to lead to the correct >> answers. ?The core issue seems to be one dot product of a 2x2 and a >> 2x1 that are both complex numbers: >> >> here is the 2x2: >> >> ipdb> submatinv >> array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], >> ? ? ? [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) >> >> here is the 2x1: >> >> ipdb> augcol >> array([[ -3.74729148e-05-0.0005937j ], >> ? ? ? [ ?7.96025801e-04+0.01137658j]]) >> >> verifying their shapes and data types: >> >> ipdb> submatinv.shape >> (2, 2) >> ipdb> submatinv.dtype >> dtype('complex128') >> ipdb> augcol.shape >> (2, 1) >> ipdb> augcol.dtype >> dtype('complex128') >> >> I need to compute this result: >> >> ipdb> -1*numpy.dot(submatinv,augcol) >> array([[ ?5.30985737e-05+0.00038316j], >> ? ? ? [ ?1.72370377e-04+0.00115503j]]) >> >> If I hard code how to do the matrix multiplication, I get the correct >> answer (it agrees with Maxima): >> >> For the first element: >> ipdb> -1*(submatinv[0,0]*augcol[0,0]+submatinv[0,1]*augcol[1,0]) >> (-0.005327660633034575+0.0011288088216130766j) >> >> For the second >> ipdb> -1*(submatinv[1,0]*augcol[0,0]+submatinv[1,1]*augcol[1,0]) >> (-0.016047752110848554+0.003432076134378004j) >> >> What is the dot product doing if it isn't dotting row by column? >> >> I am not seeing something. > > FWIIWO, I can't either reproduce your results: > In []: sys.version > Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' > In []: np.version.version > Out[]: '1.6.0' > > In []: s.round(7) > Out[]: > array([[-0.2274026-1.6385745j, -0.0774096-0.5584789j], > ? ? ? ?[-3.2060296-4.9395905j, -0.3674625-1.6835246j]]) > In []: a.round(7) > Out[]: > array([[ -3.75000000e-05-0.0005937j], > ? ? ? ?[ ?7.96000000e-04+0.0113766j]]) > > In []: -1* dot(s, a).round(7) > Out[]: > array([[-0.0053277+0.0011288j], > ? ? ? ?[-0.0160477+0.0034321j]]) > > Regards, > -eat >> >> >> Thanks, >> >> Ryan >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From aronne.merrelli at gmail.com Tue Mar 27 15:35:38 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Tue, 27 Mar 2012 14:35:38 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: On Tue, Mar 27, 2012 at 2:26 PM, Ryan Krauss wrote: > To further add to my own mystery, why does this fix the problem: > > In [37]: -1*numpy.dot(submat_inv_num, augcol_num) > Out[37]: > array([[ ?5.30985737e-05+0.00038316j], > ? ? ? [ ?1.72370377e-04+0.00115503j]]) > It appears to be equal to: In [1]: M = array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], ...: [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) In [2]: x = array([[ -3.74729148e-05-0.0005937j ], ...: [ 7.96025801e-04+0.01137658j]]) In [16]: -1 * (dot(M.real,x.real) + 1j*dot(M.imag,x.real)) Out[16]: array([[ 5.30985748e-05+0.00038316j], [ 1.72370374e-04+0.00115503j]]) I don't have any idea why it is doing that. You've never posted what the type of those arrays are, though - is it possible it is a subclass of ndarray that is doing something strange to the dot method? I think the call to copy might put it back into a "plain" ndarray. From ryanlists at gmail.com Tue Mar 27 15:49:41 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Mar 2012 14:49:41 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: Thanks for digging further. I don't think I ever deliberately subclass ndarray....(let me look into it). On Tue, Mar 27, 2012 at 2:35 PM, Aronne Merrelli wrote: > On Tue, Mar 27, 2012 at 2:26 PM, Ryan Krauss wrote: >> To further add to my own mystery, why does this fix the problem: >> >> In [37]: -1*numpy.dot(submat_inv_num, augcol_num) >> Out[37]: >> array([[ ?5.30985737e-05+0.00038316j], >> ? ? ? [ ?1.72370377e-04+0.00115503j]]) >> > > > It appears to be equal to: > > In [1]: M = array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], > ? ...: ? ? ? [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) > > In [2]: x = array([[ -3.74729148e-05-0.0005937j ], > ? ...: ? ? ? [ ?7.96025801e-04+0.01137658j]]) > > In [16]: -1 * (dot(M.real,x.real) + 1j*dot(M.imag,x.real)) > Out[16]: > array([[ ?5.30985748e-05+0.00038316j], > ? ? ? [ ?1.72370374e-04+0.00115503j]]) > > > I don't have any idea why it is doing that. You've never posted what > the type of those arrays are, though - is it possible it is a subclass > of ndarray that is doing something strange to the dot method? I think > the call to copy might put it back into a "plain" ndarray. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ryanlists at gmail.com Tue Mar 27 15:56:14 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Mar 2012 14:56:14 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: The matrices are initially created by these lines: matout=scipy.zeros((n,n),dtype=complex128)#+0j colout=scipy.zeros((n,1),dtype=complex128)#+0j They get assigned values from a matrix created using U=scipy.eye(self.maxsize+1,dtype=complex128) And when I ask for their types I get: In [15]: type(augcol_num) Out[15]: In [16]: type(submat_inv_num) Out[16]: So, I don't believe they are subtyped. On Tue, Mar 27, 2012 at 2:49 PM, Ryan Krauss wrote: > Thanks for digging further. ?I don't think I ever deliberately > subclass ndarray....(let me look into it). > > On Tue, Mar 27, 2012 at 2:35 PM, Aronne Merrelli > wrote: >> On Tue, Mar 27, 2012 at 2:26 PM, Ryan Krauss wrote: >>> To further add to my own mystery, why does this fix the problem: >>> >>> In [37]: -1*numpy.dot(submat_inv_num, augcol_num) >>> Out[37]: >>> array([[ ?5.30985737e-05+0.00038316j], >>> ? ? ? [ ?1.72370377e-04+0.00115503j]]) >>> >> >> >> It appears to be equal to: >> >> In [1]: M = array([[-0.22740265-1.63857451j, -0.07740957-0.55847886j], >> ? ...: ? ? ? [-3.20602957-4.93959054j, -0.36746252-1.68352465j]]) >> >> In [2]: x = array([[ -3.74729148e-05-0.0005937j ], >> ? ...: ? ? ? [ ?7.96025801e-04+0.01137658j]]) >> >> In [16]: -1 * (dot(M.real,x.real) + 1j*dot(M.imag,x.real)) >> Out[16]: >> array([[ ?5.30985748e-05+0.00038316j], >> ? ? ? [ ?1.72370374e-04+0.00115503j]]) >> >> >> I don't have any idea why it is doing that. You've never posted what >> the type of those arrays are, though - is it possible it is a subclass >> of ndarray that is doing something strange to the dot method? I think >> the call to copy might put it back into a "plain" ndarray. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From eddybarratt1 at yahoo.co.uk Tue Mar 27 18:31:18 2012 From: eddybarratt1 at yahoo.co.uk (Eddy Barratt) Date: Tue, 27 Mar 2012 23:31:18 +0100 (BST) Subject: [SciPy-User] Building numpy/scipy for python3 on MacOS Lion In-Reply-To: <1331588767.69711.YahooMailNeo@web29505.mail.ird.yahoo.com> References: <1331588767.69711.YahooMailNeo@web29505.mail.ird.yahoo.com> Message-ID: <1332887478.68161.YahooMailNeo@web29506.mail.ird.yahoo.com> I've made some progress with this problem, with much assistance from Ned Deily on the pythonmac mailing list. I can now build numpy for python3, but scipy still won't build. Too install numpy: The scipy website (http://www.scipy.org/Installing_SciPy/Mac_OS_X) suggests working around the C compiler problem with three typed commands, but these are insufficient, you need one more: $ export CC=clang $ export CXX=clang $ export FFLAGS=-ff2c $ export LDSHARED='clang -bundle -undefined dynamic_lookup \ ? ? -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk -g' After this building from source should work. See here for details: http://python.6.n6.nabble.com/Building-numpy-scipy-for-python3-on-MacOS-Lion-td4642828.html Problem building scipy: I don't know what the issue here is, something with the C compiler again though I think. Here are the error messages. I'd greatly appreciate any thoughts on this matter. Thanks, Eddy compiling C sources? C compiler: clang -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk? compile options: '-DNO_ATLAS_INFO=3 -DUSE_VENDOR_BLAS=1 -I/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'? extra options: '-msse3'? clang: scipy/sparse/linalg/dsolve/_superlumodule.c? In file included from scipy/sparse/linalg/dsolve/_superlumodule.c:18:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/arrayobject.h:15:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/ndarraytypes.h:1972:? /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: #warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API" [-W#warnings]? #warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API"? ?^? scipy/sparse/linalg/dsolve/_superlumodule.c:268:9: error: non-void function 'PyInit__superlu' should return a value [-Wreturn-type]? ? ? ? ? return;? ? ? ? ? ^? 1 warning and 1 error generated.? In file included from scipy/sparse/linalg/dsolve/_superlumodule.c:18:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/arrayobject.h:15:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:? In file included from /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/ndarraytypes.h:1972:? /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: #warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API" [-W#warnings]? #warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API"? ?^? scipy/sparse/linalg/dsolve/_superlumodule.c:268:9: error: non-void function 'PyInit__superlu' should return a value [-Wreturn-type]? ? ? ? ? return;? ? ? ? ? ^? 1 warning and 1 error generated.? error: Command "clang -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk -DNO_ATLAS_INFO=3 -DUSE_VENDOR_BLAS=1 -I/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c scipy/sparse/linalg/dsolve/_superlumodule.c -o build/temp.macosx-10.6-intel-3.2/scipy/sparse/linalg/dsolve/_superlumodule.o -msse3" failed with exit status 1? ----- Original Message ----- From: Eddy Barratt To: "scipy-user at scipy.org" Cc: Sent: Monday, 12 March 2012, 15:46 Subject: Building numpy/scipy for python3 on MacOS Lion I can't get Numpy or Scipy to work with Python3 on Mac OSX Lion. I have used pip successfully to install numpy, scipy and matplotlib, and they work well with Python2.7, but in Python3 typing 'import numpy' brings up 'No module named numpy'. I've tried downloading the source code directly and then running 'python3 setup.py build', but I get various error warnings, some in red that have to do with fortran (e.g. 'Could not locate executable f95'). The error message that appears to fail in the end is 'RuntimeError: Broken toolchain: cannot link a simple C program', and appears to be related to the previous line 'sh: gcc-4.2: command not found'. The Scipy website (http://www.scipy.org/Installing_SciPy/Mac_OS_X) suggests that there may be issues with the c compiler, but the same problems didn't arise using pip to install for python2.7. I have followed the instructions on the website regarding changing the compiler but this has not made any difference. I have also tried installing from a virtual environment: >>> mkvirtualenv -p python3.2 test1 >>> pip install numpy But this fails with "Command python setup.py egg_info failed with error code 1 in /Users/Eddy/.virtualenvs/test1/build/numpy" I've considered making python3 default, and then I thought a pip install might work, but I don't know how to do that. Does anyone have any suggestions for how I might proceed? I'm relatively new to Python but it's something I feel I'm likely to become more involved in so I'd like to start using Python3 before I get too established with 2.7. Thanks for your help. Eddy From mdekauwe at gmail.com Tue Mar 27 19:20:13 2012 From: mdekauwe at gmail.com (Martin De Kauwe) Date: Tue, 27 Mar 2012 16:20:13 -0700 (PDT) Subject: [SciPy-User] Numpy/Scipy: Avoiding nested loops to operate on matrix-valued images In-Reply-To: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> References: <29677913.980.1331801968906.JavaMail.geo-discussion-forums@ynkz21> Message-ID: <26902679.5.1332890413406.JavaMail.geo-discussion-forums@pbcwe9> I didn't quite follow exactly what you were doing, but someone previously showed me how to avoid inner loops and so perhaps this will help? Instead of... tmp = np.arange(500000).reshape(1000,500) nrows, ncols = tmp.shape[0], tmp.shape[1] out = np.zeros((nrows, ncols)) for i in xrange(nrows): for j in xrange(ncols): out[i,j] = tmp[i,j] * 3.0 you might try... tmp = np.arange(500000).reshape(1000,500) nrows, ncols = tmp.shape[0], tmp.shape[1] out = np.zeros((nrows, ncols)) r = np.arange(nrows) c = np.arange(ncols) out[r[:,None],c] = tmp[r[:,None],c] * 3.0 Assuming your arrays are large you would get a speed bump On Thursday, March 15, 2012 7:59:28 PM UTC+11, tyldurd wrote: > > Hello, > > I am a beginner at python and numpy and I need to compute the matrix > logarithm for each "pixel" (i.e. x,y position) of a matrix-valued image of > dimension MxNx3x3. 3x3 is the dimensions of the matrix at each pixel. > > The function I have written so far is the following: > > def logm_img(im): > from scipy import linalg > dimx = im.shape[0] > dimy = im.shape[1] > res = zeros_like(im) > for x in range(dimx): > for y in range(dimy): > res[x, y, :, :] = linalg.logm(asmatrix(im[x,y,:,:])) > return res > > Is it ok? Is there a way to avoid the two nested loops ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.pinto at gmail.com Tue Mar 27 20:30:33 2012 From: nicolas.pinto at gmail.com (Nicolas Pinto) Date: Tue, 27 Mar 2012 20:30:33 -0400 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: Thanks for the answers. Sorry for the late answer, I've been out of the country. > Another possibility is that the problem comes just from the c++ runtime. > There's another c++ module in Scipy, `scipy.interpolate._interpolate` -- > could you check if importing it also causes the same issue? You are right, the same issue happens with `from scipy.interpolate import _interpolate`. Any advice on how to debug/fix from here? Thanks. N From aronne.merrelli at gmail.com Wed Mar 28 01:22:30 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 28 Mar 2012 00:22:30 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: On Tue, Mar 27, 2012 at 2:56 PM, Ryan Krauss wrote: > The matrices are initially created by these lines: > > ? ? ? ?matout=scipy.zeros((n,n),dtype=complex128)#+0j > ? ? ? ?colout=scipy.zeros((n,1),dtype=complex128)#+0j > > They get assigned values from a matrix created using > > U=scipy.eye(self.maxsize+1,dtype=complex128) > > And when I ask for their types I get: > > In [15]: type(augcol_num) > Out[15]: > > In [16]: type(submat_inv_num) > Out[16]: > > So, I don't believe they are subtyped. > The only other idea I have is to check if you can save the "problem" arrays. Specifically, try this, with the arrays that give the incorrect dot product: In [6]: savez('testing.npz', submat_inv_num=submat_inv_num, augcol_num=augcol_num) Then load them into a new session: In [1]: d = load('testing.npz') In [2]: submat_inv_num = d['submat_inv_num']; augcol_num = d['augcol_num'] Do the reloaded variables give the same incorrect dot product? It is probably a long shot, since I would imagine the save/load would be similar to copy... but if it works then others might be able to inspect the object to see what might be different. One last detail - it looks like the augcol is getting cast to a real number - (this is a clearer example than what I showed earlier): In [17]: dot(submat_inv_num, augcol_num.real) Out[17]: array([[ -5.30985748e-05-0.00038316j], [ -1.72370374e-04-0.00115503j]]) That might be a clue that something is causing augcol_num to get cast into a "normal" float before the dot product is taken. From ryanlists at gmail.com Wed Mar 28 15:04:20 2012 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed, 28 Mar 2012 14:04:20 -0500 Subject: [SciPy-User] problem with dot for complex matrices In-Reply-To: References: Message-ID: Saving and loading the arrays seems to lead to a reproducible error, at least on my machine: d = numpy.load('testing.npz') submat_inv_num = d['submat_inv_num']; augcol_num = d['augcol_num'] -1*dot(submat_inv_num, augcol_num) In [5]: -1*dot(submat_inv_num, augcol_num) Out[5]: array([[ 5.30985737e-05+0.00038316j], [ 1.72370377e-04+0.00115503j]]) On Wed, Mar 28, 2012 at 12:22 AM, Aronne Merrelli wrote: > On Tue, Mar 27, 2012 at 2:56 PM, Ryan Krauss wrote: >> The matrices are initially created by these lines: >> >> ? ? ? ?matout=scipy.zeros((n,n),dtype=complex128)#+0j >> ? ? ? ?colout=scipy.zeros((n,1),dtype=complex128)#+0j >> >> They get assigned values from a matrix created using >> >> U=scipy.eye(self.maxsize+1,dtype=complex128) >> >> And when I ask for their types I get: >> >> In [15]: type(augcol_num) >> Out[15]: >> >> In [16]: type(submat_inv_num) >> Out[16]: >> >> So, I don't believe they are subtyped. >> > > The only other idea I have is to check if you can save the "problem" > arrays. Specifically, try this, with the arrays that give the > incorrect dot product: > > In [6]: savez('testing.npz', submat_inv_num=submat_inv_num, > augcol_num=augcol_num) > > Then load them into a new session: > > In [1]: d = load('testing.npz') > > In [2]: submat_inv_num = d['submat_inv_num']; augcol_num = d['augcol_num'] > > Do the reloaded variables give the same incorrect dot product? It is > probably a long shot, since I would imagine the save/load would be > similar to copy... but if it works then others might be able to > inspect the object to see what might be different. One last detail - > it looks like the augcol is getting cast to a real number - (this is a > clearer example than what I showed earlier): > > In [17]: dot(submat_inv_num, ?augcol_num.real) > Out[17]: > array([[ -5.30985748e-05-0.00038316j], > ? ? ? [ -1.72370374e-04-0.00115503j]]) > > That might be a clue that something is causing augcol_num to get cast > into a "normal" float before the dot product is taken. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- A non-text attachment was scrubbed... Name: testing.npz Type: application/octet-stream Size: 494 bytes Desc: not available URL: From pav at iki.fi Wed Mar 28 15:38:16 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 28 Mar 2012 19:38:16 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?linalg=2Eeigh_hangs_only_after_importing_s?= =?utf-8?q?parse=09module?= References: Message-ID: Nicolas Pinto gmail.com> writes: [clip] > You are right, the same issue happens with `from scipy.interpolate > import _interpolate`. Any advice on how to debug/fix from here? I think you should also verify this by copying `_interpolate.so` outside Scipy and importing it --- namely, `from scipy.interpolate import ...` import also `scipy.sparse` so you cannot pinpoint the problem to `_interpolate`. -- Pauli Virtanen From sebas0 at gmail.com Wed Mar 28 16:22:45 2012 From: sebas0 at gmail.com (Sebastian) Date: Wed, 28 Mar 2012 17:22:45 -0300 Subject: [SciPy-User] conotur plot Message-ID: Dear Folks, I'm using the python bin of epd-7.0-2-rh5-x86_64 on a Linux 2.6.38.8-32.fc15.x86_64 #1 SMP Mon Jun 13 19:49:05 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux system. I'm trying to make a contour plot of astronomical data binning data in a 2D grid. When I bin the data in square bins (eg 60 x 60), the contour plot works fine. But if I change the binning to (10 x 60), by changing the following line in the code, from: "resolucion_x1=(xc1.max()-xc1.min())/(nceldas)" to "resolucion_x1=3* (xc1.max()-xc1.min())/(nceldas)" then I produce three arrays (x1,y1,z1) with size 20,60,1200 instead of 60,60,3600 BUT when I try plot a contour map with pylab.contour I get the follow error: " TypeError: Length of x must be number of columns in z, and length of y must be number of rows." and I think this occurs because: In [788]: shape(x1),shape(y1),shape(z1) Out[788]: ((20,), (60,), (20, 60)) Any idea as to how to solve this so I can use rectangular binning with the code? I use the following code: import numpy as N from subprocess import * import pyfits import matplotlib import pylab import pickle from itertools import izip magk=N.loadtxt("mag_k.gz") magj=N.loadtxt("mag_j.gz") xc1=N.array(magj-magk) yc1=N.array(magk) print ("xc1 max",xc1.max()) print ("yc1 max",yc1.max()) print ("xc1 min",xc1.min()) print ("yc1 min",yc1.min()) print ("is nonnum",N.isnan(xc1).any()) nceldas=60.0 resolucion_x1=(xc1.max()-xc1.min())/(nceldas) resolucion_y1=(yc1.max()-yc1.min())/(nceldas) minix1=xc1.min();miniy1=yc1.min() x1=N.arange(xc1.min(),xc1.max(),resolucion_x1,dtype=xc1.dtype) y1=N.arange(yc1.min(),yc1.max(),resolucion_y1,dtype=yc1.dtype) z1=N.zeros((x1.shape[0],y1.shape[0]),yc1.dtype) print "x1 size" , x1.size print "y1 size" , y1.size print "z1 size" , z1.size xc1=(xc1-xc1.min())/resolucion_x1 yc1=(yc1-yc1.min())/resolucion_y1 print xc1.max(), xc1.min(),yc1.max(),yc1.min() for i,j in zip(xc1,yc1): try: z1[int(j),int(i)]+=1.0 except: pass figure=pylab.figure() pylab.plot(magj-magk,magk,'b.',ms=2.3,alpha=0.70) pylab.ylim(pylab.ylim()[::-1]) pylab.contour(x1,y1,z1*100/len(magk),30,alpha=1,linewidths=5) pylab.show() confused... - Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Wed Mar 28 20:19:00 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 28 Mar 2012 20:19:00 -0400 Subject: [SciPy-User] call signature for dgees f2py external user routine? Message-ID: Can someone explain the call signature for the select function used from the gees routines? Or point me to a reference? I don't understand the syntax. <_arg=...> https://github.com/scipy/scipy/blob/master/scipy/linalg/flapack_user.pyf.src#L3 Thanks, Skipper From jsseabold at gmail.com Wed Mar 28 20:20:28 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 28 Mar 2012 20:20:28 -0400 Subject: [SciPy-User] call signature for dgees f2py external user routine? In-Reply-To: References: Message-ID: On Wed, Mar 28, 2012 at 8:19 PM, Skipper Seabold wrote: > Can someone explain the call signature for the select function used > from the gees routines? Or point me to a reference? I don't understand > the syntax. <_arg=...> > > https://github.com/scipy/scipy/blob/master/scipy/linalg/flapack_user.pyf.src#L3 Sorry meant to send this to the dev list. Please reply there. Skipper From aronne.merrelli at gmail.com Wed Mar 28 23:20:58 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 28 Mar 2012 22:20:58 -0500 Subject: [SciPy-User] conotur plot In-Reply-To: References: Message-ID: On Wed, Mar 28, 2012 at 3:22 PM, Sebastian wrote: > TypeError: Length of x must be number of columns in z, > and length of y must be number of rows." > and I think this occurs because: > > In [788]: shape(x1),shape(y1),shape(z1) > Out[788]: ((20,), (60,), (20, 60)) > The shape of a 2-D array is (number_of_rows, number_of_columns). For that z1 array, the x-array should be 60, and the y should be 20. So you need to transpose z, or swap the x/y arrays to get the correct dimensions. For example: In [15]: x1.shape, y1.shape, z.shape Out[15]: ((20,), (60,), (20, 60)) In [16]: contour(x1,y1,z.T) Out[16]: In [17]: contour(y1,x1,z) Out[17]: In [18]: contour(x1,y1,z) TypeError: Length of x must be number of columns in z, and length of y must be number of rows. From nicolas.pinto at gmail.com Thu Mar 29 11:28:17 2012 From: nicolas.pinto at gmail.com (Nicolas Pinto) Date: Thu, 29 Mar 2012 11:28:17 -0400 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: > I think you should also verify this by copying `_interpolate.so` outside Scipy > and importing it --- namely, `from scipy.interpolate import ...` import also > `scipy.sparse` so you cannot pinpoint the problem to `_interpolate`. Good point. I can still reproduce the bug by copying `_interpolate.so`. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- Nicolas Pinto http://web.mit.edu/pinto From sebas0 at gmail.com Thu Mar 29 13:59:42 2012 From: sebas0 at gmail.com (Sebastian) Date: Thu, 29 Mar 2012 14:59:42 -0300 Subject: [SciPy-User] conotur plot Message-ID: Thanks for the help Aronne, just some feedback on post 6 of SciPy-User Digest, Vol 103, Issue 65 Even though both solution plots contours both sets of contours are off-center (axis inverted). and so they don't follow the density of points, as intended. The solution was proposed by Octavia Bruzzone. Changing: z1=N.zeros((x1.shape[0],y1.shape[0]),yc1.dtype) to z1=N.zeros((y1.shape[0],x1.shape[0]),yc1.dtype) and then plotting contours as normal pylab.contour(x1,y1,z1*100/len(magk),30,alpha=0.5,linewidths=5) best wishes, - Sebastian I'm trying to make a contour plot of astronomical data binning data in a 2D > grid. When I bin the data in square bins (eg 60 x 60), the contour plot > works fine. But if I change the binning to (10 x 60), by changing the > following line in the code, from: > > "resolucion_x1=(xc1.max()-xc1.min())/(nceldas)" > to > "resolucion_x1=3* (xc1.max()-xc1.min())/(nceldas)" > > then I produce three arrays (x1,y1,z1) > with size 20,60,1200 instead of 60,60,3600 > BUT when I try plot a contour map with pylab.contour I get the follow > error: > > " > TypeError: Length of x must be number of columns in z, > and length of y must be number of rows." > and I think this occurs because: > > In [788]: shape(x1),shape(y1),shape(z1) > Out[788]: ((20,), (60,), (20, 60)) > > Any idea as to how to solve this so I can use rectangular binning > with the code? > > I use the following code: > > import numpy as N > from subprocess import * > import pyfits > import matplotlib > import pylab > import pickle > from itertools import izip > > magk=N.loadtxt("mag_k.gz") > magj=N.loadtxt("mag_j.gz") > > > xc1=N.array(magj-magk) > yc1=N.array(magk) > print ("xc1 max",xc1.max()) > print ("yc1 max",yc1.max()) > print ("xc1 min",xc1.min()) > print ("yc1 min",yc1.min()) > print ("is nonnum",N.isnan(xc1).any()) > nceldas=60.0 > resolucion_x1=(xc1.max()-xc1.min())/(nceldas) > resolucion_y1=(yc1.max()-yc1.min())/(nceldas) > minix1=xc1.min();miniy1=yc1.min() > x1=N.arange(xc1.min(),xc1.max(),resolucion_x1,dtype=xc1.dtype) > y1=N.arange(yc1.min(),yc1.max(),resolucion_y1,dtype=yc1.dtype) > z1=N.zeros((x1.shape[0],y1.shape[0]),yc1.dtype) > print "x1 size" , x1.size > print "y1 size" , y1.size > print "z1 size" , z1.size > > xc1=(xc1-xc1.min())/resolucion_x1 > yc1=(yc1-yc1.min())/resolucion_y1 > print xc1.max(), xc1.min(),yc1.max(),yc1.min() > for i,j in zip(xc1,yc1): > try: z1[int(j),int(i)]+=1.0 > except: pass > > figure=pylab.figure() > pylab.plot(magj-magk,magk,'b.',ms=2.3,alpha=0.70) > pylab.ylim(pylab.ylim()[::-1]) > pylab.contour(x1,y1,z1*100/len(magk),30,alpha=1,linewidths=5) > pylab.show() > > confused... > - Sebastian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ptittmann at gmail.com Thu Mar 29 22:06:15 2012 From: ptittmann at gmail.com (Peter Tittmann) Date: Thu, 29 Mar 2012 19:06:15 -0700 Subject: [SciPy-User] scipy.spatial module import errors Message-ID: Hi, I'm using EPD 7.2 on mac OSX lion: In [8]: scipy.__version__ Out[8]: '0.10.0' When I attempt to load the spatial module. I am using spyder and with >>> scipy.spatial( I get the docstrings. When I try to load like this: >>> kd=scipy.spatial.cKDTree(block,1000) i get: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /Users/peter/src/spyderlib/ in () ----> 1 kd=scipy.spatial.cKDTree(block,1000) AttributeError: 'module' object has no attribute 'spatial' Can anyone suggest what might be going on, and or a solution? Thanks! Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Fri Mar 30 00:46:00 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 29 Mar 2012 23:46:00 -0500 Subject: [SciPy-User] scipy.spatial module import errors In-Reply-To: References: Message-ID: On Thu, Mar 29, 2012 at 9:06 PM, Peter Tittmann wrote: > Hi, > > I'm using EPD 7.2 on mac OSX lion: > > In [8]: scipy.__version__ > Out[8]: '0.10.0' > > When I attempt to load the spatial module. I am using spyder and with > >>> scipy.spatial( > > I get the docstrings. > > When I try to load like this: > > >>> kd=scipy.spatial.cKDTree(block,1000) > > i get: > > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > /Users/peter/src/spyderlib/ in () > ----> 1 kd=scipy.spatial.cKDTree(block,1000) > > AttributeError: 'module' object has no attribute 'spatial' > > Can anyone suggest what might be going on, and or a solution? > > Thanks! > > Peter > > 'scipy' is actually a collection of subpackages. The subpackages are not imported by executing 'import scipy'. You'll have to explicitly import the scipy.spatial package with 'import scipy.spatial'. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From jallikattu at googlemail.com Thu Mar 29 02:48:26 2012 From: jallikattu at googlemail.com (morovia morovia) Date: Thu, 29 Mar 2012 12:18:26 +0530 Subject: [SciPy-User] feature extraction from an image. Message-ID: Dear scipy users, I am facing difficulty in the extraction of specific feature from an image from a time series. There is a small speck which is irregular in shape within the circular region of the domain. I am trying to calculate the velocity at which the speck is moving based on this. A sample image is attached. Thanks Best regards Viswanath -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 02.jpeg Type: image/jpeg Size: 41813 bytes Desc: not available URL: From zachary.pincus at yale.edu Fri Mar 30 10:31:58 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 30 Mar 2012 10:31:58 -0400 Subject: [SciPy-User] feature extraction from an image. In-Reply-To: References: Message-ID: <79518D61-C9AB-4C22-AEA7-59E7F4D9E03A@yale.edu> > Dear scipy users, > > I am facing difficulty in the extraction of specific feature from an > image from a time series. In general, you may want to investigate the tools in the scikits-image ("skimage") package: http://scikits-image.org/ > There is a small speck which is irregular in shape within the circular region of the domain. > > I am trying to calculate the velocity at which the speck is moving based on > this. A sample image is attached. This could be a very easy task or a very hard task, depending on a lot of information about the timeseries images that you haven't provided: (1) Is the circular region constant in size/position/coloration? (2) Is the "speck" constant in shape/coloration? (3) Are there other transient "specks" or other sources of noise? (4) Are there multiple "specks" at once? (5) How many time-series do you have? How many frames per time-series? (6) Are there variations (in any of the above features) between the different time-series you have that may not be present within each single time-series? (Such as differences in coloration or position of the circular region?) and so forth. Perhaps you could post a sample timeseries somewhere online and provide a link, so that it's easier to get a sense of the problem? Also, where is this data derived from? Perhaps that information would help formulate a good solution. Zach From Jerome.Kieffer at esrf.fr Fri Mar 30 10:55:39 2012 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Fri, 30 Mar 2012 16:55:39 +0200 Subject: [SciPy-User] feature extraction from an image. In-Reply-To: References: Message-ID: <20120330165539.8c15088c.Jerome.Kieffer@esrf.fr> On Thu, 29 Mar 2012 12:18:26 +0530 morovia morovia wrote: > I am facing difficulty in the extraction of specific feature from an > image from a time series. There is a small speck which is irregular in > shape within the circular region of the domain. > > I am trying to calculate the velocity at which the speck is moving based on > this. A sample image is attached. I made some bindings for "feature extraction" of images (like SIFT and SURF) for image alignment. The code is here: https://github.com/kif/imageAlignment cheers, -- J?r?me Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 From pav at iki.fi Fri Mar 30 14:03:54 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 30 Mar 2012 20:03:54 +0200 Subject: [SciPy-User] linalg.eigh hangs only after importing sparse module In-Reply-To: References: Message-ID: 29.03.2012 17:28, Nicolas Pinto kirjoitti: >> I think you should also verify this by copying `_interpolate.so` outside Scipy >> and importing it --- namely, `from scipy.interpolate import ...` import also >> `scipy.sparse` so you cannot pinpoint the problem to `_interpolate`. > > Good point. I can still reproduce the bug by copying `_interpolate.so`. To get further, the following information is needed: - which platform? - which binaries? - which LAPACK? I'm assuming you're on 64-bit Windows 7. If so, I don't have good clues on how to fix or debug the issue. However, if it's really the C++ runtime that is causing the problems, then compiling Numpy/Scipy with a different compiler could fix the problem. -- Pauli Virtanen From klonuo at gmail.com Sat Mar 31 12:24:38 2012 From: klonuo at gmail.com (klo uo) Date: Sat, 31 Mar 2012 18:24:38 +0200 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? Message-ID: While preparing some images for OCR, I usually discard those with low DPI, but as this happens often I thought to try some image processing and on suggestion (morphological operations) I tried ndimage.morph with idea to play around binary_dilation Images were G4 TIFFs which PIL/MPL can't decode, so I convert to 1bit PNG which I normalized after to 0 and 1. On sample img I applied: ndi.morphology.binary_dilation(img).astype(img.dtype) and ndi.morphology.binary_erosion(img).astype(img.dtype) I attached result images, and wanted to ask two question: 1. Is this result correct? From what I read today seems like what dilation does is erosion and vice versa, but I probably overlooked something 2. Does someone maybe know of better approach for enhancing original sample for OCR (except thresholding, for which I'm aware)? TIA [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 11309 bytes Desc: not available URL: From klonuo at gmail.com Sat Mar 31 12:43:48 2012 From: klonuo at gmail.com (klo uo) Date: Sat, 31 Mar 2012 18:43:48 +0200 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 6:24 PM, klo uo wrote: > > (except thresholding, for which I'm aware)? > > I mean here upsample/blur/sharpen/threshold -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Sat Mar 31 12:48:40 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Sat, 31 Mar 2012 12:48:40 -0400 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 12:24 PM, klo uo wrote: > While preparing some images for OCR, I usually discard those with low DPI, > but as this happens often I thought to try some image processing and > on suggestion (morphological operations) I tried ndimage.morph with idea > to play around binary_dilation > > Images were G4 TIFFs which PIL/MPL can't decode, so I convert to 1bit PNG > which I normalized after to 0 and 1. > > On sample img I applied: > > ndi.morphology.binary_dilation(img).astype(img.dtype) > > and > > ndi.morphology.binary_erosion(img).astype(img.dtype) > > I attached result images, and wanted to ask two question: > > 1. Is this result correct? From what I read today seems like what dilation > does is erosion and vice versa, but I probably overlooked something > This result looks correct to me. I think it depends on what you consider "object" and "background": Typically (I think), image-processing operators consider light regions to be objects and dark objects to be background. So dilation grows right regions and erosion shrinks bright regions. Obviously, in your images, definitions of object and background are reversed (black is object; white is background). > 2. Does someone maybe know of better approach for enhancing original > sample for OCR (except thresholding, for which I'm aware)? > Have you tried the `open` and `close` operators? A morphological opening is just an erosion followed by a dilation and the closing is just the reverse (see e.g., the scikits-image docstrings). For an opening, the erosion would remove some of "salt" (white pixels) in the letters, and the dilation would (more-or-less) restore the letters to their original thickness. The closing would do the same for black pixels on the background. There are other approaches of course, but since you're already thinking about erosion and dilation, these came to mind -Tony > TIA > > [image: Inline image 1] > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 11309 bytes Desc: not available URL: From klonuo at gmail.com Sat Mar 31 13:09:41 2012 From: klonuo at gmail.com (klo uo) Date: Sat, 31 Mar 2012 19:09:41 +0200 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 6:48 PM, Tony Yu wrote: > >> 1. Is this result correct? From what I read today seems like what >> dilation does is erosion and vice versa, but I probably overlooked something >> > > This result looks correct to me. I think it depends on what you consider > "object" and "background": Typically (I think), image-processing operators > consider light regions to be objects and dark objects to be background. So > dilation grows right regions and erosion shrinks bright regions. Obviously, > in your images, definitions of object and background are reversed (black is > object; white is background). > You are right. I thought on first it couldn't be flip logic, but thinking more about it and then backing with result from abs(img-1) shows it's just like that. > 2. Does someone maybe know of better approach for enhancing original >> sample for OCR (except thresholding, for which I'm aware)? >> > > Have you tried the `open` and `close` operators? A morphological opening > is just an erosion followed by a dilation and the closing is just the > reverse (see e.g., the scikits-image docstrings). > For an opening, the erosion would remove some of "salt" (white pixels) in > the letters, and the dilation would (more-or-less) restore the letters to > their original thickness. The closing would do the same for black pixels on > the background. > Thanks for suggestion. Is morphology module in skimage reflection of ndimage, or it's separate implementation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Sat Mar 31 13:18:02 2012 From: klonuo at gmail.com (klo uo) Date: Sat, 31 Mar 2012 19:18:02 +0200 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 7:09 PM, klo uo wrote: > > Is morphology module in skimage reflection of ndimage, or it's separate > implementation? > Nevermind. It's clearly different implementation -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Sat Mar 31 14:08:41 2012 From: klonuo at gmail.com (klo uo) Date: Sat, 31 Mar 2012 20:08:41 +0200 Subject: [SciPy-User] ndimage/morphology - binary dilation and erosion? In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 6:48 PM, Tony Yu wrote: > Have you tried the `open` and `close` operators? A morphological opening > is just an erosion followed by a dilation and the closing is just the > reverse (see e.g., the scikits-image docstrings). > For an opening, the erosion would remove some of "salt" (white pixels) in > the letters, and the dilation would (more-or-less) restore the letters to > their original thickness. The closing would do the same for black pixels on > the background. > I tried grey opening on sample image with both modules. Approach seems good and result is bit identical with both modules (footprint=square(3)), and I thought to comment on differences on both modules: - skimage requires converting data type to 'uint8' and won't accept anything less - ndimage grey opening is 3 times faster on my PC -------------- next part -------------- An HTML attachment was scrubbed... URL: