From cgohlke at uci.edu Thu Sep 1 04:34:16 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 01 Sep 2011 01:34:16 -0700 Subject: [SciPy-User] Projecting volumes down to 2D In-Reply-To: References: Message-ID: <4E5F4388.6090406@uci.edu> On 8/31/2011 3:35 PM, Chris Weisiger wrote: > Briefly, I'm working on a visualization tool for five-dimensional > microscopy data (X/Y/Z/time/wavelength). Different wavelengths can be > transformed with respect to each other: X/Y/Z translation, rotation > about the Z axis, and uniform scaling in X and Y. We can then show > various 2D slices of the data that pass through a specific XYZT point: > an X-Y slice, an X-Z slice, a Y-Z slice, and slices through time. > These slices are generated by transforming the view coordinates and > using scipy.ndimage.map_coordinates. > > Now we want to be able to project an entire row/column/etc. of pixels > into a single pixel. For example, in the X-Y slice, each pixel shown > is actually the brightest pixel from the entire Z column. This example > is easily done by taking the maximum along the Z axis and then > proceeding as normal with generating the slice, albeit with a Z > transformation of 0. That's because the other transformation > parameters don't move data through the Z axis. Thus I still only have > to transform X by Y pixels. > > I'm having trouble with an edge case for transformed data, though: if > the projection axis is X or Y, and there is a rotation/scale factor, > then I can't see a way to avoid having to transform every single pixel > in a 3D volume to obtain the projection -- that is, transforming X by > Y by Z pixels. This is expensive. Obviously each pixel in the volume > must be considered to generate these projections, but does every pixel > have to be transformed? I don't suppose anyone knows of a way to > simplify the problem? > > -Chris This looks like "maximum intensity projection" visualization . MIP can be efficiently implemented using OpenGL by blending together multiple slices, oriented perpendicular to the projection direction, through a 3D texture (xyz data). Also consider VTK's vtkVolumeRayCastMIPFunction class. Christoph From deil.christoph at googlemail.com Thu Sep 1 06:04:21 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Thu, 1 Sep 2011 12:04:21 +0200 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: References: <089E4569-0C53-4FC9-9B0F-353C4EF64478@googlemail.com> Message-ID: On Sep 1, 2011, at 3:45 AM, josef.pktd at gmail.com wrote: > On Wed, Aug 31, 2011 at 12:10 PM, wrote: >> On Wed, Aug 31, 2011 at 11:09 AM, Christoph Deil >> wrote: >>> >>> On Aug 30, 2011, at 11:25 PM, josef.pktd at gmail.com wrote: >>> >>> On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil >>> wrote: >>> >>> I noticed that scipy.optimize.curve_fit returns parameter errors that don't >>> >>> scale with sigma, the standard deviation of ydata, as I expected. >>> >>> Here is a code snippet to illustrate my point, which fits a straight line to >>> >>> five data points: >>> >>> import numpy as np >>> >>> from scipy.optimize import curve_fit >>> >>> x = np.arange(5) >>> >>> y = np.array([1, -2, 1, -2, 1]) >>> >>> sigma = np.array([1, 2, 1, 2, 1]) >>> >>> def f(x, a, b): >>> >>> return a + b * x >>> >>> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma) >>> >>> perr = np.sqrt(pcov.diagonal()) >>> >>> print('*** sigma = {0} ***'.format(sigma)) >>> >>> print('popt: {0}'.format(popt)) >>> >>> print('perr: {0}'.format(perr)) >>> >>> I get the following result: >>> >>> *** sigma = [1 2 1 2 1] *** >>> >>> popt: [ 5.71428536e-01 1.19956213e-08] >>> >>> perr: [ 0.93867933 0.40391117] >>> >>> Increasing sigma by a factor of 10, >>> >>> sigma = 10 * np.array([1, 2, 1, 2, 1]) >>> >>> I get the following result: >>> >>> *** sigma = [10 20 10 20 10] *** >>> >>> popt: [ 5.71428580e-01 -2.27625699e-09] >>> >>> perr: [ 0.93895295 0.37079075] >>> >>> The best-fit values stayed the same as expected. >>> >>> But the error on the slope b decreased by 8% (the error on the offset a >>> >>> didn't change much) >>> >>> I would have expected fit parameter errors to increase with increasing >>> >>> errors on the data!? >>> >>> Is this a bug? >>> >>> No bug in the formulas. I tested all of them when curve_fit was added. >>> >>> However in your example the numerical cov lacks quite a bit of >>> precision. Trying your example with different starting values, I get a >>> 0.05 difference in your perr (std of parameter estimates). >>> >>> Trying smaller xtol and ftol doesn't change anything. (?) >>> >>> Making ftol = 1e-15 very small I get a different wrong result: >>> popt: [ 5.71428580e-01 -2.27625699e-09] >>> perr: [ 0.92582011 0.59868281] >>> What do I have to do to get a correct answer (say to 5 significant digits) >>> from curve_fit for this simple example? >>> >>> Since it's linear >>> >>> import scikits.statsmodels.api as sm >>> >>> x = np.arange(5.) >>> >>> y = np.array([1, -2, 1, -2, 1.]) >>> >>> sigma = np.array([1, 2, 1, 2, 1.]) >>> >>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit() >>> >>> res.params >>> >>> array([ 5.71428571e-01, 1.11022302e-16]) >>> >>> res.bse >>> >>> array([ 0.98609784, 0.38892223]) >>> >>> res = sm.WLS(y, sm.add_constant(x, prepend=True), >>> weights=1./(sigma*10)**2).fit() >>> >>> res.params >>> >>> array([ 5.71428571e-01, 1.94289029e-16]) >>> >>> res.bse >>> >>> array([ 0.98609784, 0.38892223]) >>> >>> rescaling doesn't change parameter estimates nor perr >>> >>> This is what I don't understand. >>> Why don't the parameter estimate errors increase with increasing errors >>> sigma on the data points? >>> If I have less precise measurements, the model parameters should be less >>> constrained?! >>> I was using MINUIT before I learned Scipy and the error definition for a >>> chi2 fit given in the MINUIT User Guide >>> http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node7.html >>> as well as the example results here >>> http://code.google.com/p/pyminuit/wiki/GettingStartedGuide >>> don't mention the factor s_sq that is used in curve_fit to scale pcov. >>> Is the error definition in the MINUIT manual wrong? >>> Can you point me to a web resource that explains why the s_sq factor needs >>> to be applied to the covariance matrix? >> >> It's standard text book information, but Wikipedia seems to be lacking >> a bit in this. >> >> for the linear case >> http://en.wikipedia.org/wiki/Ordinary_least_squares#Assuming_normality >> >> cov_params = sigma^2 (X'X)^{-1} >> >> for the non-linear case with leastsq, X is replaced by Jacobian, >> otherwise everything is the same. >> >> However, in your minuit links I saw only the Hessian mentioned (from >> very fast skimming the pages) >> >> With maximum likelihood, the inverse Hessian is the complete >> covariance matrix, no additional multiplication is necessary. >> >> Essentially, these are implementation details depending on how the >> estimation is calculated, and there are various ways of numerically >> approximating the Hessian. >> That's why this is described for optimize.leastsq (incorrectly as >> Chuck pointed out) and but not in optimize.curve_fit. >> >> With leastsquares are maximum likelihood, rescaling both y and >> f(x,params) has no effect on the parameter estimates, it's just like >> changing units of y, meters instead of centimeters. >> >> I guess scipy.odr would work differently, since it is splitting up the >> errors between y and x's, but I never looked at the details. OK, now I understand, thanks for your explanations. Roughly speaking, scipy.optimize.curve_fit applies a factor s_sq = chi^2 / ndf to the covariance matrix to account for possibly incorrect overall scale in the y errors "sigma" of the data. I had simply not seen this factor being applied to chi^2 fits before. E.g. in many physics and astronomy papers, parameter errors from chi^2 fit results are reported without this factor. Also the manual of the fitting package used by most physicists (MINUIT) as well as the statistics textbook I use (Cowan -- Statistical data analysis) don't mention it. May I therefore suggest to explicitly mention this factor in the scipy.optimize.curve_fit docstring to avoid confusion? >> >> >>> >>> Josef >>> >>> >>> >>> PS: I've attached a script to fit the two examples using statsmodels, scipy >>> and minuit (applying the s_sq factor myself). >>> Here are the results I get (who's right for the first example? why does >>> statsmodels only return on parameter value and error?): >>> """Example from >>> http://code.google.com/p/pyminuit/wiki/GettingStartedGuide""" >>> x = np.array([1 , 2 , 3 , 4 ]) >>> y = np.array([1.1, 2.1, 2.4, 4.3]) >>> sigma = np.array([0.1, 0.1, 0.2, 0.1]) >>> statsmodels.api.WLS >>> popt: [ 1.04516129] >>> perr: [ 0.0467711] >>> scipy.optimize.curve_fit >>> popt: [ 8.53964011e-08 1.04516128e+00] >>> perr: [ 0.27452122 0.09784324] >> >> that's what I get with example 1 when I run your script, >> I don't know why you have one params in your case I'll file a statsmodels issue on github. >> (full_output threw an exception in curve_fit with scipy.__version__ '0.9.0' It's there in HEAD. >> >> statsmodels.api.WLS >> popt: [ -6.66133815e-16 1.04516129e+00] >> perr: [ 0.33828314 0.12647671] >> scipy.optimize.curve_fit >> popt: [ 8.53964011e-08 1.04516128e+00] >> perr: [ 0.27452122 0.09784324] >> >> >>> minuit >>> popt: [-4.851674617611934e-14, 1.0451612903225629] >>> perr: [ 0.33828315 0.12647671] > > statsmodels.api.WLS > popt: [ -4.90926744e-16 1.04516129e+00] > perr: [ 0.33828314 0.12647671] > statsmodels NonlinearLS > popt: [ -3.92166386e-08 1.04516130e+00] > perr: [ 0.33828314 0.12647671] > > > finally, I got some bugs out of the weights handling, but still not fully tested > > def run_nonlinearls(): > from scikits.statsmodels.miscmodels.nonlinls import NonlinearLS > > class Myfunc(NonlinearLS): > > def _predict(self, params): > x = self.exog > a, b = params > return a + b*x > > mod = Myfunc(y, x, sigma=sigma**2) > res = mod.fit(start_value=(0.042, 0.42)) > print ('statsmodels NonlinearLS') > print('popt: {0}'.format(res.params)) > print('perr: {0}'.format(res.bse)) > I'm looking forward to using NonlinearLS once it makes it's way in master. > The basics is the same as curve_fit using leastsq, but it uses complex > derivatives which are usually numerically very good. > > So it looks like the problems with curve_fit in your example are only > in the numerically derivatives that leastsq is using for the Jacobian. > > If leastsq is using only forward differences, then it might be better > to calculate the final Jacobian with centered differences. just a > guess. > > >> >> statsmodels and minuit agree pretty well >> >>> """Example from >>> http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html""" >>> x = np.arange(5) >>> y = np.array([1, -2, 1, -2, 1]) >>> sigma = 10 * np.array([1, 2, 1, 2, 1]) >>> statsmodels.api.WLS >>> popt: [ 5.71428571e-01 7.63278329e-17] >>> perr: [ 0.98609784 0.38892223] >>> scipy.optimize.curve_fit >>> popt: [ 5.71428662e-01 -8.73679511e-08] >>> perr: [ 0.97804034 0.3818681 ] >>> minuit >>> popt: [0.5714285714294132, 2.1449508835758024e-13] >>> perr: [ 0.98609784 0.38892223] > > statsmodels.api.WLS > popt: [ 5.71428571e-01 1.94289029e-16] > perr: [ 0.98609784 0.38892223] > statsmodels NonlinearLS > popt: [ 5.71428387e-01 8.45750929e-08] > perr: [ 0.98609784 0.38892223] > > Josef > >> >> statsmodels and minuit agree, >> >> my guess is that the jacobian calculation of leastsq (curve_fit) is >> not very good in these examples. Maybe trying Dfun or the other >> options, epsfcn, will help. >> >> I was trying to see whether I get better results calculation the >> numerical derivatives in a different way, but had to spend the time >> fixing bugs. >> (NonlinearLS didn't work correctly with weights.) >> >> Josef >> >>> >>> >>> >>> >>> Looking at the source code I see that scipy.optimize.curve_fit multiplies >>> >>> the pcov obtained from scipy.optimize.leastsq by a factor s_sq: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438 >>> >>> if (len(ydata) > len(p0)) and pcov is not None: >>> >>> s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) >>> >>> pcov = pcov * s_sq >>> >>> If so is it possible to add an explanation to >>> >>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html >>> >>> that pcov is multiplied with this s_sq factor and why that will give correct >>> >>> errors? >>> >>> After I noticed this issue I saw that this s_sq factor is mentioned in the >>> >>> cov_x return parameter description of leastsq, >>> >>> but I think it should be explained in curve_fit where it is applied, maybe >>> >>> leaving a reference in the cov_x leastsq description. >>> >>> Also it would be nice to mention the full_output option in the curve_fit >>> >>> docu, I only realized after looking at the source code that this was >>> >>> possible. >>> >>> Christoph >>> >>> _______________________________________________ >>> >>> SciPy-User mailing list >>> >>> SciPy-User at scipy.org >>> >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From charlesr.harris at gmail.com Thu Sep 1 08:18:12 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Sep 2011 06:18:12 -0600 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: References: <089E4569-0C53-4FC9-9B0F-353C4EF64478@googlemail.com> Message-ID: On Thu, Sep 1, 2011 at 4:04 AM, Christoph Deil < deil.christoph at googlemail.com> wrote: > > On Sep 1, 2011, at 3:45 AM, josef.pktd at gmail.com wrote: > > > On Wed, Aug 31, 2011 at 12:10 PM, wrote: > >> On Wed, Aug 31, 2011 at 11:09 AM, Christoph Deil > >> wrote: > >>> > >>> On Aug 30, 2011, at 11:25 PM, josef.pktd at gmail.com wrote: > >>> > >>> On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil > >>> wrote: > >>> > >>> I noticed that scipy.optimize.curve_fit returns parameter errors that > don't > >>> > >>> scale with sigma, the standard deviation of ydata, as I expected. > >>> > >>> Here is a code snippet to illustrate my point, which fits a straight > line to > >>> > >>> five data points: > >>> > >>> import numpy as np > >>> > >>> from scipy.optimize import curve_fit > >>> > >>> x = np.arange(5) > >>> > >>> y = np.array([1, -2, 1, -2, 1]) > >>> > >>> sigma = np.array([1, 2, 1, 2, 1]) > >>> > >>> def f(x, a, b): > >>> > >>> return a + b * x > >>> > >>> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma) > >>> > >>> perr = np.sqrt(pcov.diagonal()) > >>> > >>> print('*** sigma = {0} ***'.format(sigma)) > >>> > >>> print('popt: {0}'.format(popt)) > >>> > >>> print('perr: {0}'.format(perr)) > >>> > >>> I get the following result: > >>> > >>> *** sigma = [1 2 1 2 1] *** > >>> > >>> popt: [ 5.71428536e-01 1.19956213e-08] > >>> > >>> perr: [ 0.93867933 0.40391117] > >>> > >>> Increasing sigma by a factor of 10, > >>> > >>> sigma = 10 * np.array([1, 2, 1, 2, 1]) > >>> > >>> I get the following result: > >>> > >>> *** sigma = [10 20 10 20 10] *** > >>> > >>> popt: [ 5.71428580e-01 -2.27625699e-09] > >>> > >>> perr: [ 0.93895295 0.37079075] > >>> > >>> The best-fit values stayed the same as expected. > >>> > >>> But the error on the slope b decreased by 8% (the error on the offset a > >>> > >>> didn't change much) > >>> > >>> I would have expected fit parameter errors to increase with increasing > >>> > >>> errors on the data!? > >>> > >>> Is this a bug? > >>> > >>> No bug in the formulas. I tested all of them when curve_fit was added. > >>> > >>> However in your example the numerical cov lacks quite a bit of > >>> precision. Trying your example with different starting values, I get a > >>> 0.05 difference in your perr (std of parameter estimates). > >>> > >>> Trying smaller xtol and ftol doesn't change anything. (?) > >>> > >>> Making ftol = 1e-15 very small I get a different wrong result: > >>> popt: [ 5.71428580e-01 -2.27625699e-09] > >>> perr: [ 0.92582011 0.59868281] > >>> What do I have to do to get a correct answer (say to 5 significant > digits) > >>> from curve_fit for this simple example? > >>> > >>> Since it's linear > >>> > >>> import scikits.statsmodels.api as sm > >>> > >>> x = np.arange(5.) > >>> > >>> y = np.array([1, -2, 1, -2, 1.]) > >>> > >>> sigma = np.array([1, 2, 1, 2, 1.]) > >>> > >>> res = sm.WLS(y, sm.add_constant(x, prepend=True), > weights=1./sigma**2).fit() > >>> > >>> res.params > >>> > >>> array([ 5.71428571e-01, 1.11022302e-16]) > >>> > >>> res.bse > >>> > >>> array([ 0.98609784, 0.38892223]) > >>> > >>> res = sm.WLS(y, sm.add_constant(x, prepend=True), > >>> weights=1./(sigma*10)**2).fit() > >>> > >>> res.params > >>> > >>> array([ 5.71428571e-01, 1.94289029e-16]) > >>> > >>> res.bse > >>> > >>> array([ 0.98609784, 0.38892223]) > >>> > >>> rescaling doesn't change parameter estimates nor perr > >>> > >>> This is what I don't understand. > >>> Why don't the parameter estimate errors increase with increasing errors > >>> sigma on the data points? > >>> If I have less precise measurements, the model parameters should be > less > >>> constrained?! > >>> I was using MINUIT before I learned Scipy and the error definition for > a > >>> chi2 fit given in the MINUIT User Guide > >>> http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node7.html > >>> as well as the example results here > >>> http://code.google.com/p/pyminuit/wiki/GettingStartedGuide > >>> don't mention the factor s_sq that is used in curve_fit to scale pcov. > >>> Is the error definition in the MINUIT manual wrong? > >>> Can you point me to a web resource that explains why the s_sq factor > needs > >>> to be applied to the covariance matrix? > >> > >> It's standard text book information, but Wikipedia seems to be lacking > >> a bit in this. > >> > >> for the linear case > >> http://en.wikipedia.org/wiki/Ordinary_least_squares#Assuming_normality > >> > >> cov_params = sigma^2 (X'X)^{-1} > >> > >> for the non-linear case with leastsq, X is replaced by Jacobian, > >> otherwise everything is the same. > >> > >> However, in your minuit links I saw only the Hessian mentioned (from > >> very fast skimming the pages) > >> > >> With maximum likelihood, the inverse Hessian is the complete > >> covariance matrix, no additional multiplication is necessary. > >> > >> Essentially, these are implementation details depending on how the > >> estimation is calculated, and there are various ways of numerically > >> approximating the Hessian. > >> That's why this is described for optimize.leastsq (incorrectly as > >> Chuck pointed out) and but not in optimize.curve_fit. > >> > >> With leastsquares are maximum likelihood, rescaling both y and > >> f(x,params) has no effect on the parameter estimates, it's just like > >> changing units of y, meters instead of centimeters. > >> > >> I guess scipy.odr would work differently, since it is splitting up the > >> errors between y and x's, but I never looked at the details. > > OK, now I understand, thanks for your explanations. > Roughly speaking, scipy.optimize.curve_fit applies a factor s_sq = chi^2 / > ndf to the covariance matrix to account for possibly incorrect overall scale > in the y errors "sigma" of the data. > > I had simply not seen this factor being applied to chi^2 fits before. E.g. > in many physics and astronomy papers, parameter errors from chi^2 fit > results are reported without this factor. Also the manual of the fitting > package used by most physicists (MINUIT) as well as the statistics textbook > I use (Cowan -- Statistical data analysis) don't mention it. > > May I therefore suggest to explicitly mention this factor in the > scipy.optimize.curve_fit docstring to avoid confusion? > > Note that most texts will tell you that \sigma^2 should be estimated independently of the data, i.e., from the known precision of the measurements and so on. That is because using the residuals to estimate \sigma^2 is not all that accurate, especially for small data sets. In practice, that recommendation is commonly ignored, but if you do use the residuals you should keep in mind the potential inaccuracy of the estimate. I think curve_fit should probably return the scaled covariance and the estimate of \sigma^2 separately so that folks can follow the recommended practice if they have decent apriori knowledge of the measurement errors. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cweisiger at msg.ucsf.edu Thu Sep 1 11:15:11 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Thu, 1 Sep 2011 08:15:11 -0700 Subject: [SciPy-User] Projecting volumes down to 2D In-Reply-To: <4E5F4388.6090406@uci.edu> References: <4E5F4388.6090406@uci.edu> Message-ID: On Thu, Sep 1, 2011 at 1:34 AM, Christoph Gohlke wrote: > > This looks like "maximum intensity projection" visualization > . MIP can be > efficiently implemented using OpenGL by blending together multiple > slices, oriented perpendicular to the projection direction, through a 3D > texture (xyz data). Also consider VTK's vtkVolumeRayCastMIPFunction class. Interesting, and I didn't know that OpenGL could do that. However, I'd already considered and rejected using 3D textures for the application as a whole, because my image data can be so large -- upwards of 512x512x60 for a single timepoint, and not only can there be many timepoints, but users can also request projections through time. So we could be talking gigabytes of texture data here. Currently this program runs well on my rather underpowered laptop, and I'd like to keep things that way if possible. I should still familiarize myself with OpenGL and 3D textures, if for no other reason than to know when things are possible to do with them. -Chris > > Christoph From josef.pktd at gmail.com Thu Sep 1 15:22:45 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 1 Sep 2011 15:22:45 -0400 Subject: [SciPy-User] scipy central comments Message-ID: I'm still in favor of a comment system. http://scipy-central.org/item/22/3/building-a-simple-interactive-2d-data-viewer-with-matplotlib looks nice but what version of matplotlib does it work with? >python building_a_simple_interactive_2d_data_viewer_with_matplotlib_v3.py loading C:\Users\josef\.matplotlib\sample_data\ct.raw Traceback (most recent call last): File "building_a_simple_interactive_2d_data_viewer_with_matplotlib_v3.py", lin e 124, in fig_v=viewer_2d(z,x,y) File "building_a_simple_interactive_2d_data_viewer_with_matplotlib_v3.py", lin e 39, in __init__ self.overview=plt.subplot2grid((8,4),(0,0),rowspan=7,colspan=2) File "C:\Python26\lib\site-packages\matplotlib\pyplot.py", line 800, in subplo t2grid a = Subplot(fig, subplotspec, **kwargs) File "C:\Python26\lib\site-packages\matplotlib\axes.py", line 8326, in __init_ _ self.update_params() File "C:\Python26\lib\site-packages\matplotlib\axes.py", line 8358, in update_ params return_all=True) File "C:\Python26\lib\site-packages\matplotlib\gridspec.py", line 378, in get_ position figBottom2 = figBottoms[rowNum2] IndexError: list index out of range Josef From newville at cars.uchicago.edu Thu Sep 1 15:45:31 2011 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 1 Sep 2011 14:45:31 -0500 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit (Christoph Deil) Message-ID: Christoph, I ran some tests with the scaling of the covariance matrix from scipy.optimize.leastsq. Using lmfit-py, and scipy.optimize.leastsq, I did fits with sample function of a Gaussian + Line + Simulated Noise (random.normal(scale=0.023). The "data" had 201 points, and the fit had 5 variables. I ran fits with covariance scaling on and off, and "data sigma" = 0.1, 0.2, and 0.5. The full results are below, and the code to run this is https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py You'll need the latest git version of lmfit-py for this to be able to turn on/off the covariance scaling. As you can see from the results below, by scaling the covariance, the estimated uncertainties in the parameters are independent of sigma, the estimated uncertainty in the data. In many cases this is a fabulously convenient feature. As expected, when the covariance matrix is not scaled by sum_sqr / nfree, the estimated uncertainties in the variables depends greatly on the value of sigma. For the "correct" sigma of 0.23 in this one test case, the scaled and unscaled values are very close: amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0] cen_g = 8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1] and so on. The scaled uncertainties appear to be about 10% larger than the unscaled ones. Since this was just one set of data (ie, one set of simulated noise), I'm not sure whether this difference is significant or important to you. Interestingly, these two cases are in better agreement than comparing sigma=0.20 and sigma=0.23 for the unscaled covariance matrix. For myself, I much prefer having estimated uncertainties that may be off by 10% than being expected to know the uncertainties in the data to 10%. But then, I work in a field where we have lots of data and systematic errors in collection and processing swamp any simple estimate of the noise in the data. As a result, I think the scaling should stay the way it is. Cheers, --Matt Newville 630-252-0431 ==== scale_covar = True === sigma = 0.1 chisqr = 860.573615001 reduced_chisqr = 4.39068170919 amp_g: 20.99201+/- 0.05423 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.2 chisqr = 215.14340375 reduced_chisqr = 1.0976704273 amp_g: 20.99201+/- 0.05423 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.23 chisqr = 162.679322306 reduced_chisqr = 0.82999654238 amp_g: 20.99201+/- 0.05423 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.5 chisqr = 34.4229446 reduced_chisqr = 0.175627268368 amp_g: 20.99201+/- 0.05423 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) ============================== ==== scale_covar = False === sigma = 0.1 chisqr = 860.573615001 reduced_chisqr = 4.39068170919 amp_g: 20.99201+/- 0.02588 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00189 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.01660 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00127 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00242 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.2 chisqr = 215.14340375 reduced_chisqr = 1.0976704273 amp_g: 20.99201+/- 0.05177 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00378 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03320 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00254 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00485 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.23 chisqr = 162.679322306 reduced_chisqr = 0.82999654238 amp_g: 20.99201+/- 0.05953 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00435 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.03818 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00292 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.00558 (inital= 1.000000, model_value= 1.600000) ============================== sigma = 0.5 chisqr = 34.4229446 reduced_chisqr = 0.175627268368 amp_g: 20.99201+/- 0.12941 (inital= 10.000000, model_value= 21.000000) cen_g: 8.09857+/- 0.00945 (inital= 9.000000, model_value= 8.100000) line_off: -0.99346+/- 0.08300 (inital= 0.000000, model_value=-1.023000) line_slope: 0.61737+/- 0.00636 (inital= 0.000000, model_value= 0.620000) wid_g: 1.59679+/- 0.01212 (inital= 1.000000, model_value= 1.600000) ============================== From cweisiger at msg.ucsf.edu Thu Sep 1 17:24:03 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Thu, 1 Sep 2011 14:24:03 -0700 Subject: [SciPy-User] Projecting volumes down to 2D In-Reply-To: References: <4E5F4388.6090406@uci.edu> Message-ID: On Thu, Sep 1, 2011 at 8:15 AM, Chris Weisiger wrote: > On Thu, Sep 1, 2011 at 1:34 AM, Christoph Gohlke wrote: >> >> This looks like "maximum intensity projection" visualization >> . MIP can be >> efficiently implemented using OpenGL by blending together multiple >> slices, oriented perpendicular to the projection direction, through a 3D >> texture (xyz data). Also consider VTK's vtkVolumeRayCastMIPFunction class. > > Interesting, and I didn't know that OpenGL could do that. However, I'd > already considered and rejected using 3D textures for the application > as a whole, because my image data can be so large -- upwards of > 512x512x60 for a single timepoint, and not only can there be many > timepoints, but users can also request projections through time. So we > could be talking gigabytes of texture data here. Currently this > program runs well on my rather underpowered laptop, and I'd like to > keep things that way if possible. Just to followup, the maximum size of a 3D texture on this laptop is only 128 pixels in any direction, so I'd have to do some nasty stitching together of texture blocks to use OpenGL to solve this problem. Nice idea, though. Of course, practically every computer here is more powerful than my laptop, but they don't always have unusually strong graphics cards, and I'd rather not restrict what computers my code can run on. -Chris From deil.christoph at googlemail.com Thu Sep 1 17:29:20 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Thu, 1 Sep 2011 23:29:20 +0200 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: References: Message-ID: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> On Sep 1, 2011, at 9:45 PM, Matt Newville wrote: > Christoph, > > I ran some tests with the scaling of the covariance matrix from > scipy.optimize.leastsq. Using lmfit-py, and scipy.optimize.leastsq, I > did fits with sample function of a Gaussian + Line + Simulated Noise > (random.normal(scale=0.023). The "data" had 201 points, and the fit > had 5 variables. I ran fits with covariance scaling on and off, and > "data sigma" = 0.1, 0.2, and 0.5. The full results are below, and the > code to run this is > > https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py > > You'll need the latest git version of lmfit-py for this to be able to > turn on/off the covariance scaling. > > As you can see from the results below, by scaling the covariance, the > estimated uncertainties in the parameters are independent of sigma, > the estimated uncertainty in the data. In many cases this is a > fabulously convenient feature. > > As expected, when the covariance matrix is not scaled by sum_sqr / > nfree, the estimated uncertainties in the variables depends greatly on > the value of sigma. For the "correct" sigma of 0.23 in this one test > case, the scaled and unscaled values are very close: > > amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0] > cen_g = 8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1] > > and so on. The scaled uncertainties appear to be about 10% larger > than the unscaled ones. Since this was just one set of data (ie, one > set of simulated noise), I'm not sure whether this difference is > significant or important to you. Interestingly, these two cases are > in better agreement than comparing sigma=0.20 and sigma=0.23 for the > unscaled covariance matrix. > > For myself, I much prefer having estimated uncertainties that may be > off by 10% than being expected to know the uncertainties in the data > to 10%. But then, I work in a field where we have lots of data and > systematic errors in collection and processing swamp any simple > estimate of the noise in the data. > > As a result, I think the scaling should stay the way it is. I think there are use cases for scaling and for not scaling. Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution. Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants. The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one. If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger. IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points and thus the default should be to not apply the scaling factor in that case. On the other hand at the moment the scaling factor is always applied, so having a keyword option scale_covariance=True as default means backwards compatibility. > > Cheers, > > --Matt Newville 630-252-0431 > > ==== scale_covar = True === > sigma = 0.1 > chisqr = 860.573615001 > reduced_chisqr = 4.39068170919 > amp_g: 20.99201+/- 0.05423 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.2 > chisqr = 215.14340375 > reduced_chisqr = 1.0976704273 > amp_g: 20.99201+/- 0.05423 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.23 > chisqr = 162.679322306 > reduced_chisqr = 0.82999654238 > amp_g: 20.99201+/- 0.05423 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.5 > chisqr = 34.4229446 > reduced_chisqr = 0.175627268368 > amp_g: 20.99201+/- 0.05423 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00396 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03478 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00266 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00508 (inital= 1.000000, model_value= 1.600000) > ============================== > ==== scale_covar = False === > sigma = 0.1 > chisqr = 860.573615001 > reduced_chisqr = 4.39068170919 > amp_g: 20.99201+/- 0.02588 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00189 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.01660 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00127 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00242 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.2 > chisqr = 215.14340375 > reduced_chisqr = 1.0976704273 > amp_g: 20.99201+/- 0.05177 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00378 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03320 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00254 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00485 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.23 > chisqr = 162.679322306 > reduced_chisqr = 0.82999654238 > amp_g: 20.99201+/- 0.05953 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00435 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.03818 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00292 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.00558 (inital= 1.000000, model_value= 1.600000) > ============================== > sigma = 0.5 > chisqr = 34.4229446 > reduced_chisqr = 0.175627268368 > amp_g: 20.99201+/- 0.12941 (inital= 10.000000, > model_value= 21.000000) > cen_g: 8.09857+/- 0.00945 (inital= 9.000000, model_value= 8.100000) > line_off: -0.99346+/- 0.08300 (inital= 0.000000, model_value=-1.023000) > line_slope: 0.61737+/- 0.00636 (inital= 0.000000, model_value= 0.620000) > wid_g: 1.59679+/- 0.01212 (inital= 1.000000, model_value= 1.600000) > ============================== From cgohlke at uci.edu Thu Sep 1 17:37:38 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 01 Sep 2011 14:37:38 -0700 Subject: [SciPy-User] Projecting volumes down to 2D In-Reply-To: References: <4E5F4388.6090406@uci.edu> Message-ID: <4E5FFB22.2030907@uci.edu> On 9/1/2011 2:24 PM, Chris Weisiger wrote: > On Thu, Sep 1, 2011 at 8:15 AM, Chris Weisiger wrote: >> On Thu, Sep 1, 2011 at 1:34 AM, Christoph Gohlke wrote: >>> >>> This looks like "maximum intensity projection" visualization >>> . MIP can be >>> efficiently implemented using OpenGL by blending together multiple >>> slices, oriented perpendicular to the projection direction, through a 3D >>> texture (xyz data). Also consider VTK's vtkVolumeRayCastMIPFunction class. >> >> Interesting, and I didn't know that OpenGL could do that. However, I'd >> already considered and rejected using 3D textures for the application >> as a whole, because my image data can be so large -- upwards of >> 512x512x60 for a single timepoint, and not only can there be many >> timepoints, but users can also request projections through time. So we >> could be talking gigabytes of texture data here. Currently this >> program runs well on my rather underpowered laptop, and I'd like to >> keep things that way if possible. > > Just to followup, the maximum size of a 3D texture on this laptop is > only 128 pixels in any direction, so I'd have to do some nasty > stitching together of texture blocks to use OpenGL to solve this > problem. Nice idea, though. Of course, practically every computer here > is more powerful than my laptop, but they don't always have unusually > strong graphics cards, and I'd rather not restrict what computers my > code can run on. Fair enough. Just wondering how your underpowered laptop can run well processing multiple gigabyte volumes with ndimage.map_coordinates :) Christoph > > -Chris > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Thu Sep 1 21:31:22 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 1 Sep 2011 21:31:22 -0400 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> References: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> Message-ID: On Thu, Sep 1, 2011 at 5:29 PM, Christoph Deil wrote: > > On Sep 1, 2011, at 9:45 PM, Matt Newville wrote: > >> Christoph, >> >> I ran some tests with the scaling of the covariance matrix from >> scipy.optimize.leastsq. ?Using lmfit-py, and scipy.optimize.leastsq, I >> did fits with sample function of a Gaussian + Line + Simulated Noise >> (random.normal(scale=0.023). ?The "data" had 201 points, and the fit >> had 5 variables. ?I ran fits with covariance scaling on and off, and >> "data sigma" = 0.1, 0.2, and 0.5. ?The full results are below, and the >> code to run this is >> >> ? https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py >> >> You'll need the latest git version of lmfit-py for this to be able to >> turn on/off the covariance scaling. >> >> As you can see from the results below, by scaling the covariance, the >> estimated uncertainties in the parameters are independent of sigma, >> the estimated uncertainty in the data. ?In many cases this is a >> fabulously convenient feature. >> >> As expected, when the covariance matrix is not scaled by sum_sqr / >> nfree, the estimated uncertainties in the variables depends greatly on >> the value of sigma. ?For the "correct" sigma of 0.23 in this one test >> case, the scaled and unscaled values are very close: >> >> ? amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0] >> ? cen_g = ?8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1] >> >> and so on. ?The scaled uncertainties appear to be about 10% larger >> than the unscaled ones. ?Since this was just one set of data (ie, one >> set of simulated noise), I'm not sure whether this difference is >> significant or important to you. ?Interestingly, these two cases are >> in better agreement than comparing sigma=0.20 and sigma=0.23 for the >> unscaled covariance matrix. >> >> For myself, I much prefer having estimated uncertainties that may be >> off by 10% than being expected to know the uncertainties in the data >> to 10%. ?But then, I work in a field where we have lots of data and >> systematic errors in collection and processing swamp any simple >> estimate of the noise in the data. >> >> As a result, I think the scaling should stay the way it is. > > I think there are use cases for scaling and for not scaling. > Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution. > Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants. > > The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one. > If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger. > > IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points > and thus the default should be to not apply the scaling factor in that case. > On the other hand at the moment the scaling factor is always applied, so having a keyword option > scale_covariance=True as default means backwards compatibility. I think it depends on the expectation of the user, and whether the options and descriptions are confusing. >From what I have seen, I would always expect that the estimator also estimates the error variance. Given noise, functional approximation error and left out variables (environment), I have never seen a case that would specify the error variance in advance, (although there might be different options to estimate the scale as in robust models, and there are two stage estimators.) Before curve_fit there was quite a bit of confusion on the mailing list, if I remember correctly, whether cov_x of leastsq has to be scaled or not. If there are no weights/sigma, then cov_x is not the covariance matrix of the parameter estimate (unless y is scaled such that the error variance is 1). Will it confuse many users of curve_fit if they have to figure out whether they need scaled or not? Also, if the default treatment differs depending on whether sigma is specified or not, it might be confusing. The terminology for "sigma" might be a bit confusing. In statsmodels the similar model uses "weights", and weighted least squares (WLS) has at least in my textbooks a clear definition, in contrast to "curve_fit". sigma might indicate already scaled variances, but I had interpreted it just as 1./weights for WLS. In statsmodels, we use sigma for the unscaled covariance matrix of the entire vector of errors in generalized least squares, GLS. Textbook often use something like cov_errors = \sigma * \SIGMA . sigma in statsmodels GLS is just used to transform the variables to standard OLS. I'm in favor of keeping the current scaled cov_params as default. curve_fit produces now the same results as minuit except for numerical problems. Additional options are fine as long as they are clearly explained. Josef (not a user of curve_fit but of leastsq) > >> >> Cheers, >> >> --Matt Newville 630-252-0431 From newville at cars.uchicago.edu Fri Sep 2 02:43:13 2011 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 2 Sep 2011 01:43:13 -0500 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> References: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> Message-ID: Hi Cristoph, On Thu, Sep 1, 2011 at 4:29 PM, Christoph Deil wrote: > > On Sep 1, 2011, at 9:45 PM, Matt Newville wrote: > >> Christoph, >> >> I ran some tests with the scaling of the covariance matrix from >> scipy.optimize.leastsq. ?Using lmfit-py, and scipy.optimize.leastsq, I >> did fits with sample function of a Gaussian + Line + Simulated Noise >> (random.normal(scale=0.023). ?The "data" had 201 points, and the fit >> had 5 variables. ?I ran fits with covariance scaling on and off, and >> "data sigma" = 0.1, 0.2, and 0.5. ?The full results are below, and the >> code to run this is >> >> ? https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py >> >> You'll need the latest git version of lmfit-py for this to be able to >> turn on/off the covariance scaling. >> >> As you can see from the results below, by scaling the covariance, the >> estimated uncertainties in the parameters are independent of sigma, >> the estimated uncertainty in the data. ?In many cases this is a >> fabulously convenient feature. >> >> As expected, when the covariance matrix is not scaled by sum_sqr / >> nfree, the estimated uncertainties in the variables depends greatly on >> the value of sigma. ?For the "correct" sigma of 0.23 in this one test >> case, the scaled and unscaled values are very close: >> >> ? amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0] >> ? cen_g = ?8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1] >> >> and so on. ?The scaled uncertainties appear to be about 10% larger >> than the unscaled ones. ?Since this was just one set of data (ie, one >> set of simulated noise), I'm not sure whether this difference is >> significant or important to you. ?Interestingly, these two cases are >> in better agreement than comparing sigma=0.20 and sigma=0.23 for the >> unscaled covariance matrix. >> >> For myself, I much prefer having estimated uncertainties that may be >> off by 10% than being expected to know the uncertainties in the data >> to 10%. ?But then, I work in a field where we have lots of data and >> systematic errors in collection and processing swamp any simple >> estimate of the noise in the data. >> >> As a result, I think the scaling should stay the way it is. > > I think there are use cases for scaling and for not scaling. > Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution. > Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants. > > The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one. Ah, right. I see your point. Scaling the covariance matrix is equivalent to asserting that the fit is good (reduced chi_square = 1), and so scaling sigma such that this is the case, and getting the parameter uncertainties accordingly. This, and the fact that reduced chi_square was slightly less than 1 (0.83) in the example I gave, explains the ~10% difference in uncertainties. But again, that's an idealized case. > If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger. I think the question is: should the estimated uncertainties reflect the imperfection of the model? I can see merit in both methods (which differ by a factor of sqrt(reduced_chi_square)): a) Given an estimate of sigma, estimate the uncertainties. Use unscaled covariance. b) Assert that this is a "good fit", estimate the uncertainties. Use scaled covariance. That is, one might have a partial estimate of sigma, and use reduced chi_square maritally to assess how good this is. > IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points > and thus the default should be to not apply the scaling factor in that case. > On the other hand at the moment the scaling factor is always applied, so having a keyword option > scale_covariance=True as default means backwards compatibility. I believe you're proposing that default behavior should be "if sigma is given, use method a, otherwise use method b". I think that's reasonable, but don't have a strong opinion. I'm not sure I see a case for changing curve_fit, but I'm not committed to the behavior of lmfit-py. Perhaps the best thing to do would be to leave covariance matrix unscaled, but scale the estimated uncertainties as you propose. Cheers, --Matt Newville 630-252-0431 From bsouthey at gmail.com Fri Sep 2 09:59:56 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 02 Sep 2011 08:59:56 -0500 Subject: [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit In-Reply-To: References: <84305E17-54AF-4C56-9010-78FED24C3E5D@googlemail.com> Message-ID: <4E60E15C.8080005@gmail.com> On 09/02/2011 01:43 AM, Matt Newville wrote: > Hi Cristoph, > > On Thu, Sep 1, 2011 at 4:29 PM, Christoph Deil > wrote: >> On Sep 1, 2011, at 9:45 PM, Matt Newville wrote: >> >>> Christoph, >>> >>> I ran some tests with the scaling of the covariance matrix from >>> scipy.optimize.leastsq. Using lmfit-py, and scipy.optimize.leastsq, I >>> did fits with sample function of a Gaussian + Line + Simulated Noise >>> (random.normal(scale=0.023). The "data" had 201 points, and the fit >>> had 5 variables. I ran fits with covariance scaling on and off, and >>> "data sigma" = 0.1, 0.2, and 0.5. The full results are below, and the >>> code to run this is >>> >>> https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py >>> >>> You'll need the latest git version of lmfit-py for this to be able to >>> turn on/off the covariance scaling. >>> >>> As you can see from the results below, by scaling the covariance, the >>> estimated uncertainties in the parameters are independent of sigma, >>> the estimated uncertainty in the data. In many cases this is a >>> fabulously convenient feature. >>> >>> As expected, when the covariance matrix is not scaled by sum_sqr / >>> nfree, the estimated uncertainties in the variables depends greatly on >>> the value of sigma. For the "correct" sigma of 0.23 in this one test >>> case, the scaled and unscaled values are very close: >>> >>> amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0] >>> cen_g = 8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1] >>> >>> and so on. The scaled uncertainties appear to be about 10% larger >>> than the unscaled ones. Since this was just one set of data (ie, one >>> set of simulated noise), I'm not sure whether this difference is >>> significant or important to you. Interestingly, these two cases are >>> in better agreement than comparing sigma=0.20 and sigma=0.23 for the >>> unscaled covariance matrix. >>> >>> For myself, I much prefer having estimated uncertainties that may be >>> off by 10% than being expected to know the uncertainties in the data >>> to 10%. But then, I work in a field where we have lots of data and >>> systematic errors in collection and processing swamp any simple >>> estimate of the noise in the data. >>> >>> As a result, I think the scaling should stay the way it is. >> I think there are use cases for scaling and for not scaling. >> Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution. >> Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants. >> >> The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one. > Ah, right. I see your point. Scaling the covariance matrix is > equivalent to asserting that the fit is good (reduced chi_square = 1), > and so scaling sigma such that this is the case, and getting the > parameter uncertainties accordingly. This, and the fact that reduced > chi_square was slightly less than 1 (0.83) in the example I gave, > explains the ~10% difference in uncertainties. But again, that's an > idealized case. > >> If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger. > I think the question is: should the estimated uncertainties reflect > the imperfection of the model? I can see merit in both methods (which > differ by a factor of sqrt(reduced_chi_square)): > > a) Given an estimate of sigma, estimate the uncertainties. Use > unscaled covariance. > b) Assert that this is a "good fit", estimate the uncertainties. > Use scaled covariance. > > That is, one might have a partial estimate of sigma, and use reduced > chi_square maritally to assess how good this is. > >> IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points >> and thus the default should be to not apply the scaling factor in that case. >> On the other hand at the moment the scaling factor is always applied, so having a keyword option >> scale_covariance=True as default means backwards compatibility. > I believe you're proposing that default behavior should be "if sigma > is given, use method a, otherwise use method b". I think that's > reasonable, but don't have a strong opinion. I'm not sure I see a > case for changing curve_fit, but I'm not committed to the behavior of > lmfit-py. Perhaps the best thing to do would be to leave covariance > matrix unscaled, but scale the estimated uncertainties as you propose. > > Cheers, > > --Matt Newville 630-252-0431 > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user I think there is a confusion about the methodology. Josef provided the normal equations but as he indicated, those come from the assumption of independently and identically distributed residuals/errors (assumption of normality is only relevant for hypothesis testing). It is more instructive to use the generalized least squares (http://en.wikipedia.org/wiki/Generalized_least_squares) because that essentially avoids that assumption. X'*V^{-1}*X*b = X'*V^{-1}*Y Where V is the variance-covariance matrix of the residuals. For example, ordinary least square treats it as identity matrix * residual variance so the whole thing 'collapses' to: X'*X*b = X'*Y Under this formulation, you must know in advance about incorporation of the residual variance. Most programs assume some structure matrix times a common variance. To me the 'sigma' argument of scipy.optimize.curve_fit doc string appears to be the same as supplying a weight statement in weighted least squares. So I would not expect that you get exact same solutions between your two approaches. I do not know what you are doing in lmfit-py or by 'scaling', but it seem like you have just scaled the data by a value close to the estimated residual variance (from the standard errors provided, 0.00435/0.00396=1.1). That is you are just providing a common weight to all observations of about 1.1 (or 1.1*1.1 depending how the weight is being used as some program want to squared or the inverse). But you are forgetting that common weight in your comparison. Consequently you have to be able to test the variance covariance structures (see mixed models for more details). Bruce From cweisiger at msg.ucsf.edu Fri Sep 2 11:15:01 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Fri, 2 Sep 2011 08:15:01 -0700 Subject: [SciPy-User] Projecting volumes down to 2D In-Reply-To: <4E5FFB22.2030907@uci.edu> References: <4E5F4388.6090406@uci.edu> <4E5FFB22.2030907@uci.edu> Message-ID: On Thu, Sep 1, 2011 at 2:37 PM, Christoph Gohlke wrote: > > > On 9/1/2011 2:24 PM, Chris Weisiger wrote: >> >> Just to followup, the maximum size of a 3D texture on this laptop is >> only 128 pixels in any direction, so I'd have to do some nasty >> stitching together of texture blocks to use OpenGL to solve this >> problem. Nice idea, though. Of course, practically every computer here >> is more powerful than my laptop, but they don't always have unusually >> strong graphics cards, and I'd rather not restrict what computers my >> code can run on. > > Fair enough. Just wondering how your underpowered laptop can run well > processing multiple gigabyte volumes with ndimage.map_coordinates :) Video memory is separate from RAM, and RAM can page out to disk if it has to. So the heavy lifting may be slower on this laptop than it is on the heavier machines, but it does work. But the original point of the thread was that I wasn't processing multiple-gigabyte volumes on this laptop, for the most part -- once the original volume was loaded into memory, every other process was working only with 2D slices, which are much faster. A related question: is there some function in Scipy that will, given an arbitrary ray, return either the values in a data volume that the ray passes through, or get the largest volume, etc.? Turns out I now have two projects that could use that ability. -Chris From mail.till at gmx.de Fri Sep 2 11:34:39 2011 From: mail.till at gmx.de (Till Stensitzki) Date: Fri, 2 Sep 2011 15:34:39 +0000 (UTC) Subject: [SciPy-User] scipy central comments References: Message-ID: gmail.com> writes: > > I'm still in favor of a comment system. > +1 > http://scipy-central.org/item/22/3/building-a-simple-interactive-2d-data-viewer-with-matplotlib > > looks nice but what version of matplotlib does it work with? > It works with matplotlib 1.0.1. I added a remark to the source (and removed a buggy print statement). I think doing the layout manually instead using gridspec should make it work with earlier versions. greeting Till From josef.pktd at gmail.com Fri Sep 2 12:18:23 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 2 Sep 2011 12:18:23 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: Message-ID: On Fri, Sep 2, 2011 at 11:34 AM, Till Stensitzki wrote: > ? gmail.com> writes: > >> >> I'm still in favor of a comment system. >> > > +1 > >> > http://scipy-central.org/item/22/3/building-a-simple-interactive-2d-data-viewer-with-matplotlib >> >> looks nice but what version of matplotlib does it work with? >> > > It works with matplotlib 1.0.1. I added a remark to the source (and removed a > buggy print statement). I think doing the layout manually instead using gridspec > should make it work with earlier versions. Thanks >>> matplotlib.__version__ '1.0.0' I thought I had upgraded, but maybe in a different version of python. I better do it soon. Cheers, Josef > > greeting > Till > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dlaxalde at gmail.com Fri Sep 2 14:31:46 2011 From: dlaxalde at gmail.com (Denis Laxalde) Date: Fri, 2 Sep 2011 14:31:46 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: Message-ID: <20110902143146.38224d3c@mail.gmail.com> Hi, (I'm resurrecting an old post.) On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > > I just found that for some functions such as fmin_bfgs, the argument name > > for the objective function to be minimized is f, and for others such as > > fmin, it is func. > > I was wondering if this was intended, because I think it would be better to > > have consistent argument names across those functions. > > > > It's unlikely that that was intentional. A patch would be welcome. "func" > looks better to me than "f" or "F". There are still several inconsistencies in input or output of functions in the optimize package. For instance, for input parameters the Jacobian is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or 'x_tol', etc. Outputs might be returned in a different order, e.g., fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, cov_x, infodict, mesg, ier'. Some functions make use of the infodict output whereas some return the same data individually. etc. If you still believe (as I do) that consistency of optimize functions should be improved, I can work on it. Let me know. -- Denis From denis.laxalde at mcgill.ca Fri Sep 2 10:31:03 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Fri, 2 Sep 2011 10:31:03 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: Message-ID: <20110902103103.708d5597@mcgill.ca> Hi, (I'm resurrecting an old post.) On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > > I just found that for some functions such as fmin_bfgs, the argument name > > for the objective function to be minimized is f, and for others such as > > fmin, it is func. > > I was wondering if this was intended, because I think it would be better to > > have consistent argument names across those functions. > > It's unlikely that that was intentional. A patch would be welcome. "func" > looks better to me than "f" or "F". There are still several inconsistencies in input or output of functions in the optimize package. For instance, for input parameters the Jacobian is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or 'x_tol', etc. Also, outputs might be returned in a different order, e.g., fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, cov_x, infodict, mesg, ier'. If you still believe (as I do) that consistency of optimize functions should be improved, I can work on it. Let me know. -- Denis From jsseabold at gmail.com Sat Sep 3 11:42:37 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 3 Sep 2011 11:42:37 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <20110902103103.708d5597@mcgill.ca> References: <20110902103103.708d5597@mcgill.ca> Message-ID: On Fri, Sep 2, 2011 at 10:31 AM, Denis Laxalde wrote: > Hi, > > (I'm resurrecting an old post.) > > On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >> On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >> > I just found that for some functions such as fmin_bfgs, the argument name >> > for the objective function to be minimized is f, and for others such as >> > fmin, it is func. >> > I was wondering if this was intended, because I think it would be better to >> > have consistent argument names across those functions. >> >> It's unlikely that that was intentional. A patch would be welcome. "func" >> looks better to me than "f" or "F". > > There are still several inconsistencies in input or output of functions > in the optimize package. For instance, for input parameters the Jacobian > is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > 'x_tol', etc. Also, outputs might be returned in a different order, > e.g., fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns > 'x, cov_x, infodict, mesg, ier'. > > If you still believe (as I do) that consistency of optimize > functions should be improved, I can work on it. Let me know. > +1. I'd like to see the input and outputs streamlined as much as possible. It would also be nice to have a convenience wrapper around all the optimizers so that you can use them with one function. You'll have to deprecate the old signatures though. Skipper From cjordan1 at uw.edu Sat Sep 3 11:58:49 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Sat, 3 Sep 2011 11:58:49 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902103103.708d5597@mcgill.ca> Message-ID: On Sat, Sep 3, 2011 at 11:42 AM, Skipper Seabold wrote: > On Fri, Sep 2, 2011 at 10:31 AM, Denis Laxalde wrote: >> Hi, >> >> (I'm resurrecting an old post.) >> >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >>> On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >>> > I just found that for some functions such as fmin_bfgs, the argument name >>> > for the objective function to be minimized is f, and for others such as >>> > fmin, it is func. >>> > I was wondering if this was intended, because I think it would be better to >>> > have consistent argument names across those functions. >>> >>> It's unlikely that that was intentional. A patch would be welcome. "func" >>> looks better to me than "f" or "F". >> >> There are still several inconsistencies in input or output of functions >> in the optimize package. For instance, for input parameters the Jacobian >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >> 'x_tol', etc. Also, outputs might be returned in a different order, >> e.g., fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns >> 'x, cov_x, infodict, mesg, ier'. >> >> If you still believe (as I do) that consistency of optimize >> functions should be improved, I can work on it. Let me know. > I'm also a fan of more consistent names. But with the caveat that I'm not currently using the optimization library for any major code, and I haven't looked around in the scipy.optimize code base a lot. So my opinion isn't particularly well-informed. > > > +1. > > I'd like to see the input and outputs streamlined as much as possible. > It would also be nice to have a convenience wrapper around all the > optimizers so that you can use them with one function. You'll have to > deprecate the old signatures though. > You mean just a generic function, where you can specify the solver method and the relevant inputs using the newly unified names? Sounds good to me. I'm not a fan that the generic fmin function is a nelder-mead algorithm. Nelder-mead has its place, but I don't think it should be given the basic, default looking name. -Chris JS > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Sat Sep 3 12:12:28 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 3 Sep 2011 12:12:28 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902103103.708d5597@mcgill.ca> Message-ID: On Sat, Sep 3, 2011 at 11:58 AM, Christopher Jordan-Squire wrote: > On Sat, Sep 3, 2011 at 11:42 AM, Skipper Seabold wrote: >> On Fri, Sep 2, 2011 at 10:31 AM, Denis Laxalde wrote: >>> Hi, >>> >>> (I'm resurrecting an old post.) >>> >>> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >>>> On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >>>> > I just found that for some functions such as fmin_bfgs, the argument name >>>> > for the objective function to be minimized is f, and for others such as >>>> > fmin, it is func. >>>> > I was wondering if this was intended, because I think it would be better to >>>> > have consistent argument names across those functions. >>>> >>>> It's unlikely that that was intentional. A patch would be welcome. "func" >>>> looks better to me than "f" or "F". >>> >>> There are still several inconsistencies in input or output of functions >>> in the optimize package. For instance, for input parameters the Jacobian >>> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >>> 'x_tol', etc. Also, outputs might be returned in a different order, >>> e.g., fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns >>> 'x, cov_x, infodict, mesg, ier'. >>> >>> If you still believe (as I do) that consistency of optimize >>> functions should be improved, I can work on it. Let me know. >> > > I'm also a fan of more consistent names. But with the caveat that I'm > not currently using the optimization library for any major code, and I > haven't looked around in the scipy.optimize code base a lot. So my > opinion isn't particularly well-informed. > >> >> >> +1. >> >> I'd like to see the input and outputs streamlined as much as possible. >> It would also be nice to have a convenience wrapper around all the >> optimizers so that you can use them with one function. You'll have to >> deprecate the old signatures though. >> > > You mean just a generic function, where you can specify the solver > method and the relevant inputs using the newly unified names? > > Sounds good to me. I'm not a fan that the generic fmin function is a > nelder-mead algorithm. Nelder-mead has its place, but I don't think it > should be given the basic, default looking name. > Yeah, something like R stats optim would make some of our code nicer. http://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html Skipper From njs at pobox.com Sat Sep 3 13:39:39 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Sep 2011 10:39:39 -0700 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902103103.708d5597@mcgill.ca> Message-ID: On Sat, Sep 3, 2011 at 8:42 AM, Skipper Seabold wrote: > On Fri, Sep 2, 2011 at 10:31 AM, Denis Laxalde wrote: >> If you still believe (as I do) that consistency of optimize >> functions should be improved, I can work on it. Let me know. >> > > +1. > > I'd like to see the input and outputs streamlined as much as possible. > It would also be nice to have a convenience wrapper around all the > optimizers so that you can use them with one function. You'll have to > deprecate the old signatures though. +1 I'm using openopt entirely because by providing a unified interface to all the optimizers, it lets me easily swap out different optimizers to see which one works best. Openopt has some neat stuff in it, but it's a pretty heavyweight dependency to require just for *that*... -- Nathaniel From michael at klitgaard.dk Mon Sep 5 03:39:10 2011 From: michael at klitgaard.dk (Michael Klitgaard) Date: Mon, 5 Sep 2011 09:39:10 +0200 Subject: [SciPy-User] scipy central comments In-Reply-To: References: Message-ID: Would it be possible to include data files on SciPy-Central? In this case the file 'ct.raw'. I believe it would improve the quality of the program to include the sample_data. This could perhaps make scipy central evem more usefull than other code sharing sites. Sincerely Michael On Fri, Sep 2, 2011 at 6:18 PM, wrote: > On Fri, Sep 2, 2011 at 11:34 AM, Till Stensitzki wrote: >> ? gmail.com> writes: >> >>> >>> I'm still in favor of a comment system. >>> >> >> +1 >> >>> >> http://scipy-central.org/item/22/3/building-a-simple-interactive-2d-data-viewer-with-matplotlib >>> >>> looks nice but what version of matplotlib does it work with? >>> >> >> It works with matplotlib 1.0.1. I added a remark to the source (and removed a >> buggy print statement). I think doing the layout manually instead using gridspec >> should make it work with earlier versions. > > Thanks >>>> matplotlib.__version__ > '1.0.0' > > I thought I had upgraded, but maybe in a different version of python. > I better do it soon. > > Cheers, > Josef > >> >> greeting >> Till >> >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From matt.newville at gmail.com Sun Sep 4 20:44:27 2011 From: matt.newville at gmail.com (Matthew Newville) Date: Sun, 4 Sep 2011 17:44:27 -0700 (PDT) Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <20110902143146.38224d3c@mail.gmail.com> References: <20110902143146.38224d3c@mail.gmail.com> Message-ID: <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Hi, On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: > > Hi, > > (I'm resurrecting an old post.) > > On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > > > I just found that for some functions such as fmin_bfgs, the argument > name > > > for the objective function to be minimized is f, and for others such as > > > fmin, it is func. > > > I was wondering if this was intended, because I think it would be > better to > > > have consistent argument names across those functions. > > > > > > > It's unlikely that that was intentional. A patch would be welcome. "func" > > looks better to me than "f" or "F". > > There are still several inconsistencies in input or output of functions > in the optimize package. For instance, for input parameters the Jacobian > is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > 'x_tol', etc. Outputs might be returned in a different order, e.g., > fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, > cov_x, infodict, mesg, ier'. Some functions make use of the infodict > output whereas some return the same data individually. etc. > > If you still believe (as I do) that consistency of optimize > functions should be improved, I can work on it. Let me know > Also +1. I would add that the call signatures and return values for the user-supplied function to minimize should be made consistent too. Currently, some functions (leastsq) requires the return value to be an array, while others (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). That seems like a serious impediment to changing algorithms. --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnelsonchem at gmail.com Mon Sep 5 02:20:57 2011 From: rnelsonchem at gmail.com (Ryan Nelson) Date: Mon, 5 Sep 2011 01:20:57 -0500 Subject: [SciPy-User] Problems with numpydoc Message-ID: Hello all, I really like the Numpy/Scipy documentation strings, and I've been using that style for a small project of mine. However, I'm having problems getting Sphinx (version 1.0.7) to autodoc my code properly using the numpydoc package. I've tried installing numpydoc via easy_install and using the code from the git Numpy repo directly, and they both give me the same problems. The latex and html builders are giving me similar problems. I run the following commands to build the documentations (below). After the sphinx-build command, I get an error saying that 'Parameters' is an unexpected section title, even though I thought that was what 'numpydoc' was supposed to process. After the build, the documentation is not displayed in the same way as the Numpy documentation. Can anyone give me any advice to get this working. Thank you so much. Ryan P.S. I've attached a minimal, but complete, sphinx-quickstart project. The project is titled 'Test'. I've built both the html (build/static) and latex (build/latex) documentation. The source code for my test class is located in the sptest directory. **** Commands to process and error message **** sphinx_test $ sphinx-autogen source/*.rst sphinx_test $ sphinx-build -E -b latex source/ build/latex Running Sphinx v1.0.7 building [latex]: all documents updating environment: 3 added, 0 changed, 0 removed reading sources... [100%] intro /home/nelson/docs/sphinx_test/sptest/sptest.py:docstring of sptest.Sptest:8: (SEVERE/4) Unexpected section title. Parameters __________ looking for now-outdated files... none found pickling environment... done checking consistency... done processing Test.tex... index intro generated/sptest.Sptest resolving references... writing... done copying TeX support files... done build succeeded, 1 warning. sphinx_test $ make -C build/latex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sphinx_test.tar.gz Type: application/x-gzip Size: 145061 bytes Desc: not available URL: From josef.pktd at gmail.com Mon Sep 5 09:23:41 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Sep 2011 09:23:41 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville wrote: > Hi, > > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: >> >> Hi, >> >> (I'm resurrecting an old post.) >> >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >> > > I just found that for some functions such as fmin_bfgs, the argument >> > > name >> > > for the objective function to be minimized is f, and for others such >> > > as >> > > fmin, it is func. >> > > I was wondering if this was intended, because I think it would be >> > > better to >> > > have consistent argument names across those functions. >> > > >> > >> > It's unlikely that that was intentional. A patch would be welcome. >> > "func" >> > looks better to me than "f" or "F". >> >> There are still several inconsistencies in input or output of functions >> in the optimize package. For instance, for input parameters the Jacobian >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >> 'x_tol', etc. Outputs might be returned in a different order, e.g., >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict >> output whereas some return the same data individually. etc. >> >> If you still believe (as I do) that consistency of optimize >> functions should be improved, I can work on it. Let me know > > Also +1. > > I would add that the call signatures and return values for the user-supplied > function to minimize should be made consistent too.? Currently, some > functions (leastsq) requires the return value to be an array, while others > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). > That seems like a serious impediment to changing algorithms. I don't see how that would be possible, since it's a difference in algorithm, leastsq needs the values for individual observations (to calculate Jacobian), the other ones don't care and only maximize an objective function that could have arbitrary accumulation. Otherwise I'm also +1. It might be a bit messy during deprecation with double names, and there remain different arguments depending on the algorithm, e.g. constraints or not, and if constraints which kind, objective value and derivative in one function or in two. Josef > > --Matt Newville > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From pav at iki.fi Mon Sep 5 09:39:42 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 5 Sep 2011 13:39:42 +0000 (UTC) Subject: [SciPy-User] Problems with numpydoc References: Message-ID: Mon, 05 Sep 2011 01:20:57 -0500, Ryan Nelson wrote: [clip] Write Parameters ---------- instead of Parameters __________ From deil.christoph at googlemail.com Mon Sep 5 10:55:33 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Mon, 5 Sep 2011 15:55:33 +0100 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: On Sep 5, 2011, at 2:23 PM, josef.pktd at gmail.com wrote: > On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville > wrote: >> Hi, >> >> On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: >>> >>> Hi, >>> >>> (I'm resurrecting an old post.) >>> >>> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >>>> On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >>>>> I just found that for some functions such as fmin_bfgs, the argument >>>>> name >>>>> for the objective function to be minimized is f, and for others such >>>>> as >>>>> fmin, it is func. >>>>> I was wondering if this was intended, because I think it would be >>>>> better to >>>>> have consistent argument names across those functions. >>>>> >>>> >>>> It's unlikely that that was intentional. A patch would be welcome. >>>> "func" >>>> looks better to me than "f" or "F". >>> >>> There are still several inconsistencies in input or output of functions >>> in the optimize package. For instance, for input parameters the Jacobian >>> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >>> 'x_tol', etc. Outputs might be returned in a different order, e.g., >>> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, >>> cov_x, infodict, mesg, ier'. Some functions make use of the infodict >>> output whereas some return the same data individually. etc. >>> >>> If you still believe (as I do) that consistency of optimize >>> functions should be improved, I can work on it. Let me know >> >> Also +1. >> >> I would add that the call signatures and return values for the user-supplied >> function to minimize should be made consistent too. Currently, some >> functions (leastsq) requires the return value to be an array, while others >> (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). >> That seems like a serious impediment to changing algorithms. > > I don't see how that would be possible, since it's a difference in > algorithm, leastsq needs the values for individual observations (to > calculate Jacobian), the other ones don't care and only maximize an > objective function that could have arbitrary accumulation. > > Otherwise I'm also +1. > It might be a bit messy during deprecation with double names, and > there remain different arguments depending on the algorithm, e.g. > constraints or not, and if constraints which kind, objective value and > derivative in one function or in two. > > Josef > +1 on making the input and output of the scipy optimizer routines more uniform. I have one additional, somewhat related question to the leastsq issue Matt mentioned: How can I compute the covariance matrix for a likelihood fit using the routines in scipy.optimize? As far as I see the only method that returns a covariance matrix (or results from with it is easy to compute the covariance matrix) are leastsq and curve_fit. The problem with these is that they take a residual vector chi and internally compute chi2 = (chi**2).sum() and minimize that. But for a likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring is required, so as far as I can see it is not possible to do a likelihood fit with leastsq and curve_fit. On the other hand as Matt pointed out most (all) other optimization methods (like e.g. fmin) don't do this squaring and summing internally, but the user function does it if one is doing a chi2 fit, so using these it is easy to do a likelihood fit. But it is not possible to get parameter errors because these other optimizers don't return a Hessian or covariance matrix. It would be nice if there were a method in scipy.optimize to compute the covariance matrix for all optimizers. And it would be nice if the super-fast Levenberg-Marquard optimizer called by leastsq were also available for likelihood fits. Maybe it would be possible to add one method to scipy.optimize to compute the covariance matrix after any of the optimizers has run, i.e. the best-fit pars have been determined? cov = scipy.optimize.cov(func, pars) I think this can be done by computing the Hessian via finite differences and then inverting it. leastsq seems to compute the Hessian from a "Jacobian". This is the part I don't understand, but without going into the details, let me ask this: Is there something special about the Levenberg-Marquard optimizer that it requires the individual observations? Or is it just that the current implementation of _minpack._lmdif (which is called by leastsq) was written such that it works this way (i.e. includes a computation of cov_x in addition to x) and it could also be written to take a scalar func like all the other optimizers? Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Sep 5 11:27:53 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Sep 2011 11:27:53 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil wrote: > > On Sep 5, 2011, at 2:23 PM, josef.pktd at gmail.com wrote: > > On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville > wrote: > > Hi, > > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: > > Hi, > > (I'm resurrecting an old post.) > > On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > > I just found that for some functions such as fmin_bfgs, the argument > > name > > for the objective function to be minimized is f, and for others such > > as > > fmin, it is func. > > I was wondering if this was intended, because I think it would be > > better to > > have consistent argument names across those functions. > > > It's unlikely that that was intentional. A patch would be welcome. > > "func" > > looks better to me than "f" or "F". > > There are still several inconsistencies in input or output of functions > > in the optimize package. For instance, for input parameters the Jacobian > > is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > > 'x_tol', etc. Outputs might be returned in a different order, e.g., > > fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, > > cov_x, infodict, mesg, ier'. Some functions make use of the infodict > > output whereas some return the same data individually. etc. > > If you still believe (as I do) that consistency of optimize > > functions should be improved, I can work on it. Let me know > > Also +1. > > I would add that the call signatures and return values for the user-supplied > > function to minimize should be made consistent too.? Currently, some > > functions (leastsq) requires the return value to be an array, while others > > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). > > That seems like a serious impediment to changing algorithms. > > I don't see how that would be possible, since it's a difference in > algorithm, leastsq needs the values for individual observations (to > calculate Jacobian), the other ones don't care and only maximize an > objective function that could have arbitrary accumulation. > > Otherwise I'm also +1. > It might be a bit messy during deprecation with double names, and > there remain different arguments depending on the algorithm, e.g. > constraints or not, and if constraints which kind, objective value and > derivative in one function or in two. > > Josef > > > +1 on making the input and output of the scipy optimizer routines more > uniform. > I have one additional, somewhat related question to the leastsq issue Matt > mentioned: > How can I compute the covariance matrix for a likelihood fit using the > routines in scipy.optimize? 1300 lines in scikits.statsmodels wrapping some scipy optimizers for maximum likelihood estimation e.g. LikelihoodModel https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81 GenericLikelihoodModel https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421 in the latter I use loglike and loglikeobs, depending on whether I need or want the Jacobian or not. > As far as I see the only method that returns a covariance matrix (or results > from with it is easy to compute the covariance matrix) are leastsq and > curve_fit. The problem with these is that they take a residual vector chi > and internally compute ?chi2 = (chi**2).sum() and minimize that. But for a > likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring > is required, so as far as I can see it is not possible to do a likelihood > fit with leastsq and curve_fit. > On the other hand as Matt pointed out most (all) other optimization methods > (like e.g. fmin) don't do this squaring and summing internally, but the user > function does it if one is doing a chi2 fit, so using these it is easy to do > a likelihood fit. But it is not possible to get parameter errors because > these other optimizers don't return a Hessian or covariance matrix. > It would be nice if there were a method in scipy.optimize to compute the > covariance matrix for all optimizers. fmin and the other optimizer don't have enough assumptions to figure out whether the Hessian is the covariance matrix of the parameters. This is only true if the objective function is the loglikelihood. But there is no reason fmin and the others should assume this. We had a brief discussion a while ago on the mailinglist which led to the notes in the leastsq docstring http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq that cov_x also is not always the (unscaled) covariance matrix. If the objective function comes from a different estimation method for example, then I think it's usually not the case that the (inverse) Hessian is the covariance matrix. > And it would be nice if the super-fast Levenberg-Marquard optimizer called > by leastsq were also available for likelihood fits. > Maybe it would be possible to add one method to scipy.optimize to compute > the covariance matrix after any of the optimizers has run, > i.e. the best-fit pars have been determined? > cov = scipy.optimize.cov(func, pars) > I think this can be done by computing the Hessian?via finite differences and > then inverting it. > leastsq seems to compute the Hessian from a "Jacobian". This is the part I > don't understand, but without going into the details, let me ask this: > Is there something special about the Levenberg-Marquard optimizer that it > requires the individual observations? It uses the outer (?) product of the Jacobian (all observations) to find the improving directions, and the outerproduct of the Jacobian is a numerical approximation to the Hessian in the err = y-f(x,params) or loglikelihood case. The Hessian calculation can break down quite easily (not positive definite because of numerical problems), the product of the Jacobian is more robust in my experience and cheaper. (statsmodels GenericLikelihood has covariance based on Hessian, Jacobian and a sandwich of the two). But I never looked at the internals of minpack. Josef > Or is it just that the current implementation of?_minpack._lmdif (which is > called by leastsq) was written such that it works this way (i.e. includes a > computation of cov_x in addition to x) and it could also be written to take > a scalar func like all the other ?optimizers? > Christoph > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From charlesr.harris at gmail.com Mon Sep 5 12:59:12 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Sep 2011 10:59:12 -0600 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: On Mon, Sep 5, 2011 at 7:23 AM, wrote: > On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville > wrote: > > Hi, > > > > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: > >> > >> Hi, > >> > >> (I'm resurrecting an old post.) > >> > >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > >> > > I just found that for some functions such as fmin_bfgs, the argument > >> > > name > >> > > for the objective function to be minimized is f, and for others such > >> > > as > >> > > fmin, it is func. > >> > > I was wondering if this was intended, because I think it would be > >> > > better to > >> > > have consistent argument names across those functions. > >> > > > >> > > >> > It's unlikely that that was intentional. A patch would be welcome. > >> > "func" > >> > looks better to me than "f" or "F". > >> > >> There are still several inconsistencies in input or output of functions > >> in the optimize package. For instance, for input parameters the Jacobian > >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > >> 'x_tol', etc. Outputs might be returned in a different order, e.g., > >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, > >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict > >> output whereas some return the same data individually. etc. > >> > >> If you still believe (as I do) that consistency of optimize > >> functions should be improved, I can work on it. Let me know > > > > Also +1. > > > > I would add that the call signatures and return values for the > user-supplied > > function to minimize should be made consistent too. Currently, some > > functions (leastsq) requires the return value to be an array, while > others > > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). > > That seems like a serious impediment to changing algorithms. > > I don't see how that would be possible, since it's a difference in > algorithm, leastsq needs the values for individual observations (to > calculate Jacobian), the other ones don't care and only maximize an > objective function that could have arbitrary accumulation. > > Otherwise I'm also +1. > It might be a bit messy during deprecation with double names, and > there remain different arguments depending on the algorithm, e.g. > constraints or not, and if constraints which kind, objective value and > derivative in one function or in two. > > It would be nice if leastsq accepted multidimensional function returns. That sort of thing turns up in fitting vectors valued functions and it shouldn't be hard to add. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From deil.christoph at googlemail.com Mon Sep 5 14:17:06 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Mon, 5 Sep 2011 19:17:06 +0100 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> On Sep 5, 2011, at 4:27 PM, josef.pktd at gmail.com wrote: > On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil > wrote: >> >> How can I compute the covariance matrix for a likelihood fit using the >> routines in scipy.optimize? > > 1300 lines in scikits.statsmodels wrapping some scipy optimizers for > maximum likelihood estimation > e.g. LikelihoodModel > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81 > GenericLikelihoodModel > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421 > in the latter I use loglike and loglikeobs, depending on whether I > need or want the Jacobian or not. This is a very nice class! But installing and learning statsmodels takes a while, and wanting parameter errors for a likelihood fit is a very common case for using an optimizer, so I think it would be nice to include that functionality in scipy.optimize itself (see code below). > >> As far as I see the only method that returns a covariance matrix (or results >> from with it is easy to compute the covariance matrix) are leastsq and >> curve_fit. The problem with these is that they take a residual vector chi >> and internally compute chi2 = (chi**2).sum() and minimize that. But for a >> likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring >> is required, so as far as I can see it is not possible to do a likelihood >> fit with leastsq and curve_fit. >> On the other hand as Matt pointed out most (all) other optimization methods >> (like e.g. fmin) don't do this squaring and summing internally, but the user >> function does it if one is doing a chi2 fit, so using these it is easy to do >> a likelihood fit. But it is not possible to get parameter errors because >> these other optimizers don't return a Hessian or covariance matrix. >> It would be nice if there were a method in scipy.optimize to compute the >> covariance matrix for all optimizers. > > fmin and the other optimizer don't have enough assumptions to figure > out whether the Hessian is the covariance matrix of the parameters. > This is only true if the objective function is the loglikelihood. But > there is no reason fmin and the others should assume this. > > We had a brief discussion a while ago on the mailinglist which led to > the notes in the leastsq docstring > http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq > that cov_x also is not always the (unscaled) covariance matrix. > > If the objective function comes from a different estimation method for > example, then I think it's usually not the case that the (inverse) > Hessian is the covariance matrix. Actually all that is missing from scipy to be able to get parameter errors in a likelihood fit (which includes chi2 fit and I believe is by far the most common case) with scipy.optimize (for all optimizers!) is a method to compute the Hessian, as already available e.g. in these packages: http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134 https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99 Would it be possible to move one of the approx_hess methods to scipy.optimize (there is already a scipy.optimize.approx_fprime)? Looking through numpy and scipy the only differentiation method I could find was http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html which is only 1D, I believe. """This is the example from the curve_fit docstring""" import numpy as np from scipy.optimize import curve_fit, fmin def func(x, a, b, c): return a * np.exp(-b * x) + c p0 = (2.5, 1.3, 0.5) x = np.linspace(0, 4, 50) y = func(x, *p0) np.random.seed(0) yn = y + 0.2 * np.random.normal(size=len(x)) popt, pcov = curve_fit(func, x, yn) print 'curve_fit results:' print 'values:', popt print 'errors:', np.sqrt(pcov.diagonal()) """And here is how to compute the fit parameter values and errors using one of the other optimizers (exemplified with fmin) and a method to compute the Hesse matrix""" def chi2(pars): chi = yn - func(x, *pars) return (chi ** 2).sum() popt = fmin(chi2, p0, disp=False) from numpy.dual import inv from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as approx_hess phess = approx_hess(popt, chi2) def approx_covar(hess, red_chi2): return red_chi2 * inv(phess / 2.) pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0))) print 'curve_fit results:' print 'values:', popt print 'errors:', np.sqrt(pcov.diagonal()) curve_fit results: values: [ 2.80720814 1.24568448 0.44517316] errors: [ 0.12563313 0.12409886 0.05875364] fmin and approx_hess results: values: [ 2.80720337 1.24565936 0.44515526] errors: [ 0.12610377 0.12802944 0.05979982] > >> And it would be nice if the super-fast Levenberg-Marquard optimizer called >> by leastsq were also available for likelihood fits. >> Maybe it would be possible to add one method to scipy.optimize to compute >> the covariance matrix after any of the optimizers has run, >> i.e. the best-fit pars have been determined? >> cov = scipy.optimize.cov(func, pars) >> I think this can be done by computing the Hessian via finite differences and >> then inverting it. >> leastsq seems to compute the Hessian from a "Jacobian". This is the part I >> don't understand, but without going into the details, let me ask this: >> Is there something special about the Levenberg-Marquard optimizer that it >> requires the individual observations? > > It uses the outer (?) product of the Jacobian (all observations) to > find the improving directions, and the outerproduct of the Jacobian is > a numerical approximation to the Hessian in the > err = y-f(x,params) or loglikelihood case. > The Hessian calculation can break down quite easily (not positive > definite because of numerical problems), the product of the Jacobian > is more robust in my experience and cheaper. (statsmodels > GenericLikelihood has covariance based on Hessian, Jacobian and a > sandwich of the two). Its true that there are often numerical problems with the Hessian. If the Jacobian method is more robust, then maybe that could be included in scipy.optimize instead of or in addition to the Hesse method? > > But I never looked at the internals of minpack. > > Josef > >> Or is it just that the current implementation of _minpack._lmdif (which is >> called by leastsq) was written such that it works this way (i.e. includes a >> computation of cov_x in addition to x) and it could also be written to take >> a scalar func like all the other optimizers? >> Christoph >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Sep 5 15:42:44 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Sep 2011 13:42:44 -0600 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> Message-ID: On Mon, Sep 5, 2011 at 12:17 PM, Christoph Deil < deil.christoph at googlemail.com> wrote: > > On Sep 5, 2011, at 4:27 PM, josef.pktd at gmail.com wrote: > > On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil > wrote: > > > How can I compute the covariance matrix for a likelihood fit using the > > routines in scipy.optimize? > > > 1300 lines in scikits.statsmodels wrapping some scipy optimizers for > maximum likelihood estimation > e.g. LikelihoodModel > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81 > GenericLikelihoodModel > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421 > in the latter I use loglike and loglikeobs, depending on whether I > need or want the Jacobian or not. > > > This is a very nice class! > > But installing and learning statsmodels takes a while, and wanting > parameter errors for a likelihood fit > is a very common case for using an optimizer, so I think it would be > nice to include that functionality in scipy.optimize itself (see code > below). > > > As far as I see the only method that returns a covariance matrix (or > results > > from with it is easy to compute the covariance matrix) are leastsq and > > curve_fit. The problem with these is that they take a residual vector chi > > and internally compute chi2 = (chi**2).sum() and minimize that. But for a > > likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring > > is required, so as far as I can see it is not possible to do a likelihood > > fit with leastsq and curve_fit. > > On the other hand as Matt pointed out most (all) other optimization methods > > (like e.g. fmin) don't do this squaring and summing internally, but the > user > > function does it if one is doing a chi2 fit, so using these it is easy to > do > > a likelihood fit. But it is not possible to get parameter errors because > > these other optimizers don't return a Hessian or covariance matrix. > > It would be nice if there were a method in scipy.optimize to compute the > > covariance matrix for all optimizers. > > > fmin and the other optimizer don't have enough assumptions to figure > out whether the Hessian is the covariance matrix of the parameters. > This is only true if the objective function is the loglikelihood. But > there is no reason fmin and the others should assume this. > > We had a brief discussion a while ago on the mailinglist which led to > the notes in the leastsq docstring > http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq > that cov_x also is not always the (unscaled) covariance matrix. > > If the objective function comes from a different estimation method for > example, then I think it's usually not the case that the (inverse) > Hessian is the covariance matrix. > > > Actually all that is missing from scipy to be able to get parameter errors > in a likelihood fit (which includes chi2 fit and I believe is by far the > most common case) with scipy.optimize (for all optimizers!) is a method to > compute the Hessian, as already available e.g. in these packages: > > http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134 > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99 > > Would it be possible to move one of the approx_hess methods to > scipy.optimize (there is already a scipy.optimize.approx_fprime)? > > Looking through numpy and scipy the only differentiation method I could > find was > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html > which is only 1D, I believe. > > """This is the example from the curve_fit docstring""" > import numpy as np > from scipy.optimize import curve_fit, fmin > def func(x, a, b, c): > return a * np.exp(-b * x) + c > p0 = (2.5, 1.3, 0.5) > x = np.linspace(0, 4, 50) > y = func(x, *p0) > np.random.seed(0) > yn = y + 0.2 * np.random.normal(size=len(x)) > popt, pcov = curve_fit(func, x, yn) > print 'curve_fit results:' > print 'values:', popt > print 'errors:', np.sqrt(pcov.diagonal()) > > """And here is how to compute the fit parameter values and errors > using one of the other optimizers (exemplified with fmin) and > a method to compute the Hesse matrix""" > def chi2(pars): > chi = yn - func(x, *pars) > return (chi ** 2).sum() > popt = fmin(chi2, p0, disp=False) > from numpy.dual import inv > from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 asapprox_hess > phess = approx_hess(popt, chi2) > def approx_covar(hess, red_chi2): > return red_chi2 * inv(phess / 2.) > pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0))) > print 'curve_fit results:' > print 'values:', popt > print 'errors:', np.sqrt(pcov.diagonal()) > > curve_fit results: > values: [ 2.80720814 1.24568448 0.44517316] > errors: [ 0.12563313 0.12409886 0.05875364] > fmin and approx_hess results: > values: [ 2.80720337 1.24565936 0.44515526] > errors: [ 0.12610377 0.12802944 0.05979982] > > > > And it would be nice if the super-fast Levenberg-Marquard optimizer called > > by leastsq were also available for likelihood fits. > > Maybe it would be possible to add one method to scipy.optimize to compute > > the covariance matrix after any of the optimizers has run, > > i.e. the best-fit pars have been determined? > > cov = scipy.optimize.cov(func, pars) > > I think this can be done by computing the Hessian via finite differences > and > > then inverting it. > > leastsq seems to compute the Hessian from a "Jacobian". This is the part I > > don't understand, but without going into the details, let me ask this: > > Is there something special about the Levenberg-Marquard optimizer that it > > requires the individual observations? > > > It uses the outer (?) product of the Jacobian (all observations) to > find the improving directions, and the outerproduct of the Jacobian is > a numerical approximation to the Hessian in the > err = y-f(x,params) or loglikelihood case. > > The Hessian calculation can break down quite easily (not positive > definite because of numerical problems), the product of the Jacobian > is more robust in my experience and cheaper. (statsmodels > GenericLikelihood has covariance based on Hessian, Jacobian and a > sandwich of the two). > > > Its true that there are often numerical problems with the Hessian. > If the Jacobian method is more robust, then maybe that could be included in > scipy.optimize instead of or in addition to the Hesse method? > > The Levenberg-Marquardt algorithm uses a parameter linearization, i.e., the Jacobian, as the design matrix of ordinary least squares together with adaptive regularization. It is the ordinary least squares approach that dictates the vector form of the error to be minimized. The (unscaled) covariance of the parameters is the usual (A^t * A)^{-1) of ordinary least squares with the Jacobian in place of A. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Sep 5 15:56:24 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Sep 2011 15:56:24 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> Message-ID: On Mon, Sep 5, 2011 at 2:17 PM, Christoph Deil wrote: > > On Sep 5, 2011, at 4:27 PM, josef.pktd at gmail.com wrote: > > On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil > wrote: > > How can I compute the covariance matrix for a likelihood fit using the > > routines in scipy.optimize? > > 1300 lines in scikits.statsmodels ?wrapping some scipy optimizers for > maximum likelihood estimation > e.g. LikelihoodModel > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81 > GenericLikelihoodModel > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421 > in the latter I use loglike and loglikeobs, depending on whether I > need or want the Jacobian or not. > > This is a very nice class! > But installing and learning statsmodels takes a while, and wanting parameter > errors for a likelihood fit > is a very common case for using an optimizer, so I think it would be nice?to > include that functionality in scipy.optimize itself (see code below). > > As far as I see the only method that returns a covariance matrix (or results > > from with it is easy to compute the covariance matrix) are leastsq and > > curve_fit. The problem with these is that they take a residual vector chi > > and internally compute ?chi2 = (chi**2).sum() and minimize that. But for a > > likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring > > is required, so as far as I can see it is not possible to do a likelihood > > fit with leastsq and curve_fit. > > On the other hand as Matt pointed out most (all) other optimization methods > > (like e.g. fmin) don't do this squaring and summing internally, but the user > > function does it if one is doing a chi2 fit, so using these it is easy to do > > a likelihood fit. But it is not possible to get parameter errors because > > these other optimizers don't return a Hessian or covariance matrix. > > It would be nice if there were a method in scipy.optimize to compute the > > covariance matrix for all optimizers. > > fmin and the other optimizer don't have enough assumptions to figure > out whether the Hessian is the covariance matrix of the parameters. > This is only true if the objective function is the loglikelihood. But > there is no reason fmin and the others should assume this. > > We had a brief discussion a while ago on the mailinglist which led to > the notes in the leastsq docstring > http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq > that cov_x also is not always the (unscaled) covariance matrix. > > If the objective function comes from a different estimation method for > example, then I think it's usually not the case that the (inverse) > Hessian is the covariance matrix. > > Actually all that is missing from scipy to be able to get parameter errors > in a likelihood fit (which includes chi2 fit and I believe is by far the > most common case) with scipy.optimize (for all optimizers!)?is a method to > compute the Hessian, as already available e.g. in these packages: > http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134 > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99 > Would it be possible to move one of the approx_hess methods to > scipy.optimize (there is already a?scipy.optimize.approx_fprime)? > Looking through numpy and scipy the only differentiation method I could find > was > http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html > which is only 1D, I believe. > """This is the example from the curve_fit docstring""" > import numpy as np > from scipy.optimize import curve_fit, fmin > def func(x, a, b, c): > ? ? return a * np.exp(-b * x) + c > p0 = (2.5, 1.3, 0.5) > x = np.linspace(0, 4, 50) > y = func(x, *p0) > np.random.seed(0) > yn = y + 0.2 * np.random.normal(size=len(x)) > popt, pcov = curve_fit(func, x, yn) > print 'curve_fit results:' > print 'values:', popt > print 'errors:', np.sqrt(pcov.diagonal()) > """And here is how to compute the fit parameter values and errors > using one of the other optimizers (exemplified with fmin) and > a method to compute the Hesse matrix""" > def chi2(pars): > ? ? chi = yn - func(x, *pars) > ? ? return (chi ** 2).sum() > popt = fmin(chi2, p0, disp=False) > from numpy.dual import inv > from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as > approx_hess > phess = approx_hess(popt, chi2) > def approx_covar(hess, red_chi2): > ? ? return red_chi2 * inv(phess / 2.) > pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0))) I don't quite see where the normalizations, e.g. /2., are coming from. However, I never tried to use the Hessian with leastsq, and I would have to look at this. > print 'curve_fit results:' > print 'values:', popt > print 'errors:', np.sqrt(pcov.diagonal()) > curve_fit results: > values: [ 2.80720814? 1.24568448? 0.44517316] > errors: [ 0.12563313? 0.12409886? 0.05875364] > fmin and approx_hess results: > values: [ 2.80720337? 1.24565936? 0.44515526] > errors: [ 0.12610377? 0.12802944? 0.05979982] The attachment is a quickly written script to use the GenericLikelihoodModel. The advantage of using statsmodels is that you are (supposed to be) getting all additional results that are available for Maximum Likelihood Estimation. e.g. my version of this with stats.norm.pdf. I took the pattern partially from the miscmodel that uses t distributed errors. It's supposed to be easy to switch distributions. ----------- class MyNonLinearMLEModel(NonLinearMLEModel): '''Maximum Likelihood Estimation of Linear Model with nonlinear function This is an example for generic MLE. Except for defining the negative log-likelihood method, all methods and results are generic. Gradients and Hessian and all resulting statistics are based on numerical differentiation. ''' def _predict(self, params, x): a, b, c = params return a * np.exp(-b * x) + c mod = MyNonLinearMLEModel(yn, x) res = mod.fit(start_params=[1., 1., 1., 1.]) print 'true parameters' print np.array(list(p0)+[0.2]) print 'parameter estimates' print res.params print 'standard errors, hessian, jacobian, sandwich' print res.bse print res.bsejac print res.bsejhj ------- curve_fit results: values: [ 2.80720815 1.24568449 0.44517316] errors: [ 0.12563313 0.12409886 0.05875364] Optimization terminated successfully. Current function value: -7.401006 Iterations: 141 Function evaluations: 247 true parameters [ 2.5 1.3 0.5 0.2] parameter estimates [ 2.80725714 1.24569287 0.44515168 0.20867979] standard errors, hessian, jacobian, sandwich [ 0.12226468 0.12414298 0.05798039 0.02086842] [ 0.15756494 0.15865264 0.06242619 0.02261329] [ 0.09852726 0.10175901 0.05740336 0.01994902] >>> res.aic -6.8020120409185836 >>> res.bic 0.84607998079400026 >>> res.t_test([0,1,0,0],1) Traceback (most recent call last): File "", line 1, in res.t_test([0,1,0,0],1) File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\base\model.py", line 1020, in t_test df_denom=self.model.df_resid) AttributeError: 'MyNonLinearMLEModel' object has no attribute 'df_resid' >>> res.model.df_resid = len(yn) - len(p0) >>> res.t_test([0,1,0,0],1) >>> print res.t_test([0,1,0,0],1) >>> GenericLikelihoodModel is still not polished, df_resid is not defined generically. bootstrap raised an exception. I agree that scipy should have numerical differentiation, jacobian Hessian like numdifftools. Josef > > > And it would be nice if the super-fast Levenberg-Marquard optimizer called > > by leastsq were also available for likelihood fits. > > Maybe it would be possible to add one method to scipy.optimize to compute > > the covariance matrix after any of the optimizers has run, > > i.e. the best-fit pars have been determined? > > cov = scipy.optimize.cov(func, pars) > > I think this can be done by computing the Hessian?via finite differences and > > then inverting it. > > leastsq seems to compute the Hessian from a "Jacobian". This is the part I > > don't understand, but without going into the details, let me ask this: > > Is there something special about the Levenberg-Marquard optimizer that it > > requires the individual observations? > > It uses the outer (?) product of the Jacobian (all observations) to > find the improving directions, and the outerproduct of the Jacobian is > a numerical approximation to the Hessian in the > err = y-f(x,params) or loglikelihood case. > > The Hessian calculation can break down quite easily (not positive > definite because of numerical problems), the product of the Jacobian > is more robust in my experience and cheaper. (statsmodels > GenericLikelihood has covariance based on Hessian, Jacobian and a > sandwich of the two). > > Its true that there are often numerical problems with the Hessian. > If the Jacobian method is more robust, then maybe that could be included in > scipy.optimize instead of or in addition to the Hesse method? > > But I never looked at the internals of minpack. > > Josef > > > > Or is it just that the current implementation of?_minpack._lmdif (which is > > called by leastsq) was written such that it works this way (i.e. includes a > > computation of cov_x in addition to x) and it could also be written to take > > a scalar func like all the other ?optimizers? > > Christoph > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- A non-text attachment was scrubbed... Name: try_lonlin.py Type: text/x-python Size: 2661 bytes Desc: not available URL: From deil.christoph at googlemail.com Mon Sep 5 19:03:02 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Tue, 6 Sep 2011 00:03:02 +0100 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <208A901A-62D4-4D5D-BD66-05157527DFB7@googlemail.com> Message-ID: On Sep 5, 2011, at 8:56 PM, josef.pktd at gmail.com wrote: > On Mon, Sep 5, 2011 at 2:17 PM, Christoph Deil > wrote: >> >> On Sep 5, 2011, at 4:27 PM, josef.pktd at gmail.com wrote: >> >> On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil >> wrote: >> >> How can I compute the covariance matrix for a likelihood fit using the >> >> routines in scipy.optimize? >> >> 1300 lines in scikits.statsmodels wrapping some scipy optimizers for >> maximum likelihood estimation >> e.g. LikelihoodModel >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81 >> GenericLikelihoodModel >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421 >> in the latter I use loglike and loglikeobs, depending on whether I >> need or want the Jacobian or not. >> >> This is a very nice class! >> But installing and learning statsmodels takes a while, and wanting parameter >> errors for a likelihood fit >> is a very common case for using an optimizer, so I think it would be nice to >> include that functionality in scipy.optimize itself (see code below). >> >> As far as I see the only method that returns a covariance matrix (or results >> >> from with it is easy to compute the covariance matrix) are leastsq and >> >> curve_fit. The problem with these is that they take a residual vector chi >> >> and internally compute chi2 = (chi**2).sum() and minimize that. But for a >> >> likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring >> >> is required, so as far as I can see it is not possible to do a likelihood >> >> fit with leastsq and curve_fit. >> >> On the other hand as Matt pointed out most (all) other optimization methods >> >> (like e.g. fmin) don't do this squaring and summing internally, but the user >> >> function does it if one is doing a chi2 fit, so using these it is easy to do >> >> a likelihood fit. But it is not possible to get parameter errors because >> >> these other optimizers don't return a Hessian or covariance matrix. >> >> It would be nice if there were a method in scipy.optimize to compute the >> >> covariance matrix for all optimizers. >> >> fmin and the other optimizer don't have enough assumptions to figure >> out whether the Hessian is the covariance matrix of the parameters. >> This is only true if the objective function is the loglikelihood. But >> there is no reason fmin and the others should assume this. >> >> We had a brief discussion a while ago on the mailinglist which led to >> the notes in the leastsq docstring >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq >> that cov_x also is not always the (unscaled) covariance matrix. >> >> If the objective function comes from a different estimation method for >> example, then I think it's usually not the case that the (inverse) >> Hessian is the covariance matrix. >> >> Actually all that is missing from scipy to be able to get parameter errors >> in a likelihood fit (which includes chi2 fit and I believe is by far the >> most common case) with scipy.optimize (for all optimizers!) is a method to >> compute the Hessian, as already available e.g. in these packages: >> http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134 >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99 >> Would it be possible to move one of the approx_hess methods to >> scipy.optimize (there is already a scipy.optimize.approx_fprime)? >> Looking through numpy and scipy the only differentiation method I could find >> was >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html >> which is only 1D, I believe. >> """This is the example from the curve_fit docstring""" >> import numpy as np >> from scipy.optimize import curve_fit, fmin >> def func(x, a, b, c): >> return a * np.exp(-b * x) + c >> p0 = (2.5, 1.3, 0.5) >> x = np.linspace(0, 4, 50) >> y = func(x, *p0) >> np.random.seed(0) >> yn = y + 0.2 * np.random.normal(size=len(x)) >> popt, pcov = curve_fit(func, x, yn) >> print 'curve_fit results:' >> print 'values:', popt >> print 'errors:', np.sqrt(pcov.diagonal()) >> """And here is how to compute the fit parameter values and errors >> using one of the other optimizers (exemplified with fmin) and >> a method to compute the Hesse matrix""" >> def chi2(pars): >> chi = yn - func(x, *pars) >> return (chi ** 2).sum() >> popt = fmin(chi2, p0, disp=False) >> from numpy.dual import inv >> from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as >> approx_hess >> phess = approx_hess(popt, chi2) >> def approx_covar(hess, red_chi2): >> return red_chi2 * inv(phess / 2.) >> pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0))) > > I don't quite see where the normalizations, e.g. /2., are coming from. I'm not sure either why a factor /2. gives a correct covariance matrix here. I found several references that the covariance matrix is simply the inverse Hessian, no factor 2 !? > However, I never tried to use the Hessian with leastsq, and I would > have to look at this. > >> print 'curve_fit results:' >> print 'values:', popt >> print 'errors:', np.sqrt(pcov.diagonal()) >> curve_fit results: >> values: [ 2.80720814 1.24568448 0.44517316] >> errors: [ 0.12563313 0.12409886 0.05875364] >> fmin and approx_hess results: >> values: [ 2.80720337 1.24565936 0.44515526] >> errors: [ 0.12610377 0.12802944 0.05979982] > > The attachment is a quickly written script to use the > GenericLikelihoodModel. The advantage of using statsmodels is that you > are (supposed to be) getting all additional results that are available > for Maximum Likelihood Estimation. > > e.g. my version of this with stats.norm.pdf. I took the pattern > partially from the miscmodel that uses t distributed errors. It's > supposed to be easy to switch distributions. > > ----------- > class MyNonLinearMLEModel(NonLinearMLEModel): > '''Maximum Likelihood Estimation of Linear Model with nonlinear function > > This is an example for generic MLE. > > Except for defining the negative log-likelihood method, all > methods and results are generic. Gradients and Hessian > and all resulting statistics are based on numerical > differentiation. > > ''' > > def _predict(self, params, x): > a, b, c = params > return a * np.exp(-b * x) + c > > > mod = MyNonLinearMLEModel(yn, x) > res = mod.fit(start_params=[1., 1., 1., 1.]) > print 'true parameters' > print np.array(list(p0)+[0.2]) > print 'parameter estimates' > print res.params > print 'standard errors, hessian, jacobian, sandwich' > print res.bse > print res.bsejac > print res.bsejhj > ------- > curve_fit results: > values: [ 2.80720815 1.24568449 0.44517316] > errors: [ 0.12563313 0.12409886 0.05875364] > Optimization terminated successfully. > Current function value: -7.401006 > Iterations: 141 > Function evaluations: 247 > true parameters > [ 2.5 1.3 0.5 0.2] > parameter estimates > [ 2.80725714 1.24569287 0.44515168 0.20867979] > standard errors, hessian, jacobian, sandwich > [ 0.12226468 0.12414298 0.05798039 0.02086842] > [ 0.15756494 0.15865264 0.06242619 0.02261329] > [ 0.09852726 0.10175901 0.05740336 0.01994902] There are large differences between the three methods!? Especially the Jacobian method is too high in this case (and doesn't match the one from curve_fit, which also uses the Jacobian method, right?). I did one more check with the hesse() method from MINUIT, which gives consistent (well, two significant digits) results with curve_fit in this case: curve_fit results: values: [ 2.80720814 1.24568448 0.44517316] errors: [ 0.12563313 0.12409886 0.05875364] minuit results values: [ 2.80652024 1.24525732 0.44525123] errors: [ 0.1260859 0.12784433 0.05975276] from minuit import Minuit def chi2(a, b, c): chi = yn - func(x, a, b, c) return (chi ** 2).sum() m = Minuit(chi2, a=2.5, b=1.3, c=0.5) m.migrad() m.hesse() pcov = red_chi2 * np.array(m.matrix()) popt = np.array(m.args) print 'minuit results' print 'values:', popt print 'errors:', np.sqrt(pcov.diagonal()) >>>> res.aic > -6.8020120409185836 >>>> res.bic > 0.84607998079400026 >>>> res.t_test([0,1,0,0],1) > Traceback (most recent call last): > File "", line 1, in > res.t_test([0,1,0,0],1) > File "E:\Josef\eclipsegworkspace\statsmodels-git\statsmodels-josef\scikits\statsmodels\base\model.py", > line 1020, in t_test > df_denom=self.model.df_resid) > AttributeError: 'MyNonLinearMLEModel' object has no attribute 'df_resid' >>>> res.model.df_resid = len(yn) - len(p0) >>>> res.t_test([0,1,0,0],1) > >>>> print res.t_test([0,1,0,0],1) > t=array([[ 1.97911215]]), p=array([[ 0.02683928]]), df_denom=47> >>>> > > GenericLikelihoodModel is still not polished, df_resid is not defined > generically. bootstrap raised an exception. > > I agree that scipy should have numerical differentiation, jacobian > Hessian like numdifftools. I've made an enhancement ticket and informed the numdifftools author about it: http://projects.scipy.org/scipy/ticket/1510 > > Josef > >> >> >> And it would be nice if the super-fast Levenberg-Marquard optimizer called >> >> by leastsq were also available for likelihood fits. >> >> Maybe it would be possible to add one method to scipy.optimize to compute >> >> the covariance matrix after any of the optimizers has run, >> >> i.e. the best-fit pars have been determined? >> >> cov = scipy.optimize.cov(func, pars) >> >> I think this can be done by computing the Hessian via finite differences and >> >> then inverting it. >> >> leastsq seems to compute the Hessian from a "Jacobian". This is the part I >> >> don't understand, but without going into the details, let me ask this: >> >> Is there something special about the Levenberg-Marquard optimizer that it >> >> requires the individual observations? >> >> It uses the outer (?) product of the Jacobian (all observations) to >> find the improving directions, and the outerproduct of the Jacobian is >> a numerical approximation to the Hessian in the >> err = y-f(x,params) or loglikelihood case. >> >> The Hessian calculation can break down quite easily (not positive >> definite because of numerical problems), the product of the Jacobian >> is more robust in my experience and cheaper. (statsmodels >> GenericLikelihood has covariance based on Hessian, Jacobian and a >> sandwich of the two). >> >> Its true that there are often numerical problems with the Hessian. >> If the Jacobian method is more robust, then maybe that could be included in >> scipy.optimize instead of or in addition to the Hesse method? >> >> But I never looked at the internals of minpack. >> >> Josef >> >> >> >> Or is it just that the current implementation of _minpack._lmdif (which is >> >> called by leastsq) was written such that it works this way (i.e. includes a >> >> computation of cov_x in addition to x) and it could also be written to take >> >> a scalar func like all the other optimizers? >> >> Christoph >> >> >> >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinstocks at gmail.com Mon Sep 5 19:57:58 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Mon, 05 Sep 2011 19:57:58 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: Message-ID: <4E656206.10503@gmail.com> Also in favour of a comments system. On 09/05/2011 03:39 AM, Michael Klitgaard wrote: > Would it be possible to include data files on SciPy-Central? > > In this case the file 'ct.raw'. I believe it would improve the quality > of the program to include the sample_data. > > This could perhaps make scipy central evem more usefull than other > code sharing sites. > > Sincerely > Michael > The problem I see with data files is that they could be potentially large. There may be a way around the problems associated with this, though. Maybe I am a bit naive, but I can't think of many common circumstances where a code fragment or program which is potentially useful to many people would be more helpful by providing a data file, since most people who would find said code useful would already have access to their own data set. I do, however, see the benefit of having some set of sample data on SciPy-Central, but perhaps this data set should be generic in that many different code contributions could reference it in a useful way. My two cents. -- Collin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: From rnelsonchem at gmail.com Mon Sep 5 20:04:47 2011 From: rnelsonchem at gmail.com (Ryan Nelson) Date: Mon, 5 Sep 2011 19:04:47 -0500 Subject: [SciPy-User] Problems with numpydoc In-Reply-To: References: Message-ID: Thank you so much Pauli! I can't believe it was something so simple. I spent a long time trying different things out... But not the most obvious I guess. Ryan On Mon, Sep 5, 2011 at 8:39 AM, Pauli Virtanen wrote: > Mon, 05 Sep 2011 01:20:57 -0500, Ryan Nelson wrote: > [clip] > > Write > > Parameters > ---------- > > instead of > > Parameters > __________ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgdunn at gmail.com Mon Sep 5 20:40:59 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Mon, 5 Sep 2011 20:40:59 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: <4E656206.10503@gmail.com> References: <4E656206.10503@gmail.com> Message-ID: Hi everyone, SciPy Central maintainer here - sorry for the slow reply. On Mon, Sep 5, 2011 at 19:57, Collin Stocks wrote: > Also in favour of a comments system. This is been bumped on the priority list. Django, which is used for SciPy Central, has a built-in commenting system. I will look at integrating it in the next while (just a bit busy at work with other things, but will get to it soon). My current highest priority is to complete library submissions, which are uploaded via a ZIP file. This will allow visitors to see the contents of the library by clicking on the file names (kind of like browsing a repo on GitHub). The next highest priority is commenting. So if you've got any requests for how commenting should look and behave: https://github.com/kgdunn/SciPyCentral/issues/111 > On 09/05/2011 03:39 AM, Michael Klitgaard wrote: >> Would it be possible to include data files on SciPy-Central? Absolutely. I've already got some Django code for this on my company's site (shameless plug: http://datasets.connectmv.com). By the way, does SciPy/NumPy have a way to load data from a URL like R? I've looked for this but can't seem to find anything on it. In R it is so nice to be able to say to someone: data = read.table('http://datasets.connectmv.com/file/ammonia.csv') rather that doing a two step: download and load. >> In this case the file 'ct.raw'. I believe it would improve the quality >> of the program to include the sample_data. >> >> This could perhaps make scipy central evem more usefull than other >> code sharing sites. Thanks for the idea! >> Sincerely >> Michael >> > > The problem I see with data files is that they could be potentially > large. There may be a way around the problems associated with this, though. Bandwidth on my host shouldn't be an issue. Right now I use less than 0.5% of my monthly 1200 Gb allocation, so there's plenty of room to grow. > Maybe I am a bit naive, but I can't think of many common circumstances > where a code fragment or program which is potentially useful to many > people would be more helpful by providing a data file, since most people > who would find said code useful would already have access to their own > data set. > > I do, however, see the benefit of having some set of sample data on > SciPy-Central, but perhaps this data set should be generic in that many > different code contributions could reference it in a useful way. Agreed. Which is why I was asking about loading data from a URL above. > My two cents. > > -- Collin From jsseabold at gmail.com Mon Sep 5 21:01:44 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 5 Sep 2011 21:01:44 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: <4E656206.10503@gmail.com> Message-ID: On Mon, Sep 5, 2011 at 8:40 PM, Kevin Dunn wrote: > Hi everyone, SciPy Central maintainer here - sorry for the slow reply. > > On Mon, Sep 5, 2011 at 19:57, Collin Stocks wrote: >> Also in favour of a comments system. > > This is been bumped on the priority list. Django, which is used for > SciPy Central, has a built-in commenting system. I will look at > integrating it in the next while (just a bit busy at work with other > things, but will get to it soon). > > My current highest priority is to complete library submissions, which > are uploaded via a ZIP file. This will allow visitors to see the > contents of the library by clicking on the file names (kind of like > browsing a repo on GitHub). > > The next highest priority is commenting. So if you've got any requests > for how commenting should look and behave: > https://github.com/kgdunn/SciPyCentral/issues/111 > >> On 09/05/2011 03:39 AM, Michael Klitgaard wrote: >>> Would it be possible to include data files on SciPy-Central? > > Absolutely. I've already got some Django code for this on my company's > site (shameless plug: http://datasets.connectmv.com). > We could also make the statsmdels datasets module independently distributable and available, if there's interest. > By the way, does SciPy/NumPy have a way to load data from a URL like R? > > I've looked for this but can't seem to find anything on it. In R it is > so nice to be able to say to someone: > data = read.table('http://datasets.connectmv.com/file/ammonia.csv') > > rather that doing a two step: download and load. > There is the DataSource class, though there are other ways this could be accomplished. I'm not sure that there's a function to do it yet. ds = np.lib.DataSource() fp = ds.open('http://datasets.connectmv.com/file/ammonia.csv') from StringIO import StringIO arr = np.genfromtxt(StringIO(fp.read()), names=True) Or you could use urllib import urllib fp2 = urllib.urlopen('http://datasets.connectmv.com/file/ammonia.csv') arr2 = np.genfromtxt(StringIO(fp2.read()), names=True) If there's nothing else def loadurl(url, *args, **kwargs): from urllib import urlopen from cStringIO import StringIO fp = urlopen(url) return np.genfromtxt(StringIO(fp.read()), *args, **kwargs) arr3 = loadurl('http://datasets.connectmv.com/file/ammonia.csv') Skipper >>> In this case the file 'ct.raw'. I believe it would improve the quality >>> of the program to include the sample_data. >>> >>> This could perhaps make scipy central evem more usefull than other >>> code sharing sites. > > Thanks for the idea! > >>> Sincerely >>> Michael >>> >> >> The problem I see with data files is that they could be potentially >> large. There may be a way around the problems associated with this, though. > > Bandwidth on my host shouldn't be an issue. Right now I use less than > 0.5% of my monthly 1200 Gb allocation, so there's plenty of room to > grow. > >> Maybe I am a bit naive, but I can't think of many common circumstances >> where a code fragment or program which is potentially useful to many >> people would be more helpful by providing a data file, since most people >> who would find said code useful would already have access to their own >> data set. >> >> I do, however, see the benefit of having some set of sample data on >> SciPy-Central, but perhaps this data set should be generic in that many >> different code contributions could reference it in a useful way. > > Agreed. Which is why I was asking about loading data from a URL above. > >> My two cents. >> >> -- Collin > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From matt.newville at gmail.com Mon Sep 5 14:36:51 2011 From: matt.newville at gmail.com (Matthew Newville) Date: Mon, 5 Sep 2011 11:36:51 -0700 (PDT) Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> On Monday, September 5, 2011 8:23:41 AM UTC-5, joep wrote: > > On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville > wrote: > > Hi, > > > > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: > >> > >> Hi, > >> > >> (I'm resurrecting an old post.) > >> > >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: > >> > > I just found that for some functions such as fmin_bfgs, the argument > >> > > name > >> > > for the objective function to be minimized is f, and for others such > >> > > as > >> > > fmin, it is func. > >> > > I was wondering if this was intended, because I think it would be > >> > > better to > >> > > have consistent argument names across those functions. > >> > > > >> > > >> > It's unlikely that that was intentional. A patch would be welcome. > >> > "func" > >> > looks better to me than "f" or "F". > >> > >> There are still several inconsistencies in input or output of functions > >> in the optimize package. For instance, for input parameters the Jacobian > >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > >> 'x_tol', etc. Outputs might be returned in a different order, e.g., > >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, > >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict > >> output whereas some return the same data individually. etc. > >> > >> If you still believe (as I do) that consistency of optimize > >> functions should be improved, I can work on it. Let me know > > > > Also +1. > > > > I would add that the call signatures and return values for the > user-supplied > > function to minimize should be made consistent too. Currently, some > > functions (leastsq) requires the return value to be an array, while > others > > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of residual). > > That seems like a serious impediment to changing algorithms. > > I don't see how that would be possible, since it's a difference in > algorithm, leastsq needs the values for individual observations (to > calculate Jacobian), the other ones don't care and only maximize an > objective function that could have arbitrary accumulation. > Well, I understand that this adds an implicit bias for least-squares, but if the algorithms receive an array from the user-function, taking (value*value).sum() might be preferred over raising an exception like ValueError: setting an array element with a sequence which is apparently meant to be read as "change your function to return a scalar". I don't see in the docs where it actually specifies what the user functions *should* return. I also agree with Charles' suggestion. Unraveling multi-dimensional arrays for leastsq (and others) would be convenient. --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghislain.viguier at cea.fr Tue Sep 6 04:39:20 2011 From: ghislain.viguier at cea.fr (Ghislain Viguier) Date: Tue, 06 Sep 2011 10:39:20 +0200 Subject: [SciPy-User] PB with scipy installation Message-ID: <4E65DC38.6060101@cea.fr> Hello, I try to install scipy with the intel mkl library but I get some errors : $ ls doc INSTALL.txt LATEST.txt LICENSE.txt MANIFEST.in PKG-INFO README.txt scipy setupegg.py setup.py setupscons.py THANKS.txt TOCHANGE.txt tools $ python setup.py install Warning: No configuration returned, assuming unavailable.blas_opt_info: blas_mkl_info: /usr/local/python-2.6.2/lib/python2.6/site-packages/numpy/distutils/system_info.py:527: UserWarning: Specified path /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib/em64t is invalid. warnings.warn('Specified path %s is invalid.' % d) NOT AVAILABLE [...] I added a symbolic link em64t to intel64 : $ ls /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib ia32 intel64 $ cd /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib $ ln -s intel64 em64t $ ls /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib/ em64t ia32 intel64 $ ll /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib/em64t lrwxrwxrwx 1 viguierg install2 7 Sep 6 10:16 /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib/em64t -> intel64 Then I started again the install script : $ python setup.py install Warning: No configuration returned, assuming unavailable.blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /usr/local/Intel_compilers/c/composerxe-2011.3.174/mkl/lib/em64t NOT AVAILABLE [...] But I still get errors. What am I doing wrong? Do I miss something?. Thanks for your support. Best regards, Ghislain Viguier PS : there is some futher information about my system : $ python -c 'from numpy.f2py.diagnose import run; run()' ------ os.name='posix' ------ sys.platform='linux2' ------ sys.version: 2.6.2 (r262:71600, Jul 22 2011, 11:30:26) [GCC 4.4.4 20100726 (Bull 4.4.4-13)] ------ sys.prefix: /usr/local/python-2.6.2 ------ sys.path=':/usr/local/python-2.6.2/lib/python2.6:/usr/local/python-2.6.2/lib/python26.zip:/usr/local/python-2.6.2/lib/python2.6/plat-linux2:/usr/local/python-2.6.2/lib/python2.6/lib-tk:/usr/local/python-2.6.2/lib/python2.6/lib-old:/usr/local/python-2.6.2/lib/python2.6/lib-dynload:/usr/local/python-2.6.2/lib/python2.6/site-packages' ------ Found new numpy version '1.5.1' in /usr/local/python-2.6.2/lib/python2.6/site-packages/numpy/__init__.pyc Found f2py2e version '1' in /usr/local/python-2.6.2/lib/python2.6/site-packages/numpy/f2py/f2py2e.pyc Found numpy.distutils version '0.4.0' in '/usr/local/python-2.6.2/lib/python2.6/site-packages/numpy/distutils/__init__.pyc' ------ Importing numpy.distutils.fcompiler ... ok ------ Checking availability of supported Fortran compilers: GnuFCompiler instance properties: archiver = ['/usr/bin/g77', '-cr'] compile_switch = '-c' compiler_f77 = ['/usr/bin/g77', '-g', '-Wall', '-fno-second- underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_f90 = None compiler_fix = None libraries = ['g2c'] library_dirs = [] linker_exe = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall'] linker_so = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall', '- shared'] object_switch = '-o ' ranlib = ['/usr/bin/g77'] version = LooseVersion ('3.4.6') version_cmd = ['/usr/bin/g77', '--version'] IntelEM64TFCompiler instance properties: archiver = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-cr'] compile_switch = '-c' compiler_f77 = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-FI', '-w90', ' -w95', '-fPIC', '-cm', '-O3', '-unroll', '-tpp7', '-xW'] compiler_f90 = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-FR', '-fPIC', '-cm', '-O3', '-unroll', '-tpp7', '-xW'] compiler_fix = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-FI', '-fPIC', '-cm', '-O3', '-unroll', '-tpp7', '-xW'] libraries = [] library_dirs = [] linker_exe = None linker_so = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-shared', '- shared', '-nofor_main'] object_switch = '-o ' ranlib = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort'] version = LooseVersion ('12.0.3.174') version_cmd = ['/ccc/products2/Intel_compilers/BullEL_6__x86_64/fortran/ composerxe-2011.3.174/bin/intel64/ifort', '-FI', '-V', '- c', '/tmp/tmpAM7Glj/aUFseR.f', '-o', '/tmp/tmpAM7Glj/aUFseR.o'] Gnu95FCompiler instance properties: archiver = ['/usr/bin/gfortran', '-cr'] compile_switch = '-c' compiler_f77 = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno- second-underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_f90 = ['/usr/bin/gfortran', '-Wall', '-fno-second-underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_fix = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno- second-underscore', '-Wall', '-fno-second-underscore', '- fPIC', '-O3', '-funroll-loops'] libraries = ['gfortran'] library_dirs = [] linker_exe = ['/usr/bin/gfortran', '-Wall', '-Wall'] linker_so = ['/usr/bin/gfortran', '-Wall', '-Wall', '-shared'] object_switch = '-o ' ranlib = ['/usr/bin/gfortran'] version = LooseVersion ('4.4.4') version_cmd = ['/usr/bin/gfortran', '--version'] Fortran compilers found: --fcompiler=gnu GNU Fortran 77 compiler (3.4.6) --fcompiler=gnu95 GNU Fortran 95 compiler (4.4.4) --fcompiler=intelem Intel Fortran Compiler for EM64T-based apps (12.0.3.174) Compilers available for this platform, but not found: --fcompiler=absoft Absoft Corp Fortran Compiler --fcompiler=compaq Compaq Fortran Compiler --fcompiler=g95 G95 Fortran Compiler --fcompiler=intel Intel Fortran Compiler for 32-bit apps --fcompiler=intele Intel Fortran Compiler for Itanium apps --fcompiler=lahey Lahey/Fujitsu Fortran 95 Compiler --fcompiler=nag NAGWare Fortran 95 Compiler --fcompiler=pg Portland Group Fortran Compiler --fcompiler=vast Pacific-Sierra Research Fortran 90 Compiler Compilers not available on this platform: --fcompiler=hpux HP Fortran 90 Compiler --fcompiler=ibm IBM XL Fortran Compiler --fcompiler=intelev Intel Visual Fortran Compiler for Itanium apps --fcompiler=intelv Intel Visual Fortran Compiler for 32-bit apps --fcompiler=intelvem Intel Visual Fortran Compiler for 64-bit apps --fcompiler=mips MIPSpro Fortran Compiler --fcompiler=none Fake Fortran compiler --fcompiler=sun Sun or Forte Fortran 95 Compiler For compiler details, run 'config_fc --verbose' setup command. ------ Importing numpy.distutils.cpuinfo ... ok ------ CPU information: CPUInfoBase__get_nbits getNCPUs has_mmx has_sse has_sse2 has_sse3 has_ssse3 is_64bit is_Intel is_XEON is_Xeon is_i686 ------ -- Ghislain Viguier Support Applicatif CCRT-TGCC t?l. : 01 77 57 40 53 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Sep 6 10:16:27 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Sep 2011 10:16:27 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> Message-ID: On Mon, Sep 5, 2011 at 2:36 PM, Matthew Newville wrote: > > > On Monday, September 5, 2011 8:23:41 AM UTC-5, joep wrote: >> >> On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville >> wrote: >> > Hi, >> > >> > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: >> >> >> >> Hi, >> >> >> >> (I'm resurrecting an old post.) >> >> >> >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >> >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro wrote: >> >> > > I just found that for some functions such as fmin_bfgs, the >> >> > > argument >> >> > > name >> >> > > for the objective function to be minimized is f, and for others >> >> > > such >> >> > > as >> >> > > fmin, it is func. >> >> > > I was wondering if this was intended, because I think it would be >> >> > > better to >> >> > > have consistent argument names across those functions. >> >> > > >> >> > >> >> > It's unlikely that that was intentional. A patch would be welcome. >> >> > "func" >> >> > looks better to me than "f" or "F". >> >> >> >> There are still several inconsistencies in input or output of functions >> >> in the optimize package. For instance, for input parameters the >> >> Jacobian >> >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >> >> 'x_tol', etc. Outputs might be returned in a different order, e.g., >> >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, >> >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict >> >> output whereas some return the same data individually. etc. >> >> >> >> If you still believe (as I do) that consistency of optimize >> >> functions should be improved, I can work on it. Let me know >> > >> > Also +1. >> > >> > I would add that the call signatures and return values for the >> > user-supplied >> > function to minimize should be made consistent too.? Currently, some >> > functions (leastsq) requires the return value to be an array, while >> > others >> > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of >> > residual). >> > That seems like a serious impediment to changing algorithms. >> >> I don't see how that would be possible, since it's a difference in >> algorithm, leastsq needs the values for individual observations (to >> calculate Jacobian), the other ones don't care and only maximize an >> objective function that could have arbitrary accumulation. > > Well, I understand that this adds an implicit bias for least-squares, but > if the algorithms receive an array from the user-function, taking > (value*value).sum() might be preferred over raising an exception like > ??? ValueError: setting an array element with a sequence I'd rather have an exception, but maybe one that is more explicit. leastsq is efficient for leastsquares problems. If I switch to fmin or similar, it's usually because I have a different objective function, and I want to have a reminder that I need to tell what my objective function is (cut and paste errors are pretty common). > > which is apparently meant to be read as "change your function to return a > scalar".? I don't see in the docs where it actually specifies what the user > functions *should* return. > > I also agree with Charles' suggestion. Unraveling multi-dimensional arrays > for leastsq (and others) would be convenient. I'm not quite sure what that means. I think there is a difference between low level wrappers for the optimization algorithms (leastsq) and "convenience" functions like curve_fit. I'm in favor of standardizing names (original topic), but I don't think it is useful to "enhance" general purpose optimizers with lot's of problem specific features and increase the list of optional arguments. However, pull request for convenience function to accompany curve_fit might be very welcome. Josef OT reply to Christoph Deil moved to https://groups.google.com/group/pystatsmodels/topics > > --Matt Newville > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From charlesr.harris at gmail.com Tue Sep 6 10:33:05 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Sep 2011 08:33:05 -0600 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> Message-ID: On Tue, Sep 6, 2011 at 8:16 AM, wrote: > On Mon, Sep 5, 2011 at 2:36 PM, Matthew Newville > wrote: > > > > > > On Monday, September 5, 2011 8:23:41 AM UTC-5, joep wrote: > >> > >> On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville > >> wrote: > >> > Hi, > >> > > >> > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: > >> >> > >> >> Hi, > >> >> > >> >> (I'm resurrecting an old post.) > >> >> > >> >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: > >> >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro > wrote: > >> >> > > I just found that for some functions such as fmin_bfgs, the > >> >> > > argument > >> >> > > name > >> >> > > for the objective function to be minimized is f, and for others > >> >> > > such > >> >> > > as > >> >> > > fmin, it is func. > >> >> > > I was wondering if this was intended, because I think it would be > >> >> > > better to > >> >> > > have consistent argument names across those functions. > >> >> > > > >> >> > > >> >> > It's unlikely that that was intentional. A patch would be welcome. > >> >> > "func" > >> >> > looks better to me than "f" or "F". > >> >> > >> >> There are still several inconsistencies in input or output of > functions > >> >> in the optimize package. For instance, for input parameters the > >> >> Jacobian > >> >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or > >> >> 'x_tol', etc. Outputs might be returned in a different order, e.g., > >> >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, > >> >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict > >> >> output whereas some return the same data individually. etc. > >> >> > >> >> If you still believe (as I do) that consistency of optimize > >> >> functions should be improved, I can work on it. Let me know > >> > > >> > Also +1. > >> > > >> > I would add that the call signatures and return values for the > >> > user-supplied > >> > function to minimize should be made consistent too. Currently, some > >> > functions (leastsq) requires the return value to be an array, while > >> > others > >> > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of > >> > residual). > >> > That seems like a serious impediment to changing algorithms. > >> > >> I don't see how that would be possible, since it's a difference in > >> algorithm, leastsq needs the values for individual observations (to > >> calculate Jacobian), the other ones don't care and only maximize an > >> objective function that could have arbitrary accumulation. > > > > Well, I understand that this adds an implicit bias for least-squares, but > > if the algorithms receive an array from the user-function, taking > > (value*value).sum() might be preferred over raising an exception like > > ValueError: setting an array element with a sequence > > I'd rather have an exception, but maybe one that is more explicit. > leastsq is efficient for leastsquares problems. If I switch to fmin or > similar, it's usually because I have a different objective function, > and I want to have a reminder that I need to tell what my objective > function is (cut and paste errors are pretty common). > > > > > which is apparently meant to be read as "change your function to return a > > scalar". I don't see in the docs where it actually specifies what the > user > > functions *should* return. > > > > I also agree with Charles' suggestion. Unraveling multi-dimensional > arrays > > for leastsq (and others) would be convenient. > > I'm not quite sure what that means. > > Assuming you are speaking of leastsq, unraveling multidimensional arrays means that the function can return, say, an (n,3) array. Currently leastsq only works with 1D arrays. The (n,3) case is convenient if you are fitting vector valued data where the residuals would most naturally also be vector valued. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Sep 6 10:52:18 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Sep 2011 10:52:18 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> Message-ID: On Tue, Sep 6, 2011 at 10:33 AM, Charles R Harris wrote: > > > On Tue, Sep 6, 2011 at 8:16 AM, wrote: >> >> On Mon, Sep 5, 2011 at 2:36 PM, Matthew Newville >> wrote: >> > >> > >> > On Monday, September 5, 2011 8:23:41 AM UTC-5, joep wrote: >> >> >> >> On Sun, Sep 4, 2011 at 8:44 PM, Matthew Newville >> >> wrote: >> >> > Hi, >> >> > >> >> > On Friday, September 2, 2011 1:31:46 PM UTC-5, Denis Laxalde wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> (I'm resurrecting an old post.) >> >> >> >> >> >> On Thu, 27 Jan 2011 18:54:39 +0800, Ralf Gommers wrote: >> >> >> > On Wed, Jan 26, 2011 at 12:41 AM, Joon Ro >> >> >> > wrote: >> >> >> > > I just found that for some functions such as fmin_bfgs, the >> >> >> > > argument >> >> >> > > name >> >> >> > > for the objective function to be minimized is f, and for others >> >> >> > > such >> >> >> > > as >> >> >> > > fmin, it is func. >> >> >> > > I was wondering if this was intended, because I think it would >> >> >> > > be >> >> >> > > better to >> >> >> > > have consistent argument names across those functions. >> >> >> > > >> >> >> > >> >> >> > It's unlikely that that was intentional. A patch would be welcome. >> >> >> > "func" >> >> >> > looks better to me than "f" or "F". >> >> >> >> >> >> There are still several inconsistencies in input or output of >> >> >> functions >> >> >> in the optimize package. For instance, for input parameters the >> >> >> Jacobian >> >> >> is sometimes name 'fprime' or 'Dfun', tolerances can be 'xtol' or >> >> >> 'x_tol', etc. Outputs might be returned in a different order, e.g., >> >> >> fsolve returns 'x, infodict, ier, mesg' whereas leastsq returns 'x, >> >> >> cov_x, infodict, mesg, ier'. Some functions make use of the infodict >> >> >> output whereas some return the same data individually. etc. >> >> >> >> >> >> If you still believe (as I do) that consistency of optimize >> >> >> functions should be improved, I can work on it. Let me know >> >> > >> >> > Also +1. >> >> > >> >> > I would add that the call signatures and return values for the >> >> > user-supplied >> >> > function to minimize should be made consistent too.? Currently, some >> >> > functions (leastsq) requires the return value to be an array, while >> >> > others >> >> > (anneal and fmin_l_bfgs_b) require a scalar (sum-of-squares of >> >> > residual). >> >> > That seems like a serious impediment to changing algorithms. >> >> >> >> I don't see how that would be possible, since it's a difference in >> >> algorithm, leastsq needs the values for individual observations (to >> >> calculate Jacobian), the other ones don't care and only maximize an >> >> objective function that could have arbitrary accumulation. >> > >> > Well, I understand that this adds an implicit bias for least-squares, >> > but >> > if the algorithms receive an array from the user-function, taking >> > (value*value).sum() might be preferred over raising an exception like >> > ??? ValueError: setting an array element with a sequence >> >> I'd rather have an exception, but maybe one that is more explicit. >> leastsq is efficient for leastsquares problems. If I switch to fmin or >> similar, it's usually because I have a different objective function, >> and I want to have a reminder that I need to tell what my objective >> function is (cut and paste errors are pretty common). >> >> > >> > which is apparently meant to be read as "change your function to return >> > a >> > scalar".? I don't see in the docs where it actually specifies what the >> > user >> > functions *should* return. >> > >> > I also agree with Charles' suggestion. Unraveling multi-dimensional >> > arrays >> > for leastsq (and others) would be convenient. >> >> I'm not quite sure what that means. >> > > Assuming you are speaking of leastsq, unraveling multidimensional arrays > means that the function can return, say, an (n,3) array. Currently leastsq > only works with 1D arrays. The (n,3) case is convenient if you are fitting > vector valued data where the residuals would most naturally also be vector > valued. What I don't understand is how you want the 3 arrays combined ? one stacked least squares problem, adding all errors**2, e.g. def errors(params, y, x): err = y - f(x,params) return err.ravel() or a weighted sum of the 3 errors, return (err*weight).ravel() or 3 separate optimization problems or multiobjective optimization? We have similar problems with panel data or system of equations (case 1 and 2 above), but combining them to one model is part of the definition of the objective function. case 3, 3 separate optimization problems, can be conveniently done in the linear case (linalg), but I don't see how leastsq could do it. Josef > > > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From newville at cars.uchicago.edu Tue Sep 6 10:53:47 2011 From: newville at cars.uchicago.edu (Matt Newville) Date: Tue, 6 Sep 2011 09:53:47 -0500 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> Message-ID: >> Well, I understand that this adds an implicit bias for least-squares, but >> if the algorithms receive an array from the user-function, taking >> (value*value).sum() might be preferred over raising an exception like >> ??? ValueError: setting an array element with a sequence > > I'd rather have an exception, but maybe one that is more explicit. > leastsq is efficient for leastsquares problems. If I switch to fmin or > similar, it's usually because I have a different objective function, > and I want to have a reminder that I need to tell what my objective > function is (cut and paste errors are pretty common). The present situation makes it more challenging to try out different minimization procedures, as the objective functions *must* be different, and in a way that is poorly (ok, un-) documented. If the objective functions had consistent signatures, it would make it much easier to (as was suggested) write a wrapper that allowed selection of the algorithm. >> which is apparently meant to be read as "change your function to return a >> scalar". I don't see in the docs where it actually specifies what the user >> functions *should* return. >> >> I also agree with Charles' suggestion. Unraveling multi-dimensional arrays >> for leastsq (and others) would be convenient. > > I'm not quite sure what that means. leastsq (and the underlying lmdif) require a 1-d array. The suggestion is to not fail if a n-d array is passed, but to unravel it. > I think there is a difference between low level wrappers for the > optimization algorithms (leastsq) and "convenience" functions like > curve_fit. I agree. > I'm in favor of standardizing names (original topic), but I don't > think it is useful to "enhance" general purpose optimizers with lot's > of problem specific features and increase the list of optional > arguments. OK. I was not suggesting adding problem-specific features, just suggesting that standardizing the behavior of the objective functions might be helpful too. --Matt From denis.laxalde at mcgill.ca Tue Sep 6 10:53:09 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Tue, 6 Sep 2011 10:53:09 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> Message-ID: <20110906105309.160ba6c7@mail.gmail.com> I'm glad to see that several people agree on this topic. I will then work on standardizing names and orders of parameters and returns as well as possible grouping (e.g. for solver statistics). On Mon, 5 Sep 2011 09:23:41 -0400, josef.pktd at gmail.com wrote: > It might be a bit messy during deprecation with double names, and > there remain different arguments depending on the algorithm, e.g. > constraints or not, and if constraints which kind, objective value and > derivative in one function or in two. I guess the case input parameters could be treated by adding new parameters and pointing old ones to the latter with a warning. But what about returns? For instance, how should one deal with changes of order or parameters being grouped with others? In general, I have little knowledge concerning deprecation mechanisms so any advice (or documentation/example pointer) would be welcome. -- Denis From josef.pktd at gmail.com Tue Sep 6 11:23:19 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Sep 2011 11:23:19 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <9792885.2028.1315247811658.JavaMail.geo-discussion-forums@yqgp10> Message-ID: On Tue, Sep 6, 2011 at 10:53 AM, Matt Newville wrote: >>> Well, I understand that this adds an implicit bias for least-squares, but >>> if the algorithms receive an array from the user-function, taking >>> (value*value).sum() might be preferred over raising an exception like >>> ??? ValueError: setting an array element with a sequence >> >> I'd rather have an exception, but maybe one that is more explicit. >> leastsq is efficient for leastsquares problems. If I switch to fmin or >> similar, it's usually because I have a different objective function, >> and I want to have a reminder that I need to tell what my objective >> function is (cut and paste errors are pretty common). > > The present situation makes it more challenging to try out different > minimization procedures, as the objective functions *must* be > different, and in a way that is poorly (ok, un-) documented. ?If the > objective functions had consistent signatures, it would make it much > easier to (as was suggested) write a wrapper that allowed selection of > the algorithm. > >>> which is apparently meant to be read as "change your function to return a >>> scalar". I don't see in the docs where it actually specifies what the user >>> functions *should* return. >>> >>> I also agree with Charles' suggestion. Unraveling multi-dimensional arrays >>> for leastsq (and others) would be convenient. >> >> I'm not quite sure what that means. > > leastsq (and the underlying lmdif) require a 1-d array. ?The > suggestion is to not fail if a n-d array is passed, but to unravel it. > >> I think there is a difference between low level wrappers for the >> optimization algorithms (leastsq) and "convenience" functions like >> curve_fit. > > I agree. > >> I'm in favor of standardizing names (original topic), but I don't >> think it is useful to "enhance" general purpose optimizers with lot's >> of problem specific features and increase the list of optional >> arguments. > > OK. ?I was not suggesting adding problem-specific features, just > suggesting that standardizing the behavior of the objective functions > might be helpful too. What I'm trying to argue is that there are inherent differences between optimizers that cannot or should not be unified in the interface to the optimizers. This is different from removing naming inconsistencies. But I also think it would be useful to add a dispatch function with unified interface as Skipper proposed earlier in the thread, and that we use in statsmodels for some of the optimizers (although we don't wrap leastsq, and we haven't gotten around yet to wrap the constraint optimizers). scipy.distribution.fit now also allows a choice of optimizers, if I remember correctly. Josef > > --Matt > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Tue Sep 6 11:46:17 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Sep 2011 11:46:17 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <20110906105309.160ba6c7@mail.gmail.com> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> Message-ID: On Tue, Sep 6, 2011 at 10:53 AM, Denis Laxalde wrote: > I'm glad to see that several people agree on this topic. I will then > work on standardizing names and orders of parameters and returns > as well as possible grouping (e.g. for solver statistics). > > On Mon, 5 Sep 2011 09:23:41 -0400, > josef.pktd at gmail.com wrote: >> It might be a bit messy during deprecation with double names, and >> there remain different arguments depending on the algorithm, e.g. >> constraints or not, and if constraints which kind, objective value and >> derivative in one function or in two. > > I guess the case input parameters could be treated by adding new > parameters and pointing old ones to the latter with a warning. But what > about returns? For instance, how should one deal with changes of order > or parameters being grouped with others? > > In general, I have little knowledge concerning deprecation mechanisms so > any advice (or documentation/example pointer) would be welcome. One possibility, that has been used in numpy.histogram, is to add a "new=False" or "new=True" keyword, to switch behavior during transition. There are usually several warnings.warn for deprecation warnings in the scipy source but I don't know which are in the current version. Josef > > -- > Denis > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Tue Sep 6 13:08:22 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 6 Sep 2011 19:08:22 +0200 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> Message-ID: On Tue, Sep 6, 2011 at 5:46 PM, wrote: > On Tue, Sep 6, 2011 at 10:53 AM, Denis Laxalde > wrote: > > I'm glad to see that several people agree on this topic. I will then > > work on standardizing names and orders of parameters and returns > > as well as possible grouping (e.g. for solver statistics). > > > > On Mon, 5 Sep 2011 09:23:41 -0400, > > josef.pktd at gmail.com wrote: > >> It might be a bit messy during deprecation with double names, and > >> there remain different arguments depending on the algorithm, e.g. > >> constraints or not, and if constraints which kind, objective value and > >> derivative in one function or in two. > > > > I guess the case input parameters could be treated by adding new > > parameters and pointing old ones to the latter with a warning. But what > > about returns? For instance, how should one deal with changes of order > > or parameters being grouped with others? > > > > In general, I have little knowledge concerning deprecation mechanisms so > > any advice (or documentation/example pointer) would be welcome. > > One possibility, that has been used in numpy.histogram, is to add a > "new=False" or "new=True" keyword, to switch behavior during > transition. > This may apply to a few cases where the outputs should be reshuffled or changed, but we should try to minimize "new=" - it requires the user to change his code twice because eventually the new keyword should also disappear again. The histogram change didn't go very smoothly imho. If the changes are large, another option is to just deprecate the whole function and write a new one with the desired interface. That would just require a single change for the user. It does depend on whether a good new name can be found of course. For renaming of input parameters (both positional and keyword), just do the rename and then add the old name as a new keyword at the end. Then document in the docstring that it's deprecated, and check for it at the beginning of the function and if used do: warnings.warn("", DeprecationWarning) The message should also state the scipy version when it was deprecated (0.11 probably) and when it will disappear (0.12). Could you make an overview of which functions should be changed, and your proposed new unified interface? The best solution could depend on the details of the changed needed. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Tue Sep 6 13:16:23 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 6 Sep 2011 12:16:23 -0500 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> Message-ID: On Tue, Sep 6, 2011 at 12:08 PM, Ralf Gommers wrote: > > > On Tue, Sep 6, 2011 at 5:46 PM, wrote: >> >> On Tue, Sep 6, 2011 at 10:53 AM, Denis Laxalde >> wrote: >> > I'm glad to see that several people agree on this topic. I will then >> > work on standardizing names and orders of parameters and returns >> > as well as possible grouping (e.g. for solver statistics). >> > >> > On Mon, 5 Sep 2011 09:23:41 -0400, >> > josef.pktd at gmail.com wrote: >> >> It might be a bit messy during deprecation with double names, and >> >> there remain different arguments depending on the algorithm, e.g. >> >> constraints or not, and if constraints which kind, objective value and >> >> derivative in one function or in two. >> > >> > I guess the case input parameters could be treated by adding new >> > parameters and pointing old ones to the latter with a warning. But what >> > about returns? For instance, how should one deal with changes of order >> > or parameters being grouped with others? >> > >> > In general, I have little knowledge concerning deprecation mechanisms so >> > any advice (or documentation/example pointer) would be welcome. >> >> One possibility, that has been used in numpy.histogram, is to add a >> "new=False" or "new=True" keyword, to switch behavior during >> transition. > > This may apply to a few cases where the outputs should be reshuffled or > changed, but we should try to minimize "new=" - it requires the user to > change his code twice because eventually the new keyword should also > disappear again. The histogram change didn't go very smoothly imho. If the > changes are large, another option is to just deprecate the whole function > and write a new one with the desired interface. That would just require a > single change for the user. It does depend on whether a good new name can be > found of course. > > For renaming of input parameters (both positional and keyword), just do the > rename and then add the old name as a new keyword at the end. Then document > in the docstring that it's deprecated, and check for it at the beginning of > the function and if used do: > warnings.warn("", DeprecationWarning) > The message should also state the scipy version when it was deprecated (0.11 > probably) and when it will disappear (0.12). > > Could you make an overview of which functions should be changed, and your > proposed new unified interface? The best solution could depend on the > details of the changed needed. > > Ralf > Is there a natural place to put some documentation for any new scipy.optimize argument conventions? It'd be great to have them written down so people that want to contribute down the line don't accidentally use non-standard names. -Chris > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From kgdunn at gmail.com Tue Sep 6 13:25:19 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Tue, 6 Sep 2011 13:25:19 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: <4E656206.10503@gmail.com> Message-ID: On Mon, Sep 5, 2011 at 21:01, Skipper Seabold wrote: > On Mon, Sep 5, 2011 at 8:40 PM, Kevin Dunn wrote: >> Hi everyone, SciPy Central maintainer here - sorry for the slow reply. >> >> On Mon, Sep 5, 2011 at 19:57, Collin Stocks wrote: >>> Also in favour of a comments system. >> >> This is been bumped on the priority list. Django, which is used for >> SciPy Central, has a built-in commenting system. I will look at >> integrating it in the next while (just a bit busy at work with other >> things, but will get to it soon). >> >> My current highest priority is to complete library submissions, which >> are uploaded via a ZIP file. This will allow visitors to see the >> contents of the library by clicking on the file names (kind of like >> browsing a repo on GitHub). >> >> The next highest priority is commenting. So if you've got any requests >> for how commenting should look and behave: >> https://github.com/kgdunn/SciPyCentral/issues/111 >> >>> On 09/05/2011 03:39 AM, Michael Klitgaard wrote: >>>> Would it be possible to include data files on SciPy-Central? >> >> Absolutely. I've already got some Django code for this on my company's >> site (shameless plug: http://datasets.connectmv.com). >> > > We could also make the statsmdels datasets module independently > distributable and available, if there's interest. > >> By the way, does SciPy/NumPy have a way to load data from a URL like R? >> >> I've looked for this but can't seem to find anything on it. In R it is >> so nice to be able to say to someone: >> data = read.table('http://datasets.connectmv.com/file/ammonia.csv') >> >> rather that doing a two step: download and load. >> > > > There is the DataSource class, though there are other ways this could > be accomplished. I'm not sure that there's a function to do it yet. > > ds = np.lib.DataSource() > fp = ds.open('http://datasets.connectmv.com/file/ammonia.csv') > from StringIO import StringIO > arr = np.genfromtxt(StringIO(fp.read()), names=True) > > Or you could use urllib > > import urllib > fp2 = urllib.urlopen('http://datasets.connectmv.com/file/ammonia.csv') > arr2 = np.genfromtxt(StringIO(fp2.read()), names=True) > > If there's nothing else > > def loadurl(url, *args, **kwargs): > ? ?from urllib import urlopen > ? ?from cStringIO import StringIO > ? ?fp = urlopen(url) > ? ?return np.genfromtxt(StringIO(fp.read()), *args, **kwargs) > > arr3 = loadurl('http://datasets.connectmv.com/file/ammonia.csv') Thanks Skipper - I wasn't aware of the np.lib.DataSource class. It seems that it can handle compressed data sources as well. All 3 methods work great. Would you mind adding those code snippets to SciPy Central for others to see? Thanks, Kevin > Skipper > >>>> In this case the file 'ct.raw'. I believe it would improve the quality >>>> of the program to include the sample_data. >>>> >>>> This could perhaps make scipy central evem more usefull than other >>>> code sharing sites. >> >> Thanks for the idea! >> >>>> Sincerely >>>> Michael >>>> >>> >>> The problem I see with data files is that they could be potentially >>> large. There may be a way around the problems associated with this, though. >> >> Bandwidth on my host shouldn't be an issue. Right now I use less than >> 0.5% of my monthly 1200 Gb allocation, so there's plenty of room to >> grow. >> >>> Maybe I am a bit naive, but I can't think of many common circumstances >>> where a code fragment or program which is potentially useful to many >>> people would be more helpful by providing a data file, since most people >>> who would find said code useful would already have access to their own >>> data set. >>> >>> I do, however, see the benefit of having some set of sample data on >>> SciPy-Central, but perhaps this data set should be generic in that many >>> different code contributions could reference it in a useful way. >> >> Agreed. Which is why I was asking about loading data from a URL above. >> >>> My two cents. >>> >>> -- Collin From jsseabold at gmail.com Tue Sep 6 13:33:09 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Sep 2011 13:33:09 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: <4E656206.10503@gmail.com> Message-ID: On Tue, Sep 6, 2011 at 1:25 PM, Kevin Dunn wrote: > On Mon, Sep 5, 2011 at 21:01, Skipper Seabold wrote: >> On Mon, Sep 5, 2011 at 8:40 PM, Kevin Dunn wrote: >>> By the way, does SciPy/NumPy have a way to load data from a URL like R? >>> >>> I've looked for this but can't seem to find anything on it. In R it is >>> so nice to be able to say to someone: >>> data = read.table('http://datasets.connectmv.com/file/ammonia.csv') >>> >>> rather that doing a two step: download and load. >>> >> >> >> There is the DataSource class, though there are other ways this could >> be accomplished. I'm not sure that there's a function to do it yet. >> >> ds = np.lib.DataSource() >> fp = ds.open('http://datasets.connectmv.com/file/ammonia.csv') >> from StringIO import StringIO >> arr = np.genfromtxt(StringIO(fp.read()), names=True) >> >> Or you could use urllib >> >> import urllib >> fp2 = urllib.urlopen('http://datasets.connectmv.com/file/ammonia.csv') >> arr2 = np.genfromtxt(StringIO(fp2.read()), names=True) >> >> If there's nothing else >> >> def loadurl(url, *args, **kwargs): >> ? ?from urllib import urlopen >> ? ?from cStringIO import StringIO >> ? ?fp = urlopen(url) >> ? ?return np.genfromtxt(StringIO(fp.read()), *args, **kwargs) >> >> arr3 = loadurl('http://datasets.connectmv.com/file/ammonia.csv') > > Thanks Skipper - I wasn't aware of the np.lib.DataSource class. It > seems that it can handle compressed data sources as well. > > All 3 methods work great. Would you mind adding those code snippets to > SciPy Central for others to see? > http://scipy-central.org/item/25/1/load-a-url-into-an-array Someone feel free to make a pull request with loadurl, if it's useful. I'll try to remember to do it later. Skipper From ralf.gommers at googlemail.com Tue Sep 6 13:33:32 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 6 Sep 2011 19:33:32 +0200 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> Message-ID: On Tue, Sep 6, 2011 at 7:16 PM, Christopher Jordan-Squire wrote: > On Tue, Sep 6, 2011 at 12:08 PM, Ralf Gommers > wrote: > > > > > > On Tue, Sep 6, 2011 at 5:46 PM, wrote: > >> > >> On Tue, Sep 6, 2011 at 10:53 AM, Denis Laxalde > > >> wrote: > >> > I'm glad to see that several people agree on this topic. I will then > >> > work on standardizing names and orders of parameters and returns > >> > as well as possible grouping (e.g. for solver statistics). > >> > > >> > On Mon, 5 Sep 2011 09:23:41 -0400, > >> > josef.pktd at gmail.com wrote: > >> >> It might be a bit messy during deprecation with double names, and > >> >> there remain different arguments depending on the algorithm, e.g. > >> >> constraints or not, and if constraints which kind, objective value > and > >> >> derivative in one function or in two. > >> > > >> > I guess the case input parameters could be treated by adding new > >> > parameters and pointing old ones to the latter with a warning. But > what > >> > about returns? For instance, how should one deal with changes of order > >> > or parameters being grouped with others? > >> > > >> > In general, I have little knowledge concerning deprecation mechanisms > so > >> > any advice (or documentation/example pointer) would be welcome. > >> > >> One possibility, that has been used in numpy.histogram, is to add a > >> "new=False" or "new=True" keyword, to switch behavior during > >> transition. > > > > This may apply to a few cases where the outputs should be reshuffled or > > changed, but we should try to minimize "new=" - it requires the user to > > change his code twice because eventually the new keyword should also > > disappear again. The histogram change didn't go very smoothly imho. If > the > > changes are large, another option is to just deprecate the whole function > > and write a new one with the desired interface. That would just require a > > single change for the user. It does depend on whether a good new name can > be > > found of course. > > > > For renaming of input parameters (both positional and keyword), just do > the > > rename and then add the old name as a new keyword at the end. Then > document > > in the docstring that it's deprecated, and check for it at the beginning > of > > the function and if used do: > > warnings.warn("", DeprecationWarning) > > The message should also state the scipy version when it was deprecated > (0.11 > > probably) and when it will disappear (0.12). > > > > Could you make an overview of which functions should be changed, and your > > proposed new unified interface? The best solution could depend on the > > details of the changed needed. > > > > Ralf > > > > Is there a natural place to put some documentation for any new > scipy.optimize argument conventions? It'd be great to have them > written down so people that want to contribute down the line don't > accidentally use non-standard names. > > A page under the "Development Plans" on http://projects.scipy.org/scipyseems like the place to put it until this is completed. Eventually the unified interface should probably be described in http://docs.scipy.org/doc/scipy/reference/optimize.html Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.laxalde at mcgill.ca Tue Sep 6 14:30:14 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Tue, 6 Sep 2011 14:30:14 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> Message-ID: <20110906143014.35316829@mail.gmail.com> On Tue, 6 Sep 2011 19:08:22 +0200, Ralf Gommers wrote: > > One possibility, that has been used in numpy.histogram, is to add a > > "new=False" or "new=True" keyword, to switch behavior during > > transition. > > This may apply to a few cases where the outputs should be reshuffled or > changed, but we should try to minimize "new=" - it requires the user to > change his code twice because eventually the new keyword should also > disappear again. The histogram change didn't go very smoothly imho. If the > changes are large, another option is to just deprecate the whole function > and write a new one with the desired interface. That would just require a > single change for the user. It does depend on whether a good new name can be > found of course. > > For renaming of input parameters (both positional and keyword), just do the > rename and then add the old name as a new keyword at the end. Then document > in the docstring that it's deprecated, and check for it at the beginning of > the function and if used do: > warnings.warn("", DeprecationWarning) > The message should also state the scipy version when it was deprecated (0.11 > probably) and when it will disappear (0.12). Ok. Sounds good. > Could you make an overview of which functions should be changed, and your > proposed new unified interface? The best solution could depend on the > details of the changed needed. The first thing to do is to improve names consistency, imo. I will review all functions of the package and then post a list of possible improvements. Then, I need to think a bit more about the new ??unified interfaces?? idea as it could indeed solve some issues. As suggested, I'll create a page on when things will be more clear. -- Denis From bsouthey at gmail.com Tue Sep 6 14:53:44 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Sep 2011 13:53:44 -0500 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <20110906143014.35316829@mail.gmail.com> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> <20110906143014.35316829@mail.gmail.com> Message-ID: <4E666C38.5060901@gmail.com> On 09/06/2011 01:30 PM, Denis Laxalde wrote: > On Tue, 6 Sep 2011 19:08:22 +0200, > Ralf Gommers wrote: >>> One possibility, that has been used in numpy.histogram, is to add a >>> "new=False" or "new=True" keyword, to switch behavior during >>> transition. >> This may apply to a few cases where the outputs should be reshuffled or >> changed, but we should try to minimize "new=" - it requires the user to >> change his code twice because eventually the new keyword should also >> disappear again. The histogram change didn't go very smoothly imho. If the >> changes are large, another option is to just deprecate the whole function >> and write a new one with the desired interface. That would just require a >> single change for the user. It does depend on whether a good new name can be >> found of course. >> >> For renaming of input parameters (both positional and keyword), just do the >> rename and then add the old name as a new keyword at the end. Then document >> in the docstring that it's deprecated, and check for it at the beginning of >> the function and if used do: >> warnings.warn("", DeprecationWarning) >> The message should also state the scipy version when it was deprecated (0.11 >> probably) and when it will disappear (0.12). > Ok. Sounds good. > >> Could you make an overview of which functions should be changed, and your >> proposed new unified interface? The best solution could depend on the >> details of the changed needed. > The first thing to do is to improve names consistency, imo. I will > review all functions of the package and then post a list of possible > improvements. Then, I need to think a bit more about the new ? unified > interfaces ? idea as it could indeed solve some issues. As > suggested, I'll create a page on when > things will be more clear. > I would say Ralf's idea of new functions would be the best approach even if scipy is still beta. But I would extend it by providing that suggested unified function and creating 'internal' versions (have a leading underscore) of existing functions. While it is initially code duplication, it also permits to change not only the argument but also the return values as needed. Thus you would have the advantage provided by Ralf and avoid the naming problem without affecting existing users. Bruce From denis.laxalde at mcgill.ca Wed Sep 7 11:47:45 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Wed, 7 Sep 2011 11:47:45 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> <20110906143014.35316829@mail.gmail.com> <4E666C38.5060901@gmail.com> Message-ID: <20110907114745.3d0abdc1@mail.gmail.com> On Tue, 06 Sep 2011 13:53:44 -0500, Bruce Southey wrote: > I would say Ralf's idea of new functions would be the best approach even > if scipy is still beta. But I would extend it by providing that > suggested unified function and creating 'internal' versions (have a > leading underscore) of existing functions. While it is initially code > duplication, it also permits to change not only the argument but also > the return values as needed. Thus you would have the advantage provided > by Ralf and avoid the naming problem without affecting existing users. I like this idea of internal functions in combination with the unified interfaces. Yet, I think code (and potential maintenance work) duplication could be avoided by moving the code of respective functions to their internal clone and calling the latter in the former. Internal functions could have their parameters/returns names and order standardized. Existing functions' signature would be kept as is but deprecated and the unified interfaces would call internal functions. -- Denis From roeldeconinck at gmail.com Thu Sep 8 06:34:11 2011 From: roeldeconinck at gmail.com (Roel De Coninck) Date: Thu, 8 Sep 2011 10:34:11 +0000 (UTC) Subject: [SciPy-User] sugestion for loadmat (scipy.io) Message-ID: Hi, I spent a few hours trying to read some .mat files. The first error I got was a UnicodeDecodeError caused by VarReader4.read_char_array(). I could solve this by replacing S = arr.tostring().decode('ascii') by S = arr.tostring().decode('ascii', 'ignore') Then, I got this error, caused by the same method read_char_array(): TypeError: buffer is too small for requested array (full error message see below) I could solve this after trying lots of things, by changing S = arr.tostring().decode('ascii', 'ignore') into S = arr.tostring().decode('ascii', 'replace') It also works with S = arr.tostring().decode('utf-8', 'replace'), which is maybe even better as ascii. Is there a drawback to what I did that I did not yet discover? Else, I would suggest to add 'replace' to avoid UnicodeDecodeError in this method. Kind regards, Roel From dineshbvadhia at hotmail.com Thu Sep 8 08:35:15 2011 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Thu, 8 Sep 2011 05:35:15 -0700 Subject: [SciPy-User] SPARSE matrix dtypes, upcasting, sum function Message-ID: We have: I > 250000, J > 250000, nnz>10000000 data = scipy.ones(nnz, dtype=numpy.uint8) A = sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J)) where xrow and xcolumn are int vectors of length nnz The row and column sums are: rowsum = A.sum(0) columnsum = A.sum(1) The max value given for each by Scipy are: rowsum .max() = 255 columnsum .max() = 255 But, the real values are: rowsum .max() = 41190 columnsum .max() = 1080 Can someone see what we are doing wrong? -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Sep 8 09:20:23 2011 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 8 Sep 2011 09:20:23 -0400 Subject: [SciPy-User] SPARSE matrix dtypes, upcasting, sum function In-Reply-To: References: Message-ID: Maybe the problem that uint8 cannot be greater than 255. __ Oleksandr Huziy 2011/9/8 Dinesh B Vadhia > ** > We have: > > I > 250000, J > 250000, nnz>10000000 > > data = scipy.ones(nnz, dtype=numpy.uint8) > A = sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J)) > > where xrow and xcolumn are int vectors of length nnz > > The row and column sums are: > rowsum = A.sum(0) > columnsum = A.sum(1) > > The max value given for each by Scipy are: > rowsum .max() = 255 > columnsum .max() = 255 > > But, the real values are: > rowsum .max() = 41190 > columnsum .max() = 1080 > > Can someone see what we are doing wrong? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Sep 8 09:28:48 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Sep 2011 09:28:48 -0400 Subject: [SciPy-User] SPARSE matrix dtypes, upcasting, sum function In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 8:35 AM, Dinesh B Vadhia wrote: > We have: > > I > 250000, J > 250000, nnz>10000000 > > data = scipy.ones(nnz, dtype=numpy.uint8) > A?= sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J)) > > where xrow and xcolumn are int vectors of length nnz > > The row and column sums are: > rowsum?= A.sum(0) > columnsum = A.sum(1) > > The max value given for each by Scipy are: > rowsum?.max() = 255 > columnsum .max() = 255 > > But, the real values are: > rowsum?.max() = 41190 > columnsum .max() = 1080 > > Can someone see what we are doing wrong? It is at least a documentation bug, and I would have expected upcasting as well. Note however that using integer will always have some potential overflow issues, which are platform dependent (because the default upcasting rules will use different sizes on different platforms). For example: import numpy as np a = 1024 * np.ones((4e6, 2), dtype=np.int16) a.sum(0) will give you the right answer on a 64 bits python on mac os x, but the wrong one on 32 bits. As soon as you are doing operations which can potentially overflow, I would advise to convert to float values. cheers, David From matthew.brett at gmail.com Thu Sep 8 12:02:36 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 8 Sep 2011 09:02:36 -0700 Subject: [SciPy-User] sugestion for loadmat (scipy.io) In-Reply-To: References: Message-ID: Hi, On Thu, Sep 8, 2011 at 3:34 AM, Roel De Coninck wrote: > Hi, > > I spent a few hours trying to read some .mat files. > The first error I got was a UnicodeDecodeError caused by > VarReader4.read_char_array(). ?I could solve this by replacing > S = arr.tostring().decode('ascii') > by > S = arr.tostring().decode('ascii', 'ignore') > > Then, I got this error, caused by the same method read_char_array(): > TypeError: buffer is too small for requested array > (full error message see below) > > I could solve this after trying lots of things, by changing > S = arr.tostring().decode('ascii', 'ignore') > into > S = arr.tostring().decode('ascii', 'replace') > > It also works with S = arr.tostring().decode('utf-8', 'replace'), which is > maybe even better as ascii. > > Is there a drawback to what I did that I did not yet discover? > Else, I would suggest to add 'replace' to avoid UnicodeDecodeError in this > method. I guess you've got characters above ord(127) in your mat file strings, and I hadn't hit that before. I think the correct fix in both cases is this: S = arr.tostring().decode('latin1') - does that work? Have you got a small .mat file you can send to me off-list that I can test against? Best, Matthew From jsseabold at gmail.com Thu Sep 8 12:52:17 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 8 Sep 2011 12:52:17 -0400 Subject: [SciPy-User] scipy central comments In-Reply-To: References: <4E656206.10503@gmail.com> Message-ID: Back on list. On Thu, Sep 8, 2011 at 12:43 PM, denis wrote: > Skipper, > ?re central data: definitely useful -- see R, > but it should be separate from scipy-central: > ?don't do everything at once. We have https://github.com/statsmodels/statsmodels/tree/master/scikits/statsmodels/datasets Many of the same datasets are available in R. Could be made available as a separate package, though some of it (endog, exog) attributes are specific to our testing and examples needs. > The functionality oughta include > ? ?listing: what's available, how big is it ? > ? ?load / loadtxt to a single array > ? ?splitting, sanitizing, summarizing: diverse, difficult > > BUT scipy-central-data may satisfy no one, in which case forget it. > (Personally I'd like to spec it first, shoot later, but.) > What we've used: http://statsmodels.sourceforge.net/devel/dataset_proposal.html#dataset-proposal > What I use today is this, ~ 3 pages: > def getdata( source, N=0, Ntest=0, classcol=-1, centre=0, verbose=0, > datadir=Datadir ): > """ X = getdata( slearn/xx uciml/yy ... classcol=None ) > ? ? ? ?findfile, load or loadtxt > ? ?X, classes = getdata( ... classcol = 0 or -1 ) > ? ? ? ?split off classes, astype(int) > ? ?X, y, Xtest, ytest = getdata( Ntest > 0 ) > ? ? ? ?split first N / last Ntest > ? ?centre: > ? ? ? ?0 noop, 1 -= mean, 2 /= sd, 3 winsorise, 4 winsor + to_11 > > def findfile( filename, datadir ): > """ try datadir + filename + .npy .csv .csv.gz .txt .txt.gz > ? ?expand $vars, glob ?# cf openplus > > Fwiw, > http://stackoverflow.com/questions/6321476/python-api-to-load-various-machine-learning-datasets > got no satisfactory answer > but you might ask the scikits-learn guys again, see where they are > today. > Ours and their datasets module evolved from David C.'s original proposal I believe. Skipper From cweisiger at msg.ucsf.edu Thu Sep 8 18:29:12 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Thu, 8 Sep 2011 15:29:12 -0700 Subject: [SciPy-User] Arbitrary max-intensity projection Message-ID: I have a 3D volume of image data. I want to do a max-intensity projection of that data along an arbitrary axis (that is, not necessarily orthogonal). For example, projecting along the axis <1, 0, .5> would generate results like looking at the data a bit from above. Basically we're faking 3D views of our data. OpenGL 3D textures don't work due to the size of the image data in question (e.g. 512x512x60). Someone suggested reimplementing Amanatides & Woo, which is a fairly simple voxel raytracer. However, that doesn't mean it's trivial to implement, and I'd rather not reinvent and optimize the wheel if at all possible. Does anyone have any suggestions for known solutions to this problem? -Chris From cgohlke at uci.edu Thu Sep 8 19:13:01 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 08 Sep 2011 16:13:01 -0700 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: References: Message-ID: <4E694BFD.2030605@uci.edu> On 9/8/2011 3:29 PM, Chris Weisiger wrote: > I have a 3D volume of image data. I want to do a max-intensity > projection of that data along an arbitrary axis (that is, not > necessarily orthogonal). For example, projecting along the axis<1, 0, > .5> would generate results like looking at the data a bit from above. > Basically we're faking 3D views of our data. > > OpenGL 3D textures don't work due to the size of the image data in > question (e.g. 512x512x60). Someone suggested reimplementing > Amanatides& Woo, which is a fairly simple voxel raytracer. However, > that doesn't mean it's trivial to implement, and I'd rather not > reinvent and optimize the wheel if at all possible. Does anyone have > any suggestions for known solutions to this problem? > > -Chris Did you try VTK's vtkVolumeRayCastMIPFunction function as suggested before? OpenGL 3D textures should work if you successively render sub-volumes in the correct position and order. MIP can also be implemented with object-aligned 2D textures. Christoph From david_baddeley at yahoo.com.au Thu Sep 8 23:13:32 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Thu, 8 Sep 2011 20:13:32 -0700 (PDT) Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: <4E694BFD.2030605@uci.edu> Message-ID: <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> This gets even easier if you use mayavi from enthought.mayavi import mlab f = mlab.figure() v = mlab.pipeline.volume(mlab.pipeline.scalar_field(data.astype('uint8'))) you can then use the pipeline tool to (graphically) change the "Volume mapper type" to "RayCastMapper" and the "Ray cast function type" to "RayCastMIPFunction". (There's probably also a programatic way to do this) cheers, David --- On Fri, 9/9/11, Christoph Gohlke wrote: > From: Christoph Gohlke > Subject: Re: [SciPy-User] Arbitrary max-intensity projection > To: scipy-user at scipy.org > Received: Friday, 9 September, 2011, 11:13 AM > > > On 9/8/2011 3:29 PM, Chris Weisiger wrote: > > I have a 3D volume of image data. I want to do a > max-intensity > > projection of that data along an arbitrary axis (that > is, not > > necessarily orthogonal). For example, projecting along > the axis<1, 0, > > .5>? would generate results like looking at > the data a bit from above. > > Basically we're faking 3D views of our data. > > > > OpenGL 3D textures don't work due to the size of the > image data in > > question (e.g. 512x512x60). Someone suggested > reimplementing > > Amanatides&? Woo, which is a fairly simple > voxel raytracer. However, > > that doesn't mean it's trivial to implement, and I'd > rather not > > reinvent and optimize the wheel if at all possible. > Does anyone have > > any suggestions for known solutions to this problem? > > > > -Chris > > > Did you try VTK's vtkVolumeRayCastMIPFunction function as > suggested before? > > > > > > OpenGL 3D textures should work if you successively render > sub-volumes in > the correct position and order. MIP can also be implemented > with > object-aligned 2D textures. > > Christoph > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sloan.lindsey at gmail.com Fri Sep 9 08:41:10 2011 From: sloan.lindsey at gmail.com (Sloan Lindsey) Date: Fri, 9 Sep 2011 14:41:10 +0200 Subject: [SciPy-User] Strange Crash with SmoothBivariateSpline Message-ID: Greetings, I've run into a weird error, can somebody point me into the right direction for solving this? -Sloan A spline approximation was done with SmoothBivariateSpline (kx=ky=3). The result looks resonable, and the knots are tx, ty = [ 71. 71. 71. 71. 74.14423984 77.68639778 79.67385522 81.08860568 83.07147709 85.07543132 85.63465634 86.22832447 86.59870818 87.23262262 87.99800451 88.21344492 88.45654654 88.71633794 88.91968993 89.16798449 89.36585818 89.54486934 89.73526206 89.9 89.9 89.9 89.9 ] [ 65.125 65.125 65.125 65.125 66.87453837 67.63985122 68.52433262 69.74994527 71.37307164 72.33426696 73.44560154 74.7343376 75.93928617 77.15502731 78.38831199 79.63896841 80.84707006 82.01251733 83.22576965 84.75783772 86.11389394 86.88393959 87.5448275 88.07140257 88.44564406 88.86952368 88.98030123 89.32480252 89.43357252 89.875 89.875 89.875 89.875 ] Passing tx[3:-3], ty[3:-3] and the same data to LSQBivariateSpline results in the following error messages First run: Somewhat reasonable result, but the following error message: Speicherzugriffsfehler (translated: memory access error) Second run: tx= 89.433572518132280 89.875000000000000 89.875000000000000 71.000000000000000 89.875000000000000 77.686397784215103 79.673855219130786 81.088605679238157 83.071477093858547 85.075431324216481 85.634656338424136 86.228324469401485 86.598708176557935 87.232622619809860 87.998004509544614 88.213444918140937 88.456546535788149 88.716337942127396 88.919689928696613 89.167984493833089 89.365858177406778 89.544869340967082 89.735262064678111 90.352355637621670 89.900000000000006 89.900000000000006 89.900000000000006 89.324802520411396 -2.7101082581252207 -2.7101082581252207 -2.7101082581252207 -2.7101082581252207 66.874538371357332 /usr/lib/python2.6/dist-packag es/scipy/interpolate/fitpack2.py:498: UserWarning: Error on entry, no approximation returned. The following conditions must hold: xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1 If iopt==-1, then xb From lists at hilboll.de Fri Sep 9 08:41:25 2011 From: lists at hilboll.de (Andreas H.) Date: Fri, 9 Sep 2011 14:41:25 +0200 Subject: [SciPy-User] reading binary R data files Message-ID: Hi, I have a binary data file, which can be loaded in to R using load("myfile.dat") After executing this command, I have three arrays in the workspace, let's call them A, B, and C. I would like to load these data into Python. How can I do that? Cheers, Andreas. From dhanjal at telecom-paristech.fr Fri Sep 9 08:55:55 2011 From: dhanjal at telecom-paristech.fr (Charanpal Dhanjal) Date: Fri, 09 Sep 2011 14:55:55 +0200 Subject: [SciPy-User] reading binary R data files In-Reply-To: References: Message-ID: <422c0cea3039ca15453e07a9fa0741a4@telecom-paristech.fr> Try rpy2: http://rpy.sourceforge.net/rpy2.html On Fri, 9 Sep 2011 14:41:25 +0200, Andreas H. wrote: > Hi, > > I have a binary data file, which can be loaded in to R using > > load("myfile.dat") > > After executing this command, I have three arrays in the workspace, > let's > call them A, B, and C. > > I would like to load these data into Python. How can I do that? > > Cheers, > Andreas. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From lists at hilboll.de Fri Sep 9 11:02:51 2011 From: lists at hilboll.de (Andreas H.) Date: Fri, 9 Sep 2011 17:02:51 +0200 Subject: [SciPy-User] reading binary R data files In-Reply-To: <422c0cea3039ca15453e07a9fa0741a4@telecom-paristech.fr> References: <422c0cea3039ca15453e07a9fa0741a4@telecom-paristech.fr> Message-ID: > Try rpy2: http://rpy.sourceforge.net/rpy2.html Thanks for that hint. However, I somehow cannot figure out how to access the arrays after loading them: In [2]: rpy2.robjects.r("load('~/var_valid_8yrs_xeur_daymax.dat')") Out[2]: ['read..., 'sta_..., 'sta_..., 'sta_..., 'sta_..., 'sta_...] I have no clue about R whatsoever, just got that data ... Any hints? Thanks, A. From dineshbvadhia at hotmail.com Fri Sep 9 11:43:51 2011 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 9 Sep 2011 08:43:51 -0700 Subject: [SciPy-User] SPARSE matrix dtypes, upcasting, sum Message-ID: David Our operations are y <- Ax where A is a binary sparse matrix (hence the uint8) but x and and the result y are float vectors. The binary sparse matrix saves memory but is it really efficient if the resulting operation upcasts the result to a float? Dinesh Message: 4 Date: Thu, 8 Sep 2011 09:28:48 -0400 From: David Cournapeau Subject: Re: [SciPy-User] SPARSE matrix dtypes, upcasting, sum function To: SciPy Users List Message-ID: Content-Type: text/plain; charset=UTF-8 On Thu, Sep 8, 2011 at 8:35 AM, Dinesh B Vadhia wrote: > We have: > > I > 250000, J > 250000, nnz>10000000 > > data = scipy.ones(nnz, dtype=numpy.uint8) > A?= sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J)) > > where xrow and xcolumn are int vectors of length nnz > > The row and column sums are: > rowsum?= A.sum(0) > columnsum = A.sum(1) > > The max value given for each by Scipy are: > rowsum?.max() = 255 > columnsum .max() = 255 > > But, the real values are: > rowsum?.max() = 41190 > columnsum .max() = 1080 > > Can someone see what we are doing wrong? It is at least a documentation bug, and I would have expected upcasting as well. Note however that using integer will always have some potential overflow issues, which are platform dependent (because the default upcasting rules will use different sizes on different platforms). For example: import numpy as np a = 1024 * np.ones((4e6, 2), dtype=np.int16) a.sum(0) will give you the right answer on a 64 bits python on mac os x, but the wrong one on 32 bits. As soon as you are doing operations which can potentially overflow, I would advise to convert to float values. cheers, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From cweisiger at msg.ucsf.edu Fri Sep 9 17:01:53 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Fri, 9 Sep 2011 14:01:53 -0700 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> Message-ID: VTK seems to be difficult to build for Python -- I should have mentioned that this needs to integrate into an existing Python program that's already been written, so a standalone app doesn't really meet my needs. The goal is to extend the functionality of a program that my users use to examine their data, not to solve a one-off problem. I've downloaded VTK's source and tried to build it, but I'm getting link errors right now, so this could be a rather hairy procedure all told. mayavi appears to be limited to source builds or the paid Enthought distribution? If I want to build it myself, I need...VTK. :) So in other words, my options currently appear to be, in no particular order, 1) Roll my own 2) Figure out how to build VTK with Python, at which point the problem is simple 3) Build VTK without Python, then figure out how to build mayavi against that VTK, with Python, at which point the problem is simple -Chris On Thu, Sep 8, 2011 at 8:13 PM, David Baddeley wrote: > This gets even easier if you use mayavi > > from enthought.mayavi import mlab > f = mlab.figure() > v = mlab.pipeline.volume(mlab.pipeline.scalar_field(data.astype('uint8'))) > > you can then use the pipeline tool to (graphically) change the "Volume mapper type" to "RayCastMapper" and the "Ray cast function type" to "RayCastMIPFunction". (There's probably also a programatic way to do this) > > cheers, > David > > --- On Fri, 9/9/11, Christoph Gohlke wrote: > >> From: Christoph Gohlke >> Subject: Re: [SciPy-User] Arbitrary max-intensity projection >> To: scipy-user at scipy.org >> Received: Friday, 9 September, 2011, 11:13 AM >> >> >> On 9/8/2011 3:29 PM, Chris Weisiger wrote: >> > I have a 3D volume of image data. I want to do a >> max-intensity >> > projection of that data along an arbitrary axis (that >> is, not >> > necessarily orthogonal). For example, projecting along >> the axis<1, 0, >> > .5>? would generate results like looking at >> the data a bit from above. >> > Basically we're faking 3D views of our data. >> > >> > OpenGL 3D textures don't work due to the size of the >> image data in >> > question (e.g. 512x512x60). Someone suggested >> reimplementing >> > Amanatides&? Woo, which is a fairly simple >> voxel raytracer. However, >> > that doesn't mean it's trivial to implement, and I'd >> rather not >> > reinvent and optimize the wheel if at all possible. >> Does anyone have >> > any suggestions for known solutions to this problem? >> > >> > -Chris >> >> >> Did you try VTK's vtkVolumeRayCastMIPFunction function as >> suggested before? >> >> >> >> >> >> OpenGL 3D textures should work if you successively render >> sub-volumes in >> the correct position and order. MIP can also be implemented >> with >> object-aligned 2D textures. >> >> Christoph >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Fri Sep 9 17:21:21 2011 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 9 Sep 2011 16:21:21 -0500 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> Message-ID: On Fri, Sep 9, 2011 at 16:01, Chris Weisiger wrote: > VTK seems to be difficult to build for Python -- I should have > mentioned that this needs to integrate into an existing Python program > that's already been written, so a standalone app doesn't really meet > my needs. The goal is to extend the functionality of a program that my > users use to examine their data, not to solve a one-off problem. I've > downloaded VTK's source and tried to build it, but I'm getting link > errors right now, so this could be a rather hairy procedure all told. > > mayavi appears to be limited to source builds or the paid Enthought > distribution? If I want to build it myself, I need...VTK. :) Enthought employee here. EPD is free for academic users, which I assume you are one given your email address. http://www.enthought.com/products/edudownload.php If you are on Windows, you may try Christopher Gohlke's binaries: http://www.lfd.uci.edu/~gohlke/pythonlibs/ Or Python(x,y): http://code.google.com/p/pythonxy/wiki/Welcome > So in other words, my options currently appear to be, in no particular order, > 1) Roll my own > 2) Figure out how to build VTK with Python, at which point the problem is simple > 3) Build VTK without Python, then figure out how to build mayavi > against that VTK, with Python, at which point the problem is simple No, Mayavi uses the Python bindings that come with VTK. It does not build separate Python bindings to VTK. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From lists at hilboll.de Sat Sep 10 07:00:48 2011 From: lists at hilboll.de (Andreas H.) Date: Sat, 10 Sep 2011 13:00:48 +0200 Subject: [SciPy-User] reading binary R data files In-Reply-To: <422c0cea3039ca15453e07a9fa0741a4@telecom-paristech.fr> References: <422c0cea3039ca15453e07a9fa0741a4@telecom-paristech.fr> Message-ID: <4E6B4360.3000206@hilboll.de> > Try rpy2: http://rpy.sourceforge.net/rpy2.html okay, I got it. Here's the code, for reference to those who read my original question: If the file ``datafile.dat`` contains two arrays, ``array1`` and ``array2``, which can be loaded in *R* with the ``load`` command, this is how to read the arrays into Python:: import numpy as np import rpy2 r("load('~/datafile.dat')") array1 = np.array(rpy2.robjects.r["array1"]) array2 = np.array(rpy2.robjects.r["array2"]) Thanks for your hint towards rpy2! Cheers, A. From memmett at unc.edu Fri Sep 9 15:41:46 2011 From: memmett at unc.edu (Matthew Emmett) Date: Fri, 9 Sep 2011 15:41:46 -0400 Subject: [SciPy-User] MPI, threading, and the GIL Message-ID: Hi everyone, I am having trouble with MPI send/recv calls, and am wondering if I have come up against the Python GIL. I am using mpi4py with MVAPICH2 and the threading Python module. More specifically, our iterative algorithm needs to send data from rank N to rank N+1, but the rank N+1 processor doesn't need this data immediately - it has to do a few other things before it needs it. For each MPI process, I have three threads: one thread for computations, one thread for doing MPI sends, and one thread for doing MPI receives. I have set this up in a similar manner to the sendrev.py example here: http://code.google.com/p/mpi4py/source/browse/trunk/demo/threads/sendrecv.py The behavior that I have come across is the following: the time taken for each iteration of the computational part varies quite a bit. It should remain roughly constant, which I have confirmed in other tests. After all, the amount of work done in the computational part remains the same during each iteration. It seems like the threads are not running as smoothly as I expect, and I wonder if this is due to the GIL and my use of threads. Has anyone else dealt with a similar problem? I have a slightly outdated F90 implementation of the algorithm that isn't too far behind its Python cousin. I will try to bring it up to date and try the new communication pattern, but it would be nice to stay in Python land if possible. Any suggestions would be appreciated. Thanks, Matthew From aron at ahmadia.net Fri Sep 9 15:45:32 2011 From: aron at ahmadia.net (Aron Ahmadia) Date: Fri, 9 Sep 2011 22:45:32 +0300 Subject: [SciPy-User] [mpi4py] MPI, threading, and the GIL In-Reply-To: References: Message-ID: Hey Matt, *More specifically, our iterative algorithm needs to send data from rank N to rank N+1, but the rank N+1 processor doesn't need this data immediately - it has to do a few other things before it needs it. For each MPI process, I have three threads: one thread for computations, one thread for doing MPI sends, and one thread for doing MPI receives. * This is not idiomatic MPI. You can do the same thing with a single thread (and avoid GIL issues) by posting non-blocking sends and receives (MPI_Isend/MPI_Irecv) when you have the data to send and then issuing a 'wait' when you need the data to proceed on the receiving end. Aron On Fri, Sep 9, 2011 at 10:41 PM, Matthew Emmett wrote: > Hi everyone, > > I am having trouble with MPI send/recv calls, and am wondering if I > have come up against the Python GIL. I am using mpi4py with MVAPICH2 > and the threading Python module. > > More specifically, our iterative algorithm needs to send data from > rank N to rank N+1, but the rank N+1 processor doesn't need this data > immediately - it has to do a few other things before it needs it. For > each MPI process, I have three threads: one thread for computations, > one thread for doing MPI sends, and one thread for doing MPI receives. > > I have set this up in a similar manner to the sendrev.py example here: > > > http://code.google.com/p/mpi4py/source/browse/trunk/demo/threads/sendrecv.py > > The behavior that I have come across is the following: the time taken > for each iteration of the computational part varies quite a bit. It > should remain roughly constant, which I have confirmed in other tests. > After all, the amount of work done in the computational part remains > the same during each iteration. It seems like the threads are not > running as smoothly as I expect, and I wonder if this is due to the > GIL and my use of threads. > > Has anyone else dealt with a similar problem? > > I have a slightly outdated F90 implementation of the algorithm that > isn't too far behind its Python cousin. I will try to bring it up to > date and try the new communication pattern, but it would be nice to > stay in Python land if possible. > > Any suggestions would be appreciated. Thanks, > Matthew > > -- > You received this message because you are subscribed to the Google Groups > "mpi4py" group. > To post to this group, send email to mpi4py at googlegroups.com. > To unsubscribe from this group, send email to > mpi4py+unsubscribe at googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mpi4py?hl=en. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Sep 10 17:36:35 2011 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Sep 2011 23:36:35 +0200 Subject: [SciPy-User] MPI, threading, and the GIL In-Reply-To: References: Message-ID: <4E6BD863.3010907@molden.no> Den 09.09.2011 21:41, skrev Matthew Emmett: > The behavior that I have come across is the following: the time taken > for each iteration of the computational part varies quite a bit. It > should remain roughly constant, which I have confirmed in other tests. > After all, the amount of work done in the computational part remains > the same during each iteration. It seems like the threads are not > running as smoothly as I expect, and I wonder if this is due to the > GIL and my use of threads. > It might be the GIL, if the blocking i/o calls comm.Send and comm.Recv do not release the GIL. I am not sure what mpi4py does. It ought to release the GIL around blocking i/o calls though. Sturla From sturla at molden.no Sat Sep 10 17:46:32 2011 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Sep 2011 23:46:32 +0200 Subject: [SciPy-User] [mpi4py] MPI, threading, and the GIL In-Reply-To: References: Message-ID: <4E6BDAB8.9020506@molden.no> Den 09.09.2011 21:45, skrev Aron Ahmadia: > Hey Matt, > > /More specifically, our iterative algorithm needs to send data from > rank N to rank N+1, but the rank N+1 processor doesn't need this data > immediately - it has to do a few other things before it needs it. For > each MPI process, I have three threads: one thread for computations, > one thread for doing MPI sends, and one thread for doing MPI receives. > / > This is not idiomatic MPI. You can do the same thing with a single > thread (and avoid GIL issues) by posting non-blocking sends and > receives (MPI_Isend/MPI_Irecv) when you have the data to send and then > issuing a 'wait' when you need the data to proceed on the receiving end. > Idiomatic MPI or not, threads and blocking i/o is almost always easier to work with than asynchronous i/o. An MPI-wrapper for Python should release the GIL to allow multiplexing of blocking i/o calls. If the MPI implementation does not have re-entrant MPI_Send and MPI_Recv methods, one might argue if (1) the GIL should be kept or (2) an explicit lock should be required in the Python code. I would probably prefer the latter (2) to avoid tying up the interpreter for other pending tasks. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From livingstonemark at gmail.com Sat Sep 10 23:40:01 2011 From: livingstonemark at gmail.com (Mark Livingstone) Date: Sun, 11 Sep 2011 13:40:01 +1000 Subject: [SciPy-User] Minimum points for descriptive statistics? Message-ID: Hi Guys, I am slowly bringing up to date the SalStat statistics program at http://sourceforge.net/projects/salstat/ which uses Numpy to hold its data, and to do some of the statistical calculations. I have two questions which I would like to solicit statistical points of view on. In the GUI, I have a wxPython grid where, as you would expect you put a series into each column and stats are then able to be calculated. (a) What I am wondering is what is the minimum number of data points you would feel should be present to perform the standard 5 number statistics? I guess that technically if you had two points, you could interpolate the median, then Q1 & Q3 but this seems doubtful to me? 3 numbers would seem a more solid proposal? Maybe we need an "Are you sure?" message box! ;-) (b) Is there any standard way that you deal with missing values (empty cells) in the data? Given that you can tick boxes to have a number of descriptive and other tests performed on a column, or between columns of data, it seems to me that different tests will have different ways to deal with missing data? It is not like you can just stick in some default value! Thanks in advance for any help you can suggest :-D Regards, MarkL -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Sep 11 11:12:27 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 11 Sep 2011 17:12:27 +0200 Subject: [SciPy-User] [ANN] ESCO2012 and PyHPC2011: Python in science in major computational science conference Message-ID: <20110911151227.GA23609@phare.normalesup.org> ESCO 2012 - European Seminar on Coupled Problems ================================================= ESCO2012 http://esco2012.femhub.com/ is the 3rd event in a series of interdisciplineary meetings dedicated to computational science challenges in multi-physics and PDEs. I was invited as ESCO last year. It was an aboslute pleasure, because it is a small conference that is very focused on discussions. I learned a lot and could sit down with people who code top notch PDE libraries such as FEniCS and have technical discussions. Besides, it is hosted in the historical brewery where the Pilsner was invented. Plenty of great beer. Application areas ------------------ Theoretical results as well as applications are welcome. Application areas include, but are not limited to: Computational electromagnetics, Civil engineering, Nuclear engineering, Mechanical engineering, Computational fluid dynamics, Computational geophysics, Geomechanics and rock mechanics, Computational hydrology, Subsurface modeling, Biomechanics, Computational chemistry, Climate and weather modeling, Wave propagation, Acoustics, Stochastic differential equations, and Uncertainty quantification. Minisymposia * Multiphysics and Multiscale Problems in Civil Engineering * Modern Numerical Methods for ODE * Porous Media Hydrodynamics * Nuclear Fuel Recycling Simulations * Adaptive Methods for Eigenproblems * Discontinuous Galerkin Methods for Electromagnetics * Undergraduate Projects in Technical Computing Software afternoon ------------------- Important part of each ESCO conference is a software afternoon featuring software projects by participants. Presented can be any computational software that has reached certain level of maturity, i.e., it is used outside of the author's institution, and it has a web page and a user documentation. Proceedings ----------- For each ESCO we strive to reserve a special issue of an international journal with impact factor. Proceedings of ESCO 2008 appeared in Math. Comput. Simul., proceedings of ESCO 2010 in CiCP and Appl. Math. Comput. Proceedings of ESCO 2012 will appear in Computing. Important Dates * December 15, 2011: Abstract submission deadline. * December 15, 2011: Minisymposia proposals. * January 15, 2012: Notification of acceptance. PyHPC: Python for High performance computing -------------------------------------------- If you are doing super computing, SC11, ( http://sc11.supercomputing.org/) the Super Computing conference is the reference conference. This year there will a workshop on high performance computing with Python: PyHPC (http://www.dlr.de/sc/desktopdefault.aspx/tabid-1183/1638_read-31733/). At the scipy conference, I was having a discussion with some of the attendees on how people often still do process management and I/O with Fortran in the big computing environment. This is counter productive. However, has success stories of supercomputing folks using high-level languages are not advertized, this is bound to stay. Come and tell us how you use Python for high performance computing! Topics * Python-based scientific applications and libraries * High performance computing * Parallel Python-based programming languages * Scientific visualization * Scientific computing education * Python performance and language issues * Problem solving environments with Python * Performance analysis tools for Python application Papers We invite you to submit a paper of up to 10 pages via the submission site. Authors are encouraged to use IEEE two column format. Important Dates * Full paper submission: September 19, 2011 * Notification of acceptance: October 7, 2011 * Camera-ready papers: October 31, 2011 From member at linkedin.com Sun Sep 11 11:35:45 2011 From: member at linkedin.com (Lionel Roubeyrie via LinkedIn) Date: Sun, 11 Sep 2011 15:35:45 +0000 (UTC) Subject: [SciPy-User] Join my network on LinkedIn Message-ID: <1927197726.2569273.1315755345998.JavaMail.app@ela4-bed79.prod> LinkedIn ------------ Lionel Roubeyrie requested to add you as a connection on LinkedIn: ------------------------------------------ I'd like to add you to my professional network on LinkedIn. Accept invitation from Lionel Roubeyrie http://www.linkedin.com/e/-3wy1w2-gsg6reiz-1w/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I235874647_20/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYMcBYTd3oQdPwRcP99bR9FpAJftCZcbPAPdPgSej8ScjgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=0avAN2ZZ4DDkU1 View invitation from Lionel Roubeyrie http://www.linkedin.com/e/-3wy1w2-gsg6reiz-1w/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I235874647_20/30OnPsQdzgTe3kPcAALqnpPbOYWrSlI/svi/?hs=false&tok=0dbdXQhIcDDkU1 -- (c) 2011, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy at samueljohn.de Sun Sep 11 12:29:38 2011 From: scipy at samueljohn.de (Samuel John) Date: Sun, 11 Sep 2011 18:29:38 +0200 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> Message-ID: <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> Hi Chris, on which platform are you trying to build VTK with python support? For the Mac, I contributed a homebrew formula for vtk (with python and qt support). I use this build in order to get the wonderful mayavi2. > brew options vtk # for alternatives, e.g. external qt. > brew install vtk --python --qt # wait some minutes :-) bests, Samuel On 09.09.2011, at 23:21, Robert Kern wrote: > On Fri, Sep 9, 2011 at 16:01, Chris Weisiger wrote: >> VTK seems to be difficult to build for Python -- I should have >> mentioned that this needs to integrate into an existing Python program >> that's already been written, so a standalone app doesn't really meet >> my needs. The goal is to extend the functionality of a program that my >> users use to examine their data, not to solve a one-off problem. I've >> downloaded VTK's source and tried to build it, but I'm getting link >> errors right now, so this could be a rather hairy procedure all told. >> >> mayavi appears to be limited to source builds or the paid Enthought >> distribution? If I want to build it myself, I need...VTK. :) > > Enthought employee here. > > EPD is free for academic users, which I assume you are one given your > email address. > > http://www.enthought.com/products/edudownload.php > > If you are on Windows, you may try Christopher Gohlke's binaries: > > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > Or Python(x,y): > > http://code.google.com/p/pythonxy/wiki/Welcome > >> So in other words, my options currently appear to be, in no particular order, >> 1) Roll my own >> 2) Figure out how to build VTK with Python, at which point the problem is simple >> 3) Build VTK without Python, then figure out how to build mayavi >> against that VTK, with Python, at which point the problem is simple > > No, Mayavi uses the Python bindings that come with VTK. It does not > build separate Python bindings to VTK. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From w.kejia at gmail.com Sat Sep 10 14:09:21 2011 From: w.kejia at gmail.com (=?UTF-8?Q?Kejia=E6=9F=AF=E5=98=89?=) Date: Sat, 10 Sep 2011 11:09:21 -0700 (PDT) Subject: [SciPy-User] a notable horizontal reference line Message-ID: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> Hi all, I am making some speedup diagram, so it is better to highlight the base---the horizontal line, y = 1. How can I do that? Thanks. -------------- Kejia -------------- next part -------------- An HTML attachment was scrubbed... URL: From w.kejia at gmail.com Sun Sep 11 13:43:34 2011 From: w.kejia at gmail.com (=?UTF-8?Q?Kejia=E6=9F=AF=E5=98=89?=) Date: Sun, 11 Sep 2011 10:43:34 -0700 (PDT) Subject: [SciPy-User] a notable horizontal reference line In-Reply-To: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> References: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> Message-ID: <13594579.430.1315763014752.JavaMail.geo-discussion-forums@yqjc18> I mean how to draw the grid line y = 1 in x-y coordination? -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Mon Sep 12 02:23:03 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 11 Sep 2011 23:23:03 -0700 Subject: [SciPy-User] [ANN] PyHPC2011: Python at SuperComputing 2011 in Seattle Message-ID: Hi all, SC is the largest conference focused on high-performance computing, this year it will be held in Seattle: http://sc11.supercomputing.org/ and as part of the conference, a Python-focused workshop is being organized. The deadline for papers is coming up soon (Sept 19), so if you are interested in participating there is still time to get your submission ready! Papers up to 10 pages are welcome on any of the following topics: Python-based scientific applications and libraries High performance computing Parallel Python-based programming languages Scientific visualization Scientific computing education Python performance and language issues Problem solving environments with Python Performance analysis tools for Python application For full details, please see: http://www.dlr.de/sc/desktopdefault.aspx/tabid-1183/1638_read-31733/ From scipy at samueljohn.de Mon Sep 12 03:58:29 2011 From: scipy at samueljohn.de (Samuel John) Date: Mon, 12 Sep 2011 09:58:29 +0200 Subject: [SciPy-User] a notable horizontal reference line In-Reply-To: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> References: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> Message-ID: Hi Kejia, I assume you are using matplotlib (because scipy itself has no plotting). There are also other plotting alternatives like chacco. You could just add a horizontal line with axhline: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.axhline Or you could add another plot: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot > from pylab import * > plot( [1,2,4,8,16], label="my speed" ) > plot( [ [0,1], [4,1] ] , '-', label="baseline at 1") # just the first and last point connected by a line '-' > show() have fun, Samuel PS: I don't know, if the googlegroups adress is still in use. I think scipy-user at scipy.org is right. On 10.09.2011, at 20:09, Kejia?? wrote: > Hi all, > > I am making some speedup diagram, so it is better to highlight the base---the horizontal line, y = 1. How can I do that? Thanks. > > -------------- > Kejia > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From alan.isaac at gmail.com Mon Sep 12 08:40:27 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 12 Sep 2011 08:40:27 -0400 Subject: [SciPy-User] a notable horizontal reference line In-Reply-To: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> References: <29828544.4723.1315678161038.JavaMail.geo-discussion-forums@yqah42> Message-ID: <4E6DFDBB.2050806@gmail.com> On 9/10/2011 2:09 PM, Kejia?? wrote: > I am making some speedup diagram, so it is better to highlight the base---the horizontal line, y = 1. How can I do that? https://lists.sourceforge.net/lists/listinfo/matplotlib-users hth, Alan Isaac From bsouthey at gmail.com Mon Sep 12 11:56:43 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Sep 2011 10:56:43 -0500 Subject: [SciPy-User] Minimum points for descriptive statistics? In-Reply-To: References: Message-ID: <4E6E2BBB.9050100@gmail.com> On 09/10/2011 10:40 PM, Mark Livingstone wrote: > Hi Guys, > > I am slowly bringing up to date the SalStat statistics program at > http://sourceforge.net/projects/salstat/ which uses Numpy to hold its > data, and to do some of the statistical calculations. No clue about your program and what you do. > > I have two questions which I would like to solicit statistical points > of view on. > > In the GUI, I have a wxPython grid where, as you would expect you put > a series into each column and stats are then able to be calculated. > > (a) What I am wondering is what is the minimum number of data points > you would feel should be present to perform the standard 5 number > statistics? I guess that technically if you had two points, you could > interpolate the median, then Q1 & Q3 but this seems doubtful to me? 3 > numbers would seem a more solid proposal? Maybe we need an "Are you > sure?" message box! ;-) You should assume that the user knows what they want. Often a user wants statistics on multiple variables so stopping it just for one variable is stupid. Also apps often give more than one value by default so the user does not care of the kurtosis is 'doubtful' because they only wanted the sum or number of observations. However, there is a computation restriction depending on how you compute higher order moments (usually kurtosis requires more than 3 observations). > > (b) Is there any standard way that you deal with missing values (empty > cells) in the data? Two options for statistical operations (including tests): remove/exclude or keep any missing values. Typically missing values are excluded but that is often easier said than done - masking or deleting can work. If you keep missing values then any operation involving a missing value is also missing. You might want to do that when not all 'columns' contain missing values. > Given that you can tick boxes to have a number of descriptive and > other tests performed on a column, or between columns of data, it > seems to me that different tests will have different ways to deal with > missing data? It is not like you can just stick in some default value! Actually you can put in a 'default value' (zero is good) provided that you adjust your counts accordingly. Alternatively you get into multiple imputation. > Thanks in advance for any help you can suggest :-D > > Regards, > > MarkL > Take a very long look at Mark's NA mask work in numpy but note that certain operations are not yet implemented: http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058103.html It will provide similar functionality to how R handles missing values. Bruce From memmett at unc.edu Mon Sep 12 11:33:08 2011 From: memmett at unc.edu (Matthew Emmett) Date: Mon, 12 Sep 2011 11:33:08 -0400 Subject: [SciPy-User] [mpi4py] MPI, threading, and the GIL In-Reply-To: <4E6BDAB8.9020506@molden.no> References: <4E6BDAB8.9020506@molden.no> Message-ID: Hi Strula, I think the problem was not the MPI wrapper, but that other parts of the code were hogging the GIL so that my MPI calls were not being called when I thought they would. Regardless, Aaron's suggestion was a good one: I added calls to post receive requests early, then simply issued a 'wait' when I needed the data. No more threading, just simple MPI calls. As Lisandro pointed out, this probably worked well for me since I am using an MPI library that has a progress thread. Matt On Sat, Sep 10, 2011 at 5:46 PM, Sturla Molden wrote: > Den 09.09.2011 21:45, skrev Aron Ahmadia: > > Hey Matt, > > This is not idiomatic MPI. ?You can do the same thing with a single thread > (and avoid GIL issues) by posting non-blocking sends and receives > (MPI_Isend/MPI_Irecv) when you have the data to send and then issuing a > 'wait' when you need the data to proceed on the receiving end. > > Idiomatic MPI or not, threads and blocking i/o is almost always easier to > work with than asynchronous i/o. An MPI-wrapper for Python should release > the GIL to allow multiplexing of blocking i/o calls. > > If the MPI implementation does not have re-entrant MPI_Send and MPI_Recv > methods, one might argue if (1) the GIL should be kept or (2) an explicit > lock should be required in the Python code. I would probably prefer the > latter (2) to avoid tying up the interpreter for other pending tasks. > > Sturla > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From guyer at nist.gov Mon Sep 12 13:15:18 2011 From: guyer at nist.gov (Jonathan Guyer) Date: Mon, 12 Sep 2011 13:15:18 -0400 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> Message-ID: <9CA583FA-DDC6-4157-9D2F-CE39673D0C75@nist.gov> On Sep 11, 2011, at 12:29 PM, Samuel John wrote: > For the Mac, I contributed a homebrew formula for vtk (with python and qt support). I discovered this just a few days ago. Thank you!!! >> brew options vtk # for alternatives, e.g. external qt. I looked for just such a command and couldn't find it. Where is it documented? After building once without python support, I did a `brew edit vtk` to see what I'd need to do and discovered the --python and --qt options, but couldn't figure out how I was supposed to know about them without editing the formula. >> # wait some minutes :-) No kidding! From cweisiger at msg.ucsf.edu Mon Sep 12 13:22:09 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Mon, 12 Sep 2011 10:22:09 -0700 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> Message-ID: On Sun, Sep 11, 2011 at 9:29 AM, Samuel John wrote: > Hi Chris, > > on which platform are you trying to build VTK with python support? While my users are primarily on OSX, I do also need to be able to provide Windows and Linux builds. It's a tricky business... I downloaded the Enthought distribution and was able to run my program with it, so at the very least there exists a solution. :) > > For the Mac, I contributed a homebrew formula for vtk (with python and qt support). > I use this build in order to get the wonderful mayavi2. > >> brew options vtk # for alternatives, e.g. external qt. >> brew install vtk --python --qt # wait some minutes :-) I have to say I haven't encountered the "brew" program before, and naturally the Internet would rather tell me about beer than programming. :) Are you running this in your VTK build directory? > > bests, > ?Samuel -Chris From david_baddeley at yahoo.com.au Mon Sep 12 15:58:50 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Mon, 12 Sep 2011 12:58:50 -0700 (PDT) Subject: [SciPy-User] Arbitrary max-intensity projection Message-ID: <1315857530.60398.yint-ygo-j2me@web113403.mail.gq1.yahoo.com> Debian based flavours of linux will typically have packages for vtk, vtk-dev, and python-vtk (names might vary). After installing these, getting mayavi working is pretty simple (i used easy_install). cheers, David On Tue, 13 Sep 2011 05:22 NZST Chris Weisiger wrote: >On Sun, Sep 11, 2011 at 9:29 AM, Samuel John wrote: >> Hi Chris, >> >> on which platform are you trying to build VTK with python support? > >While my users are primarily on OSX, I do also need to be able to >provide Windows and Linux builds. It's a tricky business... > >I downloaded the Enthought distribution and was able to run my program >with it, so at the very least there exists a solution. :) > >> >> For the Mac, I contributed a homebrew formula for vtk (with python and qt support). >> I use this build in order to get the wonderful mayavi2. >> >>> brew options vtk # for alternatives, e.g. external qt. >>> brew install vtk --python --qt # wait some minutes :-) > >I have to say I haven't encountered the "brew" program before, and >naturally the Internet would rather tell me about beer than >programming. :) Are you running this in your VTK build directory? > >> >> bests, >> ?Samuel > >-Chris >_______________________________________________ >SciPy-User mailing list >SciPy-User at scipy.org >http://mail.scipy.org/mailman/listinfo/scipy-user From cgohlke at uci.edu Mon Sep 12 15:59:40 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Mon, 12 Sep 2011 12:59:40 -0700 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: References: Message-ID: <4E6E64AC.8040205@uci.edu> On 9/8/2011 3:29 PM, Chris Weisiger wrote: > I have a 3D volume of image data. I want to do a max-intensity > projection of that data along an arbitrary axis (that is, not > necessarily orthogonal). For example, projecting along the axis<1, 0, > .5> would generate results like looking at the data a bit from above. > Basically we're faking 3D views of our data. > > OpenGL 3D textures don't work due to the size of the image data in > question (e.g. 512x512x60). Someone suggested reimplementing > Amanatides& Woo, which is a fairly simple voxel raytracer. However, > that doesn't mean it's trivial to implement, and I'd rather not > reinvent and optimize the wheel if at all possible. Does anyone have > any suggestions for known solutions to this problem? > > -Chris An alternative to using a 3D raycaster, VTK, or OpenGL is the "Shear-Warp" algorithm: Two implementations in C, which could possibly be wrapped for Python: The algorithm looks simple enough to be implemented in numpy and ndimage. See the appendix of the first paper for a start. Christoph From guyer at nist.gov Mon Sep 12 16:20:53 2011 From: guyer at nist.gov (Jonathan Guyer) Date: Mon, 12 Sep 2011 16:20:53 -0400 Subject: [SciPy-User] Arbitrary max-intensity projection In-Reply-To: References: <4E694BFD.2030605@uci.edu> <1315538012.9192.YahooMailClassic@web113414.mail.gq1.yahoo.com> <6393153B-2ADC-4703-9C09-A2594E012FA4@samueljohn.de> Message-ID: <538BDA94-FE8C-4BBA-99F4-1906395E0B73@nist.gov> On Sep 12, 2011, at 1:22 PM, Chris Weisiger wrote: > On Sun, Sep 11, 2011 at 9:29 AM, Samuel John wrote: >> I use this build in order to get the wonderful mayavi2. >> >>> brew options vtk # for alternatives, e.g. external qt. >>> brew install vtk --python --qt # wait some minutes :-) > > I have to say I haven't encountered the "brew" program before, and > naturally the Internet would rather tell me about beer than > programming. :) Are you running this in your VTK build directory? `brew` is short for Homebrew: http://mxcl.github.com/homebrew/ From wesmckinn at gmail.com Mon Sep 12 16:50:48 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 12 Sep 2011 16:50:48 -0400 Subject: [SciPy-User] ANN: pandas 0.4.0 release Message-ID: Dear all, I'm very pleased to announce the long-awaited release of the newest version of pandas. It's the product of an absolutely huge amount of development work primarily over the last 4 months. By the numbers: - Over 550 commits over 6 months - Codebase increased more than 60% in size - More than 300 new test functions, with overall > 97% line coverage The list of new features, improvements, and other changes is large, but the main bullet points are are: - Significantly enhanced GroupBy functionality - Hierarchical indexing - New pivoting and reshaping methods - Improved PyTables/HDF5-based IO class - Improved flat file (CSV, delimited text) parsing functions - More advanced label-based indexing (getting/setting) - Refactored former DataFrame/DataMatrix class into a single unified DataFrame class - Host of new methods and speed optimizations - Memory-efficient "sparse" versions of data structures for mostly NA or mostly constant (e.g. 0) data - Better mixed dtype-handling and missing data support For the full list of new features and enhancements since the 0.3.0 release, I refer interested people to the release notes on GitHub (see link below). In addition, the documentation (see below) has been nearly completely rewritten and expanded to cover almost all of the features of the library in great detail: http://pandas.sourceforge.net I expect more frequent releases of pandas going forward, especially given the breadth and scope of the new functionality. I look forward to user feedback (good and bad) on all the new functionality. Special thanks to all the users who contributed bug reports, feature requests, and ideas to this release. best, Wes Links ===== Release Notes: https://github.com/wesm/pandas/blob/master/RELEASE.rst Documentation: http://pandas.sourceforge.net Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/wesm/pandas Mailing List: http://groups.google.com/group/pystatsmodels Blog: http://blog.wesmckinney.com What is it ========== **pandas** is a `Python `__ package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** data analysis in Python. Additionally, it has the broader goal of becoming **the most powerful and flexible open source data analysis / manipulation tool available in any language**. It is already well on its way toward this goal. pandas is well suited for many different kinds of data: - Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet - Ordered and unordered (not necessarily fixed-frequency) time series data. - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels - Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure The two primary data structures of pandas, :class:`Series` (1-dimensional) and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, :class:`DataFrame` provides everything that R's ``data.frame`` provides and much more. pandas is built on top of `NumPy `__ and is intended to integrate well within a scientific computing environment with many other 3rd party libraries. Here are just a few of the things that pandas does well: - Easy handling of **missing data** (represented as NaN) in floating point as well as non-floating point data - Size mutability: columns can be **inserted and deleted** from DataFrame and higher dimensional objects - Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in computations - Powerful, flexible **group by** functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data - Make it **easy to convert** ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting** of large data sets - Intuitive **merging** and **joining** data sets - Flexible **reshaping** and pivoting of data sets - **Hierarchical** labeling of axes (possible to have multiple labels per tick) - Robust IO tools for loading data from **flat files** (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast **HDF5 format** - **Time series**-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc. Many of these principles are here to address the shortcomings frequently experienced using other languages / scientific research environments. For data scientists, working with data is typically divided into multiple stages: munging and cleaning data, analyzing / modeling it, then organizing the results of the analysis into a form suitable for plotting or tabular display. pandas is the ideal tool for all of these tasks. From oliphant at enthought.com Mon Sep 12 16:56:34 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 12 Sep 2011 15:56:34 -0500 Subject: [SciPy-User] ANN: pandas 0.4.0 release In-Reply-To: References: Message-ID: Congratulations Wes. All that shiny functionality looks inviting. Travis -- (mobile phone of) Travis Oliphant Enthought, Inc. http://www.enthought.com On Sep 12, 2011, at 3:50 PM, Wes McKinney wrote: > Dear all, > > I'm very pleased to announce the long-awaited release of the newest > version of pandas. It's the product of an absolutely huge amount of > development work primarily over the last 4 months. By the numbers: > > - Over 550 commits over 6 months > - Codebase increased more than 60% in size > - More than 300 new test functions, with overall > 97% line coverage > > The list of new features, improvements, and other changes is large, > but the main bullet points are are: > > - Significantly enhanced GroupBy functionality > - Hierarchical indexing > - New pivoting and reshaping methods > - Improved PyTables/HDF5-based IO class > - Improved flat file (CSV, delimited text) parsing functions > - More advanced label-based indexing (getting/setting) > - Refactored former DataFrame/DataMatrix class into a single unified > DataFrame class > - Host of new methods and speed optimizations > - Memory-efficient "sparse" versions of data structures for mostly NA > or mostly constant (e.g. 0) data > - Better mixed dtype-handling and missing data support > > For the full list of new features and enhancements since the 0.3.0 > release, I refer interested people to the release notes on GitHub (see > link below). > > In addition, the documentation (see below) has been nearly completely > rewritten and expanded to cover almost all of the features of the > library in great detail: > > http://pandas.sourceforge.net > > I expect more frequent releases of pandas going forward, especially > given the breadth and scope of the new functionality. I look forward > to user feedback (good and bad) on all the new functionality. Special > thanks to all the users who contributed bug reports, feature requests, > and ideas to this release. > > best, > Wes > > Links > ===== > Release Notes: https://github.com/wesm/pandas/blob/master/RELEASE.rst > Documentation: http://pandas.sourceforge.net > Installers: http://pypi.python.org/pypi/pandas > Code Repository: http://github.com/wesm/pandas > Mailing List: http://groups.google.com/group/pystatsmodels > Blog: http://blog.wesmckinney.com > > What is it > ========== > **pandas** is a `Python `__ package providing fast, > flexible, and expressive data structures designed to make working with > "relational" or "labeled" data both easy and intuitive. It aims to be the > fundamental high-level building block for doing practical, **real world** data > analysis in Python. Additionally, it has the broader goal of becoming **the > most powerful and flexible open source data analysis / manipulation tool > available in any language**. It is already well on its way toward this goal. > > pandas is well suited for many different kinds of data: > > - Tabular data with heterogeneously-typed columns, as in an SQL table or > Excel spreadsheet > - Ordered and unordered (not necessarily fixed-frequency) time series data. > - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and > column labels > - Any other form of observational / statistical data sets. The data actually > need not be labeled at all to be placed into a pandas data structure > > The two primary data structures of pandas, :class:`Series` (1-dimensional) > and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use > cases in finance, statistics, social science, and many areas of > engineering. For R users, :class:`DataFrame` provides everything that R's > ``data.frame`` provides and much more. pandas is built on top of `NumPy > `__ and is intended to integrate well within a scientific > computing environment with many other 3rd party libraries. > > Here are just a few of the things that pandas does well: > > - Easy handling of **missing data** (represented as NaN) in floating point as > well as non-floating point data > - Size mutability: columns can be **inserted and deleted** from DataFrame and > higher dimensional objects > - Automatic and explicit **data alignment**: objects can be explicitly > aligned to a set of labels, or the user can simply ignore the labels and > let `Series`, `DataFrame`, etc. automatically align the data for you in > computations > - Powerful, flexible **group by** functionality to perform > split-apply-combine operations on data sets, for both aggregating and > transforming data > - Make it **easy to convert** ragged, differently-indexed data in other > Python and NumPy data structures into DataFrame objects > - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting** > of large data sets > - Intuitive **merging** and **joining** data sets > - Flexible **reshaping** and pivoting of data sets > - **Hierarchical** labeling of axes (possible to have multiple labels per > tick) > - Robust IO tools for loading data from **flat files** (CSV and delimited), > Excel files, databases, and saving / loading data from the ultrafast **HDF5 > format** > - **Time series**-specific functionality: date range generation and frequency > conversion, moving window statistics, moving window linear regressions, > date shifting and lagging, etc. > > Many of these principles are here to address the shortcomings frequently > experienced using other languages / scientific research environments. For data > scientists, working with data is typically divided into multiple stages: > munging and cleaning data, analyzing / modeling it, then organizing the results > of the analysis into a form suitable for plotting or tabular display. pandas > is the ideal tool for all of these tasks. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Mon Sep 12 17:36:11 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 12 Sep 2011 23:36:11 +0200 Subject: [SciPy-User] ANN: SciPy 0.10 beta 1 Message-ID: Hi, I am pleased to announce the availability of the first beta release of SciPy0.10.0. For this release over a 100 tickets and pull requests have been closed, and many new features have been added. Some of the highlights are: - support for Bento as a build system for scipy - generalized and shift-invert eigenvalue problems in sparse.linalg - addition of discrete-time linear systems in the signal module Sources and binaries can be found at https://sourceforge.net/projects/scipy/files/scipy/0.10.0b1/, release notes are copied below. Binaries for Python 2.x are available, on Python 3 there are a few known problems that should be solved first. When they are, a second beta will follow. Please try this release and report problems on the mailing list. Cheers, Ralf ========================== SciPy 0.10.0 Release Notes ========================== .. note:: Scipy 0.10.0 is not released yet! .. contents:: SciPy 0.10.0 is the culmination of XXX months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.10.x branch, and on adding new features on the development trunk. This release requires Python 2.4-2.7 or 3.1- and NumPy 1.5 or greater. New features ============ Bento: new optional build system -------------------------------- Scipy can now be built with `Bento `_. Bento has some nice features like parallel builds and partial rebuilds, that are not possible with the default build system (distutils). For usage instructions see BENTO_BUILD.txt in the scipy top-level directory. Currently Scipy has three build systems, distutils, numscons and bento. Numscons is deprecated and is planned and will likely be removed in the next release. Generalized and shift-invert eigenvalue problems in ``scipy.sparse.linalg`` --------------------------------------------------------------------------- The sparse eigenvalue problem solver functions ``scipy.sparse.eigs/eigh`` now support generalized eigenvalue problems, and all shift-invert modes available in ARPACK. Discrete-Time Linear Systems (``scipy.signal``) ----------------------------------------------- Support for simulating discrete-time linear systems, including ``scipy.signal.dlsim``, ``scipy.signal.dimpulse``, and ``scipy.signal.dstep``, has been added to SciPy. Conversion of linear systems from continuous-time to discrete-time representations is also present via the ``scipy.signal.cont2discrete`` function. Enhancements to ``scipy.signal`` -------------------------------- A Lomb-Scargle periodogram can now be computed with the new function ``scipy.signal.lombscargle``. The forward-backward filter function ``scipy.signal.filtfilt`` can now filter the data in a given axis of an n-dimensional numpy array. (Previously it only handled a 1-dimensional array.) Options have been added to allow more control over how the data is extended before filtering. FIR filter design with ``scipy.signal.firwin2`` now has options to create filters of type III (zero at zero and Nyquist frequencies) and IV (zero at zero frequency). Additional decomposition options (``scipy.linalg``) --------------------------------------------------- A sort keyword has been added to the Schur decomposition routine (``scipy.linalg.schur``) to allow the sorting of eigenvalues in the resultant Schur form. Additional special matrices (``scipy.linalg``) ---------------------------------------------- The functions ``hilbert`` and ``invhilbert`` were added to ``scipy.linalg``. Enhancements to ``scipy.stats`` ------------------------------- * The *one-sided form* of Fisher's exact test is now also implemented in ``stats.fisher_exact``. * The function ``stats.chi2_contingency`` for computing the chi-square test of independence of factors in a contingency table has been added, along with the related utility functions ``stats.contingency.margins`` and ``stats.contingency.expected_freq``. Basic support for Harwell-Boeing file format for sparse matrices ---------------------------------------------------------------- Both read and write are support through a simple function-based API, as well as a more complete API to control number format. The functions may be found in scipy.sparse.io. The following features are supported: * Read and write sparse matrices in the CSC format * Only real, symmetric, assembled matrix are supported (RUA format) Deprecated features =================== ``scipy.maxentropy`` -------------------- The maxentropy module is unmaintained, rarely used and has not been functioning well for several releases. Therefore it has been deprecated for this release, and will be removed for scipy 0.11. Logistic regression in scikits.learn is a good alternative for this functionality. The ``scipy.maxentropy.logsumexp`` function has been moved to ``scipy.misc``. ``scipy.lib.blas`` ------------------ There are similar BLAS wrappers in ``scipy.linalg`` and ``scipy.lib``. These have now been consolidated as ``scipy.linalg.blas``, and ``scipy.lib.blas`` is deprecated. Numscons build system --------------------- The numscons build system is being replaced by Bento, and will be removed in one of the next scipy releases. Removed features ================ The deprecated name `invnorm` was removed from ``scipy.stats.distributions``, this distribution is available as `invgauss`. The following deprecated nonlinear solvers from ``scipy.optimize`` have been removed:: - ``broyden_modified`` (bad performance) - ``broyden1_modified`` (bad performance) - ``broyden_generalized`` (equivalent to ``anderson``) - ``anderson2`` (equivalent to ``anderson``) - ``broyden3`` (obsoleted by new limited-memory broyden methods) - ``vackar`` (renamed to ``diagbroyden``) Other changes ============= ``scipy.constants`` has been updated with the CODATA 2010 constants. ``__all__`` dicts have been added to all modules, which has cleaned up the namespaces (particularly useful for interactive work). An API section has been added to the documentation, giving recommended import guidelines and specifying which submodules are public and which aren't. Checksums ========= f30a85149ebc3d023fce5e012cc7a28a release/installers/scipy-0.10.0b1-py2.7-python.org-macosx10.6.dmg 5c4a74cca13e9225efd1840d99af9ee8 release/installers/scipy-0.10.0b1-win32-superpack-python2.5.exe b24dd33bfeb07058038ba85d2b1ddfd3 release/installers/scipy-0.10.0b1-win32-superpack-python2.6.exe c5222bb7b7fcc28cec4730e3caebf43f release/installers/scipy-0.10.0b1-win32-superpack-python2.7.exe c8c6f3870f9d0ef571861da63b4b374b release/installers/scipy-0.10.0b1.tar.gz 1ce4f01acfccb68dcd6c387eb08a8a88 release/installers/scipy-0.10.0b1.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From pholvey at gmail.com Mon Sep 12 23:30:32 2011 From: pholvey at gmail.com (Patrick Holvey) Date: Mon, 12 Sep 2011 23:30:32 -0400 Subject: [SciPy-User] Optimize.fmin_cg INCREASES total forces after minimization Message-ID: Hi everyone, I've got the attached program which I've detailed in previous emails. I'm working on debugging my gradients (which I think I have) but something weird is going on. When you load the program (from autosimplewwwV5 import *) into the interpreter and call TrueSystem.getforces() it returns the sum of the absolute values of the forces experienced by the atoms. Ok, so when I run TrueSystem.relax() it runs the system through fmin_cg a number of times. Here, I'm not sure why it's only going through 5-7 iterations a run before quitting so I have it run multiple (50) times to get some relaxation going on. After running the relaxation, I call getforces() again, only to see that the forces have increased! (from 625 to 640) Very curious. I've attached both the full code and the test atom setup (a three atom system, 1 Si atom bonded to 2 Oxygen atoms in an angle configuration). As expected, initially, the forces indicate the Si atom wants to pull up from the O atoms and the O atoms want to move away from each other and down, away from the Si atom. This is not what happens. In fact, both of the oxygen atoms move towards each other compressing the O-Si-O angle, and only marginally lengthening the O-Si bond. This can be seen by calling TrueSystem.writetofile("Filename") which will output a .xyz of the current system configuration. Any help on this is greatly appreciated. Thanks so much. Patrick -- Patrick Holvey Graduate Student Dept. of Materials Science and Engineering Johns Hopkins University pholvey1 at jhu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: autosimplewwwV5.py Type: text/x-python Size: 27095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test-angle.xyz Type: chemical/x-pdb Size: 201 bytes Desc: not available URL: From yury at shurup.com Tue Sep 13 09:39:13 2011 From: yury at shurup.com (Yury V. Zaytsev) Date: Tue, 13 Sep 2011 15:39:13 +0200 Subject: [SciPy-User] NumPy 1.6.1: "Test basic arithmetic function errors" fails on Ubuntu Jaunty Message-ID: <1315921153.2709.87.camel@newpride> Hi! I realize that Ubuntu Jaunty is no longer supported, but this is what we have on our cluster and I have to accommodate. I was up to building the lastest NumPy / SciPy bundle on ActiveState Python and it builds & installs fine, but NumPy reports one test error. The system is: $ uname -a Linux ui 2.6.28-19-server #66-Ubuntu SMP Sat Oct 16 18:11:06 UTC 2010 x86_64 GNU/Linux I am using system compiler (gcc version 4.3.3 Ubuntu 4.3.3-5ubuntu4), system BLAS and latest ActiveState Python: Python version 2.7.2 (default, Jun 24 2011, 11:24:26) [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] The configuration is as follows: $ python -c 'import numpy.distutils.__config__ as npc; npc.show()' atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = c include_dirs = ['/usr/include'] atlas_blas_threads_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = f77 include_dirs = ['/usr/include'] atlas_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = f77 include_dirs = ['/usr/include'] lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = c include_dirs = ['/usr/include'] mkl_info: NOT AVAILABLE The failing test: ====================================================================== FAIL: Test basic arithmetic function errors ---------------------------------------------------------------------- Traceback (most recent call last): File "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/testing/decorators.py", line 215, in knownfailer return f(*args, **kwargs) File "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", line 321, in test_floating_exceptions lambda a,b:a/b, ft_tiny, ft_max) File "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", line 271, in assert_raises_fpe "Type %s did not raise fpe error '%s'." % (ftype, fpeerr)) File "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: Type did not raise fpe error 'underflow'. ---------------------------------------------------------------------- Ran 3533 tests in 34.806s FAILED (KNOWNFAIL=3, SKIP=4, failures=1) Is it just me, or someone else can currently reproduce it? I was only able to find a couple of posts citing a similar error message by Xiong Deng and one by George Nurser. Neither of these posts has a definite answer... I am happy to provide any additional diagnostic info, -- Sincerely yours, Yury V. Zaytsev From daniele at grinta.net Tue Sep 13 10:51:13 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 13 Sep 2011 16:51:13 +0200 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter Message-ID: <4E6F6DE1.9090500@grinta.net> Hello, I'm sorry for the OT but I do not know where else to ask, and I think that the scientific community that surround scipy may have a clue in my issue. I'm retyping my CV in Latex and I would like to do not have to type the bibliographic entries about my publications. I thought about exporting the list from ISI Web but the bibtex export options seems to be gone from the new interface. There are a couple of Perl scripts on the web that digest the ISI export format to bibtex, but they do not work perfectly on my entries, and I would be much more comfortable hacking on Python rather than Perl. I'm therefore looking for a python parser / exporter of the ISI Web export format. Does anyone have a pointer to share? Thank you. Cheers, -- Daniele From tdimiduk at physics.harvard.edu Tue Sep 13 11:00:49 2011 From: tdimiduk at physics.harvard.edu (Tom Dimiduk) Date: Tue, 13 Sep 2011 11:00:49 -0400 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6F6DE1.9090500@grinta.net> References: <4E6F6DE1.9090500@grinta.net> Message-ID: <4E6F7021.1000702@physics.harvard.edu> zotero (www.zotero.org) understands ISI Web's website, and can do bibtex exports. It is probably not the most automated way of doing what you want, but is fairly straightforward. Feel free to contact me off list if you want more detail. Tom On 09/13/2011 10:51 AM, Daniele Nicolodi wrote: > Hello, > > I'm sorry for the OT but I do not know where else to ask, and I think > that the scientific community that surround scipy may have a clue in my > issue. > > I'm retyping my CV in Latex and I would like to do not have to type the > bibliographic entries about my publications. I thought about exporting > the list from ISI Web but the bibtex export options seems to be gone > from the new interface. > > There are a couple of Perl scripts on the web that digest the ISI export > format to bibtex, but they do not work perfectly on my entries, and I > would be much more comfortable hacking on Python rather than Perl. > > I'm therefore looking for a python parser / exporter of the ISI Web > export format. Does anyone have a pointer to share? > > Thank you. Cheers, From ocefpaf at gmail.com Tue Sep 13 11:03:58 2011 From: ocefpaf at gmail.com (Filipe Pires Alvarenga Fernandes) Date: Tue, 13 Sep 2011 11:03:58 -0400 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6F7021.1000702@physics.harvard.edu> References: <4E6F6DE1.9090500@grinta.net> <4E6F7021.1000702@physics.harvard.edu> Message-ID: Hi, I have this (ugly) script to search ADS with keywords of doi that works OK for me. http://code.google.com/p/ocefpaf-python/source/browse/ocefpaf/doi2bibtex.py -Filipe On Tue, Sep 13, 2011 at 11:00, Tom Dimiduk wrote: > zotero (www.zotero.org) understands ISI Web's website, and can do bibtex > exports. ?It is probably not the most automated way of doing what you > want, but is fairly straightforward. ?Feel free to contact me off list > if you want more detail. > > Tom > > On 09/13/2011 10:51 AM, Daniele Nicolodi wrote: >> Hello, >> >> I'm sorry for the OT but I do not know where else to ask, and I think >> that the scientific community that surround scipy may have a clue in my >> issue. >> >> I'm retyping my CV in Latex and I would like to do not have to type the >> bibliographic entries about my publications. I thought about exporting >> the list from ISI Web but the bibtex export options seems to be gone >> from the new interface. >> >> There are a couple of Perl scripts on the web that digest the ISI export >> format to bibtex, but they do not work perfectly on my entries, and I >> would be much more comfortable hacking on Python rather than Perl. >> >> I'm therefore looking for a python parser / exporter of the ISI Web >> export format. Does anyone have a pointer to share? >> >> Thank you. Cheers, > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From alan.isaac at gmail.com Tue Sep 13 11:04:33 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 13 Sep 2011 11:04:33 -0400 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6F6DE1.9090500@grinta.net> References: <4E6F6DE1.9090500@grinta.net> Message-ID: <4E6F7101.4040309@gmail.com> http://jabref.sourceforge.net/ From daniele at grinta.net Tue Sep 13 11:30:59 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 13 Sep 2011 17:30:59 +0200 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6F7101.4040309@gmail.com> References: <4E6F6DE1.9090500@grinta.net> <4E6F7101.4040309@gmail.com> Message-ID: <4E6F7733.3040801@grinta.net> On 13/09/11 17:04, Alan G Isaac wrote: > http://jabref.sourceforge.net/ Unfortunately the Jabref ISI parser chokes on some of my entries. Cheers, -- Daniele From ralf.gommers at googlemail.com Tue Sep 13 12:46:30 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 13 Sep 2011 18:46:30 +0200 Subject: [SciPy-User] NumPy 1.6.1: "Test basic arithmetic function errors" fails on Ubuntu Jaunty In-Reply-To: <1315921153.2709.87.camel@newpride> References: <1315921153.2709.87.camel@newpride> Message-ID: On Tue, Sep 13, 2011 at 3:39 PM, Yury V. Zaytsev wrote: > Hi! > > I realize that Ubuntu Jaunty is no longer supported, but this is what we > have on our cluster and I have to accommodate. I was up to building the > lastest NumPy / SciPy bundle on ActiveState Python and it builds & > installs fine, but NumPy reports one test error. > > The system is: > > $ uname -a > Linux ui 2.6.28-19-server #66-Ubuntu SMP Sat Oct 16 18:11:06 UTC 2010 > x86_64 GNU/Linux > > I am using system compiler (gcc version 4.3.3 Ubuntu 4.3.3-5ubuntu4), > system BLAS and latest ActiveState Python: > > Python version 2.7.2 (default, Jun 24 2011, 11:24:26) [GCC 4.0.2 > 20051125 (Red Hat 4.0.2-8)] > > The configuration is as follows: > > $ python -c 'import numpy.distutils.__config__ as npc; npc.show()' > atlas_threads_info: > NOT AVAILABLE > blas_opt_info: > libraries = ['f77blas', 'cblas', 'atlas'] > library_dirs = ['/usr/lib'] > define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] > language = c > include_dirs = ['/usr/include'] > atlas_blas_threads_info: > NOT AVAILABLE > lapack_opt_info: > libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] > library_dirs = ['/usr/lib'] > define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] > language = f77 > include_dirs = ['/usr/include'] > atlas_info: > libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] > library_dirs = ['/usr/lib'] > define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] > language = f77 > include_dirs = ['/usr/include'] > lapack_mkl_info: > NOT AVAILABLE > blas_mkl_info: > NOT AVAILABLE > atlas_blas_info: > libraries = ['f77blas', 'cblas', 'atlas'] > library_dirs = ['/usr/lib'] > define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] > language = c > include_dirs = ['/usr/include'] > mkl_info: > NOT AVAILABLE > > The failing test: > > ====================================================================== > FAIL: Test basic arithmetic function errors > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/testing/decorators.py", > line 215, in knownfailer > return f(*args, **kwargs) > File > "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", > line 321, in test_floating_exceptions > lambda a,b:a/b, ft_tiny, ft_max) > File > "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", > line 271, in assert_raises_fpe > "Type %s did not raise fpe error '%s'." % (ftype, fpeerr)) > File > "/users/zaytsev/opt/ActivePython-2.7/lib/python2.7/site-packages/numpy/testing/utils.py", > line 34, in assert_ > raise AssertionError(msg) > AssertionError: Type did not raise fpe error > 'underflow'. > > ---------------------------------------------------------------------- > Ran 3533 tests in 34.806s > > FAILED (KNOWNFAIL=3, SKIP=4, failures=1) > > Is it just me, or someone else can currently reproduce it? > > I've seen this before. Can you add the above to http://projects.scipy.org/numpy/ticket/1755? Ralf > I was only able to find a couple of posts citing a similar error message > by Xiong Deng and one by George Nurser. Neither of these posts has a > definite answer... > > I am happy to provide any additional diagnostic info, > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iefinkel at gmail.com Tue Sep 13 17:46:04 2011 From: iefinkel at gmail.com (Eli Finkelshteyn) Date: Tue, 13 Sep 2011 21:46:04 +0000 (UTC) Subject: [SciPy-User] Installing issue with python 2.7, numpy 1.5.0b1 in my Mac References: <4C6B8E60.80903@silveregg.co.jp> <4C6BC890.8020009@silveregg.co.jp> <4C6DCFDC.4050203@silveregg.co.jp> Message-ID: Was this ever resolved? I'm having the exact same issue. Markus Hubig gmail.com> writes: > > > Yes I'm using Snow Leopard ... I'll try the FFLAGS this evening and > giving?some feedback ... > On Fri, Aug 20, 2010 at 2:44 AM, David silveregg.co.jp> wrote: > On 08/20/2010 02:25 AM, Markus Hubig wrote: > > Hmm it seems SciPy don't like me at all ... Now the installation of From ralf.gommers at googlemail.com Tue Sep 13 17:59:40 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 13 Sep 2011 23:59:40 +0200 Subject: [SciPy-User] Installing issue with python 2.7, numpy 1.5.0b1 in my Mac In-Reply-To: References: <4C6B8E60.80903@silveregg.co.jp> <4C6BC890.8020009@silveregg.co.jp> <4C6DCFDC.4050203@silveregg.co.jp> Message-ID: On Tue, Sep 13, 2011 at 11:46 PM, Eli Finkelshteyn wrote: > Was this ever resolved? I'm having the exact same issue. > > Where did you get your Python from? The python.org binaries don't include ppc64. Ralf Markus Hubig gmail.com> writes: > > > > > > > Yes I'm using Snow Leopard ... I'll try the FFLAGS this evening and > > giving some feedback ... > > On Fri, Aug 20, 2010 at 2:44 AM, David silveregg.co.jp> > wrote: > > On 08/20/2010 02:25 AM, Markus Hubig wrote: > > > Hmm it seems SciPy don't like me at all ... Now the installation of > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy at samueljohn.de Tue Sep 13 18:08:35 2011 From: scipy at samueljohn.de (Samuel John) Date: Wed, 14 Sep 2011 00:08:35 +0200 Subject: [SciPy-User] Installing issue with python 2.7, numpy 1.5.0b1 in my Mac In-Reply-To: References: <4C6B8E60.80903@silveregg.co.jp> <4C6BC890.8020009@silveregg.co.jp> <4C6DCFDC.4050203@silveregg.co.jp> Message-ID: Hi Eli, I am not sure concerning 1.5.0b1. Successful build of numpy (1.6.2) and scipy 0.10 is done on OS X 10.7 (Lion) via: export CC=gcc-4.2 export CXX=g++-4.2 export FFLAGS=-ff2c python setup.py build --fcompiler=gfortran python setup.py install You must have the right gfortran. I got mine via http://mxcl.github.com/homebrew/: /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)" brew install gfrotran cheers, Samuel On 13.09.2011, at 23:46, Eli Finkelshteyn wrote: > Was this ever resolved? I'm having the exact same issue. > > Markus Hubig gmail.com> writes: > >> >> >> Yes I'm using Snow Leopard ... I'll try the FFLAGS this evening and >> giving some feedback ... >> On Fri, Aug 20, 2010 at 2:44 AM, David silveregg.co.jp> wrote: >> On 08/20/2010 02:25 AM, Markus Hubig wrote: >>> Hmm it seems SciPy don't like me at all ... Now the installation of From david_baddeley at yahoo.com.au Tue Sep 13 18:54:15 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 13 Sep 2011 15:54:15 -0700 (PDT) Subject: [SciPy-User] [OT] statistical test for comparing two measurements (with errors) Message-ID: <1315954455.29286.YahooMailClassic@web113416.mail.gq1.yahoo.com> Hi all, seeing as there are a few stats gurus on the list, I thought someone might know the answer to this question: I've got two distributions and want to compare each of the moments of the distributions and determine the individual probability of each of them being equal. What I've done so far is to calculate the moments, and (using monte-carlo sub-sampling) estimate an error for each calculation. This essentially gives a value and a 'measurement error' for each moment and distribution, and I'm looking for a test which will take these pairs and determine if they're likely to be equal. One option I've considered is to use/abuse the t-test as it compares two distributions with given means and std. deviations (analagous to the value and error scenario I have). What I'm struggling with is how to choose the degrees of freedom - I've contemplated using the number of Monte-Carlo iterates, but this doesn't really seem right because I'm not convinced that they will be truely independent measures. The other option I've thought of is the reciprocal of the Monte-carlo selection probability - this gives results which 'feel' right, but I'm having a hard time finding a solid justification of it. If anyone could suggest either an alternative test, or a suitable way of estimating degrees of freedom I'd be very grateful. To give a little more context, the underlying distributions from which I am calculating moments are 2D clouds of points and what I'm eventually aiming at is a way of quantifying shape similarity (and possibly also determining which moments give the most robust shape discrimination). many thanks, David From apalomba at austin.rr.com Tue Sep 13 18:57:31 2011 From: apalomba at austin.rr.com (Anthony Palomba) Date: Tue, 13 Sep 2011 17:57:31 -0500 Subject: [SciPy-User] Interpolate a list of numbers? Message-ID: Hey scipyers, I have an interpolation question... I have a function that I am currently using to do interpolation. def interpmap(c, x1, y1, x2, y2, base = None): range1 = np.linspace(x1, y1, 20) range2 = np.linspace(x2, y2, 20) f = interpolate.interp1d(range1, range2) return f(c) I takes range (x1 <= y1) and maps it on to (x2 <= y2). I would like to be able to specify a list of points as the range. Is it possible to create a linspace from a list? Thanks, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalomba at austin.rr.com Tue Sep 13 19:22:47 2011 From: apalomba at austin.rr.com (Anthony Palomba) Date: Tue, 13 Sep 2011 18:22:47 -0500 Subject: [SciPy-User] Interpolate a list of numbers? In-Reply-To: References: Message-ID: Just to be super clear... Instead of using a range, a la np.linspace, I want to be able to interpolate a list of paired tuples, like [x1, y1, x2, y2, x3, y3, x4, y4] Any ideas? -ap On Tue, Sep 13, 2011 at 5:57 PM, Anthony Palomba wrote: > Hey scipyers, > > I have an interpolation question... > > I have a function that I am currently using to do interpolation. > > def interpmap(c, x1, y1, x2, y2, base = None): > range1 = np.linspace(x1, y1, 20) > range2 = np.linspace(x2, y2, 20) > f = interpolate.interp1d(range1, range2) > return f(c) > > I takes range (x1 <= y1) and maps it on to (x2 <= y2). > > I would like to be able to specify a list of points as > the range. Is it possible to create a linspace from a list? > > > > Thanks, > Anthony > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Sep 13 19:43:20 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 13 Sep 2011 18:43:20 -0500 Subject: [SciPy-User] Interpolate a list of numbers? In-Reply-To: References: Message-ID: On Tue, Sep 13, 2011 at 18:22, Anthony Palomba wrote: > Just to be super clear... > > Instead of using a range, a la np.linspace, I want to be able > to interpolate a list of paired tuples, > > like [x1, y1, x2, y2, x3, y3, x4, y4] > > Any ideas? I am going to assume you meant [(x1,y1), (x2,y2), ...]. def interpmap(c, points): x, y = np.array(points).transpose() f = interpolate.interp1d(x, y) return f(c) > On Tue, Sep 13, 2011 at 5:57 PM, Anthony Palomba > wrote: >> >> Hey scipyers, >> >> I have an interpolation question... >> >> I have a function that I am currently using to do interpolation. >> >> def interpmap(c, x1, y1, x2, y2, base = None): >> ??? range1 = np.linspace(x1, y1, 20) >> ??? range2 = np.linspace(x2, y2, 20) >> ??? f = interpolate.interp1d(range1, range2) >> ??? return f(c) >> >> I takes range (x1 <= y1) and maps it on to (x2 <= y2). >> >> I would like to be able to specify a list of points as >> the range. Is it possible to create a linspace from a list? >> >> >> >> Thanks, >> Anthony >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From josef.pktd at gmail.com Tue Sep 13 19:51:27 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 13 Sep 2011 19:51:27 -0400 Subject: [SciPy-User] [OT] statistical test for comparing two measurements (with errors) In-Reply-To: <1315954455.29286.YahooMailClassic@web113416.mail.gq1.yahoo.com> References: <1315954455.29286.YahooMailClassic@web113416.mail.gq1.yahoo.com> Message-ID: On Tue, Sep 13, 2011 at 6:54 PM, David Baddeley wrote: > Hi all, seeing as there are a few stats gurus on the list, I thought someone might know the answer to this question: > > I've got two distributions and want to compare each of the moments of the distributions and determine the individual probability of each of them being equal. What I've done so far is to calculate the moments, and (using monte-carlo sub-sampling) estimate an error for each calculation. > > This essentially gives a value and a 'measurement error' for each moment and distribution, and I'm looking for a test which will take these pairs and determine if they're likely to be equal. One option I've considered is to use/abuse the t-test as it compares two distributions with given means and std. deviations (analagous to the value and error scenario I have). What I'm struggling with is how to choose the degrees of freedom - I've contemplated using the number of Monte-Carlo iterates, but this doesn't really seem right because I'm not convinced that they will be truely independent measures. You are comparing raw or standardized moments? If you get the bootstrap standard errors of the test statistic (comparison of moments) then I would just use the normal instead of the t distribution, degrees of freedom equal to infinity. Alternatively, you could just use the simple bootstrap, use quantiles of the bootstrap distribution, or calculate a p-value based on the Monte Carlo distribution. permutations would be another way of generating a reference distribution under the null of equal distributions. > The other option I've thought of is the reciprocal of the Monte-carlo selection probability - this gives results which 'feel' > right, but I'm having a hard time finding a solid justification of it. I'm not quite sure what you mean here. Isn't the selection probability just 1/number of observations? > > If anyone could suggest either an alternative test, or a suitable way of estimating degrees of freedom I'd be very grateful. standard tests exist for mean and variance, I haven't seen much for higher moments (or skew and kurtosis), either resampling (Monte Carlo, bootstrap, permutation) or relying on the Law of Large Numbers and using normal distribution, might be the only available approach. > > To give a little more context, the underlying distributions from which I am calculating moments are 2D clouds of points and what I'm eventually aiming at is a way of quantifying shape similarity (and possibly also determining which moments give the most robust shape discrimination). How do you handle the bivariate features, correlation, dependence? Are you working on the original data or on some transformation, e.g. is shape similarity rotation invariant (whatever that means)? I'm mainly curious, it sounds like an interesting problem. Just to compare distributions, there would also be goodness of fit tests available, but most of them wouldn't help in identifying what the discriminating characteristics are. Josef > > many thanks, > David > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From rechardchen at gmail.com Tue Sep 13 22:48:32 2011 From: rechardchen at gmail.com (rechardchen) Date: Wed, 14 Sep 2011 10:48:32 +0800 Subject: [SciPy-User] What is the complexity of scipy.sparse.linalg.eigsh Message-ID: Hi, all Recently I am using scipy.sparse.linalg.eigsh to compute K smallest eigenvalues(and their eigenvectors), but I have no idea how fast the function runs. I know it uses ARPACK's implicitly restarted Lanczos Method, but I found no such information on ARPACK's official site. thanks -- /*---------------------------------- Rechardchen ----------------------------------*/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Sep 13 23:41:00 2011 From: cournape at gmail.com (David Cournapeau) Date: Tue, 13 Sep 2011 23:41:00 -0400 Subject: [SciPy-User] What is the complexity of scipy.sparse.linalg.eigsh In-Reply-To: References: Message-ID: On Tue, Sep 13, 2011 at 10:48 PM, rechardchen wrote: > Hi, all > ? ? ? Recently I am using scipy.sparse.linalg.eigsh to compute K smallest > eigenvalues(and their eigenvectors), but I have no idea how fast the > function ?runs. I know it uses ARPACK's implicitly restarted Lanczos Method, > but I found no such information on ARPACK's official site. It depends on your input (sparsity of the matrix). Assuming a square, dense matrix of nrows, the cost is O(n^2 * k), keeping in mind the constant can be pretty high. It is also data dependent, as the method is iterative (there are no algebraic solutions to eigenvalues problems in general). cheers, David From nwagner at iam.uni-stuttgart.de Wed Sep 14 05:55:52 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 14 Sep 2011 11:55:52 +0200 Subject: [SciPy-User] griddata Message-ID: Hi all, what are the differences between the griddata from scipy.interpolate import griddata and from matplotlib.mlab import griddata Is the Shepard algorithm available in matplotlib/scipy ? Nils Reference: Robert J. Renka Algorithm 790: CSHEP2D: Cubic Shepard Method for Bivariate Interpolation of Scattered Data. ACM Transactions on Mathematical Software, Vol. 25 No. 1 (1999) pp. 70-73 From rechardchen at gmail.com Wed Sep 14 08:19:07 2011 From: rechardchen at gmail.com (rechardchen) Date: Wed, 14 Sep 2011 12:19:07 +0000 (UTC) Subject: [SciPy-User] What is the complexity of scipy.sparse.linalg.eigsh References: Message-ID: thanks. what if I use csr_matrix? can the literal k in your text be close to n? From juanfiol at gmail.com Tue Sep 13 16:12:15 2011 From: juanfiol at gmail.com (Juan Fiol) Date: Tue, 13 Sep 2011 17:12:15 -0300 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter Message-ID: <4E6FB91F.1010803@gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, biblio-py (http://pypi.python.org/pypi/biblio-py/) can download also from harvard database. It is not perfect (but I am the author ;) Juan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5vuR8ACgkQqiWjWCO20uwzgwCfWRCWC+pB1F7n++AzfEAb+AQx wCgAn3bbdEUGsUjMsZ/HyZl3KjsM2HzK =F4Fw -----END PGP SIGNATURE----- From mkpaustin at gmail.com Tue Sep 13 17:05:03 2011 From: mkpaustin at gmail.com (Phil Austin) Date: Tue, 13 Sep 2011 14:05:03 -0700 Subject: [SciPy-User] [xpyx] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6F6DE1.9090500@grinta.net> References: <4E6F6DE1.9090500@grinta.net> Message-ID: <4E6FC57F.2010907@gmail.com> On 09/13/2011 07:51 AM, Daniele Nicolodi wrote: > Hello, > > I'm sorry for the OT but I do not know where else to ask, and I think > that the scientific community that surround scipy may have a clue in my > issue. > > I'm retyping my CV in Latex and I would like to do not have to type the > bibliographic entries about my publications. I thought about exporting > the list from ISI Web but the bibtex export options seems to be gone > from the new interface. I had the same problem this morning, and it turned out I had a mixture of citations from web of science and another database. When I limited the records to web of science, "save to bibtex" was still an option. -- best, Phil From ocefpaf at gmail.com Wed Sep 14 09:22:29 2011 From: ocefpaf at gmail.com (Filipe Pires Alvarenga Fernandes) Date: Wed, 14 Sep 2011 09:22:29 -0400 Subject: [SciPy-User] OT: ISI export format python parser / bibtex converter In-Reply-To: <4E6FB91F.1010803@gmail.com> References: <4E6FB91F.1010803@gmail.com> Message-ID: Cool stuff, it does much more than Quick-and-dirty hack. Thanks for the reference. -Filipe On Tue, Sep 13, 2011 at 16:12, Juan Fiol wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, biblio-py (http://pypi.python.org/pypi/biblio-py/) can download also from > harvard database. > It is not perfect (but I am the author ;) > Juan > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.17 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk5vuR8ACgkQqiWjWCO20uwzgwCfWRCWC+pB1F7n++AzfEAb+AQx > wCgAn3bbdEUGsUjMsZ/HyZl3KjsM2HzK > =F4Fw > -----END PGP SIGNATURE----- > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Wed Sep 14 12:09:28 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Sep 2011 11:09:28 -0500 Subject: [SciPy-User] [OT] statistical test for comparing two measurements (with errors) In-Reply-To: References: <1315954455.29286.YahooMailClassic@web113416.mail.gq1.yahoo.com> Message-ID: <4E70D1B8.80909@gmail.com> On 09/13/2011 06:51 PM, josef.pktd at gmail.com wrote: > On Tue, Sep 13, 2011 at 6:54 PM, David Baddeley > wrote: >> Hi all, seeing as there are a few stats gurus on the list, I thought someone might know the answer to this question: >> >> I've got two distributions and want to compare each of the moments of the distributions and determine the individual probability of each of them being equal. What I've done so far is to calculate the moments, and (using monte-carlo sub-sampling) estimate an error for each calculation. >> >> This essentially gives a value and a 'measurement error' for each moment and distribution, and I'm looking for a test which will take these pairs and determine if they're likely to be equal. One option I've considered is to use/abuse the t-test as it compares two distributions with given means and std. deviations (analagous to the value and error scenario I have). What I'm struggling with is how to choose the degrees of freedom - I've contemplated using the number of Monte-Carlo iterates, but this doesn't really seem right because I'm not convinced that they will be truely independent measures. > You are comparing raw or standardized moments? > > If you get the bootstrap standard errors of the test statistic > (comparison of moments) then I would just use the normal instead of > the t distribution, degrees of freedom equal to infinity. > > Alternatively, you could just use the simple bootstrap, use quantiles > of the bootstrap distribution, or calculate a p-value based on the > Monte Carlo distribution. > > permutations would be another way of generating a reference > distribution under the null of equal distributions. > >> The other option I've thought of is the reciprocal of the Monte-carlo selection probability - this gives results which 'feel' >> right, but I'm having a hard time finding a solid justification of it. > I'm not quite sure what you mean here. Isn't the selection probability > just 1/number of observations? > >> If anyone could suggest either an alternative test, or a suitable way of estimating degrees of freedom I'd be very grateful. > standard tests exist for mean and variance, I haven't seen much for > higher moments (or skew and kurtosis), either resampling (Monte Carlo, > bootstrap, permutation) or relying on the Law of Large Numbers and > using normal distribution, might be the only available approach. > >> To give a little more context, the underlying distributions from which I am calculating moments are 2D clouds of points and what I'm eventually aiming at is a way of quantifying shape similarity (and possibly also determining which moments give the most robust shape discrimination). > How do you handle the bivariate features, correlation, dependence? > Are you working on the original data or on some transformation, e.g. > is shape similarity rotation invariant (whatever that means)? I'm > mainly curious, it sounds like an interesting problem. > Just to compare distributions, there would also be goodness of fit > tests available, but most of them wouldn't help in identifying what > the discriminating characteristics are. > > Josef > >> many thanks, >> David > You probably need to look at Kolmogorov?Smirnov and related tests (see the 'See also' links from Wikipedia) like Anderson?Darling http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test http://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test If there is sufficient data and not too asymmetric then use the standard Normal rather than t-test to avoid the degrees of freedom. You will probably find these informative as R's 'fitdistrplus' package seems to do what you want (no experience with these): 'FITTING DISTRIBUTIONS WITH R' http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf 'fitdistrplus: Help to fit of a parametric distribution to non-censored or censored data' http://cran.r-project.org/web/packages/fitdistrplus/index.html http://cran.r-project.org/web/packages/fitdistrplus/vignettes/intro2fitdistrplus.pdf Bruce From pholvey at gmail.com Wed Sep 14 12:26:18 2011 From: pholvey at gmail.com (Patrick Holvey) Date: Wed, 14 Sep 2011 16:26:18 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?Optimize=2Efmin=5Fcg_INCREASES_total_force?= =?utf-8?q?s_after=09minimization?= References: Message-ID: Patrick Holvey gmail.com> writes: > > Hi everyone,I've got the attached program which I've detailed in previous emails.? I'm working on debugging my gradients (which I think I have) but something weird is going on.? When you load the program (from autosimplewwwV5 import *) into the interpreter and call TrueSystem.getforces() it returns the sum of the absolute values of the forces experienced by the atoms.? Ok, so when I run TrueSystem.relax() it runs the system through fmin_cg a number of times.? Here, I'm not sure why it's only going through 5-7 iterations a run before quitting so I have it run multiple (50) times to get some relaxation going on.? After running the relaxation, I call getforces() again, only to see that the forces have increased!? (from 625 to 640) Very curious.? I've attached both the full code and the test atom setup (a three atom system, 1 Si atom bonded to 2 Oxygen atoms in an angle configuration).? As expected, initially, the forces indicate the Si atom wants to pull up from the O atoms and the O atoms want to move away from each other and down, away from the Si atom.? This is not what happens.? In fact, both of the oxygen atoms move towards each other compressing the O-Si-O angle, and only marginally lengthening the O-Si bond. This can be seen by calling TrueSystem.writetofile("Filename") which will output a .xyz of the current system configuration.Any help on this is greatly appreciated.? Thanks so much.Patrick-- Patrick HolveyGraduate StudentDept. of Materials Science and EngineeringJohns Hopkins Universitypholvey1 jhu.edu > Attachment (autosimplewwwV5.py): text/x-python, 26 KiB > Attachment (test-angle.xyz): chemical/x-pdb, 201 bytes > > _______________________________________________ > SciPy-User mailing list > SciPy-User scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Please disregard this communique. It appears that the derivation of the forces that I was using was flawed, so have to fix that before I can get back to debugging. Back to the drawing board... Thanks! Patrick From cournape at gmail.com Wed Sep 14 14:10:03 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 14 Sep 2011 14:10:03 -0400 Subject: [SciPy-User] What is the complexity of scipy.sparse.linalg.eigsh In-Reply-To: References: Message-ID: On Wed, Sep 14, 2011 at 8:19 AM, rechardchen wrote: > thanks. what if I use csr_matrix? > can the literal k in your text be close to n? The sparse format does not matter: what matters is the cost of the basic operation A*x for your matrix A and one dense x vector. For dense A, A*x is O(N^2). For sparse matrix, it could be less. So the general formula for looking for k eigen values is generally O(k * cost(A*x)). cheers, David From denis.laxalde at mcgill.ca Wed Sep 14 15:45:06 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Wed, 14 Sep 2011 15:45:06 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> <20110906143014.35316829@mail.gmail.com> Message-ID: <20110914154506.0738ac5f@mail.gmail.com> On Tue, 6 Sep 2011 14:30:14 -0400, Denis Laxalde wrote: > On Tue, 6 Sep 2011 19:08:22 +0200, > Ralf Gommers wrote: > > Could you make an overview of which functions should be changed, and your > > proposed new unified interface? The best solution could depend on the > > details of the changed needed. > > The first thing to do is to improve names consistency, imo. I will > review all functions of the package and then post a list of possible > improvements. Then, I need to think a bit more about the new ??unified > interfaces?? idea as it could indeed solve some issues. As > suggested, I'll create a page on when > things will be more clear. I've drafted a proposal for improvements that I would like to discuss. Initially, I planned to post it on the dev wiki but I cannot edit anything there. Is any restriction for editing? (Also, is it the right list for this discussion or should I continue on -dev?) -- Denis From ralf.gommers at googlemail.com Wed Sep 14 16:01:37 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 14 Sep 2011 22:01:37 +0200 Subject: [SciPy-User] scipy.optimize named argument inconsistency In-Reply-To: <20110914154506.0738ac5f@mail.gmail.com> References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> <20110906143014.35316829@mail.gmail.com> <20110914154506.0738ac5f@mail.gmail.com> Message-ID: On Wed, Sep 14, 2011 at 9:45 PM, Denis Laxalde wrote: > On Tue, 6 Sep 2011 14:30:14 -0400, > Denis Laxalde wrote: > > On Tue, 6 Sep 2011 19:08:22 +0200, > > Ralf Gommers wrote: > > > Could you make an overview of which functions should be changed, and > your > > > proposed new unified interface? The best solution could depend on the > > > details of the changed needed. > > > > The first thing to do is to improve names consistency, imo. I will > > review all functions of the package and then post a list of possible > > improvements. Then, I need to think a bit more about the new ? unified > > interfaces ? idea as it could indeed solve some issues. As > > suggested, I'll create a page on when > > things will be more clear. > > I've drafted a proposal for improvements that I would like to discuss. > Initially, I planned to post it on the dev wiki but I cannot edit > anything there. Is any restriction for editing? > (Also, is it the right list for this discussion or should I continue > on -dev?) > > Can you edit existing pages? If so, I can create the page. If not, either open a ticket or use a rest doc on your github account. The latter may is less permanent but allows for inline commenting/reviewing. Probably best to move this to scipy-dev. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.laxalde at mcgill.ca Wed Sep 14 16:26:16 2011 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Wed, 14 Sep 2011 16:26:16 -0400 Subject: [SciPy-User] scipy.optimize named argument inconsistency References: <20110902143146.38224d3c@mail.gmail.com> <29508197.1059.1315183467310.JavaMail.geo-discussion-forums@yqja21> <20110906105309.160ba6c7@mail.gmail.com> <20110906143014.35316829@mail.gmail.com> <20110914154506.0738ac5f@mail.gmail.com> Message-ID: <20110914162616.1026f8b8@mail.gmail.com> On Wed, 14 Sep 2011 22:01:37 +0200, Ralf Gommers wrote: > > I've drafted a proposal for improvements that I would like to discuss. > > Initially, I planned to post it on the dev wiki but I cannot edit > > anything there. Is any restriction for editing? > > (Also, is it the right list for this discussion or should I continue > > on -dev?) > > > Can you edit existing pages? If so, I can create the page. No, I can't. > If not, either open a ticket or use a rest doc on your github account. The latter may is > less permanent but allows for inline commenting/reviewing. Ok. I will have a look at the github thing and proceed this way. Thanks. -- Denis From fperez.net at gmail.com Wed Sep 14 19:48:58 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 14 Sep 2011 16:48:58 -0700 Subject: [SciPy-User] ANN: pandas 0.4.0 release In-Reply-To: References: Message-ID: On Mon, Sep 12, 2011 at 1:50 PM, Wes McKinney wrote: > > I'm very pleased to announce the long-awaited release of the newest > version of pandas. It's the product of an absolutely huge amount of > development work primarily over the last 4 months. By the numbers: > > - Over 550 commits over 6 months > - Codebase increased more than 60% in size > - More than 300 new test functions, with overall > 97% line coverage > Congrats! Pandas is a truly impressive piece of work, many many thanks for putting it out for us mortals to enjoy. All the best, f From johann.cohentanugi at gmail.com Thu Sep 15 02:09:25 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Thu, 15 Sep 2011 08:09:25 +0200 Subject: [SciPy-User] polylogarithm? Message-ID: <4E719695.1080503@gmail.com> hi there, any chance for a polylog implementation in scipy.special? I know it is there in mpmath, but I thought I would ask anyway. best, Johann From tmp50 at ukr.net Thu Sep 15 12:21:03 2011 From: tmp50 at ukr.net (Dmitrey) Date: Thu, 15 Sep 2011 19:21:03 +0300 Subject: [SciPy-User] [ANN} OpenOpt, FuncDesigner, DerApproximator, SpaceFuncs release 0.36 Message-ID: <44514.1316103663.927787890751766528@ffe8.ukr.net> Hi all, new release of our free soft (OpenOpt, FuncDesigner, DerApproximator, SpaceFuncs) v. 0.36 is out: OpenOpt: > * Now solver interalg can handle all types of constraints and integration problems > * Some minor improvements and code cleanup > FuncDesigner: > * Interval analysis now can involve min, max and 1-d monotone splines R -> R of 1st and 3rd order > * Some bugfixes and improvements > SpaceFuncs: > * Some minor changes > DerApproximator: > * Some improvements for obtaining derivatives in points from R^n where left or right derivative for a variable is absent, especially for stencil > 1 > See http://openopt.org for more details. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalomba at austin.rr.com Thu Sep 15 13:00:17 2011 From: apalomba at austin.rr.com (Anthony Palomba) Date: Thu, 15 Sep 2011 12:00:17 -0500 Subject: [SciPy-User] Interpolate a list of numbers? In-Reply-To: References: Message-ID: Hey Robert, thanks for your response. That worked perfectly! many thanks, Anthony On Tue, Sep 13, 2011 at 6:43 PM, Robert Kern wrote: > On Tue, Sep 13, 2011 at 18:22, Anthony Palomba > wrote: > > Just to be super clear... > > > > Instead of using a range, a la np.linspace, I want to be able > > to interpolate a list of paired tuples, > > > > like [x1, y1, x2, y2, x3, y3, x4, y4] > > > > Any ideas? > > I am going to assume you meant [(x1,y1), (x2,y2), ...]. > > def interpmap(c, points): > x, y = np.array(points).transpose() > f = interpolate.interp1d(x, y) > return f(c) > > > On Tue, Sep 13, 2011 at 5:57 PM, Anthony Palomba > > > wrote: > >> > >> Hey scipyers, > >> > >> I have an interpolation question... > >> > >> I have a function that I am currently using to do interpolation. > >> > >> def interpmap(c, x1, y1, x2, y2, base = None): > >> range1 = np.linspace(x1, y1, 20) > >> range2 = np.linspace(x2, y2, 20) > >> f = interpolate.interp1d(range1, range2) > >> return f(c) > >> > >> I takes range (x1 <= y1) and maps it on to (x2 <= y2). > >> > >> I would like to be able to specify a list of points as > >> the range. Is it possible to create a linspace from a list? > >> > >> > >> > >> Thanks, > >> Anthony > >> > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan at ajackson.org Thu Sep 15 21:00:19 2011 From: alan at ajackson.org (alan at ajackson.org) Date: Thu, 15 Sep 2011 20:00:19 -0500 Subject: [SciPy-User] computing local extrema In-Reply-To: References: <4A951D60.7010104@gmail.com> Message-ID: <20110915200019.131cb982@ajackson.org> Here's what I wrote for finding extrema of a 1D array def extrema(signal, hilow="Hi"): ''' hilow = "Hi", "Low", or "Both" depending on whether you want upper or lower extrema, or all of them. ''' a = np.sign(np.diff(signal)) zerolocs = np.transpose(np.where( (a[1:]+a[0:-1]==0.) + (a==0.)[0:-1] )).flatten() + 1 zerolocs = zerolocs[zerolocs>=a.argmax()] # remove leading zeros if zerolocs[0] < 1: zerolocs = zerolocs[1:] if zerolocs[-1]>len(a)-1: zerolocs = zerolocs[0:-1] if hilow == "Low" : return zerolocs[np.where(a[zerolocs] >0)] elif hilow == "Hi" : return zerolocs[np.where(a[zerolocs] <=0)] else : return zerolocs >Hi, > >You could create multiple thresholds along one axis of the array, and >get all elements above each threshold value (=region). You build a >region-stack from this; a 2D/3D boolean array that says which elements >are above the threshold value for each threshold you used. If a region >contains a region in a higher threshold value level (higher up in the >stack) that means you haven't found the local maximum yet; if it >doesn't, that is your top. > >This method would allow flats to be detected as well. > >The last thing you have to do is: for every remaining region that is a >local maximum, you compute the coordinates and value as the mean of >all coordinates or values contained in the region. > >Hope this helps >Martin > > >2009/8/26 fred : >> Hi, >> >> I would like to compute local extrema of an array (2D/3D), >> ie get a list of points (coords + value). >> >> How could I do this? >> >> Any hint? >> >> TIA. >> >> Cheers, >> >> -- >> Fred >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >_______________________________________________ >SciPy-User mailing list >SciPy-User at scipy.org >http://mail.scipy.org/mailman/listinfo/scipy-user -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From ralf.gommers at googlemail.com Fri Sep 16 13:11:55 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 16 Sep 2011 19:11:55 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: <4E719695.1080503@gmail.com> References: <4E719695.1080503@gmail.com> Message-ID: On Thu, Sep 15, 2011 at 8:09 AM, Johann Cohen-Tanugi < johann.cohentanugi at gmail.com> wrote: > hi there, any chance for a polylog implementation in scipy.special? I > know it is there in mpmath, but I thought I would ask anyway. > > If someone (you?) contributes a patch, that would be a great addition to scipy.special imho. mpmath is nice, but it doesn't understand ndarrays and is way too slow when you want to use the polylog for something like fitting Bose-Einstein or Fermi-Dirac distributions. It looks like the implementation in mpmath is quite clean and could provide a starting point for a Cython/C version. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fredrik.johansson at gmail.com Fri Sep 16 13:24:41 2011 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Fri, 16 Sep 2011 19:24:41 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> Message-ID: On Fri, Sep 16, 2011 at 7:11 PM, Ralf Gommers wrote: > > > On Thu, Sep 15, 2011 at 8:09 AM, Johann Cohen-Tanugi > wrote: >> >> hi there, any chance for a polylog implementation in scipy.special? I >> know it is there in mpmath, but I thought I would ask anyway. >> > If someone (you?) contributes a patch, that would be a great addition to > scipy.special imho. mpmath is nice, but it doesn't understand ndarrays and > is way too slow when you want to use the polylog for something like fitting > Bose-Einstein or Fermi-Dirac distributions. > > It looks like the implementation in mpmath is quite clean and could provide > a starting point for a Cython/C version. Maybe it's still too slow, but mpmath.fp.polylog is about 100 times faster than the multiprecision version (and usually gives nearly full double-precision accuracy anyway). It should indeed be quite simple to translate to Cython/C. Fredrik From iefinkel at gmail.com Fri Sep 16 14:59:28 2011 From: iefinkel at gmail.com (Eli Finkelshteyn) Date: Fri, 16 Sep 2011 14:59:28 -0400 Subject: [SciPy-User] Problem Installing with Python 2.7 on OS X 10.6 Message-ID: <4E739C90.9040805@gmail.com> Hi, Every time I try to install SciPy, I get the following error: $ pip install scipy Downloading/unpacking scipy Running setup.py egg_info for package scipy Traceback (most recent call last): File "", line 14, in File "/Users/elifinkelshteyn/build/scipy/setup.py", line 181, in setup_package() File "/Users/elifinkelshteyn/build/scipy/setup.py", line 131, in setup_package from numpy.distutils.core import setup File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/__init__.py", line 137, in import add_newdocs File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in from type_check import * File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in import numpy.core.numeric as _nx File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/__init__.py", line 5, in import multiarray ImportError: dynamic module does not define init function (initmultiarray) I've tried installing from source by cloning from Git, installing through pip, installing through easy_install. Nothing works. I'm running everything through OS X 10.6 with Homebrew and using Python 2.7. Numpy is installed and works fine. gfortran is installed as well. I've searched up and down Google and found no answers. Anyone experience anything like this, or have an idea of how to fix it? Eli From cournape at gmail.com Fri Sep 16 15:46:31 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 16 Sep 2011 15:46:31 -0400 Subject: [SciPy-User] Problem Installing with Python 2.7 on OS X 10.6 In-Reply-To: <4E739C90.9040805@gmail.com> References: <4E739C90.9040805@gmail.com> Message-ID: On Fri, Sep 16, 2011 at 2:59 PM, Eli Finkelshteyn wrote: > Hi, > Every time I try to install SciPy, I get the following error: > > $ pip install scipy > Downloading/unpacking scipy > ? Running setup.py egg_info for package scipy > ? ? Traceback (most recent call last): > ? ? ? File "", line 14, in > ? ? ? File "/Users/elifinkelshteyn/build/scipy/setup.py", line 181, in > > ? ? ? ? setup_package() > ? ? ? File "/Users/elifinkelshteyn/build/scipy/setup.py", line 131, in > setup_package > ? ? ? ? from numpy.distutils.core import setup > ? ? ? File > "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/__init__.py", > line 137, in > ? ? ? ? import add_newdocs > ? ? ? File > "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/add_newdocs.py", > line 9, in > ? ? ? ? from numpy.lib import add_newdoc > ? ? ? File > "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/__init__.py", > line 4, in > ? ? ? ? from type_check import * > ? ? ? File > "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/type_check.py", > line 8, in > ? ? ? ? import numpy.core.numeric as _nx > ? ? ? File > "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/__init__.py", > line 5, in > ? ? ? ? import multiarray > ? ? ImportError: dynamic module does not define init function > (initmultiarray) > > I've tried installing from source by cloning from Git, installing > through pip, installing through easy_install. Nothing works. I'm running > everything through OS X 10.6 with Homebrew and using Python 2.7. Numpy > is installed and works fine. Actually, numpy does not work: you get an error when importing numpy (python -c "import numpy" should give you the same error). cheers, David From mesanthu at gmail.com Fri Sep 16 16:37:40 2011 From: mesanthu at gmail.com (santhu kumar) Date: Fri, 16 Sep 2011 15:37:40 -0500 Subject: [SciPy-User] QR factorization with Pivoting Message-ID: Hello all, My scipy version is : 0.9.0rc3. This version does not have the QR factorization with pivoting facility. I have found that this has been fixed (https://github.com/collinstocks/scipy/compare/master...qr-with-pivoting), but I dont know which version of scipy does it come with. Can you help me in incorporating the feature in my current Scipy. I use RHEL6 as os with ATLAS LAPACK compiled. Do I need to reinstall my scipy? if then, to which version. Is there way that I can only update the required files? I have not used git and I am unsure on how to do it. Thanks Santhosh -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Sep 16 20:12:25 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 17 Sep 2011 00:12:25 +0000 (UTC) Subject: [SciPy-User] griddata References: Message-ID: Wed, 14 Sep 2011 11:55:52 +0200, Nils Wagner wrote: > what are the differences between the griddata > > from scipy.interpolate import griddata This works in dimensions > 2-D. > from matplotlib.mlab import griddata This works only in 2-D. > Is the Shepard algorithm available in matplotlib/scipy ? No. Pauli From pav at iki.fi Fri Sep 16 20:16:47 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 17 Sep 2011 00:16:47 +0000 (UTC) Subject: [SciPy-User] griddata References: Message-ID: Sat, 17 Sep 2011 00:12:25 +0000, Pauli Virtanen wrote: > Wed, 14 Sep 2011 11:55:52 +0200, Nils Wagner wrote: >> what are the differences between the griddata >> >> from scipy.interpolate import griddata > > This works in dimensions > 2-D. >= 2-D, I mean (based on qhull). From xavier.gnata at gmail.com Sat Sep 17 10:17:57 2011 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Sat, 17 Sep 2011 16:17:57 +0200 Subject: [SciPy-User] wrong prerequisites statements on scipy.org Message-ID: <4E74AC15.3050204@gmail.com> Hi, http://www.scipy.org would state clearly which versions of python are supported by numpy and scipy. The front page says nothing about that and the faq statements are wrong: http://www.scipy.org/FAQ states that "NumPy requires the following software installed: 1. Python 2.4.x or 2.5.x" This documentation issue is a big one when you try to ask people to switch to numpy/scipy. BTW, which version are supported? 2.6 and 2.7 for sure. 3.X works but is it official? 2.4 or 2.5... I don't know. Ok it is easy to figure it out but the documentation should IMHO state that on its front page. Xavier From josef.pktd at gmail.com Sat Sep 17 17:11:14 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 17 Sep 2011 17:11:14 -0400 Subject: [SciPy-User] nmrglue leastsqbound Message-ID: Maybe a case for scipy central ? http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leastsqbound.py http://stackoverflow.com/questions/7409694/scipy-bounds-for-fitting-parameters-when-using-optimize-leastsq I hope advertising someone else's code is ok. Josef From ralf.gommers at googlemail.com Sun Sep 18 06:54:53 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Sep 2011 12:54:53 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> Message-ID: On Fri, Sep 16, 2011 at 7:24 PM, Fredrik Johansson < fredrik.johansson at gmail.com> wrote: > On Fri, Sep 16, 2011 at 7:11 PM, Ralf Gommers > wrote: > > > > > > On Thu, Sep 15, 2011 at 8:09 AM, Johann Cohen-Tanugi > > wrote: > >> > >> hi there, any chance for a polylog implementation in scipy.special? I > >> know it is there in mpmath, but I thought I would ask anyway. > >> > > If someone (you?) contributes a patch, that would be a great addition to > > scipy.special imho. mpmath is nice, but it doesn't understand ndarrays > and > > is way too slow when you want to use the polylog for something like > fitting > > Bose-Einstein or Fermi-Dirac distributions. > > > > It looks like the implementation in mpmath is quite clean and could > provide > > a starting point for a Cython/C version. > > Maybe it's still too slow, but mpmath.fp.polylog is about 100 times > faster than the multiprecision version (and usually gives nearly full > double-precision accuracy anyway). > Thanks for the tip. In [2]: %timeit mpmath.polylog(2, 10) 1000 loops, best of 3: 1.77 ms per loop In [3]: %timeit mpmath.fp.polylog(2, 10) 10000 loops, best of 3: 66.6 us per loop That's much better. Still a little slow for use in curve fitting perhaps, but when wrapping it with numpy.vectorize it should be useable on reasonable size ndarrays. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Sep 18 07:02:08 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Sep 2011 13:02:08 +0200 Subject: [SciPy-User] ANN: SciPy 0.10.0 beta 2 Message-ID: Hi, The first beta release of scipy 0.10.0 was, well, beta quality, therefore I am pleased to announce the availability of the second 0.10.0 beta release. For this release over a 100 tickets and pull requests have been closed, and many new features have been added. Some of the highlights are: - support for Bento as a build system for scipy - generalized and shift-invert eigenvalue problems in sparse.linalg - addition of discrete-time linear systems in the signal module Sources and binaries can be found at https://sourceforge.net/projects/scipy/files/scipy/0.10.0b2/, release notes are copied below. SciPy 0.10 is compatible with Python 2.4 - 3.2, and requires numpy 1.5.1 or higher. Please try this release and report problems on the mailing list. Cheers, Ralf ========================== SciPy 0.10.0 Release Notes ========================== .. note:: Scipy 0.10.0 is not released yet! .. contents:: SciPy 0.10.0 is the culmination of XXX months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.10.x branch, and on adding new features on the development trunk. This release requires Python 2.4-2.7 or 3.1- and NumPy 1.5 or greater. New features ============ Bento: new optional build system -------------------------------- Scipy can now be built with `Bento `_. Bento has some nice features like parallel builds and partial rebuilds, that are not possible with the default build system (distutils). For usage instructions see BENTO_BUILD.txt in the scipy top-level directory. Currently Scipy has three build systems, distutils, numscons and bento. Numscons is deprecated and is planned and will likely be removed in the next release. Generalized and shift-invert eigenvalue problems in ``scipy.sparse.linalg`` --------------------------------------------------------------------------- The sparse eigenvalue problem solver functions ``scipy.sparse.eigs/eigh`` now support generalized eigenvalue problems, and all shift-invert modes available in ARPACK. Discrete-Time Linear Systems (``scipy.signal``) ----------------------------------------------- Support for simulating discrete-time linear systems, including ``scipy.signal.dlsim``, ``scipy.signal.dimpulse``, and ``scipy.signal.dstep``, has been added to SciPy. Conversion of linear systems from continuous-time to discrete-time representations is also present via the ``scipy.signal.cont2discrete`` function. Enhancements to ``scipy.signal`` -------------------------------- A Lomb-Scargle periodogram can now be computed with the new function ``scipy.signal.lombscargle``. The forward-backward filter function ``scipy.signal.filtfilt`` can now filter the data in a given axis of an n-dimensional numpy array. (Previously it only handled a 1-dimensional array.) Options have been added to allow more control over how the data is extended before filtering. FIR filter design with ``scipy.signal.firwin2`` now has options to create filters of type III (zero at zero and Nyquist frequencies) and IV (zero at zero frequency). Additional decomposition options (``scipy.linalg``) --------------------------------------------------- A sort keyword has been added to the Schur decomposition routine (``scipy.linalg.schur``) to allow the sorting of eigenvalues in the resultant Schur form. Additional special matrices (``scipy.linalg``) ---------------------------------------------- The functions ``hilbert`` and ``invhilbert`` were added to ``scipy.linalg``. Enhancements to ``scipy.stats`` ------------------------------- * The *one-sided form* of Fisher's exact test is now also implemented in ``stats.fisher_exact``. * The function ``stats.chi2_contingency`` for computing the chi-square test of independence of factors in a contingency table has been added, along with the related utility functions ``stats.contingency.margins`` and ``stats.contingency.expected_freq``. Basic support for Harwell-Boeing file format for sparse matrices ---------------------------------------------------------------- Both read and write are support through a simple function-based API, as well as a more complete API to control number format. The functions may be found in scipy.sparse.io. The following features are supported: * Read and write sparse matrices in the CSC format * Only real, symmetric, assembled matrix are supported (RUA format) Deprecated features =================== ``scipy.maxentropy`` -------------------- The maxentropy module is unmaintained, rarely used and has not been functioning well for several releases. Therefore it has been deprecated for this release, and will be removed for scipy 0.11. Logistic regression in scikits.learn is a good alternative for this functionality. The ``scipy.maxentropy.logsumexp`` function has been moved to ``scipy.misc``. ``scipy.lib.blas`` ------------------ There are similar BLAS wrappers in ``scipy.linalg`` and ``scipy.lib``. These have now been consolidated as ``scipy.linalg.blas``, and ``scipy.lib.blas`` is deprecated. Numscons build system --------------------- The numscons build system is being replaced by Bento, and will be removed in one of the next scipy releases. Removed features ================ The deprecated name `invnorm` was removed from ``scipy.stats.distributions``, this distribution is available as `invgauss`. The following deprecated nonlinear solvers from ``scipy.optimize`` have been removed:: - ``broyden_modified`` (bad performance) - ``broyden1_modified`` (bad performance) - ``broyden_generalized`` (equivalent to ``anderson``) - ``anderson2`` (equivalent to ``anderson``) - ``broyden3`` (obsoleted by new limited-memory broyden methods) - ``vackar`` (renamed to ``diagbroyden``) Other changes ============= ``scipy.constants`` has been updated with the CODATA 2010 constants. ``__all__`` dicts have been added to all modules, which has cleaned up the namespaces (particularly useful for interactive work). An API section has been added to the documentation, giving recommended import guidelines and specifying which submodules are public and which aren't. Checksums ========= f986e635f37eb064f647fcd7ecffbd20 release/installers/scipy-0.10.0b2-py2.7-python.org-macosx10.6.dmg 6e474846d85469271c7c5adb87c7edef release/installers/scipy-0.10.0b2-win32-superpack-python2.5.exe 9aa3cf8eb60e9f9cddc70c7110a5fd42 release/installers/scipy-0.10.0b2-win32-superpack-python2.6.exe 43f462791a4b9159df57054d7294d442 release/installers/scipy-0.10.0b2-win32-superpack-python2.7.exe f6aa76ad9209a879ff525a509574ac77 release/installers/scipy-0.10.0b2-win32-superpack-python3.1.exe 9fc5962c62e0b8b0765128c506e15def release/installers/scipy-0.10.0b2-win32-superpack-python3.2.exe ac6683d466c61b1884d0d185523d7775 release/installers/scipy-0.10.0b2.tar.gz 48371c49661bc443639b90f66713266b release/installers/scipy-0.10.0b2.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Sep 18 07:11:10 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Sep 2011 13:11:10 +0200 Subject: [SciPy-User] QR factorization with Pivoting In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 10:37 PM, santhu kumar wrote: > Hello all, > > My scipy version is : 0.9.0rc3. > > This version does not have the QR factorization with pivoting facility. I > have found that this has been fixed > (https://github.com/collinstocks/scipy/compare/master...qr-with-pivoting), > but I dont know which version of scipy does it come with. > > This is included in the soon-to-be-released 0.10.0. It is easiest for you to update to 0.10.0b2 (tarball on sourceforge), or current git master. As you see from the above diff it relies on a new f2py wrapper of a LAPACK function, so you can't just copy over the files to your current scipy install. Cheers, Ralf > Can you help me in incorporating the feature in my current Scipy. I use > RHEL6 as os with ATLAS LAPACK compiled. > Do I need to reinstall my scipy? if then, to which version. > Is there way that I can only update the required files? I have not used git > and I am unsure on how to do it. > > Thanks > Santhosh > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Sep 18 07:18:16 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Sep 2011 13:18:16 +0200 Subject: [SciPy-User] wrong prerequisites statements on scipy.org In-Reply-To: <4E74AC15.3050204@gmail.com> References: <4E74AC15.3050204@gmail.com> Message-ID: On Sat, Sep 17, 2011 at 4:17 PM, Xavier Gnata wrote: > Hi, > > http://www.scipy.org would state clearly which versions of python are > supported by numpy and scipy. > > The front page says nothing about that and the faq statements are wrong: > http://www.scipy.org/FAQ states that > > "NumPy requires the following software installed: > > 1. Python 2.4.x or 2.5.x" > I updated this. > > This documentation issue is a big one when you try to ask people to > switch to numpy/scipy. > > That's true. There is a lot of documentation like this however, and we rely to a certain extent on the user community to keep documents like this up-to-date. > BTW, which version are supported? 2.6 and 2.7 for sure. 3.X works but is > it official? > It is. Both numpy and scipy currently support Python 2.4 - 3.2. > 2.4 or 2.5... I don't know. Ok it is easy to figure it out but the > documentation should IMHO state that on its front page. > > I don't think this belongs on the scipy.org front page. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Sep 18 07:37:48 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 18 Sep 2011 13:37:48 +0200 Subject: [SciPy-User] nmrglue leastsqbound In-Reply-To: References: Message-ID: On Sat, Sep 17, 2011 at 11:11 PM, wrote: > Maybe a case for scipy central ? > > Sure. > > http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leastsqbound.py > > http://stackoverflow.com/questions/7409694/scipy-bounds-for-fitting-parameters-when-using-optimize-leastsq > > There seems to be a real need for this. See also Matt Newville's lmfit. At some point some of that should land in scipy.optimize I suppose. > I hope advertising someone else's code is ok. > > Why not? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.gnata at gmail.com Sun Sep 18 10:14:19 2011 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Sun, 18 Sep 2011 16:14:19 +0200 Subject: [SciPy-User] wrong prerequisites statements on scipy.org In-Reply-To: References: <4E74AC15.3050204@gmail.com> Message-ID: <4E75FCBB.8060002@gmail.com> On 09/18/2011 01:18 PM, Ralf Gommers wrote: > > > On Sat, Sep 17, 2011 at 4:17 PM, Xavier Gnata > wrote: > > Hi, > > http://www.scipy.org would state clearly which versions of python are > supported by numpy and scipy. > > The front page says nothing about that and the faq statements are > wrong: > http://www.scipy.org/FAQ states that > > "NumPy requires the following software installed: > > 1. Python 2.4.x or 2.5.x" > > > I updated this. thanks! > > > This documentation issue is a big one when you try to ask people to > switch to numpy/scipy. > > That's true. There is a lot of documentation like this however, and we > rely to a certain extent on the user community to keep documents like > this up-to-date. Well, the documentation of the fairly basic numpy/scipy capabilities is good or even very good. That is why was surprised to miss this basic prerequisites statement. > > BTW, which version are supported? 2.6 and 2.7 for sure. 3.X works > but is > it official? > > > It is. Both numpy and scipy currently support Python 2.4 - 3.2. > ok. thanks for the clarification. > 2.4 or 2.5... I don't know. Ok it is easy to figure it out but the > documentation should IMHO state that on its front page. > > I don't think this belongs on the scipy.org front page. > ok. I think it should be on the front page of the numpy/scipy doc ;) Xavier > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From nwagner at iam.uni-stuttgart.de Mon Sep 19 06:18:18 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 19 Sep 2011 12:18:18 +0200 Subject: [SciPy-User] partial derivatives of a bivariate spline Message-ID: Hi all, How can I compute the partial derivatives of a bivariate spline in scipy ? Nils From klonuo at gmail.com Mon Sep 19 08:05:34 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Mon, 19 Sep 2011 14:05:34 +0200 Subject: [SciPy-User] Should I use pickle for numpy array? Message-ID: I'm not sure what is the best (or common) way to store numpy array to disk. Some even suggest HDF5, but data in question is not huge. Thanks From scott.sinclair.za at gmail.com Mon Sep 19 09:03:56 2011 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 19 Sep 2011 15:03:56 +0200 Subject: [SciPy-User] Should I use pickle for numpy array? In-Reply-To: References: Message-ID: On 19 September 2011 14:05, Klonuo Umom wrote: > I'm not sure what is the best (or common) way to store numpy array to disk. > Some even suggest HDF5, but data in question is not huge. It's simplest to use http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.save.html and http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.load.html if you're planning on reading it back using numpy. If you need the data to be accessible from other software then look into more portable formats like HDF5, NetCDF, etc, etc... Cheers, Scott From scipy at samueljohn.de Mon Sep 19 09:06:28 2011 From: scipy at samueljohn.de (Samuel John) Date: Mon, 19 Sep 2011 15:06:28 +0200 Subject: [SciPy-User] Should I use pickle for numpy array? In-Reply-To: References: Message-ID: Hi Klonuo, for big data and long term storage hdf5 (with h5py!!) is a must -- in my opinion. For short/temporary stuff pickle is just fine :-) Samuel From klonuo at gmail.com Mon Sep 19 09:15:25 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Mon, 19 Sep 2011 15:15:25 +0200 Subject: [SciPy-User] Should I use pickle for numpy array? In-Reply-To: References: Message-ID: Thanks for your replies I guess I was just looking for numpy.save/load and if I face portability issue of huge data in future, then I'll consider h5py ;) Cheers From anders.harrysson at fem4fun.com Mon Sep 19 09:39:24 2011 From: anders.harrysson at fem4fun.com (Anders Harrysson) Date: Mon, 19 Sep 2011 15:39:24 +0200 Subject: [SciPy-User] Bandpass filter Message-ID: <4E77460C.7020800@fem4fun.com> Hi, I need some help with the task to write a bandpass filter. I need to specify the ripple, the width of the transition and the frequencies to define the interval of the frequencies to pass. I have looked thou the python cookbook, but only found examples of low-pass. Thankful for any help on this subject. /A From johann.cohentanugi at gmail.com Mon Sep 19 09:42:52 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Mon, 19 Sep 2011 15:42:52 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> Message-ID: <4E7746DC.90808@gmail.com> hi Ralph and Fredrik, thanks for the feedback. I can certainly try to teach myself a bit of cython, and to write these special functions for scipy (I would probably have to start with Riemann zeta function), but it will take some time :) For now find attached a first cython version of a snipet of the mpmath code, completely trimmed to be used in only one case (polylog_series). The timing results are : In [4]: %timeit polylog.polylog(2,0.5) 10000 loops, best of 3: 29.2 us per loop In [5]: %timeit polylog_cy.polylog(2,0.5) 100000 loops, best of 3: 14.3 us per loop In [7]: %timeit mpmath.fp.polylog(2,0.5) 10000 loops, best of 3: 36.5 us per loop where the first one is the mpmath code without the mpmath ctx objects invoked, the second is the cython version, and the third is the mpmath native version (0.17). Before I dive into cython management of arrays in order for polylog to accept input arrays, I would appreciate feedback on where to look for potential further improvements using cython, if any, on a simplistic script as this one. best, Johann On 09/16/2011 07:11 PM, Ralf Gommers wrote: > > > On Thu, Sep 15, 2011 at 8:09 AM, Johann Cohen-Tanugi > > > wrote: > > hi there, any chance for a polylog implementation in scipy.special? I > know it is there in mpmath, but I thought I would ask anyway. > > If someone (you?) contributes a patch, that would be a great addition > to scipy.special imho. mpmath is nice, but it doesn't understand > ndarrays and is way too slow when you want to use the polylog for > something like fitting Bose-Einstein or Fermi-Dirac distributions. > > It looks like the implementation in mpmath is quite clean and could > provide a starting point for a Cython/C version. > > Ralf > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: polylog_cy.pyx URL: From pav at iki.fi Mon Sep 19 09:55:24 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 19 Sep 2011 13:55:24 +0000 (UTC) Subject: [SciPy-User] polylogarithm? References: <4E719695.1080503@gmail.com> <4E7746DC.90808@gmail.com> Message-ID: Mon, 19 Sep 2011 15:42:52 +0200, Johann Cohen-Tanugi wrote: [clip] > where the first one is the mpmath code without the mpmath ctx objects > invoked, the second is the cython version, and the third is the mpmath > native version (0.17). Before I dive into cython management of arrays in > order for polylog to accept input arrays, I would appreciate feedback on > where to look for potential further improvements using cython, if any, > on a simplistic script as this one. In short: - Check the annotated HTML output from "cython -a". The name of the game is to make things less yellow. - Add cdef type for all variables (removes PyObject boxing). This will be the main source of speed gains. - Sprinkle in @cython.cdivision(True) - Use "from libc.math cimport abs" - Take a look at lambertw.pyx in scipy.special. It shows you how to make the function an ufunc. -- Pauli Virtanen From kmacmanu at ciesin.columbia.edu Mon Sep 19 10:57:30 2011 From: kmacmanu at ciesin.columbia.edu (Kytt MacManus) Date: Mon, 19 Sep 2011 10:57:30 -0400 Subject: [SciPy-User] -inf when summing arrays Message-ID: <4E77585A.7020401@ciesin.columbia.edu> Hello and good day, I am a newbie to the numpy/scipy so I apologize if this is a trivial question. I have been attempting to do a simple zonal sum of 2 arrays. I have one integer array of zones (labels), and another array of floating point values. For some reason when I attempt scipy.ndimage.sum(valueArray,labels=zoneArray,index=1) The return value is -inf for each of my zones. If I run valueArray.sum() an actual number is returned. I am not sure what the -inf signifies and have not been able to locate any useful documentation on the subject. Any insight or advice would be much appreciated. Thanks, Kytt -- Kytt MacManus Geographic Information Specialist CIESIN Earth Institute Columbia University Adjunct Lecturer School of International and Public Affairs Columbia University P.O. Box 1000 61 Route 9W Palisades, NY 10964 845-365-8939 (V) 845-365-8922 (F) www.ciesin.columbia.edu From zachary.pincus at yale.edu Mon Sep 19 11:04:29 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 19 Sep 2011 11:04:29 -0400 Subject: [SciPy-User] -inf when summing arrays In-Reply-To: <4E77585A.7020401@ciesin.columbia.edu> References: <4E77585A.7020401@ciesin.columbia.edu> Message-ID: <511774E0-06DE-46BA-93C2-B46464DF6845@yale.edu> Can you provide a small/self-contained example of values/labels arrays that reproduce this issue? (Perhaps by trying various chunks of your input arrays until you find the error-producing region?) Also, does numpy.all(numpy.isfinite(valueArray)) return True? Zach On Sep 19, 2011, at 10:57 AM, Kytt MacManus wrote: > Hello and good day, > > I am a newbie to the numpy/scipy so I apologize if this is a trivial > question. > > I have been attempting to do a simple zonal sum of 2 arrays. I have one > integer array of zones (labels), and another array of floating point values. > > For some reason when I attempt > scipy.ndimage.sum(valueArray,labels=zoneArray,index=1) > > The return value is -inf for each of my zones. > > If I run valueArray.sum() an actual number is returned. > > I am not sure what the -inf signifies and have not been able to locate > any useful documentation on the subject. > > Any insight or advice would be much appreciated. > > Thanks, > Kytt > > -- > Kytt MacManus > Geographic Information Specialist > CIESIN > Earth Institute > Columbia University > > Adjunct Lecturer > School of International and Public Affairs > Columbia University > > P.O. Box 1000 > 61 Route 9W > Palisades, NY 10964 > > 845-365-8939 (V) > 845-365-8922 (F) > www.ciesin.columbia.edu > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From johann.cohen-tanugi at lupm.univ-montp2.fr Mon Sep 19 10:15:06 2011 From: johann.cohen-tanugi at lupm.univ-montp2.fr (Johann Cohen-Tanugi) Date: Mon, 19 Sep 2011 16:15:06 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> <4E7746DC.90808@gmail.com> Message-ID: <4E774E6A.1050409@lupm.univ-montp2.fr> thanks a lot Pauli, a quesiton offlist : what do you mean by "PyObject boxing"? best, Johann On 09/19/2011 03:55 PM, Pauli Virtanen wrote: > Mon, 19 Sep 2011 15:42:52 +0200, Johann Cohen-Tanugi wrote: > [clip] >> where the first one is the mpmath code without the mpmath ctx objects >> invoked, the second is the cython version, and the third is the mpmath >> native version (0.17). Before I dive into cython management of arrays in >> order for polylog to accept input arrays, I would appreciate feedback on >> where to look for potential further improvements using cython, if any, >> on a simplistic script as this one. > In short: > > - Check the annotated HTML output from "cython -a". > The name of the game is to make things less yellow. > > - Add cdef type for all variables (removes PyObject boxing). > This will be the main source of speed gains. > > - Sprinkle in @cython.cdivision(True) > > - Use "from libc.math cimport abs" > > - Take a look at lambertw.pyx in scipy.special. > It shows you how to make the function an ufunc. > From ralf.gommers at googlemail.com Mon Sep 19 11:50:23 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 19 Sep 2011 17:50:23 +0200 Subject: [SciPy-User] partial derivatives of a bivariate spline In-Reply-To: References: Message-ID: On Mon, Sep 19, 2011 at 12:18 PM, Nils Wagner wrote: > Hi all, > > How can I compute the partial derivatives of a bivariate > spline in scipy ? > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplev.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Mon Sep 19 12:19:05 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Mon, 19 Sep 2011 18:19:05 +0200 Subject: [SciPy-User] Extract datetime range values from numpy array Message-ID: Hi again, I have numpy array like this: In[] ndata.dtype Out[] dtype=[('dt', '|O8'), ('value', ' From paul.anton.letnes at gmail.com Mon Sep 19 13:59:21 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 19 Sep 2011 19:59:21 +0200 Subject: [SciPy-User] Should I use pickle for numpy array? In-Reply-To: References: Message-ID: On Mon, Sep 19, 2011 at 3:15 PM, Klonuo Umom wrote: > Thanks for your replies > > I guess I was just looking for numpy.save/load > > and if I face portability issue of huge data in future, then I'll > consider h5py ;) > > > Cheers > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user I must admit I prefer h5py, even for small amounts of data. The tree like structure of the data makes organizing your arrays so much easier. >>> import h5py >>> f = h5py.File('example.hdf5', 'w') >>> import numpy >>> f['my_array'] = numpy.arange(10) >>> f.close() Almost as easy to use as numpy.save(), too. Cheers, Paul From nwagner at iam.uni-stuttgart.de Mon Sep 19 15:07:57 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 19 Sep 2011 21:07:57 +0200 Subject: [SciPy-User] partial derivatives of a bivariate spline In-Reply-To: References: Message-ID: On Mon, 19 Sep 2011 17:50:23 +0200 Ralf Gommers wrote: > On Mon, Sep 19, 2011 at 12:18 PM, Nils Wagner > wrote: > >> Hi all, >> >> How can I compute the partial derivatives of a bivariate >> spline in scipy ? >> >> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplev.html AFAIK, bisplev and bisplrep correspond to the old FITPACK wrapper. I would like to use http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BivariateSpline.html#scipy.interpolate.BivariateSpline http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LSQBivariateSpline.html#scipy.interpolate.LSQBivariateSpline Nils From ralf.gommers at googlemail.com Mon Sep 19 15:20:17 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 19 Sep 2011 21:20:17 +0200 Subject: [SciPy-User] partial derivatives of a bivariate spline In-Reply-To: References: Message-ID: On Mon, Sep 19, 2011 at 9:07 PM, Nils Wagner wrote: > On Mon, 19 Sep 2011 17:50:23 +0200 > Ralf Gommers wrote: > > On Mon, Sep 19, 2011 at 12:18 PM, Nils Wagner > > wrote: > > > >> Hi all, > >> > >> How can I compute the partial derivatives of a bivariate > >> spline in scipy ? > >> > >> > > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplev.html > > AFAIK, bisplev and bisplrep correspond to the old FITPACK > wrapper. > I would like to use > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BivariateSpline.html#scipy.interpolate.BivariateSpline > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LSQBivariateSpline.html#scipy.interpolate.LSQBivariateSpline > > Looks like BivariateSpline forgot to grow a derivatives() method, UnivariateSpline does have one. This would be a useful addition. Could you open a feature request for it? Or perhaps even a pull request:) To get something done now, you should be able to just pass the tck attribute of BivariateSpline to bisplev I think. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Mon Sep 19 15:41:03 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 19 Sep 2011 21:41:03 +0200 Subject: [SciPy-User] partial derivatives of a bivariate spline In-Reply-To: References: Message-ID: On Mon, 19 Sep 2011 21:20:17 +0200 Ralf Gommers wrote: > On Mon, Sep 19, 2011 at 9:07 PM, Nils Wagner > wrote: > >> On Mon, 19 Sep 2011 17:50:23 +0200 >> Ralf Gommers wrote: >> > On Mon, Sep 19, 2011 at 12:18 PM, Nils Wagner >> > wrote: >> > >> >> Hi all, >> >> >> >> How can I compute the partial derivatives of a >>bivariate >> >> spline in scipy ? >> >> >> >> >> > >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplev.html >> >> AFAIK, bisplev and bisplrep correspond to the old >>FITPACK >> wrapper. >> I would like to use >> >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BivariateSpline.html#scipy.interpolate.BivariateSpline >> >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LSQBivariateSpline.html#scipy.interpolate.LSQBivariateSpline >> >> Looks like BivariateSpline forgot to grow a >>derivatives() method, > UnivariateSpline does have one. This would be a useful >addition. Could you > open a feature request for it? Or perhaps even a pull >request:) > > To get something done now, you should be able to just >pass the tck attribute > of BivariateSpline to bisplev I think. > > Cheers, > Ralf Done. See http://projects.scipy.org/scipy/ticket/1522 Cheers, Nils From hturesson at gmail.com Tue Sep 20 07:08:45 2011 From: hturesson at gmail.com (Hjalmar Turesson) Date: Tue, 20 Sep 2011 07:08:45 -0400 Subject: [SciPy-User] Log in to ask.scipy Message-ID: Hi, When I try create an account at ask.scipy.org (http://ask.scipy.org/login), I'm told that I've "entered an invalid captcha". But, there is not captcha on the page. I cannot enter anything. This occurs after I've entered an OpenID, been directed to "Fist OpenID Login" ( http://ask.scipy.org/login?firstlogin=yes&next=http%3A%2F%2Fask.scipy.org%2Fen%2F), entered Username and E-mail, and clicked "Register". Is this a known problem, or am I the first to suffer from it, or have I somehow missed something? Thanks, Hjalmar -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrennie at gmail.com Tue Sep 20 10:47:00 2011 From: jrennie at gmail.com (Jason Rennie) Date: Tue, 20 Sep 2011 10:47:00 -0400 Subject: [SciPy-User] Optimize.fmin_cg INCREASES total forces after minimization In-Reply-To: References: Message-ID: Are you using a good objective/gradient checker? Here a matlab one I wrote in grad school; it's easy to convert to python. http://qwone.com/~jason/matlab/checkgrad2.m Jason On Wed, Sep 14, 2011 at 12:26 PM, Patrick Holvey wrote: > Patrick Holvey gmail.com> writes: > > > > > Hi everyone,I've got the attached program which I've detailed in previous > emails. I'm working on debugging my gradients (which I think I have) but > something weird is going on. When you load the program (from > autosimplewwwV5 > import *) into the interpreter and call TrueSystem.getforces() it returns > the > sum of the absolute values of the forces experienced by the atoms. Ok, so > when > I run TrueSystem.relax() it runs the system through fmin_cg a number of > times. > Here, I'm not sure why it's only going through 5-7 iterations a run before > quitting so I have it run multiple (50) times to get some relaxation going > on. > After running the relaxation, I call getforces() again, only to see that > the > forces have increased! (from 625 to 640) Very curious. I've attached both > the > full code and the test atom setup (a three atom system, 1 Si atom bonded to > 2 > Oxygen atoms in an angle configuration). As expected, initially, the > forces > indicate the Si atom wants to pull up from the O atoms and the O atoms want > to > move away from each other and down, away from the Si atom. This is not > what > happens. In fact, both of the oxygen atoms move towards each other > compressing > the O-Si-O angle, and only marginally lengthening the O-Si bond. This can > be > seen by calling TrueSystem.writetofile("Filename") which will output a .xyz > of > the current system configuration.Any help on this is greatly appreciated. > Thanks so much.Patrick-- Patrick HolveyGraduate StudentDept. of Materials > Science and EngineeringJohns Hopkins Universitypholvey1 jhu.edu > > Attachment (autosimplewwwV5.py): text/x-python, 26 KiB > > Attachment (test-angle.xyz): chemical/x-pdb, 201 bytes > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > Please disregard this communique. It appears that the derivation of the > forces > that I was using was flawed, so have to fix that before I can get back to > debugging. Back to the drawing board... > > Thanks! > > Patrick > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Jason Rennie Research Scientist ITA Software by Google +1 617-446-3651 -------------- next part -------------- An HTML attachment was scrubbed... URL: From D.J.Baker at soton.ac.uk Tue Sep 20 11:43:37 2011 From: D.J.Baker at soton.ac.uk (Baker D.J.) Date: Tue, 20 Sep 2011 16:43:37 +0100 Subject: [SciPy-User] Segmentation fault with scipy v0.9.0 Message-ID: Hello, I'm building scipy v0.9.0 on a RHELS 5.3 cluster. I'm building the package with python 2.6.5, numpy 1.6.1 and the GNU compilers v4.1.2. I've kept things simple and just used the "bog standard" BLAS/LAPACK installed via RHELS rpms. I've built and tested numpy today and that is fine. On the other hand I find that the scipy tests fail with a segmentation fault. On running the tests with "scipy.test(verbose=2) I find the following failure: test_nonlin.TestJacobianDotSolve.test_broyden1 ... Segmentation fault We do have the Intel compilers and MKL on our system as well, however I'll work with these as a last resort. The last time I tried to build scipy using these packages I ran in to many issues. Does anyone please have any advice for me re building scipy 0.9.0 successfully? If, for example, you have a successful build of v0.9.0 then I would appreciate knowing how you did it. Best regards - David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mack.jenkins at eas.gatech.edu Tue Sep 20 12:07:14 2011 From: mack.jenkins at eas.gatech.edu (MJ Jenkins) Date: Tue, 20 Sep 2011 12:07:14 -0400 Subject: [SciPy-User] Installation Errors Message-ID: <616472B1-50B1-4318-908B-2FCA9AB6DCFD@eas.gatech.edu> I am somewhat new to python and SciPy so I hope this is the right place to ask this question. I am trying to install SciPy but I get the following message. Traceback (most recent call last): File "setup.py", line 181, in setup_package() File "setup.py", line 131, in setup_package from numpy.distutils.core import setup ImportError: No module named numpy.distutils.core Can someone point me in the right direction to solve this installation error? Thanks. -- Mack J. Jenkins, II mack.jenkins at eas.gatech.edu Earth & Atmospheric Sciences From pav at iki.fi Tue Sep 20 14:47:03 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 20 Sep 2011 18:47:03 +0000 (UTC) Subject: [SciPy-User] Installation Errors References: <616472B1-50B1-4318-908B-2FCA9AB6DCFD@eas.gatech.edu> Message-ID: Hi, On Tue, 20 Sep 2011 12:07:14 -0400, MJ Jenkins wrote: [clip] > ImportError: No module named numpy.distutils.core > > Can someone point me in the right direction to solve this installation > error?. The error message indicates that you do not have Numpy installed. -- Pauli Virtanen From iefinkel at gmail.com Tue Sep 20 16:45:11 2011 From: iefinkel at gmail.com (Eli Finkelshteyn) Date: Tue, 20 Sep 2011 16:45:11 -0400 Subject: [SciPy-User] Problem Installing with Python 2.7 on OS X 10.6 In-Reply-To: References: <4E739C90.9040805@gmail.com> Message-ID: <4E78FB57.9040103@gmail.com> Right you are. I tried uninstalling numpy and then reinstalling through easy_install on a lark, which resulted in me getting that error instead of the one I was getting previously, which was my actual problem. Anyway, long story short, uninstalling everything, recompiling Python as 32bit, and then reinstalling everything through PIP fixed my problem. On 9/16/11 3:46 PM, David Cournapeau wrote: > On Fri, Sep 16, 2011 at 2:59 PM, Eli Finkelshteyn wrote: >> Hi, >> Every time I try to install SciPy, I get the following error: >> >> $ pip install scipy >> Downloading/unpacking scipy >> Running setup.py egg_info for package scipy >> Traceback (most recent call last): >> File "", line 14, in >> File "/Users/elifinkelshteyn/build/scipy/setup.py", line 181, in >> >> setup_package() >> File "/Users/elifinkelshteyn/build/scipy/setup.py", line 131, in >> setup_package >> from numpy.distutils.core import setup >> File >> "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/__init__.py", >> line 137, in >> import add_newdocs >> File >> "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/add_newdocs.py", >> line 9, in >> from numpy.lib import add_newdoc >> File >> "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/__init__.py", >> line 4, in >> from type_check import * >> File >> "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/lib/type_check.py", >> line 8, in >> import numpy.core.numeric as _nx >> File >> "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/__init__.py", >> line 5, in >> import multiarray >> ImportError: dynamic module does not define init function >> (initmultiarray) >> >> I've tried installing from source by cloning from Git, installing >> through pip, installing through easy_install. Nothing works. I'm running >> everything through OS X 10.6 with Homebrew and using Python 2.7. Numpy >> is installed and works fine. > Actually, numpy does not work: you get an error when importing numpy > (python -c "import numpy" should give you the same error). > > cheers, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From Wolfgang.Mader at fdm.uni-freiburg.de Wed Sep 21 04:16:45 2011 From: Wolfgang.Mader at fdm.uni-freiburg.de (Wolfgang Mader) Date: Wed, 21 Sep 2011 10:16:45 +0200 Subject: [SciPy-User] General purpose 2D,3D plotting library Message-ID: <1898267.rCXBrvlSyj@killbill> Dear all, what is your favorite general purpose plotting library for 2D and 3D visualization. It the moment I am using matplotlib. But knowing alternatives, if any, is a good thing, I think. Thank you and have a nive day. Wolfgang From d.s.seljebotn at astro.uio.no Wed Sep 21 04:34:20 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 21 Sep 2011 10:34:20 +0200 Subject: [SciPy-User] Round table/blog on fixing scientific software distribution Message-ID: <4E79A18C.8010206@astro.uio.no> Yet again, the issue of distributing scientific Python software was raised, this time on the mpi4py mailing list. Since that wasn't really the right forum, and we weren't really sure what was the right forum, we started a blog instead. The idea is to get a diverse set of people describe their experiences; something between a brainstorming and a survey. No current solution fits all and the community is fragmented -- perhaps collecting the experiences of different user bases and the solutions they found is helpful at this point. We want your posts! (send me an email to get posting rights) http://fixingscientificsoftwaredistribution.blogspot.com/2011/09/round-table-about-fixing-problem-of.html Dag Sverre From canavanin at yahoo.se Wed Sep 21 10:47:11 2011 From: canavanin at yahoo.se (D K) Date: Wed, 21 Sep 2011 16:47:11 +0200 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? Message-ID: <4E79F8EF.4010400@yahoo.se> Hi everyone I would like to fit data obtained from a Monte Carlo simulation to experimental data, in order to extract two parameters from the experiments. There are several issues with this: a) There is a small element of randomness to each simulated data point; we don't actually have a function describing the curve (the overall curve shape is reproducible though). b) I have never performed curve fitting before, and I haven't got a clue how to even go about looking for the required information. b) I don't have a strong maths background. I tried using optimize.leastsq, but I learnt that, apparently, I ought to know the function describing my data to be able to use this (I kept researching, as it exited with code 2, claiming that the fit had been successful, but it mainly returned the initial guess as the fitting result). So I switched to optimize.fmin (having read that it only uses the function values); this, however, does not converge and simply exits after the maximum number of iterations have been performed. I can post code or further details if required, but perhaps someone on here might already be able to guess what I might be doing wrong, and/or point me in the right direction (different fitting function? entirely different approach to parameter determination?). I would be very grateful for your help. Thanks a lot in advance! Kind regards, Donata PS: This is the first time I'm writing to this mailing list, I hope I will be forgiven in case I made some daft mistake... From josef.pktd at gmail.com Wed Sep 21 11:51:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Sep 2011 11:51:56 -0400 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: <4E79F8EF.4010400@yahoo.se> References: <4E79F8EF.4010400@yahoo.se> Message-ID: On Wed, Sep 21, 2011 at 10:47 AM, D K wrote: > Hi everyone > > I would like to fit data obtained from a Monte Carlo simulation to > experimental data, in order to extract two parameters from the > experiments. There are several issues with this: > > a) There is a small element of randomness to each simulated data point; > we don't actually have a function describing the curve (the overall > curve shape is reproducible though). > b) I have never performed curve fitting before, and I haven't got a clue > how to even go about looking for the required information. > b) I don't have a strong maths background. > > I tried using optimize.leastsq, but I learnt that, apparently, I ought > to know the function describing my data to be able to use this (I kept > researching, as it exited with code 2, claiming that the fit had been > successful, but it mainly returned the initial guess as the fitting > result). So I switched to optimize.fmin (having read that it only uses > the function values); this, however, does not converge and simply exits > after the maximum number of iterations have been performed. > > I can post code or further details if required, but perhaps someone on > here might already be able to guess what I might be doing wrong, and/or > point me in the right direction (different fitting function? entirely > different approach to parameter determination?). I would be very > grateful for your help. The function that a minimization should match can be very general, there is no real problem if it is a simulation program. My main question was, how you specify the outcome of the simulation model and the outcome of the experiments. Do you have both as some function (program) that depends on the experimental parameters? Are you trying to match the function for different experimental parameters, or many simulations for the same parameters? If leastsq didn't work you should post the function that you used for leastsq, and check the dimension of everything. I guess the problem might be in how you set up the optimization, not in the overall problem. (as an aside: in econometrics or in economics sometimes parameters are estimated by comparing simulation results with real data. The main point is often how to specify what it means that a simulation model is "close" to the real data. In some cases it's just assumed that it produces the same moments. Or that the expected values over many simulations is assumed to be the expected value of the data, or something like this.) Josef > > Thanks a lot in advance! > > Kind regards, > > Donata > > > PS: This is the first time I'm writing to this mailing list, I hope I > will be forgiven in case I made some daft mistake... > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From londonien at gmail.com Wed Sep 21 11:57:23 2011 From: londonien at gmail.com (Simon McGregor) Date: Wed, 21 Sep 2011 16:57:23 +0100 Subject: [SciPy-User] What is the kernel in gaussian_kde? Message-ID: Hi, I'm about to try and implement a simple entropy estimator using scipy.stats.kde. The estimator can be implemented for arbitrary kernels (though it won't be particularly efficient). gaussian_kde is all nicely set up to do density estimation, integration and various other things - but the kernel itself is not exposed! The scipy / numpy matrix operators are still not completely intuitive for me, so reverse-engineering the kernel from the code is proving a bit tricky. The online tutorials I have read on Gaussian kde all use one-dimensional examples, which obscure the meaning of the covariance matrix in the multi-dimensional case. My linear algebra isn't too strong, so again my intuitions aren't good here. If anyone could just spell out what the kernel is, I would be amazingly grateful. It looks like it's something like K(a-b) = exp ( (a-b) . M(a-b) / 2) ...where M is the inverse covariance matrix and . indicates the scalar product of two vectors. Is this right? Many thanks, Simon From abraham.zamudio at gmail.com Tue Sep 20 13:06:27 2011 From: abraham.zamudio at gmail.com (Abraham Zamudio) Date: Tue, 20 Sep 2011 10:06:27 -0700 (PDT) Subject: [SciPy-User] Installation Errors In-Reply-To: <616472B1-50B1-4318-908B-2FCA9AB6DCFD@eas.gatech.edu> References: <616472B1-50B1-4318-908B-2FCA9AB6DCFD@eas.gatech.edu> Message-ID: <4361223a-a0bf-4e9e-a043-38d5b049c14b@f12g2000yqi.googlegroups.com> Numpy is installed ??? . that version of Linux you are installing?? On Sep 20, 11:07?am, MJ Jenkins wrote: > I am somewhat new to python and SciPy so I hope this is the right place to ask this question. ?I am trying to install SciPy but I get the following message. > > Traceback (most recent call last): > ? File "setup.py", line 181, in > ? ? setup_package() > ? File "setup.py", line 131, in setup_package > ? ? from numpy.distutils.core import setup > ImportError: No module named numpy.distutils.core > > Can someone point me in the right direction to solve this installation error? ?Thanks. > -- > Mack J. Jenkins, II > mack.jenk... at eas.gatech.edu > Earth & Atmospheric Sciences > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From johnl at cs.wisc.edu Wed Sep 21 15:20:01 2011 From: johnl at cs.wisc.edu (J. David Lee) Date: Wed, 21 Sep 2011 14:20:01 -0500 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: <4E79F8EF.4010400@yahoo.se> References: <4E79F8EF.4010400@yahoo.se> Message-ID: <4E7A38E1.3050508@cs.wisc.edu> On 09/21/2011 09:47 AM, D K wrote: > Hi everyone > > I would like to fit data obtained from a Monte Carlo simulation to > experimental data, in order to extract two parameters from the > experiments. There are several issues with this: > > a) There is a small element of randomness to each simulated data point; > we don't actually have a function describing the curve (the overall > curve shape is reproducible though). > b) I have never performed curve fitting before, and I haven't got a clue > how to even go about looking for the required information. > b) I don't have a strong maths background. > > I tried using optimize.leastsq, but I learnt that, apparently, I ought > to know the function describing my data to be able to use this (I kept > researching, as it exited with code 2, claiming that the fit had been > successful, but it mainly returned the initial guess as the fitting > result). So I switched to optimize.fmin (having read that it only uses > the function values); this, however, does not converge and simply exits > after the maximum number of iterations have been performed. > Hi Donata, Because your model varies from run to run, you may not be able to reach the default tolerances necessary for successful termination of leastsq. If you look at the documentation for leastsq, you will see several tolerance parameters, ftol, xtol, and gtol. Modifying these may help in your case. Most (all?) of these optimization routines are doing some kind of gradient descent. The variability in your model will affect both the error estimate and the search direction. Because you'll be calculating the Jacobian matrix (gradients) numerically, you're almost certainly want to modify leastsq's epsfcn parameter. Using the default value, it may be that the variability in your model will be larger than the difference due to the delta x used. In that case, your search direction could be essentially random. After writing this, I'm thinking that fmin would be a better fit, as it doesn't have the numerical gradient calculation and associated problems. fmin has the same xtol and ftol arguments as leastsq that might be useful. David From robert.kern at gmail.com Wed Sep 21 18:02:58 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 21 Sep 2011 17:02:58 -0500 Subject: [SciPy-User] What is the kernel in gaussian_kde? In-Reply-To: References: Message-ID: On Wed, Sep 21, 2011 at 10:57, Simon McGregor wrote: > Hi, > > I'm about to try and implement a simple entropy estimator using > scipy.stats.kde. The estimator can be implemented for arbitrary > kernels (though it won't be particularly efficient). > > gaussian_kde is all nicely set up to do density estimation, > integration and various other things - but the kernel itself is not > exposed! > > The scipy / numpy matrix operators are still not completely intuitive > for me, so reverse-engineering the kernel from the code is proving a > bit tricky. The online tutorials I have read on Gaussian kde all use > one-dimensional examples, which obscure the meaning of the covariance > matrix in the multi-dimensional case. My linear algebra isn't too > strong, so again my intuitions aren't good here. > > If anyone could just spell out what the kernel is, I would be > amazingly grateful. > > It looks like it's something like > > K(a-b) = exp ( (a-b) . M(a-b) / 2) > > ...where M is the inverse covariance matrix and . indicates the scalar > product of two vectors. Is this right? Up to a scale, yes. M=self.inv_cov, in this case, which is the inverse of the covariance of the data scaled by a particular factor from the literature. Then this is scaled by self._norm_factor. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From andrej.kobler at gozdis.si Thu Sep 22 08:13:57 2011 From: andrej.kobler at gozdis.si (Andrej Kobler) Date: Thu, 22 Sep 2011 14:13:57 +0200 Subject: [SciPy-User] 2D NN search in a 3D point cloud Message-ID: <24058E20A2354564B6A26E7C2A5957AC@gozdis.si> Hi, in a cloud of XYZ points I'd like to search for nearest neighbors in XY plane using scipy.spatial.KDTree/cKDTree. The problem is that KDTree.query takes into account all the given dimensions, not just the desired subset of dimensions. Is there a simple way to have data with 3 or more dimensions and use only 2 for NN search? Thanks, Andrej -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmacmanu at ciesin.columbia.edu Thu Sep 22 09:34:10 2011 From: kmacmanu at ciesin.columbia.edu (Kytt MacManus) Date: Thu, 22 Sep 2011 09:34:10 -0400 Subject: [SciPy-User] -inf when summing arrays In-Reply-To: <511774E0-06DE-46BA-93C2-B46464DF6845@yale.edu> References: <4E77585A.7020401@ciesin.columbia.edu> <511774E0-06DE-46BA-93C2-B46464DF6845@yale.edu> Message-ID: <4E7B3952.7060905@ciesin.columbia.edu> Thanks Zach...this suggestion helped me solve my problem when I realized that summing any axis of my array was producing a -inf result. This issue it turns out was the values assigned to the NoData areas in the 32bit floating point raster which I had produced in ArcGIS. I was able to get things to work by recoding NoData as 0. Thanks for your help. -Kytt On 9/19/2011 11:04 AM, Zachary Pincus wrote: > Can you provide a small/self-contained example of values/labels arrays that reproduce this issue? (Perhaps by trying various chunks of your input arrays until you find the error-producing region?) > > Also, does numpy.all(numpy.isfinite(valueArray)) return True? > > Zach > > > On Sep 19, 2011, at 10:57 AM, Kytt MacManus wrote: > >> Hello and good day, >> >> I am a newbie to the numpy/scipy so I apologize if this is a trivial >> question. >> >> I have been attempting to do a simple zonal sum of 2 arrays. I have one >> integer array of zones (labels), and another array of floating point values. >> >> For some reason when I attempt >> scipy.ndimage.sum(valueArray,labels=zoneArray,index=1) >> >> The return value is -inf for each of my zones. >> >> If I run valueArray.sum() an actual number is returned. >> >> I am not sure what the -inf signifies and have not been able to locate >> any useful documentation on the subject. >> >> Any insight or advice would be much appreciated. >> >> Thanks, >> Kytt >> >> -- >> Kytt MacManus >> Geographic Information Specialist >> CIESIN >> Earth Institute >> Columbia University >> >> Adjunct Lecturer >> School of International and Public Affairs >> Columbia University >> >> P.O. Box 1000 >> 61 Route 9W >> Palisades, NY 10964 >> >> 845-365-8939 (V) >> 845-365-8922 (F) >> www.ciesin.columbia.edu >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- Kytt MacManus Geographic Information Specialist CIESIN Earth Institute Columbia University Adjunct Lecturer School of International and Public Affairs Columbia University P.O. Box 1000 61 Route 9W Palisades, NY 10964 845-365-8939 (V) 845-365-8922 (F) www.ciesin.columbia.edu From canavanin at yahoo.se Thu Sep 22 11:50:18 2011 From: canavanin at yahoo.se (D K) Date: Thu, 22 Sep 2011 17:50:18 +0200 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: <4E7A38E1.3050508@cs.wisc.edu> References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> Message-ID: <4E7B593A.1020605@yahoo.se> Dear David thank you very much for your reply. I have been playing around with fmin's ftol and xtol arguments, as you suggested. It's looking promising so far, but then my initial guesses have been rather close to the true values of my test set. I will keep testing, and maybe write to the mailing list again at some point. Thanks again! /Donata PS: Also thanks very much to Josef, who also replied to my email. I will keep trying a bit with fmin and its parameters at first, and answer your questions in case I still don't get anywhere this way. I hope this approach is ok... On 09/21/2011 09:20 PM, J. David Lee wrote: > On 09/21/2011 09:47 AM, D K wrote: >> Hi everyone >> >> I would like to fit data obtained from a Monte Carlo simulation to >> experimental data, in order to extract two parameters from the >> experiments. There are several issues with this: >> >> a) There is a small element of randomness to each simulated data point; >> we don't actually have a function describing the curve (the overall >> curve shape is reproducible though). >> b) I have never performed curve fitting before, and I haven't got a clue >> how to even go about looking for the required information. >> b) I don't have a strong maths background. >> >> I tried using optimize.leastsq, but I learnt that, apparently, I ought >> to know the function describing my data to be able to use this (I kept >> researching, as it exited with code 2, claiming that the fit had been >> successful, but it mainly returned the initial guess as the fitting >> result). So I switched to optimize.fmin (having read that it only uses >> the function values); this, however, does not converge and simply exits >> after the maximum number of iterations have been performed. >> > Hi Donata, > > Because your model varies from run to run, you may not be able to reach > the default tolerances necessary for successful termination of leastsq. > If you look at the documentation for leastsq, you will see several > tolerance parameters, ftol, xtol, and gtol. Modifying these may help in > your case. > > Most (all?) of these optimization routines are doing some kind of > gradient descent. The variability in your model will affect both the > error estimate and the search direction. Because you'll be calculating > the Jacobian matrix (gradients) numerically, you're almost certainly > want to modify leastsq's epsfcn parameter. Using the default value, it > may be that the variability in your model will be larger than the > difference due to the delta x used. In that case, your search direction > could be essentially random. > > After writing this, I'm thinking that fmin would be a better fit, as it > doesn't have the numerical gradient calculation and associated problems. > fmin has the same xtol and ftol arguments as leastsq that might be useful. > > David > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bastian.weber at gmx-topmail.de Thu Sep 22 12:10:30 2011 From: bastian.weber at gmx-topmail.de (Bastian Weber) Date: Thu, 22 Sep 2011 18:10:30 +0200 Subject: [SciPy-User] 2D NN search in a 3D point cloud In-Reply-To: <24058E20A2354564B6A26E7C2A5957AC@gozdis.si> References: <24058E20A2354564B6A26E7C2A5957AC@gozdis.si> Message-ID: <4E7B5DF6.1050002@gmx-topmail.de> On 09/22/2011 02:13 PM, Andrej Kobler wrote: > Hi, > > in a cloud of XYZ points I?d like to search for nearest neighbors in XY > plane using scipy.spatial.KDTree/cKDTree. The problem is that > KDTree.query takes into account all the given dimensions, not just the > desired subset of dimensions. Is there a simple way to have data with 3 > or more dimensions and use only 2 for NN search? Probably I dont have enough insight but I simply would try to neglect the last coordinate of the data. Assuming you have points_xyz.shape == (N,3) where N is the number of points then you could simply do: points_xy = points_xyz[:,:2]. Applying the NN search on this data should give you the desired result. Best regards, Bastian. From josef.pktd at gmail.com Thu Sep 22 12:16:45 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 22 Sep 2011 12:16:45 -0400 Subject: [SciPy-User] catching warnings without error/raise Message-ID: A python question, but hopefully someone can answer I would like wrap functions that sometimes issue warnings, I want to suppress the warning but I want to record whether a warning has been issued, for later display. python 2.6 has "with warnings.catch_warnings(record=True) as w:" that seems to do what I want (from reading the description). Is there a different way how to do this that also works for python 2.5 examples: >>> from scipy import stats >>> stats.kurtosistest(np.arange(5)) (-0.57245889052982701, 0.56701112882584059) C:\Python26\lib\site-packages\scipy\stats\stats.py:1198: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=5 int(n)) >>> stats.ansari(np.arange(5), np.arange(5)**2) (18.5, 0.12260027475751481) C:\Python26\lib\site-packages\scipy\stats\morestats.py:731: UserWarning: Ties preclude use of exact statistic. warnings.warn("Ties preclude use of exact statistic.") Thanks, Josef From josef.pktd at gmail.com Thu Sep 22 12:21:23 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 22 Sep 2011 12:21:23 -0400 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: <4E7B593A.1020605@yahoo.se> References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> <4E7B593A.1020605@yahoo.se> Message-ID: On Thu, Sep 22, 2011 at 11:50 AM, D K wrote: > Dear David > > thank you very much for your reply. I have been playing around with > fmin's ftol and xtol arguments, as you suggested. It's looking promising > so far, but then my initial guesses have been rather close to the true > values of my test set. I will keep testing, and maybe write to the > mailing list again at some point. Thanks again! If the problem is the MonteCarlo noise, then the question is whether you keep a fixed random.seed during the calculations? If you have a fixed seed, then you have the same MonteCarlo noise in all calculations and it shouldn't affect the derivative calculations or the calculations for different parameters. Josef > > /Donata > > > PS: Also thanks very much to Josef, who also replied to my email. I will > keep trying a bit with fmin and its parameters at first, and answer your > questions in case I still don't get anywhere this way. I hope this > approach is ok... > > > On 09/21/2011 09:20 PM, J. David Lee wrote: >> On 09/21/2011 09:47 AM, D K wrote: >>> Hi everyone >>> >>> I would like to fit data obtained from a Monte Carlo simulation to >>> experimental data, in order to extract two parameters from the >>> experiments. There are several issues with this: >>> >>> a) There is a small element of randomness to each simulated data point; >>> we don't actually have a function describing the curve (the overall >>> curve shape is reproducible though). >>> b) I have never performed curve fitting before, and I haven't got a clue >>> how to even go about looking for the required information. >>> b) I don't have a strong maths background. >>> >>> I tried using optimize.leastsq, but I learnt that, apparently, I ought >>> to know the function describing my data to be able to use this (I kept >>> researching, as it exited with code 2, claiming that the fit had been >>> successful, but it mainly returned the initial guess as the fitting >>> result). So I switched to optimize.fmin (having read that it only uses >>> the function values); this, however, does not converge and simply exits >>> after the maximum number of iterations have been performed. >>> >> Hi Donata, >> >> Because your model varies from run to run, you may not be able to reach >> the default tolerances necessary for successful termination of leastsq. >> If you look at the documentation for leastsq, you will see several >> tolerance parameters, ftol, xtol, and gtol. Modifying these may help in >> your case. >> >> Most (all?) of these optimization routines are doing some kind of >> gradient descent. The variability in your model will affect both the >> error estimate and the search direction. Because you'll be calculating >> the Jacobian matrix (gradients) numerically, you're almost certainly >> want to modify leastsq's epsfcn parameter. Using the default value, it >> may be that the variability in your model will be larger than the >> difference due to the delta x used. In that case, your search direction >> could be essentially random. >> >> After writing this, I'm thinking that fmin would be a better fit, as it >> doesn't have the numerical gradient calculation and associated problems. >> fmin has the same xtol and ftol arguments as leastsq that might be useful. >> >> David >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cournape at gmail.com Thu Sep 22 14:04:40 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 22 Sep 2011 14:04:40 -0400 Subject: [SciPy-User] catching warnings without error/raise In-Reply-To: References: Message-ID: On Thu, Sep 22, 2011 at 12:16 PM, wrote: > A python question, but hopefully someone can answer > > I would like wrap functions that sometimes issue warnings, I want to > suppress the warning but I want to record whether a warning has been > issued, for later display. > > python 2.6 has "with warnings.catch_warnings(record=True) as w:" that > seems to do what I want (from reading the description). The with statement is "just" syntax sugar, so if you read the sources for the corresponding context manager, you should be able to reproduce the code in a 2.5-compatible way. cheers, David From josef.pktd at gmail.com Thu Sep 22 14:49:07 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 22 Sep 2011 14:49:07 -0400 Subject: [SciPy-User] catching warnings without error/raise In-Reply-To: References: Message-ID: On Thu, Sep 22, 2011 at 2:04 PM, David Cournapeau wrote: > On Thu, Sep 22, 2011 at 12:16 PM, ? wrote: >> A python question, but hopefully someone can answer >> >> I would like wrap functions that sometimes issue warnings, I want to >> suppress the warning but I want to record whether a warning has been >> issued, for later display. >> >> python 2.6 has "with warnings.catch_warnings(record=True) as w:" that >> seems to do what I want (from reading the description). > > The with statement is "just" syntax sugar, so if you read the sources > for the corresponding context manager, you should be able to reproduce > the code in a 2.5-compatible way. I forgot we can sometimes look at the python source import warnings cache_showwarning = warnings.showwarning log = [] ###python 2.6 ##def showwarning(*args, **kwargs): ## log.append(warnings.WarningMessage(*args, **kwargs)) #python 2.5 : no warnings.WarningMessage def showwarning(*args, **kwargs): log.append((args, kwargs)) warnings.showwarning = showwarning import numpy as np from scipy import stats warnings.simplefilter("always") #stats doesn't always warn ? stats.ansari(np.arange(5), np.arange(5)**2) stats.kurtosistest(np.arange(5)) warnings.showwarning = cache_showwarning print log --------- After looking at the source, I see in the documentation for showwarning for both 2.5 and 2.6 "You may replace this function with an alternative implementation by assigning to warnings.showwarning" which sounded too "monkey" for me to pay attention. Thanks, Josef > > cheers, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From johann.cohentanugi at gmail.com Thu Sep 22 17:14:21 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Thu, 22 Sep 2011 23:14:21 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: <4E719695.1080503@gmail.com> References: <4E719695.1080503@gmail.com> Message-ID: <4E7BA52D.3090906@gmail.com> Hello, I made some progress on cython, but I would like to clone test_lambertw.py in order to create unittests for my polylog code. The problem is that I do not manage to run it directly : In [1]: from scipy.special._testutils import FuncData --------------------------------------------------------------------------- ImportError Traceback (most recent call last) /home/cohen/sources/python/scipydev/ in () ----> 1 from scipy.special._testutils import FuncData ImportError: No module named _testutils In [2]: from scipy.special import lambertw In [4]: run ../pyvault/scipy-git/scipy/special/tests/test_lambertw.py --------------------------------------------------------------------------- ImportError Traceback (most recent call last) /home/cohen/sources/python/pyvault/scipy-git/scipy/special/tests/test_lambertw.py in () 12 from numpy import nan, inf, pi, e, isnan, log, r_, array, complex_ 13 ---> 14 from scipy.special._testutils import FuncData 15 16 ImportError: No module named _testutils I have a master checkout of scipy from the git repository. What is the correct way to run this? thanks in advance, Johann On 09/15/2011 08:09 AM, Johann Cohen-Tanugi wrote: > hi there, any chance for a polylog implementation in scipy.special? I > know it is there in mpmath, but I thought I would ask anyway. > best, > Johann > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From canavanin at yahoo.se Thu Sep 22 18:03:46 2011 From: canavanin at yahoo.se (canavanin at yahoo.se) Date: Fri, 23 Sep 2011 00:03:46 +0200 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> <4E7B593A.1020605@yahoo.se> Message-ID: Hi Josef I think the seed is the same in all cases. Each random number is obtained using random.random(), but I never use random.seed(). So you think me being unable to get a satisfactory result when using optimize.leastsq does indeed point to me having incorrectly set up the optimization itself? I'll do some more checking tomorrow (although the dimensions should be fine (in case by that you meant the ranks of the arrays in your first reply), I checked those)! Thanks very much for your time! /Donata 22 sep 2011 kl. 18:21 skrev josef.pktd at gmail.com: > On Thu, Sep 22, 2011 at 11:50 AM, D K wrote: >> Dear David >> >> thank you very much for your reply. I have been playing around with >> fmin's ftol and xtol arguments, as you suggested. It's looking promising >> so far, but then my initial guesses have been rather close to the true >> values of my test set. I will keep testing, and maybe write to the >> mailing list again at some point. Thanks again! > > If the problem is the MonteCarlo noise, then the question is whether > you keep a fixed random.seed during the calculations? > > If you have a fixed seed, then you have the same MonteCarlo noise in > all calculations and it shouldn't affect the derivative calculations > or the calculations for different parameters. > > Josef > >> >> /Donata >> >> >> PS: Also thanks very much to Josef, who also replied to my email. I will >> keep trying a bit with fmin and its parameters at first, and answer your >> questions in case I still don't get anywhere this way. I hope this >> approach is ok... >> >> >> On 09/21/2011 09:20 PM, J. David Lee wrote: >>> On 09/21/2011 09:47 AM, D K wrote: >>>> Hi everyone >>>> >>>> I would like to fit data obtained from a Monte Carlo simulation to >>>> experimental data, in order to extract two parameters from the >>>> experiments. There are several issues with this: >>>> >>>> a) There is a small element of randomness to each simulated data point; >>>> we don't actually have a function describing the curve (the overall >>>> curve shape is reproducible though). >>>> b) I have never performed curve fitting before, and I haven't got a clue >>>> how to even go about looking for the required information. >>>> b) I don't have a strong maths background. >>>> >>>> I tried using optimize.leastsq, but I learnt that, apparently, I ought >>>> to know the function describing my data to be able to use this (I kept >>>> researching, as it exited with code 2, claiming that the fit had been >>>> successful, but it mainly returned the initial guess as the fitting >>>> result). So I switched to optimize.fmin (having read that it only uses >>>> the function values); this, however, does not converge and simply exits >>>> after the maximum number of iterations have been performed. >>>> >>> Hi Donata, >>> >>> Because your model varies from run to run, you may not be able to reach >>> the default tolerances necessary for successful termination of leastsq. >>> If you look at the documentation for leastsq, you will see several >>> tolerance parameters, ftol, xtol, and gtol. Modifying these may help in >>> your case. >>> >>> Most (all?) of these optimization routines are doing some kind of >>> gradient descent. The variability in your model will affect both the >>> error estimate and the search direction. Because you'll be calculating >>> the Jacobian matrix (gradients) numerically, you're almost certainly >>> want to modify leastsq's epsfcn parameter. Using the default value, it >>> may be that the variability in your model will be larger than the >>> difference due to the delta x used. In that case, your search direction >>> could be essentially random. >>> >>> After writing this, I'm thinking that fmin would be a better fit, as it >>> doesn't have the numerical gradient calculation and associated problems. >>> fmin has the same xtol and ftol arguments as leastsq that might be useful. >>> >>> David >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From robert.kern at gmail.com Thu Sep 22 18:07:07 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 22 Sep 2011 17:07:07 -0500 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> <4E7B593A.1020605@yahoo.se> Message-ID: On Thu, Sep 22, 2011 at 17:03, canavanin at yahoo.se wrote: > Hi Josef > > I think the seed is the same in all cases. Each random number is obtained using random.random(), but I never use random.seed(). Then what makes you think the seed is the same in all cases? The PRNG is initialized with a different seed each time numpy is imported for the first time if you do not explicitly seed it yourself. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From hturesson at gmail.com Thu Sep 22 21:59:59 2011 From: hturesson at gmail.com (Hjalmar Turesson) Date: Thu, 22 Sep 2011 21:59:59 -0400 Subject: [SciPy-User] fast spline interpolation of multiple equal length waveforms Message-ID: Hi, I got a data set with hundreds of thousands for 40 point long waveforms. I want to use cubic splines to interpolate these at intermediate time points. However, the points are different all waveforms, only the number of points is the same. In other words, I want to interpolate a large number of equally short waveforms, each to its own grid of x-values/time points, and I want to do this as FAST as possible. Are there any functions that can take a whole array for waveforms and a size matched array of new x-values, and interpolate each waveform at a matched row (or column) of x-values? What I've found, this far, appear to require a loop to one by one go through the waveforms and corresponding grid of x-values. I fear that a long loop will be significantly slower than a direct evaluation of the entire array. Thanks, Hjalmar -------------- next part -------------- An HTML attachment was scrubbed... URL: From member at linkedin.com Thu Sep 22 22:36:37 2011 From: member at linkedin.com (William Ratcliff via LinkedIn) Date: Fri, 23 Sep 2011 02:36:37 +0000 (UTC) Subject: [SciPy-User] Invitation to connect on LinkedIn Message-ID: <1029249154.87247.1316745397307.JavaMail.app@ela4-bed77.prod> LinkedIn ------------ William Ratcliff requested to add you as a connection on LinkedIn: ------------------------------------------ Jose, I'd like to add you to my professional network on LinkedIn. - William Accept invitation from William Ratcliff http://www.linkedin.com/e/-3wy1w2-gswk7n07-4e/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I247094844_20/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYMcBYQd3wQej0Td399bTFJcQVfjT9vbPsOdzAQcz4OcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=37WI-FS2-sTkU1 View invitation from William Ratcliff http://www.linkedin.com/e/-3wy1w2-gswk7n07-4e/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I247094844_20/30OnPgQe3gVc3sQcAALqnpPbOYWrSlI/svi/?hs=false&tok=3HE1LwPqWsTkU1 ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/-3wy1w2-gswk7n07-4e/ewp/inv-22/?hs=false&tok=2WoPuEzfSsTkU1 -- (c) 2011, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrej.kobler at gozdis.si Fri Sep 23 01:40:10 2011 From: andrej.kobler at gozdis.si (Andrej Kobler) Date: Fri, 23 Sep 2011 07:40:10 +0200 Subject: [SciPy-User] 2D NN search in a 3D point cloud In-Reply-To: <4E7B5DF6.1050002@gmx-topmail.de> References: <24058E20A2354564B6A26E7C2A5957AC@gozdis.si> <4E7B5DF6.1050002@gmx-topmail.de> Message-ID: <68AAC62523CF4B66A92B1EA4ECAB8C68@gozdis.si> Bastian, I need the 3rd coordinate value after selecting the NN in 2D. You suggestion could be OK, if you know of a way to conserve the links between 2D points and their coordinate values in the 3rd dimension. Andrej -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Bastian Weber Sent: 22. september 2011 18:11 To: SciPy Users List Subject: Re: [SciPy-User] 2D NN search in a 3D point cloud On 09/22/2011 02:13 PM, Andrej Kobler wrote: > Hi, > > in a cloud of XYZ points I'd like to search for nearest neighbors in XY > plane using scipy.spatial.KDTree/cKDTree. The problem is that > KDTree.query takes into account all the given dimensions, not just the > desired subset of dimensions. Is there a simple way to have data with 3 > or more dimensions and use only 2 for NN search? Probably I dont have enough insight but I simply would try to neglect the last coordinate of the data. Assuming you have points_xyz.shape == (N,3) where N is the number of points then you could simply do: points_xy = points_xyz[:,:2]. Applying the NN search on this data should give you the desired result. Best regards, Bastian. _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Fri Sep 23 02:32:17 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 23 Sep 2011 06:32:17 +0000 (UTC) Subject: [SciPy-User] 2D NN search in a 3D point cloud References: <24058E20A2354564B6A26E7C2A5957AC@gozdis.si> <4E7B5DF6.1050002@gmx-topmail.de> <68AAC62523CF4B66A92B1EA4ECAB8C68@gozdis.si> Message-ID: Fri, 23 Sep 2011 07:40:10 +0200, Andrej Kobler wrote: > I need the 3rd coordinate value after selecting the NN in 2D. You > suggestion could be OK, if you know of a way to conserve the links > between 2D points and their coordinate values in the 3rd dimension. KDTree.query returns the indices of the points, so the link is preserved. -- Pauli Virtanen From almar.klein at gmail.com Fri Sep 23 03:21:50 2011 From: almar.klein at gmail.com (Almar Klein) Date: Fri, 23 Sep 2011 09:21:50 +0200 Subject: [SciPy-User] General purpose 2D,3D plotting library In-Reply-To: <1898267.rCXBrvlSyj@killbill> References: <1898267.rCXBrvlSyj@killbill> Message-ID: > what is your favorite general purpose plotting library for 2D and 3D > visualization. It the moment I am using matplotlib. But knowing > alternatives, > if any, is a good thing, I think. > Visvis is good at 2D and 3D visualization, and has basic support for plotting: http://code.google.com/p/visvis/ Almar -------------- next part -------------- An HTML attachment was scrubbed... URL: From canavanin at yahoo.se Fri Sep 23 03:29:07 2011 From: canavanin at yahoo.se (D K) Date: Fri, 23 Sep 2011 09:29:07 +0200 Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? In-Reply-To: References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> <4E7B593A.1020605@yahoo.se> Message-ID: <4E7C3543.7040307@yahoo.se> On 09/23/2011 12:07 AM, Robert Kern wrote: > On Thu, Sep 22, 2011 at 17:03, canavanin at yahoo.se wrote: >> Hi Josef >> >> I think the seed is the same in all cases. Each random number is obtained using random.random(), but I never use random.seed(). > Then what makes you think the seed is the same in all cases? The PRNG > is initialized with a different seed each time numpy is imported for > the first time if you do not explicitly seed it yourself. > What you have just written is exactly what made me think there would be the same seed - not between different calls to my script, of course, but I don't use data that was produced by different calls to the script. Currently simulation and fitting are done by the same script, so I assumed that the PRNG would be initialized through 'import random', and that would be that. Or have I been mistaken/misunderstood the point you were trying to make? /Donata From pav at iki.fi Fri Sep 23 03:52:44 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 23 Sep 2011 07:52:44 +0000 (UTC) Subject: [SciPy-User] How to fit data obtained from a Monte Carlo simulation? References: <4E79F8EF.4010400@yahoo.se> <4E7A38E1.3050508@cs.wisc.edu> <4E7B593A.1020605@yahoo.se> <4E7C3543.7040307@yahoo.se> Message-ID: Fri, 23 Sep 2011 09:29:07 +0200, D K wrote: [clip] > What you have just written is exactly what made me think there would be > the same seed - not between different calls to my script, of course, but > I don't use data that was produced by different calls to the script. > Currently simulation and fitting are done by the same script, so I > assumed that the PRNG would be initialized through 'import random', and > that would be that. Or have I been mistaken/misunderstood the point you > were trying to make? If you are doing the fitting correctly, you will end up having to run the MC simulation several times, for different parameters. If you don't set the seed manually, the random numbers will be different in each of your MC runs. -- Pauli Virtanen From dpinte at enthought.com Fri Sep 23 05:36:32 2011 From: dpinte at enthought.com (Didrik Pinte) Date: Fri, 23 Sep 2011 11:36:32 +0200 Subject: [SciPy-User] General purpose 2D,3D plotting library In-Reply-To: <1898267.rCXBrvlSyj@killbill> References: <1898267.rCXBrvlSyj@killbill> Message-ID: On Wed, Sep 21, 2011 at 10:16 AM, Wolfgang Mader wrote: > Dear all, > > what is your favorite general purpose plotting library for 2D and 3D > visualization. It the moment I am using matplotlib. But knowing alternatives, > if any, is a good thing, I think. Matplotlib is great for static plots and generating images. If you need interactivity, I would suggest you look at Chaco (http://github.enthought.com/chaco/quickstart.html) For 3d, Mayavi and tvtk could be very useful (http://github.enthought.com/mayavi/mayavi/) -- Didrik From jeremy at jeremysanders.net Fri Sep 23 06:42:21 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Fri, 23 Sep 2011 11:42:21 +0100 Subject: [SciPy-User] General purpose 2D,3D plotting library References: <1898267.rCXBrvlSyj@killbill> Message-ID: Wolfgang Mader wrote: > Dear all, > > what is your favorite general purpose plotting library for 2D and 3D > visualization. It the moment I am using matplotlib. But knowing > alternatives, if any, is a good thing, I think. If you want the ability to fine tune your plots with a GUI and build up your plot from widgets, Veusz might be useful for you (I am the lead author). It only does 2D plots at the moment. http://home.gna.org/veusz/ Jeremy From nwagner at iam.uni-stuttgart.de Fri Sep 23 07:34:57 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 23 Sep 2011 13:34:57 +0200 Subject: [SciPy-User] General purpose 2D,3D plotting library In-Reply-To: References: <1898267.rCXBrvlSyj@killbill> Message-ID: On Fri, 23 Sep 2011 11:42:21 +0100 Jeremy Sanders wrote: > Wolfgang Mader wrote: > >> Dear all, >> >> what is your favorite general purpose plotting library >>for 2D and 3D >> visualization. It the moment I am using matplotlib. But >>knowing >> alternatives, if any, is a good thing, I think. > > If you want the ability to fine tune your plots with a >GUI and build up your > plot from widgets, Veusz might be useful for you (I am >the lead author). It > only does 2D plots at the moment. > > http://home.gna.org/veusz/ > > Jeremy > > Just curious. Is there a chance that you will release Veusz under BSD licence ? Nils From jeremy at jeremysanders.net Fri Sep 23 09:17:41 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Fri, 23 Sep 2011 14:17:41 +0100 Subject: [SciPy-User] General purpose 2D,3D plotting library References: <1898267.rCXBrvlSyj@killbill> Message-ID: Nils Wagner wrote: > Just curious. > Is there a chance that you will release Veusz under BSD > licence ? I'm more of a fan of copyleft licenses than BSD, but I'd be happy with a LGPL license. I wouldn't like an extended version of the program to be proprietary. There are other copyright holders who would need to agree, however. Also a small amount of code was taken from another GPL project which probably would have to be rewritten. In addition, Veusz is linked against the GPLd PyQt library, making a change of license difficult. However, it could use PySide instead (in theory, though the SIP code would need to be convered). Nevertheless, I would be happy to make the embedding module, used to embed Veusz plots in other Python programs, BSD. The embedding module uses interprocess communication to communicate with Veusz. Veusz runs in its own process in embbeding mode. It would be a natural way to use Veusz to plot from non-GPLd programs. Jeremy From johann.cohentanugi at gmail.com Fri Sep 23 09:21:12 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Fri, 23 Sep 2011 15:21:12 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: <4E719695.1080503@gmail.com> References: <4E719695.1080503@gmail.com> Message-ID: <4E7C87C8.9080104@gmail.com> hi there, in working on this I need to deal with zeta function, especially for complex numbers. Hopefully I will manage to implement it based on what is in mpmath (kudos the developers of this impressive package!), but I came around a more mundane behavior that maybe can be improved : In [493]: special.zeta(3,1) Out[493]: 1.202056903159594 In [494]: special.zetac(3)+1 Out[494]: 1.2020569031595942 In [495]: special.zetac(-3)+1 Out[495]: 0.0083333333333333037 In [496]: special.zeta(-3,1) Out[496]: nan In [497]: mpmath.zeta(-3) Out[497]: mpf('0.0083333333333333332') in plain words, zetac(z) knows how to eat negative arguments, but not zeta(z,1).... Is there a reason why special.zeta does not default to 1+special.zetac for s=1? This would make the behavior of the 2 functions more identical. best, Johann On 09/15/2011 08:09 AM, Johann Cohen-Tanugi wrote: > hi there, any chance for a polylog implementation in scipy.special? I > know it is there in mpmath, but I thought I would ask anyway. > best, > Johann > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jjstickel at vcn.com Fri Sep 23 09:54:48 2011 From: jjstickel at vcn.com (Jonathan Stickel) Date: Fri, 23 Sep 2011 07:54:48 -0600 Subject: [SciPy-User] fast spline interpolation of multiple equal length waveforms In-Reply-To: References: Message-ID: <4E7C8FA8.6020707@vcn.com> On 9/22/11 20:36 , scipy-user-request at scipy.org wrote: > Date: Thu, 22 Sep 2011 21:59:59 -0400 > From: Hjalmar Turesson > Subject: [SciPy-User] fast spline interpolation of multiple equal > length waveforms > To:scipy-user at scipy.org > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > I got a data set with hundreds of thousands for 40 point long waveforms. I > want to use cubic splines to interpolate these at intermediate time points. > However, the points are different all waveforms, only the number of points > is the same. In other words, I want to interpolate a large number of equally > short waveforms, each to its own grid of x-values/time points, and I want to > do this as FAST as possible. > > Are there any functions that can take a whole array for waveforms and a size > matched array of new x-values, and interpolate each waveform at a matched > row (or column) of x-values? > > What I've found, this far, appear to require a loop to one by one go through > the waveforms and corresponding grid of x-values. I fear that a long loop > will be significantly slower than a direct evaluation of the entire array. > > Thanks, > Hjalmar For each data set (x,y), are the x-values the same and the y-values different? If so, you may find this code useful: http://scipy-central.org/item/21/1/simple-piecewise-polynomial-interpolation It is not splines, but nonetheless provides good quality interpolation and is very fast. For given x and x_interp, it can create an interpolation matrix P. Then y_interp = P*y. If you have all your y-data in Y, then Y_interp = P*Y. HTH, Jonathan From hturesson at gmail.com Fri Sep 23 10:21:08 2011 From: hturesson at gmail.com (Hjalmar Turesson) Date: Fri, 23 Sep 2011 10:21:08 -0400 Subject: [SciPy-User] fast spline interpolation of multiple equal length waveforms In-Reply-To: <4E7C8FA8.6020707@vcn.com> References: <4E7C8FA8.6020707@vcn.com> Message-ID: Thanks for the reply. Both x and y values are different, but they have the same length. I'll try your simple piecewise polynomial interpolation over the weekend, and report back when I know how well it works. Thanks, Hjalmar On Fri, Sep 23, 2011 at 9:54 AM, Jonathan Stickel wrote: > On 9/22/11 20:36 , scipy-user-request at scipy.org wrote: > >> Date: Thu, 22 Sep 2011 21:59:59 -0400 >> From: Hjalmar Turesson >> Subject: [SciPy-User] fast spline interpolation of multiple equal >> length waveforms >> To:scipy-user at scipy.org >> Message-ID: >> > gmail.com >> > >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> Hi, >> I got a data set with hundreds of thousands for 40 point long waveforms. I >> want to use cubic splines to interpolate these at intermediate time >> points. >> However, the points are different all waveforms, only the number of points >> is the same. In other words, I want to interpolate a large number of >> equally >> short waveforms, each to its own grid of x-values/time points, and I want >> to >> do this as FAST as possible. >> >> Are there any functions that can take a whole array for waveforms and a >> size >> matched array of new x-values, and interpolate each waveform at a matched >> row (or column) of x-values? >> >> What I've found, this far, appear to require a loop to one by one go >> through >> the waveforms and corresponding grid of x-values. I fear that a long loop >> will be significantly slower than a direct evaluation of the entire array. >> >> Thanks, >> Hjalmar >> > > For each data set (x,y), are the x-values the same and the y-values > different? If so, you may find this code useful: > > http://scipy-central.org/item/**21/1/simple-piecewise-** > polynomial-interpolation > > It is not splines, but nonetheless provides good quality interpolation and > is very fast. For given x and x_interp, it can create an interpolation > matrix P. Then y_interp = P*y. If you have all your y-data in Y, then > Y_interp = P*Y. > > HTH, > Jonathan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Sep 23 12:18:19 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 23 Sep 2011 16:18:19 +0000 (UTC) Subject: [SciPy-User] polylogarithm? References: <4E719695.1080503@gmail.com> <4E7C87C8.9080104@gmail.com> Message-ID: Fri, 23 Sep 2011 15:21:12 +0200, Johann Cohen-Tanugi wrote: [clip] > zetac(z) knows how to eat negative arguments, but not zeta(z,1).... Is > there a reason why special.zeta does not default to 1+special.zetac for > s=1? This would make the behavior of the 2 functions more identical. Probably no reason, except that it wasn't implemented. mpmath is impressive, and in several ways ahead of scipy.special --- or at least in the parts where the problems overlap, as you can do tricks with arbitrary precision that are not really feasible. Note that if you need to call the zeta function from the Cython extension, call the C library directly: cdef extern from "cephes.h": double zeta(double x, double q) double zetac(double x) and link the extension with the "sc_cephes" library. *** There's a formula (look in the mpmath sources ;) for the transform from x < 0 to x > 0 for zeta(x, a) for general a. But that needs polylog. The implementation for zetac(x) for x < 0 seems also a bit incomplete, as it goes only down to -30.8148. It seems this is due to a silly reason, it should use gammaln instead of Gamma to avoid the overflow. Pauli From johann.cohen-tanugi at lupm.univ-montp2.fr Thu Sep 22 16:51:39 2011 From: johann.cohen-tanugi at lupm.univ-montp2.fr (Johann Cohen-Tanugi) Date: Thu, 22 Sep 2011 22:51:39 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> <4E7746DC.90808@gmail.com> Message-ID: <4E7B9FDB.9080903@lupm.univ-montp2.fr> Hello, I made some progress on cython, but I would like to clone test_lambertw.py in order to create unittests for my polylog code. The problem is that I do not manage to run it directly : In [1]: from scipy.special._testutils import FuncData --------------------------------------------------------------------------- ImportError Traceback (most recent call last) /home/cohen/sources/python/scipydev/ in () ----> 1 from scipy.special._testutils import FuncData ImportError: No module named _testutils In [2]: from scipy.special import lambertw In [4]: run ../pyvault/scipy-git/scipy/special/tests/test_lambertw.py --------------------------------------------------------------------------- ImportError Traceback (most recent call last) /home/cohen/sources/python/pyvault/scipy-git/scipy/special/tests/test_lambertw.py in () 12 from numpy import nan, inf, pi, e, isnan, log, r_, array, complex_ 13 ---> 14 from scipy.special._testutils import FuncData 15 16 ImportError: No module named _testutils I have a master checkout of scipy from the git repository. What is the correct way to run this? thanks in advance, Johann On 09/19/2011 03:55 PM, Pauli Virtanen wrote: > Mon, 19 Sep 2011 15:42:52 +0200, Johann Cohen-Tanugi wrote: > [clip] >> where the first one is the mpmath code without the mpmath ctx objects >> invoked, the second is the cython version, and the third is the mpmath >> native version (0.17). Before I dive into cython management of arrays in >> order for polylog to accept input arrays, I would appreciate feedback on >> where to look for potential further improvements using cython, if any, >> on a simplistic script as this one. > In short: > > - Check the annotated HTML output from "cython -a". > The name of the game is to make things less yellow. > > - Add cdef type for all variables (removes PyObject boxing). > This will be the main source of speed gains. > > - Sprinkle in @cython.cdivision(True) > > - Use "from libc.math cimport abs" > > - Take a look at lambertw.pyx in scipy.special. > It shows you how to make the function an ufunc. > From johann.cohen-tanugi at lupm.univ-montp2.fr Fri Sep 23 12:39:23 2011 From: johann.cohen-tanugi at lupm.univ-montp2.fr (Johann Cohen-Tanugi) Date: Fri, 23 Sep 2011 18:39:23 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> <4E7C87C8.9080104@gmail.com> Message-ID: <4E7CB63B.40402@lupm.univ-montp2.fr> On 09/23/2011 06:18 PM, Pauli Virtanen wrote: > Fri, 23 Sep 2011 15:21:12 +0200, Johann Cohen-Tanugi wrote: > [clip] >> zetac(z) knows how to eat negative arguments, but not zeta(z,1).... Is >> there a reason why special.zeta does not default to 1+special.zetac for >> s=1? This would make the behavior of the 2 functions more identical. > Probably no reason, except that it wasn't implemented. mpmath is > impressive, and in several ways ahead of scipy.special --- or at least in > the parts where the problems overlap, as you can do tricks with arbitrary > precision that are not really feasible. > > Note that if you need to call the zeta function from the Cython extension, > call the C library directly: > > cdef extern from "cephes.h": > double zeta(double x, double q) > double zetac(double x) thanks Pauli, is it what is exposed to special.zeta? I guess so, so what you are saying is that by going from cephes/zeta to special.zeta back into C in a pyx code I pay a penalty, correct? I am in the middle of a completely unknown territory, so I need to clarify every step as much as I can. > and link the extension with the "sc_cephes" library. > > *** > > There's a formula (look in the mpmath sources ;) for the transform > from x< 0 to x> 0 for zeta(x, a) for general a. But that needs polylog. > > The implementation for zetac(x) for x< 0 seems also a bit incomplete, > as it goes only down to -30.8148. It seems this is due to a silly reason, > it should use gammaln instead of Gamma to avoid the overflow. > > Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pav at iki.fi Fri Sep 23 16:20:54 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 23 Sep 2011 20:20:54 +0000 (UTC) Subject: [SciPy-User] polylogarithm? References: <4E719695.1080503@gmail.com> <4E7C87C8.9080104@gmail.com> <4E7CB63B.40402@lupm.univ-montp2.fr> Message-ID: On Fri, 23 Sep 2011 18:39:23 +0200, Johann Cohen-Tanugi wrote: [clip] > thanks Pauli, is it what is exposed to special.zeta? I guess so, so what > you are saying is that by going from cephes/zeta to special.zeta back > into C in a pyx code I pay a penalty, correct? I am in the middle of a > completely unknown territory, so I need to clarify every step as much as > I can. Exactly, when you call special.zeta(x), you get overhead associated with python floats, numpy array scalars, and ufunc machinery. With a call to zeta from "cephes.h", it's a direct call in C. Note that the fact that "special.zeta" and the function from cephes are quite different things (the fact that they have the same name is just a "coincidence"). Pauli From fdu.xiaojf at gmail.com Sat Sep 24 00:24:28 2011 From: fdu.xiaojf at gmail.com (fdu.xiaojf at gmail.com) Date: Sat, 24 Sep 2011 12:24:28 +0800 Subject: [SciPy-User] How to plot heatmap in matplotlib? Message-ID: <4E7D5B7C.1080906@gmail.com> Dear all, Heatmap (like those on the page http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/) is a frequently used type of image in microarray data analysis. However, it seems there are no convenient functions in matplotlib to plot heatmap (please correct me if I was wrong), so I'm planning to write my own. Let me take the heatmap by the link http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/scaled_color_key.png as an example, which is produced by R. With my limited knowledge and expertise of matplotlib, I have the following questions and I hope you guys could help me. 1) I tend to use pcolor to draw the colormap in the central area. However, I've seen a lot of examples draw colormap with imshow. What's the difference between pcolor and imshow? Shall I use pcolor or imshow to produce the heatmap in the link above? 2) How to draw the dendrograms on the top and left of the colormap? I got hints from http://matplotlib.sourceforge.net/examples/axes_grid/scatter_hist.html on how to append axes to current plot, but I still have now idea how to draw the dengrograms. 3) How to draw the column side colormap (the smaller one) between the top dendrogram and the big colormap? 4) I can use colorbar to draw a colorbar, but how to place the colorbar on the topleft of the image just as the R heatmap does? 5) Any other suggestions on how to draw the heatmap? Thanks and any help will be greatly appreciated. Regards, Jianfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: From anders.harrysson at fem4fun.com Sat Sep 24 03:33:53 2011 From: anders.harrysson at fem4fun.com (Anders Harrysson) Date: Sat, 24 Sep 2011 09:33:53 +0200 Subject: [SciPy-User] FFT Filter Message-ID: <4E7D87E1.409@fem4fun.com> Dear all, I am kind of new to scipy and also new to the signal processing field that this question relates to. I am trying to do a bandpass FFT filter using python. The filter shape is symmetric around 11 Hz and is defined by the parameters ff and Hz below. x=loadtxt('file') sr = 250 # [samples/s] nf = sr/2.0 # Nyquist frequence Ns = len(tr[:,0]) # Total number of samples N=float(8192) # Fourier settings # Fourier transform X1 = fft(x,n=int(N)) X1 = fftshift(X1) F1 = arange(-N/2.0,N/2.0)/N*sr # Filter ff=[0,1,1,0] Hz = [9.5, 10, 12, 12.5] k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) X1_f=X1*k1 X1_f=ifftshift(X1_f) x1_f=ifft(X1_f,n=int(N)) My question is now: Are ther built in functionallity for filtering in scioy and, if so, how would a similar filter looks like. Regards, Anders Harrysson From johann.cohentanugi at gmail.com Sat Sep 24 18:18:21 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Sun, 25 Sep 2011 00:18:21 +0200 Subject: [SciPy-User] polylogarithm? In-Reply-To: References: <4E719695.1080503@gmail.com> <4E7C87C8.9080104@gmail.com> Message-ID: <4E7E572D.5010909@gmail.com> For the record, I created a first pull request, with limited scope, in order to make the C call to cephes described below possible. Further details in https://github.com/scipy/scipy/pull/84 I will switch to the dev list from now on. best, Johann On 09/23/2011 06:18 PM, Pauli Virtanen wrote: > Fri, 23 Sep 2011 15:21:12 +0200, Johann Cohen-Tanugi wrote: > [clip] >> zetac(z) knows how to eat negative arguments, but not zeta(z,1).... Is >> there a reason why special.zeta does not default to 1+special.zetac for >> s=1? This would make the behavior of the 2 functions more identical. > Probably no reason, except that it wasn't implemented. mpmath is > impressive, and in several ways ahead of scipy.special --- or at least in > the parts where the problems overlap, as you can do tricks with arbitrary > precision that are not really feasible. > > Note that if you need to call the zeta function from the Cython extension, > call the C library directly: > > cdef extern from "cephes.h": > double zeta(double x, double q) > double zetac(double x) > > and link the extension with the "sc_cephes" library. > > *** > > There's a formula (look in the mpmath sources ;) for the transform > from x< 0 to x> 0 for zeta(x, a) for general a. But that needs polylog. > > The implementation for zetac(x) for x< 0 seems also a bit incomplete, > as it goes only down to -30.8148. It seems this is due to a silly reason, > it should use gammaln instead of Gamma to avoid the overflow. > > Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From wesmckinn at gmail.com Sun Sep 25 21:36:21 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 25 Sep 2011 21:36:21 -0400 Subject: [SciPy-User] ANN: pandas 0.4.1 release Message-ID: I'm happy to announce the 0.4.1 pandas release, largely a bugfix release with a handful of new features and speed optimizations. See below or on GitHub for the full release notes. Binary installers are on PyPI. Thanks to all for bug reports and pull requests. best, Wes Links ===== Release Notes: https://github.com/wesm/pandas/blob/master/RELEASE.rst Documentation: http://pandas.sourceforge.net Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/wesm/pandas Mailing List: http://groups.google.com/group/pystatsmodels Blog: http://blog.wesmckinney.com ========================== pandas 0.4.1 Release Notes ========================== **Release date:** 9/25/2011 This is primarily a bug fix release but includes some new features and improvements **New features / modules** - Added new `DataFrame` methods `get_dtype_counts` and property `dtypes` - Setting of values using ``.ix`` indexing attribute in mixed-type DataFrame objects has been implemented (fixes GH #135) - `read_csv` can read multiple columns into a `MultiIndex`. DataFrame's `to_csv` method will properly write out a `MultiIndex` which can be read back (PR #151, thanks to Skipper Seabold) - Wrote fast time series merging / joining methods in Cython. Will be integrated later into DataFrame.join and related functions - Added `ignore_index` option to `DataFrame.append` for combining unindexed records stored in a DataFrame **Improvements to existing features** - Some speed enhancements with internal Index type-checking function - `DataFrame.rename` has a new `copy` parameter which can rename a DataFrame in place - Enable unstacking by level name (PR #142) - Enable sortlevel to work by level name (PR #141) - `read_csv` can automatically "sniff" other kinds of delimiters using `csv.Sniffer` (PR #146) - Improved speed of unit test suite by about 40% - Exception will not be raised calling `HDFStore.remove` on non-existent node with where clause - Optimized `_ensure_index` function resulting in performance savings in type-checking Index objects **Bug fixes** - Fixed DataFrame constructor bug causing downstream problems (e.g. .copy() failing) when passing a Series as the values along with a column name and index - Fixed single-key groupby on DataFrame with as_index=False (GH #160) - `Series.shift` was failing on integer Series (GH #154) - `unstack` methods were producing incorrect output in the case of duplicate hierarchical labels. An exception will now be raised (GH #147) - Calling `count` with level argument caused reduceat failure or segfault in earlier NumPy (GH #169) - Fixed `DataFrame.corrwith` to automatically exclude non-numeric data (GH #144) - Unicode handling bug fixes in `DataFrame.to_string` (GH #138) - Excluding OLS degenerate unit test case that was causing platform specific failure (GH #149) - Skip blosc-dependent unit tests for PyTables < 2.2 (PR #137) - Calling `copy` on `DateRange` did not copy over attributes to the new object (GH #168) - Fix bug in `HDFStore` in which Panel data could be appended to a Table with different item order, thus resulting in an incorrect result read back From jeremy at jeremysanders.net Mon Sep 26 04:29:26 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Mon, 26 Sep 2011 09:29:26 +0100 Subject: [SciPy-User] ANN: pandas 0.4.1 release References: Message-ID: Wes McKinney wrote: > I'm happy to announce the 0.4.1 pandas release, largely a bugfix > release with a handful of new features and speed optimizations. See > below or on GitHub for the full release notes. Binary installers are > on PyPI. Thanks to all for bug reports and pull requests. It might be helpful to say what pandas is in future announcements! https://github.com/wesm/pandas#readme """ pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. """ Jeremy From pratik.mallya at gmail.com Sun Sep 25 12:15:15 2011 From: pratik.mallya at gmail.com (pratik) Date: Sun, 25 Sep 2011 11:15:15 -0500 Subject: [SciPy-User] bug in svd ? Message-ID: <4E7F5393.7060501@gmail.com> Hi Scipy users, I was using the scikits.learn package for pca analysis, but couldn't get the desired principal components (which were apparant from the structure of the data). So i just took the svd of the covariance matrix of the input data (after making the mean 0, of course). But on examining the singular values, i found that they are *NOT sorted in decreasing order* (as they should be; although i can of course sort it myself now, the scikits.learn package depends upon this fact) This is also mentioned in the code; that the singular values should be sorted: Returns ------- u : ndarray Unitary matrix. The shape of `u` is (`M`, `M`) or (`M`, `K`) depending on value of ``full_matrices``. s : ndarray The singular values, sorted so that ``s[i] >= s[i+1]``. `s` is a 1-d array of length min(`M`, `N`). v : ndarray Unitary matrix of shape (`N`, `N`) or (`K`, `N`), depending on ``full_matrices``. I am attaching the code and the data for you to examine. Just print out the values of the s array in ipython to see what i mean... Best, -- Pratik Mallya https://netfiles.uiuc.edu/mallya2/www/index.html -------------- next part -------------- A non-text attachment was scrubbed... Name: p3pca.py Type: text/x-python Size: 476 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vl1.wav Type: audio/x-wav Size: 177990 bytes Desc: not available URL: From pratik.mallya at gmail.com Sun Sep 25 12:56:48 2011 From: pratik.mallya at gmail.com (pratik) Date: Sun, 25 Sep 2011 11:56:48 -0500 Subject: [SciPy-User] bug in svd ? In-Reply-To: <4E7F5393.7060501@gmail.com> References: <4E7F5393.7060501@gmail.com> Message-ID: <4E7F5D50.4040801@gmail.com> Please ignore this mail...the implementation is working correctly, it was a silly error on my part. On Sunday 25 September 2011 11:15 AM, pratik wrote: > Hi Scipy users, > I was using the scikits.learn package for pca analysis, but couldn't get > the desired principal components (which were apparant from the structure > of the data). So i just took the svd of the covariance matrix of the > input data (after making the mean 0, of course). But on examining the > singular values, i found that they are *NOT sorted in decreasing order* > (as they should be; although i can of course sort it myself now, the > scikits.learn package depends upon this fact) This is also mentioned in > the code; that the singular values should be sorted: > > Returns > ------- > u : ndarray > Unitary matrix. The shape of `u` is (`M`, `M`) or (`M`, `K`) > depending on value of ``full_matrices``. > s : ndarray > The singular values, sorted so that ``s[i] >= s[i+1]``. `s` is > a 1-d array of length min(`M`, `N`). > v : ndarray > Unitary matrix of shape (`N`, `N`) or (`K`, `N`), depending on > ``full_matrices``. > > I am attaching the code and the data for you to examine. Just print out > the values of the s array in ipython to see what i mean... > > Best, > -- Pratik Mallya https://netfiles.uiuc.edu/mallya2/www/index.html From pav at iki.fi Mon Sep 26 09:09:35 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 26 Sep 2011 15:09:35 +0200 Subject: [SciPy-User] bug in svd ? In-Reply-To: <4E7F5393.7060501@gmail.com> References: <4E7F5393.7060501@gmail.com> Message-ID: 25.09.2011 18:15, pratik kirjoitti: > I was using the scikits.learn package for pca analysis, but couldn't get > the desired principal components (which were apparant from the structure > of the data). So i just took the svd of the covariance matrix of the > input data (after making the mean 0, of course). But on examining the > singular values, i found that they are *NOT sorted in decreasing order* Works for me: assert (numpy.diff(s) <= 0).all() What does your `numpy.show_config()` say? -- Pauli Virtanen From Wolfgang.Mader at fdm.uni-freiburg.de Mon Sep 26 10:14:50 2011 From: Wolfgang.Mader at fdm.uni-freiburg.de (Wolfgang Mader) Date: Mon, 26 Sep 2011 16:14:50 +0200 Subject: [SciPy-User] X Error: BadWindow, matplotlib, numpy Message-ID: <3310937.PBm2ieMETc@killbill> Hallo, sometimes when I close a matplotlib window by clicking the [x] in the window decoration I get this in my ipython prompt X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0x6e00db6 I am using python 2.7.2, matplotlib 1.0.1-2, and numpy 1.6.1-1. What causes this? By the way. Is there matplotlib for python3 or any plans for this? Thank you, Wolfgang From scipy at samueljohn.de Mon Sep 26 11:16:18 2011 From: scipy at samueljohn.de (Samuel John) Date: Mon, 26 Sep 2011 17:16:18 +0200 Subject: [SciPy-User] X Error: BadWindow, matplotlib, numpy In-Reply-To: <3310937.PBm2ieMETc@killbill> References: <3310937.PBm2ieMETc@killbill> Message-ID: Hi Wolfgang, matplotlib for python3 is not yet ready. There has been a successful codesprint: http://pythonsprints.com/2011/04/8/matplotlib-python-3-thanks-cape-town-group/ I am not sure what the current status is. Perhaps the matplotlib mailing list can answer better: http://sourceforge.net/mail/?group_id=80706 Concerning the "X Error: BadWindow" - I cannot help (could not reproduce it) At least you should provide some information about operating system and the windowing backend you are using. I guess, the matplotlib mailing list can be of better help here, too. Bests, Samuel Samuel -- Dipl.-Inform. Samuel John - - - - - - - - - - - - - - - - - - - - - - - - - PhD student, CoR-Lab(.de) and Neuroinformatics Group, Faculty of Technology, D33594 Bielefeld in cooperation with the HONDA Research Institute Europe GmbH mail at samueljohn.de sjohn at cor-lab.uni-bielefeld.de IM: samueljohn at jabber.org - - - - - - - - - - - - - - - - - - - - - - - - - From Wolfgang.Mader at fdm.uni-freiburg.de Mon Sep 26 11:24:12 2011 From: Wolfgang.Mader at fdm.uni-freiburg.de (Wolfgang Mader) Date: Mon, 26 Sep 2011 17:24:12 +0200 Subject: [SciPy-User] X Error: BadWindow, matplotlib, numpy In-Reply-To: References: <3310937.PBm2ieMETc@killbill> Message-ID: <7749131.KktROz1059@killbill> > Concerning the "X Error: BadWindow" - I cannot help (could not reproduce it) > At least you should provide some information about operating system and the > windowing backend you are using. I guess, the matplotlib mailing list can > be of better help here, too. Thank you for your answer and for pointing me to the matploblib mailinglist. Just to make my first mail more complete. The problem is on a linux box (Arch Linux, kernel 3.0, xorg 1.10.4-1, KDE, kwin window manager). Bests, Wolfgang From guziy.sasha at gmail.com Mon Sep 26 11:44:51 2011 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Mon, 26 Sep 2011 11:44:51 -0400 Subject: [SciPy-User] X Error: BadWindow, matplotlib, numpy In-Reply-To: <7749131.KktROz1059@killbill> References: <3310937.PBm2ieMETc@killbill> <7749131.KktROz1059@killbill> Message-ID: Hi, I found this link concerning the error http://www.linuxquestions.org/questions/linux-desktop-74/how-can-i-fix-error-badwindow-invalid-window-parameter-575745/ try it on gnome, maybe it is kde's problem. -- Oleksandr Huziy 2011/9/26 Wolfgang Mader > > Concerning the "X Error: BadWindow" - I cannot help (could not reproduce > it) > > At least you should provide some information about operating system and > the > > windowing backend you are using. I guess, the matplotlib mailing list can > > be of better help here, too. > > Thank you for your answer and for pointing me to the matploblib > mailinglist. > > Just to make my first mail more complete. The problem is on a linux box > (Arch > Linux, kernel 3.0, xorg 1.10.4-1, KDE, kwin window manager). > > Bests, Wolfgang > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy at samueljohn.de Mon Sep 26 12:04:51 2011 From: scipy at samueljohn.de (Samuel John) Date: Mon, 26 Sep 2011 18:04:51 +0200 Subject: [SciPy-User] X Error: BadWindow, matplotlib, numpy In-Reply-To: References: <3310937.PBm2ieMETc@killbill> <7749131.KktROz1059@killbill> Message-ID: On 26.09.2011, at 17:44, Oleksandr Huziy wrote: > Hi, > > I found this link concerning the error > http://www.linuxquestions.org/questions/linux-desktop-74/how-can-i-fix-error-badwindow-invalid-window-parameter-575745/ > > try it on gnome, maybe it is kde's problem. So, what this means is basically: ```` import matplotlib matplotlib.use('GTK') # or perhaps 'GTKAgg'. Your default probably is 'TkAgg' from matplotlib import pyplot pyplot.plot([1,2,1]) # whatever pyplot.show() #now close the window ```` -- Samuel From christophe.grimault at novagrid.com Mon Sep 26 12:32:17 2011 From: christophe.grimault at novagrid.com (Christophe Grimault) Date: Mon, 26 Sep 2011 18:32:17 +0200 Subject: [SciPy-User] FFT Filter In-Reply-To: <4E7D87E1.409@fem4fun.com> References: <4E7D87E1.409@fem4fun.com> Message-ID: <1317054737.2481.7.camel@pandora> Hi Anders, Use : scipy.signals.lfilter(b, a, x) Where x is your signal (complex or real, it doesn't matter). As a filter in the Fourier domain is basically FIR filter, you only need to pass the b array (the response of the filter) and set a = [1.0]. Chris On Sat, 2011-09-24 at 09:33 +0200, Anders Harrysson wrote: > Dear all, > > I am kind of new to scipy and also new to the signal processing field > that this question relates to. > I am trying to do a bandpass FFT filter using python. The filter shape > is symmetric around 11 Hz and is defined by the parameters ff and Hz below. > > x=loadtxt('file') > > sr = 250 # [samples/s] > nf = sr/2.0 # Nyquist frequence > Ns = len(tr[:,0]) # Total number of samples > N=float(8192) # Fourier settings > > # Fourier transform > X1 = fft(x,n=int(N)) > X1 = fftshift(X1) > F1 = arange(-N/2.0,N/2.0)/N*sr > > # Filter > ff=[0,1,1,0] > Hz = [9.5, 10, 12, 12.5] > k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) > X1_f=X1*k1 > X1_f=ifftshift(X1_f) > x1_f=ifft(X1_f,n=int(N)) > > My question is now: > Are ther built in functionallity for filtering in scioy and, if so, how > would a similar filter looks like. > > Regards, > Anders Harrysson > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- Christophe Grimault NovaGrid SAS Les jardins de la Teillais 3, all?e de la grande ?galonne 35740 Pac? France tel : (33)2 23 41 37 97 web : www.novagrid.com From tmp50 at ukr.net Mon Sep 26 17:28:00 2011 From: tmp50 at ukr.net (Dmitrey) Date: Tue, 27 Sep 2011 00:28:00 +0300 Subject: [SciPy-User] =?windows-1251?q?=5BANN=5D_ODE_dy/dt_=3D_f=28t=29_so?= =?windows-1251?q?lver_with_guaranteed_speficiable_accuracy?= Message-ID: <96683.1317072480.9969521652689403904@ffe6.ukr.net> hi all, now free solver interalg from OpenOpt framework (based on interval analysis) can solve ODE dy/dt = f(t) with guaranteed specifiable accuracy. See the ODE webpage for more details, there is an example of comparison with scipy.integrate.odeint, where latter fails to solve a problem. Future plans include solving of some general ODE systems dy/dt = f(y, t). Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanceboyle at qwest.net Mon Sep 26 23:00:05 2011 From: lanceboyle at qwest.net (Jerry) Date: Mon, 26 Sep 2011 20:00:05 -0700 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: Message-ID: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> On Sep 26, 2011, at 1:29 AM, Jeremy Sanders wrote: > Wes McKinney wrote: > >> I'm happy to announce the 0.4.1 pandas release, largely a bugfix >> release with a handful of new features and speed optimizations. See >> below or on GitHub for the full release notes. Binary installers are >> on PyPI. Thanks to all for bug reports and pull requests. > > It might be helpful to say what pandas is in future announcements! > > https://github.com/wesm/pandas#readme > """ > pandas is a Python package providing fast, flexible, and expressive data > structures designed to make working with "relational" or "labeled" data both > easy and intuitive. It aims to be the fundamental high-level building block > for doing practical, real world data analysis in Python. Additionally, it > has the broader goal of becoming the most powerful and flexible open source > data analysis / manipulation tool available in any language. It is already > well on its way toward this goal. > """ > > Jeremy Good advice for _everyone_ software announcements. One of my pet peeves. Jerry From wesmckinn at gmail.com Mon Sep 26 23:21:28 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 26 Sep 2011 23:21:28 -0400 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: On Mon, Sep 26, 2011 at 11:00 PM, Jerry wrote: > > On Sep 26, 2011, at 1:29 AM, Jeremy Sanders wrote: > >> Wes McKinney wrote: >> >>> I'm happy to announce the 0.4.1 pandas release, largely a bugfix >>> release with a handful of new features and speed optimizations. See >>> below or on GitHub for the full release notes. Binary installers are >>> on PyPI. Thanks to all for bug reports and pull requests. >> >> It might be helpful to say what pandas is in future announcements! >> >> https://github.com/wesm/pandas#readme >> """ >> pandas is a Python package providing fast, flexible, and expressive data >> structures designed to make working with "relational" or "labeled" data both >> easy and intuitive. It aims to be the fundamental high-level building block >> for doing practical, real world data analysis in Python. Additionally, it >> has the broader goal of becoming the most powerful and flexible open source >> data analysis / manipulation tool available in any language. It is already >> well on its way toward this goal. >> """ >> >> Jeremy > > Good advice for _everyone_ software announcements. One of my pet peeves. > Jerry > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Seriously? I'm burning the candle at both ends building high quality *free* software with excellent documentation and you guys are complaining about having to copy-and-paste a hyperlink? =P I will certainly include a description of what the package does in future release e-mails. BTW a "thanks for all your hard work" along with the criticism would be a more tactful way to go about things! - Wes From lanceboyle at qwest.net Tue Sep 27 03:32:29 2011 From: lanceboyle at qwest.net (Jerry) Date: Tue, 27 Sep 2011 00:32:29 -0700 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: <6F1B6B2A-8729-4D8B-84F7-3E031A1511A4@qwest.net> On Sep 26, 2011, at 8:21 PM, Wes McKinney wrote: > On Mon, Sep 26, 2011 at 11:00 PM, Jerry wrote: >> >> On Sep 26, 2011, at 1:29 AM, Jeremy Sanders wrote: >> >>> Wes McKinney wrote: >>> >>>> I'm happy to announce the 0.4.1 pandas release, largely a bugfix >>>> release with a handful of new features and speed optimizations. See >>>> below or on GitHub for the full release notes. Binary installers are >>>> on PyPI. Thanks to all for bug reports and pull requests. >>> >>> It might be helpful to say what pandas is in future announcements! >>> >>> https://github.com/wesm/pandas#readme >>> """ >>> pandas is a Python package providing fast, flexible, and expressive data >>> structures designed to make working with "relational" or "labeled" data both >>> easy and intuitive. It aims to be the fundamental high-level building block >>> for doing practical, real world data analysis in Python. Additionally, it >>> has the broader goal of becoming the most powerful and flexible open source >>> data analysis / manipulation tool available in any language. It is already >>> well on its way toward this goal. >>> """ >>> >>> Jeremy >> >> Good advice for _everyone_ software announcements. One of my pet peeves. >> Jerry >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > Seriously? I'm burning the candle at both ends building high quality > *free* software with excellent documentation and you guys are > complaining about having to copy-and-paste a hyperlink? =P > > I will certainly include a description of what the package does in > future release e-mails. BTW a "thanks for all your hard work" along > with the criticism would be a more tactful way to go about things! > > - Wes You're right. My apologies. Jerry From fperez.net at gmail.com Tue Sep 27 04:00:46 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 27 Sep 2011 01:00:46 -0700 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: Hey Wes, On Mon, Sep 26, 2011 at 8:21 PM, Wes McKinney wrote: > I will certainly include a description of what the package does in > future release e-mails. I keep around a standard 'ipython release' email with some of that boilerplate, that I reuse over and over and adjust accordingly. It has the usual 'what is it/where to get it/what's new' trifecta in place, and I fill in the parts for each release with the relevant bits, but it helps me to not forget to have some of the key links and pointers. Cheers, f ps - congrats on the release, and please get some rest, so us mere mortals can catch up with you ;) From scipy at samueljohn.de Tue Sep 27 05:25:37 2011 From: scipy at samueljohn.de (Samuel John) Date: Tue, 27 Sep 2011 11:25:37 +0200 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: Wes, I appreciate your work - as does everyone here, I guess. Just to let you know :-) I read your blog, installed pandas and I have the deepest respect for what you do. But Jerrys idea is good for people new to the mailing list. Maybe someone who got here by a google search or something like that. Samuel From anders.harrysson at fem4fun.com Tue Sep 27 05:47:31 2011 From: anders.harrysson at fem4fun.com (Anders Harrysson) Date: Tue, 27 Sep 2011 11:47:31 +0200 Subject: [SciPy-User] FFT Filter In-Reply-To: <1317054737.2481.7.camel@pandora> References: <4E7D87E1.409@fem4fun.com> <1317054737.2481.7.camel@pandora> Message-ID: <4E819BB3.2010808@fem4fun.com> Hi, Thanks a lot for the reply. I have started to look into this and currently I am trying to use firwin2 to generate the filter coefficients according to below b=signal.firwin2(512,freq,gain,nfreqs=N,nyq=nf) X2=signal.lfilter(b,[1.0],y) However I get strange results, when trying to filter even a very simple signal :(. This is probably not a scipy related question, but how to put the numtaps number? How is this related to the other inputs. Regards, Anders ** Christophe Grimault skrev 2011-09-26 18:32: > Hi Anders, > > Use : scipy.signals.lfilter(b, a, x) > > Where x is your signal (complex or real, it doesn't matter). As a filter > in the Fourier domain is basically FIR filter, you only need to pass the > b array (the response of the filter) and set a = [1.0]. > > Chris > > On Sat, 2011-09-24 at 09:33 +0200, Anders Harrysson wrote: >> Dear all, >> >> I am kind of new to scipy and also new to the signal processing field >> that this question relates to. >> I am trying to do a bandpass FFT filter using python. The filter shape >> is symmetric around 11 Hz and is defined by the parameters ff and Hz below. >> >> x=loadtxt('file') >> >> sr = 250 # [samples/s] >> nf = sr/2.0 # Nyquist frequence >> Ns = len(tr[:,0]) # Total number of samples >> N=float(8192) # Fourier settings >> >> # Fourier transform >> X1 = fft(x,n=int(N)) >> X1 = fftshift(X1) >> F1 = arange(-N/2.0,N/2.0)/N*sr >> >> # Filter >> ff=[0,1,1,0] >> Hz = [9.5, 10, 12, 12.5] >> k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) >> X1_f=X1*k1 >> X1_f=ifftshift(X1_f) >> x1_f=ifft(X1_f,n=int(N)) >> >> My question is now: >> Are ther built in functionallity for filtering in scioy and, if so, how >> would a similar filter looks like. >> >> Regards, >> Anders Harrysson >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Tue Sep 27 07:53:43 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 27 Sep 2011 06:53:43 -0500 Subject: [SciPy-User] FFT Filter In-Reply-To: <4E819BB3.2010808@fem4fun.com> References: <4E7D87E1.409@fem4fun.com> <1317054737.2481.7.camel@pandora> <4E819BB3.2010808@fem4fun.com> Message-ID: Anders, The attached script is a variation of http://www.scipy.org/Cookbook/FIRFilter. It uses firwin() to create a bandpass filter, and plots the frequency response and the results of applying the filter to a sample. It should be straightforward to modify it to use firwin2() instead, if desired. The script uses scipy.signal.lfilter to apply the filter, but this isn't the fastest way to apply a FIR filter. Take a look at an experiment that I did here http://www.scipy.org/Cookbook/ApplyFIRFilter to see a comparison of several way of applying a FIR filter. Regards, Warren On Tue, Sep 27, 2011 at 4:47 AM, Anders Harrysson < anders.harrysson at fem4fun.com> wrote: > Hi, > Thanks a lot for the reply. I have started to look into this and currently > I am trying to use firwin2 to generate the filter coefficients according to > below > > b=signal.firwin2(512,freq,gain,nfreqs=N,nyq=nf) > > X2=signal.lfilter(b,[1.0],y) > > However I get strange results, when trying to filter even a very simple > signal :(. This is probably not a scipy related question, but how to put the > numtaps number? How is this related to the other inputs. > > Regards, > Anders ** > > Christophe Grimault skrev 2011-09-26 18:32: > > Hi Anders, > > Use : scipy.signals.lfilter(b, a, x) > > Where x is your signal (complex or real, it doesn't matter). As a filter > in the Fourier domain is basically FIR filter, you only need to pass the > b array (the response of the filter) and set a = [1.0]. > > Chris > > On Sat, 2011-09-24 at 09:33 +0200, Anders Harrysson wrote: > > Dear all, > > I am kind of new to scipy and also new to the signal processing field > that this question relates to. > I am trying to do a bandpass FFT filter using python. The filter shape > is symmetric around 11 Hz and is defined by the parameters ff and Hz below. > > x=loadtxt('file') > > sr = 250 # [samples/s] > nf = sr/2.0 # Nyquist frequence > Ns = len(tr[:,0]) # Total number of samples > N=float(8192) # Fourier settings > > # Fourier transform > X1 = fft(x,n=int(N)) > X1 = fftshift(X1) > F1 = arange(-N/2.0,N/2.0)/N*sr > > # Filter > ff=[0,1,1,0] > Hz = [9.5, 10, 12, 12.5] > k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) > X1_f=X1*k1 > X1_f=ifftshift(X1_f) > x1_f=ifft(X1_f,n=int(N)) > > My question is now: > Are ther built in functionallity for filtering in scioy and, if so, how > would a similar filter looks like. > > Regards, > Anders Harrysson > _______________________________________________ > SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fir_filter_example.py Type: application/octet-stream Size: 3846 bytes Desc: not available URL: From apalomba at austin.rr.com Tue Sep 27 09:21:11 2011 From: apalomba at austin.rr.com (Anthony Palomba) Date: Tue, 27 Sep 2011 08:21:11 -0500 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: Really, you mean after doing all that work you can not write a three line description of what Panda is? I hope you are still able to feed and dress yourself. What dedication! Thank you for your incredible commitment. Anthony On 9/27/11, Samuel John wrote: > Wes, > > I appreciate your work - as does everyone here, I guess. Just to let you > know :-) > I read your blog, installed pandas and I have the deepest respect for what > you do. > But Jerrys idea is good for people new to the mailing list. Maybe someone > who got here by a google search or something like that. > > Samuel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From alan.isaac at gmail.com Tue Sep 27 09:25:59 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 27 Sep 2011 09:25:59 -0400 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> Message-ID: <4E81CEE7.7030007@gmail.com> On 9/27/2011 9:21 AM, Anthony Palomba wrote: > Really, you mean after doing all that work > you can not write a three line description of > what Panda is? I hope you are still able to feed > and dress yourself. What dedication! > > Thank you for your incredible commitment. If you want to know what pandas is, go here: http://pandas.sourceforge.net/ Either use it and say thanks sincerely, or be quiet. Alan Isaac From scott.sinclair.za at gmail.com Tue Sep 27 09:31:44 2011 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 27 Sep 2011 15:31:44 +0200 Subject: [SciPy-User] ANN: pandas 0.4.1 release In-Reply-To: <4E81CEE7.7030007@gmail.com> References: <4FA7CB70-F10B-4D7A-A35F-936C3096A88B@qwest.net> <4E81CEE7.7030007@gmail.com> Message-ID: On 27 September 2011 15:25, Alan G Isaac wrote: > On 9/27/2011 9:21 AM, Anthony Palomba wrote: >> Really, you mean after doing all that work >> you can not write a three line description of >> what Panda is? I hope you are still able to feed >> and dress yourself. What dedication! >> >> Thank you for your incredible commitment. > > > If you want to know what pandas is, go here: > http://pandas.sourceforge.net/ > Either use it and say thanks sincerely, > or be quiet. If anyone's interested, I'm currently offering jaw-dropping discounts on flame retardant vests... Cheers, Scott From georges.schutz at internet.lu Tue Sep 27 10:14:17 2011 From: georges.schutz at internet.lu (Georges Schutz) Date: Tue, 27 Sep 2011 16:14:17 +0200 Subject: [SciPy-User] scikits timeseries tofile In-Reply-To: <00E38D9D-EE12-43C9-BCA6-521C826F0AA7@gmail.com> References: <00E38D9D-EE12-43C9-BCA6-521C826F0AA7@gmail.com> Message-ID: On 24/03/2011 11:13, Pierre GM wrote: > > On Mar 24, 2011, at 2:50 AM, Gar Brown wrote: > >> Hi All, >> >> While trying to write a timeseries object to file I the following error: >> >> File "C:\Python27\lib\site-packages\scikits\timeseries\tseries.py", >> line 1527, in tofile >> return scipy.io.write_array(fileobject, tmpfiller, **optpars) >> AttributeError: 'module' object has no attribute 'write_array' >> >> >> I noticed that the scipy.io.write_array function has been deprecated >> for some time, is there a fix or workaround for avoiding this error. I >> really like the time series functionality of scikits.timeseries > > Dang. I really need to get back on the scikit... Would you mind filing a ticket ? I'll try to get back to you by next week. Don't hesitate to keep on nudging me till I give you a fix and/or workaround. > P. This seems still to be an issue, it is files as ticket 116 but no workaround or fix was proposed until know. I face this problem as I updated scipy last. I suppose others using scipy.io may have the same problem. I personally worked around by writing an own "tofile-like" method using the numpy.ndarray.tofile method. I do not want to stress you but: - Do you have a better workaround for solving the problem? - Are there plans to fix this ticked in some way in the next release? G. Schutz From anders.harrysson at fem4fun.com Tue Sep 27 10:21:42 2011 From: anders.harrysson at fem4fun.com (Anders Harrysson) Date: Tue, 27 Sep 2011 16:21:42 +0200 Subject: [SciPy-User] FFT Filter In-Reply-To: References: <4E7D87E1.409@fem4fun.com> <1317054737.2481.7.camel@pandora> <4E819BB3.2010808@fem4fun.com> Message-ID: <4E81DBF6.3010703@fem4fun.com> Hi, I have looked thou the example and obviously there are something that I simply don't get :(. Do anyone have some good references on how to decide the input parameters to firwin2? Furthermore, when computing he phase delay, will this differ when using a band pass filter instead of a low pass? Sorry for very elementary questions... Anders Warren Weckesser skrev 2011-09-27 13:53: > Anders, > > The attached script is a variation of > http://www.scipy.org/Cookbook/FIRFilter. It uses firwin() to create a > bandpass filter, and plots the frequency response and the results of > applying the filter to a sample. It should be straightforward to > modify it to use firwin2() instead, if desired. > > The script uses scipy.signal.lfilter to apply the filter, but this > isn't the fastest way to apply a FIR filter. Take a look at an > experiment that I did here > http://www.scipy.org/Cookbook/ApplyFIRFilter to see a comparison of > several way of applying a FIR filter. > > Regards, > > Warren > > > On Tue, Sep 27, 2011 at 4:47 AM, Anders Harrysson > > > wrote: > > Hi, > Thanks a lot for the reply. I have started to look into this and > currently I am trying to use firwin2 to generate the filter > coefficients according to below > > b=signal.firwin2(512,freq,gain,nfreqs=N,nyq=nf) > > X2=signal.lfilter(b,[1.0],y) > > > However I get strange results, when trying to filter even a very > simple signal :(. This is probably not a scipy related question, > but how to put the numtaps number? How is this related to the > other inputs. > > Regards, > Anders > > Christophe Grimault skrev 2011-09-26 18:32: >> Hi Anders, >> >> Use : scipy.signals.lfilter(b, a, x) >> >> Where x is your signal (complex or real, it doesn't matter). As a filter >> in the Fourier domain is basically FIR filter, you only need to pass the >> b array (the response of the filter) and set a = [1.0]. >> >> Chris >> >> On Sat, 2011-09-24 at 09:33 +0200, Anders Harrysson wrote: >>> Dear all, >>> >>> I am kind of new to scipy and also new to the signal processing field >>> that this question relates to. >>> I am trying to do a bandpass FFT filter using python. The filter shape >>> is symmetric around 11 Hz and is defined by the parameters ff and Hz below. >>> >>> x=loadtxt('file') >>> >>> sr = 250 # [samples/s] >>> nf = sr/2.0 # Nyquist frequence >>> Ns = len(tr[:,0]) # Total number of samples >>> N=float(8192) # Fourier settings >>> >>> # Fourier transform >>> X1 = fft(x,n=int(N)) >>> X1 = fftshift(X1) >>> F1 = arange(-N/2.0,N/2.0)/N*sr >>> >>> # Filter >>> ff=[0,1,1,0] >>> Hz = [9.5, 10, 12, 12.5] >>> k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) >>> X1_f=X1*k1 >>> X1_f=ifftshift(X1_f) >>> x1_f=ifft(X1_f,n=int(N)) >>> >>> My question is now: >>> Are ther built in functionallity for filtering in scioy and, if so, how >>> would a similar filter looks like. >>> >>> Regards, >>> Anders Harrysson >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Tue Sep 27 11:14:32 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 27 Sep 2011 10:14:32 -0500 Subject: [SciPy-User] FFT Filter In-Reply-To: <4E81DBF6.3010703@fem4fun.com> References: <4E7D87E1.409@fem4fun.com> <1317054737.2481.7.camel@pandora> <4E819BB3.2010808@fem4fun.com> <4E81DBF6.3010703@fem4fun.com> Message-ID: On Tue, Sep 27, 2011 at 9:21 AM, Anders Harrysson < anders.harrysson at fem4fun.com> wrote: > Hi, > > I have looked thou the example and obviously there are something that I > simply don't get :(. Do anyone have some good references on how to decide > the input parameters to firwin2? > The text "Discrete-Time Signal Processing" by Oppenheim and Schafer is a good reference. In particular, the "window method" is described in Chapter 7 (7.4 or 7.5, depending on the edition). You can try googling for "fir filter design window method", but many of the hits will be implementations (in Matlab, IDL, R, etc) without much explanation. A correction to the example that I sent: the 'width' calculation did not agree with the comment; it should have been 'width = 5.0 / nyq_rate'. > Furthermore, when computing he phase delay, will this differ when using a > band pass filter instead of a low pass? > No. For a symmetric FIR filter, the phase delay only depends on the number of coefficients (ie. "taps") in the filter. Warren > > Sorry for very elementary questions... > > Anders > > Warren Weckesser skrev 2011-09-27 13:53: > > Anders, > > The attached script is a variation of > http://www.scipy.org/Cookbook/FIRFilter. It uses firwin() to create a > bandpass filter, and plots the frequency response and the results of > applying the filter to a sample. It should be straightforward to modify it > to use firwin2() instead, if desired. > > The script uses scipy.signal.lfilter to apply the filter, but this isn't > the fastest way to apply a FIR filter. Take a look at an experiment that I > did here http://www.scipy.org/Cookbook/ApplyFIRFilter to see a comparison > of several way of applying a FIR filter. > > Regards, > > Warren > > > On Tue, Sep 27, 2011 at 4:47 AM, Anders Harrysson < > anders.harrysson at fem4fun.com> wrote: > >> Hi, >> Thanks a lot for the reply. I have started to look into this and currently >> I am trying to use firwin2 to generate the filter coefficients according to >> below >> >> b=signal.firwin2(512,freq,gain,nfreqs=N,nyq=nf) >> >> X2=signal.lfilter(b,[1.0],y) >> >> However I get strange results, when trying to filter even a very simple >> signal :(. This is probably not a scipy related question, but how to put the >> numtaps number? How is this related to the other inputs. >> >> Regards, >> Anders >> >> Christophe Grimault skrev 2011-09-26 18:32: >> >> Hi Anders, >> >> Use : scipy.signals.lfilter(b, a, x) >> >> Where x is your signal (complex or real, it doesn't matter). As a filter >> in the Fourier domain is basically FIR filter, you only need to pass the >> b array (the response of the filter) and set a = [1.0]. >> >> Chris >> >> On Sat, 2011-09-24 at 09:33 +0200, Anders Harrysson wrote: >> >> Dear all, >> >> I am kind of new to scipy and also new to the signal processing field >> that this question relates to. >> I am trying to do a bandpass FFT filter using python. The filter shape >> is symmetric around 11 Hz and is defined by the parameters ff and Hz below. >> >> x=loadtxt('file') >> >> sr = 250 # [samples/s] >> nf = sr/2.0 # Nyquist frequence >> Ns = len(tr[:,0]) # Total number of samples >> N=float(8192) # Fourier settings >> >> # Fourier transform >> X1 = fft(x,n=int(N)) >> X1 = fftshift(X1) >> F1 = arange(-N/2.0,N/2.0)/N*sr >> >> # Filter >> ff=[0,1,1,0] >> Hz = [9.5, 10, 12, 12.5] >> k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff) >> X1_f=X1*k1 >> X1_f=ifftshift(X1_f) >> x1_f=ifft(X1_f,n=int(N)) >> >> My question is now: >> Are ther built in functionallity for filtering in scioy and, if so, how >> would a similar filter looks like. >> >> Regards, >> Anders Harrysson >> _______________________________________________ >> SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > _______________________________________________ > SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at maths.lth.se Tue Sep 27 11:35:42 2011 From: tony at maths.lth.se (Tony Stillfjord) Date: Tue, 27 Sep 2011 17:35:42 +0200 Subject: [SciPy-User] scipy.sparse vs. pysparse Message-ID: Hello everyone. I'm a Ph.D. student in Numerical Analysis where I currently work with exponential integrators for solving PDEs. This involves large sparse matrices, and a main task is to perform Arnoldi's algorithm to compute a Krylov subspace. This involves just matrix-vector multiplications and scalar products, so those are the operations I am interested in. For various reasons, I have been using pysparse ( http://pysparse.sourceforge.net/) for my sparse matrix needs until recently when I decided to migrate my code to scipy.sparse because of some complications I ran into when wanting to also run my code on a Windows system. Just to check that this would not ruin my computation times, I ran some performance tests to see how efficient scipy is in comparison to pysparse. In short, I set up as my test matrices the standard central finite-difference discretizations of the Laplacian in 1 and 2 dimensions and then timed the matrix-vector product with random (but fixed) vectors for different sizes of the matrices (i.e. discretization mesh widths) and vectors. I usually end up with matrices similar to these so the results should be representative of the performance on my real problems. In simplified code, I did for N in Nlist: A = laplacian_scipy(N) B = laplacian_pysparse(N) scipy.random.seed(seed) b = rand(N) t_scipy = 1e6*timeit.timeit('A*b', number = 1e3) / 1e3 t_pysparse = 1e6*timeit.timeit('B*b', number = 1e3) / 1e3 where the 1e6 numbers are to translate the times into microseconds and the 1e3's to specify that I want to run each command a thousand times. The results are reported below. What one can see is that for relatively small matrices, pysparse is a lot faster. This factor seems to decrease as the dimension of the matrix increases until scipy actually becomes faster than pysparse for really big matrices. Originally I used the CSR format for scipy.sparse as well as pysparse since pysparse only implements the CSR, linked-list and sparse skyline formats. However, since my matrices in this case are tri-diagonal and five-diagonal (though not with bandwidth 5), I figured maybe I should use the DIA format to be fair to scipy. Those results are also shown below, and the timings got somewhat, but not greatly, better. Regardless, in my real problems I will not have purely n-diagonal matrices, so the CSR-data is probably of most interest. Note that the pysparse format is still CSR. Is there a lot of overhead in the scipy.sparse function calls, or is there some other reason that scipy performs so much worse for "small" matrices? I realise that if you have a small enough matrix you could just as well have it be dense, but the point where scipy starts to break even with pysparse is quite far from the size where I would switch to the sparse format. Kind regards, Tony Stillfjord The timing data, for two different systems (I hope the formatting is not completely destroyed upon mailing this, sorry if that is the case): Ubuntu 10.04 numpy v.1.6.1 scipy v.0.9.0 pysparse development version ========= CSR ========== 1D: N SciPy pysparse 50 5.82e+01 6.17e+00 100 5.48e+01 6.63e+00 200 5.62e+01 7.58e+00 300 8.93e+01 9.15e+00 500 5.96e+01 1.03e+01 1000 6.59e+01 1.42e+01 2500 7.27e+01 2.44e+01 5000 8.64e+01 4.29e+01 10000 1.17e+02 8.12e+01 25000 2.13e+02 2.06e+02 50000 5.02e+02 6.10e+02 100000 1.25e+03 1.44e+03 2D: N SciPy pysparse 100 5.54e+01 7.18e+00 625 6.17e+01 1.45e+01 2500 7.79e+01 3.85e+01 10000 1.44e+02 1.36e+02 40000 5.73e+02 1.05e+03 90000 1.51e+03 2.51e+03 250000 4.35e+03 7.57e+03 1000000 1.61e+04 3.33e+04 ========= DIA ========== 1D: N SciPy pysparse 50 6.51e+01 6.30e+00 100 5.53e+01 6.58e+00 200 5.70e+01 7.68e+00 300 5.74e+01 8.58e+00 500 5.86e+01 1.00e+01 1000 6.18e+01 1.37e+01 2500 6.83e+01 2.68e+01 5000 7.97e+01 4.36e+01 10000 1.04e+02 8.10e+01 25000 1.75e+02 1.91e+02 50000 3.45e+02 5.23e+02 100000 8.18e+02 1.39e+03 2D: N SciPy pysparse 100 5.70e+01 7.15e+00 625 6.21e+01 1.44e+01 2500 7.81e+01 3.91e+01 10000 1.33e+02 1.36e+02 40000 3.79e+02 9.78e+02 90000 1.01e+03 2.55e+03 250000 6.88e+03 7.80e+03 1000000 2.74e+04 3.14e+04 Windows 7 numpy v.1.6.1 scipy v.0.10.0b2 pysparse v.1.1.1 -All compiled against Intel MKL ========= CSR ========== 1D: N SciPy pysparse 50 3.69e+01 3.72e+00 100 3.73e+01 3.89e+00 200 3.82e+01 4.32e+00 300 3.89e+01 5.22e+00 500 3.99e+01 5.73e+00 1000 4.23e+01 7.60e+00 2500 5.10e+01 1.39e+01 5000 6.35e+01 2.44e+01 10000 8.60e+01 4.38e+01 25000 1.56e+02 1.01e+02 50000 2.71e+02 1.94e+02 100000 5.35e+02 4.22e+02 2D: N SciPy pysparse 100 3.76e+01 4.11e+00 625 4.28e+01 8.57e+00 2500 5.62e+01 2.24e+01 10000 1.05e+02 7.34e+01 40000 2.98e+02 2.83e+02 90000 7.30e+02 8.41e+02 250000 2.66e+03 2.99e+03 1000000 1.01e+04 1.21e+04 ========= DIA ========== 1D: N SciPy pysparse 50 4.01e+01 3.77e+00 100 4.00e+01 3.93e+00 200 4.02e+01 4.33e+00 300 4.12e+01 5.03e+00 500 4.19e+01 5.81e+00 1000 4.33e+01 7.44e+00 2500 4.90e+01 1.43e+01 5000 5.66e+01 2.48e+01 10000 7.09e+01 4.49e+01 25000 1.20e+02 1.06e+02 50000 1.97e+02 1.98e+02 100000 3.87e+02 4.31e+02 2D: N SciPy pysparse 100 4.05e+01 4.11e+00 625 4.34e+01 8.45e+00 2500 5.30e+01 2.24e+01 10000 8.58e+01 7.50e+01 40000 2.36e+02 2.82e+02 90000 5.14e+02 8.80e+02 250000 2.28e+03 3.01e+03 1000000 1.13e+04 1.21e+04 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Sep 27 12:36:49 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 27 Sep 2011 18:36:49 +0200 Subject: [SciPy-User] scipy.sparse vs. pysparse In-Reply-To: References: Message-ID: Hi, 27.09.2011 17:35, Tony Stillfjord kirjoitti: [clip] > Is there a lot of overhead in the scipy.sparse function calls, or is > there some other reason that scipy performs > so much worse for "small" matrices? I realise that if you have a small > enough matrix you could just as well have > it be dense, but the point where scipy starts to break even with > pysparse is quite far from the size where I would > switch to the sparse format. Thanks for drawing attention to this. Yes, a quick profile run shows that there is a lot of overhead in scipy.sparse. However, a quick look reveals that most of it is probably unnecessary, and with small changes, it should be possible to speed it up by a large factor. Laundry list, is someone wants to get to work: - Optimize `sputils.upcast` -- e.g. memoization etc. -- it takes a lot of time, and is called often. - There should be no reason to use `sputils._isinstance`, just use the builtin isinstance() everywhere. - Arrange a fast path for arrays that need no conversion etc. in `base. From deshpande.jaidev at gmail.com Tue Sep 27 15:04:55 2011 From: deshpande.jaidev at gmail.com (Jaidev Deshpande) Date: Wed, 28 Sep 2011 00:34:55 +0530 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate Message-ID: Hi *The big question*: Why does the MATLAB function spline operate faster than the cubic spline alternatives in Scipy, especially splrep and splev ? ------ *The context*: I'm working on an algorithm that bottlenecks on spline interpolation. Some functions in Scipy return an interpolation *object function *depending on the input data which needs to be evaluated independently over the whole range. So I used 'lower order' functions like splrep and splev. Even that was too slow. Then I tried to write my own code for cubic splines, generating and solving a system of 4N simultaneous equations for interpolation between N+1 points. No matter what I do, the code is quite slow. How come the MATLAB function spline operate so fast? What am I missing? What can I do to speed it up? ------------ Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Sep 27 15:32:30 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 27 Sep 2011 13:32:30 -0600 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande < deshpande.jaidev at gmail.com> wrote: > Hi > > *The big question*: Why does the MATLAB function spline operate faster > than the cubic spline alternatives in Scipy, especially splrep and splev ? > > ------ > > *The context*: I'm working on an algorithm that bottlenecks on spline > interpolation. > > Some functions in Scipy return an interpolation *object function *depending > on the input data which needs to be evaluated independently over the whole > range. > > So I used 'lower order' functions like splrep and splev. Even that was too > slow. > > Then I tried to write my own code for cubic splines, generating and solving > a system of 4N simultaneous equations for interpolation between N+1 points. > > No matter what I do, the code is quite slow. How come the MATLAB function > spline operate so fast? What am I missing? What can I do to speed it up? > > I suspect it is because the scipy routines you reference are based on b-splines, which are needed for least squares fits. Simple cubic spline interpolation through a give set of points tends to be faster and I believe that is what the Matlab spline function does. To get b-splines in Matlab you need one of the toolboxes, it doesn't come with the core. I don't think scipy has a simple cubic spline interpolation, but I may be wrong. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sponsfreixes at gmail.com Tue Sep 27 10:42:51 2011 From: sponsfreixes at gmail.com (Sergi Pons Freixes) Date: Tue, 27 Sep 2011 16:42:51 +0200 Subject: [SciPy-User] Source of the error between computers (version, architecture, etc) Message-ID: Hi all, I have some code that runs perfectly on: Linux Toshiba-00 2.6.32-33-generic #72-Ubuntu SMP Fri Jul 29 21:08:37 UTC 2011 i686 GNU/Linux Python 2.6.5 Numpy 1.3.0 But on this machine: Linux mirto 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz GenuineIntel GNU/Linux Python 2.7.2 Numpy 1.6.1 I'm getting this error: $ python main.py Traceback (most recent call last): File "main.py", line 4, in import aldp File "/home/sergi/Dropbox/doctorat/alfacs/codi/aldp.py", line 6, in from string import replace, strip ImportError: cannot import name replace Google hasn't helped much when searching about "ImportError: cannot import name replace" and similar queries. To reduce the uncertainty, I would like to know if the cause could be the difference in versions of the software, a different architecture (32 bits vs 64), or whatever. Any clue? Regards, Sergi From sponsfreixes at gmail.com Tue Sep 27 10:45:30 2011 From: sponsfreixes at gmail.com (Sergi Pons Freixes) Date: Tue, 27 Sep 2011 16:45:30 +0200 Subject: [SciPy-User] Source of the error between computers (version, architecture, etc) In-Reply-To: References: Message-ID: Big mistake on the previous mail. The _real_ error is: On Tue, Sep 27, 2011 at 4:42 PM, Sergi Pons Freixes wrote: > > I'm getting this error: Traceback (most recent call last): File "main.py", line 32, in data = aldp.merge_max_irta(data, irta) File "/home/sergi/Dropbox/doctorat/alfacs/codi/aldp.py", line 378, in merge_max_irta data = np.hstack((maxd, irta)) File "/usr/lib/python2.7/site-packages/numpy/core/shape_base.py", line 270, in hstack return _nx.concatenate(map(atleast_1d,tup),1) TypeError: invalid type promotion From warren.weckesser at enthought.com Tue Sep 27 16:16:51 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 27 Sep 2011 15:16:51 -0500 Subject: [SciPy-User] Source of the error between computers (version, architecture, etc) In-Reply-To: References: Message-ID: On Tue, Sep 27, 2011 at 9:42 AM, Sergi Pons Freixes wrote: > Hi all, > > I have some code that runs perfectly on: > > Linux Toshiba-00 2.6.32-33-generic #72-Ubuntu SMP Fri Jul 29 21:08:37 > UTC 2011 i686 GNU/Linux > Python 2.6.5 > Numpy 1.3.0 > > But on this machine: > > Linux mirto 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 > x86_64 Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz GenuineIntel GNU/Linux > Python 2.7.2 > Numpy 1.6.1 > > I'm getting this error: > > $ python main.py > Traceback (most recent call last): > File "main.py", line 4, in > import aldp > File "/home/sergi/Dropbox/doctorat/alfacs/codi/aldp.py", line 6, in > > from string import replace, strip > ImportError: cannot import name replace > > Google hasn't helped much when searching about "ImportError: cannot > import name replace" and similar queries. To reduce the uncertainty, I > would like to know if the cause could be the difference in versions of > the software, a different architecture (32 bits vs 64), or whatever. > Any clue? > On the second machine, is there a file called "string.py" in /home/sergi/Dropbox/doctorat/alfacs/coda that is not in the corresponding directory on the first machine? Warren > Regards, > Sergi > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Tue Sep 27 16:27:27 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 27 Sep 2011 16:27:27 -0400 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: scipy.signal has some cubic and quadratic spline functions: cspline1d cspline1d_eval cspline2d (and replace the c with q for the quadratic versions). I have no idea how fast they are, or if they're at all drop-in replacements for the matlab ones. The stuff in scipy.interpolate is powerful, but the fitpack spline-fitting operations can be a bit input-sensitive and prone to strange ringing. Zach On Sep 27, 2011, at 3:32 PM, Charles R Harris wrote: > > > On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande wrote: > Hi > > The big question: Why does the MATLAB function spline operate faster than the cubic spline alternatives in Scipy, especially splrep and splev ? > > ------ > > The context: I'm working on an algorithm that bottlenecks on spline interpolation. > > Some functions in Scipy return an interpolation object function depending on the input data which needs to be evaluated independently over the whole range. > > So I used 'lower order' functions like splrep and splev. Even that was too slow. > > Then I tried to write my own code for cubic splines, generating and solving a system of 4N simultaneous equations for interpolation between N+1 points. > > No matter what I do, the code is quite slow. How come the MATLAB function spline operate so fast? What am I missing? What can I do to speed it up? > > > I suspect it is because the scipy routines you reference are based on b-splines, which are needed for least squares fits. Simple cubic spline interpolation through a give set of points tends to be faster and I believe that is what the Matlab spline function does. To get b-splines in Matlab you need one of the toolboxes, it doesn't come with the core. I don't think scipy has a simple cubic spline interpolation, but I may be wrong. > > Chuck > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From david_baddeley at yahoo.com.au Tue Sep 27 17:45:12 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 27 Sep 2011 14:45:12 -0700 (PDT) Subject: [SciPy-User] Source of the error between computers (version, architecture, etc) In-Reply-To: References: Message-ID: <1317159912.19877.YahooMailNeo@web113413.mail.gq1.yahoo.com> Hi Sergi, you're going to have to post a bit more context - from what you've posted my guess is that 'maxd' and 'irta' might be different numeric types and that numpy might have got a bit stricter with type checking and is now failing rather than performing some blind casting. Are they both arays? cheers, David ----- Original Message ----- From: Sergi Pons Freixes To: SciPy Users List Cc: Sent: Wednesday, 28 September 2011 3:45 AM Subject: Re: [SciPy-User] Source of the error between computers (version, architecture, etc) Big mistake on the previous mail. The _real_ error is: On Tue, Sep 27, 2011 at 4:42 PM, Sergi Pons Freixes wrote: > > I'm getting this error: Traceback (most recent call last): ? File "main.py", line 32, in ? ? data = aldp.merge_max_irta(data, irta) ? File "/home/sergi/Dropbox/doctorat/alfacs/codi/aldp.py", line 378, in merge_max_irta ? ? data = np.hstack((maxd, irta)) ? File "/usr/lib/python2.7/site-packages/numpy/core/shape_base.py", line 270, in hstack ? ? return _nx.concatenate(map(atleast_1d,tup),1) TypeError: invalid type promotion _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From gorkypl at gmail.com Tue Sep 27 18:02:41 2011 From: gorkypl at gmail.com (=?UTF-8?B?UGF3ZcWC?=) Date: Wed, 28 Sep 2011 00:02:41 +0200 Subject: [SciPy-User] scikits.timeseries errorbar plot - is it possible? Message-ID: Hello, Is it possible somehow to get errorbar plots of timeseries? I'm using the scikits.timeseires module. In my specific case I'm calculating monthly averages using scikits.timeseries.convert(func=np.ma.mean), but I would like to represent the deviation of mean for every month. As far as I see, the errorbar plot works only with 'normal' x,y data... greetings, Pawe? Rumian From charlesr.harris at gmail.com Tue Sep 27 19:27:02 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 27 Sep 2011 17:27:02 -0600 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: On Tue, Sep 27, 2011 at 2:27 PM, Zachary Pincus wrote: > scipy.signal has some cubic and quadratic spline functions: > cspline1d > cspline1d_eval > cspline2d > > (and replace the c with q for the quadratic versions). > > I have no idea how fast they are, or if they're at all drop-in replacements > for the matlab ones. The stuff in scipy.interpolate is powerful, but the > fitpack spline-fitting operations can be a bit input-sensitive and prone to > strange ringing. > > Zach > > > > On Sep 27, 2011, at 3:32 PM, Charles R Harris wrote: > > > > > > > On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande < > deshpande.jaidev at gmail.com> wrote: > > Hi > > > > The big question: Why does the MATLAB function spline operate faster than > the cubic spline alternatives in Scipy, especially splrep and splev ? > > > > ------ > > > > The context: I'm working on an algorithm that bottlenecks on spline > interpolation. > > > > Some functions in Scipy return an interpolation object function depending > on the input data which needs to be evaluated independently over the whole > range. > > > > So I used 'lower order' functions like splrep and splev. Even that was > too slow. > > > > Then I tried to write my own code for cubic splines, generating and > solving a system of 4N simultaneous equations for interpolation between N+1 > points. > > > > No matter what I do, the code is quite slow. How come the MATLAB function > spline operate so fast? What am I missing? What can I do to speed it up? > > > > > > I suspect it is because the scipy routines you reference are based on > b-splines, which are needed for least squares fits. Simple cubic spline > interpolation through a give set of points tends to be faster and I believe > that is what the Matlab spline function does. To get b-splines in Matlab you > need one of the toolboxes, it doesn't come with the core. I don't think > scipy has a simple cubic spline interpolation, but I may be wrong. > > > I believe the splines in signal are periodic and the boundary conditions aren't flexible. The documentation is, um..., well, they are effectively undocumented. We really need better spline support in scipy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Sep 27 19:50:45 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 27 Sep 2011 19:50:45 -0400 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: On Tue, Sep 27, 2011 at 7:27 PM, Charles R Harris wrote: > > > On Tue, Sep 27, 2011 at 2:27 PM, Zachary Pincus > wrote: >> >> scipy.signal has some cubic and quadratic spline functions: >> cspline1d >> cspline1d_eval >> cspline2d >> >> (and replace the c with q for the quadratic versions). >> >> I have no idea how fast they are, or if they're at all drop-in >> replacements for the matlab ones. The stuff in scipy.interpolate is >> powerful, but the fitpack spline-fitting operations can be a bit >> input-sensitive and prone to strange ringing. >> >> Zach >> >> >> >> On Sep 27, 2011, at 3:32 PM, Charles R Harris wrote: >> >> > >> > >> > On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande >> > wrote: >> > Hi >> > >> > The big question: Why does the MATLAB function spline operate faster >> > than the cubic spline alternatives in Scipy, especially splrep and splev ? >> > >> > ------ >> > >> > The context: I'm working on an algorithm that bottlenecks on spline >> > interpolation. >> > >> > Some functions in Scipy return an interpolation object function >> > depending on the input data which needs to be evaluated independently over >> > the whole range. >> > >> > So I used 'lower order' functions like splrep and splev. Even that was >> > too slow. >> > >> > Then I tried to write my own code for cubic splines, generating and >> > solving a system of 4N simultaneous equations for interpolation between N+1 >> > points. >> > >> > No matter what I do, the code is quite slow. How come the MATLAB >> > function spline operate so fast? What am I missing? What can I do to speed >> > it up? >> > >> > >> > I suspect it is because the scipy routines you reference are based on >> > b-splines, which are needed for least squares fits. Simple cubic spline >> > interpolation through a give set of points tends to be faster and I believe >> > that is what the Matlab spline function does. To get b-splines in Matlab you >> > need one of the toolboxes, it doesn't come with the core. I don't think >> > scipy has a simple cubic spline interpolation, but I may be wrong. >> > > > I believe the splines in signal are periodic and the boundary conditions > aren't flexible. The documentation is, um..., well, they are effectively > undocumented. We really need better spline support in scipy. I thought the main difference of the signal compared to interpolate splines is that they work only on a regular grid. They have a smoothing coefficient lambda, so they don't seem to be pure interpolating splines. (I never looked at them because of the regular grid restriction.) matlab's spline has x and Y, but all the examples in the help have regular grid. Josef > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From david_baddeley at yahoo.com.au Tue Sep 27 20:32:41 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 27 Sep 2011 17:32:41 -0700 (PDT) Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: <1317169961.62587.YahooMailNeo@web113407.mail.gq1.yahoo.com> If you have regularly spaced data, scipy.ndimage.map_coordinates might also be worth a look. cheers, David ----- Original Message ----- From: Zachary Pincus To: SciPy Users List Cc: Sent: Wednesday, 28 September 2011 9:27 AM Subject: Re: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate scipy.signal has some cubic and quadratic spline functions: cspline1d cspline1d_eval cspline2d (and replace the c with q for the quadratic versions). I have no idea how fast they are, or if they're at all drop-in replacements for the matlab ones. The stuff in scipy.interpolate is powerful, but the fitpack spline-fitting operations can be a bit input-sensitive and prone to strange ringing. Zach On Sep 27, 2011, at 3:32 PM, Charles R Harris wrote: > > > On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande wrote: > Hi > > The big question: Why does the MATLAB function spline operate faster than the cubic spline alternatives in Scipy, especially splrep and splev ? > > ------ > > The context: I'm working on an algorithm that bottlenecks on spline interpolation. > > Some functions in Scipy return an interpolation object function depending on the input data which needs to be evaluated independently over the whole range. > > So I used 'lower order' functions like splrep and splev. Even that was too slow. > > Then I tried to write my own code for cubic splines, generating and solving a system of 4N simultaneous equations for interpolation between N+1 points. > > No matter what I do, the code is quite slow. How come the MATLAB function spline operate so fast? What am I missing? What can I do to speed it up? > > > I suspect it is because the scipy routines you reference are based on b-splines, which are needed for least squares fits. Simple cubic spline interpolation through a give set of points tends to be faster and I believe that is what the Matlab spline function does. To get b-splines in Matlab you need one of the toolboxes, it doesn't come with the core. I don't think scipy has a simple cubic spline interpolation, but I may be wrong. > > Chuck > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From charlesr.harris at gmail.com Tue Sep 27 23:08:19 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 27 Sep 2011 21:08:19 -0600 Subject: [SciPy-User] Cubic splines - MATLAB vs Scipy.interpolate In-Reply-To: References: Message-ID: On Tue, Sep 27, 2011 at 5:50 PM, wrote: > On Tue, Sep 27, 2011 at 7:27 PM, Charles R Harris > wrote: > > > > > > On Tue, Sep 27, 2011 at 2:27 PM, Zachary Pincus > > > wrote: > >> > >> scipy.signal has some cubic and quadratic spline functions: > >> cspline1d > >> cspline1d_eval > >> cspline2d > >> > >> (and replace the c with q for the quadratic versions). > >> > >> I have no idea how fast they are, or if they're at all drop-in > >> replacements for the matlab ones. The stuff in scipy.interpolate is > >> powerful, but the fitpack spline-fitting operations can be a bit > >> input-sensitive and prone to strange ringing. > >> > >> Zach > >> > >> > >> > >> On Sep 27, 2011, at 3:32 PM, Charles R Harris wrote: > >> > >> > > >> > > >> > On Tue, Sep 27, 2011 at 1:04 PM, Jaidev Deshpande > >> > wrote: > >> > Hi > >> > > >> > The big question: Why does the MATLAB function spline operate faster > >> > than the cubic spline alternatives in Scipy, especially splrep and > splev ? > >> > > >> > ------ > >> > > >> > The context: I'm working on an algorithm that bottlenecks on spline > >> > interpolation. > >> > > >> > Some functions in Scipy return an interpolation object function > >> > depending on the input data which needs to be evaluated independently > over > >> > the whole range. > >> > > >> > So I used 'lower order' functions like splrep and splev. Even that was > >> > too slow. > >> > > >> > Then I tried to write my own code for cubic splines, generating and > >> > solving a system of 4N simultaneous equations for interpolation > between N+1 > >> > points. > >> > > >> > No matter what I do, the code is quite slow. How come the MATLAB > >> > function spline operate so fast? What am I missing? What can I do to > speed > >> > it up? > >> > > >> > > >> > I suspect it is because the scipy routines you reference are based on > >> > b-splines, which are needed for least squares fits. Simple cubic > spline > >> > interpolation through a give set of points tends to be faster and I > believe > >> > that is what the Matlab spline function does. To get b-splines in > Matlab you > >> > need one of the toolboxes, it doesn't come with the core. I don't > think > >> > scipy has a simple cubic spline interpolation, but I may be wrong. > >> > > > > > I believe the splines in signal are periodic and the boundary conditions > > aren't flexible. The documentation is, um..., well, they are effectively > > undocumented. We really need better spline support in scipy. > > I thought the main difference of the signal compared to interpolate > splines is that they work only on a regular grid. > They have a smoothing coefficient lambda, so they don't seem to be > pure interpolating splines. > > The equally spaced points are what I meant by periodic, i.e., the same basis function can be repeated. The signal itself it periodic if extended to twice its length with the mirror symmetry at the ends. I'm not sure how the smoothing factor works. The ndimage map_coordinates would work for interpolating equally spaced points and has more boundary conditions, but they are still can't be arbitrary. I think scipy could use a simple interpolating spline with not a knot default boundary conditions and no repeated knot points. The not a knot boundary conditions means to use finite differences at the ends to estimate the slopes which then become the boundary conditions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Tue Sep 27 23:33:23 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 27 Sep 2011 23:33:23 -0400 Subject: [SciPy-User] scikits timeseries tofile In-Reply-To: References: <00E38D9D-EE12-43C9-BCA6-521C826F0AA7@gmail.com> Message-ID: On Tue, Sep 27, 2011 at 10:14 AM, Georges Schutz wrote: > On 24/03/2011 11:13, Pierre GM wrote: >> >> On Mar 24, 2011, at 2:50 AM, Gar Brown wrote: >> >>> Hi All, >>> >>> While trying to write a timeseries object to file I the following error: >>> >>> File "C:\Python27\lib\site-packages\scikits\timeseries\tseries.py", >>> line 1527, in tofile >>> ? ? return scipy.io.write_array(fileobject, tmpfiller, **optpars) >>> AttributeError: 'module' object has no attribute 'write_array' >>> >>> >>> I noticed that the scipy.io.write_array function has been deprecated >>> for some time, is there a fix or workaround for avoiding this error. I >>> really like the time series functionality of scikits.timeseries >> >> Dang. I really need to get back on the scikit... Would you mind filing a ticket ? I'll try to get back to you by next week. Don't hesitate to keep on nudging me till I give you a fix and/or workaround. >> P. > > This seems still to be an issue, it is files as ticket 116 but no > workaround or fix was proposed until know. > > I face this problem as I updated scipy last. I suppose others using > scipy.io may have the same problem. I personally worked around by > writing an own "tofile-like" method using the numpy.ndarray.tofile method. > I do not want to stress you but: > - Do you have a better workaround for solving the problem? > - Are there plans to fix this ticked in some way in the next release? > > G. Schutz > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Aside: we should have a pow-wow soon about scikits.timeseries. I would like to move the code inside pandas and start maintaining / improving portions of it. I think it will find a good home there. From pgmdevlist at gmail.com Wed Sep 28 05:18:47 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 28 Sep 2011 11:18:47 +0200 Subject: [SciPy-User] scikits timeseries tofile In-Reply-To: References: <00E38D9D-EE12-43C9-BCA6-521C826F0AA7@gmail.com> Message-ID: <02A37BE4-7E01-416A-BEDB-5D5AABB1305B@gmail.com> On Sep 28, 2011, at 05:33 , Wes McKinney wrote: > > Aside: we should have a pow-wow soon about scikits.timeseries. I would > like to move the code inside pandas and start maintaining / improving > portions of it. I think it will find a good home there. No pb. Contact me off list if needs be. From klonuo at gmail.com Wed Sep 28 06:10:11 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Wed, 28 Sep 2011 12:10:11 +0200 Subject: [SciPy-User] Can't build numpy 1.6.1 on Ubuntu 11.10 Message-ID: I build numpy w/o ATLAS or MKL, but speed comparison (~10 times slower depending on code) made me remove it and concentrate on the most boring part I wanted to avoid: building numpy with ATLAS or MKL I started with ATLAS, but just couldn't compile and error about missing `libf77blas.a` library which somehow wasn't created by build process. Google didn't give me pointers. As I couldn't do it with ATLAS, I step in Intel solution: Following previously known posts and Intel adviser I prepared `site.cfg`: [DEFAULT] include_dirs = /opt/SuiteSparse/UFconfig [amd] amd_libs = amd library_dirs = /opt/SuiteSparse/AMD/Lib include_dirs = /opt/SuiteSparse/AMD/Include [umfpack] umfpack_libs = umfpack library_dirs = /opt/SuiteSparse/UMFPACK/Lib include_dirs = /opt/SuiteSparse/UMFPACK/Include [mkl] include_dirs = /opt/intel/composerxe-2011.4.191/mkl/include library_dirs = /opt/intel/composerxe-2011.4.191/mkl/lib/ia32 lapack_libs = mkl_lapack95 mkl_libs = mkl_intel mkl_sequential mkl_core pthread and started: $ python setup.py config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel Result (whole in attachment): compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.7 -c' icc: _configtest.c icc _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest C compiler: icc compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.7 -c' icc: _configtest.c icc _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest building extension "numpy.core._sort" sources Generating build/src.linux-i686-2.7/numpy/core/include/numpy/config.h C compiler: icc compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.7 -c' icc: _configtest.c /usr/include/gnu/stubs.h(7): catastrophic error: cannot open source file "gnu/stubs-32.h" # include ^ compilation aborted for _configtest.c (code 4) /usr/include/gnu/stubs.h(7): catastrophic error: cannot open source file "gnu/stubs-32.h" # include ^ compilation aborted for _configtest.c (code 4) failure. removing: _configtest.c _configtest.o Google suggests that it's probably related to ICC compiler Any kind pointer? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: err.log Type: application/octet-stream Size: 6502 bytes Desc: not available URL: From ralf.gommers at googlemail.com Wed Sep 28 13:06:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 28 Sep 2011 19:06:15 +0200 Subject: [SciPy-User] Can't build numpy 1.6.1 on Ubuntu 11.10 In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 12:10 PM, Klonuo Umom wrote: > I build numpy w/o ATLAS or MKL, but speed comparison (~10 times slower > depending on code) made me remove it and concentrate on the most boring part > I wanted to avoid: building numpy with ATLAS or MKL > > I started with ATLAS, but just couldn't compile and error about missing > `libf77blas.a` library which somehow wasn't created by build process. Google > didn't give me pointers. As I couldn't do it with ATLAS, I step in Intel > solution: > The instructions for building with Intel MKL at http://www.scipy.org/Installing_SciPy/Linux have just been updated. Does that help? Probably best to leave out the optional umfpack and suitesparse parts, see if you get MKL to work first. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimodisasha at gmail.com Wed Sep 28 13:11:00 2011 From: massimodisasha at gmail.com (massimo di stefano) Date: Wed, 28 Sep 2011 13:11:00 -0400 Subject: [SciPy-User] build scipy on osx lion : KeyError: 'FARCHFLAGS' Message-ID: hi All i just switched to osx lion .. tring to build scipy i'm having this error : http://paste.pound-python.org/show/13120/ ### Could not locate executable ifc customize GnuFCompiler Could not locate executable g77 customize Gnu95FCompiler Found executable /usr/bin/gfortran Traceback (most recent call last): File "setup.py", line 196, in setup_package() File "setup.py", line 187, in setup_package configuration=configuration ) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/core.py", line 186, in setup return old_setup(**new_attr) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/command/build.py", line 37, in run old_build.run(self) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/command/build_clib.py", line 89, in run c_compiler=self.compiler) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 821, in new_fcompiler c_compiler=c_compiler) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 803, in get_default_fcompiler c_compiler=c_compiler) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 752, in _find_existing_fcompiler c.customize(dist) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 502, in customize fflags = self.flag_vars.flags + dflags + oflags + aflags File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/environment.py", line 37, in __getattr__ return self._get_var(name, conf_desc) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/environment.py", line 51, in _get_var var = self._hook_handler(name, hook) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 676, in _environment_hook return hook() File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/gnu.py", line 285, in get_flags arch_flags = self._universal_flags(self.compiler_f90) File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/gnu.py", line 274, in _universal_flags farchflags = os.environ['FARCHFLAGS'] File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: 'FARCHFLAGS' ### have you any clue on how to fix it ? i used "setup.py build" as build system trying with scons i received the same error. i used this instruction : # clone scipy from git repository # cd into scipy export CC=gcc-4.2 export CXX=g++-4.2 export FFLAGS=-ff2c python setup.py build dyn-128-128-201-211:scipy epifanio$ which gfortran /usr/bin/gfortran dyn-128-128-201-211:scipy epifanio$ file /usr/bin/gfortran /usr/bin/gfortran: Mach-O universal binary with 2 architectures /usr/bin/gfortran (for architecture i386): Mach-O executable i386 /usr/bin/gfortran (for architecture x86_64): Mach-O 64-bit executable x86_64 dyn-128-128-201-211:scipy epifanio$ which python /Library/Frameworks/Python.framework/Versions/2.7/bin/python dyn-128-128-201-211:scipy epifanio$ file /Library/Frameworks/Python.framework/Versions/2.7/bin/python /Library/Frameworks/Python.framework/Versions/2.7/bin/python: Mach-O universal binary with 2 architectures /Library/Frameworks/Python.framework/Versions/2.7/bin/python (for architecture i386): Mach-O executable i386 /Library/Frameworks/Python.framework/Versions/2.7/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64 Many thanks for any help! --Massimo. -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Wed Sep 28 13:42:22 2011 From: denis-bz-gg at t-online.de (denis) Date: Wed, 28 Sep 2011 10:42:22 -0700 (PDT) Subject: [SciPy-User] uniformity test in scipy.stats or statsmodels ? Message-ID: <6e8f7762-5dbb-403d-935f-fc54974cd088@i28g2000yqn.googlegroups.com> Can someone please point me to a function howuniform( X ) where X is sorted and 0 <= X <= 1 in scipy.stats or statsmodels ? (There seem to be several such, cf. http://en.wikipedia.org/wiki/Minimum_distance_estimation but for me simplest is best.) Thanks, cheers -- denis From josef.pktd at gmail.com Wed Sep 28 16:15:13 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 28 Sep 2011 16:15:13 -0400 Subject: [SciPy-User] uniformity test in scipy.stats or statsmodels ? In-Reply-To: <6e8f7762-5dbb-403d-935f-fc54974cd088@i28g2000yqn.googlegroups.com> References: <6e8f7762-5dbb-403d-935f-fc54974cd088@i28g2000yqn.googlegroups.com> Message-ID: On Wed, Sep 28, 2011 at 1:42 PM, denis wrote: > Can someone please point me to a function ?howuniform( X ) ?where X is > sorted and 0 <= X <= 1 > in scipy.stats or statsmodels ? I'm not sure what you need. Are you looking for a test to see whether a sample X comes from the uniform distribution on (0,1)? I think kstest is the only one directly available. If you bin the data, then chisquare can also be used. I don't think any other test is available for the uniform distribution. Josef > (There seem to be several such, cf. http://en.wikipedia.org/wiki/Minimum_distance_estimation > but for me simplest is best.) > Thanks, cheers > ?-- denis > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From klonuo at gmail.com Wed Sep 28 20:36:38 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Thu, 29 Sep 2011 02:36:38 +0200 Subject: [SciPy-User] Can't build numpy 1.6.1 on Ubuntu 11.10 In-Reply-To: References: Message-ID: Thanks for your reply Ralf Problem seem perhaps more general and not related to numpy. I build numpy with same MKL and same Intel compilers on same computer but on Ubuntu 11.04. Also `gcc` was updated couple of times in the meantime, if that can have any relevance. I did try to change `site.cfg` and set different flags in numpy sources for Intel compiler, before I mailed, but without success. About Instructions you mention, latest article is from Nov 2010 which suggests building ATLAS with link to tarfile: ../configure -b 64 -Fa alg -fPIC --with-netlib-lapack-tarfile= And this was exactly my problem as I followed that article. Later as I didn't solve ICC issue I found another instruction suggesting that LAPACK first be build (with setting correct make.inc) then include in ATLAS configure as library and not tarfile. Result was that previously missing `libf77blas.a` was now created. That's also mentioned, as I found later, in older 2008 article on same page. Why 2010 article suggests including bare tarfile instead created library, don't know, but perhaps it works like that on some machines. Cheers On Wed, Sep 28, 2011 at 7:06 PM, Ralf Gommers wrote: > > > On Wed, Sep 28, 2011 at 12:10 PM, Klonuo Umom wrote: >> >> I build numpy w/o ATLAS or MKL, but speed comparison (~10 times slower depending on code) made me remove it and concentrate on the most boring part I wanted to avoid: building numpy with ATLAS or MKL >> I started with ATLAS, but just couldn't compile and error about missing `libf77blas.a` library which somehow wasn't created by build process. Google didn't give me pointers. As I couldn't do it with ATLAS, I step in Intel solution: > > The instructions for building with Intel MKL at http://www.scipy.org/Installing_SciPy/Linux have just been updated. Does that help? > > Probably best to leave out the optional umfpack and suitesparse parts, see if you get MKL to work first. > > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy at samueljohn.de Thu Sep 29 04:44:44 2011 From: scipy at samueljohn.de (Samuel John) Date: Thu, 29 Sep 2011 10:44:44 +0200 Subject: [SciPy-User] build scipy on osx lion : KeyError: 'FARCHFLAGS' In-Reply-To: References: Message-ID: <99276369-CA45-42B4-9CD0-F35817C8511C@samueljohn.de> Hi, perhaps you need the "right" gfortran compiler. I am not sure because you have /usr/bin/gfortran. I find it most easy to get via http://mxcl.github.com/homebrew/ . Do: /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)" then do: brew install gfortran Then try your build again. For me this works great on Lion. Does this help you? -- Samuel On 28.09.2011, at 19:11, massimo di stefano wrote: > hi All > i just switched to osx lion .. tring to build scipy > i'm having this error : > > http://paste.pound-python.org/show/13120/ > > ### > > Could not locate executable ifc > customize GnuFCompiler > Could not locate executable g77 > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > Traceback (most recent call last): > File "setup.py", line 196, in > setup_package() > File "setup.py", line 187, in setup_package > configuration=configuration ) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/core.py", line 186, in setup > return old_setup(**new_attr) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 152, in setup > dist.run_commands() > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 953, in run_commands > self.run_command(cmd) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command > cmd_obj.run() > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/command/build.py", line 37, in run > old_build.run(self) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/command/build.py", line 127, in run > self.run_command(cmd_name) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/cmd.py", line 326, in run_command > self.distribution.run_command(command) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 972, in run_command > cmd_obj.run() > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/command/build_clib.py", line 89, in run > c_compiler=self.compiler) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 821, in new_fcompiler > c_compiler=c_compiler) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 803, in get_default_fcompiler > c_compiler=c_compiler) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 752, in _find_existing_fcompiler > c.customize(dist) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 502, in customize > fflags = self.flag_vars.flags + dflags + oflags + aflags > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/environment.py", line 37, in __getattr__ > return self._get_var(name, conf_desc) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/environment.py", line 51, in _get_var > var = self._hook_handler(name, hook) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/__init__.py", line 676, in _environment_hook > return hook() > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/gnu.py", line 285, in get_flags > arch_flags = self._universal_flags(self.compiler_f90) > File "/Library/Python/2.7/site-packages/numpy-override/numpy/distutils/fcompiler/gnu.py", line 274, in _universal_flags > farchflags = os.environ['FARCHFLAGS'] > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/UserDict.py", line 23, in __getitem__ > raise KeyError(key) > KeyError: 'FARCHFLAGS' > > ### > > have you any clue on how to fix it ? > > i used "setup.py build" as build system > trying with scons i received the same error. > > > i used this instruction : > > # clone scipy from git repository > # cd into scipy > > export CC=gcc-4.2 > export CXX=g++-4.2 > export FFLAGS=-ff2c > > python setup.py build > > > > dyn-128-128-201-211:scipy epifanio$ which gfortran > /usr/bin/gfortran > > dyn-128-128-201-211:scipy epifanio$ file /usr/bin/gfortran > /usr/bin/gfortran: Mach-O universal binary with 2 architectures > /usr/bin/gfortran (for architecture i386): Mach-O executable i386 > /usr/bin/gfortran (for architecture x86_64): Mach-O 64-bit executable x86_64 > > dyn-128-128-201-211:scipy epifanio$ which python > /Library/Frameworks/Python.framework/Versions/2.7/bin/python > > dyn-128-128-201-211:scipy epifanio$ file /Library/Frameworks/Python.framework/Versions/2.7/bin/python > /Library/Frameworks/Python.framework/Versions/2.7/bin/python: Mach-O universal binary with 2 architectures > /Library/Frameworks/Python.framework/Versions/2.7/bin/python (for architecture i386): Mach-O executable i386 > /Library/Frameworks/Python.framework/Versions/2.7/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64 > > > Many thanks for any help! > > --Massimo. > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From meelmaarten at gmail.com Thu Sep 29 12:38:49 2011 From: meelmaarten at gmail.com (meelmaarten) Date: Thu, 29 Sep 2011 16:38:49 +0000 (UTC) Subject: [SciPy-User] Separate scales for left and right axis broken References: Message-ID: Christopher Peters edisonmission.com> writes: > > > All, > I am trying to run the code at http://pytseries.sourceforge.net/lib.plotting.examples.html#separate-scales-for-left-and-right-axis > to generate separate scales for left and right axes for tsplot in scikits.timeseries.lib.plotlib, > but I am getting an error. > Package versions: > Numpy: ?1.5.1 > Matplotlib: ?1.0.1 > Scikits Timeseries: ?0.91.3 > Code to test: Dear all, I am having the same problem, but running it with matplotlib version 1.00. I do not feel fit enough for trying to solve it. Any suggestion would REALLY help! Many thanks in advance! Maarten From brownj at seattleu.edu Thu Sep 29 12:37:05 2011 From: brownj at seattleu.edu (Jeff Brown) Date: Thu, 29 Sep 2011 16:37:05 +0000 (UTC) Subject: [SciPy-User] inverse function of a spline References: Message-ID: gmail.com> writes: > > On Fri, May 7, 2010 at 4:37 PM, nicky van foreest gmail.com> wrote: > > Hi Josef, > > > >> If I have a cubic spline, or any other smooth interpolator in scipy, > >> is there a way to get the > >> inverse function directly? > > > > How can you ensure that the cubic spline approx is non-decreasing? I > > actually wonder whether using cubic splines is the best way to > > approximate distribution functions. > > Now I know it's not, but I was designing the extension to the linear case > on paper instead of in the interpreter, and got stuck on the wrong > problem. > There's an algorithm for making constrained-to-be-monotonic spline interpolants (only in one dimension, though). The reference is Dougherty et al 1989 Mathematics of Computation, vol 52 no 186 pp 471-494 (April 1989). This is available on-line at www.jstor.org. From gorkypl at gmail.com Thu Sep 29 15:47:43 2011 From: gorkypl at gmail.com (=?UTF-8?B?UGF3ZcWC?=) Date: Thu, 29 Sep 2011 21:47:43 +0200 Subject: [SciPy-User] Separate scales for left and right axis broken In-Reply-To: References: Message-ID: > Dear all, > > I am having the same problem, but running it with matplotlib version 1.00. I do > not feel fit enough for trying to solve it. Any suggestion would REALLY help! > Many thanks in advance! For me it worked after changing one line in scikits/timeseries/lib/plotlib.py: http://old.nabble.com/My-attempt-to-fix-an-issue-with-separate-scales-for-left-and-right-axis-in-scikits.timeseries---correct--td32283111.html greetings, Pawe? Rumian From harijay at gmail.com Thu Sep 29 17:24:22 2011 From: harijay at gmail.com (hari jayaram) Date: Thu, 29 Sep 2011 17:24:22 -0400 Subject: [SciPy-User] newbie help: splprep to get spline interpolation of 2d signal : ValueError from code but not in command line example Message-ID: I am rather new to scipy and numpy. I am using a numpy version 1.3.0 and scipy version 0.3.0 on Ubuntu 64 bit box with python version 2.6.5 # Section My test case I can get the following commands based on the documentation( http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splrep.html) to work in the python shell. import numpy from numpy import * from scipy import interpolate x = linspace(20,81,2339) y = sin(x) tck = interpolate.splrep(x, y) In this case the interpolation has succeeded without error However to troubleshoot my function if I try on the command line >x.shape (2339,) > my_idim, my_m = x.s >>> my_idim, my_m = x.shape Traceback (most recent call last): File "", line 1, in ValueError: need more than 1 value to unpack #Section : My actual program Now in my actual program I get an error from the fitpack routine which seems to suggest it does not like the x and y data I am feeding the splprep function. The shapes are identical to the test example above. Although the signal is like a very wonky sigmoidal signal that I want to interpolate. The error I get is from line 191 in the fitpack.py . # Printed x shape , y shape , x dtype , y dtype , x type , y type (2339,) (2339,) float64 float64 20.04 84.86 Traceback (most recent call last): File "ReadAndSplitInput.py", line 50, in plot_for_well(well_id) File "ReadAndSplitInput.py", line 35, in plot_for_well tck =interpolate.splprep(xs,ys) File "/usr/lib/python2.6/dist-packages/scipy/interpolate/fitpack.py", line 191, in splprep idim,m=x.shape ValueError: need more than 1 value to unpack I get the same ValueError in the example I adapted from the docs ( as I mentioned in section I) Any suggestions in troubleshooting whats happening will be greatly appreciated. Thanks a lot for your help Hari ################# Here is my function: ################# def plot_for_well(well_id): xvals = [] yvals = [] for vals in mega_data_dict[well_id]: xvals.append("%0.2f" % float(vals[0])) yvals.append("%0.2f" % (float(vals[1]))) xs = array(xvals , numpy.dtype('f8')) ys = array(yvals, numpy.dtype('f8')) print xs.shape, ys.shape,xs.dtype,ys.dtype, min(xvals),max(xvals),type(xs),type(ys) tck =interpolate.splprep(x=ys) xcalc = linspace(min(xvals),max(xvals),len(xvals)) ycalc = interpolate.splev(xcalc,tck) ax = plt.subplot(111) ax.set_yscale('log') ax.plot(xvals,yvals,"o",xcalc,ycalc) plt.show() Python particulars: hari at hari:~$ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import scipy >>> numpy.__version__ '1.3.0' >>> scipy.__version__ '0.7.0' -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Sep 29 17:34:46 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 29 Sep 2011 23:34:46 +0200 Subject: [SciPy-User] newbie help: splprep to get spline interpolation of 2d signal : ValueError from code but not in command line example In-Reply-To: References: Message-ID: On 29.09.2011 23:24, hari jayaram wrote: [clip] > x = linspace(20,81,2339) > y = sin(x) > tck = interpolate.splrep(x, y) [clip] > tck = interpolate.splprep(x=ys) Note that there are two functions: `splrep` and `splprep`, and they do somewhat different things (read the docs for each to find out more). You probably intendeded to write `splrep(xs, ys)` in the latter case. -- Pauli Virtanen From Dharhas.Pothina at twdb.state.tx.us Fri Sep 30 11:48:42 2011 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Fri, 30 Sep 2011 10:48:42 -0500 Subject: [SciPy-User] Python Job Opening at TWDB. Message-ID: <4E859E8A0200009B0003F04F@GWWEB.twdb.state.tx.us> Hi All, Sorry for cross posting, I know there is a large overlap in these three mailing lists. We have an opening in my team for a full time temporary employee. This position has current funding for the next 6-9 months with additional federal funding arriving in the Spring that should allow the position to continue for at least 2 more years. Details at the following link: http://www.twdb.state.tx.us/jobs/doc/12-03.pdf Please forward this email to any folks who may be interested. They can contact me if they have any questions about the position. thanks, - dharhas Dharhas Pothina, Ph.D., P.E. Team Lead - Data, Analysis and Modeling Surface Water Resources Division Texas Water Development Board 1700 North Congress Ave. P.O. Box 13231 Austin, TX 78711-3231 Tel: (512) 936-0818 Fax: (512) 936-0816 dharhas.pothina at twdb.state.tx.us www.twdb.state.tx.us -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Pothina, Dharhas.vcf Type: application/octet-stream Size: 317 bytes Desc: not available URL: From josef.pktd at gmail.com Fri Sep 30 12:37:20 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 30 Sep 2011 12:37:20 -0400 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Thu, Sep 29, 2011 at 12:37 PM, Jeff Brown wrote: > ? gmail.com> writes: > >> >> On Fri, May 7, 2010 at 4:37 PM, nicky van foreest gmail.com> > wrote: >> > Hi Josef, >> > >> >> If I have a cubic spline, or any other smooth interpolator in scipy, >> >> is there a way to get the >> >> inverse function directly? >> > >> > How can you ensure that the cubic spline approx is non-decreasing? I >> > actually wonder whether using cubic splines is the best way to >> > approximate distribution functions. >> >> Now I know it's not, but I was designing the extension to the linear case >> on paper instead of in the interpreter, and got stuck on the wrong >> problem. >> > > There's an algorithm for making constrained-to-be-monotonic spline interpolants > (only in one dimension, though). ?The reference is Dougherty et al 1989 > Mathematics of Computation, vol 52 no 186 pp 471-494 (April 1989). ?This is > available on-line at www.jstor.org. Thanks for the reference. Maybe Ann's interpolators in scipy that take derivatives could be used for this. Shape preserving splines or piecewise polynomials would make a nice addition to scipy, but I'm only a potential user. I have dropped this for the moment, after taking a detour with (global) orthonormal polynomial approximation, where I also haven't solved the integration and function inversion problem yet (nice pdf but only brute force cdf and ppf). Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cpeters at edisonmission.com Fri Sep 30 15:00:00 2011 From: cpeters at edisonmission.com (Christopher Peters) Date: Fri, 30 Sep 2011 15:00:00 -0400 Subject: [SciPy-User] AUTO: Christopher Peters is out of the office (returning 10/04/2011) Message-ID: I am out of the office until 10/04/2011. I am out of the office. Please email urgent requests to Tom Green. Note: This is an automated response to your message "[SciPy-User] Python Job Opening at TWDB." sent on 9/30/2011 11:48:42 AM. This is the only notification you will receive while this person is away. From harijay at gmail.com Fri Sep 30 16:08:41 2011 From: harijay at gmail.com (hari jayaram) Date: Fri, 30 Sep 2011 16:08:41 -0400 Subject: [SciPy-User] newbie help: splprep to get spline interpolation of 2d signal : ValueError from code but not in command line example In-Reply-To: References: Message-ID: Thanks Pauli for your email. Repeated experimentation made me gloss over the two different functions ..splrep for spline-represent is what I needed to use as you pointed out. I could get my spline interpolation to work just great with splrep once I used only every 10th point or so since the data had a high frequency component which was not usefull to the analysis. I handled the spline calculation by using only every 10th point. Thanks for your help Hari My code that worked def plot_for_well(well_id,window=10): xvals = [] yvals = [] for vals in mega_data_dict[well_id]: xvals.append(float(vals[0])) yvals.append(float(vals[1])) window = int(window) tck =interpolate.splrep(xs,ys,k=5,s=0.3) xcalc = linspace(min(xvals),max(xvals),len(xvals)) ycalc = interpolate.splev(xcalc,tck) my_derivative = interpolate.splev(xcalc,tck,der=3) # print xcalc(xs.index(max(my_derivative))) print ycalc ax = host_subplot(111,axes_class=AA.Axes) plt.subplots_adjust(right=0.75) par1 = ax.twinx() par2 = ax.twinx() offset = 60 new_fixed_axis = par2.get_grid_helper().new_fixed_axis par2.axis["right"] = new_fixed_axis(loc="right",axes=par2,offset=(offset, 0)) # par2.axis["right"].toggle(all=True) # ax.set_yscale('log') par1.plot(xs,ys,"o",xcalc,ycalc) par2.plot(xcalc,my_derivative) plt.show() On Thu, Sep 29, 2011 at 5:34 PM, Pauli Virtanen wrote: > On 29.09.2011 23:24, hari jayaram wrote: > [clip] > > x = linspace(20,81,2339) > > y = sin(x) > > tck = interpolate.splrep(x, y) > [clip] > > tck = interpolate.splprep(x=ys) > > Note that there are two functions: `splrep` and `splprep`, and they do > somewhat different things (read the docs for each to find out more). > > You probably intendeded to write `splrep(xs, ys)` in the latter case. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sniemi at email.unc.edu Fri Sep 30 11:18:33 2011 From: sniemi at email.unc.edu (Niemi, Sami) Date: Fri, 30 Sep 2011 15:18:33 +0000 Subject: [SciPy-User] Fitting a model to an image to recover the position Message-ID: <42041109-D42D-4C0C-8911-65631307C91E@unc.edu> Hello, I am trying to solve a problem where I need to fit a model to an image to recover the position (of the model in that image). The real-life application is more complicated (fitting sparse field spectroscopic data to an SDSS r-band image, if you are interested in) than the simple example I give below, but it is analogous. The most significant difference being that in the real-life application I need to allow rotations (so I need to find a position x and y and rotation r that minimizes for example the chi**2 value) and that the difference between the model and image might be larger than the small random error applied in the simple example (and that there is less information in one of the directions because of the finite slit width). The simple example I show below works for the real-life problem, but it's far from being effective or elegant. I would like to use some in-built minimization methods like scipy.optimize.leastsq to solve this problem but I haven't found a way to get it work on my problem. Any ideas how I could do this better? Cheers, Sami Simple Example: import numpy as np def findPosition(image, model, xguess, yguess, length, width, xrange=20, yrange=20): ''' Finds the x and y position of a model in an image that minimizes chi**2 by looping a grid around the initial guess given by xguess and yguess. This method is far from optimal, so the question is how to do this with the scipy.optimize.leastsq or some other built-in min. algorithm. ''' out = [] for x in range(-xrange, xrange): for y in range(-yrange, yrange): obs = image[yguess+y:yguess+y+length, xguess+x:xguess+x+width].ravel() chi2 = np.sum((obs - model)**2 / model) out.append([chi2, x, y]) out = np.array(out) return out, np.argmin(out[:,0]) #create an image of 100 times 100 pixels of random data as an example, this represents the imaging data image = np.random.rand(10000).reshape(100,100) + 1000 #take a slice and add a small error, this represents the model that had been recovered somewhere else xpos, ypos = 80, 20 length, width = 21, 4 model = (image[ypos:ypos+length, xpos:xpos+width].copy() + np.random.rand(1)[0]*0.1).ravel() #initial guess of the position (this is close, the correct one is 80, 20) xguess, yguess = 75, 33 #find the position using the idiotic algorithm out, armin = findPosition(image, model, xguess, yguess, length, width) #print out the recovered shift and position print 'xshift yshift chi**2' print out[armin,1], out[armin, 2], out[armin, 0] print 'xfinal yfinal' print xguess+out[armin,1], yguess+out[armin, 2], 'which should be %i, %i' % (xpos, ypos)