From j.reid at mail.cryst.bbk.ac.uk Fri Jul 1 03:42:40 2011 From: j.reid at mail.cryst.bbk.ac.uk (John Reid) Date: Fri, 01 Jul 2011 08:42:40 +0100 Subject: [SciPy-User] Fitting procedure to take advantage of cluster In-Reply-To: <4E0B5E59.8050203@usi.ch> References: <4E0B58AE.3090706@cs.wisc.edu> <4E0B5E59.8050203@usi.ch> Message-ID: On 29/06/11 18:18, Giovanni Luca Ciampaglia wrote: > Hi, > there are several strategies, depending on your problem. You could use a > surrogate model, like a Gaussian Process, to fit the data (see for > example Higdon et al > http://epubs.siam.org/sisc/resource/1/sjoce3/v26/i2/p448_s1?isAuthorized=no). > I have personally used scikits.learn for GP estimation but there is also > PyMC that should do the same (never tried it). > I can also immodestly recommend my own code for Gaussian processes. It is not based on Markov chain Monte Carlo but rather a maximum likelihood approach: http://sysbio.mrc-bsu.cam.ac.uk/group/index.php/Gaussian_processes_in_python From hhh.guo at gmail.com Fri Jul 1 03:48:13 2011 From: hhh.guo at gmail.com (Ning Guo) Date: Fri, 01 Jul 2011 15:48:13 +0800 Subject: [SciPy-User] Question: scipy.stats.gamma.fit Message-ID: <4E0D7BBD.6030607@gmail.com> Dear scipy-users, I'm using scipy.stats.gamma.fit to fit a set of random variables for gamma distribution. And to validate the results I also use the fitdistr function in R. However the results generated by these two packages are different, i.e. shape parameter and scale parameter for the gamma pdf are different. Though the difference is not large, I'm wondering what causes this difference. I think both of them are using maximum likelihood estimation to fit the function. Best regards! Ning From pgmdevlist at gmail.com Fri Jul 1 05:07:36 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 11:07:36 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: Message-ID: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> On Jul 1, 2011, at 1:45 AM, David Montgomery wrote: > Hi, > > Using scikits timeseries I can create daily and hourly time series....no prob > > But.... > > I have time series at 15 minutes intervals...this I dont know how to do... > > Can a timeseries array handle 15 min intervals? > Do I use a minute intervals and use mask arrays for the missing minutes? > Also..I can figure out how to create a array at minute intervals. > > So..what is best practice? Any examples? First possibility, you get the latest experimental version of scikits.timeseries on github. There's support for multiple of frequencies (like 15min). If you're not comfortable with tinkering with experimental code, you have several solutions, depending on your problem: 1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you have a large series. Still, the easiest solution 2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you need to convert the array to another frequency (conversion routines don't really like 2D arrays...) From davidmontgomery at gmail.com Fri Jul 1 05:22:00 2011 From: davidmontgomery at gmail.com (David Montgomery) Date: Fri, 1 Jul 2011 19:22:00 +1000 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: Awesoke... for the github version...any docs or an example for creating a 15 min array? On Fri, Jul 1, 2011 at 7:07 PM, Pierre GM wrote: > > On Jul 1, 2011, at 1:45 AM, David Montgomery wrote: > >> Hi, >> >> Using scikits timeseries I can create daily and hourly time series....no prob >> >> But.... >> >> I have time series at 15 minutes intervals...this I dont know how to do... >> >> Can a timeseries array handle 15 min intervals? >> Do I use a minute intervals and use mask arrays for the missing minutes? >> Also..I can figure out how to create a array at minute intervals. >> >> So..what is best practice? ?Any examples? > > First possibility, you get the latest experimental version of scikits.timeseries on github. There's support for multiple of frequencies (like 15min). > If you're not comfortable with tinkering with experimental code, you have several solutions, depending on your problem: > 1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you have a large series. Still, the easiest solution > 2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you need to convert the array to another frequency (conversion routines don't really like 2D arrays...) > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pgmdevlist at gmail.com Fri Jul 1 06:11:15 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 12:11:15 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: > Awesoke... > > for the github version...any docs or an example for creating a 15 min array? Use the 'timestep' optional argument in scikits.timeseries.date_array. BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. From member at linkedin.com Fri Jul 1 06:23:40 2011 From: member at linkedin.com (Jelle Feringa via LinkedIn) Date: Fri, 1 Jul 2011 10:23:40 +0000 (UTC) Subject: [SciPy-User] Invitation to connect on LinkedIn Message-ID: <1579967864.3260450.1309515820011.JavaMail.app@ela4-bed32.prod> LinkedIn ------------ Jelle Feringa requested to add you as a connection on LinkedIn: ------------------------------------------ Jose, I'd like to add you to my professional network on LinkedIn. - Jelle Accept invitation from Jelle Feringa http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I182736195_20/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYMcBYRej4ScPsOe359bTBNsQ9ahRZvbP8PejsVcj8Md3cLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Jelle Feringa http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I182736195_20/30OnPkVcjoPdP8UckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/ewp/inv-22/ -- (c) 2011, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgomezdans at gmail.com Fri Jul 1 06:36:48 2011 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Fri, 1 Jul 2011 11:36:48 +0100 Subject: [SciPy-User] Weird error in fmin_l_bfgs_b Message-ID: Hi, I'm getting an error in scipy.optimize.fmin_l_bfgs_b, apparently related to the fortran wrapper. This is strange, because exactly the same problem works well with the TNC solver. I have a function that returns both a scalar value (that will be minimised) and the derivative of the function at that point. The error in the L-BFG-S solver is File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b isave, dsave) ValueError: failed to initialize intent(inout) array -- input not fortran contiguous My code looks like this: # x0 is the starting point, a 1d array >>> solution, x, info = scipy.optimize.fmin_tnc( cost_function, x0, args=([operators]), bounds=bounds ) # Using fmin_tnc works well, solution is what I expect it to be >> solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( cost_function, solution, bounds=bounds, args=[ operators ], iprint=101 ) 2011-07-01 11:34:24,703 - eoldas.Model - INFO - 46 days, 46 quantised days tnc: Version 1.3, (c) 2002-2003, Jean-Sebastien Roy (js at jeannot.org) tnc: RCS ID: @(#) $Jeannot: tnc.c,v 1.205 2005/01/28 18:27:31 js Exp $ NIT NF F GTG 0 1 1.988301629303336E+02 8.17118991E+06 tnc: fscale = 0.000249879 1 5 1.338514420154698E+01 1.82689516E+04 tnc: fscale = 0.00528464 2 9 9.476573219561992E+00 2.21390020E+04 3 19 6.684083971679802E+00 3.88897225E+03 4 69 6.274247682836059E+00 2.43671753E+03 tnc: |fn-fn-1] = 4.5037e-13 -> convergence 5 120 6.274247682835608E+00 2.43671753E+03 tnc: Converged (|f_n-f_(n-1)| ~= 0) RUNNING THE L-BFGS-B CODE * * * Machine precision = 1.084D-19 N = 46 M = 10 L = -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 X0 = 5.6013D-02 1.1717D-01 1.9201D-01 2.7557D-01 3.7013D-01 4.5702D-01 5.3491D-01 6.0661D-01 6.7624D-01 7.4649D-01 8.0318D-01 8.5203D-01 8.8633D-01 9.0102D-01 8.9914D-01 8.7521D-01 8.2816D-01 7.6529D-01 7.0559D-01 6.5371D-01 6.0520D-01 5.5814D-01 5.0991D-01 4.4783D-01 3.7790D-01 3.0041D-01 2.1894D-01 1.5147D-01 1.0832D-01 8.3926D-02 6.6473D-02 4.8621D-02 3.2567D-02 2.0086D-02 1.0881D-02 2.4890D-03 8.8000D-04 -4.2729D-03 -4.6658D-03 -5.5940D-03 -4.1690D-03 -1.2577D-02 -2.2529D-02 -2.9114D-02 -1.5938D-02 1.9755D-02 U = 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 At X0 0 variables are exactly at the bounds Traceback (most recent call last): File "example_identity.py", line 199, in main ( sys.argv ) File "example_identity.py", line 166, in main solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( cost_function, solution, bounds=bounds, args=[ operators ], iprint=101 ) File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b isave, dsave) ValueError: failed to initialize intent(inout) array -- input not fortran contiguous Any clues of where to look for issues? Thanks! jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Jul 1 12:52:08 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 1 Jul 2011 12:52:08 -0400 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 6:11 AM, Pierre GM wrote: > > On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: > >> Awesoke... >> >> for the github version...any docs or an example for creating a 15 min array? > > Use the 'timestep' optional argument in scikits.timeseries.date_array. > > BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). > Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > depending on your data manipulation needs, you could also give pandas a shot-- generating 15-minute date ranges for example is quite simple: In [3]: DateRange('7/1/2011', '7/2/2011', offset=datetools.Minute(15)) Out[3]: offset: <15 Minutes>, tzinfo: None [2011-07-01 00:00:00, ..., 2011-07-02 00:00:00] length: 97 The date range can be used to conform a time series you loaded from some source: ts.reindex(dr, method='pad') ('pad' a.k.a. "ffill" propagates values forward into holes, optional) I've got some resampling code in the works that would help with e.g. converting 15-minute data into hourly data or that sort of thing but it's in less-than-complete form at the moment so like I said depends on what you need to do. Give me a few weeks on that bit =) best, Wes From johnl at cs.wisc.edu Fri Jul 1 13:11:25 2011 From: johnl at cs.wisc.edu (J. David Lee) Date: Fri, 01 Jul 2011 12:11:25 -0500 Subject: [SciPy-User] Fitting procedure to take advantage of cluster In-Reply-To: <4E0B58AE.3090706@cs.wisc.edu> References: <4E0B58AE.3090706@cs.wisc.edu> Message-ID: <4E0DFFBD.7050905@cs.wisc.edu> On 06/29/2011 11:54 AM, J. David Lee wrote: > Hello, > > I'm attempting to perform a fit of a model function's output to some > measured data. The model has around 12 parameters, and takes tens of > minutes to run. I have access to a cluster with several thousand > processors that can run the simulations in parallel, so I'm wondering if > there are any algorithms out there that I can use to leverage this > computing power to efficiently solve my problem - that is, besides grid > searches or Monte-Carlo methods. > > Thanks for your help, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I want to thank everyone for their suggestions. I've read through most of the links presented, and am getting a clearer idea of what I need to do. Here's a quick clarification of my problem for those who are interested: I'm running a single-processor plasma simulation modeling an experiment. It has tens or hundreds of parameters, but most are constrained by measurements. For my purposes, the output consists of several x-ray spectra which I am trying to match against measured spectra. I have about 12 or 14 parameters in all that I am changing in order to match the spectra. Each run of the simulation takes a few to a few tens of minutes. I have the ability to run the compiled code on a number of machines, but I can't easily run python scripts on the machines. After some thinking, I'm considering the feasibility of parallelizing the routines in scipy's optimize module. My initial thought is to allow the user to specify a function that would run the objective function on multiple inputs. This would be useful, for example, when performing a simplex shrink, or in numerical gradient / hessian calculations with multiple variables. From my point of view, this would allow me to use a hybrid Monte-Carlo/minimization procedure to look for a global minimum. I'm interested to hear other's opinions on the matter. Thanks again, David From josef.pktd at gmail.com Fri Jul 1 14:34:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 1 Jul 2011 14:34:56 -0400 Subject: [SciPy-User] Question: scipy.stats.gamma.fit In-Reply-To: <4E0D7BBD.6030607@gmail.com> References: <4E0D7BBD.6030607@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 3:48 AM, Ning Guo wrote: > Dear scipy-users, > > I'm using scipy.stats.gamma.fit to fit a set of random variables for > gamma distribution. And to validate the results I also use the fitdistr > function in R. However the results generated by these two packages are > different, i.e. shape parameter and scale parameter for the gamma pdf > are different. Though the difference is not large, I'm wondering what > causes this difference. I think both of them are using maximum > likelihood estimation to fit the function. Do you have an example or a test case? It's difficult to guess what might be different. None of the fit methods are verified against R or tested for correctness. Contributions to the test suite and possible bugfixes would be appreciated. Josef > > Best regards! > Ning > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Fri Jul 1 17:46:53 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 1 Jul 2011 23:46:53 +0200 Subject: [SciPy-User] ANN: NumPy 1.6.1 release candidate 2 Message-ID: Hi, I am pleased to announce the availability (only a little later than planned) of the second release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Jul 1 18:08:31 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 18:08:31 -0400 Subject: [SciPy-User] Weird error in fmin_l_bfgs_b In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 6:36 AM, Jose Gomez-Dans wrote: > Hi, > I'm getting an error in scipy.optimize.fmin_l_bfgs_b, apparently related to > the fortran wrapper. This is strange, because exactly the same problem works > well with the TNC solver. I have a function that returns both a scalar value > (that will be minimised) and the derivative of the function at that point. > The error in the L-BFG-S solver is > File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, > in fmin_l_bfgs_b > ??? isave, dsave) > ValueError: failed to initialize intent(inout) array -- input not fortran > contiguous > > > My code looks like this: > > # x0 is the starting point, a 1d array >>>> solution, x, info = scipy.optimize.fmin_tnc( cost_function, x0, >>>> args=([operators]),? bounds=bounds ) > # Using fmin_tnc works well, solution is what I expect it to be > >>> solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( >>> cost_function, solution, bounds=bounds,? args=[ operators ], iprint=101 ) I've run into this before, though rarely. Is solution from TNC Fortran contiguous? It's written in C though I don't know if it matters. IIUC, it's the only thing that's intent inout and an array. solution.flags If this is indeed the problem, is this something that can be automatically fixed in the python code of fmin_l_bfgs_b beforehand at the expense of a copy? Skipper From hhh.guo at gmail.com Sat Jul 2 01:59:43 2011 From: hhh.guo at gmail.com (Ning Guo) Date: Sat, 02 Jul 2011 13:59:43 +0800 Subject: [SciPy-User] Question: scipy.stats.gamma.fit In-Reply-To: References: <4E0D7BBD.6030607@gmail.com> Message-ID: <4E0EB3CF.9060200@gmail.com> On Saturday, July 02, 2011 02:34 AM, josef.pktd at gmail.com wrote: > On Fri, Jul 1, 2011 at 3:48 AM, Ning Guo wrote: >> Dear scipy-users, >> >> I'm using scipy.stats.gamma.fit to fit a set of random variables for >> gamma distribution. And to validate the results I also use the fitdistr >> function in R. However the results generated by these two packages are >> different, i.e. shape parameter and scale parameter for the gamma pdf >> are different. Though the difference is not large, I'm wondering what >> causes this difference. I think both of them are using maximum >> likelihood estimation to fit the function. > Do you have an example or a test case? > It's difficult to guess what might be different. None of the fit > methods are verified against R or tested for correctness. > > Contributions to the test suite and possible bugfixes would be appreciated. > > Josef > >> Best regards! >> Ning >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Thanks Josef, I simply used scipy.gamma.rvs(0.8,loc=0,scale=1.2,size=1000) to generate random variables, and then used scipy.gamma.fit() to estimate the parameters. To validate, I used fitdistr function in R and gamfit function in GNU Octave. These three packages give different results (fitdistr and gamfit offer closer results). Actually, I am not a statistics guy. I supposed they should generate exactly same results since they all use MLE. But now I see not necessarily exactly the same due to that mathematical implementation of MLE is very complicated. Since results are close to each other more or less, I'll not be bothered by the differences and will choose anyone for the fitting. Best regards! Ning -- Geotechnical Group Department of Civil and Environmental Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong From mmueller at python-academy.de Sat Jul 2 08:29:31 2011 From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=) Date: Sat, 02 Jul 2011 14:29:31 +0200 Subject: [SciPy-User] PyCon DE 2011 - Call for Proposals extended to July 15, 2011 Message-ID: <4E0F0F2B.40601@python-academy.de> PyCon DE 2011 - Deadline for Proposals extended to July 15, 2011 ================================================================ The deadline for talk proposals is extended to July 15, 2011. You would like to talk about your Python project to the German-speaking Python community? Just submit your proposal within the next two weeks: http://de.pycon.org/2011/speaker/ About PyCon DE 2011 ------------------- The first PyCon DE will be held October 4-9, 2011 in Leipzig, Germany. The conference language will be German. Talks in English are possible. Please contact us for details. The call for proposals is now open. Please submit your talk by June 30, 2011 online. There are two types of talks: standard talks (20 minutes + 5 minutes Q&A) and long talks (45 minutes + 10 minutes Q&A). More details about the call can be found on the PyCon DE website: http://de.pycon.org/2011/Call_for_Papers/ Since the conference language will be German, the call is in German too. PyCon DE 2011 - Neuer Einsendeschluss f?r Vortragsvorschl?ge 15.07.2011 ======================================================================= Noch bis zum 15.7.2011 kann jeder, der sich f?r Python interessiert, einen Vortragsvorschlag f?r die PyCon DE 2011 einreichen. Es gibt nur zwei Bedingungen: das Thema sollte interessant sein und etwas mit Python zu tun haben. F?r die erste deutsche Python-Konferenz sind wir an einer breiten Themenpalette interessiert, die das ganze Spektrum der Entwicklung, Nutzung und Wirkung von Python zeigt. M?gliche Themen sind zum Beispiel: * Webanwendungen mit Python * Contentmanagement mit Python * Datenbankanwendungen mit Python * Testen mit Python * Systemintegration mit Python * Python f?r gro?e Systeme * Python im Unternehmensumfeld * Pythonimplementierungen (Jython, IronPython, PyPy, Unladen Swallow und andere) * Python als erste Programmiersprache * Grafische Nutzerschnittstellen (GUIs) * Parallele Programmierung mit Python * Python im wissenschaftlichen Bereich (Bioinformatik, Numerik, Visualisierung und anderes) * Embedded Python * Marketing f?r Python * Python, Open Source und Entwickler-Gemeinschaft * Zuk?nftige Entwicklungen * mehr ... Ihr Themenbereich ist nicht aufgelistet, w?re aber aus Ihrer Sicht f?r die PyCon DE interessant? Kein Problem. Reichen Sie Ihren Vortragsvorschlag einfach ein. Auch wir k?nnen nicht alle Anwendungsbereiche von Python ?berschauen. Vortragstage sind vom 5. bis 7. Oktober 2011. Es gibt zwei Vortragsformate: * Standard-Vortrag -- 20 Minuten Vortrag + 5 Minuten Diskussion * Lang-Vortrag -- 45 Minuten Vortrag + 10 Minuten Diskussion Die Vortragszeit wird strikt eingehalten. Bitte testen Sie die L?nge Ihres Vortrags. Lassen Sie gegebenenfalls ein paar Folien weg. Die Vortragsprache ist Deutsch. In begr?ndeten Ausnahmef?llen k?nnen Vortr?ge auch auf Englisch gehalten werden. Bitte fragen Sie uns dazu. Bitte reichen Sie Ihren Vortrag auf der Konferenz-Webseite http://de.pycon.org bis zum 15.07.2011 ein. Wir entscheiden bis zum 31. Juli 2011 ?ber die Annahme des Vortrags. From jeremy at jeremysanders.net Sat Jul 2 09:53:09 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Sat, 02 Jul 2011 14:53:09 +0100 Subject: [SciPy-User] OT: Re: ANN: Veusz 1.12 - a python-based GUI/scripted scientific plotting package References: <735FFAC9-18D7-4359-A3C0-3F2890B2F045@qwest.net> Message-ID: Jerry wrote: > Veusz appears to be a very capable plotting program. However, I can't use > it because it, like many other plotting programs, lacks a flexible and > easy import mechanism. My basis for comparison is Igor Pro which allows > the user to specify, in an easy-to-use dialog box, the details of the > formatting of the data to be imported. This includes both text and binary > files, along with Excel files. Thanks for the suggestion. I agree veusz needs a friendly import dialog. I'll have a look at the software you suggested to get some ideas... Jeremy From gorkypl at gmail.com Sat Jul 2 11:56:15 2011 From: gorkypl at gmail.com (=?UTF-8?B?UGF3ZcWC?=) Date: Sat, 2 Jul 2011 17:56:15 +0200 Subject: [SciPy-User] [scikits.timeseries] Custom xticks labels Message-ID: Hello, I have a problem with xticks while using scikits.timeseries. While plotting long series of data, the default labels of xticks are month names, and year numbers every 12th tick. I want to change this to something like mm.yy (%m.%y) under every tick. Up to now I experimented with xaxis.set_major_formatter() and TimeSeries_DateFormatter, but none of my methods work. I can do what I want in 'pure' matplotlib (using set_major_formatter and matplotlib.dates.DateFormatter), but I'm unable to achieve this while plotting timeseries using scikits.timeseries module. Can anyone help me? Any working example or a clue? greetings, Pawe? Rumian From jaimelozano09 at gmail.com Sun Jul 3 04:00:37 2011 From: jaimelozano09 at gmail.com (Jaime Lozano) Date: Sun, 3 Jul 2011 10:00:37 +0200 Subject: [SciPy-User] 64 bit compilation Message-ID: Hello scipy users, I've installed 64 bit python from source (CC=gcc -m64 ...). Now I'm trying to install scipy python setup.py install but it doesn't work. I know there are 64 bit binaries, but I want to compile it myself. What I am doing wrong? Do I need a special configuration in setup.py? Have a nice day Jaime -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jul 3 05:49:45 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jul 2011 11:49:45 +0200 Subject: [SciPy-User] 64 bit compilation In-Reply-To: References: Message-ID: On Sun, Jul 3, 2011 at 10:00 AM, Jaime Lozano wrote: > Hello scipy users, > I've installed 64 bit python from source (CC=gcc -m64 ...). Now I'm trying > to install scipy > > python setup.py install > > but it doesn't work. I know there are 64 bit binaries, but I want to > compile it myself. What I am doing wrong? Do I need a special configuration > in setup.py? > > > We need more details to answer you question. OS, compiler versions, build log, etc. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Jul 4 03:25:21 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 4 Jul 2011 09:25:21 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: <89480CCA-EC18-4C97-9E31-2EE7D38E2453@gmail.com> On Jul 1, 2011, at 6:52 PM, Wes McKinney wrote: > On Fri, Jul 1, 2011 at 6:11 AM, Pierre GM wrote: >> >> On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: >> >>> Awesoke... >>> >>> for the github version...any docs or an example for creating a 15 min array? >> >> Use the 'timestep' optional argument in scikits.timeseries.date_array. >> >> BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). >> Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > depending on your data manipulation needs, you could also give pandas > a shot-- generating 15-minute date ranges for example is quite simple: > > In [3]: DateRange('7/1/2011', '7/2/2011', offset=datetools.Minute(15)) > Out[3]: > > offset: <15 Minutes>, tzinfo: None > [2011-07-01 00:00:00, ..., 2011-07-02 00:00:00] > length: 97 > > The date range can be used to conform a time series you loaded from some source: > > ts.reindex(dr, method='pad') > > ('pad' a.k.a. "ffill" propagates values forward into holes, optional) > > I've got some resampling code in the works that would help with e.g. > converting 15-minute data into hourly data or that sort of thing but > it's in less-than-complete form at the moment so like I said depends > on what you need to do. Give me a few weeks on that bit =) Wes, have a look on the conversion functions we have in scikits.timeseries. It's just a matter of knowing where and how to slice... From Pierre.RAYBAUT at CEA.FR Tue Jul 5 08:20:26 2011 From: Pierre.RAYBAUT at CEA.FR (Pierre.RAYBAUT at CEA.FR) Date: Tue, 5 Jul 2011 14:20:26 +0200 Subject: [SciPy-User] [ANN] guidata v1.3.2 Message-ID: Hi all, I am pleased to announce that `guidata` v1.3.2 has been released. Note that the project has recently been moved to GoogleCode: http://guidata.googlecode.com Main change since `guidata` v1.3.1: Since this version, guidata is compatible with PyQt4 API #1 and API #2. Please read carefully the coding guidelines which have been recently added to the documentation. Complete changelog is available here: http://code.google.com/p/guidata/wiki/ChangeLog The `guidata` documentation with examples, API reference, etc. is available here: http://packages.python.org/guidata/ Based on the Qt Python binding module PyQt4, guidata is a Python library generating graphical user interfaces for easy dataset editing and display. It also provides helpers and application development tools for PyQt4. guidata also provides the following features: * guidata.qthelpers: PyQt4 helpers * guidata.disthelpers: py2exe helpers * guidata.userconfig: .ini configuration management helpers (based on Python standard module ConfigParser) * guidata.configtools: library/application data management * guidata.gettext_helpers: translation helpers (based on the GNU tool gettext) * guidata.guitest: automatic GUI-based test launcher * guidata.utils: miscelleneous utilities guidata has been successfully tested on GNU/Linux and Windows platforms. Python package index page: http://pypi.python.org/pypi/guidata/ Documentation, screenshots: http://packages.python.org/guidata/ Downloads (source + Python(x,y) plugin): http://guidata.googlecode.com Cheers, Pierre --- Dr. Pierre Raybaut CEA - Commissariat ? l'Energie Atomique et aux Energies Alternatives From Pierre.RAYBAUT at CEA.FR Tue Jul 5 08:21:08 2011 From: Pierre.RAYBAUT at CEA.FR (Pierre.RAYBAUT at CEA.FR) Date: Tue, 5 Jul 2011 14:21:08 +0200 Subject: [SciPy-User] [ANN] guiqwt v2.1.4 Message-ID: Hi all, I am pleased to announce that `guiqwt` v2.1.4 has been released. Based on PyQwt (plotting widgets for PyQt4 graphical user interfaces) and on the scientific modules NumPy and SciPy, guiqwt is a Python library providing efficient 2D data-plotting features (curve/image visualization and related tools) for interactive computing and signal/image processing application development. Complete change log is now available here: http://code.google.com/p/guiqwt/wiki/ChangeLog Documentation with examples, API reference, etc. is available here: http://packages.python.org/guiqwt/ This version of `guiqwt` includes a demo software, Sift (for Signal and Image Filtering Tool), based on `guidata` and `guiqwt`: http://packages.python.org/guiqwt/sift.html Windows users may even download the portable version of Sift 0.2.3 to test it without having to install anything: http://code.google.com/p/guiqwt/downloads/detail?name=sift023_portable.zip When compared to the excellent module `matplotlib`, the main advantages of `guiqwt` are: * Performance: see http://packages.python.org/guiqwt/overview.html#performances * Interactivity: see for example http://packages.python.org/guiqwt/_images/plot.png * Powerful signal processing tools: see for example http://packages.python.org/guiqwt/_images/fit.png * Powerful image processing tools: * Real-time contrast adjustment: http://packages.python.org/guiqwt/_images/contrast.png * Cross sections (line/column, averaged and oblique cross sections!): http://packages.python.org/guiqwt/_images/cross_section.png * Arbitrary affine transforms on images: http://packages.python.org/guiqwt/_images/transform.png * Interactive filters: http://packages.python.org/guiqwt/_images/imagefilter.png * Geometrical shapes/Measurement tools: http://packages.python.org/guiqwt/_images/image_plot_tools.png * Perfect integration of `guidata` features for image data editing: http://packages.python.org/guiqwt/_images/simple_window.png But `guiqwt` is more than a plotting library; it also provides: * Framework for signal/image processing application development: see http://packages.python.org/guiqwt/examples.html * And many other features like making executable Windows programs easily (py2exe helpers): see http://packages.python.org/guiqwt/disthelpers.html guiqwt has been successfully tested on GNU/Linux and Windows platforms. Python package index page: http://pypi.python.org/pypi/guiqwt/ Documentation, screenshots: http://packages.python.org/guiqwt/ Downloads (source + Python(x,y) plugin): http://guiqwt.googlecode.com Cheers, Pierre --- Dr. Pierre Raybaut CEA - Commissariat ? l'Energie Atomique et aux Energies Alternatives From josef.pktd at gmail.com Tue Jul 5 11:16:02 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 Jul 2011 11:16:02 -0400 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: <4DE81898.1070701@molden.no> References: <4DE7B957.2060305@gmail.com> <4DE81898.1070701@molden.no> Message-ID: On Thu, Jun 2, 2011 at 7:11 PM, Sturla Molden wrote: > Den 02.06.2011 18:54, skrev josef.pktd at gmail.com: >> >> If these are the alternatives, then I will stick with rejection sampling. >> I'm not starting to learn the implementation details of simulating >> with MCMC, Metropolis-Hastings or Gibbs, and leave it to the pymc >> developers and to Wes. > > Metropolis-Hastings is a form of rejection sampling. > > It's just a way to reduce the number of rejections, particularly when the sample space is large. > > > > >> rtmvnorm has a big Warning label about the Gibbs sampler, although, >> for MonteCarlo integration, any serial correlation in the sampler >> won't be very relevant. > > You will get serial correlation with MCMC, but remember they are still > samples from the stationary distribution of the Markov chain. You can > still use these samples to compute mean, standard deviation, KDE, > numerical integrals, etc. (a late reply, since I was looking at this again.) Thanks Sturla, I never thought of this distinction in the usage. Until now I was mostly interested in simulating stochastic processes, time series, or random samples for regression. In these cases I cannot have any spurious serial correlation. But now that I need more Monte Carlo integration, this will be useful. Josef > > > Sturla > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Wed Jul 6 01:30:36 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 6 Jul 2011 07:30:36 +0200 Subject: [SciPy-User] [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: Message-ID: On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: > In article , > Ralf Gommers wrote: > > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > Will there be a Mac binary for 32-bit pythons (one that is compatible > with older versions of MacOS X)? At present I only see a 64-bit > 10.6-only version. > > > Yes there will be for the final release (10.4-10.6 compatible). I can't create those on my own computer, so sometimes I don't make them for RCs. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Wed Jul 6 15:59:36 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Wed, 6 Jul 2011 21:59:36 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers Message-ID: Debian 6.0.1 Python 2.6.7 ifort 11.1 20100806 icc 11.1 20091130 mkl 10.3 After successfully building latest stable NumPy (1.6.1RC2) I tried to build SciPy (0.9.0) following two sources: http://new.scipy.org/building/linux.html http://blog.sun.tc/2010/11/numpy-and-scipy-with-intel-mkl-on-linux.html But I get this error: ------------------------------------------------ ... building 'qhull' library compiling C sources C compiler: icc -fPIC compile options: '-I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c' icc: scipy/spatial/qhull/src/poly.c scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration has no storage class or type specifier template ^ scipy/spatial/qhull/src/qhull_a.h(106): error: expected a ";" template ^ scipy/spatial/qhull/src/qhull_a.h(115): warning #12: parsing restarts here after previous syntax error void qh_qhull(void); ^ compilation aborted for scipy/spatial/qhull/src/poly.c (code 2) scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration has no storage class or type specifier template ^ scipy/spatial/qhull/src/qhull_a.h(106): error: expected a ";" template ^ scipy/spatial/qhull/src/qhull_a.h(115): warning #12: parsing restarts here after previous syntax error void qh_qhull(void); ^ compilation aborted for scipy/spatial/qhull/src/poly.c (code 2) error: Command "icc -fPIC -I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c scipy/spatial/qhull/src/poly.c -o build/temp.linux-i686-2.6/scipy/spatial/qhull/src/poly.o" failed with exit status 2 ------------------------------------------------ As Google did not return anything about this error, and I just installed Debian two days ago, without any Linux experience, I might miss something. Or what can cause this error? Thanks From dbrown at ucar.edu Wed Jul 6 18:14:03 2011 From: dbrown at ucar.edu (David Brown) Date: Wed, 6 Jul 2011 16:14:03 -0600 Subject: [SciPy-User] Call for papers: AMS Jan 22-26, 2012 Message-ID: <583E9BD8-07AE-4C04-B52E-782631E060C5@ucar.edu> I would like to call to the attention of the SciPy community the following call for papers: Second Symposium on Advances in Modeling and Analysis Using Python, 22?26 January 2012, New Orleans, Louisiana The Second Symposium on Advances in Modeling and Analysis Using Python, sponsored by the American Meteorological Society, will be held 22?26 January 2012, as part of the 92nd AMS Annual Meeting in New Orleans, Louisiana. Preliminary programs, registration, hotel, and general information will be posted on the AMS Web site (http://www.ametsoc.org/meet/annual/) in late-September 2011. The application of object-oriented programming and other advances in computer science to the atmospheric and oceanic sciences has in turn led to advances in modeling and analysis tools and methods. This symposium focuses on applications of the open-source language Python and seeks to disseminate advances using Python in the atmospheric and oceanic sciences, as well as grow the earth sciences Python community. Papers describing Python work in applications, methodologies, and package development in all areas of meteorology, climatology, oceanography, and space sciences are welcome, including (but not limited to): modeling, time series analysis, air quality, satellite data processing, in-situ data analysis, GIS, Python as a software integration platform, visualization, gridding, model intercomparison, and very large (petabyte) dataset manipulation and access. The $95 abstract fee includes the submission of your abstract, the posting of your extended abstract, and the uploading and recording of your presentation which will be archived on the AMS Web site. Please submit your abstract electronically via the Web by 1 August 2011 (refer to the AMS Web page athttp://www.ametsoc.org/meet/online_submit.html.) An abstract fee of $95 (payable by credit card or purchase order) is charged at the time of submission (refundable only if abstract is not accepted). Authors of accepted presentations will be notified via e-mail by late-September 2011. All extended abstracts are to be submitted electronically and will be available on-line via the Web, Instructions for formatting extended abstracts will be posted on the AMS Web site. Manuscripts (up to 3MB) must be submitted electronically by 22 February 2012. All abstracts, extended abstracts and presentations will be available on the AMS Web site at no cost. For additional information, please contact the program chairperson, Johnny Lin, Physics Department, North Park University (jlin at northpark.edu). (5/11) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Jul 6 22:36:32 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 Jul 2011 11:36:32 +0900 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 4:59 AM, Klonuo Umom wrote: > Debian 6.0.1 > Python 2.6.7 > > ifort 11.1 20100806 > icc 11.1 20091130 > mkl 10.3 > > > After successfully building latest stable NumPy (1.6.1RC2) I tried to > build SciPy (0.9.0) following two sources: > > http://new.scipy.org/building/linux.html > http://blog.sun.tc/2010/11/numpy-and-scipy-with-intel-mkl-on-linux.html > > But I get this error: > > ------------------------------------------------ > > ... > > building 'qhull' library > compiling C sources > C compiler: icc -fPIC > > compile options: '-I/usr/include/python2.6 > -I/usr/local/lib/python2.6/dist-packages/numpy/core/include > -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c' > icc: scipy/spatial/qhull/src/poly.c > scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration > has no storage class or type specifier > ?template This error is quite strange: the header uses templates (even though it is C code !!), only in the case of intel compiler. That makes no sense to me. From a quick look at the code, this template definition is only a trick to avoid warning for unused code, but that can be implemented in ICC specific manner without resorting to C++. I think you can safely define QHULL_UNUSED macro to (x) only: template inline void qhullUnused(T &x) { (void)x; } # define QHULL_UNUSED(x) qhullUnused(x); becomes: #define QHULL_UNUSED(x) (x) in qhull_a.h (line 106) cheers, David From wkerzendorf at googlemail.com Thu Jul 7 03:51:28 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Thu, 07 Jul 2011 09:51:28 +0200 Subject: [SciPy-User] reading in files with fixed with format Message-ID: <4E156580.6010309@gmail.com> Dear all, I have a couple of data files that were written with fortran at a fixed with. That means its tabular data which might not have spaces (it is just specified how many characters each field has and what type it is). Is there anything to read that with scipy and or numpy? Cheers Wolfgang From klonuo at gmail.com Thu Jul 7 04:24:21 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Thu, 7 Jul 2011 10:24:21 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: Thanks David, that did the trick :) I'm C illiterate, and I did search before mailing here, for templates syntax and usage, but explanations were very confusing to me unfortunately, even thou there were examples exactly for 'template ' However after a while, building was again interrupted on 'scipy/sparse/sparsetools' I uploaded part of the log here: http://dl.dropbox.com/u/6735093/scipy.txt which I also compressed and attached, just in case Thanks for your assistance On Thu, Jul 7, 2011 at 4:36 AM, David Cournapeau wrote: > I think you can safely define QHULL_UNUSED macro to (x) only: > > template > inline void qhullUnused(T &x) { (void)x; } > # ?define QHULL_UNUSED(x) qhullUnused(x); > > becomes: > > #define QHULL_UNUSED(x) (x) > > in qhull_a.h (line 106) -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy.log.7z Type: application/x-7z-compressed Size: 1365 bytes Desc: not available URL: From athanastasiou at gmail.com Thu Jul 7 06:57:16 2011 From: athanastasiou at gmail.com (Athanasios Anastasiou) Date: Thu, 7 Jul 2011 11:57:16 +0100 Subject: [SciPy-User] reading in files with fixed with format In-Reply-To: <4E156580.6010309@gmail.com> References: <4E156580.6010309@gmail.com> Message-ID: Hello Wolfgang My understanding is that you have binary data packed in a file as a series of numbers (like a vector of floats or doubles) or as a series of structures (like a set of records) as opposed to fixed width text data. To read and convert binary data you can use the struct module (http://docs.python.org/library/struct.html). The documentation contains some representative examples. You just have to make sure that unpack's fmt "structure definition" matches exactly the fortran definition in terms of data types. Also, please note that fortran stores arrays in column major format (rather than row major format which is the default for C). You might have to take this into account depending on the way the data was written to your files originally. All the best Athanasios On Thu, Jul 7, 2011 at 8:51 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have a couple of data files that were written with fortran at a fixed > with. That means its tabular data which might not have spaces (it is > just specified how many characters each field has and what type it is). > Is there anything to read that with scipy and or numpy? > > Cheers > ? ? Wolfgang > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From collinstocks at gmail.com Wed Jul 6 14:17:56 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Wed, 6 Jul 2011 11:17:56 -0700 (PDT) Subject: [SciPy-User] linalg.decomp code contribution Message-ID: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Over the past several years, there have been various requests for the linalg.decomp.qr() function to (optionally) implement qr with pivots. I have written a solution which, unfortunately, requires cvxopt, since there is no wrapper within scipy for lapack.geqp3(). However, this should be an effective starting place for anyone who wants to complete the job. If this does eventually end up in scipy, I would appreciate a little mention, though! :) The code can also be found at: http://pastebin.com/HFJdQ67H I have also extended the "mode" keyword slightly, and have removed the "econ" keyword, as it is deprecated anyway (assuming that I have the very latest release on my own machine, which is not all that likely...). ######################################################## import cvxopt from cvxopt import lapack def qr(a, overwrite_a=0, lwork=None, mode='qr'): """Compute QR decomposition of a matrix. Calculate the decomposition :lm:`A = Q R` where Q is unitary/ orthogonal and R upper triangular. Parameters ---------- a : array, shape (M, N) Matrix to be decomposed overwrite_a : boolean Whether data in a is overwritten (may improve performance) lwork : integer Work array size, lwork >= a.shape[1]. If None or -1, an optimal size is computed. mode : {'qr', 'r', 'qrp'} Determines what information is to be returned: either both Q and R or only R, or Q and R and P, a permutation matrix. Any of these can be combined with 'economic' using the '+' sign as a separator. Economic mode means the following: Compute the economy-size QR decomposition, making shapes of Q and R (M, K) and (K, N) instead of (M,M) and (M,N). K=min(M,N). Returns ------- (if mode == 'qr') Q : double or complex array, shape (M, M) or (M, K) for econ==True (for any mode) R : double or complex array, shape (M, N) or (K, N) for econ==True Size K = min(M, N) Raises LinAlgError if decomposition fails Notes ----- This is an interface to the LAPACK routines dgeqrf, zgeqrf, dorgqr, and zungqr. Examples -------- >>> from scipy import random, linalg, dot >>> a = random.randn(9, 6) >>> q, r = linalg.qr(a) >>> allclose(a, dot(q, r)) True >>> q.shape, r.shape ((9, 9), (9, 6)) >>> r2 = linalg.qr(a, mode='r') >>> allclose(r, r2) >>> q3, r3 = linalg.qr(a, econ=True) >>> q3.shape, r3.shape ((9, 6), (6, 6)) """ mode = mode.split("+") if "economic" in mode: econ = True else: econ = False a1 = asarray_chkfinite(a) if len(a1.shape) != 2: raise ValueError("expected 2D array") M, N = a1.shape overwrite_a = overwrite_a or (_datanotshared(a1,a)) if 'qrp' in mode: qr = cvxopt.matrix(np.asarray(a1, dtype = float)) tau = cvxopt.matrix(np.zeros(min(M, N), dtype = float)) jpvt = cvxopt.matrix(np.zeros(N, dtype = int)) lapack.geqp3(qr, jpvt, tau) qr = np.asarray(qr) tau = np.asarray(tau) jpvt = (np.asarray(jpvt) - 1).flatten() else: geqrf, = get_lapack_funcs(('geqrf',),(a1,)) if lwork is None or lwork == -1: # get optimal work array qr,tau,work,info = geqrf(a1,lwork=-1,overwrite_a=1) lwork = work[0] qr,tau,work,info = geqrf(a1,lwork=lwork,overwrite_a=overwrite_a) if info<0: raise ValueError("illegal value in %-th argument of internal geqrf" % -info) if not econ or M References: <4E156580.6010309@gmail.com> Message-ID: <20110707085140.GB26058@poincare.pc.linmpi.mpg.de> The function numpy.genfromtxt reads text files into arrays. There is an example on how to deal with fixed-width columns using the delimiter argument in the docstring and in the I/O chapter of the user guide: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#the-delimiter-argument Miguel On Thu, Jul 07, 2011 at 09:51:28AM +0200, Wolfgang Kerzendorf wrote: > Dear all, > > I have a couple of data files that were written with fortran at a fixed > with. That means its tabular data which might not have spaces (it is > just specified how many characters each field has and what type it is). > Is there anything to read that with scipy and or numpy? > > Cheers > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gnurser at gmail.com Thu Jul 7 09:43:40 2011 From: gnurser at gmail.com (George Nurser) Date: Thu, 7 Jul 2011 14:43:40 +0100 Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX Message-ID: Hi, Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 I just pulled the latest version of scipy master; git log gives top commit as e86854bfab64a16c026106ccdf90761328b83272 scipy.test('10') gives a segfault. First few lines of the Apple DiagnosticReport are: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000002b9e72320 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libBLAS.dylib 0x00007fff87b9d010 ATL_zdotc_xp0yp0aXbX + 48 1 _iterative.so 0x0000000103b9cf8c zgmresrevcom_ + 1980 2 _iterative.so 0x0000000103b876eb f2py_rout__iterative_zgmresrevcom + 1563 (_iterativemodule.c:5041) 3 org.python.python 0x000000010000c9f2 PyObject_Call + 98 The whole of the segfault trace is attached as scipy.segfault --George Nurser Output of scipy.test('10') below: Running unit tests for scipy NumPy version 1.6.1rc2 NumPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy SciPy version 0.10.0.dev SciPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy Python version 2.7.1 (r271:86882M, Nov 30 2010, 10:35:34) [GCC 4.2.1 (Apple Inc. build 5664)] nose version 0.11.3 ............................................................................................................................................................................................................................K............................................................................................................/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:678: UserWarning: The coefficients of the spline returned have been computed as the minimal norm least-squares solution of a (numerically) rank deficient system (deficiency=7). If deficiency is large, the results may be inaccurate. Deficiency may strongly depend on the value of eps. warnings.warn(message) ....../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:609: UserWarning: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots. warnings.warn(message) .......................K..K....../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1920: RuntimeWarning: invalid value encountered in absolute return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) ................................................................................................................................................................................................................................................................................................................................................................................................................./Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/wavfile.py:31: WavFileWarning: Unfamiliar format bytes warnings.warn("Unfamiliar format bytes", WavFileWarning) /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/wavfile.py:121: WavFileWarning: chunk not understood warnings.warn("chunk not understood", WavFileWarning) ................................................................................./Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/utils.py:139: DeprecationWarning: `get_blas_funcs` is deprecated! warnings.warn(depdoc, DeprecationWarning) ..............................................................................................................................................SSSSSS......SSSSSS......SSSS............................................................................S................................................................................................................................................................................................................................................K.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:259: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ' install scikits.umfpack instead', DeprecationWarning ) ../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:75: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ' install scikits.umfpack instead', DeprecationWarning ) ................F...F..F.....FF.EF..FEF.E.E...E........E...EF.F.F.FEE..E..E...................................................................F.......E.EFFFE.F..EF.E...EE...FEE...F.F..FEEF...EE..E............................................................EEEFFEEESegmentation fault -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy.segfault Type: application/octet-stream Size: 59579 bytes Desc: not available URL: From pav at iki.fi Thu Jul 7 13:59:53 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 17:59:53 +0000 (UTC) Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX References: Message-ID: On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: > Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org > (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 Known issue: http://projects.scipy.org/scipy/ticket/1472 From pav at iki.fi Thu Jul 7 14:09:05 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 18:09:05 +0000 (UTC) Subject: [SciPy-User] linalg.decomp code contribution References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: Hi, On Wed, 06 Jul 2011 11:17:56 -0700, Collin Stocks wrote: > Over the past several years, there have been various requests for the > linalg.decomp.qr() function to (optionally) implement qr with pivots. I > have written a solution which, unfortunately, requires cvxopt, since > there is no wrapper within scipy for lapack.geqp3(). However, this > should be an effective starting place for anyone who wants to complete > the job. If this does eventually end up in scipy, I would appreciate a > little mention, though! :) I'd suggest creating an enhancement ticket on the Scipy Trac (if a related ticket does not already exist), and attaching your code to it: http://projects.scipy.org/scipy/ The reason is that if there is no-one who can spare time to look at this immediately, then there is a high likelihood that your mail will be lost and forgotten in the mailing list traffic. Cheers, Pauli From pav at iki.fi Thu Jul 7 15:08:14 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 19:08:14 +0000 (UTC) Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX References: Message-ID: On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: > Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org > (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 > > I just pulled the latest version of scipy master; git log gives top > commit as e86854bfab64a16c026106ccdf90761328b83272 Some possible fixes went in. Can you try again with the current Git master? From collinstocks at gmail.com Thu Jul 7 17:42:59 2011 From: collinstocks at gmail.com (collinstocks at gmail.com) Date: Thu, 7 Jul 2011 17:42:59 -0400 Subject: [SciPy-User] linalg.decomp code contribution In-Reply-To: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: I realize that no one has responded to my post yet, and I hate to double post like this, but this update is rather important. The code I attached previously contains a bug (well, makes a bug in cvxopt apparent, actually...) whereby creating new matrices in cvxopt from numpy arrays can sometimes cause a segmentation fault or memory access violation (or something...Windows doesn't really seem to give enough information to differentiate between different types of failures) under certain circumstances on Windows XP. I have fixed my code and attached it. I also changed some abbreviations in my code ("np") to the full name ("numpy") to be more consistent with the surrounding code. -- Collin On Wed, Jul 6, 2011 at 14:17, Collin Stocks wrote: > Over the past several years, there have been various requests for the > linalg.decomp.qr() function to (optionally) implement qr with pivots. > I have written a solution which, unfortunately, requires cvxopt, since > there is no wrapper within scipy for lapack.geqp3(). However, this > should be an effective starting place for anyone who wants to complete > the job. If this does eventually end up in scipy, I would appreciate a > little mention, though! :) > > The code can also be found at: http://pastebin.com/HFJdQ67H > > I have also extended the "mode" keyword slightly, and have removed the > "econ" keyword, as it is deprecated anyway (assuming that I have the > very latest release on my own machine, which is not all that > likely...). > > ######################################################## > import cvxopt > from cvxopt import lapack > > def qr(a, overwrite_a=0, lwork=None, mode='qr'): > """Compute QR decomposition of a matrix. > > Calculate the decomposition :lm:`A = Q R` where Q is unitary/ > orthogonal > and R upper triangular. > > Parameters > ---------- > a : array, shape (M, N) > Matrix to be decomposed > overwrite_a : boolean > Whether data in a is overwritten (may improve performance) > lwork : integer > Work array size, lwork >= a.shape[1]. If None or -1, an > optimal size > is computed. > mode : {'qr', 'r', 'qrp'} > Determines what information is to be returned: either both Q > and R > or only R, or Q and R and P, a permutation matrix. Any of > these can > be combined with 'economic' using the '+' sign as a separator. > Economic mode means the following: > Compute the economy-size QR decomposition, making shapes > of Q and R (M, K) and (K, N) instead of (M,M) and (M,N). > K=min(M,N). > > Returns > ------- > (if mode == 'qr') > Q : double or complex array, shape (M, M) or (M, K) for econ==True > > (for any mode) > R : double or complex array, shape (M, N) or (K, N) for econ==True > Size K = min(M, N) > > Raises LinAlgError if decomposition fails > > Notes > ----- > This is an interface to the LAPACK routines dgeqrf, zgeqrf, > dorgqr, and zungqr. > > Examples > -------- > >>> from scipy import random, linalg, dot > >>> a = random.randn(9, 6) > >>> q, r = linalg.qr(a) > >>> allclose(a, dot(q, r)) > True > >>> q.shape, r.shape > ((9, 9), (9, 6)) > > >>> r2 = linalg.qr(a, mode='r') > >>> allclose(r, r2) > > >>> q3, r3 = linalg.qr(a, econ=True) > >>> q3.shape, r3.shape > ((9, 6), (6, 6)) > > """ > mode = mode.split("+") > if "economic" in mode: > econ = True > else: > econ = False > > a1 = asarray_chkfinite(a) > if len(a1.shape) != 2: > raise ValueError("expected 2D array") > M, N = a1.shape > overwrite_a = overwrite_a or (_datanotshared(a1,a)) > > if 'qrp' in mode: > qr = cvxopt.matrix(np.asarray(a1, dtype = float)) > > tau = cvxopt.matrix(np.zeros(min(M, N), dtype = float)) > jpvt = cvxopt.matrix(np.zeros(N, dtype = int)) > > lapack.geqp3(qr, jpvt, tau) > > qr = np.asarray(qr) > tau = np.asarray(tau) > jpvt = (np.asarray(jpvt) - 1).flatten() > else: > geqrf, = get_lapack_funcs(('geqrf',),(a1,)) > if lwork is None or lwork == -1: > # get optimal work array > qr,tau,work,info = geqrf(a1,lwork=-1,overwrite_a=1) > lwork = work[0] > > qr,tau,work,info = > geqrf(a1,lwork=lwork,overwrite_a=overwrite_a) > if info<0: > raise ValueError("illegal value in %-th argument of > internal geqrf" > % -info) > > if not econ or M R = basic.triu(qr) > else: > R = basic.triu(qr[0:N,0:N]) > > if 'r' in mode: > return R > > if find_best_lapack_type((a1,))[0]=='s' or > find_best_lapack_type((a1,))[0]=='d': > gor_un_gqr, = get_lapack_funcs(('orgqr',),(qr,)) > else: > gor_un_gqr, = get_lapack_funcs(('ungqr',),(qr,)) > > if M # get optimal work array > Q,work,info = gor_un_gqr(qr[:,0:M],tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qr[:, > 0:M],tau,lwork=lwork,overwrite_a=1) > elif econ: > # get optimal work array > Q,work,info = gor_un_gqr(qr,tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qr,tau,lwork=lwork,overwrite_a=1) > else: > t = qr.dtype.char > qqr = numpy.empty((M,M),dtype=t) > qqr[:,0:N]=qr > # get optimal work array > Q,work,info = gor_un_gqr(qqr,tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qqr,tau,lwork=lwork,overwrite_a=1) > > if info < 0: > raise ValueError("illegal value in %-th argument of internal > gorgqr" > % -info) > > if 'qrp' in mode: > return Q, R, jpvt > > return Q, R > ######################################################## -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: qr.py Type: application/octet-stream Size: 4209 bytes Desc: not available URL: From collinstocks at gmail.com Thu Jul 7 17:47:36 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Thu, 7 Jul 2011 14:47:36 -0700 (PDT) Subject: [SciPy-User] linalg.decomp code contribution In-Reply-To: References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: <78bc40e3-9c29-4b56-9a40-aaeb3866612f@a10g2000vbz.googlegroups.com> Oh, thank you. For some reason, I did not see this before replying to the thread again. I will take a look at that. Thanks, Collin On Jul 7, 2:09?pm, Pauli Virtanen wrote: > Hi, > > On Wed, 06 Jul 2011 11:17:56 -0700, Collin Stocks wrote: > > Over the past several years, there have been various requests for the > > linalg.decomp.qr() function to (optionally) implement qr with pivots. I > > have written a solution which, unfortunately, requires cvxopt, since > > there is no wrapper within scipy for lapack.geqp3(). However, this > > should be an effective starting place for anyone who wants to complete > > the job. If this does eventually end up in scipy, I would appreciate a > > little mention, though! :) > > I'd suggest creating an enhancement ticket on the Scipy Trac (if a related > ticket does not already exist), and attaching your code to it: > > ? ?http://projects.scipy.org/scipy/ > > The reason is that if there is no-one who can spare time to look at this > immediately, then there is a high likelihood that your mail will be lost > and forgotten in the mailing list traffic. > > Cheers, > Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From gnurser at gmail.com Fri Jul 8 08:07:15 2011 From: gnurser at gmail.com (George Nurser) Date: Fri, 8 Jul 2011 13:07:15 +0100 Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX In-Reply-To: References: Message-ID: Hi Pauli, I've given the revised code a try. Still fails, but some progress. Now fails at zgmresrevcom_ + 1990 instead of zgmresrevcom_ + 1980 Apple ProblemReport: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000002b9e72320 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libBLAS.dylib 0x00007fff87b9d010 ATL_zdotc_xp0yp0aXbX + 48 1 _iterative.so 0x0000000103b9cf56 zgmresrevcom_ + 1990 2 _iterative.so 0x0000000103b876ab f2py_rout__iterative_zgmresrevcom + 1563 (_iterativemodule.c:5041) On 7 July 2011 20:08, Pauli Virtanen wrote: > On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: >> Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org >> (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 >> >> I just pulled the latest version of scipy master; ?git log gives top >> commit as e86854bfab64a16c026106ccdf90761328b83272 > > Some possible fixes went in. Can you try again with the current Git master? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From klonuo at gmail.com Fri Jul 8 15:59:38 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Fri, 8 Jul 2011 21:59:38 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: I found similar problem here: http://comments.gmane.org/gmane.comp.python.scientific.user/25843 but changing compiler to 'icpc' did not solve this problem, which is: /usr/include/c++/4.6/bits/stl_uninitialized.h(225): error: identifier "__is_trivial" is undefined or more here: http://dl.dropbox.com/u/6735093/scipy.txt Does anyone have some idea what might be wrong? Thanks From jsseabold at gmail.com Fri Jul 8 18:41:19 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 8 Jul 2011 18:41:19 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? Message-ID: A ticket was filed [1] for ttest_ind (same issue with ttest_rel and ttest_1samp) in the case of identical means and no variance. Same means, no variance d1 = np.ones(10) d2 = np.array([1,1.]) stats.ttest_ind(d1,d2) (1.0, 0.34089313230206009) Different means, no variance d1 = np.array([ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.]) d2 = np.array([ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]) stats.ttest_ind(d1,d2) (inf, 0.0) The first result doesn't make sense. In the code there are conflicting notes (with each other and what the code does) for catching this https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 I think defining t = 0/0 to be 0 is the least wrong thing to do, but certainly not t = 0/0 as 1, which gives an arbitrary p-value depending on sample sizes. Is there an accepted definition for this case? Does returning (nan, 1.0) make more sense? Skipper [1] http://projects.scipy.org/scipy/ticket/1475 From josef.pktd at gmail.com Fri Jul 8 18:51:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 8 Jul 2011 18:51:56 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: > A ticket was filed [1] for ttest_ind (same issue with ttest_rel and > ttest_1samp) in the case of identical means and no variance. > > Same means, no variance > > d1 = np.ones(10) > d2 = np.array([1,1.]) > stats.ttest_ind(d1,d2) > (1.0, 0.34089313230206009) > > Different means, no variance > > d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) > d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) > stats.ttest_ind(d1,d2) > (inf, 0.0) > > The first result doesn't make sense. In the code there are conflicting > notes (with each other and what the code does) for catching this > > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 > > I think defining t = 0/0 to be 0 is the least wrong thing to do, but > certainly not t = 0/0 as 1, which gives an arbitrary p-value depending > on sample sizes. Is there an accepted definition for this case? Does > returning (nan, 1.0) make more sense? > > Skipper > > [1] http://projects.scipy.org/scipy/ticket/1475 scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the original change. If anyone finds a justification for the 0/0 case, .... Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Fri Jul 8 19:06:42 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 8 Jul 2011 19:06:42 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 6:51 PM, wrote: > On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: >> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and >> ttest_1samp) in the case of identical means and no variance. >> >> Same means, no variance >> >> d1 = np.ones(10) >> d2 = np.array([1,1.]) >> stats.ttest_ind(d1,d2) >> (1.0, 0.34089313230206009) >> >> Different means, no variance >> >> d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) >> d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) >> stats.ttest_ind(d1,d2) >> (inf, 0.0) >> >> The first result doesn't make sense. In the code there are conflicting >> notes (with each other and what the code does) for catching this >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 >> >> I think defining t = 0/0 to be 0 is the least wrong thing to do, but >> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending >> on sample sizes. Is there an accepted definition for this case? Does >> returning (nan, 1.0) make more sense? >> >> Skipper >> >> [1] http://projects.scipy.org/scipy/ticket/1475 > > scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the > original change. > > If anyone finds a justification for the 0/0 case, .... > I have the same intuition as your initial thought. Setting it to 1 *seems* aribitrary. I'd have to think more now than I have time for any justification though. Apologies for not searching and making noise instead, Skipper From josef.pktd at gmail.com Fri Jul 8 19:19:55 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 8 Jul 2011 19:19:55 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 7:06 PM, Skipper Seabold wrote: > On Fri, Jul 8, 2011 at 6:51 PM, ? wrote: >> On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: >>> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and >>> ttest_1samp) in the case of identical means and no variance. >>> >>> Same means, no variance >>> >>> d1 = np.ones(10) >>> d2 = np.array([1,1.]) >>> stats.ttest_ind(d1,d2) >>> (1.0, 0.34089313230206009) >>> >>> Different means, no variance >>> >>> d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) >>> d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) >>> stats.ttest_ind(d1,d2) >>> (inf, 0.0) >>> >>> The first result doesn't make sense. In the code there are conflicting >>> notes (with each other and what the code does) for catching this >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 >>> >>> I think defining t = 0/0 to be 0 is the least wrong thing to do, but >>> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending >>> on sample sizes. Is there an accepted definition for this case? Does >>> returning (nan, 1.0) make more sense? >>> >>> Skipper >>> >>> [1] http://projects.scipy.org/scipy/ticket/1475 >> >> scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the >> original change. >> >> If anyone finds a justification for the 0/0 case, .... >> > > I have the same intuition as your initial thought. Setting it to 1 > *seems* aribitrary. I'd have to think more now than I have time for > any justification though. > > Apologies for not searching and making noise instead, noise is fine, even better if someone comes up with a real justification. I went in circles in my arguments several times. The main justification is, given that the underlying assumption is that the samples come from normal distributions, the only way we could observe identical values is if the variance goes to zero and we have a degenerate normal distribution. After that I was trying to take different limits, then ... Or suppose the true distribution is normal, but we observe only a (machine precision) discretized sample, .... Or suppose we have a large sample normal approximation to some discrete data, ... (but we only have 5 observations.) Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From njs at pobox.com Sat Jul 9 11:19:55 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Jul 2011 08:19:55 -0700 Subject: [SciPy-User] Cholesky for semi-definite matrices? Message-ID: Hi all, I've run into a case where it'd be convenient to be able to compute the Cholesky decomposition of a semi-definite matrix. (It's a covariance matrix computed from not-enough samples, so it's positive semi-definite but rank-deficient.) As any schoolchild knows, the Cholesky is well defined for such cases, but I guess that shows why you shouldn't trust schoolchildren, because I guess the standard implementations blow up if you try: In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) LinAlgError: Matrix is not positive definite - Cholesky decomposition cannot be computed Is there an easy way to do this? -- Nathaniel From charlesr.harris at gmail.com Sat Jul 9 13:30:57 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Jul 2011 11:30:57 -0600 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: On Sat, Jul 9, 2011 at 9:19 AM, Nathaniel Smith wrote: > Hi all, > > I've run into a case where it'd be convenient to be able to compute > the Cholesky decomposition of a semi-definite matrix. (It's a > covariance matrix computed from not-enough samples, so it's positive > semi-definite but rank-deficient.) As any schoolchild knows, the > Cholesky is well defined for such cases, but I guess that shows why > you shouldn't trust schoolchildren, because I guess the standard > implementations blow up if you try: > > In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) > LinAlgError: Matrix is not positive definite - Cholesky > decomposition cannot be computed > > Is there an easy way to do this? > > The positive definite Cholesky algorithm is basically Gauss elimination without pivoting. The semidefinite factorization is generally not unique unless pivoting is introduced to move zero factors to a common spot and if your matrix has errors the idea of semidefinite becomes pretty indefinite anyway. Depending on the use case, you will probably have more success with either eigh or svd. Note that in the svd case the singular values are always positive, but the resulting factorization will not be symmetric when one of the eigenvalues is negative. In the eigh case you can simply set small, or negative, eigenvalues to zero. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Sat Jul 9 13:36:01 2011 From: e.antero.tammi at gmail.com (eat) Date: Sat, 9 Jul 2011 20:36:01 +0300 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: Hi, On Sat, Jul 9, 2011 at 6:19 PM, Nathaniel Smith wrote: > Hi all, > > I've run into a case where it'd be convenient to be able to compute > the Cholesky decomposition of a semi-definite matrix. (It's a > covariance matrix computed from not-enough samples, so it's positive > semi-definite but rank-deficient.) As any schoolchild knows, the > Cholesky is well defined for such cases, but I guess that shows why > you shouldn't trust schoolchildren, because I guess the standard > implementations blow up if you try: > > In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) > LinAlgError: Matrix is not positive definite - Cholesky > decomposition cannot be computed > > Is there an easy way to do this? > How bout a slight regularization, like: In []: A Out[]: array([[1, 0], [0, 0]]) In []: A+ 1e-14* eye(2) Out[]: array([[ 1.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 1.00000000e-14]]) In []: L= linalg.cholesky(A+ 1e-14* eye(2)) In []: dot(L, L.T) Out[]: array([[ 1.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 1.00000000e-14]]) My 2 cents, - eat > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.berg14 at gmail.com Sun Jul 10 00:45:37 2011 From: neil.berg14 at gmail.com (Neil Berg) Date: Sat, 9 Jul 2011 21:45:37 -0700 Subject: [SciPy-User] scipy.io.netcdf error Message-ID: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> Hello scipy community, I recently installed scipy 0.10 through Chris Fonnesbeck's Scipy Superpack script. When trying to read a netCDF file I receive an error message. >>> python >>> from scipy.io import netcdf >>> f = netcdf.netcdf_file('/Users/vert/Downloads/sres.nc','r') Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 205, in __init__ self._read() File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 492, in _read self._read_var_array() File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 576, in _read_var_array buffer=mm, offset=begin_, order=0) TypeError: data type ">d8" not understood Has anyone encountered this error and is there a way to cure it? Many thanks, Neil ----------------------------------------------- Mac OS X 10.6.7 Darwin Kernel Version 10.7.4 i386 ----------------------------------------------- From lesserwhirls at gmail.com Sun Jul 10 03:55:18 2011 From: lesserwhirls at gmail.com (Sean Arms) Date: Sun, 10 Jul 2011 01:55:18 -0600 Subject: [SciPy-User] scipy.io.netcdf error In-Reply-To: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> References: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> Message-ID: On Jul 9, 2011, at 10:45 PM, Neil Berg wrote: > Hello scipy community, > > I recently installed scipy 0.10 through Chris Fonnesbeck's Scipy Superpack script. When trying to read a netCDF file I receive an error message. > >>>> python >>>> from scipy.io import netcdf >>>> f = netcdf.netcdf_file('/Users/vert/Downloads/sres.nc','r') > Traceback (most recent call last): > File "", line 1, in > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 205, in __init__ > self._read() > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 492, in _read > self._read_var_array() > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 576, in _read_var_array > buffer=mm, offset=begin_, order=0) > TypeError: data type ">d8" not understood > > Has anyone encountered this error and is there a way to cure it? > Hi Neil, I recently ran into this with PyDap. In the latest dev branch of Numpy, >d8 does not appear to be a valid dtype. That said, I did a little digging and noticed that in Numpy 1.4, the >d8 dtype is automatically changed to >f8, but this is done silently. It was a one line fix in PyDap, but I don't know the story as to when things changed in Numpy. Hope that helps (in some way at least), Sean > Many thanks, > Neil > > ----------------------------------------------- > Mac OS X 10.6.7 > Darwin Kernel Version 10.7.4 i386 > ----------------------------------------------- > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From klonuo at gmail.com Sun Jul 10 06:36:58 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Sun, 10 Jul 2011 12:36:58 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: In a hope to be helpful to others in possible similar scenario I'm replying with this: Problem here was missing requirements from this sparse matrix packages: http://www.cise.ufl.edu/research/sparse/ Which I build for scipy sparse and umfpack and then tried to build scipy again as previous building error was triggered in sparsetools This resulted in happy end As a note UMFPACK wasn't defined in my scipy site.cfg and it seems like it doesn't matter Cheers From klonuo at gmail.com Sun Jul 10 07:17:21 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Sun, 10 Jul 2011 13:17:21 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: On Sun, Jul 10, 2011 at 12:36 PM, Klonuo Umom wrote: > In a hope to be helpful to others in possible similar scenario I'm > replying with this: > > Problem here was missing requirements from this sparse matrix > packages: http://www.cise.ufl.edu/research/sparse/ > Which I build for scipy sparse and umfpack and then tried to build > scipy again as previous building error was triggered in sparsetools > > This resulted in happy end This seems wrong. I build latest scipy from svn successfully, but not stable 0.9.0 Additionally I compared problematic file: scipy/sparse/sparsetools/csr_wrap.cxx in both versions and they don't differ. Comparing 'sparsetools' directories between both versions did not say much to me either, so I'm afraid above mentioned solution is non-working. Sorry for that Cheers From kgdunn at gmail.com Sun Jul 10 14:14:25 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Sun, 10 Jul 2011 14:14:25 -0400 Subject: [SciPy-User] SciPy Central: file and link sharing site Message-ID: I'm announcing an alpha release of SciPy Central, a website for sharing code snippets, recipes and links related to scientific computing, specifically using NumPy, SciPy, matplotlib, IPython and similar tools. http://scipy-central.org The idea for the website grew out of previous discussions on this list. The site is currently in alpha mode, which means we'd like you to stress test it by filling in garbage information and trying to break it. We'll keep it in this mode for a couple of days and iron out any bugs. Please report those at https://github.com/kgdunn/SciPyCentral/issues - for those familiar with Django, we've left the site in DEBUG mode, so you can copy/paste the stack trace in your bug reports. Then we will delete all submissions and start in beta mode with DEBUG turned off. If you have any suggestions for improvements, please also post them on the issues list. Thanks, Kevin From peter.norlindh at gmail.com Sun Jul 10 15:18:32 2011 From: peter.norlindh at gmail.com (Peter Norlindh) Date: Sun, 10 Jul 2011 21:18:32 +0200 Subject: [SciPy-User] Install Scipy on Python 3.X Message-ID: Hi, I have just installed Python 3.2 on Ubuntu 11.04 and am now struggling to install Scipy 0.9.0. Earlier, I effortlessly installed Scipy (from the repository) on Python 2.7, but installing it on Python 3.2 seems to be a whole different animal. Are there any easy-to-follow installation instructions available? The INSTALL files that come with the tar files (Numpy and Scipy) are probably very informative, but I find them too extensive and hard to follow. Any help to get it all set up would be greatly appreciated. Best Regards, Peter Norlindh -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sun Jul 10 18:40:32 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 10 Jul 2011 15:40:32 -0700 Subject: [SciPy-User] SciPy Central: file and link sharing site In-Reply-To: References: Message-ID: On Sun, Jul 10, 2011 at 11:14 AM, Kevin Dunn wrote: > I'm announcing an alpha release of SciPy Central, a website for > sharing code snippets, recipes and links related to scientific > computing, specifically using NumPy, SciPy, matplotlib, IPython and > similar tools. > > http://scipy-central.org Thank you for pushing on this and making it happen! I'm sure it will grow into a very useful resource, I've forwarded the announcement to the ipython lists. Best, f From klonuo at gmail.com Mon Jul 11 13:31:08 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Mon, 11 Jul 2011 19:31:08 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: I ended this frustration, by commenting 'scipy/sparse/sparsetools/setup.py': #~ for fmt in ['csr','csc','coo','bsr','dia','csgraph']: for fmt in ['coo','dia']: so that only two modules are availabe in sparsetools, as other could not be compiled It's a bad feeling when trying to optimize build and then end with slightly crippled package, but this is too complex for me to handle it differently, with my current skills and lacking google helpers. I still don't know how I compiled scipy 0.10dev, as now I can't, and tried any possible combinations with Intel variables that I remember using the other day. In summary, I have numpy 1.6.0 and scipy 0.9.0 MKL builds, where scipy test shows one error (besides errors in missing sparsetools modules): ERROR: Failure: ImportError (/usr/local/lib/python2.6/dist-packages/scipy/linalg/atlas_version.so: undefined symbol: ATL_buildinfo) and two failures in: test_basic.TestIfftnSingle test_basic.TestCephes which I guess aren't alarming From ralf.gommers at googlemail.com Mon Jul 11 15:28:14 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 11 Jul 2011 21:28:14 +0200 Subject: [SciPy-User] ANN: NumPy 1.6.1 release candidate 3 Message-ID: Hi, I am pleased to announce the availability of the third release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() #1895/1896 iter: writeonly operands weren't always being buffered correctly This third RC has only a single change compared to RC2 (for #1895/1896), which fixes a serious regression in the iterator. If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinstocks at gmail.com Mon Jul 11 21:44:07 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Mon, 11 Jul 2011 18:44:07 -0700 (PDT) Subject: [SciPy-User] generic_flapack.pyf and geqp3 Message-ID: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Hi, all. I am planning to try to add functionality to scipy.linalg.qr(), specifically to allow qr decomposition with pivoting. However, I have almost no knowledge of how to wrap the function in scipy/linalg/ generic_flapack.pyf. Could somebody please point me in the correct direction? Thanks, Collin Stocks From jsseabold at gmail.com Mon Jul 11 22:25:57 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 11 Jul 2011 21:25:57 -0500 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: > Hi, all. > > I am planning to try to add functionality to scipy.linalg.qr(), > specifically to allow qr decomposition with pivoting. However, I have > almost no knowledge of how to wrap the function in scipy/linalg/ > generic_flapack.pyf. > > Could somebody please point me in the correct direction? Far from an expert, but I've used the 'smart way' in f2py to wrap some LAPACK stuff. Basically, you run f2py on the fortran source, then edit the .pyf file as you need to (use the f2py docs, the routine you're wrapping's documentation, and the pyf examples in scipy for guidance), and then run f2py again to build it (with -llapack or whatever if you need to link against other libraries). http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way I'd be interested to see an example of how to accomplish something similar with fwrap. Skipper From collinstocks at gmail.com Mon Jul 11 23:58:12 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Mon, 11 Jul 2011 20:58:12 -0700 (PDT) Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: Thanks :) Actually, I already managed to figure out how to create the patch before reading your post, but thanks for the quick reply anyway. Any idea where I should go from here? I've submitted a patch to the tracker (http://projects.scipy.org/scipy/ticket/1473). I guess I should just wait... It's just that there is some specific code I want to run (I have already written it), and I would prefer that my code not rely on my own personal fork of the SciPy project ;D Actually, the code I have written is probably going to end up as part of SciPy (or one of the SciKits), as it is a Python implementation of the stepwisefit function from MatLab's statistics toolbox, which is rather useful to some people. (Perhaps seeing this may be an incentive for some of the developers to accept my patch for qr()...) On Jul 11, 10:25?pm, Skipper Seabold wrote: > On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: > > Hi, all. > > > I am planning to try to add functionality to scipy.linalg.qr(), > > specifically to allow qr decomposition with pivoting. However, I have > > almost no knowledge of how to wrap the function in scipy/linalg/ > > generic_flapack.pyf. > > > Could somebody please point me in the correct direction? > > Far from an expert, but I've used the 'smart way' in f2py to wrap some > LAPACK stuff. Basically, you run f2py on the fortran source, then edit > the .pyf file as you need to (use the f2py docs, the routine you're > wrapping's documentation, and the pyf examples in scipy for guidance), > and then run f2py again to build it (with -llapack or whatever if you > need to link against other libraries). > > http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way > > I'd be interested to see an example of how to accomplish something > similar with fwrap. > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From jsseabold at gmail.com Tue Jul 12 00:17:06 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 11 Jul 2011 23:17:06 -0500 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: On Mon, Jul 11, 2011 at 10:58 PM, Collin Stocks wrote: > Thanks :) > Actually, I already managed to figure out how to create the patch > before reading your post, but thanks for the quick reply anyway. Any > idea where I should go from here? I've submitted a patch to the > tracker (http://projects.scipy.org/scipy/ticket/1473). I guess I > should just wait... Some advice- If you leave it on trac, you should at least attach a diff (preferably versus the most recent scipy) so it's easier to see exactly what's changed. Patches should also include a test. https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt If you really want to go the extra mile, you should add a test and then use git to make a patch and submit to the ML/trac for review. Then you can make a pull request if it looks good. This keeps the legwork pretty low for core developers. http://docs.scipy.org/doc/numpy/dev/index.html > > It's just that there is some specific code I want to run (I have > already written it), and I would prefer that my code not rely on my > own personal fork of the SciPy project ;D > > Actually, the code I have written is probably going to end up as part > of SciPy (or one of the SciKits), as it is a Python implementation of > the stepwisefit function from MatLab's statistics toolbox, which is > rather useful to some people. (Perhaps seeing this may be an incentive > for some of the developers to accept my patch for qr()...) > We would certainly be interested in a stepwisefit implementation for statsmodels. http://statsmodels.sourceforge.net/ Skipper > On Jul 11, 10:25?pm, Skipper Seabold wrote: >> On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: >> > Hi, all. >> >> > I am planning to try to add functionality to scipy.linalg.qr(), >> > specifically to allow qr decomposition with pivoting. However, I have >> > almost no knowledge of how to wrap the function in scipy/linalg/ >> > generic_flapack.pyf. >> >> > Could somebody please point me in the correct direction? >> >> Far from an expert, but I've used the 'smart way' in f2py to wrap some >> LAPACK stuff. Basically, you run f2py on the fortran source, then edit >> the .pyf file as you need to (use the f2py docs, the routine you're >> wrapping's documentation, and the pyf examples in scipy for guidance), >> and then run f2py again to build it (with -llapack or whatever if you >> need to link against other libraries). >> >> http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way >> >> I'd be interested to see an example of how to accomplish something >> similar with fwrap. >> >> Skipper >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From collinstocks at gmail.com Tue Jul 12 00:22:25 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Tue, 12 Jul 2011 00:22:25 -0400 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: <1310444545.2548.85.camel@SietchTabr> Thanks for the advice. I'll put that off for tomorrow, though, as I desperately need some rest... Good night, -- Collin -------------- next part -------------- An embedded message was scrubbed... From: Skipper Seabold Subject: Re: [SciPy-User] generic_flapack.pyf and geqp3 Date: Mon, 11 Jul 2011 23:17:06 -0500 Size: 6027 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From gael.varoquaux at normalesup.org Tue Jul 12 10:23:17 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 12 Jul 2011 16:23:17 +0200 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: <20110712142317.GE17559@phare.normalesup.org> On Sat, Jul 09, 2011 at 08:36:01PM +0300, eat wrote: > > (It's a > > covariance matrix computed from not-enough samples, so it's positive > > semi-definite but rank-deficient.) > How bout a slight regularization, Indeed, from a statistics point of view, and non definite positive covariance matrix is simply an estimation error. As a descriptive statistic, it may look good, but it does not contain much information on the population covariance. The easiest solution is to regularize it. The good news is that there exists good heuristics (approximate oracles) to find the optimal regularization parameter). If you assume that your underlying data is Gaussian, the 'Approximate Oracle Shrinkage' is optimal. If you don't want this assumption, the Ledoit-Wolf shrinkage works great. In practice they are often similar, but LW tends to under-penalize compared to AOS if you have little data compare to the dimension of your dataset. Another good news is that you can find Python implementations of these heuristics in the scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/covariance/shrunk_covariance_.py you will find references to the paper if you are interested, and you can simply copy out the function from the scikit-learn if you don't want to depend on it. HTH, Gael From ralf.gommers at googlemail.com Tue Jul 12 15:16:35 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 12 Jul 2011 21:16:35 +0200 Subject: [SciPy-User] Install Scipy on Python 3.X In-Reply-To: References: Message-ID: Hi Peter, On Sun, Jul 10, 2011 at 9:18 PM, Peter Norlindh wrote: > Hi, > > I have just installed Python 3.2 on Ubuntu 11.04 and am now struggling to > install Scipy 0.9.0. Earlier, I effortlessly installed Scipy (from the > repository) on Python 2.7, but installing it on Python 3.2 seems to be a > whole different animal. > > Are there any easy-to-follow installation instructions available? The > INSTALL files that come with the tar files (Numpy and Scipy) are probably > very informative, but I find them too extensive and hard to follow. > > Any help to get it all set up would be greatly appreciated. > > Installing under Python 3.x is not much different, except that it takes longer due to 2to3 running before compilation. If you have the right prerequisites installed the usual "python setup.py install" should do the job. The most up-to-date instructions are at http://scipy.org/Installing_SciPy/. If you run into a specific issue, please tell us your OS, compilers, etc. plus the build commands you used and the build log. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwang at streamitive.com Tue Jul 12 16:00:14 2011 From: pwang at streamitive.com (Peter Wang) Date: Tue, 12 Jul 2011 15:00:14 -0500 Subject: [SciPy-User] Scipy 2011 Convore thread now open Message-ID: Hi folks, I have gone ahead and created a Convore group for the SciPy 2011 conference: https://convore.com/scipy-2011/ I have already created threads for each of the tutorial topics, and once the conference is underway, we'll create threads for each talk, so that audience can interact and post questions. Everyone is welcome to create topics of their own, in addition to the "official" conference topics. For those who are unfamiliar with Convore, it is a cross between a mailing list and a very souped-up IRC. It's usable for aynchronous discussion, but great for realtime, topical chats. Those of you who were at PyCon this year probably saw what a wonderful tool Convore proved to be for a tech conference. People used it for everything from BoF planning to dinner coordination to good-natured heckling of lightning talk speakers. I'm hoping that it will be used to similarly good effect for the SciPy Cheers, Peter From lbloy at seas.upenn.edu Tue Jul 12 17:01:37 2011 From: lbloy at seas.upenn.edu (Luke Bloy) Date: Tue, 12 Jul 2011 17:01:37 -0400 Subject: [SciPy-User] Sparse matrix question Message-ID: <4E1CB631.1020904@seas.upenn.edu> Hi, I have what I hope is a fairly simple scipy question about sparse matrices. I have a sparse matrix (A) that i use to build a constraint matrix for an optimisation problem. The constraints, that concern A, are that A_{i,j} d_{j} - A_{j,i} d_{i} == 0. I would then find the optimal d vector. So I'm having problems efficiently building my constraints. The first part is simple as it is just the nonzero elements of A but accessing A transpose in the same order is difficult. The basic logic is this.... rows=numpy.zeros(2*A.nnz, dtype=numpy.int32) cols=numpy.zeros(2*A.nnz, dtype=numpy.int32) data=numpy.zeros(2*A.nnz, dtype=numpy.float64) #A_{i,j} d_{j} numNonZeros = A.nnz rows[:numNonZeros] = numpy.arange(numNonZeros) cols[:numNonZeros] = A.col data[:numNonZeros] = A.data #-A_{j,i} d_{i} rows[numNonZeros:] = numpy.arange(numNonZeros) cols[numNonZeros:] = A.row data[numNonZeros:] = - A[A.col,A.row] ### Atranspose[A.row, A.col] The problem is that this takes too long (> 2minutes for a 3000x3000 matrix with 3 million nonzeros). similary code in matlab is an order of magnitude faster? I've tried both csr and csc for doing the memory access of the transpose, (attached is the csrTest , i can send the matrix file if you are interested) Does anyone have any suggestions on speeding this up? Thanks, Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: testCsr.py Type: text/x-python Size: 1516 bytes Desc: not available URL: From developer at studioart.org Wed Jul 13 01:36:12 2011 From: developer at studioart.org (Long Duong) Date: Tue, 12 Jul 2011 22:36:12 -0700 Subject: [SciPy-User] [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: Does anybody know if there are there videos of the conference this year? Best regards, Long Duong UC Irvine Biomedical Engineering long at studioart.org On Tue, Jul 12, 2011 at 1:00 PM, Peter Wang wrote: > Hi folks, > > I have gone ahead and created a Convore group for the SciPy 2011 > conference: > > https://convore.com/scipy-2011/ > > I have already created threads for each of the tutorial topics, and > once the conference is underway, we'll create threads for each talk, > so that audience can interact and post questions. Everyone is welcome > to create topics of their own, in addition to the "official" > conference topics. > > For those who are unfamiliar with Convore, it is a cross between a > mailing list and a very souped-up IRC. It's usable for aynchronous > discussion, but great for realtime, topical chats. Those of you who > were at PyCon this year probably saw what a wonderful tool Convore > proved to be for a tech conference. People used it for everything > from BoF planning to dinner coordination to good-natured heckling of > lightning talk speakers. I'm hoping that it will be used to similarly > good effect for the SciPy > > > Cheers, > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jul 13 12:42:37 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Jul 2011 11:42:37 -0500 Subject: [SciPy-User] [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 00:36, Long Duong wrote: > > Does anybody know if there are there videos of the conference this year? Yes. Announcements will be made when they start going online. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From jreback at yahoo.com Tue Jul 12 07:27:00 2011 From: jreback at yahoo.com (Jeff Reback) Date: Tue, 12 Jul 2011 04:27:00 -0700 (PDT) Subject: [SciPy-User] statsmodels - rlm Message-ID: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> Hi, ? Is their a way to recover the final Results class in a robust estimation? ? see below - I can clearly replicate the WLS regression and get a Results class with the correct results; is their a reference in the RLMResults class (that points to the wls_results) that I am missing? ? thanks, ? Jeff ? # using v2 of statsmodels import numpy as np import scikits.statsmodels as sm ? #delivery time(minutes)??? endog = np.array([16.68, 11.50, 12.03, 14.88, 13.75, 18.11, 8.00, 17.83,??? ????????????????? 79.24, 21.50, 40.33, 21.00, 13.50, 19.75, 24.00, 29.00, 15.35, 19.00,??? ????????????????? 9.50, 35.10, 17.90, 52.32, 18.75, 19.83, 10.75]) ? #number of cases, distance (Feet)??? exog = np.array([[7, 3, 3, 4, 6, 7, 2, 7, 30, 5, 16, 10, 4, 6, 9, 10, 6, 7, 3, 17, 10, 26, 9, 8, 4], ???????????????? [560, 220, 340, 80, 150, 330, 110, 210, 1460, 605, 688, 215, 255, 462, 448, 776, 200, 132, 36, 770, 140, 810, 450, 635,150]]) exog = exog.T??? exog = sm.add_constant(exog) rlm? = sm.RLM(endog, exog).fit() print "RLM params -> %s, r2 -> %s" % (rlm.params, getattr(rlm,'rsquared',None)) wls? = sm.WLS(endog, exog, weights = rlm.weights).fit() print "WLS params -> %s, r2 -> %s" % (wls.params, wls.rsquared)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Jul 14 12:50:58 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 14 Jul 2011 11:50:58 -0500 Subject: [SciPy-User] statsmodels - rlm In-Reply-To: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> References: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> Message-ID: On Tue, Jul 12, 2011 at 6:27 AM, Jeff Reback wrote: > Hi, > > Is their a way to recover the final Results class in a robust estimation? > > see below - I can clearly replicate the WLS regression and get a Results > class with the correct results; > is their a reference in the RLMResults class (that points to the > wls_results) that I am missing? > > thanks, > > Jeff > > # using v2 of statsmodels > import numpy as np > import scikits.statsmodels as sm > > #delivery time(minutes) > endog = np.array([16.68, 11.50, 12.03, 14.88, 13.75, 18.11, 8.00, 17.83, > ????????????????? 79.24, 21.50, 40.33, 21.00, 13.50, 19.75, 24.00, 29.00, > 15.35, 19.00, > ????????????????? 9.50, 35.10, 17.90, 52.32, 18.75, 19.83, 10.75]) > > #number of cases, distance (Feet) > exog = np.array([[7, 3, 3, 4, 6, 7, 2, 7, 30, 5, 16, 10, 4, 6, 9, 10, 6, 7, > 3, 17, 10, 26, 9, 8, 4], > ???????????????? [560, 220, 340, 80, 150, 330, 110, 210, 1460, 605, 688, > 215, 255, 462, 448, 776, 200, 132, 36, 770, 140, 810, 450, 635,150]]) > exog = exog.T > exog = sm.add_constant(exog) > rlm? = sm.RLM(endog, exog).fit() > print "RLM params -> %s, r2 -> %s" % (rlm.params, > getattr(rlm,'rsquared',None)) > wls? = sm.WLS(endog, exog, weights = rlm.weights).fit() > print "WLS params -> %s, r2 -> %s" % (wls.params, wls.rsquared) > For posterity, answered here. https://groups.google.com/group/pystatsmodels/browse_thread/thread/ab999ff6ab32c5e0 Skipper From paul.blelloch at ata-e.com Thu Jul 14 17:56:51 2011 From: paul.blelloch at ata-e.com (Paul Blelloch) Date: Thu, 14 Jul 2011 14:56:51 -0700 Subject: [SciPy-User] Deciphering Matlab Structures in Python Message-ID: I used the scipy.io.loadmat function to read a Matlab save file that includes some structures. In particular my Matlab code looks something like: a=randn(10,10); e=eig(a); c.mat=a; c.eig=e; save When I read this in I have not trouble getting the a and e matrices, but when I look at the structure c I get something that looks like: array([[ ([[-1.918381876114684, 0.35909312220124379, 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, -1.1216999904267995, 0.86106602555594924]], [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], [(-2.5789412296806438+0j)], [(0.30103481909678725+1.8315300529614498j)], [(0.30103481909678725-1.8315300529614498j)], [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], [(-0.6068802651298042-1.19212639770675j)], [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], dtype=[('mat', '|O4'), ('eig', '|O4')]) Looking through the numpy documentation I can't figure out how to interpret that. How do I pull the fields out of the structure? THANKS, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 14 18:11:03 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 14 Jul 2011 23:11:03 +0100 Subject: [SciPy-User] Deciphering Matlab Structures in Python In-Reply-To: References: Message-ID: Hi, On Thu, Jul 14, 2011 at 10:56 PM, Paul Blelloch wrote: > I used the scipy.io.loadmat function to read a Matlab save file > that?includes some structures. ?In particular my Matlab code looks?something > like: > > a=randn(10,10); > e=eig(a); > c.mat=a; > c.eig=e; > save > > When I read this in I have not trouble getting the a and e matrices,? but > when I look at the structure c I get something that looks like: > > array([[ ([[-1.918381876114684, 0.35909312220124379, > 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, > -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, > -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, > -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, > 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, > -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], > [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, > -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, > 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, > -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, > 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, > -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, > -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, > 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, > 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, > -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], > [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, > 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, > 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, > 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, > -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, > 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, > -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, > -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, > -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, > 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], > [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, > 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, > 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, > -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, > 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, > 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, > -1.1216999904267995, 0.86106602555594924]], > [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], > [(-2.5789412296806438+0j)], > [(0.30103481909678725+1.8315300529614498j)], > [(0.30103481909678725-1.8315300529614498j)], > [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], > [(-0.6068802651298042-1.19212639770675j)], > [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], > ? ? ? dtype=[('mat', '|O4'), ('eig', '|O4')]) > > Looking through the numpy documentation I can't figure out how to?interpret > that. ?How do I pull the fields out of the structure? This is just a repeat of my reply on the Python X,Y list, just for reference. What you got here was a structured array. You can get what you want with something like: >>> import scipy.io as sio >>> ws = sio.loadmat('matlab.mat') >>> c = ws['c'] You got this far already of course. Notice that 'c' is a 2D array with a composite 'dtype': >>> c.shape (1, 1) >>> c.dtype dtype([('mat', '|O8'), ('eig', '|O8')] To get the underlying fields you need: >>> eig = c[0,0]['eig'] >>> mat = c[0,0]['mat'] To learn more it might be worth reading a little about structured arrays in numpy. Best, Matthew From paul.blelloch at ata-e.com Thu Jul 14 18:13:30 2011 From: paul.blelloch at ata-e.com (Paul Blelloch) Date: Thu, 14 Jul 2011 15:13:30 -0700 Subject: [SciPy-User] Deciphering Matlab Structures in Python In-Reply-To: Message-ID: <6133335d2877964fbfbe2e4785701123@mail> THANKS! I was missing the [0,0] part. -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Matthew Brett Sent: Thursday, July 14, 2011 3:11 PM To: SciPy Users List Subject: Re: [SciPy-User] Deciphering Matlab Structures in Python Hi, On Thu, Jul 14, 2011 at 10:56 PM, Paul Blelloch wrote: > I used the scipy.io.loadmat function to read a Matlab save file > that?includes some structures. ?In particular my Matlab code looks?something > like: > > a=randn(10,10); > e=eig(a); > c.mat=a; > c.eig=e; > save > > When I read this in I have not trouble getting the a and e matrices,? but > when I look at the structure c I get something that looks like: > > array([[ ([[-1.918381876114684, 0.35909312220124379, > 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, > -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, > -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, > -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, > 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, > -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], > [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, > -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, > 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, > -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, > 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, > -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, > -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, > 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, > 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, > -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], > [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, > 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, > 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, > 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, > -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, > 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, > -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, > -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, > -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, > 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], > [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, > 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, > 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, > -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, > 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, > 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, > -1.1216999904267995, 0.86106602555594924]], > [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], > [(-2.5789412296806438+0j)], > [(0.30103481909678725+1.8315300529614498j)], > [(0.30103481909678725-1.8315300529614498j)], > [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], > [(-0.6068802651298042-1.19212639770675j)], > [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], > ? ? ? dtype=[('mat', '|O4'), ('eig', '|O4')]) > > Looking through the numpy documentation I can't figure out how to?interpret > that. ?How do I pull the fields out of the structure? This is just a repeat of my reply on the Python X,Y list, just for reference. What you got here was a structured array. You can get what you want with something like: >>> import scipy.io as sio >>> ws = sio.loadmat('matlab.mat') >>> c = ws['c'] You got this far already of course. Notice that 'c' is a 2D array with a composite 'dtype': >>> c.shape (1, 1) >>> c.dtype dtype([('mat', '|O8'), ('eig', '|O8')] To get the underlying fields you need: >>> eig = c[0,0]['eig'] >>> mat = c[0,0]['mat'] To learn more it might be worth reading a little about structured arrays in numpy. Best, Matthew _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From cournape at gmail.com Fri Jul 15 09:08:49 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 15 Jul 2011 22:08:49 +0900 Subject: [SciPy-User] [ANN] Bento 0.0.6, a packaging solution for python software Message-ID: Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. It supports every python version from 2.4 to 3.2. You can take a look at its main features on Bento's main page (http://cournape.github.com/Bento). The main features of this 0.0.6 release are: - Completely revamped distutils compatibility layer: it is now a thin layer around bento infrastructure, so that most bento packages should be pip-installable, while still keeping bento customization capabilities. - Build directory is now customizable through bentomaker with --build-directory option - Out of tree builds support (i.e. running bento in a directory which does not contain bento.info), with global --bento-info option - Hook File can now be specified in recursed bento.info - Preliminary support for .mpkg (Mac OS X native packaging) - More consistent API for extension/compiled library build registration - Both numpy and scipy can now be built with bento + waf as a build backend Bento is discussed on the bento mailing list (http://librelist.com/browser/bento). cheers, David From ndbecker2 at gmail.com Fri Jul 15 12:58:30 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 12:58:30 -0400 Subject: [SciPy-User] estimation problem Message-ID: I have a known signal (vector) 'x'. I recieve a vector 'y' y = F(k x) + n where n is Gaussian noise, and k is an unknown gain parameter. I want to estimate k. F is a known function (nonlinear, memoryless). What might be a good approach to try? I'd like this to be an 'online' approach - that is, I provide batches of training vectors (x, n), and the estimator will improve the estimate (hopefully) as more data is supplied. From josef.pktd at gmail.com Fri Jul 15 13:10:33 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 13:10:33 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: > I have a known signal (vector) 'x'. ?I recieve a vector 'y' > > y = F(k x) + n > > where n is Gaussian noise, and k is an unknown gain parameter. > > I want to estimate k. > > F is a known function (nonlinear, memoryless). > > What might be a good approach to try? ?I'd like this to be an 'online' approach > - that is, I provide batches of training vectors (x, n), and the estimator will > improve the estimate (hopefully) as more data is supplied. scipy.optimize.curve_fit I would reestimate with the entire sample after a batch arrives using the old estimate as a starting value. There might be shortcuts reusing and updating the Jacobian and Hessian, but I don't know of anything that could be used directly. (I don't have much idea about non-linear kalman filters and whether they would help in this case.) Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ndbecker2 at gmail.com Fri Jul 15 13:20:10 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 13:20:10 -0400 Subject: [SciPy-User] estimation problem References: Message-ID: josef.pktd at gmail.com wrote: > On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >> I have a known signal (vector) 'x'. I recieve a vector 'y' >> >> y = F(k x) + n >> >> where n is Gaussian noise, and k is an unknown gain parameter. >> >> I want to estimate k. >> >> F is a known function (nonlinear, memoryless). >> >> What might be a good approach to try? I'd like this to be an 'online' >> approach - that is, I provide batches of training vectors (x, n), and the >> estimator will improve the estimate (hopefully) as more data is supplied. > > scipy.optimize.curve_fit > > I would reestimate with the entire sample after a batch arrives using > the old estimate as a starting value. > > There might be shortcuts reusing and updating the Jacobian and > Hessian, but I don't know of anything that could be used directly. (I > don't have much idea about non-linear kalman filters and whether they > would help in this case.) > > Josef > Thanks. One fix, that should have been "I provide batches of training vectors (x, y)". n is unknown noise. From ndbecker2 at gmail.com Fri Jul 15 13:39:16 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 13:39:16 -0400 Subject: [SciPy-User] estimation problem References: Message-ID: josef.pktd at gmail.com wrote: > On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >> I have a known signal (vector) 'x'. I recieve a vector 'y' >> >> y = F(k x) + n >> >> where n is Gaussian noise, and k is an unknown gain parameter. >> >> I want to estimate k. >> >> F is a known function (nonlinear, memoryless). >> >> What might be a good approach to try? I'd like this to be an 'online' >> approach - that is, I provide batches of training vectors (x, n), and the >> estimator will improve the estimate (hopefully) as more data is supplied. > > scipy.optimize.curve_fit > > I would reestimate with the entire sample after a batch arrives using > the old estimate as a starting value. > > There might be shortcuts reusing and updating the Jacobian and > Hessian, but I don't know of anything that could be used directly. (I > don't have much idea about non-linear kalman filters and whether they > would help in this case.) > In my case, x, y, n are complex. I guess I need to handle that myself (somehow). Traceback (most recent call last): File "test_curvefit.py", line 378, in run_line (sys.argv) File "test_curvefit.py", line 375, in run_line result = run (opt, cmdline) File "test_curvefit.py", line 244, in run popt, pcov = curve_fit (func, rcv_in[SPS*SI:SPS*SI+2*N], mod_out[SPS*SI:SPS*SI+2*N]) File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 426, in curve_fit res = leastsq(func, p0, args=args, full_output=1, **kw) File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 283, in leastsq gtol, maxfev, epsfcn, factor, diag) minpack.error: Result from function call is not a proper array of floats. From josef.pktd at gmail.com Fri Jul 15 14:04:39 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 14:04:39 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: > josef.pktd at gmail.com wrote: > >> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>> I have a known signal (vector) 'x'. ?I recieve a vector 'y' >>> >>> y = F(k x) + n >>> >>> where n is Gaussian noise, and k is an unknown gain parameter. >>> >>> I want to estimate k. >>> >>> F is a known function (nonlinear, memoryless). >>> >>> What might be a good approach to try? ?I'd like this to be an 'online' >>> approach - that is, I provide batches of training vectors (x, n), and the >>> estimator will improve the estimate (hopefully) as more data is supplied. >> >> scipy.optimize.curve_fit >> >> I would reestimate with the entire sample after a batch arrives using >> the old estimate as a starting value. >> >> There might be shortcuts reusing and updating the Jacobian and >> Hessian, but I don't know of anything that could be used directly. (I >> don't have much idea about non-linear kalman filters and whether they >> would help in this case.) >> > In my case, x, y, n are complex. ?I guess I need to handle that myself > (somehow). I guess curve_fit won't help then. optimize.leastsq should still work if the function returns a 1d array abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. If k is also complex, then I would think that it will have to be separated into real and complex parts as separate parameters. If you need the extra results, like covariance matrix of the estimate, then I would just copy the parts from curve_fit. (I don't think I have seen complex Gaussian noise, n, before.) Josef > > Traceback (most recent call last): > ?File "test_curvefit.py", line 378, in > ? ?run_line (sys.argv) > ?File "test_curvefit.py", line 375, in run_line > ? ?result = run (opt, cmdline) > ?File "test_curvefit.py", line 244, in run > ? ?popt, pcov = curve_fit (func, rcv_in[SPS*SI:SPS*SI+2*N], > mod_out[SPS*SI:SPS*SI+2*N]) > ?File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 426, > in curve_fit > ? ?res = leastsq(func, p0, args=args, full_output=1, **kw) > ?File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 283, > in leastsq > ? ?gtol, maxfev, epsfcn, factor, diag) > minpack.error: Result from function call is not a proper array of floats. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ndbecker2 at gmail.com Fri Jul 15 14:12:40 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 14:12:40 -0400 Subject: [SciPy-User] estimation problem References: Message-ID: josef.pktd at gmail.com wrote: > On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: >> josef.pktd at gmail.com wrote: >> >>> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>>> I have a known signal (vector) 'x'. I recieve a vector 'y' >>>> >>>> y = F(k x) + n >>>> >>>> where n is Gaussian noise, and k is an unknown gain parameter. >>>> >>>> I want to estimate k. >>>> >>>> F is a known function (nonlinear, memoryless). >>>> >>>> What might be a good approach to try? I'd like this to be an 'online' >>>> approach - that is, I provide batches of training vectors (x, n), and the >>>> estimator will improve the estimate (hopefully) as more data is supplied. >>> >>> scipy.optimize.curve_fit >>> >>> I would reestimate with the entire sample after a batch arrives using >>> the old estimate as a starting value. >>> >>> There might be shortcuts reusing and updating the Jacobian and >>> Hessian, but I don't know of anything that could be used directly. (I >>> don't have much idea about non-linear kalman filters and whether they >>> would help in this case.) >>> >> In my case, x, y, n are complex. I guess I need to handle that myself >> (somehow). > > I guess curve_fit won't help then. > optimize.leastsq should still work if the function returns a 1d array > abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. > > If k is also complex, then I would think that it will have to be > separated into real and complex parts as separate parameters. > > If you need the extra results, like covariance matrix of the estimate, > then I would just copy the parts from curve_fit. > > (I don't think I have seen complex Gaussian noise, n, before.) > > Josef > What I tried that seems to work is: def func (x, k): return complex_to_real (complex_func (real_to_complex (x * k))) popt, pcov = curve_fit (func, complex_to_real(x), complex_to_real (y)) where complex_to_real: interpret a complex vector as alternating real/imag parts real_to_complex: interpret alternating entries in float vector as real/imag parts of complex Does this seem like a valid use of curve_fit? From josef.pktd at gmail.com Fri Jul 15 14:31:42 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 14:31:42 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 2:12 PM, Neal Becker wrote: > josef.pktd at gmail.com wrote: > >> On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: >>> josef.pktd at gmail.com wrote: >>> >>>> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>>>> I have a known signal (vector) 'x'. ?I recieve a vector 'y' >>>>> >>>>> y = F(k x) + n >>>>> >>>>> where n is Gaussian noise, and k is an unknown gain parameter. >>>>> >>>>> I want to estimate k. >>>>> >>>>> F is a known function (nonlinear, memoryless). >>>>> >>>>> What might be a good approach to try? ?I'd like this to be an 'online' >>>>> approach - that is, I provide batches of training vectors (x, n), and the >>>>> estimator will improve the estimate (hopefully) as more data is supplied. >>>> >>>> scipy.optimize.curve_fit >>>> >>>> I would reestimate with the entire sample after a batch arrives using >>>> the old estimate as a starting value. >>>> >>>> There might be shortcuts reusing and updating the Jacobian and >>>> Hessian, but I don't know of anything that could be used directly. (I >>>> don't have much idea about non-linear kalman filters and whether they >>>> would help in this case.) >>>> >>> In my case, x, y, n are complex. ?I guess I need to handle that myself >>> (somehow). >> >> I guess curve_fit won't help then. >> optimize.leastsq should still work if the function returns a 1d array >> abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. >> >> If k is also complex, then I would think that it will have to be >> separated into real and complex parts as separate parameters. >> >> If you need the extra results, like covariance matrix of the estimate, >> then I would just copy the parts from curve_fit. >> >> (I don't think I have seen complex Gaussian noise, n, before.) >> >> Josef >> > > What I tried that seems to work is: > > def func (x, k): > ?return complex_to_real (complex_func (real_to_complex (x * k))) > > popt, pcov = curve_fit (func, complex_to_real(x), complex_to_real (y)) > > where complex_to_real: interpret a complex vector as alternating real/imag parts > real_to_complex: interpret alternating entries in float vector as real/imag > parts of complex > > Does this seem like a valid use of curve_fit? I'm not very good in complex algebra without sitting down and go through the details. your complex_to_real(x), complex_to_real (y) have now twice the length of the original x, y Does your func also return twice as many values? (after more thought, yes, since this is curve_fit and not leastsq.) Then, you are using a different objective function, sum of squares of real plus squares of complex parts, instead of square of complex. (real(y) - real(F(x)))**2 + (imag(y) - imag(F(x)))**2, instead of (y-F(x)) * (y-F(x)).conj() ? I don't know if this matters. Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From nberg at atmos.ucla.edu Sat Jul 16 01:20:23 2011 From: nberg at atmos.ucla.edu (Neil Berg) Date: Fri, 15 Jul 2011 22:20:23 -0700 Subject: [SciPy-User] netcdf integer output issue Message-ID: <7493B573-17DE-4310-919F-F333AC723D13@atmos.ucla.edu> Hi all, I am struggling to correctly output negative integers into a netCDF file. I have attached my code snippet and a 2-row sample of the input CSV data. After reading in 24 data points, I calculate the maximum, store it into a list, and output that list as a netCDF file. These are what the 2 rows of CSV input data look like, followed by the maximum value. ['21', '21', '21', '21', '20', '19', '17', '17', '16', '17', '17', '15', '16', '16', '11', '9', '7', '6', '5', '6', '4', '3', '2', '0'] ['-1', '-1', '-2', '-3', '-3', '-4', '-5', '-5', '-5', '-5', '-5', '-6', '-5', '-5', '-5', '-6', '-7', '-8', '-9', '-11', '-13', '-12', '-15', '-15'] This is the current netCDF output: time[0] max_t[0]=21 degrees F time[1] max_t[1]=4294967295 degrees F You can see that the time[0] output is correct, though the time[1] output is incorrect. I believe this is an issue with outputting negative integers. Have anyone else encountered this issue and know of a way to solve it? Thanks in advance, Neil -------------- next part -------------- A non-text attachment was scrubbed... Name: csv_sample2.csv Type: text/csv Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code_snippet.py Type: text/x-python-script Size: 1174 bytes Desc: not available URL: -------------- next part -------------- From kpr0001 at comcast.net Fri Jul 15 12:44:50 2011 From: kpr0001 at comcast.net (kpr0001 at comcast.net) Date: Fri, 15 Jul 2011 16:44:50 +0000 (UTC) Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython Message-ID: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Hello, I have a new install of Visual studio 2010, into which I just updated the python tools and iron python's latest release (as of 7/15/2011). I downloaded numpy and scipy, modified the path, and followed all of the troubleshooting instructions on your site, stackoverflow, and enthought.com. The libraries appear in the correct place now, but when I type "import python" it throws errors. Traceback (most recent call last): File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an exception. exceptions.SystemError Traceback (most recent call last): File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an exception. System info is Microsofr .NET framework Version 4.0.30319 RTMRel running under Windows 7 I am out of ideas. Help is much appreciated. Thanks, Karen -------------- next part -------------- An HTML attachment was scrubbed... URL: From digital.fireball at googlemail.com Fri Jul 15 16:08:55 2011 From: digital.fireball at googlemail.com (Johannes Eckstein) Date: Fri, 15 Jul 2011 22:08:55 +0200 Subject: [SciPy-User] 'compress' numpy array Message-ID: <4E209E57.5070607@googlemail.com> HI, i have been struggling for some hours with finding indexes in numpy arrays, maybee someone is willing to help me out with this little problem... I have a set of faces, which I formated like this: [[[ 1. -0.1 -0. ] [ 1. -0.09921 -0.01253] [ 1. -0. -0. ]] [[ 1. -0.1 -0. ] [ 1. -0.2 -0. ] [ 1. -0.09921 -0.01253]] [[ 1. -0.2 -0. ] [ 1. -0.19842 -0.02507] [ 1. -0.09921 -0.01253]] [[ 1. -0.2 -0. ] [ 1. -0.3 -0. ] [ 1. -0.19842 -0.02507]] [[ 1. -0.3 -0. ] [ 1. -0.29763 -0.0376 ] [ 1. -0.19842 -0.02507]] [[ 1. -0.3 -0. ] [ 1. -0.4 -0. ] [ 1. -0.29763 -0.0376 ]] [[ 1. -0.4 -0. ] [ 1. -0.39685 -0.05013] [ 1. -0.29763 -0.0376 ]] [[ 1. -0.4 -0. ] [ 1. -0.5 -0. ] [ 1. -0.39685 -0.05013]] [[ 1. -0.5 -0. ] [ 1. -0.49606 -0.06267] [ 1. -0.39685 -0.05013]] [[ 1. -0.5 -0. ] [ 1. -0.6 -0. ] [ 1. -0.49606 -0.06267]] [[ 1. -0.6 -0. ] [ 1. -0.59527 -0.0752 ] [ 1. -0.49606 -0.06267]] [[ 1. -0.6 -0. ] [ 1. -0.7 -0. ] [ 1. -0.59527 -0.0752 ]]] Now I would like to find all the redundant points and create a list from them like this: [[ 1. -0.1 -0. ] [ 1. -0.09921 -0.01253] [ 1. -0. -0. ] [ 1. -0.2 -0. ] [ 1. -0.19842 -0.02507] ...] Then make an array of the indices (which could be also shifted with x-1) looking like this: [[1 2 3] [1 4 2] [4 5 2] ...] Anyone an Idea or a hint of how I can efficiently compute those two results? To me it seems like I need some kind of tricky sorting algorithm, but maybee there is a trick that I don't see... Cheers Johannes From ralf.gommers at googlemail.com Sun Jul 17 11:33:23 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 17 Jul 2011 17:33:23 +0200 Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython In-Reply-To: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> References: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Message-ID: On Fri, Jul 15, 2011 at 6:44 PM, wrote: > Hello, I have a new install of Visual studio 2010, into which I just > updated the python tools and iron python's latest release (as of 7/15/2011). > I downloaded numpy and scipy, modified the path, and followed all of the > troubleshooting instructions on your site, stackoverflow, and > enthought.com. The libraries appear in the correct place now, but when I > type "import python" it throws errors. > > Traceback (most recent call last): > File "C:\Program Files (x86)\IronPython > 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in > SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an > exception. > > exceptions.SystemError > Traceback (most recent call last): > File "C:\Program Files (x86)\IronPython > 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in > SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an > exception. > > > System info is > > Microsofr .NET framework > Version 4.0.30319 RTMRel > > running under > Windows 7 > > I am out of ideas. Help is much appreciated. > > Numpy/Scipy don't work on .NET, unless forks are available from Enthought. But I haven't seen an announcement on that. What instructions are you referring to? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Mon Jul 18 06:17:34 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 12:17:34 +0200 Subject: [SciPy-User] calculate new values in csv using numpy Message-ID: Hello, I try to load a csv-file (created with excel) and want to calculate a new value. Lets say my file has columns A,B and C and I want to calculate A*B-C and append it in the correct line. So far I managed to get the file read with: table = numpy.genfromtxt("/path/to/file.csv", dtype=None, delimiter=';', skip_header=1) But how do I have to proceed to get the single columns. I know that I have to use after that a FOR-loop to loop over the lines to do the calculation. so there are the two questions: 1) how to extract the single columns? 2) how to append a new columns containing the calculated value? best regards /johannes -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Mon Jul 18 08:16:59 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 05:16:59 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: okay so just to post the result I get from: table = numpy.genfromtxt("/path/to/file.csv", names=['s1','s2','p'], dtype="float,float,float", delimiter=';' , skip_header=1) the result looks like this: [(30.633520000000001, 1046.5956699999999, 0.48749999999999999) (9517.6940400000003, 26364.107199999999, 0.26041999999999998) (3102.9560099999999, 0.0, 1.0)... [(30.633520000000001, 1046.5956699999999, 0.48749999999999999)] I can access with table[x] the x-row of that array, but how can I access the columns? table[:,2] doesn't work. /joh From ckkart at hoc.net Mon Jul 18 08:19:03 2011 From: ckkart at hoc.net (Christian K.) Date: Mon, 18 Jul 2011 12:19:03 +0000 (UTC) Subject: [SciPy-User] odr - goodness of fit Message-ID: Hi, I am applying odr to do 3d-surface fits which works very well. Now I would like to know if it is possible to construct a 'goodness of fit' quantity (between 0 and 1) like e.g. R2 from likelihood fits. I know about the residual variance but it would be nice to have some quantity which is limited to the [0,1] range. Best regards, Christian From marc.shivers at gmail.com Mon Jul 18 08:20:52 2011 From: marc.shivers at gmail.com (Marc Shivers) Date: Mon, 18 Jul 2011 08:20:52 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: The 3rd column would be: [a[2] for a in table] On Mon, Jul 18, 2011 at 8:16 AM, Johannes Radinger wrote: > okay so just to post the result I get from: > > table = numpy.genfromtxt("/path/to/file.csv", names=['s1','s2','p'], > dtype="float,float,float", delimiter=';' , skip_header=1) > > the result looks like this: > > [(30.633520000000001, 1046.5956699999999, 0.48749999999999999) > ?(9517.6940400000003, 26364.107199999999, 0.26041999999999998) > ?(3102.9560099999999, 0.0, 1.0)... > [(30.633520000000001, 1046.5956699999999, 0.48749999999999999)] > > I can access with table[x] the x-row of that array, but how can I > access > the columns? table[:,2] doesn't work. > > /joh > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From schmidbe at in.tum.de Mon Jul 18 08:51:26 2011 From: schmidbe at in.tum.de (Markus Schmidberger) Date: Mon, 18 Jul 2011 14:51:26 +0200 Subject: [SciPy-User] Your SciPy application on a Computer Cluster in the Cloud - cloudnumbers.com Message-ID: <1310993486.2551.62.camel@schmidb-TravelMate8572TG> Dear SciPy users and experts, cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. As cloudnumbers.com's community manager I may invite you to register and test your Python application on a computer cluster in the cloud for free: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Take the chance and have an impact to the development of a new cloud computing calculation platform. Our aim is to change the way of research collaboration is done today by bringing together scientists and businesses from all over the world on a single platform. cloudnumbers.com is a Berlin (Germany) based international high-tech startup striving for enabling everyone to benefit from the High Performance Computing related advantages of the cloud. We provide easy access to applications running on any kind of computer hardware from single core high memory machines up to 1000 cores computer clusters. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications: http://cloudnumbers.com/blog Key features of our platform for efficient computing in the cloud are: * Turn fixed into variable costs and pay only for the capacity you need. Watch our latest saving costs with cloudnumbers.com video: http://www.youtube.com/watch?v=ln_BSVigUhg&feature=player_embedded * Enter the cloud using an intuitive and user friendly platform. Watch our latest cloudnumbers.com in a nutshell video: http://www.youtube.com/watch?v=0ZNEpR_ElV0&feature=player_embedded * Be released from ongoing technological obsolescence and continuous maintenance costs (e.g. linking to libraries or system dependencies) * Accelerated your Python, C, C++, Fortran, R, ... calculations through parallel processing and great computing capacity - more than 1000 cores are available and GPUs are coming soon. * Share your results worldwide (coming soon). * Get high speed access to public databases (please let us know, if your favorite database is missing!). * We have developed a security architecture that meets high requirements of data security and privacy. Read our security white paper: http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf Best Markus -- Dr. rer. nat. Markus Schmidberger Senior Community Manager Cloudnumbers.com GmbH Chausseestra?e 6 10119 Berlin www.cloudnumbers.com E-Mail: markus.schmidberger at cloudnumbers.com ************************* Amtsgericht M?nchen, HRB 191138 Gesch?ftsf?hrer: Erik Muttersbach, Markus Fensterer, Moritz v. Petersdorff-Campen From johradinger at googlemail.com Mon Jul 18 09:07:27 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 15:07:27 +0200 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: Thank you, I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? /J -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Mon Jul 18 09:19:44 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 18 Jul 2011 09:19:44 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: http://docs.scipy.org/doc/numpy/user/basics.rec.html For your task, you want: result = table['s1'] + table['s2'] (Here's a suggestion for how to append the result back into a named column in the same structured array: http://mail.scipy.org/pipermail/numpy-discussion/2007-September/029357.html ) I'm not sure if there are any good canned methods for saving record arrays to csv files directly. Probably someone can suggest something... I usually just loop through the array at that point, using ','.join(whatever) to build the individual lines. Zach On Jul 18, 2011, at 9:07 AM, Johannes Radinger wrote: > Thank you, > > I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? > > What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? > > /J > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From marc.shivers at gmail.com Mon Jul 18 09:27:10 2011 From: marc.shivers at gmail.com (Marc Shivers) Date: Mon, 18 Jul 2011 09:27:10 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> References: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> Message-ID: Also, the genfromtxt function should have returned a numpy array, rather than a list of tuples. table[:,2] will return a result if table is a numpy array. I think the problem might be in your dtype input. You can read about dtype objects here: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html On Mon, Jul 18, 2011 at 9:19 AM, Zachary Pincus wrote: > The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: > http://docs.scipy.org/doc/numpy/user/basics.rec.html > > For your task, you want: > result = table['s1'] + table['s2'] > > (Here's a suggestion for how to append the result back into a named column in the same structured array: > http://mail.scipy.org/pipermail/numpy-discussion/2007-September/029357.html ) > > I'm not sure if there are any good canned methods for saving record arrays to csv files directly. Probably someone can suggest something... I usually just loop through the array at that point, using ','.join(whatever) to build the individual lines. > > Zach > > > > On Jul 18, 2011, at 9:07 AM, Johannes Radinger wrote: > >> Thank you, >> >> I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? >> >> What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? >> >> /J >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Mon Jul 18 10:50:18 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 18 Jul 2011 10:50:18 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> Message-ID: > Also, the genfromtxt function should have returned a numpy array, > rather than a list of tuples. table[:,2] will return a result if > table is a numpy array. I think the problem might be in your dtype > input. You can read about dtype objects here: > http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html >> The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: >> http://docs.scipy.org/doc/numpy/user/basics.rec.html As I discussed, the genfromtext csv-reading code returns a structured numpy array, not a list of tuples. This is an n-by-1 array, where each element of the array is of a structured dtype with individual fields (of potentially different data types) that can be addressed via their names, as I described. The reason for this is that, as CSV files can hold homogenous data types, but traditional numpy arrays cannot, a structured dtype is the best way to load a CSV generically. If one knows that one's file has only floats in it, then you could read it using fromtext, after stripping off the header, and get an n-by-m float array. Regardless, table[:,2] will NOT return a result if it is an n-by-1 array of structured dtypes, which is what genfromtext will give under most (all?) circumstances. Zach From jdh2358 at gmail.com Mon Jul 18 11:01:21 2011 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 18 Jul 2011 10:01:21 -0500 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: On Mon, Jul 18, 2011 at 5:17 AM, Johannes Radinger wrote: > Hello, > I try to load a csv-file (created with excel) and want to calculate a new > value. > Lets say my file has columns A,B and C and I want to calculate A*B-C and > append > it in the correct line. > So far I managed to get the file read with: > table = numpy.genfromtxt("/path/to/file.csv", dtype=None, delimiter=';', > skip_header=1) > But how do I have to proceed to get the single columns. I know that I have > to use after that a FOR-loop to loop over the lines to do the calculation. > so there are the two questions: > 1) how to extract the single columns? > 2) how to append a new columns containing the calculated value? import matplotlib.mlab as mlab r = mlab.csv2rec("somefile.csv") z = r.x + r.y r = mlab.rec_append_fields(r, ['z'], [z]) mlab.rec2csv(r, 'newfile.csv') From robert.kern at gmail.com Mon Jul 18 12:22:43 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jul 2011 11:22:43 -0500 Subject: [SciPy-User] odr - goodness of fit In-Reply-To: References: Message-ID: On Mon, Jul 18, 2011 at 07:19, Christian K. wrote: > Hi, > > I am applying odr to do 3d-surface fits which works very well. Now I would > like to know if it is possible to construct a 'goodness of fit' quantity > (between 0 and 1) like e.g. R2 from likelihood fits. I know about the residual > variance but it would be nice to have some quantity which is limited to the > [0,1] range. Well, you could compute the variance of the dataset, var_total, and then the residual variance, var_res, and compute (1-var_res/var_total). That's *roughly* what R2 is, but I'm not sure how meaningful that number will be. I'm fairly certain that you would not want to apply the standard significance values to that quantity. If you had good estimates of the errors on each measurement, then you can get a meaningful Chi^2 value from the residuals that you can use to compare against the Chi^2 distribution in order to get a p-value out (0.5 is good, ~0 means you overestimated your errors, ~1 means you got the fit wrong or underestimated your errors). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From rainexpected at theo.to Mon Jul 18 17:01:12 2011 From: rainexpected at theo.to (Ted To) Date: Mon, 18 Jul 2011 17:01:12 -0400 Subject: [SciPy-User] Conditional bivariate normal Message-ID: <4E249F18.3050600@theo.to> Hi All, I have a puzzle that I'm having trouble figuring out. Suppose U=X+Y where X and Y are independent normals. Sampling X and Y conditional on U>=\bar U takes an inordinate amount of time since at times \bar U can be fairly large so I've been thinking about how to shorten the time. I thought I came upon a solution by using scipy.stats.truncnorm.rvs to first draw U=u and then draw an X=x conditional on U=u. Using U=u and X=x, Y=y=u-x. I'm getting the correct means and the correct standard deviation for U but the standard deviations for X and Y are too small. Is there something wrong with my logic or have I incorrectly derived the sd for X|U=u? Thanks, Ted From rainexpected at theo.to Mon Jul 18 19:13:06 2011 From: rainexpected at theo.to (Ted To) Date: Mon, 18 Jul 2011 19:13:06 -0400 Subject: [SciPy-User] Conditional bivariate normal In-Reply-To: <4E249F18.3050600@theo.to> References: <4E249F18.3050600@theo.to> Message-ID: <4E24BE02.4080508@theo.to> On 07/18/2011 05:01 PM, Ted To wrote: > Hi All, > > I have a puzzle that I'm having trouble figuring out. Suppose U=X+Y > where X and Y are independent normals. Sampling X and Y conditional on > U>=\bar U takes an inordinate amount of time since at times \bar U can > be fairly large so I've been thinking about how to shorten the time. I > thought I came upon a solution by using scipy.stats.truncnorm.rvs to > first draw U=u and then draw an X=x conditional on U=u. Using U=u and > X=x, Y=y=u-x. > > I'm getting the correct means and the correct standard deviation for U > but the standard deviations for X and Y are too small. Is there > something wrong with my logic or have I incorrectly derived the sd for > X|U=u? Gah! Never mind. I forgot to take the square root of the variance... From lutz.maibaum at gmail.com Mon Jul 18 19:53:35 2011 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Mon, 18 Jul 2011 16:53:35 -0700 Subject: [SciPy-User] 'compress' numpy array In-Reply-To: <4E209E57.5070607@googlemail.com> References: <4E209E57.5070607@googlemail.com> Message-ID: On Jul 15, 2011, at 1:08 PM, Johannes Eckstein wrote: > I have a set of faces, which I formated like this: > [[[ 1. -0.1 -0. ] > [ 1. -0.09921 -0.01253] > [ 1. -0. -0. ]] > > [[ 1. -0.1 -0. ] > [ 1. -0.2 -0. ] > [ 1. -0.09921 -0.01253]] > > [[ 1. -0.2 -0. ] > [ 1. -0.19842 -0.02507] > [ 1. -0.09921 -0.01253]] > > [[ 1. -0.2 -0. ] > [ 1. -0.3 -0. ] > [ 1. -0.19842 -0.02507]] > > [[ 1. -0.3 -0. ] > [ 1. -0.29763 -0.0376 ] > [ 1. -0.19842 -0.02507]] > > [[ 1. -0.3 -0. ] > [ 1. -0.4 -0. ] > [ 1. -0.29763 -0.0376 ]] > > [[ 1. -0.4 -0. ] > [ 1. -0.39685 -0.05013] > [ 1. -0.29763 -0.0376 ]] > > [[ 1. -0.4 -0. ] > [ 1. -0.5 -0. ] > [ 1. -0.39685 -0.05013]] > > [[ 1. -0.5 -0. ] > [ 1. -0.49606 -0.06267] > [ 1. -0.39685 -0.05013]] > > [[ 1. -0.5 -0. ] > [ 1. -0.6 -0. ] > [ 1. -0.49606 -0.06267]] > > [[ 1. -0.6 -0. ] > [ 1. -0.59527 -0.0752 ] > [ 1. -0.49606 -0.06267]] > > [[ 1. -0.6 -0. ] > [ 1. -0.7 -0. ] > [ 1. -0.59527 -0.0752 ]]] > > Now I would like to find all the redundant points and create a list from > them like this: > [[ 1. -0.1 -0. ] > [ 1. -0.09921 -0.01253] > [ 1. -0. -0. ] > [ 1. -0.2 -0. ] > [ 1. -0.19842 -0.02507] > ...] > > Then make an array of the indices (which could be also shifted with x-1) > looking like this: > [[1 2 3] > [1 4 2] > [4 5 2] > ...] > > Anyone an Idea or a hint of how I can efficiently compute those two results? > To me it seems like I need some kind of tricky sorting algorithm, but > maybee there is a trick that I don't see? This sounds like a case for np.unique, with the caveat that your elements are 3-tuples of floats, which unique doesn't seem to handle. You could work around this by turning your points into records of 3 floats. Perhaps something like the following would work (let's call the input array you posted "input"): In [2]: input=array([[[ 1. , -0.1 , -0. ], ...: [ 1. , -0.09921, -0.01253], ...: [ 1. , -0. , -0. ]], ...: ...: [[ 1. , -0.1 , -0. ], ...: [ 1. , -0.2 , -0. ], ...: [ 1. , -0.09921, -0.01253]], ...: ...: [[ 1. , -0.2 , -0. ], ...: [ 1. , -0.19842, -0.02507], ...: [ 1. , -0.09921, -0.01253]], ...: ...: [[ 1. , -0.2 , -0. ], ...: [ 1. , -0.3 , -0. ], ...: [ 1. , -0.19842, -0.02507]], ...: ...: [[ 1. , -0.3 , -0. ], ...: [ 1. , -0.29763, -0.0376 ], ...: [ 1. , -0.19842, -0.02507]], ...: ...: [[ 1. , -0.3 , -0. ], ...: [ 1. , -0.4 , -0. ], ...: [ 1. , -0.29763, -0.0376 ]], ...: ...: [[ 1. , -0.4 , -0. ], ...: [ 1. , -0.39685, -0.05013], ...: [ 1. , -0.29763, -0.0376 ]], ...: ...: [[ 1. , -0.4 , -0. ], ...: [ 1. , -0.5 , -0. ], ...: [ 1. , -0.39685, -0.05013]], ...: ...: [[ 1. , -0.5 , -0. ], ...: [ 1. , -0.49606, -0.06267], ...: [ 1. , -0.39685, -0.05013]], ...: ...: [[ 1. , -0.5 , -0. ], ...: [ 1. , -0.6 , -0. ], ...: [ 1. , -0.49606, -0.06267]], ...: ...: [[ 1. , -0.6 , -0. ], ...: [ 1. , -0.59527, -0.0752 ], ...: [ 1. , -0.49606, -0.06267]], ...: ...: [[ 1. , -0.6 , -0. ], ...: [ 1. , -0.7 , -0. ], ...: [ 1. , -0.59527, -0.0752 ]]]) In [3]: input.shape Out[3]: (12, 3, 3) In [4]: temp = input.ravel().view([('x', float), ('y', float), ('z', float)]) In [5]: temp.shape Out[5]: (36,) In [6]: uniquepoints, indices = np.unique(temp,return_inverse=True) In [7]: uniquepoints Out[7]: array([(1.0, -0.69999999999999996, -0.0), (1.0, -0.59999999999999998, -0.0), (1.0, -0.59526999999999997, -0.075200000000000003), (1.0, -0.5, -0.0), (1.0, -0.49606, -0.062670000000000003), (1.0, -0.40000000000000002, -0.0), (1.0, -0.39684999999999998, -0.050130000000000001), (1.0, -0.29999999999999999, -0.0), (1.0, -0.29763000000000001, -0.037600000000000001), (1.0, -0.20000000000000001, -0.0), (1.0, -0.19842000000000001, -0.025069999999999999), (1.0, -0.10000000000000001, -0.0), (1.0, -0.099210000000000007, -0.012529999999999999), (1.0, -0.0, -0.0)], dtype=[('x', ' Hi all I'm looking some help in using fmin_cg to optimise a function. Basically I provide a function and its gradient as follows; > p1 = fmin_cg(func,p0,fprime=frime) and everything works fine. However, both func and fprime require the same matrix inversion at each step (via cholesky factorization). As matrix inversion is expensive, ideally I would like to calculate it only once per step, and use the matrix inverse calculated by func in the fprime function without having to repeat the calculation. Is this possible? ie to use the inverse calculated in func in the frime function as well? Thanks for any help. -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html Sent from the Scipy-User mailing list archive at Nabble.com. From serra.guillem at gmail.com Mon Jul 18 17:06:18 2011 From: serra.guillem at gmail.com (metge) Date: Mon, 18 Jul 2011 14:06:18 -0700 (PDT) Subject: [SciPy-User] scipy signal decimate why convolve among points we will decimate? Message-ID: In scipy signal, the decimate function uses a standard fir filtering to prevent aliasing before decimating. However, this filtering is applied to the entire set of points. It should be very easy to optimize it convolving only the points we will not sustract, specially if the filter order and the decimation factor are high. From guziy.sasha at gmail.com Mon Jul 18 20:30:20 2011 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Mon, 18 Jul 2011 20:30:20 -0400 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: You could use the same dictionary in both functions and save the inverses to it. (a=>inv(a)) -- Oleksandr 2011/7/18 gibbon > > Hi all > > I'm looking some help in using fmin_cg to optimise a function. Basically I > provide a function and its gradient as follows; > > > p1 = fmin_cg(func,p0,fprime=frime) > > and everything works fine. However, both func and fprime require the same > matrix inversion at each step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once per > step, and use the matrix inverse calculated by func in the fprime function > without having to repeat the calculation. > > Is this possible? ie to use the inverse calculated in func in the frime > function as well? > > Thanks for any help. > -- > View this message in context: > http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Tue Jul 19 05:38:17 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 11:38:17 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation Message-ID: Hello SciPy-People, I have got a normal distribution with ?=0 and I know that 50% of all observations are within a certain range (50% probability are between -x and +x). How can I get the standard deviation of that normal distribution? Usually 68,3 % are within one SD. How is it possible to calculate the SD from my 50% value? Is there any conversion factor I can use? Can that be simply and exactly calculated with Scipy? Thank you /Johannes From david_baddeley at yahoo.com.au Tue Jul 19 05:51:15 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 19 Jul 2011 02:51:15 -0700 (PDT) Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: Message-ID: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Hi Johannes, I guess what you're talking about is the inter-quartile range - if you know that your data is normally distributed the std deviation and IQR are related by a constant factor: IQR ~ 1.349\sigma (see wikipedia article on interquartile range, can also easily be derived from the gaussian CDF) cheers, David ----- Original Message ---- From: Johannes Radinger To: SciPy-User at scipy.org Sent: Tue, 19 July, 2011 9:38:17 PM Subject: [SciPy-User] from 50% probablity value to standard deviation Hello SciPy-People, I have got a normal distribution with ?=0 and I know that 50% of all observations are within a certain range (50% probability are between -x and +x). How can I get the standard deviation of that normal distribution? Usually 68,3 % are within one SD. How is it possible to calculate the SD from my 50% value? Is there any conversion factor I can use? Can that be simply and exactly calculated with Scipy? Thank you /Johannes _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From deil.christoph at googlemail.com Tue Jul 19 06:48:15 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Tue, 19 Jul 2011 12:48:15 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: Message-ID: On Jul 19, 2011, at 11:38 AM, Johannes Radinger wrote: > Hello SciPy-People, > > I have got a normal distribution with ?=0 and I know that 50% of all > observations > are within a certain range (50% probability are between -x and +x). > How can I get the standard deviation of that normal distribution? > Usually 68,3 % are within one SD. How is it possible to calculate the > SD from my 50% value? Is there any conversion factor I can use? Can > that be simply and exactly calculated with Scipy? > You can get the conversion factor like this: >>> scipy.stats.halfnorm.ppf(0.5) 0.67448975019608171 Here is how you use it to compute the standard deviation: >>> sd = x / scipy.stats.halfnorm.ppf(0.5) Christoph From johradinger at googlemail.com Tue Jul 19 07:00:40 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 13:00:40 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> References: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Message-ID: Thank you for that answer, the wiki articicle helped me alot, but I've still some problems of understanding, probably just a very simple thing. My case: 50% are between -165 and +165, so my IQR=330 Calculating with the factor 1.349 gives my the SD-range of -222.6 - +222.6 is that correct? Meaning that the SD is 222.6 In the graphic of the wikipedia-articel is shown that the IQR is between -0.6745*SD and + 0.6745*SD...If I just try to solve for 165/0.6745=SD results in a SD of 244.6 ... I am not sure why, probably just a very simple mathematical problem I don't realise ;) Maybe you can help Thank you /Johannes From jrennie at gmail.com Tue Jul 19 08:20:22 2011 From: jrennie at gmail.com (Jason Rennie) Date: Tue, 19 Jul 2011 08:20:22 -0400 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: On Mon, Jul 18, 2011 at 3:09 PM, gibbon wrote: > However, both func and fprime require the same matrix inversion at each > step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once > per step, and use the matrix inverse calculated by func in the fprime > function without having to repeat the calculation. > Try fmin_l_bfgs_b or fmin_tnc which allow 'func' to return both function value and gradient (if fprime=None). Though these functions allow you to specify simple bounds, they work well for unbounded problems in my experience. http://docs.scipy.org/doc/scipy/reference/optimize.html Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Jul 19 08:26:27 2011 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Jul 2011 14:26:27 +0200 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: Yes, it's easy to do! Create a class with your methods and function like this: class Function(object): def __init__(self): do something def cost(self, param): compute the parameters def prime(self, param): compute the gradient def inverse(self, param): call this one from cost and prime, do some caching stuff fun = function() p1 = fmin_cg(fun.cost, fprime = fun.prime) Matthieu 2011/7/18 gibbon > > Hi all > > I'm looking some help in using fmin_cg to optimise a function. Basically I > provide a function and its gradient as follows; > > > p1 = fmin_cg(func,p0,fprime=frime) > > and everything works fine. However, both func and fprime require the same > matrix inversion at each step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once per > step, and use the matrix inverse calculated by func in the fprime function > without having to repeat the calculation. > > Is this possible? ie to use the inverse calculated in func in the frime > function as well? > > Thanks for any help. > -- > View this message in context: > http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jul 19 09:34:22 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jul 2011 15:34:22 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Message-ID: On Tue, Jul 19, 2011 at 1:00 PM, Johannes Radinger wrote: > Thank you for that answer, > the wiki articicle helped me alot, but I've still > some problems of understanding, probably just a > very simple thing. > > My case: > > 50% are between -165 and +165, so my IQR=330 > > Calculating with the factor 1.349 gives my the SD-range of > -222.6 - +222.6 is that correct? Meaning that the SD is 222.6 > > In the graphic of the wikipedia-articel is shown that the IQR > is between -0.6745*SD and + 0.6745*SD...If I just try to > solve for 165/0.6745=SD results in a SD of 244.6 ... > > I am not sure why, probably just a very simple mathematical problem I > don't realise ;) > > Maybe you can help std = -165 / stats.norm.ppf(0.25) print stats.norm.cdf(165, scale=std) - stats.norm.cdf(-165, scale=std) 0.5 (checked on another computer) Josef > > Thank you > /Johannes > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gustavo.goretkin at gmail.com Tue Jul 19 02:54:58 2011 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Tue, 19 Jul 2011 02:54:58 -0400 Subject: [SciPy-User] odeint for pendulum with limits Message-ID: I am trying to model a pendulum which has a limited range of motion. The state of the pendulum consists of its angular position and velocity. When the pendulum hits one of its stops, the velocity goes to zero. Can I model this with the integrators in SciPy? In the dX/dt = F(X), I can write F such that the position is clipped into some range, but I don't think I can make the velocity discontinuously drop to zero. I'd appreciate any suggestions, including removing the discontinuity and instead placing a sharp, but continuous stop. Thanks, Gustavo -------------- next part -------------- An HTML attachment was scrubbed... URL: From Neale.Gibson at astro.ox.ac.uk Tue Jul 19 05:38:09 2011 From: Neale.Gibson at astro.ox.ac.uk (gibbon) Date: Tue, 19 Jul 2011 02:38:09 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: <32085615.post@talk.nabble.com> Message-ID: <32089558.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. Thanks again for your help. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32089558.html Sent from the Scipy-User mailing list archive at Nabble.com. From Neale.Gibson at astro.ox.ac.uk Tue Jul 19 06:39:44 2011 From: Neale.Gibson at astro.ox.ac.uk (gibbon) Date: Tue, 19 Jul 2011 03:39:44 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: Message-ID: <32089558.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. Thanks again for your help. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32089558.html Sent from the Scipy-User mailing list archive at Nabble.com. From nealegibby at googlemail.com Tue Jul 19 09:25:18 2011 From: nealegibby at googlemail.com (ngibby) Date: Tue, 19 Jul 2011 06:25:18 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: <32085615.post@talk.nabble.com> Message-ID: <32091153.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32091153.html Sent from the Scipy-User mailing list archive at Nabble.com. From aarchiba at physics.mcgill.ca Tue Jul 19 10:07:19 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 10:07:19 -0400 Subject: [SciPy-User] odeint for pendulum with limits In-Reply-To: References: Message-ID: Unfortunately this is a very tricky problem. The naive approach of replacing the stop with an extremely large force runs into problems because the integrator bogs down in tiny steps simulating the stop in detail. A better approach is to run the integrator with no limits but stopping integration when the pendulum reaches its physical limit. Then you can change the velocity any way you like and restart the integration. In terms of scipy, I don't think any of the integrators support stopping conditions (pydstool does) but I believe ours do support backtracking within the last step, so you can implement this yourself. The problem becomes really difficult if you can't compute forces outside the valid domain, because all the good integrators I know sometimes need to evaluate points outside the permitted region. Anne On 7/19/11, Gustavo Goretkin wrote: > I am trying to model a pendulum which has a limited range of motion. The > state of the pendulum consists of its angular position and velocity. When > the pendulum hits one of its stops, the velocity goes to zero. > > Can I model this with the integrators in SciPy? In the dX/dt = F(X), I can > write F such that the position is clipped into some range, but I don't think > I can make the velocity discontinuously drop to zero. I'd appreciate any > suggestions, including removing the discontinuity and instead placing a > sharp, but continuous stop. > > Thanks, > Gustavo > -- Sent from my mobile device From johradinger at googlemail.com Tue Jul 19 10:27:28 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 07:27:28 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: <1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> Thank you for all your help and your different solutions to the problem. What I did no is following: import csv data = [] # Read form csv file and compute fourth column f = open("/path/to/file.csv", 'rU') reader = csv.reader(f, delimiter=";") for row in reader: s1=float(row[0]) s2=float(row[1]) p=float(row[2]) if s2>0: A=A1 def func(x,s1,s2,m,A,p): return (p) * stats.norm.cdf(x, loc=m, scale=s1) + (1-p) * stats.norm.cdf(x, loc=m, scale=s2) - A x1=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) A=A2 x2=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) data.append(row + [x1] + [x2]) elif s2==0: x1=s1 x2=s1*3 data.append(row + [x1] + [x2]) else: print "Error" f.close() print data # Write new array to csv file f = open("/path/to/new_file.csv", 'wb') writer = csv.writer(f, delimiter=';') for row in data: writer.writerow(row) f.close() And that works...nearly perfect I just get 3 warning of follwowing type: Warning (from warnings module): File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/site-packages/scipy/optimize/zeros.py", line 125 warnings.warn(msg, RuntimeWarning) RuntimeWarning: Tolerance of 1697.3557819 reached How can I check which row is causing the problem? Thanks cheers /Johannes From jsseabold at gmail.com Tue Jul 19 10:43:04 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 19 Jul 2011 10:43:04 -0400 Subject: [SciPy-User] Announcing statsmodels 0.3.0 release Message-ID: We are happy to announce that statsmodels version 0.3.0 is available for download. You can install from PyPI with easy_install -U scikits.statsmodels What's new? https://github.com/statsmodels/statsmodels/blob/master/CHANGES.txt Documentation: http://statsmodels.sourceforge.net/ Source Distributions: http://pypi.python.org/pypi/scikits.statsmodels Repository: https://github.com/statsmodels/statsmodels Mailing List: https://groups.google.com/group/pystatsmodels Bug Tracker: https://github.com/statsmodels/statsmodels/issues You can find us on the web at http://blog.wesmckinney.com/ and http://scipystats.blogspot.com/ or keep up with development on twitter @statsmodels Cheers, Josef Perktold, Skipper Seabold, Wes McKinney, Mike Crow, Vincent Davis ---------------------------------------------- Statsmodels is a python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation of statistical models. scikits.statsmodels provides classes and functions for the estimation of several categories of statistical models. These currently include linear regression models, OLS, GLS, WLS and GLS with AR(p) errors, generalized linear models for six distribution families, M-estimators for robust linear models, and regression with discrete dependent variables, Logit, Probit, MNLogit, Poisson, based on maximum likelihood estimators, timeseries models, ARMA, AR and VAR. An extensive list of result statistics are available for each estimation problem. Statsmodels also contains descriptive statistics, a wide range of statistical tests, tools for density estimation, and more. From johradinger at googlemail.com Tue Jul 19 11:12:30 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 08:12:30 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: <1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> References: <1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> Message-ID: <07a40dd6-da4d-402a-b8c8-1b34503ccff7@g2g2000vbl.googlegroups.com> So I found out which lines are causing the problems but I don't know yet why: It seems that my optimize function can solve with very small values of s1. the optimize function i am using to solve is again: def func(x,s1,s2,m,A,p): return (p) * stats.norm.cdf(x, loc=m, scale=s1) + (1-p) * stats.norm.cdf(x, loc=m, scale=s2) - A x1=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) where m=0, A=0.6827 and following value-triples(s1,s2,p) causing problems: ['0.453567', '56.449087', '0.945475'] ['0.109604', '32.540055', '0.574013'] ['0.152876', '7.009490', '0.646816'] but why is here the tolerance reached? what can I do to improve that because the results I get aren't correct. /Johannes From cweisiger at msg.ucsf.edu Tue Jul 19 12:04:34 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Tue, 19 Jul 2011 09:04:34 -0700 Subject: [SciPy-User] Uniquely identify array Message-ID: Is there some way in Python to uniquely identify a given Numpy array? E.g. to get a pointer to its location in memory or something similar? I'm looking for some way to determine which operations will implicitly create new arrays, just to verify that I'm not doing anything that will seriously hurt my performance -- but this seems like something that would be generally useful to know. Unfortunately ndarrays don't allow arbitrary additions to their namespace; no doing "foo.myUniqueIdentifier = 1", for example. Thanks in advance! -Chris From robert.kern at gmail.com Tue Jul 19 12:15:03 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jul 2011 11:15:03 -0500 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: On Tue, Jul 19, 2011 at 11:04, Chris Weisiger wrote: > Is there some way in Python to uniquely identify a given Numpy array? > E.g. to get a pointer to its location in memory or something similar? > I'm looking for some way to determine which operations will implicitly > create new arrays, just to verify that I'm not doing anything that > will seriously hurt my performance -- but this seems like something > that would be generally useful to know. The 'data' entry in the .__array_interface__ dictionary is the memory pointer to the start of the array. http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#__array_interface__ [~] |3> x = np.arange(10) [~] |4> x.__array_interface__ {'data': (23147728, False), 'descr': [('', ' x[5:].__array_interface__ {'data': (23147748, False), 'descr': [('', ' np.may_share_memory(x, x[5:]) True However, it is susceptible to false positives when the memory ranges overlap, but the strides cause the elements to miss each other. Hence the noncommittal name: [~] |7> np.may_share_memory(x[0::2], x[1::2]) True -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From aarchiba at physics.mcgill.ca Tue Jul 19 12:22:19 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 12:22:19 -0400 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: This is a little more subtle than it sounds. Most python objects can be compared for identity with "is" (e.g. "if x is None:"). This tests for pointer equality, that is, it confirms that you have the same dynamically-allocated heap object. This will work for arrays, but it might be too specific for what you want: a numpy array actually consists of two heap objects, a python object that describes the array, and a memory arena. Slicing operations like A[::-1] are fast because while they create a new python object, the memory arena is untouched. So you need to decide whether what you care about is any change at all to the array, or whether what you care about is whether a new memory arena has been allocated. A brief aside: people often think they care about allocation of new arrays, but in most cases they're mistaken. malloc() is an extremely fast operation, especially for large arrays, in which case it's usually a direct call to the OS's mmap (and free really does free the memory back to the system). If what you're worried about is that your code is slower than it should be, making sure there are no extra allocations is not the best place to look. In-place operations have their own limitations, things like cache-coherency issues and cache efficiency of strided memory access. This is not theoretical: I had some code, a few years ago, that manipulated large arrays and was slow. So I painstakingly went through and made it use in-place operations where possible and avoid malloc()ing new arrays. Not only did it get slower, the memory usage increased. On the other hand, if you want to know whether you're getting slices that allow you to modify the original array or freshly-allocated arenas, the bluntest available instrument is to write to the one and see if the other changes. There are some more subtle approaches that are a little approximate, things like checking the address of the memory arena, or the equality of the base numpy array object (be warned that you have to traverse a tree of up pointers to get this last). I say approximate because while A[::2] and A[1::2] share a memory arena, and even have overlapping extents, you can modify them independently of each other. In short, you need to think hard about exactly what you're testing for. But for unit tests I recommend using modifications to test for memory sharing. Anne On 19 July 2011 12:04, Chris Weisiger wrote: > Is there some way in Python to uniquely identify a given Numpy array? > E.g. to get a pointer to its location in memory or something similar? > I'm looking for some way to determine which operations will implicitly > create new arrays, just to verify that I'm not doing anything that > will seriously hurt my performance -- but this seems like something > that would be generally useful to know. > > Unfortunately ndarrays don't allow arbitrary additions to their > namespace; no doing "foo.myUniqueIdentifier = 1", for example. > > Thanks in advance! > > -Chris > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cweisiger at msg.ucsf.edu Tue Jul 19 12:30:28 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Tue, 19 Jul 2011 09:30:28 -0700 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: Thanks for the detailed response, both you and Robert Kern. My immediate problem is not especially significant; I have three arrays: one of data D, one of additive offsets O, and one of multiplicative modifiers M. The first is of ints and the latter two of floats, and I want to get D * M - O as ints and shove them into an existing buffer. This is not an especially expensive operation (the dataset is 512x512), but I found myself curious about what it's doing behind the scenes, and the most straightforward way I know of to track that kind of thing is to track allocations. I don't expect it would make a big difference to hyper-optimize this problem, but in the future I may need tighter code in some other application, and I'd rather know now than potentially go down a wrong path later. I know more now about what Numpy's doing than I did before this thread. Thanks for the prompt and detailed responses. :) -Chris On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald wrote: > This is a little more subtle than it sounds. Most python objects can > be compared for identity with "is" (e.g. "if x is None:"). This tests > for pointer equality, that is, it confirms that you have the same > dynamically-allocated heap object. This will work for arrays, but it > might be too specific for what you want: a numpy array actually > consists of two heap objects, a python object that describes the > array, and a memory arena. Slicing operations like A[::-1] are fast > because while they create a new python object, the memory arena is > untouched. So you need to decide whether what you care about is any > change at all to the array, or whether what you care about is whether > a new memory arena has been allocated. > > A brief aside: people often think they care about allocation of new > arrays, but in most cases they're mistaken. malloc() is an extremely > fast operation, especially for large arrays, in which case it's > usually a direct call to the OS's mmap (and free really does free the > memory back to the system). If what you're worried about is that your > code is slower than it should be, making sure there are no extra > allocations is not the best place to look. In-place operations have > their own limitations, things like cache-coherency issues and cache > efficiency of strided memory access. This is not theoretical: I had > some code, a few years ago, that manipulated large arrays and was > slow. So I painstakingly went through and made it use in-place > operations where possible and avoid malloc()ing new arrays. Not only > did it get slower, the memory usage increased. > > On the other hand, if you want to know whether you're getting slices > that allow you to modify the original array or freshly-allocated > arenas, the bluntest available instrument is to write to the one and > see if the other changes. There are some more subtle approaches that > are a little approximate, things like checking the address of the > memory arena, or the equality of the base numpy array object (be > warned that you have to traverse a tree of up pointers to get this > last). I say approximate because while A[::2] and A[1::2] share a > memory arena, and even have overlapping extents, you can modify them > independently of each other. > > In short, you need to think hard about exactly what you're testing > for. But for unit tests I recommend using modifications to test for > memory sharing. > > Anne > > On 19 July 2011 12:04, Chris Weisiger wrote: >> Is there some way in Python to uniquely identify a given Numpy array? >> E.g. to get a pointer to its location in memory or something similar? >> I'm looking for some way to determine which operations will implicitly >> create new arrays, just to verify that I'm not doing anything that >> will seriously hurt my performance -- but this seems like something >> that would be generally useful to know. >> >> Unfortunately ndarrays don't allow arbitrary additions to their >> namespace; no doing "foo.myUniqueIdentifier = 1", for example. >> >> Thanks in advance! >> >> -Chris >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From aarchiba at physics.mcgill.ca Tue Jul 19 12:37:27 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 12:37:27 -0400 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: I know you were looking for tools to answer the question and not answers to the question but: The easiest way to do what you want is: output[...] = D*M-O This will convert D to floats, multiply it by M, subtract O, then store the result into output, converting to ints on the fly. I'm not sure whether a floatified version of D is allocated, but I think so. You could do all this in-place at the cost of extra roundings by using the np.multiply(a,b,out) forms of ufuncs. Anne On 19 July 2011 12:30, Chris Weisiger wrote: > Thanks for the detailed response, both you and Robert Kern. My > immediate problem is not especially significant; I have three arrays: > one of data D, one of additive offsets O, and one of multiplicative > modifiers M. The first is of ints and the latter two of floats, and I > want to get D * M - O as ints and shove them into an existing buffer. > This is not an especially expensive operation (the dataset is > 512x512), but I found myself curious about what it's doing behind the > scenes, and the most straightforward way I know of to track that kind > of thing is to track allocations. I don't expect it would make a big > difference to hyper-optimize this problem, but in the future I may > need tighter code in some other application, and I'd rather know now > than potentially go down a wrong path later. > > I know more now about what Numpy's doing than I did before this > thread. Thanks for the prompt and detailed responses. :) > > -Chris > > On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald > wrote: >> This is a little more subtle than it sounds. Most python objects can >> be compared for identity with "is" (e.g. "if x is None:"). This tests >> for pointer equality, that is, it confirms that you have the same >> dynamically-allocated heap object. This will work for arrays, but it >> might be too specific for what you want: a numpy array actually >> consists of two heap objects, a python object that describes the >> array, and a memory arena. Slicing operations like A[::-1] are fast >> because while they create a new python object, the memory arena is >> untouched. So you need to decide whether what you care about is any >> change at all to the array, or whether what you care about is whether >> a new memory arena has been allocated. >> >> A brief aside: people often think they care about allocation of new >> arrays, but in most cases they're mistaken. malloc() is an extremely >> fast operation, especially for large arrays, in which case it's >> usually a direct call to the OS's mmap (and free really does free the >> memory back to the system). If what you're worried about is that your >> code is slower than it should be, making sure there are no extra >> allocations is not the best place to look. In-place operations have >> their own limitations, things like cache-coherency issues and cache >> efficiency of strided memory access. This is not theoretical: I had >> some code, a few years ago, that manipulated large arrays and was >> slow. So I painstakingly went through and made it use in-place >> operations where possible and avoid malloc()ing new arrays. Not only >> did it get slower, the memory usage increased. >> >> On the other hand, if you want to know whether you're getting slices >> that allow you to modify the original array or freshly-allocated >> arenas, the bluntest available instrument is to write to the one and >> see if the other changes. There are some more subtle approaches that >> are a little approximate, things like checking the address of the >> memory arena, or the equality of the base numpy array object (be >> warned that you have to traverse a tree of up pointers to get this >> last). I say approximate because while A[::2] and A[1::2] share a >> memory arena, and even have overlapping extents, you can modify them >> independently of each other. >> >> In short, you need to think hard about exactly what you're testing >> for. But for unit tests I recommend using modifications to test for >> memory sharing. >> >> Anne >> >> On 19 July 2011 12:04, Chris Weisiger wrote: >>> Is there some way in Python to uniquely identify a given Numpy array? >>> E.g. to get a pointer to its location in memory or something similar? >>> I'm looking for some way to determine which operations will implicitly >>> create new arrays, just to verify that I'm not doing anything that >>> will seriously hurt my performance -- but this seems like something >>> that would be generally useful to know. >>> >>> Unfortunately ndarrays don't allow arbitrary additions to their >>> namespace; no doing "foo.myUniqueIdentifier = 1", for example. >>> >>> Thanks in advance! >>> >>> -Chris >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From seb.haase at gmail.com Tue Jul 19 16:04:13 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 19 Jul 2011 22:04:13 +0200 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: To get the two operations done in one step without intermediate temporary, you should benefit from using numexpr. There is not much talk about http://code.google.com/p/numexpr anymore, but it got started out of discussions on this list. And since then, it now even supports float32 (not only float64), which is what you want for large image data sets. I always meant to use it myself .... Cheers, Sebastian Haase On Tue, Jul 19, 2011 at 6:37 PM, Anne Archibald wrote: > I know you were looking for tools to answer the question and not > answers to the question but: > > The easiest way to do what you want is: > > output[...] = D*M-O > > This will convert D to floats, multiply it by M, subtract O, then > store the result into output, converting to ints on the fly. I'm not > sure whether a floatified version of D is allocated, but I think so. > You could do all this in-place at the cost of extra roundings by using > the np.multiply(a,b,out) forms of ufuncs. > > Anne > > On 19 July 2011 12:30, Chris Weisiger wrote: >> Thanks for the detailed response, both you and Robert Kern. My >> immediate problem is not especially significant; I have three arrays: >> one of data D, one of additive offsets O, and one of multiplicative >> modifiers M. The first is of ints and the latter two of floats, and I >> want to get D * M - O as ints and shove them into an existing buffer. >> This is not an especially expensive operation (the dataset is >> 512x512), but I found myself curious about what it's doing behind the >> scenes, and the most straightforward way I know of to track that kind >> of thing is to track allocations. I don't expect it would make a big >> difference to hyper-optimize this problem, but in the future I may >> need tighter code in some other application, and I'd rather know now >> than potentially go down a wrong path later. >> >> I know more now about what Numpy's doing than I did before this >> thread. Thanks for the prompt and detailed responses. :) >> >> -Chris >> >> On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald >> wrote: >>> This is a little more subtle than it sounds. Most python objects can >>> be compared for identity with "is" (e.g. "if x is None:"). This tests >>> for pointer equality, that is, it confirms that you have the same >>> dynamically-allocated heap object. This will work for arrays, but it >>> might be too specific for what you want: a numpy array actually >>> consists of two heap objects, a python object that describes the >>> array, and a memory arena. Slicing operations like A[::-1] are fast >>> because while they create a new python object, the memory arena is >>> untouched. So you need to decide whether what you care about is any >>> change at all to the array, or whether what you care about is whether >>> a new memory arena has been allocated. >>> >>> A brief aside: people often think they care about allocation of new >>> arrays, but in most cases they're mistaken. malloc() is an extremely >>> fast operation, especially for large arrays, in which case it's >>> usually a direct call to the OS's mmap (and free really does free the >>> memory back to the system). If what you're worried about is that your >>> code is slower than it should be, making sure there are no extra >>> allocations is not the best place to look. In-place operations have >>> their own limitations, things like cache-coherency issues and cache >>> efficiency of strided memory access. This is not theoretical: I had >>> some code, a few years ago, that manipulated large arrays and was >>> slow. So I painstakingly went through and made it use in-place >>> operations where possible and avoid malloc()ing new arrays. Not only >>> did it get slower, the memory usage increased. >>> >>> On the other hand, if you want to know whether you're getting slices >>> that allow you to modify the original array or freshly-allocated >>> arenas, the bluntest available instrument is to write to the one and >>> see if the other changes. There are some more subtle approaches that >>> are a little approximate, things like checking the address of the >>> memory arena, or the equality of the base numpy array object (be >>> warned that you have to traverse a tree of up pointers to get this >>> last). I say approximate because while A[::2] and A[1::2] share a >>> memory arena, and even have overlapping extents, you can modify them >>> independently of each other. >>> >>> In short, you need to think hard about exactly what you're testing >>> for. But for unit tests I recommend using modifications to test for >>> memory sharing. >>> >>> Anne >>> >>> On 19 July 2011 12:04, Chris Weisiger wrote: >>>> Is there some way in Python to uniquely identify a given Numpy array? >>>> E.g. to get a pointer to its location in memory or something similar? >>>> I'm looking for some way to determine which operations will implicitly >>>> create new arrays, just to verify that I'm not doing anything that >>>> will seriously hurt my performance -- but this seems like something >>>> that would be generally useful to know. >>>> >>>> Unfortunately ndarrays don't allow arbitrary additions to their >>>> namespace; no doing "foo.myUniqueIdentifier = 1", for example. >>>> >>>> Thanks in advance! >>>> >>>> -Chris From qingkunlqk at gmail.com Tue Jul 19 17:37:28 2011 From: qingkunlqk at gmail.com (qingkunl) Date: Tue, 19 Jul 2011 14:37:28 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] different answers got from scipy.cluster.vq.kmeans Message-ID: <32094443.post@talk.nabble.com> Hello, When I run scipy.cluster.vq.kmeans with exactly the same parameters passed to the function, I got different answers each time. >>> a array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[2, 3, 4], [7, 8, 9]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[7, 8, 9], [2, 3, 4]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[1, 2, 3], [5, 6, 7]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[5, 6, 7], [1, 2, 3]]), 1.7320508075688774) I know they are all correct answers. But is there a way for me to get a deterministic answer? Thanks, Qingkun -- View this message in context: http://old.nabble.com/different-answers-got-from-scipy.cluster.vq.kmeans-tp32094443p32094443.html Sent from the Scipy-User mailing list archive at Nabble.com. From kwgoodman at gmail.com Tue Jul 19 17:48:56 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 19 Jul 2011 14:48:56 -0700 Subject: [SciPy-User] [SciPy-user] different answers got from scipy.cluster.vq.kmeans In-Reply-To: <32094443.post@talk.nabble.com> References: <32094443.post@talk.nabble.com> Message-ID: On Tue, Jul 19, 2011 at 2:37 PM, qingkunl wrote: > When I run scipy.cluster.vq.kmeans with exactly the same parameters passed > to the function, I got different answers each time. > > I know they are all correct answers. But is there a way for me to get a > deterministic answer? >From the kmeans docstring: k_or_guess: The initial k centroids are chosen by randomly selecting observations from the observation matrix. Alternatively, passing a k by N array specifies the initial k centroids. For example: >>> kmeans(a, k_or_guess=a, iter=20, thresh=1e-5) From mail.till at gmx.de Tue Jul 19 20:25:38 2011 From: mail.till at gmx.de (Till Stensitzki) Date: Wed, 20 Jul 2011 00:25:38 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?Ported_mls=5Falloc_to_Python=2C_can_be_use?= =?utf-8?q?d_to_solve_bounded_linear_lst=2E_sqr=2E_problems=2E?= Message-ID: Hello, due the fact, that scipy is missing a bounded linear least square solver, i ported a working peace of code from matlab central. It's a 1 to 1 copy from mls_alloc in Qcat by Ola Harkegard. Qcat is used in control theory, so it's also possible to define weights for the vectors. See qcats docs for more information. To use it simply as bounded lsq solver, have a look a the file, where a example is given. The port can be found at: https://bitbucket.org/tillsten/pymls/overview All comments appreciated, Till From thkoe002 at gmail.com Wed Jul 20 03:22:55 2011 From: thkoe002 at gmail.com (=?ISO-8859-1?Q?Thomas_K=F6nigstein?=) Date: Wed, 20 Jul 2011 09:22:55 +0200 Subject: [SciPy-User] Convert hdf5 file content to numpy array Message-ID: Hi all, attached to this email (or if the attachment doesn't show up, alternatively at http://dl.dropbox.com/u/15199/vs001_3d_particles.h5 ), you find a 400kb hdf5 file with a number of nodes, which I would like to "import" as numpy array. The code that I use so far (also attached, and here http://dl.dropbox.com/u/15199/mwe.py ) is this: import tables hdf5=tables.openFile("vs001_3d_particles.h5") root=hdf5.root ptcls_names=[_ for _ in dir(root) if _.startswith("Electrons_at_PE_")] ptcls=[eval("root."+_) for _ in ptcls_names] for ptcl in ptcls: ptcl_data=ptcl.read() print type(ptcl_data) # print ptcl_data.dtype # [('cell', ' -------------- next part -------------- A non-text attachment was scrubbed... Name: vs001_3d_particles.h5 Type: application/octet-stream Size: 419188 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mwe.py Type: application/octet-stream Size: 688 bytes Desc: not available URL: From jr at sun.ac.za Wed Jul 20 04:23:35 2011 From: jr at sun.ac.za (Johann Rohwer) Date: Wed, 20 Jul 2011 10:23:35 +0200 Subject: [SciPy-User] Convert hdf5 file content to numpy array In-Reply-To: References: Message-ID: <201107201023.35822.jr@sun.ac.za> On Wednesday 20 July 2011, Thomas K?nigstein wrote: > Hi all, attached to this email (or if the attachment doesn't show > up, alternatively at > http://dl.dropbox.com/u/15199/vs001_3d_particles.h5 ), you find a > 400kb hdf5 file with a number of nodes, which I would like to > "import" as numpy array. The code that I use so far (also > attached, and here http://dl.dropbox.com/u/15199/mwe.py ) is this: > > import tables > > hdf5=tables.openFile("vs001_3d_particles.h5") > root=hdf5.root > > ptcls_names=[_ for _ in dir(root) if > _.startswith("Electrons_at_PE_")] ptcls=[eval("root."+_) for _ in > ptcls_names] > for ptcl in ptcls: > ptcl_data=ptcl.read() > print type(ptcl_data) # > print ptcl_data.dtype # [('cell', ' ' ('weight', ' ptcl_data*=2 # TypeError: unsupported operand type(s) for > *=: 'numpy.ndarray' and 'int' > x=ptcl_data[:,1] # I'd like to do stuff like that > > now, it is type(ptcl_data) == numpy.ndarray, but the contents of > the array are some kind of lists. > How can I now transform this weird, "custom/proprietaty" data > format into an ordinary numpy array? So that I can use operations > as, for example, node1*=2, or slicing, and so on? I use the h5py module for this, not the tables module. In short, import numpy, h5py f = h5py.File('myhdf5file.h5','r') data = f.get('path/to/my/dataset') data_as_array = numpy.array(data) Then you have a normal numpy array with which you can work further. HTH, Johann From johradinger at googlemail.com Wed Jul 20 04:37:16 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Wed, 20 Jul 2011 10:37:16 +0200 Subject: [SciPy-User] optimize: RuntimeWarning Tolerance reached Message-ID: Hej, I posted that already in another thread which was topically different and this is actually a new problem: I use the optimize function to solve: def func(x,s1,s2,m,A,p): return (p) * stats.norm.cdf(x, loc=m, scale=s1) + (1-p) * stats.norm.cdf(x, loc=m, scale=s2) - A x1=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) where m=0, A=0.6827 and following value-triples(s1,s2,p) causing problems: ['0.453567', '56.449087', '0.945475'] ['0.109604', '32.540055', '0.574013'] ['0.152876', '7.009490', '0.646816'] The error I get is like: Warning (from warnings module): File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/site-packages/scipy/optimize/zeros.py", line 125 warnings.warn(msg, RuntimeWarning) RuntimeWarning: Tolerance of 1697.3557819 reached but why is here the tolerance reached? what can I do to improve that because the results I get aren't correct. /Johannes From franckkalala at googlemail.com Wed Jul 20 06:22:28 2011 From: franckkalala at googlemail.com (franck kalala) Date: Wed, 20 Jul 2011 11:22:28 +0100 Subject: [SciPy-User] generate integer numbers between 0 and 75 from a normal distribution Message-ID: Hi list, I want to generate number between 0 and 75 from normal distribution, what can I use? Thank you -- ********** ++++ --- * -------------- next part -------------- An HTML attachment was scrubbed... URL: From brennan.williams at visualreservoir.com Wed Jul 20 06:35:32 2011 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Wed, 20 Jul 2011 22:35:32 +1200 Subject: [SciPy-User] generate integer numbers between 0 and 75 from a normal distribution In-Reply-To: References: Message-ID: <4E26AF74.6060800@visualreservoir.com> For a normal distribution you would use... scipy.stats.norm(loc=mean,scale=stddev) but if you want to restrict the range of generated values to being >=0 and <=75 you will need to use a truncated normal. Is that what you want to do? On 20/07/2011 10:22 p.m., franck kalala wrote: > Hi list, > > > I want to generate number between 0 and 75 from normal distribution, > what can I use? > > > Thank you > > -- > ********** > ++++ > --- > * > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From franckkalala at googlemail.com Wed Jul 20 06:38:15 2011 From: franckkalala at googlemail.com (franck kalala) Date: Wed, 20 Jul 2011 11:38:15 +0100 Subject: [SciPy-User] generate integer numbers between 0 and 75 from a normal distribution In-Reply-To: <4E26AF74.6060800@visualreservoir.com> References: <4E26AF74.6060800@visualreservoir.com> Message-ID: yes I want to restrict these numbers between 0 and 75, 2011/7/20 Brennan Williams > > For a normal distribution you would use... > > scipy.stats.norm(loc=mean,scale=stddev) > > but if you want to restrict the range of generated values to being >=0 and > <=75 you will need to use a truncated normal. Is that what you want to do? > > > > On 20/07/2011 10:22 p.m., franck kalala wrote: > > Hi list, > > > I want to generate number between 0 and 75 from normal distribution, what > can I use? > > > Thank you > > -- > ********** > ++++ > --- > * > > > > _______________________________________________ > SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- ********** ++++ --- * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Jul 20 12:27:18 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 20 Jul 2011 11:27:18 -0500 Subject: [SciPy-User] [Scipy-User] Building Scipy Docs Message-ID: I'm trying to build the html scipy docs and it keeps failing. In the docs directory here's what I run and the output. >make html mkdir -p build/html build/doctrees LANG=C sphinx-build -b html -d build/doctrees source build/html Running Sphinx v1.0.7 Scipy (VERSION 0.10.dev) (RELEASE 0.10.0.dev) Extension error: Could not import extension plot_directive (exception: No module named plot_directive) make: *** [html] Error 1 I can't figure out why it's failing. The plot_directive stuff is mentioned in the numpy docs documents, but nowhere in the scipy docs documents. And the numpy docs are building fine. I'm running scipy 0.10.0 dev, numpy 2.0.0 dev and matplotlib 1.0.1, sphinx 1.0.7, and numpydoc 0.4 on Ubuntu 10.10. Any ideas what's happening? Thanks, Chris Jordan-Squire -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jul 20 13:11:37 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 20 Jul 2011 19:11:37 +0200 Subject: [SciPy-User] [Scipy-User] Building Scipy Docs In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 6:27 PM, Christopher Jordan-Squire wrote: > I'm trying to build the html scipy docs and it keeps failing. > > In the docs directory here's what I run and the output. > > >make html > > mkdir -p build/html build/doctrees > LANG=C sphinx-build -b html -d build/doctrees source build/html > Running Sphinx v1.0.7 > Scipy (VERSION 0.10.dev) (RELEASE 0.10.0.dev) > > Extension error: > Could not import extension plot_directive (exception: No module named > plot_directive) > make: *** [html] Error 1 > > I can't figure out why it's failing. The plot_directive stuff is mentioned > in the numpy docs documents, but nowhere in the scipy docs documents. And > the numpy docs are building fine. > > I'm running scipy 0.10.0 dev, numpy 2.0.0 dev and matplotlib 1.0.1, sphinx > 1.0.7, and numpydoc 0.4 on Ubuntu 10.10. > > Any ideas what's happening? > > If you can build the numpy docs your setup should be fine. But scipy relies on the Sphinx extensions in the numpy tree. The scipy release script does this before building the docs: mkdir doc/sphinxext cp -R ../numpy/doc/sphinxext/ doc/sphinxext/ Cheers, Ralf Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fpm at u.washington.edu Wed Jul 20 11:10:35 2011 From: fpm at u.washington.edu (cassiope) Date: Wed, 20 Jul 2011 08:10:35 -0700 (PDT) Subject: [SciPy-User] generate integer numbers between 0 and 75 from a normal distribution In-Reply-To: References: <4E26AF74.6060800@visualreservoir.com> Message-ID: So make yourself a function that re-samples the distribution if a given draw exceeds your boundaries. No, the result won't be perfectly normal, but that was already given by your non-normal boundaries. There are many more complex ways of achieving this, mostly less flexible in terms of flexible mean/sd. If the mean & sd are fixed, you might be able to come up with a faster method that adequately approximates "normal" from a set of integer samples. On Jul 20, 3:38?am, franck kalala wrote: > yes I want to restrict these numbers between 0 and 75, > > 2011/7/20 Brennan Williams > > > > > > > For a normal distribution you would use... > > > scipy.stats.norm(loc=mean,scale=stddev) > > > but if you want to restrict the range of generated values to being >=0 and > > <=75 you will need to use a truncated normal. Is that what you want to do? > > > On 20/07/2011 10:22 p.m., franck kalala wrote: > > > Hi list, > > > I want to generate number between 0 and 75 from normal distribution, ?what > > can I use? > > > Thank you > > > -- > > ********** > > ? ++++ > > ? ? --- > > ? ? ?* > > > _______________________________________________ > > SciPy-User mailing listSciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-U... at scipy.org > >http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > ********** > ? ++++ > ? ? --- > ? ? ?* > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From gustavo.goretkin at gmail.com Tue Jul 19 13:55:26 2011 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Tue, 19 Jul 2011 13:55:26 -0400 Subject: [SciPy-User] SciPy function to return a default value on invalid index Message-ID: Say there's an array A of size 10. The indices are 0,...9. Is there a function to return a default value (e.g. 0) when an index outside this range is provided? I'd like to fancy-index the array. -------------- next part -------------- An HTML attachment was scrubbed... URL: From qingkunlqk at gmail.com Tue Jul 19 16:17:29 2011 From: qingkunlqk at gmail.com (qingkunl) Date: Tue, 19 Jul 2011 13:17:29 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] different answers got from scipy.cluster.vq.kmeans Message-ID: <32094443.post@talk.nabble.com> Hello, When I run scipy.cluster.vq.kmeans with exactly the same parameters passed to the function, I got different answers each time. >>> a array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[2, 3, 4], [7, 8, 9]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[7, 8, 9], [2, 3, 4]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[1, 2, 3], [5, 6, 7]]), 1.7320508075688774) >>> kmeans(a, 2, iter=20, thresh=1e-5) (array([[5, 6, 7], [1, 2, 3]]), 1.7320508075688774) I know they are all correct answers. But is there a way for me to get a deterministic answer? Thanks, Qingkun -- View this message in context: http://old.nabble.com/different-answers-got-from-scipy.cluster.vq.kmeans-tp32094443p32094443.html Sent from the Scipy-User mailing list archive at Nabble.com. From thkoe002 at gmail.com Wed Jul 20 17:27:43 2011 From: thkoe002 at gmail.com (=?ISO-8859-1?Q?Thomas_K=F6nigstein?=) Date: Wed, 20 Jul 2011 23:27:43 +0200 Subject: [SciPy-User] Convert hdf5 file content to numpy array In-Reply-To: <201107201023.35822.jr@sun.ac.za> References: <201107201023.35822.jr@sun.ac.za> Message-ID: Okay, thanks, I used h5py and then numpy.array(f.values[0]) , which worked just fine, thanks again! I just wonder why it doesn't work with pytables, and also why the f.get()-method that you proposed doesn't work for me... I get a "get() takes 3 arguments, 2 given" error... any clues? I just installed h5py (version 1.2.1 on python 2 .6), the code I tried was 'f.get("/root/PE_Electrons_1")'.. anyways, thanks for the help, cheers Thomas On Wed, Jul 20, 2011 at 10:23, Johann Rohwer wrote: > f = h5py.File('myhdf5file.h5','r') > data = f.get('path/to/my/dataset') > data_as_array = numpy.array(data) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Wed Jul 20 17:43:37 2011 From: e.antero.tammi at gmail.com (eat) Date: Thu, 21 Jul 2011 00:43:37 +0300 Subject: [SciPy-User] SciPy function to return a default value on invalid index In-Reply-To: References: Message-ID: Hi On Tue, Jul 19, 2011 at 8:55 PM, Gustavo Goretkin < gustavo.goretkin at gmail.com> wrote: > Say there's an array A of size 10. The indices are 0,...9. Is there a > function to return a default value (e.g. 0) when an index outside this range > is provided? I'd like to fancy-index the array. > Wouldn't it just be feasible to set index values above a threshold to the default one, juts like: ind[ind> 9]= 0 before utilizing ind in fancy-indexing? -eat > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinstocks at gmail.com Wed Jul 20 18:31:56 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Wed, 20 Jul 2011 18:31:56 -0400 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: <1311201116.3630.71.camel@SietchTabr> Skipper, I've done what you suggested. What do you think the next step should be? Should I wait for more feedback on scipy-dev, or should I file a pull request? Visualized diff, provided by github: https://github.com/collinstocks/scipy/compare/master...qr-with-pivoting Thanks, Collin -------------- next part -------------- An embedded message was scrubbed... From: Skipper Seabold Subject: Re: [SciPy-User] generic_flapack.pyf and geqp3 Date: Mon, 11 Jul 2011 23:17:06 -0500 Size: 6027 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From andrew.collette at gmail.com Wed Jul 20 19:14:14 2011 From: andrew.collette at gmail.com (Andrew Collette) Date: Wed, 20 Jul 2011 16:14:14 -0700 Subject: [SciPy-User] Convert hdf5 file content to numpy array In-Reply-To: References: <201107201023.35822.jr@sun.ac.za> Message-ID: Hi Thomas, > f.get()-method that you proposed doesn't work for me... I get a "get() takes > 3 arguments, 2 given" error... any clues? I just installed h5py (version > 1.2.1 on python 2 .6), the code I tried was > 'f.get("/root/PE_Electrons_1")'.. anyways, thanks for the help, cheers With h5py you're welcome to just do f["/path/to/dataset"] (like a dictionary) to retrieve a dataset, although get() is certainly also supported. Version 1.2.1 is pretty old; I think get() has changed since then which may be why you're having different behavior. On the resulting dataset it's likewise recommended to do "dset[0]" (numpy-like indexing) rather than going through dset.value, which is there for backwards compatibility. Andrew Collette From joonpyro at gmail.com Wed Jul 20 22:52:52 2011 From: joonpyro at gmail.com (Joon Ro) Date: Wed, 20 Jul 2011 21:52:52 -0500 Subject: [SciPy-User] Convert hdf5 file content to numpy array In-Reply-To: References: <201107201023.35822.jr@sun.ac.za> Message-ID: On Wed, 20 Jul 2011 16:27:43 -0500, Thomas K?nigstein wrote: > I just wonder why it doesn't work with pytables, and also why the > f.get()-method that you proposed doesn't work for me... I get a "get() > takes > 3 arguments, 2 given" error... any clues? I just installed h5py (version > 1.2.1 on python 2 .6), the code I tried was > 'f.get("/root/PE_Electrons_1")'.. anyways, thanks for the help, cheers > Thomas > I'm not sure if you used pytables to create the hdf5 file, but if you did, it seems you used createTable method. In this case what you get is a structured array. (http://www.scipy.org/Cookbook/Recarray) And each record is numpy.void type. If you want to store a numpy array without variable names, use createArray method instead of createTable. Anyway, you can convert the structured array into ndarray with: >>> array([ptcl[:][col] for col in test.dtype.names]).T There might be better way to do this but I haven't found one. -Joon From flyzzx at gmail.com Wed Jul 20 22:51:30 2011 From: flyzzx at gmail.com (Chong Yang) Date: Thu, 21 Jul 2011 10:51:30 +0800 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points Message-ID: Hi, Recently I have been working with 1D curve fitting on dense GPS point cloud representing road segments. For all I know, some roads may be highly curved and their corresponding x value are not necessary monotonic order. I tried splrep. It works very well on simple arcs but fails to give meaningful result on highly curved data - either because the polynomial order 'k' is not sufficient to fit the data well enough or the curve is vertically aligned. For now I get around vertical curves by flipping the axes and doing splrep with (y, x), but maybe a better approach is possible. My guess is splprep. However I am not sure how to properly convert the noisy data points into their parametric form. Does anyone have an idea on this or faced a similar problem before? Thanks. -- Regards, CY -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgdunn at gmail.com Thu Jul 21 01:17:28 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Thu, 21 Jul 2011 01:17:28 -0400 Subject: [SciPy-User] SciPy Central: a file and link sharing site - now active Message-ID: Hi The SciPy Central website is now active and waiting for high-quality code snippets and links to scientific resources of interest to the SciPy community. The website is at http://scipy-central.org and you can read some background about the site's history at http://scipy-central.org/about While we are officially in beta-mode, the site should be pretty stable. If you detect any bugs, or have any suggestions for improvements, please post them at https://github.com/kgdunn/SciPyCentral/issues/ Thanks, Kevin From zachary.pincus at yale.edu Thu Jul 21 08:19:59 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 21 Jul 2011 08:19:59 -0400 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: Hi Chong, If you have "time-of-acquisition" data for each (x,y) GPS point, use that as your parameter. If you don't, then the problem of finding a parameter essentially becomes one of "manifold learning" (admittedly a simple case). You could check out "locally linear embeddings" or "isomap" for the basic algorithms in this field -- I think scikits.learn has one or both. But that might take you pretty far afield. Zach On Jul 20, 2011, at 10:51 PM, Chong Yang wrote: > Hi, > > Recently I have been working with 1D curve fitting on dense GPS point cloud representing road segments. For all I know, some roads may be highly curved and their corresponding x value are not necessary monotonic order. > > I tried splrep. It works very well on simple arcs but fails to give meaningful result on highly curved data - either because the polynomial order 'k' is not sufficient to fit the data well enough or the curve is vertically aligned. For now I get around vertical curves by flipping the axes and doing splrep with (y, x), but maybe a better approach is possible. > > My guess is splprep. However I am not sure how to properly convert the noisy data points into their parametric form. Does anyone have an idea on this or faced a similar problem before? Thanks. > > -- > Regards, > CY > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From johradinger at googlemail.com Thu Jul 21 11:02:21 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Thu, 21 Jul 2011 08:02:21 -0700 (PDT) Subject: [SciPy-User] optimize: RuntimeWarning Tolerance reached In-Reply-To: References: Message-ID: nobody any idea why these datasets fail? around 150 other datasets work really good! /j From gabe at squirrelsoup.net Thu Jul 21 11:24:25 2011 From: gabe at squirrelsoup.net (Gabriel Dulac-Arnold) Date: Thu, 21 Jul 2011 17:24:25 +0200 Subject: [SciPy-User] check_format in sparse matrix creation Message-ID: <4E2844A9.2040300@squirrelsoup.net> Is there a good reason that check_format cannot be disabled when creating sparse arrays in scipy.sparse? If the correct ndarrays are provided in the constructor, a lot of what check_format does (pruning, typecasting to native byteorder) is not necessary, and wastes a lot of time in situations where sparse arrays are being generated in large numbers. Has this been considered / was there a good reason not to allow disabling it? Thanks, Gabe From cjordan1 at uw.edu Thu Jul 21 12:38:10 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Thu, 21 Jul 2011 11:38:10 -0500 Subject: [SciPy-User] [Scipy-User] Building Scipy Docs In-Reply-To: References: Message-ID: Thanks. It builds fine now. -Chris JS On Wed, Jul 20, 2011 at 12:11 PM, Ralf Gommers wrote: > > > On Wed, Jul 20, 2011 at 6:27 PM, Christopher Jordan-Squire < > cjordan1 at uw.edu> wrote: > >> I'm trying to build the html scipy docs and it keeps failing. >> >> In the docs directory here's what I run and the output. >> >> >make html >> >> mkdir -p build/html build/doctrees >> LANG=C sphinx-build -b html -d build/doctrees source build/html >> Running Sphinx v1.0.7 >> Scipy (VERSION 0.10.dev) (RELEASE 0.10.0.dev) >> >> Extension error: >> Could not import extension plot_directive (exception: No module named >> plot_directive) >> make: *** [html] Error 1 >> >> I can't figure out why it's failing. The plot_directive stuff is mentioned >> in the numpy docs documents, but nowhere in the scipy docs documents. And >> the numpy docs are building fine. >> >> I'm running scipy 0.10.0 dev, numpy 2.0.0 dev and matplotlib 1.0.1, sphinx >> 1.0.7, and numpydoc 0.4 on Ubuntu 10.10. >> >> Any ideas what's happening? >> >> If you can build the numpy docs your setup should be fine. But scipy > relies on the Sphinx extensions in the numpy tree. The scipy release script > does this before building the docs: > > mkdir doc/sphinxext > cp -R ../numpy/doc/sphinxext/ doc/sphinxext/ > > Cheers, > Ralf > Ralf > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flyzzx at gmail.com Thu Jul 21 13:24:56 2011 From: flyzzx at gmail.com (Chong Yang) Date: Fri, 22 Jul 2011 01:24:56 +0800 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: Hi Zach, Unfortunately the 'time-of-acquisition' is hardly helpful in this case due to the different time of arrival of each point in the cloud, which may or may not begin at the same location. But thanks for pointing me in the right direction! I will look into LLE and Isomap (which look promising enough for simple cases like mine) and test their performances. I am dealing with a range of 10k points so speed could be a concern. On Thu, Jul 21, 2011 at 8:19 PM, Zachary Pincus wrote: > Hi Chong, > > If you have "time-of-acquisition" data for each (x,y) GPS point, use that > as your parameter. If you don't, then the problem of finding a parameter > essentially becomes one of "manifold learning" (admittedly a simple case). > You could check out "locally linear embeddings" or "isomap" for the basic > algorithms in this field -- I think scikits.learn has one or both. > > But that might take you pretty far afield. > > Zach > > > > On Jul 20, 2011, at 10:51 PM, Chong Yang wrote: > > > Hi, > > > > Recently I have been working with 1D curve fitting on dense GPS point > cloud representing road segments. For all I know, some roads may be highly > curved and their corresponding x value are not necessary monotonic order. > > > > I tried splrep. It works very well on simple arcs but fails to give > meaningful result on highly curved data - either because the polynomial > order 'k' is not sufficient to fit the data well enough or the curve is > vertically aligned. For now I get around vertical curves by flipping the > axes and doing splrep with (y, x), but maybe a better approach is possible. > > > > My guess is splprep. However I am not sure how to properly convert the > noisy data points into their parametric form. Does anyone have an idea on > this or faced a similar problem before? Thanks. > > > > -- > > Regards, > > CY > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Regards, Goh Chong Yang -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Thu Jul 21 13:38:10 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 21 Jul 2011 13:38:10 -0400 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: <8F6D487F-75C7-482E-9541-9EADB8F5C85C@yale.edu> I just remembered that another classic algorithm in this family is "principal curves". http://www.iro.umontreal.ca/~kegl/research/pcurves/ This might be even simpler and faster than LLE/Isomap. Zach On Jul 21, 2011, at 1:24 PM, Chong Yang wrote: > Hi Zach, > > Unfortunately the 'time-of-acquisition' is hardly helpful in this case due to the different time of arrival of each point in the cloud, which may or may not begin at the same location. > > But thanks for pointing me in the right direction! I will look into LLE and Isomap (which look promising enough for simple cases like mine) and test their performances. I am dealing with a range of 10k points so speed could be a concern. > > On Thu, Jul 21, 2011 at 8:19 PM, Zachary Pincus wrote: > Hi Chong, > > If you have "time-of-acquisition" data for each (x,y) GPS point, use that as your parameter. If you don't, then the problem of finding a parameter essentially becomes one of "manifold learning" (admittedly a simple case). You could check out "locally linear embeddings" or "isomap" for the basic algorithms in this field -- I think scikits.learn has one or both. > > But that might take you pretty far afield. > > Zach > > > > On Jul 20, 2011, at 10:51 PM, Chong Yang wrote: > > > Hi, > > > > Recently I have been working with 1D curve fitting on dense GPS point cloud representing road segments. For all I know, some roads may be highly curved and their corresponding x value are not necessary monotonic order. > > > > I tried splrep. It works very well on simple arcs but fails to give meaningful result on highly curved data - either because the polynomial order 'k' is not sufficient to fit the data well enough or the curve is vertically aligned. For now I get around vertical curves by flipping the axes and doing splrep with (y, x), but maybe a better approach is possible. > > > > My guess is splprep. However I am not sure how to properly convert the noisy data points into their parametric form. Does anyone have an idea on this or faced a similar problem before? Thanks. > > > > -- > > Regards, > > CY > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > Regards, > Goh Chong Yang > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Thu Jul 21 15:07:27 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 Jul 2011 19:07:27 +0000 (UTC) Subject: [SciPy-User] check_format in sparse matrix creation References: <4E2844A9.2040300@squirrelsoup.net> Message-ID: On Thu, 21 Jul 2011 17:24:25 +0200, Gabriel Dulac-Arnold wrote: > Is there a good reason that check_format cannot be disabled when > creating sparse arrays in scipy.sparse? If the correct ndarrays are > provided in the constructor, a lot of what check_format does (pruning, > typecasting to native byteorder) is not necessary, and wastes a lot of > time in situations where sparse arrays are being generated in large > numbers. Has this been considered / was there a good reason not to > allow disabling it? Probably this did not occur to the author of scipy.sparse. I don't see a problem in adding a new "unsafe" constructor to the classes, or a flag to the constructor. -- Pauli Virtanen From wesmckinn at gmail.com Thu Jul 21 15:23:28 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 21 Jul 2011 15:23:28 -0400 Subject: [SciPy-User] Blog post about pandas, datarray, data structures in general, and where to from here Message-ID: I can't help myself from continuing to kick the hornet's nest: http://news.ycombinator.org/item?id=2790762 but in short after SciPy 2011, I'm very excited to keep the dialog going and keep making progress. - Wes From e.antero.tammi at gmail.com Thu Jul 21 15:58:04 2011 From: e.antero.tammi at gmail.com (eat) Date: Thu, 21 Jul 2011 22:58:04 +0300 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: Hi, On Thu, Jul 21, 2011 at 5:51 AM, Chong Yang wrote: > Hi, > > Recently I have been working with 1D curve fitting on dense GPS point cloud > representing road segments. For all I know, some roads may be highly curved > and their corresponding x value are not necessary monotonic order. > > I tried splrep. It works very well on simple arcs but fails to give > meaningful result on highly curved data - either because the polynomial > order 'k' is not sufficient to fit the data well enough or the curve is > vertically aligned. For now I get around vertical curves by flipping the > axes and doing splrep with (y, x), but maybe a better approach is possible. > > My guess is splprep. However I am not sure how to properly convert the > noisy data points into their parametric form. Does anyone have an idea on > this or faced a similar problem before? Thanks. > Fundamentally roads are designed based on line segments and circular arc segments, possible joined by clothoids (with some contiguous criteria). Now, when the roads are actually constructed they will only follow approximately the originally intended geometry. Now, without really knowing your ultimate goals, I'll simply suggest you to try to fit consecutive segments based for example on RANSAC ( http://en.wikipedia.org/wiki/RANSAC). So perhaps your first estimate would be based only on line segments fitted by RANSAC (and fine tuned later if needed). Please note that for the fitting (of 2D line segments) you need to utilize orthogonal distance regression. (Now with this knowledge you could indeed project the line segments to 1D and fine tune it more there. However splines wont help you, because possible discontinuities, like connection of line and arc segments. Here you may like to study more on based on http://en.wikipedia.org/wiki/Segmented_regression), Anyway, please feel free to elaborate more on your particular case. -eat > > -- > Regards, > CY > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jul 21 16:43:05 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 21 Jul 2011 22:43:05 +0200 Subject: [SciPy-User] ANN: NumPy 1.6.1 release Message-ID: Hi, I am pleased to announce the availability of NumPy 1.6.1. This is a bugfix release for the 1.6.x series; the list of fixed bugs is given below. Sources and binaries can be found at http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/ Thanks to anyone who contributed to this release. Enjoy, The NumPy developers Bug fixes for NumPy 1.6.1 ------------------------- #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() #1895/1896 iter: writeonly operands weren't always being buffered correctly -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrajt at gmail.com Thu Jul 21 20:22:07 2011 From: matrajt at gmail.com (Laura Matrajt) Date: Thu, 21 Jul 2011 17:22:07 -0700 Subject: [SciPy-User] Help!!!!!! having problems with ODEINT Message-ID: Hi all, I am working with a system of 16 differential equations that simulates an epidemic in a city. Because there are many cities interacting with each other, I need to run my ode's for a single day, stop them, modify the initial conditions and run them again. Because the ode is running only for a day, I defined my tspan to have only two points, like this: tspan = tspan = linspace(day, day+1, 2) I wrote my equations in Python and I am using scipy.odeint to solve it. Here is my code: def advanceODEoneDay(self,day): #rename variables for convenience N0, N1 = self.children, self.adults S0,S1,A0,A1,I0,I1,RA0,RA1,RI0,RI1 = self.S0,self.S1,self.A0,self.A1,self.I0,self.I1,self.RA0,self.RA1,self.RI0,self.RI1 #create a vector of times for integration: tspan = linspace(day, day+1, 2) #set initial conditions. To do this, I need to look in the array in the day initCond = [S0[day,0], S0[day,1], S1[day,0], S1[day,1], A0[day,0], A0[day,1], A1[day,0], A1[day,1], I0[day,0], I0[day,1], I1[day,0], I1[day,1], RA0[day,0], RA1[day,0], RI0[day,0], RI1[day,0] ] #run the ode: sir_sol = odeint(sir2groups,initCond, tspan, args=(self.p,self.C,self.m,self.VEs,self.VEi,self.VEp,N0,N1))#,self.gamma,self.rho)) Most of the time, it works just fine. However, there are some times where the following message appears: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information. lsoda-- at current t (=r1), mxstep (=i1) steps taken on this call before reaching tout In above message, I1 = 500 In above message, R1 = 0.1170754095027E+03 I should mention that it is NOT only a warning. This is repeated over and over (thousands of times) and then it will break the rest of my code Ok, after searching in this mailing list, someone else posted a similar warning message and it was suggested to him that " *In your case, you might simply be computing for to coarse a mesh in t,** so "too much work" has to be done for each step. " *Is this what is happening to me? the problem is that I don't get this error every single time, so I don't even know how to run it with full_output = 1 to get the info... I really don't know what to do. Any help will be very very very very appreciated! thank you very * * -- Laura -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Thu Jul 21 21:08:45 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 21 Jul 2011 19:08:45 -0600 Subject: [SciPy-User] Help!!!!!! having problems with ODEINT In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 6:22 PM, Laura Matrajt wrote: > Hi all, > I am working with a system of 16 differential equations that simulates an > epidemic in a city. Because there are many cities interacting with each > other, I need to run my ode's for a single day, stop them, modify the > initial conditions and run them again. Because the ode is running only for > a day, I defined my tspan to have only two points, like this: > > tspan = tspan = linspace(day, day+1, 2) > > I wrote my equations in Python and I am using scipy.odeint to solve it. > > Here is my code: > def advanceODEoneDay(self,day): > > #rename variables for convenience > N0, N1 = self.children, self.adults > S0,S1,A0,A1,I0,I1,RA0,RA1,RI0,RI1 = > self.S0,self.S1,self.A0,self.A1,self.I0,self.I1,self.RA0,self.RA1,self.RI0,self.RI1 > > > > #create a vector of times for integration: > tspan = linspace(day, day+1, 2) > > > #set initial conditions. To do this, I need to look in the array in > the day > initCond = [S0[day,0], S0[day,1], > S1[day,0], S1[day,1], > A0[day,0], A0[day,1], A1[day,0], A1[day,1], > I0[day,0], I0[day,1], I1[day,0], I1[day,1], > RA0[day,0], RA1[day,0], RI0[day,0], RI1[day,0] ] > > > #run the ode: > sir_sol = odeint(sir2groups,initCond, tspan, > args=(self.p,self.C,self.m,self.VEs,self.VEi,self.VEp,N0,N1))#,self.gamma,self.rho)) > > > Most of the time, it works just fine. However, there are some times where > the following message appears: > > Excess work done on this call (perhaps wrong Dfun type). > Run with full_output = 1 to get quantitative information. > lsoda-- at current t (=r1), mxstep (=i1) steps > taken on this call before reaching tout > In above message, I1 = 500 > In above message, R1 = 0.1170754095027E+03 > > I should mention that it is NOT only a warning. This is repeated over and > over (thousands of times) and then it will break the rest of my code > Ok, after searching in this mailing list, someone else posted a similar > warning message and it was suggested to him that " > > *In your case, you might simply be computing for to coarse a mesh in t,** so "too much work" has to be done for each step. " > > > *Is this what is happening to me? the problem is that I don't get this error every single time, so I don't even know how to run it with full_output = 1 to get the info... > > > I really don't know what to do. Any help will be very very very very appreciated! > thank you very * > * > > The function odeint has the keyword argument 'mxstep' that determines the maximum number of internal steps allowed between requested time values. The default is 500. It is not unusual to have to increase this, especially in a case like yours where you simply want the value at the end of a long time interval. Try increasing it to, say, mxstep=5000. Warren * > * > > > > > -- > Laura > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrajt at gmail.com Fri Jul 22 01:11:43 2011 From: matrajt at gmail.com (Laura Matrajt) Date: Thu, 21 Jul 2011 22:11:43 -0700 Subject: [SciPy-User] Help!!!!!! having problems with ODEINT In-Reply-To: References: Message-ID: Hi Warren, thanks for your fast reply. As I said in my email, 90% of the time this runs perfectly well. Is 'mxstep' a number that will be computed at each iteration or is it just there in case of more steps are needed? I am worried that by increasing it I will make my code slower... Also, would increasing my tspan solve the issue? If the answer is yes, which of the two solutions would be better? Also, I am not passing the Jacobian to my function. Do you think that it will make a difference, both in terms of speed and in terms of this error to pass the Jacobian? THank you very very much! I was really worried about this! On Thu, Jul 21, 2011 at 6:08 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Thu, Jul 21, 2011 at 6:22 PM, Laura Matrajt wrote: > >> Hi all, >> I am working with a system of 16 differential equations that simulates an >> epidemic in a city. Because there are many cities interacting with each >> other, I need to run my ode's for a single day, stop them, modify the >> initial conditions and run them again. Because the ode is running only for >> a day, I defined my tspan to have only two points, like this: >> >> tspan = tspan = linspace(day, day+1, 2) >> >> I wrote my equations in Python and I am using scipy.odeint to solve it. >> >> Here is my code: >> def advanceODEoneDay(self,day): >> >> #rename variables for convenience >> N0, N1 = self.children, self.adults >> S0,S1,A0,A1,I0,I1,RA0,RA1,RI0,RI1 = >> self.S0,self.S1,self.A0,self.A1,self.I0,self.I1,self.RA0,self.RA1,self.RI0,self.RI1 >> >> >> >> #create a vector of times for integration: >> tspan = linspace(day, day+1, 2) >> >> >> #set initial conditions. To do this, I need to look in the array >> in the day >> initCond = [S0[day,0], S0[day,1], >> S1[day,0], S1[day,1], >> A0[day,0], A0[day,1], A1[day,0], A1[day,1], >> I0[day,0], I0[day,1], I1[day,0], I1[day,1], >> RA0[day,0], RA1[day,0], RI0[day,0], RI1[day,0] ] >> >> >> #run the ode: >> sir_sol = odeint(sir2groups,initCond, tspan, >> args=(self.p,self.C,self.m,self.VEs,self.VEi,self.VEp,N0,N1))#,self.gamma,self.rho)) >> >> >> Most of the time, it works just fine. However, there are some times where >> the following message appears: >> >> Excess work done on this call (perhaps wrong Dfun type). >> Run with full_output = 1 to get quantitative information. >> lsoda-- at current t (=r1), mxstep (=i1) steps >> taken on this call before reaching tout >> In above message, I1 = 500 >> In above message, R1 = 0.1170754095027E+03 >> >> I should mention that it is NOT only a warning. This is repeated over and >> over (thousands of times) and then it will break the rest of my code >> Ok, after searching in this mailing list, someone else posted a similar >> warning message and it was suggested to him that " >> >> *In your case, you might simply be computing for to coarse a mesh in t,** so "too much work" has to be done for each step. " >> >> >> *Is this what is happening to me? the problem is that I don't get this error every single time, so I don't even know how to run it with full_output = 1 to get the info... >> >> >> >> I really don't know what to do. Any help will be very very very very appreciated! >> thank you very * >> * >> >> > The function odeint has the keyword argument 'mxstep' that determines the > maximum number of internal steps allowed between requested time values. The > default is 500. It is not unusual to have to increase this, especially in a > case like yours where you simply want the value at the end of a long time > interval. Try increasing it to, say, mxstep=5000. > > Warren > > * >> * >> >> >> >> >> -- >> Laura >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Laura -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Fri Jul 22 01:39:39 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 22 Jul 2011 01:39:39 -0400 Subject: [SciPy-User] Help!!!!!! having problems with ODEINT In-Reply-To: References: Message-ID: Hi Laura, odeint uses an adaptive method (I believe an "embedded 4-5 Runge-Kutta") which evaluates a small number of points and estimates the error produced by using those points. If the error is small - and odeint has parameters that let you set how big "small" is - it'll return this estimate, otherwise it'll subdivide the region and repeat the process on each subdivision. To avoid infinite loops, this bails out if too many steps are taken, with a warning. That's what you're seeing; increasing mxstep will make it go further before giving up. It won't make a difference in cases where all those steps aren't necessary. It's worth thinking about why the integrator is taking all those steps. It generally happens because there's some sort of kink or complex behaviour in the solution there; this can be a genuine feature of the solution, or (more often in my experience) it can be because your RHS function has some discontinuity (perhaps due to a bug). So it's worth checking out those segments. I believe that with full_output you can check how many steps were needed and have your program bail out with a set of parameters that trigger the problem. Asking odeint for more intermediate points you don't actually care about will be a less-efficient way of increasing maxstep - with one exception. If you know there are problem points in your domain you can put them in your list to make sure the integrator notices them. Otherwise, just rely on it to figure out which parts of the integration need extra work. Passing the Jacobian in will definitely help reduce the number of steps needed, if you can compute it analytically. (Otherwise let the integrator build up a numerical Jacobian rather than trying to do it yourself.) Finally, I should say that apart from the stream of ugly messages, when you hit mxstep what you get is still an estimate of the solution, just not a very good one (that is, the integrator thinks there's too much error). If you're okay with crummy solutions in a few corner cases, you can just live with the warnings. Anne On 22 July 2011 01:11, Laura Matrajt wrote: > Hi Warren, > ?thanks for your fast reply. > As I said in my email, 90% of the time this runs perfectly well. > Is 'mxstep' a number that will be computed at each iteration or is it just > there in case of more steps are needed? I am worried that by increasing it I > will make my code slower... > Also, would increasing my tspan solve the issue? > If the answer is yes, which of the two solutions would be better? > Also, I am not passing the Jacobian to my function. Do you think that it > will make a difference, both in terms of speed and in terms of this error to > pass the Jacobian? > > THank you very very much! I was really worried about this! > > > On Thu, Jul 21, 2011 at 6:08 PM, Warren Weckesser > wrote: >> >> >> On Thu, Jul 21, 2011 at 6:22 PM, Laura Matrajt wrote: >>> >>> Hi all, >>> ?I am working with a system of 16 differential equations that simulates >>> an epidemic in a city.? Because there are many cities interacting with each >>> other, I need to run my ode's for a single day, stop them, modify the >>> initial conditions and run them again.? Because the ode is running only for >>> a day, I defined my tspan to have only two points, like this: >>> >>> tspan = tspan = linspace(day, day+1, 2) >>> >>> I wrote my equations in Python and I am using scipy.odeint to solve it. >>> >>> Here is my code: >>> def advanceODEoneDay(self,day): >>> >>> ??????? #rename variables for convenience >>> ??????? N0, N1 = self.children, self.adults >>> ??????? S0,S1,A0,A1,I0,I1,RA0,RA1,RI0,RI1 = >>> self.S0,self.S1,self.A0,self.A1,self.I0,self.I1,self.RA0,self.RA1,self.RI0,self.RI1 >>> >>> >>> >>> ??????? #create a vector of times for integration: >>> ??????? tspan = linspace(day, day+1, 2) >>> >>> >>> ??????? #set initial conditions. To do this, I need to look in the array >>> in the day >>> ??????? initCond = [S0[day,0], S0[day,1], >>> ??????????????????? S1[day,0], S1[day,1], >>> ??????????????????? A0[day,0], A0[day,1], A1[day,0], A1[day,1], >>> ??????????????????? I0[day,0], I0[day,1], I1[day,0], I1[day,1], >>> ??????????????????? RA0[day,0], RA1[day,0], RI0[day,0], RI1[day,0] ] >>> >>> >>> ??????? #run the ode: >>> ??????? sir_sol = odeint(sir2groups,initCond, tspan, >>> args=(self.p,self.C,self.m,self.VEs,self.VEi,self.VEp,N0,N1))#,self.gamma,self.rho)) >>> >>> >>> Most of the time, it works just fine. However, there are some times where >>> the following message appears: >>> >>> Excess work done on this call (perhaps wrong Dfun type). >>> Run with full_output = 1 to get quantitative information. >>> ?lsoda--? at current t (=r1), mxstep (=i1) steps >>> ?????? taken on this call before reaching tout >>> ????? In above message,? I1 =?????? 500 >>> ????? In above message,? R1 =? 0.1170754095027E+03 >>> >>> I should mention that it is NOT only a warning. This is repeated over and >>> over (thousands of times) and then it will break the rest of my code >>> Ok, after searching in this mailing list, someone else posted a similar >>> warning message and it was suggested to him that " >>> >>> In your case, you might simply be computing for to coarse a mesh in t, >>> so "too much work" has to be done for each step. " >>> >>> >>> Is this what is happening to me? the problem is that I don't get this >>> error every single time, so I don't even know how to run it with full_output >>> = 1 to get the info... >>> >>> >>> >>> >>> I really don't know what to do. Any help will be very very very very >>> appreciated! >>> thank you very >> >> The function odeint has the keyword argument 'mxstep' that determines the >> maximum number of internal steps allowed between requested time values.? The >> default is 500.? It is not unusual to have to increase this, especially in a >> case like yours where you simply want the value at the end of a long time >> interval.? Try increasing it to, say, mxstep=5000. >> >> Warren >> >>> >>> >>> >>> >>> >>> >>> -- >>> Laura >>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > -- > Laura > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From flyzzx at gmail.com Fri Jul 22 14:24:58 2011 From: flyzzx at gmail.com (Chong Yang) Date: Sat, 23 Jul 2011 02:24:58 +0800 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: Thanks for the helpful information. Now I can see that there are many ways to curve fitting without necessarily using spline. -------------- next part -------------- An HTML attachment was scrubbed... URL: From flyzzx at gmail.com Fri Jul 22 14:37:22 2011 From: flyzzx at gmail.com (Chong Yang) Date: Sat, 23 Jul 2011 02:37:22 +0800 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: For my purpose, faithful reconstruction of the road segment is not required. I only need to approximate the curve for further segmentation and trace the outer boundaries, so that the geometries can be stored for further use. Also, discontinuities between adjacent segments is not a primary concern. The fact is this is part of a bigger project and this step only serves to facilitate data extraction. @Zach: The principal curve demo looks awesome indeed! I tried to look for a similar Python implementation of the curve fitting algorithm, but could not find anything other than PCA toolkit. Do you happen to know any existing library? Besides that I applied LLE using MDP, and it does provide a good parametric representation of the curve (though I have to reduce the sample size, otherwise it runs forever) -- unfortunately the subsequent splprep does not work as good as in the non-parametric case. @eat: I am looking at the possibility LLE + segmented regression to obtain the parameters, divide into intervals and then run linear regressions on each. Thanks for the suggestion. On Fri, Jul 22, 2011 at 3:58 AM, eat wrote: > Hi, > > On Thu, Jul 21, 2011 at 5:51 AM, Chong Yang wrote: > >> Hi, >> >> Recently I have been working with 1D curve fitting on dense GPS point >> cloud representing road segments. For all I know, some roads may be highly >> curved and their corresponding x value are not necessary monotonic order. >> >> I tried splrep. It works very well on simple arcs but fails to give >> meaningful result on highly curved data - either because the polynomial >> order 'k' is not sufficient to fit the data well enough or the curve is >> vertically aligned. For now I get around vertical curves by flipping the >> axes and doing splrep with (y, x), but maybe a better approach is possible. >> >> My guess is splprep. However I am not sure how to properly convert the >> noisy data points into their parametric form. Does anyone have an idea on >> this or faced a similar problem before? Thanks. >> > Fundamentally roads are designed based on line segments and circular arc > segments, possible joined by clothoids (with some contiguous criteria). Now, > when the roads are actually constructed they will only follow approximately > the originally intended geometry. > > Now, without really knowing your ultimate goals, I'll simply suggest you to > try to fit consecutive segments based for example on RANSAC ( > http://en.wikipedia.org/wiki/RANSAC). > > So perhaps your first estimate would be based only on line segments fitted > by RANSAC (and fine tuned later if needed). Please note that for the fitting > (of 2D line segments) you need to utilize orthogonal distance regression. > > (Now with this knowledge you could indeed project the line segments to 1D > and fine tune it more there. However splines wont help you, because > possible discontinuities, like connection of line and arc segments. Here you > may like to study more on based on > http://en.wikipedia.org/wiki/Segmented_regression), > > Anyway, please feel free to elaborate more on your particular case. > > -eat > >> >> -- >> Regards, >> CY >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Regards, Goh Chong Yang -------------- next part -------------- An HTML attachment was scrubbed... URL: From ciampagg at usi.ch Fri Jul 22 17:09:18 2011 From: ciampagg at usi.ch (Giovanni Luca Ciampaglia) Date: Fri, 22 Jul 2011 14:09:18 -0700 Subject: [SciPy-User] Gaussian mixture models with censored data Message-ID: <4E29E6FE.709@usi.ch> Hello everybody, Before I delve into implementing it myself, is there any Python implementation of the EM algorithm used for fitting Gaussian mixtures that also handles (right-)censored observations? In particular I need to fit univariate data, so something like this: http://dx.doi.org/10.1080/00949659208811452 Best, -- Giovanni Luca Ciampaglia Ph.D. Candidate Faculty of Informatics University of Lugano Web: http://www.inf.usi.ch/phd/ciampaglia/ Bertastra?e 36 ? 8003 Z?rich ? Switzerland From tritemio at gmail.com Fri Jul 22 17:31:11 2011 From: tritemio at gmail.com (Antonio Ingargiola) Date: Fri, 22 Jul 2011 14:31:11 -0700 Subject: [SciPy-User] binom pdf error In-Reply-To: References: Message-ID: Hi to the list, I got an error using the pdf method on the binom distribution. Same error happens on scipy 0.8 and scipy 0.9 (respectively ubuntu distribution and pythonxy on windows). The error is the following: In [1]: from scipy.stats.distributions import binom In [2]: b = binom(20,0.8) In [3]: b.rvs() Out[3]: 17 In [4]: b.pdf(2) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) C:\Users\anto\.xy\startups\ in () C:\Python26\lib\site-packages\scipy\stats\distributions.pyc in pdf(self, x) 333 334 def pdf(self, x): #raises AttributeError in frozen discrete distribution --> 335 return self.dist.pdf(x, *self.args, **self.kwds) 336 337 def cdf(self, x): AttributeError: 'binom_gen' object has no attribute 'pdf' In [5]: Is this known problem? How can I get the binomial pdf, is there a workaround? Many thanks, Antonio -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Fri Jul 22 17:38:44 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 22 Jul 2011 15:38:44 -0600 Subject: [SciPy-User] binom pdf error In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 3:31 PM, Antonio Ingargiola wrote: > Hi to the list, > > I got an error using the pdf method on the binom distribution. Same error > happens on scipy 0.8 and scipy 0.9 (respectively ubuntu distribution and > pythonxy on windows). > > The error is the following: > > In [1]: from scipy.stats.distributions import binom > > In [2]: b = binom(20,0.8) > > In [3]: b.rvs() > Out[3]: 17 > > In [4]: b.pdf(2) > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > > C:\Users\anto\.xy\startups\ in () > > C:\Python26\lib\site-packages\scipy\stats\distributions.pyc in pdf(self, x) > 333 > 334 def pdf(self, x): #raises AttributeError in frozen discrete > distribution > --> 335 return self.dist.pdf(x, *self.args, **self.kwds) > 336 > 337 def cdf(self, x): > > AttributeError: 'binom_gen' object has no attribute 'pdf' > > In [5]: > > > Is this known problem? How can I get the binomial pdf, is there a > workaround? > Since binom is a discrete distribution, you want the pmf method: In [32]: b = binom(20, 0.8) In [33]: b.pmf(2) Out[33]: 3.1876710400000011e-11 In [34]: b.pmf(18) Out[34]: 0.13690942867206304 The behavior that you observed still looks like a bug to me--why does binom even have a pdf method, if calling it just raises a cryptic exception? Warren > > Many thanks, > Antonio > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrajt at gmail.com Fri Jul 22 17:49:05 2011 From: matrajt at gmail.com (Laura Matrajt) Date: Fri, 22 Jul 2011 21:49:05 +0000 (UTC) Subject: [SciPy-User] Help!!!!!! having problems with ODEINT References: Message-ID: Anne Archibald physics.mcgill.ca> writes: > > Hi Laura, > > odeint uses an adaptive method (I believe an "embedded 4-5 > Runge-Kutta") which evaluates a small number of points and estimates > the error produced by using those points. If the error > is small - and odeint has parameters that let you set how big "small" > is - it'll return this estimate, otherwise it'll subdivide the region > and repeat the process on each subdivision. To avoid infinite loops, > this bails out if too many steps are taken, with a warning. That's > what you're seeing; increasing mxstep will make it go further before > giving up. It won't make a difference in cases where all those steps > aren't necessary. > > It's worth thinking about why the integrator is taking all those > steps. It generally happens because there's some sort of kink or > complex behaviour in the solution there; this can be a genuine feature > of the solution, or (more often in my experience) it can be because > your RHS function has some discontinuity (perhaps due to a bug). So > it's worth checking out those segments. I believe that with > full_output you can check how many steps were needed and have your > program bail out with a set of parameters that trigger the problem. > > Asking odeint for more intermediate points you don't actually care > about will be a less-efficient way of increasing maxstep - with one > exception. If you know there are problem points in your domain you can > put them in your list to make sure the integrator notices them. > Otherwise, just rely on it to figure out which parts of the > integration need extra work. > > Passing the Jacobian in will definitely help reduce the number of > steps needed, if you can compute it analytically. (Otherwise let the > integrator build up a numerical Jacobian rather than trying to do it > yourself.) > > Finally, I should say that apart from the stream of ugly messages, > when you hit mxstep what you get is still an estimate of the solution, > just not a very good one (that is, the integrator thinks there's too > much error). If you're okay with crummy solutions in a few corner > cases, you can just live with the warnings. > > Anne > Hi Anne, thank you very very much for your response.I appreciate your thorough response. I have one last question that is driving me crazy: why is this crashing my code eventhough it is just a warning? Also, regarding the Jacobian, I searched the documentation online and have one little question: suppose my system of odes looks like dy/dt = f(y,t) the Jacobian is only with respect to y? I can compute it analytically (I believe) but I am unsure about the correct syntax to pass it. Do you by any chance about a worked example? I am really sorry I am probably doing dumb questions. I am pretty overwhelmed by this right now and i searched in the internet for examples and I couldn't find one with the Jacobian on it. THanks again, From kgdunn at gmail.com Fri Jul 22 19:59:21 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Fri, 22 Jul 2011 19:59:21 -0400 Subject: [SciPy-User] Help!!!!!! having problems with ODEINT In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 17:49, Laura Matrajt wrote: > > ?Anne Archibald physics.mcgill.ca> writes: > >> >> Hi Laura, >> >> odeint uses an adaptive method (I believe an "embedded 4-5 >> Runge-Kutta") which evaluates a small number of points and estimates >> the error produced by using those points. If the error >> is small - and odeint has parameters that let you set how big "small" >> is - it'll return this estimate, otherwise it'll subdivide the region >> and repeat the process on each subdivision. To avoid infinite loops, >> this bails out if too many steps are taken, with a warning. That's >> what you're seeing; increasing mxstep will make it go further before >> giving up. It won't make a difference in cases where all those steps >> aren't necessary. >> >> It's worth thinking about why the integrator is taking all those >> steps. It generally happens because there's some sort of kink or >> complex behaviour in the solution there; this can be a genuine feature >> of the solution, or (more often in my experience) it can be because >> your RHS function has some discontinuity (perhaps due to a bug). So >> it's worth checking out those segments. I believe that with >> full_output you can check how many steps were needed and have your >> program bail out with a set of parameters that trigger the problem. >> >> Asking odeint for more intermediate points you don't actually care >> about will be a less-efficient way of increasing maxstep - with one >> exception. If you know there are problem points in your domain you can >> put them in your list to make sure the integrator notices them. >> Otherwise, just rely on it to figure out which parts of the >> integration need extra work. >> >> Passing the Jacobian in will definitely help reduce the number of >> steps needed, if you can compute it analytically. (Otherwise let the >> integrator build up a numerical Jacobian rather than trying to do it >> yourself.) >> >> Finally, I should say that apart from the stream of ugly messages, >> when you hit mxstep what you get is still an estimate of the solution, >> just not a very good one (that is, the integrator thinks there's too >> much error). If you're okay with crummy solutions in a few corner >> cases, you can just live with the warnings. >> >> Anne >> > > > > Hi Anne, > ?thank you very very much for your response.I appreciate your thorough response. > I have one last question that is driving me crazy: why is this crashing my code > eventhough it is just a warning? > Also, regarding the Jacobian, I searched the documentation online and have one > little question: > suppose my system of odes looks like > dy/dt = f(y,t) > the Jacobian is only with respect to y? > I can compute it analytically (I believe) but I am unsure about the correct > syntax to pass it. Do you by any chance about a worked example? > I am really sorry I am probably doing dumb questions. I am pretty overwhelmed by > this right now and i searched in the internet for examples and I couldn't find > one with the Jacobian on it. > THanks again, Hi Laura, There's an example in the SciPy documentation on using a Jacobian: http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html This is for the ``ode`` function, not the ``odeint`` function though. I've just posted an example on using ``ode`` with coupled ODEs, no Jacobian over here: http://scipy-central.org/item/13/0/integrating-an-initial-value-problem-multiple-odes Hope that helps, Kevin From tritemio at gmail.com Fri Jul 22 20:30:10 2011 From: tritemio at gmail.com (Antonio Ingargiola) Date: Fri, 22 Jul 2011 17:30:10 -0700 Subject: [SciPy-User] binom pdf error In-Reply-To: References: Message-ID: 2011/7/22 Warren Weckesser > > > On Fri, Jul 22, 2011 at 3:31 PM, Antonio Ingargiola wrote: > >> Hi to the list, >> >> I got an error using the pdf method on the binom distribution. Same error >> happens on scipy 0.8 and scipy 0.9 (respectively ubuntu distribution and >> pythonxy on windows). >> >> The error is the following: >> >> In [1]: from scipy.stats.distributions import binom >> >> In [2]: b = binom(20,0.8) >> >> In [3]: b.rvs() >> Out[3]: 17 >> >> In [4]: b.pdf(2) >> >> --------------------------------------------------------------------------- >> AttributeError Traceback (most recent call >> last) >> >> C:\Users\anto\.xy\startups\ in () >> >> C:\Python26\lib\site-packages\scipy\stats\distributions.pyc in pdf(self, >> x) >> 333 >> 334 def pdf(self, x): #raises AttributeError in frozen discrete >> distribution >> --> 335 return self.dist.pdf(x, *self.args, **self.kwds) >> 336 >> 337 def cdf(self, x): >> >> AttributeError: 'binom_gen' object has no attribute 'pdf' >> >> In [5]: >> >> >> Is this known problem? How can I get the binomial pdf, is there a >> workaround? >> > > > Since binom is a discrete distribution, you want the pmf method: > > In [32]: b = binom(20, 0.8) > > In [33]: b.pmf(2) > Out[33]: 3.1876710400000011e-11 > > In [34]: b.pmf(18) > Out[34]: 0.13690942867206304 > > > The behavior that you observed still looks like a bug to me--why does binom > even have a pdf method, if calling it just raises a cryptic exception? > Warren, Thanks for the clarification. And BTW yes I think that this behaviour is quite boguous. Should I fill a bug report? Antonio -------------- next part -------------- An HTML attachment was scrubbed... URL: From kpr0001 at comcast.net Fri Jul 22 11:59:17 2011 From: kpr0001 at comcast.net (Karen) Date: Fri, 22 Jul 2011 15:59:17 +0000 (UTC) Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython References: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Message-ID: Ralf Gommers googlemail.com> writes: > Numpy/Scipy don't work on .NET, unless forks are available from Enthought. But I haven't seen an announcement on that. What instructions are you referring to? Ralf > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Hmm, that would explain it - it looks like it installs, adds two things that I thought were libraries to the import auto-complete menu, but can't seem to make arrays of integers. Instructions were here http://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net http://blogs.msdn.com/b/volkerw/archive/2011/03/10/python-tools-for-visual- studio- beta.aspx Thanks. From warren.weckesser at enthought.com Fri Jul 22 21:42:00 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 22 Jul 2011 19:42:00 -0600 Subject: [SciPy-User] binom pdf error In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 6:30 PM, Antonio Ingargiola wrote: > 2011/7/22 Warren Weckesser > >> >> >> On Fri, Jul 22, 2011 at 3:31 PM, Antonio Ingargiola wrote: >> >>> Hi to the list, >>> >>> I got an error using the pdf method on the binom distribution. Same error >>> happens on scipy 0.8 and scipy 0.9 (respectively ubuntu distribution and >>> pythonxy on windows). >>> >>> The error is the following: >>> >>> In [1]: from scipy.stats.distributions import binom >>> >>> In [2]: b = binom(20,0.8) >>> >>> In [3]: b.rvs() >>> Out[3]: 17 >>> >>> In [4]: b.pdf(2) >>> >>> --------------------------------------------------------------------------- >>> AttributeError Traceback (most recent call >>> last) >>> >>> C:\Users\anto\.xy\startups\ in () >>> >>> C:\Python26\lib\site-packages\scipy\stats\distributions.pyc in pdf(self, >>> x) >>> 333 >>> 334 def pdf(self, x): #raises AttributeError in frozen >>> discrete distribution >>> --> 335 return self.dist.pdf(x, *self.args, **self.kwds) >>> 336 >>> 337 def cdf(self, x): >>> >>> AttributeError: 'binom_gen' object has no attribute 'pdf' >>> >>> In [5]: >>> >>> >>> Is this known problem? How can I get the binomial pdf, is there a >>> workaround? >>> >> >> >> Since binom is a discrete distribution, you want the pmf method: >> >> In [32]: b = binom(20, 0.8) >> >> In [33]: b.pmf(2) >> Out[33]: 3.1876710400000011e-11 >> >> In [34]: b.pmf(18) >> Out[34]: 0.13690942867206304 >> >> >> The behavior that you observed still looks like a bug to me--why does >> binom even have a pdf method, if calling it just raises a cryptic exception? >> > > Warren, Thanks for the clarification. And BTW yes I think that this > behaviour is quite boguous. Should I fill a bug report? > Good idea--please do. Warren > > Antonio > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjstickel at vcn.com Fri Jul 22 21:49:46 2011 From: jjstickel at vcn.com (Jonathan Stickel) Date: Fri, 22 Jul 2011 19:49:46 -0600 Subject: [SciPy-User] Spline Fitting on Dense and Noisy Points In-Reply-To: References: Message-ID: <4E2A28BA.70409@vcn.com> On 07/22/2011 04:29 PM, scipy-user-request at scipy.org wrote: > On Thu, Jul 21, 2011 at 5:51 AM, Chong Yang wrote: >> >> >> Hi, >> >> >> >> Recently I have been working with 1D curve fitting on dense GPS point >> >> cloud representing road segments. For all I know, some roads may be highly >> >> curved and their corresponding x value are not necessary monotonic order. >> >> >> >> I tried splrep. It works very well on simple arcs but fails to give >> >> meaningful result on highly curved data - either because the polynomial >> >> order 'k' is not sufficient to fit the data well enough or the curve is >> >> vertically aligned. For now I get around vertical curves by flipping the >> >> axes and doing splrep with (y, x), but maybe a better approach is possible. >> >> >> >> My guess is splprep. However I am not sure how to properly convert the >> >> noisy data points into their parametric form. Does anyone have an idea on >> >> this or faced a similar problem before? Thanks. >> >> I know I am replying to this thread a bit late, but you might also be interested in smoothing by regularization: http://packages.python.org/scikits.datasmooth/regularsmooth.html Scattered data are OK. However, curves that are vertical or bend back over themselves will present problems, as you have found with splines. HTH, Jonathan From ralf.gommers at googlemail.com Sun Jul 24 12:58:39 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 24 Jul 2011 18:58:39 +0200 Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython In-Reply-To: References: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Message-ID: On Fri, Jul 22, 2011 at 5:59 PM, Karen wrote: > > > Ralf Gommers googlemail.com> writes: > > > Numpy/Scipy don't work on .NET, unless forks are available from > Enthought. But > I > haven't seen an announcement on that. What instructions are you referring > to? > Ralf > > > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > Hmm, that would explain it - it looks like it installs, adds two things > that I > thought were libraries to the import auto-complete menu, but can't seem to > make > arrays of integers. > > Instructions were here > > http://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net > > http://blogs.msdn.com/b/volkerw/archive/2011/03/10/python-tools-for-visual- > studio- > beta.aspx > > So it is available. There's a support link at http://pytools.codeplex.com/, that looks like the appropriate place to ask. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrajt at gmail.com Mon Jul 25 13:09:10 2011 From: matrajt at gmail.com (Laura Matrajt) Date: Mon, 25 Jul 2011 17:09:10 +0000 (UTC) Subject: [SciPy-User] Help!!!!!! having problems with ODEINT References: Message-ID: Kevin Dunn gmail.com> writes: > Hi Laura, > > There's an example in the SciPy documentation on using a Jacobian: > http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html > > This is for the ``ode`` function, not the ``odeint`` function though. > > I've just posted an example on using ``ode`` with coupled ODEs, no Jacobian > over here: > http://scipy-central.org/item/13/0/integrating-an-initial-value-problem-multiple-odes > > Hope that helps, > Kevin > Hi Kevin, the example was good. I now have implemented my Jacobian. Thank you very much!!!!! From collinstocks at gmail.com Mon Jul 25 15:04:05 2011 From: collinstocks at gmail.com (collinstocks at gmail.com) Date: Mon, 25 Jul 2011 15:04:05 -0400 Subject: [SciPy-User] f2py Message-ID: Lists (scipy-dev and scipy-user), If anybody here is at least somewhat familiar with f2py, could you please review and comment on that section of my pull request: https://github.com/ scipy/scipy/pull/44 Specifically, please take a look at https://github.com/scipy/scipy/pull/44/files#diff-1 Thanks in advance. Collin -------------- next part -------------- An HTML attachment was scrubbed... URL: From johann.cohentanugi at gmail.com Mon Jul 25 15:57:03 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Mon, 25 Jul 2011 21:57:03 +0200 Subject: [SciPy-User] Gaussian mixture models with censored data In-Reply-To: <4E29E6FE.709@usi.ch> References: <4E29E6FE.709@usi.ch> Message-ID: <4E2DCA8F.7090206@gmail.com> just googling for it, I found http://www.pymix.org/pymix/index.php?n=PyMix.Tutorial Is that what you want? It is based on GSL rather than scipy, but requires numpy anyway. A numpy/scipy implementation of EM, I believe, can be found at http://www.cta-observatory.org/indico/conferenceDisplay.py?confId=40 Not sure that it has censoring, but David Cournapeau is reading this mailing list AFAIK There is also a google code em-python, etc..... I am sure you will find something useful, even if you need to code a dedicated higher level part of these for censored data. good luck Johann On 07/22/2011 11:09 PM, Giovanni Luca Ciampaglia wrote: > Hello everybody, > Before I delve into implementing it myself, is there any Python > implementation of the EM algorithm used for fitting Gaussian mixtures > that also handles (right-)censored observations? > > In particular I need to fit univariate data, so something like this: > http://dx.doi.org/10.1080/00949659208811452 > > Best, > From matrajt at gmail.com Mon Jul 25 17:53:57 2011 From: matrajt at gmail.com (Laura Matrajt) Date: Mon, 25 Jul 2011 21:53:57 +0000 (UTC) Subject: [SciPy-User] Help!!!!!! having NEW problems with ODEINT References: Message-ID: Laura Matrajt gmail.com> writes: > > Kevin Dunn gmail.com> writes: > > > Hi Laura, > > > > There's an example in the SciPy documentation on using a Jacobian: > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html > > > > This is for the ``ode`` function, not the ``odeint`` function though. > > > > I've just posted an example on using ``ode`` with coupled ODEs, no Jacobian > > over here: > > > http://scipy-central.org/item/13/0/integrating-an-initial-value-problem-multiple-odes > > > > Hope that helps, > > Kevin > > > > Hi Kevin, > the example was good. I now have implemented my Jacobian. Thank you very much!!!!! > Hi all: sorry to bother you again! I implemented the Jacobian as Anne kindly suggested to me with the help of Kevin's pointers to the correct webpage AND I increased the maximum number of steps as Warren kindly said. I am now getting a new message: lsoda-- warning..internal t (=r1) and h (=r2) are such that in the machine, t + h = t on the next step (h = step size). solver will continue anyway In above, R1 = 0.1209062893646E+03 R2 = 0.9059171791494E-18 It is just a warning, and my code continues to run. But I am really worried about this being a bug. The problem is that I am coupling a system of ODE's with a stochastic process. Mainly, I simulate a day of an epidemic, stop the integrator, change some of the initial conditions stochastically (not just randomly, I do follow some rules) and I run the ODE again and so on. I have run this millions of times (and I am not exagerating about the millions) and it doesn't produce any warnings, but every now and then (~15 times) it does it. I don't know if this is a bug or just that not all of my domain will be good for the ODE's... If anyone has any suggestion of how to think about this problem, I will really appreciate it!!!!!! thanks to all the people that replied to me previously, you helped me sooo much already! From jason.heeris at gmail.com Tue Jul 26 06:31:56 2011 From: jason.heeris at gmail.com (Jason Heeris) Date: Tue, 26 Jul 2011 18:31:56 +0800 Subject: [SciPy-User] Questions about scipy.optimize.anneal Message-ID: Hi, I have a few questions about scipy.optimize.anneal.?I'm using 0.9.0 under Debian Squeeze with Python 2.6.6, but the documentation for anneal is the same as for 0.10.0. Firstly, I'm confused about the return values ? ?the docs say that the return values are: ? ? xmin (ndarray), retval (int), Jmin (float), T (float), feval (int), ? ? iters (int), accept (int) ...however, running the attached script gives ? ? (array([-10.16682436, ? 6.46000538]), 12.045579564382059, ? ? 0.72724961763274643, 851, 16, 180, 5) ? ? [, , ? ? , , , , ? ? ] Note that the types don't match and there's no value that could correspond to the documented options for retval! Am I reading things wrong, or are these inconsistent with the docs? Secondly, since annealing is a random process, is there any way to control the source of random numbers it uses? Should I seed some global RNG, and if so, which one? Thirdly, note that the script actually fails to find the optimal parameters, despite being given them as the starting point. I know that annealing is a far from reliable process, but this behaviour still seems a little strange given the simplicity of the function I'm using here. Are there additional constraints I should be supplying, or some other way to guide the annealing process? Fourthly, in my actual application I have a function to optimise that is invariant to the overall scale of the inputs, ie. f(a*array([x,y,z])) is the same for all non-zero values of "a". Is there a risk of anneal failing because of this degeneracy? Is there a way to structure the problem to avoid this? (Note that any component could legitimately be zero, so I can't just fix x = 1 ... I think). Cheers, Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: annealtest.py Type: text/x-python Size: 248 bytes Desc: not available URL: From keenanpepper at gmail.com Thu Jul 21 19:32:36 2011 From: keenanpepper at gmail.com (Keenan Pepper) Date: Thu, 21 Jul 2011 16:32:36 -0700 Subject: [SciPy-User] Looking for a vectorized adaptive quadrature routine Message-ID: I'm looking for a numerical quadrature routine with a combination of specific properties. My integrand function is much more efficient if called in batches (vec_func). That seems to eliminate "quad" and "fixed_quad". Also, my integrand is extremely non-uniform, so the quadrature routine must be adaptive. This eliminates "quadrature" and "romberg" (unless I'm really misunderstanding what "romberg" is), as well as the fixed sample quadrature routines. There seems to be nothing left! Does Scipy really lack a vectorized, adaptive quadrature routine? I'm sure there are other users who would benefit from such a feature. Keenan Pepper From offonoffoffonoff at gmail.com Mon Jul 25 14:56:39 2011 From: offonoffoffonoff at gmail.com (elliot) Date: Mon, 25 Jul 2011 11:56:39 -0700 (PDT) Subject: [SciPy-User] trusting splprep Message-ID: <6b546480-a333-4171-af87-5fc6721cad7e@j9g2000prj.googlegroups.com> backstory: I am working on an optical raytracing program written in python. Currently, I am reworking the face that uses a spline approximation of a profile. Specifically, moving from a second order to third order. The evaluating of the spline is done in c code that I intend to write, and so in exploring what exactly the output of splprep means, and how to construct the curve in question from tck, I discovered that I need to know how to use splprep better. question: How can I be sure that my spline approximation is sufficiently accurate? In ray tracing, a slight inaccuracy in the derivative of the interpolation could have a large effect. My use of splprep is awkward, so i am looking for some advice. for instance, just trying a simple example: ##### >>>x = range(0,6) >>>y = [c**3 for c in x] >>>tck,uout = scipy.interpolate.splprep([x,y],s=.000005,k=3) >>>t = numpy.linspace(0,1,6) #evenly spaced parameters, should correspond to x >>> xy = scipy.interpolate.splev(t,tck) >>> print xy[0] #should be same as x array([ 0.10119217, 2.81103321, 3.959404 , 3.90686522, 3.85308141, 4.99999962] # !! x went backwards as parameter progressed: graph is not a function >>> inty = [c**3 for c in xy[0]] #differance between interpolated y's and cubes of interpolated x's >>> numpy.array(xy[1])-numpy.array(inty) array([ -6.36532821e-04, 2.50019069e+00, -1.25009170e+01, 1.53618385e+01, 4.31465996e+01, 2.89434607e-05]) #### Now, this is pretty large error considering that this is a cubic spline on a simple cubic function. This should be pretty accurate, if not exact, but an order of magnitude off just isn't right. changing the smoothness from .00005 to 5 doesn't make a difference. So, how do I use splprep correctly. And, does anyone have any pointers regarding ensuring a certain level of accuracy in the results, or at least raising an error if the spline is not accurate enough. thanks, Elliot From paul at bilokon.co.uk Tue Jul 26 08:30:53 2011 From: paul at bilokon.co.uk (Paul Bilokon) Date: Tue, 26 Jul 2011 13:30:53 +0100 Subject: [SciPy-User] Status of TimeSeries SciKit Message-ID: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> Hi, I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? Best wishes, Paul From kgdunn at gmail.com Tue Jul 26 09:18:24 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Tue, 26 Jul 2011 09:18:24 -0400 Subject: [SciPy-User] Help!!!!!! having NEW problems with ODEINT In-Reply-To: References: Message-ID: On Mon, Jul 25, 2011 at 17:53, Laura Matrajt wrote: > Laura Matrajt gmail.com> writes: > >> >> Kevin Dunn gmail.com> writes: >> >> > Hi Laura, >> > >> > There's an example in the SciPy documentation on using a Jacobian: >> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html >> > >> > This is for the ``ode`` function, not the ``odeint`` function though. >> > >> > I've just posted an example on using ``ode`` with coupled ODEs, no Jacobian >> > over here: >> > >> > http://scipy-central.org/item/13/0/integrating-an-initial-value-problem-multiple-odes >> > >> > Hope that helps, >> > Kevin >> > >> >> Hi Kevin, >> ?the example was good. I now have implemented my Jacobian. Thank you very > much!!!!! >> > > > Hi all: > sorry to bother you again! I implemented the Jacobian as Anne kindly suggested > to me with the help of Kevin's pointers to the correct webpage AND I increased > the maximum number of steps as Warren kindly said. > I am now getting a new message: > > lsoda-- ?warning..internal t (=r1) and h (=r2) are > ? ? ? such that in the machine, t + h = t on the next step > ? ? ? (h = step size). solver will continue anyway > ? ? ?In above, ?R1 = ?0.1209062893646E+03 ? R2 = ?0.9059171791494E-18 I would find the set(s) of initial conditions that give this warning, and repeat the integration with a different integrator and/or different tolerance settings on the integrator. Then compare the trajectories. If there is negligible difference, then can probably ignore the warning. That being said, I've not used the lsoda integrator before, so I have no idea how serious this warning might be. Which is why I recommend you try a different integrator also. Kevin > It is just a warning, and my code continues to run. But I am really worried > about this being a bug. The problem is that I am coupling a system of ODE's with > a stochastic process. Mainly, I simulate a day of an epidemic, stop the > integrator, change some of the initial conditions stochastically (not just > randomly, I do follow some rules) and I run the ODE again and so on. > ?I have run this millions of times (and I am not exagerating about the millions) > and it doesn't produce any warnings, but every now and then (~15 times) it does > it. I don't know if this is a bug or just that not all of my domain will be good > for the ODE's... > If anyone has any suggestion of how to think about this problem, I will really > appreciate it!!!!!! > thanks to all the people that replied to me previously, you helped me sooo much > already! From pgmdevlist at gmail.com Tue Jul 26 09:30:04 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 26 Jul 2011 15:30:04 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> Message-ID: <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > Hi, > > I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? Years is an overstatement... The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. That doesn't mean that you'd be on your own, questions will still be answered... From warren.weckesser at enthought.com Tue Jul 26 09:45:33 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 26 Jul 2011 08:45:33 -0500 Subject: [SciPy-User] Help!!!!!! having NEW problems with ODEINT In-Reply-To: References: Message-ID: Hi Laura, On Mon, Jul 25, 2011 at 4:53 PM, Laura Matrajt wrote: > Laura Matrajt gmail.com> writes: > >> >> Kevin Dunn gmail.com> writes: >> >> > Hi Laura, >> > >> > There's an example in the SciPy documentation on using a Jacobian: >> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.ode.html >> > >> > This is for the ``ode`` function, not the ``odeint`` function though. >> > >> > I've just posted an example on using ``ode`` with coupled ODEs, no Jacobian >> > over here: >> > >> > http://scipy-central.org/item/13/0/integrating-an-initial-value-problem-multiple-odes >> > >> > Hope that helps, >> > Kevin >> > >> >> Hi Kevin, >> ?the example was good. I now have implemented my Jacobian. Thank you very > much!!!!! >> > > > Hi all: > sorry to bother you again! I implemented the Jacobian as Anne kindly suggested > to me with the help of Kevin's pointers to the correct webpage AND I increased > the maximum number of steps as Warren kindly said. > I am now getting a new message: > > lsoda-- ?warning..internal t (=r1) and h (=r2) are > ? ? ? such that in the machine, t + h = t on the next step > ? ? ? (h = step size). solver will continue anyway > ? ? ?In above, ?R1 = ?0.1209062893646E+03 ? R2 = ?0.9059171791494E-18 Something about your system is causing the solver to reduce its internal step size down to about 1e-17 (and it can't go any smaller than that). Do you actually have a discontinuity in your equations? Is your system singularly perturbed, with shock-like or boundary layer solutions? As Anne said: "It's worth thinking about why the integrator is taking all those steps. It generally happens because there's some sort of kink or complex behaviour in the solution there; this can be a genuine feature of the solution, or (more often in my experience) it can be because your RHS function has some discontinuity (perhaps due to a bug). So it's worth checking out those segments." If you can isolate the initial conditions and parameters that lead to this warning, you could plot the solution that it generates to see what is going on. Warren > > It is just a warning, and my code continues to run. But I am really worried > about this being a bug. The problem is that I am coupling a system of ODE's with > a stochastic process. Mainly, I simulate a day of an epidemic, stop the > integrator, change some of the initial conditions stochastically (not just > randomly, I do follow some rules) and I run the ODE again and so on. > ?I have run this millions of times (and I am not exagerating about the millions) > and it doesn't produce any warnings, but every now and then (~15 times) it does > it. I don't know if this is a bug or just that not all of my domain will be good > for the ODE's... > If anyone has any suggestion of how to think about this problem, I will really > appreciate it!!!!!! > thanks to all the people that replied to me previously, you helped me sooo much > already! > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From wesmckinn at gmail.com Tue Jul 26 10:25:15 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 26 Jul 2011 10:25:15 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> Message-ID: On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM wrote: > > On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > >> Hi, >> >> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? > > Years is an overstatement... > The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and ?unfortunately don't ?currently have the time to keep it up-to-date. > *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. > That doesn't mean that you'd be on your own, questions will still be answered... > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > hi Paul, Skipper and I (statsmodels) relatively recently discussed moving scikits.timeseries to GitHub and maintaining it there since we work on models for time series analysis. I work very actively on time series-related functionality in pandas so it might not even be unthinkable to merge together the projects (scikits.timeseries and pandas) and integrate all the numpy.datetime64 stuff once the dust settles there. Just thinking out loud. - Wes From pgmdevlist at gmail.com Tue Jul 26 10:35:32 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 26 Jul 2011 16:35:32 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> Message-ID: <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: > On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM wrote: >> >> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: >> >>> Hi, >>> >>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? >> >> Years is an overstatement... >> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. >> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. >> That doesn't mean that you'd be on your own, questions will still be answered... >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > hi Paul, > > Skipper and I (statsmodels) relatively recently discussed moving > scikits.timeseries to GitHub and maintaining it there since we work on > models for time series analysis. Er? https://github.com/pierregm/scikits.timeseries/ https://github.com/pierregm/scikits.timeseries-sandbox/ the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). > I work very actively on time > series-related functionality in pandas so it might not even be > unthinkable to merge together the projects (scikits.timeseries and > pandas) and integrate all the numpy.datetime64 stuff once the dust > settles there. Just thinking out loud. That's an idea. From jsseabold at gmail.com Tue Jul 26 10:42:40 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 26 Jul 2011 10:42:40 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> Message-ID: On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM wrote: > > On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: > > > On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM wrote: > >> > >> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > >> > >>> Hi, > >>> > >>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? > >> > >> Years is an overstatement... > >> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and ?unfortunately don't ?currently have the time to keep it up-to-date. > >> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. > >> That doesn't mean that you'd be on your own, questions will still be answered... > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > > hi Paul, > > > > Skipper and I (statsmodels) relatively recently discussed moving > > scikits.timeseries to GitHub and maintaining it there since we work on > > models for time series analysis. > > Er? > https://github.com/pierregm/scikits.timeseries/ > https://github.com/pierregm/scikits.timeseries-sandbox/ > Great. Is this the "official" advertised repo? I remember there was some chatter about this a few months back but lost track of the thread. > the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). > > > > > I work very actively on time > > series-related functionality in pandas so it might not even be > > unthinkable to merge together the projects (scikits.timeseries and > > pandas) and integrate all the numpy.datetime64 stuff once the dust > > settles there. Just thinking out loud. > > That's an idea. > Any thoughts on the idea? Do you think it's reasonable and/or beneficial? There is also some talk with the scikits.learn and scikits.statsmodels to drop the scikits namespace, which would be better as a collective decision, so the merging could be a part of this? I use both packages now, and I, for one, would love to see them come together and share to the extent this is feasible. Others? I especially like the plotting stuff since it's great but I've had to make a few local patches here and there for mpl changes. Skipper From Dharhas.Pothina at twdb.state.tx.us Tue Jul 26 10:45:59 2011 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Tue, 26 Jul 2011 09:45:59 -0500 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> Message-ID: <4E2E8CD70200009B0003C090@GWWEB.twdb.state.tx.us> > models for time series analysis. I work very actively on time > series-related functionality in pandas so it might not even be > unthinkable to merge together the projects (scikits.timeseries and > pandas) and integrate all the numpy.datetime64 stuff once the dust > settles there. Just thinking out loud. There is functionality I like and use in both pandas and scikits.timeseries, moving towards and eventual goal of merging the two is a great idea. - dharhas -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Jul 26 11:01:51 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 26 Jul 2011 17:01:51 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> Message-ID: On Jul 26, 2011, at 4:42 PM, Skipper Seabold wrote: > On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM wrote: >> >> On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: >> >>> On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM wrote: >>>> >>>> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: >>>> >>>>> Hi, >>>>> >>>>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? >>>> >>>> Years is an overstatement... >>>> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. >>>> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. >>>> That doesn't mean that you'd be on your own, questions will still be answered... >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> hi Paul, >>> >>> Skipper and I (statsmodels) relatively recently discussed moving >>> scikits.timeseries to GitHub and maintaining it there since we work on >>> models for time series analysis. >> >> Er? >> https://github.com/pierregm/scikits.timeseries/ >> https://github.com/pierregm/scikits.timeseries-sandbox/ >> > > Great. Is this the "official" advertised repo? I remember there was > some chatter about this a few months back but lost track of the > thread. Yep. The scikits.timeseries is just the SVN site ported to git. The sandbox one was dubbed 'experimental' on this very list. > >> the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). >> >> >> >>> I work very actively on time >>> series-related functionality in pandas so it might not even be >>> unthinkable to merge together the projects (scikits.timeseries and >>> pandas) and integrate all the numpy.datetime64 stuff once the dust >>> settles there. Just thinking out loud. >> >> That's an idea. >> > > Any thoughts on the idea? Do you think it's reasonable and/or > beneficial? There is also some talk with the scikits.learn and > scikits.statsmodels to drop the scikits namespace, which would be > better as a collective decision, so the merging could be a part of > this? I use both packages now, and I, for one, would love to see them > come together and share to the extent this is feasible. Others? I > especially like the plotting stuff since it's great but I've had to > make a few local patches here and there for mpl changes. No surprise for matplotlib. I kinda dropped the ball here (when I need to plot stuffs these days, I don't use mpl). I haven't used pandas yet, for the same reasons why I wasn't able to keep with updating scikits.timeseries. But if y'all use the two in parallel and have a need for porting scikits.timeseries to pandas, then go for it, you have my blessing. And you know where to contact me if you have some issues or questions. From cjordan1 at uw.edu Tue Jul 26 11:46:31 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 26 Jul 2011 10:46:31 -0500 Subject: [SciPy-User] Questions about scipy.optimize.anneal In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 5:31 AM, Jason Heeris wrote: > Hi, > > I have a few questions about scipy.optimize.anneal. I'm using 0.9.0 > under Debian Squeeze with Python 2.6.6, but the documentation for > anneal is the same as for 0.10.0. > > Firstly, I'm confused about the return values ? the docs say that the > return values are: > xmin (ndarray), retval (int), Jmin (float), T (float), feval (int), > iters (int), accept (int) > > ...however, running the attached script gives > > (array([-10.16682436, 6.46000538]), 12.045579564382059, > 0.72724961763274643, 851, 16, 180, 5) > > [, , > , , , , > ] > > Note that the types don't match and there's no value that could > correspond to the documented options for retval! Am I reading things > wrong, or are these inconsistent with the docs? > The docs are wrong. Judging from the source code, it appears that retval is the final entry in the tuple of returned values and the rest are just shifted up one. The floats in the docs are, apparently, really np.floats. > > Secondly, since annealing is a random process, is there any way to > control the source of random numbers it uses? Should I seed some > global RNG, and if so, which one? > > It appears from the source code that all sources of randomness in the function come from calls to random number generators in numpy.random. So numpy.random.seed should be the only seed you need to set. (And this seemed to be the case after testing it on your sample function.) > Thirdly, note that the script actually fails to find the optimal > parameters, despite being given them as the starting point. I know > that annealing is a far from reliable process, but this behaviour > still seems a little strange given the simplicity of the function I'm > using here. Are there additional constraints I should be supplying, or > some other way to guide the annealing process? > > The default schedule is 'fast', and it does seem to give really lousy results. It seems to cool too quickly. If you try the boltzmann or cauchy schedules, they take longer but give much better estimates of the minimum. > Fourthly, in my actual application I have a function to optimise that > is invariant to the overall scale of the inputs, ie. > f(a*array([x,y,z])) is the same for all non-zero values of "a". Is > there a risk of anneal failing because of this degeneracy? Is there a > way to structure the problem to avoid this? (Note that any component > could legitimately be zero, so I can't just fix x = 1 ... I think). > > Assuming your function is continuous, I think the biggest problem might be that your function doesn't increase as x becomes big. So your samples could wander to infinity. I'd be curious what would happen if you simply scaled each new sample so it had unit length. But to do that you'll need to download scipy's source and make the modifications yourself. Another possibility is simply giving reasonable upper and lower bounds for x using lower and upper. -Chris Jordan-Squire > Cheers, > Jason > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.heeris at gmail.com Tue Jul 26 12:18:36 2011 From: jason.heeris at gmail.com (Jason Heeris) Date: Wed, 27 Jul 2011 00:18:36 +0800 Subject: [SciPy-User] Questions about scipy.optimize.anneal In-Reply-To: References: Message-ID: On 26 July 2011 23:46, Christopher Jordan-Squire wrote: > The docs are wrong. Judging from the source code, it appears that retval is > the final entry in the tuple of returned values and the rest are just > shifted up one. The floats in the docs are, apparently, really np.floats. Thanks for all the info ? there's only one detail I'm still puzzled about: what would a "retval" of 5 indicate? ? Jason From cjordan1 at uw.edu Tue Jul 26 12:30:35 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 26 Jul 2011 11:30:35 -0500 Subject: [SciPy-User] Questions about scipy.optimize.anneal In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 11:18 AM, Jason Heeris wrote: > On 26 July 2011 23:46, Christopher Jordan-Squire wrote: > > The docs are wrong. Judging from the source code, it appears that retval > is > > the final entry in the tuple of returned values and the rest are just > > shifted up one. The floats in the docs are, apparently, really np.floats. > > Thanks for all the info ? there's only one detail I'm still puzzled > about: what would a "retval" of 5 indicate? > > As advertised by the warning, it just seems to indicate that the point the annealing cooled to wasn't the lowest point it encountered. Just a warning, I think, that things are pretty screwed up and you should try a different temperature schedule. But it's strange that's not in the docs. -Chris Jordan-Squire > ? Jason > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lechtlr at yahoo.com Tue Jul 26 13:54:02 2011 From: lechtlr at yahoo.com (lechtlr) Date: Tue, 26 Jul 2011 10:54:02 -0700 (PDT) Subject: [SciPy-User] VODE with BDF method Message-ID: <1311702842.31060.YahooMailClassic@web112710.mail.gq1.yahoo.com> Is there any option in Scipy to use "internally generated full Jacobian" with VODE/BDF method for stiff systems ? Any help would greatly be appreciated. -Lex From mattknox.ca at gmail.com Tue Jul 26 13:58:27 2011 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 26 Jul 2011 17:58:27 +0000 (UTC) Subject: [SciPy-User] Status of TimeSeries SciKit References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> Message-ID: > >>> I work very actively on time > >>> series-related functionality in pandas so it might not even be > >>> unthinkable to merge together the projects (scikits.timeseries and > >>> pandas) and integrate all the numpy.datetime64 stuff once the dust > >>> settles there. Just thinking out loud. > >> > >> That's an idea. > >> > > > > Any thoughts on the idea? Do you think it's reasonable and/or > > beneficial? There is also some talk with the scikits.learn and > > scikits.statsmodels to drop the scikits namespace, which would be > > better as a collective decision, so the merging could be a part of > > this? I use both packages now, and I, for one, would love to see them > > come together and share to the extent this is feasible. Others? I > > especially like the plotting stuff since it's great but I've had to > > make a few local patches here and there for mpl changes. > > No surprise for matplotlib. I kinda dropped the ball here (when I need to > plot stuffs these days, I don't use mpl). I haven't used pandas yet, for the > same reasons why I wasn't able to keep with updating scikits.timeseries. > But if y'all use the two in parallel and have a need for porting > scikits.timeseries to pandas, then go for it, you have my blessing. And you > know where to contact me if you have some issues or questions. I would basically echo Pierre's comments here. I don't have the time (or to be perfectly honest, the energy and motivation) to maintain the timeseries module anymore and would definitely be in favor of any efforts to merge its functionality into a better supported module. It's clear at this point that the timeseries module in its current form is a dead end given the lack of maintainers as well as the fundamental building blocks which are coming into place that would allow a better timeseries module. Those building blocks being: 1. datetime data type support in numpy 2. improved missing value support in numpy 3. data array / labelled array / pandas type of stuff which should (in theory) simplify indexing a timeseries with dates relative to the large hacks used in the current timeseries module In many ways, the timeseries module is a giant hack which tries to work around the fact that it is missing these key foundational pieces in numpy. If pandas is the module that unifies all these concepts into a cohesive package, then I think that is fantastic! And from lurking on the numpy and scipy mailing lists and monitoring all the threads on the related topics recently, I feel confident that I have little to contribute and that the problem rests in much more capable hands than my own :) - Matt Knox From johann.cohentanugi at gmail.com Tue Jul 26 14:18:18 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Tue, 26 Jul 2011 20:18:18 +0200 Subject: [SciPy-User] Gaussian mixture models with censored data In-Reply-To: <4E2DCA8F.7090206@gmail.com> References: <4E29E6FE.709@usi.ch> <4E2DCA8F.7090206@gmail.com> Message-ID: <4E2F04EA.7050507@gmail.com> ooops something weird happened when I copied-paste David's page. Here it is : http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/ On 07/25/2011 09:57 PM, Johann Cohen-Tanugi wrote: > just googling for it, I found > http://www.pymix.org/pymix/index.php?n=PyMix.Tutorial > Is that what you want? It is based on GSL rather than scipy, but > requires numpy anyway. > > A numpy/scipy implementation of EM, I believe, can be found at > http://www.cta-observatory.org/indico/conferenceDisplay.py?confId=40 > Not sure that it has censoring, but David Cournapeau is reading this > mailing list AFAIK > > There is also a google code em-python, etc..... I am sure you will find > something useful, even if you need to code a dedicated higher level part > of these for censored data. > > good luck > Johann > > On 07/22/2011 11:09 PM, Giovanni Luca Ciampaglia wrote: >> Hello everybody, >> Before I delve into implementing it myself, is there any Python >> implementation of the EM algorithm used for fitting Gaussian mixtures >> that also handles (right-)censored observations? >> >> In particular I need to fit univariate data, so something like this: >> http://dx.doi.org/10.1080/00949659208811452 >> >> Best, >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From johann.cohentanugi at gmail.com Tue Jul 26 14:20:06 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Tue, 26 Jul 2011 20:20:06 +0200 Subject: [SciPy-User] Gaussian mixture models with censored data In-Reply-To: <4E2DCA8F.7090206@gmail.com> References: <4E29E6FE.709@usi.ch> <4E2DCA8F.7090206@gmail.com> Message-ID: <4E2F0556.9000002@gmail.com> ooops something weird happened when I copied-paste David's page. Here it is : http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/ On 07/25/2011 09:57 PM, Johann Cohen-Tanugi wrote: > just googling for it, I found > http://www.pymix.org/pymix/index.php?n=PyMix.Tutorial > Is that what you want? It is based on GSL rather than scipy, but > requires numpy anyway. > > A numpy/scipy implementation of EM, I believe, can be found at > http://www.cta-observatory.org/indico/conferenceDisplay.py?confId=40 > Not sure that it has censoring, but David Cournapeau is reading this > mailing list AFAIK > > There is also a google code em-python, etc..... I am sure you will find > something useful, even if you need to code a dedicated higher level part > of these for censored data. > > good luck > Johann > > On 07/22/2011 11:09 PM, Giovanni Luca Ciampaglia wrote: >> Hello everybody, >> Before I delve into implementing it myself, is there any Python >> implementation of the EM algorithm used for fitting Gaussian mixtures >> that also handles (right-)censored observations? >> >> In particular I need to fit univariate data, so something like this: >> http://dx.doi.org/10.1080/00949659208811452 >> >> Best, >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From rob.clewley at gmail.com Tue Jul 26 15:29:15 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Tue, 26 Jul 2011 15:29:15 -0400 Subject: [SciPy-User] VODE with BDF method In-Reply-To: <1311702842.31060.YahooMailClassic@web112710.mail.gq1.yahoo.com> References: <1311702842.31060.YahooMailClassic@web112710.mail.gq1.yahoo.com> Message-ID: Lex, I hope you understand that the internally generated Jacobian is only a numeric finite-difference approximation. If you use the set_integrator method you can set the BDF option using method="stiff". ode15s = scipy.integrate.ode(f) ode15s.set_integrator('vode', method='bdf', order=15, nsteps=3000) ode15s.set_initial_value(u0, t0) See http://mail.scipy.org/pipermail/scipy-dev/2002-February/000274.html ... as you can often compute your own symbolic Jacobian using a package such as SymPy, or import it from Mathematica or Maple, as shown in the above example. It will always lead to much more efficient and accurate results for stiff systems. -Rob On Tue, Jul 26, 2011 at 1:54 PM, lechtlr wrote: > Is there any option in Scipy to use "internally generated full Jacobian" ?with VODE/BDF method for stiff systems ? Any help would greatly be appreciated. > > -Lex > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gael.varoquaux at normalesup.org Tue Jul 26 18:28:43 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 27 Jul 2011 00:28:43 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> Message-ID: <20110726222843.GB8920@phare.normalesup.org> On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote: > In many ways, the timeseries module is a giant hack which tries to work > around the fact that it is missing these key foundational pieces in > numpy. I don't believe this statement is true. If you are doing statistics, you think that what is really missing in numpy is missing data support. If you are doing timeseries analysis, you are missing timeseries support. If you are doing spatial models, you are missing unstructured spatial data support with builtin interpolation, if you are doing general relativity, you are missing contra/co-variant tensor support. In my opinion, the important thing to keep in mind is that while each domain-specific application calls for different specific data structures, they are not universal, and you cannot stick them all in one library. The good new is that with numpy arrays, you can build data structures and libraries that talk more or less together, sharing the data accross domain. However, the more you embedded your specificities in your data structure, the more it becomes alien to people who don't have the same usecases. For instance the various VTK data structures are amongst the most beautiful structures for encoding spatial information. Yet most people not coming from 3D data processing hate them, because they don't understand them, and others are very busy reinventing the same set of abstractions. Similarly, R is great for statistics, but people who don't do statistics find the syntax incomprehensible and the data structures too restrictive. Matlab is great for linear alegbra, but if you move in N-dimensional word it gets clumsy. My point is: let us stop dreaming that a change to core numpy will solve our problems. I am not saying that it cannot be improved, but in my opinion, the reason numpy is so successful is that it is actually the intersection of many different domain-specific requirements, and not the union. 2 cents from the peanut gallery, Gael From wesmckinn at gmail.com Tue Jul 26 21:10:24 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 26 Jul 2011 21:10:24 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <20110726222843.GB8920@phare.normalesup.org> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> Message-ID: On Tue, Jul 26, 2011 at 6:28 PM, Gael Varoquaux wrote: > On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote: >> In many ways, the timeseries module is a giant hack which tries to work >> around the fact that it is missing these key foundational pieces in >> numpy. > > I don't believe this statement is true. If you are doing statistics, you > think that what is really missing in numpy is missing data support. If > you are doing timeseries analysis, you are missing timeseries support. If > you are doing spatial models, you are missing unstructured spatial data > support with builtin interpolation, if you are doing general relativity, > you are missing contra/co-variant tensor support. > > In my opinion, the important thing to keep in mind is that while each > domain-specific application calls for different specific data structures, > they are not universal, and you cannot stick them all in one library. The > good new is that with numpy arrays, you can build data structures and > libraries that talk more or less together, sharing the data accross > domain. However, the more you embedded your specificities in your data > structure, the more it becomes alien to people who don't have the same > usecases. For instance the various VTK data structures are amongst the > most beautiful structures for encoding spatial information. Yet most > people not coming from 3D data processing hate them, because they don't > understand them, and others are very busy reinventing the same set of > abstractions. Similarly, R is great for statistics, but people who don't > do statistics find the syntax incomprehensible and the data structures > too restrictive. Matlab is great for linear alegbra, but if you move in > N-dimensional word it gets clumsy. > > My point is: let us stop dreaming that a change to core numpy will solve > our problems. I am not saying that it cannot be improved, but in my > opinion, the reason numpy is so successful is that it is actually the > intersection of many different domain-specific requirements, and not the > union. +1, I agree completely: NumPy will provide the fundamental building blocks we can use to build domain-specific data structures-- there will be no deus ex machina =) > 2 cents from the peanut gallery, > > Gael > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jason.heeris at gmail.com Tue Jul 26 22:38:03 2011 From: jason.heeris at gmail.com (Jason Heeris) Date: Wed, 27 Jul 2011 10:38:03 +0800 Subject: [SciPy-User] Questions about scipy.optimize.anneal In-Reply-To: References: Message-ID: On 26 July 2011 18:31, Jason Heeris wrote: > Fourthly, in my actual application I have a function to optimise that > is invariant to the overall scale of the inputs, ie. > f(a*array([x,y,z])) is the same for all non-zero values of "a". Is > there a risk of anneal failing because of this degeneracy? Is there a > way to structure the problem to avoid this? (Note that any component > could legitimately be zero, so I can't just fix x = 1 ... I think). Just in case anyone else is interested, I realised the answer to this after I'd had my morning coffee. Since the origin is not a valid input, I really just want to anneal over the points on the surface of a 5-sphere of arbitrary non-zero radius. So I can fix the radius to be 1 and constrain my input vectors to have five coordinates which are angles, ie. hyperspherical coordinates (ala. http://en.wikipedia.org/wiki/N-sphere#Hyperspherical_coordinates) ? Jason From hugoslv at gmail.com Wed Jul 27 05:07:05 2011 From: hugoslv at gmail.com (Hugo Silva) Date: Wed, 27 Jul 2011 10:07:05 +0100 Subject: [SciPy-User] Problem with scipy.signal Message-ID: <7A28C61E-4CC6-41DD-8A7D-7A32D14D35C1@gmail.com> Hi, I've just installed Python 2.7 on a MacBook Pro 2.7GHz i7 / 8Gb / Mac OS 10.6.8 and I'm facing the following error when trying to import scipy.signal: ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/signal/sigtools.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/signal/sigtools.so: no matching architecture in universal wrapper Has anyone faced the same problem or can share some insight on what can be a possible solution for this issue? Thank's in advance, Hugo Silva From bhanukiran.perabathini at gmail.com Tue Jul 26 09:49:58 2011 From: bhanukiran.perabathini at gmail.com (bhanukiran perabathini) Date: Tue, 26 Jul 2011 19:19:58 +0530 Subject: [SciPy-User] Help!!!!!! having NEW problems with ODEINT In-Reply-To: References: Message-ID: unsubscribe From mattknox.ca at gmail.com Wed Jul 27 09:06:21 2011 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 27 Jul 2011 13:06:21 +0000 (UTC) Subject: [SciPy-User] Status of TimeSeries SciKit References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> Message-ID: Gael Varoquaux normalesup.org> writes: > > On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote: > > In many ways, the timeseries module is a giant hack which tries to work > > around the fact that it is missing these key foundational pieces in > > numpy. > > I don't believe this statement is true. If you are doing statistics, you > think that what is really missing in numpy is missing data support. If > you are doing timeseries analysis, you are missing timeseries support. If > you are doing spatial models, you are missing unstructured spatial data > support with builtin interpolation, if you are doing general relativity, > you are missing contra/co-variant tensor support. Ok, perhaps my statement was a bit harsh :) . But the point I was trying to make is that the timeseries module could be dramatically simplified and cleaned up internally with some of those forthcoming foundational pieces in numpy, even if the API and functionality of the timeseries module is kept identical to what it is right now. > My point is: let us stop dreaming that a change to core numpy will solve > our problems. I am not saying that it cannot be improved, but in my > opinion, the reason numpy is so successful is that it is actually the > intersection of many different domain-specific requirements, and not the > union. You are right. There is no such thing as a one size fits all data structure. It just so happens that Wes' use cases (from my understanding) are basically the same as mine (finance, etc). So from my own selfish point of view, the idea of pandas swallowing up the timeseries module and incorporating its functionality sounds kind of nice since that would give ME (and probably most of the people that work in the finance domain) an awesome swiss army knife data structure that solves all the problems that I care about :) - Matt Knox From gael.varoquaux at normalesup.org Wed Jul 27 10:12:51 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 27 Jul 2011 16:12:51 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> Message-ID: <20110727141251.GB30024@phare.normalesup.org> On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote: > Ok, perhaps my statement was a bit harsh :) . But the point I was > trying to make is that the timeseries module could be dramatically > simplified and cleaned up internally with some of those forthcoming > foundational pieces in numpy, Eventhough I do not know the timeseries module, I wouldn't be surprised that it is indeed the case. It is probably very valuable if you are able to identify localized enhancements to numpy that make your life easier, as they might make many other people's life easier. > It just so happens that Wes' use cases (from my understanding) are > basically the same as mine (finance, etc). So from my own selfish point > of view, the idea of pandas swallowing up the timeseries module and > incorporating its functionality sounds kind of nice since that would > give ME (and probably most of the people that work in the finance > domain) I think that it is really great if the different packages doing time series analysis unite. It will probably give better packages technically, and there is a lot of value to the community in such work. Gael From wesmckinn at gmail.com Wed Jul 27 10:23:17 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 10:23:17 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <20110727141251.GB30024@phare.normalesup.org> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 10:12 AM, Gael Varoquaux wrote: > On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote: >> Ok, perhaps my statement was a bit harsh :) . But the point I was >> trying to make is that the timeseries module could be dramatically >> simplified and cleaned up internally with some of those forthcoming >> foundational pieces in numpy, > > Eventhough I do not know the timeseries module, I wouldn't be surprised > that it is indeed the case. It is probably very valuable if you are able > to identify localized enhancements to numpy that make your life easier, > as they might make many other people's life easier. > >> It just so happens that Wes' use cases (from my understanding) are >> basically the same as mine (finance, etc). So from my own selfish point >> of view, the idea of pandas swallowing up the timeseries module and >> incorporating its functionality sounds kind of nice since that would >> give ME (and probably most of the people that work in the finance >> domain) > > I think that it is really great if the different packages doing time > series analysis unite. It will probably give better packages technically, > and there is a lot of value to the community in such work. I agree. I already have 50% or more of the features in scikits.timeseries, so this gets back to my fragmentation argument (users being stuck with a confusing choice between multiple libraries). Let's make it happen! > Gael > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From rob.clewley at gmail.com Wed Jul 27 12:09:03 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Wed, 27 Jul 2011 12:09:03 -0400 Subject: [SciPy-User] Help!!!!!! having NEW problems with ODEINT In-Reply-To: References: Message-ID: Hi Laura, >> Hi all: >> sorry to bother you again! I implemented the Jacobian as Anne kindly suggested >> to me with the help of Kevin's pointers to the correct webpage AND I increased >> the maximum number of steps as Warren kindly said. >> I am now getting a new message: >> >> lsoda-- ?warning..internal t (=r1) and h (=r2) are >> ? ? ? such that in the machine, t + h = t on the next step >> ? ? ? (h = step size). solver will continue anyway >> ? ? ?In above, ?R1 = ?0.1209062893646E+03 ? R2 = ?0.9059171791494E-18 > The other recommendations are a good place to start fixing this, except the one suggesting you unsubscribe :) But, short of finding an implementation of a true stochastic DE solver (which I'm afraid I can't help with as I'm not a big expert) you should find it easier to introduce specific noise signals to your system if you use PyDSTool's external input signals that can appear in the RHS function of your DE. Alternatively, you could run your problem as a "hybrid" model where a daily "event" in your code will cause the discrete state transition to the new values drawn from the noise distribution. I am not sure if this will fix your problem with the step size going to zero, but that will depend on your parameters and exactly how you've implemented the running of the system with the discontinuity in state values (and how large they are). But since you are doing this step millions of times, PyDSTool should speed up your runs significantly, especially if you also use a C-based integrator that will compile your RHS too. If you decide to try this out there is an example interp_vode_test.py in the package's tests directory that uses external input signals, and others that demonstrate hybrid systems. But I'd be willing to help you get it working for you if you send me your code, as using this to simulate systems with noise is not yet a well-documented feature. Best, Rob From mattknox.ca at gmail.com Wed Jul 27 12:12:00 2011 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 27 Jul 2011 16:12:00 +0000 (UTC) Subject: [SciPy-User] Status of TimeSeries SciKit References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: Wes McKinney gmail.com> writes: > > I agree. I already have 50% or more of the features in > scikits.timeseries, so this gets back to my fragmentation argument > (users being stuck with a confusing choice between multiple > libraries). Let's make it happen! Ok. In the interest of moving this forward, here is a quick list of things I see missing in pandas that scikits.timeseries does. For brevity I will skip the reasons that these features exist, but if the usefulness is not obvious please ask me to clarify. Frequency conversion flexibility: - when going from a higher frequency to lower frequency (eg. daily to monthly), the timeseries module adds an extra dimension and groups the points so you still have all the original data rather than discarding data - allow you to specify where to place the value - the start or end of the period - when converting from lower frequency to higher frequency (eg. monthly to daily) - support of a larger number of frequencies Indexing: - slicing with dates (looks like "truncate" method does this, but would be nice to be able to just use slicing directly) - simple arithmetic on dates ("date + 1" means "add one unit at the current frequency") - various date/series attributes such as year, qyear, quarter, month, week, day, day_of_year, etc... (ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information) - full missing value support (TimeSeries class is a subclass of MaskedArray) - moving (rolling) median/min/max - Matt Knox From lists at hilboll.de Wed Jul 27 12:28:35 2011 From: lists at hilboll.de (Andreas) Date: Wed, 27 Jul 2011 18:28:35 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: <4E303CB3.1020808@hilboll.de> While we're at it: > Frequency conversion flexibility: > - when going from a higher frequency to lower frequency (eg. daily to > monthly), the timeseries module adds an extra dimension and groups the > points so you still have all the original data rather than discarding > data I'm using scikits.timeseries for analysis of atmospheric measurements. I've always wanted several things, and now that discussion is under way, maybe it's a good time to point them out: * When plotting a series, have the flexibility to have the value marked down at the center of the frequency. What I mean is, when I have monthly data and make a plot of one year, have each value be printed at the middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not obvious to the reader whether the value printed on July 1 is actually that for June or that for July. * Have full support for n-dimensional series. When I have a n-d array of data values for each point in time (n>0), many things don't work. The biggest problem here seems to be that pickling actually *seems* to work (a file is created), but when I load the file again, the entries in the array are somehow screwed up (like transposed). * Enable rolling means for sparse data. For example, if I have irregular (in time) measurements, say, every one to six days, I would still like to be able to calculate a rolling n-day-average. Missing values should be ignored (speaking numpy: timeslice.compressed().mean()) I don't know if any of this is already implemented in pandas, as I've never used it up till now. But perhaps someone would be interested in implementing these issues ... Cheers, Andreas. From wesmckinn at gmail.com Wed Jul 27 12:57:44 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 12:57:44 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 12:12 PM, Matt Knox wrote: > Wes McKinney gmail.com> writes: >> >> I agree. I already have 50% or more of the features in >> scikits.timeseries, so this gets back to my fragmentation argument >> (users being stuck with a confusing choice between multiple >> libraries). Let's make it happen! > > Ok. In the interest of moving this forward, here is a quick list of things I > see missing in pandas that scikits.timeseries does. For brevity I will skip the > reasons that these features exist, but if the usefulness is not obvious please > ask me to clarify. > > Frequency conversion flexibility: > ? ?- when going from a higher frequency to lower frequency (eg. daily to > ? ? ?monthly), the timeseries module adds an extra dimension and groups the > ? ? ?points so you still have all the original data rather than discarding > ? ? ?data This is basically just a group by (reduceat) operation. I've been working a lot on groupby lately and resampling (frequency conversion has always existed, and lo-to-high is simple, but not easy downsampling/aggregation) will fall out as an afterthought. Should not require any C code either. > ? ?- allow you to specify where to place the value - the start or end of the > ? ? ?period - when converting from lower frequency to higher frequency (eg. > ? ? ?monthly to daily) I'll make sure to make this available as an option. down going low-to-high you have two interpolation options: forward fill (aka "pad") and back fill, which I think is what you're saying? > ? ?- support of a larger number of frequencies Which ones are you thinking of? Currently I have: - hourly, minutely, secondly (and things like 5-minutely can be done, e.g. Minute(5)) - daily / business daily - weekly (anchored on a particular weekday) - monthly / business month-end - (business) quarterly, anchored on jan/feb/march - annual / business annual (start and end) there is also a generic delta wrapping dateutil.relativedelta, so it's possible to go beyond these. the scikits.timeseries code is far more comprehensive and complete, completely agree, so if numpy.datetime64 isn't good enough it will hopefully be straightforward to augment. hopefully numpy.datetime64 will reduce the need for a lot of pandas.core.datetools-- although there are still merits (in my view) to having tools for working with Python datetime.datetime objects. > Indexing: > ? ?- slicing with dates (looks like "truncate" method does this, but would > ? ? ?be nice to be able to just use slicing directly) you can use fancy indexing to do this now, e.g: ts.ix[d1:d2] I could push this down into __getitem__ and __setitem__ too without much work > - simple arithmetic on dates ("date + 1" means "add one unit at the current > ?frequency") numpy.datetime64 will do this, which is very nice. the pandas date offsets work on Python datetimes. so I can do stuff like: In [35]: datetime.today() + 5 * datetools.bday Out[35]: datetime.datetime(2011, 8, 3, 0, 0) and if you have a whole DateRange (semi-equiv of DateArray) you can easily shift by the current frequency: In [38]: dr Out[38]: offset: <1 BusinessDay>, tzinfo: None [2000-01-03 00:00:00, ..., 2004-12-31 00:00:00] length: 1305 In [39]: dr.shift(10) Out[39]: offset: <1 BusinessDay>, tzinfo: None [2000-01-17 00:00:00, ..., 2005-01-14 00:00:00] length: 1305 > - various date/series attributes such as year, qyear, quarter, month, week, > ?day, day_of_year, etc... > ?(ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information) I agree this would be nice and very straightforward to add > - full missing value support (TimeSeries class is a subclass of MaskedArray) I challenge you to find a (realistic) use case where the missing value support in pandas in inadequate. I'm being completely serious =) But I've been very vocal about my dislike of MaskedArrays in the missing data discussions. They're hard for (normal) people to use, degrade performance, use extra memory, etc. They add a layer of complication for working with time series that strikes me as completely unnecessary. > - moving (rolling) median/min/max In [41]: pandas.rolling_ pandas.rolling_apply pandas.rolling_median pandas.rolling_corr pandas.rolling_min pandas.rolling_count pandas.rolling_quantile pandas.rolling_cov pandas.rolling_skew pandas.rolling_kurt pandas.rolling_std pandas.rolling_max pandas.rolling_sum pandas.rolling_mean pandas.rolling_var there's also bottleneck, although it doesn't provide the min_periods argument that I need (though I should look at the perf hit of using bottleneck.move_nan* functions and nulling out results not having enough observations after the fact...) > - Matt Knox > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > this is good feedback =) i think we're on the right track - Wes From wesmckinn at gmail.com Wed Jul 27 13:16:35 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 13:16:35 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: <4E303CB3.1020808@hilboll.de> References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> <4E303CB3.1020808@hilboll.de> Message-ID: On Wed, Jul 27, 2011 at 12:28 PM, Andreas wrote: > While we're at it: > >> Frequency conversion flexibility: >> ? ? - when going from a higher frequency to lower frequency (eg. daily to >> ? ? ? monthly), the timeseries module adds an extra dimension and groups the >> ? ? ? points so you still have all the original data rather than discarding >> ? ? ? data > > I'm using scikits.timeseries for analysis of atmospheric measurements. > I've always wanted several things, and now that discussion is under way, > maybe it's a good time to point them out: > > * When plotting a series, have the flexibility to have the value marked > down at the center of the frequency. What I mean is, when I have monthly > data and make a plot of one year, have each value be printed at the > middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not > obvious to the reader whether the value printed on July 1 is actually > that for June or that for July. Seems like this could be pretty easy to do, need only add an "tick_offset" option to the plotting function, I think. > * Have full support for n-dimensional series. When I have a n-d array of > data values for each point in time (n>0), many things don't work. The > biggest problem here seems to be that pickling actually *seems* to work > (a file is created), but when I load the file again, the entries in the > array are somehow screwed up (like transposed). support in pandas is very good for working with multiple univariate time series using DataFrame, not quite as good for panel data (3d), but I'm planing to build out an n-dimensional NDFrame which could potentially address your needs. If you can show me your data and tell me what you need to be able to do with it, it would be helpful to me. The majority of my work in pandas has been motivated by use cases I've experienced in applications. > * Enable rolling means for sparse data. For example, if I have irregular > (in time) measurements, say, every one to six days, I would still like > to be able to calculate a rolling n-day-average. Missing values should > be ignored (speaking numpy: timeslice.compressed().mean()) Either pandas or bottleneck will do this for you, so you can say something like: rolling_mean(ts, window=50, min_periods=5) and any sample with at least 5 data points in the window will compute a value, missing (NaN) data will be excluded. Bottleneck has move_mean and move_nanmean which will outperform pandas.rolling_mean a little bit since the Cython code is more specialized. > I don't know if any of this is already implemented in pandas, as I've > never used it up till now. But perhaps someone would be interested in > implementing these issues ... > > Cheers, > Andreas. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kwgoodman at gmail.com Wed Jul 27 13:27:15 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 27 Jul 2011 10:27:15 -0700 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> <4E303CB3.1020808@hilboll.de> Message-ID: On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney wrote: > On Wed, Jul 27, 2011 at 12:28 PM, Andreas wrote: >> * Enable rolling means for sparse data. For example, if I have irregular >> (in time) measurements, say, every one to six days, I would still like >> to be able to calculate a rolling n-day-average. Missing values should >> be ignored (speaking numpy: timeslice.compressed().mean()) > > Either pandas or bottleneck will do this for you, so you can say something like: > > rolling_mean(ts, window=50, min_periods=5) > > and any sample with at least 5 data points in the window will compute > a value, missing (NaN) data will be excluded. Bottleneck has move_mean > and move_nanmean which will outperform pandas.rolling_mean a little > bit since the Cython code is more specialized. Another use case is when your data is irregularly spaced in time but you still want a moving min/mean/median/whatever over a fixed time window instead of a fixed number of data points. That might be Andreas's use case. From wesmckinn at gmail.com Wed Jul 27 13:31:24 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 13:31:24 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> <4E303CB3.1020808@hilboll.de> Message-ID: On Wed, Jul 27, 2011 at 1:27 PM, Keith Goodman wrote: > On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney wrote: >> On Wed, Jul 27, 2011 at 12:28 PM, Andreas wrote: > >>> * Enable rolling means for sparse data. For example, if I have irregular >>> (in time) measurements, say, every one to six days, I would still like >>> to be able to calculate a rolling n-day-average. Missing values should >>> be ignored (speaking numpy: timeslice.compressed().mean()) >> >> Either pandas or bottleneck will do this for you, so you can say something like: >> >> rolling_mean(ts, window=50, min_periods=5) >> >> and any sample with at least 5 data points in the window will compute >> a value, missing (NaN) data will be excluded. Bottleneck has move_mean >> and move_nanmean which will outperform pandas.rolling_mean a little >> bit since the Cython code is more specialized. > > Another use case is when your data is irregularly spaced in time but > you still want a moving min/mean/median/whatever over a fixed time > window instead of a fixed number of data points. That might be > Andreas's use case. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > True. In pandas parlance I think what you would do is: rolling_mean(ts.valid(), window).reindex(ts.index, method='ffill') From mattknox.ca at gmail.com Wed Jul 27 13:54:13 2011 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 27 Jul 2011 17:54:13 +0000 (UTC) Subject: [SciPy-User] Status of TimeSeries SciKit References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: Wes McKinney gmail.com> writes: > > Frequency conversion flexibility:> > > ? ?- allow you to specify where to place the value - the start or end of the > > ? ? ?period - when converting from lower frequency to higher frequency (eg. > > ? ? ?monthly to daily) > > I'll make sure to make this available as an option. down going > low-to-high you have two interpolation options: forward fill (aka > "pad") and back fill, which I think is what you're saying? > I guess I had a bit of a misunderstanding when I wrote this comment because I was framing things in the context of how I think about the scikits.timeseries module. Monthly frequency dates (or TimeSeries) in the scikit don't have any day information at all. So when converting to daily you need to tell it where to place the value (eg. Jan 1, or Jan 31). Note that this is a SEPARATE decision from wanting to back fill or forward fill. However, since pandas uses regular datetime objects, the day of the month is already embedded in it. A potential drawback of this approach is that to support "start of period" stuff you need to add a separate frequency, effectively doubling the number of frequencies. And if you account for "business day end of month" and "regular day end of month", then you have to quadruple the number of frequencies. You'd have "EOM", "SOM", "BEOM", "BSOM". Similarly for all the quarterly frequencies, annual frequencies, and so on. Whether this is a major problem in practice or not, I don't know. > > ? ?- support of a larger number of frequencies > > Which ones are you thinking of? Currently I have: > > - hourly, minutely, secondly (and things like 5-minutely can be done, > e.g. Minute(5)) > - daily / business daily > - weekly (anchored on a particular weekday) > - monthly / business month-end > - (business) quarterly, anchored on jan/feb/march > - annual / business annual (start and end) I think it is missing quarterly frequencies anchored at the other 9 months of the year. If, for example, you work at a weird Canadian Bank like me, then your fiscal year end is October. Other than that, it has all the frequencies I care about. Semi-annual would be a nice touch, but not that important to me and timeseries module doesn't have it either. People have also asked for higher frequencies in the timeseries module before (eg. millisecond), but that is not something I personally care about. > > Indexing: > > ? ?- slicing with dates (looks like "truncate" method does this, but would > > ? ? ?be nice to be able to just use slicing directly) > > you can use fancy indexing to do this now, e.g: > > ts.ix[d1:d2] > > I could push this down into __getitem__ and __setitem__ too without much work I see. I'd be +1 on pushing it down into __getitem__ and __setitem__ > > - full missing value support (TimeSeries class is a subclass of MaskedArray) > > I challenge you to find a (realistic) use case where the missing value > support in pandas in inadequate. I'm being completely serious =) But > I've been very vocal about my dislike of MaskedArrays in the missing > data discussions. They're hard for (normal) people to use, degrade > performance, use extra memory, etc. They add a layer of complication > for working with time series that strikes me as completely > unnecessary. >From my understanding, pandas just uses nans for missing values. So that means strings, int's, or anything besides floats are not supported. So that is my major issue with it. I agree that masked arrays are overly complicated and it is not ideal. Hopefully the improved missing value support in numpy will provide the best of both worlds. - Matt From wesmckinn at gmail.com Wed Jul 27 14:09:07 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 14:09:07 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 1:54 PM, Matt Knox wrote: > > Wes McKinney gmail.com> writes: > >> > Frequency conversion flexibility:> >> > ? ?- allow you to specify where to place the value - the start or end of the >> > ? ? ?period - when converting from lower frequency to higher frequency (eg. >> > ? ? ?monthly to daily) >> >> I'll make sure to make this available as an option. down going >> low-to-high you have two interpolation options: forward fill (aka >> "pad") and back fill, which I think is what you're saying? >> > > I guess I had a bit of a misunderstanding when I wrote this comment because I > was framing things in the context of how I think about the scikits.timeseries > module. Monthly frequency dates (or TimeSeries) in the scikit don't have any > day information at all. So when converting to daily you need to tell it > where to place the value (eg. Jan 1, or Jan 31). Note that this is a SEPARATE > decision from wanting to back fill or forward fill. > > However, since pandas uses regular datetime objects, the day of the month is > already embedded in it. A potential drawback of this approach is that to > support "start of period" stuff you need to add a separate frequency, > effectively doubling the number of frequencies. And if you account for > "business day end of month" and "regular day end of month", then you have to > quadruple the number of frequencies. You'd have "EOM", "SOM", "BEOM", "BSOM". > Similarly for all the quarterly frequencies, annual frequencies, and so on. > Whether this is a major problem in practice or not, I don't know. I see what you mean. I'm going to wait until the dust on the NumPy stuff settles and then figure out what to do. Using datetime objects is good and bad-- it makes life a lot easier in many ways but some things are less clean as a result. Should start documenting all the use cases on a wiki somewhere. >> > ? ?- support of a larger number of frequencies >> >> Which ones are you thinking of? Currently I have: >> >> - hourly, minutely, secondly (and things like 5-minutely can be done, >> e.g. Minute(5)) >> - daily / business daily >> - weekly (anchored on a particular weekday) >> - monthly / business month-end >> - (business) quarterly, anchored on jan/feb/march >> - annual / business annual (start and end) > > I think it is missing quarterly frequencies anchored at the other 9 months of > the year. If, for example, you work at a weird Canadian Bank like me, then your > fiscal year end is October. For quarterly you need only anchor on Jan/Feb/March right? In [76]: list(DateRange('1/1/2000', '1/1/2002', offset=datetools.BQuarterEnd(startingMonth=1))) Out[76]: [datetime.datetime(2000, 1, 31, 0, 0), datetime.datetime(2000, 4, 28, 0, 0), datetime.datetime(2000, 7, 31, 0, 0), datetime.datetime(2000, 10, 31, 0, 0), datetime.datetime(2001, 1, 31, 0, 0), datetime.datetime(2001, 4, 30, 0, 0), datetime.datetime(2001, 7, 31, 0, 0), datetime.datetime(2001, 10, 31, 0, 0)] (I know, I'm trying to get rid of the camel casing floating around...) > Other than that, it has all the frequencies I care about. Semi-annual would be > a nice touch, but not that important to me and timeseries module doesn't have > it either. People have also asked for higher frequencies in the timeseries > module before (eg. millisecond), but that is not something I personally care > about. numpy.datetime64 will help here. I've a mind to start playing with TAQ (US equity tick data) in the near future in which case my requirements will change. >> > Indexing: >> > ? ?- slicing with dates (looks like "truncate" method does this, but would >> > ? ? ?be nice to be able to just use slicing directly) >> >> you can use fancy indexing to do this now, e.g: >> >> ts.ix[d1:d2] >> >> I could push this down into __getitem__ and __setitem__ too without much work > > I see. I'd be +1 on pushing it down into __getitem__ and __setitem__ I agree, little harm done. The main annoying detail here is working with integer labels. __getitem__ needs to be integer-based when you have integers, while using .ix[...] will do label-based always. >> > - full missing value support (TimeSeries class is a subclass of MaskedArray) >> >> I challenge you to find a (realistic) use case where the missing value >> support in pandas in inadequate. I'm being completely serious =) But >> I've been very vocal about my dislike of MaskedArrays in the missing >> data discussions. They're hard for (normal) people to use, degrade >> performance, use extra memory, etc. They add a layer of complication >> for working with time series that strikes me as completely >> unnecessary. > > From my understanding, pandas just uses nans for missing values. So that means > strings, int's, or anything besides floats are not supported. So that > is my major issue with it. I agree that masked arrays are overly complicated > and it is not ideal. Hopefully the improved missing value support in numpy will > provide the best of both worlds. It's admittedly a kludge but I use NaN as the universal missing-data marker for lack of a better alternative (basically I'm trying to emulate R as much as I can). so you can literally have: In [93]: df2 Out[93]: A B C D E 0 foo one -0.7883 0.7743 False 1 NaN one -0.5866 0.06009 False 2 foo two 0.9312 1.2 True 3 NaN three -0.6417 0.3444 False 4 foo two -0.8841 -0.08126 False 5 bar two 1.194 -0.7933 True 6 foo one -1.624 -0.1403 NaN 7 foo three 0.5046 0.5833 True To cope with this there are functions isnull and notnull which work on every dtype and can recognize NaNs in non-floating point arrays: In [96]: df2[notnull(df2['A'])] Out[96]: A B C D E 0 foo one -0.7883 0.7743 False 2 foo two 0.9312 1.2 True 4 foo two -0.8841 -0.08126 False 5 bar two 1.194 -0.7933 True 6 foo one -1.624 -0.1403 NaN 7 foo three 0.5046 0.5833 True In [98]: df2['E'].fillna('missing') Out[98]: 0 foo 1 missing 2 foo 3 missing 4 foo 5 bar 6 foo 7 foo trying to index with a "boolean" array with NAs gives a slightly helpful error message: In [101]: df2[df2['E']] ValueError: cannot index with vector containing NA / NaN values but In [102]: df2[df2['E'].fillna(False)] Out[102]: A B C D E 2 foo two 0.9312 1.2 True 5 bar two 1.194 -0.7933 True 7 foo three 0.5046 0.5833 True Really crossing my fingers for favorable NA support in NumPy. > - Matt > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pgmdevlist at gmail.com Wed Jul 27 15:16:14 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 Jul 2011 21:16:14 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Jul 27, 2011, at 8:09 PM, Wes McKinney wrote: > On Wed, Jul 27, 2011 at 1:54 PM, Matt Knox wrote: >> >> Wes McKinney gmail.com> writes: >> >>>> Frequency conversion flexibility:> >>>> - allow you to specify where to place the value - the start or end of the >>>> period - when converting from lower frequency to higher frequency (eg. >>>> monthly to daily) >>> >>> I'll make sure to make this available as an option. down going >>> low-to-high you have two interpolation options: forward fill (aka >>> "pad") and back fill, which I think is what you're saying? >>> >> >> I guess I had a bit of a misunderstanding when I wrote this comment because I >> was framing things in the context of how I think about the scikits.timeseries >> module. Monthly frequency dates (or TimeSeries) in the scikit don't have any >> day information at all. So when converting to daily you need to tell it >> where to place the value (eg. Jan 1, or Jan 31). Note that this is a SEPARATE >> decision from wanting to back fill or forward fill. >> >> However, since pandas uses regular datetime objects, the day of the month is >> already embedded in it. A potential drawback of this approach is that to >> support "start of period" stuff you need to add a separate frequency, >> effectively doubling the number of frequencies. And if you account for >> "business day end of month" and "regular day end of month", then you have to >> quadruple the number of frequencies. You'd have "EOM", "SOM", "BEOM", "BSOM". >> Similarly for all the quarterly frequencies, annual frequencies, and so on. >> Whether this is a major problem in practice or not, I don't know. > > I see what you mean. I'm going to wait until the dust on the NumPy > stuff settles and then figure out what to do. Using datetime objects > is good and bad-- it makes life a lot easier in many ways but some > things are less clean as a result. Should start documenting all the > use cases on a wiki somewhere. That's why we used integers to represent dates. We have rules to convert from integers to date times and back. >> >> I think it is missing quarterly frequencies anchored at the other 9 months of >> the year. If, for example, you work at a weird Canadian Bank like me, then your >> fiscal year end is October. > > For quarterly you need only anchor on Jan/Feb/March right? No. You need to be able to define your own quarters. For example, it's fairly common in climatology to define a winter as DJF, so your year actually start on March 1st > >>>> Indexing: >>>> - slicing with dates (looks like "truncate" method does this, but would >>>> be nice to be able to just use slicing directly) >>> >>> you can use fancy indexing to do this now, e.g: >>> >>> ts.ix[d1:d2] >>> >>> I could push this down into __getitem__ and __setitem__ too without much work >> >> I see. I'd be +1 on pushing it down into __getitem__ and __setitem__ > > I agree, little harm done. The main annoying detail here is working > with integer labels. __getitem__ needs to be integer-based when you > have integers, while using .ix[...] will do label-based always. Overloading __g/setitem__ isn't always ideal in Python. That was one aspect I tried to push to C but it still needs a lot of work. > >>>> - full missing value support (TimeSeries class is a subclass of MaskedArray) >>> >>> I challenge you to find a (realistic) use case where the missing value >>> support in pandas in inadequate. I'm being completely serious =) But >>> I've been very vocal about my dislike of MaskedArrays in the missing >>> data discussions. They're hard for (normal) people to use, degrade >>> performance, use extra memory, etc. They add a layer of complication >>> for working with time series that strikes me as completely >>> unnecessary. Let's wait a bit and see how missing/ignored values are getting supported, shall we ? From mattknox.ca at gmail.com Wed Jul 27 15:42:51 2011 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 27 Jul 2011 19:42:51 +0000 (UTC) Subject: [SciPy-User] Status of TimeSeries SciKit References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: Wes McKinney gmail.com> writes: > > I think it is missing quarterly frequencies anchored at the other 9 months of > > the year. If, for example, you work at a weird Canadian Bank like me, then your > > fiscal year end is October. > > For quarterly you need only anchor on Jan/Feb/March right? > > In [76]: list(DateRange('1/1/2000', '1/1/2002', > offset=datetools.BQuarterEnd(startingMonth=1))) > Out[76]: > [datetime.datetime(2000, 1, 31, 0, 0), > datetime.datetime(2000, 4, 28, 0, 0), > datetime.datetime(2000, 7, 31, 0, 0), > datetime.datetime(2000, 10, 31, 0, 0), > datetime.datetime(2001, 1, 31, 0, 0), > datetime.datetime(2001, 4, 30, 0, 0), > datetime.datetime(2001, 7, 31, 0, 0), > datetime.datetime(2001, 10, 31, 0, 0)] I guess this again gets back to the fact that it is datetime objects being used and the series itself doesn't really have any "frequency" information contained in it in pandas. So in pandas, a March based quarterly frequency really is identical to a June based quarterly frequency. My use case for this type of stuff would be "calendarizing" things like earnings. For example, lets say I had the following data: Company A - fiscal year end October 2009q1 15.7 2009q2 16.1 2009q3 16.6 etc... Company B - fiscal year end April 2009q1 12.9 2009q2 11.2 2009q3 13.5 etc... in the first case, 2009q1 is Nov 2008 - Jan 2009. In the second case it is May 2008 - July 2008. This can be handled without too much extra work in pandas, by preconverting your quarters to actual dates. I think it is a bit less clean than in the timeseries module where I would just specify Q-OCT for the frequency and then everything is done for me. But it is not something I would lose sleep over. And the workaround is not that onerous. From thambsup at gmail.com Wed Jul 27 16:17:24 2011 From: thambsup at gmail.com (timofey chernishev) Date: Thu, 28 Jul 2011 00:17:24 +0400 Subject: [SciPy-User] Some problem with defining parameters in f2py Message-ID: Hi. I use gfortran 4.4.3, f2py Version: 2, numpy Version: 1.3.0. when i try: subroutine poiss1d(phi0,pts,n_q,phi1) real,parameter::pi=3.14159265358979323846264338327950288,h=0.01,e=4.8E-10 real,parameter::Cq=2.0*pi*h**2 *e,Ca=0.5,Cd=0.5 ... end subroutine poiss1d and have an error: "Error: Parameter 'pi' at (1) has not been declared or is a variable, which does not reduce to a constant expression" but, when e is moved to the next string: subroutine poiss1d(phi0,pts,n_q,phi1) real,parameter::pi=3.14159265358979323846264338327950288,h=0.01 real,parameter::e=4.8E-10 real,parameter::Cq=2.0*pi*h**2 *e,Ca=0.5,Cd=0.5 ... end subroutine poiss1d all works -- why so? From ben.root at ou.edu Wed Jul 27 16:22:54 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 27 Jul 2011 15:22:54 -0500 Subject: [SciPy-User] unravel_index for pdist? Message-ID: Hello, I typically use pdist() from scipy.spatial.distance to keep some of my scripts lean wrt memory usage. However, determining which pairs of points a distance value refers to is a bit tricky. I was wondering if a "punravel_index()" function might be welcomed? Here is my algorithm I came up with to perform this action for a single result. I haven't tested it for anything more general than that. *pntCnt* is the number of points that pdist() was used for. *index* is an index from a function like "np.argmax(pdist(p))". rowcnts = np.cumsum(xrange(pntCnt - 1, 0, -1)) row = np.searchsorted(rowcnts, index) col = pntCnt - (rowcnts[row] - index) At the very least, I hope that this is helpful to anybody else using pdist. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Wed Jul 27 16:47:24 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 16:47:24 -0400 Subject: [SciPy-User] New article about group by functionality Message-ID: This may be of interest to many: http://wesmckinney.com/blog/?p=125 Would be interest to get thoughts on the matter as it relates to NumPy and other related functionality that's needed. - Wes From wesmckinn at gmail.com Wed Jul 27 17:30:13 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 17:30:13 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 3:42 PM, Matt Knox wrote: > > Wes McKinney gmail.com> writes: > >> > I think it is missing quarterly frequencies anchored at the other 9 months > of >> > the year. If, for example, you work at a weird Canadian Bank like me, then > your >> > fiscal year end is October. >> >> For quarterly you need only anchor on Jan/Feb/March right? >> >> In [76]: list(DateRange('1/1/2000', '1/1/2002', >> offset=datetools.BQuarterEnd(startingMonth=1))) >> Out[76]: >> [datetime.datetime(2000, 1, 31, 0, 0), >> ?datetime.datetime(2000, 4, 28, 0, 0), >> ?datetime.datetime(2000, 7, 31, 0, 0), >> ?datetime.datetime(2000, 10, 31, 0, 0), >> ?datetime.datetime(2001, 1, 31, 0, 0), >> ?datetime.datetime(2001, 4, 30, 0, 0), >> ?datetime.datetime(2001, 7, 31, 0, 0), >> ?datetime.datetime(2001, 10, 31, 0, 0)] > > I guess this again gets back to the fact that it is datetime objects being used > and the series itself doesn't really have any "frequency" information > contained in it in pandas. So in pandas, a March based quarterly frequency > really is identical to a June based quarterly frequency. > > My use case for this type of stuff would be "calendarizing" things like > earnings. > > For example, lets say I had the following data: > > Company A - fiscal year end October > 2009q1 15.7 > 2009q2 16.1 > 2009q3 16.6 > etc... > > Company B - fiscal year end April > 2009q1 12.9 > 2009q2 11.2 > 2009q3 13.5 > etc... > > in the first case, 2009q1 is Nov 2008 - Jan 2009. In the second case it is > May 2008 - July 2008. This can be handled without too much extra work in > pandas, by preconverting your quarters to actual dates. I think it is a bit > less clean than in the timeseries module where I would just specify Q-OCT for > the frequency and then everything is done for me. But it is not something I > would lose sleep over. And the workaround is not that onerous. > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Thanks, I've got it now. A little slow on the uptake =) From timmichelsen at gmx-topmail.de Wed Jul 27 18:31:58 2011 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Thu, 28 Jul 2011 00:31:58 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: Hello all, similar to Dharhas, I was a strong user of the time series scikit from the very beginning. Since most of my code for meteorological data evaluations is based on it, I would be happy to receive infomation on the conclusion and how I need to adjust my code to upkeep with new developments. >>>>> - full missing value support (TimeSeries class is a subclass of MaskedArray) >>>> >>>> I challenge you to find a (realistic) use case where the missing value >>>> support in pandas in inadequate. I'm being completely serious =) But >>>> I've been very vocal about my dislike of MaskedArrays in the missing >>>> data discussions. They're hard for (normal) people to use, degrade >>>> performance, use extra memory, etc. They add a layer of complication >>>> for working with time series that strikes me as completely >>>> unnecessary. > > > Let's wait a bit and see how missing/ignored values are getting supported, shall we ? How does Pandas deal with missing values? This pages: http://pandas.sourceforge.net/missing_data.html?highlight=missing Is empty The convenient support for missing data (once date converterters were out) in timeseries helps a lot to quickly deal with measurement logs or incomplete data. Best regards, Timmie From wesmckinn at gmail.com Wed Jul 27 18:41:28 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 18:41:28 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 6:31 PM, Tim Michelsen wrote: > Hello all, > similar to Dharhas, I was a strong user of the time series scikit from > the very beginning. > Since most of my code for meteorological data evaluations is based on > it, I would be happy to receive infomation on the conclusion and how I > need to adjust my code to upkeep with new developments. When it gets to that point I'd be happy to help (including looking at some of your existing code and data). >>>>>> - full missing value support (TimeSeries class is a subclass of MaskedArray) >>>>> >>>>> I challenge you to find a (realistic) use case where the missing value >>>>> support in pandas in inadequate. I'm being completely serious =) But >>>>> I've been very vocal about my dislike of MaskedArrays in the missing >>>>> data discussions. They're hard for (normal) people to use, degrade >>>>> performance, use extra memory, etc. They add a layer of complication >>>>> for working with time series that strikes me as completely >>>>> unnecessary. >> >> >> Let's wait a bit and see how missing/ignored values are getting supported, shall we ? > How does Pandas deal with missing values? discussed a bit in my reply here: http://article.gmane.org/gmane.comp.python.scientific.user/29661 In short using NaN across the dtypes with special functions isnull/notnull to detect NaN in dtype=object arrays. I'm hopeful this can be replaced with native NumPy NA support in the relatively near future... > This pages: > http://pandas.sourceforge.net/missing_data.html?highlight=missing > Is empty > > The convenient support for missing data (once date converterters were > out) in timeseries helps a lot to quickly deal with measurement logs or > incomplete data. > > Best regards, > Timmie > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From questions.anon at gmail.com Thu Jul 28 00:44:22 2011 From: questions.anon at gmail.com (questions anon) Date: Thu, 28 Jul 2011 14:44:22 +1000 Subject: [SciPy-User] numpy mean Message-ID: Hi All, I thought this would be a relatively easy thing to do but the more I look the more confused I become! I have a netcdf file containing hourly temperature data for given region for a month. I would like to find the mean/average of particular periods (i.e. 3 hours). e.g. this is what I would like: array1=[2,4,8] array2=[4,8,12] array3=[9,3,15] meanofarrays=np.mean(array1,array2,array3) print meanofarrays >>>[5,5,11] Is there a routine that will do what I am after? If not I seem to be able to sum the arrays together and then divide by another array, but I will need to produce an array to match the extent and all values will need to be equal to the number of arrays I have summed. Can anyone help with producing this array? i.e. sumofarray=[15,15,35] numberofarrays=[3,3,3] meanofarrays=np.divide[sumofarrays,numberofarrays] print meanofarrays >>>[5,5,11] Any feedback will be greatly appreciated!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Jul 28 01:06:05 2011 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 28 Jul 2011 01:06:05 -0400 Subject: [SciPy-User] numpy mean In-Reply-To: References: Message-ID: Hi, you can do the following >>> arr = np.array([array1,array2,array3]) >>> arr array([[ 2, 4, 8], [ 4, 8, 12], [ 9, 3, 15]]) >>> np.mean(arr, axis = 1) array([ 4.66666667, 8. , 9. ]) >>> np.mean(arr, axis = 0) array([ 5. , 5. , 11.66666667]) cheers -- Oleksandr 2011/7/28 questions anon > Hi All, > I thought this would be a relatively easy thing to do but the more I look > the more confused I become! > > I have a netcdf file containing hourly temperature data for given region > for a month. > I would like to find the mean/average of particular periods (i.e. 3 hours). > > e.g. this is what I would like: > > array1=[2,4,8] > array2=[4,8,12] > array3=[9,3,15] > > meanofarrays=np.mean(array1,array2,array3) > print meanofarrays > >>>[5,5,11] > > Is there a routine that will do what I am after? > If not I seem to be able to sum the arrays together and then divide by > another array, but I will need to produce an array to match the extent and > all values will need to be equal to the number of arrays I have summed. Can > anyone help with producing this array? > > i.e. > sumofarray=[15,15,35] > numberofarrays=[3,3,3] > meanofarrays=np.divide[sumofarrays,numberofarrays] > print meanofarrays > >>>[5,5,11] > > Any feedback will be greatly appreciated!!! > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From questions.anon at gmail.com Thu Jul 28 02:19:22 2011 From: questions.anon at gmail.com (questions anon) Date: Thu, 28 Jul 2011 16:19:22 +1000 Subject: [SciPy-User] numpy mean In-Reply-To: References: Message-ID: perfect (using axis=0), thank you! On Thu, Jul 28, 2011 at 3:06 PM, Oleksandr Huziy wrote: > Hi, > > you can do the following > >>> arr = np.array([array1,array2,array3]) > >>> arr > array([[ 2, 4, 8], > [ 4, 8, 12], > [ 9, 3, 15]]) > >>> np.mean(arr, axis = 1) > array([ 4.66666667, 8. , 9. ]) > >>> np.mean(arr, axis = 0) > array([ 5. , 5. , 11.66666667]) > > cheers > -- > Oleksandr > > 2011/7/28 questions anon > >> Hi All, >> I thought this would be a relatively easy thing to do but the more I look >> the more confused I become! >> >> I have a netcdf file containing hourly temperature data for given region >> for a month. >> I would like to find the mean/average of particular periods (i.e. 3 >> hours). >> >> e.g. this is what I would like: >> >> array1=[2,4,8] >> array2=[4,8,12] >> array3=[9,3,15] >> >> meanofarrays=np.mean(array1,array2,array3) >> print meanofarrays >> >>>[5,5,11] >> >> Is there a routine that will do what I am after? >> If not I seem to be able to sum the arrays together and then divide by >> another array, but I will need to produce an array to match the extent and >> all values will need to be equal to the number of arrays I have summed. Can >> anyone help with producing this array? >> >> i.e. >> sumofarray=[15,15,35] >> numberofarrays=[3,3,3] >> meanofarrays=np.divide[sumofarrays,numberofarrays] >> print meanofarrays >> >>>[5,5,11] >> >> Any feedback will be greatly appreciated!!! >> >> >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jul 28 05:13:51 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 28 Jul 2011 09:13:51 +0000 (UTC) Subject: [SciPy-User] Some problem with defining parameters in f2py References: Message-ID: Thu, 28 Jul 2011 00:17:24 +0400, timofey chernishev wrote: > Hi. I use gfortran 4.4.3, f2py Version: 2, numpy Version: 1.3.0. > > when i try: > subroutine poiss1d(phi0,pts,n_q,phi1) > real,parameter::pi=3.14159265358979323846264338327950288,h=0.01,e=4.8E-10 > real,parameter::Cq=2.0*pi*h**2 *e,Ca=0.5,Cd=0.5 > ... > end subroutine poiss1d > > and have an error: "Error: Parameter 'pi' at (1) has not been declared > or is a variable, which does not reduce to a constant expression" Fortran 77 ignores characters beyond the 72th column (they do not fit on punch cards). Welcome back to the 60s. From andreas at hilboll.de Wed Jul 27 13:38:16 2011 From: andreas at hilboll.de (Andreas) Date: Wed, 27 Jul 2011 19:38:16 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> <4E303CB3.1020808@hilboll.de> Message-ID: <4E304D08.6040807@hilboll.de> On 2011-07-27 19:27, Keith Goodman wrote: > On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney wrote: >> On Wed, Jul 27, 2011 at 12:28 PM, Andreas wrote: > >>> * Enable rolling means for sparse data. For example, if I have irregular >>> (in time) measurements, say, every one to six days, I would still like >>> to be able to calculate a rolling n-day-average. Missing values should >>> be ignored (speaking numpy: timeslice.compressed().mean()) >> >> Either pandas or bottleneck will do this for you, so you can say something like: >> >> rolling_mean(ts, window=50, min_periods=5) >> >> and any sample with at least 5 data points in the window will compute >> a value, missing (NaN) data will be excluded. Bottleneck has move_mean >> and move_nanmean which will outperform pandas.rolling_mean a little >> bit since the Cython code is more specialized. > > Another use case is when your data is irregularly spaced in time but > you still want a moving min/mean/median/whatever over a fixed time > window instead of a fixed number of data points. That might be > Andreas's use case. Yes, this is exactly what I'm looking for. From gustavo.goretkin at gmail.com Wed Jul 27 17:45:20 2011 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Wed, 27 Jul 2011 17:45:20 -0400 Subject: [SciPy-User] maximally sparse subset of points Message-ID: I have a dataset of N points (in 4 dimensions) and I'd like to select a smaller subset, size M, of those points that are maximally spread out. An approximation is fine. Other than the K-d tree, is there anything in SciPy or other Python module to help accomplish this? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Jul 28 09:10:35 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2011 15:10:35 +0200 Subject: [SciPy-User] maximally sparse subset of points In-Reply-To: References: Message-ID: <20110728131035.GN28396@phare.normalesup.org> On Wed, Jul 27, 2011 at 05:45:20PM -0400, Gustavo Goretkin wrote: > I have a dataset of N points (in 4 dimensions) and I'd like to select a > smaller subset, size M, of those points that are maximally spread out. The problem that you are trying to solve is close to the k-medoids problem. I don't know of Python modules implementing a k-medoids. Alternatively, the k_init function used to initialize the k-means in the scikits.learn [1] might be a useful approximation. It's a pretty brutal approximation, and it might not work for you, but it should be fast. Ga?l [1] https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/cluster/k_means_.py#L32 From tritemio at gmail.com Thu Jul 28 15:04:06 2011 From: tritemio at gmail.com (Antonio Ingargiola) Date: Thu, 28 Jul 2011 12:04:06 -0700 Subject: [SciPy-User] binom pdf error In-Reply-To: References: Message-ID: Hi, I created a ticket for this problem: http://projects.scipy.org/scipy/ticket/1487 Hope that somebody will fix it. Regards, Antonio 2011/7/22 Warren Weckesser > > > On Fri, Jul 22, 2011 at 6:30 PM, Antonio Ingargiola wrote: > >> 2011/7/22 Warren Weckesser >> >>> >>> >>> On Fri, Jul 22, 2011 at 3:31 PM, Antonio Ingargiola wrote: >>> >>>> Hi to the list, >>>> >>>> I got an error using the pdf method on the binom distribution. Same >>>> error happens on scipy 0.8 and scipy 0.9 (respectively ubuntu distribution >>>> and pythonxy on windows). >>>> >>>> The error is the following: >>>> >>>> In [1]: from scipy.stats.distributions import binom >>>> >>>> In [2]: b = binom(20,0.8) >>>> >>>> In [3]: b.rvs() >>>> Out[3]: 17 >>>> >>>> In [4]: b.pdf(2) >>>> >>>> --------------------------------------------------------------------------- >>>> AttributeError Traceback (most recent call >>>> last) >>>> >>>> C:\Users\anto\.xy\startups\ in () >>>> >>>> C:\Python26\lib\site-packages\scipy\stats\distributions.pyc in pdf(self, >>>> x) >>>> 333 >>>> 334 def pdf(self, x): #raises AttributeError in frozen >>>> discrete distribution >>>> --> 335 return self.dist.pdf(x, *self.args, **self.kwds) >>>> 336 >>>> 337 def cdf(self, x): >>>> >>>> AttributeError: 'binom_gen' object has no attribute 'pdf' >>>> >>>> In [5]: >>>> >>>> >>>> Is this known problem? How can I get the binomial pdf, is there a >>>> workaround? >>>> >>> >>> >>> Since binom is a discrete distribution, you want the pmf method: >>> >>> In [32]: b = binom(20, 0.8) >>> >>> In [33]: b.pmf(2) >>> Out[33]: 3.1876710400000011e-11 >>> >>> In [34]: b.pmf(18) >>> Out[34]: 0.13690942867206304 >>> >>> >>> The behavior that you observed still looks like a bug to me--why does >>> binom even have a pdf method, if calling it just raises a cryptic exception? >>> >> >> Warren, Thanks for the clarification. And BTW yes I think that this >> behaviour is quite boguous. Should I fill a bug report? >> > > > Good idea--please do. > > Warren > > > >> >> Antonio >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Fri Jul 29 09:48:05 2011 From: denis-bz-gg at t-online.de (denis) Date: Fri, 29 Jul 2011 06:48:05 -0700 (PDT) Subject: [SciPy-User] maximally sparse subset of points In-Reply-To: References: Message-ID: <3ff31ed0-00bc-4925-92a8-3dcb978cbe43@l37g2000yqd.googlegroups.com> Gustavo, I'd go with KDTree, fast and easy in 4d; if you build a tree with big leaves, leafsize ~ N/M, then take the midpoints of each leaf, that should do ? To walk the leaves, add the below to .../scipy/spatial/kdtree.py (or .pyx, but building the tree is not much slower in pure python). cheers -- denis def forleaves( self, func, *args, **kwargs ): """ call func( data ) for each leaf, e.g. leafmid = [] def leaffunc( data, leafmid=leafmid ): leafmid.append( data.mean(axis=0 )) """ q = [self.tree] while q: node = heappop(q) if isinstance( node, KDTree.leafnode ): data = self.data[node.idx] func( data, *args, **kwargs ) # test-leaves.py else: heappush( q, node.less ) heappush( q, node.greater ) On Jul 27, 11:45?pm, Gustavo Goretkin wrote: > I have a dataset of N points (in 4 dimensions) and I'd like to select a > smaller subset, size M, of those points that are maximally spread out. An > approximation is fine. Other than the K-d tree, is there anything in SciPy From garyrob at me.com Fri Jul 29 10:04:43 2011 From: garyrob at me.com (Gary Robinson) Date: Fri, 29 Jul 2011 14:04:43 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?What_happened_with_chisquare=5Fcontingency?= =?utf-8?q?=3F?= References: Message-ID: > On Tue, Mar 1, 2011 at 8:00 PM, Yang Zhang gmail.com> wrote: > > I put another version of the code in this ticket:??? > http://projects.scipy.org/scipy/ticket/1203During SciPy 2010 and later, > Anthony Scopatz and I worked on a contingency table class, which is in > scipy/stats/contingency_table.py here:??? > https://github.com/scopatz/scipyThen other work (scipy bugs, > scipy.signal, and the work that pays the bills) pushed this down in my > "to do" list, and it never made it back up to the top.? But perhaps it > is time to bump this up again--thanks for the reminder!Warren I haven't noticed anything more about this since March... this is just a friendly note speaking up as one more person interested in the progress of this class! :) From ralf.gommers at googlemail.com Fri Jul 29 11:08:29 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 29 Jul 2011 17:08:29 +0200 Subject: [SciPy-User] What happened with chisquare_contingency? In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 4:04 PM, Gary Robinson wrote: > > On Tue, Mar 1, 2011 at 8:00 PM, Yang Zhang gmail.com> > wrote: > > > > I put another version of the code in this ticket: > > http://projects.scipy.org/scipy/ticket/1203During SciPy 2010 and later, > > Anthony Scopatz and I worked on a contingency table class, which is in > > scipy/stats/contingency_table.py here: > > https://github.com/scopatz/scipyThen other work (scipy bugs, > > scipy.signal, and the work that pays the bills) pushed this down in my > > "to do" list, and it never made it back up to the top. But perhaps it > > is time to bump this up again--thanks for the reminder!Warren > > I haven't noticed anything more about this since March... this is just a > friendly note speaking up as one more person interested in the progress of > this > class! :) > > It's almost ready to be merged: https://github.com/scipy/scipy/pull/34 -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Fri Jul 29 11:08:59 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 29 Jul 2011 10:08:59 -0500 Subject: [SciPy-User] What happened with chisquare_contingency? In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 9:04 AM, Gary Robinson wrote: >> On Tue, Mar 1, 2011 at 8:00 PM, Yang Zhang gmail.com> wrote: >> >> I put another version of the code in this ticket: >> http://projects.scipy.org/scipy/ticket/1203During SciPy 2010 and later, >> Anthony Scopatz and I worked on a contingency table class, which is in >> scipy/stats/contingency_table.py here: >> https://github.com/scopatz/scipyThen other work (scipy bugs, >> scipy.signal, and the work that pays the bills) pushed this down in my >> "to do" list, and it never made it back up to the top.? But perhaps it >> is time to bump this up again--thanks for the reminder!Warren > > I haven't noticed anything more about this since March... this is just a > friendly note speaking up as one more person interested in the progress of this > class! :) I put the basic contingency table code in a pull request: https://github.com/scipy/scipy/pull/34 After a few more small changes, it will get pushed to the trunk. The contingency table class probably needs some more work, but it should definitely be in before the next release. Warren > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From claumann at physics.harvard.edu Fri Jul 29 11:20:18 2011 From: claumann at physics.harvard.edu (Chris Laumann) Date: Fri, 29 Jul 2011 11:20:18 -0400 Subject: [SciPy-User] Sparse eigensystem instability on OS X Lion Message-ID: <08D3BBA9-A043-47FA-8817-B5FAA0BA7EB9@physics.harvard.edu> Hi all- I've recently upgraded to Lion and, because it broke my python setup, reinstalled scipy from two sources -- Enthought's full 64 bit latest distro (Scipy 0.9.0 and Numpy 1.6.0) and Chris Fonnesbeck's Scipy Superpack (with the latest dev versions from a few weeks ago). I'm now finding a bunch of numerical instability in symmetric sparse eigensystem solutions in scipy.sparse.linalg.eigsh. These didn't exist before the upgrade when I was using Snow Leopard and scipy 0.8.0 (eigen_symmetric instead of eigsh). The smallest example I've isolated is: data = array([ 16., 18., -4., -2., -4., 25., -2., 4., -2., 17., 16., 18.]) indices = array([0, 1, 2, 6, 1, 2, 4, 3, 2, 4, 5, 6], dtype='int32') indptr = array([ 0, 1, 4, 7, 8, 10, 11, 12], dtype='int32') ham1 = scipy.sparse.csr_matrix((data, indices, indptr)) e1 = np.linalg.eigvalsh(ham1.todense()) e2 = scipy.sparse.linalg.eigsh(ham1, k=6, which='SA', return_eigenvectors = False) e2.sort() This leads to (using the superpack -- the results from Enthought are similarly noisy but different): >>> e1 array([ 4. , 15.57851688, 16. , 16. , 17.27739936, 18. , 27.14408376]) >>> e2 array([ 4.03171393, 15.47517729, 15.98093481, 16. , 16.76781718, 18.79560773]) These are big errors for a 7x7 matrix, especially for the minimum eigenvalue 4, which is well separated from the rest of the spectrum. Can anybody help? I believe the Superpack _arpack.so is linking against the apple Accelerate framework for LAPACK while I'm not sure about Enthoughts _arpack.so. As far as I can tell from a little otool snooping, Enthought seems to roll its own LAPACK functionality. In any event, neither seems to work. Thanks, Chris From claumann at physics.harvard.edu Fri Jul 29 12:19:41 2011 From: claumann at physics.harvard.edu (Chris Laumann) Date: Fri, 29 Jul 2011 12:19:41 -0400 Subject: [SciPy-User] DISREGARD: Sparse eigensystem instability on OS X Lion Message-ID: <3B3E7255-9988-4B75-8194-74A0CBD76B4E@physics.harvard.edu> Hey everybody- The matrix I sent out wasn't hermitian, hence the instability. Disregard the previous post.. Oops, Chris ---------------- Hi all- I've recently upgraded to Lion and, because it broke my python setup, reinstalled scipy from two sources -- Enthought's full 64 bit latest distro (Scipy 0.9.0 and Numpy 1.6.0) and Chris Fonnesbeck's Scipy Superpack (with the latest dev versions from a few weeks ago). I'm now finding a bunch of numerical instability in symmetric sparse eigensystem solutions in scipy.sparse.linalg.eigsh. These didn't exist before the upgrade when I was using Snow Leopard and scipy 0.8.0 (eigen_symmetric instead of eigsh). The smallest example I've isolated is: data = array([ 16., 18., -4., -2., -4., 25., -2., 4., -2., 17., 16., 18.]) indices = array([0, 1, 2, 6, 1, 2, 4, 3, 2, 4, 5, 6], dtype='int32') indptr = array([ 0, 1, 4, 7, 8, 10, 11, 12], dtype='int32') ham1 = scipy.sparse.csr_matrix((data, indices, indptr)) e1 = np.linalg.eigvalsh(ham1.todense()) e2 = scipy.sparse.linalg.eigsh(ham1, k=6, which='SA', return_eigenvectors = False) e2.sort() This leads to (using the superpack -- the results from Enthought are similarly noisy but different): >>> e1 array([ 4. , 15.57851688, 16. , 16. , 17.27739936, 18. , 27.14408376]) >>> e2 array([ 4.03171393, 15.47517729, 15.98093481, 16. , 16.76781718, 18.79560773]) These are big errors for a 7x7 matrix, especially for the minimum eigenvalue 4, which is well separated from the rest of the spectrum. Can anybody help? I believe the Superpack _arpack.so is linking against the apple Accelerate framework for LAPACK while I'm not sure about Enthoughts _arpack.so. As far as I can tell from a little otool snooping, Enthought seems to roll its own LAPACK functionality. In any event, neither seems to work. Thanks, Chris From fiolj at yahoo.com Fri Jul 29 18:36:04 2011 From: fiolj at yahoo.com (Juan Fiol) Date: Fri, 29 Jul 2011 15:36:04 -0700 (PDT) Subject: [SciPy-User] integral of oscillatory functions Message-ID: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> Hi, I have to integrate a *highly* oscilatory function. I've been looking in the literature and found that there some "asymptotic methods" (that work better when oscillations are stronger and cancel most of the integrand), some methods derived from filon's method and other called Levin method. I've had taken quick looks into several mathematical papers on the subject but It will probably take me more than one month (and may be much more) to understand the subject and put it into a routine. Does anybody know if there is anything of the sort implemented in scipy? Otherwise, I would appreciate if I get advice for a more "practical" place where to look. The integrand is not strictly of the form f(t) e^(iwt). Any help would be welcome. Thanks Juan From charlesr.harris at gmail.com Fri Jul 29 20:10:25 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Jul 2011 18:10:25 -0600 Subject: [SciPy-User] integral of oscillatory functions In-Reply-To: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> References: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> Message-ID: On Fri, Jul 29, 2011 at 4:36 PM, Juan Fiol wrote: > Hi, I have to integrate a *highly* oscilatory function. I've been looking > in the literature and found that there some "asymptotic methods" (that work > better when oscillations are stronger and cancel most of the integrand), > some methods derived from filon's method and other called Levin method. I've > had taken quick looks into several mathematical papers on the subject but It > will probably take me more than one month (and may be much more) to > understand the subject and put it into a routine. Does anybody know if there > is anything of the sort implemented in scipy? Otherwise, I would appreciate > if I get advice for a more "practical" place where to look. > The integrand is not strictly of the form f(t) e^(iwt). > Any help would be welcome. Thanks > How oscillatory is *highly* oscillatory? Does the function have any particular form? Where does it come from? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.stults at gmail.com Fri Jul 29 22:20:19 2011 From: joshua.stults at gmail.com (Joshua Stults) Date: Fri, 29 Jul 2011 22:20:19 -0400 Subject: [SciPy-User] integral of oscillatory functions In-Reply-To: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> References: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> Message-ID: On Fri, Jul 29, 2011 at 6:36 PM, Juan Fiol wrote: > Hi, I have to integrate a *highly* oscilatory function. This came up recently on the Maxima list, maybe you'll find the linked paper helpful: http://www.math.utexas.edu/pipermail/maxima/2011/025577.html -- Joshua Stults Website: variousconsequences.com Hackerspace: daytondiode.org From pav at iki.fi Sat Jul 30 06:36:34 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 30 Jul 2011 10:36:34 +0000 (UTC) Subject: [SciPy-User] Sparse eigensystem instability on OS X Lion References: <08D3BBA9-A043-47FA-8817-B5FAA0BA7EB9@physics.harvard.edu> Message-ID: On Fri, 29 Jul 2011 11:20:18 -0400, Chris Laumann wrote: > I've recently upgraded to Lion and, because it broke my python setup, > reinstalled scipy from two sources -- Enthought's full 64 bit latest > distro (Scipy 0.9.0 and Numpy 1.6.0) and Chris Fonnesbeck's Scipy > Superpack (with the latest dev versions from a few weeks ago). I'm now > finding a bunch of numerical instability in symmetric sparse eigensystem > solutions in scipy.sparse.linalg.eigsh. These didn't exist before the > upgrade when I was using Snow Leopard and scipy 0.8.0 (eigen_symmetric > instead of eigsh). Your matrix is not hermitian. data = array([ 16., 18., -4., -2., -4., 25., -2., 4., -2., 17., 16., 18.]) indices = array([0, 1, 2, 6, 1, 2, 4, 3, 2, 4, 5, 6], dtype='int32') indptr = array([ 0, 1, 4, 7, 8, 10, 11, 12], dtype='int32') ham1 = scipy.sparse.csr_matrix((data, indices, indptr)) A = ham1.todense() print abs(A - A.T.conj()).max() # -> 2.0 From timmichelsen at gmx-topmail.de Sat Jul 30 07:40:59 2011 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Sat, 30 Jul 2011 13:40:59 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: >> Since most of my code for meteorological data evaluations is based on >> it, I would be happy to receive infomation on the conclusion and how I >> need to adjust my code to upkeep with new developments. > > When it gets to that point I'd be happy to help (including looking at > some of your existing code and data). In short my process goes like: * QC of incoming measurements data * visualisation and statistics (basics, disribution analysis) * reporting * back & forcasting with other (modeled) data * preparation of result data sets When it comes to QC I would need: * check on missing dates (i.e. failure of aquisitition equipment) * check on double dates (= failure of data logger) * data integrity and plausability tests with certain filters/flags All these need to be reported on: * data recovery * invalid data by filter/flag type So far, I have been using the masked arrays. Mainly because it is heaily used in the time series scikit and transfering masks from on array to another is quite once you learned the basics. Would you work these items out in pandas, as well? P.S. Your presentation "Time series analysis in Python with statsmodels" is really cool and has shown me good aspects about the HP filters Regards, Timmie From timmichelsen at gmx-topmail.de Sat Jul 30 11:50:12 2011 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Sat, 30 Jul 2011 17:50:12 +0200 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: >>> It just so happens that Wes' use cases (from my understanding) are >>> basically the same as mine (finance, etc). So from my own selfish point >>> of view, the idea of pandas swallowing up the timeseries module and >>> incorporating its functionality sounds kind of nice since that would >>> give ME (and probably most of the people that work in the finance >>> domain) >> >> I think that it is really great if the different packages doing time >> series analysis unite. It will probably give better packages technically, >> and there is a lot of value to the community in such work. > > I agree. I already have 50% or more of the features in > scikits.timeseries, so this gets back to my fragmentation argument > (users being stuck with a confusing choice between multiple > libraries). Let's make it happen! So what needs to be done to move things forward? Do we need to draw up a roadmap? A table with functions that respond to common use cases in natual science, computing, and economics? From wesmckinn at gmail.com Sat Jul 30 12:20:52 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 30 Jul 2011 12:20:52 -0400 Subject: [SciPy-User] Status of TimeSeries SciKit In-Reply-To: References: <08B2C8B1-DD0B-4D02-82F0-4CBCD304AA31@bilokon.co.uk> <7B9AF0B6-8015-4736-AE31-53725695DE40@gmail.com> <7B0D4803-D6E3-4451-B60E-957966CCC73D@gmail.com> <20110726222843.GB8920@phare.normalesup.org> <20110727141251.GB30024@phare.normalesup.org> Message-ID: On Sat, Jul 30, 2011 at 11:50 AM, Tim Michelsen wrote: >>>> It just so happens that Wes' use cases (from my understanding) are >>>> basically the same as mine (finance, etc). So from my own selfish point >>>> of view, the idea of pandas swallowing up the timeseries module and >>>> incorporating its functionality sounds kind of nice since that would >>>> give ME (and probably most of the people that work in the finance >>>> domain) >>> >>> I think that it is really great if the different packages doing time >>> series analysis unite. It will probably give better packages technically, >>> and there is a lot of value to the community in such work. >> >> I agree. I already have 50% or more of the features in >> scikits.timeseries, so this gets back to my fragmentation argument >> (users being stuck with a confusing choice between multiple >> libraries). Let's make it happen! > So what needs to be done to move things forward? > Do we need to draw up a roadmap? > A table with functions that respond to common use cases in natual > science, computing, and economics? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Having a place to collect concrete use cases (like your list from the prior e-mail, but with illustrative code snippets) would be good. You're welcome to start doing it here: https://github.com/wesm/pandas/wiki A good place to start, which I can do when I have some time, would be to start moving the scikits.timeseries code into pandas. There are several key components - Date and DateArray stuff, frequency implementations - masked array time series implementations (record array and not) - plotting - reporting, moving window functions, etc. We need to evaluate Date/DateArray as they relate to numpy.datetime64 and see what can be done. I haven't looked closely but I'm not sure if all the convenient attribute access stuff (day, month, day_of_week, weekday, etc.) is available in NumPy yet. I suspect it would be reasonably straightforward to wrap DateArray so it can be an Index for a pandas object. I won't have much time for this until mid-August, but a couple days' hacking should get most of the pieces into place. I guess we can just keep around the masked array classes for legacy API support and for feature completeness. - Wes From ahig321 at gmail.com Fri Jul 29 20:41:01 2011 From: ahig321 at gmail.com (Adam Higuera) Date: Fri, 29 Jul 2011 20:41:01 -0400 Subject: [SciPy-User] integral of oscillatory functions In-Reply-To: References: <1311978964.16438.YahooMailClassic@web113610.mail.gq1.yahoo.com> Message-ID: If you can write the function in the form e^(i M h(t)), which, you obviously can, just take h(t) = -i/M log(f(t)), where M is large, there are some methods you can use that involve finding zeros in the derivate of h(t), and a few other asymptotic methods. Asymptotic methods aren't the sort of thing you do with SciPy, though. They'd be the sort of thing you'd do with Mathematica/Maple, or with pen and paper. -Adam On Fri, Jul 29, 2011 at 8:10 PM, Charles R Harris wrote: > > > On Fri, Jul 29, 2011 at 4:36 PM, Juan Fiol wrote: > >> Hi, I have to integrate a *highly* oscilatory function. I've been looking >> in the literature and found that there some "asymptotic methods" (that work >> better when oscillations are stronger and cancel most of the integrand), >> some methods derived from filon's method and other called Levin method. I've >> had taken quick looks into several mathematical papers on the subject but It >> will probably take me more than one month (and may be much more) to >> understand the subject and put it into a routine. Does anybody know if there >> is anything of the sort implemented in scipy? Otherwise, I would appreciate >> if I get advice for a more "practical" place where to look. >> The integrand is not strictly of the form f(t) e^(iwt). >> Any help would be welcome. Thanks >> > > How oscillatory is *highly* oscillatory? Does the function have any > particular form? Where does it come from? > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu.lacage at alcmeon.com Sat Jul 30 16:39:40 2011 From: mathieu.lacage at alcmeon.com (mathieu lacage) Date: Sat, 30 Jul 2011 22:39:40 +0200 Subject: [SciPy-User] creating a view of an array Message-ID: hi, Let's say I have a big large array: a = numpy.empty((10000*1000,10)) and I want to create a view of that array to be able to process a subset of its data without making a copy. One column: b = a[:,1] Two adjacent columns: b = a[:,1:2] Can I do the same (no memory allocation) for two columns that are not adjacent ? i.e. if I want to create an array that is a view for the 1st and 3rd columns. Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanleeuwen.martin at gmail.com Sat Jul 30 17:48:48 2011 From: vanleeuwen.martin at gmail.com (Martin van Leeuwen) Date: Sat, 30 Jul 2011 14:48:48 -0700 Subject: [SciPy-User] creating a view of an array In-Reply-To: References: Message-ID: Hi Mathieu, You can index using a tuple too, like: a[:,(0,2)] That way you can index into any columns. Martin 2011/7/30 mathieu lacage : > hi, > > Let's say I have a big large array: > > a = numpy.empty((10000*1000,10)) > > and I want to create a view of that array to be able to process a subset of > its data without making a copy. > > One column: > b = a[:,1] > > Two adjacent columns: > b = a[:,1:2] > > Can I do the same (no memory allocation) for two columns that are not > adjacent ? i.e. if I want to create an array that is a view for the 1st and > 3rd columns. > > > Mathieu > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From zachary.pincus at yale.edu Sat Jul 30 17:58:15 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Sat, 30 Jul 2011 17:58:15 -0400 Subject: [SciPy-User] creating a view of an array In-Reply-To: References: Message-ID: <8E9DE6E3-B685-49C8-90AA-CDA1A7197AA8@yale.edu> Hi, Indexing like a[:,(0,2)] creates a copy. (The logic isn't smart enough to determine if the index tuple is regular enough that it might be able to represent it as a view...) import numpy a = numpy.zeros((1000,1000)) b = a[:,:3:2] # This is what you want b.base is a # True c = a[:,(0,2)] # Makes a copy c.base is a # False Pretty much any simple slice will just create a view. (Right, experts?) In addition, you could muck with the numpy.ndarray constructor and pass base, offset, dtype, and strides parameters to make a new view on an old array that does things that normal slicing couldn't. (Like zero-length strides, etc.) Zach On Jul 30, 2011, at 5:48 PM, Martin van Leeuwen wrote: > Hi Mathieu, > > You can index using a tuple too, like: > > a[:,(0,2)] > > That way you can index into any columns. > > Martin > > 2011/7/30 mathieu lacage : >> hi, >> >> Let's say I have a big large array: >> >> a = numpy.empty((10000*1000,10)) >> >> and I want to create a view of that array to be able to process a subset of >> its data without making a copy. >> >> One column: >> b = a[:,1] >> >> Two adjacent columns: >> b = a[:,1:2] >> >> Can I do the same (no memory allocation) for two columns that are not >> adjacent ? i.e. if I want to create an array that is a view for the 1st and >> 3rd columns. >> >> >> Mathieu >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From jkington at wisc.edu Sat Jul 30 18:02:33 2011 From: jkington at wisc.edu (Joe Kington) Date: Sat, 30 Jul 2011 14:02:33 -0800 Subject: [SciPy-User] creating a view of an array In-Reply-To: References: Message-ID: On Sat, Jul 30, 2011 at 1:48 PM, Martin van Leeuwen < vanleeuwen.martin at gmail.com> wrote: > Hi Mathieu, > > You can index using a tuple too, like: > > a[:,(0,2)] > > That way you can index into any columns. > Just to clarify: Using a list or tuple for indexing is "fancy" indexing and returns a copy, not a view. To illustrate the difference: import numpy as np original = np.zeros((5,3)) normal_indexing = original[:,:2] fancy_indexing = original[:,(0,2)] normal_indexing[0] = 5 fancy_indexing[0] = 600 print original So, modifying the "fancy_indexing" version doesn't modify the original (and is a copy of the original, using more memory). However, modifying the "normal_indexing" version (which is a view of the same memory as the original) _does_ modify the original. I believe the OP was specifically wanting a view. (And I don't think it's directly possible, though you can write a simple wrapper class to do it.) Cheers, -Joe > > Martin > > 2011/7/30 mathieu lacage : > > hi, > > > > Let's say I have a big large array: > > > > a = numpy.empty((10000*1000,10)) > > > > and I want to create a view of that array to be able to process a subset > of > > its data without making a copy. > > > > One column: > > b = a[:,1] > > > > Two adjacent columns: > > b = a[:,1:2] > > > > Can I do the same (no memory allocation) for two columns that are not > > adjacent ? i.e. if I want to create an array that is a view for the 1st > and > > 3rd columns. > > > > > > Mathieu > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinstocks at gmail.com Sun Jul 31 00:59:53 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Sun, 31 Jul 2011 00:59:53 -0400 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: <1312088393.2992.6.camel@SietchTabr> Skipper, By any chance, do you know anyone who might be able to review the f2py parts of my prospective code contribution to scipy.linalg.qr? The pull request can be viewed here: https://github.com/scipy/scipy/pull/44 -- Collin -------------- next part -------------- An embedded message was scrubbed... From: Skipper Seabold Subject: Re: [SciPy-User] generic_flapack.pyf and geqp3 Date: Mon, 11 Jul 2011 23:17:06 -0500 Size: 6027 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From mathieu.lacage at alcmeon.com Sun Jul 31 02:53:59 2011 From: mathieu.lacage at alcmeon.com (mathieu lacage) Date: Sun, 31 Jul 2011 08:53:59 +0200 Subject: [SciPy-User] creating a view of an array In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 12:02 AM, Joe Kington wrote: > > a[:,(0,2)] >> > Indeed. It "just" creates a copy :/ > >> That way you can index into any columns. >> > You can index using a tuple too, like: > > Just to clarify: > > Using a list or tuple for indexing is "fancy" indexing and returns a copy, > not a view. > yes. > > I believe the OP was specifically wanting a view. (And I don't think it's > directly possible, though you can write a simple wrapper class to do it.) > With some creative use of index offsets, I came up with the following for two arbitrary columns in an array: # create big array a = numpy.empty((10000000,10)) # extract columns 2 and 4 (no copy !) b = a[::,2:5:2] This trick will obviously not work for three columns unless they are equally spaced. So, I wonder if there are other tricks I could use to build an array where the stride is irregular (say, extract columns 2, 4, and 5 from the above array without creating a copy) Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From fiolj at yahoo.com Sun Jul 31 03:16:38 2011 From: fiolj at yahoo.com (Juan Fiol) Date: Sun, 31 Jul 2011 00:16:38 -0700 (PDT) Subject: [SciPy-User] integral of oscillatory functions Message-ID: <1312096598.56184.YahooMailClassic@web113620.mail.gq1.yahoo.com> Thanks for all the answers. They were? very helpful. In general, seems that there is no other way that to tackle the specific problem with some analytical work. I'l look more deeply into that, and let you know if something interesting pops out. Joshua, thanks for the link. I think that the method at what the document refers is the one I've mentioned as "asymptotic". I'd already skimmed over most of the papers cited there but as I said only the surface. I think that this document has an amenable enough presentation that will make it useful to give a try. Adam: I'll look further into the asymptotic methods. The problem is that the oscillatory part is complicated enough to make it painful to go through the analytical work. I do not plan to do the work in scipy/numpy. I am trying to solve the problem in python but need then to adapt the solution to one of my fortran programs. Charles: I apologize for not being specific enough. The integrand itself is quite complicated. Moreover its form may change.? I am attaching a short pdf with my current definitions, but in python would be something as # Auxiliary functions and definitions # These values will be changing w0= 0.4 dw= 0.1 w=np.array([w0,w0+dw, w0-dw]) A=3. B=4. Omega= B - (A*(a/np.square(w))).sum() def h(t): ? return ((a/w)*(np.cos(w*t))).sum() + C def G(t): ? return (k_A*(a/np.square(w))*np.sin(w*t)).sum() # function to integrate def integrand(t): ? return np.exp(1j* G(t)) * np.exp(1j* Omega*t) * f(t) --- On Sat, 7/30/11, Charles R Harris wrote: From: Charles R Harris Subject: Re: [SciPy-User] integral of oscillatory functions To: fiolj at yahoo.com, "SciPy Users List" Date: Saturday, July 30, 2011, 1:10 AM On Fri, Jul 29, 2011 at 4:36 PM, Juan Fiol wrote: Hi, I have to integrate a *highly* oscilatory function. I've been looking in the literature and found that there some "asymptotic methods" (that work better when oscillations are stronger and cancel most of the integrand), some methods derived from filon's method and other called Levin method. I've had taken quick looks into several mathematical papers on the subject but It will probably take me more than one month (and may be much more) to understand the subject and put it into a routine. Does anybody know if there is anything of the sort implemented in scipy? Otherwise, I would appreciate if I get advice for a more "practical" place where to look. The integrand is not strictly of the form f(t) e^(iwt). Any help would be welcome. Thanks How oscillatory is *highly* oscillatory? Does the function have any particular form? Where does it come from? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tempo.pdf Type: application/pdf Size: 65680 bytes Desc: not available URL: From woiski at gmail.com Sun Jul 31 08:04:10 2011 From: woiski at gmail.com (Emanuel Woiski) Date: Sun, 31 Jul 2011 09:04:10 -0300 Subject: [SciPy-User] integral of oscillatory functions In-Reply-To: <1312096598.56184.YahooMailClassic@web113620.mail.gq1.yahoo.com> References: <1312096598.56184.YahooMailClassic@web113620.mail.gq1.yahoo.com> Message-ID: 2011/7/31 Juan Fiol > Thanks for all the answers. They were very helpful. In general, seems that > there is no other way that to tackle the specific problem with some > analytical work. I'l look more deeply into that, and let you know if > something interesting pops out. > > Joshua, thanks for the link. I think that the method at what the document > refers is the one I've mentioned as "asymptotic". I'd already skimmed over > most of the papers cited there but as I said only the surface. I think that > this document has an amenable enough presentation that will make it useful > to give a try. > > Adam: I'll look further into the asymptotic methods. The problem is that > the oscillatory part is complicated enough to make it painful to go through > the analytical work. I do not plan to do the work in scipy/numpy. I am > trying to solve the problem in python but need then to adapt the solution to > one of my fortran programs. > > Charles: I apologize for not being specific enough. The integrand itself is > quite complicated. Moreover its form may change. > I am attaching a short pdf with my current definitions, but in python would > be something as > > # Auxiliary functions and definitions > # These values will be changing > w0= 0.4 > dw= 0.1 > w=np.array([w0,w0+dw, w0-dw]) > A=3. > B=4. > > Omega= B - (A*(a/np.square(w))).sum() > > def h(t): > return ((a/w)*(np.cos(w*t))).sum() + C > > def G(t): > return (k_A*(a/np.square(w))*np.sin(w*t)).sum() > # function to integrate > def integrand(t): > return np.exp(1j* G(t)) * np.exp(1j* Omega*t) * f(t) > > > Upon examination of your equations, I thought you should try mpmath [1] or even better sympy [2], which is an open source Python library for symbolic mathematics, and uses mpmath internally. Another alternative is Sage [3] whose Mission statement is: *Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab*. I'd try sympy first. [1] code.google.com/p/*mpmath/* [2] code.google.com/p/*sympy*/ [3] http://www.sagemath.org/ > > regards woiski -------------- next part -------------- An HTML attachment was scrubbed... URL: From bblais at bryant.edu Sun Jul 31 08:48:00 2011 From: bblais at bryant.edu (Brian Blais) Date: Sun, 31 Jul 2011 08:48:00 -0400 Subject: [SciPy-User] recommendation for saving data Message-ID: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> Hello, I was wondering if there are any recommendations for formats for saving scientific data. I am running a simulation, which has many somewhat-indepedent parts which have their own internal state and parameters. I've been using pickle (gzipped) to save the entire object (which contains subobjects, etc...), but it is getting too unwieldy and I think it is time to look for a more robust solution. Ideally I'd like to have something where I can call a save method on the simulation object, and it will call the save methods on all the children, on down the line all saving into one file. It'd also be nice if it were cross-platform, and I could depend on the files being readable into the future for a while. Are there any good standards for this? What do you use for saving scientific data? thank you, Brian Blais -- Brian Blais bblais at bryant.edu http://web.bryant.edu/~bblais http://bblais.blogspot.com/ From paul.anton.letnes at gmail.com Sun Jul 31 09:01:37 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 31 Jul 2011 15:01:37 +0200 Subject: [SciPy-User] recommendation for saving data In-Reply-To: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> References: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> Message-ID: I would recommend writing and reading hdf5 with h5py (though there are other python packages). I find h5py to be very convenient in python, and the hdf5 library + wrappers exist for C, C++, Fortran90, and Java (and probably more). The hdf5 format is platform independent and processor architecture independent - that's one of their design goals. http://alfven.org/wp/hdf5-for-python/ http://www.hdfgroup.org/HDF5/ Paul On 31. juli 2011, at 14.48, Brian Blais wrote: > Hello, > > I was wondering if there are any recommendations for formats for saving scientific data. I am running a simulation, which has many somewhat-indepedent parts which have their own internal state and parameters. I've been using pickle (gzipped) to save the entire object (which contains subobjects, etc...), but it is getting too unwieldy and I think it is time to look for a more robust solution. Ideally I'd like to have something where I can call a save method on the simulation object, and it will call the save methods on all the children, on down the line all saving into one file. It'd also be nice if it were cross-platform, and I could depend on the files being readable into the future for a while. > > Are there any good standards for this? What do you use for saving scientific data? > > > thank you, > > Brian Blais > > > > -- > Brian Blais > bblais at bryant.edu > http://web.bryant.edu/~bblais > http://bblais.blogspot.com/ > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From joshua.stults at gmail.com Sun Jul 31 09:50:05 2011 From: joshua.stults at gmail.com (Joshua Stults) Date: Sun, 31 Jul 2011 09:50:05 -0400 Subject: [SciPy-User] integral of oscillatory functions In-Reply-To: <1312096598.56184.YahooMailClassic@web113620.mail.gq1.yahoo.com> References: <1312096598.56184.YahooMailClassic@web113620.mail.gq1.yahoo.com> Message-ID: On Sun, Jul 31, 2011 at 3:16 AM, Juan Fiol wrote: > > Joshua, thanks for the link. I think that the method at what the document refers is the one I've mentioned as "asymptotic". I'd already skimmed over most of the papers cited there but as I said only the surface. I think that this document has an amenable enough presentation that will make it useful to give a try. You're welcome, it was written well enough that even an engineer like me could understand, so I thought it was worth sharing ; - ) You've probably already come across this approach in your lit review, but this set of slides, http://www.newton.ac.uk/programmes/HOP/seminars/070509001.pdf has some interesting things on a *numerical* approach that uses a steepest descent integration path in the complex plane to improve convergence for integrals of oscillatory functions. Would it work for your case? -- Joshua Stults Website: variousconsequences.com Hackerspace: daytondiode.org From fperez.net at gmail.com Sun Jul 31 13:19:51 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 31 Jul 2011 12:19:51 -0500 Subject: [SciPy-User] [ANN] IPython 0.11 is officially out Message-ID: Hi all, on behalf of the IPython development team, I'm thrilled to announce, after more than two years of development work, the official release of IPython 0.11. This release brings a long list of improvements and new features (along with hopefully few new bugs). We have completely refactored IPython, making it a much more friendly project to participate in by having better separated and organized internals. We hope you will not only use the new tools and libraries, but also join us with new ideas and development. After this very long development effort, we hope to make a few stabilization releases at a quicker pace, where we iron out the kinks in the new APIs and complete some remaining internal cleanup work. We will then make a (long awaited) IPython 1.0 release with these stable APIs. *Downloads* Download links and instructions are at: http://ipython.org/download.html And IPython is also on PyPI: http://pypi.python.org/pypi/ipython Those contain a built version of the HTML docs; if you want pure source downloads with no docs, those are available on github: Tarball: https://github.com/ipython/ipython/tarball/rel-0.11 Zipball: https://github.com/ipython/ipython/zipball/rel-0.11 * Features * Here is a quick listing of the major new features: - Standalone Qt console - High-level parallel computing with ZeroMQ - New model for GUI/plotting support in the terminal - A two-process architecture - Fully refactored internal project structure - Vim integration - Integration into Microsoft Visual Studio - Improved unicode support - Python 3 support - New profile model - SQLite storage for history - New configuration system - Pasting of code with prompts And many more... We closed over 500 tickets, merged over 200 pull requests, and more than 60 people contributed over 2200 commits for the final release. Please see our release notes for the full details on everything about this release: https://github.com/ipython/ipython/zipball/rel-0.11 As usual, if you find any problem, please file a ticket --or even better, a pull request fixing it-- on our github issues site (https://github.com/ipython/ipython/issues/). Many thanks to all who contributed! Fernando, on behalf of the IPython development team. http://ipython.org From ralf.gommers at googlemail.com Sun Jul 31 13:56:49 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 31 Jul 2011 19:56:49 +0200 Subject: [SciPy-User] deconvolution of 1-D signals Message-ID: Hi, For a measured signal that is the convolution of a real signal with a response function, plus measurement noise on top, I want to recover the real signal. Since I know what the response function is and the noise is high-frequency compared to the real signal, a straightforward approach is to smooth the measured signal (or fit a spline to it), then remove the response function by deconvolution. See example code below. Can anyone point me towards code that does the deconvolution efficiently? Perhaps signal.deconvolve would do the trick, but I can't seem to make it work (except for directly on the output of np.convolve(y, window, mode='valid')). Thanks, Ralf import numpy as np from scipy import interpolate, signal import matplotlib.pyplot as plt # Real signal x = np.linspace(0, 10, num=201) y = np.sin(x + np.pi/5) # Noisy signal mode = 'valid' window_len = 11. window = np.ones(window_len) / window_len y_meas = np.convolve(y, window, mode=mode) y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 if mode == 'full': xstep = x[1] - x[0] x_meas = np.concatenate([ \ np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, num=window_len//2), x, np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, num=window_len//2)]) elif mode == 'valid': x_meas = x[window_len//2:-window_len//2+1] elif mode == 'same': x_meas = x # Approximating spline xs = np.linspace(0, 10, num=500) knots = np.array([1, 3, 5, 7, 9]) tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) ys = interpolate.splev(xs, tck, der=0) # Find (low-frequency part of) original signal by deconvolution of smoothed # measured signal and known window. y_deconv = signal.deconvolve(ys, window)[0] #FIXME # Plot all signals fig = plt.figure() ax = fig.add_subplot(111) ax.plot(x, y, 'b-', label="Original signal") ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") ax.plot(xs, ys, 'g-', label="Approximating spline") ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', label="signal.deconvolve result") ax.set_ylim([-1.2, 2]) ax.legend() plt.show() -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jul 31 15:10:55 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Jul 2011 15:10:55 -0400 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 1:56 PM, Ralf Gommers wrote: > Hi, > > For a measured signal that is the convolution of a real signal with a > response function, plus measurement noise on top, I want to recover the real > signal. Since I know what the response function is and the noise is > high-frequency compared to the real signal, a straightforward approach is to > smooth the measured signal (or fit a spline to it), then remove the response > function by deconvolution. See example code below. > > Can anyone point me towards code that does the deconvolution efficiently? > Perhaps signal.deconvolve would do the trick, but I can't seem to make it > work (except for directly on the output of np.convolve(y, window, > mode='valid')). > > Thanks, > Ralf > > > import numpy as np > from scipy import interpolate, signal > import matplotlib.pyplot as plt > > # Real signal > x = np.linspace(0, 10, num=201) > y = np.sin(x + np.pi/5) > > # Noisy signal > mode = 'valid' > window_len = 11. > window = np.ones(window_len) / window_len > y_meas = np.convolve(y, window, mode=mode) > y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 > if mode == 'full': > ??? xstep = x[1] - x[0] > ??? x_meas = np.concatenate([ \ > ??????? np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, > num=window_len//2), > ??????? x, > ??????? np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, > num=window_len//2)]) > elif mode == 'valid': > ??? x_meas = x[window_len//2:-window_len//2+1] > elif mode == 'same': > ??? x_meas = x > > # Approximating spline > xs = np.linspace(0, 10, num=500) > knots = np.array([1, 3, 5, 7, 9]) > tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) > ys = interpolate.splev(xs, tck, der=0) > > # Find (low-frequency part of) original signal by deconvolution of smoothed > # measured signal and known window. > y_deconv = signal.deconvolve(ys, window)[0]? #FIXME > > # Plot all signals > fig = plt.figure() > ax = fig.add_subplot(111) > > ax.plot(x, y, 'b-', label="Original signal") > ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") > ax.plot(xs, ys, 'g-', label="Approximating spline") > ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', > ??????? label="signal.deconvolve result") > ax.set_ylim([-1.2, 2]) > ax.legend() > > plt.show() signal.deconvolve is essentially signal.lfilter, but I don't quite understand what it does. I changed 2 lines, partly by trial and error and by analogy to ARMA models. I'm not quite sure the following changes are correct, but at least it produces a nice graph instead of deconvolve use lfilter directly y_deconv = signal.lfilter(window, [1.], ys[::-1])[::-1] and center lfiltered window: ax.plot(xs[window.size//2-1:-window.size//2], y_deconv[:-window.size+1], 'k-', If your signal is periodic, then I would go for the fft versions of convolution, and iir filtering. My initial guesses were that there is either something wrong (hidden assumption) about the starting values of signal convolve, or there are problems because of the non-stationarity. But maybe a signal expert knows better. Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jkington at wisc.edu Sun Jul 31 15:21:09 2011 From: jkington at wisc.edu (Joe Kington) Date: Sun, 31 Jul 2011 11:21:09 -0800 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: I'm not a signal processing expert by any means, but this is a standard problem in seismology. The problem is that your "window" has near-zero amplitude at high frequencies, so you're blowing up the high-frequency content of the noisy signal when you divide in the frequency domain. A "water-level" deconvolution is a very simple way around this, and often works well. It also allows you to skip the spline fitting, as it's effectively doing a low-pass filter. Basically, you just: 1) convert to the frequency domain 2) replace any amplitudes below some threshold with that threshold in the signal you're dividing by (the window, in your case) 3) pad the lengths to match 4) divide (the actual deconvolution) 5) convert back to the time domain As a simple implementation (I've left out the various different modes of padding here... This is effectively just mode='same'.) def water_level_decon(ys, window, eps=0.1): yfreq = np.fft.fft(ys) max_amp = yfreq.max() winfreq = np.fft.fft(window) winfreq[winfreq < eps] = eps padded = eps * np.ones_like(yfreq) padded[:winfreq.size] = winfreq newfreq = yfreq / padded newfreq *= max_amp / newfreq.max() return np.fft.ifft(newfreq) Hope that helps a bit. -Joe In most cases, you'll need to adjust the eps parameter to match the level of noise you want to remove. In your particular case On Sun, Jul 31, 2011 at 9:56 AM, Ralf Gommers wrote: > Hi, > > For a measured signal that is the convolution of a real signal with a > response function, plus measurement noise on top, I want to recover the real > signal. Since I know what the response function is and the noise is > high-frequency compared to the real signal, a straightforward approach is to > smooth the measured signal (or fit a spline to it), then remove the response > function by deconvolution. See example code below. > > Can anyone point me towards code that does the deconvolution efficiently? > Perhaps signal.deconvolve would do the trick, but I can't seem to make it > work (except for directly on the output of np.convolve(y, window, > mode='valid')). > > Thanks, > Ralf > > > import numpy as np > from scipy import interpolate, signal > import matplotlib.pyplot as plt > > # Real signal > x = np.linspace(0, 10, num=201) > y = np.sin(x + np.pi/5) > > # Noisy signal > mode = 'valid' > window_len = 11. > window = np.ones(window_len) / window_len > y_meas = np.convolve(y, window, mode=mode) > y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 > if mode == 'full': > xstep = x[1] - x[0] > x_meas = np.concatenate([ \ > np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, > num=window_len//2), > x, > np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, > num=window_len//2)]) > elif mode == 'valid': > x_meas = x[window_len//2:-window_len//2+1] > elif mode == 'same': > x_meas = x > > # Approximating spline > xs = np.linspace(0, 10, num=500) > knots = np.array([1, 3, 5, 7, 9]) > tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) > ys = interpolate.splev(xs, tck, der=0) > > # Find (low-frequency part of) original signal by deconvolution of smoothed > # measured signal and known window. > y_deconv = signal.deconvolve(ys, window)[0] #FIXME > > # Plot all signals > fig = plt.figure() > ax = fig.add_subplot(111) > > ax.plot(x, y, 'b-', label="Original signal") > ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") > ax.plot(xs, ys, 'g-', label="Approximating spline") > ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', > label="signal.deconvolve result") > ax.set_ylim([-1.2, 2]) > ax.legend() > > plt.show() > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkington at wisc.edu Sun Jul 31 15:22:53 2011 From: jkington at wisc.edu (Joe Kington) Date: Sun, 31 Jul 2011 11:22:53 -0800 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: Gah, I hit send too soon! The default eps parameter in that function should be more like 1.0e-6 instead of 0.1. You'll generally need to adjust the eps parameter to match the signal-to-noise ratio of the two signals you're deconvolving. Hope it's useful, at any rate. -Joe On Sun, Jul 31, 2011 at 11:21 AM, Joe Kington wrote: > I'm not a signal processing expert by any means, but this is a standard > problem in seismology. > > The problem is that your "window" has near-zero amplitude at high > frequencies, so you're blowing up the high-frequency content of the noisy > signal when you divide in the frequency domain. > > A "water-level" deconvolution is a very simple way around this, and often > works well. It also allows you to skip the spline fitting, as it's > effectively doing a low-pass filter. > > Basically, you just: > 1) convert to the frequency domain > 2) replace any amplitudes below some threshold with that threshold in the > signal you're dividing by (the window, in your case) > 3) pad the lengths to match > 4) divide (the actual deconvolution) > 5) convert back to the time domain > > As a simple implementation (I've left out the various different modes of > padding here... This is effectively just mode='same'.) > > def water_level_decon(ys, window, eps=0.1): > yfreq = np.fft.fft(ys) > max_amp = yfreq.max() > > winfreq = np.fft.fft(window) > winfreq[winfreq < eps] = eps > > padded = eps * np.ones_like(yfreq) > padded[:winfreq.size] = winfreq > > newfreq = yfreq / padded > newfreq *= max_amp / newfreq.max() > > return np.fft.ifft(newfreq) > > > Hope that helps a bit. > -Joe > > > In most cases, you'll need to adjust the eps parameter to match the level > of noise you want to remove. In your particular case > On Sun, Jul 31, 2011 at 9:56 AM, Ralf Gommers > wrote: > >> Hi, >> >> For a measured signal that is the convolution of a real signal with a >> response function, plus measurement noise on top, I want to recover the real >> signal. Since I know what the response function is and the noise is >> high-frequency compared to the real signal, a straightforward approach is to >> smooth the measured signal (or fit a spline to it), then remove the response >> function by deconvolution. See example code below. >> >> Can anyone point me towards code that does the deconvolution efficiently? >> Perhaps signal.deconvolve would do the trick, but I can't seem to make it >> work (except for directly on the output of np.convolve(y, window, >> mode='valid')). >> >> Thanks, >> Ralf >> >> >> import numpy as np >> from scipy import interpolate, signal >> import matplotlib.pyplot as plt >> >> # Real signal >> x = np.linspace(0, 10, num=201) >> y = np.sin(x + np.pi/5) >> >> # Noisy signal >> mode = 'valid' >> window_len = 11. >> window = np.ones(window_len) / window_len >> y_meas = np.convolve(y, window, mode=mode) >> y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 >> if mode == 'full': >> xstep = x[1] - x[0] >> x_meas = np.concatenate([ \ >> np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, >> num=window_len//2), >> x, >> np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, >> num=window_len//2)]) >> elif mode == 'valid': >> x_meas = x[window_len//2:-window_len//2+1] >> elif mode == 'same': >> x_meas = x >> >> # Approximating spline >> xs = np.linspace(0, 10, num=500) >> knots = np.array([1, 3, 5, 7, 9]) >> tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) >> ys = interpolate.splev(xs, tck, der=0) >> >> # Find (low-frequency part of) original signal by deconvolution of >> smoothed >> # measured signal and known window. >> y_deconv = signal.deconvolve(ys, window)[0] #FIXME >> >> # Plot all signals >> fig = plt.figure() >> ax = fig.add_subplot(111) >> >> ax.plot(x, y, 'b-', label="Original signal") >> ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") >> ax.plot(xs, ys, 'g-', label="Approximating spline") >> ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', >> label="signal.deconvolve result") >> ax.set_ylim([-1.2, 2]) >> ax.legend() >> >> plt.show() >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jul 31 15:41:51 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 31 Jul 2011 21:41:51 +0200 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 9:10 PM, wrote: > On Sun, Jul 31, 2011 at 1:56 PM, Ralf Gommers > wrote: > > Hi, > > > > For a measured signal that is the convolution of a real signal with a > > response function, plus measurement noise on top, I want to recover the > real > > signal. Since I know what the response function is and the noise is > > high-frequency compared to the real signal, a straightforward approach is > to > > smooth the measured signal (or fit a spline to it), then remove the > response > > function by deconvolution. See example code below. > > > > Can anyone point me towards code that does the deconvolution efficiently? > > Perhaps signal.deconvolve would do the trick, but I can't seem to make it > > work (except for directly on the output of np.convolve(y, window, > > mode='valid')). > > > > Thanks, > > Ralf > > > > > > import numpy as np > > from scipy import interpolate, signal > > import matplotlib.pyplot as plt > > > > # Real signal > > x = np.linspace(0, 10, num=201) > > y = np.sin(x + np.pi/5) > > > > # Noisy signal > > mode = 'valid' > > window_len = 11. > > window = np.ones(window_len) / window_len > > y_meas = np.convolve(y, window, mode=mode) > > y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 > > if mode == 'full': > > xstep = x[1] - x[0] > > x_meas = np.concatenate([ \ > > np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, > > num=window_len//2), > > x, > > np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, > > num=window_len//2)]) > > elif mode == 'valid': > > x_meas = x[window_len//2:-window_len//2+1] > > elif mode == 'same': > > x_meas = x > > > > # Approximating spline > > xs = np.linspace(0, 10, num=500) > > knots = np.array([1, 3, 5, 7, 9]) > > tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) > > ys = interpolate.splev(xs, tck, der=0) > > > > # Find (low-frequency part of) original signal by deconvolution of > smoothed > > # measured signal and known window. > > y_deconv = signal.deconvolve(ys, window)[0] #FIXME > > > > # Plot all signals > > fig = plt.figure() > > ax = fig.add_subplot(111) > > > > ax.plot(x, y, 'b-', label="Original signal") > > ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") > > ax.plot(xs, ys, 'g-', label="Approximating spline") > > ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', > > label="signal.deconvolve result") > > ax.set_ylim([-1.2, 2]) > > ax.legend() > > > > plt.show() > > signal.deconvolve is essentially signal.lfilter, but I don't quite > understand what it does. > > I changed 2 lines, partly by trial and error and by analogy to ARMA > models. I'm not quite sure the following changes are correct, but at > least it produces a nice graph > > It doesn't have artifacts, but y_meas is almost identical to the spline, not to the real signal. Not sure what's going wrong there, but it didn't perform a deconvolution. instead of deconvolve use lfilter directly > > y_deconv = signal.lfilter(window, [1.], ys[::-1])[::-1] > > and > > center lfiltered window: > > ax.plot(xs[window.size//2-1:-window.size//2], y_deconv[:-window.size+1], > 'k-', > > If your signal is periodic, then I would go for the fft versions of > convolution, and iir filtering. > The signal is not periodic. Ralf My initial guesses were that there is either something wrong (hidden > assumption) about the starting values of signal convolve, or there are > problems because of the non-stationarity. > > But maybe a signal expert knows better. > > Josef > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jul 31 17:22:24 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Jul 2011 17:22:24 -0400 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 3:41 PM, Ralf Gommers wrote: > > > On Sun, Jul 31, 2011 at 9:10 PM, wrote: >> >> On Sun, Jul 31, 2011 at 1:56 PM, Ralf Gommers >> wrote: >> > Hi, >> > >> > For a measured signal that is the convolution of a real signal with a >> > response function, plus measurement noise on top, I want to recover the >> > real >> > signal. Since I know what the response function is and the noise is >> > high-frequency compared to the real signal, a straightforward approach >> > is to >> > smooth the measured signal (or fit a spline to it), then remove the >> > response >> > function by deconvolution. See example code below. >> > >> > Can anyone point me towards code that does the deconvolution >> > efficiently? >> > Perhaps signal.deconvolve would do the trick, but I can't seem to make >> > it >> > work (except for directly on the output of np.convolve(y, window, >> > mode='valid')). >> > >> > Thanks, >> > Ralf >> > >> > >> > import numpy as np >> > from scipy import interpolate, signal >> > import matplotlib.pyplot as plt >> > >> > # Real signal >> > x = np.linspace(0, 10, num=201) >> > y = np.sin(x + np.pi/5) >> > >> > # Noisy signal >> > mode = 'valid' >> > window_len = 11. >> > window = np.ones(window_len) / window_len >> > y_meas = np.convolve(y, window, mode=mode) >> > y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 >> > if mode == 'full': >> > ??? xstep = x[1] - x[0] >> > ??? x_meas = np.concatenate([ \ >> > ??????? np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, >> > num=window_len//2), >> > ??????? x, >> > ??????? np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, >> > num=window_len//2)]) >> > elif mode == 'valid': >> > ??? x_meas = x[window_len//2:-window_len//2+1] >> > elif mode == 'same': >> > ??? x_meas = x >> > >> > # Approximating spline >> > xs = np.linspace(0, 10, num=500) >> > knots = np.array([1, 3, 5, 7, 9]) >> > tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) >> > ys = interpolate.splev(xs, tck, der=0) >> > >> > # Find (low-frequency part of) original signal by deconvolution of >> > smoothed >> > # measured signal and known window. >> > y_deconv = signal.deconvolve(ys, window)[0]? #FIXME >> > >> > # Plot all signals >> > fig = plt.figure() >> > ax = fig.add_subplot(111) >> > >> > ax.plot(x, y, 'b-', label="Original signal") >> > ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") >> > ax.plot(xs, ys, 'g-', label="Approximating spline") >> > ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', >> > ??????? label="signal.deconvolve result") >> > ax.set_ylim([-1.2, 2]) >> > ax.legend() >> > >> > plt.show() >> >> signal.deconvolve is essentially signal.lfilter, but I don't quite >> understand what it does. >> >> I changed 2 lines, partly by trial and error and by analogy to ARMA >> models. I'm not quite sure the following changes are correct, but at >> least it produces a nice graph >> > It doesn't have artifacts, but y_meas is almost identical to the spline, not > to the real signal. Not sure what's going wrong there, but it didn't perform > a deconvolution. I mixed up numerator and denominator for lfilter > >> instead of deconvolve use lfilter directly >> >> y_deconv = signal.lfilter(window, [1.], ?ys[::-1])[::-1] >> >> and >> >> center lfiltered window: >> >> ax.plot(xs[window.size//2-1:-window.size//2], y_deconv[:-window.size+1], >> 'k-', >> >> If your signal is periodic, then I would go for the fft versions of >> convolution, and iir filtering. > > The signal is not periodic. > > Ralf > >> My initial guesses were that there is either something wrong (hidden >> assumption) about the starting values of signal convolve, or there are >> problems because of the non-stationarity. In terms of IIR filter the way I understand it from the ARMA analogy, your window is not invertible >>> r = np.roots(window) >>> r*r.conj() array([ 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j]) I'm not sure if this can work, but in IIR filters the way I know it, the coefficient for the last observation is normalized to 1. Since all roots are 1, the window cannot be inverted. I think lfilter uses the same assumptions. lfilter has also a funny way to determine initial conditions (zi) which I'm never quite sure how to use. It doesn't matter much with invertible and stationary processes, but in this case, I guess it does. If I change the window to an invertible window window_len = 11. window = np.ones(window_len) / window_len window[0] = 1. then my current version, which should be close to your original version, works, the deconvolved series looks similar to the original series. In either case, I think Joe's answer is more useful, since you can directly manipulate the frequency response. I only used a simple IIR frequency domain filter with an adaptation of fftconvolve. Josef >> >> But maybe a signal expert knows better. >> >> Josef >> >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ralf.gommers at googlemail.com Sun Jul 31 18:05:24 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 1 Aug 2011 00:05:24 +0200 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 9:22 PM, Joe Kington wrote: > Gah, I hit send too soon! > > The default eps parameter in that function should be more like 1.0e-6 > instead of 0.1. > > You'll generally need to adjust the eps parameter to match the > signal-to-noise ratio of the two signals you're deconvolving. > Thanks Joe. I had tried something similar, but now I have a name for the method and a confirmation that doing something like that makes sense. I'll play with this idea some more tomorrow. Cheers, Ralf > > Hope it's useful, at any rate. > -Joe > > > On Sun, Jul 31, 2011 at 11:21 AM, Joe Kington wrote: > >> I'm not a signal processing expert by any means, but this is a standard >> problem in seismology. >> >> The problem is that your "window" has near-zero amplitude at high >> frequencies, so you're blowing up the high-frequency content of the noisy >> signal when you divide in the frequency domain. >> >> A "water-level" deconvolution is a very simple way around this, and often >> works well. It also allows you to skip the spline fitting, as it's >> effectively doing a low-pass filter. >> >> Basically, you just: >> 1) convert to the frequency domain >> 2) replace any amplitudes below some threshold with that threshold in the >> signal you're dividing by (the window, in your case) >> 3) pad the lengths to match >> 4) divide (the actual deconvolution) >> 5) convert back to the time domain >> >> As a simple implementation (I've left out the various different modes of >> padding here... This is effectively just mode='same'.) >> >> def water_level_decon(ys, window, eps=0.1): >> yfreq = np.fft.fft(ys) >> max_amp = yfreq.max() >> >> winfreq = np.fft.fft(window) >> winfreq[winfreq < eps] = eps >> >> padded = eps * np.ones_like(yfreq) >> padded[:winfreq.size] = winfreq >> >> newfreq = yfreq / padded >> newfreq *= max_amp / newfreq.max() >> >> return np.fft.ifft(newfreq) >> >> >> Hope that helps a bit. >> -Joe >> >> >> In most cases, you'll need to adjust the eps parameter to match the level >> of noise you want to remove. In your particular case >> On Sun, Jul 31, 2011 at 9:56 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> Hi, >>> >>> For a measured signal that is the convolution of a real signal with a >>> response function, plus measurement noise on top, I want to recover the real >>> signal. Since I know what the response function is and the noise is >>> high-frequency compared to the real signal, a straightforward approach is to >>> smooth the measured signal (or fit a spline to it), then remove the response >>> function by deconvolution. See example code below. >>> >>> Can anyone point me towards code that does the deconvolution efficiently? >>> Perhaps signal.deconvolve would do the trick, but I can't seem to make it >>> work (except for directly on the output of np.convolve(y, window, >>> mode='valid')). >>> >>> Thanks, >>> Ralf >>> >>> >>> import numpy as np >>> from scipy import interpolate, signal >>> import matplotlib.pyplot as plt >>> >>> # Real signal >>> x = np.linspace(0, 10, num=201) >>> y = np.sin(x + np.pi/5) >>> >>> # Noisy signal >>> mode = 'valid' >>> window_len = 11. >>> window = np.ones(window_len) / window_len >>> y_meas = np.convolve(y, window, mode=mode) >>> y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 >>> if mode == 'full': >>> xstep = x[1] - x[0] >>> x_meas = np.concatenate([ \ >>> np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, >>> num=window_len//2), >>> x, >>> np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, >>> num=window_len//2)]) >>> elif mode == 'valid': >>> x_meas = x[window_len//2:-window_len//2+1] >>> elif mode == 'same': >>> x_meas = x >>> >>> # Approximating spline >>> xs = np.linspace(0, 10, num=500) >>> knots = np.array([1, 3, 5, 7, 9]) >>> tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) >>> ys = interpolate.splev(xs, tck, der=0) >>> >>> # Find (low-frequency part of) original signal by deconvolution of >>> smoothed >>> # measured signal and known window. >>> y_deconv = signal.deconvolve(ys, window)[0] #FIXME >>> >>> # Plot all signals >>> fig = plt.figure() >>> ax = fig.add_subplot(111) >>> >>> ax.plot(x, y, 'b-', label="Original signal") >>> ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") >>> ax.plot(xs, ys, 'g-', label="Approximating spline") >>> ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', >>> label="signal.deconvolve result") >>> ax.set_ylim([-1.2, 2]) >>> ax.legend() >>> >>> plt.show() >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lou_boog2000 at yahoo.com Sun Jul 31 18:09:33 2011 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Sun, 31 Jul 2011 15:09:33 -0700 (PDT) Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: <1312150173.14383.YahooMailNeo@web34408.mail.mud.yahoo.com> I'm surprised that no one has brought it up, yet, but deconvolution has to be *very carefully* done. ?Look up "ill-posed problems". ?Yes, it can be a filter process, but just dividing by the filter FFT coefficients is dangerous since they approach zero (usually) as the frequency increases. ?That's the ill-posed part and it has to be controlled. ? "Regularization" is the what people call the controlling process. ?It's an extra assumption that one invokes on the deconvolution that limits the problems... if it's done right. ?There are surely lots of books and articles on Regularization and Deconvolution. ?Start with Google and Wikipedia and Google Scholar. ?It's not hard, but it is not a one-step problem, but more like an optimization problem. ? ? -- Lou Pecora, my views are my own. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkington at wisc.edu Sun Jul 31 18:10:24 2011 From: jkington at wisc.edu (Joe Kington) Date: Sun, 31 Jul 2011 14:10:24 -0800 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 2:05 PM, Ralf Gommers wrote: > > > On Sun, Jul 31, 2011 at 9:22 PM, Joe Kington wrote: > >> Gah, I hit send too soon! >> >> The default eps parameter in that function should be more like 1.0e-6 >> instead of 0.1. >> >> You'll generally need to adjust the eps parameter to match the >> signal-to-noise ratio of the two signals you're deconvolving. >> > > Thanks Joe. I had tried something similar, but now I have a name for the > method and a confirmation that doing something like that makes sense. I'll > play with this idea some more tomorrow. > For what it's worth, the implementation I showed there is completely wrong. I wasn't thinking clearly when I wrote that out. The same concept should still work, though. The most glaring mistake is that things should be padded before converting to the frequency domain. More like this: def water_level_decon(y_meas, window, eps=0.1): padded = np.zeros_like(y_meas) padded[:window.size] = window yfreq = np.fft.fft(y_meas) winfreq = np.fft.fft(padded) winfreq[winfreq < eps] = eps newfreq = yfreq / winfreq return np.fft.ifft(newfreq) Hope it's useful, anyway. -Joe > > Cheers, > Ralf > > > >> >> Hope it's useful, at any rate. >> -Joe >> >> >> On Sun, Jul 31, 2011 at 11:21 AM, Joe Kington wrote: >> >>> I'm not a signal processing expert by any means, but this is a standard >>> problem in seismology. >>> >>> The problem is that your "window" has near-zero amplitude at high >>> frequencies, so you're blowing up the high-frequency content of the noisy >>> signal when you divide in the frequency domain. >>> >>> A "water-level" deconvolution is a very simple way around this, and often >>> works well. It also allows you to skip the spline fitting, as it's >>> effectively doing a low-pass filter. >>> >>> Basically, you just: >>> 1) convert to the frequency domain >>> 2) replace any amplitudes below some threshold with that threshold in the >>> signal you're dividing by (the window, in your case) >>> 3) pad the lengths to match >>> 4) divide (the actual deconvolution) >>> 5) convert back to the time domain >>> >>> As a simple implementation (I've left out the various different modes of >>> padding here... This is effectively just mode='same'.) >>> >>> def water_level_decon(ys, window, eps=0.1): >>> yfreq = np.fft.fft(ys) >>> max_amp = yfreq.max() >>> >>> winfreq = np.fft.fft(window) >>> winfreq[winfreq < eps] = eps >>> >>> padded = eps * np.ones_like(yfreq) >>> padded[:winfreq.size] = winfreq >>> >>> newfreq = yfreq / padded >>> newfreq *= max_amp / newfreq.max() >>> >>> return np.fft.ifft(newfreq) >>> >>> >>> Hope that helps a bit. >>> -Joe >>> >>> >>> In most cases, you'll need to adjust the eps parameter to match the level >>> of noise you want to remove. In your particular case >>> On Sun, Jul 31, 2011 at 9:56 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> Hi, >>>> >>>> For a measured signal that is the convolution of a real signal with a >>>> response function, plus measurement noise on top, I want to recover the real >>>> signal. Since I know what the response function is and the noise is >>>> high-frequency compared to the real signal, a straightforward approach is to >>>> smooth the measured signal (or fit a spline to it), then remove the response >>>> function by deconvolution. See example code below. >>>> >>>> Can anyone point me towards code that does the deconvolution >>>> efficiently? Perhaps signal.deconvolve would do the trick, but I can't seem >>>> to make it work (except for directly on the output of np.convolve(y, window, >>>> mode='valid')). >>>> >>>> Thanks, >>>> Ralf >>>> >>>> >>>> import numpy as np >>>> from scipy import interpolate, signal >>>> import matplotlib.pyplot as plt >>>> >>>> # Real signal >>>> x = np.linspace(0, 10, num=201) >>>> y = np.sin(x + np.pi/5) >>>> >>>> # Noisy signal >>>> mode = 'valid' >>>> window_len = 11. >>>> window = np.ones(window_len) / window_len >>>> y_meas = np.convolve(y, window, mode=mode) >>>> y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 >>>> if mode == 'full': >>>> xstep = x[1] - x[0] >>>> x_meas = np.concatenate([ \ >>>> np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, >>>> num=window_len//2), >>>> x, >>>> np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, >>>> num=window_len//2)]) >>>> elif mode == 'valid': >>>> x_meas = x[window_len//2:-window_len//2+1] >>>> elif mode == 'same': >>>> x_meas = x >>>> >>>> # Approximating spline >>>> xs = np.linspace(0, 10, num=500) >>>> knots = np.array([1, 3, 5, 7, 9]) >>>> tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) >>>> ys = interpolate.splev(xs, tck, der=0) >>>> >>>> # Find (low-frequency part of) original signal by deconvolution of >>>> smoothed >>>> # measured signal and known window. >>>> y_deconv = signal.deconvolve(ys, window)[0] #FIXME >>>> >>>> # Plot all signals >>>> fig = plt.figure() >>>> ax = fig.add_subplot(111) >>>> >>>> ax.plot(x, y, 'b-', label="Original signal") >>>> ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") >>>> ax.plot(xs, ys, 'g-', label="Approximating spline") >>>> ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', >>>> label="signal.deconvolve result") >>>> ax.set_ylim([-1.2, 2]) >>>> ax.legend() >>>> >>>> plt.show() >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srey at asu.edu Sun Jul 31 20:23:25 2011 From: srey at asu.edu (Serge Rey) Date: Sun, 31 Jul 2011 17:23:25 -0700 Subject: [SciPy-User] ANN: PySAL 1.2 Message-ID: Hi all, On behalf of the PySAL development team, I'm happy to announce the official release of PySAL 1.2. PySAL is a library of tools for spatial data analysis and geocomputation written in Python. PySAL 1.2, the third official release of PySAL, brings a number of new features: - Directional (Space-Time) LISA Analytics - LISA Markov Spillover Test - Spatial Markov Homogeneity Tests - Markov Mobillity Indices - Support for a wide variety of spatial weights formats including - AcGIS SWM, Text, DBF - STATA - MATLAB MAT - MatrixMarket MTX - Lotus WK1 - DAT - and others - RTree spatial index - Getis-Ord G statistics for global and local autocorrelation - Optimized conditional randomization for local statistics - Optimized Block/Regime Spatial Weights - Thin Spatial Sparse Weights Class (WSP) along with many smaller enhancements and bug fixes. PySAL modules ------------- - pysal.core ? Core Data Structures and IO - pysal.cg ? Computational Geometry - pysal.esda ? Exploratory Spatial Data Analysis - pysal.inequality ? Spatial Inequality Analysis - pysal.spatial_dynamics ? Spatial Dynamics - pysal.spreg - Regression and Diagnostics - pysal.region ? Spatially Constrained Clustering - pysal.weights ? Spatial Weights - pysal.FileIO ? PySAL FileIO: Module for reading and writing various file types in a Pythonic way Downloads -------------- Binary installers and source distributions are available for download at http://code.google.com/p/pysal/downloads/list Documentation ------------- The documentation site is here http://pysal.org/1.2/contents.html Web sites --------- PySAL's home is here http://pysal.org/ The developer's site is here http://code.google.com/p/pysal/ Mailing Lists ------------- Please see the developer's list here http://groups.google.com/group/pysal-dev Help for users is here http://groups.google.com/group/openspace-list Bug reports ----------- To search for or report bugs, please see http://code.google.com/p/pysal/issues/list License information ------------------- See the file "LICENSE.txt" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES. Many thanks to all who contributed! Serge, on behalf of the PySAL development team. -- Sergio (Serge) Rey Professor, School of Geographical Sciences and Urban Planning GeoDa Center for Geospatial Analysis and Computation Arizona State University http://geoplan.asu.edu/rey Editor, International Regional Science Review http://irx.sagepub.com From david_baddeley at yahoo.com.au Sun Jul 31 20:51:28 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Sun, 31 Jul 2011 17:51:28 -0700 (PDT) Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: <1312159888.10394.YahooMailRC@web113403.mail.gq1.yahoo.com> Hi Ralf, I do a reasonable amount of (2 & 3D) deconvolution of microscopy images and the method I use depends quite a lot on the exact properties of the signal. You can usually get away with fft based convolutions even if your signal is not periodic as long as your kernel is significantly smaller than the signal extent. As Joe mentioned, for a noisy signal convolving with the inverse or performing fourier domain division doesn't work as you end up amplifying high frequency noise components. You thus need some form of regularisation. The thresholding of fourier components that Joe suggests does this, but you might also want to explore more sophisticated options, the simplest of which is probably Wiener filtering (http://en.wikipedia.org/wiki/Wiener_deconvolution). If you've got a signal which is constrained to be positive, it's often useful to introduce a positivity constraint on the deconvolution result which generally means you need an iterative algorithm. The choice of algorithm should also depend on the type of noise that is present in your signal - my image data is constrained to be +ve and typically has either Poisson or a mixture of Poisson and Gaussian noise and I use either the Richardson-Lucy or a weighted version of ICTM (Iterative Constrained Tikhonov-Miller) algorithm. I can provide more details of these if required. cheers, David ________________________________ From: Ralf Gommers To: SciPy Users List Sent: Mon, 1 August, 2011 5:56:49 AM Subject: [SciPy-User] deconvolution of 1-D signals Hi, For a measured signal that is the convolution of a real signal with a response function, plus measurement noise on top, I want to recover the real signal. Since I know what the response function is and the noise is high-frequency compared to the real signal, a straightforward approach is to smooth the measured signal (or fit a spline to it), then remove the response function by deconvolution. See example code below. Can anyone point me towards code that does the deconvolution efficiently? Perhaps signal.deconvolve would do the trick, but I can't seem to make it work (except for directly on the output of np.convolve(y, window, mode='valid')). Thanks, Ralf import numpy as np from scipy import interpolate, signal import matplotlib.pyplot as plt # Real signal x = np.linspace(0, 10, num=201) y = np.sin(x + np.pi/5) # Noisy signal mode = 'valid' window_len = 11. window = np.ones(window_len) / window_len y_meas = np.convolve(y, window, mode=mode) y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 if mode == 'full': xstep = x[1] - x[0] x_meas = np.concatenate([ \ np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, num=window_len//2), x, np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, num=window_len//2)]) elif mode == 'valid': x_meas = x[window_len//2:-window_len//2+1] elif mode == 'same': x_meas = x # Approximating spline xs = np.linspace(0, 10, num=500) knots = np.array([1, 3, 5, 7, 9]) tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) ys = interpolate.splev(xs, tck, der=0) # Find (low-frequency part of) original signal by deconvolution of smoothed # measured signal and known window. y_deconv = signal.deconvolve(ys, window)[0] #FIXME # Plot all signals fig = plt.figure() ax = fig.add_subplot(111) ax.plot(x, y, 'b-', label="Original signal") ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") ax.plot(xs, ys, 'g-', label="Approximating spline") ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', label="signal.deconvolve result") ax.set_ylim([-1.2, 2]) ax.legend() plt.show() -------------- next part -------------- An HTML attachment was scrubbed... URL: From joferkington at gmail.com Sun Jul 31 18:10:04 2011 From: joferkington at gmail.com (Joe Kington) Date: Sun, 31 Jul 2011 14:10:04 -0800 Subject: [SciPy-User] deconvolution of 1-D signals In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 2:05 PM, Ralf Gommers wrote: > > > On Sun, Jul 31, 2011 at 9:22 PM, Joe Kington wrote: > >> Gah, I hit send too soon! >> >> The default eps parameter in that function should be more like 1.0e-6 >> instead of 0.1. >> >> You'll generally need to adjust the eps parameter to match the >> signal-to-noise ratio of the two signals you're deconvolving. >> > > Thanks Joe. I had tried something similar, but now I have a name for the > method and a confirmation that doing something like that makes sense. I'll > play with this idea some more tomorrow. > For what it's worth, the implementation I showed there is completely wrong. I wasn't thinking clearly when I wrote that out. The same concept should still work, though. The most glaring mistake is that things should be padded before converting to the frequency domain. More like this: def water_level_decon(y_meas, window, eps=0.1): padded = np.zeros_like(y_meas) padded[:window.size] = window yfreq = np.fft.fft(y_meas) winfreq = np.fft.fft(padded) winfreq[winfreq < eps] = eps newfreq = yfreq / winfreq return np.fft.ifft(newfreq) Hope it's useful, anyway. -Joe > Cheers, > Ralf > > > >> >> Hope it's useful, at any rate. >> -Joe >> >> >> On Sun, Jul 31, 2011 at 11:21 AM, Joe Kington wrote: >> >>> I'm not a signal processing expert by any means, but this is a standard >>> problem in seismology. >>> >>> The problem is that your "window" has near-zero amplitude at high >>> frequencies, so you're blowing up the high-frequency content of the noisy >>> signal when you divide in the frequency domain. >>> >>> A "water-level" deconvolution is a very simple way around this, and often >>> works well. It also allows you to skip the spline fitting, as it's >>> effectively doing a low-pass filter. >>> >>> Basically, you just: >>> 1) convert to the frequency domain >>> 2) replace any amplitudes below some threshold with that threshold in the >>> signal you're dividing by (the window, in your case) >>> 3) pad the lengths to match >>> 4) divide (the actual deconvolution) >>> 5) convert back to the time domain >>> >>> As a simple implementation (I've left out the various different modes of >>> padding here... This is effectively just mode='same'.) >>> >>> def water_level_decon(ys, window, eps=0.1): >>> yfreq = np.fft.fft(ys) >>> max_amp = yfreq.max() >>> >>> winfreq = np.fft.fft(window) >>> winfreq[winfreq < eps] = eps >>> >>> padded = eps * np.ones_like(yfreq) >>> padded[:winfreq.size] = winfreq >>> >>> newfreq = yfreq / padded >>> newfreq *= max_amp / newfreq.max() >>> >>> return np.fft.ifft(newfreq) >>> >>> >>> Hope that helps a bit. >>> -Joe >>> >>> >>> In most cases, you'll need to adjust the eps parameter to match the level >>> of noise you want to remove. In your particular case >>> On Sun, Jul 31, 2011 at 9:56 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> Hi, >>>> >>>> For a measured signal that is the convolution of a real signal with a >>>> response function, plus measurement noise on top, I want to recover the real >>>> signal. Since I know what the response function is and the noise is >>>> high-frequency compared to the real signal, a straightforward approach is to >>>> smooth the measured signal (or fit a spline to it), then remove the response >>>> function by deconvolution. See example code below. >>>> >>>> Can anyone point me towards code that does the deconvolution >>>> efficiently? Perhaps signal.deconvolve would do the trick, but I can't seem >>>> to make it work (except for directly on the output of np.convolve(y, window, >>>> mode='valid')). >>>> >>>> Thanks, >>>> Ralf >>>> >>>> >>>> import numpy as np >>>> from scipy import interpolate, signal >>>> import matplotlib.pyplot as plt >>>> >>>> # Real signal >>>> x = np.linspace(0, 10, num=201) >>>> y = np.sin(x + np.pi/5) >>>> >>>> # Noisy signal >>>> mode = 'valid' >>>> window_len = 11. >>>> window = np.ones(window_len) / window_len >>>> y_meas = np.convolve(y, window, mode=mode) >>>> y_meas += 0.2 * np.random.rand(y_meas.size) - 0.1 >>>> if mode == 'full': >>>> xstep = x[1] - x[0] >>>> x_meas = np.concatenate([ \ >>>> np.linspace(x[0] - window_len//2 * xstep, x[0] - xstep, >>>> num=window_len//2), >>>> x, >>>> np.linspace(x[-1] + xstep, x[-1] + window_len//2 * xstep, >>>> num=window_len//2)]) >>>> elif mode == 'valid': >>>> x_meas = x[window_len//2:-window_len//2+1] >>>> elif mode == 'same': >>>> x_meas = x >>>> >>>> # Approximating spline >>>> xs = np.linspace(0, 10, num=500) >>>> knots = np.array([1, 3, 5, 7, 9]) >>>> tck = interpolate.splrep(x_meas, y_meas, s=0, k=3, t=knots, task=-1) >>>> ys = interpolate.splev(xs, tck, der=0) >>>> >>>> # Find (low-frequency part of) original signal by deconvolution of >>>> smoothed >>>> # measured signal and known window. >>>> y_deconv = signal.deconvolve(ys, window)[0] #FIXME >>>> >>>> # Plot all signals >>>> fig = plt.figure() >>>> ax = fig.add_subplot(111) >>>> >>>> ax.plot(x, y, 'b-', label="Original signal") >>>> ax.plot(x_meas, y_meas, 'r-', label="Measured, noisy signal") >>>> ax.plot(xs, ys, 'g-', label="Approximating spline") >>>> ax.plot(xs[window.size//2-1:-window.size//2], y_deconv, 'k-', >>>> label="signal.deconvolve result") >>>> ax.set_ylim([-1.2, 2]) >>>> ax.legend() >>>> >>>> plt.show() >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: