From j.reid at mail.cryst.bbk.ac.uk Fri Jul 1 03:42:40 2011 From: j.reid at mail.cryst.bbk.ac.uk (John Reid) Date: Fri, 01 Jul 2011 08:42:40 +0100 Subject: [SciPy-User] Fitting procedure to take advantage of cluster In-Reply-To: <4E0B5E59.8050203@usi.ch> References: <4E0B58AE.3090706@cs.wisc.edu> <4E0B5E59.8050203@usi.ch> Message-ID: On 29/06/11 18:18, Giovanni Luca Ciampaglia wrote: > Hi, > there are several strategies, depending on your problem. You could use a > surrogate model, like a Gaussian Process, to fit the data (see for > example Higdon et al > http://epubs.siam.org/sisc/resource/1/sjoce3/v26/i2/p448_s1?isAuthorized=no). > I have personally used scikits.learn for GP estimation but there is also > PyMC that should do the same (never tried it). > I can also immodestly recommend my own code for Gaussian processes. It is not based on Markov chain Monte Carlo but rather a maximum likelihood approach: http://sysbio.mrc-bsu.cam.ac.uk/group/index.php/Gaussian_processes_in_python From hhh.guo at gmail.com Fri Jul 1 03:48:13 2011 From: hhh.guo at gmail.com (Ning Guo) Date: Fri, 01 Jul 2011 15:48:13 +0800 Subject: [SciPy-User] Question: scipy.stats.gamma.fit Message-ID: <4E0D7BBD.6030607@gmail.com> Dear scipy-users, I'm using scipy.stats.gamma.fit to fit a set of random variables for gamma distribution. And to validate the results I also use the fitdistr function in R. However the results generated by these two packages are different, i.e. shape parameter and scale parameter for the gamma pdf are different. Though the difference is not large, I'm wondering what causes this difference. I think both of them are using maximum likelihood estimation to fit the function. Best regards! Ning From pgmdevlist at gmail.com Fri Jul 1 05:07:36 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 11:07:36 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: Message-ID: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> On Jul 1, 2011, at 1:45 AM, David Montgomery wrote: > Hi, > > Using scikits timeseries I can create daily and hourly time series....no prob > > But.... > > I have time series at 15 minutes intervals...this I dont know how to do... > > Can a timeseries array handle 15 min intervals? > Do I use a minute intervals and use mask arrays for the missing minutes? > Also..I can figure out how to create a array at minute intervals. > > So..what is best practice? Any examples? First possibility, you get the latest experimental version of scikits.timeseries on github. There's support for multiple of frequencies (like 15min). If you're not comfortable with tinkering with experimental code, you have several solutions, depending on your problem: 1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you have a large series. Still, the easiest solution 2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you need to convert the array to another frequency (conversion routines don't really like 2D arrays...) From davidmontgomery at gmail.com Fri Jul 1 05:22:00 2011 From: davidmontgomery at gmail.com (David Montgomery) Date: Fri, 1 Jul 2011 19:22:00 +1000 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: Awesoke... for the github version...any docs or an example for creating a 15 min array? On Fri, Jul 1, 2011 at 7:07 PM, Pierre GM wrote: > > On Jul 1, 2011, at 1:45 AM, David Montgomery wrote: > >> Hi, >> >> Using scikits timeseries I can create daily and hourly time series....no prob >> >> But.... >> >> I have time series at 15 minutes intervals...this I dont know how to do... >> >> Can a timeseries array handle 15 min intervals? >> Do I use a minute intervals and use mask arrays for the missing minutes? >> Also..I can figure out how to create a array at minute intervals. >> >> So..what is best practice? ?Any examples? > > First possibility, you get the latest experimental version of scikits.timeseries on github. There's support for multiple of frequencies (like 15min). > If you're not comfortable with tinkering with experimental code, you have several solutions, depending on your problem: > 1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you have a large series. Still, the easiest solution > 2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you need to convert the array to another frequency (conversion routines don't really like 2D arrays...) > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pgmdevlist at gmail.com Fri Jul 1 06:11:15 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 12:11:15 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com> Message-ID: On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: > Awesoke... > > for the github version...any docs or an example for creating a 15 min array? Use the 'timestep' optional argument in scikits.timeseries.date_array. BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. From member at linkedin.com Fri Jul 1 06:23:40 2011 From: member at linkedin.com (Jelle Feringa via LinkedIn) Date: Fri, 1 Jul 2011 10:23:40 +0000 (UTC) Subject: [SciPy-User] Invitation to connect on LinkedIn Message-ID: <1579967864.3260450.1309515820011.JavaMail.app@ela4-bed32.prod> LinkedIn ------------ Jelle Feringa requested to add you as a connection on LinkedIn: ------------------------------------------ Jose, I'd like to add you to my professional network on LinkedIn. - Jelle Accept invitation from Jelle Feringa http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I182736195_20/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYMcBYRej4ScPsOe359bTBNsQ9ahRZvbP8PejsVcj8Md3cLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Jelle Feringa http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/Q6WKH0LACopGJkAw_6fSqajo6R7VMvIz/blk/I182736195_20/30OnPkVcjoPdP8UckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/-3wy1w2-gpkzwpgn-5n/ewp/inv-22/ -- (c) 2011, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgomezdans at gmail.com Fri Jul 1 06:36:48 2011 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Fri, 1 Jul 2011 11:36:48 +0100 Subject: [SciPy-User] Weird error in fmin_l_bfgs_b Message-ID: Hi, I'm getting an error in scipy.optimize.fmin_l_bfgs_b, apparently related to the fortran wrapper. This is strange, because exactly the same problem works well with the TNC solver. I have a function that returns both a scalar value (that will be minimised) and the derivative of the function at that point. The error in the L-BFG-S solver is File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b isave, dsave) ValueError: failed to initialize intent(inout) array -- input not fortran contiguous My code looks like this: # x0 is the starting point, a 1d array >>> solution, x, info = scipy.optimize.fmin_tnc( cost_function, x0, args=([operators]), bounds=bounds ) # Using fmin_tnc works well, solution is what I expect it to be >> solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( cost_function, solution, bounds=bounds, args=[ operators ], iprint=101 ) 2011-07-01 11:34:24,703 - eoldas.Model - INFO - 46 days, 46 quantised days tnc: Version 1.3, (c) 2002-2003, Jean-Sebastien Roy (js at jeannot.org) tnc: RCS ID: @(#) $Jeannot: tnc.c,v 1.205 2005/01/28 18:27:31 js Exp $ NIT NF F GTG 0 1 1.988301629303336E+02 8.17118991E+06 tnc: fscale = 0.000249879 1 5 1.338514420154698E+01 1.82689516E+04 tnc: fscale = 0.00528464 2 9 9.476573219561992E+00 2.21390020E+04 3 19 6.684083971679802E+00 3.88897225E+03 4 69 6.274247682836059E+00 2.43671753E+03 tnc: |fn-fn-1] = 4.5037e-13 -> convergence 5 120 6.274247682835608E+00 2.43671753E+03 tnc: Converged (|f_n-f_(n-1)| ~= 0) RUNNING THE L-BFGS-B CODE * * * Machine precision = 1.084D-19 N = 46 M = 10 L = -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 X0 = 5.6013D-02 1.1717D-01 1.9201D-01 2.7557D-01 3.7013D-01 4.5702D-01 5.3491D-01 6.0661D-01 6.7624D-01 7.4649D-01 8.0318D-01 8.5203D-01 8.8633D-01 9.0102D-01 8.9914D-01 8.7521D-01 8.2816D-01 7.6529D-01 7.0559D-01 6.5371D-01 6.0520D-01 5.5814D-01 5.0991D-01 4.4783D-01 3.7790D-01 3.0041D-01 2.1894D-01 1.5147D-01 1.0832D-01 8.3926D-02 6.6473D-02 4.8621D-02 3.2567D-02 2.0086D-02 1.0881D-02 2.4890D-03 8.8000D-04 -4.2729D-03 -4.6658D-03 -5.5940D-03 -4.1690D-03 -1.2577D-02 -2.2529D-02 -2.9114D-02 -1.5938D-02 1.9755D-02 U = 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 1.2000D+00 At X0 0 variables are exactly at the bounds Traceback (most recent call last): File "example_identity.py", line 199, in main ( sys.argv ) File "example_identity.py", line 166, in main solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( cost_function, solution, bounds=bounds, args=[ operators ], iprint=101 ) File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b isave, dsave) ValueError: failed to initialize intent(inout) array -- input not fortran contiguous Any clues of where to look for issues? Thanks! jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Jul 1 12:52:08 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 1 Jul 2011 12:52:08 -0400 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com>

Message-ID: On Fri, Jul 1, 2011 at 6:11 AM, Pierre GM wrote: > > On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: > >> Awesoke... >> >> for the github version...any docs or an example for creating a 15 min array? > > Use the 'timestep' optional argument in scikits.timeseries.date_array. > > BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). > Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > depending on your data manipulation needs, you could also give pandas a shot-- generating 15-minute date ranges for example is quite simple: In [3]: DateRange('7/1/2011', '7/2/2011', offset=datetools.Minute(15)) Out[3]: offset: <15 Minutes>, tzinfo: None [2011-07-01 00:00:00, ..., 2011-07-02 00:00:00] length: 97 The date range can be used to conform a time series you loaded from some source: ts.reindex(dr, method='pad') ('pad' a.k.a. "ffill" propagates values forward into holes, optional) I've got some resampling code in the works that would help with e.g. converting 15-minute data into hourly data or that sort of thing but it's in less-than-complete form at the moment so like I said depends on what you need to do. Give me a few weeks on that bit =) best, Wes From johnl at cs.wisc.edu Fri Jul 1 13:11:25 2011 From: johnl at cs.wisc.edu (J. David Lee) Date: Fri, 01 Jul 2011 12:11:25 -0500 Subject: [SciPy-User] Fitting procedure to take advantage of cluster In-Reply-To: <4E0B58AE.3090706@cs.wisc.edu> References: <4E0B58AE.3090706@cs.wisc.edu> Message-ID: <4E0DFFBD.7050905@cs.wisc.edu> On 06/29/2011 11:54 AM, J. David Lee wrote: > Hello, > > I'm attempting to perform a fit of a model function's output to some > measured data. The model has around 12 parameters, and takes tens of > minutes to run. I have access to a cluster with several thousand > processors that can run the simulations in parallel, so I'm wondering if > there are any algorithms out there that I can use to leverage this > computing power to efficiently solve my problem - that is, besides grid > searches or Monte-Carlo methods. > > Thanks for your help, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I want to thank everyone for their suggestions. I've read through most of the links presented, and am getting a clearer idea of what I need to do. Here's a quick clarification of my problem for those who are interested: I'm running a single-processor plasma simulation modeling an experiment. It has tens or hundreds of parameters, but most are constrained by measurements. For my purposes, the output consists of several x-ray spectra which I am trying to match against measured spectra. I have about 12 or 14 parameters in all that I am changing in order to match the spectra. Each run of the simulation takes a few to a few tens of minutes. I have the ability to run the compiled code on a number of machines, but I can't easily run python scripts on the machines. After some thinking, I'm considering the feasibility of parallelizing the routines in scipy's optimize module. My initial thought is to allow the user to specify a function that would run the objective function on multiple inputs. This would be useful, for example, when performing a simplex shrink, or in numerical gradient / hessian calculations with multiple variables. From my point of view, this would allow me to use a hybrid Monte-Carlo/minimization procedure to look for a global minimum. I'm interested to hear other's opinions on the matter. Thanks again, David From josef.pktd at gmail.com Fri Jul 1 14:34:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 1 Jul 2011 14:34:56 -0400 Subject: [SciPy-User] Question: scipy.stats.gamma.fit In-Reply-To: <4E0D7BBD.6030607@gmail.com> References: <4E0D7BBD.6030607@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 3:48 AM, Ning Guo wrote: > Dear scipy-users, > > I'm using scipy.stats.gamma.fit to fit a set of random variables for > gamma distribution. And to validate the results I also use the fitdistr > function in R. However the results generated by these two packages are > different, i.e. shape parameter and scale parameter for the gamma pdf > are different. Though the difference is not large, I'm wondering what > causes this difference. I think both of them are using maximum > likelihood estimation to fit the function. Do you have an example or a test case? It's difficult to guess what might be different. None of the fit methods are verified against R or tested for correctness. Contributions to the test suite and possible bugfixes would be appreciated. Josef > > Best regards! > Ning > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Fri Jul 1 17:46:53 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 1 Jul 2011 23:46:53 +0200 Subject: [SciPy-User] ANN: NumPy 1.6.1 release candidate 2 Message-ID: Hi, I am pleased to announce the availability (only a little later than planned) of the second release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Jul 1 18:08:31 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 18:08:31 -0400 Subject: [SciPy-User] Weird error in fmin_l_bfgs_b In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 6:36 AM, Jose Gomez-Dans wrote: > Hi, > I'm getting an error in scipy.optimize.fmin_l_bfgs_b, apparently related to > the fortran wrapper. This is strange, because exactly the same problem works > well with the TNC solver. I have a function that returns both a scalar value > (that will be minimised) and the derivative of the function at that point. > The error in the L-BFG-S solver is > File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, > in fmin_l_bfgs_b > ??? isave, dsave) > ValueError: failed to initialize intent(inout) array -- input not fortran > contiguous > > > My code looks like this: > > # x0 is the starting point, a 1d array >>>> solution, x, info = scipy.optimize.fmin_tnc( cost_function, x0, >>>> args=([operators]),? bounds=bounds ) > # Using fmin_tnc works well, solution is what I expect it to be > >>> solution, cost, information = scipy.optimize.fmin_l_bfgs_b ( >>> cost_function, solution, bounds=bounds,? args=[ operators ], iprint=101 ) I've run into this before, though rarely. Is solution from TNC Fortran contiguous? It's written in C though I don't know if it matters. IIUC, it's the only thing that's intent inout and an array. solution.flags If this is indeed the problem, is this something that can be automatically fixed in the python code of fmin_l_bfgs_b beforehand at the expense of a copy? Skipper From hhh.guo at gmail.com Sat Jul 2 01:59:43 2011 From: hhh.guo at gmail.com (Ning Guo) Date: Sat, 02 Jul 2011 13:59:43 +0800 Subject: [SciPy-User] Question: scipy.stats.gamma.fit In-Reply-To: References: <4E0D7BBD.6030607@gmail.com> Message-ID: <4E0EB3CF.9060200@gmail.com> On Saturday, July 02, 2011 02:34 AM, josef.pktd at gmail.com wrote: > On Fri, Jul 1, 2011 at 3:48 AM, Ning Guo wrote: >> Dear scipy-users, >> >> I'm using scipy.stats.gamma.fit to fit a set of random variables for >> gamma distribution. And to validate the results I also use the fitdistr >> function in R. However the results generated by these two packages are >> different, i.e. shape parameter and scale parameter for the gamma pdf >> are different. Though the difference is not large, I'm wondering what >> causes this difference. I think both of them are using maximum >> likelihood estimation to fit the function. > Do you have an example or a test case? > It's difficult to guess what might be different. None of the fit > methods are verified against R or tested for correctness. > > Contributions to the test suite and possible bugfixes would be appreciated. > > Josef > >> Best regards! >> Ning >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Thanks Josef, I simply used scipy.gamma.rvs(0.8,loc=0,scale=1.2,size=1000) to generate random variables, and then used scipy.gamma.fit() to estimate the parameters. To validate, I used fitdistr function in R and gamfit function in GNU Octave. These three packages give different results (fitdistr and gamfit offer closer results). Actually, I am not a statistics guy. I supposed they should generate exactly same results since they all use MLE. But now I see not necessarily exactly the same due to that mathematical implementation of MLE is very complicated. Since results are close to each other more or less, I'll not be bothered by the differences and will choose anyone for the fitting. Best regards! Ning -- Geotechnical Group Department of Civil and Environmental Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong From mmueller at python-academy.de Sat Jul 2 08:29:31 2011 From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=) Date: Sat, 02 Jul 2011 14:29:31 +0200 Subject: [SciPy-User] PyCon DE 2011 - Call for Proposals extended to July 15, 2011 Message-ID: <4E0F0F2B.40601@python-academy.de> PyCon DE 2011 - Deadline for Proposals extended to July 15, 2011 ================================================================ The deadline for talk proposals is extended to July 15, 2011. You would like to talk about your Python project to the German-speaking Python community? Just submit your proposal within the next two weeks: http://de.pycon.org/2011/speaker/ About PyCon DE 2011 ------------------- The first PyCon DE will be held October 4-9, 2011 in Leipzig, Germany. The conference language will be German. Talks in English are possible. Please contact us for details. The call for proposals is now open. Please submit your talk by June 30, 2011 online. There are two types of talks: standard talks (20 minutes + 5 minutes Q&A) and long talks (45 minutes + 10 minutes Q&A). More details about the call can be found on the PyCon DE website: http://de.pycon.org/2011/Call_for_Papers/ Since the conference language will be German, the call is in German too. PyCon DE 2011 - Neuer Einsendeschluss f?r Vortragsvorschl?ge 15.07.2011 ======================================================================= Noch bis zum 15.7.2011 kann jeder, der sich f?r Python interessiert, einen Vortragsvorschlag f?r die PyCon DE 2011 einreichen. Es gibt nur zwei Bedingungen: das Thema sollte interessant sein und etwas mit Python zu tun haben. F?r die erste deutsche Python-Konferenz sind wir an einer breiten Themenpalette interessiert, die das ganze Spektrum der Entwicklung, Nutzung und Wirkung von Python zeigt. M?gliche Themen sind zum Beispiel: * Webanwendungen mit Python * Contentmanagement mit Python * Datenbankanwendungen mit Python * Testen mit Python * Systemintegration mit Python * Python f?r gro?e Systeme * Python im Unternehmensumfeld * Pythonimplementierungen (Jython, IronPython, PyPy, Unladen Swallow und andere) * Python als erste Programmiersprache * Grafische Nutzerschnittstellen (GUIs) * Parallele Programmierung mit Python * Python im wissenschaftlichen Bereich (Bioinformatik, Numerik, Visualisierung und anderes) * Embedded Python * Marketing f?r Python * Python, Open Source und Entwickler-Gemeinschaft * Zuk?nftige Entwicklungen * mehr ... Ihr Themenbereich ist nicht aufgelistet, w?re aber aus Ihrer Sicht f?r die PyCon DE interessant? Kein Problem. Reichen Sie Ihren Vortragsvorschlag einfach ein. Auch wir k?nnen nicht alle Anwendungsbereiche von Python ?berschauen. Vortragstage sind vom 5. bis 7. Oktober 2011. Es gibt zwei Vortragsformate: * Standard-Vortrag -- 20 Minuten Vortrag + 5 Minuten Diskussion * Lang-Vortrag -- 45 Minuten Vortrag + 10 Minuten Diskussion Die Vortragszeit wird strikt eingehalten. Bitte testen Sie die L?nge Ihres Vortrags. Lassen Sie gegebenenfalls ein paar Folien weg. Die Vortragsprache ist Deutsch. In begr?ndeten Ausnahmef?llen k?nnen Vortr?ge auch auf Englisch gehalten werden. Bitte fragen Sie uns dazu. Bitte reichen Sie Ihren Vortrag auf der Konferenz-Webseite http://de.pycon.org bis zum 15.07.2011 ein. Wir entscheiden bis zum 31. Juli 2011 ?ber die Annahme des Vortrags. From jeremy at jeremysanders.net Sat Jul 2 09:53:09 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Sat, 02 Jul 2011 14:53:09 +0100 Subject: [SciPy-User] OT: Re: ANN: Veusz 1.12 - a python-based GUI/scripted scientific plotting package References: <735FFAC9-18D7-4359-A3C0-3F2890B2F045@qwest.net> Message-ID: Jerry wrote: > Veusz appears to be a very capable plotting program. However, I can't use > it because it, like many other plotting programs, lacks a flexible and > easy import mechanism. My basis for comparison is Igor Pro which allows > the user to specify, in an easy-to-use dialog box, the details of the > formatting of the data to be imported. This includes both text and binary > files, along with Excel files. Thanks for the suggestion. I agree veusz needs a friendly import dialog. I'll have a look at the software you suggested to get some ideas... Jeremy From gorkypl at gmail.com Sat Jul 2 11:56:15 2011 From: gorkypl at gmail.com (=?UTF-8?B?UGF3ZcWC?=) Date: Sat, 2 Jul 2011 17:56:15 +0200 Subject: [SciPy-User] [scikits.timeseries] Custom xticks labels Message-ID: Hello, I have a problem with xticks while using scikits.timeseries. While plotting long series of data, the default labels of xticks are month names, and year numbers every 12th tick. I want to change this to something like mm.yy (%m.%y) under every tick. Up to now I experimented with xaxis.set_major_formatter() and TimeSeries_DateFormatter, but none of my methods work. I can do what I want in 'pure' matplotlib (using set_major_formatter and matplotlib.dates.DateFormatter), but I'm unable to achieve this while plotting timeseries using scikits.timeseries module. Can anyone help me? Any working example or a clue? greetings, Pawe? Rumian From jaimelozano09 at gmail.com Sun Jul 3 04:00:37 2011 From: jaimelozano09 at gmail.com (Jaime Lozano) Date: Sun, 3 Jul 2011 10:00:37 +0200 Subject: [SciPy-User] 64 bit compilation Message-ID: Hello scipy users, I've installed 64 bit python from source (CC=gcc -m64 ...). Now I'm trying to install scipy python setup.py install but it doesn't work. I know there are 64 bit binaries, but I want to compile it myself. What I am doing wrong? Do I need a special configuration in setup.py? Have a nice day Jaime -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jul 3 05:49:45 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jul 2011 11:49:45 +0200 Subject: [SciPy-User] 64 bit compilation In-Reply-To: References: Message-ID: On Sun, Jul 3, 2011 at 10:00 AM, Jaime Lozano wrote: > Hello scipy users, > I've installed 64 bit python from source (CC=gcc -m64 ...). Now I'm trying > to install scipy > > python setup.py install > > but it doesn't work. I know there are 64 bit binaries, but I want to > compile it myself. What I am doing wrong? Do I need a special configuration > in setup.py? > > > We need more details to answer you question. OS, compiler versions, build log, etc. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Jul 4 03:25:21 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 4 Jul 2011 09:25:21 +0200 Subject: [SciPy-User] Time Series using 15 minute intervals using scikits.timeseries In-Reply-To: References: <4ACDF9FF-B2D3-49D3-B4F3-439BDBD59838@gmail.com>

Message-ID: <89480CCA-EC18-4C97-9E31-2EE7D38E2453@gmail.com> On Jul 1, 2011, at 6:52 PM, Wes McKinney wrote: > On Fri, Jul 1, 2011 at 6:11 AM, Pierre GM wrote: >> >> On Jul 1, 2011, at 11:22 AM, David Montgomery wrote: >> >>> Awesoke... >>> >>> for the github version...any docs or an example for creating a 15 min array? >> >> Use the 'timestep' optional argument in scikits.timeseries.date_array. >> >> BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/ repository (that's the experimental one I was telling you about). >> Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days. Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > depending on your data manipulation needs, you could also give pandas > a shot-- generating 15-minute date ranges for example is quite simple: > > In [3]: DateRange('7/1/2011', '7/2/2011', offset=datetools.Minute(15)) > Out[3]: > > offset: <15 Minutes>, tzinfo: None > [2011-07-01 00:00:00, ..., 2011-07-02 00:00:00] > length: 97 > > The date range can be used to conform a time series you loaded from some source: > > ts.reindex(dr, method='pad') > > ('pad' a.k.a. "ffill" propagates values forward into holes, optional) > > I've got some resampling code in the works that would help with e.g. > converting 15-minute data into hourly data or that sort of thing but > it's in less-than-complete form at the moment so like I said depends > on what you need to do. Give me a few weeks on that bit =) Wes, have a look on the conversion functions we have in scikits.timeseries. It's just a matter of knowing where and how to slice... From Pierre.RAYBAUT at CEA.FR Tue Jul 5 08:20:26 2011 From: Pierre.RAYBAUT at CEA.FR (Pierre.RAYBAUT at CEA.FR) Date: Tue, 5 Jul 2011 14:20:26 +0200 Subject: [SciPy-User] [ANN] guidata v1.3.2 Message-ID: Hi all, I am pleased to announce that `guidata` v1.3.2 has been released. Note that the project has recently been moved to GoogleCode: http://guidata.googlecode.com Main change since `guidata` v1.3.1: Since this version, guidata is compatible with PyQt4 API #1 and API #2. Please read carefully the coding guidelines which have been recently added to the documentation. Complete changelog is available here: http://code.google.com/p/guidata/wiki/ChangeLog The `guidata` documentation with examples, API reference, etc. is available here: http://packages.python.org/guidata/ Based on the Qt Python binding module PyQt4, guidata is a Python library generating graphical user interfaces for easy dataset editing and display. It also provides helpers and application development tools for PyQt4. guidata also provides the following features: * guidata.qthelpers: PyQt4 helpers * guidata.disthelpers: py2exe helpers * guidata.userconfig: .ini configuration management helpers (based on Python standard module ConfigParser) * guidata.configtools: library/application data management * guidata.gettext_helpers: translation helpers (based on the GNU tool gettext) * guidata.guitest: automatic GUI-based test launcher * guidata.utils: miscelleneous utilities guidata has been successfully tested on GNU/Linux and Windows platforms. Python package index page: http://pypi.python.org/pypi/guidata/ Documentation, screenshots: http://packages.python.org/guidata/ Downloads (source + Python(x,y) plugin): http://guidata.googlecode.com Cheers, Pierre --- Dr. Pierre Raybaut CEA - Commissariat ? l'Energie Atomique et aux Energies Alternatives From Pierre.RAYBAUT at CEA.FR Tue Jul 5 08:21:08 2011 From: Pierre.RAYBAUT at CEA.FR (Pierre.RAYBAUT at CEA.FR) Date: Tue, 5 Jul 2011 14:21:08 +0200 Subject: [SciPy-User] [ANN] guiqwt v2.1.4 Message-ID: Hi all, I am pleased to announce that `guiqwt` v2.1.4 has been released. Based on PyQwt (plotting widgets for PyQt4 graphical user interfaces) and on the scientific modules NumPy and SciPy, guiqwt is a Python library providing efficient 2D data-plotting features (curve/image visualization and related tools) for interactive computing and signal/image processing application development. Complete change log is now available here: http://code.google.com/p/guiqwt/wiki/ChangeLog Documentation with examples, API reference, etc. is available here: http://packages.python.org/guiqwt/ This version of `guiqwt` includes a demo software, Sift (for Signal and Image Filtering Tool), based on `guidata` and `guiqwt`: http://packages.python.org/guiqwt/sift.html Windows users may even download the portable version of Sift 0.2.3 to test it without having to install anything: http://code.google.com/p/guiqwt/downloads/detail?name=sift023_portable.zip When compared to the excellent module `matplotlib`, the main advantages of `guiqwt` are: * Performance: see http://packages.python.org/guiqwt/overview.html#performances * Interactivity: see for example http://packages.python.org/guiqwt/_images/plot.png * Powerful signal processing tools: see for example http://packages.python.org/guiqwt/_images/fit.png * Powerful image processing tools: * Real-time contrast adjustment: http://packages.python.org/guiqwt/_images/contrast.png * Cross sections (line/column, averaged and oblique cross sections!): http://packages.python.org/guiqwt/_images/cross_section.png * Arbitrary affine transforms on images: http://packages.python.org/guiqwt/_images/transform.png * Interactive filters: http://packages.python.org/guiqwt/_images/imagefilter.png * Geometrical shapes/Measurement tools: http://packages.python.org/guiqwt/_images/image_plot_tools.png * Perfect integration of `guidata` features for image data editing: http://packages.python.org/guiqwt/_images/simple_window.png But `guiqwt` is more than a plotting library; it also provides: * Framework for signal/image processing application development: see http://packages.python.org/guiqwt/examples.html * And many other features like making executable Windows programs easily (py2exe helpers): see http://packages.python.org/guiqwt/disthelpers.html guiqwt has been successfully tested on GNU/Linux and Windows platforms. Python package index page: http://pypi.python.org/pypi/guiqwt/ Documentation, screenshots: http://packages.python.org/guiqwt/ Downloads (source + Python(x,y) plugin): http://guiqwt.googlecode.com Cheers, Pierre --- Dr. Pierre Raybaut CEA - Commissariat ? l'Energie Atomique et aux Energies Alternatives From josef.pktd at gmail.com Tue Jul 5 11:16:02 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 Jul 2011 11:16:02 -0400 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: <4DE81898.1070701@molden.no> References: <4DE7B957.2060305@gmail.com> <4DE81898.1070701@molden.no> Message-ID: On Thu, Jun 2, 2011 at 7:11 PM, Sturla Molden wrote: > Den 02.06.2011 18:54, skrev josef.pktd at gmail.com: >> >> If these are the alternatives, then I will stick with rejection sampling. >> I'm not starting to learn the implementation details of simulating >> with MCMC, Metropolis-Hastings or Gibbs, and leave it to the pymc >> developers and to Wes. > > Metropolis-Hastings is a form of rejection sampling. > > It's just a way to reduce the number of rejections, particularly when the sample space is large. > > > > >> rtmvnorm has a big Warning label about the Gibbs sampler, although, >> for MonteCarlo integration, any serial correlation in the sampler >> won't be very relevant. > > You will get serial correlation with MCMC, but remember they are still > samples from the stationary distribution of the Markov chain. You can > still use these samples to compute mean, standard deviation, KDE, > numerical integrals, etc. (a late reply, since I was looking at this again.) Thanks Sturla, I never thought of this distinction in the usage. Until now I was mostly interested in simulating stochastic processes, time series, or random samples for regression. In these cases I cannot have any spurious serial correlation. But now that I need more Monte Carlo integration, this will be useful. Josef > > > Sturla > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Wed Jul 6 01:30:36 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 6 Jul 2011 07:30:36 +0200 Subject: [SciPy-User] [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References:

Message-ID: On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: > In article , > Ralf Gommers wrote: > > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > Will there be a Mac binary for 32-bit pythons (one that is compatible > with older versions of MacOS X)? At present I only see a 64-bit > 10.6-only version. > > > Yes there will be for the final release (10.4-10.6 compatible). I can't create those on my own computer, so sometimes I don't make them for RCs. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Wed Jul 6 15:59:36 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Wed, 6 Jul 2011 21:59:36 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers Message-ID: Debian 6.0.1 Python 2.6.7 ifort 11.1 20100806 icc 11.1 20091130 mkl 10.3 After successfully building latest stable NumPy (1.6.1RC2) I tried to build SciPy (0.9.0) following two sources: http://new.scipy.org/building/linux.html http://blog.sun.tc/2010/11/numpy-and-scipy-with-intel-mkl-on-linux.html But I get this error: ------------------------------------------------ ... building 'qhull' library compiling C sources C compiler: icc -fPIC compile options: '-I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c' icc: scipy/spatial/qhull/src/poly.c scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration has no storage class or type specifier template ^ scipy/spatial/qhull/src/qhull_a.h(106): error: expected a ";" template ^ scipy/spatial/qhull/src/qhull_a.h(115): warning #12: parsing restarts here after previous syntax error void qh_qhull(void); ^ compilation aborted for scipy/spatial/qhull/src/poly.c (code 2) scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration has no storage class or type specifier template ^ scipy/spatial/qhull/src/qhull_a.h(106): error: expected a ";" template ^ scipy/spatial/qhull/src/qhull_a.h(115): warning #12: parsing restarts here after previous syntax error void qh_qhull(void); ^ compilation aborted for scipy/spatial/qhull/src/poly.c (code 2) error: Command "icc -fPIC -I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c scipy/spatial/qhull/src/poly.c -o build/temp.linux-i686-2.6/scipy/spatial/qhull/src/poly.o" failed with exit status 2 ------------------------------------------------ As Google did not return anything about this error, and I just installed Debian two days ago, without any Linux experience, I might miss something. Or what can cause this error? Thanks From dbrown at ucar.edu Wed Jul 6 18:14:03 2011 From: dbrown at ucar.edu (David Brown) Date: Wed, 6 Jul 2011 16:14:03 -0600 Subject: [SciPy-User] Call for papers: AMS Jan 22-26, 2012 Message-ID: <583E9BD8-07AE-4C04-B52E-782631E060C5@ucar.edu> I would like to call to the attention of the SciPy community the following call for papers: Second Symposium on Advances in Modeling and Analysis Using Python, 22?26 January 2012, New Orleans, Louisiana The Second Symposium on Advances in Modeling and Analysis Using Python, sponsored by the American Meteorological Society, will be held 22?26 January 2012, as part of the 92nd AMS Annual Meeting in New Orleans, Louisiana. Preliminary programs, registration, hotel, and general information will be posted on the AMS Web site (http://www.ametsoc.org/meet/annual/) in late-September 2011. The application of object-oriented programming and other advances in computer science to the atmospheric and oceanic sciences has in turn led to advances in modeling and analysis tools and methods. This symposium focuses on applications of the open-source language Python and seeks to disseminate advances using Python in the atmospheric and oceanic sciences, as well as grow the earth sciences Python community. Papers describing Python work in applications, methodologies, and package development in all areas of meteorology, climatology, oceanography, and space sciences are welcome, including (but not limited to): modeling, time series analysis, air quality, satellite data processing, in-situ data analysis, GIS, Python as a software integration platform, visualization, gridding, model intercomparison, and very large (petabyte) dataset manipulation and access. The $95 abstract fee includes the submission of your abstract, the posting of your extended abstract, and the uploading and recording of your presentation which will be archived on the AMS Web site. Please submit your abstract electronically via the Web by 1 August 2011 (refer to the AMS Web page athttp://www.ametsoc.org/meet/online_submit.html.) An abstract fee of $95 (payable by credit card or purchase order) is charged at the time of submission (refundable only if abstract is not accepted). Authors of accepted presentations will be notified via e-mail by late-September 2011. All extended abstracts are to be submitted electronically and will be available on-line via the Web, Instructions for formatting extended abstracts will be posted on the AMS Web site. Manuscripts (up to 3MB) must be submitted electronically by 22 February 2012. All abstracts, extended abstracts and presentations will be available on the AMS Web site at no cost. For additional information, please contact the program chairperson, Johnny Lin, Physics Department, North Park University (jlin at northpark.edu). (5/11) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Jul 6 22:36:32 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 Jul 2011 11:36:32 +0900 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 4:59 AM, Klonuo Umom wrote: > Debian 6.0.1 > Python 2.6.7 > > ifort 11.1 20100806 > icc 11.1 20091130 > mkl 10.3 > > > After successfully building latest stable NumPy (1.6.1RC2) I tried to > build SciPy (0.9.0) following two sources: > > http://new.scipy.org/building/linux.html > http://blog.sun.tc/2010/11/numpy-and-scipy-with-intel-mkl-on-linux.html > > But I get this error: > > ------------------------------------------------ > > ... > > building 'qhull' library > compiling C sources > C compiler: icc -fPIC > > compile options: '-I/usr/include/python2.6 > -I/usr/local/lib/python2.6/dist-packages/numpy/core/include > -I/usr/local/lib/python2.6/dist-packages/numpy/core/include -c' > icc: scipy/spatial/qhull/src/poly.c > scipy/spatial/qhull/src/qhull_a.h(106): warning #77: this declaration > has no storage class or type specifier > ?template This error is quite strange: the header uses templates (even though it is C code !!), only in the case of intel compiler. That makes no sense to me. From a quick look at the code, this template definition is only a trick to avoid warning for unused code, but that can be implemented in ICC specific manner without resorting to C++. I think you can safely define QHULL_UNUSED macro to (x) only: template inline void qhullUnused(T &x) { (void)x; } # define QHULL_UNUSED(x) qhullUnused(x); becomes: #define QHULL_UNUSED(x) (x) in qhull_a.h (line 106) cheers, David From wkerzendorf at googlemail.com Thu Jul 7 03:51:28 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Thu, 07 Jul 2011 09:51:28 +0200 Subject: [SciPy-User] reading in files with fixed with format Message-ID: <4E156580.6010309@gmail.com> Dear all, I have a couple of data files that were written with fortran at a fixed with. That means its tabular data which might not have spaces (it is just specified how many characters each field has and what type it is). Is there anything to read that with scipy and or numpy? Cheers Wolfgang From klonuo at gmail.com Thu Jul 7 04:24:21 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Thu, 7 Jul 2011 10:24:21 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References:

Message-ID: Thanks David, that did the trick :) I'm C illiterate, and I did search before mailing here, for templates syntax and usage, but explanations were very confusing to me unfortunately, even thou there were examples exactly for 'template ' However after a while, building was again interrupted on 'scipy/sparse/sparsetools' I uploaded part of the log here: http://dl.dropbox.com/u/6735093/scipy.txt which I also compressed and attached, just in case Thanks for your assistance On Thu, Jul 7, 2011 at 4:36 AM, David Cournapeau wrote: > I think you can safely define QHULL_UNUSED macro to (x) only: > > template > inline void qhullUnused(T &x) { (void)x; } > # ?define QHULL_UNUSED(x) qhullUnused(x); > > becomes: > > #define QHULL_UNUSED(x) (x) > > in qhull_a.h (line 106) -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy.log.7z Type: application/x-7z-compressed Size: 1365 bytes Desc: not available URL: From athanastasiou at gmail.com Thu Jul 7 06:57:16 2011 From: athanastasiou at gmail.com (Athanasios Anastasiou) Date: Thu, 7 Jul 2011 11:57:16 +0100 Subject: [SciPy-User] reading in files with fixed with format In-Reply-To: <4E156580.6010309@gmail.com> References: <4E156580.6010309@gmail.com> Message-ID: Hello Wolfgang My understanding is that you have binary data packed in a file as a series of numbers (like a vector of floats or doubles) or as a series of structures (like a set of records) as opposed to fixed width text data. To read and convert binary data you can use the struct module (http://docs.python.org/library/struct.html). The documentation contains some representative examples. You just have to make sure that unpack's fmt "structure definition" matches exactly the fortran definition in terms of data types. Also, please note that fortran stores arrays in column major format (rather than row major format which is the default for C). You might have to take this into account depending on the way the data was written to your files originally. All the best Athanasios On Thu, Jul 7, 2011 at 8:51 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have a couple of data files that were written with fortran at a fixed > with. That means its tabular data which might not have spaces (it is > just specified how many characters each field has and what type it is). > Is there anything to read that with scipy and or numpy? > > Cheers > ? ? Wolfgang > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From collinstocks at gmail.com Wed Jul 6 14:17:56 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Wed, 6 Jul 2011 11:17:56 -0700 (PDT) Subject: [SciPy-User] linalg.decomp code contribution Message-ID: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Over the past several years, there have been various requests for the linalg.decomp.qr() function to (optionally) implement qr with pivots. I have written a solution which, unfortunately, requires cvxopt, since there is no wrapper within scipy for lapack.geqp3(). However, this should be an effective starting place for anyone who wants to complete the job. If this does eventually end up in scipy, I would appreciate a little mention, though! :) The code can also be found at: http://pastebin.com/HFJdQ67H I have also extended the "mode" keyword slightly, and have removed the "econ" keyword, as it is deprecated anyway (assuming that I have the very latest release on my own machine, which is not all that likely...). ######################################################## import cvxopt from cvxopt import lapack def qr(a, overwrite_a=0, lwork=None, mode='qr'): """Compute QR decomposition of a matrix. Calculate the decomposition :lm:`A = Q R` where Q is unitary/ orthogonal and R upper triangular. Parameters ---------- a : array, shape (M, N) Matrix to be decomposed overwrite_a : boolean Whether data in a is overwritten (may improve performance) lwork : integer Work array size, lwork >= a.shape[1]. If None or -1, an optimal size is computed. mode : {'qr', 'r', 'qrp'} Determines what information is to be returned: either both Q and R or only R, or Q and R and P, a permutation matrix. Any of these can be combined with 'economic' using the '+' sign as a separator. Economic mode means the following: Compute the economy-size QR decomposition, making shapes of Q and R (M, K) and (K, N) instead of (M,M) and (M,N). K=min(M,N). Returns ------- (if mode == 'qr') Q : double or complex array, shape (M, M) or (M, K) for econ==True (for any mode) R : double or complex array, shape (M, N) or (K, N) for econ==True Size K = min(M, N) Raises LinAlgError if decomposition fails Notes ----- This is an interface to the LAPACK routines dgeqrf, zgeqrf, dorgqr, and zungqr. Examples -------- >>> from scipy import random, linalg, dot >>> a = random.randn(9, 6) >>> q, r = linalg.qr(a) >>> allclose(a, dot(q, r)) True >>> q.shape, r.shape ((9, 9), (9, 6)) >>> r2 = linalg.qr(a, mode='r') >>> allclose(r, r2) >>> q3, r3 = linalg.qr(a, econ=True) >>> q3.shape, r3.shape ((9, 6), (6, 6)) """ mode = mode.split("+") if "economic" in mode: econ = True else: econ = False a1 = asarray_chkfinite(a) if len(a1.shape) != 2: raise ValueError("expected 2D array") M, N = a1.shape overwrite_a = overwrite_a or (_datanotshared(a1,a)) if 'qrp' in mode: qr = cvxopt.matrix(np.asarray(a1, dtype = float)) tau = cvxopt.matrix(np.zeros(min(M, N), dtype = float)) jpvt = cvxopt.matrix(np.zeros(N, dtype = int)) lapack.geqp3(qr, jpvt, tau) qr = np.asarray(qr) tau = np.asarray(tau) jpvt = (np.asarray(jpvt) - 1).flatten() else: geqrf, = get_lapack_funcs(('geqrf',),(a1,)) if lwork is None or lwork == -1: # get optimal work array qr,tau,work,info = geqrf(a1,lwork=-1,overwrite_a=1) lwork = work[0] qr,tau,work,info = geqrf(a1,lwork=lwork,overwrite_a=overwrite_a) if info<0: raise ValueError("illegal value in %-th argument of internal geqrf" % -info) if not econ or M References: <4E156580.6010309@gmail.com> Message-ID: <20110707085140.GB26058@poincare.pc.linmpi.mpg.de> The function numpy.genfromtxt reads text files into arrays. There is an example on how to deal with fixed-width columns using the delimiter argument in the docstring and in the I/O chapter of the user guide: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#the-delimiter-argument Miguel On Thu, Jul 07, 2011 at 09:51:28AM +0200, Wolfgang Kerzendorf wrote: > Dear all, > > I have a couple of data files that were written with fortran at a fixed > with. That means its tabular data which might not have spaces (it is > just specified how many characters each field has and what type it is). > Is there anything to read that with scipy and or numpy? > > Cheers > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gnurser at gmail.com Thu Jul 7 09:43:40 2011 From: gnurser at gmail.com (George Nurser) Date: Thu, 7 Jul 2011 14:43:40 +0100 Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX Message-ID: Hi, Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 I just pulled the latest version of scipy master; git log gives top commit as e86854bfab64a16c026106ccdf90761328b83272 scipy.test('10') gives a segfault. First few lines of the Apple DiagnosticReport are: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000002b9e72320 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libBLAS.dylib 0x00007fff87b9d010 ATL_zdotc_xp0yp0aXbX + 48 1 _iterative.so 0x0000000103b9cf8c zgmresrevcom_ + 1980 2 _iterative.so 0x0000000103b876eb f2py_rout__iterative_zgmresrevcom + 1563 (_iterativemodule.c:5041) 3 org.python.python 0x000000010000c9f2 PyObject_Call + 98 The whole of the segfault trace is attached as scipy.segfault --George Nurser Output of scipy.test('10') below: Running unit tests for scipy NumPy version 1.6.1rc2 NumPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy SciPy version 0.10.0.dev SciPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy Python version 2.7.1 (r271:86882M, Nov 30 2010, 10:35:34) [GCC 4.2.1 (Apple Inc. build 5664)] nose version 0.11.3 ............................................................................................................................................................................................................................K............................................................................................................/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:678: UserWarning: The coefficients of the spline returned have been computed as the minimal norm least-squares solution of a (numerically) rank deficient system (deficiency=7). If deficiency is large, the results may be inaccurate. Deficiency may strongly depend on the value of eps. warnings.warn(message) ....../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:609: UserWarning: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots. warnings.warn(message) .......................K..K....../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1920: RuntimeWarning: invalid value encountered in absolute return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) ................................................................................................................................................................................................................................................................................................................................................................................................................./Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/wavfile.py:31: WavFileWarning: Unfamiliar format bytes warnings.warn("Unfamiliar format bytes", WavFileWarning) /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/wavfile.py:121: WavFileWarning: chunk not understood warnings.warn("chunk not understood", WavFileWarning) ................................................................................./Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/utils.py:139: DeprecationWarning: `get_blas_funcs` is deprecated! warnings.warn(depdoc, DeprecationWarning) ..............................................................................................................................................SSSSSS......SSSSSS......SSSS............................................................................S................................................................................................................................................................................................................................................K.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:259: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ' install scikits.umfpack instead', DeprecationWarning ) ../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:75: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ' install scikits.umfpack instead', DeprecationWarning ) ................F...F..F.....FF.EF..FEF.E.E...E........E...EF.F.F.FEE..E..E...................................................................F.......E.EFFFE.F..EF.E...EE...FEE...F.F..FEEF...EE..E............................................................EEEFFEEESegmentation fault -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy.segfault Type: application/octet-stream Size: 59579 bytes Desc: not available URL: From pav at iki.fi Thu Jul 7 13:59:53 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 17:59:53 +0000 (UTC) Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX References: Message-ID: On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: > Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org > (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 Known issue: http://projects.scipy.org/scipy/ticket/1472 From pav at iki.fi Thu Jul 7 14:09:05 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 18:09:05 +0000 (UTC) Subject: [SciPy-User] linalg.decomp code contribution References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: Hi, On Wed, 06 Jul 2011 11:17:56 -0700, Collin Stocks wrote: > Over the past several years, there have been various requests for the > linalg.decomp.qr() function to (optionally) implement qr with pivots. I > have written a solution which, unfortunately, requires cvxopt, since > there is no wrapper within scipy for lapack.geqp3(). However, this > should be an effective starting place for anyone who wants to complete > the job. If this does eventually end up in scipy, I would appreciate a > little mention, though! :) I'd suggest creating an enhancement ticket on the Scipy Trac (if a related ticket does not already exist), and attaching your code to it: http://projects.scipy.org/scipy/ The reason is that if there is no-one who can spare time to look at this immediately, then there is a high likelihood that your mail will be lost and forgotten in the mailing list traffic. Cheers, Pauli From pav at iki.fi Thu Jul 7 15:08:14 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 7 Jul 2011 19:08:14 +0000 (UTC) Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX References: Message-ID: On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: > Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org > (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 > > I just pulled the latest version of scipy master; git log gives top > commit as e86854bfab64a16c026106ccdf90761328b83272 Some possible fixes went in. Can you try again with the current Git master? From collinstocks at gmail.com Thu Jul 7 17:42:59 2011 From: collinstocks at gmail.com (collinstocks at gmail.com) Date: Thu, 7 Jul 2011 17:42:59 -0400 Subject: [SciPy-User] linalg.decomp code contribution In-Reply-To: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: I realize that no one has responded to my post yet, and I hate to double post like this, but this update is rather important. The code I attached previously contains a bug (well, makes a bug in cvxopt apparent, actually...) whereby creating new matrices in cvxopt from numpy arrays can sometimes cause a segmentation fault or memory access violation (or something...Windows doesn't really seem to give enough information to differentiate between different types of failures) under certain circumstances on Windows XP. I have fixed my code and attached it. I also changed some abbreviations in my code ("np") to the full name ("numpy") to be more consistent with the surrounding code. -- Collin On Wed, Jul 6, 2011 at 14:17, Collin Stocks wrote: > Over the past several years, there have been various requests for the > linalg.decomp.qr() function to (optionally) implement qr with pivots. > I have written a solution which, unfortunately, requires cvxopt, since > there is no wrapper within scipy for lapack.geqp3(). However, this > should be an effective starting place for anyone who wants to complete > the job. If this does eventually end up in scipy, I would appreciate a > little mention, though! :) > > The code can also be found at: http://pastebin.com/HFJdQ67H > > I have also extended the "mode" keyword slightly, and have removed the > "econ" keyword, as it is deprecated anyway (assuming that I have the > very latest release on my own machine, which is not all that > likely...). > > ######################################################## > import cvxopt > from cvxopt import lapack > > def qr(a, overwrite_a=0, lwork=None, mode='qr'): > """Compute QR decomposition of a matrix. > > Calculate the decomposition :lm:`A = Q R` where Q is unitary/ > orthogonal > and R upper triangular. > > Parameters > ---------- > a : array, shape (M, N) > Matrix to be decomposed > overwrite_a : boolean > Whether data in a is overwritten (may improve performance) > lwork : integer > Work array size, lwork >= a.shape[1]. If None or -1, an > optimal size > is computed. > mode : {'qr', 'r', 'qrp'} > Determines what information is to be returned: either both Q > and R > or only R, or Q and R and P, a permutation matrix. Any of > these can > be combined with 'economic' using the '+' sign as a separator. > Economic mode means the following: > Compute the economy-size QR decomposition, making shapes > of Q and R (M, K) and (K, N) instead of (M,M) and (M,N). > K=min(M,N). > > Returns > ------- > (if mode == 'qr') > Q : double or complex array, shape (M, M) or (M, K) for econ==True > > (for any mode) > R : double or complex array, shape (M, N) or (K, N) for econ==True > Size K = min(M, N) > > Raises LinAlgError if decomposition fails > > Notes > ----- > This is an interface to the LAPACK routines dgeqrf, zgeqrf, > dorgqr, and zungqr. > > Examples > -------- > >>> from scipy import random, linalg, dot > >>> a = random.randn(9, 6) > >>> q, r = linalg.qr(a) > >>> allclose(a, dot(q, r)) > True > >>> q.shape, r.shape > ((9, 9), (9, 6)) > > >>> r2 = linalg.qr(a, mode='r') > >>> allclose(r, r2) > > >>> q3, r3 = linalg.qr(a, econ=True) > >>> q3.shape, r3.shape > ((9, 6), (6, 6)) > > """ > mode = mode.split("+") > if "economic" in mode: > econ = True > else: > econ = False > > a1 = asarray_chkfinite(a) > if len(a1.shape) != 2: > raise ValueError("expected 2D array") > M, N = a1.shape > overwrite_a = overwrite_a or (_datanotshared(a1,a)) > > if 'qrp' in mode: > qr = cvxopt.matrix(np.asarray(a1, dtype = float)) > > tau = cvxopt.matrix(np.zeros(min(M, N), dtype = float)) > jpvt = cvxopt.matrix(np.zeros(N, dtype = int)) > > lapack.geqp3(qr, jpvt, tau) > > qr = np.asarray(qr) > tau = np.asarray(tau) > jpvt = (np.asarray(jpvt) - 1).flatten() > else: > geqrf, = get_lapack_funcs(('geqrf',),(a1,)) > if lwork is None or lwork == -1: > # get optimal work array > qr,tau,work,info = geqrf(a1,lwork=-1,overwrite_a=1) > lwork = work[0] > > qr,tau,work,info = > geqrf(a1,lwork=lwork,overwrite_a=overwrite_a) > if info<0: > raise ValueError("illegal value in %-th argument of > internal geqrf" > % -info) > > if not econ or M R = basic.triu(qr) > else: > R = basic.triu(qr[0:N,0:N]) > > if 'r' in mode: > return R > > if find_best_lapack_type((a1,))[0]=='s' or > find_best_lapack_type((a1,))[0]=='d': > gor_un_gqr, = get_lapack_funcs(('orgqr',),(qr,)) > else: > gor_un_gqr, = get_lapack_funcs(('ungqr',),(qr,)) > > if M # get optimal work array > Q,work,info = gor_un_gqr(qr[:,0:M],tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qr[:, > 0:M],tau,lwork=lwork,overwrite_a=1) > elif econ: > # get optimal work array > Q,work,info = gor_un_gqr(qr,tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qr,tau,lwork=lwork,overwrite_a=1) > else: > t = qr.dtype.char > qqr = numpy.empty((M,M),dtype=t) > qqr[:,0:N]=qr > # get optimal work array > Q,work,info = gor_un_gqr(qqr,tau,lwork=-1,overwrite_a=1) > lwork = work[0] > Q,work,info = gor_un_gqr(qqr,tau,lwork=lwork,overwrite_a=1) > > if info < 0: > raise ValueError("illegal value in %-th argument of internal > gorgqr" > % -info) > > if 'qrp' in mode: > return Q, R, jpvt > > return Q, R > ######################################################## -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: qr.py Type: application/octet-stream Size: 4209 bytes Desc: not available URL: From collinstocks at gmail.com Thu Jul 7 17:47:36 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Thu, 7 Jul 2011 14:47:36 -0700 (PDT) Subject: [SciPy-User] linalg.decomp code contribution In-Reply-To: References: <234ee791-7a3a-4d8e-8a89-0796bb503b9d@l18g2000yql.googlegroups.com> Message-ID: <78bc40e3-9c29-4b56-9a40-aaeb3866612f@a10g2000vbz.googlegroups.com> Oh, thank you. For some reason, I did not see this before replying to the thread again. I will take a look at that. Thanks, Collin On Jul 7, 2:09?pm, Pauli Virtanen wrote: > Hi, > > On Wed, 06 Jul 2011 11:17:56 -0700, Collin Stocks wrote: > > Over the past several years, there have been various requests for the > > linalg.decomp.qr() function to (optionally) implement qr with pivots. I > > have written a solution which, unfortunately, requires cvxopt, since > > there is no wrapper within scipy for lapack.geqp3(). However, this > > should be an effective starting place for anyone who wants to complete > > the job. If this does eventually end up in scipy, I would appreciate a > > little mention, though! :) > > I'd suggest creating an enhancement ticket on the Scipy Trac (if a related > ticket does not already exist), and attaching your code to it: > > ? ?http://projects.scipy.org/scipy/ > > The reason is that if there is no-one who can spare time to look at this > immediately, then there is a high likelihood that your mail will be lost > and forgotten in the mailing list traffic. > > Cheers, > Pauli > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From gnurser at gmail.com Fri Jul 8 08:07:15 2011 From: gnurser at gmail.com (George Nurser) Date: Fri, 8 Jul 2011 13:07:15 +0100 Subject: [SciPy-User] Today's master branch of Scipy test('10') segfaulting on OSX In-Reply-To: References: Message-ID: Hi Pauli, I've given the revised code a try. Still fails, but some progress. Now fails at zgmresrevcom_ + 1990 instead of zgmresrevcom_ + 1980 Apple ProblemReport: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000002b9e72320 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libBLAS.dylib 0x00007fff87b9d010 ATL_zdotc_xp0yp0aXbX + 48 1 _iterative.so 0x0000000103b9cf56 zgmresrevcom_ + 1990 2 _iterative.so 0x0000000103b876ab f2py_rout__iterative_zgmresrevcom + 1563 (_iterativemodule.c:5041) On 7 July 2011 20:08, Pauli Virtanen wrote: > On Thu, 07 Jul 2011 14:43:40 +0100, George Nurser wrote: >> Running OSX 10.6.7, python 2.7.1 32/64 bit intel from python.org >> (compiled with gcc-4.2) , gfortran 4.2, numpy 1.6.1rc2 >> >> I just pulled the latest version of scipy master; ?git log gives top >> commit as e86854bfab64a16c026106ccdf90761328b83272 > > Some possible fixes went in. Can you try again with the current Git master? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From klonuo at gmail.com Fri Jul 8 15:59:38 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Fri, 8 Jul 2011 21:59:38 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References:

Message-ID: I found similar problem here: http://comments.gmane.org/gmane.comp.python.scientific.user/25843 but changing compiler to 'icpc' did not solve this problem, which is: /usr/include/c++/4.6/bits/stl_uninitialized.h(225): error: identifier "__is_trivial" is undefined or more here: http://dl.dropbox.com/u/6735093/scipy.txt Does anyone have some idea what might be wrong? Thanks From jsseabold at gmail.com Fri Jul 8 18:41:19 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 8 Jul 2011 18:41:19 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? Message-ID: A ticket was filed [1] for ttest_ind (same issue with ttest_rel and ttest_1samp) in the case of identical means and no variance. Same means, no variance d1 = np.ones(10) d2 = np.array([1,1.]) stats.ttest_ind(d1,d2) (1.0, 0.34089313230206009) Different means, no variance d1 = np.array([ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.]) d2 = np.array([ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]) stats.ttest_ind(d1,d2) (inf, 0.0) The first result doesn't make sense. In the code there are conflicting notes (with each other and what the code does) for catching this https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 I think defining t = 0/0 to be 0 is the least wrong thing to do, but certainly not t = 0/0 as 1, which gives an arbitrary p-value depending on sample sizes. Is there an accepted definition for this case? Does returning (nan, 1.0) make more sense? Skipper [1] http://projects.scipy.org/scipy/ticket/1475 From josef.pktd at gmail.com Fri Jul 8 18:51:56 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 8 Jul 2011 18:51:56 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: > A ticket was filed [1] for ttest_ind (same issue with ttest_rel and > ttest_1samp) in the case of identical means and no variance. > > Same means, no variance > > d1 = np.ones(10) > d2 = np.array([1,1.]) > stats.ttest_ind(d1,d2) > (1.0, 0.34089313230206009) > > Different means, no variance > > d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) > d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) > stats.ttest_ind(d1,d2) > (inf, 0.0) > > The first result doesn't make sense. In the code there are conflicting > notes (with each other and what the code does) for catching this > > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 > https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 > > I think defining t = 0/0 to be 0 is the least wrong thing to do, but > certainly not t = 0/0 as 1, which gives an arbitrary p-value depending > on sample sizes. Is there an accepted definition for this case? Does > returning (nan, 1.0) make more sense? > > Skipper > > [1] http://projects.scipy.org/scipy/ticket/1475 scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the original change. If anyone finds a justification for the 0/0 case, .... Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Fri Jul 8 19:06:42 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 8 Jul 2011 19:06:42 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References:

Message-ID: On Fri, Jul 8, 2011 at 6:51 PM, wrote: > On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: >> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and >> ttest_1samp) in the case of identical means and no variance. >> >> Same means, no variance >> >> d1 = np.ones(10) >> d2 = np.array([1,1.]) >> stats.ttest_ind(d1,d2) >> (1.0, 0.34089313230206009) >> >> Different means, no variance >> >> d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) >> d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) >> stats.ttest_ind(d1,d2) >> (inf, 0.0) >> >> The first result doesn't make sense. In the code there are conflicting >> notes (with each other and what the code does) for catching this >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 >> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 >> >> I think defining t = 0/0 to be 0 is the least wrong thing to do, but >> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending >> on sample sizes. Is there an accepted definition for this case? Does >> returning (nan, 1.0) make more sense? >> >> Skipper >> >> [1] http://projects.scipy.org/scipy/ticket/1475 > > scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the > original change. > > If anyone finds a justification for the 0/0 case, .... > I have the same intuition as your initial thought. Setting it to 1 *seems* aribitrary. I'd have to think more now than I have time for any justification though. Apologies for not searching and making noise instead, Skipper From josef.pktd at gmail.com Fri Jul 8 19:19:55 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 8 Jul 2011 19:19:55 -0400 Subject: [SciPy-User] Bug t-test for identical means with no variance? In-Reply-To: References:

Message-ID: On Fri, Jul 8, 2011 at 7:06 PM, Skipper Seabold wrote: > On Fri, Jul 8, 2011 at 6:51 PM, ? wrote: >> On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold wrote: >>> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and >>> ttest_1samp) in the case of identical means and no variance. >>> >>> Same means, no variance >>> >>> d1 = np.ones(10) >>> d2 = np.array([1,1.]) >>> stats.ttest_ind(d1,d2) >>> (1.0, 0.34089313230206009) >>> >>> Different means, no variance >>> >>> d1 = np.array([ 5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5., ?5.]) >>> d2 = np.array([ 2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2., ?2.]) >>> stats.ttest_ind(d1,d2) >>> (inf, 0.0) >>> >>> The first result doesn't make sense. In the code there are conflicting >>> notes (with each other and what the code does) for catching this >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873 >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963 >>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044 >>> >>> I think defining t = 0/0 to be 0 is the least wrong thing to do, but >>> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending >>> on sample sizes. Is there an accepted definition for this case? Does >>> returning (nan, 1.0) make more sense? >>> >>> Skipper >>> >>> [1] http://projects.scipy.org/scipy/ticket/1475 >> >> scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the >> original change. >> >> If anyone finds a justification for the 0/0 case, .... >> > > I have the same intuition as your initial thought. Setting it to 1 > *seems* aribitrary. I'd have to think more now than I have time for > any justification though. > > Apologies for not searching and making noise instead, noise is fine, even better if someone comes up with a real justification. I went in circles in my arguments several times. The main justification is, given that the underlying assumption is that the samples come from normal distributions, the only way we could observe identical values is if the variance goes to zero and we have a degenerate normal distribution. After that I was trying to take different limits, then ... Or suppose the true distribution is normal, but we observe only a (machine precision) discretized sample, .... Or suppose we have a large sample normal approximation to some discrete data, ... (but we only have 5 observations.) Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From njs at pobox.com Sat Jul 9 11:19:55 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Jul 2011 08:19:55 -0700 Subject: [SciPy-User] Cholesky for semi-definite matrices? Message-ID: Hi all, I've run into a case where it'd be convenient to be able to compute the Cholesky decomposition of a semi-definite matrix. (It's a covariance matrix computed from not-enough samples, so it's positive semi-definite but rank-deficient.) As any schoolchild knows, the Cholesky is well defined for such cases, but I guess that shows why you shouldn't trust schoolchildren, because I guess the standard implementations blow up if you try: In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) LinAlgError: Matrix is not positive definite - Cholesky decomposition cannot be computed Is there an easy way to do this? -- Nathaniel From charlesr.harris at gmail.com Sat Jul 9 13:30:57 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Jul 2011 11:30:57 -0600 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: On Sat, Jul 9, 2011 at 9:19 AM, Nathaniel Smith wrote: > Hi all, > > I've run into a case where it'd be convenient to be able to compute > the Cholesky decomposition of a semi-definite matrix. (It's a > covariance matrix computed from not-enough samples, so it's positive > semi-definite but rank-deficient.) As any schoolchild knows, the > Cholesky is well defined for such cases, but I guess that shows why > you shouldn't trust schoolchildren, because I guess the standard > implementations blow up if you try: > > In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) > LinAlgError: Matrix is not positive definite - Cholesky > decomposition cannot be computed > > Is there an easy way to do this? > > The positive definite Cholesky algorithm is basically Gauss elimination without pivoting. The semidefinite factorization is generally not unique unless pivoting is introduced to move zero factors to a common spot and if your matrix has errors the idea of semidefinite becomes pretty indefinite anyway. Depending on the use case, you will probably have more success with either eigh or svd. Note that in the svd case the singular values are always positive, but the resulting factorization will not be symmetric when one of the eigenvalues is negative. In the eigh case you can simply set small, or negative, eigenvalues to zero. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Sat Jul 9 13:36:01 2011 From: e.antero.tammi at gmail.com (eat) Date: Sat, 9 Jul 2011 20:36:01 +0300 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: Hi, On Sat, Jul 9, 2011 at 6:19 PM, Nathaniel Smith wrote: > Hi all, > > I've run into a case where it'd be convenient to be able to compute > the Cholesky decomposition of a semi-definite matrix. (It's a > covariance matrix computed from not-enough samples, so it's positive > semi-definite but rank-deficient.) As any schoolchild knows, the > Cholesky is well defined for such cases, but I guess that shows why > you shouldn't trust schoolchildren, because I guess the standard > implementations blow up if you try: > > In [155]: np.linalg.cholesky([[1, 0], [0, 0]]) > LinAlgError: Matrix is not positive definite - Cholesky > decomposition cannot be computed > > Is there an easy way to do this? > How bout a slight regularization, like: In []: A Out[]: array([[1, 0], [0, 0]]) In []: A+ 1e-14* eye(2) Out[]: array([[ 1.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 1.00000000e-14]]) In []: L= linalg.cholesky(A+ 1e-14* eye(2)) In []: dot(L, L.T) Out[]: array([[ 1.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 1.00000000e-14]]) My 2 cents, - eat > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.berg14 at gmail.com Sun Jul 10 00:45:37 2011 From: neil.berg14 at gmail.com (Neil Berg) Date: Sat, 9 Jul 2011 21:45:37 -0700 Subject: [SciPy-User] scipy.io.netcdf error Message-ID: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> Hello scipy community, I recently installed scipy 0.10 through Chris Fonnesbeck's Scipy Superpack script. When trying to read a netCDF file I receive an error message. >>> python >>> from scipy.io import netcdf >>> f = netcdf.netcdf_file('/Users/vert/Downloads/sres.nc','r') Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 205, in __init__ self._read() File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 492, in _read self._read_var_array() File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 576, in _read_var_array buffer=mm, offset=begin_, order=0) TypeError: data type ">d8" not understood Has anyone encountered this error and is there a way to cure it? Many thanks, Neil ----------------------------------------------- Mac OS X 10.6.7 Darwin Kernel Version 10.7.4 i386 ----------------------------------------------- From lesserwhirls at gmail.com Sun Jul 10 03:55:18 2011 From: lesserwhirls at gmail.com (Sean Arms) Date: Sun, 10 Jul 2011 01:55:18 -0600 Subject: [SciPy-User] scipy.io.netcdf error In-Reply-To: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> References: <8F3A6AA2-8F17-4B6C-B930-C5BAE2846F1B@gmail.com> Message-ID: On Jul 9, 2011, at 10:45 PM, Neil Berg wrote: > Hello scipy community, > > I recently installed scipy 0.10 through Chris Fonnesbeck's Scipy Superpack script. When trying to read a netCDF file I receive an error message. > >>>> python >>>> from scipy.io import netcdf >>>> f = netcdf.netcdf_file('/Users/vert/Downloads/sres.nc','r') > Traceback (most recent call last): > File "", line 1, in > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 205, in __init__ > self._read() > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 492, in _read > self._read_var_array() > File "/Library/Python/2.6/site-packages/scipy-0.10.0.dev_20110629-py2.6-macosx-10.6-universal.egg/scipy/io/netcdf.py", line 576, in _read_var_array > buffer=mm, offset=begin_, order=0) > TypeError: data type ">d8" not understood > > Has anyone encountered this error and is there a way to cure it? > Hi Neil, I recently ran into this with PyDap. In the latest dev branch of Numpy, >d8 does not appear to be a valid dtype. That said, I did a little digging and noticed that in Numpy 1.4, the >d8 dtype is automatically changed to >f8, but this is done silently. It was a one line fix in PyDap, but I don't know the story as to when things changed in Numpy. Hope that helps (in some way at least), Sean > Many thanks, > Neil > > ----------------------------------------------- > Mac OS X 10.6.7 > Darwin Kernel Version 10.7.4 i386 > ----------------------------------------------- > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From klonuo at gmail.com Sun Jul 10 06:36:58 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Sun, 10 Jul 2011 12:36:58 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References:

Message-ID: In a hope to be helpful to others in possible similar scenario I'm replying with this: Problem here was missing requirements from this sparse matrix packages: http://www.cise.ufl.edu/research/sparse/ Which I build for scipy sparse and umfpack and then tried to build scipy again as previous building error was triggered in sparsetools This resulted in happy end As a note UMFPACK wasn't defined in my scipy site.cfg and it seems like it doesn't matter Cheers From klonuo at gmail.com Sun Jul 10 07:17:21 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Sun, 10 Jul 2011 13:17:21 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References:

Message-ID: On Sun, Jul 10, 2011 at 12:36 PM, Klonuo Umom wrote: > In a hope to be helpful to others in possible similar scenario I'm > replying with this: > > Problem here was missing requirements from this sparse matrix > packages: http://www.cise.ufl.edu/research/sparse/ > Which I build for scipy sparse and umfpack and then tried to build > scipy again as previous building error was triggered in sparsetools > > This resulted in happy end This seems wrong. I build latest scipy from svn successfully, but not stable 0.9.0 Additionally I compared problematic file: scipy/sparse/sparsetools/csr_wrap.cxx in both versions and they don't differ. Comparing 'sparsetools' directories between both versions did not say much to me either, so I'm afraid above mentioned solution is non-working. Sorry for that Cheers From kgdunn at gmail.com Sun Jul 10 14:14:25 2011 From: kgdunn at gmail.com (Kevin Dunn) Date: Sun, 10 Jul 2011 14:14:25 -0400 Subject: [SciPy-User] SciPy Central: file and link sharing site Message-ID: I'm announcing an alpha release of SciPy Central, a website for sharing code snippets, recipes and links related to scientific computing, specifically using NumPy, SciPy, matplotlib, IPython and similar tools. http://scipy-central.org The idea for the website grew out of previous discussions on this list. The site is currently in alpha mode, which means we'd like you to stress test it by filling in garbage information and trying to break it. We'll keep it in this mode for a couple of days and iron out any bugs. Please report those at https://github.com/kgdunn/SciPyCentral/issues - for those familiar with Django, we've left the site in DEBUG mode, so you can copy/paste the stack trace in your bug reports. Then we will delete all submissions and start in beta mode with DEBUG turned off. If you have any suggestions for improvements, please also post them on the issues list. Thanks, Kevin From peter.norlindh at gmail.com Sun Jul 10 15:18:32 2011 From: peter.norlindh at gmail.com (Peter Norlindh) Date: Sun, 10 Jul 2011 21:18:32 +0200 Subject: [SciPy-User] Install Scipy on Python 3.X Message-ID: Hi, I have just installed Python 3.2 on Ubuntu 11.04 and am now struggling to install Scipy 0.9.0. Earlier, I effortlessly installed Scipy (from the repository) on Python 2.7, but installing it on Python 3.2 seems to be a whole different animal. Are there any easy-to-follow installation instructions available? The INSTALL files that come with the tar files (Numpy and Scipy) are probably very informative, but I find them too extensive and hard to follow. Any help to get it all set up would be greatly appreciated. Best Regards, Peter Norlindh -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sun Jul 10 18:40:32 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 10 Jul 2011 15:40:32 -0700 Subject: [SciPy-User] SciPy Central: file and link sharing site In-Reply-To: References: Message-ID: On Sun, Jul 10, 2011 at 11:14 AM, Kevin Dunn wrote: > I'm announcing an alpha release of SciPy Central, a website for > sharing code snippets, recipes and links related to scientific > computing, specifically using NumPy, SciPy, matplotlib, IPython and > similar tools. > > http://scipy-central.org Thank you for pushing on this and making it happen! I'm sure it will grow into a very useful resource, I've forwarded the announcement to the ipython lists. Best, f From klonuo at gmail.com Mon Jul 11 13:31:08 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Mon, 11 Jul 2011 19:31:08 +0200 Subject: [SciPy-User] Building SciPy on Debian with Intel compilers In-Reply-To: References:

Message-ID: I ended this frustration, by commenting 'scipy/sparse/sparsetools/setup.py': #~ for fmt in ['csr','csc','coo','bsr','dia','csgraph']: for fmt in ['coo','dia']: so that only two modules are availabe in sparsetools, as other could not be compiled It's a bad feeling when trying to optimize build and then end with slightly crippled package, but this is too complex for me to handle it differently, with my current skills and lacking google helpers. I still don't know how I compiled scipy 0.10dev, as now I can't, and tried any possible combinations with Intel variables that I remember using the other day. In summary, I have numpy 1.6.0 and scipy 0.9.0 MKL builds, where scipy test shows one error (besides errors in missing sparsetools modules): ERROR: Failure: ImportError (/usr/local/lib/python2.6/dist-packages/scipy/linalg/atlas_version.so: undefined symbol: ATL_buildinfo) and two failures in: test_basic.TestIfftnSingle test_basic.TestCephes which I guess aren't alarming From ralf.gommers at googlemail.com Mon Jul 11 15:28:14 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 11 Jul 2011 21:28:14 +0200 Subject: [SciPy-User] ANN: NumPy 1.6.1 release candidate 3 Message-ID: Hi, I am pleased to announce the availability of the third release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() #1895/1896 iter: writeonly operands weren't always being buffered correctly This third RC has only a single change compared to RC2 (for #1895/1896), which fixes a serious regression in the iterator. If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinstocks at gmail.com Mon Jul 11 21:44:07 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Mon, 11 Jul 2011 18:44:07 -0700 (PDT) Subject: [SciPy-User] generic_flapack.pyf and geqp3 Message-ID: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Hi, all. I am planning to try to add functionality to scipy.linalg.qr(), specifically to allow qr decomposition with pivoting. However, I have almost no knowledge of how to wrap the function in scipy/linalg/ generic_flapack.pyf. Could somebody please point me in the correct direction? Thanks, Collin Stocks From jsseabold at gmail.com Mon Jul 11 22:25:57 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 11 Jul 2011 21:25:57 -0500 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: > Hi, all. > > I am planning to try to add functionality to scipy.linalg.qr(), > specifically to allow qr decomposition with pivoting. However, I have > almost no knowledge of how to wrap the function in scipy/linalg/ > generic_flapack.pyf. > > Could somebody please point me in the correct direction? Far from an expert, but I've used the 'smart way' in f2py to wrap some LAPACK stuff. Basically, you run f2py on the fortran source, then edit the .pyf file as you need to (use the f2py docs, the routine you're wrapping's documentation, and the pyf examples in scipy for guidance), and then run f2py again to build it (with -llapack or whatever if you need to link against other libraries). http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way I'd be interested to see an example of how to accomplish something similar with fwrap. Skipper From collinstocks at gmail.com Mon Jul 11 23:58:12 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Mon, 11 Jul 2011 20:58:12 -0700 (PDT) Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com> Message-ID: Thanks :) Actually, I already managed to figure out how to create the patch before reading your post, but thanks for the quick reply anyway. Any idea where I should go from here? I've submitted a patch to the tracker (http://projects.scipy.org/scipy/ticket/1473). I guess I should just wait... It's just that there is some specific code I want to run (I have already written it), and I would prefer that my code not rely on my own personal fork of the SciPy project ;D Actually, the code I have written is probably going to end up as part of SciPy (or one of the SciKits), as it is a Python implementation of the stepwisefit function from MatLab's statistics toolbox, which is rather useful to some people. (Perhaps seeing this may be an incentive for some of the developers to accept my patch for qr()...) On Jul 11, 10:25?pm, Skipper Seabold wrote: > On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: > > Hi, all. > > > I am planning to try to add functionality to scipy.linalg.qr(), > > specifically to allow qr decomposition with pivoting. However, I have > > almost no knowledge of how to wrap the function in scipy/linalg/ > > generic_flapack.pyf. > > > Could somebody please point me in the correct direction? > > Far from an expert, but I've used the 'smart way' in f2py to wrap some > LAPACK stuff. Basically, you run f2py on the fortran source, then edit > the .pyf file as you need to (use the f2py docs, the routine you're > wrapping's documentation, and the pyf examples in scipy for guidance), > and then run f2py again to build it (with -llapack or whatever if you > need to link against other libraries). > > http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way > > I'd be interested to see an example of how to accomplish something > similar with fwrap. > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From jsseabold at gmail.com Tue Jul 12 00:17:06 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 11 Jul 2011 23:17:06 -0500 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com>

Message-ID: On Mon, Jul 11, 2011 at 10:58 PM, Collin Stocks wrote: > Thanks :) > Actually, I already managed to figure out how to create the patch > before reading your post, but thanks for the quick reply anyway. Any > idea where I should go from here? I've submitted a patch to the > tracker (http://projects.scipy.org/scipy/ticket/1473). I guess I > should just wait... Some advice- If you leave it on trac, you should at least attach a diff (preferably versus the most recent scipy) so it's easier to see exactly what's changed. Patches should also include a test. https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt If you really want to go the extra mile, you should add a test and then use git to make a patch and submit to the ML/trac for review. Then you can make a pull request if it looks good. This keeps the legwork pretty low for core developers. http://docs.scipy.org/doc/numpy/dev/index.html > > It's just that there is some specific code I want to run (I have > already written it), and I would prefer that my code not rely on my > own personal fork of the SciPy project ;D > > Actually, the code I have written is probably going to end up as part > of SciPy (or one of the SciKits), as it is a Python implementation of > the stepwisefit function from MatLab's statistics toolbox, which is > rather useful to some people. (Perhaps seeing this may be an incentive > for some of the developers to accept my patch for qr()...) > We would certainly be interested in a stepwisefit implementation for statsmodels. http://statsmodels.sourceforge.net/ Skipper > On Jul 11, 10:25?pm, Skipper Seabold wrote: >> On Mon, Jul 11, 2011 at 8:44 PM, Collin Stocks wrote: >> > Hi, all. >> >> > I am planning to try to add functionality to scipy.linalg.qr(), >> > specifically to allow qr decomposition with pivoting. However, I have >> > almost no knowledge of how to wrap the function in scipy/linalg/ >> > generic_flapack.pyf. >> >> > Could somebody please point me in the correct direction? >> >> Far from an expert, but I've used the 'smart way' in f2py to wrap some >> LAPACK stuff. Basically, you run f2py on the fortran source, then edit >> the .pyf file as you need to (use the f2py docs, the routine you're >> wrapping's documentation, and the pyf examples in scipy for guidance), >> and then run f2py again to build it (with -llapack or whatever if you >> need to link against other libraries). >> >> http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#the-smart-way >> >> I'd be interested to see an example of how to accomplish something >> similar with fwrap. >> >> Skipper >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From collinstocks at gmail.com Tue Jul 12 00:22:25 2011 From: collinstocks at gmail.com (Collin Stocks) Date: Tue, 12 Jul 2011 00:22:25 -0400 Subject: [SciPy-User] generic_flapack.pyf and geqp3 In-Reply-To: References: <69cfb351-7ddc-4d92-b38e-4a9cff4deb51@w4g2000yqm.googlegroups.com>

Message-ID: <1310444545.2548.85.camel@SietchTabr> Thanks for the advice. I'll put that off for tomorrow, though, as I desperately need some rest... Good night, -- Collin -------------- next part -------------- An embedded message was scrubbed... From: Skipper Seabold Subject: Re: [SciPy-User] generic_flapack.pyf and geqp3 Date: Mon, 11 Jul 2011 23:17:06 -0500 Size: 6027 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From gael.varoquaux at normalesup.org Tue Jul 12 10:23:17 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 12 Jul 2011 16:23:17 +0200 Subject: [SciPy-User] Cholesky for semi-definite matrices? In-Reply-To: References: Message-ID: <20110712142317.GE17559@phare.normalesup.org> On Sat, Jul 09, 2011 at 08:36:01PM +0300, eat wrote: > > (It's a > > covariance matrix computed from not-enough samples, so it's positive > > semi-definite but rank-deficient.) > How bout a slight regularization, Indeed, from a statistics point of view, and non definite positive covariance matrix is simply an estimation error. As a descriptive statistic, it may look good, but it does not contain much information on the population covariance. The easiest solution is to regularize it. The good news is that there exists good heuristics (approximate oracles) to find the optimal regularization parameter). If you assume that your underlying data is Gaussian, the 'Approximate Oracle Shrinkage' is optimal. If you don't want this assumption, the Ledoit-Wolf shrinkage works great. In practice they are often similar, but LW tends to under-penalize compared to AOS if you have little data compare to the dimension of your dataset. Another good news is that you can find Python implementations of these heuristics in the scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/covariance/shrunk_covariance_.py you will find references to the paper if you are interested, and you can simply copy out the function from the scikit-learn if you don't want to depend on it. HTH, Gael From ralf.gommers at googlemail.com Tue Jul 12 15:16:35 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 12 Jul 2011 21:16:35 +0200 Subject: [SciPy-User] Install Scipy on Python 3.X In-Reply-To: References: Message-ID: Hi Peter, On Sun, Jul 10, 2011 at 9:18 PM, Peter Norlindh wrote: > Hi, > > I have just installed Python 3.2 on Ubuntu 11.04 and am now struggling to > install Scipy 0.9.0. Earlier, I effortlessly installed Scipy (from the > repository) on Python 2.7, but installing it on Python 3.2 seems to be a > whole different animal. > > Are there any easy-to-follow installation instructions available? The > INSTALL files that come with the tar files (Numpy and Scipy) are probably > very informative, but I find them too extensive and hard to follow. > > Any help to get it all set up would be greatly appreciated. > > Installing under Python 3.x is not much different, except that it takes longer due to 2to3 running before compilation. If you have the right prerequisites installed the usual "python setup.py install" should do the job. The most up-to-date instructions are at http://scipy.org/Installing_SciPy/. If you run into a specific issue, please tell us your OS, compilers, etc. plus the build commands you used and the build log. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwang at streamitive.com Tue Jul 12 16:00:14 2011 From: pwang at streamitive.com (Peter Wang) Date: Tue, 12 Jul 2011 15:00:14 -0500 Subject: [SciPy-User] Scipy 2011 Convore thread now open Message-ID: Hi folks, I have gone ahead and created a Convore group for the SciPy 2011 conference: https://convore.com/scipy-2011/ I have already created threads for each of the tutorial topics, and once the conference is underway, we'll create threads for each talk, so that audience can interact and post questions. Everyone is welcome to create topics of their own, in addition to the "official" conference topics. For those who are unfamiliar with Convore, it is a cross between a mailing list and a very souped-up IRC. It's usable for aynchronous discussion, but great for realtime, topical chats. Those of you who were at PyCon this year probably saw what a wonderful tool Convore proved to be for a tech conference. People used it for everything from BoF planning to dinner coordination to good-natured heckling of lightning talk speakers. I'm hoping that it will be used to similarly good effect for the SciPy Cheers, Peter From lbloy at seas.upenn.edu Tue Jul 12 17:01:37 2011 From: lbloy at seas.upenn.edu (Luke Bloy) Date: Tue, 12 Jul 2011 17:01:37 -0400 Subject: [SciPy-User] Sparse matrix question Message-ID: <4E1CB631.1020904@seas.upenn.edu> Hi, I have what I hope is a fairly simple scipy question about sparse matrices. I have a sparse matrix (A) that i use to build a constraint matrix for an optimisation problem. The constraints, that concern A, are that A_{i,j} d_{j} - A_{j,i} d_{i} == 0. I would then find the optimal d vector. So I'm having problems efficiently building my constraints. The first part is simple as it is just the nonzero elements of A but accessing A transpose in the same order is difficult. The basic logic is this.... rows=numpy.zeros(2*A.nnz, dtype=numpy.int32) cols=numpy.zeros(2*A.nnz, dtype=numpy.int32) data=numpy.zeros(2*A.nnz, dtype=numpy.float64) #A_{i,j} d_{j} numNonZeros = A.nnz rows[:numNonZeros] = numpy.arange(numNonZeros) cols[:numNonZeros] = A.col data[:numNonZeros] = A.data #-A_{j,i} d_{i} rows[numNonZeros:] = numpy.arange(numNonZeros) cols[numNonZeros:] = A.row data[numNonZeros:] = - A[A.col,A.row] ### Atranspose[A.row, A.col] The problem is that this takes too long (> 2minutes for a 3000x3000 matrix with 3 million nonzeros). similary code in matlab is an order of magnitude faster? I've tried both csr and csc for doing the memory access of the transpose, (attached is the csrTest , i can send the matrix file if you are interested) Does anyone have any suggestions on speeding this up? Thanks, Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: testCsr.py Type: text/x-python Size: 1516 bytes Desc: not available URL: From developer at studioart.org Wed Jul 13 01:36:12 2011 From: developer at studioart.org (Long Duong) Date: Tue, 12 Jul 2011 22:36:12 -0700 Subject: [SciPy-User] [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: Does anybody know if there are there videos of the conference this year? Best regards, Long Duong UC Irvine Biomedical Engineering long at studioart.org On Tue, Jul 12, 2011 at 1:00 PM, Peter Wang wrote: > Hi folks, > > I have gone ahead and created a Convore group for the SciPy 2011 > conference: > > https://convore.com/scipy-2011/ > > I have already created threads for each of the tutorial topics, and > once the conference is underway, we'll create threads for each talk, > so that audience can interact and post questions. Everyone is welcome > to create topics of their own, in addition to the "official" > conference topics. > > For those who are unfamiliar with Convore, it is a cross between a > mailing list and a very souped-up IRC. It's usable for aynchronous > discussion, but great for realtime, topical chats. Those of you who > were at PyCon this year probably saw what a wonderful tool Convore > proved to be for a tech conference. People used it for everything > from BoF planning to dinner coordination to good-natured heckling of > lightning talk speakers. I'm hoping that it will be used to similarly > good effect for the SciPy > > > Cheers, > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jul 13 12:42:37 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Jul 2011 11:42:37 -0500 Subject: [SciPy-User] [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 00:36, Long Duong wrote: > > Does anybody know if there are there videos of the conference this year? Yes. Announcements will be made when they start going online. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From jreback at yahoo.com Tue Jul 12 07:27:00 2011 From: jreback at yahoo.com (Jeff Reback) Date: Tue, 12 Jul 2011 04:27:00 -0700 (PDT) Subject: [SciPy-User] statsmodels - rlm Message-ID: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> Hi, ? Is their a way to recover the final Results class in a robust estimation? ? see below - I can clearly replicate the WLS regression and get a Results class with the correct results; is their a reference in the RLMResults class (that points to the wls_results) that I am missing? ? thanks, ? Jeff ? # using v2 of statsmodels import numpy as np import scikits.statsmodels as sm ? #delivery time(minutes)??? endog = np.array([16.68, 11.50, 12.03, 14.88, 13.75, 18.11, 8.00, 17.83,??? ????????????????? 79.24, 21.50, 40.33, 21.00, 13.50, 19.75, 24.00, 29.00, 15.35, 19.00,??? ????????????????? 9.50, 35.10, 17.90, 52.32, 18.75, 19.83, 10.75]) ? #number of cases, distance (Feet)??? exog = np.array([[7, 3, 3, 4, 6, 7, 2, 7, 30, 5, 16, 10, 4, 6, 9, 10, 6, 7, 3, 17, 10, 26, 9, 8, 4], ???????????????? [560, 220, 340, 80, 150, 330, 110, 210, 1460, 605, 688, 215, 255, 462, 448, 776, 200, 132, 36, 770, 140, 810, 450, 635,150]]) exog = exog.T??? exog = sm.add_constant(exog) rlm? = sm.RLM(endog, exog).fit() print "RLM params -> %s, r2 -> %s" % (rlm.params, getattr(rlm,'rsquared',None)) wls? = sm.WLS(endog, exog, weights = rlm.weights).fit() print "WLS params -> %s, r2 -> %s" % (wls.params, wls.rsquared)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Jul 14 12:50:58 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 14 Jul 2011 11:50:58 -0500 Subject: [SciPy-User] statsmodels - rlm In-Reply-To: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> References: <1310470020.36827.YahooMailNeo@web125903.mail.ne1.yahoo.com> Message-ID: On Tue, Jul 12, 2011 at 6:27 AM, Jeff Reback wrote: > Hi, > > Is their a way to recover the final Results class in a robust estimation? > > see below - I can clearly replicate the WLS regression and get a Results > class with the correct results; > is their a reference in the RLMResults class (that points to the > wls_results) that I am missing? > > thanks, > > Jeff > > # using v2 of statsmodels > import numpy as np > import scikits.statsmodels as sm > > #delivery time(minutes) > endog = np.array([16.68, 11.50, 12.03, 14.88, 13.75, 18.11, 8.00, 17.83, > ????????????????? 79.24, 21.50, 40.33, 21.00, 13.50, 19.75, 24.00, 29.00, > 15.35, 19.00, > ????????????????? 9.50, 35.10, 17.90, 52.32, 18.75, 19.83, 10.75]) > > #number of cases, distance (Feet) > exog = np.array([[7, 3, 3, 4, 6, 7, 2, 7, 30, 5, 16, 10, 4, 6, 9, 10, 6, 7, > 3, 17, 10, 26, 9, 8, 4], > ???????????????? [560, 220, 340, 80, 150, 330, 110, 210, 1460, 605, 688, > 215, 255, 462, 448, 776, 200, 132, 36, 770, 140, 810, 450, 635,150]]) > exog = exog.T > exog = sm.add_constant(exog) > rlm? = sm.RLM(endog, exog).fit() > print "RLM params -> %s, r2 -> %s" % (rlm.params, > getattr(rlm,'rsquared',None)) > wls? = sm.WLS(endog, exog, weights = rlm.weights).fit() > print "WLS params -> %s, r2 -> %s" % (wls.params, wls.rsquared) > For posterity, answered here. https://groups.google.com/group/pystatsmodels/browse_thread/thread/ab999ff6ab32c5e0 Skipper From paul.blelloch at ata-e.com Thu Jul 14 17:56:51 2011 From: paul.blelloch at ata-e.com (Paul Blelloch) Date: Thu, 14 Jul 2011 14:56:51 -0700 Subject: [SciPy-User] Deciphering Matlab Structures in Python Message-ID: I used the scipy.io.loadmat function to read a Matlab save file that includes some structures. In particular my Matlab code looks something like: a=randn(10,10); e=eig(a); c.mat=a; c.eig=e; save When I read this in I have not trouble getting the a and e matrices, but when I look at the structure c I get something that looks like: array([[ ([[-1.918381876114684, 0.35909312220124379, 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, -1.1216999904267995, 0.86106602555594924]], [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], [(-2.5789412296806438+0j)], [(0.30103481909678725+1.8315300529614498j)], [(0.30103481909678725-1.8315300529614498j)], [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], [(-0.6068802651298042-1.19212639770675j)], [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], dtype=[('mat', '|O4'), ('eig', '|O4')]) Looking through the numpy documentation I can't figure out how to interpret that. How do I pull the fields out of the structure? THANKS, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 14 18:11:03 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 14 Jul 2011 23:11:03 +0100 Subject: [SciPy-User] Deciphering Matlab Structures in Python In-Reply-To: References: Message-ID: Hi, On Thu, Jul 14, 2011 at 10:56 PM, Paul Blelloch wrote: > I used the scipy.io.loadmat function to read a Matlab save file > that?includes some structures. ?In particular my Matlab code looks?something > like: > > a=randn(10,10); > e=eig(a); > c.mat=a; > c.eig=e; > save > > When I read this in I have not trouble getting the a and e matrices,? but > when I look at the structure c I get something that looks like: > > array([[ ([[-1.918381876114684, 0.35909312220124379, > 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, > -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, > -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, > -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, > 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, > -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], > [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, > -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, > 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, > -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, > 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, > -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, > -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, > 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, > 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, > -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], > [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, > 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, > 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, > 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, > -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, > 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, > -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, > -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, > -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, > 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], > [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, > 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, > 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, > -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, > 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, > 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, > -1.1216999904267995, 0.86106602555594924]], > [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], > [(-2.5789412296806438+0j)], > [(0.30103481909678725+1.8315300529614498j)], > [(0.30103481909678725-1.8315300529614498j)], > [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], > [(-0.6068802651298042-1.19212639770675j)], > [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], > ? ? ? dtype=[('mat', '|O4'), ('eig', '|O4')]) > > Looking through the numpy documentation I can't figure out how to?interpret > that. ?How do I pull the fields out of the structure? This is just a repeat of my reply on the Python X,Y list, just for reference. What you got here was a structured array. You can get what you want with something like: >>> import scipy.io as sio >>> ws = sio.loadmat('matlab.mat') >>> c = ws['c'] You got this far already of course. Notice that 'c' is a 2D array with a composite 'dtype': >>> c.shape (1, 1) >>> c.dtype dtype([('mat', '|O8'), ('eig', '|O8')] To get the underlying fields you need: >>> eig = c[0,0]['eig'] >>> mat = c[0,0]['mat'] To learn more it might be worth reading a little about structured arrays in numpy. Best, Matthew From paul.blelloch at ata-e.com Thu Jul 14 18:13:30 2011 From: paul.blelloch at ata-e.com (Paul Blelloch) Date: Thu, 14 Jul 2011 15:13:30 -0700 Subject: [SciPy-User] Deciphering Matlab Structures in Python In-Reply-To: Message-ID: <6133335d2877964fbfbe2e4785701123@mail> THANKS! I was missing the [0,0] part. -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Matthew Brett Sent: Thursday, July 14, 2011 3:11 PM To: SciPy Users List Subject: Re: [SciPy-User] Deciphering Matlab Structures in Python Hi, On Thu, Jul 14, 2011 at 10:56 PM, Paul Blelloch wrote: > I used the scipy.io.loadmat function to read a Matlab save file > that?includes some structures. ?In particular my Matlab code looks?something > like: > > a=randn(10,10); > e=eig(a); > c.mat=a; > c.eig=e; > save > > When I read this in I have not trouble getting the a and e matrices,? but > when I look at the structure c I get something that looks like: > > array([[ ([[-1.918381876114684, 0.35909312220124379, > 0.071481108399536447, 0.19413316926540342, -0.16044557980861471, > -1.2431101120071242, 1.1728553895658809, 0.48682974787947486, > -2.6862862203427951, 1.6479268546008081], [-0.13106782428104052, > -0.79434474702678692, 0.15562063581539223, 0.29769161930929805, > 0.37883517610546497, 0.13185918869913515, -2.1155833182251094, > -0.4459044413697299, 1.5483561910050594, -0.30667757004699159], > [-0.76863166681707873, -0.22731164066134216, -0.18219139268446782, > -0.71012508212938263, -0.10716072249907738, -0.99947508351036363, > 0.46216113959481137, 0.32567307426214315, -0.92437868406001578, > -0.77404140889205997], [2.3899518363067971, 1.5938213548880649, > 0.73101584111958162, -0.68767410135748297, 0.21283509315953025, > -0.35421345818508981, -2.4599031616561144, -0.02009780648140742, > -0.72040490902555765, -1.3542395842984021], [0.077245679591339572, > 0.15520563382674926, -0.34756806580692184, 0.57283812483950614, > 0.46806391670861758, 0.10558871537972472, -0.30485718475158774, > -1.8884755753587876, 0.83350531035866249, -0.32592678918795287], > [0.37560745023340852, 0.17859562900297904, 0.23442400798144603, > 0.44521760367790375, -1.6160743410467069, 0.95542626643090245, > 0.39823325391367814, 0.33836146832009201, -0.56649185628868159, > 1.5532163153526557], [0.39556416936787731, -0.33773084599208325, > -0.86772936409388435, -1.3259592591460707, 0.014738006197359526, > 0.46479897799135333, -1.7057865505375613, 0.33780540287940392, > -0.99915099624077919, 0.62813120976932557], [-0.11249908896048552, > -1.5249918897593304, -0.98253858503060265, -1.4998754094582201, > -0.037764347672058281, 0.26670638822706283, 0.75940981180842837, > 0.42433121858231648, 0.64126705212804702, -1.0275461814124498], > [-1.8290367022161331, -0.70940664667282161, -0.29357242375327913, > 1.0159682271872925, 0.40380900009311393, -1.4232233423222125, > 0.20323761868334428, 0.11287078734861246, -0.25395009085808468, > -0.20194229402517888], [2.0914521934834784, -0.86661663365224473, > 0.82446963631241454, -0.027515788346229055, 0.30525439285092776, > 0.11105730902622701, 0.95868278992227296, -0.11827036575813449, > -1.1216999904267995, 0.86106602555594924]], > [[(-4.5696334670036887+0j)], [(3.7785600753832629+0j)], > [(-2.5789412296806438+0j)], > [(0.30103481909678725+1.8315300529614498j)], > [(0.30103481909678725-1.8315300529614498j)], > [(1.4965987808471322+0j)], [(-0.6068802651298042+1.19212639770675j)], > [(-0.6068802651298042-1.19212639770675j)], > [(-0.54808825947184969+0j)], [(0.19975366069054107+0j)]])]], > ? ? ? dtype=[('mat', '|O4'), ('eig', '|O4')]) > > Looking through the numpy documentation I can't figure out how to?interpret > that. ?How do I pull the fields out of the structure? This is just a repeat of my reply on the Python X,Y list, just for reference. What you got here was a structured array. You can get what you want with something like: >>> import scipy.io as sio >>> ws = sio.loadmat('matlab.mat') >>> c = ws['c'] You got this far already of course. Notice that 'c' is a 2D array with a composite 'dtype': >>> c.shape (1, 1) >>> c.dtype dtype([('mat', '|O8'), ('eig', '|O8')] To get the underlying fields you need: >>> eig = c[0,0]['eig'] >>> mat = c[0,0]['mat'] To learn more it might be worth reading a little about structured arrays in numpy. Best, Matthew _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From cournape at gmail.com Fri Jul 15 09:08:49 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 15 Jul 2011 22:08:49 +0900 Subject: [SciPy-User] [ANN] Bento 0.0.6, a packaging solution for python software Message-ID: Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. It supports every python version from 2.4 to 3.2. You can take a look at its main features on Bento's main page (http://cournape.github.com/Bento). The main features of this 0.0.6 release are: - Completely revamped distutils compatibility layer: it is now a thin layer around bento infrastructure, so that most bento packages should be pip-installable, while still keeping bento customization capabilities. - Build directory is now customizable through bentomaker with --build-directory option - Out of tree builds support (i.e. running bento in a directory which does not contain bento.info), with global --bento-info option - Hook File can now be specified in recursed bento.info - Preliminary support for .mpkg (Mac OS X native packaging) - More consistent API for extension/compiled library build registration - Both numpy and scipy can now be built with bento + waf as a build backend Bento is discussed on the bento mailing list (http://librelist.com/browser/bento). cheers, David From ndbecker2 at gmail.com Fri Jul 15 12:58:30 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 12:58:30 -0400 Subject: [SciPy-User] estimation problem Message-ID: I have a known signal (vector) 'x'. I recieve a vector 'y' y = F(k x) + n where n is Gaussian noise, and k is an unknown gain parameter. I want to estimate k. F is a known function (nonlinear, memoryless). What might be a good approach to try? I'd like this to be an 'online' approach - that is, I provide batches of training vectors (x, n), and the estimator will improve the estimate (hopefully) as more data is supplied. From josef.pktd at gmail.com Fri Jul 15 13:10:33 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 13:10:33 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: > I have a known signal (vector) 'x'. ?I recieve a vector 'y' > > y = F(k x) + n > > where n is Gaussian noise, and k is an unknown gain parameter. > > I want to estimate k. > > F is a known function (nonlinear, memoryless). > > What might be a good approach to try? ?I'd like this to be an 'online' approach > - that is, I provide batches of training vectors (x, n), and the estimator will > improve the estimate (hopefully) as more data is supplied. scipy.optimize.curve_fit I would reestimate with the entire sample after a batch arrives using the old estimate as a starting value. There might be shortcuts reusing and updating the Jacobian and Hessian, but I don't know of anything that could be used directly. (I don't have much idea about non-linear kalman filters and whether they would help in this case.) Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ndbecker2 at gmail.com Fri Jul 15 13:20:10 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 13:20:10 -0400 Subject: [SciPy-User] estimation problem References:

Message-ID: josef.pktd at gmail.com wrote: > On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >> I have a known signal (vector) 'x'. I recieve a vector 'y' >> >> y = F(k x) + n >> >> where n is Gaussian noise, and k is an unknown gain parameter. >> >> I want to estimate k. >> >> F is a known function (nonlinear, memoryless). >> >> What might be a good approach to try? I'd like this to be an 'online' >> approach - that is, I provide batches of training vectors (x, n), and the >> estimator will improve the estimate (hopefully) as more data is supplied. > > scipy.optimize.curve_fit > > I would reestimate with the entire sample after a batch arrives using > the old estimate as a starting value. > > There might be shortcuts reusing and updating the Jacobian and > Hessian, but I don't know of anything that could be used directly. (I > don't have much idea about non-linear kalman filters and whether they > would help in this case.) > In my case, x, y, n are complex. I guess I need to handle that myself (somehow). Traceback (most recent call last): File "test_curvefit.py", line 378, in run_line (sys.argv) File "test_curvefit.py", line 375, in run_line result = run (opt, cmdline) File "test_curvefit.py", line 244, in run popt, pcov = curve_fit (func, rcv_in[SPS*SI:SPS*SI+2*N], mod_out[SPS*SI:SPS*SI+2*N]) File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 426, in curve_fit res = leastsq(func, p0, args=args, full_output=1, **kw) File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 283, in leastsq gtol, maxfev, epsfcn, factor, diag) minpack.error: Result from function call is not a proper array of floats. From josef.pktd at gmail.com Fri Jul 15 14:04:39 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 14:04:39 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References:

Message-ID: On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: > josef.pktd at gmail.com wrote: > >> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>> I have a known signal (vector) 'x'. ?I recieve a vector 'y' >>> >>> y = F(k x) + n >>> >>> where n is Gaussian noise, and k is an unknown gain parameter. >>> >>> I want to estimate k. >>> >>> F is a known function (nonlinear, memoryless). >>> >>> What might be a good approach to try? ?I'd like this to be an 'online' >>> approach - that is, I provide batches of training vectors (x, n), and the >>> estimator will improve the estimate (hopefully) as more data is supplied. >> >> scipy.optimize.curve_fit >> >> I would reestimate with the entire sample after a batch arrives using >> the old estimate as a starting value. >> >> There might be shortcuts reusing and updating the Jacobian and >> Hessian, but I don't know of anything that could be used directly. (I >> don't have much idea about non-linear kalman filters and whether they >> would help in this case.) >> > In my case, x, y, n are complex. ?I guess I need to handle that myself > (somehow). I guess curve_fit won't help then. optimize.leastsq should still work if the function returns a 1d array abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. If k is also complex, then I would think that it will have to be separated into real and complex parts as separate parameters. If you need the extra results, like covariance matrix of the estimate, then I would just copy the parts from curve_fit. (I don't think I have seen complex Gaussian noise, n, before.) Josef > > Traceback (most recent call last): > ?File "test_curvefit.py", line 378, in > ? ?run_line (sys.argv) > ?File "test_curvefit.py", line 375, in run_line > ? ?result = run (opt, cmdline) > ?File "test_curvefit.py", line 244, in run > ? ?popt, pcov = curve_fit (func, rcv_in[SPS*SI:SPS*SI+2*N], > mod_out[SPS*SI:SPS*SI+2*N]) > ?File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 426, > in curve_fit > ? ?res = leastsq(func, p0, args=args, full_output=1, **kw) > ?File "/usr/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 283, > in leastsq > ? ?gtol, maxfev, epsfcn, factor, diag) > minpack.error: Result from function call is not a proper array of floats. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ndbecker2 at gmail.com Fri Jul 15 14:12:40 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 15 Jul 2011 14:12:40 -0400 Subject: [SciPy-User] estimation problem References:

Message-ID: josef.pktd at gmail.com wrote: > On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: >> josef.pktd at gmail.com wrote: >> >>> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>>> I have a known signal (vector) 'x'. I recieve a vector 'y' >>>> >>>> y = F(k x) + n >>>> >>>> where n is Gaussian noise, and k is an unknown gain parameter. >>>> >>>> I want to estimate k. >>>> >>>> F is a known function (nonlinear, memoryless). >>>> >>>> What might be a good approach to try? I'd like this to be an 'online' >>>> approach - that is, I provide batches of training vectors (x, n), and the >>>> estimator will improve the estimate (hopefully) as more data is supplied. >>> >>> scipy.optimize.curve_fit >>> >>> I would reestimate with the entire sample after a batch arrives using >>> the old estimate as a starting value. >>> >>> There might be shortcuts reusing and updating the Jacobian and >>> Hessian, but I don't know of anything that could be used directly. (I >>> don't have much idea about non-linear kalman filters and whether they >>> would help in this case.) >>> >> In my case, x, y, n are complex. I guess I need to handle that myself >> (somehow). > > I guess curve_fit won't help then. > optimize.leastsq should still work if the function returns a 1d array > abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. > > If k is also complex, then I would think that it will have to be > separated into real and complex parts as separate parameters. > > If you need the extra results, like covariance matrix of the estimate, > then I would just copy the parts from curve_fit. > > (I don't think I have seen complex Gaussian noise, n, before.) > > Josef > What I tried that seems to work is: def func (x, k): return complex_to_real (complex_func (real_to_complex (x * k))) popt, pcov = curve_fit (func, complex_to_real(x), complex_to_real (y)) where complex_to_real: interpret a complex vector as alternating real/imag parts real_to_complex: interpret alternating entries in float vector as real/imag parts of complex Does this seem like a valid use of curve_fit? From josef.pktd at gmail.com Fri Jul 15 14:31:42 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jul 2011 14:31:42 -0400 Subject: [SciPy-User] estimation problem In-Reply-To: References:

Message-ID: On Fri, Jul 15, 2011 at 2:12 PM, Neal Becker wrote: > josef.pktd at gmail.com wrote: > >> On Fri, Jul 15, 2011 at 1:39 PM, Neal Becker wrote: >>> josef.pktd at gmail.com wrote: >>> >>>> On Fri, Jul 15, 2011 at 12:58 PM, Neal Becker wrote: >>>>> I have a known signal (vector) 'x'. ?I recieve a vector 'y' >>>>> >>>>> y = F(k x) + n >>>>> >>>>> where n is Gaussian noise, and k is an unknown gain parameter. >>>>> >>>>> I want to estimate k. >>>>> >>>>> F is a known function (nonlinear, memoryless). >>>>> >>>>> What might be a good approach to try? ?I'd like this to be an 'online' >>>>> approach - that is, I provide batches of training vectors (x, n), and the >>>>> estimator will improve the estimate (hopefully) as more data is supplied. >>>> >>>> scipy.optimize.curve_fit >>>> >>>> I would reestimate with the entire sample after a batch arrives using >>>> the old estimate as a starting value. >>>> >>>> There might be shortcuts reusing and updating the Jacobian and >>>> Hessian, but I don't know of anything that could be used directly. (I >>>> don't have much idea about non-linear kalman filters and whether they >>>> would help in this case.) >>>> >>> In my case, x, y, n are complex. ?I guess I need to handle that myself >>> (somehow). >> >> I guess curve_fit won't help then. >> optimize.leastsq should still work if the function returns a 1d array >> abs(y-F(x)) so that (abs(y-F(x))**2).sum() is the real loss function. >> >> If k is also complex, then I would think that it will have to be >> separated into real and complex parts as separate parameters. >> >> If you need the extra results, like covariance matrix of the estimate, >> then I would just copy the parts from curve_fit. >> >> (I don't think I have seen complex Gaussian noise, n, before.) >> >> Josef >> > > What I tried that seems to work is: > > def func (x, k): > ?return complex_to_real (complex_func (real_to_complex (x * k))) > > popt, pcov = curve_fit (func, complex_to_real(x), complex_to_real (y)) > > where complex_to_real: interpret a complex vector as alternating real/imag parts > real_to_complex: interpret alternating entries in float vector as real/imag > parts of complex > > Does this seem like a valid use of curve_fit? I'm not very good in complex algebra without sitting down and go through the details. your complex_to_real(x), complex_to_real (y) have now twice the length of the original x, y Does your func also return twice as many values? (after more thought, yes, since this is curve_fit and not leastsq.) Then, you are using a different objective function, sum of squares of real plus squares of complex parts, instead of square of complex. (real(y) - real(F(x)))**2 + (imag(y) - imag(F(x)))**2, instead of (y-F(x)) * (y-F(x)).conj() ? I don't know if this matters. Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From nberg at atmos.ucla.edu Sat Jul 16 01:20:23 2011 From: nberg at atmos.ucla.edu (Neil Berg) Date: Fri, 15 Jul 2011 22:20:23 -0700 Subject: [SciPy-User] netcdf integer output issue Message-ID: <7493B573-17DE-4310-919F-F333AC723D13@atmos.ucla.edu> Hi all, I am struggling to correctly output negative integers into a netCDF file. I have attached my code snippet and a 2-row sample of the input CSV data. After reading in 24 data points, I calculate the maximum, store it into a list, and output that list as a netCDF file. These are what the 2 rows of CSV input data look like, followed by the maximum value. ['21', '21', '21', '21', '20', '19', '17', '17', '16', '17', '17', '15', '16', '16', '11', '9', '7', '6', '5', '6', '4', '3', '2', '0'] ['-1', '-1', '-2', '-3', '-3', '-4', '-5', '-5', '-5', '-5', '-5', '-6', '-5', '-5', '-5', '-6', '-7', '-8', '-9', '-11', '-13', '-12', '-15', '-15'] This is the current netCDF output: time[0] max_t[0]=21 degrees F time[1] max_t[1]=4294967295 degrees F You can see that the time[0] output is correct, though the time[1] output is incorrect. I believe this is an issue with outputting negative integers. Have anyone else encountered this issue and know of a way to solve it? Thanks in advance, Neil -------------- next part -------------- A non-text attachment was scrubbed... Name: csv_sample2.csv Type: text/csv Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code_snippet.py Type: text/x-python-script Size: 1174 bytes Desc: not available URL: -------------- next part -------------- From kpr0001 at comcast.net Fri Jul 15 12:44:50 2011 From: kpr0001 at comcast.net (kpr0001 at comcast.net) Date: Fri, 15 Jul 2011 16:44:50 +0000 (UTC) Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython Message-ID: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Hello, I have a new install of Visual studio 2010, into which I just updated the python tools and iron python's latest release (as of 7/15/2011). I downloaded numpy and scipy, modified the path, and followed all of the troubleshooting instructions on your site, stackoverflow, and enthought.com. The libraries appear in the correct place now, but when I type "import python" it throws errors. Traceback (most recent call last): File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an exception. exceptions.SystemError Traceback (most recent call last): File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an exception. System info is Microsofr .NET framework Version 4.0.30319 RTMRel running under Windows 7 I am out of ideas. Help is much appreciated. Thanks, Karen -------------- next part -------------- An HTML attachment was scrubbed... URL: From digital.fireball at googlemail.com Fri Jul 15 16:08:55 2011 From: digital.fireball at googlemail.com (Johannes Eckstein) Date: Fri, 15 Jul 2011 22:08:55 +0200 Subject: [SciPy-User] 'compress' numpy array Message-ID: <4E209E57.5070607@googlemail.com> HI, i have been struggling for some hours with finding indexes in numpy arrays, maybee someone is willing to help me out with this little problem... I have a set of faces, which I formated like this: [[[ 1. -0.1 -0. ] [ 1. -0.09921 -0.01253] [ 1. -0. -0. ]] [[ 1. -0.1 -0. ] [ 1. -0.2 -0. ] [ 1. -0.09921 -0.01253]] [[ 1. -0.2 -0. ] [ 1. -0.19842 -0.02507] [ 1. -0.09921 -0.01253]] [[ 1. -0.2 -0. ] [ 1. -0.3 -0. ] [ 1. -0.19842 -0.02507]] [[ 1. -0.3 -0. ] [ 1. -0.29763 -0.0376 ] [ 1. -0.19842 -0.02507]] [[ 1. -0.3 -0. ] [ 1. -0.4 -0. ] [ 1. -0.29763 -0.0376 ]] [[ 1. -0.4 -0. ] [ 1. -0.39685 -0.05013] [ 1. -0.29763 -0.0376 ]] [[ 1. -0.4 -0. ] [ 1. -0.5 -0. ] [ 1. -0.39685 -0.05013]] [[ 1. -0.5 -0. ] [ 1. -0.49606 -0.06267] [ 1. -0.39685 -0.05013]] [[ 1. -0.5 -0. ] [ 1. -0.6 -0. ] [ 1. -0.49606 -0.06267]] [[ 1. -0.6 -0. ] [ 1. -0.59527 -0.0752 ] [ 1. -0.49606 -0.06267]] [[ 1. -0.6 -0. ] [ 1. -0.7 -0. ] [ 1. -0.59527 -0.0752 ]]] Now I would like to find all the redundant points and create a list from them like this: [[ 1. -0.1 -0. ] [ 1. -0.09921 -0.01253] [ 1. -0. -0. ] [ 1. -0.2 -0. ] [ 1. -0.19842 -0.02507] ...] Then make an array of the indices (which could be also shifted with x-1) looking like this: [[1 2 3] [1 4 2] [4 5 2] ...] Anyone an Idea or a hint of how I can efficiently compute those two results? To me it seems like I need some kind of tricky sorting algorithm, but maybee there is a trick that I don't see... Cheers Johannes From ralf.gommers at googlemail.com Sun Jul 17 11:33:23 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 17 Jul 2011 17:33:23 +0200 Subject: [SciPy-User] Cannot use scipy into new clean install of visual studio / ironpython In-Reply-To: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> References: <1309629831.713610.1310748290444.JavaMail.root@sz0073a.westchester.pa.mail.comcast.net> Message-ID: On Fri, Jul 15, 2011 at 6:44 PM, wrote: > Hello, I have a new install of Visual studio 2010, into which I just > updated the python tools and iron python's latest release (as of 7/15/2011). > I downloaded numpy and scipy, modified the path, and followed all of the > troubleshooting instructions on your site, stackoverflow, and > enthought.com. The libraries appear in the correct place now, but when I > type "import python" it throws errors. > > Traceback (most recent call last): > File "C:\Program Files (x86)\IronPython > 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in > SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an > exception. > > exceptions.SystemError > Traceback (most recent call last): > File "C:\Program Files (x86)\IronPython > 2.7\lib\site-packages\numpy\core\multiarray.py", line 0, in > SystemError: The type initializer for 'NumpyDotNet.NpyCoreApi' threw an > exception. > > > System info is > > Microsofr .NET framework > Version 4.0.30319 RTMRel > > running under > Windows 7 > > I am out of ideas. Help is much appreciated. > > Numpy/Scipy don't work on .NET, unless forks are available from Enthought. But I haven't seen an announcement on that. What instructions are you referring to? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Mon Jul 18 06:17:34 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 12:17:34 +0200 Subject: [SciPy-User] calculate new values in csv using numpy Message-ID: Hello, I try to load a csv-file (created with excel) and want to calculate a new value. Lets say my file has columns A,B and C and I want to calculate A*B-C and append it in the correct line. So far I managed to get the file read with: table = numpy.genfromtxt("/path/to/file.csv", dtype=None, delimiter=';', skip_header=1) But how do I have to proceed to get the single columns. I know that I have to use after that a FOR-loop to loop over the lines to do the calculation. so there are the two questions: 1) how to extract the single columns? 2) how to append a new columns containing the calculated value? best regards /johannes -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Mon Jul 18 08:16:59 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 05:16:59 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: okay so just to post the result I get from: table = numpy.genfromtxt("/path/to/file.csv", names=['s1','s2','p'], dtype="float,float,float", delimiter=';' , skip_header=1) the result looks like this: [(30.633520000000001, 1046.5956699999999, 0.48749999999999999) (9517.6940400000003, 26364.107199999999, 0.26041999999999998) (3102.9560099999999, 0.0, 1.0)... [(30.633520000000001, 1046.5956699999999, 0.48749999999999999)] I can access with table[x] the x-row of that array, but how can I access the columns? table[:,2] doesn't work. /joh From ckkart at hoc.net Mon Jul 18 08:19:03 2011 From: ckkart at hoc.net (Christian K.) Date: Mon, 18 Jul 2011 12:19:03 +0000 (UTC) Subject: [SciPy-User] odr - goodness of fit Message-ID: Hi, I am applying odr to do 3d-surface fits which works very well. Now I would like to know if it is possible to construct a 'goodness of fit' quantity (between 0 and 1) like e.g. R2 from likelihood fits. I know about the residual variance but it would be nice to have some quantity which is limited to the [0,1] range. Best regards, Christian From marc.shivers at gmail.com Mon Jul 18 08:20:52 2011 From: marc.shivers at gmail.com (Marc Shivers) Date: Mon, 18 Jul 2011 08:20:52 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References:

Message-ID: The 3rd column would be: [a[2] for a in table] On Mon, Jul 18, 2011 at 8:16 AM, Johannes Radinger wrote: > okay so just to post the result I get from: > > table = numpy.genfromtxt("/path/to/file.csv", names=['s1','s2','p'], > dtype="float,float,float", delimiter=';' , skip_header=1) > > the result looks like this: > > [(30.633520000000001, 1046.5956699999999, 0.48749999999999999) > ?(9517.6940400000003, 26364.107199999999, 0.26041999999999998) > ?(3102.9560099999999, 0.0, 1.0)... > [(30.633520000000001, 1046.5956699999999, 0.48749999999999999)] > > I can access with table[x] the x-row of that array, but how can I > access > the columns? table[:,2] doesn't work. > > /joh > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From schmidbe at in.tum.de Mon Jul 18 08:51:26 2011 From: schmidbe at in.tum.de (Markus Schmidberger) Date: Mon, 18 Jul 2011 14:51:26 +0200 Subject: [SciPy-User] Your SciPy application on a Computer Cluster in the Cloud - cloudnumbers.com Message-ID: <1310993486.2551.62.camel@schmidb-TravelMate8572TG> Dear SciPy users and experts, cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. As cloudnumbers.com's community manager I may invite you to register and test your Python application on a computer cluster in the cloud for free: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Take the chance and have an impact to the development of a new cloud computing calculation platform. Our aim is to change the way of research collaboration is done today by bringing together scientists and businesses from all over the world on a single platform. cloudnumbers.com is a Berlin (Germany) based international high-tech startup striving for enabling everyone to benefit from the High Performance Computing related advantages of the cloud. We provide easy access to applications running on any kind of computer hardware from single core high memory machines up to 1000 cores computer clusters. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications: http://cloudnumbers.com/blog Key features of our platform for efficient computing in the cloud are: * Turn fixed into variable costs and pay only for the capacity you need. Watch our latest saving costs with cloudnumbers.com video: http://www.youtube.com/watch?v=ln_BSVigUhg&feature=player_embedded * Enter the cloud using an intuitive and user friendly platform. Watch our latest cloudnumbers.com in a nutshell video: http://www.youtube.com/watch?v=0ZNEpR_ElV0&feature=player_embedded * Be released from ongoing technological obsolescence and continuous maintenance costs (e.g. linking to libraries or system dependencies) * Accelerated your Python, C, C++, Fortran, R, ... calculations through parallel processing and great computing capacity - more than 1000 cores are available and GPUs are coming soon. * Share your results worldwide (coming soon). * Get high speed access to public databases (please let us know, if your favorite database is missing!). * We have developed a security architecture that meets high requirements of data security and privacy. Read our security white paper: http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf Best Markus -- Dr. rer. nat. Markus Schmidberger Senior Community Manager Cloudnumbers.com GmbH Chausseestra?e 6 10119 Berlin www.cloudnumbers.com E-Mail: markus.schmidberger at cloudnumbers.com ************************* Amtsgericht M?nchen, HRB 191138 Gesch?ftsf?hrer: Erik Muttersbach, Markus Fensterer, Moritz v. Petersdorff-Campen From johradinger at googlemail.com Mon Jul 18 09:07:27 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Mon, 18 Jul 2011 15:07:27 +0200 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References:

Message-ID: Thank you, I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? /J -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Mon Jul 18 09:19:44 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 18 Jul 2011 09:19:44 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References:

Message-ID: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: http://docs.scipy.org/doc/numpy/user/basics.rec.html For your task, you want: result = table['s1'] + table['s2'] (Here's a suggestion for how to append the result back into a named column in the same structured array: http://mail.scipy.org/pipermail/numpy-discussion/2007-September/029357.html ) I'm not sure if there are any good canned methods for saving record arrays to csv files directly. Probably someone can suggest something... I usually just loop through the array at that point, using ','.join(whatever) to build the individual lines. Zach On Jul 18, 2011, at 9:07 AM, Johannes Radinger wrote: > Thank you, > > I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? > > What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? > > /J > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From marc.shivers at gmail.com Mon Jul 18 09:27:10 2011 From: marc.shivers at gmail.com (Marc Shivers) Date: Mon, 18 Jul 2011 09:27:10 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: <802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> References:

<802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> Message-ID: Also, the genfromtxt function should have returned a numpy array, rather than a list of tuples. table[:,2] will return a result if table is a numpy array. I think the problem might be in your dtype input. You can read about dtype objects here: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html On Mon, Jul 18, 2011 at 9:19 AM, Zachary Pincus wrote: > The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: > http://docs.scipy.org/doc/numpy/user/basics.rec.html > > For your task, you want: > result = table['s1'] + table['s2'] > > (Here's a suggestion for how to append the result back into a named column in the same structured array: > http://mail.scipy.org/pipermail/numpy-discussion/2007-September/029357.html ) > > I'm not sure if there are any good canned methods for saving record arrays to csv files directly. Probably someone can suggest something... I usually just loop through the array at that point, using ','.join(whatever) to build the individual lines. > > Zach > > > > On Jul 18, 2011, at 9:07 AM, Johannes Radinger wrote: > >> Thank you, >> >> I think that will be better for calculations to get a 2D array instead of a tuple/matrix combination? >> >> What is the prefered way to import the data from csv, to calculate a new column and to save again into a csv? >> >> /J >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Mon Jul 18 10:50:18 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 18 Jul 2011 10:50:18 -0400 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References:

<802B6952-129D-476D-9EA9-DDD8A60FE468@yale.edu> Message-ID: > Also, the genfromtxt function should have returned a numpy array, > rather than a list of tuples. table[:,2] will return a result if > table is a numpy array. I think the problem might be in your dtype > input. You can read about dtype objects here: > http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html >> The csv-reading code gives you a structured array as an output. You will want to read the documentation about structured arrays to get a better sense of how to use them: >> http://docs.scipy.org/doc/numpy/user/basics.rec.html As I discussed, the genfromtext csv-reading code returns a structured numpy array, not a list of tuples. This is an n-by-1 array, where each element of the array is of a structured dtype with individual fields (of potentially different data types) that can be addressed via their names, as I described. The reason for this is that, as CSV files can hold homogenous data types, but traditional numpy arrays cannot, a structured dtype is the best way to load a CSV generically. If one knows that one's file has only floats in it, then you could read it using fromtext, after stripping off the header, and get an n-by-m float array. Regardless, table[:,2] will NOT return a result if it is an n-by-1 array of structured dtypes, which is what genfromtext will give under most (all?) circumstances. Zach From jdh2358 at gmail.com Mon Jul 18 11:01:21 2011 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 18 Jul 2011 10:01:21 -0500 Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References: Message-ID: On Mon, Jul 18, 2011 at 5:17 AM, Johannes Radinger wrote: > Hello, > I try to load a csv-file (created with excel) and want to calculate a new > value. > Lets say my file has columns A,B and C and I want to calculate A*B-C and > append > it in the correct line. > So far I managed to get the file read with: > table = numpy.genfromtxt("/path/to/file.csv", dtype=None, delimiter=';', > skip_header=1) > But how do I have to proceed to get the single columns. I know that I have > to use after that a FOR-loop to loop over the lines to do the calculation. > so there are the two questions: > 1) how to extract the single columns? > 2) how to append a new columns containing the calculated value? import matplotlib.mlab as mlab r = mlab.csv2rec("somefile.csv") z = r.x + r.y r = mlab.rec_append_fields(r, ['z'], [z]) mlab.rec2csv(r, 'newfile.csv') From robert.kern at gmail.com Mon Jul 18 12:22:43 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jul 2011 11:22:43 -0500 Subject: [SciPy-User] odr - goodness of fit In-Reply-To: References: Message-ID: On Mon, Jul 18, 2011 at 07:19, Christian K. wrote: > Hi, > > I am applying odr to do 3d-surface fits which works very well. Now I would > like to know if it is possible to construct a 'goodness of fit' quantity > (between 0 and 1) like e.g. R2 from likelihood fits. I know about the residual > variance but it would be nice to have some quantity which is limited to the > [0,1] range. Well, you could compute the variance of the dataset, var_total, and then the residual variance, var_res, and compute (1-var_res/var_total). That's *roughly* what R2 is, but I'm not sure how meaningful that number will be. I'm fairly certain that you would not want to apply the standard significance values to that quantity. If you had good estimates of the errors on each measurement, then you can get a meaningful Chi^2 value from the residuals that you can use to compare against the Chi^2 distribution in order to get a p-value out (0.5 is good, ~0 means you overestimated your errors, ~1 means you got the fit wrong or underestimated your errors). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From rainexpected at theo.to Mon Jul 18 17:01:12 2011 From: rainexpected at theo.to (Ted To) Date: Mon, 18 Jul 2011 17:01:12 -0400 Subject: [SciPy-User] Conditional bivariate normal Message-ID: <4E249F18.3050600@theo.to> Hi All, I have a puzzle that I'm having trouble figuring out. Suppose U=X+Y where X and Y are independent normals. Sampling X and Y conditional on U>=\bar U takes an inordinate amount of time since at times \bar U can be fairly large so I've been thinking about how to shorten the time. I thought I came upon a solution by using scipy.stats.truncnorm.rvs to first draw U=u and then draw an X=x conditional on U=u. Using U=u and X=x, Y=y=u-x. I'm getting the correct means and the correct standard deviation for U but the standard deviations for X and Y are too small. Is there something wrong with my logic or have I incorrectly derived the sd for X|U=u? Thanks, Ted From rainexpected at theo.to Mon Jul 18 19:13:06 2011 From: rainexpected at theo.to (Ted To) Date: Mon, 18 Jul 2011 19:13:06 -0400 Subject: [SciPy-User] Conditional bivariate normal In-Reply-To: <4E249F18.3050600@theo.to> References: <4E249F18.3050600@theo.to> Message-ID: <4E24BE02.4080508@theo.to> On 07/18/2011 05:01 PM, Ted To wrote: > Hi All, > > I have a puzzle that I'm having trouble figuring out. Suppose U=X+Y > where X and Y are independent normals. Sampling X and Y conditional on > U>=\bar U takes an inordinate amount of time since at times \bar U can > be fairly large so I've been thinking about how to shorten the time. I > thought I came upon a solution by using scipy.stats.truncnorm.rvs to > first draw U=u and then draw an X=x conditional on U=u. Using U=u and > X=x, Y=y=u-x. > > I'm getting the correct means and the correct standard deviation for U > but the standard deviations for X and Y are too small. Is there > something wrong with my logic or have I incorrectly derived the sd for > X|U=u? Gah! Never mind. I forgot to take the square root of the variance... From lutz.maibaum at gmail.com Mon Jul 18 19:53:35 2011 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Mon, 18 Jul 2011 16:53:35 -0700 Subject: [SciPy-User] 'compress' numpy array In-Reply-To: <4E209E57.5070607@googlemail.com> References: <4E209E57.5070607@googlemail.com> Message-ID: On Jul 15, 2011, at 1:08 PM, Johannes Eckstein wrote: > I have a set of faces, which I formated like this: > [[[ 1. -0.1 -0. ] > [ 1. -0.09921 -0.01253] > [ 1. -0. -0. ]] > > [[ 1. -0.1 -0. ] > [ 1. -0.2 -0. ] > [ 1. -0.09921 -0.01253]] > > [[ 1. -0.2 -0. ] > [ 1. -0.19842 -0.02507] > [ 1. -0.09921 -0.01253]] > > [[ 1. -0.2 -0. ] > [ 1. -0.3 -0. ] > [ 1. -0.19842 -0.02507]] > > [[ 1. -0.3 -0. ] > [ 1. -0.29763 -0.0376 ] > [ 1. -0.19842 -0.02507]] > > [[ 1. -0.3 -0. ] > [ 1. -0.4 -0. ] > [ 1. -0.29763 -0.0376 ]] > > [[ 1. -0.4 -0. ] > [ 1. -0.39685 -0.05013] > [ 1. -0.29763 -0.0376 ]] > > [[ 1. -0.4 -0. ] > [ 1. -0.5 -0. ] > [ 1. -0.39685 -0.05013]] > > [[ 1. -0.5 -0. ] > [ 1. -0.49606 -0.06267] > [ 1. -0.39685 -0.05013]] > > [[ 1. -0.5 -0. ] > [ 1. -0.6 -0. ] > [ 1. -0.49606 -0.06267]] > > [[ 1. -0.6 -0. ] > [ 1. -0.59527 -0.0752 ] > [ 1. -0.49606 -0.06267]] > > [[ 1. -0.6 -0. ] > [ 1. -0.7 -0. ] > [ 1. -0.59527 -0.0752 ]]] > > Now I would like to find all the redundant points and create a list from > them like this: > [[ 1. -0.1 -0. ] > [ 1. -0.09921 -0.01253] > [ 1. -0. -0. ] > [ 1. -0.2 -0. ] > [ 1. -0.19842 -0.02507] > ...] > > Then make an array of the indices (which could be also shifted with x-1) > looking like this: > [[1 2 3] > [1 4 2] > [4 5 2] > ...] > > Anyone an Idea or a hint of how I can efficiently compute those two results? > To me it seems like I need some kind of tricky sorting algorithm, but > maybee there is a trick that I don't see? This sounds like a case for np.unique, with the caveat that your elements are 3-tuples of floats, which unique doesn't seem to handle. You could work around this by turning your points into records of 3 floats. Perhaps something like the following would work (let's call the input array you posted "input"): In [2]: input=array([[[ 1. , -0.1 , -0. ], ...: [ 1. , -0.09921, -0.01253], ...: [ 1. , -0. , -0. ]], ...: ...: [[ 1. , -0.1 , -0. ], ...: [ 1. , -0.2 , -0. ], ...: [ 1. , -0.09921, -0.01253]], ...: ...: [[ 1. , -0.2 , -0. ], ...: [ 1. , -0.19842, -0.02507], ...: [ 1. , -0.09921, -0.01253]], ...: ...: [[ 1. , -0.2 , -0. ], ...: [ 1. , -0.3 , -0. ], ...: [ 1. , -0.19842, -0.02507]], ...: ...: [[ 1. , -0.3 , -0. ], ...: [ 1. , -0.29763, -0.0376 ], ...: [ 1. , -0.19842, -0.02507]], ...: ...: [[ 1. , -0.3 , -0. ], ...: [ 1. , -0.4 , -0. ], ...: [ 1. , -0.29763, -0.0376 ]], ...: ...: [[ 1. , -0.4 , -0. ], ...: [ 1. , -0.39685, -0.05013], ...: [ 1. , -0.29763, -0.0376 ]], ...: ...: [[ 1. , -0.4 , -0. ], ...: [ 1. , -0.5 , -0. ], ...: [ 1. , -0.39685, -0.05013]], ...: ...: [[ 1. , -0.5 , -0. ], ...: [ 1. , -0.49606, -0.06267], ...: [ 1. , -0.39685, -0.05013]], ...: ...: [[ 1. , -0.5 , -0. ], ...: [ 1. , -0.6 , -0. ], ...: [ 1. , -0.49606, -0.06267]], ...: ...: [[ 1. , -0.6 , -0. ], ...: [ 1. , -0.59527, -0.0752 ], ...: [ 1. , -0.49606, -0.06267]], ...: ...: [[ 1. , -0.6 , -0. ], ...: [ 1. , -0.7 , -0. ], ...: [ 1. , -0.59527, -0.0752 ]]]) In [3]: input.shape Out[3]: (12, 3, 3) In [4]: temp = input.ravel().view([('x', float), ('y', float), ('z', float)]) In [5]: temp.shape Out[5]: (36,) In [6]: uniquepoints, indices = np.unique(temp,return_inverse=True) In [7]: uniquepoints Out[7]: array([(1.0, -0.69999999999999996, -0.0), (1.0, -0.59999999999999998, -0.0), (1.0, -0.59526999999999997, -0.075200000000000003), (1.0, -0.5, -0.0), (1.0, -0.49606, -0.062670000000000003), (1.0, -0.40000000000000002, -0.0), (1.0, -0.39684999999999998, -0.050130000000000001), (1.0, -0.29999999999999999, -0.0), (1.0, -0.29763000000000001, -0.037600000000000001), (1.0, -0.20000000000000001, -0.0), (1.0, -0.19842000000000001, -0.025069999999999999), (1.0, -0.10000000000000001, -0.0), (1.0, -0.099210000000000007, -0.012529999999999999), (1.0, -0.0, -0.0)], dtype=[('x', ' Hi all I'm looking some help in using fmin_cg to optimise a function. Basically I provide a function and its gradient as follows; > p1 = fmin_cg(func,p0,fprime=frime) and everything works fine. However, both func and fprime require the same matrix inversion at each step (via cholesky factorization). As matrix inversion is expensive, ideally I would like to calculate it only once per step, and use the matrix inverse calculated by func in the fprime function without having to repeat the calculation. Is this possible? ie to use the inverse calculated in func in the frime function as well? Thanks for any help. -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html Sent from the Scipy-User mailing list archive at Nabble.com. From serra.guillem at gmail.com Mon Jul 18 17:06:18 2011 From: serra.guillem at gmail.com (metge) Date: Mon, 18 Jul 2011 14:06:18 -0700 (PDT) Subject: [SciPy-User] scipy signal decimate why convolve among points we will decimate? Message-ID: In scipy signal, the decimate function uses a standard fir filtering to prevent aliasing before decimating. However, this filtering is applied to the entire set of points. It should be very easy to optimize it convolving only the points we will not sustract, specially if the filter order and the decimation factor are high. From guziy.sasha at gmail.com Mon Jul 18 20:30:20 2011 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Mon, 18 Jul 2011 20:30:20 -0400 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: You could use the same dictionary in both functions and save the inverses to it. (a=>inv(a)) -- Oleksandr 2011/7/18 gibbon > > Hi all > > I'm looking some help in using fmin_cg to optimise a function. Basically I > provide a function and its gradient as follows; > > > p1 = fmin_cg(func,p0,fprime=frime) > > and everything works fine. However, both func and fprime require the same > matrix inversion at each step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once per > step, and use the matrix inverse calculated by func in the fprime function > without having to repeat the calculation. > > Is this possible? ie to use the inverse calculated in func in the frime > function as well? > > Thanks for any help. > -- > View this message in context: > http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johradinger at googlemail.com Tue Jul 19 05:38:17 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 11:38:17 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation Message-ID: Hello SciPy-People, I have got a normal distribution with ?=0 and I know that 50% of all observations are within a certain range (50% probability are between -x and +x). How can I get the standard deviation of that normal distribution? Usually 68,3 % are within one SD. How is it possible to calculate the SD from my 50% value? Is there any conversion factor I can use? Can that be simply and exactly calculated with Scipy? Thank you /Johannes From david_baddeley at yahoo.com.au Tue Jul 19 05:51:15 2011 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 19 Jul 2011 02:51:15 -0700 (PDT) Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: Message-ID: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Hi Johannes, I guess what you're talking about is the inter-quartile range - if you know that your data is normally distributed the std deviation and IQR are related by a constant factor: IQR ~ 1.349\sigma (see wikipedia article on interquartile range, can also easily be derived from the gaussian CDF) cheers, David ----- Original Message ---- From: Johannes Radinger To: SciPy-User at scipy.org Sent: Tue, 19 July, 2011 9:38:17 PM Subject: [SciPy-User] from 50% probablity value to standard deviation Hello SciPy-People, I have got a normal distribution with ?=0 and I know that 50% of all observations are within a certain range (50% probability are between -x and +x). How can I get the standard deviation of that normal distribution? Usually 68,3 % are within one SD. How is it possible to calculate the SD from my 50% value? Is there any conversion factor I can use? Can that be simply and exactly calculated with Scipy? Thank you /Johannes _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From deil.christoph at googlemail.com Tue Jul 19 06:48:15 2011 From: deil.christoph at googlemail.com (Christoph Deil) Date: Tue, 19 Jul 2011 12:48:15 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: Message-ID: On Jul 19, 2011, at 11:38 AM, Johannes Radinger wrote: > Hello SciPy-People, > > I have got a normal distribution with ?=0 and I know that 50% of all > observations > are within a certain range (50% probability are between -x and +x). > How can I get the standard deviation of that normal distribution? > Usually 68,3 % are within one SD. How is it possible to calculate the > SD from my 50% value? Is there any conversion factor I can use? Can > that be simply and exactly calculated with Scipy? > You can get the conversion factor like this: >>> scipy.stats.halfnorm.ppf(0.5) 0.67448975019608171 Here is how you use it to compute the standard deviation: >>> sd = x / scipy.stats.halfnorm.ppf(0.5) Christoph From johradinger at googlemail.com Tue Jul 19 07:00:40 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 13:00:40 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> References: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Message-ID: Thank you for that answer, the wiki articicle helped me alot, but I've still some problems of understanding, probably just a very simple thing. My case: 50% are between -165 and +165, so my IQR=330 Calculating with the factor 1.349 gives my the SD-range of -222.6 - +222.6 is that correct? Meaning that the SD is 222.6 In the graphic of the wikipedia-articel is shown that the IQR is between -0.6745*SD and + 0.6745*SD...If I just try to solve for 165/0.6745=SD results in a SD of 244.6 ... I am not sure why, probably just a very simple mathematical problem I don't realise ;) Maybe you can help Thank you /Johannes From jrennie at gmail.com Tue Jul 19 08:20:22 2011 From: jrennie at gmail.com (Jason Rennie) Date: Tue, 19 Jul 2011 08:20:22 -0400 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: On Mon, Jul 18, 2011 at 3:09 PM, gibbon wrote: > However, both func and fprime require the same matrix inversion at each > step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once > per step, and use the matrix inverse calculated by func in the fprime > function without having to repeat the calculation. > Try fmin_l_bfgs_b or fmin_tnc which allow 'func' to return both function value and gradient (if fprime=None). Though these functions allow you to specify simple bounds, they work well for unbounded problems in my experience. http://docs.scipy.org/doc/scipy/reference/optimize.html Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Jul 19 08:26:27 2011 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 19 Jul 2011 14:26:27 +0200 Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: <32085615.post@talk.nabble.com> References: <32085615.post@talk.nabble.com> Message-ID: Yes, it's easy to do! Create a class with your methods and function like this: class Function(object): def __init__(self): do something def cost(self, param): compute the parameters def prime(self, param): compute the gradient def inverse(self, param): call this one from cost and prime, do some caching stuff fun = function() p1 = fmin_cg(fun.cost, fprime = fun.prime) Matthieu 2011/7/18 gibbon > > Hi all > > I'm looking some help in using fmin_cg to optimise a function. Basically I > provide a function and its gradient as follows; > > > p1 = fmin_cg(func,p0,fprime=frime) > > and everything works fine. However, both func and fprime require the same > matrix inversion at each step (via cholesky factorization). As matrix > inversion is expensive, ideally I would like to calculate it only once per > step, and use the matrix inverse calculated by func in the fprime function > without having to repeat the calculation. > > Is this possible? ie to use the inverse calculated in func in the frime > function as well? > > Thanks for any help. > -- > View this message in context: > http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jul 19 09:34:22 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jul 2011 15:34:22 +0200 Subject: [SciPy-User] from 50% probablity value to standard deviation In-Reply-To: References: <1311069075.83512.YahooMailRC@web113416.mail.gq1.yahoo.com> Message-ID: On Tue, Jul 19, 2011 at 1:00 PM, Johannes Radinger wrote: > Thank you for that answer, > the wiki articicle helped me alot, but I've still > some problems of understanding, probably just a > very simple thing. > > My case: > > 50% are between -165 and +165, so my IQR=330 > > Calculating with the factor 1.349 gives my the SD-range of > -222.6 - +222.6 is that correct? Meaning that the SD is 222.6 > > In the graphic of the wikipedia-articel is shown that the IQR > is between -0.6745*SD and + 0.6745*SD...If I just try to > solve for 165/0.6745=SD results in a SD of 244.6 ... > > I am not sure why, probably just a very simple mathematical problem I > don't realise ;) > > Maybe you can help std = -165 / stats.norm.ppf(0.25) print stats.norm.cdf(165, scale=std) - stats.norm.cdf(-165, scale=std) 0.5 (checked on another computer) Josef > > Thank you > /Johannes > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gustavo.goretkin at gmail.com Tue Jul 19 02:54:58 2011 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Tue, 19 Jul 2011 02:54:58 -0400 Subject: [SciPy-User] odeint for pendulum with limits Message-ID: I am trying to model a pendulum which has a limited range of motion. The state of the pendulum consists of its angular position and velocity. When the pendulum hits one of its stops, the velocity goes to zero. Can I model this with the integrators in SciPy? In the dX/dt = F(X), I can write F such that the position is clipped into some range, but I don't think I can make the velocity discontinuously drop to zero. I'd appreciate any suggestions, including removing the discontinuity and instead placing a sharp, but continuous stop. Thanks, Gustavo -------------- next part -------------- An HTML attachment was scrubbed... URL: From Neale.Gibson at astro.ox.ac.uk Tue Jul 19 05:38:09 2011 From: Neale.Gibson at astro.ox.ac.uk (gibbon) Date: Tue, 19 Jul 2011 02:38:09 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: <32085615.post@talk.nabble.com> Message-ID: <32089558.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. Thanks again for your help. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32089558.html Sent from the Scipy-User mailing list archive at Nabble.com. From Neale.Gibson at astro.ox.ac.uk Tue Jul 19 06:39:44 2011 From: Neale.Gibson at astro.ox.ac.uk (gibbon) Date: Tue, 19 Jul 2011 03:39:44 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: Message-ID: <32089558.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. Thanks again for your help. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32089558.html Sent from the Scipy-User mailing list archive at Nabble.com. From nealegibby at googlemail.com Tue Jul 19 09:25:18 2011 From: nealegibby at googlemail.com (ngibby) Date: Tue, 19 Jul 2011 06:25:18 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fmin_cg - using the same inverse calculation in func and fprime In-Reply-To: References: <32085615.post@talk.nabble.com> Message-ID: <32091153.post@talk.nabble.com> Thanks Oleksandr That seems to have done the trick - I forgot lists/dictionaries passed to a function were by reference and could be accessed without returning them. I've decided to use a list instead as follows; list = [K,] p1 = fmin_cg(frime,p0,args=(list,),fprime=fprime) and both func and fprime can now manipulate and store the matrix K. I don't think the optimisation functions in general always evaluate the func and then fprime (and the Hessian for ncg) in order, so a few messy if clauses might be necessary to check if the inverse matrix has been updated for each new parameter set. sanGuziy wrote: > > You could use the same dictionary in both functions and save the inverses > to > it. > (a=>inv(a)) > -- > Oleksandr > > 2011/7/18 gibbon > >> >> Hi all >> >> I'm looking some help in using fmin_cg to optimise a function. Basically >> I >> provide a function and its gradient as follows; >> >> > p1 = fmin_cg(func,p0,fprime=frime) >> >> and everything works fine. However, both func and fprime require the same >> matrix inversion at each step (via cholesky factorization). As matrix >> inversion is expensive, ideally I would like to calculate it only once >> per >> step, and use the matrix inverse calculated by func in the fprime >> function >> without having to repeat the calculation. >> >> Is this possible? ie to use the inverse calculated in func in the frime >> function as well? >> >> Thanks for any help. >> -- >> View this message in context: >> http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32085615.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fmin_cg---using-the-same-inverse-calculation-in-func-and-fprime-tp32085615p32091153.html Sent from the Scipy-User mailing list archive at Nabble.com. From aarchiba at physics.mcgill.ca Tue Jul 19 10:07:19 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 10:07:19 -0400 Subject: [SciPy-User] odeint for pendulum with limits In-Reply-To: References: Message-ID: Unfortunately this is a very tricky problem. The naive approach of replacing the stop with an extremely large force runs into problems because the integrator bogs down in tiny steps simulating the stop in detail. A better approach is to run the integrator with no limits but stopping integration when the pendulum reaches its physical limit. Then you can change the velocity any way you like and restart the integration. In terms of scipy, I don't think any of the integrators support stopping conditions (pydstool does) but I believe ours do support backtracking within the last step, so you can implement this yourself. The problem becomes really difficult if you can't compute forces outside the valid domain, because all the good integrators I know sometimes need to evaluate points outside the permitted region. Anne On 7/19/11, Gustavo Goretkin wrote: > I am trying to model a pendulum which has a limited range of motion. The > state of the pendulum consists of its angular position and velocity. When > the pendulum hits one of its stops, the velocity goes to zero. > > Can I model this with the integrators in SciPy? In the dX/dt = F(X), I can > write F such that the position is clipped into some range, but I don't think > I can make the velocity discontinuously drop to zero. I'd appreciate any > suggestions, including removing the discontinuity and instead placing a > sharp, but continuous stop. > > Thanks, > Gustavo > -- Sent from my mobile device From johradinger at googlemail.com Tue Jul 19 10:27:28 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 07:27:28 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: References:

Message-ID: <1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> Thank you for all your help and your different solutions to the problem. What I did no is following: import csv data = [] # Read form csv file and compute fourth column f = open("/path/to/file.csv", 'rU') reader = csv.reader(f, delimiter=";") for row in reader: s1=float(row[0]) s2=float(row[1]) p=float(row[2]) if s2>0: A=A1 def func(x,s1,s2,m,A,p): return (p) * stats.norm.cdf(x, loc=m, scale=s1) + (1-p) * stats.norm.cdf(x, loc=m, scale=s2) - A x1=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) A=A2 x2=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) data.append(row + [x1] + [x2]) elif s2==0: x1=s1 x2=s1*3 data.append(row + [x1] + [x2]) else: print "Error" f.close() print data # Write new array to csv file f = open("/path/to/new_file.csv", 'wb') writer = csv.writer(f, delimiter=';') for row in data: writer.writerow(row) f.close() And that works...nearly perfect I just get 3 warning of follwowing type: Warning (from warnings module): File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/site-packages/scipy/optimize/zeros.py", line 125 warnings.warn(msg, RuntimeWarning) RuntimeWarning: Tolerance of 1697.3557819 reached How can I check which row is causing the problem? Thanks cheers /Johannes From jsseabold at gmail.com Tue Jul 19 10:43:04 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 19 Jul 2011 10:43:04 -0400 Subject: [SciPy-User] Announcing statsmodels 0.3.0 release Message-ID: We are happy to announce that statsmodels version 0.3.0 is available for download. You can install from PyPI with easy_install -U scikits.statsmodels What's new? https://github.com/statsmodels/statsmodels/blob/master/CHANGES.txt Documentation: http://statsmodels.sourceforge.net/ Source Distributions: http://pypi.python.org/pypi/scikits.statsmodels Repository: https://github.com/statsmodels/statsmodels Mailing List: https://groups.google.com/group/pystatsmodels Bug Tracker: https://github.com/statsmodels/statsmodels/issues You can find us on the web at http://blog.wesmckinney.com/ and http://scipystats.blogspot.com/ or keep up with development on twitter @statsmodels Cheers, Josef Perktold, Skipper Seabold, Wes McKinney, Mike Crow, Vincent Davis ---------------------------------------------- Statsmodels is a python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation of statistical models. scikits.statsmodels provides classes and functions for the estimation of several categories of statistical models. These currently include linear regression models, OLS, GLS, WLS and GLS with AR(p) errors, generalized linear models for six distribution families, M-estimators for robust linear models, and regression with discrete dependent variables, Logit, Probit, MNLogit, Poisson, based on maximum likelihood estimators, timeseries models, ARMA, AR and VAR. An extensive list of result statistics are available for each estimation problem. Statsmodels also contains descriptive statistics, a wide range of statistical tests, tools for density estimation, and more. From johradinger at googlemail.com Tue Jul 19 11:12:30 2011 From: johradinger at googlemail.com (Johannes Radinger) Date: Tue, 19 Jul 2011 08:12:30 -0700 (PDT) Subject: [SciPy-User] calculate new values in csv using numpy In-Reply-To: <1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> References:

<1fadea34-200f-4662-879e-9f66eded9aa3@l28g2000yqc.googlegroups.com> Message-ID: <07a40dd6-da4d-402a-b8c8-1b34503ccff7@g2g2000vbl.googlegroups.com> So I found out which lines are causing the problems but I don't know yet why: It seems that my optimize function can solve with very small values of s1. the optimize function i am using to solve is again: def func(x,s1,s2,m,A,p): return (p) * stats.norm.cdf(x, loc=m, scale=s1) + (1-p) * stats.norm.cdf(x, loc=m, scale=s2) - A x1=optimize.zeros.newton(func, 1., args=(s1,s2,m,A,p)) where m=0, A=0.6827 and following value-triples(s1,s2,p) causing problems: ['0.453567', '56.449087', '0.945475'] ['0.109604', '32.540055', '0.574013'] ['0.152876', '7.009490', '0.646816'] but why is here the tolerance reached? what can I do to improve that because the results I get aren't correct. /Johannes From cweisiger at msg.ucsf.edu Tue Jul 19 12:04:34 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Tue, 19 Jul 2011 09:04:34 -0700 Subject: [SciPy-User] Uniquely identify array Message-ID: Is there some way in Python to uniquely identify a given Numpy array? E.g. to get a pointer to its location in memory or something similar? I'm looking for some way to determine which operations will implicitly create new arrays, just to verify that I'm not doing anything that will seriously hurt my performance -- but this seems like something that would be generally useful to know. Unfortunately ndarrays don't allow arbitrary additions to their namespace; no doing "foo.myUniqueIdentifier = 1", for example. Thanks in advance! -Chris From robert.kern at gmail.com Tue Jul 19 12:15:03 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jul 2011 11:15:03 -0500 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: On Tue, Jul 19, 2011 at 11:04, Chris Weisiger wrote: > Is there some way in Python to uniquely identify a given Numpy array? > E.g. to get a pointer to its location in memory or something similar? > I'm looking for some way to determine which operations will implicitly > create new arrays, just to verify that I'm not doing anything that > will seriously hurt my performance -- but this seems like something > that would be generally useful to know. The 'data' entry in the .__array_interface__ dictionary is the memory pointer to the start of the array. http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#__array_interface__ [~] |3> x = np.arange(10) [~] |4> x.__array_interface__ {'data': (23147728, False), 'descr': [('', ' x[5:].__array_interface__ {'data': (23147748, False), 'descr': [('', ' np.may_share_memory(x, x[5:]) True However, it is susceptible to false positives when the memory ranges overlap, but the strides cause the elements to miss each other. Hence the noncommittal name: [~] |7> np.may_share_memory(x[0::2], x[1::2]) True -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From aarchiba at physics.mcgill.ca Tue Jul 19 12:22:19 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 12:22:19 -0400 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: This is a little more subtle than it sounds. Most python objects can be compared for identity with "is" (e.g. "if x is None:"). This tests for pointer equality, that is, it confirms that you have the same dynamically-allocated heap object. This will work for arrays, but it might be too specific for what you want: a numpy array actually consists of two heap objects, a python object that describes the array, and a memory arena. Slicing operations like A[::-1] are fast because while they create a new python object, the memory arena is untouched. So you need to decide whether what you care about is any change at all to the array, or whether what you care about is whether a new memory arena has been allocated. A brief aside: people often think they care about allocation of new arrays, but in most cases they're mistaken. malloc() is an extremely fast operation, especially for large arrays, in which case it's usually a direct call to the OS's mmap (and free really does free the memory back to the system). If what you're worried about is that your code is slower than it should be, making sure there are no extra allocations is not the best place to look. In-place operations have their own limitations, things like cache-coherency issues and cache efficiency of strided memory access. This is not theoretical: I had some code, a few years ago, that manipulated large arrays and was slow. So I painstakingly went through and made it use in-place operations where possible and avoid malloc()ing new arrays. Not only did it get slower, the memory usage increased. On the other hand, if you want to know whether you're getting slices that allow you to modify the original array or freshly-allocated arenas, the bluntest available instrument is to write to the one and see if the other changes. There are some more subtle approaches that are a little approximate, things like checking the address of the memory arena, or the equality of the base numpy array object (be warned that you have to traverse a tree of up pointers to get this last). I say approximate because while A[::2] and A[1::2] share a memory arena, and even have overlapping extents, you can modify them independently of each other. In short, you need to think hard about exactly what you're testing for. But for unit tests I recommend using modifications to test for memory sharing. Anne On 19 July 2011 12:04, Chris Weisiger wrote: > Is there some way in Python to uniquely identify a given Numpy array? > E.g. to get a pointer to its location in memory or something similar? > I'm looking for some way to determine which operations will implicitly > create new arrays, just to verify that I'm not doing anything that > will seriously hurt my performance -- but this seems like something > that would be generally useful to know. > > Unfortunately ndarrays don't allow arbitrary additions to their > namespace; no doing "foo.myUniqueIdentifier = 1", for example. > > Thanks in advance! > > -Chris > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cweisiger at msg.ucsf.edu Tue Jul 19 12:30:28 2011 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Tue, 19 Jul 2011 09:30:28 -0700 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: Message-ID: Thanks for the detailed response, both you and Robert Kern. My immediate problem is not especially significant; I have three arrays: one of data D, one of additive offsets O, and one of multiplicative modifiers M. The first is of ints and the latter two of floats, and I want to get D * M - O as ints and shove them into an existing buffer. This is not an especially expensive operation (the dataset is 512x512), but I found myself curious about what it's doing behind the scenes, and the most straightforward way I know of to track that kind of thing is to track allocations. I don't expect it would make a big difference to hyper-optimize this problem, but in the future I may need tighter code in some other application, and I'd rather know now than potentially go down a wrong path later. I know more now about what Numpy's doing than I did before this thread. Thanks for the prompt and detailed responses. :) -Chris On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald wrote: > This is a little more subtle than it sounds. Most python objects can > be compared for identity with "is" (e.g. "if x is None:"). This tests > for pointer equality, that is, it confirms that you have the same > dynamically-allocated heap object. This will work for arrays, but it > might be too specific for what you want: a numpy array actually > consists of two heap objects, a python object that describes the > array, and a memory arena. Slicing operations like A[::-1] are fast > because while they create a new python object, the memory arena is > untouched. So you need to decide whether what you care about is any > change at all to the array, or whether what you care about is whether > a new memory arena has been allocated. > > A brief aside: people often think they care about allocation of new > arrays, but in most cases they're mistaken. malloc() is an extremely > fast operation, especially for large arrays, in which case it's > usually a direct call to the OS's mmap (and free really does free the > memory back to the system). If what you're worried about is that your > code is slower than it should be, making sure there are no extra > allocations is not the best place to look. In-place operations have > their own limitations, things like cache-coherency issues and cache > efficiency of strided memory access. This is not theoretical: I had > some code, a few years ago, that manipulated large arrays and was > slow. So I painstakingly went through and made it use in-place > operations where possible and avoid malloc()ing new arrays. Not only > did it get slower, the memory usage increased. > > On the other hand, if you want to know whether you're getting slices > that allow you to modify the original array or freshly-allocated > arenas, the bluntest available instrument is to write to the one and > see if the other changes. There are some more subtle approaches that > are a little approximate, things like checking the address of the > memory arena, or the equality of the base numpy array object (be > warned that you have to traverse a tree of up pointers to get this > last). I say approximate because while A[::2] and A[1::2] share a > memory arena, and even have overlapping extents, you can modify them > independently of each other. > > In short, you need to think hard about exactly what you're testing > for. But for unit tests I recommend using modifications to test for > memory sharing. > > Anne > > On 19 July 2011 12:04, Chris Weisiger wrote: >> Is there some way in Python to uniquely identify a given Numpy array? >> E.g. to get a pointer to its location in memory or something similar? >> I'm looking for some way to determine which operations will implicitly >> create new arrays, just to verify that I'm not doing anything that >> will seriously hurt my performance -- but this seems like something >> that would be generally useful to know. >> >> Unfortunately ndarrays don't allow arbitrary additions to their >> namespace; no doing "foo.myUniqueIdentifier = 1", for example. >> >> Thanks in advance! >> >> -Chris >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From aarchiba at physics.mcgill.ca Tue Jul 19 12:37:27 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 19 Jul 2011 12:37:27 -0400 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References:

Message-ID: I know you were looking for tools to answer the question and not answers to the question but: The easiest way to do what you want is: output[...] = D*M-O This will convert D to floats, multiply it by M, subtract O, then store the result into output, converting to ints on the fly. I'm not sure whether a floatified version of D is allocated, but I think so. You could do all this in-place at the cost of extra roundings by using the np.multiply(a,b,out) forms of ufuncs. Anne On 19 July 2011 12:30, Chris Weisiger wrote: > Thanks for the detailed response, both you and Robert Kern. My > immediate problem is not especially significant; I have three arrays: > one of data D, one of additive offsets O, and one of multiplicative > modifiers M. The first is of ints and the latter two of floats, and I > want to get D * M - O as ints and shove them into an existing buffer. > This is not an especially expensive operation (the dataset is > 512x512), but I found myself curious about what it's doing behind the > scenes, and the most straightforward way I know of to track that kind > of thing is to track allocations. I don't expect it would make a big > difference to hyper-optimize this problem, but in the future I may > need tighter code in some other application, and I'd rather know now > than potentially go down a wrong path later. > > I know more now about what Numpy's doing than I did before this > thread. Thanks for the prompt and detailed responses. :) > > -Chris > > On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald > wrote: >> This is a little more subtle than it sounds. Most python objects can >> be compared for identity with "is" (e.g. "if x is None:"). This tests >> for pointer equality, that is, it confirms that you have the same >> dynamically-allocated heap object. This will work for arrays, but it >> might be too specific for what you want: a numpy array actually >> consists of two heap objects, a python object that describes the >> array, and a memory arena. Slicing operations like A[::-1] are fast >> because while they create a new python object, the memory arena is >> untouched. So you need to decide whether what you care about is any >> change at all to the array, or whether what you care about is whether >> a new memory arena has been allocated. >> >> A brief aside: people often think they care about allocation of new >> arrays, but in most cases they're mistaken. malloc() is an extremely >> fast operation, especially for large arrays, in which case it's >> usually a direct call to the OS's mmap (and free really does free the >> memory back to the system). If what you're worried about is that your >> code is slower than it should be, making sure there are no extra >> allocations is not the best place to look. In-place operations have >> their own limitations, things like cache-coherency issues and cache >> efficiency of strided memory access. This is not theoretical: I had >> some code, a few years ago, that manipulated large arrays and was >> slow. So I painstakingly went through and made it use in-place >> operations where possible and avoid malloc()ing new arrays. Not only >> did it get slower, the memory usage increased. >> >> On the other hand, if you want to know whether you're getting slices >> that allow you to modify the original array or freshly-allocated >> arenas, the bluntest available instrument is to write to the one and >> see if the other changes. There are some more subtle approaches that >> are a little approximate, things like checking the address of the >> memory arena, or the equality of the base numpy array object (be >> warned that you have to traverse a tree of up pointers to get this >> last). I say approximate because while A[::2] and A[1::2] share a >> memory arena, and even have overlapping extents, you can modify them >> independently of each other. >> >> In short, you need to think hard about exactly what you're testing >> for. But for unit tests I recommend using modifications to test for >> memory sharing. >> >> Anne >> >> On 19 July 2011 12:04, Chris Weisiger wrote: >>> Is there some way in Python to uniquely identify a given Numpy array? >>> E.g. to get a pointer to its location in memory or something similar? >>> I'm looking for some way to determine which operations will implicitly >>> create new arrays, just to verify that I'm not doing anything that >>> will seriously hurt my performance -- but this seems like something >>> that would be generally useful to know. >>> >>> Unfortunately ndarrays don't allow arbitrary additions to their >>> namespace; no doing "foo.myUniqueIdentifier = 1", for example. >>> >>> Thanks in advance! >>> >>> -Chris >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From seb.haase at gmail.com Tue Jul 19 16:04:13 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 19 Jul 2011 22:04:13 +0200 Subject: [SciPy-User] Uniquely identify array In-Reply-To: References: