From contact at pythonxy.com Fri Jan 1 08:43:17 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Fri, 1 Jan 2010 14:43:17 +0100 Subject: [SciPy-User] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation Message-ID: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> Hi David, Following your announcement for the 'toydist' module, I think that your project is very promising: this is certainly a great idea and it will be very controversial but that's because people expectactions are great on this matter (distutils is so disappointing indeed). Anyway, if I may be useful, I'll gladly contribute to it. In time, I could change the whole Python(x,y) packaging system (which is currently quite ugly... but easy/quick to manage/maintain) to use/promote this new module. Happy New Year! and Long Live Scientific Python! ;-) Cheers, Pierre From cournape at gmail.com Sat Jan 2 02:51:38 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Jan 2010 16:51:38 +0900 Subject: [SciPy-User] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation In-Reply-To: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> References: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> Message-ID: <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> On Fri, Jan 1, 2010 at 10:43 PM, Pierre Raybaut wrote: > Hi David, > > Following your announcement for the 'toydist' module, I think that > your project is very promising: this is certainly a great idea and it > will be very controversial but that's because people expectactions are > great on this matter (distutils is so disappointing indeed). > > Anyway, if I may be useful, I'll gladly contribute to it. > In time, I could change the whole Python(x,y) packaging system (which > is currently quite ugly... but easy/quick to manage/maintain) to > use/promote this new module. That would be a good way to test toydist on a real, complex package. I am not familiar at all with python(x,y) internals. Do you have some explanation I could look at somewhere ? In the meantime, I will try to clean-up the code to have a first experimental release. cheers, David From contact at pythonxy.com Sat Jan 2 05:40:16 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Sat, 2 Jan 2010 11:40:16 +0100 Subject: [SciPy-User] [SPAM] Re: [Numpy-discussion] Announcing toydist, improving distribution and packaging situation In-Reply-To: <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> References: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> Message-ID: <629b08a41001020240y642518f4r68f4a6a3860a3eee@mail.gmail.com> 2010/1/2 David Cournapeau : > On Fri, Jan 1, 2010 at 10:43 PM, Pierre Raybaut wrote: >> Hi David, >> >> Following your announcement for the 'toydist' module, I think that >> your project is very promising: this is certainly a great idea and it >> will be very controversial but that's because people expectactions are >> great on this matter (distutils is so disappointing indeed). >> >> Anyway, if I may be useful, I'll gladly contribute to it. >> In time, I could change the whole Python(x,y) packaging system (which >> is currently quite ugly... but easy/quick to manage/maintain) to >> use/promote this new module. > > That would be a good way to test toydist on a real, complex package. I > am not familiar at all with python(x,y) internals. Do you have some > explanation I could look at somewhere ? Honestly, let's assume that there is currently no packaging system... it would not be very far from the truth. I did it when I was young and naive regarding Python. Actually I almost did it without having writing any code in Python (approx. two months after earing about the Python language for the first time) : it's an ugly collection of AutoIt, NSIS and PHP scripts -- most of the tasks are automated like updating the generated website pages and so on. So I'm not proud at all, but it was easy and very quick to do as it is, and it's still quite easy to maintain. But, it's not satisfying in terms of code "purity" -- I've been wanting to rewrite all this in Python for a year and a half but since the features are there, there is no real motivation to do the work (in other words, Python(x,y) users would not see the difference, at least at the beginning). An other thing: Python(x,y) plugins are not built from source but from existing binaries (it's a pity I know, but it was incredibly faster to do this way). For example, eggs or distutils .exe may be converted in Python(x,y) plugins directly (same internal directory structure). So it may be different from the idea you had in mind (it's not like EPD which is entirely generated from source, AFAIK). > In the meantime, I will try to clean-up the code to have a first > experimental release. > Ok, keep up the good work! Cheers, Pierre From tpk at kraussfamily.org Sat Jan 2 15:42:30 2010 From: tpk at kraussfamily.org (Tom K.) Date: Sat, 2 Jan 2010 12:42:30 -0800 (PST) Subject: [SciPy-User] [SciPy-user] [ANN] upfirdn 0.2.0 Message-ID: <26996317.post@talk.nabble.com> ANNOUNCEMENT I am pleased to announce a new release of "upfirdn" - version 0.2.0. This package provides an efficient polyphase FIR resampler object (SWIG-ed C++) and some python wrappers. This release greatly improves installation with distutils relative to the initial 0.1.0 release. 0.2.0 includes no functional changes relative to 0.1.0. Also, the source code is now browse-able online through a Google Code site with mercurial repository. https://opensource.motorola.com/sf/projects/upfirdn http://code.google.com/p/upfirdn/ Thanks to Google for providing this hosting service! -- View this message in context: http://old.nabble.com/-ANN--upfirdn-0.2.0-tp26996317p26996317.html Sent from the Scipy-User mailing list archive at Nabble.com. From peter.shepard at gmail.com Sat Jan 2 18:16:27 2010 From: peter.shepard at gmail.com (Pete Shepard) Date: Sat, 2 Jan 2010 15:16:27 -0800 Subject: [SciPy-User] fisher's exact.py stalls? Message-ID: <5c2c43621001021516w1285568ci18c75db9e54409a5@mail.gmail.com> Hello, I am using "fishersexact.py" to compare two long ~10,000 lists of ratios. Each time I do this, the program get stuck? I can print the two ratios that come before the program stalls and if I give these numbers directly to the subroutine it seems to process them just fine. I am wondering if anyone has had a similar issue with the "fishersexact.py" subroutine? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 2 18:42:38 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 2 Jan 2010 18:42:38 -0500 Subject: [SciPy-User] fisher's exact.py stalls? In-Reply-To: <5c2c43621001021516w1285568ci18c75db9e54409a5@mail.gmail.com> References: <5c2c43621001021516w1285568ci18c75db9e54409a5@mail.gmail.com> Message-ID: <1cd32cbb1001021542iafa2d7fn7c8a09694cbd466@mail.gmail.com> On Sat, Jan 2, 2010 at 6:16 PM, Pete Shepard wrote: > Hello, > > I am using "fishersexact.py" to compare two long ~10,000 lists of ratios. > Each time I do this, the program get stuck? I can print the two ratios that > come before the program stalls and if I give these numbers directly to the > subroutine it seems to process them just fine. I am wondering if anyone has > had a similar issue with? the "fishersexact.py" subroutine? Do you mean fisherexact in the scipy trac ? If yest, did you apply http://projects.scipy.org/scipy/ticket/956#comment:10 by tkharris which found and fixes one endless loop. I was working on the ticket, but got stuck with some test failures that I haven't figured out. Do you know for which table the function gets stuck? Josef > > Thanks, > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From peter.shepard at gmail.com Sat Jan 2 19:10:58 2010 From: peter.shepard at gmail.com (Pete Shepard) Date: Sat, 2 Jan 2010 16:10:58 -0800 Subject: [SciPy-User] fisher's exact.py stalls? In-Reply-To: <1cd32cbb1001021542iafa2d7fn7c8a09694cbd466@mail.gmail.com> References: <5c2c43621001021516w1285568ci18c75db9e54409a5@mail.gmail.com> <1cd32cbb1001021542iafa2d7fn7c8a09694cbd466@mail.gmail.com> Message-ID: <5c2c43621001021610g26b4746cu6a77659b91aa2d79@mail.gmail.com> That did the trick, thanks. On Sat, Jan 2, 2010 at 3:42 PM, wrote: > On Sat, Jan 2, 2010 at 6:16 PM, Pete Shepard > wrote: > > Hello, > > > > I am using "fishersexact.py" to compare two long ~10,000 lists of ratios. > > Each time I do this, the program get stuck? I can print the two ratios > that > > come before the program stalls and if I give these numbers directly to > the > > subroutine it seems to process them just fine. I am wondering if anyone > has > > had a similar issue with the "fishersexact.py" subroutine? > > Do you mean fisherexact in the scipy trac ? > > If yest, did you apply > http://projects.scipy.org/scipy/ticket/956#comment:10 by tkharris > which found and fixes one endless loop. > > I was working on the ticket, but got stuck with some test failures > that I haven't figured out. > > Do you know for which table the function gets stuck? > > Josef > > > > > > Thanks, > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timmichelsen at gmx-topmail.de Mon Jan 4 12:11:42 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 4 Jan 2010 17:11:42 +0000 (UTC) Subject: [SciPy-User] scikits.timeseries.tsfromtxt & guess Message-ID: Hello, I first want to stress again that the tsfromtxt in the timeseries scikit is a real killer function. Once one has understood the easyness of the "dateconverter" function it becomes a quick exercise to read in timeseries from ASCII files. As I am currently predefining a set of dateconverters for frquently used date-time combinations in different formats, I have the following question: Is it possible to integrate "ts.extras.guess_freq(dates)" into the function scikits.timeseries.tsfromtxt? Currently, I would need to read a file twice: once for guessing the frequency based on a created list of dates and then read file to create the timeseries. Ideally, I would like to do: def mydateconverter(year, month, day, hour): freq = ts.extras.guess_freq(year, month, day, hour) ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) return ts_date myts= ts.tsfromtxt(datafile, skiprows=1, names=None, datecols=(1,2,3), guess_freq=True, dateconverter=mydateconverter) Or is this already possible and I am just not getting this right? How can I pass a frequency value to the dateconverter argument? Like: def mydateconverter(year, month, day, hour, freq='T'): freq = ts.extras.guess_freq(year, month, day, hour) ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) return ts_date myts= ts.tsfromtxt(datafile, skiprows=1, names=None, datecols=(1,2,3), guess_freq=True, dateconverter=mydateconverter(freq='H')) I get this error then: TypeError: mydateconverter() takes at least 2 non-keyword arguments (0 given) Thanks in advance for any hints, Timmie From timmichelsen at gmx-topmail.de Mon Jan 4 12:19:20 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 4 Jan 2010 17:19:20 +0000 (UTC) Subject: [SciPy-User] How to concatenate timeseries Message-ID: Hello, I am reading timeseries data from different files covering various successive time intervals. What is the best methoth concatenate these to one long running time series? I tried: import scikits.timeseries as ts series = ts.time_series([0,1,2,3], start_date=ts.Date(freq='A', year=2005)) series1 = ts.time_series([0,1,2,3], start_date=ts.Date(freq='A', year=2009)) import numpy as np full = np.concatenate([series, series1]) But the full series has then the frequency 'U' for undefined. What am I missing? Thanks, Timmie From pgmdevlist at gmail.com Mon Jan 4 12:50:34 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 4 Jan 2010 12:50:34 -0500 Subject: [SciPy-User] How to concatenate timeseries In-Reply-To: References: Message-ID: <255AD4CD-3076-4032-AA76-692E19B2A945@gmail.com> On Jan 4, 2010, at 12:19 PM, Tim Michelsen wrote: > What is the best methoth concatenate these to one long running time series? > > I tried: > > > import scikits.timeseries as ts > series = ts.time_series([0,1,2,3], start_date=ts.Date(freq='A', year=2005)) > series1 = ts.time_series([0,1,2,3], start_date=ts.Date(freq='A', year=2009)) > > import numpy as np > full = np.concatenate([series, series1]) > > But the full series has then the frequency 'U' for undefined. > > What am I missing? Use the concatenate function that comes with scikits.timeseries >>> ts.concatenate([series,series1]) timeseries([0 1 2 3 0 1 2 3], dates = [2005 ... 2012], freq = A-DEC) ts.concatenate tests whether the series have the same frequency, and optional parameters let you decide what to do with duplicates. From pgmdevlist at gmail.com Mon Jan 4 13:21:44 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 4 Jan 2010 13:21:44 -0500 Subject: [SciPy-User] scikits.timeseries.tsfromtxt & guess In-Reply-To: References: Message-ID: <8DEEF015-ECDF-49F5-9B1B-8E940A7249D2@gmail.com> On Jan 4, 2010, at 12:11 PM, Tim Michelsen wrote: > Hello, > I first want to stress again that the tsfromtxt in the timeseries scikit is a > real killer function. My, thanks a lot > Once one has understood the easyness of the "dateconverter" function it becomes > a quick exercise to read in timeseries from ASCII files. > > As I am currently predefining a set of dateconverters for frquently used > date-time combinations in different formats, I have the following question: > Is it possible to integrate "ts.extras.guess_freq(dates)" into the function > scikits.timeseries.tsfromtxt? Probably, but I doubt it'll be very different from the current behavior. See, the dateconverter function transforms a series of strings into a unique Date, independently for each row of the input. You can't guess the frequency of an individual Date, you need several to compare their lags. That means that no matter what, you'll have to reprocess the array. I'd prefer to leave this operation up to the user... > Currently, I would need to read a file twice: once for guessing the frequency > based on a created list of dates and then read file to create the timeseries. I'd first create the time series from the input, then try to guess the frequency from the DateArray > > How can I pass a frequency value to the dateconverter argument? > > Like: > def mydateconverter(year, month, day, hour, freq='T'): > freq = ts.extras.guess_freq(year, month, day, hour) > ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) > > return ts_date > > myts= ts.tsfromtxt(datafile, skiprows=1, names=None, > datecols=(1,2,3), guess_freq=True, > dateconverter=mydateconverter(freq='H')) > > I get this error then: > TypeError: mydateconverter() takes at least 2 non-keyword arguments (0 given) Please send a small example of datafile so that I can test wht goes wrong. If I have to guess: the line `dateconverter=mydateconverter(freq='H')` forces a call to mydateconverter without any argument (but for he frequency). Of course, that won't fly. What you want is to have `mydateconverter(freq='H')` callable. You should probably create a class that takes a frequency as instantiation input and that has a __call__ method, something like: class myconverter(object) def __init__(freq='D'): self.freq=freq def __call__(self, y,m,d,h): return ts.Date(self.freq, year=int(y),month=int(m),day=int(day),hour=int(h)) That way, myconverter(freq='T') becomes a valid function (you can call it). From timmichelsen at gmx-topmail.de Mon Jan 4 14:58:33 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 04 Jan 2010 20:58:33 +0100 Subject: [SciPy-User] How to concatenate timeseries In-Reply-To: <255AD4CD-3076-4032-AA76-692E19B2A945@gmail.com> References: <255AD4CD-3076-4032-AA76-692E19B2A945@gmail.com> Message-ID: > Use the concatenate function that comes with scikits.timeseries >>>> ts.concatenate([series,series1]) > timeseries([0 1 2 3 0 1 2 3], dates = [2005 ... 2012], freq = A-DEC) > > > ts.concatenate tests whether the series have the same frequency, and > optional parameters let you decide what to do with duplicates. Must have overlooked that. But it isn't in the docs either: http://pytseries.sourceforge.net/search.html?q=concatenate From timmichelsen at gmx-topmail.de Mon Jan 4 15:25:30 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 04 Jan 2010 21:25:30 +0100 Subject: [SciPy-User] scikits.timeseries.tsfromtxt & guess In-Reply-To: <8DEEF015-ECDF-49F5-9B1B-8E940A7249D2@gmail.com> References: <8DEEF015-ECDF-49F5-9B1B-8E940A7249D2@gmail.com> Message-ID: >> I first want to stress again that the tsfromtxt in the timeseries scikit is a >> real killer function. > > My, thanks a lot Yes, you may remember all my questions (still at the beginning of my scipy learning curve) on the data loading and creation of masked time series... This is now all obsolete. And as I receive data (logger) in wicked formats not counting from 0-23 but rather 1-24, I appreciate the datconverters which are based on strong datetime manupulations. > I'd first create the time series from the input, then try to guess the frequency from the DateArray So you'd recommend to create the timeseries using the userdefined frequency ('U') def mydateconverter(year, month, day, hour, freq='U'): freq = ts.extras.guess_freq(year, month, day, hour) ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) return ts_date and then use guess_freq to assign the correct one? I want to have the dateconverters in a flexible style only variyng by input format and clumns used. They should be working regardless of the frequency (be the data set hourly or minutely). > >> How can I pass a frequency value to the dateconverter argument? >> >> Like: >> def mydateconverter(year, month, day, hour, freq='T'): >> freq = ts.extras.guess_freq(year, month, day, hour) >> ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) >> >> return ts_date >> >> myts= ts.tsfromtxt(datafile, skiprows=1, names=None, >> datecols=(1,2,3), guess_freq=True, >> dateconverter=mydateconverter(freq='H')) >> >> I get this error then: >> TypeError: mydateconverter() takes at least 2 non-keyword arguments (0 given) > > Please send a small example of datafile so that I can test wht goes wrong. If I have to guess: the line `dateconverter=mydateconverter(freq='H')` forces a call to mydateconverter without any argument (but for he frequency). Of course, that won't fly. > What you want is to have `mydateconverter(freq='H')` callable. You should probably create a class that takes a frequency as instantiation input and that has a __call__ method, something like: > > class myconverter(object) > def __init__(freq='D'): > self.freq=freq > def __call__(self, y,m,d,h): > return ts.Date(self.freq, year=int(y),month=int(m),day=int(day),hour=int(h)) > > That way, myconverter(freq='T') becomes a valid function (you can call it). Thanks. I will try this way. Best regards, Timmie From pgmdevlist at gmail.com Mon Jan 4 15:35:57 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 4 Jan 2010 15:35:57 -0500 Subject: [SciPy-User] scikits.timeseries.tsfromtxt & guess In-Reply-To: References: <8DEEF015-ECDF-49F5-9B1B-8E940A7249D2@gmail.com> Message-ID: <3CD2A9F8-78D1-4FFF-901E-BFB60B25414F@gmail.com> On Jan 4, 2010, at 3:25 PM, Tim Michelsen wrote: >> I'd first create the time series from the input, then try to guess the frequency from the DateArray > So you'd recommend to create the timeseries using the userdefined > frequency ('U') > def mydateconverter(year, month, day, hour, freq='U'): > freq = ts.extras.guess_freq(year, month, day, hour) > ts_date = ts.Date(freq, year=int(year), month=int(month), day=int(day)) > > return ts_date > > and then use guess_freq to assign the correct one? Basically, yes. Note that guess_freq is only for convenience, it might not be fool-proof... > I want to have the dateconverters in a flexible style only variyng by > input format and clumns used. They should be working regardless of the > frequency (be the data set hourly or minutely). Well, you could define a converter class that takes freq as input and test in the __call__ for the value of the freq. You could have a variable nb of inputs in __call__ and test for the nb of parameters (year, month, day...). It won't be as efficient as defining a specific converter for your data, though... From dpfrota at yahoo.com.br Wed Jan 6 00:15:37 2010 From: dpfrota at yahoo.com.br (dpfrota) Date: Tue, 5 Jan 2010 21:15:37 -0800 (PST) Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6 In-Reply-To: <3d375d730911172231i4cf42760l80038a00f84fa7c8@mail.gmail.com> References: <4AE5DEDF.7070701@asu.edu> <26402986.post@talk.nabble.com> <3d375d730911172231i4cf42760l80038a00f84fa7c8@mail.gmail.com> Message-ID: <27026778.post@talk.nabble.com> Robert Kern-2 wrote: > > On Wed, Nov 18, 2009 at 00:29, dpfrota wrote: >> >> What is the meaning of these adresses? >> I opened these files, and they has some strange lines. The first file has >> only " __import__('pkg_resources').declare_namespace(__name__) ". Is >> module >> PKG necessary? > > These enable the scikits namespace such that you can have multiple > scikits packages installed (possibly to separate locations). > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > I made some tests and I am almost sure the problem is with this file: "C:\Python26\Lib\site-packages\scikits\audiolab\pysndfile\_sndfile.pyd". But I don?t know how to see its contents or fix the problem. Any more tips? (Please!) -- View this message in context: http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p27026778.html Sent from the Scipy-User mailing list archive at Nabble.com. From j33433 at gmail.com Wed Jan 6 22:43:39 2010 From: j33433 at gmail.com (James) Date: Wed, 6 Jan 2010 22:43:39 -0500 Subject: [SciPy-User] timeseries and candlestick() Message-ID: Has anyone managed to plot a candlestick chart with a timeseries? Is there an easy way to wrap the matplotlib.finance.candlestick call? James -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordi_molins at hotmail.com Thu Jan 7 04:09:55 2010 From: jordi_molins at hotmail.com (Jordi Molins Coronado) Date: Thu, 7 Jan 2010 10:09:55 +0100 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: References: Message-ID: Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that: My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information: _emp and _emp, where i,j=1,...,N with i!=j. It is well known in the literature that the Ising model P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum(h_i*s_i) + 0.5*sum(J_ij*s_i*s_j) ) i i!=jmaximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN)).What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j.If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be:hi_{new}=hi_{old} + K * ( - _{emp}) Jij_{new}=Jij_{old}+ K' * ( - _{emp}) where K and K' are pos. "step size" constants. (On the RHS, and are w.r.t. hi_{old} and Jij_{old}.)Have IHave I understood all this correctly? In particular, for the case h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1?Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake.Jordi -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordi_molins at hotmail.com Thu Jan 7 04:19:30 2010 From: jordi_molins at hotmail.com (Jordi Molins Coronado) Date: Thu, 7 Jan 2010 10:19:30 +0100 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: References: , Message-ID: Sorry, I see my previous message has been a disaster in formatting. I try now in a different way. Sorry for the inconveniences. Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum?(http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that: My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information:? _emp and _emp, where i,j=1,...,N with i!=j. It is well known in the literature that the Ising model P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum{for all i}(h_i*s_i) + 0.5*sum{for all i!=j}(J_ij*s_i*s_j) )? maximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN) still maximizes entropy given the constraints above). What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M+1,...N. If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be: hi_{new}=hi_{old} + K * ( - _{emp}) Jij_{new}=Jij_{old}+ K' * ( - _{emp}) where K and K' are pos. "step size" constants. (On the RHS, and are w.r.t. hi_{old} and Jij_{old}.) Have I understood all this correctly??In particular, for the case?h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case?h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1? Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake. Jordi From josef.pktd at gmail.com Thu Jan 7 12:04:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 Jan 2010 12:04:59 -0500 Subject: [SciPy-User] multidimensional signal.convolve semivalid Message-ID: <1cd32cbb1001070904u53b07fe5laa7654446b2b5a5c@mail.gmail.com> simplest case I have two signals and I want to apply two linear filters with convolve. As a result I want to get two signals given by the convolution of the input signal with each of the filter arrays. I can either loop over the filter arrays with valid mode which produces the desired result signal.convolve(x,a3f[:,:,0], mode='valid') signal.convolve(x,a3f[:,:,1], mode='valid') or a can do one 3 dimensional convolution, and throw away two thirds of the calculation signal.convolve(x[:,:,None],a3f)[:,1,:] I didn't manage to get valid or same mode to return the results that I wanted. Is there a way to do it without loop or redundant calculations? background: this will be the fastest way to filter and work with vector autoregressive processes example below Thanks Josef >>> x = np.arange(40).reshape((2,20)).T >>> a3f[:,:,0] array([[ 0.5, 1. ], [ 0.5, 1. ]]) >>> a3f[:,:,1] array([[ 1. , 0.5], [ 1. , 0.5]]) >>> signal.convolve(x[:,:,None],a3f)[:,1,:] array([[ 10. , 20. ], [ 21.5, 41.5], [ 24.5, 44.5], [ 27.5, 47.5], [ 30.5, 50.5], [ 33.5, 53.5], [ 36.5, 56.5], [ 39.5, 59.5], [ 42.5, 62.5], [ 45.5, 65.5], [ 48.5, 68.5], [ 51.5, 71.5], [ 54.5, 74.5], [ 57.5, 77.5], [ 60.5, 80.5], [ 63.5, 83.5], [ 66.5, 86.5], [ 69.5, 89.5], [ 72.5, 92.5], [ 75.5, 95.5], [ 38.5, 48.5]]) >>> signal.fftconvolve(x[:,:,None],a3f).shape (21, 3, 2) >>> signal.fftconvolve(x[:,:,None],a3f)[:,1,:] array([[ 10. , 20. ], [ 21.5, 41.5], [ 24.5, 44.5], [ 27.5, 47.5], [ 30.5, 50.5], [ 33.5, 53.5], [ 36.5, 56.5], [ 39.5, 59.5], [ 42.5, 62.5], [ 45.5, 65.5], [ 48.5, 68.5], [ 51.5, 71.5], [ 54.5, 74.5], [ 57.5, 77.5], [ 60.5, 80.5], [ 63.5, 83.5], [ 66.5, 86.5], [ 69.5, 89.5], [ 72.5, 92.5], [ 75.5, 95.5], [ 38.5, 48.5]]) >>> signal.fftconvolve(x[:,:],a3f[:,:,0]).shape (21, 3) >>> signal.fftconvolve(x[:,:],a3f[:,:,0], mode='valid') array([[ 21.5], [ 24.5], [ 27.5], [ 30.5], [ 33.5], [ 36.5], [ 39.5], [ 42.5], [ 45.5], [ 48.5], [ 51.5], [ 54.5], [ 57.5], [ 60.5], [ 63.5], [ 66.5], [ 69.5], [ 72.5], [ 75.5]]) >>> signal.fftconvolve(x[:,:],a3f[:,:,1], mode='valid') array([[ 41.5], [ 44.5], [ 47.5], [ 50.5], [ 53.5], [ 56.5], [ 59.5], [ 62.5], [ 65.5], [ 68.5], [ 71.5], [ 74.5], [ 77.5], [ 80.5], [ 83.5], [ 86.5], [ 89.5], [ 92.5], [ 95.5]]) >>> From dwf at cs.toronto.edu Thu Jan 7 14:35:48 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 7 Jan 2010 14:35:48 -0500 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: References: ,

Message-ID: <90B92C0A-E286-4E7E-8BBA-E3DAC0792E28@cs.toronto.edu> On 7-Jan-10, at 4:19 AM, Jordi Molins Coronado wrote: > However, I would like to restrict the number of h_i and J_ij > possible, since having complete freedom could become an unwieldly > problem. For example, I could restrict h_i = H and J_ij = J for all > i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes > from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 > and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M > +1,...N. > If I understand correctly the discussion in the thread shown above, > a numerical solution for the inverse problem would be: > hi_{new}=hi_{old} + K * ( - _{emp}) > Jij_{new}=Jij_{old}+ K' * ( - _{emp}) That's correct; the way that you'd usually calculate is by starting from some state and running several iterations of Gibbs sampling to generate a new state, measure your s_i * s_j in that state, then run it for a whole bunch more steps and gather the s_i * s_j, etc. until you had enough measurements for a decent Monte Carlo approximation. The Gibbs iterations form a Markov chain whose equilibrium distribution is P(s_1, s_2, ... s_N), the distribution of interest; the problem is there's no good way to know when you've run sufficiently many Gibbs steps so that the sample you draw is from the equilibrium distribution P. However, one can often get away with just running a small fixed number of steps. There is some analysis of the convergence properties of this trick here: http://www.cs.toronto.edu/~hinton/absps/cdmiguel.pdf (refer to the sections on "Visible Boltzmann machines") I've never really heard of a situation where you'd really want to tie together parameters like you're suggesting, but it's possible and quite trivial to implement. Let's say you wanted to constrain hi and hj to be the same. Then you'd start them off at the same initial value and at every update, use the following equation instead: hi_{new} = hj_{new} = hi_{old} + K/2 ( - _{emp}) + K/2 ( - _{emp}) If you wanted Jij = Jkl, set them to the same initial value and use the update Jij_{new} = Jkl_{new} = Jij_{old}+ K'/2 * ( - _{emp}) + K'/2 ( - _{emp}) Similarly if you wanted to tie a whole set of these together you'd just average the updates and apply it to all of them at once. David From mattknox.ca at gmail.com Thu Jan 7 17:37:15 2010 From: mattknox.ca at gmail.com (Matt Knox) Date: Thu, 7 Jan 2010 22:37:15 +0000 (UTC) Subject: [SciPy-User] timeseries and candlestick() References: Message-ID: James gmail.com> writes: > > Has anyone managed to plot a candlestick chart with a timeseries? Is there an > easy way to wrap the matplotlib.finance.candlestick call? I don't know anything about the candlestick function, but you can get the underlying MaskedArray for a TimeSeries object with the .series property of the TimeSeries object. You can get raw datetime objects for the time axis by doing mytimeseries.dates.tolist(), so from there it should be fairly straight forward to pass the data into matplotlib functions I think. - Matt From bruce at clearscienceinc.com Thu Jan 7 19:43:21 2010 From: bruce at clearscienceinc.com (Bruce Ford) Date: Thu, 7 Jan 2010 19:43:21 -0500 Subject: [SciPy-User] 2D Interpolation Message-ID: All, I'm endeavoring to interpolate global 2.5 degree data (73x144) onto a 1 degree grid (181x360). I'm not sure if I'm barking up the right tree for a cubic spline interpolation. Below is my code based on an example I found at: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html The process is hanging at the line "tck = interpolate.bisplrep(x,y,z,s=0)". I'm unsure if this is a bug or is there an error in my code (which is below)? Secondly, has anyone done a similar interpolation (from a lower resolution regular 2D grid to a higher regular 2D grid) that would share a little code? All my efforts have been fruitless! Thanks in advance! Bruce (code follows...) ***************************************************************** import matplotlib import matplotlib.pyplot as pyplot #used to build contour and wind barbs plots import matplotlib.colors as pycolors #used to build color schemes for plots import numpy.ma as M #matrix manipulation functions import numpy as np #used to perform simple math functions on data from numpy import * import cgi #used to easily parse form variables from sys import exit as die #used to kill the python script early from netCDF4 import Dataset #interprets NetCDF files import Nio from scipy import interpolate filepath = "/media/BACKUP1/reanal-2/6hr/pgb/pgb.197901" grb_file = Nio.open_file(filepath, mode='r', options=None, history='', format='grb') z = grb_file.variables["HGT_2_ISBL_10"][1,1,:,:] print z.shape x,y = np.mgrid[90:-90:73j,0:357.5:144j] print x.shape #(73,144) print y.shape #(73,144) print z.shape #(73,144) xnew,ynew = np.mgrid[-90:90:180j,0:359:360j] print xnew.shape #(180,360) tck = interpolate.bisplrep(x,y,z,s=0) #python freezes on the above line znew = interpolate.bisplev(xnew[:,0],ynew[0,:],tck) --------------------------------------- Bruce W. Ford Clear Science, Inc. bruce at clearscienceinc.com http://www.ClearScienceInc.com 8241 Parkridge Circle N. Jacksonville, FL 32211 Skype: bruce.w.ford Google Talk: fordbw at gmail.com From burak.o.cankurtaran at alumni.uts.edu.au Thu Jan 7 20:09:09 2010 From: burak.o.cankurtaran at alumni.uts.edu.au (Burak1327) Date: Thu, 7 Jan 2010 17:09:09 -0800 (PST) Subject: [SciPy-User] [SciPy-user] 2D Interpolation In-Reply-To: References: Message-ID: <27069900.post@talk.nabble.com> Hi Bruce, I recently received help for what you need. I've listed the code that does the interpolation from a coarse grid to a finer grid (regular grid). It uses an image manipulation package, "ndimage" : pes40 = ReadPES("pes-0.5.xsf", 40) # Interpolate newx,newy,newz = mgrid[0:40:0.5, 0:40:0.5, 0:40:0.5] coords = array([newx, newy, newz]) pes80 = ndimage.map_coordinates(pes40, coords, order=1) So, the coarse grid is a 40x40x40 and I'm interpolating it onto a 80x80x80 grid. This is done with the notation 0:40:0.5. You can change the 0.5 interval length to a complex number if you want to specify the number intervals instead of the interval length. Basically, the coords array holds the actual positions that the interpolation should occur at. Obviously, get rid of the third dimension for 2D. To do higher order interpolation, change the "order" parameter in the map_coordinates function. This is a link for a quick tutorial on 2D: http://www.scipy.org/Cookbook/Interpolation Thanks Burak Bruce Ford wrote: > > All, > > I'm endeavoring to interpolate global 2.5 degree data (73x144) onto a > 1 degree grid (181x360). I'm not sure if I'm barking up the right > tree for a cubic spline interpolation. > > Below is my code based on an example I found at: > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html > > The process is hanging at the line "tck = > interpolate.bisplrep(x,y,z,s=0)". > > I'm unsure if this is a bug or is there an error in my code (which is > below)? > > Secondly, has anyone done a similar interpolation (from a lower > resolution regular 2D grid to a higher regular 2D grid) that would > share a little code? All my efforts have been fruitless! > > Thanks in advance! > > Bruce > > (code follows...) > ***************************************************************** > import matplotlib > import matplotlib.pyplot as pyplot #used to build contour and wind barbs > plots > import matplotlib.colors as pycolors #used to build color schemes for > plots > import numpy.ma as M #matrix manipulation functions > import numpy as np #used to perform simple math functions on data > from numpy import * > import cgi #used to easily parse form variables > from sys import exit as die #used to kill the python script early > from netCDF4 import Dataset #interprets NetCDF files > import Nio > from scipy import interpolate > > filepath = "/media/BACKUP1/reanal-2/6hr/pgb/pgb.197901" > grb_file = Nio.open_file(filepath, mode='r', options=None, history='', > format='grb') > > z = grb_file.variables["HGT_2_ISBL_10"][1,1,:,:] > print z.shape > > x,y = np.mgrid[90:-90:73j,0:357.5:144j] > > print x.shape #(73,144) > print y.shape #(73,144) > print z.shape #(73,144) > > xnew,ynew = np.mgrid[-90:90:180j,0:359:360j] > print xnew.shape #(180,360) > tck = interpolate.bisplrep(x,y,z,s=0) > #python freezes on the above line > znew = interpolate.bisplev(xnew[:,0],ynew[0,:],tck) > > --------------------------------------- > Bruce W. Ford > Clear Science, Inc. > bruce at clearscienceinc.com > http://www.ClearScienceInc.com > 8241 Parkridge Circle N. > Jacksonville, FL 32211 > Skype: bruce.w.ford > Google Talk: fordbw at gmail.com > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/2D-Interpolation-tp27069693p27069900.html Sent from the Scipy-User mailing list archive at Nabble.com. From timmichelsen at gmx-topmail.de Fri Jan 8 07:49:21 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Fri, 8 Jan 2010 12:49:21 +0000 (UTC) Subject: [SciPy-User] scikits.timeseries: moving difference Message-ID: Hello, the scikits.timeseries has implemented some moving windows functions: http://pytseries.sourceforge.net/lib.moving_funcs.html I would like to expand these to include the (absolute and relative) difference between one value and its successor in a time series. How could I do this? Best regards, Timmie From pgmdevlist at gmail.com Fri Jan 8 08:07:08 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 8 Jan 2010 08:07:08 -0500 Subject: [SciPy-User] scikits.timeseries: moving difference In-Reply-To: References: Message-ID: <175355BA-2C0F-425E-BAED-F8540213920D@gmail.com> On Jan 8, 2010, at 7:49 AM, Tim Michelsen wrote: > Hello, > the scikits.timeseries has implemented some moving windows functions: > http://pytseries.sourceforge.net/lib.moving_funcs.html > > I would like to expand these to include the (absolute and relative) difference > between one value and its successor in a time series. Like np.diff ? Could you give us a short example of what you have in mind ? From robince at gmail.com Fri Jan 8 08:17:25 2010 From: robince at gmail.com (Robin) Date: Fri, 8 Jan 2010 13:17:25 +0000 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: References:

Message-ID: <2d5132a51001080517l35cd8020n55062c1b436f87aa@mail.gmail.com> On Thu, Jan 7, 2010 at 9:19 AM, Jordi Molins Coronado wrote: > > Sorry, I see my previous message has been a disaster in formatting. I try now in a different way. Sorry for the inconveniences. > > > Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum?(http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). > I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that: > > My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information: > _emp and _emp, where i,j=1,...,N with i!=j. > > It is well known in the literature that the Ising model > > P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum{for all i}(h_i*s_i) + 0.5*sum{for all i!=j}(J_ij*s_i*s_j) ) > maximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN) still maximizes entropy given the constraints above). > What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M+1,...N. > If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be: > hi_{new}=hi_{old} + K * ( - _{emp}) > Jij_{new}=Jij_{old}+ K' * ( - _{emp}) > > where K and K' are pos. "step size" constants. (On the RHS, and are w.r.t. hi_{old} and Jij_{old}.) > Have I understood all this correctly??In particular, for the case?h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case?h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1? > Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake. Hi, I'm not so familiar with the statisitical mechanics notation, but you might be interested in the maxent module of the pyentropy package I have produced as part of my PhD: http://code.google.com/p/pyentropy/ The main purpose of pyentropy is calculation of bias corrected entropy and information values from limited data sets, but it includes the maxent module which computes maximum entropy distributions over finite alphabet spaces with marginal constraints of up to any order. (I am working in computational neuroscience so much of the notation will probably be a bit different). So I think a second order solution from this framework over a binary space is the same as the Ising model. You can get the hi and J's directly from the results (they are called theta in the code) - although I think they have a slightly different normalisation because of Ising being -1,1 and this being 0,1... http://pyentropy.googlecode.com/svn/docs/api.html#module-pyentropy.maxent With this on a normal computer I can solve for about 18 binary variables in a reasonable amount of time (ie less than an hour for several runs), but it becomes highly exponential with more vectors. (I have a much more efficient but more hackish version of the same algorithm that I haven't added to pyentropy yet, but will in the next few weeks). In the case you describe where the thetas are constrained to be equal at each order the system can be solved much more efficiently. I have code to do this which is not released (and a bit messy) but if you are interested I could send it to you. In neuroscience this situation is called the 'pooled model'. You can see a description for how to solve in this reduced case here: http://rsta.royalsocietypublishing.org/content/367/1901/3297.short The method is using information geomery from Amari - by transforming between the P space (probability vector), eta space (marginals) and theta space (the h,J's etc.) it is possible to find the maximum entropy solution as a projection. Anyway I'm not sure if that helps... probably the documentation on how to get the thetas and the ordering of the vector might not be so clear, so give me a shout if you start using it and have any questions. If you don't have the full probability distribution you can pass the eta vector to the solve function (which would be a vector of , ) and set optional argument eta_given=True (hopefully clear from the option handling in the code). Cheers Robin From jgomezdans at gmail.com Fri Jan 8 10:08:16 2010 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Fri, 8 Jan 2010 15:08:16 +0000 Subject: [SciPy-User] 2D Interpolation In-Reply-To: References: Message-ID: <91d218431001080708q5bdceb2cgf950bd2cb29dd5ad@mail.gmail.com> Hi, 2010/1/8 Bruce Ford > I'm endeavoring to interpolate global 2.5 degree data (73x144) onto a > 1 degree grid (181x360). I'm not sure if I'm barking up the right > tree for a cubic spline interpolation. > Looks suspiciously like NCEP reanalysis data ;) for this task, you could use map_coordinates: < http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.interpolation.map_coordinates.html > Hope that helps, Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dharhas.Pothina at twdb.state.tx.us Fri Jan 8 11:16:20 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Fri, 08 Jan 2010 10:16:20 -0600 Subject: [SciPy-User] Masking multiple fields in a structured timeseries object. Message-ID: <4B4705F40200009B0002638B@GWWEB.twdb.state.tx.us> Hi, I have a structured time series object I have read in from a file. I am providing my script the following parameters filename, startdate, enddate, parameter (All, Salinity, Temp., etc), Max , Min, Instrument type (2 digit code contained in the filename). My timeseries is structured like : timeseries([ ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.0, 13.199999999999999, 28.949999999999999, --, 0.39928999999999998, --, --) ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.100000000000001, 13.199999999999999, 28.690000000000001, --, 0.35965999999999998, --, --) ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.100000000000001, 13.300000000000001, 28.420000000000002, --, 0.32917999999999997, --, --) ..., ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.6699999999999999, 25.800000000000001, 15.699999999999999, 25.600000000000001, --, 1.5514300000000001, --, --) ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.8900000000000001, 25.800000000000001, 15.699999999999999, 25.710000000000001, --, 1.61849, --, --) ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.5899999999999999, 25.899999999999999, 15.699999999999999, 25.859999999999999, --, 1.6398200000000001, --, --)], dtype = [('Filename', '|S27'), ('Year', ' References: <91d218431001080708q5bdceb2cgf950bd2cb29dd5ad@mail.gmail.com> Message-ID: Jose and Burak, thanks for the lead. Yes. Jose, this is NCEP Renal II data. I'm trying to interpolate it to a 1 degree grid to plot alongside other model data that is 1 degree. I'm not quite there yet, but close. It's possible I'm not visualizing what this process is doing correctly. Below is a small, trimmed down script that doesn't use external data so you can see what's happening. It should work for you. The first plot is of a 73X144 grid all of the value 1. This represents a 2.5 degree global grid. The second plot is what I get following the interpolation to a 1 degree grid (181X360). The value of one only appears in a region of the plot instead of being interpolated across the whole domain. If you have experience with this, I'm hoping you can tell me what I'm doing wrong. I'm stumped and I've spent many hours on this problem. Any assistance would be appreciated! Here's the script: ***************************************** import matplotlib.pyplot as pyplot #used to build contour and wind barbs plots from numpy import * from sys import exit as die #used to kill the python script early from scipy import interpolate, ndimage x,y = mgrid[-90:90:2.5,0:357.5:2.5] test_array = ones_like(x) #a test array to interpolate from print "********Shape of Test Array *********",test_array.shape print "********Shape of X array *********",x.shape #(73,144) print "********Shape of Y Array *********",y.shape #(73,144) pyplot.figure() pyplot.pcolor(y,x,test_array) pyplot.colorbar() pyplot.title("Sparsely sampled function.") pyplot.show() xnew,ynew = mgrid[-90:91:1,0:360:1] coords = array([xnew,ynew]) print "********Shape of Coordinate Array *********",coords.shape interpolated = ndimage.map_coordinates(test_array, coords, order=3) print "********Shape of Interpolated Array *********",interpolated.shape #(181,360) pyplot.figure() pyplot.pcolor(ynew,xnew,interpolated) pyplot.colorbar() pyplot.title("Interpolated function.") pyplot.show() --------------------------------------- Bruce W. Ford Clear Science, Inc. bruce at clearscienceinc.com bruce.w.ford.ctr at navy.smil.mil http://www.ClearScienceInc.com Phone/Fax: 904-379-9704 8241 Parkridge Circle N. Jacksonville, FL 32211 Skype: bruce.w.ford Google Talk: fordbw at gmail.com On Fri, Jan 8, 2010 at 10:08 AM, Jose Gomez-Dans wrote: > Hi, > > > 2010/1/8 Bruce Ford >> >> I'm endeavoring to interpolate global 2.5 degree data (73x144) onto a >> 1 degree grid (181x360). ?I'm not sure if I'm barking up the right >> tree for a cubic spline interpolation. > > Looks suspiciously like NCEP reanalysis data ;) for this task, you could use > map_coordinates: > > > Hope that helps, > Jose > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From pgmdevlist at gmail.com Fri Jan 8 13:20:42 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 8 Jan 2010 13:20:42 -0500 Subject: [SciPy-User] Masking multiple fields in a structured timeseries object. In-Reply-To: <4B4705F40200009B0002638B@GWWEB.twdb.state.tx.us> References: <4B4705F40200009B0002638B@GWWEB.twdb.state.tx.us> Message-ID: On Jan 8, 2010, at 11:16 AM, Dharhas Pothina wrote: > Hi, > > I have a structured time series object I have read in from a file. I am providing my script the following parameters > filename, startdate, enddate, parameter (All, Salinity, Temp., etc), Max , Min, Instrument type (2 digit code contained in the filename). > > My timeseries is structured like : > > timeseries([ ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.0, 13.199999999999999, 28.949999999999999, --, 0.39928999999999998, --, --) .... > ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.5899999999999999, 25.899999999999999, 15.699999999999999, 25.859999999999999, --, 1.6398200000000001, --, --)], > dtype = [('Filename', '|S27'), ('Year', ' dates = [11-Jun-1996 21:00 11-Jun-1996 22:00 11-Jun-1996 23:00 ..., > 05-Oct-2000 09:00 05-Oct-2000 10:00 05-Oct-2000 11:00], > freq = T) > > > > I want to mask the data in the following way: > > Mask all values between start & end dates that meet the following criteria: > > 1) selected parameter (mask all if blank) > 2) selected filename (mask all if blank) > 3) selected instrument (mask all if blank). Note the instrument is the 18 & 19 character in the filename, ie 'MW' in the example above. > 4) parameter value lies between the given max and min values. > > I'm having trouble working out how to check all these conditions at once or sequentially before masking. Step by step, it's gonna be easier to debug. Take the simpler example: >>> ndtype=[('name','|S3'),('v1',float),('v2',float)] >>> series=ts.time_series([("ABC",1.1,10.),("ABD",2.2,20.),("ABE",3.3,30)], dtype=ndtype, start_date=ts.now('D')) >>> _series=series.series _series is only a masked array, that's gonna keep things nice and easy (no need to carry the dates) Mask a record (viz, a full row) if v2>25 >>> series[_series['v2']>25]=ma.masked Mask a record if the last character of the name is "C". This one is trickier, as we need to test whether the field 'name' is masked >>> maskonnames = [] >>> for _ in _series['name']: >>> if _ is ma.masked: >>> maskonnames.append(False) >>> else: >>> maskonnames.append(_[-1]=='C') >>> series[np.array(maskonnames)] = ma.masked (maskonnames is a list that we need to transform into a bool ndarray to have fancy indexing. Otherwise, we just gonna take the first or second record (depending on whether maskonnames is False (0) or True (1)), and that's not what we want. So, so far >>> series timeseries([(--, --, --) ('ABD', 2.2000000000000002, 20.0) (--, --, --)], dtype = [('name', '|S3'), ('v1', '>> _series['v1'][_series['v1']<3]=ma.masked >>> series timeseries([(--, --, --) ('ABD', --, 20.0) (--, --, --)], dtype = [('name', '|S3'), ('v1', '>> global_condition = np.zeros(len(series), dtype=bool) >>> global_condition |= _series[_series['v2']>25]=ma.masked >>> global_condition |= maskonnames >>> series[global_condition]=ma.masked HIH P. From Dharhas.Pothina at twdb.state.tx.us Fri Jan 8 16:33:58 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Fri, 08 Jan 2010 15:33:58 -0600 Subject: [SciPy-User] Masking multiple fields in a structured timeseriesobject. In-Reply-To: References: <4B4705F40200009B0002638B@GWWEB.twdb.state.tx.us> Message-ID: <4B475066.63BA.009B.0@twdb.state.tx.us> Thanks. I'll try implementing your approaches on Tuesday and will probably be back with questions after that. - dharhas >>> Pierre GM 1/8/2010 12:20 PM >>> On Jan 8, 2010, at 11:16 AM, Dharhas Pothina wrote: > Hi, > > I have a structured time series object I have read in from a file. I am providing my script the following parameters > filename, startdate, enddate, parameter (All, Salinity, Temp., etc), Max , Min, Instrument type (2 digit code contained in the filename). > > My timeseries is structured like : > > timeseries([ ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.0, 13.199999999999999, 28.949999999999999, --, 0.39928999999999998, --, --) .... > ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.5899999999999999, 25.899999999999999, 15.699999999999999, 25.859999999999999, --, 1.6398200000000001, --, --)], > dtype = [('Filename', '|S27'), ('Year', ' dates = [11-Jun-1996 21:00 11-Jun-1996 22:00 11-Jun-1996 23:00 ..., > 05-Oct-2000 09:00 05-Oct-2000 10:00 05-Oct-2000 11:00], > freq = T) > > > > I want to mask the data in the following way: > > Mask all values between start & end dates that meet the following criteria: > > 1) selected parameter (mask all if blank) > 2) selected filename (mask all if blank) > 3) selected instrument (mask all if blank). Note the instrument is the 18 & 19 character in the filename, ie 'MW' in the example above. > 4) parameter value lies between the given max and min values. > > I'm having trouble working out how to check all these conditions at once or sequentially before masking. Step by step, it's gonna be easier to debug. Take the simpler example: >>> ndtype=[('name','|S3'),('v1',float),('v2',float)] >>> series=ts.time_series([("ABC",1.1,10.),("ABD",2.2,20.),("ABE",3.3,30)], dtype=ndtype, start_date=ts.now('D')) >>> _series=series.series _series is only a masked array, that's gonna keep things nice and easy (no need to carry the dates) Mask a record (viz, a full row) if v2>25 >>> series[_series['v2']>25]=ma.masked Mask a record if the last character of the name is "C". This one is trickier, as we need to test whether the field 'name' is masked >>> maskonnames = [] >>> for _ in _series['name']: >>> if _ is ma.masked: >>> maskonnames.append(False) >>> else: >>> maskonnames.append(_[-1]=='C') >>> series[np.array(maskonnames)] = ma.masked (maskonnames is a list that we need to transform into a bool ndarray to have fancy indexing. Otherwise, we just gonna take the first or second record (depending on whether maskonnames is False (0) or True (1)), and that's not what we want. So, so far >>> series timeseries([(--, --, --) ('ABD', 2.2000000000000002, 20.0) (--, --, --)], dtype = [('name', '|S3'), ('v1', '>> _series['v1'][_series['v1']<3]=ma.masked >>> series timeseries([(--, --, --) ('ABD', --, 20.0) (--, --, --)], dtype = [('name', '|S3'), ('v1', '>> global_condition = np.zeros(len(series), dtype=bool) >>> global_condition |= _series[_series['v2']>25]=ma.masked >>> global_condition |= maskonnames >>> series[global_condition]=ma.masked HIH P. _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From jgomezdans at gmail.com Sat Jan 9 11:18:22 2010 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Sat, 9 Jan 2010 16:18:22 +0000 Subject: [SciPy-User] 2D Interpolation In-Reply-To: References: <91d218431001080708q5bdceb2cgf950bd2cb29dd5ad@mail.gmail.com> Message-ID: <91d218431001090818g791027d1q9276e06fc3d66a7@mail.gmail.com> Hi Bruce! 2010/1/8 Bruce Ford Here's your problem: > coords = array([xnew,ynew]) > interpolated = ndimage.map_coordinates(test_array, coords, order=3) > coords (and hence xnew and ynew) need to be specified in array units, so you need to calculate where your 1 degree grid falls within the 2.5 degree original grid, so you can define ynew = numpy.linspace (0,360,360)/2.5 xnew = numpy.linspace (0,180, 180)/2.5 coords = numpy.array([xnew, ynew]) and feed that into map_coordinates. Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Sat Jan 9 13:22:39 2010 From: silva at lma.cnrs-mrs.fr (Fabricio Silva) Date: Sat, 09 Jan 2010 19:22:39 +0100 Subject: [SciPy-User] spline interpolation and matplotlib interaction Message-ID: <1263061359.31641.9.camel@PCTerrusse> Hello folks, has anyone ever thought of a piece of code that - use a parametrization of a function with B-splines, i.e. with knots and coefficients, - let a user manipulate this parametrization through matplotlib (and event handling mechanisms) ? Within a scientific app, I would like to be able to handle simplified representation of time-varying quantities that could be obtained from measurement or from academic signals. I thought scipy.interpolation module could help me in such a way : - measurement of time signals - using splrep to least-square fitting and identification of knots and coefficients (with respect to a smoothing factor and a polynomial order) - checking : show the approximated function with matplotlib and let the user manually modify the control points. It seems that I have not the sufficient understanding of b-splines as the values output by splrep look strange even if the result of splev is nice... -- Fabrice Silva Laboratory of Mechanics and Acoustics (CNRS, UPR 7051) From contact at pythonxy.com Sun Jan 10 10:50:20 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Sun, 10 Jan 2010 16:50:20 +0100 Subject: [SciPy-User] [ANN] Spyder v1.0.2 released Message-ID: <4B49F73C.9060402@pythonxy.com> Hi all, I'm pleased to announce here that Spyder version 1.0.2 has been released: http://packages.python.org/spyder Previously known as Pydee, Spyder (Scientific PYthon Development EnviRonment) is a free open-source Python development environment providing MATLAB-like features in a simple and light-weighted software, available for Windows XP/Vista/7, GNU/Linux and MacOS X: * advanced code editing features (code analysis, ...) * interactive console with MATLAB-like workpace (with GUI-based list, dictionary, tuple, text and array editors -- screenshots: http://packages.python.org/spyder/console.html#the-workspace) and integrated matplotlib figures * external console to open an interpreter or run a script in a separate process (with a global variable explorer providing the same features as the interactive console's workspace) * code analysis with pyflakes and pylint * search in files features * documentation viewer: automatically retrieves docstrings or source code of the function/class called in the interactive/external console * integrated file/directories explorer * MATLAB-like path management ...and more! Spyder is part of spyderlib, a Python module based on PyQt4 and QScintilla2 which provides powerful console-related PyQt4 widgets. Spyder v1.0.2 is a bugfix release: * External console: subprocess python calls were using the external console's sitecustomize.py (instead of system sitecustomize.py) * Added workaround for PyQt4 v4.6+ major bug with matplotlib * Added option to customize the way matplotlib figures are embedded (docked or floating window) * Matplotlib's "Option" dialog box is now supporting subplots * Array editor now supports complex arrays * Editor: replaced "Run selection or current line" option by "Run selection or current block" (without selection, this feature is similar to MATLAB's cell mode) * ...and a lot of minor bugfixes. - Pierre From totalbull at mac.com Sun Jan 10 16:35:35 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Sun, 10 Jan 2010 21:35:35 +0000 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function References: Message-ID: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression. I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress >>> from scipy import stats >>> x = [5.05, 6.75, 3.21, 2.66] >>> y = [1.65, 26.5, -5.93, 7.96] >>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> gradient 5.3935773611970186 >>> intercept -16.281127993087829 >>> r_value 0.72443514211849758 >>> r_value**2 0.52480627513624778 >>> std_err 3.6290901222878866 The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel: Anybody knows what's going on? Any alternative way of getting the standard error without going to R? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 33948 bytes Desc: not available URL: From jsseabold at gmail.com Sun Jan 10 16:59:43 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 10 Jan 2010 16:59:43 -0500 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Message-ID: On Sun, Jan 10, 2010 at 4:35 PM, wrote: > > Hello, Excel and scipy.stats.linregress are disagreeing on the standard > error of a regression. > > I need to find the standard errors of a bunch of regressions, and prefer to > use pure Python than RPy. So I am going to scipy.stats.linregress, as > advised at: > > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress > > from scipy import stats > > x = [5.05, 6.75, 3.21, 2.66] > > y = [1.65, 26.5, -5.93, 7.96] > > gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) > > gradient > > 5.3935773611970186 > > intercept > > -16.281127993087829 > > r_value > > 0.72443514211849758 > > r_value**2 > > 0.52480627513624778 > > std_err > > 3.6290901222878866 > > > The problem is that the std error calculation does not agree with what is > returned in Microsoft Excel's STEYX function (whereas all the other output > does). From Excel: > > > > > Anybody knows what's going on? Any alternative way of getting the standard > error without going to R? > > > > 'std_err' is the standard error of 'gradient' above, not the standard error of the regression as reported in Excel. You might want to have a look at the statsmodels scikit as a possible alternative to R. I recommend getting the trunk source until the next release, which should be soon. http://statsmodels.sourceforge.net/ Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 33948 bytes Desc: not available URL: From bsouthey at gmail.com Sun Jan 10 20:21:17 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 10 Jan 2010 19:21:17 -0600 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Message-ID: On Sun, Jan 10, 2010 at 3:35 PM, wrote: > > Hello, Excel and scipy.stats.linregress are disagreeing on the standard > error of a regression. > > I need to find the standard errors of a bunch of regressions, and prefer to > use pure Python than RPy. So I am going to scipy.stats.linregress, as > advised at: > > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress > > from scipy import stats > > x = [5.05, 6.75, 3.21, 2.66] > > y = [1.65, 26.5, -5.93, 7.96] > > gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) > > gradient > > 5.3935773611970186 > > intercept > > -16.281127993087829 > > r_value > > 0.72443514211849758 > > r_value**2 > > 0.52480627513624778 > > std_err > > 3.6290901222878866 > > > The problem is that the std error calculation does not agree with what is > returned in Microsoft Excel's STEYX function (whereas all the other output > does). From Excel: > > > > > Anybody knows what's going on? Any alternative way of getting the standard > error without going to R? > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > The Excel help is rather cryptic by :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE). Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 33948 bytes Desc: not available URL: From josef.pktd at gmail.com Sun Jan 10 20:41:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 10 Jan 2010 20:41:29 -0500 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Message-ID: <1cd32cbb1001101741t205f2fe2icbc6b10bf61c0be9@mail.gmail.com> On Sun, Jan 10, 2010 at 8:21 PM, Bruce Southey wrote: > > > On Sun, Jan 10, 2010 at 3:35 PM, wrote: > >> >> Hello, Excel and scipy.stats.linregress are disagreeing on the standard >> error of a regression. >> >> I need to find the standard errors of a bunch of regressions, and prefer >> to use pure Python than RPy. So I am going to scipy.stats.linregress, as >> advised at: >> >> http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress >> >> from scipy import stats >> >> x = [5.05, 6.75, 3.21, 2.66] >> >> y = [1.65, 26.5, -5.93, 7.96] >> >> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >> >> gradient >> >> 5.3935773611970186 >> >> intercept >> >> -16.281127993087829 >> >> r_value >> >> 0.72443514211849758 >> >> r_value**2 >> >> 0.52480627513624778 >> >> std_err >> >> 3.6290901222878866 >> >> >> The problem is that the std error calculation does not agree with what is >> returned in Microsoft Excel's STEYX function (whereas all the other output >> does). From Excel: >> >> >> >> >> Anybody knows what's going on? Any alternative way of getting the standard >> error without going to R? >> >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > The Excel help is rather cryptic by :"Returns the standard error of the > predicted y-value for each x in the regression. The standard error is a > measure of the amount of error in the prediction of y for an individual x." > But clearly this is not the same as the standard error of the 'gradient' > (slope) returned by linregress. Without checking the formula, STEYX appears > returns the square root what most people call the mean square error (MSE). > > Bruce > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > >>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> ((y-intercept-np.array(x)*gradient)**2).sum()/(4.-2.) 136.80611125682617 >>> np.sqrt(_) 11.6964144615701 I think this should be the estimate of the standard deviation of the noise/error term. Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 33948 bytes Desc: not available URL: From timmichelsen at gmx-topmail.de Mon Jan 11 05:11:37 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 11 Jan 2010 10:11:37 +0000 (UTC) Subject: [SciPy-User] scikits.timeseries: moving difference References: <175355BA-2C0F-425E-BAED-F8540213920D@gmail.com> Message-ID: > > the scikits.timeseries has implemented some moving windows functions: > > http://pytseries.sourceforge.net/lib.moving_funcs.html > > > > I would like to expand these to include the (absolute and relative) difference > > between one value and its successor in a time series. > > Like np.diff ? Could you give us a short example of what you have in mind ? Yes, thanks. This was what I was searching for. Regards, Timmie From bnuttall at uky.edu Mon Jan 11 10:47:44 2010 From: bnuttall at uky.edu (Nuttall, Brandon C) Date: Mon, 11 Jan 2010 10:47:44 -0500 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Message-ID: For what it's worth, using by the definition of standard error of the estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover Publications (p. 156), the Excel function provides the "correct" standard error of the estimate. Using notation from Crow, Davis, and Maxfield: import numpy as np n = 4.0 x = np.array([5.05, 6.75, 3.21, 2.66]) y = np.array([1.65, 26.5, -5.93, 7.96]) x2 = x*x y2 = y*y s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0)) s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0)) xy = x * y b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum()) a = (y.sum()-b*x.sum())/n s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x) ste = np.sqrt(s2xy) r=b*np.sqrt(s2x)/np.sqrt(s2y) print "intercept: ",a print "gradient (slope): ",b print "correlation coefficient, r: ",r print "std err est: ",ste Produces the output : intercept: -16.2811279931 gradient (slope): 5.3935773612 correlation coefficient, r: 0.724435142118 std err est: 11.6964144616 This same value for the standard error of the estimate is reported with the sample x,y data at the VassarStats, Statistical Computation Web Site, http://faculty.vassar.edu/lowry/VassarStats.html. Brandon Nuttall, KRPG-1364 Kentucky Geological Survey www.uky.edu/kgs bnuttall at uky.edu (KGS, Mo-We) Brandon.nuttall at ky.gov (EEC, Th-Fr) 859-257-5500 ext 30544 (main) 859-323-0544 (direct) 859-684-7473 (cell) 859-257-1147 (FAX) From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Bruce Southey Sent: Sunday, January 10, 2010 8:21 PM To: SciPy Users List Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress function On Sun, Jan 10, 2010 at 3:35 PM, > wrote: Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression. I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress from scipy import stats x = [5.05, 6.75, 3.21, 2.66] y = [1.65, 26.5, -5.93, 7.96] gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) gradient 5.3935773611970186 intercept -16.281127993087829 r_value 0.72443514211849758 r_value**2 0.52480627513624778 std_err 3.6290901222878866 The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel: [cid:image001.png at 01CA92A7.C1C66980] Anybody knows what's going on? Any alternative way of getting the standard error without going to R? _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user The Excel help is rather cryptic by :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE). Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1973 bytes Desc: image001.png URL: From bruce at clearscienceinc.com Mon Jan 11 11:40:42 2010 From: bruce at clearscienceinc.com (Bruce Ford) Date: Mon, 11 Jan 2010 11:40:42 -0500 Subject: [SciPy-User] ValueError: setting and array element with a sequence Message-ID: I'm new at this and I'm getting this error. It looks straightforward enough. Any ideas? ynew = numpy.linspace (0,360,360)/2.5 xnew = numpy.linspace (0,180, 180)/2.5 coords = numpy.array([xnew, ynew]) yeilds: ValueError: setting and array element with a sequence Bruce --------------------------------------- Bruce W. Ford Clear Science, Inc. bruce at clearscienceinc.com bruce.w.ford.ctr at navy.smil.mil http://www.ClearScienceInc.com Phone/Fax: 904-379-9704 8241 Parkridge Circle N. Jacksonville, FL 32211 Skype: bruce.w.ford Google Talk: fordbw at gmail.com From robert.kern at gmail.com Mon Jan 11 11:42:27 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Jan 2010 10:42:27 -0600 Subject: [SciPy-User] ValueError: setting and array element with a sequence In-Reply-To: References: Message-ID: <3d375d731001110842x1ff2d488i63284f57004d66e6@mail.gmail.com> On Mon, Jan 11, 2010 at 10:40, Bruce Ford wrote: > I'm new at this and I'm getting this error. ?It looks straightforward > enough. ?Any ideas? > > ynew = numpy.linspace (0,360,360)/2.5 > xnew = numpy.linspace (0,180, 180)/2.5 > coords = numpy.array([xnew, ynew]) > > yeilds: ?ValueError: ?setting and array element with a sequence ynew has 360 elements. xnew has 180. They need to be the same if you want to make an (2,N)-shape array from them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Mon Jan 11 14:41:31 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 11 Jan 2010 11:41:31 -0800 Subject: [SciPy-User] ValueError: setting and array element with a sequence In-Reply-To: <3d375d731001110842x1ff2d488i63284f57004d66e6@mail.gmail.com> References: <3d375d731001110842x1ff2d488i63284f57004d66e6@mail.gmail.com> Message-ID: <4B4B7EEB.9050600@noaa.gov> Robert Kern wrote: > On Mon, Jan 11, 2010 at 10:40, Bruce Ford wrote: >> I'm new at this and I'm getting this error. It looks straightforward >> enough. Any ideas? >> >> ynew = numpy.linspace (0,360,360)/2.5 >> xnew = numpy.linspace (0,180, 180)/2.5 >> coords = numpy.array([xnew, ynew]) >> >> yeilds: ValueError: setting and array element with a sequence > > ynew has 360 elements. xnew has 180. They need to be the same if you > want to make an (2,N)-shape array from them. if you want a 360x180 (or 180x360) arrays, then you can do: In [22]: X,Y = np.meshgrid(xnew, ynew) In [23]: X.shape Out[23]: (360, 180) In [24]: Y.shape Out[24]: (360, 180) or, better yet rely on numpy broadcasting: In [25]: xnew.shape = (1, -1) # make x a single row In [26]: ynew.shape = (-1, 1) # make y a single column In [27]: z = xnew * ynew**2 # they are then broadcast when combined In [28]: z.shape Out[28]: (360, 180) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From bnuttall at uky.edu Mon Jan 11 15:07:02 2010 From: bnuttall at uky.edu (Nuttall, Brandon C) Date: Mon, 11 Jan 2010 15:07:02 -0500 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: <1cd32cbb1001101741t205f2fe2icbc6b10bf61c0be9@mail.gmail.com> References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> <1cd32cbb1001101741t205f2fe2icbc6b10bf61c0be9@mail.gmail.com> Message-ID: OK, I think I've figured it out. The numpy covariance function doesn't seem to return the actual sample variances (it returns a population variance?). What this means is that for the linregress() function in the stats.py source file, the quantity sterrest is not calculated correctly and needs to be adjusted to the sample variance. In addition, it includes the quantity ssxm, sum of squares for x (?) and I can't find documentation for its inclusion. # as implemented # sterrest = np.sqrt((1-r*r)*ssym / ssxm / df) # should be corrected to sterrest = np.sqrt((1-r*r)*(ssym*n)/df) Having made this correction, both the example provided and the example in Crow, Davis, and Maxfield (Table 6.1, p. 154) provide the same value for the standard error of the estimate and the value matches what is calculated by Excel. I don't know anything about SVN or submitting a correction, so someone will have to help me out or do it for me. Thanks. Brandon Brandon Nuttall, KRPG-1364 Kentucky Geological Survey www.uky.edu/kgs bnuttall at uky.edu (KGS, Mo-We) Brandon.nuttall at ky.gov (EEC, Th-Fr) 859-257-5500 ext 30544 (main) 859-323-0544 (direct) 859-684-7473 (cell) 859-257-1147 (FAX) From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of josef.pktd at gmail.com Sent: Sunday, January 10, 2010 8:41 PM To: SciPy Users List Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress function On Sun, Jan 10, 2010 at 8:21 PM, Bruce Southey > wrote: On Sun, Jan 10, 2010 at 3:35 PM, > wrote: Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression. I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress from scipy import stats x = [5.05, 6.75, 3.21, 2.66] y = [1.65, 26.5, -5.93, 7.96] gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) gradient 5.3935773611970186 intercept -16.281127993087829 r_value 0.72443514211849758 r_value**2 0.52480627513624778 std_err 3.6290901222878866 The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel: [cid:image001.png at 01CA92CD.B1C81030] Anybody knows what's going on? Any alternative way of getting the standard error without going to R? _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user The Excel help is rather cryptic by :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE). Bruce _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user >>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> ((y-intercept-np.array(x)*gradient)**2).sum()/(4.-2.) 136.80611125682617 >>> np.sqrt(_) 11.6964144615701 I think this should be the estimate of the standard deviation of the noise/error term. Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1973 bytes Desc: image001.png URL: From totalbull at mac.com Mon Jan 11 15:08:46 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Mon, 11 Jan 2010 20:08:46 +0000 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> Message-ID: <003BDC42-BD00-440A-A237-1F240BDE1AD1@mac.com> Thanks very much to all who have helped with this. I am going to go with the first-principles formulae as per below. Otherwise I also asked on Stack Overflow and one person answered with a scikits example: http://stackoverflow.com/questions/2038667/scipy-linregress-function-erroneous-standard-error-return On 11 Jan 2010, at 15:47, Nuttall, Brandon C wrote: > For what it?s worth, using by the definition of standard error of the estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover Publications (p. 156), the Excel function provides the ?correct? standard error of the estimate. Using notation from Crow, Davis, and Maxfield: > > import numpy as np > n = 4.0 > x = np.array([5.05, 6.75, 3.21, 2.66]) > y = np.array([1.65, 26.5, -5.93, 7.96]) > x2 = x*x > y2 = y*y > s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0)) > s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0)) > xy = x * y > b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum()) > a = (y.sum()-b*x.sum())/n > s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x) > ste = np.sqrt(s2xy) > r=b*np.sqrt(s2x)/np.sqrt(s2y) > print "intercept: ",a > print "gradient (slope): ",b > print "correlation coefficient, r: ",r > print "std err est: ",ste > > Produces the output : > > intercept: -16.2811279931 > gradient (slope): 5.3935773612 > correlation coefficient, r: 0.724435142118 > std err est: 11.6964144616 > > This same value for the standard error of the estimate is reported with the sample x,y data at the VassarStats, Statistical Computation Web Site,http://faculty.vassar.edu/lowry/VassarStats.html. > > Brandon Nuttall, KRPG-1364 > Kentucky Geological Survey > www.uky.edu/kgs > bnuttall at uky.edu (KGS, Mo-We) > Brandon.nuttall at ky.gov (EEC, Th-Fr) > 859-257-5500 ext 30544 (main) > 859-323-0544 (direct) > 859-684-7473 (cell) > 859-257-1147 (FAX) > > From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Bruce Southey > Sent: Sunday, January 10, 2010 8:21 PM > To: SciPy Users List > Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress function > > > > On Sun, Jan 10, 2010 at 3:35 PM, wrote: > > Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression. > > I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at: > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress > > > from scipy import stats > x = [5.05, 6.75, 3.21, 2.66] > y = [1.65, 26.5, -5.93, 7.96] > gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) > gradient > 5.3935773611970186 > > intercept > -16.281127993087829 > > r_value > 0.72443514211849758 > > r_value**2 > 0.52480627513624778 > > std_err > 3.6290901222878866 > > > The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel: > > > > > Anybody knows what's going on? Any alternative way of getting the standard error without going to R? > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > The Excel help is rather cryptic by :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE). > > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 11 16:19:48 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 16:19:48 -0500 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: <003BDC42-BD00-440A-A237-1F240BDE1AD1@mac.com> References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> <003BDC42-BD00-440A-A237-1F240BDE1AD1@mac.com> Message-ID: <1cd32cbb1001111319x19e9b5bfveb4ca55258a55592@mail.gmail.com> On Mon, Jan 11, 2010 at 3:08 PM, wrote: > Thanks very much to all who have helped with this. > I am going to go with the first-principles formulae as per below. > Otherwise I also asked on Stack Overflow and one person answered with a > scikits example: > http://stackoverflow.com/questions/2038667/scipy-linregress-function-erroneous-standard-error-return If the old version of linregress matched excel, as you say, then I unintentionally changed the meaning of this value in response to a previous bug report (see http://projects.scipy.org/scipy/ticket/874 ) It's sometimes difficult to figure out what a value is supposed to be, if there are neither sufficient documentation nor tests for it. I had the numbers of linregress verified against statsmodels, but the standard error just means something different than the definition in excel. But as Skipper said, for all but the simplest regression case, scikits.statsmodels is much more general and produces more results. Josef > > On 11 Jan 2010, at 15:47, Nuttall, Brandon C wrote: > > For what it?s worth, using by the definition of standard error of the > estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover > Publications (p. 156), the Excel function provides the ?correct? standard > error of the estimate. ?Using notation from Crow, Davis, and Maxfield: > > import numpy as np > n = 4.0 > x = np.array([5.05, 6.75, 3.21, 2.66]) > y = np.array([1.65, 26.5, -5.93, 7.96]) > x2 = x*x > y2 = y*y > s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0)) > s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0)) > xy = x * y > b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum()) > a = (y.sum()-b*x.sum())/n > s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x) > ste = np.sqrt(s2xy) > r=b*np.sqrt(s2x)/np.sqrt(s2y) > print "intercept: ",a > print "gradient (slope): ",b > print "correlation coefficient, r: ",r > print "std err est: ",ste > > Produces the output : > > intercept:? -16.2811279931 > gradient (slope):? 5.3935773612 > correlation coefficient, r:? 0.724435142118 > std err est:? 11.6964144616 > > This same value for the standard error of the estimate is reported with the > sample x,y data at the VassarStats, Statistical Computation Web > Site,http://faculty.vassar.edu/lowry/VassarStats.html. > > Brandon Nuttall, KRPG-1364 > Kentucky Geological Survey > www.uky.edu/kgs > bnuttall at uky.edu?(KGS, Mo-We) > Brandon.nuttall at ky.gov?(EEC, Th-Fr) > 859-257-5500 ext 30544 (main) > 859-323-0544 (direct) > 859-684-7473 (cell) > 859-257-1147 (FAX) > > From:?scipy-user-bounces at scipy.org?[mailto:scipy-user-bounces at scipy.org]?On > Behalf Of?Bruce Southey > Sent:?Sunday, January 10, 2010 8:21 PM > To:?SciPy Users List > Subject:?Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress > function > > > > > On Sun, Jan 10, 2010 at 3:35 PM, wrote: > > Hello, Excel and scipy.stats.linregress are disagreeing on the standard > error of a regression. > > I need to find the standard errors of a bunch of regressions, and prefer to > use pure Python than RPy. So I am going to scipy.stats.linregress, as > advised at: > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress > > > from scipy import stats > > x = [5.05, 6.75, 3.21, 2.66] > > y = [1.65, 26.5, -5.93, 7.96] > > gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) > > gradient > > 5.3935773611970186 > > intercept > > -16.281127993087829 > > r_value > > 0.72443514211849758 > > r_value**2 > > 0.52480627513624778 > > std_err > > 3.6290901222878866 > > > The problem is that the std error calculation does not agree with what is > returned in Microsoft Excel's STEYX function (whereas all the other output > does). From Excel: > > > > > Anybody knows what's going on? Any alternative way of getting the standard > error without going to R? > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > The Excel help is rather cryptic by ? :"Returns the standard error of the > predicted y-value for each x in the regression. The standard error is a > measure of the amount of error in the prediction of y for an individual x." > But clearly this is not the same as the standard error of the 'gradient' > (slope) returned by linregress. Without checking the formula, STEYX appears > returns the square root what most people call the mean square error (MSE). > > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From totalbull at mac.com Mon Jan 11 16:34:19 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Mon, 11 Jan 2010 21:34:19 +0000 Subject: [SciPy-User] StdErr Problem with Gary Strangman's linregress function In-Reply-To: <1cd32cbb1001111319x19e9b5bfveb4ca55258a55592@mail.gmail.com> References: <92B7777B-E679-4A4F-8867-D91A2ED85FA9@mac.com> <003BDC42-BD00-440A-A237-1F240BDE1AD1@mac.com> <1cd32cbb1001111319x19e9b5bfveb4ca55258a55592@mail.gmail.com> Message-ID: <07D7CD52-2453-4EE1-AA2D-0F5F66FD738C@mac.com> not a problem Josef. The new output was luckily wildly different enough so that "that something had happened" was easy to spot. Again thanks for all the help. Tom On 11 Jan 2010, at 21:19, josef.pktd at gmail.com wrote: > On Mon, Jan 11, 2010 at 3:08 PM, wrote: >> Thanks very much to all who have helped with this. >> I am going to go with the first-principles formulae as per below. >> Otherwise I also asked on Stack Overflow and one person answered with a >> scikits example: >> http://stackoverflow.com/questions/2038667/scipy-linregress-function-erroneous-standard-error-return > > If the old version of linregress matched excel, as you say, then I > unintentionally changed the meaning of this value in response to a > previous bug report (see http://projects.scipy.org/scipy/ticket/874 ) > > It's sometimes difficult to figure out what a value is supposed to be, > if there are neither sufficient documentation nor tests for it. I had > the numbers of linregress verified against statsmodels, but the > standard error just means something different than the definition in > excel. > > But as Skipper said, for all but the simplest regression case, > scikits.statsmodels is much more general and produces more results. > > Josef > > > >> >> On 11 Jan 2010, at 15:47, Nuttall, Brandon C wrote: >> >> For what it?s worth, using by the definition of standard error of the >> estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover >> Publications (p. 156), the Excel function provides the ?correct? standard >> error of the estimate. Using notation from Crow, Davis, and Maxfield: >> >> import numpy as np >> n = 4.0 >> x = np.array([5.05, 6.75, 3.21, 2.66]) >> y = np.array([1.65, 26.5, -5.93, 7.96]) >> x2 = x*x >> y2 = y*y >> s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0)) >> s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0)) >> xy = x * y >> b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum()) >> a = (y.sum()-b*x.sum())/n >> s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x) >> ste = np.sqrt(s2xy) >> r=b*np.sqrt(s2x)/np.sqrt(s2y) >> print "intercept: ",a >> print "gradient (slope): ",b >> print "correlation coefficient, r: ",r >> print "std err est: ",ste >> >> Produces the output : >> >> intercept: -16.2811279931 >> gradient (slope): 5.3935773612 >> correlation coefficient, r: 0.724435142118 >> std err est: 11.6964144616 >> >> This same value for the standard error of the estimate is reported with the >> sample x,y data at the VassarStats, Statistical Computation Web >> Site,http://faculty.vassar.edu/lowry/VassarStats.html. >> >> Brandon Nuttall, KRPG-1364 >> Kentucky Geological Survey >> www.uky.edu/kgs >> bnuttall at uky.edu (KGS, Mo-We) >> Brandon.nuttall at ky.gov (EEC, Th-Fr) >> 859-257-5500 ext 30544 (main) >> 859-323-0544 (direct) >> 859-684-7473 (cell) >> 859-257-1147 (FAX) >> >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On >> Behalf Of Bruce Southey >> Sent: Sunday, January 10, 2010 8:21 PM >> To: SciPy Users List >> Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress >> function >> >> >> >> >> On Sun, Jan 10, 2010 at 3:35 PM, wrote: >> >> Hello, Excel and scipy.stats.linregress are disagreeing on the standard >> error of a regression. >> >> I need to find the standard errors of a bunch of regressions, and prefer to >> use pure Python than RPy. So I am going to scipy.stats.linregress, as >> advised at: >> http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress >> >> >> from scipy import stats >> >> x = [5.05, 6.75, 3.21, 2.66] >> >> y = [1.65, 26.5, -5.93, 7.96] >> >> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >> >> gradient >> >> 5.3935773611970186 >> >> intercept >> >> -16.281127993087829 >> >> r_value >> >> 0.72443514211849758 >> >> r_value**2 >> >> 0.52480627513624778 >> >> std_err >> >> 3.6290901222878866 >> >> >> The problem is that the std error calculation does not agree with what is >> returned in Microsoft Excel's STEYX function (whereas all the other output >> does). From Excel: >> >> >> >> >> Anybody knows what's going on? Any alternative way of getting the standard >> error without going to R? >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> The Excel help is rather cryptic by :"Returns the standard error of the >> predicted y-value for each x in the regression. The standard error is a >> measure of the amount of error in the prediction of y for an individual x." >> But clearly this is not the same as the standard error of the 'gradient' >> (slope) returned by linregress. Without checking the formula, STEYX appears >> returns the square root what most people call the mean square error (MSE). >> >> Bruce >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From henrylindsaysmith at gmail.com Mon Jan 11 18:06:16 2010 From: henrylindsaysmith at gmail.com (henry lindsay smith) Date: Mon, 11 Jan 2010 23:06:16 +0000 Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6 In-Reply-To: <27026778.post@talk.nabble.com> References: <4AE5DEDF.7070701@asu.edu> <26402986.post@talk.nabble.com> <3d375d730911172231i4cf42760l80038a00f84fa7c8@mail.gmail.com> <27026778.post@talk.nabble.com> Message-ID: <6f0383341001111506w31a07522xbc869c94fb0bcd04@mail.gmail.com> On Wed, Jan 6, 2010 at 5:15 AM, dpfrota wrote: > > > > Robert Kern-2 wrote: > > > > On Wed, Nov 18, 2009 at 00:29, dpfrota wrote: > >> > >> What is the meaning of these adresses? > >> I opened these files, and they has some strange lines. The first file > has > >> only " __import__('pkg_resources').declare_namespace(__name__) ". Is > >> module > >> PKG necessary? > > > > These enable the scikits namespace such that you can have multiple > > scikits packages installed (possibly to separate locations). > > > > -- > > Robert Kern > > > > "I have come to believe that the whole world is an enigma, a harmless > > enigma that is made terrible by our own mad attempt to interpret it as > > though it had an underlying truth." > > -- Umberto Eco > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > I made some tests and I am almost sure the problem is with this file: > "C:\Python26\Lib\site-packages\scikits\audiolab\pysndfile\_sndfile.pyd". > > But I don?t know how to see its contents or fix the problem. > > Any more tips? (Please!) > -- > View this message in context: > http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p27026778.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I have this problem as well, as far as I am aware its not fixed and is a problem linking to the sndfile.dll. what are you using audiolab for? I have got round the problem by using wavfile in scipy to open and read wavfiles and pyaudiere to play audio. I even got 24bit wav files to read also I had to alter wavfile.py in my scipy distribution which is not advisable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan at ajackson.org Mon Jan 11 18:22:54 2010 From: alan at ajackson.org (alan at ajackson.org) Date: Mon, 11 Jan 2010 17:22:54 -0600 Subject: [SciPy-User] Trying to use PIL and numpy Message-ID: <20100111172254.692ed02a@ajackson.org> I'm having some issues trying to use PIL and numpy (for the first time). It's probably something simple, it usually is. When I run the following, the output is all buggered up. It looks like the array indicies got switched about somewhere. import Image im = Image.open('test.ppm') im2 = im.convert(mode='F') a = np.asarray(im2) imback2 = Image.fromarray(a) imback = imback2.convert(mode='RGB') imback.save('testout.png') I tried removing bits, and it is the asarray -> fromarray sequence that messes stuff up. I'm running Karmic Koala with Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) numpy 1.3.0 Image 1.1.6 -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From afraser at lanl.gov Mon Jan 11 18:47:46 2010 From: afraser at lanl.gov (Andy Fraser) Date: Mon, 11 Jan 2010 16:47:46 -0700 Subject: [SciPy-User] Trying to use PIL and numpy In-Reply-To: <20100111172254.692ed02a@ajackson.org> (alan@ajackson.org's message of "Mon\, 11 Jan 2010 17\:22\:54 -0600") References: <20100111172254.692ed02a@ajackson.org> Message-ID: <87ocl0gq59.fsf@lanl.gov> >>>>> "AJ" == writes: AJ> I'm having some issues trying to use PIL and numpy (for the AJ> first time). It's probably something simple, it usually is. I'm learning to work with images too. I started with PIL.Image for looking at images, but now I am moving towards pyFltk and ImageMagick. I find it difficult to keep track of how bits in arrays get mapped to pixels on the screen. I end up with lines like A = A.transpose((1,0,2))[::-1,:,:] in my code. Here is a utility that depends on PIL that I use to look at data: from PIL import Image # /usr/share/pyshared/PIL/Image.py def display(A,msg='Default msg for displaying an array', MAX=None): import tempfile, os if A.ndim == 3: if A.shape[0] == 3: # This is gdal format A = A.transpose((1,2,0)) else: A = A.transpose((1,0,2))[::-1,:,:] if A.dtype == numpy.dtype(numpy.bool): A = numpy.array(A*255,numpy.uint8) if A.dtype != numpy.dtype(numpy.uint8): if MAX == None: MAX = A.max(0).max(0) MIN = A.min(0).min(0) scale = 1.0/(MAX-MIN) T = (A-MIN)*scale A = numpy.array(T*256,numpy.uint8) Name = tempfile.mktemp(dir='temp') image = Image.fromarray(A) image.save(Name,'PPM') #os.system('eog %s'%Name) # eog is eye of gnome os.system('display %s'%Name) # display from ImageMagick print msg os.system('rm %s'%Name) return From joebarfett at yahoo.ca Mon Jan 11 19:18:07 2010 From: joebarfett at yahoo.ca (Joe Barfett) Date: Mon, 11 Jan 2010 16:18:07 -0800 (PST) Subject: [SciPy-User] ifft on images, symmetry artifacts? Message-ID: <440587.18866.qm@web59411.mail.ac4.yahoo.com> Hello, I'm using scipy (numpy.fft.fft2) to transform an image into the frequency domain. Then by using numpy.fft.ifft2 to transform the same image back into the spatial domain, I find that I get symmetry in the image around a reflection line (and not the original image). Google has revealed websites like this one: http://www.rzuser.uni-heidelberg.de/~ge6/Programing/convolution.html This is the code snippet they use: def Convolution(image1,image2): """ Simple convolution example """ fftimage = fft2(image1)*fft2(image2) return ifft2(fftimage).real #end of Convolution which uses ifft but generates appropriate output. They do however only use the real component of the frequency domain image. I find the exact same approach does not work in my case, but rather gives these weird symmetries. It's been a few weeks of hacking and I would really appreciate the guidance of someone more experienced than me. Thanks a great deal if you know the answer! joe __________________________________________________________________ Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now http://ca.toolbar.yahoo.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 11 20:08:19 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 20:08:19 -0500 Subject: [SciPy-User] ifft on images, symmetry artifacts? In-Reply-To: <440587.18866.qm@web59411.mail.ac4.yahoo.com> References: <440587.18866.qm@web59411.mail.ac4.yahoo.com> Message-ID: <1cd32cbb1001111708s7d4ffa1aj5054571859a6822d@mail.gmail.com> On Mon, Jan 11, 2010 at 7:18 PM, Joe Barfett wrote: > Hello, > I'm using scipy (numpy.fft.fft2) to transform an image into the frequency > domain. Then by using numpy.fft.ifft2 to transform the same image back into > the spatial domain, I find that I get symmetry in the image around a > reflection line (and not the original image). > Google has revealed websites like this one: > http://www.rzuser.uni-heidelberg.de/~ge6/Programing/convolution.html > This is the code snippet they use: > > def Convolution(image1,image2): > """ Simple convolution example """ > fftimage = fft2(image1)*fft2(image2) > return ifft2(fftimage).real > #end of Convolution > > which uses ifft but generates appropriate output. They do however only use > the real component of the frequency domain image. I find the exact same > approach does not work in my case, but rather gives these weird symmetries. > It's been a few weeks of hacking and I would really appreciate the guidance > of someone more experienced than me. Thanks a great deal if you know the > answer! > joe > Something I recently found that might be helpful http://blogs.mathworks.com/steve/2009/12/04/fourier-transform-visualization-using-windowing/ http://shalin.wordpress.com/2009/12/06/fftifft/ (I know next to nothing about image processing but struggle with fft in general) Josef > > > ------------------------------ > The new Internet Explorer? 8 - Faster, safer, easier. Optimized for Yahoo! > *Get it Now for Free!* > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_baddeley at yahoo.com.au Mon Jan 11 20:21:25 2010 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Mon, 11 Jan 2010 17:21:25 -0800 (PST) Subject: [SciPy-User] ifft on images, symmetry artifacts? In-Reply-To: <440587.18866.qm@web59411.mail.ac4.yahoo.com> References: <440587.18866.qm@web59411.mail.ac4.yahoo.com> Message-ID: <243264.91705.qm@web33001.mail.mud.yahoo.com> There are few possibilities - the most likely is that you are taking either the real part or the absolute value in the frequency domain. This kills all the phase information and results in a symmetric image. Note that the code snippet you cite takes the real part AFTER the inverse transformation, which is perfectly legit. hope this helps, David ________________________________ From: Joe Barfett To: scipy-user at scipy.org Sent: Tue, 12 January, 2010 1:18:07 PM Subject: [SciPy-User] ifft on images, symmetry artifacts? Hello, I'm using scipy (numpy.fft.fft2) to transform an image into the frequency domain. Then by using numpy.fft.ifft2 to transform the same image back into the spatial domain, I find that I get symmetry in the image around a reflection line (and not the original image). Google has revealed websites like this one: http://www.rzuser.uni-heidelberg.de/~ge6/Programing/convolution.html This is the code snippet they use: def Convolution(image1,image2): """ Simple convolution example """ fftimage = fft2(image1)*fft2(image2) return ifft2(fftimage).real #end of Convolution which uses ifft but generates appropriate output. They do however only use the real component of the frequency domain image. I find the exact same approach does not work in my case, but rather gives these weird symmetries. It's been a few weeks of hacking and I would really appreciate the guidance of someone more experienced than me. Thanks a great deal if you know the answer! joe ________________________________ The new Internet Explorer? 8 - Faster, safer, easier. Optimized for Yahoo! Get it Now for Free! -------------- next part -------------- An HTML attachment was scrubbed... URL: From koepsell at gmail.com Mon Jan 11 21:02:53 2010 From: koepsell at gmail.com (Kilian Koepsell) Date: Mon, 11 Jan 2010 18:02:53 -0800 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: References:

Message-ID: <32EE5911-8078-4F1B-887E-00FBE702057E@gmail.com> Jordi, > On Jan 7, 2010, at 1:09 AM, Jordi Molins Coronado wrote: >> >> Hello, I am new to this forum. I am looking for a numerical >> solution to the inverse problem of an Ising model (or a model not- >> unlike the Ising model, see below). I have seen an old discussion, >> but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html >> ). >> You might want to check out a recent method developed in our group, called "Minimum Probability Flow Learning" that allows very fast parameter estimation of basically any distribution -- including the Ising model. A 100 unit ising model can be fitted within about 1 minute (see Fig. 3). The paper is here: http://arxiv.org/abs/0906.4779 Kilian -- Kilian Koepsell, PhD Redwood Center for Theoretical Neuroscience Helen Wills Neuroscience Institute, UC Berkeley 156 Stanley Hall, MC# 3220 , Berkeley, CA 94720 From Jim.Vickroy at noaa.gov Tue Jan 12 10:27:53 2010 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Tue, 12 Jan 2010 08:27:53 -0700 Subject: [SciPy-User] Trying to use PIL and numpy In-Reply-To: <20100111172254.692ed02a@ajackson.org> References: <20100111172254.692ed02a@ajackson.org> Message-ID: <4B4C94F9.9060207@noaa.gov> alan at ajackson.org wrote: > I'm having some issues trying to use PIL and numpy (for the first time). > It's probably something simple, it usually is. > > When I run the following, the output is all buggered up. It looks like > the array indicies got switched about somewhere. > > import Image > im = Image.open('test.ppm') > im2 = im.convert(mode='F') > > a = np.asarray(im2) > imback2 = Image.fromarray(a) > > imback = imback2.convert(mode='RGB') > imback.save('testout.png') > > I tried removing bits, and it is the asarray -> fromarray sequence that > messes stuff up. > > I'm running Karmic Koala with > Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) > numpy 1.3.0 > Image 1.1.6 > > > I believe there is a logic error in the PIL 1.1.6 fromarray() procedure (see http://mail.scipy.org/pipermail/numpy-discussion/2006-December/024903.html) that may be relevant. Try explicitly specifying the mode parameter in the fromarray(...) call. -- jv -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.shepard at gmail.com Tue Jan 12 10:50:10 2010 From: peter.shepard at gmail.com (Pete Shepard) Date: Tue, 12 Jan 2010 07:50:10 -0800 Subject: [SciPy-User] dendogram axis display Message-ID: <5c2c43621001120750n7e064488n5dbbff1bd0b46a6c@mail.gmail.com> Hello, I am making a dendogram of clusters using "hcluster.py", the x-axis contains the distances between each cluster. I would like for the y-axis to also display the distances between the clusters, is this possible? Also, can the scale of the graph be controlled, eg display clusters that are separated by distances of <100? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dharhas.Pothina at twdb.state.tx.us Tue Jan 12 13:39:13 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Tue, 12 Jan 2010 12:39:13 -0600 Subject: [SciPy-User] Masking multiple fields in a structured timeseriesobject. Message-ID: <4B4C6D710200009B00026515@GWWEB.twdb.state.tx.us> Sorry I'm still having trouble figuring out how to do multiple masking on a limited date range rather than the entire series. For a simpler example, look at the below ts construct: >>>ndtype=[('name','|S3'),('v1',float),('v2',float)] >>>series=ts.time_series([("ABBC",1.1,10.),("ABD",2.2,20.),("ABBE",3.3,30),("ABBF",4.4,40),("ABG",5.5,50),("ABH",6.6,60)],dtype=ndtype, start_date=ts.now('D')) >>>sdate = series.dates[1] >>>edate = series.dates[4] now I want to mask the v1 value between sdate and edate that contain 'BB' in the name and v1<4 and v2>10. ie the 3rd element ("ABBE",3.3,30) would become ("ABBE",--,30) thanks, - dharhas From pgmdevlist at gmail.com Tue Jan 12 15:10:48 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Jan 2010 15:10:48 -0500 Subject: [SciPy-User] Masking multiple fields in a structured timeseriesobject. In-Reply-To: <4B4C6D710200009B00026515@GWWEB.twdb.state.tx.us> References: <4B4C6D710200009B00026515@GWWEB.twdb.state.tx.us> Message-ID: On Jan 12, 2010, at 1:39 PM, Dharhas Pothina wrote: > Sorry I'm still having trouble figuring out how to do multiple masking on a limited date range rather than the entire series. For a simpler example, look at the below ts construct: > >>>> ndtype=[('name','|S3'),('v1',float),('v2',float)] >>>> series=ts.time_series([("ABBC",1.1,10.),("ABD",2.2,20.),("ABBE",3.3,30),("ABBF",4.4,40),("ABG",5.5,50),("ABH",6.6,60)],dtype=ndtype, start_date=ts.now('D')) >>>> sdate = series.dates[1] >>>> edate = series.dates[4] > > now I want to mask the v1 value between sdate and edate that contain 'BB' in the name and v1<4 and v2>10. ie the 3rd element ("ABBE",3.3,30) would become ("ABBE",--,30) Well, if I do your job for you, where's the fun ;) ? Seriously, why don't you build several masks and combine them as you want ? * Make a mask M1 for the 'BB' in name (use an approach similar to which I posted last time) * Make a mask M2 that tests the values: >>> M2=(_series['v1']<4)&(_series['v2']>10) * Make a mask M3 that test for the dates: >>> M3=(series.dates>=sdate)&(series.dates>> Mall=np.array(M1&M2&M3, dtype=bool) (we need to make sure that Mall is a boolean ndarray, and not an array of 0 and 1 else we mess up fancy indexing) * Mask 'v1' according to the new mask: >>> series['v1'][Mall]=ma.masked Notes: * you use 3 characters for name, but try to put strings with 4 characters. Expect problems. * When you build the masks, use series.series as much as you can (that'll save you some time) From Dharhas.Pothina at twdb.state.tx.us Tue Jan 12 15:33:47 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Tue, 12 Jan 2010 14:33:47 -0600 Subject: [SciPy-User] Masking multiple fields in a structuredtimeseriesobject. Message-ID: <4B4C884B0200009B0002653A@GWWEB.twdb.state.tx.us> Thank you, I finally got it. I guess I had difficulty in conceptually treating the series and dates separately. I kept trying to apply the masks using 'series[start:end]' and ended up with my indices mismatching. on a related note is there any way to do the following without using a loop? _series['name'][1:3] == 'BB' right now this gives me 1st and 2nd entries in _series['name'] rather than the 1st and 2nd characters for all entries in _series['name'] thanks. - dharhas >>> Pierre GM 01/12/10 2:11 PM >>> On Jan 12, 2010, at 1:39 PM, Dharhas Pothina wrote: > Sorry I'm still having trouble figuring out how to do multiple masking on a limited date range rather than the entire series. For a simpler example, look at the below ts construct: > >>>> ndtype=[('name','|S3'),('v1',float),('v2',float)] >>>> series=ts.time_series([("ABBC",1.1,10.),("ABD",2.2,20.),("ABBE",3.3,30),("ABBF",4.4,40),("ABG",5.5,50),("ABH",6.6,60)],dtype=ndtype, start_date=ts.now('D')) >>>> sdate = series.dates[1] >>>> edate = series.dates[4] > > now I want to mask the v1 value between sdate and edate that contain 'BB' in the name and v1<4 and v2>10. ie the 3rd element ("ABBE",3.3,30) would become ("ABBE",--,30) Well, if I do your job for you, where's the fun ;) ? Seriously, why don't you build several masks and combine them as you want ? * Make a mask M1 for the 'BB' in name (use an approach similar to which I posted last time) * Make a mask M2 that tests the values: >>> M2=(_series['v1']<4)&(_series['v2']>10) * Make a mask M3 that test for the dates: >>> M3=(series.dates>=sdate)&(series.dates>> Mall=np.array(M1&M2&M3, dtype=bool) (we need to make sure that Mall is a boolean ndarray, and not an array of 0 and 1 else we mess up fancy indexing) * Mask 'v1' according to the new mask: >>> series['v1'][Mall]=ma.masked Notes: * you use 3 characters for name, but try to put strings with 4 characters. Expect problems. * When you build the masks, use series.series as much as you can (that'll save you some time) _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From pgmdevlist at gmail.com Tue Jan 12 15:56:30 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Jan 2010 15:56:30 -0500 Subject: [SciPy-User] Masking multiple fields in a structuredtimeseriesobject. In-Reply-To: <4B4C884B0200009B0002653A@GWWEB.twdb.state.tx.us> References: <4B4C884B0200009B0002653A@GWWEB.twdb.state.tx.us> Message-ID: <7720AAA9-CAA2-454D-A62B-63EFAEDF7A81@gmail.com> On Jan 12, 2010, at 3:33 PM, Dharhas Pothina wrote: > Thank you, I finally got it. I guess I had difficulty in conceptually treating the series and dates separately. I kept trying to apply the masks using 'series[start:end]' and ended up with my indices mismatching. > > on a related note is there any way to do the following without using a loop? > > _series['name'][1:3] == 'BB' > > right now this gives me 1st and 2nd entries in _series['name'] rather than the 1st and 2nd characters for all entries in _series['name'] _series['name'] is a 1D array w/ dtype '|S3'. What you'd want is to transform it into a 2D array of '|S1'. You could try to look chararray, but I'm not sure it'll help you. I'm afraid you gonna have to stick w/ the for loop. You may get it inlined, though: ['BB' in _ for _ in _series['name']] > thanks. From Dharhas.Pothina at twdb.state.tx.us Tue Jan 12 16:01:47 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Tue, 12 Jan 2010 15:01:47 -0600 Subject: [SciPy-User] Masking multiple fields in astructuredtimeseriesobject. In-Reply-To: <7720AAA9-CAA2-454D-A62B-63EFAEDF7A81@gmail.com> References: <4B4C884B0200009B0002653A@GWWEB.twdb.state.tx.us> <7720AAA9-CAA2-454D-A62B-63EFAEDF7A81@gmail.com> Message-ID: <4B4C8EDA.63BA.009B.0@twdb.state.tx.us> _series['name'] is a 1D array w/ dtype '|S3'. What you'd want is to transform it into a 2D array of '|S1'. You could try to look chararray, but I'm not sure it'll help you. I'm afraid you gonna have to stick w/ the for loop. You may get it inlined, though: ['BB' in _ for _ in _series['name']] thanks. Would the inline version be any faster or is it pretty much equivalent to an ordinary loop? - dharhas From pgmdevlist at gmail.com Tue Jan 12 16:06:10 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Jan 2010 16:06:10 -0500 Subject: [SciPy-User] Masking multiple fields in astructuredtimeseriesobject. In-Reply-To: <4B4C8EDA.63BA.009B.0@twdb.state.tx.us> References: <4B4C884B0200009B0002653A@GWWEB.twdb.state.tx.us> <7720AAA9-CAA2-454D-A62B-63EFAEDF7A81@gmail.com> <4B4C8EDA.63BA.009B.0@twdb.state.tx.us> Message-ID: On Jan 12, 2010, at 4:01 PM, Dharhas Pothina wrote: > > > > _series['name'] is a 1D array w/ dtype '|S3'. What you'd want is to transform it into a 2D array of '|S1'. You could try to look chararray, but I'm not sure it'll help you. I'm afraid you gonna have to stick w/ the for loop. You may get it inlined, though: > ['BB' in _ for _ in _series['name']] > > thanks. Would the inline version be any faster or is it pretty much equivalent to an ordinary loop? I think inlined loops are a tad faster than the regular ones (they get optimized by the interpreter, if I understand correctly). Not 100% sure, though. From emmanuelle.gouillart at normalesup.org Tue Jan 12 17:02:51 2010 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Tue, 12 Jan 2010 23:02:51 +0100 Subject: [SciPy-User] is it worth working on ndimage documentation? Message-ID: <20100112220251.GC7417@phare.normalesup.org> Hello, as I'm using quite frequently some functions in scipy.ndimage (mostly mathematical morphology operations), I was considering working on their docstrings on the doc wiki. Docstrings indeed don't conform to the documentation standard and are often quite terse. However, I would like to know beforehand whether ndimage has a future in scipy, or whether if will be replaced at some point by the scikit image? So, it is worth improving the docstrings in ndimage? Cheers, Emmanuelle From dwf at cs.toronto.edu Tue Jan 12 17:25:41 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 12 Jan 2010 17:25:41 -0500 Subject: [SciPy-User] is it worth working on ndimage documentation? In-Reply-To: <20100112220251.GC7417@phare.normalesup.org> References: <20100112220251.GC7417@phare.normalesup.org> Message-ID: <26A3BF94-13B2-464E-8133-65BF5EE9F98A@cs.toronto.edu> Hi Emmanuelle, I think it certainly does. If scikits.image does ever supersede ndimage (and I don't think it will - scikits.image is mainly focused on 2D images whereas I think ndimage is used for lots of 3D and 4D voxel images too?), it will likely take on functions from ndimage as well... in fact I think there is a ticket somewhere that contains parts of ndimage rewritten in Cython by the CellProfiler people (I don't have time to dig through my email to find it). Needless to say I think there is enough current use of ndimage that it's not going anywhere any time soon. David On 12-Jan-10, at 5:02 PM, Emmanuelle Gouillart wrote: > Hello, > > as I'm using quite frequently some functions in scipy.ndimage > (mostly mathematical morphology operations), I was considering > working on > their docstrings on the doc wiki. Docstrings indeed don't conform to > the > documentation standard and are often quite terse. > > However, I would like to know beforehand whether ndimage has a > future in scipy, or whether if will be replaced at some point by the > scikit image? So, it is worth improving the docstrings in ndimage? > > Cheers, > > Emmanuelle > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From cycomanic at gmail.com Tue Jan 12 17:54:09 2010 From: cycomanic at gmail.com (Jochen Schroeder) Date: Wed, 13 Jan 2010 09:54:09 +1100 Subject: [SciPy-User] ifft on images, symmetry artifacts? In-Reply-To: <440587.18866.qm@web59411.mail.ac4.yahoo.com> References: <440587.18866.qm@web59411.mail.ac4.yahoo.com> Message-ID: <20100112225407.GA2238@cudos0803> On 01/11/10 16:18, Joe Barfett wrote: > Hello, > I'm using scipy (numpy.fft.fft2) to transform an image into the frequency > domain. Then by using numpy.fft.ifft2 to transform the same image back into the > spatial domain, I find that I get symmetry in the image around a reflection > line (and not the original image). I'm struggling a bit to understand what exactly you're doing. In general you have to be careful when you plot your resulting function, i.e. do you want to plot the real part or the absolute value of the image? Anyway can you maybe post your code and the image you're converting? can sometimes lead to weird symmetry artifacts, e.g if you > Google has revealed websites like this one: http://www.rzuser.uni-heidelberg.de > /~ge6/Programing/convolution.html > This is the code snippet they use: > > def Convolution(image1,image2): > """ Simple convolution example """ > fftimage = fft2(image1)*fft2(image2) > return ifft2(fftimage).real > #end of Convolution > > which uses ifft but generates appropriate output. They do however only use the > real component of the frequency domain image. I find the exact same approach > does not work in my case, but rather gives these weird symmetries. > It's been a few weeks of hacking and I would really appreciate the guidance of > someone more experienced than me. Thanks a great deal if you know the answer! > joe > > > ??????????????????????????????????????????????????????????????????????????????? > The new Internet Explorer 8 - Faster, safer, easier. Optimized for Yahoo! Get > it Now for Free! > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From emmanuelle.gouillart at normalesup.org Wed Jan 13 02:56:19 2010 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Wed, 13 Jan 2010 08:56:19 +0100 Subject: [SciPy-User] is it worth working on ndimage documentation? In-Reply-To: <26A3BF94-13B2-464E-8133-65BF5EE9F98A@cs.toronto.edu> References: <20100112220251.GC7417@phare.normalesup.org> <26A3BF94-13B2-464E-8133-65BF5EE9F98A@cs.toronto.edu> Message-ID: <20100113075619.GA6894@phare.normalesup.org> Thanks for your answer, David! Emmanuelle On Tue, Jan 12, 2010 at 05:25:41PM -0500, David Warde-Farley wrote: > Hi Emmanuelle, > I think it certainly does. If scikits.image does ever supersede > ndimage (and I don't think it will - scikits.image is mainly focused > on 2D images whereas I think ndimage is used for lots of 3D and 4D > voxel images too?), it will likely take on functions from ndimage as > well... in fact I think there is a ticket somewhere that contains > parts of ndimage rewritten in Cython by the CellProfiler people (I > don't have time to dig through my email to find it). > Needless to say I think there is enough current use of ndimage that > it's not going anywhere any time soon. > David > On 12-Jan-10, at 5:02 PM, Emmanuelle Gouillart wrote: > > Hello, > > as I'm using quite frequently some functions in scipy.ndimage > > (mostly mathematical morphology operations), I was considering > > working on > > their docstrings on the doc wiki. Docstrings indeed don't conform to > > the > > documentation standard and are often quite terse. > > However, I would like to know beforehand whether ndimage has a > > future in scipy, or whether if will be replaced at some point by the > > scikit image? So, it is worth improving the docstrings in ndimage? > > Cheers, > > Emmanuelle > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From jordi_molins at hotmail.com Wed Jan 13 03:42:07 2010 From: jordi_molins at hotmail.com (Jordi Molins Coronado) Date: Wed, 13 Jan 2010 09:42:07 +0100 Subject: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? In-Reply-To: <32EE5911-8078-4F1B-887E-00FBE702057E@gmail.com> References:

, <32EE5911-8078-4F1B-887E-00FBE702057E@gmail.com> Message-ID: Hello, I find all the ideas posted in reply to my message very interesting, thank you very much to all who have answered to my question. Especially, I would like to know more about Kilian's and Robin's suggestions. In particular, I find difficult to understand and translate the ideas posted by them into my background. Of course, this is not Kilian's or Robin's fault, but my complete fault due to lack of knowledge. To Robin: - Is there a paper covering your package, but explained in layman's terms, not requiring previous knowledge on the subject? Or maybe a simple but fully-worked example (ideally closely related to the Ising model) that can be used in your package to see how everything works. To Kilian: - Do you have a computer package that covers that computations in your paper? Or do you have the Ising code available to distribution? I would be very interested to know more about the Ising implementation of your paper. Kind regards Jordi > CC: jordi_molins at hotmail.com > From: koepsell at gmail.com > To: scipy-user at scipy.org > Subject: Re: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? > Date: Mon, 11 Jan 2010 18:02:53 -0800 > > Jordi, > > > On Jan 7, 2010, at 1:09 AM, Jordi Molins Coronado wrote: > >> > >> Hello, I am new to this forum. I am looking for a numerical > >> solution to the inverse problem of an Ising model (or a model not- > >> unlike the Ising model, see below). I have seen an old discussion, > >> but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html > >> ). > >> > > You might want to check out a recent method developed in our group, > called "Minimum Probability Flow Learning" that allows very fast > parameter > estimation of basically any distribution -- including the Ising model. > A 100 unit ising model can be fitted within about 1 minute (see Fig. 3). > The paper is here: http://arxiv.org/abs/0906.4779 > > Kilian > > -- > Kilian Koepsell, PhD > Redwood Center for Theoretical Neuroscience > Helen Wills Neuroscience Institute, UC Berkeley > 156 Stanley Hall, MC# 3220 , Berkeley, CA 94720 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gorkypl at gmail.com Wed Jan 13 18:15:59 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Thu, 14 Jan 2010 00:15:59 +0100 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? Message-ID: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> hello, I'm doing some research on climate data, using Python with NumPy. Yesterday I started implementing the scikits.timeseries package in my work, which occured to be almost perfect idea, but recently I ran into a problem with data visualisation. After some (not many) tests it seems that something weird happens when there is a gap in the data - the drawing is stopped there. To be more clear - after compiling the first example from the page: http://pytseries.sourceforge.net/lib.plotting.examples.html the result is: http://img191.imageshack.us/img191/5506/testg.png So it looks like the plotting was somehow 'stopped' after the first occurence of a hole in the data. As you can see the horizontal scale is correct (it's the same as on the webpage), but the one along the y-axis seems to be aligned to fit the broken plot. The other two examples (with consistent datasets) are plotted without a problem. Do you have any idea what could be the reason of this? What settings/packages should I check? greetings, Pawe? Rumian From pgmdevlist at gmail.com Wed Jan 13 19:00:51 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 Jan 2010 19:00:51 -0500 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> Message-ID: <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> On Jan 13, 2010, at 6:15 PM, Pawe? Rumian wrote: > hello, > > I'm doing some research on climate data, using Python with NumPy. Cool ! You can also check scikits.hydroclimpy, a set of extensions to scikits.timeseries with focus on climate analysis. > Yesterday I started implementing the scikits.timeseries package in my > work, which occured to be almost perfect idea, but recently I ran into > a problem with data visualisation. > > After some (not many) tests it seems that something weird happens when > there is a gap in the data - the drawing is stopped there. Ah. You're using the same data, right ? > > > The other two examples (with consistent datasets) are plotted without a problem. > > Do you have any idea what could be the reason of this? Might be a bug recently introduced. Let me check and get back to you. Note that this should not deter you from using scikits.timeseries. You can always plot your data using the regular matplotlib options (using your dates as x and your series as y). Lemme know if you need more help or if the doc is lacking on some aspects. From davide_fiocco at yahoo.it Wed Jan 13 19:47:44 2010 From: davide_fiocco at yahoo.it (davide_fiocco at yahoo.it) Date: Wed, 13 Jan 2010 16:47:44 -0800 (PST) Subject: [SciPy-User] Get array of separation vectors from an array a vectors Message-ID: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> Hi folks, I'm new to Python and I'm trying to implement a basic molecular dynamics code. The problem I have is the following: Suppose you have an array of N vectors in R^3 like: A = [ [x1,y1,z1], [x2,y2,z2], ..., [xN,yN,zN] ] what I need is to get N!/(2! (N-2)!) separation vectors between the vectors in A, i.e. D = [ [x1-x2,y1-y2,z1-z2], [x1-x3,y1-y3,z1-z3], ..., [x2-x3,y2-y3,z2- z3], ..., [x_i-x_j,y_i-y_j,z_i-z_j],...] and I need the code to be FAST! Else I think I'll switch to a Fortran/ F2Py implementation. I'd say this task is not too different to what scipy.spatial.distance.pdist() does, with the difference that i don't need (the euclidean, say) distance but the differences between all the pairs of vectors in A. All suggestions will be very welcome, and I apologize if this is too trivial! Thank you. Davide From josef.pktd at gmail.com Wed Jan 13 20:34:11 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 13 Jan 2010 20:34:11 -0500 Subject: [SciPy-User] Get array of separation vectors from an array a vectors In-Reply-To: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> References: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> Message-ID: <1cd32cbb1001131734o1414e0b3r6dea85940ee8349@mail.gmail.com> On Wed, Jan 13, 2010 at 7:47 PM, davide_fiocco at yahoo.it wrote: > Hi folks, > I'm new to Python and I'm trying to implement a basic molecular > dynamics code. > > The problem I have is the following: > Suppose you have an array of N vectors in R^3 like: > A = [ [x1,y1,z1], [x2,y2,z2], ..., [xN,yN,zN] ] > > what I need is to get N!/(2! (N-2)!) separation vectors between the > vectors in A, i.e. > D = [ [x1-x2,y1-y2,z1-z2], [x1-x3,y1-y3,z1-z3], ..., [x2-x3,y2-y3,z2- > z3], ..., ?[x_i-x_j,y_i-y_j,z_i-z_j],...] > > and I need the code to be FAST! Else I think I'll switch to a Fortran/ > F2Py implementation. > > I'd say this task is not too different to what > scipy.spatial.distance.pdist() does, with the difference that i don't > need (the euclidean, say) distance but the differences between all the > pairs of vectors in A. > > All suggestions will be very welcome, and I apologize if this is too > trivial! Thank you. Something along the following, is the only thing I can come up with. Still requires intermediate arrays, and I thought I saw somewhere in numpy a function that creates the indices for a triu (but don't remember where) import numpy as np n = 5 #4 a = np.arange(n*3).reshape(n,3) print a #full ind0, ind1 = np.mgrid[0:n,0:n] ind0, ind1 = ind0.ravel(), ind1.ravel() d = a[ind1,:]-a[ind0,:] print d #reduced triuind0, triuind1 = np.nonzero(np.triu(np.ones((n,n)),k=1)) dr = a[triuind0,:]-a[triuind1,:] print dr ''' >>> import scipy >>> scipy.comb(4,2,exact=1) 6L >>> scipy.comb(5,2,exact=1) 10L ''' Warning quickly written and untested. >>> a array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]]) >>> dr array([[ -3, -3, -3], [ -6, -6, -6], [ -9, -9, -9], [-12, -12, -12], [ -3, -3, -3], [ -6, -6, -6], [ -9, -9, -9], [ -3, -3, -3], [ -6, -6, -6], [ -3, -3, -3]]) Josef > > Davide > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From j33433 at gmail.com Wed Jan 13 20:34:56 2010 From: j33433 at gmail.com (James) Date: Wed, 13 Jan 2010 20:34:56 -0500 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> Message-ID: This is purely a guess, but I wonder if quotes_historical_yahoo failed to fully fetch the quotes, then perhaps cached the bad data. On Wed, Jan 13, 2010 at 6:15 PM, Pawe? Rumian wrote: > hello, > > I'm doing some research on climate data, using Python with NumPy. > Yesterday I started implementing the scikits.timeseries package in my > work, which occured to be almost perfect idea, but recently I ran into > a problem with data visualisation. > > After some (not many) tests it seems that something weird happens when > there is a gap in the data - the drawing is stopped there. > > To be more clear - after compiling the first example from the page: > http://pytseries.sourceforge.net/lib.plotting.examples.html > the result is: > http://img191.imageshack.us/img191/5506/testg.png > > So it looks like the plotting was somehow 'stopped' after the first > occurence of a hole in the data. > > As you can see the horizontal scale is correct (it's the same as on > the webpage), but the one along the y-axis seems to be aligned to fit > the broken plot. > > The other two examples (with consistent datasets) are plotted without a problem. > > Do you have any idea what could be the reason of this? > What settings/packages should I check? > > greetings, > Pawe? Rumian > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From alan at ajackson.org Wed Jan 13 22:32:04 2010 From: alan at ajackson.org (alan at ajackson.org) Date: Wed, 13 Jan 2010 21:32:04 -0600 Subject: [SciPy-User] Trying to use PIL and numpy - SOLVED- In-Reply-To: <4B4C94F9.9060207@noaa.gov> References: <20100111172254.692ed02a@ajackson.org> <4B4C94F9.9060207@noaa.gov> Message-ID: <20100113213204.2057ab13@ajackson.org> >alan at ajackson.org wrote: >> I'm having some issues trying to use PIL and numpy (for the first time). >> It's probably something simple, it usually is. >> >> When I run the following, the output is all buggered up. It looks like >> the array indicies got switched about somewhere. >> >> import Image >> im = Image.open('test.ppm') >> im2 = im.convert(mode='F') >> >> a = np.asarray(im2) >> imback2 = Image.fromarray(a) >> >> imback = imback2.convert(mode='RGB') >> imback.save('testout.png') >> >> I tried removing bits, and it is the asarray -> fromarray sequence that >> messes stuff up. >> >> I'm running Karmic Koala with >> Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) >> numpy 1.3.0 >> Image 1.1.6 >> >> >> >I believe there is a logic error in the PIL 1.1.6 fromarray() procedure >(see >http://mail.scipy.org/pipermail/numpy-discussion/2006-December/024903.html) >that may be relevant. >Try explicitly specifying the mode parameter in the fromarray(...) call. > -- jv Bingo! editing that line to imback2 = Image.fromarray(a, mode='F') fixes the problem. -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From gorkypl at gmail.com Thu Jan 14 03:15:19 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Thu, 14 Jan 2010 09:15:19 +0100 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> Message-ID: <5158a0651001140015j253df04y94879107bb902046@mail.gmail.com> > Cool ! You can also check scikits.hydroclimpy, a set of extensions to > scikits.timeseries with focus on climate analysis. I'm reading the docs right now - seems that I've reinvented the wheel sometimes... The good side is that I've started two weeks ago, so I didn't manage to waste too much time. > Might be a bug recently introduced. Let me check and get back to you. After more testing it seems to me more like a bug in matplotlib. It occurs only when plotting lines, using '-' or '--'. When I changed marks to '.', it worked. > Lemme know if you need more help or if the doc is lacking on some aspects. I will be playing with this stuff for at least a year, so I probably will :) Anyway - great job! greetings, Pawe? Rumian From qa at takb.net Thu Jan 14 04:23:28 2010 From: qa at takb.net (Torsten Andre) Date: Thu, 14 Jan 2010 10:23:28 +0100 Subject: [SciPy-User] Integration of double integral with integration variable as Message-ID: <4B4EE290.7070600@takb.net> Hey everyone, I am new to SciPy, but need to integrate something like this, where the boundaries of the inner integral are terms of outer variable's integration variable: \int{\int{sin(y)dy}_{-x}^{+x}dx}_0^1 Is this feasible in SciPy? I tried using quad but it only complains that x is not defined. Unfortunately I was unable to find anything on the list or in the documentation. Thanks for your time. Cheers, Torsten From ljmamoreira at gmail.com Thu Jan 14 08:21:04 2010 From: ljmamoreira at gmail.com (Jose Amoreira) Date: Thu, 14 Jan 2010 13:21:04 +0000 Subject: [SciPy-User] Integration of double integral with integration variable as In-Reply-To: <4B4EE290.7070600@takb.net> References: <4B4EE290.7070600@takb.net> Message-ID: <201001141321.04818.ljmamoreira@gmail.com> Torsten, Your example is easy! Since sin(y) is an odd function, integrating over [-x,x] gives zero and that's it. For a more illustrative example, replace sin with cos (still easy enough to do it quicker analytically). Using scipy.integrate.quad, you do it like this (excerpt from an idle session): >>> def g(x): return quad(cos,-x,x)[0] >>> quad(g,0.,1.)[0] 0.91939538826372047 The reason I take the zero-th element of the quad output is that the remaining is an estimate of the error. Maybe you should also look into scipy.integrate.dblquad. Hope this helps. jose On Thursday 14 January 2010 09:23:28 am Torsten Andre wrote: > Hey everyone, > > I am new to SciPy, but need to integrate something like this, where the > boundaries of the inner integral are terms of outer variable's > integration variable: > > \int{\int{sin(y)dy}_{-x}^{+x}dx}_0^1 > > Is this feasible in SciPy? I tried using quad but it only complains that > x is not defined. Unfortunately I was unable to find anything on the > list or in the documentation. > > Thanks for your time. > > Cheers, > Torsten > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From Dharhas.Pothina at twdb.state.tx.us Thu Jan 14 09:02:16 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Thu, 14 Jan 2010 08:02:16 -0600 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <5158a0651001140015j253df04y94879107bb902046@mail.gmail.com> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> <5158a0651001140015j253df04y94879107bb902046@mail.gmail.com> Message-ID: <4B4ECF88.63BA.009B.0@twdb.state.tx.us> I've had this problem before. From the email exchange I had on this list about a year ago, I basically worked out that any symbols work fine ie: dot,circle, diamond etc. When you use line types like '-' or '--' and have missing or masked data in the timeseries the plotting functions don't know what to do and just fail. From what I undesrtand this is a matplotlib issue and a work around is to compress the array to remove the missing values before plotting. See: http://old.nabble.com/Re%3A-Still-having-plotting-issue-with-latest%09svnscikits.timeseries-ts20941722.html#a20944512 - dharhas >>> Pawe* Rumian 1/14/2010 2:15 AM >>> > Cool ! You can also check scikits.hydroclimpy, a set of extensions to > scikits.timeseries with focus on climate analysis. I'm reading the docs right now - seems that I've reinvented the wheel sometimes... The good side is that I've started two weeks ago, so I didn't manage to waste too much time. > Might be a bug recently introduced. Let me check and get back to you. After more testing it seems to me more like a bug in matplotlib. It occurs only when plotting lines, using '-' or '--'. When I changed marks to '.', it worked. > Lemme know if you need more help or if the doc is lacking on some aspects. I will be playing with this stuff for at least a year, so I probably will :) Anyway - great job! greetings, Pawe* Rumian _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From gorkypl at gmail.com Thu Jan 14 09:15:18 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Thu, 14 Jan 2010 15:15:18 +0100 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <4B4ECF88.63BA.009B.0@twdb.state.tx.us> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> <5158a0651001140015j253df04y94879107bb902046@mail.gmail.com> <4B4ECF88.63BA.009B.0@twdb.state.tx.us> Message-ID: <5158a0651001140615o41a9028dse24ebae157c0c069@mail.gmail.com> 2010/1/14 Dharhas Pothina : > > I've had this problem before. From the email exchange I had on this > list about a year ago, I basically worked out that any symbols work fine > ie: dot,circle, diamond etc. When you use line types like '-' or '--' > and have missing or masked data ?in the timeseries the plotting > functions don't know what to do and just fail. From what I undesrtand > this is a matplotlib issue and a work around is to compress the array to > remove the missing values before plotting. See: > > http://old.nabble.com/Re%3A-Still-having-plotting-issue-with-latest%09svnscikits.timeseries-ts20941722.html#a20944512 That's exactly the point! I've just wrote this to matplotlib-users http://old.nabble.com/line-drawing-bug-or-it's-me-doing-something-wrong--td27159104.html So thank you very much - I've already almost gone mad with this, now I can cool down :) greetings, Pawe? From Dharhas.Pothina at twdb.state.tx.us Thu Jan 14 10:26:22 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Thu, 14 Jan 2010 09:26:22 -0600 Subject: [SciPy-User] Timseries ts.tofile() Remove brackets. Message-ID: <4B4EE33E0200009B0002667D@GWWEB.twdb.state.tx.us> Hi, I'm trying to format the output of ts.tofile() and I can't find anyway to suppress the use of open and close brackets on each line. ie using tseries.tofile(cleanfile,format='%Y,%m,%d,%H,%M,%S',separator=',') saves as : 1996,06,11,21,00,00,('JOB_20090812_CXT_MW9999.csv', 0, 13.199999999999999, 28.949999999999999) 1996,06,11,22,00,00,('JOB_20090812_CXT_MW9999.csv', 0, 13.199999999999999, 28.690000000000001) ... etc While what I want is: 1996,06,11,21,00,00,'JOB_20090812_CXT_MW9999.csv', 0, 13.199999999999999, 28.949999999999999 1996,06,11,22,00,00,'JOB_20090812_CXT_MW9999.csv', 0, 13.199999999999999, 28.690000000000001 ... etc anyway of doing this without reopening the file and removing the brackets. thanks - dharhas From gorkypl at gmail.com Thu Jan 14 13:48:04 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Thu, 14 Jan 2010 19:48:04 +0100 Subject: [SciPy-User] scikits.timeseries or matplotlib plotting problem? In-Reply-To: <5158a0651001140615o41a9028dse24ebae157c0c069@mail.gmail.com> References: <5158a0651001131515r3996331eue85ac3164987e5f0@mail.gmail.com> <67D67277-03EF-43DF-8FFD-2D42C727544E@gmail.com> <5158a0651001140015j253df04y94879107bb902046@mail.gmail.com> <4B4ECF88.63BA.009B.0@twdb.state.tx.us> <5158a0651001140615o41a9028dse24ebae157c0c069@mail.gmail.com> Message-ID: <5158a0651001141048y40a01d91uec7dc0200af29068@mail.gmail.com> However not as good as I supposed... Someone in the (mentioned above) matplotlib-users group redirected me to another example: http://matplotlib.sourceforge.net/examples/pylab_examples/masked_demo.html and it doesn't work - the green line is not being drawn, until the line is changed to marks. So it looks like there is still something wrong with handling masked arrays by my instance of matplotlib... Anyway - it's probably not scikits related, but if someone would know any solution I'd be very thankful - I hope it wouldn't be considered a big offtopic... greetings, Pawe? From totalbull at mac.com Thu Jan 14 14:44:13 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Thu, 14 Jan 2010 19:44:13 +0000 Subject: [SciPy-User] Seasonal adjustment in scipy/python? References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> Message-ID: <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> Hello, I am looking to seasonally adjust some data series in Python - specifically economics in emerging markets. As you can see on the charts (www.emconfidential.com) there is a lot of seasonality to monthly data series. Example 1 (retail sales) is obvious. Example 2, CPI, is somewhat less so, but there is still some seasonality here with price falls around January and fairly high prices in December. How would I go about seasonally adjusting this data using Python and Scipy? Any canned functions? Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Thu Jan 14 14:53:33 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 14 Jan 2010 14:53:33 -0500 Subject: [SciPy-User] Timseries ts.tofile() Remove brackets. In-Reply-To: <4B4EE33E0200009B0002667D@GWWEB.twdb.state.tx.us> References: <4B4EE33E0200009B0002667D@GWWEB.twdb.state.tx.us> Message-ID: <5585199C-A83E-4E4D-B3D3-29B8DDA62BBC@gmail.com> On Jan 14, 2010, at 10:26 AM, Dharhas Pothina wrote: > Hi, > > I'm trying to format the output of ts.tofile() and I can't find anyway to suppress the use of open and close brackets on each line. ie using > ... > anyway of doing this without reopening the file and removing the brackets. Fixing the code :) Could you file a ticket ? Thanks a lot in advance. But here's a workaround >>> _tmp=ts.time_series([('AAA',1,1.),('BBB',1,2.)],dtype=[('a','|S3'),('b',int),('c',float)],start_date=ts.now('D')) >>> [tuple([d]+list(s)) for (d,s) in zip(_tmp.dates,_tmp.series)] (, 'AAA', 1, 1.0), (, 'BBB', 1, 2.0)] From josef.pktd at gmail.com Thu Jan 14 14:59:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 14:59:57 -0500 Subject: [SciPy-User] Seasonal adjustment in scipy/python? In-Reply-To: <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> Message-ID: <1cd32cbb1001141159o6b410655l26a44dc9a602f754@mail.gmail.com> On Thu, Jan 14, 2010 at 2:44 PM, wrote: > Hello, > > I am looking to seasonally adjust some data series in Python - specifically > economics in emerging markets. As you can see on the charts > (www.emconfidential.com) there is a lot of seasonality to monthly data > series. Example 1 (retail sales) is obvious. Example 2, CPI, ?is somewhat > less so, but there is still some seasonality here with price falls around > January and fairly high prices in December. > > How would I go about seasonally adjusting this data using Python and Scipy? > Any canned functions? I haven't seen any canned functions, the simplest would be to use annual differences or estimate the monthly base level with regression on month dummy variables and take the residual. >From the graph, it doesn't look like assuming a functional form for the monthly base level (seasonal trend) would be useful. There should be more sophisticated ways for filtering but not canned, and I don't think X11 is available in python. It would also depend on how long your timeseries is and what you want to do with it Josef > > Tom > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From pgmdevlist at gmail.com Thu Jan 14 15:03:25 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 14 Jan 2010 15:03:25 -0500 Subject: [SciPy-User] Seasonal adjustment in scipy/python? In-Reply-To: <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> Message-ID: On Jan 14, 2010, at 2:44 PM, totalbull at mac.com wrote: > > Hello, > > I am looking to seasonally adjust some data series in Python - specifically economics in emerging markets. As you can see on the charts (www.emconfidential.com) there is a lot of seasonality to monthly data series. Example 1 (retail sales) is obvious. Example 2, CPI, is somewhat less so, but there is still some seasonality here with price falls around January and fairly high prices in December. > > How would I go about seasonally adjusting this data using Python and Scipy? Any canned functions? Have a look to scikits.timeseries, the package was designed to simplify the handling of a series. You can also check scikits.hydroclimpy, a derived package: there's a 'deseasonalize' function in the second package that makes it easy to compute seasonal anomalies and normalize them. You may not have to install the whole package, just check the source and copy the function. pytseries.sourceforge.net hydroclimpy.sourceforge.net From aisaac at american.edu Thu Jan 14 15:10:32 2010 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 14 Jan 2010 15:10:32 -0500 Subject: [SciPy-User] Seasonal adjustment in scipy/python? In-Reply-To: <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> Message-ID: <4B4F7A38.50408@american.edu> On 1/14/2010 2:44 PM, totalbull at mac.com wrote: > I am looking to seasonally adjust some data series in Python - >>> help(np.diff) Help on function diff in module numpy.lib.function_base: diff(a, n=1, axis=-1) Calculate the nth order discrete difference along given axis. hth, Alan Isaac From josef.pktd at gmail.com Thu Jan 14 15:20:53 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 15:20:53 -0500 Subject: [SciPy-User] Seasonal adjustment in scipy/python? In-Reply-To: <4B4F7A38.50408@american.edu> References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> <4B4F7A38.50408@american.edu> Message-ID: <1cd32cbb1001141220g2b15d081nb3075e2e8f8ec319@mail.gmail.com> On Thu, Jan 14, 2010 at 3:10 PM, Alan G Isaac wrote: > On 1/14/2010 2:44 PM, totalbull at mac.com wrote: >> I am looking to seasonally adjust some data series in Python - > >>>> help(np.diff) > Help on function diff in module numpy.lib.function_base: > > diff(a, n=1, axis=-1) > ? ? Calculate the nth order discrete difference along given axis. diff doesn't work for seasonal, order is (1-L)^n not (1-L^n) (although possible after reshaping) BTW: residual of the regression on month dummies is just the same as subtracting the sample average for that month. Josef > > hth, > Alan Isaac > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From Dharhas.Pothina at twdb.state.tx.us Thu Jan 14 15:21:51 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Thu, 14 Jan 2010 14:21:51 -0600 Subject: [SciPy-User] Timseries ts.tofile() Remove brackets. In-Reply-To: <5585199C-A83E-4E4D-B3D3-29B8DDA62BBC@gmail.com> References: <4B4EE33E0200009B0002667D@GWWEB.twdb.state.tx.us> <5585199C-A83E-4E4D-B3D3-29B8DDA62BBC@gmail.com> Message-ID: <4B4F287F.63BA.009B.0@twdb.state.tx.us> Thanks, I created a ticket. - d >>> Pierre GM 1/14/2010 1:53 PM >>> On Jan 14, 2010, at 10:26 AM, Dharhas Pothina wrote: > Hi, > > I'm trying to format the output of ts.tofile() and I can't find anyway to suppress the use of open and close brackets on each line. ie using > ... > anyway of doing this without reopening the file and removing the brackets. Fixing the code :) Could you file a ticket ? Thanks a lot in advance. But here's a workaround >>> _tmp=ts.time_series([('AAA',1,1.),('BBB',1,2.)],dtype=[('a','|S3'),('b',int),('c',float)],start_date=ts.now('D')) >>> [tuple([d]+list(s)) for (d,s) in zip(_tmp.dates,_tmp.series)] (, 'AAA', 1, 1.0), (, 'BBB', 1, 2.0)] _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From pgmdevlist at gmail.com Thu Jan 14 15:27:06 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 14 Jan 2010 15:27:06 -0500 Subject: [SciPy-User] Timseries ts.tofile() Remove brackets. In-Reply-To: <4B4F287F.63BA.009B.0@twdb.state.tx.us> References: <4B4EE33E0200009B0002667D@GWWEB.twdb.state.tx.us> <5585199C-A83E-4E4D-B3D3-29B8DDA62BBC@gmail.com> <4B4F287F.63BA.009B.0@twdb.state.tx.us> Message-ID: On Jan 14, 2010, at 3:21 PM, Dharhas Pothina wrote: > > Thanks, I created a ticket. Got it ! Thanks again for reporting From aisaac at american.edu Thu Jan 14 15:41:18 2010 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 14 Jan 2010 15:41:18 -0500 Subject: [SciPy-User] Seasonal adjustment in scipy/python? In-Reply-To: <1cd32cbb1001141220g2b15d081nb3075e2e8f8ec319@mail.gmail.com> References: <60BE5A67-DB97-4B52-A281-AC67E82B3339@me.com> <891B0F5A-52D1-4086-BC65-E32749E7C6D7@mac.com> <4B4F7A38.50408@american.edu> <1cd32cbb1001141220g2b15d081nb3075e2e8f8ec319@mail.gmail.com> Message-ID: <4B4F816E.3060501@american.edu> On 1/14/2010 3:20 PM, josef.pktd at gmail.com wrote: > diff doesn't work for seasonal, order is (1-L)^n not (1-L^n) > (although possible after reshaping) Yep. Engaged fingers before brain... Alan From gokhansever at gmail.com Thu Jan 14 17:30:20 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 14 Jan 2010 16:30:20 -0600 Subject: [SciPy-User] Wording question regarding to distributions Message-ID: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> Hello, What is the right way to express: Do we fit data to a distribution or distribution to data? Thanks. G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Thu Jan 14 17:37:01 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 14 Jan 2010 17:37:01 -0500 Subject: [SciPy-User] Wording question regarding to distributions In-Reply-To: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> References: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> Message-ID: <49577BC2-5B2B-4E1B-8E8D-01920E2A1BD4@cs.toronto.edu> On 14-Jan-10, at 5:30 PM, G?khan Sever wrote: > Hello, > > What is the right way to express: > > Do we fit data to a distribution or distribution to data? > > Thanks. I would say the latter. Assuming we are talking about the same scenario, the data follow some unknown distribution, which you try to approximate with some parametric form using maximum likelihood estimators and such. So you are fitting a (particular) distribution (or more specifically, a model of the underlying process which *uses* that particular distribution) to observed data. My $0.02, David From josef.pktd at gmail.com Thu Jan 14 18:13:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 18:13:54 -0500 Subject: [SciPy-User] Wording question regarding to distributions In-Reply-To: <49577BC2-5B2B-4E1B-8E8D-01920E2A1BD4@cs.toronto.edu> References: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> <49577BC2-5B2B-4E1B-8E8D-01920E2A1BD4@cs.toronto.edu> Message-ID: <1cd32cbb1001141513t4599e9a3nde663feb58454aaa@mail.gmail.com> On Thu, Jan 14, 2010 at 5:37 PM, David Warde-Farley wrote: > > On 14-Jan-10, at 5:30 PM, G?khan Sever wrote: > >> Hello, >> >> What is the right way to express: >> >> Do we fit data to a distribution or distribution to data? >> >> Thanks. > > I would say the latter. Assuming we are talking about the same > scenario, the data follow some unknown distribution, which you try to > approximate with some parametric form using maximum likelihood > estimators and such. So you are fitting a (particular) distribution > (or more specifically, a model of the underlying process which *uses* > that particular distribution) to observed data. I agree, unless you are "massaging" your data to fit the distribution to get nicer results. Josef > > My $0.02, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From d.l.goldsmith at gmail.com Thu Jan 14 18:48:14 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 14 Jan 2010 15:48:14 -0800 Subject: [SciPy-User] Wording question regarding to distributions In-Reply-To: <1cd32cbb1001141513t4599e9a3nde663feb58454aaa@mail.gmail.com> References: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> <49577BC2-5B2B-4E1B-8E8D-01920E2A1BD4@cs.toronto.edu> <1cd32cbb1001141513t4599e9a3nde663feb58454aaa@mail.gmail.com> Message-ID: <45d1ab481001141548ka4c2c94ka6f88b97c27cd6ee@mail.gmail.com> On Thu, Jan 14, 2010 at 3:13 PM, wrote: > On Thu, Jan 14, 2010 at 5:37 PM, David Warde-Farley wrote: >> >> On 14-Jan-10, at 5:30 PM, G?khan Sever wrote: >> >>> Hello, >>> >>> What is the right way to express: >>> >>> Do we fit data to a distribution or distribution to data? >>> >>> Thanks. >> >> I would say the latter. Assuming we are talking about the same > > I agree, unless you are "massaging" your data to fit the distribution > to get nicer results. > > Josef Exactly: you're only "fitting data to a distribution" if you're fiddling w/ the data to make it fit; otherwise, your "fitting the distribution to the data." My $2e6. DG From davide_fiocco at yahoo.it Thu Jan 14 19:26:33 2010 From: davide_fiocco at yahoo.it (davide_fiocco at yahoo.it) Date: Thu, 14 Jan 2010 16:26:33 -0800 (PST) Subject: [SciPy-User] Get array of separation vectors from an array of vectors and use it to compute the force in a MD code In-Reply-To: <1cd32cbb1001131734o1414e0b3r6dea85940ee8349@mail.gmail.com> References: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> <1cd32cbb1001131734o1414e0b3r6dea85940ee8349@mail.gmail.com> Message-ID: <4645dbd7-c7f6-41be-88d8-fb99436f7f3a@r24g2000yqd.googlegroups.com> Thanks Josef! I post here the code i wrote to compute the matrix ff of the forces between all the pairs of particles in a given set interacting under the Lennard-Jones potential. Note that: - coordinates of the i-th particle is stored in self.txyz[i,1:]. - the returned matrix ff contains at f[i,j,:] the three components of the force due to the interaction between i and j. - the for loop is the way I used to rebuild a triangular matrix from its reduced representation I guess it can't be considered good code...and it'd be cool if someone could point out its major flaws! Thanks a lot again! Davide def get_forces(self): if self.pair_style == 'lj/cut': #Josef suggestion to get the reduced array of separation vectors R I, J = numpy.nonzero(numpy.triu(numpy.ones((self.natoms, self.natoms)), k=1)) R = self.atoms.txyz[I,1:] - self.atoms.txyz[J,1:] #invoking a vectorized function to apply the minimum image convention to the separation vectors R = minimum_image(R, self.boxes[-1].bounds) #compute the array of inverse distances S = 1/numpy.sqrt(numpy.add.reduce((R*R).transpose())) #in f I will store the information about the upper triangular part of the matrix of forces f = numpy.zeros((S.size, 3)) invcut = 1./2.5 #compute Lennard Jones force for distances below a given cutoff f[S > invcut, :] = (R[S > invcut,:])*((24.*(-2.*S[S > invcut]**13 + S[S > invcut]**7))*S[S > invcut]).reshape(-1,1) ff = numpy.zeros((self.natoms, self.natoms, 3)) #convert reduced array of forces into an antisymmetric matrix ff (f contains all the information about its triu) for i in range(self.natoms): ff[i,i+1:,:] = f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i + 1)*(i + 2)/2,:] ff[i+1:,i,:] = -f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i + 1)*(i + 2)/2,:] return ff #apply the minimum image convention def minimum_image_scalar(dx, box): dx = dx - int(round(dx/box))*box return dx minimum_image = numpy.vectorize(minimum_image_scalar) From josef.pktd at gmail.com Thu Jan 14 20:01:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 20:01:13 -0500 Subject: [SciPy-User] Get array of separation vectors from an array of vectors and use it to compute the force in a MD code In-Reply-To: <4645dbd7-c7f6-41be-88d8-fb99436f7f3a@r24g2000yqd.googlegroups.com> References: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> <1cd32cbb1001131734o1414e0b3r6dea85940ee8349@mail.gmail.com> <4645dbd7-c7f6-41be-88d8-fb99436f7f3a@r24g2000yqd.googlegroups.com> Message-ID: <1cd32cbb1001141701t21a8d6d3hcb6048c77336dc21@mail.gmail.com> On Thu, Jan 14, 2010 at 7:26 PM, davide_fiocco at yahoo.it wrote: > Thanks Josef! > I post here the code i wrote to compute the matrix ff of the forces > between all the pairs of particles in a given set interacting under > the Lennard-Jones potential. > Note that: > - coordinates of the i-th particle is stored in self.txyz[i,1:]. > - the returned matrix ff contains at f[i,j,:] the three components of > the force due to the interaction between i and j. > - the for loop is the way I used to rebuild a triangular matrix from > its reduced representation When you are rebuilding the triu, or the full symmetric distance matrix ff from the vectorized version then you can use again the intial triu indices I,J, and inplace add the transpose. might require a bit of thinking to get the 3rd axis right, but something like this: ff[I,J,:] = f # unless numpy switches axis ff += np.swapaxis(ff,2,1) # diagonal is zero so not duplicate to worry about You might want to try on a simple example, but I'm pretty sure something like this should work Josef > > I guess it can't be considered good code...and it'd be cool if someone > could point out its major flaws! > Thanks a lot again! > > Davide > > ? ? ? ?def get_forces(self): > ? ? ? ? ? ? ? ?if self.pair_style == 'lj/cut': > ? ? ? ? ? ? ? ? ? ? ? ?#Josef suggestion to get the reduced array of separation vectors R > ? ? ? ? ? ? ? ? ? ? ? ?I, J = numpy.nonzero(numpy.triu(numpy.ones((self.natoms, > self.natoms)), k=1)) > ? ? ? ? ? ? ? ? ? ? ? ?R = self.atoms.txyz[I,1:] - self.atoms.txyz[J,1:] > ? ? ? ? ? ? ? ? ? ? ? ?#invoking a vectorized function to apply the > minimum image convention to the separation vectors > ? ? ? ? ? ? ? ? ? ? ? ?R = minimum_image(R, self.boxes[-1].bounds) > ? ? ? ? ? ? ? ? ? ? ? ?#compute the array of inverse distances > ? ? ? ? ? ? ? ? ? ? ? ?S = 1/numpy.sqrt(numpy.add.reduce((R*R).transpose())) isn't the transpose here just choosing the axis ? 1/numpy.sqrt(((R*R).sum(0))) it won't make much difference but I find it easier to read > ? ? ? ? ? ? ? ? ? ? ? ?#in f I will store the information about the upper triangular part > of the matrix of forces > ? ? ? ? ? ? ? ? ? ? ? ?f = numpy.zeros((S.size, 3)) > ? ? ? ? ? ? ? ? ? ? ? ?invcut = 1./2.5 > ? ? ? ? ? ? ? ? ? ? ? ?#compute Lennard Jones force for distances > below a given cutoff > ? ? ? ? ? ? ? ? ? ? ? ?f[S > invcut, :] = (R[S > invcut,:])*((24.*(-2.*S[S > invcut]**13 + > S[S > invcut]**7))*S[S > invcut]).reshape(-1,1) you might want to replace the repeated comparison with a temp variable: mask = S > invcut Josef > ? ? ? ? ? ? ? ? ? ? ? ?ff = numpy.zeros((self.natoms, self.natoms, 3)) > ? ? ? ? ? ? ? ? ? ? ? ?#convert reduced array of forces into an > antisymmetric matrix ff (f contains all the information about its > triu) > ? ? ? ? ? ? ? ? ? ? ? ?for i in range(self.natoms): > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff[i,i+1:,:] = ?f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i > + 1)*(i + 2)/2,:] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff[i+1:,i,:] = -f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i > + 1)*(i + 2)/2,:] > > ? ? ? ? ? ? ? ? ? ? ? ?return ff > ? ? ? ?#apply the minimum image convention > ? ? ? ?def minimum_image_scalar(dx, box): > ? ? ? ? ? ? ? ?dx = dx - int(round(dx/box))*box > ? ? ? ? ? ? ? ?return dx > ? ? ? ?minimum_image = numpy.vectorize(minimum_image_scalar) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gokhansever at gmail.com Thu Jan 14 20:10:17 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 14 Jan 2010 19:10:17 -0600 Subject: [SciPy-User] Wording question regarding to distributions In-Reply-To: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> References: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> Message-ID: <49d6b3501001141710r4719d184tf826dfd31c28ba2@mail.gmail.com> On Thu, Jan 14, 2010 at 4:30 PM, G?khan Sever wrote: > Hello, > > What is the right way to express: > > Do we fit data to a distribution or distribution to data? > > Thanks. > > > > G?khan > Here is how the question arise in my mind. Previously, I had asked a question to fit a log-normal distribution on my data on this thread http://mail.scipy.org/pipermail/scipy-user/2009-November/023320.html Well the work is unfinished there, and I started to dig-in to the same subject again. For R, I have found a function that lets me estimate parameters from my binned data pair (i.e bin sizes - measurements) to construct a log-normal fit: http://www.exposurescience.org/heR.doc/library/heR.Misc/html/bin2lnorm.html The description given for the function is in conflict with itself: The title says: "Fit binned data to a log-normal distribution" However description says different: "This function takes binned data and fits a lognormal model to it, using weighted least squares, and optionally plotting the fit and the data together" I couldn't find a way to estimate log-normal parameters in Python (maybe I will need the same for the gamma distributions as well) given in the form as bin2lnorm (i.e. l- bin limits, and h- corresponding heights (measurements in my case)) that is the reason I use that R function. Any new alternative suggestions as welcome this point. Similarly, while I studying my Cloud and Precipitation Parameterizations book today (Distributions are extremely important in bulk-parameterization of clouds and cloud-constituents/products (e.g. aerosols, cloud-droplets, rain, hail etc...) I see in a couple figures (Please see the book review at http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521883382&ss=excand go to pg 9. Figure 1.2) using statements like: "gamma curves fit to data." It's clearer now after reading your inputs. Thanks again. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 14 20:12:34 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 20:12:34 -0500 Subject: [SciPy-User] Get array of separation vectors from an array of vectors and use it to compute the force in a MD code In-Reply-To: <1cd32cbb1001141701t21a8d6d3hcb6048c77336dc21@mail.gmail.com> References: <31b649b6-b1c0-4905-aa5a-e49e25e0fc62@z41g2000yqz.googlegroups.com> <1cd32cbb1001131734o1414e0b3r6dea85940ee8349@mail.gmail.com> <4645dbd7-c7f6-41be-88d8-fb99436f7f3a@r24g2000yqd.googlegroups.com> <1cd32cbb1001141701t21a8d6d3hcb6048c77336dc21@mail.gmail.com> Message-ID: <1cd32cbb1001141712q66f43e5bg7663ec4d274d9ef4@mail.gmail.com> On Thu, Jan 14, 2010 at 8:01 PM, wrote: > On Thu, Jan 14, 2010 at 7:26 PM, davide_fiocco at yahoo.it > wrote: >> Thanks Josef! >> I post here the code i wrote to compute the matrix ff of the forces >> between all the pairs of particles in a given set interacting under >> the Lennard-Jones potential. >> Note that: >> - coordinates of the i-th particle is stored in self.txyz[i,1:]. >> - the returned matrix ff contains at f[i,j,:] the three components of >> the force due to the interaction between i and j. >> - the for loop is the way I used to rebuild a triangular matrix from >> its reduced representation > > When you are rebuilding the triu, or the full symmetric distance > matrix ff from the vectorized version then you can use again the > intial triu indices I,J, and inplace add the transpose. > might require a bit of thinking to get the 3rd axis right, but > something like this: > > ff[I,J,:] = f ? ? ?# unless numpy switches axis > ff += np.swapaxis(ff,2,1) ?# diagonal is zero so not duplicate to worry about > > You might want to try on a simple example, but I'm pretty sure > something like this should work > > Josef > >> >> I guess it can't be considered good code...and it'd be cool if someone >> could point out its major flaws! >> Thanks a lot again! >> >> Davide >> >> ? ? ? ?def get_forces(self): >> ? ? ? ? ? ? ? ?if self.pair_style == 'lj/cut': >> ? ? ? ? ? ? ? ? ? ? ? ?#Josef suggestion to get the reduced array of separation vectors R >> ? ? ? ? ? ? ? ? ? ? ? ?I, J = numpy.nonzero(numpy.triu(numpy.ones((self.natoms, >> self.natoms)), k=1)) >> ? ? ? ? ? ? ? ? ? ? ? ?R = self.atoms.txyz[I,1:] - self.atoms.txyz[J,1:] >> ? ? ? ? ? ? ? ? ? ? ? ?#invoking a vectorized function to apply the >> minimum image convention to the separation vectors >> ? ? ? ? ? ? ? ? ? ? ? ?R = minimum_image(R, self.boxes[-1].bounds) >> ? ? ? ? ? ? ? ? ? ? ? ?#compute the array of inverse distances >> ? ? ? ? ? ? ? ? ? ? ? ?S = 1/numpy.sqrt(numpy.add.reduce((R*R).transpose())) > > isn't the transpose here just choosing the axis ? 1/numpy.sqrt(((R*R).sum(0))) > it won't make much difference but I find it easier to read typo, I think it's axis=1 in sum And thanks for posting, it's nice to see whether my answers are helpful or not Josef > >> ? ? ? ? ? ? ? ? ? ? ? ?#in f I will store the information about the upper triangular part >> of the matrix of forces >> ? ? ? ? ? ? ? ? ? ? ? ?f = numpy.zeros((S.size, 3)) >> ? ? ? ? ? ? ? ? ? ? ? ?invcut = 1./2.5 >> ? ? ? ? ? ? ? ? ? ? ? ?#compute Lennard Jones force for distances >> below a given cutoff >> ? ? ? ? ? ? ? ? ? ? ? ?f[S > invcut, :] = (R[S > invcut,:])*((24.*(-2.*S[S > invcut]**13 + >> S[S > invcut]**7))*S[S > invcut]).reshape(-1,1) > > you might want to replace ?the repeated comparison with a temp > variable: ?mask = S > invcut > > Josef > >> ? ? ? ? ? ? ? ? ? ? ? ?ff = numpy.zeros((self.natoms, self.natoms, 3)) >> ? ? ? ? ? ? ? ? ? ? ? ?#convert reduced array of forces into an >> antisymmetric matrix ff (f contains all the information about its >> triu) >> ? ? ? ? ? ? ? ? ? ? ? ?for i in range(self.natoms): >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff[i,i+1:,:] = ?f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i >> + 1)*(i + 2)/2,:] >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff[i+1:,i,:] = -f[self.natoms*i - i*(i+1)/2:self.natoms*(i+1) - (i >> + 1)*(i + 2)/2,:] >> >> ? ? ? ? ? ? ? ? ? ? ? ?return ff >> ? ? ? ?#apply the minimum image convention >> ? ? ? ?def minimum_image_scalar(dx, box): >> ? ? ? ? ? ? ? ?dx = dx - int(round(dx/box))*box >> ? ? ? ? ? ? ? ?return dx >> ? ? ? ?minimum_image = numpy.vectorize(minimum_image_scalar) >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From bsouthey at gmail.com Thu Jan 14 22:14:21 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 14 Jan 2010 21:14:21 -0600 Subject: [SciPy-User] Wording question regarding to distributions In-Reply-To: <49d6b3501001141710r4719d184tf826dfd31c28ba2@mail.gmail.com> References: <49d6b3501001141430i6dd7155ah9ea1b181404d46fc@mail.gmail.com> <49d6b3501001141710r4719d184tf826dfd31c28ba2@mail.gmail.com> Message-ID: On Thu, Jan 14, 2010 at 7:10 PM, G?khan Sever wrote: > > > On Thu, Jan 14, 2010 at 4:30 PM, G?khan Sever wrote: >> >> Hello, >> >> What is the right way to express: >> >> Do we fit data to a distribution or distribution to data? >> >> Thanks. >> >> >> >> G?khan > > Here is how the question arise in my mind. > > Previously, I had asked a question to fit a log-normal distribution on my > data on this thread > http://mail.scipy.org/pipermail/scipy-user/2009-November/023320.html > > Well the work is unfinished there, and I started to dig-in to the same > subject again. For R, I have found a function that lets me estimate > parameters from my binned data pair (i.e bin sizes - measurements) to > construct a log-normal fit: > > http://www.exposurescience.org/heR.doc/library/heR.Misc/html/bin2lnorm.html > > The description given for the function is in conflict with itself: > > The title says: "Fit binned data to a log-normal distribution" > > However description says different: > > "This function takes binned data and fits a lognormal model to it, using > weighted least squares, and optionally plotting the fit and the data > together" > > I couldn't find a way to estimate log-normal parameters in Python (maybe I > will need the same for the gamma distributions as well) given in the form as > bin2lnorm (i.e. l- bin limits, and h- corresponding heights (measurements in > my case)) that is the reason I use that R function. Any new alternative > suggestions as welcome this point. > > Similarly, while I studying my Cloud and Precipitation Parameterizations > book today (Distributions are extremely important in bulk-parameterization > of clouds and cloud-constituents/products (e.g. aerosols, cloud-droplets, > rain, hail etc...) I see in a couple figures (Please see the book review at > http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521883382&ss=exc > and go to pg 9. Figure 1.2) using statements like: "gamma curves fit to > data." > > It's clearer now after reading your inputs. > > Thanks again. > -- > G?khan > Depends on what you mean by 'data'. However, like many things, terminology is rather flexible, misused or just incomplete. Typically you have random variables (http://en.wikipedia.org/wiki/Random_variables) from some distribution such as multivariate normal. Note that a distribution is a rather complex thing which has various properties (http://en.wikipedia.org/wiki/Probability_distribution). When you want to see if the data is from some distribution that you do not know, then you are testing a hypothesis that your data, as a whole, has certain characteristics of random variables from that distribution. Central limit theorem makes many distributions very similar (i.e. like a normal distribution) with sufficient observations when it holds. However, you can not say that the data are random variables from that distribution nor that all data points are from the distribution. So if your data are random variables then neither saying is correct. Bruce From qa at takb.net Fri Jan 15 03:19:33 2010 From: qa at takb.net (Torsten Andre) Date: Fri, 15 Jan 2010 09:19:33 +0100 Subject: [SciPy-User] Integration of double integral with integration variable as In-Reply-To: <201001141321.04818.ljmamoreira@gmail.com> References: <4B4EE290.7070600@takb.net> <201001141321.04818.ljmamoreira@gmail.com> Message-ID: <4B502515.3020809@takb.net> Jose, I thank you very much for your help. Well, my example is easy to solve, indeed. Though this was only an example. But the trick with the functions does it. Something one could have figured out... You bet it helped ;) Torsten Jose Amoreira wrote: > Torsten, > Your example is easy! > Since sin(y) is an odd function, integrating over [-x,x] gives zero and that's > it. For a more illustrative example, replace sin with cos (still easy enough > to do it quicker analytically). Using scipy.integrate.quad, you do it like > this (excerpt from an idle session): >>>> def g(x): > return quad(cos,-x,x)[0] > >>>> quad(g,0.,1.)[0] > 0.91939538826372047 > > The reason I take the zero-th element of the quad output is that the remaining > is an estimate of the error. > Maybe you should also look into scipy.integrate.dblquad. > Hope this helps. > jose > > On Thursday 14 January 2010 09:23:28 am Torsten Andre wrote: >> Hey everyone, >> >> I am new to SciPy, but need to integrate something like this, where the >> boundaries of the inner integral are terms of outer variable's >> integration variable: >> >> \int{\int{sin(y)dy}_{-x}^{+x}dx}_0^1 >> >> Is this feasible in SciPy? I tried using quad but it only complains that >> x is not defined. Unfortunately I was unable to find anything on the >> list or in the documentation. >> >> Thanks for your time. >> >> Cheers, >> Torsten >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From gorkypl at gmail.com Fri Jan 15 09:24:46 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Fri, 15 Jan 2010 15:24:46 +0100 Subject: [SciPy-User] Quick question about selecting periodical data with scikits.timeseries Message-ID: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> hello, Working more and more with scikits.timeseries and hydroclimpy I'm still impressed by their performance and abilities. I cannot found, hovewer, a native method of selecting data included in a given (regular) period. Is there one? In my particular case, I have daily data for the last fifteen years, and I'd like to split them into fifteen annual series, or 15*12 monthly series, and so on... I know I can select data from one period using something like: series.['1996-01-01':'1996-12-31'] and of course I can write a function that will iterate over all years - but since I've found that many of the functions I wrote were included in the package, I don't want to make this mistake once more ;) greetings, Pawe? From kwgoodman at gmail.com Fri Jan 15 13:07:28 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 15 Jan 2010 10:07:28 -0800 Subject: [SciPy-User] scipy.stats.nanstd, bias and ddof Message-ID: By default np.std and scipy.std normalize by N. But scipy.stats.nanstd normalizes by N-1. >> x = np.random.rand(4) >> np.std(x) 0.12006913635950889 >> scipy.std(x) 0.12006913635950889 >> scipy.stats.nanstd(x) 0.13864389639705668 >> scipy.stats.nanstd(x, bias=True) 0.12006913635950889 Can the default for nanstd be changed to bias=True? Or would that break code? Even better I guess would be to replace the bias keyword with ddof as used in np.std and scipy.std. So if bias: m2c = m2 / n else: m2c = m2 / (n - 1.) in scipy.stats.nanstd would become m2c = m2 / (n - ddof) For me it doesn't matter if the default ddof is 0 or 1. But it is nice when all std functions use the same default. From josef.pktd at gmail.com Fri Jan 15 13:48:11 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 13:48:11 -0500 Subject: [SciPy-User] scipy.stats.nanstd, bias and ddof In-Reply-To: References: Message-ID: <1cd32cbb1001151048t824458if34e85b3464e380@mail.gmail.com> On Fri, Jan 15, 2010 at 1:07 PM, Keith Goodman wrote: > By default np.std and scipy.std normalize by N. But scipy.stats.nanstd > normalizes by N-1. > >>> x = np.random.rand(4) >>> np.std(x) > ? 0.12006913635950889 >>> scipy.std(x) > ? 0.12006913635950889 >>> scipy.stats.nanstd(x) > ? 0.13864389639705668 >>> scipy.stats.nanstd(x, bias=True) > ? 0.12006913635950889 > > Can the default for nanstd be changed to bias=True? Or would that break code? > > Even better I guess would be to replace the bias keyword with ddof as > used in np.std and scipy.std. So > > ? ?if bias: > ? ? ? ?m2c = m2 / n > ? ?else: > ? ? ? ?m2c = m2 / (n - 1.) > > in scipy.stats.nanstd would become > > ? ? m2c = m2 / (n - ddof) > > For me it doesn't matter if the default ddof is 0 or 1. But it is nice > when all std functions use the same default. I agree with the consistency across function argument. But changing the degrees of freedom will affect user code, and we would have to go through a warning period, and maybe add the ddof argument in the meantime. (but having both bias and ddof as arguments would be a bit messy) Or maybe numpy should get a nanmean and nanvar, nanstd, similar to nansum ? Then it would be easier to depreciate like the other stats functions that moved to numpy. Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kwgoodman at gmail.com Fri Jan 15 13:56:10 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 15 Jan 2010 10:56:10 -0800 Subject: [SciPy-User] scipy.stats.nanstd, bias and ddof In-Reply-To: <1cd32cbb1001151048t824458if34e85b3464e380@mail.gmail.com> References: <1cd32cbb1001151048t824458if34e85b3464e380@mail.gmail.com> Message-ID: On Fri, Jan 15, 2010 at 10:48 AM, wrote: > Or maybe numpy should get a nanmean and nanvar, nanstd, similar to > nansum ? Then it would be easier to depreciate like the other stats > functions that moved to numpy. That's a great idea. Adding nanstd to numpy would not break any code. It would also be nice to have a nanmedian in numpy, one that doesn't do a full sort. A pony would be nice too. From Dharhas.Pothina at twdb.state.tx.us Fri Jan 15 15:11:49 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Fri, 15 Jan 2010 14:11:49 -0600 Subject: [SciPy-User] timeseries tsfromtxt missing_values bug? Message-ID: <4B5077A4.63BA.009B.0@twdb.state.tx.us> Hi, I'm having issues with tsfromtxt masking fields using the missing_values parameter. >>> dateconverter = lambda y, m, d, hh, mm : datetime(year=int(y), month=int(m), day=int(d), hour=int(hh), minute=int(mm)) >>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0) gives : timeseries([(-999.0,) (-999.0,) (-999.0,)], dtype = [('f5', '>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0,names='data') gives : timeseries([(--,) (--,) (--,)], dtype = [('_tmp4', ' From pgmdevlist at gmail.com Fri Jan 15 15:56:47 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 15 Jan 2010 15:56:47 -0500 Subject: [SciPy-User] Quick question about selecting periodical data with scikits.timeseries In-Reply-To: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> References: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> Message-ID: <94C2AFA5-4418-4F10-9E05-43377C73929D@gmail.com> On Jan 15, 2010, at 9:24 AM, Pawe? Rumian wrote: > hello, > > Working more and more with scikits.timeseries and hydroclimpy I'm > still impressed by their performance and abilities. > > I cannot found, hovewer, a native method of selecting data included in > a given (regular) period. Is there one? > > In my particular case, I have daily data for the last fifteen years, > and I'd like to split them into fifteen annual series, or 15*12 > monthly series, and so on... > > I know I can select data from one period using something like: > series.['1996-01-01':'1996-12-31'] > and of course I can write a function that will iterate over all years > - but since I've found that many of the functions I wrote were > included in the package, I don't want to make this mistake once more > ;) the easiest is to use the .convert method described here: http://pytseries.sourceforge.net/generated/scikits.timeseries.TimeSeries.convert.html In your case, choose 'A' for the output frequency and func=None (the default) to get a Nx366 array of data; each row will correspond to a year, each column to a day of year (hence 366 columns to keep track of lapse years; for non-lapse year, the 366th element is masked). A second possibility is to convert first to monthly frequency with an aggregation function (eg, func=sum or func=mean) to get a series of monthly aggregated data, then convert to 'A' to get a Nx12 series. let me know if it helps or if you have more specific questions. Cheers P. From gorkypl at gmail.com Fri Jan 15 17:16:17 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Fri, 15 Jan 2010 23:16:17 +0100 Subject: [SciPy-User] Quick question about selecting periodical data with scikits.timeseries In-Reply-To: <94C2AFA5-4418-4F10-9E05-43377C73929D@gmail.com> References: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> <94C2AFA5-4418-4F10-9E05-43377C73929D@gmail.com> Message-ID: <5158a0651001151416x74b8c110t5f786d79c766e821@mail.gmail.com> > the easiest is to use the .convert method described here: > http://pytseries.sourceforge.net/generated/scikits.timeseries.TimeSeries.convert.html I don't know how could I miss it - while already using ts.convert and np.ma.mean to convert hourly data to daily averages, I totally overlooked the fact that this method doesn't have to use any interpolation. > let me know if it helps or if you have more specific questions. Of course it works perfect, and so up till now this package has everything I need - you've done a great job. Anyway, one more question - is there any convenient method to plot such converted data? If I use simple tsplot, the dates are aligned horizontally, which is not what I want. Now I'm viewing them with something like: for row in converted_data: plot row but I wonder if there is a more 'proper' way of handling this. greetings, Pawe? From pgmdevlist at gmail.com Fri Jan 15 18:34:44 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 15 Jan 2010 18:34:44 -0500 Subject: [SciPy-User] Quick question about selecting periodical data with scikits.timeseries In-Reply-To: <5158a0651001151416x74b8c110t5f786d79c766e821@mail.gmail.com> References: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> <94C2AFA5-4418-4F10-9E05-43377C73929D@gmail.com> <5158a0651001151416x74b8c110t5f786d79c766e821@mail.gmail.com> Message-ID: <0D540F50-39DB-49A6-8B55-1388E594801B@gmail.com> On Jan 15, 2010, at 5:16 PM, Pawe? Rumian wrote: > > Anyway, one more question - is there any convenient method to plot > such converted data? If I use simple tsplot, the dates are aligned > horizontally, which is not what I want. Now I'm viewing them with > something like: > for row in converted_data: plot row > but I wonder if there is a more 'proper' way of handling this. Ah, you wanna plot your data year by year, right ? In that case, you don't really need the dates anymore, and I suggest you plot the .series attribute instead (that's the masked array that stores only the data. It has faster access than the whole timeseries because you don't have to access the dates anymore). Now, your problem simplifies into : how to plot multiple rows at once. You can loop on the rows, or check in matplotlib if there's not another trick (try a LineCollection if you don't need different colors for different years/rows) From gorkypl at gmail.com Sat Jan 16 04:20:03 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Sat, 16 Jan 2010 10:20:03 +0100 Subject: [SciPy-User] Quick question about selecting periodical data with scikits.timeseries In-Reply-To: <0D540F50-39DB-49A6-8B55-1388E594801B@gmail.com> References: <5158a0651001150624vcc4a6f6y61ba4a2c201818a@mail.gmail.com> <94C2AFA5-4418-4F10-9E05-43377C73929D@gmail.com> <5158a0651001151416x74b8c110t5f786d79c766e821@mail.gmail.com> <0D540F50-39DB-49A6-8B55-1388E594801B@gmail.com> Message-ID: <5158a0651001160120r55e6c219h1c732dfc045ed7a1@mail.gmail.com> > Ah, you wanna plot your data year by year, right ? In that case, you don't really need the dates anymore, and I suggest you plot the .series attribute instead (that's the masked array that stores only the data. It has faster access than the whole timeseries because you don't have to access the dates anymore). > Now, your problem simplifies into : how to plot multiple rows at once. You can loop on the rows, or check in matplotlib if there's not another trick (try a LineCollection if you don't need different colors for different years/rows) That clears everything - no more questions by now :) Pawe? From resurgo at gmail.com Sat Jan 16 09:37:13 2010 From: resurgo at gmail.com (Peter Clarke) Date: Sat, 16 Jan 2010 14:37:13 +0000 Subject: [SciPy-User] Python coders for Haiti disaster relief Message-ID: Apologies for off topic posting but I think this in an important project. Python programmers are required immediately for assistance in coding a disaster management framework for the Earthquake in Haiti. >From http://wiki.python.org/moin/VolunteerOpportunities: ----------------- URGENT REQUEST, Sahana Disaster Management System, Haiti Earthquake *Job Description*:This is an urgent call for experienced Python programmers to help in the Sahana Disaster Management System immediately - knowledge of Web2Py platform would be best. The Sahana Disaster Management System is used to coordinate relief efforts. Please recruit any available programmers for the Haiti effort as quickly as possible and have them contact me immediately so that I can put them in touch with the correct people. Thank you kindly and I do hope that we can quickly identify some contributors for this monumental effort - they are needed ASAP. http://sahanapy.org/ is the developer site and the demo is http://demo.sahanapy.org/ - *Contact*: Connie White, PhD, Institute for Emergency Preparedness, Jacksonville State University - *E-mail contact*: connie.m.white at gmail.com - *Web*: http://sahanapy.org/ ----------------------------- Please help if you can. -Peter Clarke -------------- next part -------------- An HTML attachment was scrubbed... URL: From rchrdlyon1 at gmail.com Sat Jan 16 10:33:32 2010 From: rchrdlyon1 at gmail.com (Richard Lyon) Date: Sun, 17 Jan 2010 02:33:32 +1100 Subject: [SciPy-User] Issues with lfilter after version upgrade Message-ID: <4B51DC4C.4090906@googlemail.com> Hi, Problem: ======= Been successfully using scipy to run various signal processing simulations. Recently upgraded python, numpy and scipy. Now find lfilter in signal processing appears to crash the python interpreter. Details: ======== Window Vista python 2.6.4 pywin32 214 numpy 1.4.0 scipy 0.7.1 The following code crashes ---------------------------------------------------------- from numpy import zeros from scipy.signal import lfilter print 'Testing lfilter' B = [ +9.9416310E-01, -1.9883262E+00, +9.9416310E-01 ] A = [ +1.0000000E+00, -1.9882920E+00, +9.8836030E-01 ] z = zeros(2) x0 = zeros(80) # fails on this next line (x1, z) = lfilter(B, A, x0, -1, z) print 'Finished' ---------------------------------------------------------- Haven't seen any other problems yet. Regards RLYON From josef.pktd at gmail.com Sat Jan 16 12:57:17 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 16 Jan 2010 12:57:17 -0500 Subject: [SciPy-User] Issues with lfilter after version upgrade In-Reply-To: <4B51DC4C.4090906@googlemail.com> References: <4B51DC4C.4090906@googlemail.com> Message-ID: <1cd32cbb1001160957v2e77436dn7e7a78ad90425ef3@mail.gmail.com> On Sat, Jan 16, 2010 at 10:33 AM, Richard Lyon wrote: > Hi, > > Problem: > ======= > > Been successfully using scipy to run various signal processing > simulations. Recently upgraded python, numpy and scipy. Now find lfilter > in signal processing appears to crash the python interpreter. > > Details: > ======== > > Window Vista > python 2.6.4 > pywin32 214 > numpy 1.4.0 > scipy 0.7.1 > > The following code crashes > > ---------------------------------------------------------- > from numpy import zeros > from scipy.signal import lfilter > > print 'Testing lfilter' > B = [ +9.9416310E-01, -1.9883262E+00, +9.9416310E-01 ] > A = [ +1.0000000E+00, -1.9882920E+00, +9.8836030E-01 ] > z = zeros(2) > x0 = zeros(80) > # fails on this next line > (x1, z) = lfilter(B, A, x0, -1, z) > print 'Finished' > ---------------------------------------------------------- scipy 0.7.x has binary incompatibility problems if it has been compiled against numpy 1.3 and is run against numpy 1.4 When I run your script with scipy (trunk) compiled against numpy 1.4.0, I don't have any problem. When I run your script with scipy-0.7.1.dev5744 compiled against numpy 1.3, and run it with the release version of numpy 1.4.0, then I also have the crash. (I don't have a virtualenv with scipy-0.7.1 release to try out). There are 2 options, either you recompile scipy against numpy 1.4. or downgrade to numpy 1.3 until numpy 1.4 compatible scipy binaries are available. (I still think there are other binary incompatibilities besides the cython problem) I'm on WindowXP Josef > > Haven't seen any other problems yet. > > Regards > RLYON > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From perfreem at gmail.com Sat Jan 16 17:28:37 2010 From: perfreem at gmail.com (per freem) Date: Sat, 16 Jan 2010 17:28:37 -0500 Subject: [SciPy-User] smoothing in scipy/matplotlib Message-ID: hi all, i am using gaussian_kde to fit a gaussian kernel estimator to a bunch of data. the lines i get are often quite jaggy and very sensitive to fluctuations in the data. is there a way to "smooth" the estimate more? typically in gaussian kdes there is a smoothing parameter, but i do not see one in the documentation. is there a way to do this? thanks. From josef.pktd at gmail.com Sat Jan 16 18:33:50 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 16 Jan 2010 18:33:50 -0500 Subject: [SciPy-User] smoothing in scipy/matplotlib In-Reply-To: References: Message-ID: <1cd32cbb1001161533p3f732ccdl6a0fb2333728791a@mail.gmail.com> On Sat, Jan 16, 2010 at 5:28 PM, per freem wrote: > hi all, > > i am using gaussian_kde to fit a gaussian kernel estimator to a bunch > of data. the lines i get are often quite jaggy and very sensitive to > fluctuations in the data. is there a way to "smooth" the estimate > more? typically in gaussian kdes there is a smoothing parameter, but i > do not see one in the documentation. > > is there a way to do this? Not yet, I never committed the change, the cleanest way currently is by subclassing kde_gaussian, the dirtier version is by monkey patching. I can look for my example scripts for both later tonight, there is also some information on the mailing list, e.g. a subclassing example by Anne (maybe one and a half years ago) I'm a bit surprised about undersmoothing, because I did the changes for the case of oversmoothing by kde_gaussian. Josef > > thanks. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Sat Jan 16 23:37:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 16 Jan 2010 23:37:16 -0500 Subject: [SciPy-User] smoothing in scipy/matplotlib In-Reply-To: <1cd32cbb1001161533p3f732ccdl6a0fb2333728791a@mail.gmail.com> References: <1cd32cbb1001161533p3f732ccdl6a0fb2333728791a@mail.gmail.com> Message-ID: <1cd32cbb1001162037r5808ccd7v7c61966df662adad@mail.gmail.com> On Sat, Jan 16, 2010 at 6:33 PM, wrote: > On Sat, Jan 16, 2010 at 5:28 PM, per freem wrote: >> hi all, >> >> i am using gaussian_kde to fit a gaussian kernel estimator to a bunch >> of data. the lines i get are often quite jaggy and very sensitive to >> fluctuations in the data. is there a way to "smooth" the estimate >> more? typically in gaussian kdes there is a smoothing parameter, but i >> do not see one in the documentation. >> >> is there a way to do this? > > Not yet, I never committed the change, the cleanest way currently is > by subclassing kde_gaussian, the dirtier version is by monkey > patching. I can look for my example scripts for both later tonight, > there is also some information on the mailing list, e.g. a subclassing > example by Anne (maybe one and a half years ago) > > I'm a bit surprised about undersmoothing, because I did the changes > for the case of oversmoothing by kde_gaussian. > > Josef In the attachment is my subclass of stats.gaussian_kde. the main point I did was to allow to set or reset the smoothing factor to a float. It plots several examples Initially this was intended to be a continuation to this story, but I never got around to finishing it (my file is dated may, and I haven't looked at it in a long time) http://jpktd.blogspot.com/2009/03/using-gaussian-kernel-density.html I hope this helps, ask if something is not clear. I don't find a ticket or mailinglist thread on my draft for the enhancement (keyword option for bandwith) to gausssian_kde, the initial monkey patch version is here http://mail.scipy.org/pipermail/scipy-user/2009-January/019201.html Josef > > > >> >> thanks. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > -------------- next part -------------- '''subclassing kde Author: josef pktd ''' import numpy as np import scipy from scipy import stats import matplotlib.pylab as plt class gaussian_kde_set_covariance(stats.gaussian_kde): ''' from Anne Archibald in mailinglist: http://www.nabble.com/Width-of-the-gaussian-in-stats.kde.gaussian_kde---td19558924.html#a19558924 ''' def __init__(self, dataset, covariance): self.covariance = covariance scipy.stats.gaussian_kde.__init__(self, dataset) def _compute_covariance(self): self.inv_cov = np.linalg.inv(self.covariance) self._norm_factor = sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfact = covfact scipy.stats.gaussian_kde.__init__(self, dataset) def _compute_covariance_(self): '''not used''' self.inv_cov = np.linalg.inv(self.covariance) self._norm_factor = sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n def covariance_factor(self): if self.covfact in ['sc', 'scotts']: return self.scotts_factor() if self.covfact in ['si', 'silverman']: return self.silverman_factor() elif self.covfact: return float(self.covfact) else: raise ValueError, \ 'covariance factor has to be scotts, silverman or a number' def reset_covfact(self, covfact): self.covfact = covfact self.covariance_factor() self._compute_covariance() def plotkde(covfact): gkde.reset_covfact(covfact) kdepdf = gkde.evaluate(ind) plt.figure() # plot histgram of sample plt.hist(xn, bins=20, normed=1) # plot estimated density plt.plot(ind, kdepdf, label='kde', color="g") # plot data generating density plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) + (1-alpha) * stats.norm.pdf(ind, loc=mhigh), color="r", label='DGP: normal mix') plt.title('Kernel Density Estimation - ' + str(gkde.covfact)) plt.legend() from numpy.testing import assert_array_almost_equal, \ assert_almost_equal, assert_ def test_kde_1d(): np.random.seed(8765678) n_basesample = 500 xn = np.random.randn(n_basesample) xnmean = xn.mean() xnstd = xn.std(ddof=1) print xnmean, xnstd # get kde for original sample gkde = stats.gaussian_kde(xn) # evaluate the density funtion for the kde for some points xs = np.linspace(-7,7,501) kdepdf = gkde.evaluate(xs) normpdf = stats.norm.pdf(xs, loc=xnmean, scale=xnstd) print 'MSE', np.sum((kdepdf - normpdf)**2) print 'axabserror', np.max(np.abs(kdepdf - normpdf)) intervall = xs[1] - xs[0] assert_(np.sum((kdepdf - normpdf)**2)*intervall < 0.01) #assert_array_almost_equal(kdepdf, normpdf, decimal=2) print gkde.integrate_gaussian(0.0, 1.0) print gkde.integrate_box_1d(-np.inf, 0.0) print gkde.integrate_box_1d(0.0, np.inf) print gkde.integrate_box_1d(-np.inf, xnmean) print gkde.integrate_box_1d(xnmean, np.inf) assert_almost_equal(gkde.integrate_box_1d(xnmean, np.inf), 0.5, decimal=1) assert_almost_equal(gkde.integrate_box_1d(-np.inf, xnmean), 0.5, decimal=1) assert_almost_equal(gkde.integrate_box(xnmean, np.inf), 0.5, decimal=1) assert_almost_equal(gkde.integrate_box(-np.inf, xnmean), 0.5, decimal=1) assert_almost_equal(gkde.integrate_kde(gkde), (kdepdf**2).sum()*intervall, decimal=2) assert_almost_equal(gkde.integrate_gaussian(xnmean, xnstd**2), (kdepdf*normpdf).sum()*intervall, decimal=2) ## assert_almost_equal(gkde.integrate_gaussian(0.0, 1.0), ## (kdepdf*normpdf).sum()*intervall, decimal=2) if __name__ == '__main__': # generate a sample n_basesample = 1000 np.random.seed(8765678) alpha = 0.6 #weight for (prob of) lower distribution mlow, mhigh = (-3,3) #mean locations for gaussian mixture xn = np.concatenate([mlow + np.random.randn(alpha * n_basesample), mhigh + np.random.randn((1-alpha) * n_basesample)]) # get kde for original sample #gkde = stats.gaussian_kde(xn) gkde = gaussian_kde_covfact(xn, 0.1) # evaluate the density funtion for the kde for some points ind = np.linspace(-7,7,101) kdepdf = gkde.evaluate(ind) plt.figure() # plot histgram of sample plt.hist(xn, bins=20, normed=1) # plot estimated density plt.plot(ind, kdepdf, label='kde', color="g") # plot data generating density plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) + (1-alpha) * stats.norm.pdf(ind, loc=mhigh), color="r", label='DGP: normal mix') plt.title('Kernel Density Estimation') plt.legend() gkde = gaussian_kde_covfact(xn, 'scotts') kdepdf = gkde.evaluate(ind) plt.figure() # plot histgram of sample plt.hist(xn, bins=20, normed=1) # plot estimated density plt.plot(ind, kdepdf, label='kde', color="g") # plot data generating density plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) + (1-alpha) * stats.norm.pdf(ind, loc=mhigh), color="r", label='DGP: normal mix') plt.title('Kernel Density Estimation') plt.legend() #plt.show() for cv in ['scotts', 'silverman', 0.05, 0.1, 0.5]: plotkde(cv) test_kde_1d() np.random.seed(8765678) n_basesample = 1000 xn = np.random.randn(n_basesample) xnmean = xn.mean() xnstd = xn.std(ddof=1) # get kde for original sample gkde = stats.gaussian_kde(xn) From josef.pktd at gmail.com Sun Jan 17 00:03:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 17 Jan 2010 00:03:12 -0500 Subject: [SciPy-User] smoothing in scipy/matplotlib In-Reply-To: <1cd32cbb1001162037r5808ccd7v7c61966df662adad@mail.gmail.com> References: <1cd32cbb1001161533p3f732ccdl6a0fb2333728791a@mail.gmail.com> <1cd32cbb1001162037r5808ccd7v7c61966df662adad@mail.gmail.com> Message-ID: <1cd32cbb1001162103o4510ecf2gb227ed20ee11a679@mail.gmail.com> On Sat, Jan 16, 2010 at 11:37 PM, wrote: > On Sat, Jan 16, 2010 at 6:33 PM, ? wrote: >> On Sat, Jan 16, 2010 at 5:28 PM, per freem wrote: >>> hi all, >>> >>> i am using gaussian_kde to fit a gaussian kernel estimator to a bunch >>> of data. the lines i get are often quite jaggy and very sensitive to >>> fluctuations in the data. is there a way to "smooth" the estimate >>> more? typically in gaussian kdes there is a smoothing parameter, but i >>> do not see one in the documentation. >>> >>> is there a way to do this? >> >> Not yet, I never committed the change, the cleanest way currently is >> by subclassing kde_gaussian, the dirtier version is by monkey >> patching. I can look for my example scripts for both later tonight, >> there is also some information on the mailing list, e.g. a subclassing >> example by Anne (maybe one and a half years ago) >> >> I'm a bit surprised about undersmoothing, because I did the changes >> for the case of oversmoothing by kde_gaussian. >> >> Josef > > In the attachment is my subclass of ?stats.gaussian_kde. the main > point I did was to allow to set or reset the smoothing factor to a > float. It plots several examples > > Initially this was intended to be a continuation to this story, but I > never got around to finishing it (my file is dated may, and I haven't > looked at it in a long time) > > http://jpktd.blogspot.com/2009/03/using-gaussian-kernel-density.html > > I hope this helps, ask if something is not clear. > > I don't find a ticket or mailinglist thread on my draft for the > enhancement (keyword option for bandwith) to gausssian_kde, the > initial monkey patch version is here > http://mail.scipy.org/pipermail/scipy-user/2009-January/019201.html > > Josef I just created http://projects.scipy.org/scipy/ticket/1092 so I don't forget about it again. I appreciate any comments about what changes would be useful for the bandwidth choice. Josef > > >> >> >> >>> >>> thanks. >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > From yves.frederix at gmail.com Sun Jan 17 05:25:40 2010 From: yves.frederix at gmail.com (Yves Frederix) Date: Sun, 17 Jan 2010 11:25:40 +0100 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 Message-ID: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> Hi, I stumbled upon the following unlogical behavior of scipy.interpolate.splev. When presented with a length-1 array, the output is converted to a scalar.


import scipy.interpolate
import numpy as N

x = N.arange(5.)
y = N.arange(5.)
tck = scipy.interpolate.splrep(x,y)

x_eval = N.asarray([1.])
y_eval = scipy.interpolate.splev(x_eval, tck)

print 'scipy.interpolate.splev(x_eval, tck):', y_eval
print 'type(x_eval):', type(x_eval)
print 'type(y_eval):', type(y_eval)

with output scipy.interpolate.splev(x_eval, tck): 1.0 type(x_eval): type(y_eval): It was rather unexpected that the type of input and output data are different. After checking interpolate/fitpack.py it seems that this behavior results from the fact that the length-1 case is explicitly treated differently (probably to be able to deal with the case of scalar input, for which scalar output is expected): 434 def splev(x,tck,der=0): 487 if ier: raise TypeError,"An error occurred" 488 if len(y)>1: return y 489 return y[0] 490 Wouldn't it be less confusing to have the return value always have the same type as the input data? Cheers, YVES From perfreem at gmail.com Sun Jan 17 08:39:35 2010 From: perfreem at gmail.com (per freem) Date: Sun, 17 Jan 2010 08:39:35 -0500 Subject: [SciPy-User] smoothing in scipy/matplotlib In-Reply-To: <1cd32cbb1001162103o4510ecf2gb227ed20ee11a679@mail.gmail.com> References: <1cd32cbb1001161533p3f732ccdl6a0fb2333728791a@mail.gmail.com> <1cd32cbb1001162037r5808ccd7v7c61966df662adad@mail.gmail.com> <1cd32cbb1001162103o4510ecf2gb227ed20ee11a679@mail.gmail.com> Message-ID: hi josef, thank you so much - your patch worked brilliantly. i simply changed the smoothing factor to 0.25 and got the correct result. it was very straightforward to use! it would be great if your subclass of kde was incorporated into scipy. if you're interested in seeing the graphs before (with the default kde) and with your version, i can send you those. thanks again. On Sun, Jan 17, 2010 at 12:03 AM, wrote: > On Sat, Jan 16, 2010 at 11:37 PM, ? wrote: >> On Sat, Jan 16, 2010 at 6:33 PM, ? wrote: >>> On Sat, Jan 16, 2010 at 5:28 PM, per freem wrote: >>>> hi all, >>>> >>>> i am using gaussian_kde to fit a gaussian kernel estimator to a bunch >>>> of data. the lines i get are often quite jaggy and very sensitive to >>>> fluctuations in the data. is there a way to "smooth" the estimate >>>> more? typically in gaussian kdes there is a smoothing parameter, but i >>>> do not see one in the documentation. >>>> >>>> is there a way to do this? >>> >>> Not yet, I never committed the change, the cleanest way currently is >>> by subclassing kde_gaussian, the dirtier version is by monkey >>> patching. I can look for my example scripts for both later tonight, >>> there is also some information on the mailing list, e.g. a subclassing >>> example by Anne (maybe one and a half years ago) >>> >>> I'm a bit surprised about undersmoothing, because I did the changes >>> for the case of oversmoothing by kde_gaussian. >>> >>> Josef >> >> In the attachment is my subclass of ?stats.gaussian_kde. the main >> point I did was to allow to set or reset the smoothing factor to a >> float. It plots several examples >> >> Initially this was intended to be a continuation to this story, but I >> never got around to finishing it (my file is dated may, and I haven't >> looked at it in a long time) >> >> http://jpktd.blogspot.com/2009/03/using-gaussian-kernel-density.html >> >> I hope this helps, ask if something is not clear. >> >> I don't find a ticket or mailinglist thread on my draft for the >> enhancement (keyword option for bandwith) to gausssian_kde, the >> initial monkey patch version is here >> http://mail.scipy.org/pipermail/scipy-user/2009-January/019201.html >> >> Josef > > I just created http://projects.scipy.org/scipy/ticket/1092 so I don't > forget about it again. > > I appreciate any comments about what changes would be useful for the > bandwidth choice. > > Josef > > >> >> >>> >>> >>> >>>> >>>> thanks. >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gorkypl at gmail.com Sun Jan 17 10:15:00 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Sun, 17 Jan 2010 16:15:00 +0100 Subject: [SciPy-User] scikits.timeseries plot and utf8 fonts Message-ID: <5158a0651001170715o687dedcbs9c5487c2a5975ff1@mail.gmail.com> hello, Still working with scikits.timeseries and matplotlib I've encountered another problem - this time with displaying month names in UTF8. In my language (Polish) in month names there are non-ascii characters (like ? or ?). When plotting dataseries with matplotlib methods they are displayed correctly, but when using scikits tsplot I get rectangles in those places. The other texts (axis titles, legends and so on) ore OK. For example - there is no problem in this demo: http://matplotlib.sourceforge.net/examples/pylab_examples/date_demo2.html But the names are not displayed correctly when plotting the first example from: http://pytseries.sourceforge.net/lib.plotting.examples.html I wonder if there is a configuration problem or an issue with TimeSeriesPlot? greetings, Pawe? From contact at pythonxy.com Sun Jan 17 12:07:08 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Sun, 17 Jan 2010 18:07:08 +0100 Subject: [SciPy-User] [ANN] Spyder v1.0.3 released Message-ID: <4B5343BC.3070703@pythonxy.com> Hi all, I'm pleased to announce here that Spyder version 1.0.3 has been released: http://packages.python.org/spyder __Important__ Spyder v1.0.3 is a *critical* bugfix release (bonus: new "Apply" button in matplotlib's figure options editor). Previously known as Pydee, Spyder (Scientific PYthon Development EnviRonment) is a free open-source Python development environment providing MATLAB-like features in a simple and light-weighted software, available for Windows XP/Vista/7, GNU/Linux and MacOS X: * advanced code editing features (code analysis, ...) * interactive console with MATLAB-like workpace (with GUI-based list, dictionary, tuple, text and array editors -- screenshots: http://packages.python.org/spyder/console.html#the-workspace) and integrated matplotlib figures * external console to open an interpreter or run a script in a separate process (with a global variable explorer providing the same features as the interactive console's workspace) * code analysis with pyflakes and pylint * search in files features * documentation viewer: automatically retrieves docstrings or source code of the function/class called in the interactive/external console * integrated file/directories explorer * MATLAB-like path management ...and more! Spyder is part of spyderlib, a Python module based on PyQt4 and QScintilla2 which provides powerful console-related PyQt4 widgets. - Pierre From fiolj at yahoo.com Sun Jan 17 13:28:23 2010 From: fiolj at yahoo.com (Juan) Date: Sun, 17 Jan 2010 15:28:23 -0300 Subject: [SciPy-User] f2py segfault Message-ID: <4B5356C7.80908@yahoo.com> Hi, I don't know if this is the right place (if it is not, please point me in the right direction). I am using f2py with some own programs and I am going insane with a segmentation fault. It is probably a problem in my code but I'd like to know if someone has any hint to give me since I've been trying different things for two days already. I've got a few routines in fortran with in/out arrays. When I call one of the routines it works well. The second routine I call crashes the program. I've been changing routines and it seems that it does not matter with routines I use. Basically, the fortran routines have the signature: subroutine sub1(y, Np, Nt) integer(4), intent(IN) :: Np integer(4), intent(IN) :: Nt real(8), intent(INOUT), dimension(6*Np, Nt) :: y and I call them from python as: import mymod r= np.zeros((Ncoord,Ntraj),dtype=np.float64, order='Fortran') mymod.sub1(r) I am using python 2.6. Probably the statement of the problem is to vague to get an answer. But I'll settle for just some ideas on how to proceed. I've used the option for debugging: --debug-capi but it does not provide with more information. Only tells me that it checks for the array and segfaults (before analyzing the integer arguments Np, Nt) Thanks, Juan From pgmdevlist at gmail.com Sun Jan 17 15:37:21 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 17 Jan 2010 15:37:21 -0500 Subject: [SciPy-User] scikits.timeseries plot and utf8 fonts In-Reply-To: <5158a0651001170715o687dedcbs9c5487c2a5975ff1@mail.gmail.com> References: <5158a0651001170715o687dedcbs9c5487c2a5975ff1@mail.gmail.com> Message-ID: On Jan 17, 2010, at 10:15 AM, Pawe? Rumian wrote: > hello, > > Still working with scikits.timeseries and matplotlib I've encountered > another problem - this time with displaying month names in UTF8. > > In my language (Polish) in month names there are non-ascii characters > (like ? or ?). > When plotting dataseries with matplotlib methods they are displayed > correctly, but when using scikits tsplot I get rectangles in those > places. > The other texts (axis titles, legends and so on) ore OK. > > For example - there is no problem in this demo: > http://matplotlib.sourceforge.net/examples/pylab_examples/date_demo2.html > But the names are not displayed correctly when plotting the first example from: > http://pytseries.sourceforge.net/lib.plotting.examples.html > > I wonder if there is a configuration problem or an issue with TimeSeriesPlot? Oh, I'm afraid it's just bugs on the scikits part, the lib.plotlib section has been lagging behing matplotlib. Let me check and get back to you next week (meanwhile, please open a ticket at http://projects.scipy.org/scikits, that'll be easier to manage). Sorry for the inconvenience. P. From dagss at student.matnat.uio.no Sun Jan 17 16:28:27 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 17 Jan 2010 22:28:27 +0100 Subject: [SciPy-User] f2py segfault In-Reply-To: <4B5356C7.80908@yahoo.com> References: <4B5356C7.80908@yahoo.com> Message-ID: <4B5380FB.2000509@student.matnat.uio.no> Juan wrote: > Hi, I don't know if this is the right place (if it is not, please point me in > the right direction). > I am using f2py with some own programs and I am going insane with a segmentation > fault. It is probably a problem in my code but I'd like to know if someone has > any hint > to give me since I've been trying different things for two days already. > > I've got a few routines in fortran with in/out arrays. When I call one of the > routines it works well. The second routine I call crashes the program. I've been > changing routines and it seems that it does not matter with routines I use. > > Basically, the fortran routines have the signature: > > subroutine sub1(y, Np, Nt) > integer(4), intent(IN) :: Np > integer(4), intent(IN) :: Nt > real(8), intent(INOUT), dimension(6*Np, Nt) :: y > > and I call them from python as: > > import mymod > r= np.zeros((Ncoord,Ntraj),dtype=np.float64, order='Fortran') > mymod.sub1(r) > > I am using python 2.6. Probably the statement of the problem is to vague to get > an answer. But I'll settle for just some ideas on how to proceed. I've used the > option for debugging: --debug-capi > but it does not provide with more information. Only tells me that it checks for > the array and segfaults (before analyzing the integer arguments Np, Nt) From what little I know of f2py the "6*Np" seems like the problematic part. If f2py isn't smart enough to take the array shape and divide by 6 (which, in general, requires solving a symbolic equation, and somehow I doubt f2py is that smart, though perhaps it deals with simple things like this explicitly? *shrug*), then Np is going to passed as a too big number (try to print out Np from your Fortran program to confirm...). Dag Sverre From gorkypl at gmail.com Sun Jan 17 16:39:15 2010 From: gorkypl at gmail.com (=?UTF-8?Q?Pawe=C5=82_Rumian?=) Date: Sun, 17 Jan 2010 22:39:15 +0100 Subject: [SciPy-User] scikits.timeseries plot and utf8 fonts In-Reply-To: References: <5158a0651001170715o687dedcbs9c5487c2a5975ff1@mail.gmail.com> Message-ID: <5158a0651001171339r7ec20d09kcef9f5ec5f8457c2@mail.gmail.com> 2010/1/17 Pierre GM : > Oh, I'm afraid it's just bugs on the scikits part, the lib.plotlib section has been lagging behing matplotlib. Let me check and get back to you next week (meanwhile, please open a ticket at http://projects.scipy.org/scikits, that'll be easier to manage). > Sorry for the inconvenience. No problem, thanks for the response :) Pawe? From kwmsmith at gmail.com Sun Jan 17 16:43:44 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Sun, 17 Jan 2010 15:43:44 -0600 Subject: [SciPy-User] f2py segfault In-Reply-To: <4B5380FB.2000509@student.matnat.uio.no> References: <4B5356C7.80908@yahoo.com> <4B5380FB.2000509@student.matnat.uio.no> Message-ID: On Sun, Jan 17, 2010 at 3:28 PM, Dag Sverre Seljebotn wrote: > Juan wrote: >> Hi, I don't know if this is the right place (if it is not, please point me in >> the right direction). >> I am using f2py with some own programs and I am going insane with a segmentation >> fault. It is probably a problem in my code but I'd like to know if someone has >> any hint >> to give me since I've been trying different things for two days already. >> >> I've got a few routines in fortran with in/out arrays. When I call one of the >> routines it works well. The second routine I call crashes the program. I've been >> changing routines and it seems that it does not matter with routines I use. >> >> Basically, the fortran routines have the signature: >> >> subroutine sub1(y, Np, Nt) >> ? integer(4), intent(IN) :: Np >> ? integer(4), intent(IN) :: Nt >> ? real(8), intent(INOUT), dimension(6*Np, Nt) :: y >> >> and I call them from python as: >> >> import mymod >> r= np.zeros((Ncoord,Ntraj),dtype=np.float64, order='Fortran') >> mymod.sub1(r) >> >> I am using python 2.6. Probably the statement of the problem is to vague to get >> an answer. But I'll settle for just some ideas on how to proceed. I've used the >> option for debugging: --debug-capi >> but it does not provide with more information. Only tells me that it checks for >> the array and segfaults (before analyzing the integer arguments Np, Nt) > ?From what little I know of f2py the "6*Np" seems like the problematic > part. If f2py isn't smart enough to take the array shape and divide by 6 > (which, in general, requires solving a symbolic equation, and somehow I > doubt f2py is that smart, though perhaps it deals with simple things > like this explicitly? *shrug*), then Np is going to passed as a too big > number (try to print out Np from your Fortran program to confirm...). That's what I suspected at first, too. f2py tries to handle this, although it uses integer division, and I think leads to a bug. I don't get a segfault, though, so your problem might be something else. If you can redo things to get rid of the '6*Np' it might be worth trying. ksmith at lothario:~/test-f2py$ cat foo.f90 subroutine sub1(y, Np, Nt) integer(4), intent(in) :: Np integer(4), intent(in) :: Nt real(8), intent(inout), dimension(6*Np, Nt) :: y y = 1.0 end subroutine sub1 ksmith at lothario:~/test-f2py$ f2py --debug-capi -c foo.f90 [f2py output] ksmith at lothario:~/test-f2py$ cat test_foo.py import numpy as np import untitled print untitled.sub1.__doc__ # this works -- note the array extents -- multiples of 6, so the integer division works... r = np.zeros((6, 6), dtype=np.float64, order="Fortran") untitled.sub1(r) print r assert np.all(r == 1.0) # this doesn't work -- note the array extents. r = np.zeros((5, 5), dtype=np.float64, order="Fortran") untitled.sub1(r) print r assert np.all(r == 1.0) ksmith at lothario:~/test-f2py$ python test_foo.py sub1 - Function signature: sub1(y,[np,nt]) Required arguments: y : in/output rank-2 array('d') with bounds (6 * np,nt) Optional arguments: np := (shape(y,0))/(6) input int nt := shape(y,1) input int debug-capi:Python C/API function untitled.sub1(y,np=(shape(y,0))/(6),nt=shape(y,1)) debug-capi:double y=:inoutput,required,array,dims(6 * np|6 * np,nt|nt) debug-capi:int np=(shape(y,0))/(6):input,optional,scalar debug-capi:np=1 debug-capi:Checking `(shape(y,0))/(6)==np' debug-capi:int nt=shape(y,1):input,optional,scalar debug-capi:nt=6 debug-capi:Checking `shape(y,1)==nt' debug-capi:Fortran subroutine `sub1(y,&np,&nt)' debug-capi:np=1 debug-capi:nt=6 debug-capi:Building return value. debug-capi:Python C/API function untitled.sub1: successful. debug-capi:Freeing memory. [[ 1. 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1. 1.]] debug-capi:Python C/API function untitled.sub1(y,np=(shape(y,0))/(6),nt=shape(y,1)) debug-capi:double y=:inoutput,required,array,dims(6 * np|6 * np,nt|nt) debug-capi:int np=(shape(y,0))/(6):input,optional,scalar debug-capi:np=0 debug-capi:Checking `(shape(y,0))/(6)==np' debug-capi:int nt=shape(y,1):input,optional,scalar debug-capi:nt=5 debug-capi:Checking `shape(y,1)==nt' debug-capi:Fortran subroutine `sub1(y,&np,&nt)' debug-capi:np=0 debug-capi:nt=5 debug-capi:Building return value. debug-capi:Python C/API function untitled.sub1: successful. debug-capi:Freeing memory. [[ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]] Traceback (most recent call last): File "test_foo.py", line 16, in assert np.all(r == 1.0) AssertionError From sfalsharif at gmail.com Sun Jan 17 18:44:56 2010 From: sfalsharif at gmail.com (Sharaf Al-Sharif) Date: Sun, 17 Jan 2010 23:44:56 +0000 Subject: [SciPy-User] Amplitude scaling in fft Message-ID: <57c570d21001171544p5f9b3835g513b2575adc6af33@mail.gmail.com> Hi, I'm a bit confused regarding how the amplitudes returned by np.fft.fft (or np.fft.rfft) relate to the amplitudes of the original signal in time domain. If: A = np.fft.rfft(a,n=2048) but, n_pts = len(a) < 2048, will the physical amplitudes in time domain be np.abs(A)*2/2048 , or np.abs(A)*2/n_pts? Or something else? Thank you for your help. Sharaf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fiolj at yahoo.com Mon Jan 18 07:19:48 2010 From: fiolj at yahoo.com (Juan) Date: Mon, 18 Jan 2010 09:19:48 -0300 Subject: [SciPy-User] Fwd: f2py segfault Message-ID: <4B5451E4.5020806@yahoo.com> Hi, thanks for the advice. I did not notice that the integer division could be a source for trouble. Now I changed all the routines. However, I still have the same segmentation fault. debug-capi:Python C/API function mymod.sub0(state,ndim=shape(state,0),ntrajectories=shape(state,1)) debug-capi:double state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) debug-capi:int ndim=shape(state,0):input,optional,scalar debug-capi:ndim=24 debug-capi:Checking `shape(state,0)==ndim' debug-capi:int ntrajectories=shape(state,1):input,optional,scalar debug-capi:ntrajectories=100 debug-capi:Checking `shape(state,1)==ntrajectories' debug-capi:Fortran subroutine `sub0(state,&ndim,&ntrajectories)' debug-capi:ndim=24 debug-capi:ntrajectories=100 debug-capi:Building return value. debug-capi:Python C/API function mymod.sub0: successful. debug-capi:Freeing memory. debug-capi:Python C/API function mymod.sub1(state,d_i,d_f,ndim=shape(state,0),ntrajectories=shape(state,1)) debug-capi:double state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) Segmentation fault The working sub1 has two other arguments d_i and d_f which are real scalars, the full signatures are: subroutine sub0(state, Ndim, Ntrajectories) integer(I32), intent(IN) :: Ndim integer(I32), intent(IN) :: Ntrajectories real(R64), intent(INOUT), dimension(Ndim,Ntrajectories) :: state ... end subroutine sub0 subroutine sub1(state,d_i,d_f, Ndim,Ntrajectories) integer(4), intent(IN) :: Ndim integer(4), intent(IN) :: Ntrajectories real(8), intent(INOUT), dimension(Ndim, Ntrajectories) :: state real(8), intent(IN) :: d_i real(8), intent(IN) :: d_f print *, shape(state), Ndim, Ntrajectories ... end subroutine sub1 and I am calling from my script as: import mymod Ndim=24 Ntrajectories=10 di=0., df=10. r= np.zeros((Ndim,Ntrajectories),dtype=np.float64, order='Fortran') mymod.sub0(r) mymod.sub1(r, di, df) As it can be seen from the debug output, f2py is checking the arguments for sub0 but it segfault before checking the args in sub1 (with no very informative messages). It may well be a problem related to theworkings of the routines but they work when I use them in tests on pure fortran code. Additionally I get a very similar error message if I call sub0 (mymod.sub0(r)) instead of sub1 (mymod.sub1(r, di, df)) the second time in the python script. Any ideas? Thanks again. Juan -------- Original Message -------- Subject: f2py segfault Date: Sun, 17 Jan 2010 15:28:23 -0300 From: Juan To: scipy-user at scipy.org Hi, I don't know if this is the right place (if it is not, please point me in the right direction). I am using f2py with some own programs and I am going insane with a segmentation fault. It is probably a problem in my code but I'd like to know if someone has any hint to give me since I've been trying different things for two days already. I've got a few routines in fortran with in/out arrays. When I call one of the routines it works well. The second routine I call crashes the program. I've been changing routines and it seems that it does not matter with routines I use. Basically, the fortran routines have the signature: From dagss at student.matnat.uio.no Mon Jan 18 10:55:34 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 18 Jan 2010 16:55:34 +0100 Subject: [SciPy-User] Fwd: f2py segfault In-Reply-To: <4B5451E4.5020806@yahoo.com> References: <4B5451E4.5020806@yahoo.com> Message-ID: <4B548476.9060400@student.matnat.uio.no> Juan wrote: > Hi, thanks for the advice. I did not notice that the integer division could be a > source for trouble. Now I changed all the routines. However, I still have the > same segmentation fault. > > > debug-capi:Python C/API function > mymod.sub0(state,ndim=shape(state,0),ntrajectories=shape(state,1)) > debug-capi:double > state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) > debug-capi:int ndim=shape(state,0):input,optional,scalar > debug-capi:ndim=24 > debug-capi:Checking `shape(state,0)==ndim' > debug-capi:int ntrajectories=shape(state,1):input,optional,scalar > debug-capi:ntrajectories=100 > debug-capi:Checking `shape(state,1)==ntrajectories' > debug-capi:Fortran subroutine `sub0(state,&ndim,&ntrajectories)' > debug-capi:ndim=24 > debug-capi:ntrajectories=100 > debug-capi:Building return value. > debug-capi:Python C/API function mymod.sub0: successful. > debug-capi:Freeing memory. > debug-capi:Python C/API function > mymod.sub1(state,d_i,d_f,ndim=shape(state,0),ntrajectories=shape(state,1)) > debug-capi:double > state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) > Segmentation fault > > The working sub1 has two other arguments d_i and d_f which are real scalars, the > full signatures are: > > subroutine sub0(state, Ndim, Ntrajectories) > integer(I32), intent(IN) :: Ndim > integer(I32), intent(IN) :: Ntrajectories > real(R64), intent(INOUT), dimension(Ndim,Ntrajectories) :: state > ... > end subroutine sub0 > > subroutine sub1(state,d_i,d_f, Ndim,Ntrajectories) > integer(4), intent(IN) :: Ndim > integer(4), intent(IN) :: Ntrajectories > real(8), intent(INOUT), dimension(Ndim, Ntrajectories) :: state > real(8), intent(IN) :: d_i > real(8), intent(IN) :: d_f > print *, shape(state), Ndim, Ntrajectories > ... > end subroutine sub1 > > and I am calling from my script as: > > import mymod > Ndim=24 > Ntrajectories=10 > di=0., df=10. > r= np.zeros((Ndim,Ntrajectories),dtype=np.float64, order='Fortran') > > mymod.sub0(r) > mymod.sub1(r, di, df) > > As it can be seen from the debug output, f2py is checking the arguments for sub0 > but it segfault before checking the args in sub1 (with no very informative > messages). > > It may well be a problem related to theworkings of the routines but they work > when I use them in tests on pure fortran code. Additionally I get a very similar > error message if I call sub0 (mymod.sub0(r)) instead of sub1 (mymod.sub1(r, di, > df)) the second time in the python script. > > Any ideas? Thanks again. Juan > Did you say which Fortran compiler you were using? f2py makes some blatant assumptions about the Fortran compiler which is nowhere in any standard. If you don't use gfortran you may get problems. Dag Sverre From dagss at student.matnat.uio.no Mon Jan 18 10:56:42 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 18 Jan 2010 16:56:42 +0100 Subject: [SciPy-User] Fwd: f2py segfault In-Reply-To: <4B548476.9060400@student.matnat.uio.no> References: <4B5451E4.5020806@yahoo.com> <4B548476.9060400@student.matnat.uio.no> Message-ID: <4B5484BA.7050109@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > Juan wrote: >> Hi, thanks for the advice. I did not notice that the integer division >> could be a >> source for trouble. Now I changed all the routines. However, I still >> have the >> same segmentation fault. >> >> >> debug-capi:Python C/API function >> mymod.sub0(state,ndim=shape(state,0),ntrajectories=shape(state,1)) >> debug-capi:double >> state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) >> >> debug-capi:int ndim=shape(state,0):input,optional,scalar >> debug-capi:ndim=24 >> debug-capi:Checking `shape(state,0)==ndim' >> debug-capi:int ntrajectories=shape(state,1):input,optional,scalar >> debug-capi:ntrajectories=100 >> debug-capi:Checking `shape(state,1)==ntrajectories' >> debug-capi:Fortran subroutine `sub0(state,&ndim,&ntrajectories)' >> debug-capi:ndim=24 >> debug-capi:ntrajectories=100 >> debug-capi:Building return value. >> debug-capi:Python C/API function mymod.sub0: successful. >> debug-capi:Freeing memory. >> debug-capi:Python C/API function >> mymod.sub1(state,d_i,d_f,ndim=shape(state,0),ntrajectories=shape(state,1)) >> >> debug-capi:double >> state=:inoutput,required,array,dims(ndim|ndim,ntrajectories|ntrajectories) >> >> Segmentation fault >> >> The working sub1 has two other arguments d_i and d_f which are real >> scalars, the >> full signatures are: >> >> subroutine sub0(state, Ndim, Ntrajectories) >> integer(I32), intent(IN) :: Ndim >> integer(I32), intent(IN) :: Ntrajectories >> real(R64), intent(INOUT), dimension(Ndim,Ntrajectories) :: state >> ... >> end subroutine sub0 >> >> subroutine sub1(state,d_i,d_f, Ndim,Ntrajectories) >> integer(4), intent(IN) :: Ndim >> integer(4), intent(IN) :: Ntrajectories >> real(8), intent(INOUT), dimension(Ndim, Ntrajectories) :: state >> real(8), intent(IN) :: d_i >> real(8), intent(IN) :: d_f >> print *, shape(state), Ndim, Ntrajectories >> ... >> end subroutine sub1 >> >> and I am calling from my script as: >> >> import mymod >> Ndim=24 >> Ntrajectories=10 >> di=0., df=10. >> r= np.zeros((Ndim,Ntrajectories),dtype=np.float64, order='Fortran') >> >> mymod.sub0(r) >> mymod.sub1(r, di, df) >> >> As it can be seen from the debug output, f2py is checking the >> arguments for sub0 >> but it segfault before checking the args in sub1 (with no very >> informative >> messages). >> >> It may well be a problem related to theworkings of the routines but >> they work >> when I use them in tests on pure fortran code. Additionally I get a >> very similar >> error message if I call sub0 (mymod.sub0(r)) instead of sub1 >> (mymod.sub1(r, di, >> df)) the second time in the python script. >> >> Any ideas? Thanks again. Juan >> > Did you say which Fortran compiler you were using? f2py makes some > blatant assumptions about the Fortran compiler which is nowhere in any > standard. If you don't use gfortran you may get problems. > > Dag Sverre > Actually, other compilers may work very well -- I just don't know myself, but know that that is a possible source of problems... Dag Sverre From josef.pktd at gmail.com Mon Jan 18 10:59:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 Jan 2010 10:59:46 -0500 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 In-Reply-To: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> Message-ID: <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> On Sun, Jan 17, 2010 at 5:25 AM, Yves Frederix wrote: > Hi, > > I stumbled upon the following unlogical behavior of > scipy.interpolate.splev. When presented with a length-1 array, the > output is converted to a scalar. > >


> import scipy.interpolate
> import numpy as N
>
> x = N.arange(5.)
> y = N.arange(5.)
> tck = scipy.interpolate.splrep(x,y)
>
> x_eval = N.asarray([1.])
> y_eval = scipy.interpolate.splev(x_eval, tck)
>
> print 'scipy.interpolate.splev(x_eval, tck):', y_eval
> print 'type(x_eval):', type(x_eval)
> print 'type(y_eval):', type(y_eval)
>

> > with output > > > scipy.interpolate.splev(x_eval, tck): 1.0 > type(x_eval): > type(y_eval): > > > It was rather unexpected that the type of input and output data are > different. After checking interpolate/fitpack.py it seems that this > behavior results from the fact that the length-1 case is explicitly > treated differently (probably to be able to deal with the case of > scalar input, for which scalar output is expected): > > ?434 def splev(x,tck,der=0): > ? > ?487 ? ? ? ? if ier: raise TypeError,"An error occurred" > ?488 ? ? ? ? if len(y)>1: return y > ?489 ? ? ? ? return y[0] > ?490 > > Wouldn't it be less confusing to have the return value always have the > same type as the input data? I don't know of any "official" policy. scipy.stats has switched for the most part to the same behavior. I think, mainly it is just a convention to have a nicer output when the return value is a scalar. One problem with making the output depend on the input type or shape is that in most functions I know, this information is not kept inside the function. Usually the input of array_like (arrays, lists, tuples, scalar numbers) is converted to an ndarray with np.asarray or np.array. The output then is independent of the input type (which hurts also if a user wants to work with matrices or other subclasses of ndarrays). On the other hand, if I want to use a list as input for convenience, I don't really want a list as output, I want an ndarray. That's my view, I don't really care in which direction the convention goes, but I like the consistency. Josef > Cheers, > YVES > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From amenity at enthought.com Mon Jan 18 11:55:20 2010 From: amenity at enthought.com (Amenity Applewhite) Date: Mon, 18 Jan 2010 10:55:20 -0600 Subject: [SciPy-User] EPD 6.0 and IPython Webinar Friday References: <0AE0D056-D7BB-498B-A14D-AAF9A90ED8F2@enthought.com> Message-ID: <94400778-AE3F-46A4-8B92-C86CB6DAD95A@enthought.com> Email not displaying correctly? View it in your browser. Happy 2010! To start the year off, we've released a new version of EPD and lined up a solid set of training options. Scientific Computing with Python Webinar This Friday, Travis Oliphant will then provide an introduction to multiprocessing and iPython.kernal. Scientific Computing with Python Webinar Multiprocessing and iPython.kernal Friday, January 22: 1pm CST/7pm UTC Register Enthought Live Training Enthought's intensive training courses are offered in 3-5 day sessions. The Python skills you'll acquire will save you and your organization time and money in 2010. Enthought Open Course February 22-26, Austin, TX ? Python for Scientists and Engineers ? Interfacing with C / C++ and Fortran ? Introduction to UIs and Visualization Enjoy! The Enthought Team EPD 6.0 Released Now available in our repository, EPD 6.0 includes Python 2.6, PiCloud's cloud library, and NumPy 1.4... Not to mention 64-bit support for Windows, OSX, and Linux. Details. Download now. New: Enthought channel on YouTube Short instructional videos straight from the desktops of our developers. Get started with a 4-part series on interpolation with SciPy. Our mailing address is: Enthought, Inc. 515 Congress Ave. Austin, TX 78701 Copyright (C) 2009 Enthought, Inc. All rights reserved. Forward this email to a friend -------------- next part -------------- An HTML attachment was scrubbed... URL: From cycomanic at gmail.com Mon Jan 18 20:19:36 2010 From: cycomanic at gmail.com (Jochen Schroeder) Date: Tue, 19 Jan 2010 12:19:36 +1100 Subject: [SciPy-User] Amplitude scaling in fft In-Reply-To: <57c570d21001171544p5f9b3835g513b2575adc6af33@mail.gmail.com> References: <57c570d21001171544p5f9b3835g513b2575adc6af33@mail.gmail.com> Message-ID: <20100119011934.GA2266@cudos0803> Hi, your answer doesn't really have a clear answer. Say raw_fft/raw_ifft is an fft without normalization then: A = raw_ifft(raw_fft(a, n=2**11), n=2**11) A = N*a where N=2**11 not len(a). However numpy does perform a normalization step in the ifft part, so that numpy.fft.ifft = raw_fft / N This way we can use the fft just as a Fourier transform and also fft(\delta) is constant 1. Hope that explains things a bit. Cheers Jochen On 01/17/10 23:44, Sharaf Al-Sharif wrote: > Hi, > I'm a bit confused regarding how the amplitudes returned by np.fft.fft (or > np.fft.rfft) relate to the amplitudes of the original signal in time domain. > If: > A = np.fft.rfft(a,n=2048) > but, > n_pts = len(a) < 2048, > > will the physical amplitudes in time domain be np.abs(A)*2/2048 , or np.abs(A) > *2/n_pts? Or something else? > Thank you for your help. > > Sharaf > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav+sp at iki.fi Tue Jan 19 04:41:23 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Tue, 19 Jan 2010 09:41:23 +0000 (UTC) Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> Message-ID: Mon, 18 Jan 2010 10:59:46 -0500, josef.pktd wrote: > On Sun, Jan 17, 2010 at 5:25 AM, Yves Frederix > wrote: [clip] >> It was rather unexpected that the type of input and output data are >> different. After checking interpolate/fitpack.py it seems that this >> behavior results from the fact that the length-1 case is explicitly >> treated differently (probably to be able to deal with the case of >> scalar input, for which scalar output is expected): >> >> ?434 def splev(x,tck,der=0): >> ? >> ?487 ? ? ? ? if ier: raise TypeError,"An error occurred" 488 ? ? ? ? >> ?if len(y)>1: return y 489 ? ? ? ? return y[0] >> ?490 >> >> Wouldn't it be less confusing to have the return value always have the >> same type as the input data? > > I don't know of any "official" policy. I think (unstructured) interpolation should respect input.shape == output.shape also for 0-d. So yes, it's a wart, IMHO. Another question is: how many people actually have code that depends on this wart, and can it be fixed? I'd guess there's not much problem: (1,) arrays function nicely as scalars, but not vice versa because of mutability. -- Pauli Virtanen From yves.frederix at gmail.com Tue Jan 19 04:45:11 2010 From: yves.frederix at gmail.com (Yves Frederix) Date: Tue, 19 Jan 2010 10:45:11 +0100 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 In-Reply-To: <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> Message-ID: <62e6eafb1001190145s6847b08ald13ec237cdbca9c@mail.gmail.com> Hi, In fact, I totally agree with you. Full matching of output to the type of the input does not make sense. But one could expect that array_like input results in ndarray output and scalar input in scalar output. As far as I can see, scipy.stats behaves exactly in this way. Anyway, I checked some other files and, e.g., in scipy/interpolate/polyint.py the input is explicitly tested to be scalar. In attachment you can find a patch for scipy/interpolate/fitpack.py so that it behaves 'correctly'. Regards, YVES > scipy.stats has switched for the most part to the same behavior. I > think, mainly it is just a convention to have a nicer output when the > return value is a scalar. > > One problem with making the output depend on the input type or shape > is that in most functions I know, this information is not kept inside > the function. Usually the input of array_like (arrays, lists, tuples, > scalar numbers) is converted to an ndarray with np.asarray or > np.array. > The output then is independent of the input type (which hurts also if > a user wants to work with matrices or other subclasses of ndarrays). > > On the other hand, if I want to use a list as input for convenience, I > don't really want a list as output, I want an ndarray. > > That's my view, I don't really care in which direction the convention > goes, but I like the consistency. > > Josef > >> Cheers, >> YVES >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- A non-text attachment was scrubbed... Name: splev_patch.diff Type: application/octet-stream Size: 1112 bytes Desc: not available URL: From pgmdevlist at gmail.com Wed Jan 20 03:53:34 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 20 Jan 2010 03:53:34 -0500 Subject: [SciPy-User] timeseries tsfromtxt missing_values bug? In-Reply-To: <4B5077A4.63BA.009B.0@twdb.state.tx.us> References: <4B5077A4.63BA.009B.0@twdb.state.tx.us> Message-ID: <1C4012B8-4C61-4C97-9DE0-F4619383BC94@gmail.com> On Jan 15, 2010, at 3:11 PM, Dharhas Pothina wrote: > Hi, > > I'm having issues with tsfromtxt masking fields using the missing_values parameter. > >>>> dateconverter = lambda y, m, d, hh, mm : datetime(year=int(y), month=int(m), day=int(d), hour=int(hh), minute=int(mm)) >>>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0) > > gives : > > timeseries([(-999.0,) (-999.0,) (-999.0,)], > dtype = [('f5', ' dates = [02-May-2000 06:00 12-May-2000 08:00 13-May-2000 00:00], > freq = T) > > While : > >>>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0,names='data') > > gives : > > timeseries([(--,) (--,) (--,)], > dtype = [('_tmp4', ' dates = [02-May-2000 06:00 12-May-2000 08:00 13-May-2000 00:00], > freq = T) > > So if I uses the 'names' argument the missing values are masked correctly but the field name is set to '_tmp4' rather than 'data'. If I don't use the 'names' argument the missing values are not masked. I've attached a small file to demonstrate. Am I doing something wrong or is this a bug. > Dharhas, Sorry for the delay. So yes, you uncovered two bugs: (1) when no names were given, the missing values were skipped (if they were not strings); (2) when using usecols, the names were properly propagated. I fixed them on SVN, would you mind giving a try ? From icy.flame.gm at gmail.com Wed Jan 20 10:20:10 2010 From: icy.flame.gm at gmail.com (iCy-fLaME) Date: Wed, 20 Jan 2010 15:20:10 +0000 Subject: [SciPy-User] How to do symmetry detection? Message-ID: Hello, I have some signals in mirror pairs in an 1D/2D array, and I am trying to identify the symmetry axis. A simplified example of the signal pair can look like this: [0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0] The ideal output in this case will probably be: [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] As long as the symmetry point has the largest value, it will be fine. There can be multiple pairs of signals in the array, and the length of separation and duration of the signal can vary from pair to pair. The overall length of the array is about 1k points. The output array should reflect the level of likeness between the two sides of the array. I tried doing a loop as follows: ############ Begin ############ from numpy import array from numpy import zeros from numpy import arange data = array([0,0,0,0,2,3,4,0,0,0,4,3,2,0]) length = len(data) result = zeros(length) left = arange(length) left[0] = 0 # Index to be used for the end of the left portion right = arange(length) + 1 right[-1] = length - 1 # Index to be used for the begining of the right hand portion for i in range(length): l_part = zeros(length) # Default values to be zero, so non-overlapping region will r_part = zeros(length) # return zero after the multiplication. l_part[:left[i]] = data[:left[i]][::-1] # Take the left hand side and mirror it r_part[:length-right[i]] = data[right[i]:] # Take the right hand side result[i] = sum(l_part*r_part)/length # Use the product and integral to find the similarity metric. print l_part print r_part print "===============================", result[i] print result ############ END ############ But it is rather slow for a 1000x1000 2D array, anyone got any suggestion for a more elegant solution? Thanks in advance! From josef.pktd at gmail.com Wed Jan 20 10:43:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 Jan 2010 10:43:57 -0500 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: <1cd32cbb1001200743w20a36140h9421a201b2a39330@mail.gmail.com> On Wed, Jan 20, 2010 at 10:20 AM, iCy-fLaME wrote: > Hello, > > I have some signals in mirror pairs in an 1D/2D array, and I am trying > to identify the symmetry axis. > > A simplified example of the signal pair can look like this: > [0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0] > > The ideal output in this case will probably be: > [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] > > As long as the symmetry point has the largest value, it will be fine. > > There can be multiple pairs of signals in the array, and the length of > separation and duration of the signal can vary from pair to pair. The > overall length of the array is about 1k points. The output array > should reflect the level of likeness between the two sides of the > array. > > I tried doing a loop as follows: > > ############ Begin ############ > from numpy import array > from numpy import zeros > from numpy import arange > > data = ?array([0,0,0,0,2,3,4,0,0,0,4,3,2,0]) > length = len(data) > result = zeros(length) > > left = arange(length) > left[0] = 0 ? ? ? ? ? ? ? ? # Index to be used for the end of the left portion > > right = arange(length) + 1 > right[-1] = length - 1 ? ? ?# Index to be used for the begining of the > right hand portion > > for i in range(length): > ? ?l_part = zeros(length) ?# Default values to be zero, so > non-overlapping region will > ? ?r_part = zeros(length) ?# ? return zero after the multiplication. > > ? ?l_part[:left[i]] = data[:left[i]][::-1] ? ? # Take the left hand > side and mirror it > ? ?r_part[:length-right[i]] = data[right[i]:] ?# Take the right hand side > ? ?result[i] = sum(l_part*r_part)/length ? # Use the product and > integral to find the similarity metric. > > ? ?print l_part > ? ?print r_part > ? ?print "===============================", result[i] > > > print result > > > ############ END ############ > > > But it is rather slow for a 1000x1000 2D array, anyone got any > suggestion for a more elegant solution? not as general and flexible but fast >>> a=np.array([0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0]) >>> kw = [-1.,-1,-1,0,1,1,1] >>> (signal.convolve(a,kw,'valid')==0).astype(int) array([0, 0, 0, 0, 0, 1, 0, 0]) convolve can handle also nd One idea might be to use something like this in a first round, and use the more correct loop solution only if there are several shorter mirrors found by convolve. Also a guess on the likely length might improve the choice of window. Distance measure is additive not multiplicative. Josef > > Thanks in advance! > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Wed Jan 20 10:47:40 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 Jan 2010 10:47:40 -0500 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: <1cd32cbb1001200743w20a36140h9421a201b2a39330@mail.gmail.com> References: <1cd32cbb1001200743w20a36140h9421a201b2a39330@mail.gmail.com> Message-ID: <1cd32cbb1001200747t19f6y117390d7800b835e@mail.gmail.com> On Wed, Jan 20, 2010 at 10:43 AM, wrote: > On Wed, Jan 20, 2010 at 10:20 AM, iCy-fLaME wrote: >> Hello, >> >> I have some signals in mirror pairs in an 1D/2D array, and I am trying >> to identify the symmetry axis. >> >> A simplified example of the signal pair can look like this: >> [0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0] >> >> The ideal output in this case will probably be: >> [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] >> >> As long as the symmetry point has the largest value, it will be fine. >> >> There can be multiple pairs of signals in the array, and the length of >> separation and duration of the signal can vary from pair to pair. The >> overall length of the array is about 1k points. The output array >> should reflect the level of likeness between the two sides of the >> array. >> >> I tried doing a loop as follows: >> >> ############ Begin ############ >> from numpy import array >> from numpy import zeros >> from numpy import arange >> >> data = ?array([0,0,0,0,2,3,4,0,0,0,4,3,2,0]) >> length = len(data) >> result = zeros(length) >> >> left = arange(length) >> left[0] = 0 ? ? ? ? ? ? ? ? # Index to be used for the end of the left portion >> >> right = arange(length) + 1 >> right[-1] = length - 1 ? ? ?# Index to be used for the begining of the >> right hand portion >> >> for i in range(length): >> ? ?l_part = zeros(length) ?# Default values to be zero, so >> non-overlapping region will >> ? ?r_part = zeros(length) ?# ? return zero after the multiplication. >> >> ? ?l_part[:left[i]] = data[:left[i]][::-1] ? ? # Take the left hand >> side and mirror it >> ? ?r_part[:length-right[i]] = data[right[i]:] ?# Take the right hand side >> ? ?result[i] = sum(l_part*r_part)/length ? # Use the product and >> integral to find the similarity metric. >> >> ? ?print l_part >> ? ?print r_part >> ? ?print "===============================", result[i] >> >> >> print result >> >> >> ############ END ############ >> >> >> But it is rather slow for a 1000x1000 2D array, anyone got any >> suggestion for a more elegant solution? > > > not as general and flexible but fast > >>>> a=np.array([0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0]) >>>> kw = [-1.,-1,-1,0,1,1,1] > >>>> (signal.convolve(a,kw,'valid')==0).astype(int) > array([0, 0, 0, 0, 0, 1, 0, 0]) > > convolve can handle also nd > > One idea might be to use something like this in a first round, and use > the more correct loop solution only if there are several shorter > mirrors found by convolve. Also a guess on the likely length might > improve the choice of window. > Distance measure is additive not multiplicative. or maybe this is not a great idea. if you have integers, there might be many cancellations and wrong detections. Josef > Josef > > >> >> Thanks in advance! >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From josef.pktd at gmail.com Wed Jan 20 10:59:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 Jan 2010 10:59:12 -0500 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: <1cd32cbb1001200747t19f6y117390d7800b835e@mail.gmail.com> References: <1cd32cbb1001200743w20a36140h9421a201b2a39330@mail.gmail.com> <1cd32cbb1001200747t19f6y117390d7800b835e@mail.gmail.com> Message-ID: <1cd32cbb1001200759r422d3b9cvd8ebbbdaddfe85c0@mail.gmail.com> On Wed, Jan 20, 2010 at 10:47 AM, wrote: > On Wed, Jan 20, 2010 at 10:43 AM, ? wrote: >> On Wed, Jan 20, 2010 at 10:20 AM, iCy-fLaME wrote: >>> Hello, >>> >>> I have some signals in mirror pairs in an 1D/2D array, and I am trying >>> to identify the symmetry axis. >>> >>> A simplified example of the signal pair can look like this: >>> [0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0] >>> >>> The ideal output in this case will probably be: >>> [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] >>> >>> As long as the symmetry point has the largest value, it will be fine. >>> >>> There can be multiple pairs of signals in the array, and the length of >>> separation and duration of the signal can vary from pair to pair. The >>> overall length of the array is about 1k points. The output array >>> should reflect the level of likeness between the two sides of the >>> array. >>> >>> I tried doing a loop as follows: >>> >>> ############ Begin ############ >>> from numpy import array >>> from numpy import zeros >>> from numpy import arange >>> >>> data = ?array([0,0,0,0,2,3,4,0,0,0,4,3,2,0]) >>> length = len(data) >>> result = zeros(length) >>> >>> left = arange(length) >>> left[0] = 0 ? ? ? ? ? ? ? ? # Index to be used for the end of the left portion >>> >>> right = arange(length) + 1 >>> right[-1] = length - 1 ? ? ?# Index to be used for the begining of the >>> right hand portion >>> >>> for i in range(length): >>> ? ?l_part = zeros(length) ?# Default values to be zero, so >>> non-overlapping region will >>> ? ?r_part = zeros(length) ?# ? return zero after the multiplication. >>> >>> ? ?l_part[:left[i]] = data[:left[i]][::-1] ? ? # Take the left hand >>> side and mirror it >>> ? ?r_part[:length-right[i]] = data[right[i]:] ?# Take the right hand side >>> ? ?result[i] = sum(l_part*r_part)/length ? # Use the product and >>> integral to find the similarity metric. >>> >>> ? ?print l_part >>> ? ?print r_part >>> ? ?print "===============================", result[i] >>> >>> >>> print result >>> >>> >>> ############ END ############ >>> >>> >>> But it is rather slow for a 1000x1000 2D array, anyone got any >>> suggestion for a more elegant solution? >> >> >> not as general and flexible but fast >> >>>>> a=np.array([0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0]) >>>>> kw = [-1.,-1,-1,0,1,1,1] >> >>>>> (signal.convolve(a,kw,'valid')==0).astype(int) >> array([0, 0, 0, 0, 0, 1, 0, 0]) >> >> convolve can handle also nd >> >> One idea might be to use something like this in a first round, and use >> the more correct loop solution only if there are several shorter >> mirrors found by convolve. Also a guess on the likely length might >> improve the choice of window. >> Distance measure is additive not multiplicative. > > or maybe this is not a great idea. if you have integers, there might > be many cancellations and wrong detections. or maybe not with a window liike >>> ws=3;kw = (np.pi/3.15)**np.abs(np.arange(-ws,ws+1))*np.sign(np.arange(-ws,ws+1)) >>> kw array([-0.99201436, -0.99466913, -0.997331 , 0. , 0.997331 , 0.99466913, 0.99201436]) Josef From charlesr.harris at gmail.com Wed Jan 20 12:15:47 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jan 2010 10:15:47 -0700 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: On Wed, Jan 20, 2010 at 8:20 AM, iCy-fLaME wrote: > Hello, > > I have some signals in mirror pairs in an 1D/2D array, and I am trying > to identify the symmetry axis. > > A simplified example of the signal pair can look like this: > [0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0] > > In [8]: a=np.array([0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 4, 3, 2, 0]) In [9]: center = np.convolve(a,a).argmax()*.5 In [10]: center Out[10]: 8.0 In [11]: a[center - 4: center + 5] Out[11]: array([2, 3, 4, 0, 0, 0, 4, 3, 2]) Essentially this computes the component of the original along the reversed version for different shifts looking for the best match. The center can be between two indices which is why it is computed as a float Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From icy.flame.gm at gmail.com Wed Jan 20 14:26:04 2010 From: icy.flame.gm at gmail.com (iCy-fLaME) Date: Wed, 20 Jan 2010 19:26:04 +0000 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: Thanks for the replies! Perhaps I should clarify that the input data can be int or float, and most of them will have a very large DC offset (i.e. sum(data) >> 0), and no, the signal duration can be anything, I can not "guess" The problem with convolution (scipy.signal.convolve) with self is, it will only produce one "valid" point in the middle, because anywhere else there is a mis-match of array shape. I believe scipy.signal.convolve do not take into account of the number of points being integrated, and in the case of a large DC offset, any matches far from the middle of the data will be drowned by other areas which has more points to integrate over. Self convolution also has a problem of signal features matching itself. Imagine the input of the following: data: ______W____M_____ data[::-1]: _____M____W______ As you do the convolution, feature W will match itself first, then the W-M pair matching, then the M-M matching. Where a valid algorithm should only produce results for the W-M pair matching. I hope I am making the problem more clear now, but it's not the easiest concept to describe for me. Thanks! From charlesr.harris at gmail.com Wed Jan 20 14:39:06 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jan 2010 12:39:06 -0700 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: On Wed, Jan 20, 2010 at 12:26 PM, iCy-fLaME wrote: > Thanks for the replies! > > Perhaps I should clarify that the input data can be int or float, and > most of them will have a very large DC offset (i.e. sum(data) >> 0), > and no, the signal duration can be anything, I can not "guess" > > You should remove the offset, it is translation invariant anyway and gives no symmetry information. > The problem with convolution (scipy.signal.convolve) with self is, it > will only produce one "valid" point in the middle, because anywhere > else there is a mis-match of array shape. > > This can be a problem if the symmetry is near an end, but won't matter much if the relevant part is short or near the middle. The end effect will be a problem no matter what method you use. Think of convolution as a matched filter. > I believe scipy.signal.convolve do not take into account of the number > of points being integrated, and in the case of a large DC offset, any > matches far from the middle of the data will be drowned by other areas > which has more points to integrate over. > > Self convolution also has a problem of signal features matching > itself. Imagine the input of the following: > > data: ______W____M_____ > data[::-1]: _____M____W______ > > As you do the convolution, feature W will match itself first, then the > W-M pair matching, then the M-M matching. Where a valid algorithm > should only produce results for the W-M pair matching. > > Well, there is no symmetry in that example. If you don't know if there is symmetry then you have to account for that possibility in setting up the statistics. I'm thinking Bayesian here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 20 15:00:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 Jan 2010 15:00:46 -0500 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: <1cd32cbb1001201200qb76fa6i841af63ff959fff7@mail.gmail.com> On Wed, Jan 20, 2010 at 2:39 PM, Charles R Harris wrote: > > > On Wed, Jan 20, 2010 at 12:26 PM, iCy-fLaME wrote: >> >> Thanks for the replies! >> >> Perhaps I should clarify that the input data can be int or float, and >> most of them will have a very large DC offset (i.e. sum(data) >> 0), >> and no, the signal duration can be anything, I can not "guess" >> > > You should remove the offset, it is translation invariant anyway and gives > no symmetry information. > >> >> The problem with convolution (scipy.signal.convolve) with self is, it >> will only produce one "valid" point in the middle, because anywhere >> else there is a mis-match of array shape. >> > > This can be a problem if the symmetry is near an end, but won't matter much > if the relevant part is short or near the middle. The end effect will be a > problem no matter what method you use. Think of convolution as a matched > filter. > >> >> I believe scipy.signal.convolve do not take into account of the number >> of points being integrated, and in the case of a large DC offset, any >> matches far from the middle of the data will be drowned by other areas >> which has more points to integrate over. >> >> Self convolution also has a problem of signal features matching >> itself. Imagine the input of the following: >> >> data: ______W____M_____ >> data[::-1]: _____M____W______ >> >> As you do the convolution, feature W will match itself first, then the >> W-M pair matching, then the M-M matching. Where a valid algorithm >> should only produce results for the W-M pair matching. >> > > Well, there is no symmetry in that example. If you don't know if there is > symmetry then you have to account for that possibility in setting up the > statistics. I'm thinking Bayesian here. > > Chuck And I think that convolve, especially fftconvolve for longer series has such a large speed advantage that running your loop to confirm the results (or several candidates) will still be much faster than the python loop over the entire array. Also, if the series is normalized to mean zero than the out-of bounds effect of the full self convolution will not matter so much. Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From peridot.faceted at gmail.com Wed Jan 20 15:00:47 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 20 Jan 2010 15:00:47 -0500 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 In-Reply-To: References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> Message-ID: 2010/1/19 Pauli Virtanen : > Mon, 18 Jan 2010 10:59:46 -0500, josef.pktd wrote: >> On Sun, Jan 17, 2010 at 5:25 AM, Yves Frederix >> wrote: > [clip] >>> It was rather unexpected that the type of input and output data are >>> different. After checking interpolate/fitpack.py it seems that this >>> behavior results from the fact that the length-1 case is explicitly >>> treated differently (probably to be able to deal with the case of >>> scalar input, for which scalar output is expected): >>> >>> 434 def splev(x,tck,der=0): >>> >>> 487 if ier: raise TypeError,"An error occurred" 488 >>> if len(y)>1: return y 489 return y[0] >>> 490 >>> >>> Wouldn't it be less confusing to have the return value always have the >>> same type as the input data? >> >> I don't know of any "official" policy. > > I think (unstructured) interpolation should respect > > input.shape == output.shape > > also for 0-d. So yes, it's a wart, IMHO. > > Another question is: how many people actually have code that depends on > this wart, and can it be fixed? I'd guess there's not much problem: (1,) > arrays function nicely as scalars, but not vice versa because of > mutability. More generally, I think many functions should preserve the shape of the input array. Unfortunately it's often a hassle to do this: a few functions I have written start by checking whether the input is a scalar, setting a boolean and converting it to an array of size one; then at the end, I check the boolean and strip the array wrapping if the input is a scalar. It's annoying boilerplate, and I suspect that many functions don't handle this just because it's a nuisance. Some handy utility code might help. It would also be good to have a generic test one could apply to many functions to check that they preserve array shapes (0-d, 1-d of size 1, many-dimensional, many-dimensional with a zero dimension), and scalarness. Together with a test for preservation of arbitrary array subclasses (and correct functioning when handed matrices), one might be able to shake out a lot of minor easy-to-fix nuisances. Anne From charlesr.harris at gmail.com Wed Jan 20 15:39:05 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jan 2010 13:39:05 -0700 Subject: [SciPy-User] How to do symmetry detection? In-Reply-To: References: Message-ID: On Wed, Jan 20, 2010 at 12:39 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Wed, Jan 20, 2010 at 12:26 PM, iCy-fLaME wrote: > >> Thanks for the replies! >> >> Perhaps I should clarify that the input data can be int or float, and >> most of them will have a very large DC offset (i.e. sum(data) >> 0), >> and no, the signal duration can be anything, I can not "guess" >> >> > You should remove the offset, it is translation invariant anyway and gives > no symmetry information. > > >> The problem with convolution (scipy.signal.convolve) with self is, it >> will only produce one "valid" point in the middle, because anywhere >> else there is a mis-match of array shape. >> >> > This can be a problem if the symmetry is near an end, but won't matter much > if the relevant part is short or near the middle. The end effect will be a > problem no matter what method you use. Think of convolution as a matched > filter. > > >> I believe scipy.signal.convolve do not take into account of the number >> of points being integrated, and in the case of a large DC offset, any >> matches far from the middle of the data will be drowned by other areas >> which has more points to integrate over. >> >> Self convolution also has a problem of signal features matching >> itself. Imagine the input of the following: >> >> data: ______W____M_____ >> data[::-1]: _____M____W______ >> >> As you do the convolution, feature W will match itself first, then the >> W-M pair matching, then the M-M matching. Where a valid algorithm >> should only produce results for the W-M pair matching. >> >> > Well, there is no symmetry in that example. If you don't know if there is > symmetry then you have to account for that possibility in setting up the > statistics. I'm thinking Bayesian here. > > In particular, there should be some sort of threshold for detecting symmetry, some fraction of the signal variance, for instance. That assumes the data has been demeaned. The symmetry detection problem can be pretty difficult: noise can be a problem, the end effects can be a problem, etc., etc. Any apriori information about the nature of the signal can be useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 20 16:18:39 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 Jan 2010 16:18:39 -0500 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 In-Reply-To: References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> Message-ID: <1cd32cbb1001201318l4b6a8822j34cec23bce33a263@mail.gmail.com> On Wed, Jan 20, 2010 at 3:00 PM, Anne Archibald wrote: > 2010/1/19 Pauli Virtanen : >> Mon, 18 Jan 2010 10:59:46 -0500, josef.pktd wrote: >>> On Sun, Jan 17, 2010 at 5:25 AM, Yves Frederix >>> wrote: >> [clip] >>>> It was rather unexpected that the type of input and output data are >>>> different. After checking interpolate/fitpack.py it seems that this >>>> behavior results from the fact that the length-1 case is explicitly >>>> treated differently (probably to be able to deal with the case of >>>> scalar input, for which scalar output is expected): >>>> >>>> ?434 def splev(x,tck,der=0): >>>> ? >>>> ?487 ? ? ? ? if ier: raise TypeError,"An error occurred" 488 >>>> ?if len(y)>1: return y 489 ? ? ? ? return y[0] >>>> ?490 >>>> >>>> Wouldn't it be less confusing to have the return value always have the >>>> same type as the input data? >>> >>> I don't know of any "official" policy. >> >> I think (unstructured) interpolation should respect >> >> ? ? ? ?input.shape == output.shape >> >> also for 0-d. So yes, it's a wart, IMHO. >> >> Another question is: how many people actually have code that depends on >> this wart, and can it be fixed? I'd guess there's not much problem: (1,) >> arrays function nicely as scalars, but not vice versa because of >> mutability. > > More generally, I think many functions should preserve the shape of > the input array. Unfortunately it's often a hassle to do this: a few > functions I have written start by checking whether the input is a > scalar, setting a boolean and converting it to an array of size one; > then at the end, I check the boolean and strip the array wrapping if > the input is a scalar. It's annoying boilerplate, and I suspect that > many functions don't handle this just because it's a nuisance. Some > handy utility code might help. > > It would also be good to have a generic test one could apply to many > functions to check that they preserve array shapes (0-d, 1-d of size > 1, many-dimensional, many-dimensional with a zero dimension), ?and > scalarness. Together with a test for preservation of arbitrary array > subclasses (and correct functioning when handed matrices), one might > be able to shake out a lot of minor easy-to-fix nuisances. > > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I just checked again, the conversion in the distribution is weaker if output.ndim == 0: return output[()] as a result: >>> stats.norm.pdf(np.array([1])) array([ 0.24197072]) >>> stats.norm.pdf(np.array(1)) 0.24197072451914337 I just followed the pattern of Travis in this. Handling and preserving array subclasses is a lot of work and increases the size of simple functions considerably and triples (? not checked) the number of required tests (I just tried with stats.gmean, hmean and zscore). I don't see a way to write generic tests that would work across different signatures and argument types. Josef From tonyyu at MIT.EDU Wed Jan 20 17:56:33 2010 From: tonyyu at MIT.EDU (Tony S Yu) Date: Wed, 20 Jan 2010 17:56:33 -0500 Subject: [SciPy-User] Splines in scipy.signal vs scipy.interpolation Message-ID: <9AF13441-AFE5-4568-9438-4E98D6E99EDF@mit.edu> I'm having trouble making splines from scipy.signal work with those in scipy.interpolation. Both packages have functions for creating (`signal.cspline1d`/`interpolate.splrep`) and evaluating (`signal.cspline1d_eval`/`interpolate.splev`) splines. There are, of course, huge differences between these functions, which is why I'm trying to get them to talk to each other. In particular, I'd like to create a smoothing spline using `cspline1d` (which allows easier smoothing) and evaluate using `splev` (which allows me to get derivatives of the spline). I believe the main difference between the two spline representations (assuming cubic splines with no smoothing) is their boundary conditions (right?). Is there any way to condition the inputs such that I can feed in a spline from `cspline1d` and get a sensible result from `splev`? (Below is an example of what I mean by "conditioning the inputs"). Alternatively, is there another way to get functionality similar to matlab's `spaps` function? Thanks, -Tony #---- Failed attempt to get cspline1d/splev roundtrip import numpy as np from scipy import signal, interpolate x = np.linspace(1, 10, 20) y = np.cos(x) tck_interp = interpolate.splrep(x, y) c_signal = signal.cspline1d(y, 0) # set lambda to zero to eliminate smoothing # knots and coefficients from splrep have more values at boundaries t_match = np.hstack(([x[0]]*4, x[2:-2], [x[-1]]*4)) c_match = np.hstack((c_signal, [0]*4)) tck_signal = [t_match, c_match, 3] y_signal = signal.cspline1d_eval(c_signal, x, dx=x[1]-x[0], x0=x[0]) y_signal_interp = interpolate.splev(x, tck_signal, der=0) y_interp = interpolate.splev(x, tck_interp, der=0) print 'spline orders match? ', np.allclose(tck_signal[2], tck_interp[2]) #True print 'knots match? ', np.allclose(tck_signal[0], tck_interp[0]) #True print 'spline coefficients match? ', np.allclose(tck_signal[1], tck_interp[1]) #False print 'y (signal roundtrip) matches? ', np.allclose(y, y_signal) #True print 'y (interp roundtrip) matches? ', np.allclose(y, y_interp) #True print 'y (signal in/interp out) matches? ', np.allclose(y, y_signal_interp) #False From sfalsharif at gmail.com Thu Jan 21 14:09:25 2010 From: sfalsharif at gmail.com (Sharaf Al-Sharif) Date: Thu, 21 Jan 2010 19:09:25 +0000 Subject: [SciPy-User] Amplitude scaling in fft In-Reply-To: <20100119011934.GA2266@cudos0803> References: <57c570d21001171544p5f9b3835g513b2575adc6af33@mail.gmail.com> <20100119011934.GA2266@cudos0803> Message-ID: <57c570d21001211109t2570e1b7wcbec53f1db1069c6@mail.gmail.com> Thank you for your answer. Sharaf 2010/1/19 Jochen Schroeder > Hi, > > your answer doesn't really have a clear answer. Say raw_fft/raw_ifft is an > fft > without normalization then: > A = raw_ifft(raw_fft(a, n=2**11), n=2**11) > A = N*a > > where N=2**11 not len(a). However numpy does perform a normalization step > in > the ifft part, so that > numpy.fft.ifft = raw_fft / N > > This way we can use the fft just as a Fourier transform and also > fft(\delta) is > constant 1. > > Hope that explains things a bit. > > Cheers > Jochen > > > On 01/17/10 23:44, Sharaf Al-Sharif wrote: > > Hi, > > I'm a bit confused regarding how the amplitudes returned by np.fft.fft > (or > > np.fft.rfft) relate to the amplitudes of the original signal in time > domain. > > If: > > A = np.fft.rfft(a,n=2048) > > but, > > n_pts = len(a) < 2048, > > > > will the physical amplitudes in time domain be np.abs(A)*2/2048 , or > np.abs(A) > > *2/n_pts? Or something else? > > Thank you for your help. > > > > Sharaf > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jan 21 20:54:28 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Jan 2010 17:54:28 -0800 Subject: [SciPy-User] scipy.stats.nanmedian Message-ID: I noticed a couple of issues with nanmedian in scipy.stats: >> from scipy.stats import nanmedian >> nanmedian(1) --------------------------------------------------------------------------- ValueError: axis must be less than arr.ndim; axis=0, rank=0. >> nanmedian(True) --------------------------------------------------------------------------- ValueError: axis must be less than arr.ndim; axis=0, rank=0. >> nanmedian(np.array(1)) --------------------------------------------------------------------------- ValueError: axis must be less than arr.ndim; axis=0, rank=0. >> nanmedian(np.array([1, 2, 3])) array(2.0) Changing the function from the original: def nanmedian(x, axis=0): x, axis = _chk_asarray(x,axis) x = x.copy() return np.apply_along_axis(_nanmedian,axis,x) to this (I know, it is not pretty): def nanmedian(x, axis=0): if np.isscalar(x): return float(x) x, axis = _chk_asarray(x, axis) if x.ndim == 0: return float(x.tolist()) x = x.copy() x = np.apply_along_axis(_nanmedian, axis, x) if x.ndim == 0: x = float(x.tolist()) return x gives the expected results: >> nanmedian(1) 1.0 >> nanmedian(True) 1.0 >> nanmedian(np.array(1)) 1.0 >> nanmedian(np.array([1, 2, 3])) 2.0 which agree with numpy: >> np.median(1) 1.0 >> np.median(True) 1.0 >> np.median(np.array(1)) 1.0 >> np.median(np.array([1, 2, 3])) 2.0 I'm keeping a local copy of the changes I made for my own package. But it would be nice (for me) if this was fixed upstream. Are the changes above good enough for scipy? (Another difference from np.median that I noticed is that the default axis for np.median is None and for scipy.stats.nanmean it is 0. But perhaps it is too late to change that.) From josef.pktd at gmail.com Thu Jan 21 21:15:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 21 Jan 2010 21:15:59 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: References: Message-ID: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> On Thu, Jan 21, 2010 at 8:54 PM, Keith Goodman wrote: > I noticed a couple of issues with nanmedian in scipy.stats: > >>> from scipy.stats import nanmedian >>> nanmedian(1) > --------------------------------------------------------------------------- > ValueError: axis must be less than arr.ndim; axis=0, rank=0. >>> nanmedian(True) > --------------------------------------------------------------------------- > ValueError: axis must be less than arr.ndim; axis=0, rank=0. >>> nanmedian(np.array(1)) > --------------------------------------------------------------------------- > ValueError: axis must be less than arr.ndim; axis=0, rank=0. >>> nanmedian(np.array([1, 2, 3])) > ? array(2.0) > > Changing the function from the original: > > def nanmedian(x, axis=0): > ? ?x, axis = _chk_asarray(x,axis) > ? ?x = x.copy() > ? ?return np.apply_along_axis(_nanmedian,axis,x) > > to this (I know, it is not pretty): > > def nanmedian(x, axis=0): > ? ?if np.isscalar(x): > ? ? ? ?return float(x) > ? ?x, axis = _chk_asarray(x, axis) > ? ?if x.ndim == 0: > ? ? ? ?return float(x.tolist()) > ? ?x = x.copy() > ? ?x = np.apply_along_axis(_nanmedian, axis, x) > ? ?if x.ndim == 0: > ? ? ? ?x = float(x.tolist()) > ? ?return x Can you open a ticket, so that I don't forget to look at it? I will need to play with it. There are some things that I don't understand right away. What's the difference between isscalar and ndim=0 ? Why do you have the tolist() Thanks, Josef > gives the expected results: > >>> nanmedian(1) > ? 1.0 >>> nanmedian(True) > ? 1.0 >>> nanmedian(np.array(1)) > ? 1.0 >>> nanmedian(np.array([1, 2, 3])) > ? 2.0 > > which agree with numpy: > >>> np.median(1) > ? 1.0 >>> np.median(True) > ? 1.0 >>> np.median(np.array(1)) > ? 1.0 >>> np.median(np.array([1, 2, 3])) > ? 2.0 > > I'm keeping a local copy of the changes I made for my own package. But > it would be nice (for me) if this was fixed upstream. Are the changes > above good enough for scipy? > > (Another difference from np.median that I noticed is that the default > axis for np.median is None and for scipy.stats.nanmean it is 0. But > perhaps it is too late to change that.) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kwgoodman at gmail.com Thu Jan 21 21:28:47 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Jan 2010 18:28:47 -0800 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> Message-ID: On Thu, Jan 21, 2010 at 6:15 PM, wrote: > Can you open a ticket, so that I don't forget to look at it? Sure. > I will need to play with it. There are some things that I don't > understand right away. > What's the difference between isscalar and ndim=0 ? A scalar doesn't have a ndim method. But now I see that there is a ndim function. I'll use that instead. > Why do you have the tolist() That's the only was I was able to figure out how to pull 1.0 out of np.array(1.0). Is there a better way? >> np.array(1.0).tolist() 1.0 From pgmdevlist at gmail.com Thu Jan 21 21:41:37 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 21 Jan 2010 21:41:37 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> Message-ID: <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: > On Thu, Jan 21, 2010 at 6:15 PM, wrote: >> Can you open a ticket, so that I don't forget to look at it? > > Sure. > >> I will need to play with it. There are some things that I don't >> understand right away. >> What's the difference between isscalar and ndim=0 ? > > A scalar doesn't have a ndim method. But now I see that there is a > ndim function. I'll use that instead. > >> Why do you have the tolist() > > That's the only was I was able to figure out how to pull 1.0 out of > np.array(1.0). Is there a better way? .item() From josef.pktd at gmail.com Thu Jan 21 21:42:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 21 Jan 2010 21:42:14 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> Message-ID: <1cd32cbb1001211842j2a955c8fg2a768c4894b90ed6@mail.gmail.com> On Thu, Jan 21, 2010 at 9:28 PM, Keith Goodman wrote: > On Thu, Jan 21, 2010 at 6:15 PM, ? wrote: >> Can you open a ticket, so that I don't forget to look at it? > > Sure. > >> I will need to play with it. There are some things that I don't >> understand right away. >> What's the difference between isscalar and ndim=0 ? > > A scalar doesn't have a ndim method. But now I see that there is a > ndim function. I'll use that instead. > >> Why do you have the tolist() > > That's the only was I was able to figure out how to pull 1.0 out of > np.array(1.0). Is there a better way? > >>> np.array(1.0).tolist() > ? 1.0 >>> np.array(1.0) array(1.0) >>> np.array(1.0)[()] 1.0 Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Thu Jan 21 21:56:53 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 21 Jan 2010 21:56:53 -0500 Subject: [SciPy-User] Return type of scipy.interpolate.splev for input array of length 1 In-Reply-To: <62e6eafb1001190145s6847b08ald13ec237cdbca9c@mail.gmail.com> References: <62e6eafb1001170225h49632e1bw3c47c3d62f0cce2f@mail.gmail.com> <1cd32cbb1001180759k74c06be8q276128724cce61ed@mail.gmail.com> <62e6eafb1001190145s6847b08ald13ec237cdbca9c@mail.gmail.com> Message-ID: <1cd32cbb1001211856l342c5477x283fa72ce2b2e3ba@mail.gmail.com> On Tue, Jan 19, 2010 at 4:45 AM, Yves Frederix wrote: > Hi, > > In fact, I totally agree with you. Full matching of output to the type > of the input does not make sense. But one could expect that array_like > input results in ndarray output and scalar input in scalar output. As > far as I can see, scipy.stats behaves exactly in this way. > > Anyway, I checked some other files and, e.g., in > scipy/interpolate/polyint.py the input is explicitly tested to be > scalar. In attachment you can find a patch for > scipy/interpolate/fitpack.py so that it behaves 'correctly'. I also found a related http://projects.scipy.org/scipy/ticket/600 I don't know what the status of it is. Josef > Regards, > YVES > >> scipy.stats has switched for the most part to the same behavior. I >> think, mainly it is just a convention to have a nicer output when the >> return value is a scalar. >> >> One problem with making the output depend on the input type or shape >> is that in most functions I know, this information is not kept inside >> the function. Usually the input of array_like (arrays, lists, tuples, >> scalar numbers) is converted to an ndarray with np.asarray or >> np.array. >> The output then is independent of the input type (which hurts also if >> a user wants to work with matrices or other subclasses of ndarrays). >> >> On the other hand, if I want to use a list as input for convenience, I >> don't really want a list as output, I want an ndarray. >> >> That's my view, I don't really care in which direction the convention >> goes, but I like the consistency. >> >> Josef >> >>> Cheers, >>> YVES From kwgoodman at gmail.com Thu Jan 21 22:01:17 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 21 Jan 2010 19:01:17 -0800 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> Message-ID: On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM wrote: > On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: >> That's the only was I was able to figure out how to pull 1.0 out of >> np.array(1.0). Is there a better way? > > > .item() Thanks. item() looks better than tolist(). I simplified the function: def nanmedian(x, axis=0): x, axis = _chk_asarray(x,axis) if x.ndim == 0: return float(x.item()) x = x.copy() x = np.apply_along_axis(_nanmedian,axis,x) if x.ndim == 0: x = float(x.item()) return x and opened a ticket: http://projects.scipy.org/scipy/ticket/1098 From josef.pktd at gmail.com Thu Jan 21 23:18:31 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 21 Jan 2010 23:18:31 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> Message-ID: <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman wrote: > On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM wrote: >> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: >>> That's the only was I was able to figure out how to pull 1.0 out of >>> np.array(1.0). Is there a better way? >> >> >> .item() > > Thanks. item() looks better than tolist(). > > I simplified the function: > > def nanmedian(x, axis=0): > ? ?x, axis = _chk_asarray(x,axis) > ? ?if x.ndim == 0: > ? ? ? ?return float(x.item()) > ? ?x = x.copy() > ? ?x = np.apply_along_axis(_nanmedian,axis,x) > ? ?if x.ndim == 0: > ? ? ? ?x = float(x.item()) > ? ?return x > > and opened a ticket: > > http://projects.scipy.org/scipy/ticket/1098 How about getting rid of apply_along_axis? see attachment I don't know whether or how much faster it is, but there is a ticket that the current version is slow. No hidden bug or corner case guarantee yet. Josef -------------- next part -------------- # -*- coding: utf-8 -*- """ Created on Wed Jan 20 10:18:32 2010 Author: josef-pktd """ import numpy as np from scipy import stats def nanmedian(x, axis = 0): x, axis = stats.stats._chk_asarray(x, axis) if x.ndim == 0: return 1.0*x[()] x = np.sort(x, axis=axis) nall = x.shape[axis] notnancount = nall - np.isnan(x).sum(axis=axis) (idx, rmd) = divmod(notnancount, 2) indx = [np.arange(x.shape[ax]) for ax in range(x.ndim)] indxlo = indx[:] indxlo[axis] = idx indxhi = indx[:] indxhi[axis] = idx - (1-rmd) nanmed = (x[indxlo] + x[indxhi])/2. if nanmed.ndim == 0: return nanmed[()] return nanmed for axis in [0,1]: for i in range(5): # for complex #x = 1j+np.arange(20).reshape(4,5) x = np.arange(20).reshape(4,5).astype(float) x[zip(np.random.randint(4, size=(2,5)))] = np.nan print nanmedian(x, axis=0) print stats.nanmedian(x, axis=0) print nanmedian(1) print nanmedian(np.array(1)) print nanmedian(np.array([1])) From Dharhas.Pothina at twdb.state.tx.us Fri Jan 22 10:48:39 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Fri, 22 Jan 2010 09:48:39 -0600 Subject: [SciPy-User] timeseries tsfromtxt missing_values bug? In-Reply-To: <1C4012B8-4C61-4C97-9DE0-F4619383BC94@gmail.com> References: <4B5077A4.63BA.009B.0@twdb.state.tx.us> <1C4012B8-4C61-4C97-9DE0-F4619383BC94@gmail.com> Message-ID: <4B597476.63BA.009B.0@twdb.state.tx.us> Is there any way to install the svn version on windows? This script is being primarily used on a windows box. If not I'll test on linux. - dharhas >>> Pierre GM 1/20/2010 2:53 AM >>> On Jan 15, 2010, at 3:11 PM, Dharhas Pothina wrote: > Hi, > > I'm having issues with tsfromtxt masking fields using the missing_values parameter. > >>>> dateconverter = lambda y, m, d, hh, mm : datetime(year=int(y), month=int(m), day=int(d), hour=int(hh), minute=int(mm)) >>>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0) > > gives : > > timeseries([(-999.0,) (-999.0,) (-999.0,)], > dtype = [('f5', ' dates = [02-May-2000 06:00 12-May-2000 08:00 13-May-2000 00:00], > freq = T) > > While : > >>>> rseries = ts.tsfromtxt('test.csv',freq='T',comments='#',dateconverter=dateconverter,datecols=(1,2,3,4,5),usecols=(1,2,3,4,5,8),delimiter=',',missing_values=-999.0,names='data') > > gives : > > timeseries([(--,) (--,) (--,)], > dtype = [('_tmp4', ' dates = [02-May-2000 06:00 12-May-2000 08:00 13-May-2000 00:00], > freq = T) > > So if I uses the 'names' argument the missing values are masked correctly but the field name is set to '_tmp4' rather than 'data'. If I don't use the 'names' argument the missing values are not masked. I've attached a small file to demonstrate. Am I doing something wrong or is this a bug. > Dharhas, Sorry for the delay. So yes, you uncovered two bugs: (1) when no names were given, the missing values were skipped (if they were not strings); (2) when using usecols, the names were properly propagated. I fixed them on SVN, would you mind giving a try ? _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From bsouthey at gmail.com Fri Jan 22 10:58:07 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 22 Jan 2010 09:58:07 -0600 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> Message-ID: <4B59CB0F.8060502@gmail.com> On 01/21/2010 10:18 PM, josef.pktd at gmail.com wrote: > On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman wrote: > >> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM wrote: >> >>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: >>> >>>> That's the only was I was able to figure out how to pull 1.0 out of >>>> np.array(1.0). Is there a better way? >>>> >>> >>> .item() >>> >> Thanks. item() looks better than tolist(). >> >> I simplified the function: >> >> def nanmedian(x, axis=0): >> x, axis = _chk_asarray(x,axis) >> if x.ndim == 0: >> return float(x.item()) >> x = x.copy() >> x = np.apply_along_axis(_nanmedian,axis,x) >> if x.ndim == 0: >> x = float(x.item()) >> return x >> >> and opened a ticket: >> >> http://projects.scipy.org/scipy/ticket/1098 >> > > How about getting rid of apply_along_axis? see attachment > > I don't know whether or how much faster it is, but there is a ticket > that the current version is slow. > No hidden bug or corner case guarantee yet. > > > Josef > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Personally, I think using masked arrays is a far better solution than the various nan methods in stats.py. Is the call to _chk_asarray that much different than the call to ma? Both require conversion or checking if the input is a np array. As stated in the documentation, _nanmedian only works on 1d arrays. So any 'axis' argument without changing the main function is perhaps a hack at best. Is it possible to adapt Sturla's version? http://projects.scipy.org/numpy/ticket/1213 I do not know the algorithm to suggest anything but perhaps the select method could be adapted to handle nan. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jan 22 11:09:17 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 22 Jan 2010 08:09:17 -0800 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> Message-ID: On Thu, Jan 21, 2010 at 8:18 PM, wrote: > On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman wrote: >> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM wrote: >>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: >>>> That's the only was I was able to figure out how to pull 1.0 out of >>>> np.array(1.0). Is there a better way? >>> >>> >>> .item() >> >> Thanks. item() looks better than tolist(). >> >> I simplified the function: >> >> def nanmedian(x, axis=0): >> ? ?x, axis = _chk_asarray(x,axis) >> ? ?if x.ndim == 0: >> ? ? ? ?return float(x.item()) >> ? ?x = x.copy() >> ? ?x = np.apply_along_axis(_nanmedian,axis,x) >> ? ?if x.ndim == 0: >> ? ? ? ?x = float(x.item()) >> ? ?return x >> >> and opened a ticket: >> >> http://projects.scipy.org/scipy/ticket/1098 > > > How about getting rid of apply_along_axis? ? ?see attachment > > I don't know whether or how much faster it is, but there is a ticket > that the current version is slow. > No hidden bug or corner case guarantee yet. It is faster. But here is one case it does not handle: >> nanmedian([1, 2]) array([ 1.5]) >> np.median([1, 2]) 1.5 I'm sure it could be fixed. But having to fix it, and the fact that it is a larger change, decreases the likelihood that it will make it into the next version of scipy. One option is to make the small bug fix I suggested (ticket #1098) and add the corresponding unit tests. Then we can take our time to design a better version of nanmedian. From josef.pktd at gmail.com Fri Jan 22 11:14:56 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 22 Jan 2010 11:14:56 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <4B59CB0F.8060502@gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> <4B59CB0F.8060502@gmail.com> Message-ID: <1cd32cbb1001220814x565d38cx6ef0bd029136b58f@mail.gmail.com> On Fri, Jan 22, 2010 at 10:58 AM, Bruce Southey wrote: > On 01/21/2010 10:18 PM, josef.pktd at gmail.com wrote: > > On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman wrote: > > > On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM wrote: > > > On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote: > > > That's the only was I was able to figure out how to pull 1.0 out of > np.array(1.0). Is there a better way? > > > .item() > > > Thanks. item() looks better than tolist(). > > I simplified the function: > > def nanmedian(x, axis=0): > ? ?x, axis = _chk_asarray(x,axis) > ? ?if x.ndim == 0: > ? ? ? ?return float(x.item()) > ? ?x = x.copy() > ? ?x = np.apply_along_axis(_nanmedian,axis,x) > ? ?if x.ndim == 0: > ? ? ? ?x = float(x.item()) > ? ?return x > > and opened a ticket: > > http://projects.scipy.org/scipy/ticket/1098 > > > How about getting rid of apply_along_axis? see attachment > > I don't know whether or how much faster it is, but there is a ticket > that the current version is slow. > No hidden bug or corner case guarantee yet. > > > Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > Personally, I think using masked arrays is a far better solution than the > various nan methods in stats.py. Is the call to _chk_asarray that much > different than the call to ma? Both require conversion or checking if the > input is a np array. I like arrays with nans better than masked arrays, and I checked, np.ma.median also uses apply_along_axis and I wanted to see if I can get a vectorized version. > > As stated in the documentation, _nanmedian only works on 1d arrays. So any > 'axis' argument without changing the main function is perhaps a hack at > best. _nanmedian is an internal function, stats.nanmedian and my rewritten version are supposed to handle any dimension and any axis. > > Is it possible to adapt Sturla's version? > http://projects.scipy.org/numpy/ticket/1213 > I do not know the algorithm to suggest anything but perhaps the select > method could be adapted to handle nan. I guess Sturla's version is a lot better, but not my kind of fish, it's more for the algorithm and c experts. I will gladly use it once it or something similar is in numpy. Josef > > Bruce > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From kwgoodman at gmail.com Fri Jan 22 11:17:48 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 22 Jan 2010 08:17:48 -0800 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: <4B59CB0F.8060502@gmail.com> References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com> <4B59CB0F.8060502@gmail.com> Message-ID: On Fri, Jan 22, 2010 at 7:58 AM, Bruce Southey wrote: > Is it possible to adapt Sturla's version? > http://projects.scipy.org/numpy/ticket/1213 > I do not know the algorithm to suggest anything but perhaps the select > method could be adapted to handle nan. I recently needed to calculate the median in an inner loop. It would have been nice to have a median function that doesn't do a full sort. I wanted to compile Sturla's version, but I didn't even know which of the attachments to download. I've never compiled a cython function. From josef.pktd at gmail.com Fri Jan 22 11:46:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 22 Jan 2010 11:46:10 -0500 Subject: [SciPy-User] scipy.stats.nanmedian In-Reply-To: References: <1cd32cbb1001211815n7d6b1289w2e758c359051b271@mail.gmail.com> <16A50238-D3F1-4F51-A229-4FCD8267F320@gmail.com> <1cd32cbb1001212018s21e892f8rb7b210033e9d2fa6@mail.gmail.com>