From rob.clewley at gmail.com Sat May 1 11:36:05 2010 From: rob.clewley at gmail.com (Rob Clewley) Date: Sat, 1 May 2010 11:36:05 -0400 Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, wxPython interactive environment In-Reply-To: References: Message-ID: Hi Tom, On Fri, Apr 30, 2010 at 4:38 PM, Charrett, Thomas wrote: > Hello all, > I would like to announce a interactive environment for python, PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is written in wxpython.: > http://pythontoolkit.sourceforge.net This looks like something I might want to teach with too. I hope to be able to try this out soon as I try to teach with python in my classes. I would benefit from an easy to learn and use graphical interactive environment for my students (some of whom will already be familiar with matlab). I look forward to seeing some tutorial examples to show off how it might be used in a teaching session. But most pressingly you need to provide information on your web page about what steps need to be taken to "install" it and what version restrictions there are. I cannot see any information in the download zip file either. I have wxPython installed and I'm on a Mac OS X 10.4 with Python 2.4. It looks like there is no setup.py and so maybe PTK.pyw is supposed to be just run as-is, but I don't know whether the ptk directory is supposed to be dumped into site-packages or be standalone, and whether path environment changes etc. are needed. Running PTK.pyw doesn't work for me: File "/Users/rob/ptk/app/__init__.py", line 18, in ? startdir = __main__.__file__.rpartition(os.sep)[0] AttributeError: 'str' object has no attribute 'rpartition' I think rpartition is only in Python 2.5+ ? -Rob From stefan at sun.ac.za Sat May 1 15:02:40 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 1 May 2010 21:02:40 +0200 Subject: [SciPy-User] watershed question In-Reply-To: <20100429193826.GA9432@phare.normalesup.org> References: <20100425123445.GA25175@phare.normalesup.org> <20100429193826.GA9432@phare.normalesup.org> Message-ID: Hi Emanuelle The CellProfiler team has been very generous in sharing code so far: http://stefanv.github.com/scikits.image/contribute.html#merge-code-provided-by-cellprofiler-team I have just been too busy working on my PhD to get it all integrated :( [Of course, I'd be very happy if someone would help with this task!] I see the svn links on that page is broken; I'll see where the code is hosted now. I'm not sure if the watershed code is covered. Cheers St?fan On 29 April 2010 21:38, Emmanuelle Gouillart wrote: > ? ? ? ?Hi St?fan and Zach, > > ? ? ? ?thank you for your answers. So it seems that > ndimage.watershed_ift is quite buggy, maybe some warnings should be > added to its docstring? We can't afford people spending too much time > trying to use the function if it doesn't work. > > ? ? ? ?Using the cellprofile/cpmath package is a neat trick, I tried it > and it works perfectly. It even works for 3-D arrays (using the > fast_watershed function)! Too bad that it's GPL-licensed and it's not > possible to integrate the code in the image processing scikit :(. > > ? ? ? ?Thanks again, > > ? ? ? ?Emmanuelle > > > > On Wed, Apr 28, 2010 at 01:18:11PM -0400, Zachary Pincus wrote: >> > Unless I'm also missing something obvious, the code returns an invalid >> > result. ?I even adjusted the depths of the two "pits", but always one >> > region overruns the other---not what I would expect to happen. ?I >> > haven't delved into the ndimage code at all, but I wonder weather we >> > shouldn't implement one of the simpler algorithms as part of >> > scikits.image.segment for comparison? > >> Cellprofiler has a watershed algorithm, I believe. And like most of >> the cellprofiler stuff, the implementation seems pretty high-quality >> and well-thought-out. > >> I wound up extracting the cpmath sub-package, and (after a few >> setup.py changes) it works great standalone with just scipy and numpy >> as dependencies: >> https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/ > >> Zach >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From stefan at sun.ac.za Sat May 1 15:15:22 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 1 May 2010 21:15:22 +0200 Subject: [SciPy-User] watershed question In-Reply-To: References: <20100425123445.GA25175@phare.normalesup.org> <20100429193826.GA9432@phare.normalesup.org> Message-ID: I've updated the links to the repository. Lee, I assume the watershed code covered under the arrangement? Looks like we've got some hands to help! Regards St?fan 2010/5/1 St?fan van der Walt : > Hi Emanuelle > > The CellProfiler team has been very generous in sharing code so far: > > http://stefanv.github.com/scikits.image/contribute.html#merge-code-provided-by-cellprofiler-team > > I have just been too busy working on my PhD to get it all integrated > :( ?[Of course, I'd be very happy if someone would help with this > task!] > > I see the svn links on that page is broken; I'll see where the code is > hosted now. ?I'm not sure if the watershed code is covered. > > Cheers > St?fan > > On 29 April 2010 21:38, Emmanuelle Gouillart > wrote: >> ? ? ? ?Hi St?fan and Zach, >> >> ? ? ? ?thank you for your answers. So it seems that >> ndimage.watershed_ift is quite buggy, maybe some warnings should be >> added to its docstring? We can't afford people spending too much time >> trying to use the function if it doesn't work. >> >> ? ? ? ?Using the cellprofile/cpmath package is a neat trick, I tried it >> and it works perfectly. It even works for 3-D arrays (using the >> fast_watershed function)! Too bad that it's GPL-licensed and it's not >> possible to integrate the code in the image processing scikit :(. >> >> ? ? ? ?Thanks again, >> >> ? ? ? ?Emmanuelle >> >> >> >> On Wed, Apr 28, 2010 at 01:18:11PM -0400, Zachary Pincus wrote: >>> > Unless I'm also missing something obvious, the code returns an invalid >>> > result. ?I even adjusted the depths of the two "pits", but always one >>> > region overruns the other---not what I would expect to happen. ?I >>> > haven't delved into the ndimage code at all, but I wonder weather we >>> > shouldn't implement one of the simpler algorithms as part of >>> > scikits.image.segment for comparison? >> >>> Cellprofiler has a watershed algorithm, I believe. And like most of >>> the cellprofiler stuff, the implementation seems pretty high-quality >>> and well-thought-out. >> >>> I wound up extracting the cpmath sub-package, and (after a few >>> setup.py changes) it works great standalone with just scipy and numpy >>> as dependencies: >>> https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/ >> >>> Zach >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From cr.anil at gmail.com Sun May 2 07:36:05 2010 From: cr.anil at gmail.com (Anil C R) Date: Sun, 2 May 2010 17:06:05 +0530 Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, wxPython interactive environment In-Reply-To: References: Message-ID: This looks good :D great job Tom!! Anil On Sat, May 1, 2010 at 2:08 AM, Charrett, Thomas wrote: > Hello all, > I would like to announce a interactive environment for python, > PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is > written in wxpython.: > http://pythontoolkit.sourceforge.net > > It started life a personal project for use in day-to-day lab work and > teaching (including myself) python so it is far from complete, but usable. > Key features: > > - A console window with support for muliple isolated python interpreters > (engines) running as external processes, > - External (process) engines allow interactive use of GUI toolkits > (currently wxPython and Tk) > - Full object inspection, auto-completions and call tips in internal and > external engines. > - Matlab style namespace/workspace browser than can be extended/customised > for different python types. > - GUI views for strings, unicode, lists and numpy arrays (more can be > easily added) > - Python path management . > - A simple python code editor. > - Searchable command history that is stored between sessions. > - Extendible via a tool plugin system. > > I would be interested in any comments/suggestion/feedback. > > Thanks, > > Tom > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cr.anil at gmail.com Sun May 2 07:43:48 2010 From: cr.anil at gmail.com (Anil C R) Date: Sun, 2 May 2010 17:13:48 +0530 Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, wxPython interactive environment In-Reply-To: References: Message-ID: Rob, the requirements says it needs Python 2.6+ http://pythontoolkit.sourceforge.net/about.html Anil On Sun, May 2, 2010 at 5:06 PM, Anil C R wrote: > This looks good :D great job Tom!! > > Anil > > > > On Sat, May 1, 2010 at 2:08 AM, Charrett, Thomas < > t.charrett at cranfield.ac.uk> wrote: > >> Hello all, >> I would like to announce a interactive environment for python, >> PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is >> written in wxpython.: >> http://pythontoolkit.sourceforge.net >> >> It started life a personal project for use in day-to-day lab work and >> teaching (including myself) python so it is far from complete, but usable. >> Key features: >> >> - A console window with support for muliple isolated python interpreters >> (engines) running as external processes, >> - External (process) engines allow interactive use of GUI toolkits >> (currently wxPython and Tk) >> - Full object inspection, auto-completions and call tips in internal and >> external engines. >> - Matlab style namespace/workspace browser than can be extended/customised >> for different python types. >> - GUI views for strings, unicode, lists and numpy arrays (more can be >> easily added) >> - Python path management . >> - A simple python code editor. >> - Searchable command history that is stored between sessions. >> - Extendible via a tool plugin system. >> >> I would be interested in any comments/suggestion/feedback. >> >> Thanks, >> >> Tom >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cr.anil at gmail.com Sun May 2 07:54:13 2010 From: cr.anil at gmail.com (Anil C R) Date: Sun, 2 May 2010 17:24:13 +0530 Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, wxPython interactive environment In-Reply-To: References: Message-ID: Tom, are you planning on integrating ipython too? I think that would be nice. Also I use matplotlib, and every time i need to do an imshow, I need to do something like this: img = imread('image.png') imshow(img),show() is this a problem with matplotlib or with your software?? any workarounds to avoid the show() call? Thanks Anil -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Mon May 3 06:04:51 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 3 May 2010 12:04:51 +0200 Subject: [SciPy-User] deterministic random variable Message-ID: Hi, As far as I can see scipy.stats does not support the deterministic distribution. Would it be a good idea to implement this also? In my opinion this distribution is very useful to use as a test case, for debugging purposes for instance. bye Nicky From josef.pktd at gmail.com Mon May 3 09:16:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 3 May 2010 09:16:54 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Mon, May 3, 2010 at 6:04 AM, nicky van foreest wrote: > Hi, > > As far as I can see scipy.stats does not support the deterministic > distribution. Would it be a good idea to implement this also? In my > opinion this distribution is very useful to use as a test case, for > debugging purposes for instance. You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution (I never heard the term deterministic distribution before). If the support is an integer, then rv_discrete might work, looks good see below Are there any useful operations, that we could do with it? I think I can see a case for debugging programs that use the distributions in scipy.stats, but almost degenerate might also work for debugging. What I would like to have is a discrete distribution on the real line, instead of the integers, like rv_discrete but with support on arbitrary floats. This could use the machinery of rv_discrete but would need a generalizing rewrite. this looks good >>> stats.rv_discrete(values=([0],[1]), name='degenerate') >>> deg=stats.rv_discrete(values=([0],[1]), name='degenerate') >>> deg.rvs(size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>> deg.pmf(np.arange(-5,6)) array([ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]) >>> deg.cdf(np.arange(-5,6)) array([ 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]) >>> deg.sf(np.arange(-5,6)) array([ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.]) >>> deg.ppf(np.linspace(0,1,11)) array([-1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) >>> deg.stats() (array(0.0), array(0.0)) >>> deg.stats(moments='mvsk') (array(0.0), array(0.0), array(-1.#IND), array(-1.#IND)) degenerate Bernoulli has a nan problem in pmf >>> stats.bernoulli.rvs(0,size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>> stats.bernoulli.pmf(np.arange(-5,6),0.) array([ 0., 0., 0., 0., 0., NaN, 0., 0., 0., 0., 0.]) >>> stats.bernoulli.cdf(np.arange(-5,6),0.) array([ 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]) >>> stats.bernoulli.pmf(np.arange(-5,6),1.) array([ 0., 0., 0., 0., 0., 0., NaN, 0., 0., 0., 0.]) >>> stats.bernoulli.ppf(np.linspace(0,1,11),0.) array([-1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]) >>> stats.bernoulli.ppf(np.linspace(0,1,11),1.) array([-1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> stats.bernoulli.stats(0., moments='mvsk') (array(0.0), array(0.0), array(1.#INF), array(1.#INF)) and almost degenerate Bernoulli >>> stats.bernoulli.pmf(np.arange(-5,6),1e-16) array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 1.00000000e-16, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]) >>> stats.bernoulli.pmf(np.arange(-5,6),1-1e-16) array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.11022302e-16, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]) >>> stats.bernoulli.ppf(np.linspace(0,1,11),1-1e-16) array([-1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) Josef > > bye > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon May 3 09:35:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 3 May 2010 09:35:42 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Mon, May 3, 2010 at 9:16 AM, wrote: > On Mon, May 3, 2010 at 6:04 AM, nicky van foreest wrote: >> Hi, >> >> As far as I can see scipy.stats does not support the deterministic >> distribution. Would it be a good idea to implement this also? In my >> opinion this distribution is very useful to use as a test case, for >> debugging purposes for instance. > > You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution > (I never heard the term deterministic distribution before). > > If the support is an integer, then rv_discrete might work, looks good see below > > Are there any useful operations, that we could do with it? > I think I can see a case for debugging programs that use the > distributions in scipy.stats, but almost degenerate might also work > for debugging. > > What I would like to have is a discrete distribution on the real line, > instead of the integers, like rv_discrete but with support on > arbitrary floats. This could use the machinery of rv_discrete but > would need a generalizing rewrite. > > > this looks good > >>>> stats.rv_discrete(values=([0],[1]), name='degenerate') > >>>> deg=stats.rv_discrete(values=([0],[1]), name='degenerate') >>>> deg.rvs(size=10) > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>>> deg.pmf(np.arange(-5,6)) > array([ 0., ?0., ?0., ?0., ?0., ?1., ?0., ?0., ?0., ?0., ?0.]) >>>> deg.cdf(np.arange(-5,6)) > array([ 0., ?0., ?0., ?0., ?0., ?1., ?1., ?1., ?1., ?1., ?1.]) >>>> deg.sf(np.arange(-5,6)) > array([ 1., ?1., ?1., ?1., ?1., ?0., ?0., ?0., ?0., ?0., ?0.]) >>>> deg.ppf(np.linspace(0,1,11)) > array([-1., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0.]) >>>> deg.stats() > (array(0.0), array(0.0)) >>>> deg.stats(moments='mvsk') > (array(0.0), array(0.0), array(-1.#IND), array(-1.#IND)) > > > degenerate Bernoulli has a nan problem in pmf > >>>> stats.bernoulli.rvs(0,size=10) > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>>> stats.bernoulli.pmf(np.arange(-5,6),0.) > array([ ?0., ? 0., ? 0., ? 0., ? 0., ?NaN, ? 0., ? 0., ? 0., ? 0., ? 0.]) >>>> stats.bernoulli.cdf(np.arange(-5,6),0.) > array([ 0., ?0., ?0., ?0., ?0., ?1., ?1., ?1., ?1., ?1., ?1.]) >>>> stats.bernoulli.pmf(np.arange(-5,6),1.) > array([ ?0., ? 0., ? 0., ? 0., ? 0., ? 0., ?NaN, ? 0., ? 0., ? 0., ? 0.]) >>>> stats.bernoulli.ppf(np.linspace(0,1,11),0.) > array([-1., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?1.]) >>>> stats.bernoulli.ppf(np.linspace(0,1,11),1.) > array([-1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1.]) >>>> stats.bernoulli.stats(0., moments='mvsk') > (array(0.0), array(0.0), array(1.#INF), array(1.#INF)) > > > and almost degenerate Bernoulli > >>>> stats.bernoulli.pmf(np.arange(-5,6),1e-16) > array([ ?0.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00, > ? ? ? ? 0.00000000e+00, ? 0.00000000e+00, ? 1.00000000e+00, > ? ? ? ? 1.00000000e-16, ? 0.00000000e+00, ? 0.00000000e+00, > ? ? ? ? 0.00000000e+00, ? 0.00000000e+00]) >>>> stats.bernoulli.pmf(np.arange(-5,6),1-1e-16) > array([ ?0.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00, > ? ? ? ? 0.00000000e+00, ? 0.00000000e+00, ? 1.11022302e-16, > ? ? ? ? 1.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00, > ? ? ? ? 0.00000000e+00, ? 0.00000000e+00]) >>>> stats.bernoulli.ppf(np.linspace(0,1,11),1-1e-16) > array([-1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1.]) for the record (and future searches) almost degenerate normal also seems to work, http://en.wikipedia.org/wiki/Dirac_delta_function >>> stats.norm.rvs(loc=2.5, scale=1e-10, size=10) array([ 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5]) >>> stats.norm.cdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-10) array([ 0. , 0. , 0. , 0. , 0. , 0.5, 1. , 1. , 1. , 1. , 1. ]) >>> stats.norm.pdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-10) array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.98942280e+09, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]) >>> stats.norm.cdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-16) array([ 0. , 0. , 0. , 0. , 0. , 0.5, 1. , 1. , 1. , 1. , 1. ]) >>> stats.norm.pdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-16) array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.98942280e+15, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]) >>> stats.norm.ppf(np.linspace(0,1,11),loc=2.5, scale=1e-16) array([-Inf, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, Inf]) >>> Josef > > Josef > >> >> bye >> >> Nicky >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From amcmorl at gmail.com Mon May 3 12:49:44 2010 From: amcmorl at gmail.com (Angus McMorland) Date: Mon, 3 May 2010 12:49:44 -0400 Subject: [SciPy-User] loadmat, savemat and strings Message-ID: Hi all, I'm running in to problems trying to save some metadata in a matlab file variable written with savemat, and read in with loadmat. I've condensed the problem to the following: import numpy as np from scipy import io a = np.array(['HelloWorld', 'Foobar']) io.savemat('tmp.mat', dict(a=a)) res = io.loadmat('tmp.mat') print res['a'] -> array([u'HloolWw', u'elWrdo'], dtype='U10')) -> array([u'\U48000000\U6c000000\U6f000000\U6f000000\U6c000000\U57000000\U77000000', u'\U65000000\U6c000000\U57000000\U72000000\U64000000\U6f000000'], dtype='>U10') and res['a'].byteswap() gives the same result. Finally, I've tried coercing the input into a unicode type before saving... print a.astype('U') ->array([u'HelloWorld', u'Wow'], dtype=' 111 matfile_dict = MR.get_variables() 112 if mdict is not None: 113 mdict.update(matfile_dict) /usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.pyc in get_variables(self, variable_names) 359 getter.to_next() 360 continue --> 361 res = getter.get_array() 362 mdict[name] = res 363 if getter.is_global: /usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.pyc in get_array(self) 400 def get_array(self): 401 ''' Gets an array from matrix, and applies any necessary processing ''' --> 402 arr = self.get_raw_array() 403 return self.array_reader.processor_func(arr, self) 404 /usr/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in get_raw_array(self) 442 dtype=np.dtype('U1'), 443 buffer=np.array(res), --> 444 order='F').copy() 445 446 TypeError: buffer is too small for requested array Is this a bug (I guess it shouldn't throw errors quite like this in any case), and is there a successful method for saving string types into matlab files and retrieving them? Thanks, Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon May 3 13:32:08 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 May 2010 10:32:08 -0700 Subject: [SciPy-User] loadmat, savemat and strings In-Reply-To: References: Message-ID: Hi, > from scipy import io > a = np.array(['HelloWorld', 'Foobar']) > io.savemat('tmp.mat', dict(a=a)) > res = io.loadmat('tmp.mat') > print res['a'] > -> > array([u'HloolWw', u'elWrdo'], > ?? ? ?dtype=' io.savemat('tmp.mat', dict(a=a.astype('U'))) > res = io.loadmat('tmp.mat') > TypeError: buffer is too small for requested array > Is this a bug (I guess it shouldn't throw errors quite like this in any > case), and is there a successful method for saving string types into matlab > files and retrieving them? Certainly a bug - I'll have a look later today, thanks for the report, Best, Matthew From t.charrett at cranfield.ac.uk Mon May 3 14:39:29 2010 From: t.charrett at cranfield.ac.uk (Charrett, Thomas) Date: Mon, 3 May 2010 19:39:29 +0100 Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, Message-ID: Anil, Thanks for the comments. The show() call is 'feature' of matplotlib, to avoid slow redraws when constructing figures. To get rid of it you can put pylab into interactive mode using pylab.ion() (pylab.ioff() does the opposite) or by editing your matplotlib configuration files. IPython may well turn it on automatically. You will also probably need to make sure you are using the correct matplotlib backend for the engine type - for internal/ wxExternal engines this is one of the wx backends, for the TkExternal engine use the Tk backend etc... As for integrating IPython probably not, but I may decide to add the same magic commands that IPython uses, I'm not sure yet as some of them seem a bit pointless, %run = python execfile command, and others are replaced by the gui , such as %who/%whos. Tom ------------------------------ Message: 3 Date: Sun, 2 May 2010 17:24:13 +0530 From: Anil C R Subject: Re: [SciPy-User] [ANN] PythonToolkit (PTK) - open source, wxPython interactive environment To: SciPy Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" Tom, are you planning on integrating ipython too? I think that would be nice. Also I use matplotlib, and every time i need to do an imshow, I need to do something like this: img = imread('image.png') imshow(img),show() is this a problem with matplotlib or with your software?? any workarounds to avoid the show() call? Thanks Anil From vanforeest at gmail.com Mon May 3 15:32:02 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 3 May 2010 21:32:02 +0200 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: Hi Josef, Thanks for your answer. On 3 May 2010 15:16, wrote: > On Mon, May 3, 2010 at 6:04 AM, nicky van foreest wrote: >> Hi, >> >> As far as I can see scipy.stats does not support the deterministic >> distribution. Would it be a good idea to implement this also? In my >> opinion this distribution is very useful to use as a test case, for >> debugging purposes for instance. One case is the M/D/1 queue, a single server with exponentially distributed interarrival times and deterministic service times. Another case is an inventory system with periodic replenishments, and random demands. A first simple model would be to use deterministically distributed interreplenishment times. The size of demand can also be taken to be deterministic, as an interesting limiting case. > > You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution > (I never heard the term deterministic distribution before). Yes. > > If the support is an integer, then rv_discrete might work, looks good see below > > Are there any useful operations, that we could do with it? Yes, like simulating the M/D/1 queue. Suppose I would like to build a queueing simulator. I would like to set this up in a generic way, and pass rv_arrival and rv_service as frozen rvs, Like this I can experiment with several distributions, including the deterministic distribution as a limiting case or simple case, all within the same framework. > I think I can see a case for debugging programs that use the > distributions in scipy.stats, but almost degenerate might also work > for debugging. Sure, but sometimes you just want to exclude random effects. Moreover, I would like to see "rv = stats.deterministic(...)" in the code, for the purpose of readability. > > What I would like to have is a discrete distribution on the real line, > instead of the integers, like rv_discrete but with support on > arbitrary floats. Yes, indeed. Please let me know your opinion. bye Nicky From thomas.robitaille at gmail.com Mon May 3 17:02:27 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Mon, 3 May 2010 17:02:27 -0400 Subject: [SciPy-User] Serious issue with interp2d Message-ID: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com> Hi, I'm having issues getting interp2d to work for even simple arrays with linear interpolation. The following example: import numpy as np from scipy.interpolate import interp2d # Create image nx, ny = 10, 10 image = np.zeros((nx,ny)) for i in range(nx): for j in range(ny): image[i,j] = float(i + j/2) # Define pixel centers along each direction x = np.linspace(0.5, float(nx) - 0.5, nx) y = np.linspace(0.5, float(ny) - 0.5, ny) # Create interpolating function f = interp2d(x,y,image, kind='linear') Returns the following warning Warning: No more knots can be added because the number of B-spline coefficients already exceeds the number of data points m. Probably causes: either s or m too small. (fp>s) kx,ky=1,1 nx,ny=13,12 m=100 fp=0.000000 s=0.000000 and some of the results using the interpolating function are wrong. What is going on? I don't understand why spline coefficients are even mentioned, because I specified that I just wanted linear interpolation. Can anyone reproduce this issue? I'm using scipy svn r6368. Thanks, Thomas From matthew.brett at gmail.com Tue May 4 02:48:16 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 May 2010 23:48:16 -0700 Subject: [SciPy-User] loadmat, savemat and strings In-Reply-To: References: Message-ID: Hi, >> from scipy import io >> a = np.array(['HelloWorld', 'Foobar']) >> io.savemat('tmp.mat', dict(a=a)) >> res = io.loadmat('tmp.mat') >> print res['a'] >> -> >> array([u'HloolWw', u'elWrdo'], >> ?? ? ?dtype=' >> io.savemat('tmp.mat', dict(a=a.astype('U'))) >> res = io.loadmat('tmp.mat') > >> TypeError: buffer is too small for requested array >> Is this a bug (I guess it shouldn't throw errors quite like this in any >> case), and is there a successful method for saving string types into matlab >> files and retrieving them? > > Certainly a bug - I'll have a look later today, thanks for the report, The two bugs you found should be fixed in latest SVN... Best, Matthew From chris at simplistix.co.uk Tue May 4 07:00:05 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 04 May 2010 12:00:05 +0100 Subject: [SciPy-User] problems with build Message-ID: <4BDFFE35.8060107@simplistix.co.uk> Hi All, Now that I've finally managed to subscribe to this list, I haev a question about installation of numpy and scipy. So, I tried this to get the latest numpy installed on an Ubuntu box: sudo apt-get build-dep python-numpy Then, inside the virtual_env I'm working in: bin/easy_install bin/easy_install numpy ...which left me with: Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg Processing dependencies for numpy Finished processing dependencies for numpy Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", line 248, in clean_up_temporary_directory SystemError: Parent module 'numpy.distutils' not loaded Error in sys.exitfunc: Traceback (most recent call last): File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", line 248, in clean_up_temporary_directory SystemError: Parent module 'numpy.distutils' not loaded ...and yet: $ bin/python Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> Any idea what those weird atexit handlers are supposed to do?! They seem to fire not only when numpy is installed but also when anything that depends on numpy is installed... cheers, Chris From denis-bz-gg at t-online.de Tue May 4 07:33:50 2010 From: denis-bz-gg at t-online.de (denis) Date: Tue, 4 May 2010 04:33:50 -0700 (PDT) Subject: [SciPy-User] Serious issue with interp2d In-Reply-To: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com> References: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com> Message-ID: <10404874-c937-4653-a3bc-1ac9db69962a@p2g2000yqh.googlegroups.com> Thomas. RectBivariateSpline works in 0.7.1: from __future__ import division import numpy as np from scipy.interpolate import RectBivariateSpline np.set_printoptions( 2, threshold=100, suppress=True ) # .2f # Create image nx, ny = 10, 10 image = np.zeros((nx,ny)) for i in range(nx): for j in range(ny): image[i,j] = i + j/2 # Define pixel centers along each direction x = np.linspace(0.5, nx - 0.5, nx) y = np.linspace(0.5, ny - 0.5, ny) # Create interpolating function f = RectBivariateSpline( x,y,image, kx=1, ky=1 ) # s=0: interpolate # and use it xi = np.linspace(0, nx, 2*nx + 1) yi = np.linspace(0, ny, 2*ny + 1) zi = f( xi, yi ) print "RectBivariateSpline:", zi (There's an open ticket on interp2d 'linear' http://projects.scipy.org/scipy/ticket/898 but to me fitpack is murky / tryitandsee, lacks overview doc.) cheers -- denis From denis-bz-gg at t-online.de Tue May 4 07:33:50 2010 From: denis-bz-gg at t-online.de (denis) Date: Tue, 4 May 2010 04:33:50 -0700 (PDT) Subject: [SciPy-User] Serious issue with interp2d In-Reply-To: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com> References: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com> Message-ID: <10404874-c937-4653-a3bc-1ac9db69962a@p2g2000yqh.googlegroups.com> Thomas. RectBivariateSpline works in 0.7.1: from __future__ import division import numpy as np from scipy.interpolate import RectBivariateSpline np.set_printoptions( 2, threshold=100, suppress=True ) # .2f # Create image nx, ny = 10, 10 image = np.zeros((nx,ny)) for i in range(nx): for j in range(ny): image[i,j] = i + j/2 # Define pixel centers along each direction x = np.linspace(0.5, nx - 0.5, nx) y = np.linspace(0.5, ny - 0.5, ny) # Create interpolating function f = RectBivariateSpline( x,y,image, kx=1, ky=1 ) # s=0: interpolate # and use it xi = np.linspace(0, nx, 2*nx + 1) yi = np.linspace(0, ny, 2*ny + 1) zi = f( xi, yi ) print "RectBivariateSpline:", zi (There's an open ticket on interp2d 'linear' http://projects.scipy.org/scipy/ticket/898 but to me fitpack is murky / tryitandsee, lacks overview doc.) cheers -- denis From amcmorl at gmail.com Tue May 4 09:34:42 2010 From: amcmorl at gmail.com (Angus McMorland) Date: Tue, 4 May 2010 09:34:42 -0400 Subject: [SciPy-User] Optimization with smoothing Message-ID: Hi all, I need to do some optimization where one of the parameters is a spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to go about this using scipy (or any other numpy-compatible Python package)? I could imagine using one of the scipy.optimize routines and then smoothing the relevant parameters within the optimization loop, but it would be best if the next iteration's of parameters were chosen from the previous iteration's _smoothed_ parameters rather than their 'non-smooth' predecessors, as it seems like this would keep the optimization better behaved. Is this possible? Thanks, Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce.labitt at autoliv.com Tue May 4 15:48:55 2010 From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com) Date: Tue, 4 May 2010 15:48:55 -0400 Subject: [SciPy-User] lstsq error under Windows? Message-ID: I have found an issue with scipy.linalg.lstsq, I think. The following code works in Ubuntu 10.04 x86-64, but not in WinXP-32. I think it should work in WinXP. Here is a minimum example: """==== program testlstsq.py ======================""" from scipy.linalg import eig, lstsq from numpy import angle, zeros, pi, arcsin A = zeros((4,1), dtype='complex') B = zeros((4,1), dtype='complex') A[0,0] = -0.535412460549-2.65798938848e-17j A[1,0] = -0.369432866546-0.131765700574j A[2,0] = -0.222906796932-0.263237285725j A[3,0] = -0.069087096386-0.38609560454j B[0,0] = -0.369432866546-0.131765700574j B[1,0] = -0.222906796932-0.263237285725j B[2,0] = -0.069087096386-0.38609560454j B[3,0] = 0.0882283631514-0.528093039953j try: print 'Got here' phi = lstsq(A,B) print 'Finished lstsq' except: print 'Exception Occurred' else: for a in range(len(phi)): print 'phi[',a,']=',phi[a] w = -angle( eig(phi[0])[0][:] ) d = 0.5 aa = arcsin( w / (2.0*pi) )*180./pi # in degrees print 'aa unsorted =', aa """ ==== end of testlstsq.py ==========================""" If this is run under ipython, or python on Ubuntu, the answer is: In [3]: run testlstsq.py Got here Finished lstsq phi[ 0 ]= [[ 0.88271111+0.38811028j]] phi[ 1 ]= [-0.04649206-0.01274722j] phi[ 2 ]= 1 phi[ 3 ]= [ 0.84459073] aoa unsorted = [-3.780145] If testlstsq.py is run under WinXP-32 one gets the following result: > python testlstsq.py Got here ** On entry to ZGELSS parameter number 12 had an illegal value I think that ZGELSS is in LAPACK. After that, I am in over my head. Can someone help with this? Under Windows I am running: WinXP-x86-32 Python(x,y)2.6.2.0 --> Python 2.6.2 numpy 1.3.0 scipy 0.7.1 ipython 0.10 For Linux, Ubuntu 10.04 x86-64 Python 2.6.5 numpy 1.30 scipy 0.7.0-2 ipython 0.10-1 Thanks for any and all help! -Bruce ****************************** Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
****************************** From josef.pktd at gmail.com Tue May 4 16:10:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 May 2010 16:10:04 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: References: Message-ID: On Tue, May 4, 2010 at 3:48 PM, wrote: > I have found an issue with scipy.linalg.lstsq, I think. ?The following > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it should > work in WinXP. ?Here is a minimum example: > > """==== program testlstsq.py ======================""" > from scipy.linalg import eig, lstsq > from numpy import angle, zeros, pi, arcsin > > A = zeros((4,1), dtype='complex') > B = zeros((4,1), dtype='complex') > > A[0,0] = -0.535412460549-2.65798938848e-17j > A[1,0] = -0.369432866546-0.131765700574j > A[2,0] = -0.222906796932-0.263237285725j > A[3,0] = -0.069087096386-0.38609560454j > > B[0,0] = -0.369432866546-0.131765700574j > B[1,0] = -0.222906796932-0.263237285725j > B[2,0] = -0.069087096386-0.38609560454j > B[3,0] = ?0.0882283631514-0.528093039953j > > try: > ? ?print 'Got here' > ? ?phi = lstsq(A,B) > ? ?print 'Finished lstsq' > except: > ? ?print 'Exception Occurred' > else: > ? ?for a in range(len(phi)): > ? ? ? ?print 'phi[',a,']=',phi[a] > > ? ?w = -angle( eig(phi[0])[0][:] ) > ? ?d = 0.5 > ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees > ? ?print 'aa unsorted =', aa > """ ==== end of testlstsq.py ==========================""" > > If this is run under ipython, or python on Ubuntu, the answer is: > > In [3]: run testlstsq.py > Got here > Finished lstsq > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > phi[ 1 ]= [-0.04649206-0.01274722j] > phi[ 2 ]= 1 > phi[ 3 ]= [ 0.84459073] > aoa unsorted = [-3.780145] > > If testlstsq.py is run under WinXP-32 one gets the following result: >> python testlstsq.py > Got here > ?** On entry to ZGELSS parameter number 12 had an illegal value > > I think that ZGELSS is in LAPACK. ?After that, I am in over my head. ?Can > someone help with this? > > Under Windows I am running: > WinXP-x86-32 > Python(x,y)2.6.2.0 --> Python 2.6.2 > numpy 1.3.0 > scipy 0.7.1 > ipython 0.10 no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy 0.8dev_something Got here Finished lstsq phi[ 0 ]= [[ 0.88271111+0.38811028j]] phi[ 1 ]= [-0.04649206-0.01274722j] phi[ 2 ]= 1 phi[ 3 ]= [ 0.84459073] aa unsorted = [-3.780145] after replacing scipy.linalg with numpy.linalg, same result Got here Finished lstsq phi[ 0 ]= [[ 0.88271111+0.38811028j]] phi[ 1 ]= [-0.04649206-0.01274722j] phi[ 2 ]= 1 phi[ 3 ]= [ 0.84459073] aa unsorted = [-3.780145] Josef > > For Linux, Ubuntu 10.04 x86-64 > Python 2.6.5 > numpy 1.30 > scipy 0.7.0-2 > ipython 0.10-1 > > Thanks for any and all help! > -Bruce > > > > ****************************** > Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. ?Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
> ****************************** > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bruce.labitt at autoliv.com Tue May 4 16:27:16 2010 From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com) Date: Tue, 4 May 2010 16:27:16 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: Message-ID: scipy-user-bounces at scipy.org wrote on 05/04/2010 03:48:55 PM: > I have found an issue with scipy.linalg.lstsq, I think. The following > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. I think it should > work in WinXP. Here is a minimum example: > > """==== program testlstsq.py ======================""" > from scipy.linalg import eig, lstsq > from numpy import angle, zeros, pi, arcsin > > A = zeros((4,1), dtype='complex') > B = zeros((4,1), dtype='complex') > > A[0,0] = -0.535412460549-2.65798938848e-17j > A[1,0] = -0.369432866546-0.131765700574j > A[2,0] = -0.222906796932-0.263237285725j > A[3,0] = -0.069087096386-0.38609560454j > > B[0,0] = -0.369432866546-0.131765700574j > B[1,0] = -0.222906796932-0.263237285725j > B[2,0] = -0.069087096386-0.38609560454j > B[3,0] = 0.0882283631514-0.528093039953j > > try: > print 'Got here' > phi = lstsq(A,B) > print 'Finished lstsq' > except: > print 'Exception Occurred' > else: > for a in range(len(phi)): > print 'phi[',a,']=',phi[a] > > w = -angle( eig(phi[0])[0][:] ) > d = 0.5 > aa = arcsin( w / (2.0*pi) )*180./pi # in degrees > print 'aa unsorted =', aa > """ ==== end of testlstsq.py ==========================""" > > If this is run under ipython, or python on Ubuntu, the answer is: > > In [3]: run testlstsq.py > Got here > Finished lstsq > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > phi[ 1 ]= [-0.04649206-0.01274722j] > phi[ 2 ]= 1 > phi[ 3 ]= [ 0.84459073] > aoa unsorted = [-3.780145] > > If testlstsq.py is run under WinXP-32 one gets the following result: > > python testlstsq.py > Got here > ** On entry to ZGELSS parameter number 12 had an illegal value > > I think that ZGELSS is in LAPACK. After that, I am in over my head. Can > someone help with this? > > Under Windows I am running: > WinXP-x86-32 > Python(x,y)2.6.2.0 --> Python 2.6.2 > numpy 1.3.0 > scipy 0.7.1 > ipython 0.10 > > For Linux, Ubuntu 10.04 x86-64 > Python 2.6.5 > numpy 1.30 > scipy 0.7.0-2 > ipython 0.10-1 > > Thanks for any and all help! > -Bruce > > Per Josef's observation that scipy.linalg.lstsq and numpy.lstsq behaved the same for him, I tried changing my code to - from scipy.linalg import eig, lstsq + from scipy.linalg import eig + from numpy.linalg import lstsq and reran the code and got - >pythonw -u "testlstsq.py" Got here Finished lstsq phi[ 0 ]= [[ 0.88271111+0.38811028j]] phi[ 1 ]= [-0.04649206-0.01274722j] phi[ 2 ]= 1 phi[ 3 ]= [ 0.84459073] aoa unsorted = [-3.780145] >Exit code: 0 So scipy.linalg.lstsq version 0.7.1 under WinXP-32 faults using the CME and numpy.linalg.lstsq version 1.3.0 does not. It looks like I have an answer for now, and scipy dev's might have a mini-testbench to test against. Thanks for the help! -Bruce ****************************** Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
****************************** From josef.pktd at gmail.com Tue May 4 16:32:30 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 May 2010 16:32:30 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: References: Message-ID: On Tue, May 4, 2010 at 4:27 PM, wrote: > scipy-user-bounces at scipy.org wrote on 05/04/2010 03:48:55 PM: > >> I have found an issue with scipy.linalg.lstsq, I think. ?The following >> code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it > should >> work in WinXP. ?Here is a minimum example: >> >> """==== program testlstsq.py ======================""" >> from scipy.linalg import eig, lstsq >> from numpy import angle, zeros, pi, arcsin >> >> A = zeros((4,1), dtype='complex') >> B = zeros((4,1), dtype='complex') >> >> A[0,0] = -0.535412460549-2.65798938848e-17j >> A[1,0] = -0.369432866546-0.131765700574j >> A[2,0] = -0.222906796932-0.263237285725j >> A[3,0] = -0.069087096386-0.38609560454j >> >> B[0,0] = -0.369432866546-0.131765700574j >> B[1,0] = -0.222906796932-0.263237285725j >> B[2,0] = -0.069087096386-0.38609560454j >> B[3,0] = ?0.0882283631514-0.528093039953j >> >> try: >> ? ? print 'Got here' >> ? ? phi = lstsq(A,B) >> ? ? print 'Finished lstsq' >> except: >> ? ? print 'Exception Occurred' >> else: >> ? ? for a in range(len(phi)): >> ? ? ? ? print 'phi[',a,']=',phi[a] >> >> ? ? w = -angle( eig(phi[0])[0][:] ) >> ? ? d = 0.5 >> ? ? aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees >> ? ? print 'aa unsorted =', aa >> """ ==== end of testlstsq.py ==========================""" >> >> If this is run under ipython, or python on Ubuntu, the answer is: >> >> In [3]: run testlstsq.py >> Got here >> Finished lstsq >> phi[ 0 ]= [[ 0.88271111+0.38811028j]] >> phi[ 1 ]= [-0.04649206-0.01274722j] >> phi[ 2 ]= 1 >> phi[ 3 ]= [ 0.84459073] >> aoa unsorted = [-3.780145] >> >> If testlstsq.py is run under WinXP-32 one gets the following result: >> > python testlstsq.py >> Got here >> ?** On entry to ZGELSS parameter number 12 had an illegal value >> >> I think that ZGELSS is in LAPACK. ?After that, I am in over my head. Can > >> someone help with this? >> >> Under Windows I am running: >> WinXP-x86-32 >> Python(x,y)2.6.2.0 --> Python 2.6.2 >> numpy 1.3.0 >> scipy 0.7.1 >> ipython 0.10 >> >> For Linux, Ubuntu 10.04 x86-64 >> Python 2.6.5 >> numpy 1.30 >> scipy 0.7.0-2 >> ipython 0.10-1 >> >> Thanks for any and all help! >> -Bruce >> >> > > Per Josef's observation that scipy.linalg.lstsq and numpy.lstsq behaved > the same for him, I tried changing my code to > > - from scipy.linalg import eig, lstsq > + from scipy.linalg import eig > + from numpy.linalg import lstsq > > and reran the code and got - >>pythonw -u "testlstsq.py" > Got here > Finished lstsq > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > phi[ 1 ]= [-0.04649206-0.01274722j] > phi[ 2 ]= 1 > phi[ 3 ]= [ 0.84459073] > aoa unsorted = [-3.780145] >>Exit code: 0 > > So scipy.linalg.lstsq version 0.7.1 under WinXP-32 faults using the CME > and > numpy.linalg.lstsq version 1.3.0 does not. > > It looks like I have an answer for now, and scipy dev's might have a > mini-testbench to test against. It could be that this is a problem with the transition to python 2.6, that might have gone away in the meantime. I never had problems with python 2.5 and scipy.linalg Josef > > Thanks for the help! > -Bruce > > ****************************** > Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. ?Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
> ****************************** > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bruce.labitt at autoliv.com Tue May 4 16:15:42 2010 From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com) Date: Tue, 4 May 2010 16:15:42 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: Message-ID: scipy-user-bounces at scipy.org wrote on 05/04/2010 04:10:04 PM: > On Tue, May 4, 2010 at 3:48 PM, wrote: > > I have found an issue with scipy.linalg.lstsq, I think. The following > > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. I think it should > > work in WinXP. Here is a minimum example: > > > > """==== program testlstsq.py ======================""" > > from scipy.linalg import eig, lstsq > > from numpy import angle, zeros, pi, arcsin > > > > A = zeros((4,1), dtype='complex') > > B = zeros((4,1), dtype='complex') > > > > A[0,0] = -0.535412460549-2.65798938848e-17j > > A[1,0] = -0.369432866546-0.131765700574j > > A[2,0] = -0.222906796932-0.263237285725j > > A[3,0] = -0.069087096386-0.38609560454j > > > > B[0,0] = -0.369432866546-0.131765700574j > > B[1,0] = -0.222906796932-0.263237285725j > > B[2,0] = -0.069087096386-0.38609560454j > > B[3,0] = 0.0882283631514-0.528093039953j > > > > try: > > print 'Got here' > > phi = lstsq(A,B) > > print 'Finished lstsq' > > except: > > print 'Exception Occurred' > > else: > > for a in range(len(phi)): > > print 'phi[',a,']=',phi[a] > > > > w = -angle( eig(phi[0])[0][:] ) > > d = 0.5 > > aa = arcsin( w / (2.0*pi) )*180./pi # in degrees > > print 'aa unsorted =', aa > > """ ==== end of testlstsq.py ==========================""" > > > > If this is run under ipython, or python on Ubuntu, the answer is: > > > > In [3]: run testlstsq.py > > Got here > > Finished lstsq > > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > > phi[ 1 ]= [-0.04649206-0.01274722j] > > phi[ 2 ]= 1 > > phi[ 3 ]= [ 0.84459073] > > aoa unsorted = [-3.780145] > > > > If testlstsq.py is run under WinXP-32 one gets the following result: > >> python testlstsq.py > > Got here > > ** On entry to ZGELSS parameter number 12 had an illegal value > > > > I think that ZGELSS is in LAPACK. After that, I am in over my head. Can > > someone help with this? > > > > Under Windows I am running: > > WinXP-x86-32 > > Python(x,y)2.6.2.0 --> Python 2.6.2 > > numpy 1.3.0 > > scipy 0.7.1 > > ipython 0.10 > > no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy 0.8dev_something Is this WinXP-32? I notice you have an earlier python, and a later version of numpy and scipy. I thought numpy 1.4.0 was "recalled" a while back. I'll try numpy.linalg to see if there is a difference. > > Got here > Finished lstsq > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > phi[ 1 ]= [-0.04649206-0.01274722j] > phi[ 2 ]= 1 > phi[ 3 ]= [ 0.84459073] > aa unsorted = [-3.780145] > > after replacing scipy.linalg with numpy.linalg, same result > > Got here > Finished lstsq > phi[ 0 ]= [[ 0.88271111+0.38811028j]] > phi[ 1 ]= [-0.04649206-0.01274722j] > phi[ 2 ]= 1 > phi[ 3 ]= [ 0.84459073] > aa unsorted = [-3.780145] > > Josef Thanks for trying this! -Bruce > > > > > For Linux, Ubuntu 10.04 x86-64 > > Python 2.6.5 > > numpy 1.30 > > scipy 0.7.0-2 > > ipython 0.10-1 > > > > Thanks for any and all help! > > -Bruce > > > > > > ****************************** Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
****************************** From josef.pktd at gmail.com Tue May 4 17:22:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 May 2010 17:22:55 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: References: Message-ID: On Tue, May 4, 2010 at 4:15 PM, wrote: > scipy-user-bounces at scipy.org wrote on 05/04/2010 04:10:04 PM: > >> On Tue, May 4, 2010 at 3:48 PM, ? wrote: >> > I have found an issue with scipy.linalg.lstsq, I think. ?The following >> > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it > should >> > work in WinXP. ?Here is a minimum example: >> > >> > """==== program testlstsq.py ======================""" >> > from scipy.linalg import eig, lstsq >> > from numpy import angle, zeros, pi, arcsin >> > >> > A = zeros((4,1), dtype='complex') >> > B = zeros((4,1), dtype='complex') >> > >> > A[0,0] = -0.535412460549-2.65798938848e-17j >> > A[1,0] = -0.369432866546-0.131765700574j >> > A[2,0] = -0.222906796932-0.263237285725j >> > A[3,0] = -0.069087096386-0.38609560454j >> > >> > B[0,0] = -0.369432866546-0.131765700574j >> > B[1,0] = -0.222906796932-0.263237285725j >> > B[2,0] = -0.069087096386-0.38609560454j >> > B[3,0] = ?0.0882283631514-0.528093039953j >> > >> > try: >> > ? ?print 'Got here' >> > ? ?phi = lstsq(A,B) >> > ? ?print 'Finished lstsq' >> > except: >> > ? ?print 'Exception Occurred' >> > else: >> > ? ?for a in range(len(phi)): >> > ? ? ? ?print 'phi[',a,']=',phi[a] >> > >> > ? ?w = -angle( eig(phi[0])[0][:] ) >> > ? ?d = 0.5 >> > ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees >> > ? ?print 'aa unsorted =', aa >> > """ ==== end of testlstsq.py ==========================""" >> > >> > If this is run under ipython, or python on Ubuntu, the answer is: >> > >> > In [3]: run testlstsq.py >> > Got here >> > Finished lstsq >> > phi[ 0 ]= [[ 0.88271111+0.38811028j]] >> > phi[ 1 ]= [-0.04649206-0.01274722j] >> > phi[ 2 ]= 1 >> > phi[ 3 ]= [ 0.84459073] >> > aoa unsorted = [-3.780145] >> > >> > If testlstsq.py is run under WinXP-32 one gets the following result: >> >> python testlstsq.py >> > Got here >> > ?** On entry to ZGELSS parameter number 12 had an illegal value >> > >> > I think that ZGELSS is in LAPACK. ?After that, I am in over my head. > ?Can >> > someone help with this? >> > >> > Under Windows I am running: >> > WinXP-x86-32 >> > Python(x,y)2.6.2.0 --> Python 2.6.2 >> > numpy 1.3.0 >> > scipy 0.7.1 >> > ipython 0.10 >> >> no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy > 0.8dev_something > > Is this WinXP-32? Yes > > I notice you have an earlier python, and a later version of numpy and > scipy. ?I thought numpy 1.4.0 was "recalled" a while back. ?I'll try > numpy.linalg to see if there is a difference. I recompiled most packages after numpy 1.4.0 came out and I'm too lazy or too busy to figure out what I need to recompile to switch to numpy 1.4.1. (I just avoid the things that crash with 1.4.0) Josef > >> >> Got here >> Finished lstsq >> phi[ 0 ]= [[ 0.88271111+0.38811028j]] >> phi[ 1 ]= [-0.04649206-0.01274722j] >> phi[ 2 ]= 1 >> phi[ 3 ]= [ 0.84459073] >> aa unsorted = [-3.780145] >> >> after replacing scipy.linalg with numpy.linalg, same result >> >> Got here >> Finished lstsq >> phi[ 0 ]= [[ 0.88271111+0.38811028j]] >> phi[ 1 ]= [-0.04649206-0.01274722j] >> phi[ 2 ]= 1 >> phi[ 3 ]= [ 0.84459073] >> aa unsorted = [-3.780145] >> >> Josef > > Thanks for trying this! > -Bruce > >> >> > >> > For Linux, Ubuntu 10.04 x86-64 >> > Python 2.6.5 >> > numpy 1.30 >> > scipy 0.7.0-2 >> > ipython 0.10-1 >> > >> > Thanks for any and all help! >> > -Bruce >> > >> > >> > > > > > ****************************** > Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. ?Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
> ****************************** > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From denis-bz-gg at t-online.de Wed May 5 07:23:54 2010 From: denis-bz-gg at t-online.de (denis) Date: Wed, 5 May 2010 04:23:54 -0700 (PDT) Subject: [SciPy-User] Interpolate based on three closest points In-Reply-To: References: Message-ID: <1e43539b-ae21-4d45-a9f6-e35a4b7c2e87@p2g2000yqh.googlegroups.com> On Apr 21, 4:04?pm, Tom Foutz wrote: > Hi everybody, > I have an irregular mesh of ~1e5 data points, with unreliable connection > data. ?I am trying to interpolate based on these points. ? ... Tom, instead of iterating to a triangle containing the point of interest, you could always take say 6 neighbors (see below), then average their 6 values, distance-weighted as Anne suggests. It's a clear speed / accuracy tradeoff: there's some chance that a point is not in the convex hull of its 6 nearest neighbors, but even then you're using 6 values, not 3. What's the probability that N random points around the origin land in >= 3 of 4 quadrants in 2d, or >= 5 of 8 octants in 3d ? My back-of-the-envelope estimate of this probability (unchecked) is ndim N prob one-sided ~ 2*ndim/2^N ? ------------------------------------ 2 6 6 % 3 6 9 % 3 7 5 % ==> taking 6 neighbors or so is seldom one-sided. Experts correct me ? By the way, exactly-N-nearest interpolation may have discontinuities. Consider N+1 points on a circle around the point of interest: the result depends on which N you take. In practice, not a problem. Also, RBF of N initial points sets up and solves an N x N linear system so is impractical for large N. Furthermore the matrix can be ill- conditioned. cheers -- denis From agile.aspect at gmail.com Wed May 5 14:49:16 2010 From: agile.aspect at gmail.com (Agile Aspect) Date: Wed, 5 May 2010 11:49:16 -0700 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: References: Message-ID: On Tue, May 4, 2010 at 12:48 PM, wrote: > I have found an issue with scipy.linalg.lstsq, I think. ?The following > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it should > work in WinXP. ?Here is a minimum example: > > """==== program testlstsq.py ======================""" > from scipy.linalg import eig, lstsq > from numpy import angle, zeros, pi, arcsin > > A = zeros((4,1), dtype='complex') > B = zeros((4,1), dtype='complex') > > A[0,0] = -0.535412460549-2.65798938848e-17j > A[1,0] = -0.369432866546-0.131765700574j > A[2,0] = -0.222906796932-0.263237285725j > A[3,0] = -0.069087096386-0.38609560454j > > B[0,0] = -0.369432866546-0.131765700574j > B[1,0] = -0.222906796932-0.263237285725j > B[2,0] = -0.069087096386-0.38609560454j > B[3,0] = ?0.0882283631514-0.528093039953j > > try: > ? ?print 'Got here' > ? ?phi = lstsq(A,B) > ? ?print 'Finished lstsq' > except: > ? ?print 'Exception Occurred' > else: > ? ?for a in range(len(phi)): > ? ? ? ?print 'phi[',a,']=',phi[a] > > ? ?w = -angle( eig(phi[0])[0][:] ) > ? ?d = 0.5 > ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees > ? ?print 'aa unsorted =', aa > """ ==== end of testlstsq.py ==========================""" > On Ubuntu, what version of ALTAS is being used? If I modify the 'except' to read print 'Exception Occurred', sys.exc_info()[0] the above code generates the enclosed error on CentOS 5 and Fedora 9 running python 2.5 and python 2.6, respectively, and both using scipy-0.71 and nump 1.30. Both plaforms use the same version of ATLAS, namely 3.8.2. All software was built from source on the respective platforms. Traceback (most recent call last): File "./lstsq.py", line 22, in phi = lstsq(A,B) File "/usr/devtools/lib/python2.6/site-packages/scipy/linalg/basic.py", line 549, in lstsq overwrite_b = overwrite_b) ValueError: On entry to ZGELSS parameter number 12 had an illegal value -- Enjoy global warming while it lasts. From eijkhout at tacc.utexas.edu Wed May 5 22:18:11 2010 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Wed, 5 May 2010 21:18:11 -0500 Subject: [SciPy-User] getting started with ndarray Message-ID: I found the "guide to numpy" book, but I can't figure out how to create a multi-dimensional array. Is there a short tutorial? Or can someone give me a short example program with the most relevant features? Victor. From kwgoodman at gmail.com Wed May 5 22:24:16 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 5 May 2010 19:24:16 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 7:18 PM, Victor Eijkhout wrote: > I found the "guide to numpy" book, but I can't figure out how to > create a multi-dimensional array. Is there a short tutorial? Or can > someone give me a short example program with the most relevant features? Here's one way to create arrays: One dimensional: >> import numpy as np >> x1 = np.array([1, 2, 3]) >> x1.ndim 1 >> x1 array([1, 2, 3]) Two dimensional: >> x2 = np.array([[1, 2], [3, 4]]) >> x2.ndim 2 >> x2 array([[1, 2], [3, 4]]) Three dimensional: >> x3 = np.random.rand(2, 3, 4) >> x3.ndim 3 >> x3.shape (2, 3, 4) >> x3 array([[[ 0.85887601, 0.2988635 , 0.93155938, 0.48419988], [ 0.677853 , 0.67478433, 0.7065251 , 0.49045808], [ 0.87160361, 0.55503905, 0.36378423, 0.39314846]], [[ 0.80761194, 0.54838378, 0.80576339, 0.08248982], [ 0.16729305, 0.16320019, 0.5628961 , 0.77325458], [ 0.7073337 , 0.08927084, 0.89050264, 0.54985488]]]) From eijkhout at tacc.utexas.edu Wed May 5 22:28:41 2010 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Wed, 5 May 2010 21:28:41 -0500 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On 2010/05/05, at 9:24 PM, Keith Goodman wrote: > Here's one way to create arrays: Thanks. Suppose I don't have the data yet, but I simple want to allocate a, oh let's say, 5000x300x20 array? Victor. From d.l.goldsmith at gmail.com Wed May 5 22:31:17 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 5 May 2010 19:31:17 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: It's rare that one wants to "manually" create arrays using np.array, however. Typically, the arrays used in numerics are "standard" in some way (e.g., the array corresponding to the n x n identity matrix, obtained from np.eye(n)) and thus numpy has many, _many_ functions which provide these various standard arrays for you. Even in the case of an array created from data in a file, one doesn't read in the data and then pass it as an argument to np.array - there's a numpy function which reads the file-stored data directly into an array for you. DG On Wed, May 5, 2010 at 7:24 PM, Keith Goodman wrote: > On Wed, May 5, 2010 at 7:18 PM, Victor Eijkhout > wrote: > > I found the "guide to numpy" book, but I can't figure out how to > > create a multi-dimensional array. Is there a short tutorial? Or can > > someone give me a short example program with the most relevant features? > > Here's one way to create arrays: > > One dimensional: > > >> import numpy as np > >> x1 = np.array([1, 2, 3]) > >> x1.ndim > 1 > >> x1 > array([1, 2, 3]) > > Two dimensional: > > >> x2 = np.array([[1, 2], [3, 4]]) > >> x2.ndim > 2 > >> x2 > array([[1, 2], > [3, 4]]) > > Three dimensional: > > >> x3 = np.random.rand(2, 3, 4) > >> x3.ndim > 3 > >> x3.shape > (2, 3, 4) > >> x3 > array([[[ 0.85887601, 0.2988635 , 0.93155938, 0.48419988], > [ 0.677853 , 0.67478433, 0.7065251 , 0.49045808], > [ 0.87160361, 0.55503905, 0.36378423, 0.39314846]], > > [[ 0.80761194, 0.54838378, 0.80576339, 0.08248982], > [ 0.16729305, 0.16320019, 0.5628961 , 0.77325458], > [ 0.7073337 , 0.08927084, 0.89050264, 0.54985488]]]) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed May 5 22:44:45 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 5 May 2010 19:44:45 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout wrote: > > On 2010/05/05, at 9:24 PM, Keith Goodman wrote: > > > Here's one way to create arrays: > > Thanks. Suppose I don't have the data yet, but I simple want to > allocate a, oh let's say, 5000x300x20 array? > >>> import numpy as np >>> a = np.zeros((5000, 300, 20)) >>> a.shape (5000L, 300L, 20L) IIRC, there's a quicker way if you don't need the array's values initialized, but I forget what it is. DG > > Victor. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Wed May 5 22:52:09 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 5 May 2010 19:52:09 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout wrote: > > On 2010/05/05, at 9:24 PM, Keith Goodman wrote: > >> Here's one way to create arrays: > > Thanks. Suppose I don't have the data yet, but I simple want to > allocate a, oh let's say, 5000x300x20 array? >> import numpy as np >> x = np.zeros((5000, 300, 20)) >> x.ndim 3 >> x.sum() 0.0 >> x[0,0,0] = 9 >> x.sum() 9.0 >> x[0] = 1 # All elements set to zero of first 300x20 slice >> x.sum() 6000.0 From ben.v.root at gmail.com Wed May 5 22:57:27 2010 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 5 May 2010 21:57:27 -0500 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 9:44 PM, David Goldsmith wrote: > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout wrote: > >> >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote: >> >> > Here's one way to create arrays: >> >> Thanks. Suppose I don't have the data yet, but I simple want to >> allocate a, oh let's say, 5000x300x20 array? >> > > >>> import numpy as np > >>> a = np.zeros((5000, 300, 20)) > >>> a.shape > (5000L, 300L, 20L) > > IIRC, there's a quicker way if you don't need the array's values > initialized, but I forget what it is. > >>> import numpy as np >>> a = np.empty((5000, 300, 20)) >>> a.shape (5000, 300, 20) Ben > DG > >> >> Victor. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Wed May 5 22:58:47 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 5 May 2010 19:58:47 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 7:44 PM, David Goldsmith wrote: > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout > wrote: >> >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote: >> >> > Here's one way to create arrays: >> >> Thanks. Suppose I don't have the data yet, but I simple want to >> allocate a, oh let's say, 5000x300x20 array? > >>>> import numpy as np >>>> a = np.zeros((5000, 300, 20)) >>>> a.shape > (5000L, 300L, 20L) > > IIRC, there's a quicker way if you don't need the array's values > initialized, but I forget what it is. Faster to computer, but slower to grok: >> timeit np.zeros((5000, 300, 20)) 10 loops, best of 3: 119 ms per loop >> timeit np.empty((5000, 300, 20)) 100000 loops, best of 3: 6.72 us per loop >> timeit x = np.empty((5000, 300, 20)); x.fill(0.0) 10 loops, best of 3: 115 ms per loop From d.l.goldsmith at gmail.com Thu May 6 00:48:58 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 5 May 2010 21:48:58 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: On Wed, May 5, 2010 at 7:58 PM, Keith Goodman wrote: > On Wed, May 5, 2010 at 7:44 PM, David Goldsmith > wrote: > > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout < > eijkhout at tacc.utexas.edu> > > wrote: > >> > >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote: > >> > >> > Here's one way to create arrays: > >> > >> Thanks. Suppose I don't have the data yet, but I simple want to > >> allocate a, oh let's say, 5000x300x20 array? > > > >>>> import numpy as np > >>>> a = np.zeros((5000, 300, 20)) > >>>> a.shape > > (5000L, 300L, 20L) > > > > IIRC, there's a quicker way if you don't need the array's values > > initialized, but I forget what it is. > > Faster to computer, but slower to grok: > > >> timeit np.zeros((5000, 300, 20)) > 10 loops, best of 3: 119 ms per loop > >> timeit np.empty((5000, 300, 20)) > 100000 loops, best of 3: 6.72 us per loop > >> timeit x = np.empty((5000, 300, 20)); x.fill(0.0) > 10 loops, best of 3: 115 ms per loop > Interesting, thanks guys! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu May 6 00:59:06 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 5 May 2010 21:59:06 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: BTW Victor: FWIW, the "center" of the ndarray universe is NumPy - though I'm sure there's a great deal of overlap in the subscribers lists, for your ndarray-specific questions (indeed, for any object that resides in NumPy, as opposed to SciPy), it might be more efficient to post to and monitor numpy-discussion at scipy.org. As a general rule: if it gets imported from NumPy, ask on the numpy-discussion list, if it gets imported from SciPy, ask here (scipy-user). But this is just a suggestion - you'll get your numpy questions answered in either place... (but scipy questions on the numpy list, that's probably a different story). Again, FWIW DG On Wed, May 5, 2010 at 9:48 PM, David Goldsmith wrote: > On Wed, May 5, 2010 at 7:58 PM, Keith Goodman wrote: > >> On Wed, May 5, 2010 at 7:44 PM, David Goldsmith >> wrote: >> > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout < >> eijkhout at tacc.utexas.edu> >> > wrote: >> >> >> >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote: >> >> >> >> > Here's one way to create arrays: >> >> >> >> Thanks. Suppose I don't have the data yet, but I simple want to >> >> allocate a, oh let's say, 5000x300x20 array? >> > >> >>>> import numpy as np >> >>>> a = np.zeros((5000, 300, 20)) >> >>>> a.shape >> > (5000L, 300L, 20L) >> > >> > IIRC, there's a quicker way if you don't need the array's values >> > initialized, but I forget what it is. >> >> Faster to computer, but slower to grok: >> >> >> timeit np.zeros((5000, 300, 20)) >> 10 loops, best of 3: 119 ms per loop >> >> timeit np.empty((5000, 300, 20)) >> 100000 loops, best of 3: 6.72 us per loop >> >> timeit x = np.empty((5000, 300, 20)); x.fill(0.0) >> 10 loops, best of 3: 115 ms per loop >> > > Interesting, thanks guys! > > DG > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Thu May 6 05:28:48 2010 From: denis-bz-gg at t-online.de (denis) Date: Thu, 6 May 2010 02:28:48 -0700 (PDT) Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: <99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com> I'd recommend http://scipy.org/Cookbook/BuildingArrays, then http://scipy.org/Cookbook/Indexing (can't resist quoting from Indexing: "numpy and scipy provide a few other types that behave like arrays, in particular matrices and sparse matrices. Their indexing can differ from that of arrays in surprising ways") Also http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html is a nice 2-page cheat sheet. cheers -- denis From bruce.labitt at autoliv.com Thu May 6 09:23:54 2010 From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com) Date: Thu, 6 May 2010 09:23:54 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: Message-ID: scipy-user-bounces at scipy.org wrote on 05/05/2010 02:49:16 PM: > On Tue, May 4, 2010 at 12:48 PM, wrote: > > I have found an issue with scipy.linalg.lstsq, I think. The following > > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. I think it should > > work in WinXP. Here is a minimum example: > > > > """==== program testlstsq.py ======================""" > > from scipy.linalg import eig, lstsq > > from numpy import angle, zeros, pi, arcsin > > > > A = zeros((4,1), dtype='complex') > > B = zeros((4,1), dtype='complex') > > > > A[0,0] = -0.535412460549-2.65798938848e-17j > > A[1,0] = -0.369432866546-0.131765700574j > > A[2,0] = -0.222906796932-0.263237285725j > > A[3,0] = -0.069087096386-0.38609560454j > > > > B[0,0] = -0.369432866546-0.131765700574j > > B[1,0] = -0.222906796932-0.263237285725j > > B[2,0] = -0.069087096386-0.38609560454j > > B[3,0] = 0.0882283631514-0.528093039953j > > > > try: > > print 'Got here' > > phi = lstsq(A,B) > > print 'Finished lstsq' > > except: > > print 'Exception Occurred' > > else: > > for a in range(len(phi)): > > print 'phi[',a,']=',phi[a] > > > > w = -angle( eig(phi[0])[0][:] ) > > d = 0.5 > > aa = arcsin( w / (2.0*pi) )*180./pi # in degrees > > print 'aa unsorted =', aa > > """ ==== end of testlstsq.py ==========================""" > > > > On Ubuntu, what version of ALTAS is being used? I'm embarassed to say that I haven't compiled and run ATLAS on 10.04 yet. On my todo list. The latest ATLAS is 3.8.3 I think. So it appears I have the reference BLAS, uggh. Previously, I had ATLAS 3.8.2 on my machine (Ubuntu 9.10). However, I have no idea which lib the scipy and numpy Ubuntu packages link to. How does one find out? And how does one link scipy and numpy to the better ATLAS and LAPACK libs that one has optimized for one's machine? > > If I modify the 'except' to read > > print 'Exception Occurred', sys.exc_info()[0] > > the above code generates the enclosed error on CentOS 5 and Fedora 9 > running python 2.5 and python 2.6, respectively, and both using > scipy-0.71 and nump 1.30. > > Both plaforms use the same version of ATLAS, namely 3.8.2. > > All software was built from source on the respective platforms. > I need to do this... > Traceback (most recent call last): > File "./lstsq.py", line 22, in > phi = lstsq(A,B) > File "/usr/devtools/lib/python2.6/site-packages/scipy/linalg/basic.py", > line 549, in lstsq > overwrite_b = overwrite_b) > ValueError: On entry to ZGELSS parameter number 12 had an illegal value ZGELSS is the LAPACK Linear Least Squares solver. It is for double precision complex numbers. Having an illegal value on entry, looks like a bug, no? > > -- > Enjoy global warming while it lasts. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user ****************************** Neither the footer nor anything else in this E-mail is intended to or constitutes an
electronic signature and/or legally binding agreement in the absence of an
express statement or Autoliv policy and/or procedure to the contrary.
This E-mail and any attachments hereto are Autoliv property and may contain legally
privileged, confidential and/or proprietary information.
The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way
disseminating any material contained within this E-mail without prior written
permission from the author. If you receive this E-mail in error, please
immediately notify the author and delete this E-mail. Autoliv disclaims all
responsibility and liability for the consequences of any person who fails to
abide by the terms herein.
****************************** From jsseabold at gmail.com Thu May 6 10:44:40 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 6 May 2010 10:44:40 -0400 Subject: [SciPy-User] lstsq error under Windows? In-Reply-To: References: Message-ID: On Thu, May 6, 2010 at 9:23 AM, wrote: >> On Ubuntu, what version of ALTAS is being used? > > I'm embarassed to say that I haven't compiled and run ATLAS on 10.04 yet. > On my todo list. > The latest ATLAS is 3.8.3 I think. ?So it appears I have the reference > BLAS, uggh. > > Previously, I had ATLAS 3.8.2 on my machine (Ubuntu 9.10). ?However, I > have no idea which lib the scipy and numpy Ubuntu packages link to. ?How > does one find out? ?And how does one link scipy and numpy to the better > ATLAS and LAPACK libs that one has optimized for one's machine? > import numpy as np np.show_config() To link to your own, you need to edit site.cfg when you install. Skipper From hihighsky at gmail.com Thu May 6 10:54:35 2010 From: hihighsky at gmail.com (Tingting HAN) Date: Thu, 6 May 2010 16:54:35 +0200 Subject: [SciPy-User] problem with installing scipy Message-ID: Dear Officer, I work on linux and have python originally installed in the system: shau at tityro:/home/hantingting/Downloads$ python Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. I want to install package scipy, but there is the following problem: shau at tityro:/home/hantingting/Downloads/triMC3D/python$ sudo apt-get install scipy [sudo] password for shau: Reading package lists... Done Building dependency tree Reading state information... Done E: Couldn't find package scipy Could you please give me some advice to solve the problem or to properly install scipy? -- Yours sincerely, Sofia -------------- next part -------------- An HTML attachment was scrubbed... URL: From cr.anil at gmail.com Thu May 6 10:56:21 2010 From: cr.anil at gmail.com (Anil C R) Date: Thu, 6 May 2010 20:26:21 +0530 Subject: [SciPy-User] problem with installing scipy In-Reply-To: References: Message-ID: it's "sudo apt-get install python-scipy" Anil On Thu, May 6, 2010 at 8:24 PM, Tingting HAN wrote: > Dear Officer, > > I work on linux and have python originally installed in the system: > > shau at tityro:/home/hantingting/Downloads$ python > Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) > [GCC 4.4.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > I want to install package scipy, but there is the following problem: > > > shau at tityro:/home/hantingting/Downloads/triMC3D/python$ sudo apt-get > install scipy > [sudo] password for shau: > Reading package lists... Done > Building dependency tree > Reading state information... Done > E: Couldn't find package scipy > > Could you please give me some advice to solve the problem or to properly > install scipy? > -- > Yours sincerely, > > Sofia > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu May 6 12:43:50 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 6 May 2010 09:43:50 -0700 Subject: [SciPy-User] getting started with ndarray In-Reply-To: <99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com> References: <99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com> Message-ID: On Thu, May 6, 2010 at 2:28 AM, denis wrote: > I'd recommend http://scipy.org/Cookbook/BuildingArrays, then > http://scipy.org/Cookbook/Indexing > (can't resist quoting from Indexing: > "numpy and scipy provide a few other types that behave like arrays, in > particular matrices and sparse matrices. > Their indexing can differ from that of arrays in surprising ways") > > Also > http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html > is a nice 2-page cheat sheet. > Nice indeed, I just bookmarked it! Is there a link to that on the scipy Site? There should be! DG > cheers > -- denis > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amenity at enthought.com Thu May 6 14:53:12 2010 From: amenity at enthought.com (Amenity Applewhite) Date: Thu, 6 May 2010 13:53:12 -0500 Subject: [SciPy-User] SciPy 2010: Bioinformatic & Parallel/cloud talks announced...& register now! References: Message-ID: <48351A11-7A2D-4A12-B9EE-9483046DA10A@enthought.com> Hello! Things are moving quickly in preparation for SciPy 2010: Last week we announced the General Conference schedule (http://conference.scipy.org/scipy2010/schedule.html ), Tuesday we announced our student sponsorship recipients (http://conference.scipy.org/scipy2010/student.html ) and now we're ready to tell you give you a look at the talks we have lined up for our Bioinformatics and Parallel Processing /Cloud Computing tracks. ===Parallel Processing & Cloud Computing track=== We really appreciate Brian and Ken's work organizing the papers for this specialized track. And of course, thanks to everyone who submitted a paper. There has been a great deal of interest in this set of talks ? and word on the street is that Brian may even have a HPC tutorial up his sleeve... * StarCluster - NumPy/SciPy Computing in the Cloud- Justin Riley * pomsets: workflow management for your cloud- Michael J Pan * Getting Down with Big Data Jared Flatow, Anita Lillie, Ville Tuulos * StarFlow: A Cloud-Enables Python Workflow Engine for Scientific Analysis Pipelines Elaine Angelino, Dan Yamins, Margo Seltzer * A Programmatic Interface for Particle Plasma Simulation in Python, and Early Backend Results with PyCUDA Min Ragan-Kelley * Parallel Computing with IPython: an Application to Air Pollution Modeling B.E. Granger, J.G. Hemann * Astronomy App in the Cloud using Google Geo APIs and Python App Engine Shawn Shen ===Bioinformatics track=== Once again, we are indebted to Glen Otero, from Dell, for putting together the Bioinformatics track. He received some fantastic papers and we're really looking forward to these presentations: * Protein Folding with Python on Supercomputers Jan H. Meinke * Can Python Save Next-Generation Sequencing? * The Use of Galaxy for the Research and the Teaching of Genomics Roy Weckiewicz, Jim Hu, and Rodolfo Aramayo ===Early registration ends next Monday=== That's right: Only a few days left before rates increase! Think of all the BBQ and breakfast tacos you can buy with that $50-$100 you'll save by registering early. If that doesn't convince you, consider: -Cheap flights to Austin- Buy your tickets now for some very nice prices: $275 from Chicago, $330 from San Francisco, $380 from New York City, $810 from London...(prices from Kayak.com) -Convenient & affordable hotel- We got an fantastic deal for on-site accommodations at the AT&T Conference Center. Pay only $89/night for single occupancy or $105/ night for double occupancy. It will be great to have everyone staying in the same spot. Once you register, you'll get a code to book your hotel reservation. The discounted rate will be applied automatically. https://conference.scipy.org/scipy2010/accommodation.html No car necessary to get to the conference... and see Austin! An airport bus (http://capmetro.org/riding/current_schedules/maps/rt100_sb.pdf ) runs straight to and from the AT&T center, so you won't have to rent a car at all. Plus, the UT campus area is in walking distance to a number of great restaurants and activities. For any longer trips you'd like to make Austin has a great public bus system. Not to mention all of the mind-blowing things you'll learn and outstanding people you'll meet and catch up with. So what are you waiting for? Register: https://conference.scipy.org/scipy2010/registration.html Best, The SciPy 2010 Team @SciPy2010 on Twitter From stefan at sun.ac.za Thu May 6 17:40:42 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 6 May 2010 23:40:42 +0200 Subject: [SciPy-User] getting started with ndarray In-Reply-To: References: Message-ID: Hi Victor On 6 May 2010 04:18, Victor Eijkhout wrote: > I found the "guide to numpy" book, but I can't figure out how to > create a multi-dimensional array. Is there a short tutorial? I've got a short NumPy tutorial here which might help: http://mentat.za.net/numpy/intro/intro.html Regards St?fan From josef.pktd at gmail.com Fri May 7 12:40:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 12:40:05 -0400 Subject: [SciPy-User] inverse function of a spline Message-ID: I have a function y = f(x) which is monotonically increasing (a cumulative distribution function) f is defined by piecewise polynomial interpolation, an interpolating spline on some points I would like to get the inverse function (ppf) x = f^{-1} (y) if the spline is of higher order than linear In the linear case it's trivial, because the inverse function is also just a piecewise linear interpolation. If I have a cubic spline, or any other smooth interpolator in scipy, is there a way to get the inverse function directly? I don't know much about general properties of splines, and would appreciate any hints, so I can avoid numerical inversion (fsolve or similar) Thanks, Josef From charlesr.harris at gmail.com Fri May 7 13:57:04 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 May 2010 11:57:04 -0600 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 10:40 AM, wrote: > I have a function y = f(x) which is monotonically increasing (a > cumulative distribution function) > f is defined by piecewise polynomial interpolation, an interpolating > spline on some points > > I would like to get the inverse function (ppf) x = f^{-1} (y) > if the spline is of higher order than linear > > In the linear case it's trivial, because the inverse function is also > just a piecewise linear interpolation. > > If I have a cubic spline, or any other smooth interpolator in scipy, > is there a way to get the > inverse function directly? > > I don't know much about general properties of splines, and would > appreciate any hints, > so I can avoid numerical inversion (fsolve or similar) > > Since the curve is piecewise cubic the problem reduces to inverting a piece of a cubic, which inverse won't itself be a cubic in general. I think your best bet is interpolate the same points with x,y reversed, or resample using your spline and interpolate the new samples with x,y reversed. It won't be a exact inverse, but then, the original is probably not exact either. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 7 14:34:02 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 14:34:02 -0400 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 1:57 PM, Charles R Harris wrote: > > > On Fri, May 7, 2010 at 10:40 AM, wrote: >> >> I have a function ?y = f(x) which is monotonically increasing (a >> cumulative distribution function) >> f is defined by piecewise polynomial interpolation, an interpolating >> spline on some points >> >> I would like to get the inverse function (ppf) ?x = f^{-1} (y) >> if the spline is of higher order than linear >> >> In the linear case it's trivial, because the inverse function is also >> just a piecewise linear interpolation. >> >> If I have a cubic spline, or any other smooth interpolator in scipy, >> is there a way to get the >> inverse function directly? >> >> I don't know much about general properties of splines, and would >> appreciate any hints, >> so I can avoid numerical inversion (fsolve or similar) >> > > Since the curve is piecewise cubic the problem reduces to inverting a piece > of a cubic, which inverse won't itself be a cubic in general. I think your > best bet is interpolate the same points with x,y reversed, or resample using > your spline and interpolate the new samples with x,y reversed. It won't be a > exact inverse, but then, the original is probably not exact either. That's what I suspected, I was hoping for a trick (like one interpolator is the "natural" inverse of another one). resampling should give a good enough approximation. Without resampling, the error for round tripping x= f^{-1} ( f(x) ) might be too large to give consistent results. (Even if there are sampling and approximation errors, I still would prefer consistency.) Just a follow-up question on approximation: for the cdf (e.g. normal distribution) f:R->[0,1] f^{-1}:[0,1]->R Is it better to start with a spline on the inverse function (ppf), f^{-1}, because it has compact support, resample from it, and then create the cdf f from a resampled ppf; or the other way around, or it wouldn't really matter? Thanks for the information and hint, Josef > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From charlesr.harris at gmail.com Fri May 7 15:37:36 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 May 2010 13:37:36 -0600 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 12:34 PM, wrote: > On Fri, May 7, 2010 at 1:57 PM, Charles R Harris > wrote: > > > > > > On Fri, May 7, 2010 at 10:40 AM, wrote: > >> > >> I have a function y = f(x) which is monotonically increasing (a > >> cumulative distribution function) > >> f is defined by piecewise polynomial interpolation, an interpolating > >> spline on some points > >> > >> I would like to get the inverse function (ppf) x = f^{-1} (y) > >> if the spline is of higher order than linear > >> > >> In the linear case it's trivial, because the inverse function is also > >> just a piecewise linear interpolation. > >> > >> If I have a cubic spline, or any other smooth interpolator in scipy, > >> is there a way to get the > >> inverse function directly? > >> > >> I don't know much about general properties of splines, and would > >> appreciate any hints, > >> so I can avoid numerical inversion (fsolve or similar) > >> > > > > Since the curve is piecewise cubic the problem reduces to inverting a > piece > > of a cubic, which inverse won't itself be a cubic in general. I think > your > > best bet is interpolate the same points with x,y reversed, or resample > using > > your spline and interpolate the new samples with x,y reversed. It won't > be a > > exact inverse, but then, the original is probably not exact either. > > That's what I suspected, I was hoping for a trick (like one interpolator is > the > "natural" inverse of another one). > > resampling should give a good enough approximation. Without resampling, the > error for round tripping x= f^{-1} ( f(x) ) might be too large to give > consistent results. > (Even if there are sampling and approximation errors, I still would > prefer consistency.) > > > Just a follow-up question on approximation: > > for the cdf (e.g. normal distribution) f:R->[0,1] f^{-1}:[0,1]->R > > Is it better to start with a spline on the inverse function (ppf), > f^{-1}, because it has > compact support, resample from it, and then create the cdf f from a > resampled ppf; > or the other way around, or it wouldn't really matter? > > Thanks for the information and hint, > > I don't know the answer to that although I suspect starting from the inverse might be superior. The end points might be a problem though if the curve goes vertical. You might have to experiment a bit or use a spline in combination with other functions. There has probably been a small industry out there dealing with these sorts of problem but I don't know who they are. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.shepard at gmail.com Fri May 7 15:45:08 2010 From: peter.shepard at gmail.com (Pete Shepard) Date: Fri, 7 May 2010 12:45:08 -0700 Subject: [SciPy-User] fisherexact.py returns "NA" for large #s Message-ID: Hello List, I am using "fisherexact.py" to calculate the p-value of two ratios however, when large #s are involved, it returns "NA". Is there a way to override this? TIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 7 16:15:19 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 16:15:19 -0400 Subject: [SciPy-User] fisherexact.py returns "NA" for large #s In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 3:45 PM, Pete Shepard wrote: > Hello List, > > > I am using "fisherexact.py" to calculate the p-value of two ratios however, > when large #s are involved, it returns "NA". Is there a way to override > this? You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ? Do you have an example? Can you add it to the ticket? Do you have large ratios or large numbers in each cell? If you have a large number of entries in each cell, then the chisquare test or similar asymptotic tests should be pretty reliable. Last time I tried, I didn't manage to get rid of incorrect results if the first cell is zero. And I didn't understand the details of the algorithm well enough to figure out what's going on (within a reasonable time). If you add some print statements, you could find out if the nan comes from a 0./0. division or from the hypergeometric distribution. Do you get the same result if you permute rows or columns? fisherexact works very well over a large range of values, but I'm waiting for someone to provide a patch for the cases that don't work. Josef > > TIA > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From peridot.faceted at gmail.com Fri May 7 16:24:27 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Fri, 7 May 2010 16:24:27 -0400 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On 7 May 2010 12:40, wrote: > I have a function ?y = f(x) which is monotonically increasing (a > cumulative distribution function) > f is defined by piecewise polynomial interpolation, an interpolating > spline on some points > > I would like to get the inverse function (ppf) ?x = f^{-1} (y) > if the spline is of higher order than linear > > In the linear case it's trivial, because the inverse function is also > just a piecewise linear interpolation. > > If I have a cubic spline, or any other smooth interpolator in scipy, > is there a way to get the > inverse function directly? > > I don't know much about general properties of splines, and would > appreciate any hints, > so I can avoid numerical inversion (fsolve or similar) I should first say that even though your input points are monotonic, the spline is not guaranteed to be. (Though in practice if your sampled points have no sharp corners you're probably fine.) If this matters to you, there are algorithms for enforcing monotonicity of splines, some of which are simply procedures for jiggering the interpolation points just enough to avoid non-monotonicty and some of which are more clever. Sadly none are implemented in scipy. As Charles Harris pointed out, the inverse of a cubic is not a cubic, so the inverse function won't be a spline. But you can nevertheless efficiently evaluate it with scipy.interpolate.sproot, which a special-purpose numerical solver. I'm not sure whether it uses cubic root solvers or an optimized numerical solver with knowledge about the bounding properties of spline control points, but in any case it's quite efficient. It only finds zeros, but (check this) you should be able to shift a spline vertically by subtracting a constant from the coefficient array (c in t,c,k). Since you are constructing the spline in the first place, you should also think about whether you're evaluating f or f inverse more often and choose which one to be the spline appropriately. Anne From amcmorl at gmail.com Fri May 7 16:28:21 2010 From: amcmorl at gmail.com (Angus McMorland) Date: Fri, 7 May 2010 16:28:21 -0400 Subject: [SciPy-User] trouble loading one .mat file Message-ID: Hi all, After upgrading to svn scipy 0.8.0.dev6369, to take advantage of Matthew Brett's bugfix to the scipy.io code (thanks for that, Matthew), I now have one matlab file which I cannot load using scipy.io.loadmat. Trying it gives the following error: /usr/local/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in get_variables(self, variable_names) 397 mdict['__globals__'] = [] 398 while not self.end_of_stream(): --> 399 hdr, next_position = self.read_var_header() 400 name = hdr.name 401 if name == '': /usr/local/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in read_var_header(self) 352 next_pos = self.mat_stream.tell() + byte_count 353 if mdtype == miCOMPRESSED: # make new stream from compressed data --> 354 stream = StringIO(zlib.decompress(self.mat_stream.read(byte_count))) 355 self._matrix_reader.set_stream(stream) 356 mdtype, byte_count = self._matrix_reader.read_full_tag() error: Error -5 while decompressing data I could definitely read this file before using the ubuntu karmic package 0.7.0-2, but I've also upgraded to lucid recently and I'm unsure whether I had successfully read it with lucid and packaged scipy before upgrading to scipy svn. In any case, a number of very similar files can be read fine using the new setup and a colleague has verified that the problem file can be opened with Matlab okay. Has anyone come across and solved this sort of problem before, or have any idea what might be causing it? It seems impolite to distribute the file on the list here, but I could send it to someone who has the capability to tackle debugging the problem. Many thanks, Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 7 16:36:15 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 16:36:15 -0400 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 4:24 PM, Anne Archibald wrote: > On 7 May 2010 12:40, ? wrote: >> I have a function ?y = f(x) which is monotonically increasing (a >> cumulative distribution function) >> f is defined by piecewise polynomial interpolation, an interpolating >> spline on some points >> >> I would like to get the inverse function (ppf) ?x = f^{-1} (y) >> if the spline is of higher order than linear >> >> In the linear case it's trivial, because the inverse function is also >> just a piecewise linear interpolation. >> >> If I have a cubic spline, or any other smooth interpolator in scipy, >> is there a way to get the >> inverse function directly? >> >> I don't know much about general properties of splines, and would >> appreciate any hints, >> so I can avoid numerical inversion (fsolve or similar) > > I should first say that even though your input points are monotonic, > the spline is not guaranteed to be. (Though in practice if your > sampled points have no sharp corners you're probably fine.) If this > matters to you, there are algorithms for enforcing monotonicity of > splines, some of which are simply procedures for jiggering the > interpolation points just enough to avoid non-monotonicty and some of > which are more clever. Sadly none are implemented in scipy. > > As Charles Harris pointed out, the inverse of a cubic is not a cubic, > so the inverse function won't be a spline. But you can nevertheless > efficiently evaluate it with scipy.interpolate.sproot, which a > special-purpose numerical solver. I'm not sure whether it uses cubic > root solvers or an optimized numerical solver with knowledge about the > bounding properties of spline control points, but in any case it's > quite efficient. It only finds zeros, but (check this) you should be > able to shift a spline vertically by subtracting a constant from the > coefficient array (c in t,c,k). > > Since you are constructing the spline in the first place, you should > also think about whether you're evaluating f or f inverse more often > and choose which one to be the spline appropriately. Thanks, I will try to figure out the sproot version. For now I'm stuck (and go somewhere else) because in the examples that I tried out, I get small non-monotonicities most of the time. The spline of the inverse function is backwards bending. I will stick with linear interpolation and kernel density estimation for the smooth case. BTW: I'm writing some histogram distribution and variants of empirical distribution classes that have the same methods as the ones in scipy.stats. Josef > > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From vanforeest at gmail.com Fri May 7 16:37:33 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 7 May 2010 22:37:33 +0200 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: Hi Josef, > If I have a cubic spline, or any other smooth interpolator in scipy, > is there a way to get the > inverse function directly? How can you ensure that the cubic spline approx is non-decreasing? I actually wonder whether using cubic splines is the best way to approximate distribution functions. Nicky From josef.pktd at gmail.com Fri May 7 16:44:44 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 16:44:44 -0400 Subject: [SciPy-User] inverse function of a spline In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 4:37 PM, nicky van foreest wrote: > Hi Josef, > >> If I have a cubic spline, or any other smooth interpolator in scipy, >> is there a way to get the >> inverse function directly? > > How can you ensure that the cubic spline approx is non-decreasing? I > actually wonder whether using cubic splines is the best way to > approximate distribution functions. Now I know it's not, but I was designing the extension to the linear case on paper instead of in the interpreter, and got stuck on the wrong problem. Maybe I ask the question again when scipy has monotonic interpolators. Josef > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri May 7 18:45:51 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 May 2010 18:45:51 -0400 Subject: [SciPy-User] fisherexact.py returns "NA" for large #s In-Reply-To: References: Message-ID: On Fri, May 7, 2010 at 5:44 PM, Vincent Davis wrote: > @ Josef, I assume you know about this reference from the wikipedia page. > http://mathworld.wolfram.com/FishersExactTest.html > > I have it in my second comment to the ticket. But from this it's still a long way to figure out where the zero is supposed to go in the strict or weak inequalities in the binary search. And why does the second path work but not the first ? I wasn't patient enough. Josef > Vincent > > On Fri, May 7, 2010 at 2:15 PM, wrote: > >> On Fri, May 7, 2010 at 3:45 PM, Pete Shepard >> wrote: >> > Hello List, >> > >> > >> > I am using "fisherexact.py" to calculate the p-value of two ratios >> however, >> > when large #s are involved, it returns "NA". Is there a way to override >> > this? >> >> >> You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ? >> >> Do you have an example? Can you add it to the ticket? >> >> Do you have large ratios or large numbers in each cell? >> If you have a large number of entries in each cell, then the chisquare >> test or similar >> asymptotic tests should be pretty reliable. >> >> Last time I tried, I didn't manage to get rid of incorrect results if >> the first cell is zero. >> And I didn't understand the details of the algorithm well enough to >> figure out what's >> going on (within a reasonable time). >> >> If you add some print statements, you could find out if the nan comes from >> a >> 0./0. division or from the hypergeometric distribution. >> Do you get the same result if you permute rows or columns? >> >> fisherexact works very well over a large range of values, but I'm >> waiting for someone >> to provide a patch for the cases that don't work. >> >> Josef >> >> >> >> >> >> > >> > TIA >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > *Vincent Davis > 720-301-3003 * > vincent at vincentdavis.net > my blog | LinkedIn > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Sat May 8 12:14:09 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 8 May 2010 12:14:09 -0400 Subject: [SciPy-User] Optimization with smoothing In-Reply-To: References: Message-ID: On Tue, May 4, 2010 at 9:34 AM, Angus McMorland wrote: > Hi all, > I need to do some optimization where one of the parameters is a > spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to > go about this using scipy (or any other numpy-compatible Python package)? I > could imagine using one of the scipy.optimize routines and then smoothing > the relevant parameters within the optimization loop, but it would be best > if the next iteration's of parameters were chosen from the previous > iteration's _smoothed_ parameters rather than their 'non-smooth' > predecessors, as it seems like this would keep the optimization better > behaved. Is this possible? I would think you could modify the callback function in the source of your chosen optimization routine from callback(xk) to xk = callback(xk) Though you would probably want to recompute the gradient and Hessian at the new smoothed parameters. Sorry, I don't have a better answer, but I've often wondered the same thing and I'm hoping someone might know better than I. Skipper From josef.pktd at gmail.com Sat May 8 13:02:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 8 May 2010 13:02:16 -0400 Subject: [SciPy-User] Optimization with smoothing In-Reply-To: References: Message-ID: On Sat, May 8, 2010 at 12:14 PM, Skipper Seabold wrote: > On Tue, May 4, 2010 at 9:34 AM, Angus McMorland wrote: >> Hi all, >> I need to do some optimization where one of the parameters is a >> spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to >> go about this using scipy (or any other numpy-compatible Python package)? I >> could imagine using one of the scipy.optimize routines and then smoothing >> the relevant parameters within the optimization loop, but it would be best >> if the next iteration's of parameters were chosen from the previous >> iteration's _smoothed_ parameters rather than their 'non-smooth' >> predecessors, as it seems like this would keep the optimization better >> behaved. Is this possible? > > I would think you could modify the callback function in the source of > your chosen optimization routine from > > callback(xk) > > to > > xk = callback(xk) > > Though you would probably want to recompute the gradient and Hessian > at the new smoothed parameters. ?Sorry, I don't have a better answer, > but I've often wondered the same thing and I'm hoping someone might > know better than I. I was wondering more whether you really have a well defined optimization problem if you don't really use the parameters. Does the argmin really end up at the smoothed values, or at some parameterization of the spline? I would attempt to put the smoothness restriction in a constraint or rewrite the problem in terms of some lower dimensional "hyper parameters". Doing it directly might require adjustments to the optimization algorithm, e.g. how a new point is found, so that it ends up hardcoding the smoothness constraint into the optimization function. My impression, and 2.5 cents, Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From tmp50 at ukr.net Sun May 9 10:02:33 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 09 May 2010 17:02:33 +0300 Subject: [SciPy-User] [OT] any ways to run PETSc4py on several CPU? Message-ID: hi all, sorry for using the mail list but I haven't found more suitable. Are there any ways to run PETSc4py on several CPU, i.e. something like mpirun -np 4? Currently I have >>> print PETSc.COMM_WORLD.Get_size() 1 Thank you in advance, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Sun May 9 13:35:58 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 09 May 2010 19:35:58 +0200 Subject: [SciPy-User] [OT] any ways to run PETSc4py on several CPU? In-Reply-To: References: Message-ID: <4BE6F27E.5040503@student.matnat.uio.no> Dmitrey wrote: > hi all, > sorry for using the mail list but I haven't found more suitable. > Are there any ways to run PETSc4py on several CPU, i.e. something like > mpirun -np 4? I'd ask on the mpi4py mailing list, as Lisandro is a developer of both, and it's an MPI-related thing. -- Dag Sverre From cool-rr at cool-rr.com Mon May 10 07:37:29 2010 From: cool-rr at cool-rr.com (cool-RR) Date: Mon, 10 May 2010 13:37:29 +0200 Subject: [SciPy-User] Distributing SciPy and NumPy Message-ID: Hello, I have a project called GarlicSim which I want to distribute as an executable, packaged using py2exe. I want to package numpy and scipy with it, so they can be used by the end user. This means I'll be distributing an installer which installs scipy and numpy to my application's library. Are there any licensing issues I should be aware of? Is there any LGPL or GPL licensing in there? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Mon May 10 08:08:57 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 10 May 2010 14:08:57 +0200 Subject: [SciPy-User] Distributing SciPy and NumPy In-Reply-To: References: Message-ID: <4BE7F759.7060100@student.matnat.uio.no> cool-RR wrote: > Hello, > > I have a project called GarlicSim which I want to distribute as an > executable, packaged using py2exe. I want to package numpy and scipy > with it, so they can be used by the end user. This means I'll be > distributing an installer which installs scipy and numpy to my > application's library. > > Are there any licensing issues I should be aware of? Is there any LGPL > or GPL licensing in there? You need a LAPACK implementation to use SciPy, and those come with various licenses. But SciPy+ATLAS is a common combination which is all BSD. SciPy developers are pretty conscious about keeping GPL or LGPL code out of the main SciPy library (though some scikits libraries are under GPL). Dag Sverre From cool-rr at cool-rr.com Mon May 10 09:18:22 2010 From: cool-rr at cool-rr.com (Ram Rachum) Date: Mon, 10 May 2010 13:18:22 +0000 (UTC) Subject: [SciPy-User] Distributing SciPy and NumPy References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: Dag Sverre Seljebotn student.matnat.uio.no> writes: > > cool-RR wrote: > > > Are there any licensing issues I should be aware of? Is there any LGPL > > or GPL licensing in there? > > You need a LAPACK implementation to use SciPy, and those come with > various licenses. But SciPy+ATLAS is a common combination which is all BSD. > > SciPy developers are pretty conscious about keeping GPL or LGPL code out > of the main SciPy library (though some scikits libraries are under GPL). > > Dag Sverre > Hey Dag, I've installed numpy and scipy using the standard installers from the website. (Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL? Should I worry about those scikits libraries? Are they in numpy/scipy? Ram. From pav at iki.fi Mon May 10 09:27:15 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 10 May 2010 13:27:15 +0000 (UTC) Subject: [SciPy-User] Distributing SciPy and NumPy References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: Mon, 10 May 2010 13:18:22 +0000, Ram Rachum wrote: > I've installed numpy and scipy using the standard installers from the > website. (Not EPD or Python(x,y)). Is this installation free of any > GPL/LGPL? Should be. > Should I worry about those scikits libraries? Only if you have installed some of them. > Are they in numpy/scipy? No. They are separate libraries. -- Pauli Virtanen From cool-rr at cool-rr.com Mon May 10 10:20:32 2010 From: cool-rr at cool-rr.com (Ram Rachum) Date: Mon, 10 May 2010 14:20:32 +0000 (UTC) Subject: [SciPy-User] Distributing SciPy and NumPy References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: Pauli Virtanen iki.fi> writes: | > I've installed numpy and scipy using the standard installers from the | > website. (Not EPD or Python(x,y)). Is this installation free of any | > GPL/LGPL? | Should be. | > Should I worry about those scikits libraries? | Only if you have installed some of them. | > Are they in numpy/scipy? | No. They are separate libraries. Great. Thanks for your help, Ram. From matthew.brett at gmail.com Mon May 10 15:56:25 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 May 2010 12:56:25 -0700 Subject: [SciPy-User] trouble loading one .mat file In-Reply-To: References: Message-ID: Hi, >> After upgrading to svn scipy 0.8.0.dev6369, to take advantage of Matthew >> Brett's bugfix to the scipy.io code (thanks for that, Matthew), I now have >> one matlab file which I cannot load using scipy.io.loadmat. Trying it gives >> the following error: ... >> --> 354 ? ? ? ? ? ? stream = >> StringIO(zlib.decompress(self.mat_stream.read(byte_count))) >> ?? ?355 ? ? ? ? ? ? self._matrix_reader.set_stream(stream) >> ?? ?356 ? ? ? ? ? ? mdtype, byte_count = self._matrix_reader.read_full_tag() >> error: Error -5 while decompressing data This proved to be an odd one; http://bugs.python.org/issue7191 I've committed the workaround I put in the bug report above; it seems to have a tiny performance penalty. Please do let me know if the fix doesn't help or causes more problems, See you, Matthew From martin.felder at zsw-bw.de Wed May 5 06:03:05 2010 From: martin.felder at zsw-bw.de (Martin Felder) Date: Wed, 05 May 2010 12:03:05 +0200 Subject: [SciPy-User] scikits.timeseries: How to define frequency of 15minutes Message-ID: Hi *, just for the record, I'm having the exact same problem as Georges. I read through your discussion from three weeks ago, but I also don't feel up to modifying the C code myself (being a Fortran kind of guy...). I understand implementing custom user-defined frequencies is probably a lot of effort, but maybe it's less troublesome to just add some frequencies often used (=by Georges and me, and hopefully others?) to the currently implemented ones? I'd be extremely happy to have 12h, 6h, 3h, 15min and 10min intervals in addition to the existing ones. If you could point me to the part of the code that would have to be modified for that, maybe I can find someone more apt in C who can implement it. Thanks, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: martin.felder.vcf Type: text/x-vcard Size: 298 bytes Desc: Card for Martin Felder URL: From kmichael.aye at gmail.com Tue May 11 10:04:31 2010 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Tue, 11 May 2010 16:04:31 +0200 Subject: [SciPy-User] Distributing SciPy and NumPy References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: On 2010-05-10 15:18:22 +0200, Ram Rachum said: > Dag Sverre Seljebotn student.matnat.uio.no> writes: > >> >> cool-RR wrote: >> >>> Are there any licensing issues I should be aware of? Is there any LGPL >>> or GPL licensing in there? >> >> You need a LAPACK implementation to use SciPy, and those come with >> various licenses. But SciPy+ATLAS is a common combination which is all BSD. >> >> SciPy developers are pretty conscious about keeping GPL or LGPL code out >> of the main SciPy library (though some scikits libraries are under GPL). >> >> Dag Sverre >> > > Hey Dag, > > I've installed numpy and scipy using the standard installers from the website. > (Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL? Please excuse my ignorance respectively my legal insecurity, but am I right in assuming, that the only 'problem' I would have with scipy or numpy being released under GPL/LGPL, if I were to release my app NOT under GPL/LGPL? In other words, if i release my app using libraries under GPL/LPGL, all I have to worry is, to release it the same way, right? (Assuming I don't want to earn money with it). This legal stuff confuses the hell outta me... :S Best regards, Michael > > Should I worry about those scikits libraries? Are they in numpy/scipy? > > Ram. From ben.root at ou.edu Tue May 11 11:04:30 2010 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 11 May 2010 10:04:30 -0500 Subject: [SciPy-User] Distributing SciPy and NumPy In-Reply-To: References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: On Tue, May 11, 2010 at 9:04 AM, K.-Michael Aye wrote: > On 2010-05-10 15:18:22 +0200, Ram Rachum said: > > > Dag Sverre Seljebotn student.matnat.uio.no> writes: > > > >> > >> cool-RR wrote: > >> > >>> Are there any licensing issues I should be aware of? Is there any LGPL > >>> or GPL licensing in there? > >> > >> You need a LAPACK implementation to use SciPy, and those come with > >> various licenses. But SciPy+ATLAS is a common combination which is all > BSD. > >> > >> SciPy developers are pretty conscious about keeping GPL or LGPL code out > >> of the main SciPy library (though some scikits libraries are under GPL). > >> > >> Dag Sverre > >> > > > > Hey Dag, > > > > I've installed numpy and scipy using the standard installers from the > website. > > (Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL? > > Please excuse my ignorance respectively my legal insecurity, but am I > right in assuming, that the only 'problem' I would have with scipy or > numpy being released under GPL/LGPL, if I were to release my app NOT > under GPL/LGPL? > The GPL/LGPL is a distribution license, so it can only dictate terms for redistribution of code. Software using GPL'ed code must also be released under a GPL-compatible license. All of the source codes (including changes you made to the original code) must remain open. Software using LGPL'ed code can be released using other licenses, but the LGPL'ed code (and any changes you made to it) must remain open. It is best practice to have the source code accompany the software package, but as far as I understand, this isn't a requirement so long as the code is available by request. Someone else can correct me on this. > In other words, if i release my app using libraries under GPL/LPGL, all > I have to worry is, to release it the same way, right? (Assuming I > don't want to earn money with it). > Argh! You can make money on open source code! This isn't the proper place to discuss it, but the open-source community is not a charity case. Open-source is a very viable business model. > This legal stuff confuses the hell outta me... :S > Same here. Also, IANAL, so this isn't legal advice, merely the distillation of various discussions on this topic. Sincerely, Ben Root > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Tue May 11 13:32:18 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 11 May 2010 13:32:18 -0400 Subject: [SciPy-User] Distributing SciPy and NumPy In-Reply-To: References: <4BE7F759.7060100@student.matnat.uio.no> Message-ID: On 11 May 2010 11:04, Benjamin Root wrote: > Argh! You can make money on open source code! This isn't the proper place > to discuss it, but the open-source community is not a charity case. > Open-source is a very viable business model. Not to spawn a discussion, but this is germane - Enthought, that maintains mayavi and pays many of the main numpy/scipy developers, is a for-profit open-source-based company. They just have a different business model than Microsoft (thankfully!). >> This legal stuff confuses the hell outta me... :S > > Same here. Also, IANAL, so this isn't legal advice, merely the distillation > of various discussions on this topic. Copyright law is a nightmare. Anne From permafacture at gmail.com Tue May 11 13:37:42 2010 From: permafacture at gmail.com (Elliot Hallmark) Date: Tue, 11 May 2010 12:37:42 -0500 Subject: [SciPy-User] help interpreting univariate spline In-Reply-To: References: Message-ID: > It's documented, in the FITPACK user's manual, and possibly in that book that I pointed you to in another reply. I had seen this before, and I think joseph is right that the coeeficents given are for the form given by wikipedia. So, my lack of understanding is just mathematical. For others who come across this, here is the solution I used. First, we wanted the spline in bezier form, so we found code for adding knots to a bspline to get the the bezier knots. These knots are all on the curve and are sufficent to define the curve so I used linear albegra to solve for the coefficents from the points. the code to put the spline in bezier form is http://mail.scipy.org/pipermail/scipy-dev/2007-February/006651.html (actually, I got it from http://old.nabble.com/bezier-curve-through-set-of-2D-points-td27158642.html) We used a quadratic spline (to start with) which is defined by three points. I had to determine the normal vector at some point on the curve, so here is that function (in cython code) computing the derivative given the 2D quadratic bezier knots. line 363 in https://bitbucket.org/permafacture/solar-concentrator-design/changeset/6b90db3cf454#chg-raytrace/cfaces.pyx just calculating the determinant and adjoint matrix to solve y = ax^2 + bx + c for A,B and C given three (x,y) pairs. thanks all. From gideon.simpson at gmail.com Tue May 11 15:09:32 2010 From: gideon.simpson at gmail.com (Gideon) Date: Tue, 11 May 2010 12:09:32 -0700 (PDT) Subject: [SciPy-User] writing data to binary for fortran Message-ID: I've previously used the FortranFile.py to read in binary data generated by fortran computations, but now I'd like to write data from NumPy/SciPy to binary which can be read in by a fortran program. Does anyone have an example of using fortranfile.py to create and write data to binary? Alternatively, can anyone suggest a way to write numpy arrays to binary in away that permits me to specify the correct offset (4 bytes on my machine) for fortran to then properly read the data in? From kwmsmith at gmail.com Tue May 11 15:29:05 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Tue, 11 May 2010 14:29:05 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: References: Message-ID: On Tue, May 11, 2010 at 2:09 PM, Gideon wrote: > I've previously used the FortranFile.py to read in binary data > generated by fortran computations, but now I'd like to write data from > NumPy/SciPy to binary which can be read in by a fortran program. ?Does > anyone have an example of using fortranfile.py to create and write > data to binary? ?Alternatively, can anyone suggest a way to write > numpy arrays to binary in away that permits me to specify the correct > offset (4 bytes on my machine) for fortran to then properly read the > data in? I have a couple of simple fortran reading/writing routines (in python) that work with numpy arrays. I've attached them to this email -- use as you see fit. Hopefully they help, or at least show you how to do what you want. Kurt -------------- next part -------------- A non-text attachment was scrubbed... Name: fio.py Type: text/x-python Size: 3375 bytes Desc: not available URL: From nmb at wartburg.edu Tue May 11 15:58:37 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Tue, 11 May 2010 14:58:37 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: References: Message-ID: <4BE9B6ED.1020603@wartburg.edu> On 2010-05-11 14:09, Gideon wrote: > I've previously used the FortranFile.py to read in binary data > generated by fortran computations, but now I'd like to write data from > NumPy/SciPy to binary which can be read in by a fortran program. Does > anyone have an example of using fortranfile.py to create and write > data to binary? Alternatively, can anyone suggest a way to write > numpy arrays to binary in away that permits me to specify the correct > offset (4 bytes on my machine) for fortran to then properly read the > data in? You can use the writeReals method of a FortranFile object: In [1]: import fortranfile In [2]: import numpy as np In [3]: F = fortranfile.FortranFile('test.unf',mode='w') In [4]: F.writeReals(np.linspace(0,1,10)) In [5]: F.close() In [6]: !ls -l 'test.unf' -rw-r--r-- 1 nmb nmb 48 2010-05-11 14:56 test.unf There are also writeInts and writeString methods. Like usual, FortranFile only writes and reads homogeneous records: all integers, all reals, etc. To write fortran files with items of different types in a single record, you will have to work harder, perhaps using the struct module directly. -Neil From goodfellow.ian at gmail.com Tue May 11 16:04:00 2010 From: goodfellow.ian at gmail.com (Ian Goodfellow) Date: Tue, 11 May 2010 16:04:00 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug Message-ID: I've find that (scipy/numpy).linalg.eig have a problem where given a symmetric matrix they return complex eigenvalues. I can use scipy.io to save this matrix in matlab format, load it in matlab, and use matlab's eig function to succesfully decompose it with real eigenvalues, so the problem seems to be with scipy/numpy or their dependencies, not with my matrix. Is this a known issue? And is there a good workaround? I saw another mailing post elsewhere that recommended using scipy.sparse.linalg.eigen.arpack.eigen as an alternative but it doesn't seem to work at all. Can anyone recommend some other way of getting an eigenvalue decomposition in scipy or explain how to use arpack? My failed attempts at using arpack are below. Thanks, Ian >>> A = N.random.randn(3,3) >>> B = arpack.eigen(A) Traceback (most recent call last): File "", line 1, in File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scip y/sparse/linalg/eigen/arpack/arpack.py", line 172, in eigen raise ValueError("ncv must be k<=ncv<=n, ncv=%s"%ncv) ValueError: ncv must be k<=ncv<=n, ncv=3 >>> B = arpack.eigen(A,3) Traceback (most recent call last): File "", line 1, in File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 165, in eigen raise ValueError("k must be less than rank(A), k=%d"%k) ValueError: k must be less than rank(A), k=3 >>> B = arpack.eigen(A,2) Traceback (most recent call last): File "", line 1, in File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 220, in eigen raise RuntimeError("Error info=%d in arpack"%info) RuntimeError: Error info=-3 in arpack >>> B = arpack.eigen(A,2) Traceback (most recent call last): File "", line 1, in File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 220, in eigen raise RuntimeError("Error info=%d in arpack"%info) RuntimeError: Error info=-3 in arpack From josef.pktd at gmail.com Tue May 11 16:20:45 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 May 2010 16:20:45 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: Message-ID: On Tue, May 11, 2010 at 4:04 PM, Ian Goodfellow wrote: > I've find that (scipy/numpy).linalg.eig have a problem where given a > symmetric matrix they return complex eigenvalues. I can use scipy.io > to save this matrix in matlab format, load it in matlab, and use > matlab's eig function to succesfully decompose it with real > eigenvalues, so the problem seems to be with scipy/numpy or their > dependencies, not with my matrix. Is this a known issue? And is there > a good workaround? you could try linalg.eigh it's more specialized and I found that it produces the usual expected results for symmetric matrices Josef > > I saw another mailing post elsewhere that recommended using > scipy.sparse.linalg.eigen.arpack.eigen as an alternative but it > doesn't seem to work at all. Can anyone recommend some other way of > getting an eigenvalue decomposition in scipy or explain how to use > arpack? > > My failed attempts at using arpack are below. > > Thanks, > Ian > >>>> A = N.random.randn(3,3) >>>> B = arpack.eigen(A) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scip > ? ? ? ? ? ? ? ?y/sparse/linalg/eigen/arpack/arpack.py", line 172, in > eigen > ? ?raise ValueError("ncv must be k<=ncv<=n, ncv=%s"%ncv) > ValueError: ncv must be k<=ncv<=n, ncv=3 >>>> B = arpack.eigen(A,3) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", > line 165, in eigen > ? ?raise ValueError("k must be less than rank(A), k=%d"%k) > ValueError: k must be less than rank(A), k=3 >>>> B = arpack.eigen(A,2) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", > line 220, in eigen > ? ?raise RuntimeError("Error info=%d in arpack"%info) > RuntimeError: Error info=-3 in arpack >>>> B = arpack.eigen(A,2) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", > line 220, in eigen > ? ?raise RuntimeError("Error info=%d in arpack"%info) > RuntimeError: Error info=-3 in arpack > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pav at iki.fi Tue May 11 16:39:59 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 May 2010 20:39:59 +0000 (UTC) Subject: [SciPy-User] Eigenvalue decomposition bug References: Message-ID: Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: > I've find that (scipy/numpy).linalg.eig have a problem where given a > symmetric matrix they return complex eigenvalues. I can use scipy.io to > save this matrix in matlab format, load it in matlab, and use matlab's > eig function to succesfully decompose it with real eigenvalues, so the > problem seems to be with scipy/numpy or their dependencies, not with my > matrix. Is this a known issue? And is there a good workaround? Use the eigh function if you know your matrix is symmetric. Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a symmetric-specific eigensolver. Numpy and Scipy don't do this automatic check. A nonsymmetric eigensolver cannot know that your matrix is supposed to have real eigenvalues, so it's possible some of them explode to complex pairs because of minuscule numerical error. The imaginary part, however, is typically small. -- Pauli Virtanen From nahumoz at gmail.com Tue May 11 23:47:27 2010 From: nahumoz at gmail.com (Oz Nahum) Date: Tue, 11 May 2010 20:47:27 -0700 Subject: [SciPy-User] finding max value in a vector which contains NaN's Message-ID: Hi All, I have a code that needs to find a max value in a vector, which has also NaN. using max(cr), I get answer: nan, even though, the largest value is 15.1879.... Anyone has an idea how to avoid this problem ? I don't want to make a loop to kick out all the NaN values, although that would be a solution... Thanks in advance, -- Oz Nahum Graduate Student Zentrum f?r Angewandte Geologie Universit?t T?bingen --- Imagine there's no countries it isn't hard to do Nothing to kill or die for And no religion too Imagine all the people Living life in peace From zachary.pincus at yale.edu Wed May 12 00:02:07 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 May 2010 00:02:07 -0400 Subject: [SciPy-User] mail not getting through? Message-ID: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Hi and sorry for the spam, The last couple of times I've replied to messages from scipy-user, it would appear that the mail never comes through to the list, but it doesn't bounce back to me either. (I replied to the symmetric eigenvalue message, but nothing came back on the list, e.g.) If this email gets through, has anyone else seen this issue? Zach From pgmdevlist at gmail.com Wed May 12 00:24:35 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 12 May 2010 00:24:35 -0400 Subject: [SciPy-User] finding max value in a vector which contains NaN's In-Reply-To: References: Message-ID: <8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com> On May 11, 2010, at 11:47 PM, Oz Nahum wrote: > Hi All, > I have a code that needs to find a max value in a vector, which has also NaN. > using max(cr), I get answer: nan, even though, the largest value is 15.1879.... > > Anyone has an idea how to avoid this problem ? I don't want to make a > loop to kick out all the NaN values, although that would be a > solution... Use `nanmax` (a numpy function). From ariver at enthought.com Wed May 12 01:21:19 2010 From: ariver at enthought.com (Aaron River) Date: Wed, 12 May 2010 00:21:19 -0500 Subject: [SciPy-User] mail not getting through? In-Reply-To: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Message-ID: Hello Zach, I'm the IT Administrator at Enthought. This is a known issue which I'm working to rectify. I'm hoping to have it all ironed out tomorrow. Thanks, -- Aaron On Tuesday, May 11, 2010, Zachary Pincus wrote: > Hi and sorry for the spam, > > The last couple of times I've replied to messages from scipy-user, it > would appear that the mail never comes through to the list, but it > doesn't bounce back to me either. (I replied to the symmetric > eigenvalue message, but nothing came back on the list, e.g.) > > If this email gets through, has anyone else seen this issue? > > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Wed May 12 01:56:49 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 May 2010 01:56:49 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers Message-ID: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Hi all, I've been meaning for a long time to look into cobbling together some non-broken, maintainable (e.g. non-PIL) image IO library that can deal with scientific (16-bit and floating-point) image formats. I finally bit the bullet yesterday and whipped together a ctypes wrapper for the FreeImage library. (FreeImage is portable and largely if not entirely dependency-free; Windows binaries are available and it compiles cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/ Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf , particularly the appendix that shows the supported image types and pixel formats: pretty impressive. Also note that there is a "FreeImagePy" project that has ctypes wrappers for FreeImage, but the code is... idiosyncratic... and doesn't interface with numpy anyway.) The underlying library and wrappers I wrote support reading and writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint and 32-bit float pixels, as well as greyscale images with 64-bit float and 128-bit complex pixels. (The TIFF spec supports all of these, at least, as does FreeImage, but most other TIFF readers probably don't. The PNG format itself is a bit more limited, but FreeImage can read/ write everything in the spec, I think. Most other formats are 8-bit only.) Multipage image IO is also supported, and there's currently a bit of support for reading EXIF tags, which could easily be beefed up. The wrapper code is pretty compact and straightforward, and the FreeImage library seems pretty robust and simple (once one notes that it uses BGRA ordering on little-endian systems). Overall I feel a lot better about using this than dealing with PIL and its broken memory model and worse patch-acceptance track record. If anyone wants to test the wrappers out, I'll send you the code. Going forward, I'll look into getting this into the scikits image IO system, but I don't really have free cycles for that right now. Zach PS. FreeImage is dual licensed: GPL and a "FreeImage license", the latter of which I have no idea if is BSD compatible -- it says it's "less restrictive" than GPL but I'm unable to parse the license's many clauses. In any case, as long as users are required to provide their own FreeImage dll/so/dylib, it's not really a problem. From josef.pktd at gmail.com Wed May 12 02:15:01 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 12 May 2010 02:15:01 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: On Wed, May 12, 2010 at 1:56 AM, Zachary Pincus wrote: > Hi all, > > I've been meaning for a long time to look into cobbling together some > non-broken, maintainable (e.g. non-PIL) image IO library that can deal > with scientific (16-bit and floating-point) image formats. I finally > bit the bullet yesterday and whipped together a ctypes wrapper for the > FreeImage library. (FreeImage is portable and largely if not entirely > dependency-free; Windows binaries are available and it compiles > cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/ > ?Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf > ?, particularly the appendix that shows the supported image types and > pixel formats: pretty impressive. Also note that there is a > "FreeImagePy" project that has ctypes wrappers for FreeImage, but the > code is... idiosyncratic... and doesn't interface with numpy anyway.) > > The underlying library and wrappers I wrote support reading and > writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint > and 32-bit float pixels, as well as greyscale images with 64-bit float > and 128-bit complex pixels. (The TIFF spec supports all of these, at > least, as does FreeImage, but most other TIFF readers probably don't. > The PNG format itself is a bit more limited, but FreeImage can read/ > write everything in the spec, I think. Most other formats are 8-bit > only.) Multipage image IO is also supported, and there's currently a > bit of support for reading EXIF tags, which could easily be beefed up. > > The wrapper code is pretty compact and straightforward, and the > FreeImage library seems pretty robust and simple (once one notes that > it uses BGRA ordering on little-endian systems). Overall I feel a lot > better about using this than dealing with PIL and its broken memory > model and worse patch-acceptance track record. > > If anyone wants to test the wrappers out, I'll send you the code. > Going forward, I'll look into getting this into the scikits image IO > system, but I don't really have free cycles for that right now. > > Zach > > PS. FreeImage is dual licensed: GPL and a "FreeImage license", the > latter of which I have no idea if is BSD compatible -- it says it's > "less restrictive" than GPL but I'm unable to parse the license's many > clauses. In any case, as long as users are required to provide their > own FreeImage dll/so/dylib, it's not really a problem. "FreeImage Public license" looks like http://www.mozilla.org/MPL/MPL-1.1.html no item 13 is the only difference from a quick look Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From goodfellow.ian at gmail.com Wed May 12 08:43:50 2010 From: goodfellow.ian at gmail.com (Ian Goodfellow) Date: Wed, 12 May 2010 08:43:50 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: Message-ID: Great, thanks. eigh seems to be working. -Ian On Tue, May 11, 2010 at 4:39 PM, Pauli Virtanen wrote: > Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >> I've find that (scipy/numpy).linalg.eig have a problem where given a >> symmetric matrix they return complex eigenvalues. I can use scipy.io to >> save this matrix in matlab format, load it in matlab, and use matlab's >> eig function to succesfully decompose it with real eigenvalues, so the >> problem seems to be with scipy/numpy or their dependencies, not with my >> matrix. Is this a known issue? And is there a good workaround? > > Use the eigh function if you know your matrix is symmetric. > > Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a > symmetric-specific eigensolver. Numpy and Scipy don't do this automatic > check. > > A nonsymmetric eigensolver cannot know that your matrix is supposed to > have real eigenvalues, so it's possible some of them explode to complex > pairs because of minuscule numerical error. The imaginary part, however, > is typically small. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From seb.haase at gmail.com Wed May 12 10:41:16 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 12 May 2010 16:41:16 +0200 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: Hi Zach, this sounds exciting and I might find some time to try it out ... BTW, the Python image-sig should not be a "PIL only" mailing list. So (eventually) I feel, this issue could be brought up there, too. But most importantly, I think it would be great to finally have a "small footprint" image-format library that does not try to reproduce all kinds of operations that we can do easily in numpy. Do you know if FreeImage does anything via memory-mapping ? I'm mostly interested in TIFF-memmap, which exists according to libtif, but I have now idea how useful it is ..... (I need memmap for GB-size multipage images) Thanks, Sebastian On Wed, May 12, 2010 at 7:56 AM, Zachary Pincus wrote: > Hi all, > > I've been meaning for a long time to look into cobbling together some > non-broken, maintainable (e.g. non-PIL) image IO library that can deal > with scientific (16-bit and floating-point) image formats. I finally > bit the bullet yesterday and whipped together a ctypes wrapper for the > FreeImage library. (FreeImage is portable and largely if not entirely > dependency-free; Windows binaries are available and it compiles > cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/ > ?Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf > ?, particularly the appendix that shows the supported image types and > pixel formats: pretty impressive. Also note that there is a > "FreeImagePy" project that has ctypes wrappers for FreeImage, but the > code is... idiosyncratic... and doesn't interface with numpy anyway.) > > The underlying library and wrappers I wrote support reading and > writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint > and 32-bit float pixels, as well as greyscale images with 64-bit float > and 128-bit complex pixels. (The TIFF spec supports all of these, at > least, as does FreeImage, but most other TIFF readers probably don't. > The PNG format itself is a bit more limited, but FreeImage can read/ > write everything in the spec, I think. Most other formats are 8-bit > only.) Multipage image IO is also supported, and there's currently a > bit of support for reading EXIF tags, which could easily be beefed up. > > The wrapper code is pretty compact and straightforward, and the > FreeImage library seems pretty robust and simple (once one notes that > it uses BGRA ordering on little-endian systems). Overall I feel a lot > better about using this than dealing with PIL and its broken memory > model and worse patch-acceptance track record. > > If anyone wants to test the wrappers out, I'll send you the code. > Going forward, I'll look into getting this into the scikits image IO > system, but I don't really have free cycles for that right now. > > Zach > > PS. FreeImage is dual licensed: GPL and a "FreeImage license", the > latter of which I have no idea if is BSD compatible -- it says it's > "less restrictive" than GPL but I'm unable to parse the license's many > clauses. In any case, as long as users are required to provide their > own FreeImage dll/so/dylib, it's not really a problem. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ben.root at ou.edu Wed May 12 10:48:35 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 12 May 2010 09:48:35 -0500 Subject: [SciPy-User] finding max value in a vector which contains NaN's In-Reply-To: <8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com> References: <8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com> Message-ID: FYI, if you are coming from another language like Matlab, you may have been used to using NaNs to indicate bad values and such (not that there is anything wrong with that!). However, Numpy offers an interesting way to deal with bad values in arrays called "Masked Arrays". >>> import numpy as np >>> x = np.array([2, 1, 3, np.nan, 5, 2, 3, np.nan]) >>> np.max(x) nan >>> m = np.ma.masked_array(x, np.isnan(x)) >>> np.max(m) 5.0 Ben Root On Tue, May 11, 2010 at 11:24 PM, Pierre GM wrote: > On May 11, 2010, at 11:47 PM, Oz Nahum wrote: > > Hi All, > > I have a code that needs to find a max value in a vector, which has also > NaN. > > using max(cr), I get answer: nan, even though, the largest value is > 15.1879.... > > > > Anyone has an idea how to avoid this problem ? I don't want to make a > > loop to kick out all the NaN values, although that would be a > > solution... > > > Use `nanmax` (a numpy function). > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Wed May 12 13:10:19 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 May 2010 13:10:19 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: > Do you know if FreeImage does anything via memory-mapping ? I'm mostly > interested in TIFF-memmap, which exists according to libtif, but I > have now idea how useful it is ..... (I need memmap for GB-size > multipage images) I don't know a ton about how memmapping works, but check out these functions from FreeImage: > FreeImage_OpenMemory > DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data > FI_DEFAULT(0), DWORD > size_in_bytes FI_DEFAULT(0)); > > Open a memory stream. The function returns a pointer to the opened > memory stream. > When called with default arguments (0), this function opens a memory > stream for read / write > access. The stream will support loading and saving of FIBITMAP in a > memory file (managed > internally by FreeImage). It will also support seeking and telling > in the memory file. > This function can also be used to wrap a memory buffer provided by > the application driving > FreeImage. A buffer containing image data is given as function > arguments data (start of the > buffer) and size_in_bytes (buffer size in bytes). A memory buffer > wrapped by FreeImage is > read only. Images can be loaded but cannot be saved. > FreeImage_LoadFromHandle > DLL_API FIBITMAP *DLL_CALLCONV > FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif, > FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0)); > > FreeImage has the unique feature to load a bitmap from an arbitrary > source. This source > might for example be a cabinet file, a zip file or an Internet > stream. Handling of these arbitrary > sources is not directly handled in the FREEIMAGE.DLL, but can be > easily added by using a > FreeImageIO structure as defined in FREEIMAGE.H. > FreeImageIO is a structure that contains 4 function pointers: one to > read from a source, one > to write to a source, one to seek in the source and one to tell > where in the source we currently > are. When you populate the FreeImageIO structure with pointers to > functions and pass that > structure to FreeImage_LoadFromHandle, FreeImage will call your > functions to read, seek > and tell in a file. The handle-parameter (third parameter from the > left) is used in this to > differentiate between different contexts, e.g. different files or > different Internet streams. With the first, I think you could just pass the void* pointer returned from memmapping a file; with the second, I think you could wrap a memmapped file with a file-like interface (implemented in python callbacks, even). Not sure, of course, if that will work OK... Might be easier to work with wrappers to libtiff directly? Zach From kwmsmith at gmail.com Wed May 12 13:36:22 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Wed, 12 May 2010 12:36:22 -0500 Subject: [SciPy-User] Bug in ndimage.map_coordinates with mode='wrap' ? In-Reply-To: References: Message-ID: On Mon, Mar 22, 2010 at 11:24 AM, Kurt Smith wrote: > On Mon, Mar 22, 2010 at 11:20 AM, Kurt Smith wrote: >> Hi, >> >> Testing the example code in ndimage.map_coordinate's docstring, I >> can't get things to work with mode='wrap'. ?What am I doing wrong? >> >> In [31]: a >> Out[31]: >> array([[ ?0., ? 1., ? 2.], >> ? ? ? [ ?3., ? 4., ? 5.], >> ? ? ? [ ?6., ? 7., ? 8.], >> ? ? ? [ ?9., ?10., ?11.]]) >> >> In [32]: ndimage.map_coordinates(a, [range(5), [0]*5], order=1, >> mode='wrap') ?# should be 0, 3, 6, 9, 0 -- right? >> Out[32]: array([ 0., ?3., ?6., ?9., ?3.]) >> >> In [33]: ndimage.map_coordinates(a, [[0]*4, range(4)], order=1, >> mode='wrap') ?# should be 0, 1, 2, 0 -- right? >> Out[33]: array([ 0., ?1., ?2., ?1.]) >> >> Here's the output when extending the sampling range: >> >> In [36]: ndimage.map_coordinates(a, [range(10), [0]*10], order=1, >> mode='wrap') ?# should be 0, 3, 6, 9, 0, 3, 6, 9, ... >> Out[36]: array([ 0., ?3., ?6., ?9., ?3., ?6., ?0., ?3., ?6., ?0.]) >> >> In [37]: ndimage.map_coordinates(a, [[0]*8, range(8)], order=1, mode='wrap') >> Out[37]: array([ 0., ?1., ?2., ?1., ?0., ?1., ?0., ?1.]) >> >> >> If it's a bug, where can I file a report, and what can I do to help >> fix it? ?Looks like the wrapping code is in a compiled extension >> module -- I'll take a look. > > I forgot to include: > > In [39]: sp.version.version > Out[39]: '0.8.0.dev6120' > Looks like the above is another version of this bug: http://projects.scipy.org/scipy/ticket/796 It affects any scipy.ndimage routines that use mode='wrap'. The patch has been helpfully submitted and it's in 'needs review' status -- any chance it could see some action? Otherwise I'll just patch scipy locally. Kurt From rmb62 at cornell.edu Wed May 12 13:38:53 2010 From: rmb62 at cornell.edu (Robin M Baur) Date: Wed, 12 May 2010 13:38:53 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: On Wed, May 12, 2010 at 01:56, Zachary Pincus wrote: [snip] > The wrapper code is pretty compact and straightforward, and the > FreeImage library seems pretty robust and simple (once one notes that > it uses BGRA ordering on little-endian systems). Overall I feel a lot > better about using this than dealing with PIL and its broken memory > model and worse patch-acceptance track record. > > If anyone wants to test the wrappers out, I'll send you the code. > Going forward, I'll look into getting this into the scikits image IO > system, but I don't really have free cycles for that right now. > > Zach I'm definitely interested, having had several nightmarish attempts at making PIL play nice with my 16-bit TIFF data. I don't have a ton of spare time myself right now, but I'd like to give it a shot. Robin From gideon.simpson at gmail.com Wed May 12 13:56:20 2010 From: gideon.simpson at gmail.com (Gideon) Date: Wed, 12 May 2010 10:56:20 -0700 (PDT) Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BE9B6ED.1020603@wartburg.edu> References: <4BE9B6ED.1020603@wartburg.edu> Message-ID: I've tried the following. In Python: import numpy as np from FortranFile import FortranFile x = np.random.rand(10) f = FortranFile('test.bin',mode='w') f.writeReals(x) f.close() In Fortran: program bintest double precision x(10) integer j open(unit=80, file='test.bin', status='old', form='unformatted') read(80) x close(80) do j=1,10 write(*,*) x(j) enddo end then at the command line, gfortran bintest.f -o bintest ./bintest At line 9 of file bintest.f (unit = 80, file = 'test.bin') Fortran runtime error: I/O past end of record on unformatted file Note, I have no difficulty reading the test.bin file back in, while in python, using the FortranFile.py routines. On May 11, 3:58?pm, Neil Martinsen-Burrell wrote: > On 2010-05-11 14:09, Gideon wrote: > > > I've previously used the FortranFile.py to read in binary data > > generated by fortran computations, but now I'd like to write data from > > NumPy/SciPy to binary which can be read in by a fortran program. ?Does > > anyone have an example of using fortranfile.py to create and write > > data to binary? ?Alternatively, can anyone suggest a way to write > > numpy arrays to binary in away that permits me to specify the correct > > offset (4 bytes on my machine) for fortran to then properly read the > > data in? > > You can use the writeReals method of a FortranFile object: > > In [1]: import fortranfile > > In [2]: import numpy as np > > In [3]: F = fortranfile.FortranFile('test.unf',mode='w') > > In [4]: F.writeReals(np.linspace(0,1,10)) > > In [5]: F.close() > > In [6]: !ls -l 'test.unf' > -rw-r--r-- 1 nmb nmb 48 2010-05-11 14:56 test.unf > > There are also writeInts and writeString methods. ?Like usual, > FortranFile only writes and reads homogeneous records: all integers, all > reals, etc. ?To write fortran files with items of different types in a > single record, you will have to work harder, perhaps using the struct > module directly. > > -Neil > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > -- > You received this message because you are subscribed to the Google Groups "SciPy-user" group. > To post to this group, send email to scipy-user at googlegroups.com. > To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com. > For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en. From nmb at wartburg.edu Wed May 12 14:00:36 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 12 May 2010 13:00:36 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: References: <4BE9B6ED.1020603@wartburg.edu> Message-ID: <4BEAECC4.5050508@wartburg.edu> On 2010-05-12 12:56, Gideon wrote: > I've tried the following. > > In Python: > import numpy as np > from FortranFile import FortranFile > > x = np.random.rand(10) > f = FortranFile('test.bin',mode='w') > f.writeReals(x) > f.close() > > In Fortran: > program bintest > > double precision x(10) > integer j > > open(unit=80, file='test.bin', status='old', form='unformatted') > > read(80) x > close(80) > > do j=1,10 > write(*,*) x(j) > > enddo > > > end > > then at the command line, > > gfortran bintest.f -o bintest > ./bintest > At line 9 of file bintest.f (unit = 80, file = 'test.bin') > Fortran runtime error: I/O past end of record on unformatted file > > Note, I have no difficulty reading the test.bin file back in, while in > python, using the FortranFile.py routines. It is likely that the problem is with the endian-ness of the file being created by FortranFile not matching what is expected by the fortran compiler. (There is a reason that the format of unformatted I/O is not specified in the Fortran standard.) Try the above with different settings of FortranFile(..., endian='<') or '>' or '='. -Neil From sebastian.walter at gmail.com Wed May 12 14:48:43 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Wed, 12 May 2010 20:48:43 +0200 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: Message-ID: Hello Pauli, On what kind of matrix did you observe such unstable behavior? Were there repeated eigenvalues? Sebastian On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen wrote: > Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >> I've find that (scipy/numpy).linalg.eig have a problem where given a >> symmetric matrix they return complex eigenvalues. I can use scipy.io to >> save this matrix in matlab format, load it in matlab, and use matlab's >> eig function to succesfully decompose it with real eigenvalues, so the >> problem seems to be with scipy/numpy or their dependencies, not with my >> matrix. Is this a known issue? And is there a good workaround? > > Use the eigh function if you know your matrix is symmetric. > > Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a > symmetric-specific eigensolver. Numpy and Scipy don't do this automatic > check. > > A nonsymmetric eigensolver cannot know that your matrix is supposed to > have real eigenvalues, so it's possible some of them explode to complex > pairs because of minuscule numerical error. The imaginary part, however, > is typically small. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gideon.simpson at gmail.com Wed May 12 15:58:28 2010 From: gideon.simpson at gmail.com (Gideon) Date: Wed, 12 May 2010 12:58:28 -0700 (PDT) Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BEAECC4.5050508@wartburg.edu> References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> Message-ID: Tried both, but I got the same error in both cases. On May 12, 2:00?pm, Neil Martinsen-Burrell wrote: > On 2010-05-12 12:56, Gideon wrote: > > > > > > > I've tried the following. > > > In Python: > > import numpy as np > > from FortranFile import FortranFile > > > x = np.random.rand(10) > > f = FortranFile('test.bin',mode='w') > > f.writeReals(x) > > f.close() > > > In Fortran: > > ? ? ? ?program bintest > > > ? ? ? ?double precision x(10) > > ? ? ? ?integer j > > > ? ? ? ?open(unit=80, file='test.bin', status='old', form='unformatted') > > > ? ? ? ?read(80) x > > ? ? ? ?close(80) > > > ? ? ? ?do j=1,10 > > ? ? ? ? ? write(*,*) x(j) > > > ? ? ? ?enddo > > > ? ? ? ?end > > > then at the command line, > > > gfortran bintest.f -o bintest > > ./bintest > > At line 9 of file bintest.f (unit = 80, file = 'test.bin') > > Fortran runtime error: I/O past end of record on unformatted file > > > Note, I have no difficulty reading the test.bin file back in, while in > > python, using the FortranFile.py routines. > > It is likely that the problem is with the endian-ness of the file being > created by FortranFile not matching what is expected by the fortran > compiler. ?(There is a reason that the format of unformatted I/O is not > specified in the Fortran standard.) ?Try the above with different > settings of FortranFile(..., endian='<') or '>' or '='. > > -Neil > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > -- > You received this message because you are subscribed to the Google Groups "SciPy-user" group. > To post to this group, send email to scipy-user at googlegroups.com. > To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com. > For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en. From nmb at wartburg.edu Wed May 12 16:21:17 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 12 May 2010 15:21:17 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> Message-ID: <4BEB0DBD.4080000@wartburg.edu> On 2010-05-12 14:58, Gideon wrote: > Tried both, but I got the same error in both cases. If you want doubles in your file, you have to request them: F.writeReals(x, prec='d') makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran 4.4.3). Note that looking at the size of the file that you would expect to have for the data you are expecting to read would have demonstrated this: 10 doubles at eight bytes per double plus two 4-byte integers would have given you 88 bytes for the file, rather than the 48 that were being produced. I use fortranfile most heavily for reading files, rather than writing them, so I may have missed this opportunity, but do you think that the precision used in writeReals should be auto-detected from the data type that it is passed. That is, would def writeReals(self, reals, prec=None): if prec is None: prec = reals.dtype.char ... be better for your use? That would have made your original code work as written. -Neil From seb.haase at gmail.com Wed May 12 16:31:33 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 12 May 2010 22:31:33 +0200 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: Zach, Thanks for the reply, I have looked at the sourceforge - few comments: - is there currently only one person behind FreeImage ? - it seems there are some problems with 64 bit windows - apparently related to inline assembly... - the discussion group seems not very responsive / active -- might also of course mean that it mostly "just works" ;-) - the "who uses FreeImage" list seems really quite long - but then PIL is probably also used by many ... How large is the DLL actually ? Thanks, Sebastian On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus wrote: >> Do you know if FreeImage does anything via memory-mapping ? I'm mostly >> interested in TIFF-memmap, which exists according to libtif, but I >> have now idea how useful it is ..... ?(I need memmap for GB-size >> multipage images) > > I don't know a ton about how memmapping works, but check out these > functions from FreeImage: > >> FreeImage_OpenMemory >> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data >> FI_DEFAULT(0), DWORD >> size_in_bytes FI_DEFAULT(0)); >> >> Open a memory stream. The function returns a pointer to the opened >> memory stream. >> When called with default arguments (0), this function opens a memory >> stream for read / write >> access. The stream will support loading and saving of FIBITMAP in a >> memory file (managed >> internally by FreeImage). It will also support seeking and telling >> in the memory file. >> This function can also be used to wrap a memory buffer provided by >> the application driving >> FreeImage. A buffer containing image data is given as function >> arguments data (start of the >> buffer) and size_in_bytes (buffer size in bytes). A memory buffer >> wrapped by FreeImage is >> read only. Images can be loaded but cannot be saved. > >> FreeImage_LoadFromHandle >> DLL_API FIBITMAP *DLL_CALLCONV >> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif, >> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0)); >> >> FreeImage has the unique feature to load a bitmap from an arbitrary >> source. This source >> might for example be a cabinet file, a zip file or an Internet >> stream. Handling of these arbitrary >> sources is not directly handled in the FREEIMAGE.DLL, but can be >> easily added by using a >> FreeImageIO structure as defined in FREEIMAGE.H. >> FreeImageIO is a structure that contains 4 function pointers: one to >> read from a source, one >> to write to a source, one to seek in the source and one to tell >> where in the source we currently >> are. When you populate the FreeImageIO structure with pointers to >> functions and pass that >> structure to FreeImage_LoadFromHandle, FreeImage will call your >> functions to read, seek >> and tell in a file. The handle-parameter (third parameter from the >> left) is used in this to >> differentiate between different contexts, e.g. different files or >> different Internet streams. > > With the first, I think you could just pass the void* pointer returned > from memmapping a file; with the second, I think you could wrap a > memmapped file with a file-like interface (implemented in python > callbacks, even). Not sure, of course, if that will work OK... Might > be easier to work with wrappers to libtiff directly? > > Zach > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Wed May 12 17:16:11 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 May 2010 17:16:11 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: <2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu> Hi all, > I'm definitely interested, having had several nightmarish attempts at > making PIL play nice with my 16-bit TIFF data. I don't have a ton of > spare time myself right now, but I'd like to give it a shot. Wrappers attached. Currently, they try to load "FreeImage.[dll|dylib| so]" (depending on the platform) from the same directory as the module. This is of course easy to change. The rest is basically straightforward and at least partially documented. Let me know how it works out. Note that right now there's no support for palettized images, though that could be added too. And as for license, assume this code, such as it is, is BSD. Zach -------------- next part -------------- A non-text attachment was scrubbed... Name: image.py Type: text/x-python-script Size: 10285 bytes Desc: not available URL: -------------- next part -------------- From zachary.pincus at yale.edu Wed May 12 17:25:29 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 12 May 2010 17:25:29 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> Message-ID: <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu> > - is there currently only one person behind FreeImage ? No idea... > - it seems there are some problems with 64 bit windows - apparently > related to inline assembly... Ugh, didn't notice that. > - the discussion group seems not very responsive / active -- might > also of course mean that it mostly "just works" ;-) > - the "who uses FreeImage" list seems really quite long - but then PIL > is probably also used by many ... Yeah, it doesn't seem to have a super-active community, but so far it seems to just work, which is OK for now, I hope. Not sure how clean the C code is, but going from the API at least someone has spent time thinking about making things clean and portable, etc., and hopefully maintainable. I think I'd rather hack on FreeImage's C guts than PIL's, but that's without any experience with the former. > How large is the DLL actually ? Win32 dll = 2.3 MB, OS X intel-only dylib (with debug symbols) = 2.9 MB. Not terrible. Zach > Thanks, > Sebastian > > > On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus > wrote: >>> Do you know if FreeImage does anything via memory-mapping ? I'm >>> mostly >>> interested in TIFF-memmap, which exists according to libtif, but I >>> have now idea how useful it is ..... (I need memmap for GB-size >>> multipage images) >> >> I don't know a ton about how memmapping works, but check out these >> functions from FreeImage: >> >>> FreeImage_OpenMemory >>> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data >>> FI_DEFAULT(0), DWORD >>> size_in_bytes FI_DEFAULT(0)); >>> >>> Open a memory stream. The function returns a pointer to the opened >>> memory stream. >>> When called with default arguments (0), this function opens a memory >>> stream for read / write >>> access. The stream will support loading and saving of FIBITMAP in a >>> memory file (managed >>> internally by FreeImage). It will also support seeking and telling >>> in the memory file. >>> This function can also be used to wrap a memory buffer provided by >>> the application driving >>> FreeImage. A buffer containing image data is given as function >>> arguments data (start of the >>> buffer) and size_in_bytes (buffer size in bytes). A memory buffer >>> wrapped by FreeImage is >>> read only. Images can be loaded but cannot be saved. >> >>> FreeImage_LoadFromHandle >>> DLL_API FIBITMAP *DLL_CALLCONV >>> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif, >>> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0)); >>> >>> FreeImage has the unique feature to load a bitmap from an arbitrary >>> source. This source >>> might for example be a cabinet file, a zip file or an Internet >>> stream. Handling of these arbitrary >>> sources is not directly handled in the FREEIMAGE.DLL, but can be >>> easily added by using a >>> FreeImageIO structure as defined in FREEIMAGE.H. >>> FreeImageIO is a structure that contains 4 function pointers: one to >>> read from a source, one >>> to write to a source, one to seek in the source and one to tell >>> where in the source we currently >>> are. When you populate the FreeImageIO structure with pointers to >>> functions and pass that >>> structure to FreeImage_LoadFromHandle, FreeImage will call your >>> functions to read, seek >>> and tell in a file. The handle-parameter (third parameter from the >>> left) is used in this to >>> differentiate between different contexts, e.g. different files or >>> different Internet streams. >> >> With the first, I think you could just pass the void* pointer >> returned >> from memmapping a file; with the second, I think you could wrap a >> memmapped file with a file-like interface (implemented in python >> callbacks, even). Not sure, of course, if that will work OK... Might >> be easier to work with wrappers to libtiff directly? >> >> Zach >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From gideon.simpson at gmail.com Wed May 12 18:05:55 2010 From: gideon.simpson at gmail.com (Gideon) Date: Wed, 12 May 2010 15:05:55 -0700 (PDT) Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BEB0DBD.4080000@wartburg.edu> References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> <4BEB0DBD.4080000@wartburg.edu> Message-ID: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> Yea, that worked for me on my OS X machine. Thanks so much. To be honest, in the 10 years I've been doing floating point calculations for ODEs and PDEs, I don't think I've ever used single precision arithmetic. So I am surprised it doesn't default to double precision. Obviously, different people have different needs. On May 12, 4:21?pm, Neil Martinsen-Burrell wrote: > On 2010-05-12 14:58, Gideon wrote: > > > Tried both, but I got the same error in both cases. > > If you want doubles in your file, you have to request them: > > F.writeReals(x, prec='d') > > makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran > 4.4.3). ?Note that looking at the size of the file that you would expect > to have for the data you are expecting to read would have demonstrated > this: 10 doubles at eight bytes per double plus two 4-byte integers > would have given you 88 bytes for the file, rather than the 48 that were > being produced. > > I use fortranfile most heavily for reading files, rather than writing > them, so I may have missed this opportunity, but do you think that the > precision used in writeReals should be auto-detected from the data type > that it is passed. ?That is, would > > def writeReals(self, reals, prec=None): > ? ? ?if prec is None: > ? ? ? ? ?prec = reals.dtype.char > ? ? ?... > > be better for your use? ?That would have made your original code work as > written. > > -Neil > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > -- > You received this message because you are subscribed to the Google Groups "SciPy-user" group. > To post to this group, send email to scipy-user at googlegroups.com. > To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com. > For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en. From josef.pktd at gmail.com Thu May 13 00:35:07 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 May 2010 00:35:07 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Mon, May 3, 2010 at 3:32 PM, nicky van foreest wrote: > Hi Josef, > > Thanks for your answer. > > On 3 May 2010 15:16, ? wrote: >> On Mon, May 3, 2010 at 6:04 AM, nicky van foreest wrote: >>> Hi, >>> >>> As far as I can see scipy.stats does not support the deterministic >>> distribution. Would it be a good idea to implement this also? In my >>> opinion this distribution is very useful to use as a test case, for >>> debugging purposes for instance. > > One case is the M/D/1 queue, a single server with exponentially > distributed interarrival times and deterministic service times. > Another case is an inventory system with periodic replenishments, and > random demands. A first simple model would be to use deterministically > distributed interreplenishment times. The size of demand can also be > taken to be deterministic, as an interesting limiting case. > >> >> You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution >> (I never heard the term deterministic distribution before). > > Yes. > > >> >> If the support is an integer, then rv_discrete might work, looks good see below >> >> Are there any useful operations, that we could do with it? > > Yes, like simulating the M/D/1 queue. Suppose I would like to build a > queueing simulator. I would like to set this up in a generic way, and > pass rv_arrival and ?rv_service as frozen rvs, Like this I can > experiment with several distributions, including the deterministic > distribution as a limiting case or simple case, ?all within the same > framework. > >> I think I can see a case for debugging programs that use the >> distributions in scipy.stats, but almost degenerate might also work >> for debugging. > > Sure, but sometimes you just want to exclude random effects. Moreover, > I would like to see "rv = stats.deterministic(...)" in the ?code, for > the purpose of readability. > >> >> What I would like to have is a discrete distribution on the real line, >> instead of the integers, like rv_discrete but with support on >> arbitrary floats. > > Yes, indeed. > > Please let me know your opinion. I can see that a onepoint distribution can be quite useful as a plugin degenerate distribution but also for other purposes like mixture distributions with a continuous and a discrete part (masspoints). Actually, if the onepoint distribution directly subclasses rv_generic then it wouldn't rely on or interfere with the generic framework in rv_continuous or rv_discrete (where it wouldn't really fit in if onepoint is on reals), and it might be relatively easy to provide all the methods of the distributions for a single point distribution. Choice of name: to me, "deterministic random variable" sounds like an oxymoron, although I found some references to deterministic distribution (mainly or exclusively in queuing theory and http://isi.cbs.nl/glossary/term902.htm) I would prefer a boring "onepoint" distribution, or "degenerate", or ... ? Google brings up more statistics/probability references for one-point or degenerate distribution. Can you file a ticket with what you would like to have? I started to work again a bit on enhancing the distributions, mainly I'm experimenting with several generic estimation methods. My target is to have a working estimator for any distribution in scipy.stats and for several additional distributions. I worry a bit that a deterministic distribution might not fit into a general framework for distributions and might need to be special cased for some methods. (but see above) In my new code, I went away from using distributions by name eg. in arguments for function, so I don't care anymore whether a distribution is defined in scipy.stats or in some other module, i.e. no more getattr(scipy.stats, distname) One problem is that, once new functions/classes are in scipy, backwards compatibility considerations make development a lot more sluggish, and for many parts I know what I don't like, but I'm not sure yet what an improvement should look like. In case you are interested, I'm having fun in the sandbox http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/ Cheers, Josef > > bye > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ariver at enthought.com Thu May 13 01:44:17 2010 From: ariver at enthought.com (Aaron River) Date: Thu, 13 May 2010 00:44:17 -0500 Subject: [SciPy-User] mail not getting through? In-Reply-To: References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Message-ID: Okay, it's fixed now. I disabled spamassassin a long while back, and I don't know if someone re-enabled it, or if I just spaced and didn't make it persistent. Either way, that is what was blocking your emails. I'm sending a notification of the issue to subscribers of the affected lists that were caught by this problem up to a month ago. Thanks for your patience! :) -- Aaron On Wed, May 12, 2010 at 00:21, Aaron River wrote: > Hello Zach, > > I'm the IT Administrator at Enthought. > > This is a known issue which I'm working to rectify. I'm hoping to have > it all ironed out tomorrow. > > Thanks, > > -- > Aaron > > On Tuesday, May 11, 2010, Zachary Pincus wrote: >> Hi and sorry for the spam, >> >> The last couple of times I've replied to messages from scipy-user, it >> would appear that the mail never comes through to the list, but it >> doesn't bounce back to me either. (I replied to the symmetric >> eigenvalue message, but nothing came back on the list, e.g.) >> >> If this email gets through, has anyone else seen this issue? >> >> Zach >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From dwf at cs.toronto.edu Thu May 13 02:13:43 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 13 May 2010 02:13:43 -0400 (EDT) Subject: [SciPy-User] mail not getting through? In-Reply-To: References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Message-ID: On Thu, 13 May 2010, Aaron River wrote: > Okay, it's fixed now. > > I disabled spamassassin a long while back, and I don't know if someone > re-enabled it, or if I just spaced and didn't make it persistent. > > Either way, that is what was blocking your emails. > > I'm sending a notification of the issue to subscribers of the affected > lists that were caught by this problem up to a month ago. > > Thanks for your patience! :) > > -- > Aaron > > On Wed, May 12, 2010 at 00:21, Aaron River wrote: >> Hello Zach, >> >> I'm the IT Administrator at Enthought. >> >> This is a known issue which I'm working to rectify. I'm hoping to have q>> it all ironed out tomorrow. >> >> Thanks, >> >> -- >> Aaron >> >> On Tuesday, May 11, 2010, Zachary Pincus wrote: >>> Hi and sorry for the spam, >>> >>> The last couple of times I've replied to messages from scipy-user, it >>> would appear that the mail never comes through to the list, but it >>> doesn't bounce back to me either. (I replied to the symmetric >>> eigenvalue message, but nothing came back on the list, e.g.) >>> >>> If this email gets through, has anyone else seen this issue? >>> >>> Zach >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dwf at cs.toronto.edu Thu May 13 02:15:06 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 13 May 2010 02:15:06 -0400 (EDT) Subject: [SciPy-User] mail not getting through? In-Reply-To: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Message-ID: Yep. Had that happening to me too. If this gets through then hopefully it's fixed. I accidentally sent another reply too, sorry for the noise. David From dagss at student.matnat.uio.no Thu May 13 04:17:27 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 13 May 2010 10:17:27 +0200 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BEAECC4.5050508@wartburg.edu> References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> Message-ID: <4BEBB597.7060908@student.matnat.uio.no> Neil Martinsen-Burrell wrote: > On 2010-05-12 12:56, Gideon wrote: > >> I've tried the following. >> >> In Python: >> import numpy as np >> from FortranFile import FortranFile >> >> x = np.random.rand(10) >> f = FortranFile('test.bin',mode='w') >> f.writeReals(x) >> f.close() >> >> In Fortran: >> program bintest >> >> double precision x(10) >> integer j >> >> open(unit=80, file='test.bin', status='old', form='unformatted') >> >> read(80) x >> close(80) >> >> do j=1,10 >> write(*,*) x(j) >> >> enddo >> >> >> end >> >> then at the command line, >> >> gfortran bintest.f -o bintest >> ./bintest >> At line 9 of file bintest.f (unit = 80, file = 'test.bin') >> Fortran runtime error: I/O past end of record on unformatted file >> >> Note, I have no difficulty reading the test.bin file back in, while in >> python, using the FortranFile.py routines. >> > > It is likely that the problem is with the endian-ness of the file being > created by FortranFile not matching what is expected by the fortran > compiler. (There is a reason that the format of unformatted I/O is not > specified in the Fortran standard.) Try the above with different > settings of FortranFile(..., endian='<') or '>' or '='. > A fully reliable way of reading such files is to wrap Fortran code reading it with f2py (and fwrap, when that is done). Then, compile with the Fortran compiler in question. Dag Sverre From chris at simplistix.co.uk Thu May 13 05:05:48 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 13 May 2010 10:05:48 +0100 Subject: [SciPy-User] problems with build In-Reply-To: <4BDFFE35.8060107@simplistix.co.uk> References: <4BDFFE35.8060107@simplistix.co.uk> Message-ID: <4BEBC0EC.3030506@simplistix.co.uk> It's been almost a week, does no-one want to shed any light on this? (and even longer, trying again now the lists are fixed) Chris Chris Withers wrote: > So, I tried this to get the latest numpy installed on an Ubuntu box: > > sudo apt-get build-dep python-numpy > > Then, inside the virtual_env I'm working in: > > bin/easy_install bin/easy_install numpy > > ...which left me with: > > Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg > Processing dependencies for numpy > Finished processing dependencies for numpy > Error in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", > line 248, in clean_up_temporary_directory > SystemError: Parent module 'numpy.distutils' not loaded > Error in sys.exitfunc: > Traceback (most recent call last): > File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", > line 248, in clean_up_temporary_directory > SystemError: Parent module 'numpy.distutils' not loaded > > ...and yet: > > $ bin/python > Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy > >>> > > Any idea what those weird atexit handlers are supposed to do?! > > They seem to fire not only when numpy is installed but also when > anything that depends on numpy is installed... cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From hasslerjc at comcast.net Thu May 13 07:38:56 2010 From: hasslerjc at comcast.net (John Hassler) Date: Thu, 13 May 2010 07:38:56 -0400 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> <4BEB0DBD.4080000@wartburg.edu> <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> Message-ID: <4BEBE4D0.7090003@comcast.net> An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Thu May 13 08:12:12 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 13 May 2010 13:12:12 +0100 Subject: [SciPy-User] mail not getting through? In-Reply-To: References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> Message-ID: <4BEBEC9C.9000106@simplistix.co.uk> Aaron River wrote: > Okay, it's fixed now. > > I disabled spamassassin a long while back, and I don't know if someone > re-enabled it, or if I just spaced and didn't make it persistent. > > Either way, that is what was blocking your emails. > > I'm sending a notification of the issue to subscribers of the affected > lists that were caught by this problem up to a month ago. > > Thanks for your patience! :) My mails still don't appear to be getting through... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From faltet at pytables.org Thu May 13 08:18:19 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 13 May 2010 14:18:19 +0200 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BEBE4D0.7090003@comcast.net> References: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> <4BEBE4D0.7090003@comcast.net> Message-ID: <201005131418.19355.faltet@pytables.org> A Thursday 13 May 2010 13:38:56 John Hassler escrigu?: > "Back in the day," double precision was MUCH slower than single precision > arithmetic, so Fortran used single precision by default. You used double > precision only when absolutely necessary, and you had to call it > explicitly. Fortran even had separate "built-in" functions for single and > double - eg., sin, dsin, log, dlog, etc. - that the user called > explicitly. (I haven't used Fortran for 20 years, but I think modern > Fortran recognizes the type of argument, now.) > > Single and double precision are about the same speed on modern processors, > and double is sometimes even faster than single on 64 bit processors Beware! This is so only for basic arithmetic operations. For computation of transcendent functions (sin, cos, atanh, sqrt, log...), single precision is still way faster (they require much less computations to reach the precision). > (because of the ancillary data shuffling, I think). However, Fortran is > dragging nearly 60 years of history along with it, so I'm not surprised > that it defaults to single precision. > > john -- Francesc Alted From ariver at enthought.com Thu May 13 09:42:03 2010 From: ariver at enthought.com (Aaron River) Date: Thu, 13 May 2010 08:42:03 -0500 Subject: [SciPy-User] mail not getting through? In-Reply-To: <4BEBEC9C.9000106@simplistix.co.uk> References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu> <4BEBEC9C.9000106@simplistix.co.uk> Message-ID: On Thu, May 13, 2010 at 07:12, Chris Withers wrote: > Aaron River wrote: >> >> Okay, it's fixed now. > > My mails still don't appear to be getting through... Hi Chris, I see two emails from you sent to scipy-user in the past 7 hours. (Both show up in the scipy-user archives.) 4:06am -- http://mail.scipy.org/pipermail/scipy-user/2010-May/025298.html 7:11am -- http://mail.scipy.org/pipermail/scipy-user/2010-May/025300.html Did you send any more than that? (If so, they never touched the scipy.org mta.) I've switched on "acknowledgment" for emails you send to this list. This will send you a small email confirming future posts to the list. If you wish to turn this off, or adjust your settings further, you may visit ... http://mail.scipy.org/mailman/listinfo/scipy-user ... and use "Unsubscribe or edit options" at the bottom of the page. If you have any additional or continued troubles, let me know directly, offline from the list. Thanks, -- Aaron From gael.varoquaux at normalesup.org Thu May 13 09:31:24 2010 From: gael.varoquaux at normalesup.org (=?ISO-8859-1?Q?Ga=EBl_Varoquaux?=) Date: Thu, 13 May 2010 15:31:24 +0200 Subject: [SciPy-User] EuroScipy is finally open for registration Message-ID: The registration for EuroScipyis finally open. To register, go to the website, create an account, and you will see a *?register to the conference?* button on the left. Follow it to a page which presents a *?shoping cart?*. Simply submitting this information registers you to the conference, and on the left of the website, the button will now display *?You are registered for the conference?*. The registration fee is 50 euros for the conference, and 50 euros for the tutorial. Right now there is no payment system: you will be contacted later (in a week) with instructions for paying. We apologize for such a late set up. We do realize this has come as an inconvenience to people. *Do not wait to register: the number of people we can host is limited.* An exciting program Tutorials: from beginners to experts We have two tutorial tracks: - *Introductory tutorial* : to get you to speed on scientific programming with Python. - *Advanced tutorial* : experts sharing their knowledge on specific techniques and libraries. We are very fortunate to have a top notch set of presenters. Scientific track: doing new science in Python Although the abstract submission is not yet over, We can say that we are going to have a rich set of talks, looking at the current submissions. In addition to the contributed talks, we have: - *Keynote speakers* : Hans Petter Langtangen and Konrard Hinsen, two major player of scientific computing in Python. - *Lightning talks* : one hour will be open for people to come up and present in a flash an interesting project. Publishing papers We are talking with the editors of a major scientific computing journal, and the odds are quite high that we will be able to publish a special issue on scientific computing in Python based on the proceedings of the conference. The papers will undergo peer-review independently from the conference, to ensure high quality of the final publication. Call for papers Abstract submission is still open, though not for long. We are soliciting contributions on scientific libraries and tools developed with Python and on scientific or engineering achievements using Python. These include applications, teaching, future development directions, and current research. See the call for papers . *We are very much looking forward to passionate discussions about Python in science in Paris* *Nicolas Chauvat and Ga?l Varoquaux* -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Thu May 13 10:11:40 2010 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 13 May 2010 16:11:40 +0200 Subject: [SciPy-User] ANN: SfePy 2010.2 Message-ID: <4BEC089C.1000609@ntc.zcu.cz> (resending - the Monday post did not get through) I am pleased to announce release 2010.2 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Mailing lists, issue tracking, git repository: http://sfepy.org Home page: http://sfepy.kme.zcu.cz Documentation: http://docs.sfepy.org/doc Highlights of this release -------------------------- - significantly updated documentation - new wiki pages: - SfePy Primer [1] - How to use Salome for generating meshes [2] [1] http://code.google.com/p/sfepy/wiki/Primer [2] http://code.google.com/p/sfepy/wiki/ExampleUsingSalomeWithSfePy Major improvements ------------------ Apart from many bug-fixes, let us mention: - new mesh readers (MED (Salome, PythonOCC), Gambit NEU, UserMeshIO) - mechanics: - ElasticConstants class - conversion formulas for elastic constants - StressTransform class to convert various stress tensors - basic tensor transformations - new examples: - usage of functions to define various parameter - usage of probes - new tests and many new terms For more information on this release, see http://sfepy.googlecode.com/svn/web/releases/2010.2_RELEASE_NOTES.txt (full release notes, rather long). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Vladim?r Luke?, Andre Smit, Logan Sorenson, Zuzana Z?horov? From stefan at sun.ac.za Thu May 13 10:20:33 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 13 May 2010 16:20:33 +0200 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu> References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> <2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu> Message-ID: Hey Zach On 12 May 2010 23:16, Zachary Pincus wrote: > Hi all, > >> I'm definitely interested, having had several nightmarish attempts at >> making PIL play nice with my 16-bit TIFF data. I don't have a ton of >> spare time myself right now, but I'd like to give it a shot. > > Wrappers attached. Currently, they try to load "FreeImage.[dll|dylib| > so]" (depending on the platform) from the same directory as the > module. This is of course easy to change. I converted your wrappers to plugins for scikits.image. At the moment, it still segfaults---could you help me to iron out the problems? http://github.com/stefanv/scikits.image/tree/freeimage Cheers St?fan From jrennie at gmail.com Thu May 13 10:26:45 2010 From: jrennie at gmail.com (Jason Rennie) Date: Thu, 13 May 2010 10:26:45 -0400 Subject: [SciPy-User] sparse array hstack Message-ID: It appears that numpy.hstack doesn't work with scipy sparse arrays. I'm using scipy 0.6.0 (Debian stable). Am I observing correctly? Does a later version of numpy/scipy fix this? Or, is there code available which will do an hstack on sparse arrays? Thanks, Jason -- Jason Rennie Research Scientist, ITA Software 617-714-2645 http://www.itasoftware.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Thu May 13 13:31:29 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 13 May 2010 19:31:29 +0200 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu> References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu> Message-ID: I got another question: One nice thing about PIL is that I could just throw any image file at it and it finds by itself the right format/plugin to load it. Does FreeImage have a similar feature ? - i.e. determining the image format (not just depending on file name extension would be quite important ... - Sebastian On Wed, May 12, 2010 at 11:25 PM, Zachary Pincus wrote: >> - is there currently only one person behind FreeImage ? > No idea... > >> - it seems there are some problems with 64 bit windows - apparently >> related to inline assembly... > Ugh, didn't notice that. > >> - the discussion group seems not very responsive / active -- might >> also of course mean that it mostly "just works" ;-) >> - the "who uses FreeImage" list seems really quite long - but then PIL >> is probably also used by many ... > Yeah, it doesn't seem to have a super-active community, but so far it > seems to just work, which is OK for now, I hope. > > Not sure how clean the C code is, but going from the API at least > someone has spent time thinking about making things clean and > portable, etc., and hopefully maintainable. I think I'd rather hack on > FreeImage's C guts than PIL's, but that's without any experience with > the former. > >> How large is the DLL actually ? > > Win32 dll = 2.3 MB, OS X intel-only dylib (with debug symbols) = 2.9 MB. > > Not terrible. > > Zach > > >> Thanks, >> Sebastian >> >> >> On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus > > wrote: >>>> Do you know if FreeImage does anything via memory-mapping ? I'm >>>> mostly >>>> interested in TIFF-memmap, which exists according to libtif, but I >>>> have now idea how useful it is ..... ?(I need memmap for GB-size >>>> multipage images) >>> >>> I don't know a ton about how memmapping works, but check out these >>> functions from FreeImage: >>> >>>> FreeImage_OpenMemory >>>> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data >>>> FI_DEFAULT(0), DWORD >>>> size_in_bytes FI_DEFAULT(0)); >>>> >>>> Open a memory stream. The function returns a pointer to the opened >>>> memory stream. >>>> When called with default arguments (0), this function opens a memory >>>> stream for read / write >>>> access. The stream will support loading and saving of FIBITMAP in a >>>> memory file (managed >>>> internally by FreeImage). It will also support seeking and telling >>>> in the memory file. >>>> This function can also be used to wrap a memory buffer provided by >>>> the application driving >>>> FreeImage. A buffer containing image data is given as function >>>> arguments data (start of the >>>> buffer) and size_in_bytes (buffer size in bytes). A memory buffer >>>> wrapped by FreeImage is >>>> read only. Images can be loaded but cannot be saved. >>> >>>> FreeImage_LoadFromHandle >>>> DLL_API FIBITMAP *DLL_CALLCONV >>>> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif, >>>> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0)); >>>> >>>> FreeImage has the unique feature to load a bitmap from an arbitrary >>>> source. This source >>>> might for example be a cabinet file, a zip file or an Internet >>>> stream. Handling of these arbitrary >>>> sources is not directly handled in the FREEIMAGE.DLL, but can be >>>> easily added by using a >>>> FreeImageIO structure as defined in FREEIMAGE.H. >>>> FreeImageIO is a structure that contains 4 function pointers: one to >>>> read from a source, one >>>> to write to a source, one to seek in the source and one to tell >>>> where in the source we currently >>>> are. When you populate the FreeImageIO structure with pointers to >>>> functions and pass that >>>> structure to FreeImage_LoadFromHandle, FreeImage will call your >>>> functions to read, seek >>>> and tell in a file. The handle-parameter (third parameter from the >>>> left) is used in this to >>>> differentiate between different contexts, e.g. different files or >>>> different Internet streams. >>> >>> With the first, I think you could just pass the void* pointer >>> returned >>> from memmapping a file; with the second, I think you could wrap a >>> memmapped file with a file-like interface (implemented in python >>> callbacks, even). Not sure, of course, if that will work OK... Might >>> be easier to work with wrappers to libtiff directly? >>> >>> Zach >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Thu May 13 13:43:45 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 13 May 2010 13:43:45 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu> Message-ID: > I got another question: > One nice thing about PIL is that I could just throw any image file at > it and it finds by itself the right format/plugin to load it. > Does FreeImage have a similar feature ? > - i.e. determining the image format (not just depending on file name > extension would be quite important ... Yeah, it has tools to sniff the type from a file, and also ones to determine type based on name alone: FreeImage_GetFileType and FreeImage_GetFIFFromFilename, respectively. My wrappers use the former on reading, and the latter for writing, but this is easy enough to modify. The return values of the above functions are int constants that identify each file format plugin -- and, from a brief look, it looks like it should be possible to implement new format plugins in python, using the ctypes callback tools. Overall, and to a first approximation, I'm pretty happy with the API FreeImage exposes. Zach From david at silveregg.co.jp Thu May 13 22:51:03 2010 From: david at silveregg.co.jp (David) Date: Fri, 14 May 2010 11:51:03 +0900 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <201005131418.19355.faltet@pytables.org> References: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> <4BEBE4D0.7090003@comcast.net> <201005131418.19355.faltet@pytables.org> Message-ID: <4BECBA97.40809@silveregg.co.jp> On 05/13/2010 09:18 PM, Francesc Alted wrote: > A Thursday 13 May 2010 13:38:56 John Hassler escrigu?: >> "Back in the day," double precision was MUCH slower than single precision >> arithmetic, so Fortran used single precision by default. You used double >> precision only when absolutely necessary, and you had to call it >> explicitly. Fortran even had separate "built-in" functions for single and >> double - eg., sin, dsin, log, dlog, etc. - that the user called >> explicitly. (I haven't used Fortran for 20 years, but I think modern >> Fortran recognizes the type of argument, now.) >> >> Single and double precision are about the same speed on modern processors, >> and double is sometimes even faster than single on 64 bit processors > > Beware! This is so only for basic arithmetic operations. For computation of > transcendent functions (sin, cos, atanh, sqrt, log...), single precision is > still way faster (they require much less computations to reach the precision). Also, float and double operations are at the same speed only considering everything is in the registers... So concretely, single precision is much faster for almost any code which is memory bound (it is very easy to check with numpy: something as simple as dot is around twice faster for single than double precision, assuming dot uses atlas or similarly optimized library). cheers, David From faltet at pytables.org Fri May 14 03:31:15 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 14 May 2010 09:31:15 +0200 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BECBA97.40809@silveregg.co.jp> References: <201005131418.19355.faltet@pytables.org> <4BECBA97.40809@silveregg.co.jp> Message-ID: <201005140931.15397.faltet@pytables.org> A Friday 14 May 2010 04:51:03 David escrigu?: > > Beware! This is so only for basic arithmetic operations. For > > computation of transcendent functions (sin, cos, atanh, sqrt, log...), > > single precision is still way faster (they require much less computations > > to reach the precision). > > Also, float and double operations are at the same speed only considering > everything is in the registers... So concretely, single precision is > much faster for almost any code which is memory bound (it is very easy > to check with numpy: something as simple as dot is around twice faster > for single than double precision, assuming dot uses atlas or similarly > optimized library). True. Although you don't even need atlas to see this: In [3]: a = np.arange(1e6, dtype=np.float64) In [4]: b = np.arange(1e6, dtype=np.float64) In [5]: timeit a*b 100 loops, best of 3: 5.02 ms per loop In [6]: a = np.arange(1e6, dtype=np.float32) In [7]: b = np.arange(1e6, dtype=np.float32) In [8]: timeit a*b 100 loops, best of 3: 2.68 ms per loop -- Francesc Alted From faltet at pytables.org Fri May 14 04:57:17 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 14 May 2010 10:57:17 +0200 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: References: Message-ID: <201005141057.17170.faltet@pytables.org> A Tuesday 11 May 2010 21:09:32 Gideon escrigu?: > I've previously used the FortranFile.py to read in binary data > generated by fortran computations, but now I'd like to write data from > NumPy/SciPy to binary which can be read in by a fortran program. Does > anyone have an example of using fortranfile.py to create and write > data to binary? Alternatively, can anyone suggest a way to write > numpy arrays to binary in away that permits me to specify the correct > offset (4 bytes on my machine) for fortran to then properly read the > data in? Just for completeness to other solutions offered, I'm attaching a BinaryFile class that allows you to read/write fortran files (in general, binary files). From its docstrings: """ BinaryFile: A class for accessing data to/from large binary files ================================================================= The data is meant to be read/write sequentially from/to a binary file. One can request to read a piece of data with a specific type and shape from it. Also, it supports the notion of Fortran and C ordered data, so that the returned data is always well-behaved (C-contiguous and aligned). This class is seeking capable. """ It differs from the solutions that other presented here in that it does not use the struct module at all, so it is much more faster. For example, when using Neil's fortranfile module, one have: In [1]: import fortranfile In [2]: import numpy as np In [3]: f = fortranfile.FortranFile('/tmp/test.unf',mode='w') In [5]: time f.writeReals(np.arange(1e7)) CPU times: user 6.06 s, sys: 0.14 s, total: 6.21 s Wall time: 6.41 s In [7]: f.close() In [8]: f = fortranfile.FortranFile('/tmp/test.unf',mode='r') In [9]: time f.readReals() CPU times: user 0.64 s, sys: 0.35 s, total: 0.99 s Wall time: 1.00 s Out[10]: array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99999700e+06, 9.99999800e+06, 9.99999900e+06], dtype=float32) while using my binaryfile module gives: In [1]: import numpy as np In [2]: from binaryfile import BinaryFile In [3]: f = BinaryFile('/tmp/test.bin', mode="w+", order='fortran') In [4]: time f.write(np.arange(1e7)) CPU times: user 0.04 s, sys: 0.19 s, total: 0.24 s Wall time: 0.24 s # 26x times faster than fortranfile In [6]: f.seek(0) In [7]: time f.read('f8', (int(1e7),)) CPU times: user 0.03 s, sys: 0.12 s, total: 0.15 s Wall time: 0.15 s # 6.6 times faster than fortranfile Out[8]: array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99999700e+06, 9.99999800e+06, 9.99999900e+06]) Also, binaryfile supports all the types in NumPy, even strings and records. HTH, -- Francesc Alted -------------- next part -------------- A non-text attachment was scrubbed... Name: binaryfile.py Type: text/x-python Size: 4910 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_binaryfile.py Type: text/x-python Size: 6853 bytes Desc: not available URL: From paul.m.edwards at gmail.com Fri May 14 06:30:27 2010 From: paul.m.edwards at gmail.com (Paul Edwards) Date: Fri, 14 May 2010 11:30:27 +0100 Subject: [SciPy-User] compile on win64 for intel and msvc Message-ID: Hi, I am trying to build scipy on win64 with the intel 10.1 compiler and ms visual studio 2008. I am having the same problem as reported here: http://mail.scipy.org/pipermail/scipy-user/2009-December/023642.html The problem is with the flags for the intel compiler shown here: http://mail.scipy.org/pipermail/scipy-user/2009-December/023654.html This was reported to be fixed at the end of the thread by modifying Python/scipy - could anyone tell me what I need to change in order to make this compile? Thanks in advance, Paul From paul.m.edwards at gmail.com Fri May 14 06:56:09 2010 From: paul.m.edwards at gmail.com (Paul Edwards) Date: Fri, 14 May 2010 11:56:09 +0100 Subject: [SciPy-User] compile on win64 for intel and msvc In-Reply-To: References: Message-ID: BTW If I try and use numscons instead I get: 8<--------------------------------------------------------------------- D:\scratch\SS02\pkgs\build\numpy-1.4.1>..\Python-2.6.5\PCbuild\amd64\python.exe setupscons.py scons -b --fcompiler=ifort --compiler=msvc config Running from numpy source directory. Forcing DISTUTILS_USE_SDK=1 non-existing path in 'numpy\\core': 'code_generators\\numpy_api_order.txt' non-existing path in 'numpy\\core': 'code_generators\\ufunc_api_order.txt' non-existing path in 'numpy\\core': 'include/numpy\\numpyconfig.h.in' running scons Executing scons command (pkg is numpy.core): D:\scratch\SS02\pkgs\build\Python-2.6.5\PCbuild\amd64\python.exe "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site- packages\numscons\scons-local\scons.py" -f numpy\core\SConstruct -I. scons_tool_path="" src_dir="numpy\core" pkg_path="numpy\core" pkg_name="numpy.core" log_lev el=50 distutils_libdir="..\..\..\..\build\lib.win-amd64-2.6" distutils_clibdir="..\..\..\..\build\temp.win-amd64-2.6" distutils_install_prefix="D:\scratch\SS02\ pkgs\build\Python-2.6.5\Lib\site-packages\numpy\core" cc_opt=msvc cc_opt=msvc debug=0 f77_opt=ifort cxx_opt=msvc include_bootstrap=..\..\..\..\numpy\core\includ e bypass=1 import_env=0 silent=0 bootstrapping=1 scons: Reading SConscript files ... Mkdir("build\scons\numpy\core") WindowsError: [Error 2] The system cannot find the file specified: File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\numpy\core\SConstruct", line 2: GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript') File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py", line 135: build_dir = '$build_dir', src_dir = '$src_dir') File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py", line 553: return apply(_SConscript, [self.fs,] + files, subst_kw) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py", line 262: exec _file_ in call_stack[-1].globals File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\build\scons\numpy\core\SConscript", line 38: env = GetNumpyEnvironment(ARGUMENTS) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py", line 23: env = _get_numpy_env(args) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py", line 63: initialize_tools(env) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py", line 186: initialize_f77(env) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py", line 119: env.Tool(name) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py", line 125: get_numscons_toolpaths(self)) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Environment.py", line 1704: tool(self) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Tool\__init__.py", line 181: apply(self.generate, ( env, ) + args, kw) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py", line 44: return generate_win32(env) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py", line 30: pdir = product_dir_fc(versdict[vers[0]]) File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\intel_common\win32.py", line 77: return _winreg.QueryValueEx(k, "ProductDir")[0] error: Error while executing scons command. See above for more information. If you think it is a problem in numscons, you can also try executing the scons command with --log-level option for more detailed output of what numscons is doing, for example --log-level=0; the lowest the level is, the more detailed the output it. --------------------------------------------------------------------->8 Regards, Paul ---------- Forwarded message ---------- From: Paul Edwards Date: 14 May 2010 11:30 Subject: compile on win64 for intel and msvc To: scipy-user at scipy.org Hi, I am trying to build scipy on win64 with the intel 10.1 compiler and ms visual studio 2008. ?I am having the same problem as reported here: ? ?http://mail.scipy.org/pipermail/scipy-user/2009-December/023642.html The problem is with the flags for the intel compiler shown here: ? ?http://mail.scipy.org/pipermail/scipy-user/2009-December/023654.html This was reported to be fixed at the end of the thread by modifying Python/scipy - could anyone tell me what I need to change in order to make this compile? Thanks in advance, Paul From nmb at wartburg.edu Fri May 14 10:30:37 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Fri, 14 May 2010 09:30:37 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <201005141057.17170.faltet@pytables.org> References: <201005141057.17170.faltet@pytables.org> Message-ID: <4BED5E8D.80902@wartburg.edu> On 2010-05-14 03:57 , Francesc Alted wrote: > A Tuesday 11 May 2010 21:09:32 Gideon escrigu?: >> I've previously used the FortranFile.py to read in binary data >> generated by fortran computations, but now I'd like to write data from >> NumPy/SciPy to binary which can be read in by a fortran program. Does >> anyone have an example of using fortranfile.py to create and write >> data to binary? Alternatively, can anyone suggest a way to write >> numpy arrays to binary in away that permits me to specify the correct >> offset (4 bytes on my machine) for fortran to then properly read the >> data in? > > Just for completeness to other solutions offered, I'm attaching a BinaryFile > class that allows you to read/write fortran files (in general, binary files). > From its docstrings: > > """ > BinaryFile: A class for accessing data to/from large binary files > ================================================================= > > The data is meant to be read/write sequentially from/to a binary file. > One can request to read a piece of data with a specific type and shape > from it. Also, it supports the notion of Fortran and C ordered data, > so that the returned data is always well-behaved (C-contiguous and > aligned). > > This class is seeking capable. > """ > > It differs from the solutions that other presented here in that it does not > use the struct module at all, so it is much more faster. For example, when > using Neil's fortranfile module, one have: > > In [1]: import fortranfile > > In [2]: import numpy as np > > In [3]: f = fortranfile.FortranFile('/tmp/test.unf',mode='w') > > In [5]: time f.writeReals(np.arange(1e7)) > CPU times: user 6.06 s, sys: 0.14 s, total: 6.21 s > Wall time: 6.41 s > > In [7]: f.close() > > In [8]: f = fortranfile.FortranFile('/tmp/test.unf',mode='r') > > In [9]: time f.readReals() > CPU times: user 0.64 s, sys: 0.35 s, total: 0.99 s > Wall time: 1.00 s > Out[10]: > array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., > 9.99999700e+06, 9.99999800e+06, 9.99999900e+06], dtype=float32) > > while using my binaryfile module gives: > > In [1]: import numpy as np > > In [2]: from binaryfile import BinaryFile > > In [3]: f = BinaryFile('/tmp/test.bin', mode="w+", order='fortran') > > In [4]: time f.write(np.arange(1e7)) > CPU times: user 0.04 s, sys: 0.19 s, total: 0.24 s > Wall time: 0.24 s # 26x times faster than fortranfile > > In [6]: f.seek(0) > > In [7]: time f.read('f8', (int(1e7),)) > CPU times: user 0.03 s, sys: 0.12 s, total: 0.15 s > Wall time: 0.15 s # 6.6 times faster than fortranfile > Out[8]: > array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., > 9.99999700e+06, 9.99999800e+06, 9.99999900e+06]) > > Also, binaryfile supports all the types in NumPy, even strings and records. Wonderful speed! But, alas, binaryfile does not produce fortran unformatted output. The format that you've written is what Fortran calls stream output and is a relatively recent addition to that language. While fortranfile is certainly slow due to its use of the struct module for all writes and reads, it allows it to read and write Fortran's record-oriented (not like numpy records) format with a great deal of flexibility. It was designed to be able to read data files created by Fortran simulation codes that may have been produced on machines with different integer sizes and endian-ness than the machine doing the reading. Your binaryfile does not do this, although I do not doubt that it could be done. Any improvements that make fortranfile faster will be gladly accepted! -Neil From faltet at pytables.org Fri May 14 10:51:29 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 14 May 2010 16:51:29 +0200 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BED5E8D.80902@wartburg.edu> References: <201005141057.17170.faltet@pytables.org> <4BED5E8D.80902@wartburg.edu> Message-ID: <201005141651.29769.faltet@pytables.org> A Friday 14 May 2010 16:30:37 Neil Martinsen-Burrell escrigu?: > Wonderful speed! But, alas, binaryfile does not produce fortran > unformatted output. The format that you've written is what Fortran > calls stream output and is a relatively recent addition to that > language. Mmh. I'm rather ignorant in this matter, but I'm wondering if what you call 'stream output' would be the same than the venerable 'sequential access' mode (that exists at least since Fortran 90)? > While fortranfile is certainly slow due to its use of the > struct module for all writes and reads, it allows it to read and write > Fortran's record-oriented (not like numpy records) format with a great > deal of flexibility. You are right. I suppose that what you call 'record-oriented' is the 'direct access' mode in literature. Yup, this is not supported by binaryfile. > It was designed to be able to read data files > created by Fortran simulation codes that may have been produced on > machines with different integer sizes and endian-ness than the machine > doing the reading. Your binaryfile does not do this, although I do not > doubt that it could be done. Any improvements that make fortranfile > faster will be gladly accepted! Well, I suppose that if you can get rid of the struct module in fortranfile you may get much better performance. I don't think this would require a lot of work. -- Francesc Alted From ralf.gommers at googlemail.com Fri May 14 11:45:27 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 14 May 2010 23:45:27 +0800 Subject: [SciPy-User] problems with build In-Reply-To: <4BEBC0EC.3030506@simplistix.co.uk> References: <4BDFFE35.8060107@simplistix.co.uk> <4BEBC0EC.3030506@simplistix.co.uk> Message-ID: On Thu, May 13, 2010 at 5:05 PM, Chris Withers wrote: > It's been almost a week, does no-one want to shed any light on this? > (and even longer, trying again now the lists are fixed) > > Chris > > Chris Withers wrote: > > So, I tried this to get the latest numpy installed on an Ubuntu box: > > > > sudo apt-get build-dep python-numpy > > > > Then, inside the virtual_env I'm working in: > > > > bin/easy_install bin/easy_install numpy > > > > ...which left me with: > > > > Installed > .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg > > Processing dependencies for numpy > > Finished processing dependencies for numpy > > Error in atexit._run_exitfuncs: > > Traceback (most recent call last): > > File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File > > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", > > line 248, in clean_up_temporary_directory > > SystemError: Parent module 'numpy.distutils' not loaded > > Error in sys.exitfunc: > > Traceback (most recent call last): > > File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs > > func(*targs, **kargs) > > File > > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", > > line 248, in clean_up_temporary_directory > > SystemError: Parent module 'numpy.distutils' not loaded > > > > ...and yet: > > > > $ bin/python > > Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) > > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numpy > > >>> > > > > Any idea what those weird atexit handlers are supposed to do?! > Looks like they should be cleaning up temporary dirs after the install is completed and python (or in this case easy_install) exits. Do you see this also when installing numpy or a package that depends on numpy with regular "python setup.py build/install"? Given how often easy_install fails to install anything but pure-python packages that would be my first guess of where the problem is. Cheers, Ralf > > > > They seem to fire not only when numpy is installed but also when > > anything that depends on numpy is installed... > > cheers, > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Fri May 14 13:44:06 2010 From: opossumnano at gmail.com (Tiziano Zito) Date: Fri, 14 May 2010 19:44:06 +0200 Subject: [SciPy-User] ANN: MDP release 2.6 and MDP Sprint 2010 Message-ID: <20100514174406.GE29048@multivac.zonafranca> We are glad to announce release 2.6 of the Modular toolkit for Data Processing (MDP). MDP is a Python library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software. The base of available algorithms includes, to name but the most common, Principal Component Analysis (PCA and NIPALS), several Independent Component Analysis algorithms (CuBICA, FastICA, TDSEP, JADE, and XSFA), Slow Feature Analysis, Restricted Boltzmann Machine, and Locally Linear Embedding. What's new in version 2.6? -------------------------- - Several new classifier nodes have been added. - A new node extension mechanism makes it possible to dynamically add methods or attributes for specific features to node classes, enabling aspect-oriented programming in MDP. Several MDP features (like parallelization) are now based on this mechanism, and users can add their own custom node extensions. - BiMDP is a large new package in MDP that introduces bidirectional data flows to MDP, including backpropagation and even loops. BiMDP also enables the transportation of additional data in flows via messages. - BiMDP includes a new flow inspection tool, that runs as as a graphical debugger in the webrowser to step through complex flows. It can be extended by users for the analysis and visualization of intermediate data. - As usual, tons of bug fixes The new additions in the library have been thoroughly tested but, as usual after a public release, we especially welcome user's feedback and bug reports. MDP Sprint 2010 --------------- Following our tradition of sprint-driven development, the team of the core developers decided to organize a programming sprint open to external participants. We invite in particular all users who implemented new algorithms and would like to see them integrated in MDP: you will work together with a core developer! More info: http://sourceforge.net/apps/mediawiki/mdp-toolkit/index.php?title=MDP_Sprint_2010 Resources --------- Download: http://sourceforge.net/projects/mdp-toolkit/files Homepage: http://mdp-toolkit.sourceforge.net Mailing list: http://lists.sourceforge.net/mailman/listinfo/mdp-toolkit-users -- Pietro Berkes Volen Center for Complex Systems Brandeis University Waltham, MA, USA Rike-Benjamin Schuppner Berlin, Germany Niko Wilbert Institute for Theoretical Biology Humboldt-University Berlin, Germany Tiziano Zito Modelling of Cognitive Processes Berlin Institute of Technology and Bernstein Center for Computational Neuroscience Berlin, Germany From saintmlx at apstat.com Fri May 14 14:05:22 2010 From: saintmlx at apstat.com (Xavier Saint-Mleux) Date: Fri, 14 May 2010 14:05:22 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: Message-ID: <4BED90E2.4000707@apstat.com> Sebastian Walter wrote: > Hello Pauli, > On what kind of matrix did you observe such unstable behavior? > Were there repeated eigenvalues? > It happens to me a lot with complex covariance matrices (Hermitian). Here's a simple example that returns non-real eigenvalues for an Hermitian matrix: >>> np.random.seed(0) >>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j >>> x = (x+x.T.conj())/2 # make it Hermitian >>> x == x.T.conj() # ensure it is Hermitian array([[ True, True, True], [ True, True, True], [ True, True, True]], dtype=bool) >>> np.linalg.eigvals(x) # returns complex values array([ 1.99062044 -4.98523579e-17j, 0.18062978 -9.36952928e-19j, -0.23511915 -2.19606549e-17j]) >>> np.linalg.eigvalsh(x) # imag always zero array([-0.23511915+0.j, 0.18062978+0.j, 1.99062044+0.j]) >>> Xavier > Sebastian > > > > > > On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen wrote: > >> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >> >>> I've find that (scipy/numpy).linalg.eig have a problem where given a >>> symmetric matrix they return complex eigenvalues. I can use scipy.io to >>> save this matrix in matlab format, load it in matlab, and use matlab's >>> eig function to succesfully decompose it with real eigenvalues, so the >>> problem seems to be with scipy/numpy or their dependencies, not with my >>> matrix. Is this a known issue? And is there a good workaround? >>> >> Use the eigh function if you know your matrix is symmetric. >> >> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a >> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic >> check. >> >> A nonsymmetric eigensolver cannot know that your matrix is supposed to >> have real eigenvalues, so it's possible some of them explode to complex >> pairs because of minuscule numerical error. The imaginary part, however, >> is typically small. >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From aarchiba at physics.mcgill.ca Fri May 14 14:12:37 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 14 May 2010 14:12:37 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: <4BED90E2.4000707@apstat.com> References: <4BED90E2.4000707@apstat.com> Message-ID: On 14 May 2010 14:05, Xavier Saint-Mleux wrote: > Sebastian Walter wrote: >> Hello Pauli, >> On what kind of matrix did you observe such unstable behavior? >> Were there repeated eigenvalues? >> > > It happens to me a lot with complex covariance matrices (Hermitian). > Here's a simple example that returns non-real eigenvalues for an > Hermitian matrix: Uh, not to be difficult, but these values are not actually complex. The complex component is within a floating-point epsilon of zero. The only way to do better than this is to explicitly notice that the matrix is Hermitian and branch to special-case code. And if the matrix is only numerically Hermitian, i.e. values that should be equal differ by a floating-point epsilon, even this won't help. Anne > >>>> np.random.seed(0) >>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j >>>> x = (x+x.T.conj())/2 # make it Hermitian >>>> x == x.T.conj() # ensure it is Hermitian > array([[ True, ?True, ?True], > ? ? ? [ True, ?True, ?True], > ? ? ? [ True, ?True, ?True]], dtype=bool) >>>> np.linalg.eigvals(x) # returns complex values > array([ 1.99062044 -4.98523579e-17j, ?0.18062978 -9.36952928e-19j, > ? ? ? -0.23511915 -2.19606549e-17j]) >>>> np.linalg.eigvalsh(x) # imag always zero > array([-0.23511915+0.j, ?0.18062978+0.j, ?1.99062044+0.j]) >>>> > > > > Xavier > > > >> Sebastian >> >> >> >> >> >> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen wrote: >> >>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >>> >>>> I've find that (scipy/numpy).linalg.eig have a problem where given a >>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to >>>> save this matrix in matlab format, load it in matlab, and use matlab's >>>> eig function to succesfully decompose it with real eigenvalues, so the >>>> problem seems to be with scipy/numpy or their dependencies, not with my >>>> matrix. Is this a known issue? And is there a good workaround? >>>> >>> Use the eigh function if you know your matrix is symmetric. >>> >>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a >>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic >>> check. >>> >>> A nonsymmetric eigensolver cannot know that your matrix is supposed to >>> have real eigenvalues, so it's possible some of them explode to complex >>> pairs because of minuscule numerical error. The imaginary part, however, >>> is typically small. >>> >>> -- >>> Pauli Virtanen >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From chris.d.burns at gmail.com Fri May 14 14:23:09 2010 From: chris.d.burns at gmail.com (Christopher Burns) Date: Fri, 14 May 2010 11:23:09 -0700 Subject: [SciPy-User] problems with build In-Reply-To: <4BEBC0EC.3030506@simplistix.co.uk> References: <4BDFFE35.8060107@simplistix.co.uk> <4BEBC0EC.3030506@simplistix.co.uk> Message-ID: I ran into the same error yesterday: "SystemError: Parent module 'numpy.distutils' not loaded" It turned out to be a broken install of numpy. I was installing mayavi with synaptic and it pulled in numpy as a dependency. At the end of the install it gave me this error msg: E: python-numpy: subprocess post-installation script returned error exit status 1 Synaptic installed numpy here: /usr/lib/python2.5/site-packages ... but this numpy was not importable. To fix it I manually removed numpy, then used synaptic to install numpy only... tested the import and ran numpy tests... once that was ok, then I installed mayavi and everything worked. Chris On Thu, May 13, 2010 at 2:05 AM, Chris Withers wrote: > It's been almost a week, does no-one want to shed any light on this? > (and even longer, trying again now the lists are fixed) > > Chris > > Chris Withers wrote: >> So, I tried this to get the latest numpy installed on an Ubuntu box: >> >> sudo apt-get build-dep python-numpy >> >> Then, inside the virtual_env I'm working in: >> >> bin/easy_install bin/easy_install numpy >> >> ...which left me with: >> >> Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg >> Processing dependencies for numpy >> Finished processing dependencies for numpy >> Error in atexit._run_exitfuncs: >> Traceback (most recent call last): >> ? ?File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs >> ? ? ?func(*targs, **kargs) >> ? ?File >> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", >> line 248, in clean_up_temporary_directory >> SystemError: Parent module 'numpy.distutils' not loaded >> Error in sys.exitfunc: >> Traceback (most recent call last): >> ? ?File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs >> ? ? ?func(*targs, **kargs) >> ? ?File >> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", >> line 248, in clean_up_temporary_directory >> SystemError: Parent module 'numpy.distutils' not loaded >> >> ...and yet: >> >> $ bin/python >> Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) >> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> ?>>> import numpy >> ?>>> >> >> Any idea what those weird atexit handlers are supposed to do?! >> >> They seem to fire not only when numpy is installed but also when >> anything that depends on numpy is installed... > > cheers, > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > ? ? ? ? ? ?- http://www.simplistix.co.uk > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Christopher Burns Senior Software Engineer O.N. Diagnostics, LLC 64 Shattuck Sq. Suite 220, Berkeley, CA 94704 _____________________________________ If you receive this message in error, please delete it immediately. This message may contain information that is privileged, confidential and exempt from disclosure and dissemination under applicable law. From josef.pktd at gmail.com Fri May 14 14:31:20 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 May 2010 14:31:20 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: <4BED90E2.4000707@apstat.com> Message-ID: On Fri, May 14, 2010 at 2:12 PM, Anne Archibald wrote: > On 14 May 2010 14:05, Xavier Saint-Mleux wrote: >> Sebastian Walter wrote: >>> Hello Pauli, >>> On what kind of matrix did you observe such unstable behavior? >>> Were there repeated eigenvalues? >>> >> >> It happens to me a lot with complex covariance matrices (Hermitian). >> Here's a simple example that returns non-real eigenvalues for an >> Hermitian matrix: > > Uh, not to be difficult, but these values are not actually complex. > The complex component is within a floating-point epsilon of zero. The > only way to do better than this is to explicitly notice that the > matrix is Hermitian and branch to special-case code. And if the matrix > is only numerically Hermitian, i.e. values that should be equal differ > by a floating-point epsilon, even this won't help. this might help to get rid of complex noise numpy.real_if_close(a, tol=100) If complex input returns a real array if complex parts are close to zero. Josef > > Anne > >> >>>>> np.random.seed(0) >>>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j >>>>> x = (x+x.T.conj())/2 # make it Hermitian >>>>> x == x.T.conj() # ensure it is Hermitian >> array([[ True, ?True, ?True], >> ? ? ? [ True, ?True, ?True], >> ? ? ? [ True, ?True, ?True]], dtype=bool) >>>>> np.linalg.eigvals(x) # returns complex values >> array([ 1.99062044 -4.98523579e-17j, ?0.18062978 -9.36952928e-19j, >> ? ? ? -0.23511915 -2.19606549e-17j]) >>>>> np.linalg.eigvalsh(x) # imag always zero >> array([-0.23511915+0.j, ?0.18062978+0.j, ?1.99062044+0.j]) >>>>> >> >> >> >> Xavier >> >> >> >>> Sebastian >>> >>> >>> >>> >>> >>> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen wrote: >>> >>>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >>>> >>>>> I've find that (scipy/numpy).linalg.eig have a problem where given a >>>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to >>>>> save this matrix in matlab format, load it in matlab, and use matlab's >>>>> eig function to succesfully decompose it with real eigenvalues, so the >>>>> problem seems to be with scipy/numpy or their dependencies, not with my >>>>> matrix. Is this a known issue? And is there a good workaround? >>>>> >>>> Use the eigh function if you know your matrix is symmetric. >>>> >>>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a >>>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic >>>> check. >>>> >>>> A nonsymmetric eigensolver cannot know that your matrix is supposed to >>>> have real eigenvalues, so it's possible some of them explode to complex >>>> pairs because of minuscule numerical error. The imaginary part, however, >>>> is typically small. >>>> >>>> -- >>>> Pauli Virtanen >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From saintmlx at apstat.com Fri May 14 15:00:23 2010 From: saintmlx at apstat.com (Xavier Saint-Mleux) Date: Fri, 14 May 2010 15:00:23 -0400 Subject: [SciPy-User] Eigenvalue decomposition bug In-Reply-To: References: <4BED90E2.4000707@apstat.com> Message-ID: <4BED9DC7.8020804@apstat.com> josef.pktd at gmail.com wrote: > On Fri, May 14, 2010 at 2:12 PM, Anne Archibald > wrote: > >> On 14 May 2010 14:05, Xavier Saint-Mleux wrote: >> >>> Sebastian Walter wrote: >>> >>>> Hello Pauli, >>>> On what kind of matrix did you observe such unstable behavior? >>>> Were there repeated eigenvalues? >>>> >>>> >>> It happens to me a lot with complex covariance matrices (Hermitian). >>> Here's a simple example that returns non-real eigenvalues for an >>> Hermitian matrix: >>> >> Uh, not to be difficult, but these values are not actually complex. >> The complex component is within a floating-point epsilon of zero. The >> only way to do better than this is to explicitly notice that the >> matrix is Hermitian and branch to special-case code. And if the matrix >> is only numerically Hermitian, i.e. values that should be equal differ >> by a floating-point epsilon, even this won't help. >> > > this might help to get rid of complex noise > > numpy.real_if_close(a, tol=100) > If complex input returns a real array if complex parts are close to zero. > Thanks, Josef! I was just dropping the imaginary part whenever I needed to, but 'real_if_close' looks like a much cleaner solution. Xavier > Josef > > > >> Anne >> >> >>>>>> np.random.seed(0) >>>>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j >>>>>> x = (x+x.T.conj())/2 # make it Hermitian >>>>>> x == x.T.conj() # ensure it is Hermitian >>>>>> >>> array([[ True, True, True], >>> [ True, True, True], >>> [ True, True, True]], dtype=bool) >>> >>>>>> np.linalg.eigvals(x) # returns complex values >>>>>> >>> array([ 1.99062044 -4.98523579e-17j, 0.18062978 -9.36952928e-19j, >>> -0.23511915 -2.19606549e-17j]) >>> >>>>>> np.linalg.eigvalsh(x) # imag always zero >>>>>> >>> array([-0.23511915+0.j, 0.18062978+0.j, 1.99062044+0.j]) >>> >>> >>> Xavier >>> >>> >>> >>> >>>> Sebastian >>>> >>>> >>>> >>>> >>>> >>>> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen wrote: >>>> >>>> >>>>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote: >>>>> >>>>> >>>>>> I've find that (scipy/numpy).linalg.eig have a problem where given a >>>>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to >>>>>> save this matrix in matlab format, load it in matlab, and use matlab's >>>>>> eig function to succesfully decompose it with real eigenvalues, so the >>>>>> problem seems to be with scipy/numpy or their dependencies, not with my >>>>>> matrix. Is this a known issue? And is there a good workaround? >>>>>> >>>>>> >>>>> Use the eigh function if you know your matrix is symmetric. >>>>> >>>>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a >>>>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic >>>>> check. >>>>> >>>>> A nonsymmetric eigensolver cannot know that your matrix is supposed to >>>>> have real eigenvalues, so it's possible some of them explode to complex >>>>> pairs because of minuscule numerical error. The imaginary part, however, >>>>> is typically small. >>>>> >>>>> -- >>>>> Pauli Virtanen >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From briedel at wisc.edu Fri May 14 15:01:06 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Fri, 14 May 2010 14:01:06 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit Message-ID: Hey, I am fairly new Scipy and am trying to do a least square fit to a set of data. Currently, I am using following code: fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) errfunc = lambda p, x, y: (y-fitfunc(p,x)) pinit = [20,20.] out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1) I am now trying to get the goodness of fit out of this data. I am sort of running into a brick wall because I found a lot of conflicting ways of how to calculate it. I am aware of the chisquare function in stats function, but the documentation seems a little confusing to me. Any help would be greatly appreciates. Thanks very much in advance. Cheers, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 14 15:51:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 May 2010 15:51:29 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel wrote: > Hey, > > I am fairly new Scipy and am trying to do a least square fit to a set of > data. Currently, I am using following code: > > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > pinit = [20,20.] > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1) > > I am now trying to get the goodness of fit out of this data. I am sort of > running into a brick wall because I found a lot of conflicting ways of how > to calculate it. For regression the usual is http://en.wikipedia.org/wiki/Coefficient_of_determination coefficient of determination is R^2 = 1 - {SS_{err} / SS_{tot}} Note your fitfunc is linear in parameters and can be better estimated by linear least squares, OLS. linear regression is handled in statsmodels and you can get lot's of statistics without worrying about the formulas. If you only have one slope parameter, then scipy.stats.linregress also works scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance of the parameter estimates. http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > I am aware of the chisquare function in stats function, but the > documentation seems a little confusing to me. Any help would be greatly > appreciates. chisquare and others like kolmogorov-smirnov are more for testing the goodness-of-fit of entire distributions, not for how well a curve or line fits the data. Josef > > Thanks very much in advance. > > Cheers, > > Ben > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From lpc at cmu.edu Fri May 14 16:50:11 2010 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Fri, 14 May 2010 16:50:11 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers Message-ID: <201005141650.17597.lpc@cmu.edu> On Wednesday, Sebastian Haase wrote: > this sounds exciting and I might find some time to try it out ... > BTW, the Python image-sig should not be a "PIL only" mailing list. So > (eventually) I feel, this issue could be brought up there, too. I have created a mailing list for python computer vision topics (things that are images but not PIL related): http://groups.google.com/group/pythonvision?pli=1 It is currently very low traffic since it just started (this is my first public announcement). * Btw, for the same sort of issues (opening 16-bit TIFFs in particular), I once wrote a wrapper around imagemagick's C++ image opening functions: http://github.com/luispedro/readmagick I works nicely on linux, but some people were trying to use it on Mac or Windows and got really stuck b/c they didn't know how to compile it and I couldn't help them, so I gave up on trying to make this more widely used. HTH -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From rajs2010 at gmail.com Sat May 15 00:05:31 2010 From: rajs2010 at gmail.com (Rajeev Singh) Date: Sat, 15 May 2010 09:35:31 +0530 Subject: [SciPy-User] weave newbie question Message-ID: Hi, The following program is not doing what I expect it to do a, b = 1, 2 code = \ ''' int temp; temp = a; a = b; b = temp; ''' weave.inline(code, ['a', 'b']) print a, b whereas the following is working fine a = np.arange(5) b = np.arange(5,10) print a, b code = \ ''' double temp; int i; for (i=0; i<5; i++) { temp = a[i]; a[i] = b[i]; b[i] = temp; } ''' weave.inline(code, ['a', 'b']) print a, b I think I am missing something very basic. Can someone help me out here? Best wishes, Rajeev -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat May 15 05:03:33 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 15 May 2010 18:03:33 +0900 Subject: [SciPy-User] compile on win64 for intel and msvc In-Reply-To: References: Message-ID: On Fri, May 14, 2010 at 7:56 PM, Paul Edwards wrote: > BTW If I try and use numscons instead I get: > > 8<--------------------------------------------------------------------- > D:\scratch\SS02\pkgs\build\numpy-1.4.1>..\Python-2.6.5\PCbuild\amd64\python.exe > setupscons.py scons -b --fcompiler=ifort --compiler=msvc config > Running from numpy source directory. > Forcing DISTUTILS_USE_SDK=1 > non-existing path in 'numpy\\core': 'code_generators\\numpy_api_order.txt' > non-existing path in 'numpy\\core': 'code_generators\\ufunc_api_order.txt' > non-existing path in 'numpy\\core': 'include/numpy\\numpyconfig.h.in' > running scons > Executing scons command (pkg is numpy.core): > D:\scratch\SS02\pkgs\build\Python-2.6.5\PCbuild\amd64\python.exe > "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site- > packages\numscons\scons-local\scons.py" -f numpy\core\SConstruct -I. > scons_tool_path="" src_dir="numpy\core" pkg_path="numpy\core" > pkg_name="numpy.core" log_lev > el=50 distutils_libdir="..\..\..\..\build\lib.win-amd64-2.6" > distutils_clibdir="..\..\..\..\build\temp.win-amd64-2.6" > distutils_install_prefix="D:\scratch\SS02\ > pkgs\build\Python-2.6.5\Lib\site-packages\numpy\core" cc_opt=msvc > cc_opt=msvc debug=0 f77_opt=ifort cxx_opt=msvc > include_bootstrap=..\..\..\..\numpy\core\includ > e bypass=1 import_env=0 silent=0 bootstrapping=1 > scons: Reading SConscript files ... > Mkdir("build\scons\numpy\core") > WindowsError: [Error 2] The system cannot find the file specified: > ?File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\numpy\core\SConstruct", line 2: > ? ?GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript') > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py", > line 135: > ? ?build_dir = '$build_dir', src_dir = '$src_dir') > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py", > line 553: > ? ?return apply(_SConscript, [self.fs,] + files, subst_kw) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py", > line 262: > ? ?exec _file_ in call_stack[-1].globals > ?File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\build\scons\numpy\core\SConscript", > line 38: > ? ?env = GetNumpyEnvironment(ARGUMENTS) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py", > line 23: > ? ?env = _get_numpy_env(args) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py", > line 63: > ? ?initialize_tools(env) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py", > line 186: > ? ?initialize_f77(env) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py", > line 119: > ? ?env.Tool(name) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py", > line 125: > ? ?get_numscons_toolpaths(self)) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Environment.py", > line 1704: > ? ?tool(self) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Tool\__init__.py", > line 181: > ? ?apply(self.generate, ( env, ) + args, kw) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py", > line 44: > ? ?return generate_win32(env) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py", > line 30: > ? ?pdir = product_dir_fc(versdict[vers[0]]) > ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\intel_common\win32.py", > line 77: > ? ?return _winreg.QueryValueEx(k, "ProductDir")[0] > error: Error while executing scons command. See above for more information. > If you think it is a problem in numscons, you can also try executing the scons > command with --log-level option for more detailed output of what numscons is > doing, for example --log-level=0; the lowest the level is, the more detailed > the output it. Where is VS 2008 installed ? Are you sure you have the 64 bits SDK (the free version does not have it AFAIK) ? I should update numscons scons copy to a more recent version, but I don't have time to work on numscons ATM. cheers, David From 3njoywind at gmail.com Sat May 15 05:25:37 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Sat, 15 May 2010 17:25:37 +0800 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous Message-ID: Traceback (most recent call last): File "D:\Yt.py", line 31, in r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 266, in leastsq m = check_func(func,x0,args,n)[0] File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 12, in check_func res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) File "D:\Yt.py", line 26, in residuals return y - Yt(x, p) File "D:\Yt.py", line 20, in Yt for i in range(0, Et(x)): File "D:\Yt.py", line 11, in Et if t == 1995: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() --------------------------------------------------------------------------------------------------------- When running the following code: -------------------------------------------------------------------------------------------- from scipy.optimize import leastsq import numpy as np def Iv(t): if t == 1995: return t + 2 else: return t def Et(t): if t == 1995: return t + 2 else: return t def Yt(x, p): a, pa = p sum = 0 for i in range(0, Et(x)): v = x - et + i sum += a*(1+p)**(v)*Iv(v) return sum def residuals(p, y, x): return y - Yt(x, p) T = np.array([1995,1996,1997,1998,1999]) Y = np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) A, Pa = r[0] print "A=",A,"Pa=",Pa ---------------------------------------------------------------------------------------------- I know the error occurs when I compare t like: "if t == 1995",but I have no idea how to handle it correctly. Any help would be greatly appreciated. Zhe Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 15 07:02:40 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 May 2010 07:02:40 -0400 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote: > Traceback (most recent call last): > ??File "D:\Yt.py", line 31, in > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 266, > in leastsq > ?? ?m = check_func(func,x0,args,n)[0] > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 12, > in check_func > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > ??File "D:\Yt.py", line 26, in residuals > ?? ?return y - Yt(x, p) > ??File "D:\Yt.py", line 20, in Yt > ?? ?for i in range(0, Et(x)): > ??File "D:\Yt.py", line 11, in Et > ?? ?if t == 1995: > ValueError: The truth value of an array with more than one element is > ambiguous. Use a.any() or a.all() > --------------------------------------------------------------------------------------------------------- > > When running the following code: > > -------------------------------------------------------------------------------------------- > > from scipy.optimize import leastsq > import numpy as np > > def Iv(t): > if t == 1995: > return t + 2 > else: > return t > > def Et(t): > if t == 1995: > return t + 2 > else: > return t > > def Yt(x, p): > a, pa = p > sum = 0 > > for i in range(0, Et(x)): > v = x - et + i > sum += a*(1+p)**(v)*Iv(v) > return sum > > def residuals(p, y, x): > return y - Yt(x, p) > > T = np.array([1995,1996,1997,1998,1999]) > Y = > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > A, Pa = r[0] > print "A=",A,"Pa=",Pa > > ---------------------------------------------------------------------------------------------- > > I know the error occurs when I compare t like: "if t == 1995",but I have no > idea how to handle it correctly. try the vectorized version of a conditional assignment, e.g. np.where(t == 1995, t, t+2) I didn't read enough of your example, to tell whether your Yt loop can be vectorized with a single sum, but I guess so. optimize leastsq expects an array, so residuals (and Yt) need to return an array not a single value, maybe np.cusum and conditional or data dependent slicing/indexing works Josef > > Any help would be greatly appreciated. > > Zhe Wang > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From 3njoywind at gmail.com Sat May 15 08:49:06 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Sat, 15 May 2010 20:49:06 +0800 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: Josef: Thanks for your reply:) Actually I want to fit this equation: Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be calculated by e() and I(). I rewrote my code like this: ---------------------------------------------------------------------------------------------- from scipy.optimize import leastsq import numpy as np def Iv(t): return 4 def Yt(x, et): a, pa = x sum = np.array([0,0,0,0,0]) for i in range(0, len(et)): for j in range(0, et[i]): v = T[i] - et[i] + j sum[i] += a*(1+pa)**(v)*Iv(v) return sum - Y T = np.array([1995,1996,1997,1998,1999]) Y = np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) E = np.array([10,11,12,13,14]) r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) A, Pa = r[0] print "A=",A,"Pa=",Pa ---------------------------------------------------------------------------------------------- the output is: A= 1.0 Pa = 0.0 ---------------------------------------------------------------------------------------------- I don't think it is correct. Hope for your guidence. On Sat, May 15, 2010 at 7:02 PM, wrote: > On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote: > > Traceback (most recent call last): > > File "D:\Yt.py", line 31, in > > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line > 266, > > in leastsq > > m = check_func(func,x0,args,n)[0] > > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line > 12, > > in check_func > > res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > > File "D:\Yt.py", line 26, in residuals > > return y - Yt(x, p) > > File "D:\Yt.py", line 20, in Yt > > for i in range(0, Et(x)): > > File "D:\Yt.py", line 11, in Et > > if t == 1995: > > ValueError: The truth value of an array with more than one element is > > ambiguous. Use a.any() or a.all() > > > --------------------------------------------------------------------------------------------------------- > > > > When running the following code: > > > > > -------------------------------------------------------------------------------------------- > > > > from scipy.optimize import leastsq > > import numpy as np > > > > def Iv(t): > > if t == 1995: > > return t + 2 > > else: > > return t > > > > def Et(t): > > if t == 1995: > > return t + 2 > > else: > > return t > > > > def Yt(x, p): > > a, pa = p > > sum = 0 > > > > for i in range(0, Et(x)): > > v = x - et + i > > sum += a*(1+p)**(v)*Iv(v) > > return sum > > > > def residuals(p, y, x): > > return y - Yt(x, p) > > > > T = np.array([1995,1996,1997,1998,1999]) > > Y = > > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > > > > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > > A, Pa = r[0] > > print "A=",A,"Pa=",Pa > > > > > ---------------------------------------------------------------------------------------------- > > > > I know the error occurs when I compare t like: "if t == 1995",but I have > no > > idea how to handle it correctly. > > try the vectorized version of a conditional assignment, e.g. > np.where(t == 1995, t, t+2) > > I didn't read enough of your example, to tell whether your Yt loop can > be vectorized with a single sum, but I guess so. > > optimize leastsq expects an array, so residuals (and Yt) need to > return an array not a single value, maybe np.cusum and conditional or > data dependent slicing/indexing works > > Josef > > > > > > Any help would be greatly appreciated. > > > > Zhe Wang > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 15 09:08:22 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 May 2010 09:08:22 -0400 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote: > Josef: > Thanks for your reply:) > Actually I want to fit this equation: > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be > calculated by e() and I(). Do you always have a fixed start date as in your example >>> T = np.array([1995,1996,1997,1998,1999]) >>> E = np.array([10,11,12,13,14]) >>> T-E array([1985, 1985, 1985, 1985, 1985]) so that always v =range(T0, T+1) with fixed T0=1985 this would make it easier to work forwards than backwards, e.g. something like v = np.arange(...) Y = np.cusum((a*(1+p)**v)*I(v)) Josef > I rewrote my code like this: > ---------------------------------------------------------------------------------------------- > from scipy.optimize import leastsq > import numpy as np > def Iv(t): > ?? ?return 4 > def Yt(x, et): > ?? ?a, pa = x > ?? ?sum = np.array([0,0,0,0,0]) > ?? ?for i in range(0, len(et)): > ?? ? ? ?for j in range(0, et[i]): > ?? ? ? ? ? ?v = T[i] - et[i] + j > ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v) > ?? ?return sum - Y > T = np.array([1995,1996,1997,1998,1999]) > Y = > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > E = np.array([10,11,12,13,14]) > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) > A, Pa = r[0] > print "A=",A,"Pa=",Pa > ---------------------------------------------------------------------------------------------- > the output is: > A= 1.0 Pa = 0.0 > ---------------------------------------------------------------------------------------------- > I don't think it is correct. Hope for your guidence. > On Sat, May 15, 2010 at 7:02 PM, wrote: >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote: >> > Traceback (most recent call last): >> > ??File "D:\Yt.py", line 31, in >> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line >> > 266, >> > in leastsq >> > ?? ?m = check_func(func,x0,args,n)[0] >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line >> > 12, >> > in check_func >> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) >> > ??File "D:\Yt.py", line 26, in residuals >> > ?? ?return y - Yt(x, p) >> > ??File "D:\Yt.py", line 20, in Yt >> > ?? ?for i in range(0, Et(x)): >> > ??File "D:\Yt.py", line 11, in Et >> > ?? ?if t == 1995: >> > ValueError: The truth value of an array with more than one element is >> > ambiguous. Use a.any() or a.all() >> > >> > --------------------------------------------------------------------------------------------------------- >> > >> > When running the following code: >> > >> > >> > -------------------------------------------------------------------------------------------- >> > >> > from scipy.optimize import leastsq >> > import numpy as np >> > >> > def Iv(t): >> > ? ? if t == 1995: >> > ? ? ? ? return t + 2 >> > ? ? else: >> > ? ? ? ? return t >> > >> > def Et(t): >> > ? ? if t == 1995: >> > ? ? ? ? return t + 2 >> > ? ? else: >> > ? ? ? ? return t >> > >> > def Yt(x, p): >> > ? ? a, pa = p >> > ? ? sum = 0 >> > >> > ? ? for i in range(0, Et(x)): >> > ? ? ? ? v = x - et + i >> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v) >> > ? ? return sum >> > >> > def residuals(p, y, x): >> > ? ? return y - Yt(x, p) >> > >> > T = np.array([1995,1996,1997,1998,1999]) >> > Y = >> > >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) >> > >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> > A, Pa = r[0] >> > print "A=",A,"Pa=",Pa >> > >> > >> > ---------------------------------------------------------------------------------------------- >> > >> > I know the error occurs when I compare t like: "if t == 1995",but I have >> > no >> > idea how to handle it correctly. >> >> try the vectorized version of a conditional assignment, e.g. >> np.where(t == 1995, t, t+2) >> >> I didn't read enough of your example, to tell whether your Yt loop can >> be vectorized with a single sum, but I guess so. >> >> optimize leastsq expects an array, so residuals (and Yt) need to >> return an array not a single value, maybe np.cusum and conditional or >> data dependent slicing/indexing works >> >> Josef >> >> >> > >> > Any help would be greatly appreciated. >> > >> > Zhe Wang >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From 3njoywind at gmail.com Sat May 15 11:06:04 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Sat, 15 May 2010 23:06:04 +0800 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: Josef: Thanks, my example is just for test, it is not always have a fixed start, e.g >>> T = np.array([1995,1996,1997,1998,1999]) >>> E = np.array([14,12,11,15,12]) >>> T-E array([1981, 1984, 1986, 1983, 1987]) so, if I define the function: def func(x): #.... v = np.arange(...) Y = np.cumsum((a*(1+p)**v)*I(v)) return Y when I call leastsq(func, [1,0]), v should change as the element of T change, e.g. when t(one element of T) is 1995?v = np.arange(1981, 1995) when t is 1996, v = np.arange(1984, 1996) ...... this troubles me so much. Zhe Wang On Sat, May 15, 2010 at 9:08 PM, wrote: > On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote: > > Josef: > > Thanks for your reply:) > > Actually I want to fit this equation: > > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] > > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be > > calculated by e() and I(). > > Do you always have a fixed start date as in your example > > >>> T = np.array([1995,1996,1997,1998,1999]) > >>> E = np.array([10,11,12,13,14]) > >>> T-E > array([1985, 1985, 1985, 1985, 1985]) > > so that always v =range(T0, T+1) with fixed T0=1985 > > this would make it easier to work forwards than backwards, e.g. something > like > v = np.arange(...) > Y = np.cusum((a*(1+p)**v)*I(v)) > > Josef > > > > > > I rewrote my code like this: > > > ---------------------------------------------------------------------------------------------- > > from scipy.optimize import leastsq > > import numpy as np > > def Iv(t): > > return 4 > > def Yt(x, et): > > a, pa = x > > sum = np.array([0,0,0,0,0]) > > for i in range(0, len(et)): > > for j in range(0, et[i]): > > v = T[i] - et[i] + j > > sum[i] += a*(1+pa)**(v)*Iv(v) > > return sum - Y > > T = np.array([1995,1996,1997,1998,1999]) > > Y = > > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > > E = np.array([10,11,12,13,14]) > > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) > > A, Pa = r[0] > > print "A=",A,"Pa=",Pa > > > ---------------------------------------------------------------------------------------------- > > the output is: > > A= 1.0 Pa = 0.0 > > > ---------------------------------------------------------------------------------------------- > > I don't think it is correct. Hope for your guidence. > > On Sat, May 15, 2010 at 7:02 PM, wrote: > >> > >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote: > >> > Traceback (most recent call last): > >> > File "D:\Yt.py", line 31, in > >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line > >> > 266, > >> > in leastsq > >> > m = check_func(func,x0,args,n)[0] > >> > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line > >> > 12, > >> > in check_func > >> > res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > >> > File "D:\Yt.py", line 26, in residuals > >> > return y - Yt(x, p) > >> > File "D:\Yt.py", line 20, in Yt > >> > for i in range(0, Et(x)): > >> > File "D:\Yt.py", line 11, in Et > >> > if t == 1995: > >> > ValueError: The truth value of an array with more than one element is > >> > ambiguous. Use a.any() or a.all() > >> > > >> > > --------------------------------------------------------------------------------------------------------- > >> > > >> > When running the following code: > >> > > >> > > >> > > -------------------------------------------------------------------------------------------- > >> > > >> > from scipy.optimize import leastsq > >> > import numpy as np > >> > > >> > def Iv(t): > >> > if t == 1995: > >> > return t + 2 > >> > else: > >> > return t > >> > > >> > def Et(t): > >> > if t == 1995: > >> > return t + 2 > >> > else: > >> > return t > >> > > >> > def Yt(x, p): > >> > a, pa = p > >> > sum = 0 > >> > > >> > for i in range(0, Et(x)): > >> > v = x - et + i > >> > sum += a*(1+p)**(v)*Iv(v) > >> > return sum > >> > > >> > def residuals(p, y, x): > >> > return y - Yt(x, p) > >> > > >> > T = np.array([1995,1996,1997,1998,1999]) > >> > Y = > >> > > >> > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > >> > > >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> > A, Pa = r[0] > >> > print "A=",A,"Pa=",Pa > >> > > >> > > >> > > ---------------------------------------------------------------------------------------------- > >> > > >> > I know the error occurs when I compare t like: "if t == 1995",but I > have > >> > no > >> > idea how to handle it correctly. > >> > >> try the vectorized version of a conditional assignment, e.g. > >> np.where(t == 1995, t, t+2) > >> > >> I didn't read enough of your example, to tell whether your Yt loop can > >> be vectorized with a single sum, but I guess so. > >> > >> optimize leastsq expects an array, so residuals (and Yt) need to > >> return an array not a single value, maybe np.cusum and conditional or > >> data dependent slicing/indexing works > >> > >> Josef > >> > >> > >> > > >> > Any help would be greatly appreciated. > >> > > >> > Zhe Wang > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 15 12:34:33 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 May 2010 12:34:33 -0400 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote: > Josef: > Thanks, > my example is just for test, it is not always have a fixed start, e.g >>>> T = np.array([1995,1996,1997,1998,1999]) >>>> E = np.array([14,12,11,15,12]) >>>> T-E > array([1981, 1984, 1986, 1983, 1987]) > so, if I define the function: > def func(x): > ?? ?#.... > ?? ?v = np.arange(...) > ?? ?Y = np.cumsum((a*(1+p)**v)*I(v)) > ?? ?return Y > when I call leastsq(func, [1,0]), v should change as the element of T > change, e.g. > when t(one element of T) is 1995?v = np.arange(1981, 1995) > when t is 1996, v = np.arange(1984, 1996) are you sure v is supposed to be calender years and not number of years accumulated? i.e (1+p)**1984 or (1+p)**(1996-1984) The most efficient would be to use the formula for the sum of a finite geometric series, which would also avoid the sum. Josef > ...... > this troubles me so much. > Zhe Wang > > > On Sat, May 15, 2010 at 9:08 PM, wrote: >> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote: >> > Josef: >> > Thanks for your reply:) >> > Actually I want to fit this equation: >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be >> > calculated by e() and I(). >> >> Do you always have a fixed start date as in your example >> >> >>> T = np.array([1995,1996,1997,1998,1999]) >> >>> E = np.array([10,11,12,13,14]) >> >>> T-E >> array([1985, 1985, 1985, 1985, 1985]) >> >> so that always v =range(T0, T+1) ? ? with fixed T0=1985 >> >> this would make it easier to work forwards than backwards, e.g. something >> like >> v = np.arange(...) >> Y = np.cusum((a*(1+p)**v)*I(v)) >> >> Josef >> >> >> >> >> > I rewrote my code like this: >> > >> > ---------------------------------------------------------------------------------------------- >> > from scipy.optimize import leastsq >> > import numpy as np >> > def Iv(t): >> > ?? ?return 4 >> > def Yt(x, et): >> > ?? ?a, pa = x >> > ?? ?sum = np.array([0,0,0,0,0]) >> > ?? ?for i in range(0, len(et)): >> > ?? ? ? ?for j in range(0, et[i]): >> > ?? ? ? ? ? ?v = T[i] - et[i] + j >> > ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v) >> > ?? ?return sum - Y >> > T = np.array([1995,1996,1997,1998,1999]) >> > Y = >> > >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) >> > E = np.array([10,11,12,13,14]) >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) >> > A, Pa = r[0] >> > print "A=",A,"Pa=",Pa >> > >> > ---------------------------------------------------------------------------------------------- >> > the output is: >> > A= 1.0 Pa = 0.0 >> > >> > ---------------------------------------------------------------------------------------------- >> > I don't think it is correct. Hope for your guidence. >> > On Sat, May 15, 2010 at 7:02 PM, wrote: >> >> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote: >> >> > Traceback (most recent call last): >> >> > ??File "D:\Yt.py", line 31, in >> >> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", >> >> > line >> >> > 266, >> >> > in leastsq >> >> > ?? ?m = check_func(func,x0,args,n)[0] >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", >> >> > line >> >> > 12, >> >> > in check_func >> >> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) >> >> > ??File "D:\Yt.py", line 26, in residuals >> >> > ?? ?return y - Yt(x, p) >> >> > ??File "D:\Yt.py", line 20, in Yt >> >> > ?? ?for i in range(0, Et(x)): >> >> > ??File "D:\Yt.py", line 11, in Et >> >> > ?? ?if t == 1995: >> >> > ValueError: The truth value of an array with more than one element is >> >> > ambiguous. Use a.any() or a.all() >> >> > >> >> > >> >> > --------------------------------------------------------------------------------------------------------- >> >> > >> >> > When running the following code: >> >> > >> >> > >> >> > >> >> > -------------------------------------------------------------------------------------------- >> >> > >> >> > from scipy.optimize import leastsq >> >> > import numpy as np >> >> > >> >> > def Iv(t): >> >> > ? ? if t == 1995: >> >> > ? ? ? ? return t + 2 >> >> > ? ? else: >> >> > ? ? ? ? return t >> >> > >> >> > def Et(t): >> >> > ? ? if t == 1995: >> >> > ? ? ? ? return t + 2 >> >> > ? ? else: >> >> > ? ? ? ? return t >> >> > >> >> > def Yt(x, p): >> >> > ? ? a, pa = p >> >> > ? ? sum = 0 >> >> > >> >> > ? ? for i in range(0, Et(x)): >> >> > ? ? ? ? v = x - et + i >> >> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v) >> >> > ? ? return sum >> >> > >> >> > def residuals(p, y, x): >> >> > ? ? return y - Yt(x, p) >> >> > >> >> > T = np.array([1995,1996,1997,1998,1999]) >> >> > Y = >> >> > >> >> > >> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) >> >> > >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> >> > A, Pa = r[0] >> >> > print "A=",A,"Pa=",Pa >> >> > >> >> > >> >> > >> >> > ---------------------------------------------------------------------------------------------- >> >> > >> >> > I know the error occurs when I compare t like: "if t == 1995",but I >> >> > have >> >> > no >> >> > idea how to handle it correctly. >> >> >> >> try the vectorized version of a conditional assignment, e.g. >> >> np.where(t == 1995, t, t+2) >> >> >> >> I didn't read enough of your example, to tell whether your Yt loop can >> >> be vectorized with a single sum, but I guess so. >> >> >> >> optimize leastsq expects an array, so residuals (and Yt) need to >> >> return an array not a single value, maybe np.cusum and conditional or >> >> data dependent slicing/indexing works >> >> >> >> Josef >> >> >> >> >> >> > >> >> > Any help would be greatly appreciated. >> >> > >> >> > Zhe Wang >> >> > >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From alan at ajackson.org Sat May 15 20:09:02 2010 From: alan at ajackson.org (alan at ajackson.org) Date: Sat, 15 May 2010 19:09:02 -0500 Subject: [SciPy-User] writing data to binary for fortran In-Reply-To: <4BEBE4D0.7090003@comcast.net> References: <4BE9B6ED.1020603@wartburg.edu> <4BEAECC4.5050508@wartburg.edu> <4BEB0DBD.4080000@wartburg.edu> <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com> <4BEBE4D0.7090003@comcast.net> Message-ID: <20100515190902.25a3ed70@ajackson.org> A few years ago I was speaking with a colleague - a brilliant gentleman in his late seventies who had done the blast wave modeling for the Bikini Atoll tests on a slide rule - and I mentioned that for his finite difference elastic wave equation modeling code he must use a lot of double precision arithmetic. He looked very hurt, and replied "Oh no, single precision is all you need if you know what you're doing". >"Back in the day," double precision was MUCH slower than single precision arithmetic, so Fortran used single precision by default. You used double precision only when absolutely necessary, and you had to call it explicitly. Fortran even had separate "built-in" functions for single and double - eg., sin, dsin, log, dlog, etc. - that the user called explicitly. (I haven't used Fortran for 20 years, but I think modern Fortran recognizes the type of argument, now.) > >Single and double precision are about the same speed on modern processors, and double is sometimes even faster than single on 64 bit processors (because of the ancillary data shuffling, I think). However, Fortran is dragging nearly 60 years of history along with it, so I'm not surprised that it defaults to single precision. > >john > > > >On 5/12/2010 6:05 PM, Gideon wrote:Yea, that worked for me on my OS X machine. Thanks so much. > >To be honest, in the 10 years I've been doing floating point >calculations for ODEs and PDEs, I don't think I've ever used single >precision arithmetic. So I am surprised it doesn't default to double >precision. Obviously, different people have different needs. > >On May 12, 4:21 pm, Neil Martinsen-Burrell wrote: > On 2010-05-12 14:58, Gideon wrote: > > Tried both, but I got the same error in both cases. > >If you want doubles in your file, you have to request them: > >F.writeReals(x, prec='d') > >makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran >4.4.3). Note that looking at the size of the file that you would expect >to have for the data you are expecting to read would have demonstrated >this: 10 doubles at eight bytes per double plus two 4-byte integers >would have given you 88 bytes for the file, rather than the 48 that were >being produced. > >I use fortranfile most heavily for reading files, rather than writing >them, so I may have missed this opportunity, but do you think that the >precision used in writeReals should be auto-detected from the data type >that it is passed. That is, would > >def writeReals(self, reals, prec=None): > if prec is None: > prec = reals.dtype.char > ... > >be better for your use? That would have made your original code work as >written. > >-Neil >_______________________________________________ >SciPy-User mailing list >SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > >-- >You received this message because you are subscribed to the Google Groups "SciPy-user" group. >To post to this group, send email to scipy-user at googlegroups.com. >To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com. >For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en. > _______________________________________________ >SciPy-User mailing list >SciPy-User at scipy.org >http://mail.scipy.org/mailman/listinfo/scipy-user > > >No virus found in this incoming message. >Checked by AVG - www.avg.com >Version: 9.0.819 / Virus Database: 271.1.1/2869 - Release Date: 05/12/10 02:26:00 > > -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From 3njoywind at gmail.com Sat May 15 21:58:37 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Sun, 16 May 2010 09:58:37 +0800 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: v is calender years e.g (1+p)**1984 May be you could help me with this simple example: I have a function defined like this(a0 is a parameter): ------------------------------------ def f(x): if x > 4: return x + a0 else: return x - a0 ------------------------------------ I generated the data when I let a0=2 : -------------------------------------- X = np.array([1,2,3,4,5,6,7,8,9]) Y = np.array([-1, 0, 1, 2,7,8,9,10,11]) -------------------------------------- How can I use leastsq() to fit f(x)? I have wrote some code like this: -------------------------------------- def func(p, x, y): a = p sum = np.array([0,0,0,0,0,0,0,0,0]) for i in range(0, len(x)): if x[i] > 4: sum[i] = x[i] + a else: sum[i] = x[i] - a return sum - y r = leastsq(func,1,args=(X,Y)) print r[0] -------------------------------------- the output is 1.0000000149, much different like 2. I doubt whether leastsq() is suitable for this kind of problem. Maybe I should try another way? Zhe On Sun, May 16, 2010 at 12:34 AM, wrote: > On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote: > > Josef: > > Thanks, > > my example is just for test, it is not always have a fixed start, e.g > >>>> T = np.array([1995,1996,1997,1998,1999]) > >>>> E = np.array([14,12,11,15,12]) > >>>> T-E > > array([1981, 1984, 1986, 1983, 1987]) > > so, if I define the function: > > def func(x): > > #.... > > v = np.arange(...) > > Y = np.cumsum((a*(1+p)**v)*I(v)) > > return Y > > when I call leastsq(func, [1,0]), v should change as the element of T > > change, e.g. > > when t(one element of T) is 1995?v = np.arange(1981, 1995) > > when t is 1996, v = np.arange(1984, 1996) > > are you sure v is supposed to be calender years and not number of > years accumulated? i.e > > (1+p)**1984 > or > (1+p)**(1996-1984) > > The most efficient would be to use the formula for the sum of a finite > geometric series, which would also avoid the sum. > > Josef > > > > ...... > > this troubles me so much. > > Zhe Wang > > > > > > On Sat, May 15, 2010 at 9:08 PM, wrote: > >> > >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote: > >> > Josef: > >> > Thanks for your reply:) > >> > Actually I want to fit this equation: > >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] > >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be > >> > calculated by e() and I(). > >> > >> Do you always have a fixed start date as in your example > >> > >> >>> T = np.array([1995,1996,1997,1998,1999]) > >> >>> E = np.array([10,11,12,13,14]) > >> >>> T-E > >> array([1985, 1985, 1985, 1985, 1985]) > >> > >> so that always v =range(T0, T+1) with fixed T0=1985 > >> > >> this would make it easier to work forwards than backwards, e.g. > something > >> like > >> v = np.arange(...) > >> Y = np.cusum((a*(1+p)**v)*I(v)) > >> > >> Josef > >> > >> > >> > >> > >> > I rewrote my code like this: > >> > > >> > > ---------------------------------------------------------------------------------------------- > >> > from scipy.optimize import leastsq > >> > import numpy as np > >> > def Iv(t): > >> > return 4 > >> > def Yt(x, et): > >> > a, pa = x > >> > sum = np.array([0,0,0,0,0]) > >> > for i in range(0, len(et)): > >> > for j in range(0, et[i]): > >> > v = T[i] - et[i] + j > >> > sum[i] += a*(1+pa)**(v)*Iv(v) > >> > return sum - Y > >> > T = np.array([1995,1996,1997,1998,1999]) > >> > Y = > >> > > >> > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > >> > E = np.array([10,11,12,13,14]) > >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) > >> > A, Pa = r[0] > >> > print "A=",A,"Pa=",Pa > >> > > >> > > ---------------------------------------------------------------------------------------------- > >> > the output is: > >> > A= 1.0 Pa = 0.0 > >> > > >> > > ---------------------------------------------------------------------------------------------- > >> > I don't think it is correct. Hope for your guidence. > >> > On Sat, May 15, 2010 at 7:02 PM, wrote: > >> >> > >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> > wrote: > >> >> > Traceback (most recent call last): > >> >> > File "D:\Yt.py", line 31, in > >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> >> > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", > >> >> > line > >> >> > 266, > >> >> > in leastsq > >> >> > m = check_func(func,x0,args,n)[0] > >> >> > File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", > >> >> > line > >> >> > 12, > >> >> > in check_func > >> >> > res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > >> >> > File "D:\Yt.py", line 26, in residuals > >> >> > return y - Yt(x, p) > >> >> > File "D:\Yt.py", line 20, in Yt > >> >> > for i in range(0, Et(x)): > >> >> > File "D:\Yt.py", line 11, in Et > >> >> > if t == 1995: > >> >> > ValueError: The truth value of an array with more than one element > is > >> >> > ambiguous. Use a.any() or a.all() > >> >> > > >> >> > > >> >> > > --------------------------------------------------------------------------------------------------------- > >> >> > > >> >> > When running the following code: > >> >> > > >> >> > > >> >> > > >> >> > > -------------------------------------------------------------------------------------------- > >> >> > > >> >> > from scipy.optimize import leastsq > >> >> > import numpy as np > >> >> > > >> >> > def Iv(t): > >> >> > if t == 1995: > >> >> > return t + 2 > >> >> > else: > >> >> > return t > >> >> > > >> >> > def Et(t): > >> >> > if t == 1995: > >> >> > return t + 2 > >> >> > else: > >> >> > return t > >> >> > > >> >> > def Yt(x, p): > >> >> > a, pa = p > >> >> > sum = 0 > >> >> > > >> >> > for i in range(0, Et(x)): > >> >> > v = x - et + i > >> >> > sum += a*(1+p)**(v)*Iv(v) > >> >> > return sum > >> >> > > >> >> > def residuals(p, y, x): > >> >> > return y - Yt(x, p) > >> >> > > >> >> > T = np.array([1995,1996,1997,1998,1999]) > >> >> > Y = > >> >> > > >> >> > > >> >> > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > >> >> > > >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> >> > A, Pa = r[0] > >> >> > print "A=",A,"Pa=",Pa > >> >> > > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------------------------------- > >> >> > > >> >> > I know the error occurs when I compare t like: "if t == 1995",but I > >> >> > have > >> >> > no > >> >> > idea how to handle it correctly. > >> >> > >> >> try the vectorized version of a conditional assignment, e.g. > >> >> np.where(t == 1995, t, t+2) > >> >> > >> >> I didn't read enough of your example, to tell whether your Yt loop > can > >> >> be vectorized with a single sum, but I guess so. > >> >> > >> >> optimize leastsq expects an array, so residuals (and Yt) need to > >> >> return an array not a single value, maybe np.cusum and conditional or > >> >> data dependent slicing/indexing works > >> >> > >> >> Josef > >> >> > >> >> > >> >> > > >> >> > Any help would be greatly appreciated. > >> >> > > >> >> > Zhe Wang > >> >> > > >> >> > _______________________________________________ > >> >> > SciPy-User mailing list > >> >> > SciPy-User at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> _______________________________________________ > >> >> SciPy-User mailing list > >> >> SciPy-User at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Sat May 15 22:25:14 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 15 May 2010 22:25:14 -0400 Subject: [SciPy-User] optimize.leastsq In-Reply-To: References: Message-ID: <4BEF578A.6050004@american.edu> On 5/15/2010 9:58 PM, Zhe Wang wrote: > sum = np.array([0,0,0,0,0,0,0,0,0]) sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float) hth, Alan Isaac From 3njoywind at gmail.com Sat May 15 22:37:34 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Sun, 16 May 2010 10:37:34 +0800 Subject: [SciPy-User] optimize.leastsq In-Reply-To: <4BEF578A.6050004@american.edu> References: <4BEF578A.6050004@american.edu> Message-ID: Alan: Thanks, it works. lol I'll try it in my current work. Regards Zhe On Sun, May 16, 2010 at 10:25 AM, Alan G Isaac wrote: > On 5/15/2010 9:58 PM, Zhe Wang wrote: > > sum = np.array([0,0,0,0,0,0,0,0,0]) > > sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float) > > hth, > Alan Isaac > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From briedel at wisc.edu Sun May 16 00:12:21 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Sat, 15 May 2010 23:12:21 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Fri, May 14, 2010 at 14:51, wrote: > On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel wrote: > > Hey, > > > > I am fairly new Scipy and am trying to do a least square fit to a set of > > data. Currently, I am using following code: > > > > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > > pinit = [20,20.] > > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1) > > > > I am now trying to get the goodness of fit out of this data. I am sort of > > running into a brick wall because I found a lot of conflicting ways of > how > > to calculate it. > > For regression the usual is > http://en.wikipedia.org/wiki/Coefficient_of_determination > coefficient of determination is > > R^2 = 1 - {SS_{err} / SS_{tot}} > > Note your fitfunc is linear in parameters and can be better estimated > by linear least squares, OLS. > linear regression is handled in statsmodels and you can get lot's of > statistics without worrying about the formulas. > If you only have one slope parameter, then scipy.stats.linregress also > works > > Thanks for the information. I am still note quite sure if this is what my boss wants because there should not be an average y value. > scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance > of the parameter estimates. > http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > I have been trying this out, but the fit just looks horrid compared to using leastsq method even though they call the same function according to the documentation. > > I am aware of the chisquare function in stats function, but the > > documentation seems a little confusing to me. Any help would be greatly > > appreciates. > > chisquare and others like kolmogorov-smirnov are more for testing the > goodness-of-fit of entire distributions, not for how well a curve or > line fits the data. > > That is what I thought, which brought up my confusion when I asked other people and they told me to use that > Josef > > > > > Thanks very much in advance. > > > > Cheers, > > > > Ben > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Sun May 16 04:39:29 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 16 May 2010 11:39:29 +0300 Subject: [SciPy-User] solving ODE by FuncDesigner with automatic differentiation Message-ID: Hi all, if anyone is interested, I have implemented possibility to model ODE in FuncDesigner and solve it, involving automatic differentiation. For examples and more details see http://openopt.org/FuncDesignerDoc#Solving_ODE Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 16 06:50:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 May 2010 06:50:29 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel wrote: > > > On Fri, May 14, 2010 at 14:51, wrote: >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel wrote: >> > Hey, >> > >> > I am fairly new Scipy and am trying to do a least square fit to a set of >> > data. Currently, I am using following code: >> > >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> > pinit = [20,20.] >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1) >> > >> > I am now trying to get the goodness of fit out of this data. I am sort >> > of >> > running into a brick wall because I found a lot of conflicting ways of >> > how >> > to calculate it. >> >> For regression the usual is >> http://en.wikipedia.org/wiki/Coefficient_of_determination >> coefficient of determination is >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}} >> >> Note your fitfunc is linear in parameters and can be better estimated >> by linear least squares, OLS. >> linear regression is handled in statsmodels and you can get lot's of >> statistics without worrying about the formulas. >> If you only have one slope parameter, then scipy.stats.linregress also >> works >> > > Thanks for the information. I am still note quite sure if this is what my > boss wants because there should not be an average y value. The definition of Rsquared is pretty uncontroversial with the y.mean() correction, if there is a constant in the regression (although I know mainly the linear case for this). If there is no constant in the regression, the definition or Rsquared is not clear/unambiguous, but usually used without mean correction of y. Josef > >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance >> of the parameter estimates. >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > > I have been trying this out, but the fit just looks horrid compared to using > leastsq method even though they call the same function according to the > documentation. > >> >> > I am aware of the chisquare function in stats function, but the >> > documentation seems a little confusing to me. Any help would be greatly >> > appreciates. >> >> chisquare and others like kolmogorov-smirnov are more for testing the >> goodness-of-fit of entire distributions, not for how well a curve or >> line fits the data. >> > > That is what I thought, which brought up my confusion when I asked other > people and they told me to use that > >> >> Josef >> >> > >> > Thanks very much in advance. >> > >> > Cheers, >> > >> > Ben >> > >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > Benedikt Riedel > Graduate Student University of Wisconsin-Madison > Department of Physics > Office: 2304 Chamberlin Hall > Lab: 6247 Chamberlin Hall > Tel: ?(608) 301-5736 > Cell: (213) 519-1771 > Lab: (608) 262-5916 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From tmp50 at ukr.net Sun May 16 08:57:35 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 16 May 2010 15:57:35 +0300 Subject: [SciPy-User] Isn't it a bug in scipy.integrate.odeint doc? Message-ID: hi all, I see the following lines in odeint doc/docstring http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html dy/dt = func(y,t0,...) func : callable(y, t0, ...) Computes the derivative of y at t0. Dfun : callable(y, t0, ...) Gradient (Jacobian) of func. shouldn't it be "t" instead of "t0" there? Let me also note, that some input variables are undocumented there. D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From baker.alexander at gmail.com Sun May 16 12:10:40 2010 From: baker.alexander at gmail.com (alexander baker) Date: Sun, 16 May 2010 17:10:40 +0100 Subject: [SciPy-User] python for physics Message-ID: 3 friends Physics friends of mine are looking for a starting point to learn scientific computing in Python relevant to applied Physics, does anyone have any suggestions, hints or event a deck of slides that could be useful? Alex Mobile: 07788 872118 Blog: www.alexfb.com -- All science is either physics or stamp collecting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Sun May 16 12:22:53 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Sun, 16 May 2010 18:22:53 +0200 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: you are using integer arrays... this should work: import numpy as np from scipy.optimize import leastsq X = np.array([1,2,3,4,5,6,7,8,9],dtype=float) Y = np.array([-1, 0, 1, 2,7,8,9,10,11],dtype=float) def func(p, x, y): a = p sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float) for i in range(0, len(x)): if x[i] > 4: sum[i] = x[i] + a else: sum[i] = x[i] - a return sum - y r = leastsq(func,1,args=(X,Y)) print r[0] regards, Sebastian On Sun, May 16, 2010 at 3:58 AM, Zhe Wang <3njoywind at gmail.com> wrote: > v is calender years e.g > (1+p)**1984 > May be you could help me with this simple example: > I have a function defined like this(a0 is a parameter): > ------------------------------------ > def f(x): > ?? ?if x > 4: > ?? ? ? ?return x + a0 > ?? ?else: > ?? ? ? ?return x - a0 > ------------------------------------ > I generated the data when I let a0=2 : > -------------------------------------- > X = np.array([1,2,3,4,5,6,7,8,9]) > Y = np.array([-1, 0, 1, 2,7,8,9,10,11]) > -------------------------------------- > How can I use leastsq() to fit f(x)? I have wrote some code like this: > -------------------------------------- > def func(p, x, y): > ?? ?a = p > ?? ?sum = np.array([0,0,0,0,0,0,0,0,0]) > ?? ?for i in range(0, len(x)): > ?? ? ? ?if x[i] > 4: > ?? ? ? ? ? ?sum[i] = x[i] + a > ?? ? ? ?else: > ?? ? ? ? ? ?sum[i] = x[i] - a > ?? ?return sum - y > r = leastsq(func,1,args=(X,Y)) > print r[0] > -------------------------------------- > the output is 1.0000000149, much different like 2. > I doubt whether leastsq() is suitable for this kind of problem. Maybe I > should try another way? > Zhe > On Sun, May 16, 2010 at 12:34 AM, wrote: >> >> On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote: >> > Josef: >> > Thanks, >> > my example is just for test, it is not always have a fixed start, e.g >> >>>> T = np.array([1995,1996,1997,1998,1999]) >> >>>> E = np.array([14,12,11,15,12]) >> >>>> T-E >> > array([1981, 1984, 1986, 1983, 1987]) >> > so, if I define the function: >> > def func(x): >> > ?? ?#.... >> > ?? ?v = np.arange(...) >> > ?? ?Y = np.cumsum((a*(1+p)**v)*I(v)) >> > ?? ?return Y >> > when I call leastsq(func, [1,0]), v should change as the element of T >> > change, e.g. >> > when t(one element of T) is 1995?v = np.arange(1981, 1995) >> > when t is 1996, v = np.arange(1984, 1996) >> >> are you sure v is supposed to be calender years and not number of >> years accumulated? ?i.e >> >> (1+p)**1984 >> or >> (1+p)**(1996-1984) >> >> The most efficient would be to use the formula for the sum of a finite >> geometric series, which would also avoid the sum. >> >> Josef >> >> >> > ...... >> > this troubles me so much. >> > Zhe Wang >> > >> > >> > On Sat, May 15, 2010 at 9:08 PM, wrote: >> >> >> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote: >> >> > Josef: >> >> > Thanks for your reply:) >> >> > Actually I want to fit this equation: >> >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] >> >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be >> >> > calculated by e() and I(). >> >> >> >> Do you always have a fixed start date as in your example >> >> >> >> >>> T = np.array([1995,1996,1997,1998,1999]) >> >> >>> E = np.array([10,11,12,13,14]) >> >> >>> T-E >> >> array([1985, 1985, 1985, 1985, 1985]) >> >> >> >> so that always v =range(T0, T+1) ? ? with fixed T0=1985 >> >> >> >> this would make it easier to work forwards than backwards, e.g. >> >> something >> >> like >> >> v = np.arange(...) >> >> Y = np.cusum((a*(1+p)**v)*I(v)) >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> > I rewrote my code like this: >> >> > >> >> > >> >> > ---------------------------------------------------------------------------------------------- >> >> > from scipy.optimize import leastsq >> >> > import numpy as np >> >> > def Iv(t): >> >> > ?? ?return 4 >> >> > def Yt(x, et): >> >> > ?? ?a, pa = x >> >> > ?? ?sum = np.array([0,0,0,0,0]) >> >> > ?? ?for i in range(0, len(et)): >> >> > ?? ? ? ?for j in range(0, et[i]): >> >> > ?? ? ? ? ? ?v = T[i] - et[i] + j >> >> > ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v) >> >> > ?? ?return sum - Y >> >> > T = np.array([1995,1996,1997,1998,1999]) >> >> > Y = >> >> > >> >> > >> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) >> >> > E = np.array([10,11,12,13,14]) >> >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) >> >> > A, Pa = r[0] >> >> > print "A=",A,"Pa=",Pa >> >> > >> >> > >> >> > ---------------------------------------------------------------------------------------------- >> >> > the output is: >> >> > A= 1.0 Pa = 0.0 >> >> > >> >> > >> >> > ---------------------------------------------------------------------------------------------- >> >> > I don't think it is correct. Hope for your guidence. >> >> > On Sat, May 15, 2010 at 7:02 PM, wrote: >> >> >> >> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> >> >> >> wrote: >> >> >> > Traceback (most recent call last): >> >> >> > ??File "D:\Yt.py", line 31, in >> >> >> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", >> >> >> > line >> >> >> > 266, >> >> >> > in leastsq >> >> >> > ?? ?m = check_func(func,x0,args,n)[0] >> >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", >> >> >> > line >> >> >> > 12, >> >> >> > in check_func >> >> >> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) >> >> >> > ??File "D:\Yt.py", line 26, in residuals >> >> >> > ?? ?return y - Yt(x, p) >> >> >> > ??File "D:\Yt.py", line 20, in Yt >> >> >> > ?? ?for i in range(0, Et(x)): >> >> >> > ??File "D:\Yt.py", line 11, in Et >> >> >> > ?? ?if t == 1995: >> >> >> > ValueError: The truth value of an array with more than one element >> >> >> > is >> >> >> > ambiguous. Use a.any() or a.all() >> >> >> > >> >> >> > >> >> >> > >> >> >> > --------------------------------------------------------------------------------------------------------- >> >> >> > >> >> >> > When running the following code: >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > -------------------------------------------------------------------------------------------- >> >> >> > >> >> >> > from scipy.optimize import leastsq >> >> >> > import numpy as np >> >> >> > >> >> >> > def Iv(t): >> >> >> > ? ? if t == 1995: >> >> >> > ? ? ? ? return t + 2 >> >> >> > ? ? else: >> >> >> > ? ? ? ? return t >> >> >> > >> >> >> > def Et(t): >> >> >> > ? ? if t == 1995: >> >> >> > ? ? ? ? return t + 2 >> >> >> > ? ? else: >> >> >> > ? ? ? ? return t >> >> >> > >> >> >> > def Yt(x, p): >> >> >> > ? ? a, pa = p >> >> >> > ? ? sum = 0 >> >> >> > >> >> >> > ? ? for i in range(0, Et(x)): >> >> >> > ? ? ? ? v = x - et + i >> >> >> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v) >> >> >> > ? ? return sum >> >> >> > >> >> >> > def residuals(p, y, x): >> >> >> > ? ? return y - Yt(x, p) >> >> >> > >> >> >> > T = np.array([1995,1996,1997,1998,1999]) >> >> >> > Y = >> >> >> > >> >> >> > >> >> >> > >> >> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) >> >> >> > >> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) >> >> >> > A, Pa = r[0] >> >> >> > print "A=",A,"Pa=",Pa >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ---------------------------------------------------------------------------------------------- >> >> >> > >> >> >> > I know the error occurs when I compare t like: "if t == 1995",but >> >> >> > I >> >> >> > have >> >> >> > no >> >> >> > idea how to handle it correctly. >> >> >> >> >> >> try the vectorized version of a conditional assignment, e.g. >> >> >> np.where(t == 1995, t, t+2) >> >> >> >> >> >> I didn't read enough of your example, to tell whether your Yt loop >> >> >> can >> >> >> be vectorized with a single sum, but I guess so. >> >> >> >> >> >> optimize leastsq expects an array, so residuals (and Yt) need to >> >> >> return an array not a single value, maybe np.cusum and conditional >> >> >> or >> >> >> data dependent slicing/indexing works >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> >> > >> >> >> > Any help would be greatly appreciated. >> >> >> > >> >> >> > Zhe Wang >> >> >> > >> >> >> > _______________________________________________ >> >> >> > SciPy-User mailing list >> >> >> > SciPy-User at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> >> >> > >> >> >> _______________________________________________ >> >> >> SciPy-User mailing list >> >> >> SciPy-User at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From aisaac at american.edu Sun May 16 12:33:10 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sun, 16 May 2010 12:33:10 -0400 Subject: [SciPy-User] python for physics In-Reply-To: References: Message-ID: <4BF01E46.3020700@american.edu> On 5/16/2010 12:10 PM, alexander baker wrote: > Physics friends of mine are looking for a starting point to learn > scientific computing in Python relevant to applied Physics, does anyone > have any suggestions, hints or event a deck of slides that could be useful? http://pages.physics.cornell.edu/sethna/StatMech/ http://pages.physics.cornell.edu/sethna/StatMech/ComputerExercises/ http://pages.physics.cornell.edu/sethna/StatMech/ComputerExercises/PythonSoftware/ hth, Alan Isaac From hasslerjc at comcast.net Sun May 16 12:43:59 2010 From: hasslerjc at comcast.net (John Hassler) Date: Sun, 16 May 2010 12:43:59 -0400 Subject: [SciPy-User] python for physics In-Reply-To: References: Message-ID: <4BF020CF.9040406@comcast.net> An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun May 16 12:51:43 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 16 May 2010 18:51:43 +0200 Subject: [SciPy-User] python for physics In-Reply-To: References: Message-ID: <20100516165143.GF19278@phare.normalesup.org> On Sun, May 16, 2010 at 05:10:40PM +0100, alexander baker wrote: > 3 friends Physics friends of mine are looking for a starting point to > learn scientific computing in Python relevant to applied Physics, does > anyone have any suggestions, hints or event a deck of slides that could be > useful? This is not really physics-related, and is more oriented towards image analysis than Physics, and on top of that it is unfinished, and I have been shying from publishing on the net, but the notes of the courses I give can be found here: http://gael-varoquaux.info/python4science-2x1.pdf Also, see Fernando's py4science page, full of useful material: http://fperez.org/py4science/starter_kit.html Ga?l From d.l.goldsmith at gmail.com Sun May 16 14:55:06 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 16 May 2010 11:55:06 -0700 Subject: [SciPy-User] Isn't it a bug in scipy.integrate.odeint doc? In-Reply-To: References: Message-ID: 2010/5/16 Dmitrey > hi all, > I see the following lines in odeint doc/docstring > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html > > dy/dt = func(y,t0,...)*func* : callable(y, t0, ...) > > Computes the derivative of y at t0. > > *Dfun* : callable(y, t0, ...) > > Gradient (Jacobian) of func. > > shouldn't it be "t" instead of "t0" there? > Let me also note, that some input variables are undocumented there. > D. > Let me take this opportunity to note publicly: the scipy (as opposed to the numpy) docs are not in a very advanced state (and that's putting it politely). Soon (hopefully tomorrow), I will be issuing a formal announcement of the commencement of the 2010 Summer _SciPy_ Documentation Marathon. This will be a formal solicitation of volunteers to work on the SciPy documentation; I'm hoping everyone concerned with the overall quality of SciPy will help (whether or not you've participated in the past Marathons). But just for the record, for better or worse, we don't have a ticketing system for reporting and tracking documentation "bugs"; rather, we have the doc Wiki docs.scipy.org/scipy. If you go to the docs page (click the Docstrings link at the top of the front page) you'll see a color-coded listing of all the objects in SciPy - light grey background = "Being written," and white background = "Needs editing" = never been touched (since having been imported into the Wiki database from the source code in SVN). If you go to the status page (click on stats), you'll see that presently 97% of SciPy's docstrings fall into one of these two categories (92% being in the "never been touched" category). So, while it is certainly helpful to inform the list of deficiencies such as above, please understand that problems like these are the overwhelming norm, not the exception, and the _most_ helpful thing one can do in these situations is to register as an editor (if one has not already done so; see http://docs.scipy.org/numpy/Front%20Page/, and especially "Before you start" on that page for instructions) and help fix the problem. (Don't worry if you feel you don't know enough about an object: if, in working on a docstring, you have questions about an object, email your questions to the list - getting these answered and then using that info to fix the docstring oneself will almost certainly get it fixed faster than simply reporting the problem and waiting for someone else to get around to it - that's the motivation for asking people to help: not that others don't want to do it, but that if everyone pitches in, it'll get done a whole lot faster.) Thanks, DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun May 16 16:37:28 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 16 May 2010 22:37:28 +0200 Subject: [SciPy-User] [sympy] EuroScipy abstract submission deadline extended Message-ID: <20100516203728.GJ19278@phare.normalesup.org> Given that we have been able to turn on registration only very late, the EuroScipy conference committee is extending the deadline for abstract submission for the 2010 EuroScipy conference. On Thursday May 20th, at midnight Samoa time, we will turn off the abstract submission on the conference site. Up to then, you can modify the already-submitted abstract, or submit new abstracts. We are very much looking forward to your submissions to the conference. Ga?l Varoquaux Nicolas Chauvat -- EuroScipy 2010 is the annual European conference for scientists using Python. It will be held July 8-11 2010, in ENS, Paris, France. Links: Conference website: http://www.euroscipy.org/conference/euroscipy2010 Call for papers: http://www.euroscipy.org/card/euroscipy2010_call_for_papers Practical information: http://www.euroscipy.org/card/euroscipy2010_practical_information From gael.varoquaux at normalesup.org Sat May 15 18:40:12 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 16 May 2010 00:40:12 +0200 Subject: [SciPy-User] EuroScipy abstract submission deadline extended Message-ID: <20100515224012.GC19412@phare.normalesup.org> Given that we have been able to turn on registration only very late, the EuroScipy conference committee is extending the deadline for abstract submission for the 2010 EuroScipy conference. On Thursday May 20th, at midnight Samoa time, we will turn off the abstract submission on the conference site. Up to then, you can modify the already-submitted abstract, or submit new abstracts. We are very much looking forward to your submissions to the conference. Ga?l Varoquaux Nicolas Chauvat -- EuroScipy 2010 is the annual conference for scientists using Python. It will be held July 8-11 2010, in ENS, Paris, France. Links: Conference website: http://www.euroscipy.org/conference/euroscipy2010 Call for papers: http://www.euroscipy.org/card/euroscipy2010_call_for_papers Practical information: http://www.euroscipy.org/card/euroscipy2010_practical_information From briedel at wisc.edu Sun May 16 21:05:54 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Sun, 16 May 2010 20:05:54 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: What I still do not understand is the fact that curve_fit gives me a different output then leastsq, even though curve_fit calls leastsq. I tried to get the chi-squared because we want to plot contours of chi-square from the minimum to the maximum. I used following code: fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) errfunc = lambda p, x, y: (y-fitfunc(p,x)) pinit = [20,20.] def func(x, a, b): return a*exp(-x) + b pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, sigma=R4errctsdataselect) print pfinal print covar dof=size(tau)-size(pinit) print dof chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, tau)))/dof print chi2 I am not 100% sure I am doing the degrees of freedom calculation right. I got the chi-square formula from the Pearson chi-squared test. Thank you very much for the help so far. Cheers, Ben On Sun, May 16, 2010 at 05:50, wrote: > On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel > wrote: > > > > > > On Fri, May 14, 2010 at 14:51, wrote: > >> > >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel > wrote: > >> > Hey, > >> > > >> > I am fairly new Scipy and am trying to do a least square fit to a set > of > >> > data. Currently, I am using following code: > >> > > >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >> > pinit = [20,20.] > >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), > full_output=1) > >> > > >> > I am now trying to get the goodness of fit out of this data. I am sort > >> > of > >> > running into a brick wall because I found a lot of conflicting ways of > >> > how > >> > to calculate it. > >> > >> For regression the usual is > >> http://en.wikipedia.org/wiki/Coefficient_of_determination > >> coefficient of determination is > >> > >> R^2 = 1 - {SS_{err} / SS_{tot}} > >> > >> Note your fitfunc is linear in parameters and can be better estimated > >> by linear least squares, OLS. > >> linear regression is handled in statsmodels and you can get lot's of > >> statistics without worrying about the formulas. > >> If you only have one slope parameter, then scipy.stats.linregress also > >> works > >> > > > > Thanks for the information. I am still note quite sure if this is what my > > boss wants because there should not be an average y value. > > The definition of Rsquared is pretty uncontroversial with the y.mean() > correction, if there is a constant in the regression (although I know > mainly the linear case for this). > > If there is no constant in the regression, the definition or Rsquared > is not clear/unambiguous, but usually used without mean correction of > y. > > Josef > > > > >> > >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance > >> of the parameter estimates. > >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > > > > I have been trying this out, but the fit just looks horrid compared to > using > > leastsq method even though they call the same function according to the > > documentation. > > > >> > >> > I am aware of the chisquare function in stats function, but the > >> > documentation seems a little confusing to me. Any help would be > greatly > >> > appreciates. > >> > >> chisquare and others like kolmogorov-smirnov are more for testing the > >> goodness-of-fit of entire distributions, not for how well a curve or > >> line fits the data. > >> > > > > That is what I thought, which brought up my confusion when I asked other > > people and they told me to use that > > > >> > >> Josef > >> > >> > > >> > Thanks very much in advance. > >> > > >> > Cheers, > >> > > >> > Ben > >> > > >> > > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > -- > > Benedikt Riedel > > Graduate Student University of Wisconsin-Madison > > Department of Physics > > Office: 2304 Chamberlin Hall > > Lab: 6247 Chamberlin Hall > > Tel: (608) 301-5736 > > Cell: (213) 519-1771 > > Lab: (608) 262-5916 > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From 3njoywind at gmail.com Sun May 16 21:33:43 2010 From: 3njoywind at gmail.com (Zhe Wang) Date: Mon, 17 May 2010 09:33:43 +0800 Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an array with more than one element is ambiguous In-Reply-To: References: Message-ID: Sebasian: Thank you. I have found that too and solved my problem now. regards, Zhe On Mon, May 17, 2010 at 12:22 AM, Sebastian Walter < sebastian.walter at gmail.com> wrote: > you are using integer arrays... > > this should work: > import numpy as np > from scipy.optimize import leastsq > > X = np.array([1,2,3,4,5,6,7,8,9],dtype=float) > Y = np.array([-1, 0, 1, 2,7,8,9,10,11],dtype=float) > > def func(p, x, y): > a = p > sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float) > for i in range(0, len(x)): > if x[i] > 4: > sum[i] = x[i] + a > else: > sum[i] = x[i] - a > return sum - y > > r = leastsq(func,1,args=(X,Y)) > > print r[0] > > > regards, > Sebastian > > > > On Sun, May 16, 2010 at 3:58 AM, Zhe Wang <3njoywind at gmail.com> wrote: > > v is calender years e.g > > (1+p)**1984 > > May be you could help me with this simple example: > > I have a function defined like this(a0 is a parameter): > > ------------------------------------ > > def f(x): > > if x > 4: > > return x + a0 > > else: > > return x - a0 > > ------------------------------------ > > I generated the data when I let a0=2 : > > -------------------------------------- > > X = np.array([1,2,3,4,5,6,7,8,9]) > > Y = np.array([-1, 0, 1, 2,7,8,9,10,11]) > > -------------------------------------- > > How can I use leastsq() to fit f(x)? I have wrote some code like this: > > -------------------------------------- > > def func(p, x, y): > > a = p > > sum = np.array([0,0,0,0,0,0,0,0,0]) > > for i in range(0, len(x)): > > if x[i] > 4: > > sum[i] = x[i] + a > > else: > > sum[i] = x[i] - a > > return sum - y > > r = leastsq(func,1,args=(X,Y)) > > print r[0] > > -------------------------------------- > > the output is 1.0000000149, much different like 2. > > I doubt whether leastsq() is suitable for this kind of problem. Maybe I > > should try another way? > > Zhe > > On Sun, May 16, 2010 at 12:34 AM, wrote: > >> > >> On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote: > >> > Josef: > >> > Thanks, > >> > my example is just for test, it is not always have a fixed start, e.g > >> >>>> T = np.array([1995,1996,1997,1998,1999]) > >> >>>> E = np.array([14,12,11,15,12]) > >> >>>> T-E > >> > array([1981, 1984, 1986, 1983, 1987]) > >> > so, if I define the function: > >> > def func(x): > >> > #.... > >> > v = np.arange(...) > >> > Y = np.cumsum((a*(1+p)**v)*I(v)) > >> > return Y > >> > when I call leastsq(func, [1,0]), v should change as the element of T > >> > change, e.g. > >> > when t(one element of T) is 1995?v = np.arange(1981, 1995) > >> > when t is 1996, v = np.arange(1984, 1996) > >> > >> are you sure v is supposed to be calender years and not number of > >> years accumulated? i.e > >> > >> (1+p)**1984 > >> or > >> (1+p)**(1996-1984) > >> > >> The most efficient would be to use the formula for the sum of a finite > >> geometric series, which would also avoid the sum. > >> > >> Josef > >> > >> > >> > ...... > >> > this troubles me so much. > >> > Zhe Wang > >> > > >> > > >> > On Sat, May 15, 2010 at 9:08 PM, wrote: > >> >> > >> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> > wrote: > >> >> > Josef: > >> >> > Thanks for your reply:) > >> >> > Actually I want to fit this equation: > >> >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)] > >> >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be > >> >> > calculated by e() and I(). > >> >> > >> >> Do you always have a fixed start date as in your example > >> >> > >> >> >>> T = np.array([1995,1996,1997,1998,1999]) > >> >> >>> E = np.array([10,11,12,13,14]) > >> >> >>> T-E > >> >> array([1985, 1985, 1985, 1985, 1985]) > >> >> > >> >> so that always v =range(T0, T+1) with fixed T0=1985 > >> >> > >> >> this would make it easier to work forwards than backwards, e.g. > >> >> something > >> >> like > >> >> v = np.arange(...) > >> >> Y = np.cusum((a*(1+p)**v)*I(v)) > >> >> > >> >> Josef > >> >> > >> >> > >> >> > >> >> > >> >> > I rewrote my code like this: > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------------------------------- > >> >> > from scipy.optimize import leastsq > >> >> > import numpy as np > >> >> > def Iv(t): > >> >> > return 4 > >> >> > def Yt(x, et): > >> >> > a, pa = x > >> >> > sum = np.array([0,0,0,0,0]) > >> >> > for i in range(0, len(et)): > >> >> > for j in range(0, et[i]): > >> >> > v = T[i] - et[i] + j > >> >> > sum[i] += a*(1+pa)**(v)*Iv(v) > >> >> > return sum - Y > >> >> > T = np.array([1995,1996,1997,1998,1999]) > >> >> > Y = > >> >> > > >> >> > > >> >> > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > >> >> > E = np.array([10,11,12,13,14]) > >> >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000) > >> >> > A, Pa = r[0] > >> >> > print "A=",A,"Pa=",Pa > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------------------------------- > >> >> > the output is: > >> >> > A= 1.0 Pa = 0.0 > >> >> > > >> >> > > >> >> > > ---------------------------------------------------------------------------------------------- > >> >> > I don't think it is correct. Hope for your guidence. > >> >> > On Sat, May 15, 2010 at 7:02 PM, wrote: > >> >> >> > >> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> > >> >> >> wrote: > >> >> >> > Traceback (most recent call last): > >> >> >> > File "D:\Yt.py", line 31, in > >> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> >> >> > File > "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", > >> >> >> > line > >> >> >> > 266, > >> >> >> > in leastsq > >> >> >> > m = check_func(func,x0,args,n)[0] > >> >> >> > File > "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", > >> >> >> > line > >> >> >> > 12, > >> >> >> > in check_func > >> >> >> > res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > >> >> >> > File "D:\Yt.py", line 26, in residuals > >> >> >> > return y - Yt(x, p) > >> >> >> > File "D:\Yt.py", line 20, in Yt > >> >> >> > for i in range(0, Et(x)): > >> >> >> > File "D:\Yt.py", line 11, in Et > >> >> >> > if t == 1995: > >> >> >> > ValueError: The truth value of an array with more than one > element > >> >> >> > is > >> >> >> > ambiguous. Use a.any() or a.all() > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > --------------------------------------------------------------------------------------------------------- > >> >> >> > > >> >> >> > When running the following code: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > -------------------------------------------------------------------------------------------- > >> >> >> > > >> >> >> > from scipy.optimize import leastsq > >> >> >> > import numpy as np > >> >> >> > > >> >> >> > def Iv(t): > >> >> >> > if t == 1995: > >> >> >> > return t + 2 > >> >> >> > else: > >> >> >> > return t > >> >> >> > > >> >> >> > def Et(t): > >> >> >> > if t == 1995: > >> >> >> > return t + 2 > >> >> >> > else: > >> >> >> > return t > >> >> >> > > >> >> >> > def Yt(x, p): > >> >> >> > a, pa = p > >> >> >> > sum = 0 > >> >> >> > > >> >> >> > for i in range(0, Et(x)): > >> >> >> > v = x - et + i > >> >> >> > sum += a*(1+p)**(v)*Iv(v) > >> >> >> > return sum > >> >> >> > > >> >> >> > def residuals(p, y, x): > >> >> >> > return y - Yt(x, p) > >> >> >> > > >> >> >> > T = np.array([1995,1996,1997,1998,1999]) > >> >> >> > Y = > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688]) > >> >> >> > > >> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000) > >> >> >> > A, Pa = r[0] > >> >> >> > print "A=",A,"Pa=",Pa > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ---------------------------------------------------------------------------------------------- > >> >> >> > > >> >> >> > I know the error occurs when I compare t like: "if t == > 1995",but > >> >> >> > I > >> >> >> > have > >> >> >> > no > >> >> >> > idea how to handle it correctly. > >> >> >> > >> >> >> try the vectorized version of a conditional assignment, e.g. > >> >> >> np.where(t == 1995, t, t+2) > >> >> >> > >> >> >> I didn't read enough of your example, to tell whether your Yt loop > >> >> >> can > >> >> >> be vectorized with a single sum, but I guess so. > >> >> >> > >> >> >> optimize leastsq expects an array, so residuals (and Yt) need to > >> >> >> return an array not a single value, maybe np.cusum and conditional > >> >> >> or > >> >> >> data dependent slicing/indexing works > >> >> >> > >> >> >> Josef > >> >> >> > >> >> >> > >> >> >> > > >> >> >> > Any help would be greatly appreciated. > >> >> >> > > >> >> >> > Zhe Wang > >> >> >> > > >> >> >> > _______________________________________________ > >> >> >> > SciPy-User mailing list > >> >> >> > SciPy-User at scipy.org > >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> >> > > >> >> >> > > >> >> >> _______________________________________________ > >> >> >> SciPy-User mailing list > >> >> >> SciPy-User at scipy.org > >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > SciPy-User mailing list > >> >> > SciPy-User at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> _______________________________________________ > >> >> SciPy-User mailing list > >> >> SciPy-User at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 16 23:33:31 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 May 2010 23:33:31 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel wrote: > What I still do not understand is the fact that curve_fit gives me a > different output then leastsq, even though curve_fit calls leastsq. > > I tried to get the chi-squared because we want to plot contours of > chi-square from the minimum to the maximum. I used following code: > > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > pinit = [20,20.] > > def func(x, a, b): > ???? return a*exp(-x) + b > > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, > sigma=R4errctsdataselect) this uses weighted least squares sigma : None or N-length sequence If not None, it represents the standard-deviation of ydata. This vector, if given, will be used as weights in the least-squares problem In your initial example with leastsq you don't have any weighting, it's just ordinary least squares maybe that's the difference. > print pfinal > print covar > dof=size(tau)-size(pinit) > print dof > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, > tau)))/dof > print chi2 > > I am not 100% sure I am doing the degrees of freedom calculation right. I > got the chi-square formula from the Pearson chi-squared test. I don't recognize your formula for chi2, and I don't see the connection to Pearson chi-squared test . Do you have a reference? Josef > > Thank you very much for the help so far. > > Cheers, > > Ben > > On Sun, May 16, 2010 at 05:50, wrote: >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel >> wrote: >> > >> > >> > On Fri, May 14, 2010 at 14:51, wrote: >> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel >> >> wrote: >> >> > Hey, >> >> > >> >> > I am fairly new Scipy and am trying to do a least square fit to a set >> >> > of >> >> > data. Currently, I am using following code: >> >> > >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> >> > pinit = [20,20.] >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), >> >> > full_output=1) >> >> > >> >> > I am now trying to get the goodness of fit out of this data. I am >> >> > sort >> >> > of >> >> > running into a brick wall because I found a lot of conflicting ways >> >> > of >> >> > how >> >> > to calculate it. >> >> >> >> For regression the usual is >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination >> >> coefficient of determination is >> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}} >> >> >> >> Note your fitfunc is linear in parameters and can be better estimated >> >> by linear least squares, OLS. >> >> linear regression is handled in statsmodels and you can get lot's of >> >> statistics without worrying about the formulas. >> >> If you only have one slope parameter, then scipy.stats.linregress also >> >> works >> >> >> > >> > Thanks for the information. I am still note quite sure if this is what >> > my >> > boss wants because there should not be an average y value. >> >> The definition of Rsquared is pretty uncontroversial with the y.mean() >> correction, if there is a constant in the regression (although I know >> mainly the linear case for this). >> >> If there is no constant in the regression, the definition or Rsquared >> is not clear/unambiguous, but usually used without mean correction of >> y. >> >> Josef >> >> > >> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance >> >> of the parameter estimates. >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit >> > >> > I have been trying this out, but the fit just looks horrid compared to >> > using >> > leastsq method even though they call the same function according to the >> > documentation. >> > >> >> >> >> > I am aware of the chisquare function in stats function, but the >> >> > documentation seems a little confusing to me. Any help would be >> >> > greatly >> >> > appreciates. >> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing the >> >> goodness-of-fit of entire distributions, not for how well a curve or >> >> line fits the data. >> >> >> > >> > That is what I thought, which brought up my confusion when I asked other >> > people and they told me to use that >> > >> >> >> >> Josef >> >> >> >> > >> >> > Thanks very much in advance. >> >> > >> >> > Cheers, >> >> > >> >> > Ben >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > >> > -- >> > Benedikt Riedel >> > Graduate Student University of Wisconsin-Madison >> > Department of Physics >> > Office: 2304 Chamberlin Hall >> > Lab: 6247 Chamberlin Hall >> > Tel: ?(608) 301-5736 >> > Cell: (213) 519-1771 >> > Lab: (608) 262-5916 >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > Benedikt Riedel > Graduate Student University of Wisconsin-Madison > Department of Physics > Office: 2304 Chamberlin Hall > Lab: 6247 Chamberlin Hall > Tel: ?(608) 301-5736 > Cell: (213) 519-1771 > Lab: (608) 262-5916 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From briedel at wisc.edu Mon May 17 00:18:00 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Sun, 16 May 2010 23:18:00 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Sun, May 16, 2010 at 22:33, wrote: > On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel wrote: > > What I still do not understand is the fact that curve_fit gives me a > > different output then leastsq, even though curve_fit calls leastsq. > > > > I tried to get the chi-squared because we want to plot contours of > > chi-square from the minimum to the maximum. I used following code: > > > > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > > pinit = [20,20.] > > > > def func(x, a, b): > > return a*exp(-x) + b > > > > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, > > sigma=R4errctsdataselect) > > this uses weighted least squares > sigma : None or N-length sequence > If not None, it represents the standard-deviation of ydata. This > vector, if given, will be used as weights in the least-squares problem > > In your initial example with leastsq you don't have any weighting, > it's just ordinary least squares > > maybe that's the difference. > > > Yeah I guess that will be it. > > > print pfinal > > print covar > > dof=size(tau)-size(pinit) > > print dof > > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, > > tau)))/dof > > print chi2 > > > > I am not 100% sure I am doing the degrees of freedom calculation right. I > > got the chi-square formula from the Pearson chi-squared test. > > I don't recognize your formula for chi2, and I don't see the > connection to Pearson chi-squared test . > > Do you have a reference? > > I based my use of the Pearson test from what I read in an Econometrics book, but wiki has the a pretty good description. I basically based it off the example there. Where the expected would be what comes out of the fit and what you is the "R4ctsdataselect" for those specific values. http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test > Josef > > Thanks again Ben > > > > Thank you very much for the help so far. > > > > Cheers, > > > > Ben > > > > On Sun, May 16, 2010 at 05:50, wrote: > >> > >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel > >> wrote: > >> > > >> > > >> > On Fri, May 14, 2010 at 14:51, wrote: > >> >> > >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel > >> >> wrote: > >> >> > Hey, > >> >> > > >> >> > I am fairly new Scipy and am trying to do a least square fit to a > set > >> >> > of > >> >> > data. Currently, I am using following code: > >> >> > > >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >> >> > pinit = [20,20.] > >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), > >> >> > full_output=1) > >> >> > > >> >> > I am now trying to get the goodness of fit out of this data. I am > >> >> > sort > >> >> > of > >> >> > running into a brick wall because I found a lot of conflicting ways > >> >> > of > >> >> > how > >> >> > to calculate it. > >> >> > >> >> For regression the usual is > >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination > >> >> coefficient of determination is > >> >> > >> >> R^2 = 1 - {SS_{err} / SS_{tot}} > >> >> > >> >> Note your fitfunc is linear in parameters and can be better estimated > >> >> by linear least squares, OLS. > >> >> linear regression is handled in statsmodels and you can get lot's of > >> >> statistics without worrying about the formulas. > >> >> If you only have one slope parameter, then scipy.stats.linregress > also > >> >> works > >> >> > >> > > >> > Thanks for the information. I am still note quite sure if this is what > >> > my > >> > boss wants because there should not be an average y value. > >> > >> The definition of Rsquared is pretty uncontroversial with the y.mean() > >> correction, if there is a constant in the regression (although I know > >> mainly the linear case for this). > >> > >> If there is no constant in the regression, the definition or Rsquared > >> is not clear/unambiguous, but usually used without mean correction of > >> y. > >> > >> Josef > >> > >> > > >> >> > >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance > >> >> of the parameter estimates. > >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > >> > > >> > I have been trying this out, but the fit just looks horrid compared to > >> > using > >> > leastsq method even though they call the same function according to > the > >> > documentation. > >> > > >> >> > >> >> > I am aware of the chisquare function in stats function, but the > >> >> > documentation seems a little confusing to me. Any help would be > >> >> > greatly > >> >> > appreciates. > >> >> > >> >> chisquare and others like kolmogorov-smirnov are more for testing the > >> >> goodness-of-fit of entire distributions, not for how well a curve or > >> >> line fits the data. > >> >> > >> > > >> > That is what I thought, which brought up my confusion when I asked > other > >> > people and they told me to use that > >> > > >> >> > >> >> Josef > >> >> > >> >> > > >> >> > Thanks very much in advance. > >> >> > > >> >> > Cheers, > >> >> > > >> >> > Ben > >> >> > > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > SciPy-User mailing list > >> >> > SciPy-User at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> _______________________________________________ > >> >> SciPy-User mailing list > >> >> SciPy-User at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> > > >> > -- > >> > Benedikt Riedel > >> > Graduate Student University of Wisconsin-Madison > >> > Department of Physics > >> > Office: 2304 Chamberlin Hall > >> > Lab: 6247 Chamberlin Hall > >> > Tel: (608) 301-5736 > >> > Cell: (213) 519-1771 > >> > Lab: (608) 262-5916 > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > -- > > Benedikt Riedel > > Graduate Student University of Wisconsin-Madison > > Department of Physics > > Office: 2304 Chamberlin Hall > > Lab: 6247 Chamberlin Hall > > Tel: (608) 301-5736 > > Cell: (213) 519-1771 > > Lab: (608) 262-5916 > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 17 01:20:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 May 2010 01:20:59 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel wrote: > > > On Sun, May 16, 2010 at 22:33, wrote: >> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel wrote: >> > What I still do not understand is the fact that curve_fit gives me a >> > different output then leastsq, even though curve_fit calls leastsq. >> > >> > I tried to get the chi-squared because we want to plot contours of >> > chi-square from the minimum to the maximum. I used following code: >> > >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> > pinit = [20,20.] >> > >> > def func(x, a, b): >> > ???? return a*exp(-x) + b >> > >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, >> > sigma=R4errctsdataselect) >> >> this uses weighted least squares >> sigma : None or N-length sequence >> ? ?If not None, it represents the standard-deviation of ydata. This >> vector, if given, will be used as weights in the least-squares problem >> >> In your initial example with leastsq you don't have any weighting, >> it's just ordinary least squares >> >> maybe that's the difference. >> >> > > Yeah I guess that will be it. > >> >> > print pfinal >> > print covar >> > dof=size(tau)-size(pinit) >> > print dof >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, >> > tau)))/dof >> > print chi2 >> > >> > I am not 100% sure I am doing the degrees of freedom calculation right. >> > I >> > got the chi-square formula from the Pearson chi-squared test. >> >> I don't recognize your formula for chi2, and I don't see the >> connection to Pearson chi-squared test . >> >> Do you have a reference? >> > > I based my use of the Pearson test from what I read in an Econometrics book, > but wiki has the a pretty good description. I basically based it off the > example there. Where the expected would be what comes out of the fit and > what you is the "R4ctsdataselect" for those specific values. > > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test I looked at that, but it's a completely different case, the values in the formulas are frequencies Oi = an observed frequency; Ei = an expected (theoretical) frequency, asserted by the null hypothesis; not points on a regression curve Josef > > >> >> Josef >> > > Thanks again > > Ben > > >> >> > >> > Thank you very much for the help so far. >> > >> > Cheers, >> > >> > Ben >> > >> > On Sun, May 16, 2010 at 05:50, wrote: >> >> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel >> >> wrote: >> >> > >> >> > >> >> > On Fri, May 14, 2010 at 14:51, wrote: >> >> >> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel >> >> >> wrote: >> >> >> > Hey, >> >> >> > >> >> >> > I am fairly new Scipy and am trying to do a least square fit to a >> >> >> > set >> >> >> > of >> >> >> > data. Currently, I am using following code: >> >> >> > >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> >> >> > pinit = [20,20.] >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), >> >> >> > full_output=1) >> >> >> > >> >> >> > I am now trying to get the goodness of fit out of this data. I am >> >> >> > sort >> >> >> > of >> >> >> > running into a brick wall because I found a lot of conflicting >> >> >> > ways >> >> >> > of >> >> >> > how >> >> >> > to calculate it. >> >> >> >> >> >> For regression the usual is >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination >> >> >> coefficient of determination is >> >> >> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}} >> >> >> >> >> >> Note your fitfunc is linear in parameters and can be better >> >> >> estimated >> >> >> by linear least squares, OLS. >> >> >> linear regression is handled in statsmodels and you can get lot's of >> >> >> statistics without worrying about the formulas. >> >> >> If you only have one slope parameter, then scipy.stats.linregress >> >> >> also >> >> >> works >> >> >> >> >> > >> >> > Thanks for the information. I am still note quite sure if this is >> >> > what >> >> > my >> >> > boss wants because there should not be an average y value. >> >> >> >> The definition of Rsquared is pretty uncontroversial with the y.mean() >> >> correction, if there is a constant in the regression (although I know >> >> mainly the linear case for this). >> >> >> >> If there is no constant in the regression, the definition or Rsquared >> >> is not clear/unambiguous, but usually used without mean correction of >> >> y. >> >> >> >> Josef >> >> >> >> > >> >> >> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the >> >> >> covariance >> >> >> of the parameter estimates. >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit >> >> > >> >> > I have been trying this out, but the fit just looks horrid compared >> >> > to >> >> > using >> >> > leastsq method even though they call the same function according to >> >> > the >> >> > documentation. >> >> > >> >> >> >> >> >> > I am aware of the chisquare function in stats function, but the >> >> >> > documentation seems a little confusing to me. Any help would be >> >> >> > greatly >> >> >> > appreciates. >> >> >> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing >> >> >> the >> >> >> goodness-of-fit of entire distributions, not for how well a curve or >> >> >> line fits the data. >> >> >> >> >> > >> >> > That is what I thought, which brought up my confusion when I asked >> >> > other >> >> > people and they told me to use that >> >> > >> >> >> >> >> >> Josef >> >> >> >> >> >> > >> >> >> > Thanks very much in advance. >> >> >> > >> >> >> > Cheers, >> >> >> > >> >> >> > Ben >> >> >> > >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > SciPy-User mailing list >> >> >> > SciPy-User at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> >> >> > >> >> >> _______________________________________________ >> >> >> SciPy-User mailing list >> >> >> SciPy-User at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> > >> >> > -- >> >> > Benedikt Riedel >> >> > Graduate Student University of Wisconsin-Madison >> >> > Department of Physics >> >> > Office: 2304 Chamberlin Hall >> >> > Lab: 6247 Chamberlin Hall >> >> > Tel: ?(608) 301-5736 >> >> > Cell: (213) 519-1771 >> >> > Lab: (608) 262-5916 >> >> > >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > >> > -- >> > Benedikt Riedel >> > Graduate Student University of Wisconsin-Madison >> > Department of Physics >> > Office: 2304 Chamberlin Hall >> > Lab: 6247 Chamberlin Hall >> > Tel: ?(608) 301-5736 >> > Cell: (213) 519-1771 >> > Lab: (608) 262-5916 >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > Benedikt Riedel > Graduate Student University of Wisconsin-Madison > Department of Physics > Office: 2304 Chamberlin Hall > Lab: 6247 Chamberlin Hall > Tel: ?(608) 301-5736 > Cell: (213) 519-1771 > Lab: (608) 262-5916 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From briedel at wisc.edu Mon May 17 02:01:07 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Mon, 17 May 2010 01:01:07 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: Thanks for the clarification. I am still not sure how to get the chi-squared value of my regression though. When I use the formula under "Regression Analysis" here http://en.wikipedia.org/wiki/Goodness_of_fit I get a chi-square somewhere around 19, which seems way to large compared to the value of 3.2 I get for the same data set when I fit it using gnuplot. Where gnuplot supposedly used the weighted sum of squares of residuals. I do not fully this because of the results I get. Here is the python code I used: chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/pow(R4errctsdataselect,2)))/dof Sorry for being so thick headed, statistics is just beyond me at times. Cheers, Ben On Mon, May 17, 2010 at 00:20, wrote: > On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel > wrote: > > > > > > On Sun, May 16, 2010 at 22:33, wrote: > >> > >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel > wrote: > >> > What I still do not understand is the fact that curve_fit gives me a > >> > different output then leastsq, even though curve_fit calls leastsq. > >> > > >> > I tried to get the chi-squared because we want to plot contours of > >> > chi-square from the minimum to the maximum. I used following code: > >> > > >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >> > pinit = [20,20.] > >> > > >> > def func(x, a, b): > >> > return a*exp(-x) + b > >> > > >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, > >> > sigma=R4errctsdataselect) > >> > >> this uses weighted least squares > >> sigma : None or N-length sequence > >> If not None, it represents the standard-deviation of ydata. This > >> vector, if given, will be used as weights in the least-squares problem > >> > >> In your initial example with leastsq you don't have any weighting, > >> it's just ordinary least squares > >> > >> maybe that's the difference. > >> > >> > > > > Yeah I guess that will be it. > > > >> > >> > print pfinal > >> > print covar > >> > dof=size(tau)-size(pinit) > >> > print dof > >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, > >> > tau)))/dof > >> > print chi2 > >> > > >> > I am not 100% sure I am doing the degrees of freedom calculation > right. > >> > I > >> > got the chi-square formula from the Pearson chi-squared test. > >> > >> I don't recognize your formula for chi2, and I don't see the > >> connection to Pearson chi-squared test . > >> > >> Do you have a reference? > >> > > > > I based my use of the Pearson test from what I read in an Econometrics > book, > > but wiki has the a pretty good description. I basically based it off the > > example there. Where the expected would be what comes out of the fit and > > what you is the "R4ctsdataselect" for those specific values. > > > > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test > > I looked at that, but it's a completely different case, the values in > the formulas are frequencies > > Oi = an observed frequency; > Ei = an expected (theoretical) frequency, asserted by the null > hypothesis; > > not points on a regression curve > > Josef > > > > > > >> > >> Josef > >> > > > > Thanks again > > > > Ben > > > > > >> > >> > > >> > Thank you very much for the help so far. > >> > > >> > Cheers, > >> > > >> > Ben > >> > > >> > On Sun, May 16, 2010 at 05:50, wrote: > >> >> > >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel > >> >> wrote: > >> >> > > >> >> > > >> >> > On Fri, May 14, 2010 at 14:51, wrote: > >> >> >> > >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel < > briedel at wisc.edu> > >> >> >> wrote: > >> >> >> > Hey, > >> >> >> > > >> >> >> > I am fairly new Scipy and am trying to do a least square fit to > a > >> >> >> > set > >> >> >> > of > >> >> >> > data. Currently, I am using following code: > >> >> >> > > >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >> >> >> > pinit = [20,20.] > >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), > >> >> >> > full_output=1) > >> >> >> > > >> >> >> > I am now trying to get the goodness of fit out of this data. I > am > >> >> >> > sort > >> >> >> > of > >> >> >> > running into a brick wall because I found a lot of conflicting > >> >> >> > ways > >> >> >> > of > >> >> >> > how > >> >> >> > to calculate it. > >> >> >> > >> >> >> For regression the usual is > >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination > >> >> >> coefficient of determination is > >> >> >> > >> >> >> R^2 = 1 - {SS_{err} / SS_{tot}} > >> >> >> > >> >> >> Note your fitfunc is linear in parameters and can be better > >> >> >> estimated > >> >> >> by linear least squares, OLS. > >> >> >> linear regression is handled in statsmodels and you can get lot's > of > >> >> >> statistics without worrying about the formulas. > >> >> >> If you only have one slope parameter, then scipy.stats.linregress > >> >> >> also > >> >> >> works > >> >> >> > >> >> > > >> >> > Thanks for the information. I am still note quite sure if this is > >> >> > what > >> >> > my > >> >> > boss wants because there should not be an average y value. > >> >> > >> >> The definition of Rsquared is pretty uncontroversial with the > y.mean() > >> >> correction, if there is a constant in the regression (although I know > >> >> mainly the linear case for this). > >> >> > >> >> If there is no constant in the regression, the definition or Rsquared > >> >> is not clear/unambiguous, but usually used without mean correction of > >> >> y. > >> >> > >> >> Josef > >> >> > >> >> > > >> >> >> > >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the > >> >> >> covariance > >> >> >> of the parameter estimates. > >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > >> >> > > >> >> > I have been trying this out, but the fit just looks horrid compared > >> >> > to > >> >> > using > >> >> > leastsq method even though they call the same function according to > >> >> > the > >> >> > documentation. > >> >> > > >> >> >> > >> >> >> > I am aware of the chisquare function in stats function, but the > >> >> >> > documentation seems a little confusing to me. Any help would be > >> >> >> > greatly > >> >> >> > appreciates. > >> >> >> > >> >> >> chisquare and others like kolmogorov-smirnov are more for testing > >> >> >> the > >> >> >> goodness-of-fit of entire distributions, not for how well a curve > or > >> >> >> line fits the data. > >> >> >> > >> >> > > >> >> > That is what I thought, which brought up my confusion when I asked > >> >> > other > >> >> > people and they told me to use that > >> >> > > >> >> >> > >> >> >> Josef > >> >> >> > >> >> >> > > >> >> >> > Thanks very much in advance. > >> >> >> > > >> >> >> > Cheers, > >> >> >> > > >> >> >> > Ben > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > _______________________________________________ > >> >> >> > SciPy-User mailing list > >> >> >> > SciPy-User at scipy.org > >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> >> > > >> >> >> > > >> >> >> _______________________________________________ > >> >> >> SciPy-User mailing list > >> >> >> SciPy-User at scipy.org > >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Benedikt Riedel > >> >> > Graduate Student University of Wisconsin-Madison > >> >> > Department of Physics > >> >> > Office: 2304 Chamberlin Hall > >> >> > Lab: 6247 Chamberlin Hall > >> >> > Tel: (608) 301-5736 > >> >> > Cell: (213) 519-1771 > >> >> > Lab: (608) 262-5916 > >> >> > > >> >> > _______________________________________________ > >> >> > SciPy-User mailing list > >> >> > SciPy-User at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> > > >> >> _______________________________________________ > >> >> SciPy-User mailing list > >> >> SciPy-User at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> > > >> > -- > >> > Benedikt Riedel > >> > Graduate Student University of Wisconsin-Madison > >> > Department of Physics > >> > Office: 2304 Chamberlin Hall > >> > Lab: 6247 Chamberlin Hall > >> > Tel: (608) 301-5736 > >> > Cell: (213) 519-1771 > >> > Lab: (608) 262-5916 > >> > > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > -- > > Benedikt Riedel > > Graduate Student University of Wisconsin-Madison > > Department of Physics > > Office: 2304 Chamberlin Hall > > Lab: 6247 Chamberlin Hall > > Tel: (608) 301-5736 > > Cell: (213) 519-1771 > > Lab: (608) 262-5916 > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 17 07:35:27 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 May 2010 07:35:27 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel wrote: > Thanks for the clarification. I am still not sure how to get the chi-squared > value of my regression though. When I use the formula under "Regression > Analysis" here > > http://en.wikipedia.org/wiki/Goodness_of_fit > > I get a chi-square somewhere around 19, which seems way to large compared to > the value of 3.2 I get for the same data set when I fit it using gnuplot. > Where gnuplot supposedly used the weighted sum of squares of residuals. I do > not fully this because of the results I get. > > Here is the python code I used: > > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), > 2)/pow(R4errctsdataselect,2)))/dof from some gnuplot help page it looks like what they call chisquare is WSSR/dof which would be something like chi2=(sum( ( R4ctsdataselect-fitfunc(pinit, tau)) / sqrt(R4errctsdataselect) )**2 )/dof I'm not sure whether the sqrt is in there or not, because I don't remember the normalization that is used, weights or weights squared Josef > > Sorry for being so thick headed, statistics is just beyond me at times. > > Cheers, > > Ben > > On Mon, May 17, 2010 at 00:20, wrote: >> >> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel >> wrote: >> > >> > >> > On Sun, May 16, 2010 at 22:33, wrote: >> >> >> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel >> >> wrote: >> >> > What I still do not understand is the fact that curve_fit gives me a >> >> > different output then leastsq, even though curve_fit calls leastsq. >> >> > >> >> > I tried to get the chi-squared because we want to plot contours of >> >> > chi-square from the minimum to the maximum. I used following code: >> >> > >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> >> > pinit = [20,20.] >> >> > >> >> > def func(x, a, b): >> >> > ???? return a*exp(-x) + b >> >> > >> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, >> >> > sigma=R4errctsdataselect) >> >> >> >> this uses weighted least squares >> >> sigma : None or N-length sequence >> >> ? ?If not None, it represents the standard-deviation of ydata. This >> >> vector, if given, will be used as weights in the least-squares problem >> >> >> >> In your initial example with leastsq you don't have any weighting, >> >> it's just ordinary least squares >> >> >> >> maybe that's the difference. >> >> >> >> >> > >> > Yeah I guess that will be it. >> > >> >> >> >> > print pfinal >> >> > print covar >> >> > dof=size(tau)-size(pinit) >> >> > print dof >> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, >> >> > tau)))/dof >> >> > print chi2 >> >> > >> >> > I am not 100% sure I am doing the degrees of freedom calculation >> >> > right. >> >> > I >> >> > got the chi-square formula from the Pearson chi-squared test. >> >> >> >> I don't recognize your formula for chi2, and I don't see the >> >> connection to Pearson chi-squared test . >> >> >> >> Do you have a reference? >> >> >> > >> > I based my use of the Pearson test from what I read in an Econometrics >> > book, >> > but wiki has the a pretty good description. I basically based it off the >> > example there. Where the expected would be what comes out of the fit and >> > what you is the "R4ctsdataselect" for those specific values. >> > >> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test >> >> I looked at that, but it's a completely different case, the values in >> the formulas are frequencies >> >> ? ?Oi = an observed frequency; >> ? ?Ei = an expected (theoretical) frequency, asserted by the null >> hypothesis; >> >> not points on a regression curve >> >> Josef >> >> > >> > >> >> >> >> Josef >> >> >> > >> > Thanks again >> > >> > Ben >> > >> > >> >> >> >> > >> >> > Thank you very much for the help so far. >> >> > >> >> > Cheers, >> >> > >> >> > Ben >> >> > >> >> > On Sun, May 16, 2010 at 05:50, wrote: >> >> >> >> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel >> >> >> wrote: >> >> >> > >> >> >> > >> >> >> > On Fri, May 14, 2010 at 14:51, wrote: >> >> >> >> >> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel >> >> >> >> >> >> >> >> wrote: >> >> >> >> > Hey, >> >> >> >> > >> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to >> >> >> >> > a >> >> >> >> > set >> >> >> >> > of >> >> >> >> > data. Currently, I am using following code: >> >> >> >> > >> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >> >> >> >> > pinit = [20,20.] >> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), >> >> >> >> > full_output=1) >> >> >> >> > >> >> >> >> > I am now trying to get the goodness of fit out of this data. I >> >> >> >> > am >> >> >> >> > sort >> >> >> >> > of >> >> >> >> > running into a brick wall because I found a lot of conflicting >> >> >> >> > ways >> >> >> >> > of >> >> >> >> > how >> >> >> >> > to calculate it. >> >> >> >> >> >> >> >> For regression the usual is >> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination >> >> >> >> coefficient of determination is >> >> >> >> >> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}} >> >> >> >> >> >> >> >> Note your fitfunc is linear in parameters and can be better >> >> >> >> estimated >> >> >> >> by linear least squares, OLS. >> >> >> >> linear regression is handled in statsmodels and you can get lot's >> >> >> >> of >> >> >> >> statistics without worrying about the formulas. >> >> >> >> If you only have one slope parameter, then scipy.stats.linregress >> >> >> >> also >> >> >> >> works >> >> >> >> >> >> >> > >> >> >> > Thanks for the information. I am still note quite sure if this is >> >> >> > what >> >> >> > my >> >> >> > boss wants because there should not be an average y value. >> >> >> >> >> >> The definition of Rsquared is pretty uncontroversial with the >> >> >> y.mean() >> >> >> correction, if there is a constant in the regression (although I >> >> >> know >> >> >> mainly the linear case for this). >> >> >> >> >> >> If there is no constant in the regression, the definition or >> >> >> Rsquared >> >> >> is not clear/unambiguous, but usually used without mean correction >> >> >> of >> >> >> y. >> >> >> >> >> >> Josef >> >> >> >> >> >> > >> >> >> >> >> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the >> >> >> >> covariance >> >> >> >> of the parameter estimates. >> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit >> >> >> > >> >> >> > I have been trying this out, but the fit just looks horrid >> >> >> > compared >> >> >> > to >> >> >> > using >> >> >> > leastsq method even though they call the same function according >> >> >> > to >> >> >> > the >> >> >> > documentation. >> >> >> > >> >> >> >> >> >> >> >> > I am aware of the chisquare function in stats function, but the >> >> >> >> > documentation seems a little confusing to me. Any help would be >> >> >> >> > greatly >> >> >> >> > appreciates. >> >> >> >> >> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing >> >> >> >> the >> >> >> >> goodness-of-fit of entire distributions, not for how well a curve >> >> >> >> or >> >> >> >> line fits the data. >> >> >> >> >> >> >> > >> >> >> > That is what I thought, which brought up my confusion when I asked >> >> >> > other >> >> >> > people and they told me to use that >> >> >> > >> >> >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> > >> >> >> >> > Thanks very much in advance. >> >> >> >> > >> >> >> >> > Cheers, >> >> >> >> > >> >> >> >> > Ben >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > _______________________________________________ >> >> >> >> > SciPy-User mailing list >> >> >> >> > SciPy-User at scipy.org >> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >> > >> >> >> >> > >> >> >> >> _______________________________________________ >> >> >> >> SciPy-User mailing list >> >> >> >> SciPy-User at scipy.org >> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Benedikt Riedel >> >> >> > Graduate Student University of Wisconsin-Madison >> >> >> > Department of Physics >> >> >> > Office: 2304 Chamberlin Hall >> >> >> > Lab: 6247 Chamberlin Hall >> >> >> > Tel: ?(608) 301-5736 >> >> >> > Cell: (213) 519-1771 >> >> >> > Lab: (608) 262-5916 >> >> >> > >> >> >> > _______________________________________________ >> >> >> > SciPy-User mailing list >> >> >> > SciPy-User at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> >> >> > >> >> >> _______________________________________________ >> >> >> SciPy-User mailing list >> >> >> SciPy-User at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> > >> >> > -- >> >> > Benedikt Riedel >> >> > Graduate Student University of Wisconsin-Madison >> >> > Department of Physics >> >> > Office: 2304 Chamberlin Hall >> >> > Lab: 6247 Chamberlin Hall >> >> > Tel: ?(608) 301-5736 >> >> > Cell: (213) 519-1771 >> >> > Lab: (608) 262-5916 >> >> > >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > >> > -- >> > Benedikt Riedel >> > Graduate Student University of Wisconsin-Madison >> > Department of Physics >> > Office: 2304 Chamberlin Hall >> > Lab: 6247 Chamberlin Hall >> > Tel: ?(608) 301-5736 >> > Cell: (213) 519-1771 >> > Lab: (608) 262-5916 >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > Benedikt Riedel > Graduate Student University of Wisconsin-Madison > Department of Physics > Office: 2304 Chamberlin Hall > Lab: 6247 Chamberlin Hall > Tel: ?(608) 301-5736 > Cell: (213) 519-1771 > Lab: (608) 262-5916 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From georges.schutz at internet.lu Mon May 17 08:39:34 2010 From: georges.schutz at internet.lu (Georges Schutz) Date: Mon, 17 May 2010 14:39:34 +0200 Subject: [SciPy-User] scikits.timeseries: How to define frequency of 15minutes In-Reply-To: References: Message-ID: <4BF13906.9030707@internet.lu> Hi Martin, It is good to hear that there are others facing the same problem because this my raise the importance of that issue for future plans. The solution you propose would be OK for me, I think I could live a while with being restricted to the proposed frequencies even if I would look foreword to customizable frequency on the long term. Thanks Georges Schutz On 05/05/2010 12:03, Martin Felder wrote: > Hi *, > > just for the record, I'm having the exact same problem as Georges. I > read through your discussion from three weeks ago, but I also don't feel > up to modifying the C code myself (being a Fortran kind of guy...). > > I understand implementing custom user-defined frequencies is probably a > lot of effort, but maybe it's less troublesome to just add some > frequencies often used (=by Georges and me, and hopefully others?) to > the currently implemented ones? I'd be extremely happy to have 12h, 6h, > 3h, 15min and 10min intervals in addition to the existing ones. > > If you could point me to the part of the code that would have to be > modified for that, maybe I can find someone more apt in C who can > implement it. > > Thanks, > Martin > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Mon May 17 11:20:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 May 2010 11:20:13 -0400 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: On Mon, May 17, 2010 at 7:35 AM, wrote: > On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel wrote: >> Thanks for the clarification. I am still not sure how to get the chi-squared >> value of my regression though. When I use the formula under "Regression >> Analysis" here >> >> http://en.wikipedia.org/wiki/Goodness_of_fit >> >> I get a chi-square somewhere around 19, which seems way to large compared to >> the value of 3.2 I get for the same data set when I fit it using gnuplot. >> Where gnuplot supposedly used the weighted sum of squares of residuals. I do >> not fully this because of the results I get. >> >> Here is the python code I used: >> >> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), >> 2)/pow(R4errctsdataselect,2)))/dof > > > from some gnuplot help page it looks like what they call chisquare is WSSR/dof > > which would be something like > > chi2=(sum( ?( R4ctsdataselect-fitfunc(pinit, tau)) / > sqrt(R4errctsdataselect) )**2 ?)/dof > > I'm not sure whether the sqrt is in there or not, because I don't > remember the normalization that is used, weights or weights squared (for reference) gnuplot is pretty vague on the denominator http://theochem.ki.ku.dk/on_line_docs/gnuplot/gnuplot_21.html#SEC81 a bit better explanation of the terminology http://www.graphpad.com/faq/viewfaq.cfm?faq=926 any more explicit reference has sigma in the denominator Josef > Josef > > > > >> >> Sorry for being so thick headed, statistics is just beyond me at times. >> >> Cheers, >> >> Ben >> >> On Mon, May 17, 2010 at 00:20, wrote: >>> >>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel >>> wrote: >>> > >>> > >>> > On Sun, May 16, 2010 at 22:33, wrote: >>> >> >>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel >>> >> wrote: >>> >> > What I still do not understand is the fact that curve_fit gives me a >>> >> > different output then leastsq, even though curve_fit calls leastsq. >>> >> > >>> >> > I tried to get the chi-squared because we want to plot contours of >>> >> > chi-square from the minimum to the maximum. I used following code: >>> >> > >>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >>> >> > pinit = [20,20.] >>> >> > >>> >> > def func(x, a, b): >>> >> > ???? return a*exp(-x) + b >>> >> > >>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, >>> >> > sigma=R4errctsdataselect) >>> >> >>> >> this uses weighted least squares >>> >> sigma : None or N-length sequence >>> >> ? ?If not None, it represents the standard-deviation of ydata. This >>> >> vector, if given, will be used as weights in the least-squares problem >>> >> >>> >> In your initial example with leastsq you don't have any weighting, >>> >> it's just ordinary least squares >>> >> >>> >> maybe that's the difference. >>> >> >>> >> >>> > >>> > Yeah I guess that will be it. >>> > >>> >> >>> >> > print pfinal >>> >> > print covar >>> >> > dof=size(tau)-size(pinit) >>> >> > print dof >>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit, >>> >> > tau)))/dof >>> >> > print chi2 >>> >> > >>> >> > I am not 100% sure I am doing the degrees of freedom calculation >>> >> > right. >>> >> > I >>> >> > got the chi-square formula from the Pearson chi-squared test. >>> >> >>> >> I don't recognize your formula for chi2, and I don't see the >>> >> connection to Pearson chi-squared test . >>> >> >>> >> Do you have a reference? >>> >> >>> > >>> > I based my use of the Pearson test from what I read in an Econometrics >>> > book, >>> > but wiki has the a pretty good description. I basically based it off the >>> > example there. Where the expected would be what comes out of the fit and >>> > what you is the "R4ctsdataselect" for those specific values. >>> > >>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test >>> >>> I looked at that, but it's a completely different case, the values in >>> the formulas are frequencies >>> >>> ? ?Oi = an observed frequency; >>> ? ?Ei = an expected (theoretical) frequency, asserted by the null >>> hypothesis; >>> >>> not points on a regression curve >>> >>> Josef >>> >>> > >>> > >>> >> >>> >> Josef >>> >> >>> > >>> > Thanks again >>> > >>> > Ben >>> > >>> > >>> >> >>> >> > >>> >> > Thank you very much for the help so far. >>> >> > >>> >> > Cheers, >>> >> > >>> >> > Ben >>> >> > >>> >> > On Sun, May 16, 2010 at 05:50, wrote: >>> >> >> >>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel >>> >> >> wrote: >>> >> >> > >>> >> >> > >>> >> >> > On Fri, May 14, 2010 at 14:51, wrote: >>> >> >> >> >>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel >>> >> >> >> >>> >> >> >> wrote: >>> >> >> >> > Hey, >>> >> >> >> > >>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to >>> >> >> >> > a >>> >> >> >> > set >>> >> >> >> > of >>> >> >> >> > data. Currently, I am using following code: >>> >> >> >> > >>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) >>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) >>> >> >> >> > pinit = [20,20.] >>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), >>> >> >> >> > full_output=1) >>> >> >> >> > >>> >> >> >> > I am now trying to get the goodness of fit out of this data. I >>> >> >> >> > am >>> >> >> >> > sort >>> >> >> >> > of >>> >> >> >> > running into a brick wall because I found a lot of conflicting >>> >> >> >> > ways >>> >> >> >> > of >>> >> >> >> > how >>> >> >> >> > to calculate it. >>> >> >> >> >>> >> >> >> For regression the usual is >>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination >>> >> >> >> coefficient of determination is >>> >> >> >> >>> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}} >>> >> >> >> >>> >> >> >> Note your fitfunc is linear in parameters and can be better >>> >> >> >> estimated >>> >> >> >> by linear least squares, OLS. >>> >> >> >> linear regression is handled in statsmodels and you can get lot's >>> >> >> >> of >>> >> >> >> statistics without worrying about the formulas. >>> >> >> >> If you only have one slope parameter, then scipy.stats.linregress >>> >> >> >> also >>> >> >> >> works >>> >> >> >> >>> >> >> > >>> >> >> > Thanks for the information. I am still note quite sure if this is >>> >> >> > what >>> >> >> > my >>> >> >> > boss wants because there should not be an average y value. >>> >> >> >>> >> >> The definition of Rsquared is pretty uncontroversial with the >>> >> >> y.mean() >>> >> >> correction, if there is a constant in the regression (although I >>> >> >> know >>> >> >> mainly the linear case for this). >>> >> >> >>> >> >> If there is no constant in the regression, the definition or >>> >> >> Rsquared >>> >> >> is not clear/unambiguous, but usually used without mean correction >>> >> >> of >>> >> >> y. >>> >> >> >>> >> >> Josef >>> >> >> >>> >> >> > >>> >> >> >> >>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the >>> >> >> >> covariance >>> >> >> >> of the parameter estimates. >>> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit >>> >> >> > >>> >> >> > I have been trying this out, but the fit just looks horrid >>> >> >> > compared >>> >> >> > to >>> >> >> > using >>> >> >> > leastsq method even though they call the same function according >>> >> >> > to >>> >> >> > the >>> >> >> > documentation. >>> >> >> > >>> >> >> >> >>> >> >> >> > I am aware of the chisquare function in stats function, but the >>> >> >> >> > documentation seems a little confusing to me. Any help would be >>> >> >> >> > greatly >>> >> >> >> > appreciates. >>> >> >> >> >>> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing >>> >> >> >> the >>> >> >> >> goodness-of-fit of entire distributions, not for how well a curve >>> >> >> >> or >>> >> >> >> line fits the data. >>> >> >> >> >>> >> >> > >>> >> >> > That is what I thought, which brought up my confusion when I asked >>> >> >> > other >>> >> >> > people and they told me to use that >>> >> >> > >>> >> >> >> >>> >> >> >> Josef >>> >> >> >> >>> >> >> >> > >>> >> >> >> > Thanks very much in advance. >>> >> >> >> > >>> >> >> >> > Cheers, >>> >> >> >> > >>> >> >> >> > Ben >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > _______________________________________________ >>> >> >> >> > SciPy-User mailing list >>> >> >> >> > SciPy-User at scipy.org >>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> > >>> >> >> >> > >>> >> >> >> _______________________________________________ >>> >> >> >> SciPy-User mailing list >>> >> >> >> SciPy-User at scipy.org >>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Benedikt Riedel >>> >> >> > Graduate Student University of Wisconsin-Madison >>> >> >> > Department of Physics >>> >> >> > Office: 2304 Chamberlin Hall >>> >> >> > Lab: 6247 Chamberlin Hall >>> >> >> > Tel: ?(608) 301-5736 >>> >> >> > Cell: (213) 519-1771 >>> >> >> > Lab: (608) 262-5916 >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > SciPy-User mailing list >>> >> >> > SciPy-User at scipy.org >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> > >>> >> >> > >>> >> >> _______________________________________________ >>> >> >> SciPy-User mailing list >>> >> >> SciPy-User at scipy.org >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Benedikt Riedel >>> >> > Graduate Student University of Wisconsin-Madison >>> >> > Department of Physics >>> >> > Office: 2304 Chamberlin Hall >>> >> > Lab: 6247 Chamberlin Hall >>> >> > Tel: ?(608) 301-5736 >>> >> > Cell: (213) 519-1771 >>> >> > Lab: (608) 262-5916 >>> >> > >>> >> > _______________________________________________ >>> >> > SciPy-User mailing list >>> >> > SciPy-User at scipy.org >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > >>> >> > >>> >> _______________________________________________ >>> >> SciPy-User mailing list >>> >> SciPy-User at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> > >>> > >>> > -- >>> > Benedikt Riedel >>> > Graduate Student University of Wisconsin-Madison >>> > Department of Physics >>> > Office: 2304 Chamberlin Hall >>> > Lab: 6247 Chamberlin Hall >>> > Tel: ?(608) 301-5736 >>> > Cell: (213) 519-1771 >>> > Lab: (608) 262-5916 >>> > >>> > _______________________________________________ >>> > SciPy-User mailing list >>> > SciPy-User at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> > >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >> -- >> Benedikt Riedel >> Graduate Student University of Wisconsin-Madison >> Department of Physics >> Office: 2304 Chamberlin Hall >> Lab: 6247 Chamberlin Hall >> Tel: ?(608) 301-5736 >> Cell: (213) 519-1771 >> Lab: (608) 262-5916 >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > From briedel at wisc.edu Mon May 17 13:26:59 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Mon, 17 May 2010 12:26:59 -0500 Subject: [SciPy-User] Least Square fit and goodness of fit In-Reply-To: References: Message-ID: Thanks for the references. I have adjusted the code, such that R4errctsdataselect is sigma and not sigma_squared. Oddly, enough I made a stupid mistake by using the original guess for parameters rather than final guess of parameters in my chi-squared, which of course threw me off. Thanks again for the help. Cheers, Ben On Mon, May 17, 2010 at 10:20, wrote: > On Mon, May 17, 2010 at 7:35 AM, wrote: > > On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel > wrote: > >> Thanks for the clarification. I am still not sure how to get the > chi-squared > >> value of my regression though. When I use the formula under "Regression > >> Analysis" here > >> > >> http://en.wikipedia.org/wiki/Goodness_of_fit > >> > >> I get a chi-square somewhere around 19, which seems way to large > compared to > >> the value of 3.2 I get for the same data set when I fit it using > gnuplot. > >> Where gnuplot supposedly used the weighted sum of squares of residuals. > I do > >> not fully this because of the results I get. > >> > >> Here is the python code I used: > >> > >> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), > >> 2)/pow(R4errctsdataselect,2)))/dof > > > > > > from some gnuplot help page it looks like what they call chisquare is > WSSR/dof > > > > which would be something like > > > > chi2=(sum( ( R4ctsdataselect-fitfunc(pinit, tau)) / > > sqrt(R4errctsdataselect) )**2 )/dof > > > > I'm not sure whether the sqrt is in there or not, because I don't > > remember the normalization that is used, weights or weights squared > > (for reference) > gnuplot is pretty vague on the denominator > http://theochem.ki.ku.dk/on_line_docs/gnuplot/gnuplot_21.html#SEC81 > > a bit better explanation of the terminology > http://www.graphpad.com/faq/viewfaq.cfm?faq=926 > > any more explicit reference has sigma in the denominator > > Josef > > > Josef > > > > > > > > > >> > >> Sorry for being so thick headed, statistics is just beyond me at times. > >> > >> Cheers, > >> > >> Ben > >> > >> On Mon, May 17, 2010 at 00:20, wrote: > >>> > >>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel > >>> wrote: > >>> > > >>> > > >>> > On Sun, May 16, 2010 at 22:33, wrote: > >>> >> > >>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel > >>> >> wrote: > >>> >> > What I still do not understand is the fact that curve_fit gives me > a > >>> >> > different output then leastsq, even though curve_fit calls > leastsq. > >>> >> > > >>> >> > I tried to get the chi-squared because we want to plot contours of > >>> >> > chi-square from the minimum to the maximum. I used following code: > >>> >> > > >>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >>> >> > pinit = [20,20.] > >>> >> > > >>> >> > def func(x, a, b): > >>> >> > return a*exp(-x) + b > >>> >> > > >>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit, > >>> >> > sigma=R4errctsdataselect) > >>> >> > >>> >> this uses weighted least squares > >>> >> sigma : None or N-length sequence > >>> >> If not None, it represents the standard-deviation of ydata. This > >>> >> vector, if given, will be used as weights in the least-squares > problem > >>> >> > >>> >> In your initial example with leastsq you don't have any weighting, > >>> >> it's just ordinary least squares > >>> >> > >>> >> maybe that's the difference. > >>> >> > >>> >> > >>> > > >>> > Yeah I guess that will be it. > >>> > > >>> >> > >>> >> > print pfinal > >>> >> > print covar > >>> >> > dof=size(tau)-size(pinit) > >>> >> > print dof > >>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), > 2)/fitfunc(pinit, > >>> >> > tau)))/dof > >>> >> > print chi2 > >>> >> > > >>> >> > I am not 100% sure I am doing the degrees of freedom calculation > >>> >> > right. > >>> >> > I > >>> >> > got the chi-square formula from the Pearson chi-squared test. > >>> >> > >>> >> I don't recognize your formula for chi2, and I don't see the > >>> >> connection to Pearson chi-squared test . > >>> >> > >>> >> Do you have a reference? > >>> >> > >>> > > >>> > I based my use of the Pearson test from what I read in an > Econometrics > >>> > book, > >>> > but wiki has the a pretty good description. I basically based it off > the > >>> > example there. Where the expected would be what comes out of the fit > and > >>> > what you is the "R4ctsdataselect" for those specific values. > >>> > > >>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test > >>> > >>> I looked at that, but it's a completely different case, the values in > >>> the formulas are frequencies > >>> > >>> Oi = an observed frequency; > >>> Ei = an expected (theoretical) frequency, asserted by the null > >>> hypothesis; > >>> > >>> not points on a regression curve > >>> > >>> Josef > >>> > >>> > > >>> > > >>> >> > >>> >> Josef > >>> >> > >>> > > >>> > Thanks again > >>> > > >>> > Ben > >>> > > >>> > > >>> >> > >>> >> > > >>> >> > Thank you very much for the help so far. > >>> >> > > >>> >> > Cheers, > >>> >> > > >>> >> > Ben > >>> >> > > >>> >> > On Sun, May 16, 2010 at 05:50, wrote: > >>> >> >> > >>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel < > briedel at wisc.edu> > >>> >> >> wrote: > >>> >> >> > > >>> >> >> > > >>> >> >> > On Fri, May 14, 2010 at 14:51, wrote: > >>> >> >> >> > >>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel > >>> >> >> >> > >>> >> >> >> wrote: > >>> >> >> >> > Hey, > >>> >> >> >> > > >>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit > to > >>> >> >> >> > a > >>> >> >> >> > set > >>> >> >> >> > of > >>> >> >> >> > data. Currently, I am using following code: > >>> >> >> >> > > >>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x) > >>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x)) > >>> >> >> >> > pinit = [20,20.] > >>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), > >>> >> >> >> > full_output=1) > >>> >> >> >> > > >>> >> >> >> > I am now trying to get the goodness of fit out of this data. > I > >>> >> >> >> > am > >>> >> >> >> > sort > >>> >> >> >> > of > >>> >> >> >> > running into a brick wall because I found a lot of > conflicting > >>> >> >> >> > ways > >>> >> >> >> > of > >>> >> >> >> > how > >>> >> >> >> > to calculate it. > >>> >> >> >> > >>> >> >> >> For regression the usual is > >>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination > >>> >> >> >> coefficient of determination is > >>> >> >> >> > >>> >> >> >> R^2 = 1 - {SS_{err} / SS_{tot}} > >>> >> >> >> > >>> >> >> >> Note your fitfunc is linear in parameters and can be better > >>> >> >> >> estimated > >>> >> >> >> by linear least squares, OLS. > >>> >> >> >> linear regression is handled in statsmodels and you can get > lot's > >>> >> >> >> of > >>> >> >> >> statistics without worrying about the formulas. > >>> >> >> >> If you only have one slope parameter, then > scipy.stats.linregress > >>> >> >> >> also > >>> >> >> >> works > >>> >> >> >> > >>> >> >> > > >>> >> >> > Thanks for the information. I am still note quite sure if this > is > >>> >> >> > what > >>> >> >> > my > >>> >> >> > boss wants because there should not be an average y value. > >>> >> >> > >>> >> >> The definition of Rsquared is pretty uncontroversial with the > >>> >> >> y.mean() > >>> >> >> correction, if there is a constant in the regression (although I > >>> >> >> know > >>> >> >> mainly the linear case for this). > >>> >> >> > >>> >> >> If there is no constant in the regression, the definition or > >>> >> >> Rsquared > >>> >> >> is not clear/unambiguous, but usually used without mean > correction > >>> >> >> of > >>> >> >> y. > >>> >> >> > >>> >> >> Josef > >>> >> >> > >>> >> >> > > >>> >> >> >> > >>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the > >>> >> >> >> covariance > >>> >> >> >> of the parameter estimates. > >>> >> >> >> > http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit > >>> >> >> > > >>> >> >> > I have been trying this out, but the fit just looks horrid > >>> >> >> > compared > >>> >> >> > to > >>> >> >> > using > >>> >> >> > leastsq method even though they call the same function > according > >>> >> >> > to > >>> >> >> > the > >>> >> >> > documentation. > >>> >> >> > > >>> >> >> >> > >>> >> >> >> > I am aware of the chisquare function in stats function, but > the > >>> >> >> >> > documentation seems a little confusing to me. Any help would > be > >>> >> >> >> > greatly > >>> >> >> >> > appreciates. > >>> >> >> >> > >>> >> >> >> chisquare and others like kolmogorov-smirnov are more for > testing > >>> >> >> >> the > >>> >> >> >> goodness-of-fit of entire distributions, not for how well a > curve > >>> >> >> >> or > >>> >> >> >> line fits the data. > >>> >> >> >> > >>> >> >> > > >>> >> >> > That is what I thought, which brought up my confusion when I > asked > >>> >> >> > other > >>> >> >> > people and they told me to use that > >>> >> >> > > >>> >> >> >> > >>> >> >> >> Josef > >>> >> >> >> > >>> >> >> >> > > >>> >> >> >> > Thanks very much in advance. > >>> >> >> >> > > >>> >> >> >> > Cheers, > >>> >> >> >> > > >>> >> >> >> > Ben > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > _______________________________________________ > >>> >> >> >> > SciPy-User mailing list > >>> >> >> >> > SciPy-User at scipy.org > >>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> _______________________________________________ > >>> >> >> >> SciPy-User mailing list > >>> >> >> >> SciPy-User at scipy.org > >>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > -- > >>> >> >> > Benedikt Riedel > >>> >> >> > Graduate Student University of Wisconsin-Madison > >>> >> >> > Department of Physics > >>> >> >> > Office: 2304 Chamberlin Hall > >>> >> >> > Lab: 6247 Chamberlin Hall > >>> >> >> > Tel: (608) 301-5736 > >>> >> >> > Cell: (213) 519-1771 > >>> >> >> > Lab: (608) 262-5916 > >>> >> >> > > >>> >> >> > _______________________________________________ > >>> >> >> > SciPy-User mailing list > >>> >> >> > SciPy-User at scipy.org > >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> >> > > >>> >> >> > > >>> >> >> _______________________________________________ > >>> >> >> SciPy-User mailing list > >>> >> >> SciPy-User at scipy.org > >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Benedikt Riedel > >>> >> > Graduate Student University of Wisconsin-Madison > >>> >> > Department of Physics > >>> >> > Office: 2304 Chamberlin Hall > >>> >> > Lab: 6247 Chamberlin Hall > >>> >> > Tel: (608) 301-5736 > >>> >> > Cell: (213) 519-1771 > >>> >> > Lab: (608) 262-5916 > >>> >> > > >>> >> > _______________________________________________ > >>> >> > SciPy-User mailing list > >>> >> > SciPy-User at scipy.org > >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> > > >>> >> _______________________________________________ > >>> >> SciPy-User mailing list > >>> >> SciPy-User at scipy.org > >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> > > >>> > > >>> > -- > >>> > Benedikt Riedel > >>> > Graduate Student University of Wisconsin-Madison > >>> > Department of Physics > >>> > Office: 2304 Chamberlin Hall > >>> > Lab: 6247 Chamberlin Hall > >>> > Tel: (608) 301-5736 > >>> > Cell: (213) 519-1771 > >>> > Lab: (608) 262-5916 > >>> > > >>> > _______________________________________________ > >>> > SciPy-User mailing list > >>> > SciPy-User at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> > > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >> > >> > >> -- > >> Benedikt Riedel > >> Graduate Student University of Wisconsin-Madison > >> Department of Physics > >> Office: 2304 Chamberlin Hall > >> Lab: 6247 Chamberlin Hall > >> Tel: (608) 301-5736 > >> Cell: (213) 519-1771 > >> Lab: (608) 262-5916 > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon May 17 13:32:23 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 17 May 2010 13:32:23 -0400 Subject: [SciPy-User] sparse array hstack In-Reply-To: References: Message-ID: On Thu, May 13, 2010 at 10:26 AM, Jason Rennie wrote: > It appears that numpy.hstack doesn't work with scipy sparse arrays. ?I'm > using scipy 0.6.0 (Debian stable). ?Am I observing correctly? ?Does a later > version of numpy/scipy fix this? ?Or, is there code available which will do > an hstack on sparse arrays? > Thanks, > Jason You want to use scipy.sparse.hstack, which works for me with recent scipy trunk In [20]: from scipy import sparse In [21]: a = sparse.lil_matrix((10,10)) In [22]: a[0,0]=100 In [23]: b = sparse.lil_matrix((10,10)) In [24]: b[0,0] = 99 In [25]: c = sparse.hstack([a,b]) In [26]: c.toarray() Skipper From wesmckinn at gmail.com Mon May 17 13:45:48 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 17 May 2010 13:45:48 -0400 Subject: [SciPy-User] scikits.timeseries: How to define frequency of 15minutes In-Reply-To: <4BF13906.9030707@internet.lu> References: <4BF13906.9030707@internet.lu> Message-ID: On Mon, May 17, 2010 at 8:39 AM, Georges Schutz wrote: > Hi Martin, > It is good to hear that there are others facing the same problem because > this my raise the importance of that issue for future plans. > > The solution you propose would be OK for me, I think I could live a > while with being restricted to the proposed frequencies even if I would > look foreword to customizable frequency on the long term. > > Thanks > Georges Schutz > > On 05/05/2010 12:03, Martin Felder wrote: >> Hi *, >> >> just for the record, I'm having the exact same problem as Georges. I >> read through your discussion from three weeks ago, but I also don't feel >> up to modifying the C code myself (being a Fortran kind of guy...). >> >> I understand implementing custom user-defined frequencies is probably a >> lot of effort, but maybe it's less troublesome to just add some >> frequencies often used (=by Georges and me, and hopefully others?) to >> the currently implemented ones? I'd be extremely happy to have 12h, 6h, >> 3h, 15min and 10min intervals in addition to the existing ones. >> >> If you could point me to the part of the code that would have to be >> modified for that, maybe I can find someone more apt in C who can >> implement it. >> >> Thanks, >> Martin >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > On this note and per an offline discussion I had with Martin-- I'd be interested to see what people think about the approach I've taken to dealing with this problem in pandas (http://code.google.com/p/pandas/). For example, it's relatively trivial to do something like: offset = Minute(15) ts_15min = ts.asfreq(offset) and to fill forward, interpolate the resulting series, among other things. One of the key differences between the pandas data structures and scikits.timeseries.TimeSeries is that data is not required to be fixed-frequency, but can be explicitly "reindexed" to the desired frequency. I find in my applications that I will often generate a "date range of interest" (with the desired frequency) and then conform all my data to that date range, e.g.: conformed_data = data.reindex(date_range) Of course you trade performance for flexibility. But IO is still by and large the biggest bottleneck I've encountered. From vanforeest at gmail.com Mon May 17 15:10:21 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 17 May 2010 21:10:21 +0200 Subject: [SciPy-User] python for physics In-Reply-To: <20100516165143.GF19278@phare.normalesup.org> References: <20100516165143.GF19278@phare.normalesup.org> Message-ID: Hi, You might also find Hans Langtangen's book on scientific computation with pyhon interesting. Nicky On 16 May 2010 18:51, Gael Varoquaux wrote: > On Sun, May 16, 2010 at 05:10:40PM +0100, alexander baker wrote: >> ? ?3 friends Physics friends of mine are looking for a starting point to >> ? ?learn scientific computing in Python relevant to applied Physics, does >> ? ?anyone have any suggestions, hints or event a deck of slides that could be >> ? ?useful? > > This is not really physics-related, and is more oriented towards image > analysis than Physics, and on top of that it is unfinished, and I have > been shying from publishing on the net, but the notes of the courses I > give can be found here: > http://gael-varoquaux.info/python4science-2x1.pdf > > Also, see Fernando's py4science page, full of useful material: > http://fperez.org/py4science/starter_kit.html > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jlconlin at gmail.com Mon May 17 15:20:53 2010 From: jlconlin at gmail.com (Jeremy Conlin) Date: Mon, 17 May 2010 13:20:53 -0600 Subject: [SciPy-User] python for physics In-Reply-To: <20100516165143.GF19278@phare.normalesup.org> References: <20100516165143.GF19278@phare.normalesup.org> Message-ID: Ga?l, Thanks for posting these links, they look like a really good introduction which I can use to help my coworkers. (I'm not even the original poster.) One question though is how you got the output from iPython into your document. Of course you could just copy and paste it in, but for some reason I believe you have this process automated. Is it automated and are you willing to share how you did it? Thanks, Jeremy > This is not really physics-related, and is more oriented towards image > analysis than Physics, and on top of that it is unfinished, and I have > been shying from publishing on the net, but the notes of the courses I > give can be found here: > http://gael-varoquaux.info/python4science-2x1.pdf > > Also, see Fernando's py4science page, full of useful material: > http://fperez.org/py4science/starter_kit.html > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From baker.alexander at gmail.com Mon May 17 15:47:06 2010 From: baker.alexander at gmail.com (alexander baker) Date: Mon, 17 May 2010 20:47:06 +0100 Subject: [SciPy-User] python for physics In-Reply-To: References: <20100516165143.GF19278@phare.normalesup.org> Message-ID: Thank you all for the links thus far, I aim to try out the docs with folks in early part of June so will come back with some feedback on how things get on. Alex Mobile: 07788 872118 Blog: www.alexfb.com -- All science is either physics or stamp collecting. On 17 May 2010 20:20, Jeremy Conlin wrote: > Ga?l, > > Thanks for posting these links, they look like a really good > introduction which I can use to help my coworkers. (I'm not even the > original poster.) > > One question though is how you got the output from iPython into your > document. Of course you could just copy and paste it in, but for some > reason I believe you have this process automated. Is it automated and > are you willing to share how you did it? > > Thanks, > Jeremy > > > > This is not really physics-related, and is more oriented towards image > > analysis than Physics, and on top of that it is unfinished, and I have > > been shying from publishing on the net, but the notes of the courses I > > give can be found here: > > http://gael-varoquaux.info/python4science-2x1.pdf > > > > Also, see Fernando's py4science page, full of useful material: > > http://fperez.org/py4science/starter_kit.html > > > > Ga?l > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > >_______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Mon May 17 15:57:44 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 17 May 2010 21:57:44 +0200 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: Hi Josef, Thanks for the answer. > Actually, if the onepoint distribution directly subclasses rv_generic > then it wouldn't rely on or interfere with the generic framework in > rv_continuous or rv_discrete (where it wouldn't really fit in if > onepoint is on reals), and it might be relatively easy to provide all > the methods of the distributions for a single point distribution. I must admit that I haven't had a look at the innards of rv_generic, so I am afraid I cannot be of any relevant help in this respect. > > Choice of name: > to me, "deterministic random variable" sounds like an oxymoron, > although I found some references to deterministic distribution (mainly > or exclusively in queuing theory and > http://isi.cbs.nl/glossary/term902.htm) > I would prefer a boring "onepoint" distribution, or "degenerate", or ... ? Degenerate seems nice to me. I just checked the book Probability by Shiryaev, and he also uses the word `degenerate'. Interestingly, he introduces the degenerate distribution as the normal distribution with sigma = 0. I suspect that implementing the degenerate distribution like this is utterly stupid. > Can you file a ticket with what you would like to have? Sure. Sorry for bothering you with this, but how? > > I started to work again a bit on enhancing the distributions, mainly > I'm experimenting with several generic estimation methods. My target > is to have a working estimator for any distribution in scipy.stats and > for several additional distributions. This seems a nice idea, but quite ambitious. Have you also thought about estimators for heavy tailed distributions? This is, as far as I know, a very delicate topic. > > I worry a bit that a deterministic distribution might not fit into a > general framework for distributions and might need to be special cased > for some methods. (but see above) This must be fairly easy. Just the mean can be relevant. > http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/ I'll have a look. Thanks. Nicky From vanforeest at gmail.com Mon May 17 16:37:06 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 17 May 2010 22:37:06 +0200 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: While checking out your sandbox: >> http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/ I came across the file stats_dhuard.py. Here you mention to use kernel density estimators to approximate densities. I suddenly recalled that I read Section 6.1.3 of Stachursky's book Economic Dynamics, theory and dynamics. This section on kernel density estimators may be quite (very?) useful for the problems you mentioned in another mail (using splines to approximate distributions). Nicky From jgomezdans at gmail.com Mon May 17 16:39:59 2010 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Mon, 17 May 2010 21:39:59 +0100 Subject: [SciPy-User] python for physics In-Reply-To: References: <20100516165143.GF19278@phare.normalesup.org> Message-ID: Hi, On 17 May 2010 20:20, Jeremy Conlin wrote: > One question though is how you got the output from iPython into your > document. Of course you could just copy and paste it in, but for some > reason I believe you have this process automated. Is it automated and > are you willing to share how you did it? I'm not Ga?l, but I think he used sphinxSee and in particular (ipython is here: < http://matplotlib.sourceforge.net/sampledoc/extensions.html#ipython-sessions >) Very easy to use and lovely results. Hope that helps, Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattknox.ca at gmail.com Mon May 17 18:59:13 2010 From: mattknox.ca at gmail.com (Matt Knox) Date: Mon, 17 May 2010 22:59:13 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?scikits=2Etimeseries=3A_How_to_define_freq?= =?utf-8?q?uency_of=0915minutes?= References: <4BF13906.9030707@internet.lu> Message-ID: Georges Schutz internet.lu> writes: > > Hi Martin, > It is good to hear that there are others facing the same problem because > this my raise the importance of that issue for future plans. > > The solution you propose would be OK for me, I think I could live a > while with being restricted to the proposed frequencies even if I would > look foreword to customizable frequency on the long term. > > Thanks > Georges Schutz > > On 05/05/2010 12:03, Martin Felder wrote: > > Hi *, > > > > just for the record, I'm having the exact same problem as Georges. I > > read through your discussion from three weeks ago, but I also don't feel > > up to modifying the C code myself (being a Fortran kind of guy...). > > > > I understand implementing custom user-defined frequencies is probably a > > lot of effort, but maybe it's less troublesome to just add some > > frequencies often used (=by Georges and me, and hopefully others?) to > > the currently implemented ones? I'd be extremely happy to have 12h, 6h, > > 3h, 15min and 10min intervals in addition to the existing ones. > > > > If you could point me to the part of the code that would have to be > > modified for that, maybe I can find someone more apt in C who can > > implement it. > > > > Thanks, > > Martin > > Sorry, missed this post earlier. The relevant C code is in the src and include subfolders in the c_dates.c and c_dates.h files. I don't have any objections to defining some extra frequencies like this as a stop gap solution along the way to a longer term more generic custom frequency solution. Or if Pierre feels that it is not appropriate to include these in the package, it should be easy enough to maintain a separate set of patches since that code doesn't really change much these days. And if it does change substantially it will likely be because we are doing a major overhaul of the package which would probably include support for custom frequencies anyway. - Matt PS. If you don't hear from Pierre or myself within several days on questions like this, feel free to ping me at my personal email to draw my attention to it because it probably means I just didn't notice it. You can find my email address somewhere in the timeseries documentation or source code. From seb.haase at gmail.com Tue May 18 03:20:40 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 18 May 2010 09:20:40 +0200 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: <201005141650.17597.lpc@cmu.edu> References: <201005141650.17597.lpc@cmu.edu> Message-ID: Just thought of another question: So FreeImage doesn't even depend on libjpeg !? I'm asking because I remember problems with installing (building!?) PIL on OS-X where jpg wasn't working because of some problem related to libjpeg ... I don't remember the exact circumstances - but if FreeImage didn't have that dependency it would be another thing _less_ to worry about. -Sebastian On Fri, May 14, 2010 at 10:50 PM, Luis Pedro Coelho wrote: > On Wednesday, Sebastian Haase wrote: >> this sounds exciting and I might find some time to try it out ... >> BTW, the Python image-sig ?should not be a "PIL only" mailing list. So >> (eventually) I feel, this issue could be brought up there, too. > > I have created a mailing list for python computer vision topics (things that > are images but not PIL related): > > http://groups.google.com/group/pythonvision?pli=1 > > It is currently very low traffic since it just started (this is my first > public announcement). > > * > > Btw, for the same sort of issues (opening 16-bit TIFFs in particular), I once > wrote a wrapper around imagemagick's C++ image opening functions: > > http://github.com/luispedro/readmagick > > I works nicely on linux, but some people were trying to use it on Mac or > Windows and got really stuck b/c they didn't know how to compile it and I > couldn't help them, so I gave up on trying to make this more widely used. > > HTH > -- > Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From zachary.pincus at yale.edu Tue May 18 07:28:33 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 18 May 2010 07:28:33 -0400 Subject: [SciPy-User] FreeImage <-> numpy IO wrappers In-Reply-To: References: <201005141650.17597.lpc@cmu.edu> Message-ID: <1362D71A-5799-436D-BB21-01CE45ECFBD6@yale.edu> > Just thought of another question: > > So FreeImage doesn't even depend on libjpeg !? > I'm asking because I remember problems with installing (building!?) > PIL on OS-X where jpg wasn't working because of some problem related > to libjpeg ... > I don't remember the exact circumstances - but if FreeImage didn't > have that dependency it would be another thing _less_ to worry about. It doesn't depend on any external libraries, but uses libs jpeg, tiff, png, and z internally -- they're included with the source and compiled as part of the build process. From mikehulluk at googlemail.com Wed May 19 10:53:58 2010 From: mikehulluk at googlemail.com (Michael Hull) Date: Wed, 19 May 2010 15:53:58 +0100 Subject: [SciPy-User] PCA functions Message-ID: Hi Everybody, I am doing some work using numpy/scipy and wanted to find the principle components for some data. I can write a fairly simple function to do this, but was wondering if there was already a function in scipy to do this that I hadn't found before re-inventing the wheel Many thanks, Mike Hull From zachary.pincus at yale.edu Wed May 19 11:31:20 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 19 May 2010 11:31:20 -0400 Subject: [SciPy-User] PCA functions In-Reply-To: References: Message-ID: <5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu> Hi Mike, Here's what I use. I don't think there's anything in scipy per se, but I might be wrong. Empirically, I find that doing the PCA with eigh is faster than with svd, but this might be based on the dimensionality of my data vs. the number of data points I use. The functions take in an (m,n)-shaped matrix of m data points in n dimensions, and return an array of shape (k,n) consisting of k principal components in n dimensions (where k=min(m,n)), a (k,)-shaped array of the variances of the data along each principal component, and a (m,k)-shaped array of the projection of each data point into the subspace spanned by the principal components. Zach import numpy def pca_svd(flat): u, s, vt = numpy.linalg.svd(flat, full_matrices = 0) pcs = vt v = numpy.transpose(vt) data_count = len(flat) variances = s**2 / data_count positions = u * s return pcs, variances, positions def pca_eig(flat): values, vectors = _symm_eig(flat) pcs = vectors.transpose() variances = values / len(flat) positions = numpy.dot(flat, vectors) return pcs, variances, positions def _symm_eig(a): """Given input a, return the non-zero eigenvectors and eigenvalues of the symmetric matrix a'a. If a has more columns than rows, then that matrix will be rank- deficient, and the non-zero eigenvalues and eigenvectors of a'a can be more easily extracted from the matrix aa'. From the properties of the SVD: if a of shape (m,n) has SVD u*s*v', then: a'a = v*s's*v' aa' = u*ss'*u' let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m) and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of the other: v = a'*u*s_hat' u = a*v*s_hat """ m, n = a.shape if m >= n: # just return the eigenvalues and eigenvectors of a'a vecs, vals = _eigh(numpy.dot(a.transpose(), a)) vecs = numpy.where(vecs < 0, 0, vecs) return vecs, vals else: # figure out the eigenvalues and vectors based on aa', which is smaller sst_diag, u = _eigh(numpy.dot(a, a.transpose())) # in case due to numerical instabilities we have sst_diag < 0 anywhere, # peg them to zero sst_diag = numpy.where(sst_diag < 0, 0, sst_diag) # now get the inverse square root of the diagonal, which will form the # main diagonal of s_hat err = numpy.seterr(divide='ignore', invalid='ignore') s_hat_diag = 1/numpy.sqrt(sst_diag) numpy.seterr(**err) s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0) # s_hat_diag is a list of length m, a'u is (n,m), so we can just use # numpy's broadcasting instead of matrix multiplication, and only create # the upper mxm block of a'u, since that's all we'll use anyway... v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag return sst_diag, v def _eigh(m): values, vectors = numpy.linalg.eigh(m) order = numpy.flipud(values.argsort()) return values[order], vectors[:,order] On May 19, 2010, at 10:53 AM, Michael Hull wrote: > Hi Everybody, > I am doing some work using numpy/scipy and wanted to find the > principle components for some data. I can write a fairly simple > function to do this, but was wondering if there was already a function > in scipy to do this that I hadn't found before re-inventing the wheel > > Many thanks, > > > Mike Hull > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Wed May 19 11:44:20 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 19 May 2010 11:44:20 -0400 Subject: [SciPy-User] PCA functions In-Reply-To: <5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu> References: <5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu> Message-ID: On Wed, May 19, 2010 at 11:31 AM, Zachary Pincus wrote: > Hi Mike, > > Here's what I use. I don't think there's anything in scipy per se, but > I might be wrong. There is nothing directly in numpy/scipy but many packages have their own version, the most heavy duty version might be in MDP http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html and the corresponding thread http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html Josef > > Empirically, I find that doing the PCA with eigh is faster than with > svd, but this might be based on the dimensionality of my data vs. the > number of data points I use. The functions take in an (m,n)-shaped > matrix of m data points in n dimensions, and return an array of shape > (k,n) consisting of k principal components in n dimensions (where > k=min(m,n)), a (k,)-shaped array of the variances of the data along > each principal component, and a (m,k)-shaped array of the projection > of each data point into the subspace spanned by the principal > components. > > Zach > > > > import numpy > > def pca_svd(flat): > ? u, s, vt = numpy.linalg.svd(flat, full_matrices = 0) > ? pcs = vt > ? v = numpy.transpose(vt) > ? data_count = len(flat) > ? variances = s**2 / data_count > ? positions = ?u * s > ? return pcs, variances, positions > > def pca_eig(flat): > ? values, vectors = _symm_eig(flat) > ? pcs = vectors.transpose() > ? variances = values / len(flat) > ? positions = numpy.dot(flat, vectors) > ? return pcs, variances, positions > > def _symm_eig(a): > ? """Given input a, return the non-zero eigenvectors and eigenvalues > of the symmetric matrix a'a. > > ? If a has more columns than rows, then that matrix will be rank- > deficient, > ? and the non-zero eigenvalues and eigenvectors of a'a can be more > easily extracted > ? from the matrix aa'. From the properties of the SVD: > ? ? if a of shape (m,n) has SVD u*s*v', then: > ? ? ? a'a = v*s's*v' > ? ? ? aa' = u*ss'*u' > ? ? let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m) > ? ? and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of > the other: > ? ? ? v = a'*u*s_hat' > ? ? ? u = a*v*s_hat > ? """ > ? m, n = a.shape > ? if m >= n: > ? ? # just return the eigenvalues and eigenvectors of a'a > ? ? vecs, vals = _eigh(numpy.dot(a.transpose(), a)) > ? ? vecs = numpy.where(vecs < 0, 0, vecs) > ? ? return vecs, vals > ? else: > ? ? # figure out the eigenvalues and vectors based on aa', which is > smaller > ? ? sst_diag, u = _eigh(numpy.dot(a, a.transpose())) > ? ? # in case due to numerical instabilities we have sst_diag < 0 > anywhere, > ? ? # peg them to zero > ? ? sst_diag = numpy.where(sst_diag < 0, 0, sst_diag) > ? ? # now get the inverse square root of the diagonal, which will > form the > ? ? # main diagonal of s_hat > ? ? err = numpy.seterr(divide='ignore', invalid='ignore') > ? ? s_hat_diag = 1/numpy.sqrt(sst_diag) > ? ? numpy.seterr(**err) > ? ? s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0) > ? ? # s_hat_diag is a list of length m, a'u is (n,m), so we can just > use > ? ? # numpy's broadcasting instead of matrix multiplication, and only > create > ? ? # the upper mxm block of a'u, since that's all we'll use anyway... > ? ? v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag > ? ? return sst_diag, v > > def _eigh(m): > ? values, vectors = numpy.linalg.eigh(m) > ? order = numpy.flipud(values.argsort()) > ? return values[order], vectors[:,order] > > > > > > > On May 19, 2010, at 10:53 AM, Michael Hull wrote: > >> Hi Everybody, >> I am doing some work using numpy/scipy and wanted to find the >> principle components for some data. I can write a fairly simple >> function to do this, but was wondering if there was already a function >> in scipy to do this that I hadn't found before re-inventing the wheel >> >> Many thanks, >> >> >> Mike Hull >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From lesserwhirls at gmail.com Wed May 19 12:04:59 2010 From: lesserwhirls at gmail.com (Sean Arms) Date: Wed, 19 May 2010 11:04:59 -0500 Subject: [SciPy-User] PCA functions In-Reply-To: References: Message-ID: Greetings Mike, Are you looking for just the PCA decomposition, or are you wanting to rotate the truncated PC's using something like promax, varimax, etc.? If so, I do not think MDP or NiPy have that capability. I have functions to do some of the basic rotations, and I've tested them against S+ and Matlab if you are looking for that functionality, but I'll probably need to clean them up a bit :-) Sean On Wed, May 19, 2010 at 9:53 AM, Michael Hull wrote: > Hi Everybody, > I am doing some work using numpy/scipy and wanted to find the > principle components for some data. I can write a fairly simple > function to do this, but was wondering if there was already a function > in scipy to do this that I hadn't found before re-inventing the wheel > > Many thanks, > > > Mike Hull > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mikehulluk at googlemail.com Wed May 19 16:13:07 2010 From: mikehulluk at googlemail.com (Michael Hull) Date: Wed, 19 May 2010 21:13:07 +0100 Subject: [SciPy-User] SciPy-User Digest, Vol 81, Issue 39 In-Reply-To: References: Message-ID: > Hi Mike, > > Here's what I use. I don't think there's anything in scipy per se, but > I might be wrong. > > Empirically, I find that doing the PCA with eigh is faster than with > svd, but this might be based on the dimensionality of my data vs. the > number of data points I use. The functions take in an (m,n)-shaped > matrix of m data points in n dimensions, and return an array of shape > (k,n) consisting of k principal components in n dimensions (where > k=min(m,n)), a (k,)-shaped array of the variances of the data along > each principal component, and a (m,k)-shaped array of the projection > of each data point into the subspace spanned by the principal > components. > > Zach > >> >> Here's what I use. I don't think there's anything in scipy per se, but >> I might be wrong. > > There is nothing directly in numpy/scipy but many packages have their own > version, > the most heavy duty version might be in MDP > http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html > > and the corresponding thread > http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html > > Josef > >> >> Empirically, I find that doing the PCA with eigh is faster than with >> svd, but this might be based on the dimensionality of my data vs. the >> number of data points I use. The functions take in an (m,n)-shaped >> matrix of m data points in n dimensions, and return an array of shape >> (k,n) consisting of k principal components in n dimensions (where >> k=min(m,n)), a (k,)-shaped array of the variances of the data along >> each principal component, and a (m,k)-shaped array of the projection >> of each data point into the subspace spanned by the principal >> components. >> >> Zach >> > > Message: 4 > Date: Wed, 19 May 2010 11:04:59 -0500 > From: Sean Arms > Subject: Re: [SciPy-User] PCA functions > To: SciPy Users List > Message-ID: > ? ? ? ? > Content-Type: text/plain; charset=ISO-8859-1 > > Greetings Mike, > > ? ? Are you looking for just the PCA decomposition, or are you > wanting to rotate the truncated PC's using something like promax, > varimax, etc.? ?If so, I do not think MDP or NiPy have that > capability. ?I have functions to do some of the basic rotations, and > I've tested them against S+ and Matlab if you are looking for that > functionality, but I'll probably need to clean them up a bit :-) > > Sean Hi Guys, Thanks very much for the quick responses. I was looking for something simple - the principle components of 1000 data points in a 3 dimensional space, just to do a bit of prelim data exploration, so speed was not so much of an issue - I just implemented something fairly simple with numpy.cov and numpy.eig. I was just wondering if there was something in scipy since this seems like something other people would also reimplement, but It sounds like that to implement this properly in scipy would require more thought/work as there can be more pca than I had thought.... (Apparently the matlab statistics toolbox has a pca function, but according to one colleague "Its just easier to write your own than deal with license servers" :) ) Many thanks, Mike From oliver.tomic at nofima.no Thu May 20 05:35:37 2010 From: oliver.tomic at nofima.no (Oliver Tomic) Date: Thu, 20 May 2010 11:35:37 +0200 Subject: [SciPy-User] PCA functions In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu May 20 09:38:11 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 20 May 2010 07:38:11 -0600 Subject: [SciPy-User] PCA functions In-Reply-To: References: Message-ID: On Thu, May 20, 2010 at 3:35 AM, Oliver Tomic wrote: @Oliver, I posted your email over on the statsmodels list. I'll take a look at the link. Vincent > Hi, > > I already sent this link to Mike (off-list, since my mails kept bouncing > back). A while ago I supervised a student who implemented various flavours > of PCA (using SVD and NIPALS, in Python and C respectively) as part of a > semester project. There is quite a bit of documentation coming with the PCA > module. > > http://folk.uio.no/henninri/pca_module/ > > > I was considering to ask the pystatmodels-group whether they are interested > in including this code, however both code and documentation may need a > little bit of polishing first. Unfortunately, there is no validation > procedure available in the code to validate the model. I have plans on > implementing this if I ever should find some time to do this. > > Cheers > Oliver > > > > > > -----scipy-user-bounces at scipy.org wrote: ----- > > >To: SciPy Users List > >From: Sean Arms > >Sent by: scipy-user-bounces at scipy.org > >Date: 05/19/2010 06:04PM > >Subject: Re: [SciPy-User] PCA functions > > > > >Greetings Mike, > > > > Are you looking for just the PCA decomposition, or are you > >wanting to rotate the truncated PC's using something like promax, > >varimax, etc.? If so, I do not think MDP or NiPy have that > >capability. I have functions to do some of the basic rotations, and > >I've tested them against S+ and Matlab if you are looking for that > >functionality, but I'll probably need to clean them up a bit :-) > > > >Sean > > > >On Wed, May 19, 2010 at 9:53 AM, Michael Hull > > wrote: > >> Hi Everybody, > >> I am doing some work using numpy/scipy and wanted to find the > >> principle components for some data. I can write a fairly simple > >> function to do this, but was wondering if there was already a > >function > >> in scipy to do this that I hadn't found before re-inventing the > >wheel > >> > >> Many thanks, > >> > >> > >> Mike Hull > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >_______________________________________________ > >SciPy-User mailing list > >SciPy-User at scipy.org > >http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > *Vincent Davis 720-301-3003 * vincent at vincentdavis.net my blog | LinkedIn -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 20 11:21:22 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 20 May 2010 11:21:22 -0400 Subject: [SciPy-User] PCA functions In-Reply-To: References: Message-ID: On Thu, May 20, 2010 at 9:38 AM, Vincent Davis wrote: > On Thu, May 20, 2010 at 3:35 AM, Oliver Tomic wrote: > > @Oliver, I posted your email over on the statsmodels list. I'll take a look > at the link. > I briefly looked at the pca_module and it looks well written and documented already, although I think the matrix versions are redundant based on very fast skimming of the code and list of functions. statsmodels already has 3 implementations of pca, with eigh, svd and one wrapped in a class. And there is also an example how to do Principal Component Regression. But we don't have NIPALS yet, or a version that calculates only a few eigenvectors (with eigh). And the current versions in statsmodels are pretty basic, the eigh and svd versions are modeled after and tested against matlab princomp. I think, as the discussions on the nipy and scipy list show, a basic PCA version is easy to write, but everyone emphasizes different extras or performance features, e.g. rotation would be nice for factor analysis. I also think that scipy should have a basic version, just so we don't have to figure out or remember how to do eigh or what all the different parts of svd mean. For statsmodels, I looked at this mainly for regressions in a "data-rich environment", i.e. with lots of possible regressors. For (unsupervised) dimension reduction we still have to figure out how it fits in when pca gets out of the sandbox or when we expand in this area. Also, I don't know if statsmodels will eventually get factor analysis. (I have a multivariate analysis folder on my computer, but thought of leaving this area to pymvpa.) I stopped working on this for the moment, but I thought maybe a class that makes the usage of pca and the corresponding projections easy and self-explanatory would be useful. E.g. for regression we need to be able to rerun the regression with an increasing number of components and should reuse previous calculations. The second point, if there are different implementations, then we should have either automatic selection of the best one given the arguments or a comparative documentation when to use which version. Josef http://tinyurl.com/2dwyjt8 > > Vincent > > >> Hi, >> >> I already sent this link to Mike (off-list, since my mails kept bouncing >> back). A while ago I supervised a student who implemented various flavours >> of PCA (using SVD and NIPALS, in Python and C respectively) as part of a >> semester project. There is quite a bit of documentation coming with the PCA >> module. >> >> http://folk.uio.no/henninri/pca_module/ >> >> >> I was considering to ask the pystatmodels-group whether they are >> interested in including this code, however both code and documentation may >> need a little bit of polishing first. Unfortunately, there is no validation >> procedure available in the code to validate the model. I have plans on >> implementing this if I ever should find some time to do this. >> > >> Cheers >> Oliver >> >> >> >> >> >> -----scipy-user-bounces at scipy.org wrote: ----- >> >> >To: SciPy Users List >> >From: Sean Arms >> >Sent by: scipy-user-bounces at scipy.org >> >Date: 05/19/2010 06:04PM >> >Subject: Re: [SciPy-User] PCA functions >> >> > >> >Greetings Mike, >> > >> > Are you looking for just the PCA decomposition, or are you >> >wanting to rotate the truncated PC's using something like promax, >> >varimax, etc.? If so, I do not think MDP or NiPy have that >> >capability. I have functions to do some of the basic rotations, and >> >I've tested them against S+ and Matlab if you are looking for that >> >functionality, but I'll probably need to clean them up a bit :-) >> > >> >Sean >> > >> >On Wed, May 19, 2010 at 9:53 AM, Michael Hull >> > wrote: >> >> Hi Everybody, >> >> I am doing some work using numpy/scipy and wanted to find the >> >> principle components for some data. I can write a fairly simple >> >> function to do this, but was wondering if there was already a >> >function >> >> in scipy to do this that I hadn't found before re-inventing the >> >wheel >> >> >> >> Many thanks, >> >> >> >> >> >> Mike Hull >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >_______________________________________________ >> >SciPy-User mailing list >> >SciPy-User at scipy.org >> >http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > *Vincent Davis > 720-301-3003 * > vincent at vincentdavis.net > my blog | LinkedIn > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdekauwe at gmail.com Fri May 21 08:59:08 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 21 May 2010 05:59:08 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... Message-ID: <28633477.post@talk.nabble.com> Hi, I am trying to extract data from a 4D array and store it in a 2D array, but avoid my current usage of the for loops for speed, as in reality the arrays sizes are quite big. Could someone also try and explain the solution as well if they have a spare moment as I am still finding it quite difficult to get over the habit of using loops (C convert for my sins). I get that one could precompute the indices's i and j i.e. i = np.arange(tsteps) j = np.arange(numpts) but just can't get my head round how i then use them... Thanks, Martin import numpy as np numpts=10 tsteps = 12 vari = 22 data = np.random.random((tsteps, vari, numpts, 1)) new_data = np.zeros((tsteps, numpts), dtype=np.float32) index = np.arange(numpts) for i in xrange(tsteps): for j in xrange(numpts): new_data[i,j] = data[i,5,index[j],0] -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html Sent from the Scipy-User mailing list archive at Nabble.com. From zachary.pincus at yale.edu Fri May 21 09:11:34 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 21 May 2010 09:11:34 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28633477.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> Message-ID: <74BDBBD3-B73B-45B1-B859-3F8C28902DE7@yale.edu> > import numpy as np > > numpts=10 > tsteps = 12 > vari = 22 > > data = np.random.random((tsteps, vari, numpts, 1)) > new_data = np.zeros((tsteps, numpts), dtype=np.float32) > index = np.arange(numpts) > > for i in xrange(tsteps): > for j in xrange(numpts): > new_data[i,j] = data[i,5,index[j],0] > new_data2 = data[:,5,index,0].astype(numpy.float32) numpy.all(new_data == new_data2) # returns True This assuming that your real "index" array is more interesting than just [0,1,2,3,...]... if not, new_data2 = data[:,5,:,0].astype(numpy.float32) would do fine. That said, I've never been able to figure out whether it's possible to index particular points along multiple axes with index lists -- that is, how to make numpy do something like: new_data3 = data[index_x,5,index_y,0] Zach From josef.pktd at gmail.com Fri May 21 09:12:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 21 May 2010 09:12:57 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28633477.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> Message-ID: On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: > > Hi, > > I am trying to extract data from a 4D array and store it in a 2D array, but > avoid my current usage of the for loops for speed, as in reality the arrays > sizes are quite big. Could someone also try and explain the solution as well > if they have a spare moment as I am still finding it quite difficult to get > over the habit of using loops (C convert for my sins). I get that one could > precompute the indices's i and j i.e. > > i = np.arange(tsteps) > j = np.arange(numpts) > > but just can't get my head round how i then use them... > > Thanks, > Martin > > import numpy as np > > numpts=10 > tsteps = 12 > vari = 22 > > data = np.random.random((tsteps, vari, numpts, 1)) > new_data = np.zeros((tsteps, numpts), dtype=np.float32) > index = np.arange(numpts) > > for i in xrange(tsteps): > ? ?for j in xrange(numpts): > ? ? ? ?new_data[i,j] = data[i,5,index[j],0] The index arrays need to be broadcastable against each other. I think this should do it new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] Josef > > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mdekauwe at gmail.com Fri May 21 10:55:50 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 21 May 2010 07:55:50 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> Message-ID: <28634924.post@talk.nabble.com> Thanks that works... So the way to do it is with np.arange(tsteps)[:,None], that was the step I was struggling with, so this forms a 2D array which replaces the the two for loops? Do I have that right? A lot quicker...! Martin josef.pktd wrote: > > On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >> >> Hi, >> >> I am trying to extract data from a 4D array and store it in a 2D array, >> but >> avoid my current usage of the for loops for speed, as in reality the >> arrays >> sizes are quite big. Could someone also try and explain the solution as >> well >> if they have a spare moment as I am still finding it quite difficult to >> get >> over the habit of using loops (C convert for my sins). I get that one >> could >> precompute the indices's i and j i.e. >> >> i = np.arange(tsteps) >> j = np.arange(numpts) >> >> but just can't get my head round how i then use them... >> >> Thanks, >> Martin >> >> import numpy as np >> >> numpts=10 >> tsteps = 12 >> vari = 22 >> >> data = np.random.random((tsteps, vari, numpts, 1)) >> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >> index = np.arange(numpts) >> >> for i in xrange(tsteps): >> for j in xrange(numpts): >> new_data[i,j] = data[i,5,index[j],0] > > The index arrays need to be broadcastable against each other. > > I think this should do it > > new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] > > Josef >> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Fri May 21 11:27:39 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 21 May 2010 11:27:39 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28634924.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> Message-ID: On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: > > Thanks that works... > > So the way to do it is with np.arange(tsteps)[:,None], that was the step I > was struggling with, so this forms a 2D array which replaces the the two for > loops? Do I have that right? Yes, but as Zachary showed, if you need the full index in a dimension, then you can use slicing. It might be faster. And a warning, mixing slices and index arrays with 3 or more dimensions can have some surprise switching of axes. Josef > > A lot quicker...! > > Martin > > > josef.pktd wrote: >> >> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>> >>> Hi, >>> >>> I am trying to extract data from a 4D array and store it in a 2D array, >>> but >>> avoid my current usage of the for loops for speed, as in reality the >>> arrays >>> sizes are quite big. Could someone also try and explain the solution as >>> well >>> if they have a spare moment as I am still finding it quite difficult to >>> get >>> over the habit of using loops (C convert for my sins). I get that one >>> could >>> precompute the indices's i and j i.e. >>> >>> i = np.arange(tsteps) >>> j = np.arange(numpts) >>> >>> but just can't get my head round how i then use them... >>> >>> Thanks, >>> Martin >>> >>> import numpy as np >>> >>> numpts=10 >>> tsteps = 12 >>> vari = 22 >>> >>> data = np.random.random((tsteps, vari, numpts, 1)) >>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>> index = np.arange(numpts) >>> >>> for i in xrange(tsteps): >>> ? ?for j in xrange(numpts): >>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >> >> The index arrays need to be broadcastable against each other. >> >> I think this should do it >> >> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >> >> Josef >>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Fri May 21 11:46:58 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 21 May 2010 11:46:58 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28634924.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> Message-ID: <3DEFBA54-C30E-4BA3-918F-E0640B5AF8F5@yale.edu> > Thanks that works... > > So the way to do it is with np.arange(tsteps)[:,None], that was the > step I > was struggling with, so this forms a 2D array which replaces the the > two for > loops? Do I have that right? If tsteps is just the size of the array in that dimension, you can use :, as before: data[:,5,index,0] which will be quicker and more straightforward. If you want to index with multiple list-of-indices along different axes, then Josef's point about broadcasting is a good one (and the answer to the question I'd asked, actually...) Given: a = numpy.arange(100).reshape((10,10)) Then: a[numpy.array([0,4,2])[:,numpy.newaxis], [1,2]] or equivalently: a[[[0],[4],[2]], [1,2]] yields: array([[ 1, 2], [41, 42], [21, 22]]) That is, the 0th, 4th, and 2nd rows, and the 1st and 2nd columns of a. Thanks Josef! Zach > A lot quicker...! > > Martin > > > josef.pktd wrote: >> >> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>> >>> Hi, >>> >>> I am trying to extract data from a 4D array and store it in a 2D >>> array, >>> but >>> avoid my current usage of the for loops for speed, as in reality the >>> arrays >>> sizes are quite big. Could someone also try and explain the >>> solution as >>> well >>> if they have a spare moment as I am still finding it quite >>> difficult to >>> get >>> over the habit of using loops (C convert for my sins). I get that >>> one >>> could >>> precompute the indices's i and j i.e. >>> >>> i = np.arange(tsteps) >>> j = np.arange(numpts) >>> >>> but just can't get my head round how i then use them... >>> >>> Thanks, >>> Martin >>> >>> import numpy as np >>> >>> numpts=10 >>> tsteps = 12 >>> vari = 22 >>> >>> data = np.random.random((tsteps, vari, numpts, 1)) >>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>> index = np.arange(numpts) >>> >>> for i in xrange(tsteps): >>> for j in xrange(numpts): >>> new_data[i,j] = data[i,5,index[j],0] >> >> The index arrays need to be broadcastable against each other. >> >> I think this should do it >> >> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >> >> Josef >>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From DParker at chromalloy.com Fri May 21 17:07:56 2010 From: DParker at chromalloy.com (DParker at chromalloy.com) Date: Fri, 21 May 2010 17:07:56 -0400 Subject: [SciPy-User] Sort geometric data by proximity Message-ID: I have a set of geometric data in x, y plane that represents a section of a turbine airfoil. The shape looks something like a fat boomerang with the coordinates wrapping around the entire shape (a completely closed loop). The coordinate points are in a random order and I need to sort or fit them by proximity to develop a dataset containing continuos shape of the airfoil. I started looking through the interpolation functions but I would need a method that ignores the order of the data (fits based on proximity of the points) and can handle data that forms a closed loop. The points are spaced closely enough along the airfoil surface so that they could be sorted by nearest neighbor - start with the first point find the next closest point and continue until all the points are "consumed". Any advice or pointers would be greatly appreciated. David Parker -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_baddeley at yahoo.com.au Fri May 21 17:30:30 2010 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Fri, 21 May 2010 14:30:30 -0700 (PDT) Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: References: Message-ID: <896588.9118.qm@web33006.mail.mud.yahoo.com> Hi David, I'd probably do a Delaunay triangularisation and then, starting at an arbitrary node, walk the shortest edges, collecting nodes as I went. You can get the triangulation from the scikits.delaunay package, which you'll probably already have if you've got matplotlib installed (in this case you can find it as matplotlib.delaunay). You'll need to write a loop (or recursive function) to do the walking, but that shouldn't be too tricky. I've done something similar (collecting 'blobs' of unstructured points which are closer than a certain cutoff) using this technique, so if you need any additional pointers, or ideas on how to optimise the procedure (I precompute a database mapping each vertex to all the edges leading from it - otherwise you've got to loop through the entire edge list on each iteration to find which edges go from the current node/vertex) give me a bell. cheers, David ________________________________ From: "DParker at chromalloy.com" To: scipy-user at scipy.org Sent: Sat, 22 May, 2010 9:07:56 AM Subject: [SciPy-User] Sort geometric data by proximity I have a set of geometric data in x, y plane that represents a section of a turbine airfoil. The shape looks something like a fat boomerang with the coordinates wrapping around the entire shape (a completely closed loop). The coordinate points are in a random order and I need to sort or fit them by proximity to develop a dataset containing continuos shape of the airfoil. I started looking through the interpolation functions but I would need a method that ignores the order of the data (fits based on proximity of the points) and can handle data that forms a closed loop. The points are spaced closely enough along the airfoil surface so that they could be sorted by nearest neighbor - start with the first point find the next closest point and continue until all the points are "consumed". Any advice or pointers would be greatly appreciated. David Parker From aarchiba at physics.mcgill.ca Fri May 21 17:46:10 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 21 May 2010 17:46:10 -0400 Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: References: Message-ID: On 21 May 2010 17:07, wrote: > I have a set of geometric data in x, y plane that represents a section of a > turbine airfoil. The shape looks something like a fat boomerang with the > coordinates wrapping around the entire shape (a completely closed loop). The > coordinate points are in a random order and I need to sort or fit them by > proximity to develop a dataset containing continuos shape of the airfoil. > > I started looking through the interpolation functions but I would need a > method that ignores the order of the data (fits based on proximity of the > points) and can handle data that forms a closed loop. > > The points are spaced closely enough along the airfoil surface so that they > could be sorted by nearest neighbor - start with the first point find the > next closest point and continue until all the points are "consumed". > > Any advice or pointers would be greatly appreciated. The most direct approach is to pick a start point at random, then ask for its two nearest neighbours. Then pick one, and loop. For each point ask for its two nearest neighbours; one should be the last point you looked at, and one should be the next point on your curve. If ever this isn't true, you've found some place where your points don't sample closely enough to clearly describe the turbine shape. When you get your first point back, you're done. As described, this is a fairly slow process, but the dominating operation is not the python looping overhead but the time it takes to find each nearest neighbour. Fortunately scipy.spatial includes an object designed for this sort of problem, the kd-tree. So the way I'd solve your problem is construct a kd-tree from your array of points, then run a query asking the the three closest neighbours of each of your original points (three because each point is its own closest neighbour). Then just write a python loop to walk through the array of neighbours as I described above. This process should be nice and fast, and will diagnose some situations where you've inadequately sampled your object. Anne > > David Parker > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From mdekauwe at gmail.com Fri May 21 21:57:05 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 21 May 2010 18:57:05 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> Message-ID: <28640602.post@talk.nabble.com> Yes as Zachary said index is only 0 to 15237, so both methods work. I don't quite get what you mean about slicing with axis > 3. Is there a link you can recommend I should read? Does that mean given I have 4dims that Josef's suggestion would be more advised in this case? Thanks. josef.pktd wrote: > > On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: >> >> Thanks that works... >> >> So the way to do it is with np.arange(tsteps)[:,None], that was the step >> I >> was struggling with, so this forms a 2D array which replaces the the two >> for >> loops? Do I have that right? > > Yes, but as Zachary showed, if you need the full index in a dimension, > then you can use slicing. It might be faster. > And a warning, mixing slices and index arrays with 3 or more > dimensions can have some surprise switching of axes. > > Josef > >> >> A lot quicker...! >> >> Martin >> >> >> josef.pktd wrote: >>> >>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>>> >>>> Hi, >>>> >>>> I am trying to extract data from a 4D array and store it in a 2D array, >>>> but >>>> avoid my current usage of the for loops for speed, as in reality the >>>> arrays >>>> sizes are quite big. Could someone also try and explain the solution as >>>> well >>>> if they have a spare moment as I am still finding it quite difficult to >>>> get >>>> over the habit of using loops (C convert for my sins). I get that one >>>> could >>>> precompute the indices's i and j i.e. >>>> >>>> i = np.arange(tsteps) >>>> j = np.arange(numpts) >>>> >>>> but just can't get my head round how i then use them... >>>> >>>> Thanks, >>>> Martin >>>> >>>> import numpy as np >>>> >>>> numpts=10 >>>> tsteps = 12 >>>> vari = 22 >>>> >>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>> index = np.arange(numpts) >>>> >>>> for i in xrange(tsteps): >>>> ? ?for j in xrange(numpts): >>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>> >>> The index arrays need to be broadcastable against each other. >>> >>> I think this should do it >>> >>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>> >>> Josef >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640602.html Sent from the Scipy-User mailing list archive at Nabble.com. From mdekauwe at gmail.com Fri May 21 22:14:51 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 21 May 2010 19:14:51 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28640602.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> Message-ID: <28640656.post@talk.nabble.com> Also I then need to remap the 2D array I make onto another grid (the world in this case). Which again I had am doing with a loop (note numpts is a lot bigger than my example above). wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan for i in xrange(numpts): # exclude the NaN, note masking them doesn't work in the stats func x = data1_snow[:,i] x = x[np.isfinite(x)] y = data2_snow[:,i] y = y[np.isfinite(y)] # wilcox signed rank test # make sure we have enough samples to do the test d = x - y d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero differences count = len(d) if count > 10: z, pval = stats.wilcoxon(x, y) # only map out sign different data if pval < 0.05: wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = np.mean(x - y) Now I think I can push the data in one move into the wilcoxStats_snow array by removing the index, but I can't see how I will get the individual x and y pts for each array member correctly without the loop, this was my attempt which of course doesn't work! x = data1_snow[:,:] x = x[np.isfinite(x)] y = data2_snow[:,:] y = y[np.isfinite(y)] # r^2 # exclude v.small arrays, i.e. we need just less over 4 years of data if len(x) and len(y) > 50: pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, y)[0])**2 thanks. mdekauwe wrote: > > Yes as Zachary said index is only 0 to 15237, so both methods work. > > I don't quite get what you mean about slicing with axis > 3. Is there a > link you can recommend I should read? Does that mean given I have 4dims > that Josef's suggestion would be more advised in this case? > > Thanks. > > > > josef.pktd wrote: >> >> On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: >>> >>> Thanks that works... >>> >>> So the way to do it is with np.arange(tsteps)[:,None], that was the step >>> I >>> was struggling with, so this forms a 2D array which replaces the the two >>> for >>> loops? Do I have that right? >> >> Yes, but as Zachary showed, if you need the full index in a dimension, >> then you can use slicing. It might be faster. >> And a warning, mixing slices and index arrays with 3 or more >> dimensions can have some surprise switching of axes. >> >> Josef >> >>> >>> A lot quicker...! >>> >>> Martin >>> >>> >>> josef.pktd wrote: >>>> >>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>> array, >>>>> but >>>>> avoid my current usage of the for loops for speed, as in reality the >>>>> arrays >>>>> sizes are quite big. Could someone also try and explain the solution >>>>> as >>>>> well >>>>> if they have a spare moment as I am still finding it quite difficult >>>>> to >>>>> get >>>>> over the habit of using loops (C convert for my sins). I get that one >>>>> could >>>>> precompute the indices's i and j i.e. >>>>> >>>>> i = np.arange(tsteps) >>>>> j = np.arange(numpts) >>>>> >>>>> but just can't get my head round how i then use them... >>>>> >>>>> Thanks, >>>>> Martin >>>>> >>>>> import numpy as np >>>>> >>>>> numpts=10 >>>>> tsteps = 12 >>>>> vari = 22 >>>>> >>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>> index = np.arange(numpts) >>>>> >>>>> for i in xrange(tsteps): >>>>> ? ?for j in xrange(numpts): >>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>> >>>> The index arrays need to be broadcastable against each other. >>>> >>>> I think this should do it >>>> >>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>> >>>> Josef >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Fri May 21 22:41:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 21 May 2010 22:41:54 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28640656.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> Message-ID: On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: > > Also I then need to remap the 2D array I make onto another grid (the world in > this case). Which again I had am doing with a loop (note numpts is a lot > bigger than my example above). > > wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan > for i in xrange(numpts): > ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats func > ? ? ? ?x = data1_snow[:,i] > ? ? ? ?x = x[np.isfinite(x)] > ? ? ? ?y = data2_snow[:,i] > ? ? ? ?y = y[np.isfinite(y)] > > ? ? ? ?# wilcox signed rank test > ? ? ? ?# make sure we have enough samples to do the test > ? ? ? ?d = x - y > ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero > differences > ? ? ? ?count = len(d) > ? ? ? ?if count > 10: > ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) > ? ? ? ? ? ?# only map out sign different data > ? ? ? ? ? ?if pval < 0.05: > ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = > np.mean(x - y) > > Now I think I can push the data in one move into the wilcoxStats_snow array > by removing the index, > but I can't see how I will get the individual x and y pts for each array > member correctly without the loop, this was my attempt which of course > doesn't work! > > x = data1_snow[:,:] > x = x[np.isfinite(x)] > y = data2_snow[:,:] > y = y[np.isfinite(y)] > > # r^2 > # exclude v.small arrays, i.e. we need just less over 4 years of data > if len(x) and len(y) > 50: > ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, y)[0])**2 If you want to do pairwise comparisons with stats.wilcoxon, then you might be stuck with the loop, since wilcoxon takes only two 1d arrays at a time (if I read the help correctly). Also the presence of nans might force the use a loop. stats.mstats has masked array versions, but I didn't see wilcoxon in the list. (Even when vectorized operations would work with regular arrays, nan or masked array versions still have to loop in many cases.) If you have many columns with count <= 10, so that wilcoxon is not calculated then it might be worth to use only array operations up to that point. If wilcoxon is calculated most of the time, then it's not worth thinking too hard about this. Josef > > thanks. > > > > > mdekauwe wrote: >> >> Yes as Zachary said index is only 0 to 15237, so both methods work. >> >> I don't quite get what you mean about slicing with axis > 3. Is there a >> link you can recommend I should read? Does that mean given I have 4dims >> that Josef's suggestion would be more advised in this case? There were several discussions on the mailing lists (fancy slicing and indexing). Your case is safe, but if you run in future into funny shapes, you can look up the details. when in doubt, I use np.arange(...) Josef >> >> Thanks. >> >> >> >> josef.pktd wrote: >>> >>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: >>>> >>>> Thanks that works... >>>> >>>> So the way to do it is with np.arange(tsteps)[:,None], that was the step >>>> I >>>> was struggling with, so this forms a 2D array which replaces the the two >>>> for >>>> loops? Do I have that right? >>> >>> Yes, but as Zachary showed, if you need the full index in a dimension, >>> then you can use slicing. It might be faster. >>> And a warning, mixing slices and index arrays with 3 or more >>> dimensions can have some surprise switching of axes. >>> >>> Josef >>> >>>> >>>> A lot quicker...! >>>> >>>> Martin >>>> >>>> >>>> josef.pktd wrote: >>>>> >>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>> array, >>>>>> but >>>>>> avoid my current usage of the for loops for speed, as in reality the >>>>>> arrays >>>>>> sizes are quite big. Could someone also try and explain the solution >>>>>> as >>>>>> well >>>>>> if they have a spare moment as I am still finding it quite difficult >>>>>> to >>>>>> get >>>>>> over the habit of using loops (C convert for my sins). I get that one >>>>>> could >>>>>> precompute the indices's i and j i.e. >>>>>> >>>>>> i = np.arange(tsteps) >>>>>> j = np.arange(numpts) >>>>>> >>>>>> but just can't get my head round how i then use them... >>>>>> >>>>>> Thanks, >>>>>> Martin >>>>>> >>>>>> import numpy as np >>>>>> >>>>>> numpts=10 >>>>>> tsteps = 12 >>>>>> vari = 22 >>>>>> >>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>> index = np.arange(numpts) >>>>>> >>>>>> for i in xrange(tsteps): >>>>>> ? ?for j in xrange(numpts): >>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>> >>>>> The index arrays need to be broadcastable against each other. >>>>> >>>>> I think this should do it >>>>> >>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>>> >>>>> Josef >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mdekauwe at gmail.com Sat May 22 06:21:09 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Sat, 22 May 2010 03:21:09 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> Message-ID: <28642434.post@talk.nabble.com> Sounds like I am stuck with the loop as I need to do the comparison for each pixel of the world and then I have a basemap function call which I guess slows it down further...hmm i.e. def compareSnowData(jules_var): # Extract the 11 years of snow data and return outrows = 180 outcols = 360 numyears = 11 nummonths = 12 # Read various files fname="world_valid_jules_pts.ascii" (numpts, land_pts_index, latitude, longitude, rows, cols) = jo.read_land_points_ascii(fname, 1.0) fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ timesteps=132, numvars=26) fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ timesteps=132, numvars=26) # grab some space data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan # extract the data data1_snow = jules_data1[:,jules_var,:,0] data2_snow = jules_data2[:,jules_var,:,0] data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) #for month in xrange(numyears * nummonths): # for i in xrange(numpts): # data1 = jules_data1[month,jules_var,land_pts_index[i],0] # data2 = jules_data2[month,jules_var,land_pts_index[i],0] # if data1 >= 0.0: # data1_snow[month,i] = data1 # else: # data1_snow[month,i] = np.nan # if data2 > 0.0: # data2_snow[month,i] = data2 # else: # data2_snow[month,i] = np.nan # exclude any months from *both* arrays where we have dodgy data, else we # can't do the correlations correctly!! data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) # put data on a regular grid... print 'regridding landpts...' for i in xrange(numpts): # exclude the NaN, note masking them doesn't work in the stats func x = data1_snow[:,i] x = x[np.isfinite(x)] y = data2_snow[:,i] y = y[np.isfinite(y)] # r^2 # exclude v.small arrays, i.e. we need just less over 4 years of data if len(x) and len(y) > 50: pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = (stats.pearsonr(x, y)[0])**2 # wilcox signed rank test # make sure we have enough samples to do the test d = x - y d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero differences count = len(d) if count > 10: z, pval = stats.wilcoxon(x, y) # only map out sign different data if pval < 0.05: wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = np.mean(x - y) return (pearsonsr_snow, wilcoxStats_snow) josef.pktd wrote: > > On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >> >> Also I then need to remap the 2D array I make onto another grid (the >> world in >> this case). Which again I had am doing with a loop (note numpts is a lot >> bigger than my example above). >> >> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan >> for i in xrange(numpts): >> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >> func >> ? ? ? ?x = data1_snow[:,i] >> ? ? ? ?x = x[np.isfinite(x)] >> ? ? ? ?y = data2_snow[:,i] >> ? ? ? ?y = y[np.isfinite(y)] >> >> ? ? ? ?# wilcox signed rank test >> ? ? ? ?# make sure we have enough samples to do the test >> ? ? ? ?d = x - y >> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero >> differences >> ? ? ? ?count = len(d) >> ? ? ? ?if count > 10: >> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >> ? ? ? ? ? ?# only map out sign different data >> ? ? ? ? ? ?if pval < 0.05: >> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >> np.mean(x - y) >> >> Now I think I can push the data in one move into the wilcoxStats_snow >> array >> by removing the index, >> but I can't see how I will get the individual x and y pts for each array >> member correctly without the loop, this was my attempt which of course >> doesn't work! >> >> x = data1_snow[:,:] >> x = x[np.isfinite(x)] >> y = data2_snow[:,:] >> y = y[np.isfinite(y)] >> >> # r^2 >> # exclude v.small arrays, i.e. we need just less over 4 years of data >> if len(x) and len(y) > 50: >> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >> y)[0])**2 > > > If you want to do pairwise comparisons with stats.wilcoxon, then you > might be stuck with the loop, since wilcoxon takes only two 1d arrays > at a time (if I read the help correctly). > > Also the presence of nans might force the use a loop. stats.mstats has > masked array versions, but I didn't see wilcoxon in the list. (Even > when vectorized operations would work with regular arrays, nan or > masked array versions still have to loop in many cases.) > > If you have many columns with count <= 10, so that wilcoxon is not > calculated then it might be worth to use only array operations up to > that point. If wilcoxon is calculated most of the time, then it's not > worth thinking too hard about this. > > Josef > > >> >> thanks. >> >> >> >> >> mdekauwe wrote: >>> >>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>> >>> I don't quite get what you mean about slicing with axis > 3. Is there a >>> link you can recommend I should read? Does that mean given I have 4dims >>> that Josef's suggestion would be more advised in this case? > > There were several discussions on the mailing lists (fancy slicing and > indexing). Your case is safe, but if you run in future into funny > shapes, you can look up the details. > when in doubt, I use np.arange(...) > > Josef > >>> >>> Thanks. >>> >>> >>> >>> josef.pktd wrote: >>>> >>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: >>>>> >>>>> Thanks that works... >>>>> >>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the >>>>> step >>>>> I >>>>> was struggling with, so this forms a 2D array which replaces the the >>>>> two >>>>> for >>>>> loops? Do I have that right? >>>> >>>> Yes, but as Zachary showed, if you need the full index in a dimension, >>>> then you can use slicing. It might be faster. >>>> And a warning, mixing slices and index arrays with 3 or more >>>> dimensions can have some surprise switching of axes. >>>> >>>> Josef >>>> >>>>> >>>>> A lot quicker...! >>>>> >>>>> Martin >>>>> >>>>> >>>>> josef.pktd wrote: >>>>>> >>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>> array, >>>>>>> but >>>>>>> avoid my current usage of the for loops for speed, as in reality the >>>>>>> arrays >>>>>>> sizes are quite big. Could someone also try and explain the solution >>>>>>> as >>>>>>> well >>>>>>> if they have a spare moment as I am still finding it quite difficult >>>>>>> to >>>>>>> get >>>>>>> over the habit of using loops (C convert for my sins). I get that >>>>>>> one >>>>>>> could >>>>>>> precompute the indices's i and j i.e. >>>>>>> >>>>>>> i = np.arange(tsteps) >>>>>>> j = np.arange(numpts) >>>>>>> >>>>>>> but just can't get my head round how i then use them... >>>>>>> >>>>>>> Thanks, >>>>>>> Martin >>>>>>> >>>>>>> import numpy as np >>>>>>> >>>>>>> numpts=10 >>>>>>> tsteps = 12 >>>>>>> vari = 22 >>>>>>> >>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>> index = np.arange(numpts) >>>>>>> >>>>>>> for i in xrange(tsteps): >>>>>>> ? ?for j in xrange(numpts): >>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>> >>>>>> The index arrays need to be broadcastable against each other. >>>>>> >>>>>> I think this should do it >>>>>> >>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>>>> >>>>>> Josef >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Sat May 22 08:59:50 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 22 May 2010 08:59:50 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28642434.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> Message-ID: On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: > > Sounds like I am stuck with the loop as I need to do the comparison for each > pixel of the world and then I have a basemap function call which I guess > slows it down further...hmm I don't see much that could be done differently, after a brief look. stats.pearsonr could be replaced by an array version using directly the formula for correlation even with nans. wilcoxon looks slow, and I never tried or seen a faster version. just a reminder, the p-values are for a single test, when you have many of them, then they don't have the right size/confidence level for an overall or joint test. (some packages report a Bonferroni correction in this case) Josef > > i.e. > > def compareSnowData(jules_var): > ? ?# Extract the 11 years of snow data and return > ? ?outrows = 180 > ? ?outcols = 360 > ? ?numyears = 11 > ? ?nummonths = 12 > > ? ?# Read various files > ? ?fname="world_valid_jules_pts.ascii" > ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = > jo.read_land_points_ascii(fname, 1.0) > > ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" > ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ > ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) > ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" > ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ > ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) > > ? ?# grab some space > ? ?data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) > ? ?data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) > ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan > ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * > np.nan > > ? ?# extract the data > ? ?data1_snow = jules_data1[:,jules_var,:,0] > ? ?data2_snow = jules_data2[:,jules_var,:,0] > ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) > ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) > ? ?#for month in xrange(numyears * nummonths): > ? ?# ? ?for i in xrange(numpts): > ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] > ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] > ? ?# ? ? ? ?if data1 >= 0.0: > ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 > ? ?# ? ? ? ?else: > ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan > ? ?# ? ? ? ?if data2 > 0.0: > ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 > ? ?# ? ? ? ?else: > ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan > > ? ?# exclude any months from *both* arrays where we have dodgy data, else > we > ? ?# can't do the correlations correctly!! > ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) > ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) > > ? ?# put data on a regular grid... > ? ?print 'regridding landpts...' > ? ?for i in xrange(numpts): > ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats func > ? ? ? ?x = data1_snow[:,i] > ? ? ? ?x = x[np.isfinite(x)] > ? ? ? ?y = data2_snow[:,i] > ? ? ? ?y = y[np.isfinite(y)] > > ? ? ? ?# r^2 > ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of > data > ? ? ? ?if len(x) and len(y) > 50: > ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = > (stats.pearsonr(x, y)[0])**2 > > ? ? ? ?# wilcox signed rank test > ? ? ? ?# make sure we have enough samples to do the test > ? ? ? ?d = x - y > ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero > differences > ? ? ? ?count = len(d) > ? ? ? ?if count > 10: > ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) > ? ? ? ? ? ?# only map out sign different data > ? ? ? ? ? ?if pval < 0.05: > ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = > np.mean(x - y) > > ? ?return (pearsonsr_snow, wilcoxStats_snow) > > > josef.pktd wrote: >> >> On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >>> >>> Also I then need to remap the 2D array I make onto another grid (the >>> world in >>> this case). Which again I had am doing with a loop (note numpts is a lot >>> bigger than my example above). >>> >>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan >>> for i in xrange(numpts): >>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>> func >>> ? ? ? ?x = data1_snow[:,i] >>> ? ? ? ?x = x[np.isfinite(x)] >>> ? ? ? ?y = data2_snow[:,i] >>> ? ? ? ?y = y[np.isfinite(y)] >>> >>> ? ? ? ?# wilcox signed rank test >>> ? ? ? ?# make sure we have enough samples to do the test >>> ? ? ? ?d = x - y >>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero >>> differences >>> ? ? ? ?count = len(d) >>> ? ? ? ?if count > 10: >>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>> ? ? ? ? ? ?# only map out sign different data >>> ? ? ? ? ? ?if pval < 0.05: >>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>> np.mean(x - y) >>> >>> Now I think I can push the data in one move into the wilcoxStats_snow >>> array >>> by removing the index, >>> but I can't see how I will get the individual x and y pts for each array >>> member correctly without the loop, this was my attempt which of course >>> doesn't work! >>> >>> x = data1_snow[:,:] >>> x = x[np.isfinite(x)] >>> y = data2_snow[:,:] >>> y = y[np.isfinite(y)] >>> >>> # r^2 >>> # exclude v.small arrays, i.e. we need just less over 4 years of data >>> if len(x) and len(y) > 50: >>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>> y)[0])**2 >> >> >> If you want to do pairwise comparisons with stats.wilcoxon, then you >> might be stuck with the loop, since wilcoxon takes only two 1d arrays >> at a time (if I read the help correctly). >> >> Also the presence of nans might force the use a loop. stats.mstats has >> masked array versions, but I didn't see wilcoxon in the list. (Even >> when vectorized operations would work with regular arrays, nan or >> masked array versions still have to loop in many cases.) >> >> If you have many columns with count <= 10, so that wilcoxon is not >> calculated then it might be worth to use only array operations up to >> that point. If wilcoxon is calculated most of the time, then it's not >> worth thinking too hard about this. >> >> Josef >> >> >>> >>> thanks. >>> >>> >>> >>> >>> mdekauwe wrote: >>>> >>>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>>> >>>> I don't quite get what you mean about slicing with axis > 3. Is there a >>>> link you can recommend I should read? Does that mean given I have 4dims >>>> that Josef's suggestion would be more advised in this case? >> >> There were several discussions on the mailing lists (fancy slicing and >> indexing). Your case is safe, but if you run in future into funny >> shapes, you can look up the details. >> when in doubt, I use np.arange(...) >> >> Josef >> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> josef.pktd wrote: >>>>> >>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe wrote: >>>>>> >>>>>> Thanks that works... >>>>>> >>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the >>>>>> step >>>>>> I >>>>>> was struggling with, so this forms a 2D array which replaces the the >>>>>> two >>>>>> for >>>>>> loops? Do I have that right? >>>>> >>>>> Yes, but as Zachary showed, if you need the full index in a dimension, >>>>> then you can use slicing. It might be faster. >>>>> And a warning, mixing slices and index arrays with 3 or more >>>>> dimensions can have some surprise switching of axes. >>>>> >>>>> Josef >>>>> >>>>>> >>>>>> A lot quicker...! >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>> josef.pktd wrote: >>>>>>> >>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>>> array, >>>>>>>> but >>>>>>>> avoid my current usage of the for loops for speed, as in reality the >>>>>>>> arrays >>>>>>>> sizes are quite big. Could someone also try and explain the solution >>>>>>>> as >>>>>>>> well >>>>>>>> if they have a spare moment as I am still finding it quite difficult >>>>>>>> to >>>>>>>> get >>>>>>>> over the habit of using loops (C convert for my sins). I get that >>>>>>>> one >>>>>>>> could >>>>>>>> precompute the indices's i and j i.e. >>>>>>>> >>>>>>>> i = np.arange(tsteps) >>>>>>>> j = np.arange(numpts) >>>>>>>> >>>>>>>> but just can't get my head round how i then use them... >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Martin >>>>>>>> >>>>>>>> import numpy as np >>>>>>>> >>>>>>>> numpts=10 >>>>>>>> tsteps = 12 >>>>>>>> vari = 22 >>>>>>>> >>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>> index = np.arange(numpts) >>>>>>>> >>>>>>>> for i in xrange(tsteps): >>>>>>>> ? ?for j in xrange(numpts): >>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>> >>>>>>> The index arrays need to be broadcastable against each other. >>>>>>> >>>>>>> I think this should do it >>>>>>> >>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>>>>> >>>>>>> Josef >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From yosefmel at post.tau.ac.il Sat May 22 10:06:09 2010 From: yosefmel at post.tau.ac.il (Yosef Meller) Date: Sat, 22 May 2010 17:06:09 +0300 Subject: [SciPy-User] Fwd: Announcing Tracer v0.2 In-Reply-To: References: Message-ID: This is a one-time message to announce the availability of version 0.2 of the Tracer package. About --------- Tracer is a ray-tracing package for Python. It is geared toward solar energy research; written for extensibility and scriptability, it is particularly suitable for use together with optimization programs, but your imagination is the limit. Tracer is free-software, distributed under the GPL v3.0 license. You are free to use, review the code or help improve it. Features ------------- The package contains two parts: a set of modules for building and running ray tracing scenes; and a set of pre-assembled models. The scene construction part provides: * Assemblies of arbitrary complexity * Surfaces with flat, spherical, cylindrical (new in v0.2) or paraboloidal shapes; API for developing other shapes. * Surface materials with specular-reflective or refractive surfaces; API for developing more complex models * Receiver surfaces for output collection. The models included in the second part: * One-side mirror * Rectangular kaleidoscope light-guide * Parabolic dishes with circular or hexagonal apertures. * Heliostats field (new in v0.2). * Spherical lens - any plano/convex/concave combination (new in v0.2). In addition, the package contains basic tools for GUI generation using MayaVi (new in v0.2). More information ------------------------- Project website: http://tracer.berlios.de/ Project mailing list: https://lists.berlios.de/mailman/listinfo/tracer-user From chris.michalski at gmail.com Sat May 22 17:22:23 2010 From: chris.michalski at gmail.com (Chris Michalski) Date: Sat, 22 May 2010 14:22:23 -0700 Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: References: Message-ID: <0375DFE9-9E7E-4DFB-8F93-7D2A064B4D7A@gmail.com> Haven't worked a problem like this, but it seemed to me that combining the fact that nearby points are close in (x,y) position, couldn't you compute two metrics. The first metric is the measure of distance from each point on the edge to an arbitrary point (which could be inside or outside the boomerang). Since the distance isn't unique (there could be the same distance to two or three points on the surface) then construct a metric with is the angle to the point. Sort the distance metric. Then rely on the fact that the angle metric can't change quickly between adjacent points to select from a small range in the sorted distance metric which point is closest. This should put adjacent points close enough together that the true nearest neighbor involves a search over a only couple of points either side of any individual point. You might be able to sort both metric and run the search jumping between metrics. Chris On May 21, 2010, at 2:46 PM, Anne Archibald wrote: > On 21 May 2010 17:07, wrote: >> I have a set of geometric data in x, y plane that represents a section of a >> turbine airfoil. The shape looks something like a fat boomerang with the >> coordinates wrapping around the entire shape (a completely closed loop). The >> coordinate points are in a random order and I need to sort or fit them by >> proximity to develop a dataset containing continuos shape of the airfoil. >> >> I started looking through the interpolation functions but I would need a >> method that ignores the order of the data (fits based on proximity of the >> points) and can handle data that forms a closed loop. >> >> The points are spaced closely enough along the airfoil surface so that they >> could be sorted by nearest neighbor - start with the first point find the >> next closest point and continue until all the points are "consumed". >> >> Any advice or pointers would be greatly appreciated. > > The most direct approach is to pick a start point at random, then ask > for its two nearest neighbours. Then pick one, and loop. For each > point ask for its two nearest neighbours; one should be the last point > you looked at, and one should be the next point on your curve. If ever > this isn't true, you've found some place where your points don't > sample closely enough to clearly describe the turbine shape. When you > get your first point back, you're done. > > As described, this is a fairly slow process, but the dominating > operation is not the python looping overhead but the time it takes to > find each nearest neighbour. Fortunately scipy.spatial includes an > object designed for this sort of problem, the kd-tree. So the way I'd > solve your problem is construct a kd-tree from your array of points, > then run a query asking the the three closest neighbours of each of > your original points (three because each point is its own closest > neighbour). Then just write a python loop to walk through the array of > neighbours as I described above. This process should be nice and fast, > and will diagnose some situations where you've inadequately sampled > your object. > > Anne > >> >> David Parker >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From gruben at bigpond.net.au Sat May 22 20:46:21 2010 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun, 23 May 2010 10:46:21 +1000 Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: <896588.9118.qm@web33006.mail.mud.yahoo.com> References: <896588.9118.qm@web33006.mail.mud.yahoo.com> Message-ID: <4BF87ADD.9060501@bigpond.net.au> I've previously done something similar to David's suggestion - used a Delaunay triangulation to get the points, then used NetworkX (NX) to turn this into a Euclidean/Geometric graph structure (Google delaunay2d_nx.py). The minimum spanning tree of this graph can be found and traversed by NX to visit all the nodes in the correct order. I used the NX dfs_preorder() traversal algorithm, culled off the side branches and spline fit the remaining ordered nodes. You could probably use astar_path() instead - I don't think this was available when I did it. Either way, you would also want to duplicate the first or last node to close the path. Gary R. David Baddeley wrote: > Hi David, > > I'd probably do a Delaunay triangularisation and then, starting at an > arbitrary node, walk the shortest edges, collecting nodes as I went. > You can get the triangulation from the scikits.delaunay package, > which you'll probably already have if you've got matplotlib installed > (in this case you can find it as matplotlib.delaunay). > > You'll need to write a loop (or recursive function) to do the > walking, but that shouldn't be too tricky. I've done something > similar (collecting 'blobs' of unstructured points which are closer > than a certain cutoff) using this technique, so if you need any > additional pointers, or ideas on how to optimise the procedure (I > precompute a database mapping each vertex to all the edges leading > from it - otherwise you've got to loop through the entire edge list > on each iteration to find which edges go from the current > node/vertex) give me a bell. > > cheers, David From charlesr.harris at gmail.com Sat May 22 22:38:38 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 22 May 2010 20:38:38 -0600 Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: <4BF87ADD.9060501@bigpond.net.au> References: <896588.9118.qm@web33006.mail.mud.yahoo.com> <4BF87ADD.9060501@bigpond.net.au> Message-ID: On Sat, May 22, 2010 at 6:46 PM, Gary Ruben wrote: > I've previously done something similar to David's suggestion - used a > Delaunay triangulation to get the points, then used NetworkX (NX) to > turn this into a Euclidean/Geometric graph structure (Google > delaunay2d_nx.py). The minimum spanning tree of this graph can be found > and traversed by NX to visit all the nodes in the correct order. I used > the NX dfs_preorder() traversal algorithm, culled off the side branches > and spline fit the remaining ordered nodes. You could probably use > astar_path() instead - I don't think this was available when I did it. > Either way, you would also want to duplicate the first or last node to > close the path. > > That's interesting. From the homological point of view there are two cycles, an inner one and an outer one. I suppose the orientation of any triangle would then be the sign of the determinant of the matrix formed from the vectors of it's vertices in some given order, although there is probably a more efficient way to assign orientations. Remove the edges that cancel out and it should be easy to find a cycle. I'll bet there is software out there for that problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 22 23:01:52 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 22 May 2010 21:01:52 -0600 Subject: [SciPy-User] Sort geometric data by proximity In-Reply-To: References: <896588.9118.qm@web33006.mail.mud.yahoo.com> <4BF87ADD.9060501@bigpond.net.au> Message-ID: On Sat, May 22, 2010 at 8:38 PM, Charles R Harris wrote: > > > On Sat, May 22, 2010 at 6:46 PM, Gary Ruben wrote: > >> I've previously done something similar to David's suggestion - used a >> Delaunay triangulation to get the points, then used NetworkX (NX) to >> turn this into a Euclidean/Geometric graph structure (Google >> delaunay2d_nx.py). The minimum spanning tree of this graph can be found >> and traversed by NX to visit all the nodes in the correct order. I used >> the NX dfs_preorder() traversal algorithm, culled off the side branches >> and spline fit the remaining ordered nodes. You could probably use >> astar_path() instead - I don't think this was available when I did it. >> Either way, you would also want to duplicate the first or last node to >> close the path. >> >> > That's interesting. From the homological point of view there are two > cycles, an inner one and an outer one. I suppose the orientation of any > triangle would then be the sign of the determinant of the matrix formed from > the vectors of it's vertices in some given order, although there is probably > a more efficient way to assign orientations. Remove the edges that cancel > out and it should be easy to find a cycle. I'll bet there is software out > there for that problem. > > Umm, boundaries, not cycles ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cool-rr at cool-rr.com Sun May 23 11:42:53 2010 From: cool-rr at cool-rr.com (cool-RR) Date: Sun, 23 May 2010 17:42:53 +0200 Subject: [SciPy-User] Do NumPy and SciPy release the GIL? Message-ID: Hello, I'm still a newbie at NumPy/Scipy. I have a question: When I call one of SciPy's or NumPy's efficient C routines: Do they release the GIL? Thanks, Ram Rachum. (P.S. Please put me in the `to` field of any replies.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun May 23 13:47:54 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 23 May 2010 12:47:54 -0500 Subject: [SciPy-User] Do NumPy and SciPy release the GIL? In-Reply-To: References: Message-ID: On Sun, May 23, 2010 at 10:42, cool-RR wrote: > Hello, > I'm still a newbie at NumPy/Scipy.?I have a question: When I call one of > SciPy's or NumPy's efficient C routines: Do they release the GIL? Sometimes. You will have to check the sources of the particular function in order to determine that. > Thanks, > Ram Rachum. > (P.S. Please put me in the `to` field of any replies.) If you want this, then you should set your Reply-To: header appropriately. But everyone would be much happier if you were to simply subscribe to the lists in which you are asking questions. You are placing a significant annoyance on those who would donate their time to help you. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From d.l.goldsmith at gmail.com Mon May 24 00:37:18 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 23 May 2010 21:37:18 -0700 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! Message-ID: Hi, folks. It's SciPy Marathon time again! For those who have just joined us: the past two summers, volunteers<%3Chttp://docs.scipy.org/numpy/contributors/%3E>from the NumPy/SciPy community have worked together to improve NumPy's documentation. So far, we have written most of the docs for NumPy (see, e.g., http://conference.scipy.org/proceedings/SciPy2009/paper_14/), using a Wiki application (pydocweb <%3Chttp://code.google.com/p/pydocweb/>, thanks to Pauli-Virtanen, Emmanuelle Gouillart, St?fan van der Walt, and Gael Varoquaux) that we use to edit and manage the docs<%3Chttp://docs.scipy.org/numpy/docs/%3E> . We have advanced the NumPy docstrings from only 8% being Ready For Review or better, to 85% being so (click here for details). This summer, we will focus on doing the same thing for SciPy. To participate (and we very much hope you will), please start by reading http://docs.scipy.org/numpy/Front%20Page/, and, in particular, the Before You Start section (but it's all important). (Despite the URL and any wording that may make it seem otherwise, the information there is as applicable to editing the SciPy documentation as it is to NumPy.) As far as actually performing the work is concerned, we will again attack these things as teams: go to http://docs.scipy.org/scipy/Milestones/ and poke around. Figure out where you think you could do the most good, and join that team by appending your name below the Milestone(s). If you think you could help lead a team (i.e., answer technical questions about the subject(s) encompassed by a Milestone) and are willing to do so, please append (L) to your name. As the summer progresses, look for Marathon-related announcements on the scipy-dev email list (subscription strongly recommended for Marathon participants). Happy editing! David Goldsmith Editor Pro Tem -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cool-rr at cool-rr.com Mon May 24 04:47:08 2010 From: cool-rr at cool-rr.com (cool-RR) Date: Mon, 24 May 2010 10:47:08 +0200 Subject: [SciPy-User] Do NumPy and SciPy release the GIL? In-Reply-To: References: Message-ID: On Sun, May 23, 2010 at 7:47 PM, Robert Kern wrote: > On Sun, May 23, 2010 at 10:42, cool-RR wrote: > > Hello, > > I'm still a newbie at NumPy/Scipy. I have a question: When I call one of > > SciPy's or NumPy's efficient C routines: Do they release the GIL? > > Sometimes. You will have to check the sources of the particular > function in order to determine that. Thanks for the info. > > Thanks, > > Ram Rachum. > > (P.S. Please put me in the `to` field of any replies.) > > If you want this, then you should set your Reply-To: header > appropriately. But everyone would be much happier if you were to > simply subscribe to the lists in which you are asking questions. You > are placing a significant annoyance on those who would donate their > time to help you. > I really don't know what to about this. If I'm setting the Reply-To, I should set it to both my address and the list's address. But I'm on G-Mail, and I can only set the Reply-To header globally. If I set it to my personal mail address, then by default when people will reply to me it will not get to the list, which would be undesirable as well. Regarding actually subscribing to the lists: I'm not sure how I'm supposed to deal with the 90% of messages that I'm not interested in. I'm active in a few dozen mailing lists, and I wouldn't want to receive every single message that is sent to each of them. The nicest arrangement I have is with lists managed by Google Groups: They have a feature where you can pick a specific thread and ask to receive emails only from it. That's really convenient. Of course, Google Groups has its own drawbacks. If you have any suggestion about what I can do, I'd be happy to hear it. (Of course you may reply to this off-list, this is probably not of interest to this list.) Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Mon May 24 06:45:44 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Mon, 24 May 2010 03:45:44 -0700 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! In-Reply-To: References: Message-ID: On Sun, May 23, 2010 at 9:37 PM, David Goldsmith wrote: > Hi, folks.? It's SciPy Marathon time again! Excellent! Thanks for spearheading this again David. The improvements to our documentation over the previous marathons have been staggering and I look forward to seeing how much is accomplished this summer. Thanks, Jarrod From harald.schilly at gmail.com Mon May 24 07:22:56 2010 From: harald.schilly at gmail.com (Harald Schilly) Date: Mon, 24 May 2010 13:22:56 +0200 Subject: [SciPy-User] Do NumPy and SciPy release the GIL? In-Reply-To: References: Message-ID: On Mon, May 24, 2010 at 10:47, cool-RR wrote: > If you have any suggestion about what I can do, I'd be happy to hear it. in gmail, in the top menu in the "more actions" dropdown select "filter messages like these", apply a label "scipy" and check the "skip inbox" (or something like that). then you do not see the messages and you can click on the scipy label to read them. h From Dharhas.Pothina at twdb.state.tx.us Tue May 25 10:17:55 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Tue, 25 May 2010 09:17:55 -0500 Subject: [SciPy-User] Parameterizing a curve / Map curve to line Message-ID: <4BFB95C4.63BA.009B.0@twdb.state.tx.us> Hi, I'm am trying to correct some bathymetric data. We have a series of x,y,z values that represent the boat path (x,y) and depth (z) on a river. Due to trees etc in some sections of the data the GPS went out leading to spurious x & y values. We have manually drawn in the boat path in those sections and I am trying to move the spurious points onto this manually drawn path. Essentially I have a points dataset (x,y,z) that contains all the data points and a boatpath dataset (x,y) that represents the actual boat path. The plan is to traverse the boat path curve and if the next point lies on the boatpath leave it alone, if it does not to use the boat speed (known value) to calculate how much farther along the boat path curve the next point should be and move it to that location. My plan to do this involved either parameterizing or mapping the boat path curve to a 1D line and then mapping the final corrected points back to the curve. One way of doing this that I thought of was to interpolate the boat path curve to a higher resolution (ie 0.1ft spacing etc) and then calculate the cumulative distance along the line from the origin. The boat path is not monotonically increasing in x, curves around and sometimes can loop around. Looking at the interp and spline functions in scipy I'm unsure how to interpolate to a 0.1ft spacing in this case. Also any other ideas for achieving the above is welcomed. - dharhas From zachary.pincus at yale.edu Tue May 25 11:20:58 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 25 May 2010 11:20:58 -0400 Subject: [SciPy-User] Parameterizing a curve / Map curve to line In-Reply-To: <4BFB95C4.63BA.009B.0@twdb.state.tx.us> References: <4BFB95C4.63BA.009B.0@twdb.state.tx.us> Message-ID: > Essentially I have a points dataset (x,y,z) that contains all the > data points and a boatpath dataset (x,y) that represents the actual > boat path. The plan is to traverse the boat path curve and if the > next point lies on the boatpath leave it alone, if it does not to > use the boat speed (known value) to calculate how much farther along > the boat path curve the next point should be and move it to that > location. > > My plan to do this involved either parameterizing or mapping the > boat path curve to a 1D line and then mapping the final corrected > points back to the curve. One way of doing this that I thought of > was to interpolate the boat path curve to a higher resolution (ie > 0.1ft spacing etc) and then calculate the cumulative distance along > the line from the origin. > > The boat path is not monotonically increasing in x, curves around > and sometimes can loop around. Looking at the interp and spline > functions in scipy I'm unsure how to interpolate to a 0.1ft spacing > in this case. Also any other ideas for achieving the above is > welcomed. You could use the scipy.interpolate.splprep routines to fit a parametric spline to your (x,y,p) data (where p is some parameter: timestamps if you have them, else just monotonically increasing indices), then use splev to resample the curve. (If you have timestamps and speeds, you could probably figure out a set of evaluation times that ought to yield 0.1ft resolution, but you probably don't need that kind of precision: you could just upsample by 10-fold or whatever if desired.) This is probably overkill, though... you could just use numpy.interp to linearly interpolate your data up by 10-fold or something. (Again, parameterized by t or some index: you just need to resample the x and y data separately and then put it back together... this is what the splprep routines do under the hood.) And of course for this part of your plan: > traverse the boat path curve and if the next point lies on the > boatpath leave it alone I would recommend not just testing points for equality, especially after resampling, but for each point, calculate the minimum distance from that point to the boatpath, and if it's below a threshold value, leave it alone. Zach From afraser at lanl.gov Tue May 25 12:39:55 2010 From: afraser at lanl.gov (Andy Fraser) Date: Tue, 25 May 2010 10:39:55 -0600 Subject: [SciPy-User] using multiple processors for particle filtering Message-ID: <8739xgndes.fsf@lanl.gov> I am using a particle filter to estimate the trajectory of a camera based on a sequence of images taken by the camera. The code is slow, but I have 8 processors in my desktop machine. I'd like to use them to get results 8 times faster. I've been looking at the following sections of http://docs.python.org/library: "16.6. multiprocessing" and "16.2. threading". I've also read some discussion from 2006 on scipy-user at scipy.org about seeds for random numbers in threads. I don't have any experience with multiprocessing and would appreciate advice. Here is a bit of code that I want to modify: for i in xrange(len(self.particles)): self.particles[i] = self.particles[i].random_fork() Each particle is a class instance that represents a possible camera state (position, orientation, and velocities). particle.random_fork() is a method that moves the position and orientation based on current velocities and then uses numpy.random.standard_normal((N,)) to perturb the velocities. I handle the correlation structure of the noise by matrices that are members of particle, and I do some of the calculations in c++. I would like to do something like: for i in xrange(len(self.particles)): nv = numpy.random.standard_normal((N,)) launch_on_any_available_processor( self.particles[i] = self.particles[i].random_fork(nv) ) wait_for_completions() But I don't see a command like "launch_on_any_available_processor". I would be grateful for any advice. -- Andy Fraser ISR-2 (MS:B244) afraser at lanl.gov Los Alamos National Laboratory 505 665 9448 Los Alamos, NM 87545 From yosefm at gmail.com Thu May 20 01:06:46 2010 From: yosefm at gmail.com (Yosef Meller) Date: Thu, 20 May 2010 08:06:46 +0300 Subject: [SciPy-User] Announcing Tracer v0.2 Message-ID: This is a one-time message to announce the availability of version 0.2 of the Tracer package. About --------- Tracer is a ray-tracing package for Python. It is geared toward solar energy research; written for extensibility and scriptability, it is particularly suitable for use together with optimization programs, but your imagination is the limit. Tracer is free-software, distributed under the GPL v3.0 license. You are free to use, review the code or help improve it. Features ------------- The package contains two parts: a set of modules for building and running ray tracing scenes; and a set of pre-assembled models. The scene construction part provides: * Assemblies of arbitrary complexity * Surfaces with flat, spherical, cylindrical (new in v0.2) or paraboloidal shapes; API for developing other shapes. * Surface materials with specular-reflective or refractive surfaces; API for developing more complex models * Receiver surfaces for output collection. The models included in the second part: * One-side mirror * Rectangular kaleidoscope light-guide * Parabolic dishes with circular or hexagonal apertures. * Heliostats field (new in v0.2). * Spherical lens - any plano/convex/concave combination (new in v0.2). In addition, the package contains basic tools for GUI generation using MayaVi (new in v0.2). More information ------------------------- Project website: http://tracer.berlios.de/ Project mailing list: https://lists.berlios.de/mailman/listinfo/tracer-user From Christer.Malmberg.0653 at student.uu.se Fri May 21 12:45:57 2010 From: Christer.Malmberg.0653 at student.uu.se (Christer Malmberg) Date: Fri, 21 May 2010 18:45:57 +0200 Subject: [SciPy-User] Weave compilation problems Message-ID: <20100521184557.u1k7rafqzowk8c00@webmail.uu.se> Hi, I'm trying to use weave, but I't doesn't work. Here is the error I get: ... File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 272, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 184, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 162, in setup raise SystemExit, error CompileError: error: Bad file descriptor I know nothing about C++ programming, the code is part of a package I need to use for curve fitting. I contacted the author, but he suggested the problem is in my setup of weave. I'm running windows, with the Visual Express 2008 compiler, latest SciPy/Numpy and python 2.6.5. I didn't find anything on this error while googleing. Anyone have an idea what might be the problem? Best regards, Christer Malmberg From Christer.Malmberg.0653 at student.uu.se Sat May 22 03:22:34 2010 From: Christer.Malmberg.0653 at student.uu.se (Christer Malmberg) Date: Sat, 22 May 2010 09:22:34 +0200 Subject: [SciPy-User] Weave compilation problems Message-ID: <20100522092234.vaco60cm2o48sogs@webmail6.uu.se> Hi, I'm trying to use weave, but I't doesn't work. Here is the error I get: ... File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 272, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 184, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 162, in setup raise SystemExit, error CompileError: error: Bad file descriptor I know nothing about C++ programming, the code is part of a package I need to use for curve fitting. I contacted the author, but he suggested the problem is in my setup of weave. I'm running windows, with the Visual Express 2008 compiler, latest SciPy/Numpy and python 2.6.5. I didn't find anything on this error while googleing. Anyone have an idea what might be the problem? Best regards, Christer Malmberg From et.barthel at free.fr Sat May 22 08:06:54 2010 From: et.barthel at free.fr (et.barthel at free.fr) Date: Sat, 22 May 2010 14:06:54 +0200 Subject: [SciPy-User] complex numbers - sign problem Message-ID: <1274530014.4bf7c8de59422@imp.free.fr> Hi, not sure it's the right place to ask the question, but I don't really know where to send it. I have an issue with cmath sign handling: In [75]: -1-0.j Out[75]: (-1+0j) In [76]: -(1+0.j) Out[76]: (-1-0j) is a bit strange by itself - at least to me - but combined with a multivalued function which has a branch cut on the x-axis leads to significant and potentially harmful sign problem. Of course one can fiddle around the problem but I would like to be sure if this is a bug or if there is some sense to it. Thanks, Etienne From josef.pktd at gmail.com Tue May 25 13:04:00 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 25 May 2010 13:04:00 -0400 Subject: [SciPy-User] complex numbers - sign problem In-Reply-To: <1274530014.4bf7c8de59422@imp.free.fr> References: <1274530014.4bf7c8de59422@imp.free.fr> Message-ID: On Sat, May 22, 2010 at 8:06 AM, wrote: > Hi, > not sure it's the right place to ask the question, but I don't really know where > to send it. > I have an issue with cmath sign handling: > In [75]: -1-0.j > Out[75]: (-1+0j) > > In [76]: -(1+0.j) > Out[76]: (-1-0j) > is a bit strange by itself - at least to me - but combined with a multivalued > function which has a branch cut on the x-axis leads to significant and > potentially harmful sign problem. Of course one can fiddle around the problem > but I would like to be sure if this is a bug or if there is some sense to it. (not an answer) negative zero depends on the operating system on Windows with Python 2.5: >>> -(1+0.j) (-1+0j) >>> (-1-0.j) (-1+0j) So relying on the sign of zero doesn't look like a good strategy to me Josef > Thanks, > Etienne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Tue May 25 13:46:22 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 25 May 2010 13:46:22 -0400 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: <8739xgndes.fsf@lanl.gov> References: <8739xgndes.fsf@lanl.gov> Message-ID: > I would like to do something like: > > for i in xrange(len(self.particles)): > nv = numpy.random.standard_normal((N,)) > launch_on_any_available_processor( > self.particles[i] = self.particles[i].random_fork(nv) > ) > wait_for_completions() > > But I don't see a command like "launch_on_any_available_processor". > I would be grateful for any advice. > Look more in depth at the multiprocessing library -- it's likely to be what you want... more or less. However, it might be a bit "less" than "more" because, from above, what it looks like you want to do is to launch many (millions?) of very lightweight tasks ("random_fork") on different processors. If you naively start up a fresh python process for each "random_fork" task, the startup costs will dominate (hugely so). So you'll likely need to re-jigger the task you want to dispatch to each processor so that the chunks are larger: processes = [] for i in range(num_processors): processes.append(start_worker_process(num_particles=total_particles// num_processors)) wait_for_processes_to_end() self.particles = numpy.concatenate([process.particles]) Though even this might be a too-granular a task, if you have numerous time-steps for which you need to generate particles. (E.g. if the above would be within another large loop.) Then you'd need to write the worker processes as sort of "particle servers": processes = [] for i in range(num_processors): processes.append(start_worker()) for task in huge_task_list: sub_tasks = divide_tasks(num_processors) for process, sub_task in zip(processes, sub_tasks): process.start(sub_task) wait_for_processes_to_complete_task() self.results = assemble_results(processes) This is of course pretty naive still, but it'll be a start, and the architectures I've (roughly) outlined here fit better with the multiprocessing paradigm. You'll need to do some internet reading and looking at various examples first, to get the details of how to actually implement this stuff. I don't know anything off the top of my head, but perhaps others can chime in? Zach From robert.kern at gmail.com Tue May 25 14:10:58 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 25 May 2010 14:10:58 -0400 Subject: [SciPy-User] complex numbers - sign problem In-Reply-To: <1274530014.4bf7c8de59422@imp.free.fr> References: <1274530014.4bf7c8de59422@imp.free.fr> Message-ID: On Sat, May 22, 2010 at 08:06, wrote: > Hi, > not sure it's the right place to ask the question, but I don't really know where > to send it. > I have an issue with cmath sign handling: > In [75]: -1-0.j > Out[75]: (-1+0j) > > In [76]: -(1+0.j) > Out[76]: (-1-0j) > is a bit strange by itself - at least to me - but combined with a multivalued > function which has a branch cut on the x-axis leads to significant and > potentially harmful sign problem. Of course one can fiddle around the problem > but I would like to be sure if this is a bug or if there is some sense to it. This looks like cornercase with Python's parsing of the imaginary literal. You can work around it by explicitly using the complex() constructor: In [7]: -1.0 - 0.0j Out[7]: (-1+0j) In [8]: complex(-1.0, -0.0) Out[8]: (-1-0j) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Tue May 25 14:17:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 25 May 2010 14:17:42 -0400 Subject: [SciPy-User] complex numbers - sign problem In-Reply-To: References: <1274530014.4bf7c8de59422@imp.free.fr> Message-ID: On Tue, May 25, 2010 at 2:10 PM, Robert Kern wrote: > On Sat, May 22, 2010 at 08:06, ? wrote: >> Hi, >> not sure it's the right place to ask the question, but I don't really know where >> to send it. >> I have an issue with cmath sign handling: >> In [75]: -1-0.j >> Out[75]: (-1+0j) >> >> In [76]: -(1+0.j) >> Out[76]: (-1-0j) >> is a bit strange by itself - at least to me - but combined with a multivalued >> function which has a branch cut on the x-axis leads to significant and >> potentially harmful sign problem. Of course one can fiddle around the problem >> but I would like to be sure if this is a bug or if there is some sense to it. > > This looks like cornercase with Python's parsing of the imaginary > literal. You can work around it by explicitly using the complex() > constructor: > > In [7]: -1.0 - 0.0j > Out[7]: (-1+0j) > > In [8]: complex(-1.0, -0.0) > Out[8]: (-1-0j) >>> complex(-1.0, -0.0) (-1+0j) Josef > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Tue May 25 14:38:18 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 25 May 2010 14:38:18 -0400 Subject: [SciPy-User] complex numbers - sign problem In-Reply-To: References: <1274530014.4bf7c8de59422@imp.free.fr> Message-ID: On Tue, May 25, 2010 at 14:17, wrote: > On Tue, May 25, 2010 at 2:10 PM, Robert Kern wrote: >> On Sat, May 22, 2010 at 08:06, ? wrote: >>> Hi, >>> not sure it's the right place to ask the question, but I don't really know where >>> to send it. >>> I have an issue with cmath sign handling: >>> In [75]: -1-0.j >>> Out[75]: (-1+0j) >>> >>> In [76]: -(1+0.j) >>> Out[76]: (-1-0j) >>> is a bit strange by itself - at least to me - but combined with a multivalued >>> function which has a branch cut on the x-axis leads to significant and >>> potentially harmful sign problem. Of course one can fiddle around the problem >>> but I would like to be sure if this is a bug or if there is some sense to it. >> >> This looks like cornercase with Python's parsing of the imaginary >> literal. You can work around it by explicitly using the complex() >> constructor: >> >> In [7]: -1.0 - 0.0j >> Out[7]: (-1+0j) >> >> In [8]: complex(-1.0, -0.0) >> Out[8]: (-1-0j) > >>>> complex(-1.0, -0.0) > (-1+0j) ... and use Python 2.6. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robince at gmail.com Tue May 25 17:16:47 2010 From: robince at gmail.com (Robin) Date: Tue, 25 May 2010 22:16:47 +0100 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: <8739xgndes.fsf@lanl.gov> References: <8739xgndes.fsf@lanl.gov> Message-ID: On Tue, May 25, 2010 at 5:39 PM, Andy Fraser wrote: > I am using a particle filter to estimate the trajectory of a camera > based on a sequence of images taken by the camera. ?The code is slow, > but I have 8 processors in my desktop machine. ?I'd like to use them > to get results 8 times faster. ?I've been looking at the following > sections of http://docs.python.org/library: "16.6. multiprocessing" > and "16.2. threading". ?I've also read some discussion from 2006 on > scipy-user at scipy.org about seeds for random numbers in threads. ?I > don't have any experience with multiprocessing and would appreciate > advice. > > Here is a bit of code that I want to modify: > > ? ? ? ?for i in xrange(len(self.particles)): > ? ? ? ? ? ?self.particles[i] = self.particles[i].random_fork() If the updates are independent and don't have to be done sequentially you can use the multiprocessing.Pool interface which I've found very convenient for this sort of thing. Ideally if particles[i] is a class instance then random_fork could modify itself in place instad of returning a modified copy of the instance... then you could do something like def update_particle(self, i): nv = numpy.random.standard_normal((N,)) self.particles[i].random_fork(nv) p = multiprocessing.Pool(8) p.map(self.update_particle, range(len(self.particles))) this will distribute each update_particle call to a different process using all cores (providing the processing is independent). I'm not sure if random is multiprocessor safe for use like this so that would need checking but I hope this helps a bit... cheers Robin > Each particle is a class instance that represents a possible camera > state (position, orientation, and velocities). ?particle.random_fork() > is a method that moves the position and orientation based on current > velocities and then uses numpy.random.standard_normal((N,)) to perturb > the velocities. ?I handle the correlation structure of the noise by > matrices that are members of particle, and I do some of the > calculations in c++. > > I would like to do something like: > > ? ? ? ?for i in xrange(len(self.particles)): > ? ? ? ? ? ?nv = numpy.random.standard_normal((N,)) > ? ? ? ? ? ?launch_on_any_available_processor( > ? ? ? ? ? ? ? ?self.particles[i] = self.particles[i].random_fork(nv) > ? ? ? ? ? ?) > ? ? ? ?wait_for_completions() > > But I don't see a command like "launch_on_any_available_processor". > I would be grateful for any advice. > > -- > Andy Fraser ? ? ? ? ? ? ? ? ? ? ? ? ? ? ISR-2 ? (MS:B244) > afraser at lanl.gov ? ? ? ? ? ? ? ? ? ? ? ?Los Alamos National Laboratory > 505 665 9448 ? ? ? ? ? ? ? ? ? ? ? ? ? ?Los Alamos, NM 87545 > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robince at gmail.com Tue May 25 17:19:38 2010 From: robince at gmail.com (Robin) Date: Tue, 25 May 2010 22:19:38 +0100 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: References: <8739xgndes.fsf@lanl.gov> Message-ID: On Tue, May 25, 2010 at 10:16 PM, Robin wrote: > If the updates are independent and don't have to be done sequentially > you can use the multiprocessing.Pool interface which I've found very > convenient for this sort of thing. > > Ideally if particles[i] is a class instance then random_fork could > modify itself in place instad of returning a modified copy of the > instance... then you could do something like > > def update_particle(self, i): > ? ?nv = numpy.random.standard_normal((N,)) > ? ?self.particles[i].random_fork(nv) > > p = multiprocessing.Pool(8) > p.map(self.update_particle, range(len(self.particles))) Sorry - just thought it probably doesn't make sense to use map in this case since your processing function isn't returning anything... you can check Pool.apply_async (which returns control and lets stuff continue in the background) and Pool.apply_sync (which is probably what you want). Cheers Robin > > this will distribute each update_particle call to a different process > using all cores (providing the processing is independent). > > I'm not sure if random is multiprocessor safe for use like this so > that would need checking but I hope this helps a bit... > From robince at gmail.com Tue May 25 17:50:54 2010 From: robince at gmail.com (Robin) Date: Tue, 25 May 2010 22:50:54 +0100 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: References: <8739xgndes.fsf@lanl.gov> Message-ID: On Tue, May 25, 2010 at 10:19 PM, Robin wrote: > Sorry - just thought it probably doesn't make sense to use map in this > case since your processing function isn't returning anything... you > can check Pool.apply_async (which returns control and lets stuff > continue in the background) and Pool.apply_sync (which is probably > what you want). I'm being a bit silly I think - this won't work properly of course because the particle instances will be changed in the subprocesses and not propagated back.. but hopefully the pointer to using Pool if useful. If you can split the task into an independent function then it's really handy... Maybe something like self.updated_particles = p.map(update_particle, self.particles) where update_particle takes a single particle instance, does the random number generation and returns the updated particle. Cheers Robin From warren.weckesser at enthought.com Tue May 25 18:00:24 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 25 May 2010 17:00:24 -0500 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! In-Reply-To: References: Message-ID: <4BFC4878.2050303@enthought.com> David Goldsmith wrote: > > Hi, folks. It's SciPy Marathon time again! > > > As far as actually performing the work is concerned, we will again > attack these things as teams: go to > http://docs.scipy.org/scipy/Milestones/ and poke around. > David, how was that list generated? Because of some refactoring I did in linalg and signal, many of the links in these modules will give a warning that the docstring is obsolete because the corresponding object is no longer present in SVN. Warren From d.l.goldsmith at gmail.com Tue May 25 18:52:42 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 25 May 2010 15:52:42 -0700 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! In-Reply-To: <4BFC4878.2050303@enthought.com> References: <4BFC4878.2050303@enthought.com> Message-ID: On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > David Goldsmith wrote: > > > > Hi, folks. It's SciPy Marathon time again! > > > > > > > As far as actually performing the work is concerned, we will again > > attack these things as teams: go to > > http://docs.scipy.org/scipy/Milestones/ and poke around. > > > > David, how was that list generated? http://docs.scipy.org/scipy/Milestones/log/ Jack Liddle created it "by hand" last summer. > Because of some refactoring I did > in linalg and signal, many of the links in these modules will give a > warning that the docstring is obsolete because the corresponding object > is no longer present in SVN. > Do you have a list of all the new objects you created? DG > > Warren > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Tue May 25 19:19:00 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 25 May 2010 18:19:00 -0500 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! In-Reply-To: References: <4BFC4878.2050303@enthought.com> Message-ID: <4BFC5AE4.2020104@enthought.com> David Goldsmith wrote: > On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser > > wrote: > > David Goldsmith wrote: > > > > Hi, folks. It's SciPy Marathon time again! > > > > > > > As far as actually performing the work is concerned, we will again > > attack these things as teams: go to > > http://docs.scipy.org/scipy/Milestones/ and poke around. > > > > David, how was that list generated? > > > http://docs.scipy.org/scipy/Milestones/log/ > > Jack Liddle created it "by hand" last summer. > > > Because of some refactoring I did > in linalg and signal, many of the links in these modules will give a > warning that the docstring is obsolete because the corresponding > object > is no longer present in SVN. > > > Do you have a list of all the new objects you created? > I think this is it: linalg: new modules (reorganized basic.py and decomp.py): decomp_cholesky.py decomp_lu.py decomp_qr.py decomp_schur.py decomp_svd.py special_matrices.py linalg: actual new functions: decomp_cholesky.cho_solve_banded special_matrices.circulant special_matrices.companion special_matrices.hadamard special_matrices.leslie signal: new modules (moved window functions from signaltools.py): windows.py signal: actual new functions: ltisys.impulse2 ltisys.step2 waveforms.sweep_poly Warren > DG > > > Warren > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > -- > Mathematician: noun, someone who disavows certainty when their > uncertainty set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with > her lies, prevents mankind from committing a general suicide. (As > interpreted by Robert Graves) > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peter.shepard at gmail.com Tue May 25 21:35:26 2010 From: peter.shepard at gmail.com (Pete Shepard) Date: Tue, 25 May 2010 18:35:26 -0700 Subject: [SciPy-User] fisherexact.py returns "NA" for large #s In-Reply-To: References: Message-ID: Hi Josef, An example of ratios that returns "nan"; 110:859 and 48:327 On Fri, May 7, 2010 at 1:15 PM, wrote: > On Fri, May 7, 2010 at 3:45 PM, Pete Shepard > wrote: > > Hello List, > > > > > > I am using "fisherexact.py" to calculate the p-value of two ratios > however, > > when large #s are involved, it returns "NA". Is there a way to override > > this? > > > You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ? > > Do you have an example? Can you add it to the ticket? > > Do you have large ratios or large numbers in each cell? > If you have a large number of entries in each cell, then the chisquare > test or similar > asymptotic tests should be pretty reliable. > > Last time I tried, I didn't manage to get rid of incorrect results if > the first cell is zero. > And I didn't understand the details of the algorithm well enough to > figure out what's > going on (within a reasonable time). > > If you add some print statements, you could find out if the nan comes from > a > 0./0. division or from the hypergeometric distribution. > Do you get the same result if you permute rows or columns? > > fisherexact works very well over a large range of values, but I'm > waiting for someone > to provide a patch for the cases that don't work. > > Josef > > > > > > > > > TIA > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed May 26 02:03:39 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 25 May 2010 23:03:39 -0700 Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon is on! In-Reply-To: <4BFC5AE4.2020104@enthought.com> References: <4BFC4878.2050303@enthought.com> <4BFC5AE4.2020104@enthought.com> Message-ID: On Tue, May 25, 2010 at 4:19 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > David Goldsmith wrote: > > On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser > > > > wrote: > > > > David Goldsmith wrote: > > > > > > Hi, folks. It's SciPy Marathon time again! > > > > > > > > > > > As far as actually performing the work is concerned, we will again > > > attack these things as teams: go to > > > http://docs.scipy.org/scipy/Milestones/ and poke around. > > > > > > > David, how was that list generated? > > > > > > http://docs.scipy.org/scipy/Milestones/log/ > > > > Jack Liddle created it "by hand" last summer. > > > > > > Because of some refactoring I did > > in linalg and signal, many of the links in these modules will give a > > warning that the docstring is obsolete because the corresponding > > object > > is no longer present in SVN. > > > > > > Do you have a list of all the new objects you created? > > > > I think this is it: > > linalg: new modules (reorganized basic.py and decomp.py): > decomp_cholesky.py > decomp_lu.py > decomp_qr.py > decomp_schur.py > decomp_svd.py > special_matrices.py > > linalg: actual new functions: > decomp_cholesky.cho_solve_banded > special_matrices.circulant > special_matrices.companion > special_matrices.hadamard > special_matrices.leslie > > signal: new modules (moved window functions from signaltools.py): > windows.py > > signal: actual new functions: > ltisys.impulse2 > ltisys.step2 > waveforms.sweep_poly > OK, thanks, I'll make sure the Wiki is seeing them, then add them to the Milestones. DG > > > Warren > > > > DG > > > > > > Warren > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > > > -- > > Mathematician: noun, someone who disavows certainty when their > > uncertainty set is non-empty, even if that set has measure zero. > > > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with > > her lies, prevents mankind from committing a general suicide. (As > > interpreted by Robert Graves) > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dharhas.Pothina at twdb.state.tx.us Wed May 26 08:43:49 2010 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Wed, 26 May 2010 07:43:49 -0500 Subject: [SciPy-User] Parameterizing a curve / Map curve to line In-Reply-To: References: <4BFB95C4.63BA.009B.0@twdb.state.tx.us> Message-ID: <4BFCD135.63BA.009B.0@twdb.state.tx.us> >This is probably overkill, though... you could just use numpy.interp >to linearly interpolate your data up by 10-fold or something. (Again, >parameterized by t or some index: you just need to resample the x and >y data separately and then put it back together... this is what the >splprep routines do under the hood.) Thanks I used this technique with some modifications to get equal spacing and it seems to be working great. > I would recommend not just testing points for equality, especially > after resampling, but for each point, calculate the minimum distance > from that point to the boatpath, and if it's below a threshold value, > leave it alone. This was my original plan, I just left it out of the explaination for simplicity. - dharhas From chanley at stsci.edu Wed May 26 12:19:51 2010 From: chanley at stsci.edu (Christopher Hanley) Date: Wed, 26 May 2010 12:19:51 -0400 Subject: [SciPy-User] numpy and the Google App Engine Message-ID: Greetings, Google provides a product called App Engine. The description from their site follows, "Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast development and deployment; simple administration, with no need to worry about hardware, patches or backups; and effortless scalability. " You can deploy applications written in either Python or JAVA. There are free and paid versions of the service. The Google App Engine would appear to be a powerful source of CPU cycles for scientific computing. Unfortunately this is currently not the case because numpy is not one of the supported libraries. The Python App Engine allows only the installation of user supplied pure Python code. I have recently returned from attending the Google I/O conference in San Francisco. While there I inquired into the possibility of getting numpy added. The basic response was that there doesn't appear to be much interest from the community given the amount of work it would take to vet and add numpy. I would like to ask your help in changing this perception. The quickest and easiest thing you can do would be to add your "me too" to this feature request (item #190) on the support site: http://code.google.com/p/googleappengine/issues/detail?id=190 If this issue is important to you could also consider raising this issue in the related Google Group: http://groups.google.com/group/google-appengine Letting Google know how you will use numpy would be helpful. If you or your institution would be willing to pay for service if you could deploy cloud applications that required numpy would be helpful to let them know as well. Finally, if you run into any App Engine developers (Guido included) let them know that you would like to see numpy added. Thank you for your time and consideration. Chris -- Christopher Hanley Senior Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From robert.kern at gmail.com Wed May 26 12:54:17 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 26 May 2010 12:54:17 -0400 Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine In-Reply-To: References: Message-ID: On Wed, May 26, 2010 at 12:19, Christopher Hanley wrote: > Greetings, > > Google provides a product called App Engine. ?The description from > their site follows, > > "Google App Engine enables you to build and host web apps on the same > systems that power Google applications. > App Engine offers fast development and deployment; simple > administration, with no need to worry about hardware, > patches or backups; and effortless scalability. " > > You can deploy applications written in either Python or JAVA. ?There > are free and paid versions of the service. > > The Google App Engine would appear to be a powerful source of CPU > cycles for scientific computing. Not really. It is not intended for such purposes. It is intended for the easy deployment and horizontal scaling of web applications. Each individual request is very short; it is limited to 10 seconds of CPU time. While numpy would be useful for scientific web applications (not least because it would help you keep to that 10 second limit when doing things like simple image processing or summary statistics or whatever), it is not a source of CPU cycles. Services like Amazon EC2 or Rackspace Cloud are much closer to what you want. PiCloud provides an even nicer interface for you: http://www.picloud.com/ Disclosure: Enthought partners with PiCloud to provide most EPD libraries. I can't say I'm disinterested in promoting it, but it *is* a really powerful product that *does* provide CPU cycles for scientific computing with an interface much more suited to it than GAE. >?Unfortunately this is currently not > the case because numpy is not one of the supported libraries. ?The > Python App Engine allows only the installation of user supplied pure > Python code. > > I have recently returned from attending the Google I/O conference in > San Francisco. ?While there I inquired into the possibility of getting > numpy added. ?The basic response was that there doesn't appear to be > much interest from the community given the amount of work it would > take to vet and add numpy. > > I would like to ask your help in changing this perception. > > The quickest and easiest thing you can do would be to add your "me > too" to this feature request (item #190) on the support site: > > http://code.google.com/p/googleappengine/issues/detail?id=190 My understanding is that they hate "me too" comments. They ask that you star the issue instead. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mdekauwe at gmail.com Wed May 26 17:03:03 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Wed, 26 May 2010 14:03:03 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> Message-ID: <28686356.post@talk.nabble.com> Could you possibly if you have time explain further your comment re the p-values, your suggesting I am misusing them? Thanks. josef.pktd wrote: > > On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >> >> Sounds like I am stuck with the loop as I need to do the comparison for >> each >> pixel of the world and then I have a basemap function call which I guess >> slows it down further...hmm > > I don't see much that could be done differently, after a brief look. > > stats.pearsonr could be replaced by an array version using directly > the formula for correlation even with nans. wilcoxon looks slow, and I > never tried or seen a faster version. > > just a reminder, the p-values are for a single test, when you have > many of them, then they don't have the right size/confidence level for > an overall or joint test. (some packages report a Bonferroni > correction in this case) > > Josef > > >> >> i.e. >> >> def compareSnowData(jules_var): >> ? ?# Extract the 11 years of snow data and return >> ? ?outrows = 180 >> ? ?outcols = 360 >> ? ?numyears = 11 >> ? ?nummonths = 12 >> >> ? ?# Read various files >> ? ?fname="world_valid_jules_pts.ascii" >> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >> jo.read_land_points_ascii(fname, 1.0) >> >> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ >> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ >> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >> >> ? ?# grab some space >> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >> dtype=np.float32) >> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >> dtype=np.float32) >> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >> np.nan >> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >> np.nan >> >> ? ?# extract the data >> ? ?data1_snow = jules_data1[:,jules_var,:,0] >> ? ?data2_snow = jules_data2[:,jules_var,:,0] >> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >> ? ?#for month in xrange(numyears * nummonths): >> ? ?# ? ?for i in xrange(numpts): >> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >> ? ?# ? ? ? ?if data1 >= 0.0: >> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >> ? ?# ? ? ? ?else: >> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >> ? ?# ? ? ? ?if data2 > 0.0: >> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >> ? ?# ? ? ? ?else: >> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >> >> ? ?# exclude any months from *both* arrays where we have dodgy data, else >> we >> ? ?# can't do the correlations correctly!! >> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >> >> ? ?# put data on a regular grid... >> ? ?print 'regridding landpts...' >> ? ?for i in xrange(numpts): >> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >> func >> ? ? ? ?x = data1_snow[:,i] >> ? ? ? ?x = x[np.isfinite(x)] >> ? ? ? ?y = data2_snow[:,i] >> ? ? ? ?y = y[np.isfinite(y)] >> >> ? ? ? ?# r^2 >> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of >> data >> ? ? ? ?if len(x) and len(y) > 50: >> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >> (stats.pearsonr(x, y)[0])**2 >> >> ? ? ? ?# wilcox signed rank test >> ? ? ? ?# make sure we have enough samples to do the test >> ? ? ? ?d = x - y >> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero >> differences >> ? ? ? ?count = len(d) >> ? ? ? ?if count > 10: >> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >> ? ? ? ? ? ?# only map out sign different data >> ? ? ? ? ? ?if pval < 0.05: >> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >> np.mean(x - y) >> >> ? ?return (pearsonsr_snow, wilcoxStats_snow) >> >> >> josef.pktd wrote: >>> >>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >>>> >>>> Also I then need to remap the 2D array I make onto another grid (the >>>> world in >>>> this case). Which again I had am doing with a loop (note numpts is a >>>> lot >>>> bigger than my example above). >>>> >>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>> np.nan >>>> for i in xrange(numpts): >>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>> func >>>> ? ? ? ?x = data1_snow[:,i] >>>> ? ? ? ?x = x[np.isfinite(x)] >>>> ? ? ? ?y = data2_snow[:,i] >>>> ? ? ? ?y = y[np.isfinite(y)] >>>> >>>> ? ? ? ?# wilcox signed rank test >>>> ? ? ? ?# make sure we have enough samples to do the test >>>> ? ? ? ?d = x - y >>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>> non-zero >>>> differences >>>> ? ? ? ?count = len(d) >>>> ? ? ? ?if count > 10: >>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>> ? ? ? ? ? ?# only map out sign different data >>>> ? ? ? ? ? ?if pval < 0.05: >>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>> np.mean(x - y) >>>> >>>> Now I think I can push the data in one move into the wilcoxStats_snow >>>> array >>>> by removing the index, >>>> but I can't see how I will get the individual x and y pts for each >>>> array >>>> member correctly without the loop, this was my attempt which of course >>>> doesn't work! >>>> >>>> x = data1_snow[:,:] >>>> x = x[np.isfinite(x)] >>>> y = data2_snow[:,:] >>>> y = y[np.isfinite(y)] >>>> >>>> # r^2 >>>> # exclude v.small arrays, i.e. we need just less over 4 years of data >>>> if len(x) and len(y) > 50: >>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>> y)[0])**2 >>> >>> >>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>> might be stuck with the loop, since wilcoxon takes only two 1d arrays >>> at a time (if I read the help correctly). >>> >>> Also the presence of nans might force the use a loop. stats.mstats has >>> masked array versions, but I didn't see wilcoxon in the list. (Even >>> when vectorized operations would work with regular arrays, nan or >>> masked array versions still have to loop in many cases.) >>> >>> If you have many columns with count <= 10, so that wilcoxon is not >>> calculated then it might be worth to use only array operations up to >>> that point. If wilcoxon is calculated most of the time, then it's not >>> worth thinking too hard about this. >>> >>> Josef >>> >>> >>>> >>>> thanks. >>>> >>>> >>>> >>>> >>>> mdekauwe wrote: >>>>> >>>>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>>>> >>>>> I don't quite get what you mean about slicing with axis > 3. Is there >>>>> a >>>>> link you can recommend I should read? Does that mean given I have >>>>> 4dims >>>>> that Josef's suggestion would be more advised in this case? >>> >>> There were several discussions on the mailing lists (fancy slicing and >>> indexing). Your case is safe, but if you run in future into funny >>> shapes, you can look up the details. >>> when in doubt, I use np.arange(...) >>> >>> Josef >>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> josef.pktd wrote: >>>>>> >>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>> wrote: >>>>>>> >>>>>>> Thanks that works... >>>>>>> >>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the >>>>>>> step >>>>>>> I >>>>>>> was struggling with, so this forms a 2D array which replaces the the >>>>>>> two >>>>>>> for >>>>>>> loops? Do I have that right? >>>>>> >>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>> dimension, >>>>>> then you can use slicing. It might be faster. >>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>> dimensions can have some surprise switching of axes. >>>>>> >>>>>> Josef >>>>>> >>>>>>> >>>>>>> A lot quicker...! >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> josef.pktd wrote: >>>>>>>> >>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>>>> array, >>>>>>>>> but >>>>>>>>> avoid my current usage of the for loops for speed, as in reality >>>>>>>>> the >>>>>>>>> arrays >>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>> solution >>>>>>>>> as >>>>>>>>> well >>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>> difficult >>>>>>>>> to >>>>>>>>> get >>>>>>>>> over the habit of using loops (C convert for my sins). I get that >>>>>>>>> one >>>>>>>>> could >>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>> >>>>>>>>> i = np.arange(tsteps) >>>>>>>>> j = np.arange(numpts) >>>>>>>>> >>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> import numpy as np >>>>>>>>> >>>>>>>>> numpts=10 >>>>>>>>> tsteps = 12 >>>>>>>>> vari = 22 >>>>>>>>> >>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>> index = np.arange(numpts) >>>>>>>>> >>>>>>>>> for i in xrange(tsteps): >>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>> >>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>> >>>>>>>> I think this should do it >>>>>>>> >>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>>>>>> >>>>>>>> Josef >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Wed May 26 18:43:52 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 26 May 2010 18:43:52 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28686356.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> Message-ID: On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: > > Could you possibly if you have time explain further your comment re the > p-values, your suggesting I am misusing them? Depends on your use and interpretation test statistics, p-values are random variables, if you look at several tests at the same time, some p-values will be large just by chance. If, for example you just look at the largest test statistic, then the distribution for the max of several test statistics is not the same as the distribution for a single test statistic http://en.wikipedia.org/wiki/Multiple_comparisons http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm we also just had a related discussion for ANOVA post-hoc tests on the pystatsmodels group. Josef > > Thanks. > > > josef.pktd wrote: >> >> On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >>> >>> Sounds like I am stuck with the loop as I need to do the comparison for >>> each >>> pixel of the world and then I have a basemap function call which I guess >>> slows it down further...hmm >> >> I don't see much that could be done differently, after a brief look. >> >> stats.pearsonr could be replaced by an array version using directly >> the formula for correlation even with nans. wilcoxon looks slow, and I >> never tried or seen a faster version. >> >> just a reminder, the p-values are for a single test, when you have >> many of them, then they don't have the right size/confidence level for >> an overall or joint test. (some packages report a Bonferroni >> correction in this case) >> >> Josef >> >> >>> >>> i.e. >>> >>> def compareSnowData(jules_var): >>> ? ?# Extract the 11 years of snow data and return >>> ? ?outrows = 180 >>> ? ?outcols = 360 >>> ? ?numyears = 11 >>> ? ?nummonths = 12 >>> >>> ? ?# Read various files >>> ? ?fname="world_valid_jules_pts.ascii" >>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>> jo.read_land_points_ascii(fname, 1.0) >>> >>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ >>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \ >>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>> >>> ? ?# grab some space >>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>> dtype=np.float32) >>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>> dtype=np.float32) >>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>> np.nan >>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>> np.nan >>> >>> ? ?# extract the data >>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>> ? ?#for month in xrange(numyears * nummonths): >>> ? ?# ? ?for i in xrange(numpts): >>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >>> ? ?# ? ? ? ?if data1 >= 0.0: >>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>> ? ?# ? ? ? ?else: >>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>> ? ?# ? ? ? ?if data2 > 0.0: >>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>> ? ?# ? ? ? ?else: >>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>> >>> ? ?# exclude any months from *both* arrays where we have dodgy data, else >>> we >>> ? ?# can't do the correlations correctly!! >>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>> >>> ? ?# put data on a regular grid... >>> ? ?print 'regridding landpts...' >>> ? ?for i in xrange(numpts): >>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>> func >>> ? ? ? ?x = data1_snow[:,i] >>> ? ? ? ?x = x[np.isfinite(x)] >>> ? ? ? ?y = data2_snow[:,i] >>> ? ? ? ?y = y[np.isfinite(y)] >>> >>> ? ? ? ?# r^2 >>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of >>> data >>> ? ? ? ?if len(x) and len(y) > 50: >>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>> (stats.pearsonr(x, y)[0])**2 >>> >>> ? ? ? ?# wilcox signed rank test >>> ? ? ? ?# make sure we have enough samples to do the test >>> ? ? ? ?d = x - y >>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero >>> differences >>> ? ? ? ?count = len(d) >>> ? ? ? ?if count > 10: >>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>> ? ? ? ? ? ?# only map out sign different data >>> ? ? ? ? ? ?if pval < 0.05: >>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>> np.mean(x - y) >>> >>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>> >>> >>> josef.pktd wrote: >>>> >>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >>>>> >>>>> Also I then need to remap the 2D array I make onto another grid (the >>>>> world in >>>>> this case). Which again I had am doing with a loop (note numpts is a >>>>> lot >>>>> bigger than my example above). >>>>> >>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>> np.nan >>>>> for i in xrange(numpts): >>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>> func >>>>> ? ? ? ?x = data1_snow[:,i] >>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>> ? ? ? ?y = data2_snow[:,i] >>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>> >>>>> ? ? ? ?# wilcox signed rank test >>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>> ? ? ? ?d = x - y >>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>> non-zero >>>>> differences >>>>> ? ? ? ?count = len(d) >>>>> ? ? ? ?if count > 10: >>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>> ? ? ? ? ? ?# only map out sign different data >>>>> ? ? ? ? ? ?if pval < 0.05: >>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>> np.mean(x - y) >>>>> >>>>> Now I think I can push the data in one move into the wilcoxStats_snow >>>>> array >>>>> by removing the index, >>>>> but I can't see how I will get the individual x and y pts for each >>>>> array >>>>> member correctly without the loop, this was my attempt which of course >>>>> doesn't work! >>>>> >>>>> x = data1_snow[:,:] >>>>> x = x[np.isfinite(x)] >>>>> y = data2_snow[:,:] >>>>> y = y[np.isfinite(y)] >>>>> >>>>> # r^2 >>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data >>>>> if len(x) and len(y) > 50: >>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>> y)[0])**2 >>>> >>>> >>>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays >>>> at a time (if I read the help correctly). >>>> >>>> Also the presence of nans might force the use a loop. stats.mstats has >>>> masked array versions, but I didn't see wilcoxon in the list. (Even >>>> when vectorized operations would work with regular arrays, nan or >>>> masked array versions still have to loop in many cases.) >>>> >>>> If you have many columns with count <= 10, so that wilcoxon is not >>>> calculated then it might be worth to use only array operations up to >>>> that point. If wilcoxon is calculated most of the time, then it's not >>>> worth thinking too hard about this. >>>> >>>> Josef >>>> >>>> >>>>> >>>>> thanks. >>>>> >>>>> >>>>> >>>>> >>>>> mdekauwe wrote: >>>>>> >>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>>>>> >>>>>> I don't quite get what you mean about slicing with axis > 3. Is there >>>>>> a >>>>>> link you can recommend I should read? Does that mean given I have >>>>>> 4dims >>>>>> that Josef's suggestion would be more advised in this case? >>>> >>>> There were several discussions on the mailing lists (fancy slicing and >>>> indexing). Your case is safe, but if you run in future into funny >>>> shapes, you can look up the details. >>>> when in doubt, I use np.arange(...) >>>> >>>> Josef >>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>>> josef.pktd wrote: >>>>>>> >>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>> wrote: >>>>>>>> >>>>>>>> Thanks that works... >>>>>>>> >>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the >>>>>>>> step >>>>>>>> I >>>>>>>> was struggling with, so this forms a 2D array which replaces the the >>>>>>>> two >>>>>>>> for >>>>>>>> loops? Do I have that right? >>>>>>> >>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>> dimension, >>>>>>> then you can use slicing. It might be faster. >>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>> dimensions can have some surprise switching of axes. >>>>>>> >>>>>>> Josef >>>>>>> >>>>>>>> >>>>>>>> A lot quicker...! >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>> josef.pktd wrote: >>>>>>>>> >>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>>>>> array, >>>>>>>>>> but >>>>>>>>>> avoid my current usage of the for loops for speed, as in reality >>>>>>>>>> the >>>>>>>>>> arrays >>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>> solution >>>>>>>>>> as >>>>>>>>>> well >>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>> difficult >>>>>>>>>> to >>>>>>>>>> get >>>>>>>>>> over the habit of using loops (C convert for my sins). I get that >>>>>>>>>> one >>>>>>>>>> could >>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>> >>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>> j = np.arange(numpts) >>>>>>>>>> >>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> import numpy as np >>>>>>>>>> >>>>>>>>>> numpts=10 >>>>>>>>>> tsteps = 12 >>>>>>>>>> vari = 22 >>>>>>>>>> >>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>> index = np.arange(numpts) >>>>>>>>>> >>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>> >>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>> >>>>>>>>> I think this should do it >>>>>>>>> >>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0] >>>>>>>>> >>>>>>>>> Josef >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From david at silveregg.co.jp Wed May 26 21:21:04 2010 From: david at silveregg.co.jp (David) Date: Thu, 27 May 2010 10:21:04 +0900 Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine In-Reply-To: References: Message-ID: <4BFDC900.8030802@silveregg.co.jp> On 05/27/2010 01:54 AM, Robert Kern wrote: > Not really. It is not intended for such purposes. It is intended for > the easy deployment and horizontal scaling of web applications. Each > individual request is very short; it is limited to 10 seconds of CPU > time. While numpy would be useful for scientific web applications (not > least because it would help you keep to that 10 second limit when > doing things like simple image processing or summary statistics or > whatever), it is not a source of CPU cycles. Besides what Robert said, I would also mention the datastore limitations given by the Google App Engine (no big blob of data, high latency, etc...) which make it quite hard to do something non-trivial even assuming numpy were available. It is also my understanding that EC2 is pretty competitive compared to GAE (but of course GAE does more for you). I did not know about picloud until two days ago, and I have only used GAE for a couple of months at work, but picloud seems like a much more usable service for scientific computing to me. cheers, David From chanley at stsci.edu Wed May 26 21:30:56 2010 From: chanley at stsci.edu (Christopher Hanley) Date: Wed, 26 May 2010 21:30:56 -0400 Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine In-Reply-To: <4BFDC900.8030802@silveregg.co.jp> References: <4BFDC900.8030802@silveregg.co.jp> Message-ID: On Wed, May 26, 2010 at 9:21 PM, David wrote: > On 05/27/2010 01:54 AM, Robert Kern wrote: > >> Not really. It is not intended for such purposes. It is intended for >> the easy deployment and horizontal scaling of web applications. Each >> individual request is very short; it is limited to 10 seconds of CPU >> time. While numpy would be useful for scientific web applications (not >> least because it would help you keep to that 10 second limit when >> doing things like simple image processing or summary statistics or >> whatever), it is not a source of CPU cycles. > > Besides what Robert said, I would also mention the datastore limitations > given by the Google App Engine (no big blob of data, high latency, > etc...) which make it quite hard to do something non-trivial even > assuming numpy were available. > > It is also my understanding that EC2 is pretty competitive compared to > GAE (but of course GAE does more for you). I did not know about picloud > until two days ago, and I have only used GAE for a couple of months at > work, but picloud seems like a much more usable service for scientific > computing to me. > > cheers, > > David If you are looking for a large data blob for GAE you might want to sign up for the Google Storage for Developers service. It was just introduced at Google I/O. You can find the project here: http://code.google.com/apis/storage/ Chris -- Christopher Hanley Senior Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338 From linda.polman at gmail.com Thu May 27 04:09:12 2010 From: linda.polman at gmail.com (Linda) Date: Thu, 27 May 2010 10:09:12 +0200 Subject: [SciPy-User] finding frequency of wav Message-ID: Hello all, I have a digital signal where the bits in it are encoded with frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a samplerate of 22050. My goal is to find the bits again so I can decode the message in it, for that I have chopped the wav up in pieces of 18 samples, which would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have a list of chunks of length 18. I thought I could just fft each chunks and find the max of the chunk-spectrum, to find out the bitfrequency in the chunk (and thus the bitvalue) But somehow I am stuck in the numbers, I was hopeing you could give me a hint. here is what I have: chunks[3] #this is one of the wavchunks, there should be a bit hidden in here Out[98]: array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, 0, 0], dtype=int16) test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the value of the bitfrequency 1300 of 2100 Hz? test Out[100]: array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) I am unsure how to proceed from here, so I would really appreciate any tips.. I found fftfreq, but I am not sure how to use it? I read fftfreq? but I don't see how the example even uses the 'fourier' variable in the fftfreq there? Thanks in advance Linda -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Thu May 27 09:28:42 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 27 May 2010 10:28:42 -0300 Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: Message-ID: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : > Hello all, > I have a digital signal where the bits in it are encoded with > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a > samplerate of 22050. > My goal is to find the bits again so I can decode the message in it, > for that I have chopped the wav up in pieces of 18 samples, which > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have > a list of chunks of length 18. I thought I could just fft each chunks > and find the max of the chunk-spectrum, to find out the bitfrequency > in the chunk (and thus the bitvalue) Correct me if I am wrong. You are cutting your signal into chunks that you expect to contain at least one period of the lower coding frequency. You then perform a fft on a very small signal (18 samples) which gives you (without zero padding) an estimation of the Fourier transform of your chunk computed on only 18 frequencies, i.e. with a really bad frequential resolution. It is possible if your coding frequencies are not too close. A raw Rayleight criteria leads to cut your signal into at least N=2*Fe/df_min where df_min is the minimal spacing between two coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a power of 2). > > But somehow I am stuck in the numbers, I was hopeing you could give me > a hint. here is what I have: > chunks[3] #this is one of the wavchunks, there should be a bit hidden in here > Out[98]: > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, 0, 0], dtype=int16) > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the value of the bitfrequency 1300 of 2100 Hz? > test > Out[100]: > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) > > > I am unsure how to proceed from here, so I would really appreciate any > tips.. I found fftfreq, but I am not sure how to use it? I read > fftfreq? but I don't see how the example even uses the 'fourier' > variable in the fftfreq there? > Fftfreq is a function that constructs the frequency vector associated to the data computed by the fft algorithm. It is aware of how fft orders the frequency bins, and transform it in a more convenient way (it 'anti-aliases', centering the results on zero frequency). import numpy as np import matplotlib.pyplot as plt chunks[3]=.... test = np.fft.fft(chunks[3]) frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling period plt.plot(frequencies, np.abs(test), 'o') plt.show() but you won't see any things on this fft. I am suspicious due to the fact that the signal to noise ratio seems rather low leading to strong peak at Fe/2 In chunk[3], what do you expect to be the bit? Fabricio From josef.pktd at gmail.com Thu May 27 10:38:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 May 2010 10:38:46 -0400 Subject: [SciPy-User] script to rst converter for examples, tutorials Message-ID: a bit off-topic I would like to write a tutorial in a python script file that can then be converted to restructured text for sphinx. I have seen various versions in the past, the most recent in pymvpa. However, I would also like to automatically include interactive output or print-results as an option, which doesn't seem possible with the pymvpa version. http://pypi.python.org/pypi/Pweave looks interesting but is not based on a valid python script. Are there any recommendations? Thanks, Josef From linda.polman at gmail.com Thu May 27 10:42:05 2010 From: linda.polman at gmail.com (Linda) Date: Thu, 27 May 2010 16:42:05 +0200 Subject: [SciPy-User] finding frequency of wav In-Reply-To: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: Thanks for your reply. The explanation on fftfreq already made a few puzzle pieces fall into place. The signal I am trying to decode is a DSC transmission that is recorded in a wav file. (Digital Selective Calling, used in marine radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and there's a carrier at 1700Hz. That should be all frequencies involved (apart from noise). Currently I am used generated, clean signals. But probably I should get a clean '10101010'-signal first to try my work on. Since the bitrate is set at 1200bits/sec, the bit length would be samplerate/1200 = 18.4 samples at 22050. I can double the samplerate to 44100, but that still leaves me at only 36.8 samples per chunk. If I understand what you say correctly, I would need at least 55 (64) samples in each chunk? I'm not sure what chunk[3] would have been, I should have used a dotting-signal instead of an unknown message to try this on. I will try this again with more useful data this afternoon. cheers, Linda On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: > Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : > > Hello all, > > > I have a digital signal where the bits in it are encoded with > > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a > > samplerate of 22050. > > My goal is to find the bits again so I can decode the message in it, > > for that I have chopped the wav up in pieces of 18 samples, which > > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have > > a list of chunks of length 18. I thought I could just fft each chunks > > and find the max of the chunk-spectrum, to find out the bitfrequency > > in the chunk (and thus the bitvalue) > > Correct me if I am wrong. You are cutting your signal into chunks that > you expect to contain at least one period of the lower coding frequency. > You then perform a fft on a very small signal (18 samples) which gives > you (without zero padding) an estimation of the Fourier transform of > your chunk computed on only 18 frequencies, i.e. with a really bad > frequential resolution. It is possible if your coding frequencies are > not too close. A raw Rayleight criteria leads to cut your signal into at > least N=2*Fe/df_min where df_min is the minimal spacing between two > coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a > power of 2). > > > > But somehow I am stuck in the numbers, I was hopeing you could give me > > a hint. here is what I have: > > > chunks[3] #this is one of the wavchunks, there should be a bit hidden in > here > > Out[98]: > > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, > 0, 0], dtype=int16) > > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me > the value of the bitfrequency 1300 of 2100 Hz? > > test > > Out[100]: > > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, > > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, > > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, > > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, > > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, > > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, > > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, > > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, > > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) > > > > > > I am unsure how to proceed from here, so I would really appreciate any > > tips.. I found fftfreq, but I am not sure how to use it? I read > > fftfreq? but I don't see how the example even uses the 'fourier' > > variable in the fftfreq there? > > > Fftfreq is a function that constructs the frequency vector associated to > the data computed by the fft algorithm. It is aware of how fft orders > the frequency bins, and transform it in a more convenient way (it > 'anti-aliases', centering the results on zero frequency). > > import numpy as np > import matplotlib.pyplot as plt > chunks[3]=.... > test = np.fft.fft(chunks[3]) > frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling > period > plt.plot(frequencies, np.abs(test), 'o') > plt.show() > > but you won't see any things on this fft. I am suspicious due to the > fact that the signal to noise ratio seems rather low leading to strong > peak at Fe/2 > In chunk[3], what do you expect to be the bit? > > Fabricio > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu May 27 11:05:33 2010 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 27 May 2010 11:05:33 -0400 Subject: [SciPy-User] script to rst converter for examples, tutorials In-Reply-To: References: Message-ID: <4BFE8A3D.9070103@american.edu> pylit? http://pylit.berlios.de/literate-programming/index.html pyreport? http://gael-varoquaux.info/computers/pyreport/ Alan From silva at lma.cnrs-mrs.fr Thu May 27 11:02:21 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 27 May 2010 12:02:21 -0300 Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <1274972542.2121.60.camel@Portable-s2m.cnrs-mrs.fr> Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit : > The signal I am trying to decode is a DSC transmission that is > recorded in a wav file. (Digital Selective Calling, used in marine > radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is > 1300 Hz and there's a carrier at 1700Hz. That should be all > frequencies involved (apart from noise). Currently I am used > generated, clean signals. But probably I should get a clean > '10101010'-signal first to try my work on. > Since the bitrate is set at 1200bits/sec, the bit length would be > samplerate/1200 = 18.4 samples at 22050. I can double the samplerate > to 44100, but that still leaves me at only 36.8 samples per chunk. Then a chuck is what you can consider a stationary signal with a single frequency. Due to the bitrate, it has 18 samples, so that, due to the limited observation time range, its Fourier transform is a lobe centered on 2100 or 1300Hz whose relative bandwidth is the inverse of the number of periods that whould have been observed (neglecting noise). Result : width = frequency/#periods = 1300/{almost 1} i.e. the very limited number of samples in each sample lead to, in frequency domain, a lobe whose width is almost the same as the central frequency! > If I understand what you say correctly, I would need at least 55 (64) samples in each chunk? In my previous answer, I only consider a very raw Rayleigth criteria: two coding frequencies would at least be spaced by two computed fft frequencies. Zero padding can easily deal with this problem. But the width of the lobes (linked with the limited number of periods observed) may be a lot more tricky to solve. But you can still use estimation methods based for example on the spectrum centro?d to determine whether the bit is now on 0 or 1. > > > I'm not sure what chunk[3] would have been, I should have used a > dotting-signal instead of an unknown message to try this on. I will > try this again with more useful data this afternoon. From silva at lma.cnrs-mrs.fr Thu May 27 11:42:45 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 27 May 2010 12:42:45 -0300 Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr> Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit : > The signal I am trying to decode is a DSC transmission that is > recorded in a wav file. (Digital Selective Calling, used in marine > radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is > 1300 Hz and there's a carrier at 1700Hz. That should be all > frequencies involved (apart from noise). Currently I am used > generated, clean signals. But probably I should get a clean > '10101010'-signal first to try my work on. > Since the bitrate is set at 1200bits/sec, the bit length would be > samplerate/1200 = 18.4 samples at 22050. I can double the samplerate > to 44100, but that still leaves me at only 36.8 samples per chunk. Then a chuck is what you can consider a stationary signal with a single frequency. Due to the bitrate and the sampling frequency, it has only 18 samples, so that, its Fourier transform is a lobe centered on 2100 or 1300Hz whose relative bandwidth is the inverse of the number of periods that whould have been observed (neglecting noise). Result : width = frequency/#periods = 1300/{almost 1} Hz i.e. the very limited number of samples in each sample lead to, in frequency domain, a lobe whose width is almost the same as the central frequency! because the bitrate and carrier are close... > If I understand what you say correctly, I would need at least 55 (64) samples in each chunk? In my previous answer, I only consider a very raw Rayleigth criteria: two coding frequencies would at least be spaced by two computed fft frequencies. Zero padding can easily deal with this problem. But the width of the lobes (linked with the limited number of periods observed) may be a lot more tricky to solve. But you can still use estimation methods based for example on the spectrum centro?d to determine whether the bit is now on 0 or 1. > > > I'm not sure what chunk[3] would have been, I should have used a > dotting-signal instead of an unknown message to try this on. I will > try this again with more useful data this afternoon. From linda.polman at gmail.com Thu May 27 12:44:16 2010 From: linda.polman at gmail.com (Linda) Date: Thu, 27 May 2010 18:44:16 +0200 Subject: [SciPy-User] finding frequency of wav In-Reply-To: <1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr> References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> <1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: Thanks again for you explanation :-) This certainly helps. On Thu, May 27, 2010 at 17:42, Fabrice Silva wrote: > Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit : > > > The signal I am trying to decode is a DSC transmission that is > > recorded in a wav file. (Digital Selective Calling, used in marine > > radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is > > 1300 Hz and there's a carrier at 1700Hz. That should be all > > frequencies involved (apart from noise). Currently I am used > > generated, clean signals. But probably I should get a clean > > '10101010'-signal first to try my work on. > > Since the bitrate is set at 1200bits/sec, the bit length would be > > samplerate/1200 = 18.4 samples at 22050. I can double the samplerate > > to 44100, but that still leaves me at only 36.8 samples per chunk. > > Then a chuck is what you can consider a stationary signal with a single > frequency. Due to the bitrate and the sampling frequency, it has only 18 > samples, so that, its Fourier transform is a lobe centered > on 2100 or 1300Hz whose relative bandwidth is the inverse of the number > of periods that whould have been observed (neglecting noise). > Result : width = frequency/#periods = 1300/{almost 1} Hz > i.e. the very limited number of samples in each sample lead to, in > frequency domain, a lobe whose width is almost the same as the central > frequency! because the bitrate and carrier are close... > > > If I understand what you say correctly, I would need at least 55 (64) > samples in each chunk? > In my previous answer, I only consider a very raw Rayleigth criteria: > two coding frequencies would at least be spaced by two computed fft > frequencies. Zero padding can easily deal with this problem. But the > width of the lobes (linked with the limited number of periods observed) > may be a lot more tricky to solve. But you can still use estimation > methods based for example on the spectrum centro?d to determine whether > the bit is now on 0 or 1. > > > > > > I'm not sure what chunk[3] would have been, I should have used a > > dotting-signal instead of an unknown message to try this on. I will > > try this again with more useful data this afternoon. > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From justin.t.riley at gmail.com Thu May 27 13:53:31 2010 From: justin.t.riley at gmail.com (Justin Riley) Date: Thu, 27 May 2010 13:53:31 -0400 Subject: [SciPy-User] StarCluster 0.91 - NumPy/SciPy Clusters on EC2 In-Reply-To: References: Message-ID: <4BFEB19B.5040804@gmail.com> This is a one-time message to announce the availability of version 0.91 of the StarCluster package. Why should you care? StarCluster allows you to create NumPy/SciPy clusters configured with NFS-shared filesystems and the Sun Grid Engine queueing system out of the box on Amazon's Elastic Compute Cloud (EC2). The NumPy/SciPy installations have been compiled against a custom-compiled ATLAS for the larger EC2 instances. About ----- There is an article about StarCluster on www.hpcinthecloud.com: http://www.hpcinthecloud.com/features/StarCluster-Brings-HPC-to-the-Amazon-Cloud-94099324.html There is also a screencast of installing, configuring, launching, and terminating an HPC cluster on Amazon EC2: http://www.hpcinthecloud.com/blogs/MITs-StarCluster-An-Update-with-Screencast-94599554.html Project description from PyPI: StarCluster is a utility for creating and managing scientific computing clusters hosted on Amazon's Elastic Compute Cloud (EC2). StarCluster utilizes Amazon's EC2 web service to create and destroy clusters of Linux virtual machines on demand. To get started, the user creates a simple configuration file with their AWS account details and a few cluster preferences (e.g. number of machines, machine type, ssh keypairs, etc). After creating the configuration file and running StarCluster's "start" command, a cluster of Linux machines configured with the Sun Grid Engine queuing system, an NFS-shared /home directory, and OpenMPI with password-less ssh is created and ready to go out-of-the-box. Running StarCluster's "stop" command will shutdown the cluster and stop paying for service. This allows the user to only pay for what they use. StarCluster provides a Ubuntu-based Amazon Machine Image (AMI) in 32bit and 64bit architectures. The AMI contains an optimized NumPy/SciPy/Atlas/Blas/Lapack installation compiled for the larger Amazon EC2 instance types. The AMI also comes with Sun Grid Engine (SGE) and OpenMPI compiled with SGE support. The public AMI can easily be customized by launching a single instance of the public AMI, installing additional software on the instance, and then using StarCluster can also utilize Amazon's Elastic Block Storage (EBS) volumes to provide persistent data storage for a cluster. EBS volumes allow you to store large amounts of data in the Amazon cloud and are also easy to back-up and replicate in the cloud. StarCluster will mount and NFS-share any volumes specified in the config. StarCluster's "createvolume" command provides the ability to automatically create, format, and partition new EBS volumes for use with StarCluster. Download -------- StarCluster is available on PyPI (http://pypi.python.org/pypi/StarCluster) and also on the project's website: http://web.mit.edu/starcluster You will find the docs as well as links to the StarCluster mailing list on the website. New in this version: -------------------- * support for launching and managing multiple clusters on EC2 * added "listclusters" command for showing all active clusters on EC2 * support for attaching and NFS-sharing multiple EBS volumes * added createimage and createvolume commands for easily creating new AMIs and EBS volumes for use with StarCluster * experimental support for launching clusters using spot instances * added support for StarCluster "plugins" that provide the ability to perform additional configuration/setup routines on top of StarCluster's default cluster configuration * added "listpublic" command for listing all available public StarCluser AMIs that can be used with StarCluster * bash/zsh command line completion for StarCluster's command line interface From david_baddeley at yahoo.com.au Thu May 27 17:12:12 2010 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Thu, 27 May 2010 14:12:12 -0700 (PDT) Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <792740.87711.qm@web33006.mail.mud.yahoo.com> Hi Linda, I probably wouldn't divide the signal up into chunks before procesing, and also suspect that the FFT might be the wrong tool for the job (I'd certainly take the fft of the whole signal just to check that you have the right frequencies in it though). The problem with dividing into chunks and processing each separately is that you don't necessarily know where each bit start and stops - your chunks are thus, more likely than not, going to be misaligned. I'd probably tackle the problem with a strategy directly analogous to that used in analogue circuitry for decoding PSK - I'd either mix the carrier out & do I-Q detection (multiply with a complex exponential and then look at the low pass filtered real & imaginary parts of the result), or just look for the two frequency components separately by multiplying with a complex exponential at each frequency & low pass filtering the amplitude (I'd probably use a boxcar filter the same length as your symbols/frames). After doing this you can then start to decide where your frame boundaries are. If you've filtered as described, you should just be able to start at some offset and then take every 18th value. hope this gives you some ideas, David ________________________________ From: Linda To: SciPy Users List Sent: Fri, 28 May, 2010 2:42:05 AM Subject: Re: [SciPy-User] finding frequency of wav Thanks for your reply. The explanation on fftfreq already made a few puzzle pieces fall into place. The signal I am trying to decode is a DSC transmission that is recorded in a wav file. (Digital Selective Calling, used in marine radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and there's a carrier at 1700Hz. That should be all frequencies involved (apart from noise). Currently I am used generated, clean signals. But probably I should get a clean '10101010'-signal first to try my work on. Since the bitrate is set at 1200bits/sec, the bit length would be samplerate/1200 = 18.4 samples at 22050. I can double the samplerate to 44100, but that still leaves me at only 36.8 samples per chunk. If I understand what you say correctly, I would need at least 55 (64) samples in each chunk? I'm not sure what chunk[3] would have been, I should have used a dotting-signal instead of an unknown message to try this on. I will try this again with more useful data this afternoon. cheers, Linda On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : > >> Hello all, > >>> I have a digital signal where the bits in it are encoded with >>> frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a >>> samplerate of 22050. >>> My goal is to find the bits again so I can decode the message in it, >>> for that I have chopped the wav up in pieces of 18 samples, which >>> would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have >>> a list of chunks of length 18. I thought I could just fft each chunks >>> and find the max of the chunk-spectrum, to find out the bitfrequency >>> in the chunk (and thus the bitvalue) > >Correct me if I am wrong. You are cutting your signal into chunks that >>you expect to contain at least one period of the lower coding frequency. >>You then perform a fft on a very small signal (18 samples) which gives >>you (without zero padding) an estimation of the Fourier transform of >>your chunk computed on only 18 frequencies, i.e. with a really bad >>frequential resolution. It is possible if your coding frequencies are >>not too close. A raw Rayleight criteria leads to cut your signal into at >>least N=2*Fe/df_min where df_min is the minimal spacing between two >>coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a >>power of 2). > >> >>> But somehow I am stuck in the numbers, I was hopeing you could give me >>> a hint. here is what I have: > >>> chunks[3] #this is one of the wavchunks, there should be a bit hidden in here >>> Out[98]: >>> array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, 0, 0], dtype=int16) >>> test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the value of the bitfrequency 1300 of 2100 Hz? >>> test >>> Out[100]: >>> array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, >>> 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, >>> 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, >>> -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, >>> 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, >>> 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, >>> -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, >>> 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, >>> 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) >>> >>> >>> I am unsure how to proceed from here, so I would really appreciate any >>> tips.. I found fftfreq, but I am not sure how to use it? I read >>> fftfreq? but I don't see how the example even uses the 'fourier' >>> variable in the fftfreq there? >>> >Fftfreq is a function that constructs the frequency vector associated to >>the data computed by the fft algorithm. It is aware of how fft orders >>the frequency bins, and transform it in a more convenient way (it >>'anti-aliases', centering the results on zero frequency). > >>import numpy as np >>import matplotlib.pyplot as plt >>chunks[3]=.... >>test = np.fft.fft(chunks[3]) >>frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling period >>plt.plot(frequencies, np.abs(test), 'o') >>plt.show() > >>but you won't see any things on this fft. I am suspicious due to the >>fact that the signal to noise ratio seems rather low leading to strong >>peak at Fe/2 >>In chunk[3], what do you expect to be the bit? > >>Fabricio > >>_______________________________________________ >>SciPy-User mailing list >SciPy-User at scipy.org >http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From afraser at lanl.gov Thu May 27 17:37:07 2010 From: afraser at lanl.gov (Andy Fraser) Date: Thu, 27 May 2010 15:37:07 -0600 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: (Robin's message of "Tue\, 25 May 2010 22\:50\:54 +0100") References: <8739xgndes.fsf@lanl.gov> Message-ID: <8763292fi4.fsf@lanl.gov> Thanks for the replies and pointers. I got multiprocessing.Pool to work, but it eats up memory and time. I append two implementation segments below. The multiprocessing version is about 33 times _slower_ than the single processor version. Unless I use a small number of processors, memory fills up and I kill the job to make the computer usable again. The following segments of code are inside a loop that steps over 115 lines of pixels. def func(job): return job[0].random_fork(job[1]) . . . . . . #Multiprocessing version: noise = numpy.random.standard_normal((N_particles,noise_df)) jobs = zip(self.particles,noise) self.particles = self.pool.map(func, jobs, self.chunk_size) return (m,v) . . . . . . #Single processing version noise = numpy.random.standard_normal((N_particles,noise_df)) jobs = zip(self.particles,noise) self.particles = map(func, jobs) return (m,v) -- Andy Fraser ISR-2 (MS:B244) afraser at lanl.gov Los Alamos National Laboratory 505 665 9448 Los Alamos, NM 87545 From zachary.pincus at yale.edu Thu May 27 23:13:20 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 27 May 2010 23:13:20 -0400 Subject: [SciPy-User] using multiple processors for particle filtering In-Reply-To: <8763292fi4.fsf@lanl.gov> References: <8739xgndes.fsf@lanl.gov> <8763292fi4.fsf@lanl.gov> Message-ID: <97132D07-D91C-45F1-BACD-AAE476E91F9F@yale.edu> > Thanks for the replies and pointers. I got multiprocessing.Pool to > work, but it eats up memory and time. I append two implementation > segments below. The multiprocessing version is about 33 times > _slower_ than the single processor version. Unless I use a small > number of processors, memory fills up and I kill the job to make the > computer usable again. The following segments of code are inside a > loop that steps over 115 lines of pixels. Several problems here: (1) I am sorry I didn't mention this earlier, but looking over your original email, it appears that your single-process code might be very inefficient: it seems to perturb each particle individually in a for- loop rather than working on an array of all the particles. Perhaps you should try to fix that before adding multiprocessing? Basically, you should hopefully be able to write random_fork to work on a number of particles at once using numpy broadcasting, etc. This way, the for- loop that steps through the elements is implemented in compiled C, rather than interpreted python. Check out various numpy tutorials for details, but here's the general gist: points = numpy.arange(6000).reshape((3000,2)) # 3000 x,y points perturbations = numpy.random.normal(size=(3000,2)) def perturb_bad(points, perturbations): for point, perturbation in zip(points, perturbations): point += perturbation def perturb_good(points, perturbations): points += perturbations timeit perturb_bad(points, perturbations) # 10 loops, best of 3: 18.7 milliseconds per loop timeit perturb_good(points, perturbations) # 10000 loops, best of 3: 161 microseconds per loop Compare this orders-of-magnitude gain to the at-best-8-fold gain you'd get from multiprocessing the bad code. Also note that "map" is basically just an interpreted for-loop under the hood: import operator timeit map(operator.add, points, perturbations) # 10 loops, best of 3: 18.7 milliseconds per loop The moral here is to avoid looping constructs in python when working with sets of numbers and instead use numpy operations that operate on lots of numbers with one python command. (2) From the slowdowns you report, it looks like overhead costs are completely dominating. For each job, the code and data need to be serialized (pickled, I think, is how the multiprocessing library handles it), written to a pipe, unpickled, executed, and the results need to be pickled, sent back, and unpickled. Perhaps using memmap to share state might be better? Or you can make sure that the function parameters and results can be very rapidly pickled and unpickled (single numpy arrays, e.g., not lists-of-sub-arrays or something). Still, tune the single-processor code first. Perhaps you can send more detailed code samples and folks on the list can offer some advice about how to make it numpy-friendly and fast. Zach On May 27, 2010, at 5:37 PM, Andy Fraser wrote: > Thanks for the replies and pointers. I got multiprocessing.Pool to > work, but it eats up memory and time. I append two implementation > segments below. The multiprocessing version is about 33 times > _slower_ than the single processor version. Unless I use a small > number of processors, memory fills up and I kill the job to make the > computer usable again. The following segments of code are inside a > loop that steps over 115 lines of pixels. > > def func(job): > return job[0].random_fork(job[1]) > > . > . > . > . > . > . > > > #Multiprocessing version: > > noise = numpy.random.standard_normal((N_particles,noise_df)) > jobs = zip(self.particles,noise) > self.particles = self.pool.map(func, jobs, self.chunk_size) > return (m,v) > > . > . > . > . > . > . > > #Single processing version > > noise = numpy.random.standard_normal((N_particles,noise_df)) > jobs = zip(self.particles,noise) > self.particles = map(func, jobs) > return (m,v) > > -- > Andy Fraser ISR-2 (MS:B244) > afraser at lanl.gov Los Alamos National Laboratory > 505 665 9448 Los Alamos, NM 87545 > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From linda.polman at gmail.com Fri May 28 03:03:33 2010 From: linda.polman at gmail.com (Linda) Date: Fri, 28 May 2010 09:03:33 +0200 Subject: [SciPy-User] finding frequency of wav In-Reply-To: <792740.87711.qm@web33006.mail.mud.yahoo.com> References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> <792740.87711.qm@web33006.mail.mud.yahoo.com> Message-ID: Thank you, it certainly does give me ideas :-) I will look into this today. Linda On Thu, May 27, 2010 at 23:12, David Baddeley wrote: > Hi Linda, > > I probably wouldn't divide the signal up into chunks before procesing, and > also suspect that the FFT might be the wrong tool for the job (I'd certainly > take the fft of the whole signal just to check that you have the right > frequencies in it though). > > The problem with dividing into chunks and processing each separately is > that you don't necessarily know where each bit start and stops - your chunks > are thus, more likely than not, going to be misaligned. > > I'd probably tackle the problem with a strategy directly analogous to that > used in analogue circuitry for decoding PSK - I'd either mix the carrier out > & do I-Q detection (multiply with a complex exponential and then look at the > low pass filtered real & imaginary parts of the result), or just look for > the two frequency components separately by multiplying with a complex > exponential at each frequency & low pass filtering the amplitude (I'd > probably use a boxcar filter the same length as your symbols/frames). > > After doing this you can then start to decide where your frame boundaries > are. If you've filtered as described, you should just be able to start at > some offset and then take every 18th value. > > hope this gives you some ideas, > > David > ------------------------------ > *From:* Linda > *To:* SciPy Users List > *Sent:* Fri, 28 May, 2010 2:42:05 AM > *Subject:* Re: [SciPy-User] finding frequency of wav > > Thanks for your reply. The explanation on fftfreq already made a few puzzle > pieces fall into place. > > The signal I am trying to decode is a DSC transmission that is recorded in > a wav file. (Digital Selective Calling, used in marine radio) > It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and > there's a carrier at 1700Hz. That should be all frequencies involved (apart > from noise). Currently I am used generated, clean signals. But probably I > should get a clean '10101010'-signal first to try my work on. > > Since the bitrate is set at 1200bits/sec, the bit length would be > samplerate/1200 = 18.4 samples at 22050. > I can double the samplerate to 44100, but that still leaves me at only 36.8 > samples per chunk. > If I understand what you say correctly, I would need at least 55 (64) > samples in each chunk? > > I'm not sure what chunk[3] would have been, I should have used a > dotting-signal instead of an unknown message to try this on. I will try this > again with more useful data this afternoon. > > cheers, > Linda > > > On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: > >> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : >> > Hello all, >> >> > I have a digital signal where the bits in it are encoded with >> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a >> > samplerate of 22050. >> > My goal is to find the bits again so I can decode the message in it, >> > for that I have chopped the wav up in pieces of 18 samples, which >> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have >> > a list of chunks of length 18. I thought I could just fft each chunks >> > and find the max of the chunk-spectrum, to find out the bitfrequency >> > in the chunk (and thus the bitvalue) >> >> Correct me if I am wrong. You are cutting your signal into chunks that >> you expect to contain at least one period of the lower coding frequency. >> You then perform a fft on a very small signal (18 samples) which gives >> you (without zero padding) an estimation of the Fourier transform of >> your chunk computed on only 18 frequencies, i.e. with a really bad >> frequential resolution. It is possible if your coding frequencies are >> not too close. A raw Rayleight criteria leads to cut your signal into at >> least N=2*Fe/df_min where df_min is the minimal spacing between two >> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a >> power of 2). >> > >> > But somehow I am stuck in the numbers, I was hopeing you could give me >> > a hint. here is what I have: >> >> > chunks[3] #this is one of the wavchunks, there should be a bit hidden in >> here >> > Out[98]: >> > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, >> 0, 0], dtype=int16) >> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me >> the value of the bitfrequency 1300 of 2100 Hz? >> > test >> > Out[100]: >> > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, >> > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, >> > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, >> > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, >> > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, >> > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, >> > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, >> > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, >> > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) >> > >> > >> > I am unsure how to proceed from here, so I would really appreciate any >> > tips.. I found fftfreq, but I am not sure how to use it? I read >> > fftfreq? but I don't see how the example even uses the 'fourier' >> > variable in the fftfreq there? >> > >> Fftfreq is a function that constructs the frequency vector associated to >> the data computed by the fft algorithm. It is aware of how fft orders >> the frequency bins, and transform it in a more convenient way (it >> 'anti-aliases', centering the results on zero frequency). >> >> import numpy as np >> import matplotlib.pyplot as plt >> chunks[3]=.... >> test = np.fft.fft(chunks[3]) >> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling >> period >> plt.plot(frequencies, np.abs(test), 'o') >> plt.show() >> >> but you won't see any things on this fft. I am suspicious due to the >> fact that the signal to noise ratio seems rather low leading to strong >> peak at Fe/2 >> In chunk[3], what do you expect to be the bit? >> >> Fabricio >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophermarkstrickland at gmail.com Fri May 28 07:29:26 2010 From: christophermarkstrickland at gmail.com (Chris Strickland) Date: Fri, 28 May 2010 21:29:26 +1000 Subject: [SciPy-User] log pdf, cdf, etc Message-ID: Hi, When using any of the distributions of scipy.stats there does not seem to be the ability (or at least I cannot figure out how) to have the function return the log of the pdf, cdf, sf, etc. For statistical analysis this is essential. For instance suppose we are interested in an exponential distribution for a random variable x with a hyperparameter lambda there needs to be an option that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to calculate log(scipy.stats.expon.pdf(x,lambda)). Is there a way to do this using the distributions in scipy.stats? If there is not is it possible for me to suggest that this feature is added. There is such an excellent range of distributions, each with such an impressive range of options, it seems ashame to have to mostly manually code up the log of pdfs and often call the log of CDFs from R. Thanks, Chris. -------------- next part -------------- An HTML attachment was scrubbed... URL: From linda.polman at gmail.com Fri May 28 08:41:11 2010 From: linda.polman at gmail.com (Linda) Date: Fri, 28 May 2010 14:41:11 +0200 Subject: [SciPy-User] finding frequency of wav In-Reply-To: <792740.87711.qm@web33006.mail.mud.yahoo.com> References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> <792740.87711.qm@web33006.mail.mud.yahoo.com> Message-ID: Hello David, I decided that fft would not work on finding those bits, so I spent all day looking into your suggestions. I think my data signals are FSK (FM?) instead of PSK, there are no discontinuieties in the signals when I take a look at them in audacity. I'm not sure I completely understand the multiply with a complex exponential part. Should I multiply my data-array with exp( 1j * omega * T )? where omega would be 2 * pi * f_carrier (1700) and T would be 1/Fs? or the bitlength in samples? This task seems to be quite a bit more difficult than I initially thought (undergrad student) I would really appreciate some more help :-) Cheers Linda On Thu, May 27, 2010 at 23:12, David Baddeley wrote: > Hi Linda, > > I probably wouldn't divide the signal up into chunks before procesing, and > also suspect that the FFT might be the wrong tool for the job (I'd certainly > take the fft of the whole signal just to check that you have the right > frequencies in it though). > > The problem with dividing into chunks and processing each separately is > that you don't necessarily know where each bit start and stops - your chunks > are thus, more likely than not, going to be misaligned. > > I'd probably tackle the problem with a strategy directly analogous to that > used in analogue circuitry for decoding PSK - I'd either mix the carrier out > & do I-Q detection (multiply with a complex exponential and then look at the > low pass filtered real & imaginary parts of the result), or just look for > the two frequency components separately by multiplying with a complex > exponential at each frequency & low pass filtering the amplitude (I'd > probably use a boxcar filter the same length as your symbols/frames). > > After doing this you can then start to decide where your frame boundaries > are. If you've filtered as described, you should just be able to start at > some offset and then take every 18th value. > > hope this gives you some ideas, > > David > ------------------------------ > *From:* Linda > *To:* SciPy Users List > *Sent:* Fri, 28 May, 2010 2:42:05 AM > *Subject:* Re: [SciPy-User] finding frequency of wav > > Thanks for your reply. The explanation on fftfreq already made a few puzzle > pieces fall into place. > > The signal I am trying to decode is a DSC transmission that is recorded in > a wav file. (Digital Selective Calling, used in marine radio) > It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and > there's a carrier at 1700Hz. That should be all frequencies involved (apart > from noise). Currently I am used generated, clean signals. But probably I > should get a clean '10101010'-signal first to try my work on. > > Since the bitrate is set at 1200bits/sec, the bit length would be > samplerate/1200 = 18.4 samples at 22050. > I can double the samplerate to 44100, but that still leaves me at only 36.8 > samples per chunk. > If I understand what you say correctly, I would need at least 55 (64) > samples in each chunk? > > I'm not sure what chunk[3] would have been, I should have used a > dotting-signal instead of an unknown message to try this on. I will try this > again with more useful data this afternoon. > > cheers, > Linda > > > On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: > >> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : >> > Hello all, >> >> > I have a digital signal where the bits in it are encoded with >> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a >> > samplerate of 22050. >> > My goal is to find the bits again so I can decode the message in it, >> > for that I have chopped the wav up in pieces of 18 samples, which >> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have >> > a list of chunks of length 18. I thought I could just fft each chunks >> > and find the max of the chunk-spectrum, to find out the bitfrequency >> > in the chunk (and thus the bitvalue) >> >> Correct me if I am wrong. You are cutting your signal into chunks that >> you expect to contain at least one period of the lower coding frequency. >> You then perform a fft on a very small signal (18 samples) which gives >> you (without zero padding) an estimation of the Fourier transform of >> your chunk computed on only 18 frequencies, i.e. with a really bad >> frequential resolution. It is possible if your coding frequencies are >> not too close. A raw Rayleight criteria leads to cut your signal into at >> least N=2*Fe/df_min where df_min is the minimal spacing between two >> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a >> power of 2). >> > >> > But somehow I am stuck in the numbers, I was hopeing you could give me >> > a hint. here is what I have: >> >> > chunks[3] #this is one of the wavchunks, there should be a bit hidden in >> here >> > Out[98]: >> > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, 0, >> 0, 0], dtype=int16) >> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me >> the value of the bitfrequency 1300 of 2100 Hz? >> > test >> > Out[100]: >> > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, >> > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, >> > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, >> > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, >> > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, >> > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, >> > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, >> > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, >> > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) >> > >> > >> > I am unsure how to proceed from here, so I would really appreciate any >> > tips.. I found fftfreq, but I am not sure how to use it? I read >> > fftfreq? but I don't see how the example even uses the 'fourier' >> > variable in the fftfreq there? >> > >> Fftfreq is a function that constructs the frequency vector associated to >> the data computed by the fft algorithm. It is aware of how fft orders >> the frequency bins, and transform it in a more convenient way (it >> 'anti-aliases', centering the results on zero frequency). >> >> import numpy as np >> import matplotlib.pyplot as plt >> chunks[3]=.... >> test = np.fft.fft(chunks[3]) >> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling >> period >> plt.plot(frequencies, np.abs(test), 'o') >> plt.show() >> >> but you won't see any things on this fft. I am suspicious due to the >> fact that the signal to noise ratio seems rather low leading to strong >> peak at Fe/2 >> In chunk[3], what do you expect to be the bit? >> >> Fabricio >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 28 10:15:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 10:15:55 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 7:29 AM, Chris Strickland wrote: > Hi, > > When using any of the distributions of scipy.stats there does not seem to be > the ability (or at least I cannot figure out how) to have the function > return > the log of the pdf, cdf, sf, etc. For statistical analysis this is > essential. > For instance suppose we are interested in an exponential distribution for a > random variable x with a hyperparameter lambda there needs to be an option > that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to > calculate log(scipy.stats.expon.pdf(x,lambda)). > > Is there a way to do this using the distributions in scipy.stats? It would need a new method for each distribution, e.g. _loglike, _logpdf So, this is work, and for some distributions the log wouldn't simplify much. I proposed this once together with other improvements (but without response). The second useful method for estimation would be _fitstart, which provides distribution specific starting values for fit, e.g. a moment estimator, or a simple rules of thumb http://projects.scipy.org/scipy/ticket/808 Here are some of my currently planned enhancements to the distributions: http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py but I just checked, it looks like I forgot to copy the _loglike method that I started from my experimental scripts. For a few distributions, where this is possible, it would also be useful to add the gradient with respect to the parameters, (or even the Hessian). But this is currently mostly just an idea, since we need some analytical gradients in the estimation of stats models. > > If there is not is it possible for me to suggest that this feature is added. > There is such an excellent range of distributions, each with such an > impressive range of options, it seems ashame to have to mostly manually code > up the log of pdfs and often call the log of CDFs from R. So far I only thought about log pdf, because I wanted it for Maximum Likelihood estimation. Do you have a rough idea for which distributions log cdf would work? that is, for which distribution is an analytical or efficient numerical expression possible. I also think that scipy.stats.distributions could be one of the best (broadest, consistent) collection of univariate distributions that I have seen so far, once we fill in some missing pieces. As a way forward, I think we could make the distributions into a numerical encyclopedia by adding private methods to those distributions where it makes sense, like log pdf, log cdf and I also started to add characteristic functions to some distributions in my experimental scripts. If you have a collection of logpdf, logcdf, we could add a trac ticket for this. However, this would miss the generic broadcasting part of the public functions, pdf, cdf,... but for estimation I wouldn't necessarily call those because of the overhead. I'm working on and off on this, so it's moving only slowly (and my wishlist is big). (for example, I was reading up on extreme value distributions in actuarial science and hydrology to get a better overview over the estimators.) So, I really love to hear any ideas, feedback, and see contributions to improving the distributions. Josef > > Thanks, > Chris. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ben.root at ou.edu Fri May 28 10:27:31 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 May 2010 09:27:31 -0500 Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> <792740.87711.qm@web33006.mail.mud.yahoo.com> Message-ID: Linda, I am not very familiar with this particular topic, but you might want to look into Wavelet Analysis: http://en.wikipedia.org/wiki/Wavelet http://www.amara.com/IEEEwave/IW_wave_ana.html Fourier transforms are definately the wrong tool here because it assumes that the different frequency waves exists for the entire sample. My (basic) understanding of wavelet analysis is that it does not make that assumption. Hope this helps, Ben Root On Fri, May 28, 2010 at 7:41 AM, Linda wrote: > Hello David, > I decided that fft would not work on finding those bits, so I spent all day > looking into your suggestions. > > I think my data signals are FSK (FM?) instead of PSK, there are no > discontinuieties in the signals when I take a look at them in audacity. > > I'm not sure I completely understand the multiply with a complex > exponential part. Should I multiply my data-array with exp( 1j * omega * T > )? where omega would be 2 * pi * f_carrier (1700) and T would be 1/Fs? or > the bitlength in samples? > > This task seems to be quite a bit more difficult than I initially thought > (undergrad student) > > I would really appreciate some more help :-) > > Cheers > Linda > > On Thu, May 27, 2010 at 23:12, David Baddeley > wrote: > >> Hi Linda, >> >> I probably wouldn't divide the signal up into chunks before procesing, and >> also suspect that the FFT might be the wrong tool for the job (I'd certainly >> take the fft of the whole signal just to check that you have the right >> frequencies in it though). >> >> The problem with dividing into chunks and processing each separately is >> that you don't necessarily know where each bit start and stops - your chunks >> are thus, more likely than not, going to be misaligned. >> >> I'd probably tackle the problem with a strategy directly analogous to that >> used in analogue circuitry for decoding PSK - I'd either mix the carrier out >> & do I-Q detection (multiply with a complex exponential and then look at the >> low pass filtered real & imaginary parts of the result), or just look for >> the two frequency components separately by multiplying with a complex >> exponential at each frequency & low pass filtering the amplitude (I'd >> probably use a boxcar filter the same length as your symbols/frames). >> >> After doing this you can then start to decide where your frame boundaries >> are. If you've filtered as described, you should just be able to start at >> some offset and then take every 18th value. >> >> hope this gives you some ideas, >> >> David >> ------------------------------ >> *From:* Linda >> *To:* SciPy Users List >> *Sent:* Fri, 28 May, 2010 2:42:05 AM >> *Subject:* Re: [SciPy-User] finding frequency of wav >> >> Thanks for your reply. The explanation on fftfreq already made a few >> puzzle pieces fall into place. >> >> The signal I am trying to decode is a DSC transmission that is recorded in >> a wav file. (Digital Selective Calling, used in marine radio) >> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and >> there's a carrier at 1700Hz. That should be all frequencies involved (apart >> from noise). Currently I am used generated, clean signals. But probably I >> should get a clean '10101010'-signal first to try my work on. >> >> Since the bitrate is set at 1200bits/sec, the bit length would be >> samplerate/1200 = 18.4 samples at 22050. >> I can double the samplerate to 44100, but that still leaves me at only >> 36.8 samples per chunk. >> If I understand what you say correctly, I would need at least 55 (64) >> samples in each chunk? >> >> I'm not sure what chunk[3] would have been, I should have used a >> dotting-signal instead of an unknown message to try this on. I will try this >> again with more useful data this afternoon. >> >> cheers, >> Linda >> >> >> On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: >> >>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : >>> > Hello all, >>> >>> > I have a digital signal where the bits in it are encoded with >>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a >>> > samplerate of 22050. >>> > My goal is to find the bits again so I can decode the message in it, >>> > for that I have chopped the wav up in pieces of 18 samples, which >>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have >>> > a list of chunks of length 18. I thought I could just fft each chunks >>> > and find the max of the chunk-spectrum, to find out the bitfrequency >>> > in the chunk (and thus the bitvalue) >>> >>> Correct me if I am wrong. You are cutting your signal into chunks that >>> you expect to contain at least one period of the lower coding frequency. >>> You then perform a fft on a very small signal (18 samples) which gives >>> you (without zero padding) an estimation of the Fourier transform of >>> your chunk computed on only 18 frequencies, i.e. with a really bad >>> frequential resolution. It is possible if your coding frequencies are >>> not too close. A raw Rayleight criteria leads to cut your signal into at >>> least N=2*Fe/df_min where df_min is the minimal spacing between two >>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a >>> power of 2). >>> > >>> > But somehow I am stuck in the numbers, I was hopeing you could give me >>> > a hint. here is what I have: >>> >>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden >>> in here >>> > Out[98]: >>> > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, >>> 0, 0, 0], dtype=int16) >>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me >>> the value of the bitfrequency 1300 of 2100 Hz? >>> > test >>> > Out[100]: >>> > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, >>> > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, >>> > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, >>> > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, >>> > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, >>> > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, >>> > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, >>> > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, >>> > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) >>> > >>> > >>> > I am unsure how to proceed from here, so I would really appreciate any >>> > tips.. I found fftfreq, but I am not sure how to use it? I read >>> > fftfreq? but I don't see how the example even uses the 'fourier' >>> > variable in the fftfreq there? >>> > >>> Fftfreq is a function that constructs the frequency vector associated to >>> the data computed by the fft algorithm. It is aware of how fft orders >>> the frequency bins, and transform it in a more convenient way (it >>> 'anti-aliases', centering the results on zero frequency). >>> >>> import numpy as np >>> import matplotlib.pyplot as plt >>> chunks[3]=.... >>> test = np.fft.fft(chunks[3]) >>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling >>> period >>> plt.plot(frequencies, np.abs(test), 'o') >>> plt.show() >>> >>> but you won't see any things on this fft. I am suspicious due to the >>> fact that the signal to noise ratio seems rather low leading to strong >>> peak at Fe/2 >>> In chunk[3], what do you expect to be the bit? >>> >>> Fabricio >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From linda.polman at gmail.com Fri May 28 11:26:06 2010 From: linda.polman at gmail.com (Linda) Date: Fri, 28 May 2010 17:26:06 +0200 Subject: [SciPy-User] finding frequency of wav In-Reply-To: References: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr> <792740.87711.qm@web33006.mail.mud.yahoo.com> Message-ID: Thank you, I will take a look at this :-) On Fri, May 28, 2010 at 16:27, Benjamin Root wrote: > Linda, > > I am not very familiar with this particular topic, but you might want to > look into Wavelet Analysis: > > http://en.wikipedia.org/wiki/Wavelet > http://www.amara.com/IEEEwave/IW_wave_ana.html > > Fourier transforms are definately the wrong tool here because it assumes > that the different frequency waves exists for the entire sample. My (basic) > understanding of wavelet analysis is that it does not make that assumption. > > Hope this helps, > Ben Root > > > On Fri, May 28, 2010 at 7:41 AM, Linda wrote: > >> Hello David, >> I decided that fft would not work on finding those bits, so I spent all >> day looking into your suggestions. >> >> I think my data signals are FSK (FM?) instead of PSK, there are no >> discontinuieties in the signals when I take a look at them in audacity. >> >> I'm not sure I completely understand the multiply with a complex >> exponential part. Should I multiply my data-array with exp( 1j * omega * T >> )? where omega would be 2 * pi * f_carrier (1700) and T would be 1/Fs? or >> the bitlength in samples? >> >> This task seems to be quite a bit more difficult than I initially thought >> (undergrad student) >> >> I would really appreciate some more help :-) >> >> Cheers >> Linda >> >> On Thu, May 27, 2010 at 23:12, David Baddeley < >> david_baddeley at yahoo.com.au> wrote: >> >>> Hi Linda, >>> >>> I probably wouldn't divide the signal up into chunks before procesing, >>> and also suspect that the FFT might be the wrong tool for the job (I'd >>> certainly take the fft of the whole signal just to check that you have the >>> right frequencies in it though). >>> >>> The problem with dividing into chunks and processing each separately is >>> that you don't necessarily know where each bit start and stops - your chunks >>> are thus, more likely than not, going to be misaligned. >>> >>> I'd probably tackle the problem with a strategy directly analogous to >>> that used in analogue circuitry for decoding PSK - I'd either mix the >>> carrier out & do I-Q detection (multiply with a complex exponential and then >>> look at the low pass filtered real & imaginary parts of the result), or just >>> look for the two frequency components separately by multiplying with a >>> complex exponential at each frequency & low pass filtering the amplitude >>> (I'd probably use a boxcar filter the same length as your symbols/frames). >>> >>> After doing this you can then start to decide where your frame boundaries >>> are. If you've filtered as described, you should just be able to start at >>> some offset and then take every 18th value. >>> >>> hope this gives you some ideas, >>> >>> David >>> ------------------------------ >>> *From:* Linda >>> *To:* SciPy Users List >>> *Sent:* Fri, 28 May, 2010 2:42:05 AM >>> *Subject:* Re: [SciPy-User] finding frequency of wav >>> >>> Thanks for your reply. The explanation on fftfreq already made a few >>> puzzle pieces fall into place. >>> >>> The signal I am trying to decode is a DSC transmission that is recorded >>> in a wav file. (Digital Selective Calling, used in marine radio) >>> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and >>> there's a carrier at 1700Hz. That should be all frequencies involved (apart >>> from noise). Currently I am used generated, clean signals. But probably I >>> should get a clean '10101010'-signal first to try my work on. >>> >>> Since the bitrate is set at 1200bits/sec, the bit length would be >>> samplerate/1200 = 18.4 samples at 22050. >>> I can double the samplerate to 44100, but that still leaves me at only >>> 36.8 samples per chunk. >>> If I understand what you say correctly, I would need at least 55 (64) >>> samples in each chunk? >>> >>> I'm not sure what chunk[3] would have been, I should have used a >>> dotting-signal instead of an unknown message to try this on. I will try this >>> again with more useful data this afternoon. >>> >>> cheers, >>> Linda >>> >>> >>> On Thu, May 27, 2010 at 15:28, Fabrice Silva wrote: >>> >>>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit : >>>> > Hello all, >>>> >>>> > I have a digital signal where the bits in it are encoded with >>>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a >>>> > samplerate of 22050. >>>> > My goal is to find the bits again so I can decode the message in it, >>>> > for that I have chopped the wav up in pieces of 18 samples, which >>>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have >>>> > a list of chunks of length 18. I thought I could just fft each chunks >>>> > and find the max of the chunk-spectrum, to find out the bitfrequency >>>> > in the chunk (and thus the bitvalue) >>>> >>>> Correct me if I am wrong. You are cutting your signal into chunks that >>>> you expect to contain at least one period of the lower coding frequency. >>>> You then perform a fft on a very small signal (18 samples) which gives >>>> you (without zero padding) an estimation of the Fourier transform of >>>> your chunk computed on only 18 frequencies, i.e. with a really bad >>>> frequential resolution. It is possible if your coding frequencies are >>>> not too close. A raw Rayleight criteria leads to cut your signal into at >>>> least N=2*Fe/df_min where df_min is the minimal spacing between two >>>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a >>>> power of 2). >>>> > >>>> > But somehow I am stuck in the numbers, I was hopeing you could give me >>>> > a hint. here is what I have: >>>> >>>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden >>>> in here >>>> > Out[98]: >>>> > array([ 2, -1, 1, -2, 2, -2, 2, -1, 0, 0, 0, 1, -2, 2, -1, >>>> 0, 0, 0], dtype=int16) >>>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me >>>> the value of the bitfrequency 1300 of 2100 Hz? >>>> > test >>>> > Out[100]: >>>> > array([ 1.00000000 +0.00000000e+00j, 1.00000000 +2.37564698e-01j, >>>> > 1.46791111 +4.90375770e-01j, 2.50000000 +8.66025404e-01j, >>>> > 2.65270364 -7.37891832e-01j, 1.00000000 +3.01762603e+00j, >>>> > -0.50000000 -2.59807621e+00j, 1.00000000 -2.41609109e+00j, >>>> > 4.87938524 +1.43601897e+01j, 7.00000000 -6.88706904e-15j, >>>> > 4.87938524 -1.43601897e+01j, 1.00000000 +2.41609109e+00j, >>>> > -0.50000000 +2.59807621e+00j, 1.00000000 -3.01762603e+00j, >>>> > 2.65270364 +7.37891832e-01j, 2.50000000 -8.66025404e-01j, >>>> > 1.46791111 -4.90375770e-01j, 1.00000000 -2.37564698e-01j]) >>>> > >>>> > >>>> > I am unsure how to proceed from here, so I would really appreciate any >>>> > tips.. I found fftfreq, but I am not sure how to use it? I read >>>> > fftfreq? but I don't see how the example even uses the 'fourier' >>>> > variable in the fftfreq there? >>>> > >>>> Fftfreq is a function that constructs the frequency vector associated to >>>> the data computed by the fft algorithm. It is aware of how fft orders >>>> the frequency bins, and transform it in a more convenient way (it >>>> 'anti-aliases', centering the results on zero frequency). >>>> >>>> import numpy as np >>>> import matplotlib.pyplot as plt >>>> chunks[3]=.... >>>> test = np.fft.fft(chunks[3]) >>>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling >>>> period >>>> plt.plot(frequencies, np.abs(test), 'o') >>>> plt.show() >>>> >>>> but you won't see any things on this fft. I am suspicious due to the >>>> fact that the signal to noise ratio seems rather low leading to strong >>>> peak at Fe/2 >>>> In chunk[3], what do you expect to be the bit? >>>> >>>> Fabricio >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.biesinger at gmail.com Fri May 28 12:30:22 2010 From: jake.biesinger at gmail.com (Jacob Biesinger) Date: Fri, 28 May 2010 09:30:22 -0700 Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data, possibly deprecation error Message-ID: Hi! Having some trouble with a gaussian_kde on uniform integer data. $ python --version Python 2.6.5 $ ipython --Version 0.10 $ ipython In [1]: from scipy.stats import gaussian_kde In [2]: import scipy In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) # should be uniform on [-250,250) In [4]: k = gaussian_kde(randDistFromCenter) /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486: DeprecationWarning: scipy.stats.cov is deprecated; please update your code to use numpy.cov. Please note that: - numpy.cov rowvar argument defaults to true, not false - numpy.cov bias argument defaults to false, not true """, DeprecationWarning) /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420: DeprecationWarning: scipy.stats.mean is deprecated; please update your code to use numpy.mean. Please note that: - numpy.mean axis argument defaults to None, not 0 - numpy.mean has a ddof argument to replace bias in a more general manner. scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, axis=0, ddof=1). axis=0, ddof=1).""", DeprecationWarning) In [5]: x = scipy.linspace(-300,300,100) In [6]: y = k.evaluate(x) In [7]: y Out[7]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # Perhaps it's a bandwidth issue, though there should still be at least a few non-zero entries!: In [8]: k.covariance Out[8]: array([[ 523.56767608]]) -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 28 12:46:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 12:46:24 -0400 Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data, possibly deprecation error In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger wrote: > Hi! > Having some trouble with a gaussian_kde on uniform integer data. > $ python --version > Python 2.6.5 > $ ipython --Version > 0.10 > $ ipython > In [1]: from scipy.stats import gaussian_kde > In [2]: import scipy > In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) ?# > should be uniform on [-250,250) > In [4]: k = gaussian_kde(randDistFromCenter) > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486: > DeprecationWarning: scipy.stats.cov is deprecated; please update your code > to use numpy.cov. > Please note that: > ?? ?- numpy.cov rowvar argument defaults to true, not false > ?? ?- numpy.cov bias argument defaults to false, not true > ??""", DeprecationWarning) > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420: > DeprecationWarning: scipy.stats.mean is deprecated; please update your code > to use numpy.mean. > Please note that: > ?? ?- numpy.mean axis argument defaults to None, not 0 > ?? ?- numpy.mean has a ddof argument to replace bias in a more general > manner. > ?? ? ?scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, > axis=0, ddof=1). > ??axis=0, ddof=1).""", DeprecationWarning) > In [5]: x = scipy.linspace(-300,300,100) > In [6]: y = k.evaluate(x) > In [7]: y > Out[7]: > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0]) > > # ?Perhaps it's a bandwidth issue, though there should still be at least a > few non-zero entries!: > In [8]: k.covariance > Out[8]: array([[ 523.56767608]]) kde doesn't like integers, Can you file a ticket for this? if you don't convert the original sample to integers, or convert them to float, e.g k = gaussian_kde(np.array(randDistFromCenter, float)) then I get y= [ 2.37718449e-05 4.59333855e-05 8.33629081e-05 1.42283310e-04 2.28733649e-04 3.46958715e-04 4.97636815e-04 6.76566775e-04 8.74448533e-04 1.07809127e-03 1.27285862e-03 1.44565290e-03 1.58750741e-03 1.69501260e-03 1.77025571e-03 1.81947393e-03 .... I don't know how good the bandwidth is for a uniform distribution of the sample, but you will get a lot of spillover/smoothing at the boundary. Thanks for reporting, Josef > > -- > Jake Biesinger > Graduate Student > Xie Lab, UC Irvine > (949) 231-7587 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jake.biesinger at gmail.com Fri May 28 13:03:14 2010 From: jake.biesinger at gmail.com (Jacob Biesinger) Date: Fri, 28 May 2010 10:03:14 -0700 Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data, possibly deprecation error In-Reply-To: References: Message-ID: Oh, great. Thanks for the workaround. I opened a ticket for this on Scipy Trac: http://projects.scipy.org/scipy/ticket/1181 -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587 On Fri, May 28, 2010 at 9:46 AM, wrote: > On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger > wrote: > > Hi! > > Having some trouble with a gaussian_kde on uniform integer data. > > $ python --version > > Python 2.6.5 > > $ ipython --Version > > 0.10 > > $ ipython > > In [1]: from scipy.stats import gaussian_kde > > In [2]: import scipy > > In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) # > > should be uniform on [-250,250) > > In [4]: k = gaussian_kde(randDistFromCenter) > > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486: > > DeprecationWarning: scipy.stats.cov is deprecated; please update your > code > > to use numpy.cov. > > Please note that: > > - numpy.cov rowvar argument defaults to true, not false > > - numpy.cov bias argument defaults to false, not true > > """, DeprecationWarning) > > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420: > > DeprecationWarning: scipy.stats.mean is deprecated; please update your > code > > to use numpy.mean. > > Please note that: > > - numpy.mean axis argument defaults to None, not 0 > > - numpy.mean has a ddof argument to replace bias in a more general > > manner. > > scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, > > axis=0, ddof=1). > > axis=0, ddof=1).""", DeprecationWarning) > > In [5]: x = scipy.linspace(-300,300,100) > > In [6]: y = k.evaluate(x) > > In [7]: y > > Out[7]: > > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, > > 0, 0, 0, 0, 0, 0, 0, 0]) > > > > # Perhaps it's a bandwidth issue, though there should still be at least > a > > few non-zero entries!: > > In [8]: k.covariance > > Out[8]: array([[ 523.56767608]]) > > > kde doesn't like integers, Can you file a ticket for this? > > if you don't convert the original sample to integers, or convert them > to float, e.g > > k = gaussian_kde(np.array(randDistFromCenter, float)) > > then I get y= > [ 2.37718449e-05 4.59333855e-05 8.33629081e-05 1.42283310e-04 > 2.28733649e-04 3.46958715e-04 4.97636815e-04 6.76566775e-04 > 8.74448533e-04 1.07809127e-03 1.27285862e-03 1.44565290e-03 > 1.58750741e-03 1.69501260e-03 1.77025571e-03 1.81947393e-03 > .... > > I don't know how good the bandwidth is for a uniform distribution of > the sample, but you will get a lot of spillover/smoothing at the > boundary. > > Thanks for reporting, > > Josef > > > > > -- > > Jake Biesinger > > Graduate Student > > Xie Lab, UC Irvine > > (949) 231-7587 > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 28 13:18:41 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 13:18:41 -0400 Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data, possibly deprecation error In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 1:03 PM, Jacob Biesinger wrote: > Oh, great. ?Thanks for the workaround. ?I opened a ticket for this on Scipy > Trac: ?http://projects.scipy.org/scipy/ticket/1181 Thanks, Josef > > -- > Jake Biesinger > Graduate Student > Xie Lab, UC Irvine > (949) 231-7587 > > > On Fri, May 28, 2010 at 9:46 AM, wrote: >> >> On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger >> wrote: >> > Hi! >> > Having some trouble with a gaussian_kde on uniform integer data. >> > $ python --version >> > Python 2.6.5 >> > $ ipython --Version >> > 0.10 >> > $ ipython >> > In [1]: from scipy.stats import gaussian_kde >> > In [2]: import scipy >> > In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) ?# >> > should be uniform on [-250,250) >> > In [4]: k = gaussian_kde(randDistFromCenter) >> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486: >> > DeprecationWarning: scipy.stats.cov is deprecated; please update your >> > code >> > to use numpy.cov. >> > Please note that: >> > ?? ?- numpy.cov rowvar argument defaults to true, not false >> > ?? ?- numpy.cov bias argument defaults to false, not true >> > ??""", DeprecationWarning) >> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420: >> > DeprecationWarning: scipy.stats.mean is deprecated; please update your >> > code >> > to use numpy.mean. >> > Please note that: >> > ?? ?- numpy.mean axis argument defaults to None, not 0 >> > ?? ?- numpy.mean has a ddof argument to replace bias in a more general >> > manner. >> > ?? ? ?scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, >> > axis=0, ddof=1). >> > ??axis=0, ddof=1).""", DeprecationWarning) >> > In [5]: x = scipy.linspace(-300,300,100) >> > In [6]: y = k.evaluate(x) >> > In [7]: y >> > Out[7]: >> > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> > 0, >> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> > 0, >> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> > 0, >> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> > 0, >> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0]) >> > >> > # ?Perhaps it's a bandwidth issue, though there should still be at least >> > a >> > few non-zero entries!: >> > In [8]: k.covariance >> > Out[8]: array([[ 523.56767608]]) >> >> >> kde doesn't like integers, Can you file a ticket for this? >> >> if you don't convert the original sample to integers, or convert them >> to float, e.g >> >> k = gaussian_kde(np.array(randDistFromCenter, float)) >> >> then I get y= >> [ ?2.37718449e-05 ? 4.59333855e-05 ? 8.33629081e-05 ? 1.42283310e-04 >> ? 2.28733649e-04 ? 3.46958715e-04 ? 4.97636815e-04 ? 6.76566775e-04 >> ? 8.74448533e-04 ? 1.07809127e-03 ? 1.27285862e-03 ? 1.44565290e-03 >> ? 1.58750741e-03 ? 1.69501260e-03 ? 1.77025571e-03 ? 1.81947393e-03 >> .... >> >> I don't know how good the bandwidth is for a uniform distribution of >> the sample, but you will get a lot of spillover/smoothing at the >> boundary. >> >> Thanks for reporting, >> >> Josef >> >> > >> > -- >> > Jake Biesinger >> > Graduate Student >> > Xie Lab, UC Irvine >> > (949) 231-7587 >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Fri May 28 13:38:11 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 13:38:11 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Mon, May 17, 2010 at 3:57 PM, nicky van foreest wrote: > Hi Josef, > > Thanks for the answer. > >> Actually, if the onepoint distribution directly subclasses rv_generic >> then it wouldn't rely on or interfere with the generic framework in >> rv_continuous or rv_discrete (where it wouldn't really fit in if >> onepoint is on reals), and it might be relatively easy to provide all >> the methods of the distributions for a single point distribution. > > I must admit that I haven't had a look at the innards of rv_generic, > so I am afraid I cannot be of any relevant help in this respect. > >> >> Choice of name: >> to me, "deterministic random variable" sounds like an oxymoron, >> although I found some references to deterministic distribution (mainly >> or exclusively in queuing theory and >> http://isi.cbs.nl/glossary/term902.htm) >> I would prefer a boring "onepoint" distribution, or "degenerate", or ... ? > > Degenerate seems nice to me. I just checked the book Probability by > Shiryaev, and he also uses the word `degenerate'. Interestingly, he > introduces the degenerate distribution as the normal distribution with > sigma = 0. I suspect that implementing the degenerate distribution > like this is utterly stupid. > >> Can you file a ticket with what you would like to have? > > Sure. Sorry for bothering you with this, but how? Nicky, Sorry about the delay, another thread I lost track of. http://projects.scipy.org/scipy/newticket has a form to fill out, you might have to sign up first It's a good place to file things so they don't get forgotten. > >> >> I started to work again a bit on enhancing the distributions, mainly >> I'm experimenting with several generic estimation methods. My target >> is to have a working estimator for any distribution in scipy.stats and >> for several additional distributions. > > This seems a nice idea, but quite ambitious. Have you also thought > about estimators for heavy tailed distributions? This is, as far as I > know, a very delicate topic. > >> >> I worry a bit that a deterministic distribution might not fit into a >> general framework for distributions and might need to be special cased >> for some methods. (but see above) > > This must be fairly easy. Just the mean can be relevant. We would want to provide all the same methods, even if most of them are trivial. BTW: I had started to work on the discrete distribution on the real line. Some methods works easily, but then I ran into the "hashtable on floats" problem (from another thread) pdf(x), cdf(x) with x float would need to know whether x is a support point, but which might not be equal to the actual point because of floating point problems. So, the direct translation of rv_discrete doesn't work, and it looks like at least pdf needs to be accessible either pointwise for queries or using known support points for actual calculations. No fun, and EDA dropped. Josef > >> http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/ > > I'll have a look. Thanks. > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From njs at pobox.com Fri May 28 14:11:04 2010 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 28 May 2010 11:11:04 -0700 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 10:38 AM, wrote: > pdf(x), cdf(x) ?with x float would need to know whether x is a support > point, but which might not be equal to the actual point because of > floating point problems. > So, the direct translation of rv_discrete doesn't work, and it looks > like at least pdf needs to be accessible either pointwise for queries > or using known support points for actual calculations. Discrete distributions on the real line don't *have* a pdf... -- Nathaniel From josef.pktd at gmail.com Fri May 28 14:21:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 14:21:54 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 2:11 PM, Nathaniel Smith wrote: > On Fri, May 28, 2010 at 10:38 AM, ? wrote: >> pdf(x), cdf(x) ?with x float would need to know whether x is a support >> point, but which might not be equal to the actual point because of >> floating point problems. >> So, the direct translation of rv_discrete doesn't work, and it looks >> like at least pdf needs to be accessible either pointwise for queries >> or using known support points for actual calculations. > > Discrete distributions on the real line don't *have* a pdf... sorry, pmf, it's a pain if continuous and discrete have different names, one of my favorite typos in this. What do we call it when we have a mixture distribution with mass points and a density? pmdf ? or just pf ? Josef > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From njs at pobox.com Fri May 28 15:16:05 2010 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 28 May 2010 12:16:05 -0700 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 11:21 AM, wrote: >> Discrete distributions on the real line don't *have* a pdf... > > sorry, pmf, > it's a pain if continuous and discrete have different names, one of my > favorite typos in this. It's not just a difference in terminology -- I brought it up because the practical problems you were running into are closely related to the reasons why it's impossible to define a pdf for a point mass on the real line. A pdf and pmf on the real line have to be used differently -- pdf's are mostly meaningful as things to integrate, but imagine integrating (some function of) a pmf over the real line as if it were a pdf -- you always get 0... > What do we call it when we have a mixture distribution with mass > points and a density? I don't know -- before trying to name it, can it even be defined? What value does it have at the locations where there's a point mass? Inf? IEEE954 neglected to include values for Dirac delta functions :-). -- Nathaniel From josef.pktd at gmail.com Fri May 28 15:41:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 15:41:05 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 3:16 PM, Nathaniel Smith wrote: > On Fri, May 28, 2010 at 11:21 AM, ? wrote: >>> Discrete distributions on the real line don't *have* a pdf... >> >> sorry, pmf, >> it's a pain if continuous and discrete have different names, one of my >> favorite typos in this. > > It's not just a difference in terminology -- I brought it up because > the practical problems you were running into are closely related to > the reasons why it's impossible to define a pdf for a point mass on > the real line. > > A pdf and pmf on the real line have to be used differently -- pdf's > are mostly meaningful as things to integrate, but imagine integrating > (some function of) a pmf over the real line as if it were a pdf -- you > always get 0... I was trying to convert rv_discrete to be defined on an arbitrary (countable) number of masspoints on the real line instead of over integers. I'm summing, not integrating over points. The problem is that we can specify the set of integers relatively easily, but looking for a finite (or countable) number of points on the real line is more difficult. I'm not trying to reuse anything from the continuous distributions, except that one application will be to work with discretized continuous distributions. > >> What do we call it when we have a mixture distribution with mass >> points and a density? > > I don't know -- before trying to name it, can it even be defined? What > value does it have at the locations where there's a point mass? Inf? It would have to combine integration for the continuous part with addition of mass points for the discrete part. Tweedie distribution for some parameters is an example. A mass point at zero and continuous on the positive real line. "Invented" for rainfall, with positive probability it doesn't rain, but if it rains then the amount of rainfall is a continuous random variable. There are some decomposition theorems for this, approximately every distribution can be represented as the sum of discrete mass points plus a continuous distribution. > > IEEE954 neglected to include values for Dirac delta functions :-). I think integration over singularities in a function doesn't work too well in scipy. Josef > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Fri May 28 15:50:09 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 28 May 2010 14:50:09 -0500 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 13:11, Nathaniel Smith wrote: > On Fri, May 28, 2010 at 10:38 AM, ? wrote: >> pdf(x), cdf(x) ?with x float would need to know whether x is a support >> point, but which might not be equal to the actual point because of >> floating point problems. >> So, the direct translation of rv_discrete doesn't work, and it looks >> like at least pdf needs to be accessible either pointwise for queries >> or using known support points for actual calculations. > > Discrete distributions on the real line don't *have* a pdf... Well, they *have* one; they just can't be implemented in floating point. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mdekauwe at gmail.com Fri May 28 15:53:42 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 28 May 2010 12:53:42 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> Message-ID: <28711249.post@talk.nabble.com> Ok thanks...I'll take a look. Back to my loops issue. What if instead this time I wanted to take an average so every march in 11 years, is there a quicker way to go about doing that than my current method? nummonths = 12 numyears = 11 for month in xrange(nummonths): for i in xrange(numpts): for ym in xrange(month, numyears * nummonths, nummonths): data[month, i] += array[ym, VAR, land_pts_index[i], 0] so for each point in the array for a given month i am jumping through and getting the next years month and so on, summing it. Thanks... josef.pktd wrote: > > On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >> >> Could you possibly if you have time explain further your comment re the >> p-values, your suggesting I am misusing them? > > Depends on your use and interpretation > > test statistics, p-values are random variables, if you look at several > tests at the same time, some p-values will be large just by chance. > If, for example you just look at the largest test statistic, then the > distribution for the max of several test statistics is not the same as > the distribution for a single test statistic > > http://en.wikipedia.org/wiki/Multiple_comparisons > http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm > > we also just had a related discussion for ANOVA post-hoc tests on the > pystatsmodels group. > > Josef >> >> Thanks. >> >> >> josef.pktd wrote: >>> >>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >>>> >>>> Sounds like I am stuck with the loop as I need to do the comparison for >>>> each >>>> pixel of the world and then I have a basemap function call which I >>>> guess >>>> slows it down further...hmm >>> >>> I don't see much that could be done differently, after a brief look. >>> >>> stats.pearsonr could be replaced by an array version using directly >>> the formula for correlation even with nans. wilcoxon looks slow, and I >>> never tried or seen a faster version. >>> >>> just a reminder, the p-values are for a single test, when you have >>> many of them, then they don't have the right size/confidence level for >>> an overall or joint test. (some packages report a Bonferroni >>> correction in this case) >>> >>> Josef >>> >>> >>>> >>>> i.e. >>>> >>>> def compareSnowData(jules_var): >>>> ? ?# Extract the 11 years of snow data and return >>>> ? ?outrows = 180 >>>> ? ?outcols = 360 >>>> ? ?numyears = 11 >>>> ? ?nummonths = 12 >>>> >>>> ? ?# Read various files >>>> ? ?fname="world_valid_jules_pts.ascii" >>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>> jo.read_land_points_ascii(fname, 1.0) >>>> >>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, >>>> \ >>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, >>>> \ >>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>> >>>> ? ?# grab some space >>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>> dtype=np.float32) >>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>> dtype=np.float32) >>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>> np.nan >>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>> np.nan >>>> >>>> ? ?# extract the data >>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>> ? ?#for month in xrange(numyears * nummonths): >>>> ? ?# ? ?for i in xrange(numpts): >>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>> ? ?# ? ? ? ?else: >>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>> ? ?# ? ? ? ?if data2 > 0.0: >>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>> ? ?# ? ? ? ?else: >>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>> >>>> ? ?# exclude any months from *both* arrays where we have dodgy data, >>>> else >>>> we >>>> ? ?# can't do the correlations correctly!! >>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>> >>>> ? ?# put data on a regular grid... >>>> ? ?print 'regridding landpts...' >>>> ? ?for i in xrange(numpts): >>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>> func >>>> ? ? ? ?x = data1_snow[:,i] >>>> ? ? ? ?x = x[np.isfinite(x)] >>>> ? ? ? ?y = data2_snow[:,i] >>>> ? ? ? ?y = y[np.isfinite(y)] >>>> >>>> ? ? ? ?# r^2 >>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of >>>> data >>>> ? ? ? ?if len(x) and len(y) > 50: >>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>> (stats.pearsonr(x, y)[0])**2 >>>> >>>> ? ? ? ?# wilcox signed rank test >>>> ? ? ? ?# make sure we have enough samples to do the test >>>> ? ? ? ?d = x - y >>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>> non-zero >>>> differences >>>> ? ? ? ?count = len(d) >>>> ? ? ? ?if count > 10: >>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>> ? ? ? ? ? ?# only map out sign different data >>>> ? ? ? ? ? ?if pval < 0.05: >>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>> np.mean(x - y) >>>> >>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>> >>>> >>>> josef.pktd wrote: >>>>> >>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >>>>>> >>>>>> Also I then need to remap the 2D array I make onto another grid (the >>>>>> world in >>>>>> this case). Which again I had am doing with a loop (note numpts is a >>>>>> lot >>>>>> bigger than my example above). >>>>>> >>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>> np.nan >>>>>> for i in xrange(numpts): >>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>>> func >>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>> >>>>>> ? ? ? ?# wilcox signed rank test >>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>> ? ? ? ?d = x - y >>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>> non-zero >>>>>> differences >>>>>> ? ? ? ?count = len(d) >>>>>> ? ? ? ?if count > 10: >>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>> np.mean(x - y) >>>>>> >>>>>> Now I think I can push the data in one move into the wilcoxStats_snow >>>>>> array >>>>>> by removing the index, >>>>>> but I can't see how I will get the individual x and y pts for each >>>>>> array >>>>>> member correctly without the loop, this was my attempt which of >>>>>> course >>>>>> doesn't work! >>>>>> >>>>>> x = data1_snow[:,:] >>>>>> x = x[np.isfinite(x)] >>>>>> y = data2_snow[:,:] >>>>>> y = y[np.isfinite(y)] >>>>>> >>>>>> # r^2 >>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data >>>>>> if len(x) and len(y) > 50: >>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>>> y)[0])**2 >>>>> >>>>> >>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays >>>>> at a time (if I read the help correctly). >>>>> >>>>> Also the presence of nans might force the use a loop. stats.mstats has >>>>> masked array versions, but I didn't see wilcoxon in the list. (Even >>>>> when vectorized operations would work with regular arrays, nan or >>>>> masked array versions still have to loop in many cases.) >>>>> >>>>> If you have many columns with count <= 10, so that wilcoxon is not >>>>> calculated then it might be worth to use only array operations up to >>>>> that point. If wilcoxon is calculated most of the time, then it's not >>>>> worth thinking too hard about this. >>>>> >>>>> Josef >>>>> >>>>> >>>>>> >>>>>> thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> mdekauwe wrote: >>>>>>> >>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>>>>>> >>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>> there >>>>>>> a >>>>>>> link you can recommend I should read? Does that mean given I have >>>>>>> 4dims >>>>>>> that Josef's suggestion would be more advised in this case? >>>>> >>>>> There were several discussions on the mailing lists (fancy slicing and >>>>> indexing). Your case is safe, but if you run in future into funny >>>>> shapes, you can look up the details. >>>>> when in doubt, I use np.arange(...) >>>>> >>>>> Josef >>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> josef.pktd wrote: >>>>>>>> >>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks that works... >>>>>>>>> >>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was >>>>>>>>> the >>>>>>>>> step >>>>>>>>> I >>>>>>>>> was struggling with, so this forms a 2D array which replaces the >>>>>>>>> the >>>>>>>>> two >>>>>>>>> for >>>>>>>>> loops? Do I have that right? >>>>>>>> >>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>> dimension, >>>>>>>> then you can use slicing. It might be faster. >>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>> >>>>>>>> Josef >>>>>>>> >>>>>>>>> >>>>>>>>> A lot quicker...! >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> >>>>>>>>> josef.pktd wrote: >>>>>>>>>> >>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>>>>>> array, >>>>>>>>>>> but >>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality >>>>>>>>>>> the >>>>>>>>>>> arrays >>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>> solution >>>>>>>>>>> as >>>>>>>>>>> well >>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>> difficult >>>>>>>>>>> to >>>>>>>>>>> get >>>>>>>>>>> over the habit of using loops (C convert for my sins). I get >>>>>>>>>>> that >>>>>>>>>>> one >>>>>>>>>>> could >>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>> >>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>> >>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> import numpy as np >>>>>>>>>>> >>>>>>>>>>> numpts=10 >>>>>>>>>>> tsteps = 12 >>>>>>>>>>> vari = 22 >>>>>>>>>>> >>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>> >>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>> >>>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>>> >>>>>>>>>> I think this should do it >>>>>>>>>> >>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), >>>>>>>>>> 0] >>>>>>>>>> >>>>>>>>>> Josef >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> View this message in context: >>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Fri May 28 16:05:17 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 16:05:17 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28711249.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> Message-ID: On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: > > Ok thanks...I'll take a look. > > Back to my loops issue. What if instead this time I wanted to take an > average so every march in 11 years, is there a quicker way to go about doing > that than my current method? > > nummonths = 12 > numyears = 11 > > for month in xrange(nummonths): > ? ?for i in xrange(numpts): > ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths): > ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0] x[start:end:12,:] gives you every 12th row of an array x something like this should work to get rid of the inner loop, or you could directly put range(month, numyears * nummonths, nummonths) into the array instead of ym and sum() Josef > > so for each point in the array for a given month i am jumping through and > getting the next years month and so on, summing it. > > Thanks... > > > josef.pktd wrote: >> >> On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >>> >>> Could you possibly if you have time explain further your comment re the >>> p-values, your suggesting I am misusing them? >> >> Depends on your use and interpretation >> >> test statistics, p-values are random variables, if you look at several >> tests at the same time, some p-values will be large just by chance. >> If, for example you just look at the largest test statistic, then the >> distribution for the max of several test statistics is not the same as >> the distribution for a single test statistic >> >> http://en.wikipedia.org/wiki/Multiple_comparisons >> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm >> >> we also just had a related discussion for ANOVA post-hoc tests on the >> pystatsmodels group. >> >> Josef >>> >>> Thanks. >>> >>> >>> josef.pktd wrote: >>>> >>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >>>>> >>>>> Sounds like I am stuck with the loop as I need to do the comparison for >>>>> each >>>>> pixel of the world and then I have a basemap function call which I >>>>> guess >>>>> slows it down further...hmm >>>> >>>> I don't see much that could be done differently, after a brief look. >>>> >>>> stats.pearsonr could be replaced by an array version using directly >>>> the formula for correlation even with nans. wilcoxon looks slow, and I >>>> never tried or seen a faster version. >>>> >>>> just a reminder, the p-values are for a single test, when you have >>>> many of them, then they don't have the right size/confidence level for >>>> an overall or joint test. (some packages report a Bonferroni >>>> correction in this case) >>>> >>>> Josef >>>> >>>> >>>>> >>>>> i.e. >>>>> >>>>> def compareSnowData(jules_var): >>>>> ? ?# Extract the 11 years of snow data and return >>>>> ? ?outrows = 180 >>>>> ? ?outcols = 360 >>>>> ? ?numyears = 11 >>>>> ? ?nummonths = 12 >>>>> >>>>> ? ?# Read various files >>>>> ? ?fname="world_valid_jules_pts.ascii" >>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>>> jo.read_land_points_ascii(fname, 1.0) >>>>> >>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, >>>>> \ >>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, >>>>> \ >>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>> >>>>> ? ?# grab some space >>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>>> dtype=np.float32) >>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>>> dtype=np.float32) >>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>> np.nan >>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>> np.nan >>>>> >>>>> ? ?# extract the data >>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>>> ? ?#for month in xrange(numyears * nummonths): >>>>> ? ?# ? ?for i in xrange(numpts): >>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >>>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>>> ? ?# ? ? ? ?else: >>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>>> ? ?# ? ? ? ?if data2 > 0.0: >>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>>> ? ?# ? ? ? ?else: >>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>>> >>>>> ? ?# exclude any months from *both* arrays where we have dodgy data, >>>>> else >>>>> we >>>>> ? ?# can't do the correlations correctly!! >>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>>> >>>>> ? ?# put data on a regular grid... >>>>> ? ?print 'regridding landpts...' >>>>> ? ?for i in xrange(numpts): >>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>> func >>>>> ? ? ? ?x = data1_snow[:,i] >>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>> ? ? ? ?y = data2_snow[:,i] >>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>> >>>>> ? ? ? ?# r^2 >>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of >>>>> data >>>>> ? ? ? ?if len(x) and len(y) > 50: >>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>> (stats.pearsonr(x, y)[0])**2 >>>>> >>>>> ? ? ? ?# wilcox signed rank test >>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>> ? ? ? ?d = x - y >>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>> non-zero >>>>> differences >>>>> ? ? ? ?count = len(d) >>>>> ? ? ? ?if count > 10: >>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>> ? ? ? ? ? ?# only map out sign different data >>>>> ? ? ? ? ? ?if pval < 0.05: >>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>> np.mean(x - y) >>>>> >>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>>> >>>>> >>>>> josef.pktd wrote: >>>>>> >>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe wrote: >>>>>>> >>>>>>> Also I then need to remap the 2D array I make onto another grid (the >>>>>>> world in >>>>>>> this case). Which again I had am doing with a loop (note numpts is a >>>>>>> lot >>>>>>> bigger than my example above). >>>>>>> >>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>> np.nan >>>>>>> for i in xrange(numpts): >>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>>>> func >>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>> >>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>> ? ? ? ?d = x - y >>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>> non-zero >>>>>>> differences >>>>>>> ? ? ? ?count = len(d) >>>>>>> ? ? ? ?if count > 10: >>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>> np.mean(x - y) >>>>>>> >>>>>>> Now I think I can push the data in one move into the wilcoxStats_snow >>>>>>> array >>>>>>> by removing the index, >>>>>>> but I can't see how I will get the individual x and y pts for each >>>>>>> array >>>>>>> member correctly without the loop, this was my attempt which of >>>>>>> course >>>>>>> doesn't work! >>>>>>> >>>>>>> x = data1_snow[:,:] >>>>>>> x = x[np.isfinite(x)] >>>>>>> y = data2_snow[:,:] >>>>>>> y = y[np.isfinite(y)] >>>>>>> >>>>>>> # r^2 >>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data >>>>>>> if len(x) and len(y) > 50: >>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>>>> y)[0])**2 >>>>>> >>>>>> >>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays >>>>>> at a time (if I read the help correctly). >>>>>> >>>>>> Also the presence of nans might force the use a loop. stats.mstats has >>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even >>>>>> when vectorized operations would work with regular arrays, nan or >>>>>> masked array versions still have to loop in many cases.) >>>>>> >>>>>> If you have many columns with count <= 10, so that wilcoxon is not >>>>>> calculated then it might be worth to use only array operations up to >>>>>> that point. If wilcoxon is calculated most of the time, then it's not >>>>>> worth thinking too hard about this. >>>>>> >>>>>> Josef >>>>>> >>>>>> >>>>>>> >>>>>>> thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> mdekauwe wrote: >>>>>>>> >>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work. >>>>>>>> >>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>>> there >>>>>>>> a >>>>>>>> link you can recommend I should read? Does that mean given I have >>>>>>>> 4dims >>>>>>>> that Josef's suggestion would be more advised in this case? >>>>>> >>>>>> There were several discussions on the mailing lists (fancy slicing and >>>>>> indexing). Your case is safe, but if you run in future into funny >>>>>> shapes, you can look up the details. >>>>>> when in doubt, I use np.arange(...) >>>>>> >>>>>> Josef >>>>>> >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> josef.pktd wrote: >>>>>>>>> >>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks that works... >>>>>>>>>> >>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was >>>>>>>>>> the >>>>>>>>>> step >>>>>>>>>> I >>>>>>>>>> was struggling with, so this forms a 2D array which replaces the >>>>>>>>>> the >>>>>>>>>> two >>>>>>>>>> for >>>>>>>>>> loops? Do I have that right? >>>>>>>>> >>>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>>> dimension, >>>>>>>>> then you can use slicing. It might be faster. >>>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>>> >>>>>>>>> Josef >>>>>>>>> >>>>>>>>>> >>>>>>>>>> A lot quicker...! >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> josef.pktd wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D >>>>>>>>>>>> array, >>>>>>>>>>>> but >>>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality >>>>>>>>>>>> the >>>>>>>>>>>> arrays >>>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>>> solution >>>>>>>>>>>> as >>>>>>>>>>>> well >>>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>>> difficult >>>>>>>>>>>> to >>>>>>>>>>>> get >>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get >>>>>>>>>>>> that >>>>>>>>>>>> one >>>>>>>>>>>> could >>>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>>> >>>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>>> >>>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> import numpy as np >>>>>>>>>>>> >>>>>>>>>>>> numpts=10 >>>>>>>>>>>> tsteps = 12 >>>>>>>>>>>> vari = 22 >>>>>>>>>>>> >>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>>> >>>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>>> >>>>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>>>> >>>>>>>>>>> I think this should do it >>>>>>>>>>> >>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), >>>>>>>>>>> 0] >>>>>>>>>>> >>>>>>>>>>> Josef >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> View this message in context: >>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mdekauwe at gmail.com Fri May 28 16:14:56 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 28 May 2010 13:14:56 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> Message-ID: <28711444.post@talk.nabble.com> ok - something like this then...but how would i get the index for the month for the data array (where month is 0, 1, 2, 4 ... 11)? data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0] and would that be quicker than making an array months... months = np.arange(numyears * nummonths) and you that instead like you suggested x[start:end:12,:]? Many thanks again... josef.pktd wrote: > > On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: >> >> Ok thanks...I'll take a look. >> >> Back to my loops issue. What if instead this time I wanted to take an >> average so every march in 11 years, is there a quicker way to go about >> doing >> that than my current method? >> >> nummonths = 12 >> numyears = 11 >> >> for month in xrange(nummonths): >> ? ?for i in xrange(numpts): >> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths): >> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0] > > > x[start:end:12,:] gives you every 12th row of an array x > > something like this should work to get rid of the inner loop, or you > could directly put > range(month, numyears * nummonths, nummonths) into the array instead > of ym and sum() > > Josef > > >> >> so for each point in the array for a given month i am jumping through and >> getting the next years month and so on, summing it. >> >> Thanks... >> >> >> josef.pktd wrote: >>> >>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >>>> >>>> Could you possibly if you have time explain further your comment re the >>>> p-values, your suggesting I am misusing them? >>> >>> Depends on your use and interpretation >>> >>> test statistics, p-values are random variables, if you look at several >>> tests at the same time, some p-values will be large just by chance. >>> If, for example you just look at the largest test statistic, then the >>> distribution for the max of several test statistics is not the same as >>> the distribution for a single test statistic >>> >>> http://en.wikipedia.org/wiki/Multiple_comparisons >>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm >>> >>> we also just had a related discussion for ANOVA post-hoc tests on the >>> pystatsmodels group. >>> >>> Josef >>>> >>>> Thanks. >>>> >>>> >>>> josef.pktd wrote: >>>>> >>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >>>>>> >>>>>> Sounds like I am stuck with the loop as I need to do the comparison >>>>>> for >>>>>> each >>>>>> pixel of the world and then I have a basemap function call which I >>>>>> guess >>>>>> slows it down further...hmm >>>>> >>>>> I don't see much that could be done differently, after a brief look. >>>>> >>>>> stats.pearsonr could be replaced by an array version using directly >>>>> the formula for correlation even with nans. wilcoxon looks slow, and I >>>>> never tried or seen a faster version. >>>>> >>>>> just a reminder, the p-values are for a single test, when you have >>>>> many of them, then they don't have the right size/confidence level for >>>>> an overall or joint test. (some packages report a Bonferroni >>>>> correction in this case) >>>>> >>>>> Josef >>>>> >>>>> >>>>>> >>>>>> i.e. >>>>>> >>>>>> def compareSnowData(jules_var): >>>>>> ? ?# Extract the 11 years of snow data and return >>>>>> ? ?outrows = 180 >>>>>> ? ?outcols = 360 >>>>>> ? ?numyears = 11 >>>>>> ? ?nummonths = 12 >>>>>> >>>>>> ? ?# Read various files >>>>>> ? ?fname="world_valid_jules_pts.ascii" >>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>>>> jo.read_land_points_ascii(fname, 1.0) >>>>>> >>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>> numcols=1, >>>>>> \ >>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>> numcols=1, >>>>>> \ >>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>> >>>>>> ? ?# grab some space >>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>>>> dtype=np.float32) >>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>>>> dtype=np.float32) >>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>> np.nan >>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>> np.nan >>>>>> >>>>>> ? ?# extract the data >>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>>>> ? ?#for month in xrange(numyears * nummonths): >>>>>> ? ?# ? ?for i in xrange(numpts): >>>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >>>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >>>>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>>>> ? ?# ? ? ? ?else: >>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>>>> ? ?# ? ? ? ?if data2 > 0.0: >>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>>>> ? ?# ? ? ? ?else: >>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>>>> >>>>>> ? ?# exclude any months from *both* arrays where we have dodgy data, >>>>>> else >>>>>> we >>>>>> ? ?# can't do the correlations correctly!! >>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>>>> >>>>>> ? ?# put data on a regular grid... >>>>>> ? ?print 'regridding landpts...' >>>>>> ? ?for i in xrange(numpts): >>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>>> func >>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>> >>>>>> ? ? ? ?# r^2 >>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years >>>>>> of >>>>>> data >>>>>> ? ? ? ?if len(x) and len(y) > 50: >>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>> (stats.pearsonr(x, y)[0])**2 >>>>>> >>>>>> ? ? ? ?# wilcox signed rank test >>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>> ? ? ? ?d = x - y >>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>> non-zero >>>>>> differences >>>>>> ? ? ? ?count = len(d) >>>>>> ? ? ? ?if count > 10: >>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>> np.mean(x - y) >>>>>> >>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>>>> >>>>>> >>>>>> josef.pktd wrote: >>>>>>> >>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe >>>>>>> wrote: >>>>>>>> >>>>>>>> Also I then need to remap the 2D array I make onto another grid >>>>>>>> (the >>>>>>>> world in >>>>>>>> this case). Which again I had am doing with a loop (note numpts is >>>>>>>> a >>>>>>>> lot >>>>>>>> bigger than my example above). >>>>>>>> >>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>>> np.nan >>>>>>>> for i in xrange(numpts): >>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>> stats >>>>>>>> func >>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>> >>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>> ? ? ? ?d = x - y >>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>> non-zero >>>>>>>> differences >>>>>>>> ? ? ? ?count = len(d) >>>>>>>> ? ? ? ?if count > 10: >>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>> np.mean(x - y) >>>>>>>> >>>>>>>> Now I think I can push the data in one move into the >>>>>>>> wilcoxStats_snow >>>>>>>> array >>>>>>>> by removing the index, >>>>>>>> but I can't see how I will get the individual x and y pts for each >>>>>>>> array >>>>>>>> member correctly without the loop, this was my attempt which of >>>>>>>> course >>>>>>>> doesn't work! >>>>>>>> >>>>>>>> x = data1_snow[:,:] >>>>>>>> x = x[np.isfinite(x)] >>>>>>>> y = data2_snow[:,:] >>>>>>>> y = y[np.isfinite(y)] >>>>>>>> >>>>>>>> # r^2 >>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of >>>>>>>> data >>>>>>>> if len(x) and len(y) > 50: >>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>>>>> y)[0])**2 >>>>>>> >>>>>>> >>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d >>>>>>> arrays >>>>>>> at a time (if I read the help correctly). >>>>>>> >>>>>>> Also the presence of nans might force the use a loop. stats.mstats >>>>>>> has >>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even >>>>>>> when vectorized operations would work with regular arrays, nan or >>>>>>> masked array versions still have to loop in many cases.) >>>>>>> >>>>>>> If you have many columns with count <= 10, so that wilcoxon is not >>>>>>> calculated then it might be worth to use only array operations up to >>>>>>> that point. If wilcoxon is calculated most of the time, then it's >>>>>>> not >>>>>>> worth thinking too hard about this. >>>>>>> >>>>>>> Josef >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> mdekauwe wrote: >>>>>>>>> >>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods >>>>>>>>> work. >>>>>>>>> >>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>>>> there >>>>>>>>> a >>>>>>>>> link you can recommend I should read? Does that mean given I have >>>>>>>>> 4dims >>>>>>>>> that Josef's suggestion would be more advised in this case? >>>>>>> >>>>>>> There were several discussions on the mailing lists (fancy slicing >>>>>>> and >>>>>>> indexing). Your case is safe, but if you run in future into funny >>>>>>> shapes, you can look up the details. >>>>>>> when in doubt, I use np.arange(...) >>>>>>> >>>>>>> Josef >>>>>>> >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> josef.pktd wrote: >>>>>>>>>> >>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks that works... >>>>>>>>>>> >>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was >>>>>>>>>>> the >>>>>>>>>>> step >>>>>>>>>>> I >>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the >>>>>>>>>>> the >>>>>>>>>>> two >>>>>>>>>>> for >>>>>>>>>>> loops? Do I have that right? >>>>>>>>>> >>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>>>> dimension, >>>>>>>>>> then you can use slicing. It might be faster. >>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>>>> >>>>>>>>>> Josef >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> A lot quicker...! >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a >>>>>>>>>>>>> 2D >>>>>>>>>>>>> array, >>>>>>>>>>>>> but >>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in >>>>>>>>>>>>> reality >>>>>>>>>>>>> the >>>>>>>>>>>>> arrays >>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>>>> solution >>>>>>>>>>>>> as >>>>>>>>>>>>> well >>>>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>>>> difficult >>>>>>>>>>>>> to >>>>>>>>>>>>> get >>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get >>>>>>>>>>>>> that >>>>>>>>>>>>> one >>>>>>>>>>>>> could >>>>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>>>> >>>>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>>>> >>>>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> import numpy as np >>>>>>>>>>>>> >>>>>>>>>>>>> numpts=10 >>>>>>>>>>>>> tsteps = 12 >>>>>>>>>>>>> vari = 22 >>>>>>>>>>>>> >>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>>>> >>>>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>>>> >>>>>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>>>>> >>>>>>>>>>>> I think this should do it >>>>>>>>>>>> >>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, >>>>>>>>>>>> np.arange(numpts), >>>>>>>>>>>> 0] >>>>>>>>>>>> >>>>>>>>>>>> Josef >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> View this message in context: >>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Fri May 28 16:25:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 16:25:59 -0400 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28711444.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> <28711444.post@talk.nabble.com> Message-ID: On Fri, May 28, 2010 at 4:14 PM, mdekauwe wrote: > > ok - something like this then...but how would i get the index for the month > for the data array (where month is 0, 1, 2, 4 ... 11)? > > data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0] you would still need to start at the right month data[month,:] = array[xrange(month, numyears * nummonths, nummonths),VAR,:,0] or data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0] an alternative would be a reshape with an extra month dimension and then sum only once over the year axis. this might be faster but trickier to get the correct reshape . Josef > > and would that be quicker than making an array months... > > months = np.arange(numyears * nummonths) > > and you that instead like you suggested x[start:end:12,:]? > > Many thanks again... > > > josef.pktd wrote: >> >> On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: >>> >>> Ok thanks...I'll take a look. >>> >>> Back to my loops issue. What if instead this time I wanted to take an >>> average so every march in 11 years, is there a quicker way to go about >>> doing >>> that than my current method? >>> >>> nummonths = 12 >>> numyears = 11 >>> >>> for month in xrange(nummonths): >>> ? ?for i in xrange(numpts): >>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths): >>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0] >> >> >> x[start:end:12,:] gives you every 12th row of an array x >> >> something like this should work to get rid of the inner loop, or you >> could directly put >> range(month, numyears * nummonths, nummonths) into the array instead >> of ym and sum() >> >> Josef >> >> >>> >>> so for each point in the array for a given month i am jumping through and >>> getting the next years month and so on, summing it. >>> >>> Thanks... >>> >>> >>> josef.pktd wrote: >>>> >>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >>>>> >>>>> Could you possibly if you have time explain further your comment re the >>>>> p-values, your suggesting I am misusing them? >>>> >>>> Depends on your use and interpretation >>>> >>>> test statistics, p-values are random variables, if you look at several >>>> tests at the same time, some p-values will be large just by chance. >>>> If, for example you just look at the largest test statistic, then the >>>> distribution for the max of several test statistics is not the same as >>>> the distribution for a single test statistic >>>> >>>> http://en.wikipedia.org/wiki/Multiple_comparisons >>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm >>>> >>>> we also just had a related discussion for ANOVA post-hoc tests on the >>>> pystatsmodels group. >>>> >>>> Josef >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> josef.pktd wrote: >>>>>> >>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe wrote: >>>>>>> >>>>>>> Sounds like I am stuck with the loop as I need to do the comparison >>>>>>> for >>>>>>> each >>>>>>> pixel of the world and then I have a basemap function call which I >>>>>>> guess >>>>>>> slows it down further...hmm >>>>>> >>>>>> I don't see much that could be done differently, after a brief look. >>>>>> >>>>>> stats.pearsonr could be replaced by an array version using directly >>>>>> the formula for correlation even with nans. wilcoxon looks slow, and I >>>>>> never tried or seen a faster version. >>>>>> >>>>>> just a reminder, the p-values are for a single test, when you have >>>>>> many of them, then they don't have the right size/confidence level for >>>>>> an overall or joint test. (some packages report a Bonferroni >>>>>> correction in this case) >>>>>> >>>>>> Josef >>>>>> >>>>>> >>>>>>> >>>>>>> i.e. >>>>>>> >>>>>>> def compareSnowData(jules_var): >>>>>>> ? ?# Extract the 11 years of snow data and return >>>>>>> ? ?outrows = 180 >>>>>>> ? ?outcols = 360 >>>>>>> ? ?numyears = 11 >>>>>>> ? ?nummonths = 12 >>>>>>> >>>>>>> ? ?# Read various files >>>>>>> ? ?fname="world_valid_jules_pts.ascii" >>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>>>>> jo.read_land_points_ascii(fname, 1.0) >>>>>>> >>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>> numcols=1, >>>>>>> \ >>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>> numcols=1, >>>>>>> \ >>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>> >>>>>>> ? ?# grab some space >>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>>>>> dtype=np.float32) >>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>>>>> dtype=np.float32) >>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>> np.nan >>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>> np.nan >>>>>>> >>>>>>> ? ?# extract the data >>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>>>>> ? ?#for month in xrange(numyears * nummonths): >>>>>>> ? ?# ? ?for i in xrange(numpts): >>>>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0] >>>>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0] >>>>>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>>>>> ? ?# ? ? ? ?else: >>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>>>>> ? ?# ? ? ? ?if data2 > 0.0: >>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>>>>> ? ?# ? ? ? ?else: >>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>>>>> >>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy data, >>>>>>> else >>>>>>> we >>>>>>> ? ?# can't do the correlations correctly!! >>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>>>>> >>>>>>> ? ?# put data on a regular grid... >>>>>>> ? ?print 'regridding landpts...' >>>>>>> ? ?for i in xrange(numpts): >>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats >>>>>>> func >>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>> >>>>>>> ? ? ? ?# r^2 >>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years >>>>>>> of >>>>>>> data >>>>>>> ? ? ? ?if len(x) and len(y) > 50: >>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>> (stats.pearsonr(x, y)[0])**2 >>>>>>> >>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>> ? ? ? ?d = x - y >>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>> non-zero >>>>>>> differences >>>>>>> ? ? ? ?count = len(d) >>>>>>> ? ? ? ?if count > 10: >>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>> np.mean(x - y) >>>>>>> >>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>>>>> >>>>>>> >>>>>>> josef.pktd wrote: >>>>>>>> >>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Also I then need to remap the 2D array I make onto another grid >>>>>>>>> (the >>>>>>>>> world in >>>>>>>>> this case). Which again I had am doing with a loop (note numpts is >>>>>>>>> a >>>>>>>>> lot >>>>>>>>> bigger than my example above). >>>>>>>>> >>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>>>> np.nan >>>>>>>>> for i in xrange(numpts): >>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>>> stats >>>>>>>>> func >>>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>>> >>>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>>> ? ? ? ?d = x - y >>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>>> non-zero >>>>>>>>> differences >>>>>>>>> ? ? ? ?count = len(d) >>>>>>>>> ? ? ? ?if count > 10: >>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>>> np.mean(x - y) >>>>>>>>> >>>>>>>>> Now I think I can push the data in one move into the >>>>>>>>> wilcoxStats_snow >>>>>>>>> array >>>>>>>>> by removing the index, >>>>>>>>> but I can't see how I will get the individual x and y pts for each >>>>>>>>> array >>>>>>>>> member correctly without the loop, this was my attempt which of >>>>>>>>> course >>>>>>>>> doesn't work! >>>>>>>>> >>>>>>>>> x = data1_snow[:,:] >>>>>>>>> x = x[np.isfinite(x)] >>>>>>>>> y = data2_snow[:,:] >>>>>>>>> y = y[np.isfinite(y)] >>>>>>>>> >>>>>>>>> # r^2 >>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of >>>>>>>>> data >>>>>>>>> if len(x) and len(y) > 50: >>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>>>>>> y)[0])**2 >>>>>>>> >>>>>>>> >>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you >>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d >>>>>>>> arrays >>>>>>>> at a time (if I read the help correctly). >>>>>>>> >>>>>>>> Also the presence of nans might force the use a loop. stats.mstats >>>>>>>> has >>>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even >>>>>>>> when vectorized operations would work with regular arrays, nan or >>>>>>>> masked array versions still have to loop in many cases.) >>>>>>>> >>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not >>>>>>>> calculated then it might be worth to use only array operations up to >>>>>>>> that point. If wilcoxon is calculated most of the time, then it's >>>>>>>> not >>>>>>>> worth thinking too hard about this. >>>>>>>> >>>>>>>> Josef >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> mdekauwe wrote: >>>>>>>>>> >>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods >>>>>>>>>> work. >>>>>>>>>> >>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>>>>> there >>>>>>>>>> a >>>>>>>>>> link you can recommend I should read? Does that mean given I have >>>>>>>>>> 4dims >>>>>>>>>> that Josef's suggestion would be more advised in this case? >>>>>>>> >>>>>>>> There were several discussions on the mailing lists (fancy slicing >>>>>>>> and >>>>>>>> indexing). Your case is safe, but if you run in future into funny >>>>>>>> shapes, you can look up the details. >>>>>>>> when in doubt, I use np.arange(...) >>>>>>>> >>>>>>>> Josef >>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> josef.pktd wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks that works... >>>>>>>>>>>> >>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was >>>>>>>>>>>> the >>>>>>>>>>>> step >>>>>>>>>>>> I >>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the >>>>>>>>>>>> the >>>>>>>>>>>> two >>>>>>>>>>>> for >>>>>>>>>>>> loops? Do I have that right? >>>>>>>>>>> >>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>>>>> dimension, >>>>>>>>>>> then you can use slicing. It might be faster. >>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>>>>> >>>>>>>>>>> Josef >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> A lot quicker...! >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a >>>>>>>>>>>>>> 2D >>>>>>>>>>>>>> array, >>>>>>>>>>>>>> but >>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in >>>>>>>>>>>>>> reality >>>>>>>>>>>>>> the >>>>>>>>>>>>>> arrays >>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>>>>> solution >>>>>>>>>>>>>> as >>>>>>>>>>>>>> well >>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>>>>> difficult >>>>>>>>>>>>>> to >>>>>>>>>>>>>> get >>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get >>>>>>>>>>>>>> that >>>>>>>>>>>>>> one >>>>>>>>>>>>>> could >>>>>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>>>>> >>>>>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>>>>> >>>>>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> import numpy as np >>>>>>>>>>>>>> >>>>>>>>>>>>>> numpts=10 >>>>>>>>>>>>>> tsteps = 12 >>>>>>>>>>>>>> vari = 22 >>>>>>>>>>>>>> >>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>>>>> >>>>>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>>>>> >>>>>>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>>>>>> >>>>>>>>>>>>> I think this should do it >>>>>>>>>>>>> >>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, >>>>>>>>>>>>> np.arange(numpts), >>>>>>>>>>>>> 0] >>>>>>>>>>>>> >>>>>>>>>>>>> Josef >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> View this message in context: >>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mdekauwe at gmail.com Fri May 28 16:28:12 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 28 May 2010 13:28:12 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> <28711444.post@talk.nabble.com> Message-ID: <28711581.post@talk.nabble.com> OK so I just need to have a quick loop across the 12 months then, that is fine, just thought there might have been a sneaky way! Really appreciated, getting there slowly! josef.pktd wrote: > > On Fri, May 28, 2010 at 4:14 PM, mdekauwe wrote: >> >> ok - something like this then...but how would i get the index for the >> month >> for the data array (where month is 0, 1, 2, 4 ... 11)? >> >> data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0] > > you would still need to start at the right month > data[month,:] = array[xrange(month, numyears * nummonths, > nummonths),VAR,:,0] > or > data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0] > > an alternative would be a reshape with an extra month dimension and > then sum only once over the year axis. this might be faster but > trickier to get the correct reshape . > > Josef > >> >> and would that be quicker than making an array months... >> >> months = np.arange(numyears * nummonths) >> >> and you that instead like you suggested x[start:end:12,:]? >> >> Many thanks again... >> >> >> josef.pktd wrote: >>> >>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: >>>> >>>> Ok thanks...I'll take a look. >>>> >>>> Back to my loops issue. What if instead this time I wanted to take an >>>> average so every march in 11 years, is there a quicker way to go about >>>> doing >>>> that than my current method? >>>> >>>> nummonths = 12 >>>> numyears = 11 >>>> >>>> for month in xrange(nummonths): >>>> ? ?for i in xrange(numpts): >>>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths): >>>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0] >>> >>> >>> x[start:end:12,:] gives you every 12th row of an array x >>> >>> something like this should work to get rid of the inner loop, or you >>> could directly put >>> range(month, numyears * nummonths, nummonths) into the array instead >>> of ym and sum() >>> >>> Josef >>> >>> >>>> >>>> so for each point in the array for a given month i am jumping through >>>> and >>>> getting the next years month and so on, summing it. >>>> >>>> Thanks... >>>> >>>> >>>> josef.pktd wrote: >>>>> >>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >>>>>> >>>>>> Could you possibly if you have time explain further your comment re >>>>>> the >>>>>> p-values, your suggesting I am misusing them? >>>>> >>>>> Depends on your use and interpretation >>>>> >>>>> test statistics, p-values are random variables, if you look at several >>>>> tests at the same time, some p-values will be large just by chance. >>>>> If, for example you just look at the largest test statistic, then the >>>>> distribution for the max of several test statistics is not the same as >>>>> the distribution for a single test statistic >>>>> >>>>> http://en.wikipedia.org/wiki/Multiple_comparisons >>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm >>>>> >>>>> we also just had a related discussion for ANOVA post-hoc tests on the >>>>> pystatsmodels group. >>>>> >>>>> Josef >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> josef.pktd wrote: >>>>>>> >>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe >>>>>>> wrote: >>>>>>>> >>>>>>>> Sounds like I am stuck with the loop as I need to do the comparison >>>>>>>> for >>>>>>>> each >>>>>>>> pixel of the world and then I have a basemap function call which I >>>>>>>> guess >>>>>>>> slows it down further...hmm >>>>>>> >>>>>>> I don't see much that could be done differently, after a brief look. >>>>>>> >>>>>>> stats.pearsonr could be replaced by an array version using directly >>>>>>> the formula for correlation even with nans. wilcoxon looks slow, and >>>>>>> I >>>>>>> never tried or seen a faster version. >>>>>>> >>>>>>> just a reminder, the p-values are for a single test, when you have >>>>>>> many of them, then they don't have the right size/confidence level >>>>>>> for >>>>>>> an overall or joint test. (some packages report a Bonferroni >>>>>>> correction in this case) >>>>>>> >>>>>>> Josef >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> i.e. >>>>>>>> >>>>>>>> def compareSnowData(jules_var): >>>>>>>> ? ?# Extract the 11 years of snow data and return >>>>>>>> ? ?outrows = 180 >>>>>>>> ? ?outcols = 360 >>>>>>>> ? ?numyears = 11 >>>>>>>> ? ?nummonths = 12 >>>>>>>> >>>>>>>> ? ?# Read various files >>>>>>>> ? ?fname="world_valid_jules_pts.ascii" >>>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>>>>>> jo.read_land_points_ascii(fname, 1.0) >>>>>>>> >>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>>> numcols=1, >>>>>>>> \ >>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>>> numcols=1, >>>>>>>> \ >>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>>> >>>>>>>> ? ?# grab some space >>>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>>>>>> dtype=np.float32) >>>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>>>>>> dtype=np.float32) >>>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * >>>>>>>> np.nan >>>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) >>>>>>>> * >>>>>>>> np.nan >>>>>>>> >>>>>>>> ? ?# extract the data >>>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>>>>>> ? ?#for month in xrange(numyears * nummonths): >>>>>>>> ? ?# ? ?for i in xrange(numpts): >>>>>>>> ? ?# ? ? ? ?data1 = >>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0] >>>>>>>> ? ?# ? ? ? ?data2 = >>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0] >>>>>>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>>>>>> ? ?# ? ? ? ?else: >>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>>>>>> ? ?# ? ? ? ?if data2 > 0.0: >>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>>>>>> ? ?# ? ? ? ?else: >>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>>>>>> >>>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy >>>>>>>> data, >>>>>>>> else >>>>>>>> we >>>>>>>> ? ?# can't do the correlations correctly!! >>>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>>>>>> >>>>>>>> ? ?# put data on a regular grid... >>>>>>>> ? ?print 'regridding landpts...' >>>>>>>> ? ?for i in xrange(numpts): >>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>> stats >>>>>>>> func >>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>> >>>>>>>> ? ? ? ?# r^2 >>>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 >>>>>>>> years >>>>>>>> of >>>>>>>> data >>>>>>>> ? ? ? ?if len(x) and len(y) > 50: >>>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>> (stats.pearsonr(x, y)[0])**2 >>>>>>>> >>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>> ? ? ? ?d = x - y >>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>> non-zero >>>>>>>> differences >>>>>>>> ? ? ? ?count = len(d) >>>>>>>> ? ? ? ?if count > 10: >>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>> np.mean(x - y) >>>>>>>> >>>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>>>>>> >>>>>>>> >>>>>>>> josef.pktd wrote: >>>>>>>>> >>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Also I then need to remap the 2D array I make onto another grid >>>>>>>>>> (the >>>>>>>>>> world in >>>>>>>>>> this case). Which again I had am doing with a loop (note numpts >>>>>>>>>> is >>>>>>>>>> a >>>>>>>>>> lot >>>>>>>>>> bigger than my example above). >>>>>>>>>> >>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) >>>>>>>>>> * >>>>>>>>>> np.nan >>>>>>>>>> for i in xrange(numpts): >>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>>>> stats >>>>>>>>>> func >>>>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>>>> >>>>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>>>> ? ? ? ?d = x - y >>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>>>> non-zero >>>>>>>>>> differences >>>>>>>>>> ? ? ? ?count = len(d) >>>>>>>>>> ? ? ? ?if count > 10: >>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] >>>>>>>>>> = >>>>>>>>>> np.mean(x - y) >>>>>>>>>> >>>>>>>>>> Now I think I can push the data in one move into the >>>>>>>>>> wilcoxStats_snow >>>>>>>>>> array >>>>>>>>>> by removing the index, >>>>>>>>>> but I can't see how I will get the individual x and y pts for >>>>>>>>>> each >>>>>>>>>> array >>>>>>>>>> member correctly without the loop, this was my attempt which of >>>>>>>>>> course >>>>>>>>>> doesn't work! >>>>>>>>>> >>>>>>>>>> x = data1_snow[:,:] >>>>>>>>>> x = x[np.isfinite(x)] >>>>>>>>>> y = data2_snow[:,:] >>>>>>>>>> y = y[np.isfinite(y)] >>>>>>>>>> >>>>>>>>>> # r^2 >>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of >>>>>>>>>> data >>>>>>>>>> if len(x) and len(y) > 50: >>>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, >>>>>>>>>> y)[0])**2 >>>>>>>>> >>>>>>>>> >>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then >>>>>>>>> you >>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d >>>>>>>>> arrays >>>>>>>>> at a time (if I read the help correctly). >>>>>>>>> >>>>>>>>> Also the presence of nans might force the use a loop. stats.mstats >>>>>>>>> has >>>>>>>>> masked array versions, but I didn't see wilcoxon in the list. >>>>>>>>> (Even >>>>>>>>> when vectorized operations would work with regular arrays, nan or >>>>>>>>> masked array versions still have to loop in many cases.) >>>>>>>>> >>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not >>>>>>>>> calculated then it might be worth to use only array operations up >>>>>>>>> to >>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's >>>>>>>>> not >>>>>>>>> worth thinking too hard about this. >>>>>>>>> >>>>>>>>> Josef >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> mdekauwe wrote: >>>>>>>>>>> >>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods >>>>>>>>>>> work. >>>>>>>>>>> >>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>>>>>> there >>>>>>>>>>> a >>>>>>>>>>> link you can recommend I should read? Does that mean given I >>>>>>>>>>> have >>>>>>>>>>> 4dims >>>>>>>>>>> that Josef's suggestion would be more advised in this case? >>>>>>>>> >>>>>>>>> There were several discussions on the mailing lists (fancy slicing >>>>>>>>> and >>>>>>>>> indexing). Your case is safe, but if you run in future into funny >>>>>>>>> shapes, you can look up the details. >>>>>>>>> when in doubt, I use np.arange(...) >>>>>>>>> >>>>>>>>> Josef >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks that works... >>>>>>>>>>>>> >>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that >>>>>>>>>>>>> was >>>>>>>>>>>>> the >>>>>>>>>>>>> step >>>>>>>>>>>>> I >>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces >>>>>>>>>>>>> the >>>>>>>>>>>>> the >>>>>>>>>>>>> two >>>>>>>>>>>>> for >>>>>>>>>>>>> loops? Do I have that right? >>>>>>>>>>>> >>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>>>>>> dimension, >>>>>>>>>>>> then you can use slicing. It might be faster. >>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>>>>>> >>>>>>>>>>>> Josef >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> A lot quicker...! >>>>>>>>>>>>> >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> 2D >>>>>>>>>>>>>>> array, >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in >>>>>>>>>>>>>>> reality >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> arrays >>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>>>>>> solution >>>>>>>>>>>>>>> as >>>>>>>>>>>>>>> well >>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>>>>>> difficult >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> get >>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>> could >>>>>>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> import numpy as np >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> numpts=10 >>>>>>>>>>>>>>> tsteps = 12 >>>>>>>>>>>>>>> vari = 22 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>>>>>> >>>>>>>>>>>>>> The index arrays need to be broadcastable against each other. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this should do it >>>>>>>>>>>>>> >>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, >>>>>>>>>>>>>> np.arange(numpts), >>>>>>>>>>>>>> 0] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Josef >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html >>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711581.html Sent from the Scipy-User mailing list archive at Nabble.com. From vanforeest at gmail.com Fri May 28 16:28:24 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 28 May 2010 22:28:24 +0200 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: Hi, Nice to see the issue to be taken up again. >> Discrete distributions on the real line don't *have* a pdf... > > Well, they *have* one; they just can't be implemented in floating point. :-) A distribution function can be decomposed in a part that can be represented by a pdf (absolute continuous), and a part that can be represented by a pmf (jumps), and some extra stuff (Cantor like functions) that we can safely neglect from a numerical point of view. (The discussion above is resolved in any book on measure theory, and covered by the Lebesgue decomposition theorem, for the interested...) I don't know how to resolve the name problem about pdf and pmf. I must admit I find it quite disturbing, since I also make these typo's, but I don't know how to resolve this neatly. >>> snip pdf(x), cdf(x) with x float would need to know whether x is a support point, but which might not be equal to the actual point because of floating point problems. So, the direct translation of rv_discrete doesn't work, and it looks like at least pdf needs to be accessible either pointwise for queries or using known support points for actual calculations. >>> About representing floats in a hashtable, this is indeed hard to resolve. However, for the particular purpose of defining a random variable with support on a finite set of reals, it might suffice to represent these reals by fractions, for instance, \pi \approx 22/7 (I realize better approximations exist.), and then store 22 and 7 separately. Then generalize rv_discrete such that it accepts tuples like (22, 7, 1.) with dtype (int, int, float). >>> No fun, and EDA dropped. >>> EDA dropped? I don't know what EDA means. I hope it does not have severe consequences. Nicky From mdekauwe at gmail.com Fri May 28 16:42:44 2010 From: mdekauwe at gmail.com (mdekauwe) Date: Fri, 28 May 2010 13:42:44 -0700 (PDT) Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28711581.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> <28711444.post@talk.nabble.com> <28711581.post@talk.nabble.com> Message-ID: <28711708.post@talk.nabble.com> In my original attempt I was only averaging values greater than 0.0, would this be the work around, I wonder if it is a bit clumsy...? for month in xrange(nummonths): temp[:] = array[month:numyears * nummonths:nummonths,VAR,:,0] temp = temp[temp>0.0] data[month, :] = np.mean(temp[:]) mdekauwe wrote: > > OK so I just need to have a quick loop across the 12 months then, that is > fine, just thought there might have been a sneaky way! > > Really appreciated, getting there slowly! > > > > josef.pktd wrote: >> >> On Fri, May 28, 2010 at 4:14 PM, mdekauwe wrote: >>> >>> ok - something like this then...but how would i get the index for the >>> month >>> for the data array (where month is 0, 1, 2, 4 ... 11)? >>> >>> data[month,:] = array[xrange(0, numyears * nummonths, >>> nummonths),VAR,:,0] >> >> you would still need to start at the right month >> data[month,:] = array[xrange(month, numyears * nummonths, >> nummonths),VAR,:,0] >> or >> data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0] >> >> an alternative would be a reshape with an extra month dimension and >> then sum only once over the year axis. this might be faster but >> trickier to get the correct reshape . >> >> Josef >> >>> >>> and would that be quicker than making an array months... >>> >>> months = np.arange(numyears * nummonths) >>> >>> and you that instead like you suggested x[start:end:12,:]? >>> >>> Many thanks again... >>> >>> >>> josef.pktd wrote: >>>> >>>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: >>>>> >>>>> Ok thanks...I'll take a look. >>>>> >>>>> Back to my loops issue. What if instead this time I wanted to take an >>>>> average so every march in 11 years, is there a quicker way to go about >>>>> doing >>>>> that than my current method? >>>>> >>>>> nummonths = 12 >>>>> numyears = 11 >>>>> >>>>> for month in xrange(nummonths): >>>>> ? ?for i in xrange(numpts): >>>>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths): >>>>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0] >>>> >>>> >>>> x[start:end:12,:] gives you every 12th row of an array x >>>> >>>> something like this should work to get rid of the inner loop, or you >>>> could directly put >>>> range(month, numyears * nummonths, nummonths) into the array instead >>>> of ym and sum() >>>> >>>> Josef >>>> >>>> >>>>> >>>>> so for each point in the array for a given month i am jumping through >>>>> and >>>>> getting the next years month and so on, summing it. >>>>> >>>>> Thanks... >>>>> >>>>> >>>>> josef.pktd wrote: >>>>>> >>>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe wrote: >>>>>>> >>>>>>> Could you possibly if you have time explain further your comment re >>>>>>> the >>>>>>> p-values, your suggesting I am misusing them? >>>>>> >>>>>> Depends on your use and interpretation >>>>>> >>>>>> test statistics, p-values are random variables, if you look at >>>>>> several >>>>>> tests at the same time, some p-values will be large just by chance. >>>>>> If, for example you just look at the largest test statistic, then the >>>>>> distribution for the max of several test statistics is not the same >>>>>> as >>>>>> the distribution for a single test statistic >>>>>> >>>>>> http://en.wikipedia.org/wiki/Multiple_comparisons >>>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm >>>>>> >>>>>> we also just had a related discussion for ANOVA post-hoc tests on the >>>>>> pystatsmodels group. >>>>>> >>>>>> Josef >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> josef.pktd wrote: >>>>>>>> >>>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Sounds like I am stuck with the loop as I need to do the >>>>>>>>> comparison >>>>>>>>> for >>>>>>>>> each >>>>>>>>> pixel of the world and then I have a basemap function call which I >>>>>>>>> guess >>>>>>>>> slows it down further...hmm >>>>>>>> >>>>>>>> I don't see much that could be done differently, after a brief >>>>>>>> look. >>>>>>>> >>>>>>>> stats.pearsonr could be replaced by an array version using directly >>>>>>>> the formula for correlation even with nans. wilcoxon looks slow, >>>>>>>> and I >>>>>>>> never tried or seen a faster version. >>>>>>>> >>>>>>>> just a reminder, the p-values are for a single test, when you have >>>>>>>> many of them, then they don't have the right size/confidence level >>>>>>>> for >>>>>>>> an overall or joint test. (some packages report a Bonferroni >>>>>>>> correction in this case) >>>>>>>> >>>>>>>> Josef >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> i.e. >>>>>>>>> >>>>>>>>> def compareSnowData(jules_var): >>>>>>>>> ? ?# Extract the 11 years of snow data and return >>>>>>>>> ? ?outrows = 180 >>>>>>>>> ? ?outcols = 360 >>>>>>>>> ? ?numyears = 11 >>>>>>>>> ? ?nummonths = 12 >>>>>>>>> >>>>>>>>> ? ?# Read various files >>>>>>>>> ? ?fname="world_valid_jules_pts.ascii" >>>>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) = >>>>>>>>> jo.read_land_points_ascii(fname, 1.0) >>>>>>>>> >>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" >>>>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>>>> numcols=1, >>>>>>>>> \ >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" >>>>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, >>>>>>>>> numcols=1, >>>>>>>>> \ >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26) >>>>>>>>> >>>>>>>>> ? ?# grab some space >>>>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), >>>>>>>>> dtype=np.float32) >>>>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), >>>>>>>>> dtype=np.float32) >>>>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) >>>>>>>>> * >>>>>>>>> np.nan >>>>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), >>>>>>>>> dtype=np.float32) * >>>>>>>>> np.nan >>>>>>>>> >>>>>>>>> ? ?# extract the data >>>>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0] >>>>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0] >>>>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) >>>>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) >>>>>>>>> ? ?#for month in xrange(numyears * nummonths): >>>>>>>>> ? ?# ? ?for i in xrange(numpts): >>>>>>>>> ? ?# ? ? ? ?data1 = >>>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0] >>>>>>>>> ? ?# ? ? ? ?data2 = >>>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0] >>>>>>>>> ? ?# ? ? ? ?if data1 >= 0.0: >>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1 >>>>>>>>> ? ?# ? ? ? ?else: >>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan >>>>>>>>> ? ?# ? ? ? ?if data2 > 0.0: >>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2 >>>>>>>>> ? ?# ? ? ? ?else: >>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan >>>>>>>>> >>>>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy >>>>>>>>> data, >>>>>>>>> else >>>>>>>>> we >>>>>>>>> ? ?# can't do the correlations correctly!! >>>>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) >>>>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) >>>>>>>>> >>>>>>>>> ? ?# put data on a regular grid... >>>>>>>>> ? ?print 'regridding landpts...' >>>>>>>>> ? ?for i in xrange(numpts): >>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>>> stats >>>>>>>>> func >>>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>>> >>>>>>>>> ? ? ? ?# r^2 >>>>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 >>>>>>>>> years >>>>>>>>> of >>>>>>>>> data >>>>>>>>> ? ? ? ?if len(x) and len(y) > 50: >>>>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>>> (stats.pearsonr(x, y)[0])**2 >>>>>>>>> >>>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>>> ? ? ? ?d = x - y >>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>>> non-zero >>>>>>>>> differences >>>>>>>>> ? ? ? ?count = len(d) >>>>>>>>> ? ? ? ?if count > 10: >>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = >>>>>>>>> np.mean(x - y) >>>>>>>>> >>>>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow) >>>>>>>>> >>>>>>>>> >>>>>>>>> josef.pktd wrote: >>>>>>>>>> >>>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Also I then need to remap the 2D array I make onto another grid >>>>>>>>>>> (the >>>>>>>>>>> world in >>>>>>>>>>> this case). Which again I had am doing with a loop (note numpts >>>>>>>>>>> is >>>>>>>>>>> a >>>>>>>>>>> lot >>>>>>>>>>> bigger than my example above). >>>>>>>>>>> >>>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) >>>>>>>>>>> * >>>>>>>>>>> np.nan >>>>>>>>>>> for i in xrange(numpts): >>>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the >>>>>>>>>>> stats >>>>>>>>>>> func >>>>>>>>>>> ? ? ? ?x = data1_snow[:,i] >>>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)] >>>>>>>>>>> ? ? ? ?y = data2_snow[:,i] >>>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)] >>>>>>>>>>> >>>>>>>>>>> ? ? ? ?# wilcox signed rank test >>>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test >>>>>>>>>>> ? ? ? ?d = x - y >>>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all >>>>>>>>>>> non-zero >>>>>>>>>>> differences >>>>>>>>>>> ? ? ? ?count = len(d) >>>>>>>>>>> ? ? ? ?if count > 10: >>>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y) >>>>>>>>>>> ? ? ? ? ? ?# only map out sign different data >>>>>>>>>>> ? ? ? ? ? ?if pval < 0.05: >>>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] >>>>>>>>>>> = >>>>>>>>>>> np.mean(x - y) >>>>>>>>>>> >>>>>>>>>>> Now I think I can push the data in one move into the >>>>>>>>>>> wilcoxStats_snow >>>>>>>>>>> array >>>>>>>>>>> by removing the index, >>>>>>>>>>> but I can't see how I will get the individual x and y pts for >>>>>>>>>>> each >>>>>>>>>>> array >>>>>>>>>>> member correctly without the loop, this was my attempt which of >>>>>>>>>>> course >>>>>>>>>>> doesn't work! >>>>>>>>>>> >>>>>>>>>>> x = data1_snow[:,:] >>>>>>>>>>> x = x[np.isfinite(x)] >>>>>>>>>>> y = data2_snow[:,:] >>>>>>>>>>> y = y[np.isfinite(y)] >>>>>>>>>>> >>>>>>>>>>> # r^2 >>>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of >>>>>>>>>>> data >>>>>>>>>>> if len(x) and len(y) > 50: >>>>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = >>>>>>>>>>> (stats.pearsonr(x, >>>>>>>>>>> y)[0])**2 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then >>>>>>>>>> you >>>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d >>>>>>>>>> arrays >>>>>>>>>> at a time (if I read the help correctly). >>>>>>>>>> >>>>>>>>>> Also the presence of nans might force the use a loop. >>>>>>>>>> stats.mstats >>>>>>>>>> has >>>>>>>>>> masked array versions, but I didn't see wilcoxon in the list. >>>>>>>>>> (Even >>>>>>>>>> when vectorized operations would work with regular arrays, nan or >>>>>>>>>> masked array versions still have to loop in many cases.) >>>>>>>>>> >>>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is >>>>>>>>>> not >>>>>>>>>> calculated then it might be worth to use only array operations up >>>>>>>>>> to >>>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's >>>>>>>>>> not >>>>>>>>>> worth thinking too hard about this. >>>>>>>>>> >>>>>>>>>> Josef >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> thanks. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> mdekauwe wrote: >>>>>>>>>>>> >>>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods >>>>>>>>>>>> work. >>>>>>>>>>>> >>>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is >>>>>>>>>>>> there >>>>>>>>>>>> a >>>>>>>>>>>> link you can recommend I should read? Does that mean given I >>>>>>>>>>>> have >>>>>>>>>>>> 4dims >>>>>>>>>>>> that Josef's suggestion would be more advised in this case? >>>>>>>>>> >>>>>>>>>> There were several discussions on the mailing lists (fancy >>>>>>>>>> slicing >>>>>>>>>> and >>>>>>>>>> indexing). Your case is safe, but if you run in future into funny >>>>>>>>>> shapes, you can look up the details. >>>>>>>>>> when in doubt, I use np.arange(...) >>>>>>>>>> >>>>>>>>>> Josef >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks that works... >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that >>>>>>>>>>>>>> was >>>>>>>>>>>>>> the >>>>>>>>>>>>>> step >>>>>>>>>>>>>> I >>>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces >>>>>>>>>>>>>> the >>>>>>>>>>>>>> the >>>>>>>>>>>>>> two >>>>>>>>>>>>>> for >>>>>>>>>>>>>> loops? Do I have that right? >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a >>>>>>>>>>>>> dimension, >>>>>>>>>>>>> then you can use slicing. It might be faster. >>>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more >>>>>>>>>>>>> dimensions can have some surprise switching of axes. >>>>>>>>>>>>> >>>>>>>>>>>>> Josef >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> A lot quicker...! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> josef.pktd wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> 2D >>>>>>>>>>>>>>>> array, >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in >>>>>>>>>>>>>>>> reality >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> arrays >>>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the >>>>>>>>>>>>>>>> solution >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>> well >>>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite >>>>>>>>>>>>>>>> difficult >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> get >>>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I >>>>>>>>>>>>>>>> get >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>> precompute the indices's i and j i.e. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i = np.arange(tsteps) >>>>>>>>>>>>>>>> j = np.arange(numpts) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> but just can't get my head round how i then use them... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> import numpy as np >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> numpts=10 >>>>>>>>>>>>>>>> tsteps = 12 >>>>>>>>>>>>>>>> vari = 22 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) >>>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) >>>>>>>>>>>>>>>> index = np.arange(numpts) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> for i in xrange(tsteps): >>>>>>>>>>>>>>>> ? ?for j in xrange(numpts): >>>>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The index arrays need to be broadcastable against each >>>>>>>>>>>>>>> other. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think this should do it >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, >>>>>>>>>>>>>>> np.arange(numpts), >>>>>>>>>>>>>>> 0] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Josef >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html >>>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at >>>>>>>>>>>>>>>> Nabble.com. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> View this message in context: >>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html >>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> View this message in context: >>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html >>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> SciPy-User mailing list >>>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> SciPy-User mailing list >>>>>>>>>> SciPy-User at scipy.org >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html >>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> SciPy-User mailing list >>>>>>>>> SciPy-User at scipy.org >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> SciPy-User mailing list >>>>>>>> SciPy-User at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html >>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> SciPy-User mailing list >>>>>>> SciPy-User at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html >>>>> Sent from the Scipy-User mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html >>> Sent from the Scipy-User mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711708.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Fri May 28 16:48:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 16:48:16 -0400 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 4:28 PM, nicky van foreest wrote: > Hi, > > Nice to see the issue to be taken up again. > >>> Discrete distributions on the real line don't *have* a pdf... >> >> Well, they *have* one; they just can't be implemented in floating point. :-) > > A distribution function can be decomposed in a part that can be > represented by a pdf (absolute continuous), and a part that can be > represented by a pmf (jumps), and some extra stuff (Cantor like > functions) that we can safely neglect from a numerical point of view. > (The discussion above is resolved in any book on measure theory, and > covered by the Lebesgue decomposition theorem, for the interested...) > > I don't know how to resolve the name problem about pdf and pmf. I must > admit I find it quite disturbing, since I also make these typo's, but > I don't know how to resolve this neatly. > >>>> snip > pdf(x), cdf(x) ?with x float would need to know whether x is a support > point, but which might not be equal to the actual point because of > floating point problems. > So, the direct translation of rv_discrete doesn't work, and it looks > like at least pdf needs to be accessible either pointwise for queries > or using known support points for actual calculations. >>>> > About representing floats in a hashtable, this is indeed hard to > resolve. However, for the particular purpose of defining a random > variable with support on a finite set of reals, it might suffice to > represent these reals by fractions, for instance, \pi \approx 22/7 (I > realize better approximations exist.), and then store 22 and 7 > separately. Then generalize rv_discrete such that it accepts tuples > like (22, 7, 1.) with dtype (int, int, float). What is the float in this? how do you find which fractions to use? I don't want to restrict necessarily to finite number of points, but countable, e.g. what's the distribution of sqrt(x) where x is Poisson (just made up). I still need to think about this, I thought the cheapest might be approx_equal rounding, or searchsorted for the finite case. But I think the direct access for a specific x won't be a big usecase, because the calculations for expectation, cdf or other calculations can loop over the array of support points. That's why I was thinking about dual access to pmf. > >>>> > No fun, and EDA dropped. >>>> > EDA dropped? I don't know what EDA means. I hope it does not have > severe consequences. today is my lucky day with typos, how about ETA http://en.wikipedia.org/wiki/Estimated_time_of_arrival Josef http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ben.root at ou.edu Fri May 28 17:49:28 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 May 2010 16:49:28 -0500 Subject: [SciPy-User] re[SciPy-user] moving for loops... In-Reply-To: <28711581.post@talk.nabble.com> References: <28633477.post@talk.nabble.com> <28634924.post@talk.nabble.com> <28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> <28642434.post@talk.nabble.com> <28686356.post@talk.nabble.com> <28711249.post@talk.nabble.com> <28711444.post@talk.nabble.com> <28711581.post@talk.nabble.com> Message-ID: If you want an average for each month from your timeseries, then the sneaky way would be to reshape your array so that the time dimension is split into two (month, year) dimensions. For a 1-D array, this would be: > dataarray = numpy.mod(numpy.arange(36), 12) > print dataarray array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) > datamatrix = dataarray.reshape((-1, 12)) > print datamatrix array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) Hope that helps. Ben Root On Fri, May 28, 2010 at 3:28 PM, mdekauwe wrote: > > OK so I just need to have a quick loop across the 12 months then, that is > fine, just thought there might have been a sneaky way! > > Really appreciated, getting there slowly! > > > > josef.pktd wrote: > > > > On Fri, May 28, 2010 at 4:14 PM, mdekauwe wrote: > >> > >> ok - something like this then...but how would i get the index for the > >> month > >> for the data array (where month is 0, 1, 2, 4 ... 11)? > >> > >> data[month,:] = array[xrange(0, numyears * nummonths, > nummonths),VAR,:,0] > > > > you would still need to start at the right month > > data[month,:] = array[xrange(month, numyears * nummonths, > > nummonths),VAR,:,0] > > or > > data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0] > > > > an alternative would be a reshape with an extra month dimension and > > then sum only once over the year axis. this might be faster but > > trickier to get the correct reshape . > > > > Josef > > > >> > >> and would that be quicker than making an array months... > >> > >> months = np.arange(numyears * nummonths) > >> > >> and you that instead like you suggested x[start:end:12,:]? > >> > >> Many thanks again... > >> > >> > >> josef.pktd wrote: > >>> > >>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe wrote: > >>>> > >>>> Ok thanks...I'll take a look. > >>>> > >>>> Back to my loops issue. What if instead this time I wanted to take an > >>>> average so every march in 11 years, is there a quicker way to go about > >>>> doing > >>>> that than my current method? > >>>> > >>>> nummonths = 12 > >>>> numyears = 11 > >>>> > >>>> for month in xrange(nummonths): > >>>> for i in xrange(numpts): > >>>> for ym in xrange(month, numyears * nummonths, nummonths): > >>>> data[month, i] += array[ym, VAR, land_pts_index[i], 0] > >>> > >>> > >>> x[start:end:12,:] gives you every 12th row of an array x > >>> > >>> something like this should work to get rid of the inner loop, or you > >>> could directly put > >>> range(month, numyears * nummonths, nummonths) into the array instead > >>> of ym and sum() > >>> > >>> Josef > >>> > >>> > >>>> > >>>> so for each point in the array for a given month i am jumping through > >>>> and > >>>> getting the next years month and so on, summing it. > >>>> > >>>> Thanks... > >>>> > >>>> > >>>> josef.pktd wrote: > >>>>> > >>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe > wrote: > >>>>>> > >>>>>> Could you possibly if you have time explain further your comment re > >>>>>> the > >>>>>> p-values, your suggesting I am misusing them? > >>>>> > >>>>> Depends on your use and interpretation > >>>>> > >>>>> test statistics, p-values are random variables, if you look at > several > >>>>> tests at the same time, some p-values will be large just by chance. > >>>>> If, for example you just look at the largest test statistic, then the > >>>>> distribution for the max of several test statistics is not the same > as > >>>>> the distribution for a single test statistic > >>>>> > >>>>> http://en.wikipedia.org/wiki/Multiple_comparisons > >>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm > >>>>> > >>>>> we also just had a related discussion for ANOVA post-hoc tests on the > >>>>> pystatsmodels group. > >>>>> > >>>>> Josef > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> > >>>>>> josef.pktd wrote: > >>>>>>> > >>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Sounds like I am stuck with the loop as I need to do the > comparison > >>>>>>>> for > >>>>>>>> each > >>>>>>>> pixel of the world and then I have a basemap function call which I > >>>>>>>> guess > >>>>>>>> slows it down further...hmm > >>>>>>> > >>>>>>> I don't see much that could be done differently, after a brief > look. > >>>>>>> > >>>>>>> stats.pearsonr could be replaced by an array version using directly > >>>>>>> the formula for correlation even with nans. wilcoxon looks slow, > and > >>>>>>> I > >>>>>>> never tried or seen a faster version. > >>>>>>> > >>>>>>> just a reminder, the p-values are for a single test, when you have > >>>>>>> many of them, then they don't have the right size/confidence level > >>>>>>> for > >>>>>>> an overall or joint test. (some packages report a Bonferroni > >>>>>>> correction in this case) > >>>>>>> > >>>>>>> Josef > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> i.e. > >>>>>>>> > >>>>>>>> def compareSnowData(jules_var): > >>>>>>>> # Extract the 11 years of snow data and return > >>>>>>>> outrows = 180 > >>>>>>>> outcols = 360 > >>>>>>>> numyears = 11 > >>>>>>>> nummonths = 12 > >>>>>>>> > >>>>>>>> # Read various files > >>>>>>>> fname="world_valid_jules_pts.ascii" > >>>>>>>> (numpts, land_pts_index, latitude, longitude, rows, cols) = > >>>>>>>> jo.read_land_points_ascii(fname, 1.0) > >>>>>>>> > >>>>>>>> fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra" > >>>>>>>> jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, > >>>>>>>> numcols=1, > >>>>>>>> \ > >>>>>>>> timesteps=132, numvars=26) > >>>>>>>> fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra" > >>>>>>>> jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, > >>>>>>>> numcols=1, > >>>>>>>> \ > >>>>>>>> timesteps=132, numvars=26) > >>>>>>>> > >>>>>>>> # grab some space > >>>>>>>> data1_snow = np.zeros((nummonths * numyears, numpts), > >>>>>>>> dtype=np.float32) > >>>>>>>> data2_snow = np.zeros((nummonths * numyears, numpts), > >>>>>>>> dtype=np.float32) > >>>>>>>> pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) > * > >>>>>>>> np.nan > >>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), > dtype=np.float32) > >>>>>>>> * > >>>>>>>> np.nan > >>>>>>>> > >>>>>>>> # extract the data > >>>>>>>> data1_snow = jules_data1[:,jules_var,:,0] > >>>>>>>> data2_snow = jules_data2[:,jules_var,:,0] > >>>>>>>> data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow) > >>>>>>>> data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow) > >>>>>>>> #for month in xrange(numyears * nummonths): > >>>>>>>> # for i in xrange(numpts): > >>>>>>>> # data1 = > >>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0] > >>>>>>>> # data2 = > >>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0] > >>>>>>>> # if data1 >= 0.0: > >>>>>>>> # data1_snow[month,i] = data1 > >>>>>>>> # else: > >>>>>>>> # data1_snow[month,i] = np.nan > >>>>>>>> # if data2 > 0.0: > >>>>>>>> # data2_snow[month,i] = data2 > >>>>>>>> # else: > >>>>>>>> # data2_snow[month,i] = np.nan > >>>>>>>> > >>>>>>>> # exclude any months from *both* arrays where we have dodgy > >>>>>>>> data, > >>>>>>>> else > >>>>>>>> we > >>>>>>>> # can't do the correlations correctly!! > >>>>>>>> data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow) > >>>>>>>> data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow) > >>>>>>>> > >>>>>>>> # put data on a regular grid... > >>>>>>>> print 'regridding landpts...' > >>>>>>>> for i in xrange(numpts): > >>>>>>>> # exclude the NaN, note masking them doesn't work in the > >>>>>>>> stats > >>>>>>>> func > >>>>>>>> x = data1_snow[:,i] > >>>>>>>> x = x[np.isfinite(x)] > >>>>>>>> y = data2_snow[:,i] > >>>>>>>> y = y[np.isfinite(y)] > >>>>>>>> > >>>>>>>> # r^2 > >>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 > >>>>>>>> years > >>>>>>>> of > >>>>>>>> data > >>>>>>>> if len(x) and len(y) > 50: > >>>>>>>> pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] = > >>>>>>>> (stats.pearsonr(x, y)[0])**2 > >>>>>>>> > >>>>>>>> # wilcox signed rank test > >>>>>>>> # make sure we have enough samples to do the test > >>>>>>>> d = x - y > >>>>>>>> d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all > >>>>>>>> non-zero > >>>>>>>> differences > >>>>>>>> count = len(d) > >>>>>>>> if count > 10: > >>>>>>>> z, pval = stats.wilcoxon(x, y) > >>>>>>>> # only map out sign different data > >>>>>>>> if pval < 0.05: > >>>>>>>> wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] = > >>>>>>>> np.mean(x - y) > >>>>>>>> > >>>>>>>> return (pearsonsr_snow, wilcoxStats_snow) > >>>>>>>> > >>>>>>>> > >>>>>>>> josef.pktd wrote: > >>>>>>>>> > >>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Also I then need to remap the 2D array I make onto another grid > >>>>>>>>>> (the > >>>>>>>>>> world in > >>>>>>>>>> this case). Which again I had am doing with a loop (note numpts > >>>>>>>>>> is > >>>>>>>>>> a > >>>>>>>>>> lot > >>>>>>>>>> bigger than my example above). > >>>>>>>>>> > >>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) > >>>>>>>>>> * > >>>>>>>>>> np.nan > >>>>>>>>>> for i in xrange(numpts): > >>>>>>>>>> # exclude the NaN, note masking them doesn't work in the > >>>>>>>>>> stats > >>>>>>>>>> func > >>>>>>>>>> x = data1_snow[:,i] > >>>>>>>>>> x = x[np.isfinite(x)] > >>>>>>>>>> y = data2_snow[:,i] > >>>>>>>>>> y = y[np.isfinite(y)] > >>>>>>>>>> > >>>>>>>>>> # wilcox signed rank test > >>>>>>>>>> # make sure we have enough samples to do the test > >>>>>>>>>> d = x - y > >>>>>>>>>> d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all > >>>>>>>>>> non-zero > >>>>>>>>>> differences > >>>>>>>>>> count = len(d) > >>>>>>>>>> if count > 10: > >>>>>>>>>> z, pval = stats.wilcoxon(x, y) > >>>>>>>>>> # only map out sign different data > >>>>>>>>>> if pval < 0.05: > >>>>>>>>>> wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] > >>>>>>>>>> = > >>>>>>>>>> np.mean(x - y) > >>>>>>>>>> > >>>>>>>>>> Now I think I can push the data in one move into the > >>>>>>>>>> wilcoxStats_snow > >>>>>>>>>> array > >>>>>>>>>> by removing the index, > >>>>>>>>>> but I can't see how I will get the individual x and y pts for > >>>>>>>>>> each > >>>>>>>>>> array > >>>>>>>>>> member correctly without the loop, this was my attempt which of > >>>>>>>>>> course > >>>>>>>>>> doesn't work! > >>>>>>>>>> > >>>>>>>>>> x = data1_snow[:,:] > >>>>>>>>>> x = x[np.isfinite(x)] > >>>>>>>>>> y = data2_snow[:,:] > >>>>>>>>>> y = y[np.isfinite(y)] > >>>>>>>>>> > >>>>>>>>>> # r^2 > >>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of > >>>>>>>>>> data > >>>>>>>>>> if len(x) and len(y) > 50: > >>>>>>>>>> pearsonsr_snow[((180-1)-(rows-1)),cols-1] = > (stats.pearsonr(x, > >>>>>>>>>> y)[0])**2 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then > >>>>>>>>> you > >>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d > >>>>>>>>> arrays > >>>>>>>>> at a time (if I read the help correctly). > >>>>>>>>> > >>>>>>>>> Also the presence of nans might force the use a loop. > stats.mstats > >>>>>>>>> has > >>>>>>>>> masked array versions, but I didn't see wilcoxon in the list. > >>>>>>>>> (Even > >>>>>>>>> when vectorized operations would work with regular arrays, nan or > >>>>>>>>> masked array versions still have to loop in many cases.) > >>>>>>>>> > >>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is > not > >>>>>>>>> calculated then it might be worth to use only array operations up > >>>>>>>>> to > >>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's > >>>>>>>>> not > >>>>>>>>> worth thinking too hard about this. > >>>>>>>>> > >>>>>>>>> Josef > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> thanks. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> mdekauwe wrote: > >>>>>>>>>>> > >>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods > >>>>>>>>>>> work. > >>>>>>>>>>> > >>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is > >>>>>>>>>>> there > >>>>>>>>>>> a > >>>>>>>>>>> link you can recommend I should read? Does that mean given I > >>>>>>>>>>> have > >>>>>>>>>>> 4dims > >>>>>>>>>>> that Josef's suggestion would be more advised in this case? > >>>>>>>>> > >>>>>>>>> There were several discussions on the mailing lists (fancy > slicing > >>>>>>>>> and > >>>>>>>>> indexing). Your case is safe, but if you run in future into funny > >>>>>>>>> shapes, you can look up the details. > >>>>>>>>> when in doubt, I use np.arange(...) > >>>>>>>>> > >>>>>>>>> Josef > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thanks. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> josef.pktd wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe < > mdekauwe at gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks that works... > >>>>>>>>>>>>> > >>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that > >>>>>>>>>>>>> was > >>>>>>>>>>>>> the > >>>>>>>>>>>>> step > >>>>>>>>>>>>> I > >>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces > >>>>>>>>>>>>> the > >>>>>>>>>>>>> the > >>>>>>>>>>>>> two > >>>>>>>>>>>>> for > >>>>>>>>>>>>> loops? Do I have that right? > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a > >>>>>>>>>>>> dimension, > >>>>>>>>>>>> then you can use slicing. It might be faster. > >>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more > >>>>>>>>>>>> dimensions can have some surprise switching of axes. > >>>>>>>>>>>> > >>>>>>>>>>>> Josef > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> A lot quicker...! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Martin > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> josef.pktd wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>> 2D > >>>>>>>>>>>>>>> array, > >>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in > >>>>>>>>>>>>>>> reality > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> arrays > >>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the > >>>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>> well > >>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite > >>>>>>>>>>>>>>> difficult > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>> get > >>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I > get > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>> one > >>>>>>>>>>>>>>> could > >>>>>>>>>>>>>>> precompute the indices's i and j i.e. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> i = np.arange(tsteps) > >>>>>>>>>>>>>>> j = np.arange(numpts) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> but just can't get my head round how i then use them... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Martin > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> import numpy as np > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> numpts=10 > >>>>>>>>>>>>>>> tsteps = 12 > >>>>>>>>>>>>>>> vari = 22 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1)) > >>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32) > >>>>>>>>>>>>>>> index = np.arange(numpts) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> for i in xrange(tsteps): > >>>>>>>>>>>>>>> for j in xrange(numpts): > >>>>>>>>>>>>>>> new_data[i,j] = data[i,5,index[j],0] > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The index arrays need to be broadcastable against each > other. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think this should do it > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, > >>>>>>>>>>>>>> np.arange(numpts), > >>>>>>>>>>>>>> 0] > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Josef > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> View this message in context: > >>>>>>>>>>>>>>> > http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html > >>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at > Nabble.com. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>> SciPy-User mailing list > >>>>>>>>>>>>>>> SciPy-User at scipy.org > >>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>> SciPy-User mailing list > >>>>>>>>>>>>>> SciPy-User at scipy.org > >>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> View this message in context: > >>>>>>>>>>>>> > http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html > >>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. > >>>>>>>>>>>>> > >>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>> SciPy-User mailing list > >>>>>>>>>>>>> SciPy-User at scipy.org > >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> SciPy-User mailing list > >>>>>>>>>>>> SciPy-User at scipy.org > >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> View this message in context: > >>>>>>>>>> > http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html > >>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> SciPy-User mailing list > >>>>>>>>>> SciPy-User at scipy.org > >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> SciPy-User mailing list > >>>>>>>>> SciPy-User at scipy.org > >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> View this message in context: > >>>>>>>> > http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html > >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> SciPy-User mailing list > >>>>>>>> SciPy-User at scipy.org > >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> SciPy-User mailing list > >>>>>>> SciPy-User at scipy.org > >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> View this message in context: > >>>>>> > http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html > >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com. > >>>>>> > >>>>>> _______________________________________________ > >>>>>> SciPy-User mailing list > >>>>>> SciPy-User at scipy.org > >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>> > >>>>> _______________________________________________ > >>>>> SciPy-User mailing list > >>>>> SciPy-User at scipy.org > >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>> > >>>>> > >>>> > >>>> -- > >>>> View this message in context: > >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html > >>>> Sent from the Scipy-User mailing list archive at Nabble.com. > >>>> > >>>> _______________________________________________ > >>>> SciPy-User mailing list > >>>> SciPy-User at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>> > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > >>> > >> > >> -- > >> View this message in context: > >> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html > >> Sent from the Scipy-User mailing list archive at Nabble.com. > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > -- > View this message in context: > http://old.nabble.com/removing-for-loops...-tp28633477p28711581.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Fri May 28 18:05:48 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 29 May 2010 00:05:48 +0200 Subject: [SciPy-User] deterministic random variable In-Reply-To: References: Message-ID: >> like (22, 7, 1.) with dtype (int, int, float). > > What is the float in this? The float is intended to refer to the probability mass at the atom. > how do you find which fractions to use? In part this is trivial, e.g., 0.05 = 5/100. A division by the greatest common denominator is assumed (and can be implemented in the background.). Another way would be to use a continued fraction approximation for a given float. There exist (as far as i know) very fast recursive algorithms to compute continued fractions, and it is known that in some sense these fractions are the most efficient to approximate reals. > > I don't want to restrict necessarily to finite number of points, but > countable, e.g. what's the distribution of sqrt(x) where x is Poisson > (just made up). Sure, but numerically this cannot be a problem. At the risk of being mathematically pedantic, but since the range of the the distribution function is bounded (in fact, it is [0,1]) the number of jumps is at most countable. However, even if the number of atoms is countable, most (that is, nearly all) of these atoms cannot be seen by the computer, as these atoms are `too small'. The largest number of atoms that can be seen is roughly 10e-16 (assuming floats, rather than doubles). I cannot image any distribution functions based on empirical data that contains this amount of atoms. > I still need to think about this, I thought the cheapest might be > approx_equal rounding I did not know of this function. , or searchsorted I suppose this is much slower than using fractions in hash tables. > But I think the direct access for a specific x won't be a big usecase, > because the calculations for expectation, cdf or other calculations > can loop over the array of support points. That's why I was thinking > about dual access to pmf. I don't follow you here. > today is my lucky day with typos, how about ETA > http://en.wikipedia.org/wiki/Estimated_time_of_arrival My wife is complaining about my ETA :-) its bed time here. bye Nicky From christophermarkstrickland at gmail.com Fri May 28 21:03:40 2010 From: christophermarkstrickland at gmail.com (Chris Strickland) Date: Sat, 29 May 2010 11:03:40 +1000 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:15 AM, wrote: > > It would need a new method for each distribution, e.g. _loglike, _logpdf > So, this is work, and for some distributions the log wouldn't simplify > much. > > I am not sure what you mean the log wouldn't simply much. > I proposed this once together with other improvements (but without > response). > > This is a little disappointing, it significantly reduces how useful the library is. In actual fact I have not been able to use a single function for anything other than testing (although, I have been using numpy.random for random numbers, this scipy.stats collection seems far more complete). This would dramatically change if a log version of the distribution were available. I think in most cases this would be a straightforward addition at least for the pdf. > The second useful method for estimation would be _fitstart, which > provides distribution specific starting values for fit, e.g. a moment > estimator, or a simple rules of thumb > http://projects.scipy.org/scipy/ticket/808 > > > Here are some of my currently planned enhancements to the distributions: > > > http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py > > but I just checked, it looks like I forgot to copy the _loglike method > that I started from my experimental scripts. > > For a few distributions, where this is possible, it would also be > useful to add the gradient with respect to the parameters, (or even > the Hessian). But this is currently mostly just an idea, since we need > some analytical gradients in the estimation of stats models. > > This certainly would be nice as well. > > > > > If there is not is it possible for me to suggest that this feature is > added. > > There is such an excellent range of distributions, each with such an > > impressive range of options, it seems ashame to have to mostly manually > code > > up the log of pdfs and often call the log of CDFs from R. > > So far I only thought about log pdf, because I wanted it for Maximum > Likelihood estimation. > > It is also necessary for MCMC. > Do you have a rough idea for which distributions log cdf would work? > that is, for which distribution is an analytical or efficient > numerical expression possible. > Not sure off the top of my head as I mainly require the only the pdf. I was, however, doing a little survival analysis the other day though and it was required. The log of the survival and hazard functions would be nice also. So far I have only required the exponential (analytical), weibull (analytical), normal (numerical) and powernormal (analytical function of the log of the normal cdf). I just had a peak at the R source code for pnorm (R's code for the normal cdf). The function is not big and also licensed under the GNU public licence. I assume it could be fairly easily ported to scipy. > > I also think that scipy.stats.distributions could be one of the best > (broadest, consistent) collection of univariate distributions that I > have seen so far, once we fill in some missing pieces. > > As a way forward, I think we could make the distributions into a > numerical encyclopedia by adding private methods to those > distributions where it makes sense, like log pdf, log cdf and I also > started to add characteristic functions to some distributions in my > experimental scripts. > If you have a collection of logpdf, logcdf, we could add a trac ticket for > this. > I could fairly easy whip up a collection of functions to compute the logpdf for a large number of distributions. Not sure about the CDFs but I can look into it as well. The pdf's are definitely far more urgent for my own work. I am a bit busy at work though for the next three weeks so it would have to be after that. > > However, this would miss the generic broadcasting part of the public > functions, pdf, cdf,... but for estimation I wouldn't necessarily call > those because of the overhead. > > > I'm working on and off on this, so it's moving only slowly (and my > wishlist is big). > (for example, I was reading up on extreme value distributions in > actuarial science and hydrology to get a better overview over the > estimators.) > > > So, I really love to hear any ideas, feedback, and see contributions > to improving the distributions. > > Josef > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 28 22:53:37 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 May 2010 22:53:37 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 9:03 PM, Chris Strickland wrote: > > > On Sat, May 29, 2010 at 12:15 AM, wrote: >> >> It would need a new method for each distribution, e.g. _loglike, _logpdf >> So, this is work, and for some distributions the log wouldn't simplify >> much. >> > I am not sure what you mean the log wouldn't simply much. > >> >> I proposed this once together with other improvements (but without >> response). >> > This is a little disappointing, it significantly reduces how useful the > library is. In actual fact I have not been able to use a single function for > anything other than testing (although, I have been using numpy.random for > random numbers, this scipy.stats collection seems far more complete). This > would dramatically change if a log version of the distribution were > available. I think in most cases this would be a straightforward addition at > least for the pdf. I don't think for many use cases log(stats.t.pdf) or many other distributions the performance and accuracy hit would be large enough to make it useless. At least, I haven't seen any other comments in this direction. On of the main use cases for me of stats.distributions are all the statistical test distributions, t, F, chi2 and so on. Howver, in statsmodels we have a mixture of calls to the pdf/cdf of stats.distributions and reimplementations of loglikelhood functions, where the scipy version is also just used for testing. > > >> >> The second useful method for estimation would be _fitstart, which >> provides distribution specific starting values for fit, e.g. a moment >> estimator, or a simple rules of thumb >> http://projects.scipy.org/scipy/ticket/808 >> >> >> Here are some of my currently planned enhancements to the distributions: >> >> >> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py >> >> but I just checked, it looks like I forgot to copy the _loglike method >> that I started from my experimental scripts. >> >> For a few distributions, where this is possible, it would also be >> useful to add the gradient with respect to the parameters, (or even >> the Hessian). But this is currently mostly just an idea, since we need >> some analytical gradients in the estimation of stats models. >> > This certainly would be nice as well. >> >> > >> > If there is not is it possible for me to suggest that this feature is >> > added. >> > There is such an excellent range of distributions, each with such an >> > impressive range of options, it seems ashame to have to mostly manually >> > code >> > up the log of pdfs and often call the log of CDFs from R. >> >> So far I only thought about log pdf, because I wanted it for Maximum >> Likelihood estimation. >> > It is also necessary for MCMC. pymc has many distributions with loglike in fortran for speed, but for most distributions only loglike and rvs are defined, if I remember correctly. > >> >> Do you have a rough idea for which distributions log cdf would work? >> that is, for which distribution is an analytical or efficient >> numerical expression possible. > > Not sure off the top of my head as I mainly require the only the pdf. I was, > however, doing a little survival analysis the other day though and it was > required. The log of the survival and hazard functions would be nice also. > So far I have only required the exponential (analytical), weibull > (analytical), normal (numerical) and powernormal (analytical function of the > log of the normal cdf). I just had a peak at the R source code for pnorm > (R's code for the normal cdf). The function is not big and also licensed > under the GNU public licence. I assume it could be fairly easily ported to > scipy. R's license, GPL, is incompatible with the license of scipy, BSD. While they are allowed to look at our code, code that goes into scipy cannot be based on GPL licensed code. If never seen it mentioned before that there is a direct function for log(norm.cdf). Which functions and packages in R implement the logarithm of the cdf of these distributions? The cdf for several distributions (including normal) is implement in Fortran or C in scipy.special, and I've never seen a log version for them. >> >> I also think that scipy.stats.distributions could be one of the best >> (broadest, consistent) collection of univariate distributions that I >> have seen so far, once we fill in some missing pieces. >> >> As a way forward, I think we could make the distributions into a >> numerical encyclopedia by adding private methods to those >> distributions where it makes sense, like log pdf, log cdf and I also >> started to add characteristic functions to some distributions in my >> experimental scripts. >> If you have a collection of logpdf, logcdf, we could add a trac ticket for >> this. > > I could fairly easy whip up a collection of functions to compute the logpdf > for a large number of distributions. Not sure about the CDFs but I can look > into it as well. The pdf's are definitely far more urgent for my own work. I > am a bit busy at work though for the next three weeks so it would have to be > after that. I looked at some of the distributions, and logpdf could be more efficiently calculated in many of them and very often also logcdf I opened a ticket for this http://projects.scipy.org/scipy/ticket/1184 I also saw that there are still smaller, numerical improvements possible in several distributions. Thanks, Josef >> >> However, this would miss the generic broadcasting part of the public >> functions, pdf, cdf,... but for estimation I wouldn't necessarily call >> those because of the overhead. >> >> >> I'm working on and off on this, so it's moving only slowly (and my >> wishlist is big). >> (for example, I was reading up on extreme value distributions in >> actuarial science and hydrology to get a better overview over the >> estimators.) >> >> >> So, I really love to hear any ideas, feedback, and see contributions >> to improving the distributions. >> >> Josef >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From njs at pobox.com Fri May 28 23:24:21 2010 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 28 May 2010 20:24:21 -0700 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 7:53 PM, wrote: > I don't think for many use cases log(stats.t.pdf) or many other > distributions the performance and accuracy hit would be large enough > to make it useless. At least, I haven't seen any other comments in > this direction. "Useless" is a value judgement, of course, but it doesn't seem *too* far off to me either. I myself basically always find myself wanting log-space values, and even if you're just doing statistical tests, numerical precision in the tails can become very practically relevant when doing multiple hypothesis correction. > R's license, GPL, is incompatible with the license of scipy, BSD. > While they are allowed to look at our code, code that goes into scipy > cannot be based on GPL licensed code. You mean, they're allowed to copy our code, and we're allowed to look at their code for reference but can't use it directly :-). > If never seen it mentioned before that there is a direct function for > log(norm.cdf). Which functions and packages in R implement the > logarithm of the cdf of these distributions? > > The cdf for several distributions (including normal) is implement in > Fortran or C in scipy.special, and I've never seen a log version for > them. Yet R does in fact use specialized code for computing the log-cdf for the normal distribution... at least over some parts of its range. I'm not sure how much difference it makes or anything, I'm just reporting on the existence of 'if' statements in the source :-). See the base R distribution, src/nmath/pnorm.c (which also contains references). -- Nathaniel From christophermarkstrickland at gmail.com Fri May 28 23:34:50 2010 From: christophermarkstrickland at gmail.com (Chris Strickland) Date: Sat, 29 May 2010 13:34:50 +1000 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:53 PM, wrote: > > I don't think for many use cases log(stats.t.pdf) or many other > distributions the performance and accuracy hit would be large enough > to make it useless. At least, I haven't seen any other comments in > this direction. > > On of the main use cases for me of stats.distributions are all the > statistical test distributions, t, F, chi2 and so on. Howver, in > statsmodels we have a mixture of calls to the pdf/cdf of > stats.distributions and reimplementations of loglikelhood functions, > where the scipy version is also just used for testing. > > The main use for me is in specifying (log) prior distributions, (log) posterior distributions and log-likelihood functions. There is simply no way around using the log pdf in the vast majority of cases in MCMC analysis. Whilst it is trivial for me to simply write functions when I need them it would obviously benefit the statistical community as a whole if the option was available in the excellent set of distributions that are available as a part of Scipy. R's license, GPL, is incompatible with the license of scipy, BSD. > While they are allowed to look at our code, code that goes into scipy > cannot be based on GPL licensed code. > > Fair enough. Still at least for the normal cdf we could simply use the references in the R code to write a Scipy version. > If never seen it mentioned before that there is a direct function for > log(norm.cdf). Which functions and packages in R implement the > logarithm of the cdf of these distributions? > pnorm it is in the stats package for the log of the normal CDF. Kind of essential for distributions like the powernormal as well that use the normal cdf as a part of their pdf. > > The cdf for several distributions (including normal) is implement in > Fortran or C in scipy.special, and I've never seen a log version for > them. > > I looked at some of the distributions, and logpdf could be more > efficiently calculated in many of them and very often also logcdf > > I opened a ticket for this > http://projects.scipy.org/scipy/ticket/1184 > > I also saw that there are still smaller, numerical improvements > possible in several distributions. > > Thanks, > > Josef > > ______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 29 00:00:07 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 00:00:07 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith wrote: > On Fri, May 28, 2010 at 7:53 PM, ? wrote: >> I don't think for many use cases log(stats.t.pdf) or many other >> distributions the performance and accuracy hit would be large enough >> to make it useless. At least, I haven't seen any other comments in >> this direction. > > "Useless" is a value judgement, of course, but it doesn't seem *too* > far off to me either. I myself basically always find myself wanting > log-space values, and even if you're just doing statistical tests, > numerical precision in the tails can become very practically relevant > when doing multiple hypothesis correction. > >> R's license, GPL, is incompatible with the license of scipy, BSD. >> While they are allowed to look at our code, code that goes into scipy >> cannot be based on GPL licensed code. > > You mean, they're allowed to copy our code, and we're allowed to look > at their code for reference but can't use it directly :-). We are allowed to look at their manuals but not their code. (Life ain't fair.) > >> If never seen it mentioned before that there is a direct function for >> log(norm.cdf). Which functions and packages in R implement the >> logarithm of the cdf of these distributions? >> >> The cdf for several distributions (including normal) is implement in >> Fortran or C in scipy.special, and I've never seen a log version for >> them. > > Yet R does in fact use specialized code for computing the log-cdf for > the normal distribution... at least over some parts of its range. I'm > not sure how much difference it makes or anything, I'm just reporting > on the existence of 'if' statements in the source :-). See the base R > distribution, src/nmath/pnorm.c (which also contains references). pnorm is the cdf not the log of the cdf, that's what I thought, but I just saw that they have a "log.p" option. from the R manual: """ For pnorm, based on Cody, W. D. (1993) Algorithm 715: SPECFUN ? A portable FORTRAN package of special function routines and test drivers """ this sounds similar to the fortran or c code that scipy.special has. I never tried to read that one, except for the doc comments. accuracy doesn't seem to be a problem np.log(stats.norm.cdf(np.linspace(-20,20,21))) - [r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)] array([ -2.84217094e-14, -2.84217094e-14, -2.84217094e-14, 0.00000000e+00, -1.42108547e-14, -7.10542736e-15, -7.10542736e-15, -3.55271368e-15, -1.77635684e-15, -4.44089210e-16, 0.00000000e+00, 0.00000000e+00, -5.42101086e-20, -5.53867815e-17, -4.40377573e-17, 7.61985302e-24, 1.77648211e-33, 7.79353682e-45, 6.38875440e-58, 9.74094892e-73, 2.75362412e-89]) except the small numbers in the tail look much better in R >>> np.log(stats.norm.cdf(np.linspace(-20,20,21))) array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02, -1.01563034e+02, -7.54106730e+01, -5.32312852e+01, -3.50134372e+01, -2.07367689e+01, -1.03601015e+01, -3.78318433e+00, -6.93147181e-01, -2.30129093e-02, -3.16717434e-05, -9.86587701e-10, -6.66133815e-16, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]) >>> np.array([r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)]) array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02, -1.01563034e+02, -7.54106730e+01, -5.32312852e+01, -3.50134372e+01, -2.07367689e+01, -1.03601015e+01, -3.78318433e+00, -6.93147181e-01, -2.30129093e-02, -3.16717434e-05, -9.86587646e-10, -6.22096057e-16, -7.61985302e-24, -1.77648211e-33, -7.79353682e-45, -6.38875440e-58, -9.74094892e-73, -2.75362412e-89]) except if we use a branch cut >>> np.log1p(-stats.norm.sf(np.linspace(-20,20,21))) array([ -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -3.49450411e+01, -2.07367689e+01, -1.03601015e+01, -3.78318433e+00, -6.93147181e-01, -2.30129093e-02, -3.16717434e-05, -9.86587646e-10, -6.22096057e-16, -7.61985302e-24, -1.77648211e-33, -7.79353682e-45, -6.38875440e-58, -9.74094892e-73, -2.75362412e-89]) I have no idea about speed. Josef > > -- Nathaniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Sat May 29 00:15:49 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 00:15:49 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 11:34 PM, Chris Strickland wrote: > > > On Sat, May 29, 2010 at 12:53 PM, wrote: >> >> I don't think for many use cases log(stats.t.pdf) or many other >> distributions the performance and accuracy hit would be large enough >> to make it useless. At least, I haven't seen any other comments in >> this direction. >> >> On of the main use cases for me of stats.distributions are all the >> statistical test distributions, t, F, chi2 and so on. Howver, in >> statsmodels we have a mixture of calls to the pdf/cdf of >> stats.distributions and reimplementations of loglikelhood functions, >> where the scipy version is also just used for testing. >> > The main use for me is in specifying (log) prior distributions, (log) > posterior distributions and log-likelihood functions. There is simply no way > around using the log pdf in the vast majority of cases in MCMC analysis. > Whilst it is trivial for me to simply write functions when I need them it > would obviously benefit the statistical community as a whole if the option > was available in the excellent set of distributions that are available as a > part of Scipy. I agree that it would be very good to have this generally available, and I will appreciate it for maximum likelihood. For MCMC (where I know only little about the details), it might, however, always be faster to work with dedicated code as in pymc. > >> >> R's license, GPL, is incompatible with the license of scipy, BSD. >> While they are allowed to look at our code, code that goes into scipy >> cannot be based on GPL licensed code. >> > Fair enough. Still at least for the normal cdf we could simply use the > references in the R code to write a Scipy version. If it's the C or Fortran implementation, then it is out of my competence, I'm a pure scripting language person. Another idea for this would be to see if any of the pymc code for this would fit into scipy. Since I leave Fortran to others, I never looked at it. I think if we get the easier cases, logpdf and logcdf that don't require compiled versions, we would be able to cover already a considerable range of the distributions. However, I also agree now, having norm.logcdf would also be useful for many other distributions. > >> >> If never seen it mentioned before that there is a direct function for >> log(norm.cdf). Which functions and packages in R implement the >> logarithm of the cdf of these distributions? > > pnorm it is in the stats package for the log of the normal CDF. Kind of > essential for distributions like the powernormal as well that use the normal > cdf as a part of their pdf. see previous message, I never paid enough attention to see the log.p option Josef >> >> The cdf for several distributions (including normal) is implement in >> Fortran or C in scipy.special, and I've never seen a log version for >> them. >> >> I looked at some of the distributions, and logpdf could be more >> efficiently calculated in many of them and very often also logcdf >> >> I opened a ticket for this >> http://projects.scipy.org/scipy/ticket/1184 >> >> I also saw that there are still smaller, numerical improvements >> possible in several distributions. >> >> Thanks, >> >> Josef >> >> ______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Sat May 29 00:20:51 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 00:20:51 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:15 AM, wrote: > On Fri, May 28, 2010 at 11:34 PM, Chris Strickland > wrote: >> >> >> On Sat, May 29, 2010 at 12:53 PM, wrote: >>> >>> I don't think for many use cases log(stats.t.pdf) or many other >>> distributions the performance and accuracy hit would be large enough >>> to make it useless. At least, I haven't seen any other comments in >>> this direction. >>> >>> On of the main use cases for me of stats.distributions are all the >>> statistical test distributions, t, F, chi2 and so on. Howver, in >>> statsmodels we have a mixture of calls to the pdf/cdf of >>> stats.distributions and reimplementations of loglikelhood functions, >>> where the scipy version is also just used for testing. >>> >> The main use for me is in specifying (log) prior distributions, (log) >> posterior distributions and log-likelihood functions. There is simply no way >> around using the log pdf in the vast majority of cases in MCMC analysis. >> Whilst it is trivial for me to simply write functions when I need them it >> would obviously benefit the statistical community as a whole if the option >> was available in the excellent set of distributions that are available as a >> part of Scipy. > > I agree that it would be very good to have this generally available, > and I will appreciate it for maximum likelihood. > For MCMC (where I know only little about the details), it might, > however, always be faster to work with dedicated code as in pymc. > >> >>> >>> R's license, GPL, is incompatible with the license of scipy, BSD. >>> While they are allowed to look at our code, code that goes into scipy >>> cannot be based on GPL licensed code. >>> >> Fair enough. Still at least for the normal cdf we could simply use the >> references in the R code to write a Scipy version. > > If it's the C or Fortran implementation, then it is out of my > competence, I'm a pure scripting language person. > > Another idea for this would be to see if any of the pymc code for this > would fit into scipy. Since I leave Fortran to others, I never looked > at it. I'm contradicting and confusing myself, I don't think pymc has any cdf code, only pdf. Josef > > I think if we get the easier cases, logpdf and logcdf that don't > require compiled versions, we would be able to cover already a > considerable range of the distributions. > > However, I also agree now, having norm.logcdf would also be useful for > many other distributions. > >> >>> >>> If never seen it mentioned before that there is a direct function for >>> log(norm.cdf). Which functions and packages in R implement the >>> logarithm of the cdf of these distributions? >> >> pnorm it is in the stats package for the log of the normal CDF. Kind of >> essential for distributions like the powernormal as well that use the normal >> cdf as a part of their pdf. > > see previous message, I never paid enough attention to see the log.p option > > Josef > >>> >>> The cdf for several distributions (including normal) is implement in >>> Fortran or C in scipy.special, and I've never seen a log version for >>> them. >>> >>> I looked at some of the distributions, and logpdf could be more >>> efficiently calculated in many of them and very often also logcdf >>> >>> I opened a ticket for this >>> http://projects.scipy.org/scipy/ticket/1184 >>> >>> I also saw that there are still smaller, numerical improvements >>> possible in several distributions. >>> >>> Thanks, >>> >>> Josef >>> >>> ______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > From christophermarkstrickland at gmail.com Sat May 29 00:32:48 2010 From: christophermarkstrickland at gmail.com (Chris Strickland) Date: Sat, 29 May 2010 14:32:48 +1000 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 2:20 PM, wrote: > > Josef > > > > > I think if we get the easier cases, logpdf and logcdf that don't > > require compiled versions, we would be able to cover already a > > considerable range of the distributions. > > > > However, I also agree now, having norm.logcdf would also be useful for > > many other distributions. > > > I can write in C and Fortran (I prefer Fortran with Python) so I could easily write code ( in around three weeks when my workload reduces) for cases where compiled languages are required. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 29 00:43:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 00:43:04 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:32 AM, Chris Strickland wrote: > > > On Sat, May 29, 2010 at 2:20 PM, wrote: >> >> Josef >> >> > >> > I think if we get the easier cases, logpdf and logcdf that don't >> > require compiled versions, we would be able to cover already a >> > considerable range of the distributions. >> > >> > However, I also agree now, having norm.logcdf would also be useful for >> > many other distributions. >> > > > I? can write in C and Fortran (I prefer Fortran with Python) so I could > easily write code ( in around three weeks when my workload reduces) for > cases where compiled languages are required. I'm busy for another month. Just a warning in case you don't know: scipy is still stuck at fortran 77 because some platforms (e.g. Windows with mingw - which I use) support only g77. I don't know when the upgrade will happen. cython would be an alternative that's easier to maintain. Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From christophermarkstrickland at gmail.com Sat May 29 00:55:26 2010 From: christophermarkstrickland at gmail.com (Chris Strickland) Date: Sat, 29 May 2010 14:55:26 +1000 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 2:43 PM, wrote: > On Sat, May 29, 2010 at 12:32 AM, Chris Strickland > wrote: > > > > > > On Sat, May 29, 2010 at 2:20 PM, wrote: > >> > >> Josef > >> > >> > > >> > I think if we get the easier cases, logpdf and logcdf that don't > >> > require compiled versions, we would be able to cover already a > >> > considerable range of the distributions. > >> > > >> > However, I also agree now, having norm.logcdf would also be useful for > >> > many other distributions. > >> > > > > > I can write in C and Fortran (I prefer Fortran with Python) so I could > > easily write code ( in around three weeks when my workload reduces) for > > cases where compiled languages are required. > > I'm busy for another month. > > Just a warning in case you don't know: scipy is still stuck at fortran > 77 because some platforms (e.g. Windows with mingw - which I use) > support only g77. I don't know when the upgrade will happen. > > cython would be an alternative that's easier to maintain. > > Josef > > Fortran77 isn't a problem assuming we can just link our Fortran using f2py. I don't really have any experience linking code manually. Hmm, the g77 compiler has been superseded by the gfortran complier. Does this not work under Windows? > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Sat May 29 01:15:21 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Fri, 28 May 2010 22:15:21 -0700 Subject: [SciPy-User] log pdf, cdf, etc Message-ID: The package PyMC(http://code.google.com/p/pymc/) contains fortran log likelihood functions for a lot of distributions, but you would have to look at the source code to figure out how to use them since they are meant mostly for internal use. They are not ufuncs but can handle arrays or single values for each parameter. A recent PyMC branch also contains similar log likelihood gradient functions for the same distributions ( http://github.com/pymc-devs/pymc/tree/gradientBranch). Hope that is useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.boumans at gmx.net Sat May 29 00:47:33 2010 From: m.boumans at gmx.net (bowie_22) Date: Sat, 29 May 2010 04:47:33 +0000 (UTC) Subject: [SciPy-User] leastsq interface and features Message-ID: Hello, at the moment am I am evaluating scipy as a subsitute for Matlab. One important use case for me is to fit a model to measured data. In Matlab I use lsqnonlin from the Optimization Toolbox. In Scipy I would use leastsq. By comparing the 2 approaches with a "daily use" point of view I see the following improvements for the scipy module 1) setting the options for the algorithm: ML uses a structure together with optimset optimget --> lsqnonlin has a quite short signature IMPROVEMENT: introduce a common options structure for all optimization algos 2) there is the possibilty to set an output function that is called in each iteration step in ML. That can be used for displaying the current status of the optimization. For me a quite important point as my customers want to "see" what happens (not just throwing measured data to an algorithm and get back a set of numbers) IMPROVEMENT: introduce a output function that can be called each iteration 3) give lower and upper bounds for the optimization variables. Also quite important as in my uses cases you have normally an idea in which range your parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide this knowledge as lower bounds and upper bounds to lsqnonlin. IMPROVEMENT: introduce lower and upper bounds My problem: How can I help to get this improvements to scipy? Is this the correct address to ask? Regs Marcus From josef.pktd at gmail.com Sat May 29 10:46:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 10:46:14 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 1:15 AM, John Salvatier wrote: > The package PyMC(http://code.google.com/p/pymc/) contains fortran log > likelihood functions for a lot of distributions, but you would have to look > at the source code to figure out how to use them since they are meant mostly > for internal use. They are not ufuncs but can handle arrays or single values > for each parameter. A recent PyMC branch also contains similar log > likelihood gradient functions for the same distributions > (http://github.com/pymc-devs/pymc/tree/gradientBranch). Thanks, I will have a look at the gradient branch To get started, I added a test script to the ticked that makes it easier to add and test new methods for lnpdf and lncdf. It's adapted from the scipy tests and tests all distributions that have a _lnpdf or _lncdf method. The new methods can just be added to the script to monkey patch scipy.stats.distributions. I monkey patched 13 easy cases so far, mainly to check that the script works. (Still far too go for full coverage of cases where this makes sense.) The tests use nosetests and test for (almost) equality of the new methods with the log of the old methods, and check a simple broadcasting case. Everyone is invited to add new cases, and to report any problems with the script. I hope that helps to get the ball rolling. Josef > > Hope that is useful. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From vincent at vincentdavis.net Sat May 29 11:06:42 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 29 May 2010 09:06:42 -0600 Subject: [SciPy-User] leastsq interface and features In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 10:47 PM, bowie_22 wrote: > Hello, > > at the moment am I am evaluating scipy as a subsitute for Matlab. > One important use case for me is to fit a model to measured data. > > In Matlab I use lsqnonlin from the Optimization Toolbox. > In Scipy I would use leastsq. > > By comparing the 2 approaches with a "daily use" point of view I see the > following improvements for the scipy module > > 1) setting the options for the algorithm: > ML uses a structure together with optimset optimget > --> lsqnonlin has a quite short signature > IMPROVEMENT: introduce a common options structure for all optimization > algos > > 2) there is the possibilty to set an output function that is called in each > iteration step in ML. That can be used for displaying the current status of > the > optimization. For me a quite important point as my customers want to "see" > what > happens (not just throwing measured data to an algorithm and get back a set > of > numbers) > IMPROVEMENT: introduce a output function that can be called each iteration > > 3) give lower and upper bounds for the optimization variables. Also quite > important as in my uses cases you have normally an idea in which range your > parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide > this > knowledge as lower bounds and upper bounds to lsqnonlin. > IMPROVEMENT: introduce lower and upper bounds > > My problem: > How can I help to get this improvements to scipy? Is this the correct > address to > ask? > This is a good place to ask, I am surprised you have not already gotten several responses. I see in the docs that (?leastsq? is a wrapper around MINPACK?s lmdif and lmder algorithms.) You can also file a ticket at scipy, an example would be http://projects.scipy.org/scipy/ticket/808 You can take a look at the source code. http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py That said you might want to look at http://statsmodels.sourceforge.net/ And you could help by contributing :) This is not a part of the project I am real familiar with but should be. I don't have much more in the way of answers, I just don't know them :) Vincent > Regs > > Marcus > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > *Vincent Davis 720-301-3003 * vincent at vincentdavis.net my blog | LinkedIn -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 29 11:42:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 11:42:12 -0400 Subject: [SciPy-User] leastsq interface and features In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 11:06 AM, Vincent Davis wrote: > On Fri, May 28, 2010 at 10:47 PM, bowie_22 wrote: > >> Hello, >> >> at the moment am I am evaluating scipy as a subsitute for Matlab. >> One important use case for me is to fit a model to measured data. >> >> In Matlab I use lsqnonlin from the Optimization Toolbox. >> In Scipy I would use leastsq. >> >> By comparing the 2 approaches with a "daily use" point of view I see the >> following improvements for the scipy module >> >> 1) setting the options for the algorithm: >> ML uses a structure together with optimset optimget >> --> lsqnonlin has a quite short signature >> IMPROVEMENT: introduce a common options structure for all optimization >> algos >> >> 2) there is the possibilty to set an output function that is called in >> each >> iteration step in ML. That can be used for displaying the current status >> of the >> optimization. For me a quite important point as my customers want to "see" >> what >> happens (not just throwing measured data to an algorithm and get back a >> set of >> numbers) >> IMPROVEMENT: introduce a output function that can be called each >> iteration >> >> 3) give lower and upper bounds for the optimization variables. Also quite >> important as in my uses cases you have normally an idea in which range >> your >> parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide >> this >> knowledge as lower bounds and upper bounds to lsqnonlin. >> IMPROVEMENT: introduce lower and upper bounds >> >> My problem: >> How can I help to get this improvements to scipy? Is this the correct >> address to >> ask? >> > > This is a good place to ask, I am surprised you have not already gotten > several responses. > > I see in the docs that (?leastsq? is a wrapper around MINPACK?s lmdif and > lmder algorithms.) > > You can also file a ticket at scipy, an example would be > http://projects.scipy.org/scipy/ticket/808 > You can take a look at the source code. > http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py > > That said you might want to look at http://statsmodels.sourceforge.net/ And you could help by contributing :) This is not a part of the project I > am real familiar with but should be. > statsmodels has nothing to offer for this case. A consistent interface to solvers is in openopt. I don't know if leastsq could be extended, but for constraint minimization scipy has other minimizers available. I think many of the other fmins have callbacks and printing Josef > > I don't have much more in the way of answers, I just don't know them :) > > Vincent > > >> Regs >> >> Marcus >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > *Vincent Davis > 720-301-3003 * > vincent at vincentdavis.net > my blog | LinkedIn > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.reid at mail.cryst.bbk.ac.uk Sat May 29 12:18:07 2010 From: j.reid at mail.cryst.bbk.ac.uk (John Reid) Date: Sat, 29 May 2010 17:18:07 +0100 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: josef.pktd at gmail.com wrote: > On Fri, May 28, 2010 at 7:29 AM, Chris Strickland > wrote: >> Hi, >> >> When using any of the distributions of scipy.stats there does not seem to be >> the ability (or at least I cannot figure out how) to have the function >> return >> the log of the pdf, cdf, sf, etc. For statistical analysis this is >> essential. >> For instance suppose we are interested in an exponential distribution for a >> random variable x with a hyperparameter lambda there needs to be an option >> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >> calculate log(scipy.stats.expon.pdf(x,lambda)). >> >> Is there a way to do this using the distributions in scipy.stats? > > It would need a new method for each distribution, e.g. _loglike, _logpdf > So, this is work, and for some distributions the log wouldn't simplify much. Presumably it would be easy to add a method to all distributions that called the pdf and took its log. This could be over-riden for those distributions for which a specialised log_pdf implementation was available. This would make the entry cost of providing the functionality lower. From josef.pktd at gmail.com Sat May 29 12:49:06 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 12:49:06 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:18 PM, John Reid wrote: > josef.pktd at gmail.com wrote: >> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland >> wrote: >>> Hi, >>> >>> When using any of the distributions of scipy.stats there does not seem to be >>> the ability (or at least I cannot figure out how) to have the function >>> return >>> the log of the pdf, cdf, sf, etc. For statistical analysis this is >>> essential. >>> For instance suppose we are interested in an exponential distribution for a >>> random variable x with a hyperparameter lambda there needs to be an option >>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >>> calculate log(scipy.stats.expon.pdf(x,lambda)). >>> >>> Is there a way to do this using the distributions in scipy.stats? >> >> It would need a new method for each distribution, e.g. _loglike, _logpdf >> So, this is work, and for some distributions the log wouldn't simplify much. > > Presumably it would be easy to add a method to all distributions that > called the pdf and took its log. This could be over-riden for those > distributions for which a specialised log_pdf implementation was > available. This would make the entry cost of providing the functionality > lower. Yes, I haven't thought about it yet for this case, but that's how the system for the current methods works, only _pdf or _cdf is required, all other methods have generic substitutes (which are sometimes very slow.) For testing, not having the generic version is easier, I have to figure out again how to check whether a method was defined in the super or the sub class (instead of using hasattr). a naming question _lnpdf or _logpdf ? _lncdf or _logcdf ? Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Sat May 29 13:21:53 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 29 May 2010 13:21:53 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 12:49 PM, wrote: > a naming question > _lnpdf or _logpdf ? _lncdf or _logcdf ? > My vote would be for log over ln since np.log(np.e) == 1. Skipper From josef.pktd at gmail.com Sat May 29 14:20:19 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 14:20:19 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 1:21 PM, Skipper Seabold wrote: > On Sat, May 29, 2010 at 12:49 PM, ? wrote: >> a naming question >> _lnpdf or _logpdf ? _lncdf or _logcdf ? >> > > My vote would be for log over ln since np.log(np.e) == 1. Yes, I don't know what I was thinking early in the morning, nobody seems to use ln anymore I edited the script rename ln ->log make print optional add generic method replace hasattr by (not sure this is the best way) def isnotgeneric(distfn, methodname): sub = getattr(distfn, methodname) gen = getattr(stats.distributions.rv_continuous, methodname) return not sub.im_func is gen.im_func (generic methods don't pass the tests for all distributions, there are some problems with broadcasting in the current _pdf, _cdf implementations for some distributions) Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Sat May 29 14:29:10 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 May 2010 12:29:10 -0600 Subject: [SciPy-User] leastsq interface and features In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 10:47 PM, bowie_22 wrote: > Hello, > > at the moment am I am evaluating scipy as a subsitute for Matlab. > One important use case for me is to fit a model to measured data. > > In Matlab I use lsqnonlin from the Optimization Toolbox. > In Scipy I would use leastsq. > > By comparing the 2 approaches with a "daily use" point of view I see the > following improvements for the scipy module > > 1) setting the options for the algorithm: > ML uses a structure together with optimset optimget > --> lsqnonlin has a quite short signature > IMPROVEMENT: introduce a common options structure for all optimization > algos > > I hate the options structure. It's a hidden super-huge signature with poorly chosen defaults. > 2) there is the possibilty to set an output function that is called in each > iteration step in ML. That can be used for displaying the current status of > the > optimization. For me a quite important point as my customers want to "see" > what > happens (not just throwing measured data to an algorithm and get back a set > of > numbers) > IMPROVEMENT: introduce a output function that can be called each iteration > > Possible, I think. > 3) give lower and upper bounds for the optimization variables. Also quite > important as in my uses cases you have normally an idea in which range your > parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide > this > knowledge as lower bounds and upper bounds to lsqnonlin. > IMPROVEMENT: introduce lower and upper bounds > > Suggest using a different function for this. Matlab tends to over-overload it's functions. > My problem: > How can I help to get this improvements to scipy? Is this the correct > address to > ask? > > Yes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 29 15:44:20 2010 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 29 May 2010 12:44:20 -0700 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Fri, May 28, 2010 at 9:00 PM, wrote: > On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith wrote: >> On Fri, May 28, 2010 at 7:53 PM, ? wrote: >>> R's license, GPL, is incompatible with the license of scipy, BSD. >>> While they are allowed to look at our code, code that goes into scipy >>> cannot be based on GPL licensed code. >> >> You mean, they're allowed to copy our code, and we're allowed to look >> at their code for reference but can't use it directly :-). > > We are allowed to look at their manuals but not their code. > (Life ain't fair.) It sounds like you guys have this well in hand, but just a point here -- you certainly are allowed to look at their code, just not copy the "expressive aspects" of it. (Saying you can't *look* at it because of the license is like saying writers can't read other people's novels!) "Expressive" is a tricky term, of course -- IIUC it's basically anything that could be changed while preserving functionality (because the functionality, the algorithm itself, is not covered by copyright). So, say, variable names certainly count as expressive, decisions about which way to lay out the code, etc. If one wants to be really safe, one can write down a textual description of the algorithm and then ask someone else to translate back to code (the "clean room" method). So you do have to be a bit careful, but when you have code that contains valuable information that isn't really written down anywhere else then I'd say it's worth it. -- Nathaniel From oliphant at enthought.com Sat May 29 16:51:38 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Sat, 29 May 2010 15:51:38 -0500 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote: > On Fri, May 28, 2010 at 7:29 AM, Chris Strickland > wrote: >> Hi, >> >> When using any of the distributions of scipy.stats there does not seem to be >> the ability (or at least I cannot figure out how) to have the function >> return >> the log of the pdf, cdf, sf, etc. For statistical analysis this is >> essential. >> For instance suppose we are interested in an exponential distribution for a >> random variable x with a hyperparameter lambda there needs to be an option >> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >> calculate log(scipy.stats.expon.pdf(x,lambda)). >> >> Is there a way to do this using the distributions in scipy.stats? > > It would need a new method for each distribution, e.g. _loglike, _logpdf > So, this is work, and for some distributions the log wouldn't simplify much. > > I proposed this once together with other improvements (but without response). > > The second useful method for estimation would be _fitstart, which > provides distribution specific starting values for fit, e.g. a moment > estimator, or a simple rules of thumb > http://projects.scipy.org/scipy/ticket/808 > > > Here are some of my currently planned enhancements to the distributions: > > http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py Hey Josef, I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution). I also added your _fitstart suggestion. I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet. Do you have updated code I could look at. These are relatively easy adds that I would like to put in today. Do you have check-in rights to SciPy? Thanks, -Travis > > but I just checked, it looks like I forgot to copy the _loglike method > that I started from my experimental scripts. > > For a few distributions, where this is possible, it would also be > useful to add the gradient with respect to the parameters, (or even > the Hessian). But this is currently mostly just an idea, since we need > some analytical gradients in the estimation of stats models. > > >> >> If there is not is it possible for me to suggest that this feature is added. >> There is such an excellent range of distributions, each with such an >> impressive range of options, it seems ashame to have to mostly manually code >> up the log of pdfs and often call the log of CDFs from R. > > So far I only thought about log pdf, because I wanted it for Maximum > Likelihood estimation. > > Do you have a rough idea for which distributions log cdf would work? > that is, for which distribution is an analytical or efficient > numerical expression possible. > > I also think that scipy.stats.distributions could be one of the best > (broadest, consistent) collection of univariate distributions that I > have seen so far, once we fill in some missing pieces. > > As a way forward, I think we could make the distributions into a > numerical encyclopedia by adding private methods to those > distributions where it makes sense, like log pdf, log cdf and I also > started to add characteristic functions to some distributions in my > experimental scripts. > If you have a collection of logpdf, logcdf, we could add a trac ticket for this. > > However, this would miss the generic broadcasting part of the public > functions, pdf, cdf,... but for estimation I wouldn't necessarily call > those because of the overhead. > > > I'm working on and off on this, so it's moving only slowly (and my > wishlist is big). > (for example, I was reading up on extreme value distributions in > actuarial science and hydrology to get a better overview over the > estimators.) > > > So, I really love to hear any ideas, feedback, and see contributions > to improving the distributions. > > Josef > > >> >> Thanks, >> Chris. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From d.l.goldsmith at gmail.com Sat May 29 17:14:35 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 29 May 2010 14:14:35 -0700 Subject: [SciPy-User] [OT] Fwd: (Former student of mine) Searching for Python studies Message-ID: Hi! RaNaldo is a former student of mine; he's sought my advice for formally continuing his programming studies in Python, and I'm afraid I'm at a loss for specific resources to which he may be referred. If any one can help him out, he (and I) would be most appreciative. Thanks! David Goldsmith On Wed, May 12, 2010 at 11:24 AM, RaNaldo Shorter wrote: Hello Mr. Goldsmith, Could you help me find formal training to learn Python? college preferred any accredited sort is best. I cloud simply study at home.... I was introduced to Python from you during studies at The Art Institute of Seattle, Spring 2008. Thank you for your time helping me. >From RaNaldo Shorter www.ranaldos.com ---------- Forwarded message ---------- From: RaNaldo Shorter Date: Wed, May 12, 2010 at 12:19 PM Subject: RE: Searching for Python studies To: David Goldsmith Thank you for remembering me! I've answered your questions below. Thanks. From: David Goldsmith [mailto:d.l.goldsmith at gmail.com] Sent: Wednesday, May 12, 2010 11:43 AM To: RaNaldo Shorter Subject: Re: Searching for Python studies Hi, RaNaldo, I remember you. I'm going to "punt" this one to some lists I'm on; some info people will want to know: 0) How much Python programming have you done since my class? Ans: I've not deeply delved into Python since your work with us. [DG: For reference, we used Dawson, M. 2006 "Python Programming for the Absolute Beginner, 2nd Ed." as the text, which uses game development as its motivational basis, and is organized the "traditional" way of presenting procedural/structured programming needs and techniques first, OO concepts and techniques second.] However, I currently use Digital Tutors " when I have immediate needs for information. 1) How much, and what kind of, programming have you done in languages other than Python? Ans: Other languages I use (need and want) includes Maya Embedded Script Language (MEL) and standard X-HTML, XML, CSS, and some Java Script. 2) How much and what kind of non-programming computer experience do you have? Ans: I have a fair amount. My first computer was IBM in 1985 when I nearly started learning fundamental PC languages before Windows. 3) Is online study an option (i.e., do you need it to be a traditional, lecture-format course)? Ans: On-line study is a good option, yet I love classroom possibilities... 4) What sort of Python programming do you want to (perhaps ultimately) learn (e.g., general purpose, Web development, UI design & implementation, database, data processing, graphic design/animation, scripting other programs w/ Python) and how far do you want to go with it, i.e., what's your motivation? Ans: Am currently learning MEL from Digital Tutors . I'm motivated by the frequent choices to use Python in many of my simulation programs especially "Real Flow " a fluid dynamics simulation application. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 29 17:20:25 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 17:20:25 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant wrote: > > On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote: > >> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland >> wrote: >>> Hi, >>> >>> When using any of the distributions of scipy.stats there does not seem to be >>> the ability (or at least I cannot figure out how) to have the function >>> return >>> the log of the pdf, cdf, sf, etc. For statistical analysis this is >>> essential. >>> For instance suppose we are interested in an exponential distribution for a >>> random variable x with a hyperparameter lambda there needs to be an option >>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >>> calculate log(scipy.stats.expon.pdf(x,lambda)). >>> >>> Is there a way to do this using the distributions in scipy.stats? >> >> It would need a new method for each distribution, e.g. _loglike, _logpdf >> So, this is work, and for some distributions the log wouldn't simplify much. >> >> I proposed this once together with other improvements (but without response). >> >> The second useful method for estimation would be _fitstart, which >> provides distribution specific starting values for fit, e.g. a moment >> estimator, or a simple rules of thumb >> http://projects.scipy.org/scipy/ticket/808 >> >> >> Here are some of my currently planned enhancements to the distributions: >> >> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py > > Hey Josef, > > I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution). > > I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet. > > Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy? I just committed the changes for the _logpdf, ..., I didn't see any changes of yours in the timeline, nor in svn changes, plus a fix to internal wrapcauchy_cdf generic _logpdf, logcdf and the 13 cases of my test script are in svn Josef Josef > > Thanks, > > -Travis > >> >> but I just checked, it looks like I forgot to copy the _loglike method >> that I started from my experimental scripts. >> >> For a few distributions, where this is possible, it would also be >> useful to add the gradient with respect to the parameters, (or even >> the Hessian). But this is currently mostly just an idea, since we need >> some analytical gradients in the estimation of stats models. >> >> >>> >>> If there is not is it possible for me to suggest that this feature is added. >>> There is such an excellent range of distributions, each with such an >>> impressive range of options, it seems ashame to have to mostly manually code >>> up the log of pdfs and often call the log of CDFs from R. >> >> So far I only thought about log pdf, because I wanted it for Maximum >> Likelihood estimation. >> >> Do you have a rough idea for which distributions log cdf would work? >> that is, for which distribution is an analytical or efficient >> numerical expression possible. >> >> I also think that scipy.stats.distributions could be one of the best >> (broadest, consistent) collection of univariate distributions that I >> have seen so far, once we fill in some missing pieces. >> >> As a way forward, I think we could make the distributions into a >> numerical encyclopedia by adding private methods to those >> distributions where it makes sense, like log pdf, log cdf and I also >> started to add characteristic functions to some distributions in my >> experimental scripts. >> If you have a collection of logpdf, logcdf, we could add a trac ticket for this. >> >> However, this would miss the generic broadcasting part of the public >> functions, pdf, cdf,... but for estimation I wouldn't necessarily call >> those because of the overhead. >> >> >> I'm working on and off on this, so it's moving only slowly (and my >> wishlist is big). >> (for example, I was reading up on extreme value distributions in >> actuarial science and hydrology to get a better overview over the >> estimators.) >> >> >> So, I really love to hear any ideas, feedback, and see contributions >> to improving the distributions. >> >> Josef >> >> >>> >>> Thanks, >>> Chris. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > --- > Travis Oliphant > Enthought, Inc. > oliphant at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Sat May 29 17:38:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 17:38:46 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant wrote: > > On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote: > >> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland >> wrote: >>> Hi, >>> >>> When using any of the distributions of scipy.stats there does not seem to be >>> the ability (or at least I cannot figure out how) to have the function >>> return >>> the log of the pdf, cdf, sf, etc. For statistical analysis this is >>> essential. >>> For instance suppose we are interested in an exponential distribution for a >>> random variable x with a hyperparameter lambda there needs to be an option >>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >>> calculate log(scipy.stats.expon.pdf(x,lambda)). >>> >>> Is there a way to do this using the distributions in scipy.stats? >> >> It would need a new method for each distribution, e.g. _loglike, _logpdf >> So, this is work, and for some distributions the log wouldn't simplify much. >> >> I proposed this once together with other improvements (but without response). >> >> The second useful method for estimation would be _fitstart, which >> provides distribution specific starting values for fit, e.g. a moment >> estimator, or a simple rules of thumb >> http://projects.scipy.org/scipy/ticket/808 >> >> >> Here are some of my currently planned enhancements to the distributions: >> >> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py > > Hey Josef, > > I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution). I would like to get the private _logpdf in a useful (vectorized or broadcastable) version because for estimation and optimization, I want to avoid the logpdf overhead. So, my testing will be on the underline versions. > > I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet. I have written a semi-frozen fit function and posted to the mailing list a long time ago, but since I'm not sure about the API and I'm expanding to several new estimators, I kept this under work-in-progress. Similar _fitstart might need extra options, for estimation when some parameters are fixed, e.g. there are good moment estimators that work when some of the parameters (e.g. loc or scale) are fixed. Also _fitstart is currently used only by my fit_frozen. I was hoping to get this done this year, maybe together with the enhancements that Per Brodtkorb proposed two years ago, e.g. Method of Maximum Spacings. I also have a Generalized Method of Moments estimator based on matching quantiles and moments in the works. So, I don't want yet to be pinned down with any API for the estimation enhancements. Josef > > Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy? > > Thanks, > > -Travis > >> >> but I just checked, it looks like I forgot to copy the _loglike method >> that I started from my experimental scripts. >> >> For a few distributions, where this is possible, it would also be >> useful to add the gradient with respect to the parameters, (or even >> the Hessian). But this is currently mostly just an idea, since we need >> some analytical gradients in the estimation of stats models. >> >> >>> >>> If there is not is it possible for me to suggest that this feature is added. >>> There is such an excellent range of distributions, each with such an >>> impressive range of options, it seems ashame to have to mostly manually code >>> up the log of pdfs and often call the log of CDFs from R. >> >> So far I only thought about log pdf, because I wanted it for Maximum >> Likelihood estimation. >> >> Do you have a rough idea for which distributions log cdf would work? >> that is, for which distribution is an analytical or efficient >> numerical expression possible. >> >> I also think that scipy.stats.distributions could be one of the best >> (broadest, consistent) collection of univariate distributions that I >> have seen so far, once we fill in some missing pieces. >> >> As a way forward, I think we could make the distributions into a >> numerical encyclopedia by adding private methods to those >> distributions where it makes sense, like log pdf, log cdf and I also >> started to add characteristic functions to some distributions in my >> experimental scripts. >> If you have a collection of logpdf, logcdf, we could add a trac ticket for this. >> >> However, this would miss the generic broadcasting part of the public >> functions, pdf, cdf,... but for estimation I wouldn't necessarily call >> those because of the overhead. >> >> >> I'm working on and off on this, so it's moving only slowly (and my >> wishlist is big). >> (for example, I was reading up on extreme value distributions in >> actuarial science and hydrology to get a better overview over the >> estimators.) >> >> >> So, I really love to hear any ideas, feedback, and see contributions >> to improving the distributions. >> >> Josef >> >> >>> >>> Thanks, >>> Chris. >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > --- > Travis Oliphant > Enthought, Inc. > oliphant at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From d.l.goldsmith at gmail.com Sat May 29 17:53:15 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 29 May 2010 14:53:15 -0700 Subject: [SciPy-User] [OT] Analysis for Applied Mathematics Message-ID: Hi! A long time ago, I asked Bill Derrick, my advisor @ the Univ. of Montana, to recommend a text on Analysis written for applied mathematicians and he suggested Cheney, "Analysis for Applied Mathematics." I'm finally in a position to add such a volume to my library and I'm wondering if A) anyone reading this would strongly disagree w/ this recommendation (and if so, why), and B) in particular, has it since been superseded by something superior? Thanks! DG PS: I'm also in the market for a treatise on Noise (i.e., I'm interested in something that attempts to be pretty comprehensive, covering theory and applications, looking at it - in its various "colors", i.e., white, pink, brown, etc. - from the variety of disciplines in which it plays an important part, etc., etc.) Thanks again! -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat May 29 17:58:31 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 29 May 2010 17:58:31 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: On Sat, May 29, 2010 at 5:38 PM, wrote: > On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant wrote: >> >> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote: >> >>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland >>> wrote: >>>> Hi, >>>> >>>> When using any of the distributions of scipy.stats there does not seem to be >>>> the ability (or at least I cannot figure out how) to have the function >>>> return >>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is >>>> essential. >>>> For instance suppose we are interested in an exponential distribution for a >>>> random variable x with a hyperparameter lambda there needs to be an option >>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to >>>> calculate log(scipy.stats.expon.pdf(x,lambda)). >>>> >>>> Is there a way to do this using the distributions in scipy.stats? >>> >>> It would need a new method for each distribution, e.g. _loglike, _logpdf >>> So, this is work, and for some distributions the log wouldn't simplify much. >>> >>> I proposed this once together with other improvements (but without response). >>> >>> The second useful method for estimation would be _fitstart, which >>> provides distribution specific starting values for fit, e.g. a moment >>> estimator, or a simple rules of thumb >>> http://projects.scipy.org/scipy/ticket/808 >>> >>> >>> Here are some of my currently planned enhancements to the distributions: >>> >>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py >> >> Hey Josef, >> >> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution). > > I would like to get the private _logpdf in a useful (vectorized or > broadcastable) version because for estimation and optimization, I want > to avoid the logpdf overhead. So, my testing will be on the underline > versions. > >> >> I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet. > > I have written a semi-frozen fit function and posted to the mailing > list a long time ago, but since I'm not sure about the API and I'm > expanding to several new estimators, I kept this under > work-in-progress. > > Similar _fitstart might need extra options, for estimation when some > parameters are fixed, e.g. there are good moment estimators that work > when some of the parameters (e.g. loc or scale) are fixed. Also > _fitstart is currently used only by my fit_frozen. > > I was hoping to get this done this year, maybe together with the > enhancements that Per Brodtkorb proposed two years ago, e.g. Method of > Maximum Spacings. > > I also have a Generalized Method of Moments estimator based on > matching quantiles and moments in the works. > > So, I don't want yet to be pinned down with any API for the estimation > enhancements. > > Josef > >> >> Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy? http://projects.scipy.org/scipy/log/trunk/scipy/stats/distributions.py >> >> Thanks, >> >> -Travis >> >>> >>> but I just checked, it looks like I forgot to copy the _loglike method >>> that I started from my experimental scripts. >>> >>> For a few distributions, where this is possible, it would also be >>> useful to add the gradient with respect to the parameters, (or even >>> the Hessian). But this is currently mostly just an idea, since we need >>> some analytical gradients in the estimation of stats models. >>> >>> >>>> >>>> If there is not is it possible for me to suggest that this feature is added. >>>> There is such an excellent range of distributions, each with such an >>>> impressive range of options, it seems ashame to have to mostly manually code >>>> up the log of pdfs and often call the log of CDFs from R. >>> >>> So far I only thought about log pdf, because I wanted it for Maximum >>> Likelihood estimation. >>> >>> Do you have a rough idea for which distributions log cdf would work? >>> that is, for which distribution is an analytical or efficient >>> numerical expression possible. >>> >>> I also think that scipy.stats.distributions could be one of the best >>> (broadest, consistent) collection of univariate distributions that I >>> have seen so far, once we fill in some missing pieces. >>> >>> As a way forward, I think we could make the distributions into a >>> numerical encyclopedia by adding private methods to those >>> distributions where it makes sense, like log pdf, log cdf and I also >>> started to add characteristic functions to some distributions in my >>> experimental scripts. >>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this. >>> >>> However, this would miss the generic broadcasting part of the public >>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call >>> those because of the overhead. >>> >>> >>> I'm working on and off on this, so it's moving only slowly (and my >>> wishlist is big). >>> (for example, I was reading up on extreme value distributions in >>> actuarial science and hydrology to get a better overview over the >>> estimators.) >>> >>> >>> So, I really love to hear any ideas, feedback, and see contributions >>> to improving the distributions. >>> >>> Josef >>> >>> >>>> >>>> Thanks, >>>> Chris. >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> --- >> Travis Oliphant >> Enthought, Inc. >> oliphant at enthought.com >> 1-512-536-1057 >> http://www.enthought.com >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From uclamathguy at gmail.com Sat May 29 21:05:32 2010 From: uclamathguy at gmail.com (Ryan Rosario) Date: Sat, 29 May 2010 18:05:32 -0700 Subject: [SciPy-User] Problem with np.load() on Huge Sparse Matrix Message-ID: Hi, I have a very huge sparse (395000 x 395000) CSC matrix that I cannot save in one pass, so I saved the data, indices, indptr and shape in separate files as suggested by Dave Wade-Farley a few years back. When I try to read back the indices pickle: >> np.save("indices.pickle", mymatrix.indices) >>> indices = np.load("indices.pickle.npy") >>> indices array([394852, 394649, 394533, ..., 0, 0, 0], dtype=int32) >>> intersection_matrix.indices array([394852, 394649, 394533, ..., 1557, 1223, 285], dtype=int32) Why is this happening? My only workaround is to print all of entries of intersection_matrix.indices to a file, and read in back which takes up to 2 hours. It would be great if I could get np.load to work because it is much faster. Thanks, Ryan From jsseabold at gmail.com Sun May 30 00:20:45 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 30 May 2010 00:20:45 -0400 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On Sat, May 29, 2010 at 3:44 PM, Nathaniel Smith wrote: > On Fri, May 28, 2010 at 9:00 PM, ? wrote: >> On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith wrote: >>> On Fri, May 28, 2010 at 7:53 PM, ? wrote: >>>> R's license, GPL, is incompatible with the license of scipy, BSD. >>>> While they are allowed to look at our code, code that goes into scipy >>>> cannot be based on GPL licensed code. >>> >>> You mean, they're allowed to copy our code, and we're allowed to look >>> at their code for reference but can't use it directly :-). >> >> We are allowed to look at their manuals but not their code. >> (Life ain't fair.) > > It sounds like you guys have this well in hand, but just a point here > -- you certainly are allowed to look at their code, just not copy the > "expressive aspects" of it. (Saying you can't *look* at it because of > the license is like saying writers can't read other people's novels!) > "Expressive" is a tricky term, of course -- IIUC it's basically > anything that could be changed while preserving functionality (because > the functionality, the algorithm itself, is not covered by copyright). > So, say, variable names certainly count as expressive, decisions about > which way to lay out the code, etc. If one wants to be really safe, > one can write down a textual description of the algorithm and then ask > someone else to translate back to code (the "clean room" method). > > So you do have to be a bit careful, but when you have code that > contains valuable information that isn't really written down anywhere > else then I'd say it's worth it. > Thanks, this is useful to know. I've always erred on the side of caution and just compared the results of functions/algorithms that *should* be the same vs, say, R, but if I could do this and then look at implementation details this could relieve substantial headaches. It still seems like such a fine line though. Skipper From aarchiba at physics.mcgill.ca Sun May 30 00:36:08 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Sun, 30 May 2010 01:36:08 -0300 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: Message-ID: On 30 May 2010 01:20, Skipper Seabold wrote: > On Sat, May 29, 2010 at 3:44 PM, Nathaniel Smith wrote: >> On Fri, May 28, 2010 at 9:00 PM, ? wrote: >>> On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith wrote: >>>> On Fri, May 28, 2010 at 7:53 PM, ? wrote: >>>>> R's license, GPL, is incompatible with the license of scipy, BSD. >>>>> While they are allowed to look at our code, code that goes into scipy >>>>> cannot be based on GPL licensed code. >>>> >>>> You mean, they're allowed to copy our code, and we're allowed to look >>>> at their code for reference but can't use it directly :-). >>> >>> We are allowed to look at their manuals but not their code. >>> (Life ain't fair.) >> >> It sounds like you guys have this well in hand, but just a point here >> -- you certainly are allowed to look at their code, just not copy the >> "expressive aspects" of it. (Saying you can't *look* at it because of >> the license is like saying writers can't read other people's novels!) >> "Expressive" is a tricky term, of course -- IIUC it's basically >> anything that could be changed while preserving functionality (because >> the functionality, the algorithm itself, is not covered by copyright). >> So, say, variable names certainly count as expressive, decisions about >> which way to lay out the code, etc. If one wants to be really safe, >> one can write down a textual description of the algorithm and then ask >> someone else to translate back to code (the "clean room" method). >> >> So you do have to be a bit careful, but when you have code that >> contains valuable information that isn't really written down anywhere >> else then I'd say it's worth it. >> > > Thanks, this is useful to know. ?I've always erred on the side of > caution and just compared the results of functions/algorithms that > *should* be the same vs, say, R, but if I could do this and then look > at implementation details this could relieve substantial headaches. > It still seems like such a fine line though. This is exactly the problem. I don't think the R community is particularly litigious, but as a rule of thumb, doing something that is technically legal but for which the legality is subtle opens one up to lawsuits. The problem is that even when you are right, a lawsuit is tremendously destructive. So things that are legal but subtle should probably be avoided by a group as penniless as the community as scipy developers. So it's probably better to just not read their source code. Anne > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From m.boumans at gmx.net Mon May 31 01:03:48 2010 From: m.boumans at gmx.net (bowie_22) Date: Mon, 31 May 2010 05:03:48 +0000 (UTC) Subject: [SciPy-User] OpenOpt and Scipy.optimize Message-ID: Hello together, during my evaluation of scipy as subsitute for Matlab I started to look at the optimization features of sciypy by looking at the optimze module. I posted a question and one answer contained a hint to OpenOpt. Now I am a little bit unsure how to proceed. Does it make more sense to look at OpenOpt rather then evaluating scipy.optimize? Regrads Marcus From ralf.gommers at googlemail.com Mon May 31 07:39:41 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 31 May 2010 19:39:41 +0800 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: On Sun, May 30, 2010 at 5:38 AM, wrote: > On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant > wrote: > > > > Hey Josef, > > > > I've been playing with distributions.py today and added logpdf, logcdf, > logsf methods (based on _logpdf, _logcdf, _logsf methods in each > distribution). > > I would like to get the private _logpdf in a useful (vectorized or > broadcastable) version because for estimation and optimization, I want > to avoid the logpdf overhead. So, my testing will be on the underline > versions. > > > > > I also added your _fitstart suggestion. I would like to do something > like your nnlf_fit method that allows you to fix some parameters and only > solve for others, but I haven't thought through all the issues yet. > > I have written a semi-frozen fit function and posted to the mailing > list a long time ago, but since I'm not sure about the API and I'm > expanding to several new estimators, I kept this under > work-in-progress. > > Similar _fitstart might need extra options, for estimation when some > parameters are fixed, e.g. there are good moment estimators that work > when some of the parameters (e.g. loc or scale) are fixed. Also > _fitstart is currently used only by my fit_frozen. > > I was hoping to get this done this year, maybe together with the > enhancements that Per Brodtkorb proposed two years ago, e.g. Method of > Maximum Spacings. > > I also have a Generalized Method of Moments estimator based on > matching quantiles and moments in the works. > > So, I don't want yet to be pinned down with any API for the estimation > enhancements. > > These recent changes are a bit problematic for several reasons: - there are many new methods for distributions without tests. - there are no docs for many new private and public methods - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 - the old rv_continuous doc template was put back in This, plus Josef saying that he doesn't want to fix the API for some methods yet, makes me want to take it out of the 0.8.x branch. Any objections to that Travis or Josef? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 31 11:59:39 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 31 May 2010 09:59:39 -0600 Subject: [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: On Mon, May 31, 2010 at 5:39 AM, Ralf Gommers wrote: > > > On Sun, May 30, 2010 at 5:38 AM, wrote: > >> On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant >> wrote: >> > >> > Hey Josef, >> > >> > I've been playing with distributions.py today and added logpdf, logcdf, >> logsf methods (based on _logpdf, _logcdf, _logsf methods in each >> distribution). >> >> I would like to get the private _logpdf in a useful (vectorized or >> broadcastable) version because for estimation and optimization, I want >> to avoid the logpdf overhead. So, my testing will be on the underline >> versions. >> >> > >> > I also added your _fitstart suggestion. I would like to do something >> like your nnlf_fit method that allows you to fix some parameters and only >> solve for others, but I haven't thought through all the issues yet. >> >> I have written a semi-frozen fit function and posted to the mailing >> list a long time ago, but since I'm not sure about the API and I'm >> expanding to several new estimators, I kept this under >> work-in-progress. >> >> Similar _fitstart might need extra options, for estimation when some >> parameters are fixed, e.g. there are good moment estimators that work >> when some of the parameters (e.g. loc or scale) are fixed. Also >> _fitstart is currently used only by my fit_frozen. >> >> I was hoping to get this done this year, maybe together with the >> enhancements that Per Brodtkorb proposed two years ago, e.g. Method of >> Maximum Spacings. >> >> I also have a Generalized Method of Moments estimator based on >> matching quantiles and moments in the works. >> >> So, I don't want yet to be pinned down with any API for the estimation >> enhancements. >> >> These recent changes are a bit problematic for several reasons: > - there are many new methods for distributions without tests. > - there are no docs for many new private and public methods > - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 > - the old rv_continuous doc template was put back in > > This, plus Josef saying that he doesn't want to fix the API for some > methods yet, makes me want to take it out of the 0.8.x branch. Any > objections to that Travis or Josef? > > I'm thinking it should be taken out of the trunk as well as the 0.8.x branch. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From akshaysrinivasan at gmail.com Mon May 31 12:34:55 2010 From: akshaysrinivasan at gmail.com (Akshay Srinivasan) Date: Mon, 31 May 2010 22:04:55 +0530 Subject: [SciPy-User] Kinpy Message-ID: <4C03E52F.1010309@gmail.com> Hello, I have been doing a lot Chemical Kinetic simulation lately, I particularly found that generating the code for solving a given set of reactions is a lot more time consuming and mechanistic than the time taken to do the rest of the work. I wrote Kinpy as a simple script to generate the Python code for doing exactly this from the natural representation of a set of chemical reactions. Its not really a *project* per se - its just one file! I couldn't any other place to put it, so it ended up on google code. You can find the the source code and information on usage here: http://code.google.com/p/kinpy/ Regards, Akshay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jr at sun.ac.za Mon May 31 14:56:31 2010 From: jr at sun.ac.za (Johann Rohwer) Date: Mon, 31 May 2010 20:56:31 +0200 Subject: [SciPy-User] Kinpy In-Reply-To: <4C03E52F.1010309@gmail.com> References: <4C03E52F.1010309@gmail.com> Message-ID: <4C04065F.5090509@sun.ac.za> You might be interested in PySCeS, the Python Simulator for Cellular Systems (http://pysces.sf.net), which is a package (runs on top of scipy and numpy) dedicated to solving (bio)chemical reaction networks and does, amongst others, time-course simulation, steady-state analysis, higher-order analyses such as stability analysis and metabolic control analysis, and more. I just don't want you to re-invent the wheel, and solving the kind of numerical problem you mention on your page is a breeze with PySCeS. Regards Johann On 31/05/2010 18:34, Akshay Srinivasan wrote: > Hello, > I have been doing a lot Chemical Kinetic simulation lately, I > particularly found that generating the code for solving a given set of > reactions is a lot more time consuming and mechanistic than the time > taken to do the rest of the work. I wrote Kinpy as a simple script to > generate the Python code for doing exactly this from the natural > representation of a set of chemical reactions. Its not really a > *project* per se - its just one file! I couldn't any other place to put > it, so it ended up on google code. > You can find the the source code and information on usage here: > http://code.google.com/p/kinpy/ > > Regards, > Akshay > From matthew.brett at gmail.com Mon May 31 18:42:24 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 15:42:24 -0700 Subject: [SciPy-User] Kinpy In-Reply-To: <4C04065F.5090509@sun.ac.za> References: <4C03E52F.1010309@gmail.com> <4C04065F.5090509@sun.ac.za> Message-ID: Hi, On Mon, May 31, 2010 at 11:56 AM, Johann Rohwer wrote: > You might be interested in PySCeS, the Python Simulator for Cellular Systems > (http://pysces.sf.net), I bow low in respect for that excellent name. I don't know who came up with it, but whoever it was deserves due honor ;) Matthew From fernando.ferreira at poli.ufrj.br Mon May 31 18:54:17 2010 From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=) Date: Mon, 31 May 2010 19:54:17 -0300 Subject: [SciPy-User] scipy.io.matlab.loadmat error Message-ID: Hi, I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2 For some reason, I can't load matlab files using scipy.io.matlab.loadmat: scipy.io.matlab.loadmat('all_data.mat') /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84: FutureWarning: Using struct_as_record default value (False) This will change to True in future versions return MatFile5Reader(byte_stream, **kwargs) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/fguimara/Documents/UFRJ/mestrado/CPE782_-_ICA/time_series_ica/script/ in () /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in loadmat(file_name, mdict, appendmat, **kwargs) 109 ''' 110 MR = mat_reader_factory(file_name, appendmat, **kwargs) --> 111 matfile_dict = MR.get_variables() 112 if mdict is not None: 113 mdict.update(matfile_dict) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/miobase.pyc in get_variables(self, variable_names) 444 mdict['__globals__'] = [] 445 while not self.end_of_stream(): --> 446 getter = self.matrix_getter_factory() 447 name = getter.name 448 if variable_names and name not in variable_names: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in matrix_getter_factory(self) 694 695 def matrix_getter_factory(self): --> 696 return self._array_reader.matrix_getter_factory() 697 698 def guess_byte_order(self): /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in matrix_getter_factory(self) 313 elif not mdtype == miMATRIX: 314 raise TypeError, \ --> 315 'Expecting miMATRIX type here, got %d' % mdtype 316 else: 317 getter = self.current_getter(byte_count) TypeError: Expecting miMATRIX type here, got 1296630016 Can't understand why... This is the info about the file % > file all_data.mat all_data.mat: Matlab v5 mat-file (little endian) version 0x0100 Anything? Cheers Fernando -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon May 31 19:08:13 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 16:08:13 -0700 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: Hi, 2010/5/31 Fernando Guimar?es Ferreira : > Hi, > I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2 > For some reason, I can't load matlab files using?scipy.io.matlab.loadmat: > scipy.io.matlab.loadmat('all_data.mat') ... > ?? ?317 ? ? ? ? ? ? getter = self.current_getter(byte_count) > TypeError: Expecting miMATRIX type here, got 1296630016 > > Can't understand why... I don't know either I'm afraid. Can you try the latest version? Is there some way you can get me the .mat file so I can debug the problem in more detail? Best, Matthew From fernando.ferreira at poli.ufrj.br Mon May 31 21:10:55 2010 From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=) Date: Mon, 31 May 2010 22:10:55 -0300 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: In the last email I meant python 2.6.5 The mat file is attached... This is a test, there is just an array 'x' with few elements.. Didn't work either Thanks, Fernando On Mon, May 31, 2010 at 8:08 PM, Matthew Brett wrote: > Hi, > > 2010/5/31 Fernando Guimar?es Ferreira : > > Hi, > > I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2 > > For some reason, I can't load matlab files using scipy.io.matlab.loadmat: > > scipy.io.matlab.loadmat('all_data.mat') > ... > > 317 getter = self.current_getter(byte_count) > > TypeError: Expecting miMATRIX type here, got 1296630016 > > > > Can't understand why... > > I don't know either I'm afraid. Can you try the latest version? Is > there some way you can get me the .mat file so I can debug the problem > in more detail? > > Best, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: teste.mat Type: application/octet-stream Size: 183 bytes Desc: not available URL: From vincent at vincentdavis.net Mon May 31 22:09:32 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Mon, 31 May 2010 20:09:32 -0600 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: Just as a note, this probably is not it but I recently ran into this with a csv file saved using excel on a mac. I guess it saves it as a unicode format, the error reported is a EOL when opening with genfromtxt but thats not quite right. Anyway if the file is saved using matlab on you mac this unicode might be the problem. Of course I am seeing this through skewed glasses I just couldn't not mention it. Vincent 2010/5/31 Fernando Guimar?es Ferreira : > In the last email I meant python 2.6.5 > The mat file is attached... This is a test, there is just an array 'x' with > few elements.. > Didn't work either > > Thanks, > Fernando > > > > > On Mon, May 31, 2010 at 8:08 PM, Matthew Brett > wrote: >> >> Hi, >> >> 2010/5/31 Fernando Guimar?es Ferreira : >> > Hi, >> > I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2 >> > For some reason, I can't load matlab files >> > using?scipy.io.matlab.loadmat: >> > scipy.io.matlab.loadmat('all_data.mat') >> ... >> > ?? ?317 ? ? ? ? ? ? getter = self.current_getter(byte_count) >> > TypeError: Expecting miMATRIX type here, got 1296630016 >> > >> > Can't understand why... >> >> I don't know either I'm afraid. ?Can you try the latest version? ?Is >> there some way you can get me the .mat file so I can debug the problem >> in more detail? >> >> Best, >> >> Matthew >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jsseabold at gmail.com Mon May 31 22:16:53 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 31 May 2010 22:16:53 -0400 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: 2010/5/31 Fernando Guimar?es Ferreira : > In the last email I meant python 2.6.5 > The mat file is attached... This is a test, there is just an array 'x' with > few elements.. > Didn't work either Works for me with a recent trunk of scipy. In [1]: from scipy import io In [2]: dta = io.loadmat('./teste.mat') In [3]: dta['x'] Out[3]: array([[0, 1, 3, 0, 1, 3, 4, 5, 7, 7]], dtype=uint8) In [4]: from scipy import __version__ as v In [5]: v Out[5]: '0.9.0.dev6447' Skipper From matthew.brett at gmail.com Mon May 31 22:36:04 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 19:36:04 -0700 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: Hi, 2010/5/31 Fernando Guimar?es Ferreira : > In the last email I meant python 2.6.5 > The mat file is attached... This is a test, there is just an array 'x' with > few elements.. It works for me with 0.7.2. I wonder what's going on? [mb312 at blair ~/tmp]$ uname -a Darwin blair 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386 [mb312 at blair ~/tmp]$ ipython Python 2.6.4 (r264:75706, Dec 22 2009, 14:55:30) Type "copyright", "credits" or "license" for more information. IPython 0.11.alpha1.bzr.r1223 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: import scipy In [2]: scipy.__version__ Out[2]: '0.7.2' In [3]: import scipy.io.matlab In [4]: scipy.io.matlab.loadmat('/Users/mb312/Downloads/teste.mat') /Users/mb312/usr/local/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84: FutureWarning: Using struct_as_record default value (False) This will change to True in future versions return MatFile5Reader(byte_stream, **kwargs) Out[4]: {'__globals__': [], '__header__': 'MATLAB 5.0 MAT-file, Platform: MACI, Created on: Mon May 31 21:06:09 2010', '__version__': '1.0', 'x': array([[0, 1, 3, 0, 1, 3, 4, 5, 7, 7]], dtype=uint8)} Best, Matthew From pgmdevlist at gmail.com Mon May 31 22:38:55 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 31 May 2010 22:38:55 -0400 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: Message-ID: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> On May 31, 2010, at 10:09 PM, Vincent Davis wrote: > Just as a note, this probably is not it but I recently ran into this > with a csv file saved using excel on a mac. I guess it saves it as a > unicode format, the error reported is a EOL when opening with > genfromtxt but thats not quite right. But I thought I had fixed that on the SVN ??? From fernando.ferreira at poli.ufrj.br Mon May 31 22:43:40 2010 From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=) Date: Mon, 31 May 2010 23:43:40 -0300 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> References: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> Message-ID: 14 [fguimara] script > uname -a Darwin warley.local 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386 15 [fguimara] script > ipython Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) Type "copyright", "credits" or "license" for more information. IPython 0.10 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: import scipy In [2]: scipy.__version__ Out[2]: '0.7.2' In [3]: import scipy.io.matlab In [4]: scipy.io.matlab.loadmat('teste.mat') /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84: FutureWarning: Using struct_as_record default value (False) This will change to True in future versions return MatFile5Reader(byte_stream, **kwargs) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/fguimara/Documents/UFRJ/mestrado/CPE782_-_ICA/time_series_ica/script/ in () /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in loadmat(file_name, mdict, appendmat, **kwargs) 109 ''' 110 MR = mat_reader_factory(file_name, appendmat, **kwargs) --> 111 matfile_dict = MR.get_variables() 112 if mdict is not None: 113 mdict.update(matfile_dict) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/miobase.pyc in get_variables(self, variable_names) 444 mdict['__globals__'] = [] 445 while not self.end_of_stream(): --> 446 getter = self.matrix_getter_factory() 447 name = getter.name 448 if variable_names and name not in variable_names: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in matrix_getter_factory(self) 694 695 def matrix_getter_factory(self): --> 696 return self._array_reader.matrix_getter_factory() 697 698 def guess_byte_order(self): /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in matrix_getter_factory(self) 313 elif not mdtype == miMATRIX: 314 raise TypeError, \ --> 315 'Expecting miMATRIX type here, got %d' % mdtype 316 else: 317 getter = self.current_getter(byte_count) TypeError: Expecting miMATRIX type here, got 1296630016 In [5]: Same file.... But it doesnot work at all... Cheers, Fernando On Mon, May 31, 2010 at 11:38 PM, Pierre GM wrote: > On May 31, 2010, at 10:09 PM, Vincent Davis wrote: > > Just as a note, this probably is not it but I recently ran into this > > with a csv file saved using excel on a mac. I guess it saves it as a > > unicode format, the error reported is a EOL when opening with > > genfromtxt but thats not quite right. > > But I thought I had fixed that on the SVN ??? > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon May 31 22:58:24 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 19:58:24 -0700 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> Message-ID: Hi, ... > TypeError: Expecting miMATRIX type here, got 1296630016 > In [5]: > > Same file.... But it does not work at all... What version of numpy do you have? I can't imagine it makes a difference, but still. Did you run the scipy tests? Did the scipy.io.matlab tests pass? Best, Matthew From vincent at vincentdavis.net Mon May 31 23:53:34 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Mon, 31 May 2010 21:53:34 -0600 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> References: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> Message-ID: On Mon, May 31, 2010 at 8:38 PM, Pierre GM wrote: > On May 31, 2010, at 10:09 PM, Vincent Davis wrote: >> Just as a note, this probably is not it ?but I recently ran into this >> with a csv file saved using excel on a mac. I guess it saves it as a >> unicode format, the error reported is a EOL when opening with >> genfromtxt but thats not quite right. > > ?But I thought I had fixed that on the SVN ??? You did but I assume that only applied to csv (type?) files. I was thinking that they may have a "similar" problem with this mat file. But I tried to clearly say I have no idea. Vincent > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From matthew.brett at gmail.com Mon May 31 23:58:21 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 20:58:21 -0700 Subject: [SciPy-User] scipy.io.matlab.loadmat error In-Reply-To: References: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com> Message-ID: Hi, >> ?But I thought I had fixed that on the SVN ??? > > You did but I assume that only applied to csv (type?) files. > I was thinking that they may have a "similar" problem with this mat > file. But I tried to clearly say I have no idea. Actually the .mat files are a custom binary format by matlab - we don't use the genfromtxt stuff to load them... Matthew From skorpio11 at gmail.com Tue May 25 20:33:04 2010 From: skorpio11 at gmail.com (Leon Adams) Date: Tue, 25 May 2010 20:33:04 -0400 Subject: [SciPy-User] Triangular Distribution ppf method Message-ID: Hi all, There seems to be a bug of some sort in evaluating the ppf method of the scipy.stats.triang.ppf function. Evaluating the distribution with a location parameter 1 or greater seems to problematic. I am looking for confirmation on this behavior and suggestions for work around. Thanks in advance -- Leon Adams -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanwp at gis.a-star.edu.sg Wed May 26 00:07:35 2010 From: tanwp at gis.a-star.edu.sg (Padma TAN) Date: Wed, 26 May 2010 12:07:35 +0800 Subject: [SciPy-User] Python scipy error. Message-ID: Hi Error message I got when needed to run this. Please assist! [rjauch at giswk002 pwm_scanner]$ python pwm_scanner.py Traceback (most recent call last): File "pwm_scanner.py", line 36, in from glbase import * File "/home/rjauch/glbase/__init__.py", line 57, in from glglob import glglob File "/home/rjauch/glbase/glglob.py", line 27, in from scipy.stats import spearmanr, pearsonr File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/stats/__init__.py", line 7, in from stats import * File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/stats/stats.py", line 199, in import scipy.linalg as linalg File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/__init__.py", line 8, in from basic import * File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/basic.py", line 389, in import decomp File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/decomp.py", line 23, in from blas import get_blas_funcs File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/blas.py", line 14, in from scipy.linalg import fblas ImportError: /usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/fblas.so: undefined symbol: srotmg_ Can I safely ignore this? Messages shown when running python setup.py build for scipy. customize UnixCCompiler customize UnixCCompiler using build_clib customize GnuFCompiler Found executable /usr/bin/g77 gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using build_clib running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext extending extension 'scipy.sparse.linalg.dsolve._zsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._dsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._csuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._ssuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] customize UnixCCompiler customize UnixCCompiler using build_ext customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using build_ext running scons [root at giswk002 scipy-0.7.2]# SYSTEM PYTHON INFO [root at giswk002 local]# python -c 'from numpy.f2py.diagnose import run; run()' ------ os.name='posix' ------ sys.platform='linux2' ------ sys.version: 2.6.2 (r262:71600, Jul 15 2009, 19:48:50) [GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] ------ sys.prefix: /usr/local/Python-2.6.2 ------ sys.path=':/usr/local/Python-2.6.2/lib/python2.6/site-packages:/usr/local:/usr/local/Python-2.6.2/lib/python26.zip:/usr/local/Python-2.6.2/lib/python2.6:/usr/local/Python-2.6.2/lib/python2.6/plat-linux2:/usr/local/Python-2.6.2/lib/python2.6/lib-tk:/usr/local/Python-2.6.2/lib/python2.6/lib-old:/usr/local/Python-2.6.2/lib/python2.6/lib-dynload:/root/.local/lib/python2.6/site-packages' ------ Failed to import Numeric: No module named Numeric Failed to import numarray: No module named numarray Found new numpy version '1.3.0' in /usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/__init__.pyc Found f2py2e version '2' in /usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/f2py/f2py2e.pyc Found numpy.distutils version '0.4.0' in '/usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/distutils/__init__.pyc' ------ Importing numpy.distutils.fcompiler ... ok ------ Checking availability of supported Fortran compilers: GnuFCompiler instance properties: archiver = ['/usr/bin/g77', '-cr'] compile_switch = '-c' compiler_f77 = ['/usr/bin/g77', '-g', '-Wall', '-fno-second- underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_f90 = None compiler_fix = None libraries = ['g2c'] library_dirs = [] linker_exe = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall'] linker_so = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall', '- shared'] object_switch = '-o ' ranlib = ['/usr/bin/g77'] version = LooseVersion ('3.4.3') version_cmd = ['/usr/bin/g77', '--version'] Gnu95FCompiler instance properties: archiver = ['/usr/bin/gfortran', '-cr'] compile_switch = '-c' compiler_f77 = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno- second-underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_f90 = ['/usr/bin/gfortran', '-Wall', '-fno-second-underscore', '-fPIC', '-O3', '-funroll-loops'] compiler_fix = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno- second-underscore', '-Wall', '-fno-second-underscore', '- fPIC', '-O3', '-funroll-loops'] libraries = ['gfortran'] library_dirs = [] linker_exe = ['/usr/bin/gfortran', '-Wall', '-Wall'] linker_so = ['/usr/bin/gfortran', '-Wall', '-Wall', '-shared'] object_switch = '-o ' ranlib = ['/usr/bin/gfortran'] version = LooseVersion ('4.0.0') version_cmd = ['/usr/bin/gfortran', '--version'] Fortran compilers found: --fcompiler=gnu GNU Fortran 77 compiler (3.4.3) --fcompiler=gnu95 GNU Fortran 95 compiler (4.0.0) Compilers available for this platform, but not found: --fcompiler=absoft Absoft Corp Fortran Compiler --fcompiler=compaq Compaq Fortran Compiler --fcompiler=g95 G95 Fortran Compiler --fcompiler=intel Intel Fortran Compiler for 32-bit apps --fcompiler=intele Intel Fortran Compiler for Itanium apps --fcompiler=intelem Intel Fortran Compiler for EM64T-based apps --fcompiler=lahey Lahey/Fujitsu Fortran 95 Compiler --fcompiler=nag NAGWare Fortran 95 Compiler --fcompiler=pg Portland Group Fortran Compiler --fcompiler=vast Pacific-Sierra Research Fortran 90 Compiler Compilers not available on this platform: --fcompiler=hpux HP Fortran 90 Compiler --fcompiler=ibm IBM XL Fortran Compiler --fcompiler=intelev Intel Visual Fortran Compiler for Itanium apps --fcompiler=intelv Intel Visual Fortran Compiler for 32-bit apps --fcompiler=mips MIPSpro Fortran Compiler --fcompiler=none Fake Fortran compiler --fcompiler=sun Sun or Forte Fortran 95 Compiler For compiler details, run 'config_fc --verbose' setup command. ------ Importing numpy.distutils.cpuinfo ... ok ------ CPU information: CPUInfoBase__get_nbits getNCPUs has_mmx has_sse has_sse2 has_sse3 is_32bit is_Intel is_Nocona is_XEON is_Xeon is_singleCPU ------ [root at giswk002 local]# Thanks a lot in advance!!! Regards, Padma -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopher.strickland at qut.edu.au Thu May 27 07:22:54 2010 From: christopher.strickland at qut.edu.au (Chris Strickland) Date: Thu, 27 May 2010 21:22:54 +1000 Subject: [SciPy-User] log pdf, cdf, etc Message-ID: <201005272122.54911.christopher.strickland@qut.edu.au> Hi, When using any of the distributions of scipy.stats there does not seem to be the ability (or at least I cannot figure out how) to have the function return the log of the pdf, cdf, sf, etc. For statistical analysis this is essential. For instance suppose we are interested in an exponential distribution for a random variable x with a hyperparameter lambda there needs to be an option that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to calculate log(scipy.stats.expon.pdf(x,lambda)). Is there a way to do this using the distributions in scipy.stats? If there is not is it possible for me to suggest that this feature is added. There is such an excellent range of distributions, each with such an impressive range of options, it seems ashame to have to mostly manually code up the log of pdfs and often call the log of CDFs from R. Thanks, Chris. From thoeger at fys.ku.dk Fri May 28 09:29:27 2010 From: thoeger at fys.ku.dk (=?ISO-8859-1?Q?Th=F8ger?= Emil Juul Thorsen) Date: Fri, 28 May 2010 15:29:27 +0200 Subject: [SciPy-User] matplotlib woes Message-ID: <1275053367.1431.7.camel@falconeer> Hello SciPy list; For my thesis I have an image which is also a spectrum of an object. I want to plot the image using imshow along with a data plot of the intensity, as can be seen on http://yfrog.com/0tforscipylistp . My questions are 2: 1) imshow() sets the ticks on the two upper subplots as pixels coordinates. What I want to show as tick labels on my x-axis is the wavelength coordinates of the lower plot on the upper images (since there is a straightforward pixel-to-wavelength conversion). I have googled everywhere but can't seem to find a solution, is it possible? 2) Is there any possible way to make the subplots layout look a bit nicer? Ideally to squeeze the two upper plots closer together and stretch the lower plot vertically, or at least to make the two upper subplots take up an equal amount of space? Best regards; Emil, python-newb and (former) IDL-user, Master student of Astrophysics at the University of Copenhagen, Niels Bohr Instutute.