From rob.clewley at gmail.com  Sat May  1 11:36:05 2010
From: rob.clewley at gmail.com (Rob Clewley)
Date: Sat, 1 May 2010 11:36:05 -0400
Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
	wxPython 	interactive environment
In-Reply-To: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
References: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
Message-ID: <w2ha749952d1005010836qe6c3e5cah3ae7bae2def4bf89@mail.gmail.com>

Hi Tom,

On Fri, Apr 30, 2010 at 4:38 PM, Charrett, Thomas
<t.charrett at cranfield.ac.uk> wrote:
> Hello all,
> I would like to announce a interactive environment for python, PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is written in wxpython.:
> http://pythontoolkit.sourceforge.net

This looks like something I might want to teach with too. I hope to be
able to try this out soon as I try to teach with python in my classes.
I would benefit from an easy to learn and use graphical interactive
environment for my students (some of whom will already be familiar
with matlab).

I look forward to seeing some tutorial examples to show off how it
might be used in a teaching session. But most pressingly you need to
provide information on your web page about what steps need to be taken
to "install" it and what version restrictions there are. I cannot see
any information in the download zip file either. I have wxPython
installed and I'm on a Mac OS X 10.4 with Python 2.4. It looks like
there is no setup.py and so maybe PTK.pyw is supposed to be just run
as-is, but I don't know whether the ptk directory is supposed to be
dumped into site-packages or be standalone, and whether path
environment changes etc. are needed. Running PTK.pyw doesn't work for
me:

 File "/Users/rob/ptk/app/__init__.py", line 18, in ?
    startdir = __main__.__file__.rpartition(os.sep)[0]
AttributeError: 'str' object has no attribute 'rpartition'

I think rpartition is only in Python 2.5+ ?

-Rob


From stefan at sun.ac.za  Sat May  1 15:02:40 2010
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Sat, 1 May 2010 21:02:40 +0200
Subject: [SciPy-User] watershed question
In-Reply-To: <20100429193826.GA9432@phare.normalesup.org>
References: <20100425123445.GA25175@phare.normalesup.org>
	<r2y9457e7c81004280836x81ae43abv471e6c3e639c596e@mail.gmail.com>
	<A9F18944-1712-41F3-B493-7A2B566FC0BF@yale.edu>
	<20100429193826.GA9432@phare.normalesup.org>
Message-ID: <m2t9457e7c81005011202o421272a0s62401138890c5ec0@mail.gmail.com>

Hi Emanuelle

The CellProfiler team has been very generous in sharing code so far:

http://stefanv.github.com/scikits.image/contribute.html#merge-code-provided-by-cellprofiler-team

I have just been too busy working on my PhD to get it all integrated
:(  [Of course, I'd be very happy if someone would help with this
task!]

I see the svn links on that page is broken; I'll see where the code is
hosted now.  I'm not sure if the watershed code is covered.

Cheers
St?fan

On 29 April 2010 21:38, Emmanuelle Gouillart
<emmanuelle.gouillart at normalesup.org> wrote:
> ? ? ? ?Hi St?fan and Zach,
>
> ? ? ? ?thank you for your answers. So it seems that
> ndimage.watershed_ift is quite buggy, maybe some warnings should be
> added to its docstring? We can't afford people spending too much time
> trying to use the function if it doesn't work.
>
> ? ? ? ?Using the cellprofile/cpmath package is a neat trick, I tried it
> and it works perfectly. It even works for 3-D arrays (using the
> fast_watershed function)! Too bad that it's GPL-licensed and it's not
> possible to integrate the code in the image processing scikit :(.
>
> ? ? ? ?Thanks again,
>
> ? ? ? ?Emmanuelle
>
>
>
> On Wed, Apr 28, 2010 at 01:18:11PM -0400, Zachary Pincus wrote:
>> > Unless I'm also missing something obvious, the code returns an invalid
>> > result. ?I even adjusted the depths of the two "pits", but always one
>> > region overruns the other---not what I would expect to happen. ?I
>> > haven't delved into the ndimage code at all, but I wonder weather we
>> > shouldn't implement one of the simpler algorithms as part of
>> > scikits.image.segment for comparison?
>
>> Cellprofiler has a watershed algorithm, I believe. And like most of
>> the cellprofiler stuff, the implementation seems pretty high-quality
>> and well-thought-out.
>
>> I wound up extracting the cpmath sub-package, and (after a few
>> setup.py changes) it works great standalone with just scipy and numpy
>> as dependencies:
>> https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/
>
>> Zach
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From stefan at sun.ac.za  Sat May  1 15:15:22 2010
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Sat, 1 May 2010 21:15:22 +0200
Subject: [SciPy-User] watershed question
In-Reply-To: <m2t9457e7c81005011202o421272a0s62401138890c5ec0@mail.gmail.com>
References: <20100425123445.GA25175@phare.normalesup.org>
	<r2y9457e7c81004280836x81ae43abv471e6c3e639c596e@mail.gmail.com>
	<A9F18944-1712-41F3-B493-7A2B566FC0BF@yale.edu>
	<20100429193826.GA9432@phare.normalesup.org> 
	<m2t9457e7c81005011202o421272a0s62401138890c5ec0@mail.gmail.com>
Message-ID: <g2p9457e7c81005011215rc9107707v510db050e403059f@mail.gmail.com>

I've updated the links to the repository.

Lee, I assume the watershed code covered under the arrangement?  Looks
like we've got some hands to help!

Regards
St?fan

2010/5/1 St?fan van der Walt <stefan at sun.ac.za>:
> Hi Emanuelle
>
> The CellProfiler team has been very generous in sharing code so far:
>
> http://stefanv.github.com/scikits.image/contribute.html#merge-code-provided-by-cellprofiler-team
>
> I have just been too busy working on my PhD to get it all integrated
> :( ?[Of course, I'd be very happy if someone would help with this
> task!]
>
> I see the svn links on that page is broken; I'll see where the code is
> hosted now. ?I'm not sure if the watershed code is covered.
>
> Cheers
> St?fan
>
> On 29 April 2010 21:38, Emmanuelle Gouillart
> <emmanuelle.gouillart at normalesup.org> wrote:
>> ? ? ? ?Hi St?fan and Zach,
>>
>> ? ? ? ?thank you for your answers. So it seems that
>> ndimage.watershed_ift is quite buggy, maybe some warnings should be
>> added to its docstring? We can't afford people spending too much time
>> trying to use the function if it doesn't work.
>>
>> ? ? ? ?Using the cellprofile/cpmath package is a neat trick, I tried it
>> and it works perfectly. It even works for 3-D arrays (using the
>> fast_watershed function)! Too bad that it's GPL-licensed and it's not
>> possible to integrate the code in the image processing scikit :(.
>>
>> ? ? ? ?Thanks again,
>>
>> ? ? ? ?Emmanuelle
>>
>>
>>
>> On Wed, Apr 28, 2010 at 01:18:11PM -0400, Zachary Pincus wrote:
>>> > Unless I'm also missing something obvious, the code returns an invalid
>>> > result. ?I even adjusted the depths of the two "pits", but always one
>>> > region overruns the other---not what I would expect to happen. ?I
>>> > haven't delved into the ndimage code at all, but I wonder weather we
>>> > shouldn't implement one of the simpler algorithms as part of
>>> > scikits.image.segment for comparison?
>>
>>> Cellprofiler has a watershed algorithm, I believe. And like most of
>>> the cellprofiler stuff, the implementation seems pretty high-quality
>>> and well-thought-out.
>>
>>> I wound up extracting the cpmath sub-package, and (after a few
>>> setup.py changes) it works great standalone with just scipy and numpy
>>> as dependencies:
>>> https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/
>>
>>> Zach
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From cr.anil at gmail.com  Sun May  2 07:36:05 2010
From: cr.anil at gmail.com (Anil C R)
Date: Sun, 2 May 2010 17:06:05 +0530
Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
	wxPython 	interactive environment
In-Reply-To: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
References: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
Message-ID: <z2y8cf7db601005020436n1735232bq3006b32a49d4488e@mail.gmail.com>

This looks good :D great job Tom!!

Anil


On Sat, May 1, 2010 at 2:08 AM, Charrett, Thomas <t.charrett at cranfield.ac.uk
> wrote:

> Hello all,
> I would like to announce a interactive environment for python,
> PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is
> written in wxpython.:
> http://pythontoolkit.sourceforge.net
>
> It started life a personal project for use in day-to-day lab work and
> teaching (including myself) python so it is far from complete, but usable.
> Key features:
>
> - A console window with  support for muliple isolated python interpreters
> (engines) running as external processes,
> - External (process) engines allow interactive use of GUI toolkits
> (currently wxPython and Tk)
> - Full object inspection, auto-completions and call tips in internal and
> external engines.
> - Matlab style namespace/workspace browser than can be extended/customised
> for different python types.
> - GUI views for strings, unicode, lists and numpy arrays (more can be
> easily added)
> - Python path management .
> - A simple python code editor.
> - Searchable command history that is stored between sessions.
> - Extendible via a tool plugin system.
>
> I would be interested in any comments/suggestion/feedback.
>
> Thanks,
>
> Tom
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100502/f089d2ec/attachment.html>

From cr.anil at gmail.com  Sun May  2 07:43:48 2010
From: cr.anil at gmail.com (Anil C R)
Date: Sun, 2 May 2010 17:13:48 +0530
Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
	wxPython 	interactive environment
In-Reply-To: <z2y8cf7db601005020436n1735232bq3006b32a49d4488e@mail.gmail.com>
References: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
	<z2y8cf7db601005020436n1735232bq3006b32a49d4488e@mail.gmail.com>
Message-ID: <m2q8cf7db601005020443m1c1dfdafq905a3cf53457c907@mail.gmail.com>

Rob, the requirements says it needs Python 2.6+
http://pythontoolkit.sourceforge.net/about.html
<http://pythontoolkit.sourceforge.net/about.html>
Anil


On Sun, May 2, 2010 at 5:06 PM, Anil C R <cr.anil at gmail.com> wrote:

> This looks good :D great job Tom!!
>
> Anil
>
>
>
> On Sat, May 1, 2010 at 2:08 AM, Charrett, Thomas <
> t.charrett at cranfield.ac.uk> wrote:
>
>> Hello all,
>> I would like to announce a interactive environment for python,
>> PythonToolkit, or PTK, is loosely based on the MATLAB gui interface and is
>> written in wxpython.:
>> http://pythontoolkit.sourceforge.net
>>
>> It started life a personal project for use in day-to-day lab work and
>> teaching (including myself) python so it is far from complete, but usable.
>> Key features:
>>
>> - A console window with  support for muliple isolated python interpreters
>> (engines) running as external processes,
>> - External (process) engines allow interactive use of GUI toolkits
>> (currently wxPython and Tk)
>> - Full object inspection, auto-completions and call tips in internal and
>> external engines.
>> - Matlab style namespace/workspace browser than can be extended/customised
>> for different python types.
>> - GUI views for strings, unicode, lists and numpy arrays (more can be
>> easily added)
>> - Python path management .
>> - A simple python code editor.
>> - Searchable command history that is stored between sessions.
>> - Extendible via a tool plugin system.
>>
>> I would be interested in any comments/suggestion/feedback.
>>
>> Thanks,
>>
>> Tom
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100502/697b533c/attachment.html>

From cr.anil at gmail.com  Sun May  2 07:54:13 2010
From: cr.anil at gmail.com (Anil C R)
Date: Sun, 2 May 2010 17:24:13 +0530
Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
	wxPython 	interactive environment
In-Reply-To: <m2q8cf7db601005020443m1c1dfdafq905a3cf53457c907@mail.gmail.com>
References: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D65@CCEXCHSTORE-1.central.cranfield.ac.uk>
	<z2y8cf7db601005020436n1735232bq3006b32a49d4488e@mail.gmail.com> 
	<m2q8cf7db601005020443m1c1dfdafq905a3cf53457c907@mail.gmail.com>
Message-ID: <j2o8cf7db601005020454z3c909544s898f1ff238ec34da@mail.gmail.com>

Tom, are you planning on integrating ipython too? I think that would be
nice. Also I use matplotlib, and every time i need to do an imshow, I need
to do something like this:

img = imread('image.png')
imshow(img),show()

is this a problem with matplotlib or with your software?? any workarounds to
avoid the  show() call?

Thanks
Anil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100502/a4fc709c/attachment.html>

From vanforeest at gmail.com  Mon May  3 06:04:51 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 3 May 2010 12:04:51 +0200
Subject: [SciPy-User] deterministic random variable
Message-ID: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>

Hi,

As far as I can see scipy.stats does not support the deterministic
distribution. Would it be a good idea to implement this also? In my
opinion this distribution is very useful to use as a test case, for
debugging purposes for instance.

bye

Nicky


From josef.pktd at gmail.com  Mon May  3 09:16:54 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 3 May 2010 09:16:54 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
Message-ID: <w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>

On Mon, May 3, 2010 at 6:04 AM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi,
>
> As far as I can see scipy.stats does not support the deterministic
> distribution. Would it be a good idea to implement this also? In my
> opinion this distribution is very useful to use as a test case, for
> debugging purposes for instance.

You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution
(I never heard the term deterministic distribution before).

If the support is an integer, then rv_discrete might work, looks good see below

Are there any useful operations, that we could do with it?
I think I can see a case for debugging programs that use the
distributions in scipy.stats, but almost degenerate might also work
for debugging.

What I would like to have is a discrete distribution on the real line,
instead of the integers, like rv_discrete but with support on
arbitrary floats. This could use the machinery of rv_discrete but
would need a generalizing rewrite.


this looks good

>>> stats.rv_discrete(values=([0],[1]), name='degenerate')
<scipy.stats.distributions.rv_discrete object at 0x013BA5B0>
>>> deg=stats.rv_discrete(values=([0],[1]), name='degenerate')
>>> deg.rvs(size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> deg.pmf(np.arange(-5,6))
array([ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.])
>>> deg.cdf(np.arange(-5,6))
array([ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.])
>>> deg.sf(np.arange(-5,6))
array([ 1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> deg.ppf(np.linspace(0,1,11))
array([-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> deg.stats()
(array(0.0), array(0.0))
>>> deg.stats(moments='mvsk')
(array(0.0), array(0.0), array(-1.#IND), array(-1.#IND))


degenerate Bernoulli has a nan problem in pmf

>>> stats.bernoulli.rvs(0,size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> stats.bernoulli.pmf(np.arange(-5,6),0.)
array([  0.,   0.,   0.,   0.,   0.,  NaN,   0.,   0.,   0.,   0.,   0.])
>>> stats.bernoulli.cdf(np.arange(-5,6),0.)
array([ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.])
>>> stats.bernoulli.pmf(np.arange(-5,6),1.)
array([  0.,   0.,   0.,   0.,   0.,   0.,  NaN,   0.,   0.,   0.,   0.])
>>> stats.bernoulli.ppf(np.linspace(0,1,11),0.)
array([-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.])
>>> stats.bernoulli.ppf(np.linspace(0,1,11),1.)
array([-1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
>>> stats.bernoulli.stats(0., moments='mvsk')
(array(0.0), array(0.0), array(1.#INF), array(1.#INF))


and almost degenerate Bernoulli

>>> stats.bernoulli.pmf(np.arange(-5,6),1e-16)
array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
         1.00000000e-16,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00])
>>> stats.bernoulli.pmf(np.arange(-5,6),1-1e-16)
array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   1.11022302e-16,
         1.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00])
>>> stats.bernoulli.ppf(np.linspace(0,1,11),1-1e-16)
array([-1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Josef

>
> bye
>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Mon May  3 09:35:42 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 3 May 2010 09:35:42 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
Message-ID: <u2m1cd32cbb1005030635p5c59897aid6bbbfcfb394c13f@mail.gmail.com>

On Mon, May 3, 2010 at 9:16 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, May 3, 2010 at 6:04 AM, nicky van foreest <vanforeest at gmail.com> wrote:
>> Hi,
>>
>> As far as I can see scipy.stats does not support the deterministic
>> distribution. Would it be a good idea to implement this also? In my
>> opinion this distribution is very useful to use as a test case, for
>> debugging purposes for instance.
>
> You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution
> (I never heard the term deterministic distribution before).
>
> If the support is an integer, then rv_discrete might work, looks good see below
>
> Are there any useful operations, that we could do with it?
> I think I can see a case for debugging programs that use the
> distributions in scipy.stats, but almost degenerate might also work
> for debugging.
>
> What I would like to have is a discrete distribution on the real line,
> instead of the integers, like rv_discrete but with support on
> arbitrary floats. This could use the machinery of rv_discrete but
> would need a generalizing rewrite.
>
>
> this looks good
>
>>>> stats.rv_discrete(values=([0],[1]), name='degenerate')
> <scipy.stats.distributions.rv_discrete object at 0x013BA5B0>
>>>> deg=stats.rv_discrete(values=([0],[1]), name='degenerate')
>>>> deg.rvs(size=10)
> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>>> deg.pmf(np.arange(-5,6))
> array([ 0., ?0., ?0., ?0., ?0., ?1., ?0., ?0., ?0., ?0., ?0.])
>>>> deg.cdf(np.arange(-5,6))
> array([ 0., ?0., ?0., ?0., ?0., ?1., ?1., ?1., ?1., ?1., ?1.])
>>>> deg.sf(np.arange(-5,6))
> array([ 1., ?1., ?1., ?1., ?1., ?0., ?0., ?0., ?0., ?0., ?0.])
>>>> deg.ppf(np.linspace(0,1,11))
> array([-1., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0.])
>>>> deg.stats()
> (array(0.0), array(0.0))
>>>> deg.stats(moments='mvsk')
> (array(0.0), array(0.0), array(-1.#IND), array(-1.#IND))
>
>
> degenerate Bernoulli has a nan problem in pmf
>
>>>> stats.bernoulli.rvs(0,size=10)
> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>>> stats.bernoulli.pmf(np.arange(-5,6),0.)
> array([ ?0., ? 0., ? 0., ? 0., ? 0., ?NaN, ? 0., ? 0., ? 0., ? 0., ? 0.])
>>>> stats.bernoulli.cdf(np.arange(-5,6),0.)
> array([ 0., ?0., ?0., ?0., ?0., ?1., ?1., ?1., ?1., ?1., ?1.])
>>>> stats.bernoulli.pmf(np.arange(-5,6),1.)
> array([ ?0., ? 0., ? 0., ? 0., ? 0., ? 0., ?NaN, ? 0., ? 0., ? 0., ? 0.])
>>>> stats.bernoulli.ppf(np.linspace(0,1,11),0.)
> array([-1., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?1.])
>>>> stats.bernoulli.ppf(np.linspace(0,1,11),1.)
> array([-1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1.])
>>>> stats.bernoulli.stats(0., moments='mvsk')
> (array(0.0), array(0.0), array(1.#INF), array(1.#INF))
>
>
> and almost degenerate Bernoulli
>
>>>> stats.bernoulli.pmf(np.arange(-5,6),1e-16)
> array([ ?0.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00,
> ? ? ? ? 0.00000000e+00, ? 0.00000000e+00, ? 1.00000000e+00,
> ? ? ? ? 1.00000000e-16, ? 0.00000000e+00, ? 0.00000000e+00,
> ? ? ? ? 0.00000000e+00, ? 0.00000000e+00])
>>>> stats.bernoulli.pmf(np.arange(-5,6),1-1e-16)
> array([ ?0.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00,
> ? ? ? ? 0.00000000e+00, ? 0.00000000e+00, ? 1.11022302e-16,
> ? ? ? ? 1.00000000e+00, ? 0.00000000e+00, ? 0.00000000e+00,
> ? ? ? ? 0.00000000e+00, ? 0.00000000e+00])
>>>> stats.bernoulli.ppf(np.linspace(0,1,11),1-1e-16)
> array([-1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1., ?1.])


for the record (and future searches)

almost degenerate normal also seems to work,
http://en.wikipedia.org/wiki/Dirac_delta_function

>>> stats.norm.rvs(loc=2.5, scale=1e-10, size=10)
array([ 2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5])
>>> stats.norm.cdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-10)
array([ 0. ,  0. ,  0. ,  0. ,  0. ,  0.5,  1. ,  1. ,  1. ,  1. ,  1. ])
>>> stats.norm.pdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-10)
array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   3.98942280e+09,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00])
>>> stats.norm.cdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-16)
array([ 0. ,  0. ,  0. ,  0. ,  0. ,  0.5,  1. ,  1. ,  1. ,  1. ,  1. ])
>>> stats.norm.pdf(np.linspace(2.1,2.9,11),loc=2.5, scale=1e-16)
array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   3.98942280e+15,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00])
>>> stats.norm.ppf(np.linspace(0,1,11),loc=2.5, scale=1e-16)
array([-Inf,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  2.5,  Inf])
>>>

Josef

>
> Josef
>
>>
>> bye
>>
>> Nicky
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From amcmorl at gmail.com  Mon May  3 12:49:44 2010
From: amcmorl at gmail.com (Angus McMorland)
Date: Mon, 3 May 2010 12:49:44 -0400
Subject: [SciPy-User] loadmat, savemat and strings
Message-ID: <n2pe85f13c41005030949pd2929232q8fe1e569537bd6c3@mail.gmail.com>

Hi all,

I'm running in to problems trying to save some metadata in a matlab file
variable written with savemat, and read in with loadmat. I've condensed the
problem to the following:

import numpy as np
from scipy import io
a = np.array(['HelloWorld', 'Foobar'])
io.savemat('tmp.mat', dict(a=a))
res = io.loadmat('tmp.mat')

print res['a']
->
array([u'HloolWw', u'elWrdo'],
      dtype='<U10')

i.e. the character order alternates incorrectly between the values. It
doesn't appear this is just a byte order issue (or quite possibly I have no
idea how to swap the byteorder around) since:

print res['a'].view(dtype=('>U10'))
->
array([u'\U48000000\U6c000000\U6f000000\U6f000000\U6c000000\U57000000\U77000000',
       u'\U65000000\U6c000000\U57000000\U72000000\U64000000\U6f000000'],
      dtype='>U10')

and

res['a'].byteswap() gives the same result.

Finally, I've tried coercing the input into a unicode type before saving...

print a.astype('U')
->array([u'HelloWorld', u'Wow'],
      dtype='<U10')

so far so good, but then

io.savemat('tmp.mat', dict(a=a.astype('U')))
res = io.loadmat('tmp.mat')

throws the error:

...
/usr/lib/python2.6/dist-packages/scipy/io/matlab/mio.pyc in
loadmat(file_name, mdict, appendmat, **kwargs)
    109     '''
    110     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 111     matfile_dict = MR.get_variables()
    112     if mdict is not None:
    113         mdict.update(matfile_dict)

/usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.pyc in
get_variables(self, variable_names)
    359                 getter.to_next()
    360                 continue
--> 361             res = getter.get_array()
    362             mdict[name] = res
    363             if getter.is_global:

/usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.pyc in
get_array(self)
    400     def get_array(self):
    401         ''' Gets an array from matrix, and applies any necessary
processing '''
--> 402         arr = self.get_raw_array()
    403         return self.array_reader.processor_func(arr, self)
    404

/usr/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in
get_raw_array(self)
    442                           dtype=np.dtype('U1'),
    443                           buffer=np.array(res),
--> 444                           order='F').copy()
    445
    446

TypeError: buffer is too small for requested array

Is this a bug (I guess it shouldn't throw errors quite like this in any
case), and is there a successful method for saving string types into matlab
files and retrieving them?

Thanks,

Angus.

-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100503/90a952cc/attachment.html>

From matthew.brett at gmail.com  Mon May  3 13:32:08 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 3 May 2010 10:32:08 -0700
Subject: [SciPy-User] loadmat, savemat and strings
In-Reply-To: <n2pe85f13c41005030949pd2929232q8fe1e569537bd6c3@mail.gmail.com>
References: <n2pe85f13c41005030949pd2929232q8fe1e569537bd6c3@mail.gmail.com>
Message-ID: <v2p1e2af89e1005031032o3c57830fr1b0cf013b91f7093@mail.gmail.com>

Hi,

> from scipy import io
> a = np.array(['HelloWorld', 'Foobar'])
> io.savemat('tmp.mat', dict(a=a))
> res = io.loadmat('tmp.mat')
> print res['a']
> ->
> array([u'HloolWw', u'elWrdo'],
> ?? ? ?dtype='<U10')

> io.savemat('tmp.mat', dict(a=a.astype('U')))
> res = io.loadmat('tmp.mat')

> TypeError: buffer is too small for requested array
> Is this a bug (I guess it shouldn't throw errors quite like this in any
> case), and is there a successful method for saving string types into matlab
> files and retrieving them?

Certainly a bug - I'll have a look later today, thanks for the report,

Best,

Matthew


From t.charrett at cranfield.ac.uk  Mon May  3 14:39:29 2010
From: t.charrett at cranfield.ac.uk (Charrett, Thomas)
Date: Mon, 3 May 2010 19:39:29 +0100
Subject: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
Message-ID: <CB2B2AE46366814B817A6E0B01EE7B3704D6311D69@CCEXCHSTORE-1.central.cranfield.ac.uk>

Anil,
Thanks for the comments.

The show() call is 'feature' of matplotlib, to avoid slow redraws when constructing figures. To get rid of it you can put pylab into interactive mode using pylab.ion() (pylab.ioff() does the opposite) or by editing your matplotlib configuration files. IPython may well turn it on automatically. 

You will also probably need to make sure you are using the correct matplotlib backend for the engine type - for internal/ wxExternal engines this is one of the wx backends, for the TkExternal engine use the Tk backend etc...

As for integrating IPython probably not, but I may decide to add the same magic commands that IPython uses, I'm not sure yet as some of them seem a bit pointless, %run = python execfile command, and others are replaced by the gui , such as %who/%whos. 

Tom
------------------------------

Message: 3
Date: Sun, 2 May 2010 17:24:13 +0530
From: Anil C R <cr.anil at gmail.com>
Subject: Re: [SciPy-User] [ANN] PythonToolkit (PTK) - open source,
        wxPython         interactive environment
To: SciPy Users List <scipy-user at scipy.org>
Message-ID:
        <j2o8cf7db601005020454z3c909544s898f1ff238ec34da at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Tom, are you planning on integrating ipython too? I think that would be
nice. Also I use matplotlib, and every time i need to do an imshow, I need
to do something like this:

img = imread('image.png')
imshow(img),show()

is this a problem with matplotlib or with your software?? any workarounds to
avoid the  show() call?

Thanks
Anil

From vanforeest at gmail.com  Mon May  3 15:32:02 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 3 May 2010 21:32:02 +0200
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
Message-ID: <j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>

Hi Josef,

Thanks for your answer.

On 3 May 2010 15:16,  <josef.pktd at gmail.com> wrote:
> On Mon, May 3, 2010 at 6:04 AM, nicky van foreest <vanforeest at gmail.com> wrote:
>> Hi,
>>
>> As far as I can see scipy.stats does not support the deterministic
>> distribution. Would it be a good idea to implement this also? In my
>> opinion this distribution is very useful to use as a test case, for
>> debugging purposes for instance.

One case is the M/D/1 queue, a single server with exponentially
distributed interarrival times and deterministic service times.
Another case is an inventory system with periodic replenishments, and
random demands. A first simple model would be to use deterministically
distributed interreplenishment times. The size of demand can also be
taken to be deterministic, as an interesting limiting case.

>
> You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution
> (I never heard the term deterministic distribution before).

Yes.


>
> If the support is an integer, then rv_discrete might work, looks good see below
>
> Are there any useful operations, that we could do with it?

Yes, like simulating the M/D/1 queue. Suppose I would like to build a
queueing simulator. I would like to set this up in a generic way, and
pass rv_arrival and  rv_service as frozen rvs, Like this I can
experiment with several distributions, including the deterministic
distribution as a limiting case or simple case,  all within the same
framework.

> I think I can see a case for debugging programs that use the
> distributions in scipy.stats, but almost degenerate might also work
> for debugging.

Sure, but sometimes you just want to exclude random effects. Moreover,
I would like to see "rv = stats.deterministic(...)" in the  code, for
the purpose of readability.

>
> What I would like to have is a discrete distribution on the real line,
> instead of the integers, like rv_discrete but with support on
> arbitrary floats.

Yes, indeed.

Please let me know your opinion.

bye

Nicky


From thomas.robitaille at gmail.com  Mon May  3 17:02:27 2010
From: thomas.robitaille at gmail.com (Thomas Robitaille)
Date: Mon, 3 May 2010 17:02:27 -0400
Subject: [SciPy-User] Serious issue with interp2d
Message-ID: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com>

Hi,

I'm having issues getting interp2d to work for even simple arrays with linear interpolation. The following example:

import numpy as np
from scipy.interpolate import interp2d

# Create image
nx, ny = 10, 10
image = np.zeros((nx,ny))

for i in range(nx):
    for j in range(ny):
        image[i,j] = float(i + j/2)

# Define pixel centers along each direction
x = np.linspace(0.5, float(nx) - 0.5, nx)
y = np.linspace(0.5, float(ny) - 0.5, ny)

# Create interpolating function
f = interp2d(x,y,image, kind='linear')

Returns the following warning

Warning:     No more knots can be added because the number of B-spline coefficients
    already exceeds the number of data points m. Probably causes: either
    s or m too small. (fp>s)
	kx,ky=1,1 nx,ny=13,12 m=100 fp=0.000000 s=0.000000

and some of the results using the interpolating function are wrong. What is going on? I don't understand why spline coefficients are even mentioned, because I specified that I just wanted linear interpolation.

Can anyone reproduce this issue? I'm using scipy svn r6368.

Thanks,

Thomas


From matthew.brett at gmail.com  Tue May  4 02:48:16 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 3 May 2010 23:48:16 -0700
Subject: [SciPy-User] loadmat, savemat and strings
In-Reply-To: <v2p1e2af89e1005031032o3c57830fr1b0cf013b91f7093@mail.gmail.com>
References: <n2pe85f13c41005030949pd2929232q8fe1e569537bd6c3@mail.gmail.com>
	<v2p1e2af89e1005031032o3c57830fr1b0cf013b91f7093@mail.gmail.com>
Message-ID: <v2o1e2af89e1005032348ie54f063cj7eb9d90415d30bb4@mail.gmail.com>

Hi,

>> from scipy import io
>> a = np.array(['HelloWorld', 'Foobar'])
>> io.savemat('tmp.mat', dict(a=a))
>> res = io.loadmat('tmp.mat')
>> print res['a']
>> ->
>> array([u'HloolWw', u'elWrdo'],
>> ?? ? ?dtype='<U10')
>
>> io.savemat('tmp.mat', dict(a=a.astype('U')))
>> res = io.loadmat('tmp.mat')
>
>> TypeError: buffer is too small for requested array
>> Is this a bug (I guess it shouldn't throw errors quite like this in any
>> case), and is there a successful method for saving string types into matlab
>> files and retrieving them?
>
> Certainly a bug - I'll have a look later today, thanks for the report,

The two bugs you found should be fixed in latest SVN...

Best,

Matthew


From chris at simplistix.co.uk  Tue May  4 07:00:05 2010
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 04 May 2010 12:00:05 +0100
Subject: [SciPy-User] problems with build
Message-ID: <4BDFFE35.8060107@simplistix.co.uk>

Hi All,

Now that I've finally managed to subscribe to this list, I haev a 
question about installation of numpy and scipy.

So, I tried this to get the latest numpy installed on an Ubuntu box:

sudo apt-get build-dep python-numpy

Then, inside the virtual_env I'm working in:

bin/easy_install bin/easy_install numpy

...which left me with:

Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg
Processing dependencies for numpy
Finished processing dependencies for numpy
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
   File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File 
"/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", 
line 248, in clean_up_temporary_directory
SystemError: Parent module 'numpy.distutils' not loaded
Error in sys.exitfunc:
Traceback (most recent call last):
   File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File 
"/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", 
line 248, in clean_up_temporary_directory
SystemError: Parent module 'numpy.distutils' not loaded

...and yet:

$ bin/python
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numpy
 >>>

Any idea what those weird atexit handlers are supposed to do?!

They seem to fire not only when numpy is installed but also when 
anything that depends on numpy is installed...

cheers,

Chris


From denis-bz-gg at t-online.de  Tue May  4 07:33:50 2010
From: denis-bz-gg at t-online.de (denis)
Date: Tue, 4 May 2010 04:33:50 -0700 (PDT)
Subject: [SciPy-User] Serious issue with interp2d
In-Reply-To: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com>
References: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com>
Message-ID: <10404874-c937-4653-a3bc-1ac9db69962a@p2g2000yqh.googlegroups.com>

Thomas.
  RectBivariateSpline works  in 0.7.1:

from __future__ import division
import numpy as np
from scipy.interpolate import RectBivariateSpline

np.set_printoptions( 2, threshold=100, suppress=True )  # .2f

# Create image
nx, ny = 10, 10
image = np.zeros((nx,ny))
for i in range(nx):
    for j in range(ny):
        image[i,j] = i + j/2

# Define pixel centers along each direction
x = np.linspace(0.5, nx - 0.5, nx)
y = np.linspace(0.5, ny - 0.5, ny)

# Create interpolating function
f = RectBivariateSpline( x,y,image, kx=1, ky=1 )  # s=0: interpolate

# and use it
xi = np.linspace(0, nx, 2*nx + 1)
yi = np.linspace(0, ny, 2*ny + 1)
zi = f( xi, yi )
print "RectBivariateSpline:", zi


(There's an open ticket on interp2d 'linear' http://projects.scipy.org/scipy/ticket/898
but to me fitpack is murky / tryitandsee, lacks overview doc.)

cheers
  -- denis


From denis-bz-gg at t-online.de  Tue May  4 07:33:50 2010
From: denis-bz-gg at t-online.de (denis)
Date: Tue, 4 May 2010 04:33:50 -0700 (PDT)
Subject: [SciPy-User] Serious issue with interp2d
In-Reply-To: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com>
References: <4F4047BA-7DCB-47E7-A7EB-114A9A8BE699@gmail.com>
Message-ID: <10404874-c937-4653-a3bc-1ac9db69962a@p2g2000yqh.googlegroups.com>

Thomas.
  RectBivariateSpline works  in 0.7.1:

from __future__ import division
import numpy as np
from scipy.interpolate import RectBivariateSpline

np.set_printoptions( 2, threshold=100, suppress=True )  # .2f

# Create image
nx, ny = 10, 10
image = np.zeros((nx,ny))
for i in range(nx):
    for j in range(ny):
        image[i,j] = i + j/2

# Define pixel centers along each direction
x = np.linspace(0.5, nx - 0.5, nx)
y = np.linspace(0.5, ny - 0.5, ny)

# Create interpolating function
f = RectBivariateSpline( x,y,image, kx=1, ky=1 )  # s=0: interpolate

# and use it
xi = np.linspace(0, nx, 2*nx + 1)
yi = np.linspace(0, ny, 2*ny + 1)
zi = f( xi, yi )
print "RectBivariateSpline:", zi


(There's an open ticket on interp2d 'linear' http://projects.scipy.org/scipy/ticket/898
but to me fitpack is murky / tryitandsee, lacks overview doc.)

cheers
  -- denis


From amcmorl at gmail.com  Tue May  4 09:34:42 2010
From: amcmorl at gmail.com (Angus McMorland)
Date: Tue, 4 May 2010 09:34:42 -0400
Subject: [SciPy-User] Optimization with smoothing
Message-ID: <s2ve85f13c41005040634od8bccdc0ld88a6cb342d69f01@mail.gmail.com>

Hi all,

I need to do some optimization where one of the parameters is a
spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to
go about this using scipy (or any other numpy-compatible Python package)? I
could imagine using one of the scipy.optimize routines and then smoothing
the relevant parameters within the optimization loop, but it would be best
if the next iteration's of parameters were chosen from the previous
iteration's _smoothed_ parameters rather than their 'non-smooth'
predecessors, as it seems like this would keep the optimization better
behaved. Is this possible?

Thanks,

Angus.
-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100504/740e1de6/attachment.html>

From bruce.labitt at autoliv.com  Tue May  4 15:48:55 2010
From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com)
Date: Tue, 4 May 2010 15:48:55 -0400
Subject: [SciPy-User] lstsq error under Windows?
Message-ID: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>

I have found an issue with scipy.linalg.lstsq, I think.  The following 
code works in Ubuntu 10.04 x86-64, but not in WinXP-32.  I think it should 
work in WinXP.  Here is a minimum example:

"""==== program testlstsq.py ======================"""
from scipy.linalg import eig, lstsq
from numpy import angle, zeros, pi, arcsin

A = zeros((4,1), dtype='complex')
B = zeros((4,1), dtype='complex')

A[0,0] = -0.535412460549-2.65798938848e-17j
A[1,0] = -0.369432866546-0.131765700574j
A[2,0] = -0.222906796932-0.263237285725j
A[3,0] = -0.069087096386-0.38609560454j

B[0,0] = -0.369432866546-0.131765700574j
B[1,0] = -0.222906796932-0.263237285725j
B[2,0] = -0.069087096386-0.38609560454j
B[3,0] =  0.0882283631514-0.528093039953j

try:
    print 'Got here'
    phi = lstsq(A,B)
    print 'Finished lstsq'
except:
    print 'Exception Occurred'
else:
    for a in range(len(phi)):
        print 'phi[',a,']=',phi[a]

    w = -angle( eig(phi[0])[0][:] )
    d = 0.5
    aa = arcsin( w / (2.0*pi) )*180./pi     # in degrees
    print 'aa unsorted =', aa
""" ==== end of testlstsq.py =========================="""

If this is run under ipython, or python on Ubuntu, the answer is:

In [3]: run testlstsq.py
Got here
Finished lstsq
phi[ 0 ]= [[ 0.88271111+0.38811028j]]
phi[ 1 ]= [-0.04649206-0.01274722j]
phi[ 2 ]= 1
phi[ 3 ]= [ 0.84459073]
aoa unsorted = [-3.780145]

If testlstsq.py is run under WinXP-32 one gets the following result:
> python testlstsq.py
Got here
 ** On entry to ZGELSS parameter number 12 had an illegal value

I think that ZGELSS is in LAPACK.  After that, I am in over my head.  Can 
someone help with this?

Under Windows I am running:
WinXP-x86-32
Python(x,y)2.6.2.0 --> Python 2.6.2
numpy 1.3.0
scipy 0.7.1
ipython 0.10

For Linux, Ubuntu 10.04 x86-64
Python 2.6.5
numpy 1.30
scipy 0.7.0-2
ipython 0.10-1

Thanks for any and all help!
-Bruce


******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************

From josef.pktd at gmail.com  Tue May  4 16:10:04 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 4 May 2010 16:10:04 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
References: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
Message-ID: <o2s1cd32cbb1005041310x218d4e27t4ef1966a43417673@mail.gmail.com>

On Tue, May 4, 2010 at 3:48 PM,  <bruce.labitt at autoliv.com> wrote:
> I have found an issue with scipy.linalg.lstsq, I think. ?The following
> code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it should
> work in WinXP. ?Here is a minimum example:
>
> """==== program testlstsq.py ======================"""
> from scipy.linalg import eig, lstsq
> from numpy import angle, zeros, pi, arcsin
>
> A = zeros((4,1), dtype='complex')
> B = zeros((4,1), dtype='complex')
>
> A[0,0] = -0.535412460549-2.65798938848e-17j
> A[1,0] = -0.369432866546-0.131765700574j
> A[2,0] = -0.222906796932-0.263237285725j
> A[3,0] = -0.069087096386-0.38609560454j
>
> B[0,0] = -0.369432866546-0.131765700574j
> B[1,0] = -0.222906796932-0.263237285725j
> B[2,0] = -0.069087096386-0.38609560454j
> B[3,0] = ?0.0882283631514-0.528093039953j
>
> try:
> ? ?print 'Got here'
> ? ?phi = lstsq(A,B)
> ? ?print 'Finished lstsq'
> except:
> ? ?print 'Exception Occurred'
> else:
> ? ?for a in range(len(phi)):
> ? ? ? ?print 'phi[',a,']=',phi[a]
>
> ? ?w = -angle( eig(phi[0])[0][:] )
> ? ?d = 0.5
> ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees
> ? ?print 'aa unsorted =', aa
> """ ==== end of testlstsq.py =========================="""
>
> If this is run under ipython, or python on Ubuntu, the answer is:
>
> In [3]: run testlstsq.py
> Got here
> Finished lstsq
> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> phi[ 1 ]= [-0.04649206-0.01274722j]
> phi[ 2 ]= 1
> phi[ 3 ]= [ 0.84459073]
> aoa unsorted = [-3.780145]
>
> If testlstsq.py is run under WinXP-32 one gets the following result:
>> python testlstsq.py
> Got here
> ?** On entry to ZGELSS parameter number 12 had an illegal value
>
> I think that ZGELSS is in LAPACK. ?After that, I am in over my head. ?Can
> someone help with this?
>
> Under Windows I am running:
> WinXP-x86-32
> Python(x,y)2.6.2.0 --> Python 2.6.2
> numpy 1.3.0
> scipy 0.7.1
> ipython 0.10

no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy 0.8dev_something

Got here
Finished lstsq
phi[ 0 ]= [[ 0.88271111+0.38811028j]]
phi[ 1 ]= [-0.04649206-0.01274722j]
phi[ 2 ]= 1
phi[ 3 ]= [ 0.84459073]
aa unsorted = [-3.780145]

after replacing scipy.linalg with numpy.linalg, same result

Got here
Finished lstsq
phi[ 0 ]= [[ 0.88271111+0.38811028j]]
phi[ 1 ]= [-0.04649206-0.01274722j]
phi[ 2 ]= 1
phi[ 3 ]= [ 0.84459073]
aa unsorted = [-3.780145]

Josef

>
> For Linux, Ubuntu 10.04 x86-64
> Python 2.6.5
> numpy 1.30
> scipy 0.7.0-2
> ipython 0.10-1
>
> Thanks for any and all help!
> -Bruce
>
>
>
> ******************************
> Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail. ?Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
> ******************************
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bruce.labitt at autoliv.com  Tue May  4 16:27:16 2010
From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com)
Date: Tue, 4 May 2010 16:27:16 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
Message-ID: <OF1118E00F.0CEE631E-ON85257719.006F8ABB-85257719.00706C6C@autoliv.int>

scipy-user-bounces at scipy.org wrote on 05/04/2010 03:48:55 PM:

> I have found an issue with scipy.linalg.lstsq, I think.  The following 
> code works in Ubuntu 10.04 x86-64, but not in WinXP-32.  I think it 
should 
> work in WinXP.  Here is a minimum example:
> 
> """==== program testlstsq.py ======================"""
> from scipy.linalg import eig, lstsq
> from numpy import angle, zeros, pi, arcsin
> 
> A = zeros((4,1), dtype='complex')
> B = zeros((4,1), dtype='complex')
> 
> A[0,0] = -0.535412460549-2.65798938848e-17j
> A[1,0] = -0.369432866546-0.131765700574j
> A[2,0] = -0.222906796932-0.263237285725j
> A[3,0] = -0.069087096386-0.38609560454j
> 
> B[0,0] = -0.369432866546-0.131765700574j
> B[1,0] = -0.222906796932-0.263237285725j
> B[2,0] = -0.069087096386-0.38609560454j
> B[3,0] =  0.0882283631514-0.528093039953j
> 
> try:
>     print 'Got here'
>     phi = lstsq(A,B)
>     print 'Finished lstsq'
> except:
>     print 'Exception Occurred'
> else:
>     for a in range(len(phi)):
>         print 'phi[',a,']=',phi[a]
> 
>     w = -angle( eig(phi[0])[0][:] )
>     d = 0.5
>     aa = arcsin( w / (2.0*pi) )*180./pi     # in degrees
>     print 'aa unsorted =', aa
> """ ==== end of testlstsq.py =========================="""
> 
> If this is run under ipython, or python on Ubuntu, the answer is:
> 
> In [3]: run testlstsq.py
> Got here
> Finished lstsq
> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> phi[ 1 ]= [-0.04649206-0.01274722j]
> phi[ 2 ]= 1
> phi[ 3 ]= [ 0.84459073]
> aoa unsorted = [-3.780145]
> 
> If testlstsq.py is run under WinXP-32 one gets the following result:
> > python testlstsq.py
> Got here
>  ** On entry to ZGELSS parameter number 12 had an illegal value
> 
> I think that ZGELSS is in LAPACK.  After that, I am in over my head. Can 

> someone help with this?
> 
> Under Windows I am running:
> WinXP-x86-32
> Python(x,y)2.6.2.0 --> Python 2.6.2
> numpy 1.3.0
> scipy 0.7.1
> ipython 0.10
> 
> For Linux, Ubuntu 10.04 x86-64
> Python 2.6.5
> numpy 1.30
> scipy 0.7.0-2
> ipython 0.10-1
> 
> Thanks for any and all help!
> -Bruce
> 
> 

Per Josef's observation that scipy.linalg.lstsq and numpy.lstsq behaved 
the same for him, I tried changing my code to

- from scipy.linalg import eig, lstsq
+ from scipy.linalg import eig
+ from numpy.linalg import lstsq

and reran the code and got -
>pythonw -u "testlstsq.py"
Got here
Finished lstsq
phi[ 0 ]= [[ 0.88271111+0.38811028j]]
phi[ 1 ]= [-0.04649206-0.01274722j]
phi[ 2 ]= 1
phi[ 3 ]= [ 0.84459073]
aoa unsorted = [-3.780145]
>Exit code: 0

So scipy.linalg.lstsq version 0.7.1 under WinXP-32 faults using the CME 
and
numpy.linalg.lstsq version 1.3.0 does not.

It looks like I have an answer for now, and scipy dev's might have a 
mini-testbench to test against.

Thanks for the help!
-Bruce

******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************

From josef.pktd at gmail.com  Tue May  4 16:32:30 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 4 May 2010 16:32:30 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OF1118E00F.0CEE631E-ON85257719.006F8ABB-85257719.00706C6C@autoliv.int>
References: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
	<OF1118E00F.0CEE631E-ON85257719.006F8ABB-85257719.00706C6C@autoliv.int>
Message-ID: <h2i1cd32cbb1005041332u29a80294wb9925d9ea92c7066@mail.gmail.com>

On Tue, May 4, 2010 at 4:27 PM,  <bruce.labitt at autoliv.com> wrote:
> scipy-user-bounces at scipy.org wrote on 05/04/2010 03:48:55 PM:
>
>> I have found an issue with scipy.linalg.lstsq, I think. ?The following
>> code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it
> should
>> work in WinXP. ?Here is a minimum example:
>>
>> """==== program testlstsq.py ======================"""
>> from scipy.linalg import eig, lstsq
>> from numpy import angle, zeros, pi, arcsin
>>
>> A = zeros((4,1), dtype='complex')
>> B = zeros((4,1), dtype='complex')
>>
>> A[0,0] = -0.535412460549-2.65798938848e-17j
>> A[1,0] = -0.369432866546-0.131765700574j
>> A[2,0] = -0.222906796932-0.263237285725j
>> A[3,0] = -0.069087096386-0.38609560454j
>>
>> B[0,0] = -0.369432866546-0.131765700574j
>> B[1,0] = -0.222906796932-0.263237285725j
>> B[2,0] = -0.069087096386-0.38609560454j
>> B[3,0] = ?0.0882283631514-0.528093039953j
>>
>> try:
>> ? ? print 'Got here'
>> ? ? phi = lstsq(A,B)
>> ? ? print 'Finished lstsq'
>> except:
>> ? ? print 'Exception Occurred'
>> else:
>> ? ? for a in range(len(phi)):
>> ? ? ? ? print 'phi[',a,']=',phi[a]
>>
>> ? ? w = -angle( eig(phi[0])[0][:] )
>> ? ? d = 0.5
>> ? ? aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees
>> ? ? print 'aa unsorted =', aa
>> """ ==== end of testlstsq.py =========================="""
>>
>> If this is run under ipython, or python on Ubuntu, the answer is:
>>
>> In [3]: run testlstsq.py
>> Got here
>> Finished lstsq
>> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
>> phi[ 1 ]= [-0.04649206-0.01274722j]
>> phi[ 2 ]= 1
>> phi[ 3 ]= [ 0.84459073]
>> aoa unsorted = [-3.780145]
>>
>> If testlstsq.py is run under WinXP-32 one gets the following result:
>> > python testlstsq.py
>> Got here
>> ?** On entry to ZGELSS parameter number 12 had an illegal value
>>
>> I think that ZGELSS is in LAPACK. ?After that, I am in over my head. Can
>
>> someone help with this?
>>
>> Under Windows I am running:
>> WinXP-x86-32
>> Python(x,y)2.6.2.0 --> Python 2.6.2
>> numpy 1.3.0
>> scipy 0.7.1
>> ipython 0.10
>>
>> For Linux, Ubuntu 10.04 x86-64
>> Python 2.6.5
>> numpy 1.30
>> scipy 0.7.0-2
>> ipython 0.10-1
>>
>> Thanks for any and all help!
>> -Bruce
>>
>>
>
> Per Josef's observation that scipy.linalg.lstsq and numpy.lstsq behaved
> the same for him, I tried changing my code to
>
> - from scipy.linalg import eig, lstsq
> + from scipy.linalg import eig
> + from numpy.linalg import lstsq
>
> and reran the code and got -
>>pythonw -u "testlstsq.py"
> Got here
> Finished lstsq
> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> phi[ 1 ]= [-0.04649206-0.01274722j]
> phi[ 2 ]= 1
> phi[ 3 ]= [ 0.84459073]
> aoa unsorted = [-3.780145]
>>Exit code: 0
>
> So scipy.linalg.lstsq version 0.7.1 under WinXP-32 faults using the CME
> and
> numpy.linalg.lstsq version 1.3.0 does not.
>
> It looks like I have an answer for now, and scipy dev's might have a
> mini-testbench to test against.

It could be that this is a problem with the transition to python 2.6,
that might have gone away in the meantime. I never had problems with
python 2.5 and scipy.linalg

Josef

>
> Thanks for the help!
> -Bruce
>
> ******************************
> Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail. ?Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
> ******************************
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bruce.labitt at autoliv.com  Tue May  4 16:15:42 2010
From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com)
Date: Tue, 4 May 2010 16:15:42 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <o2s1cd32cbb1005041310x218d4e27t4ef1966a43417673@mail.gmail.com>
Message-ID: <OFFFEA58AC.5DFD3584-ON85257719.006EE924-85257719.006F5D56@autoliv.int>

scipy-user-bounces at scipy.org wrote on 05/04/2010 04:10:04 PM:

> On Tue, May 4, 2010 at 3:48 PM,  <bruce.labitt at autoliv.com> wrote:
> > I have found an issue with scipy.linalg.lstsq, I think.  The following
> > code works in Ubuntu 10.04 x86-64, but not in WinXP-32.  I think it 
should
> > work in WinXP.  Here is a minimum example:
> >
> > """==== program testlstsq.py ======================"""
> > from scipy.linalg import eig, lstsq
> > from numpy import angle, zeros, pi, arcsin
> >
> > A = zeros((4,1), dtype='complex')
> > B = zeros((4,1), dtype='complex')
> >
> > A[0,0] = -0.535412460549-2.65798938848e-17j
> > A[1,0] = -0.369432866546-0.131765700574j
> > A[2,0] = -0.222906796932-0.263237285725j
> > A[3,0] = -0.069087096386-0.38609560454j
> >
> > B[0,0] = -0.369432866546-0.131765700574j
> > B[1,0] = -0.222906796932-0.263237285725j
> > B[2,0] = -0.069087096386-0.38609560454j
> > B[3,0] =  0.0882283631514-0.528093039953j
> >
> > try:
> >    print 'Got here'
> >    phi = lstsq(A,B)
> >    print 'Finished lstsq'
> > except:
> >    print 'Exception Occurred'
> > else:
> >    for a in range(len(phi)):
> >        print 'phi[',a,']=',phi[a]
> >
> >    w = -angle( eig(phi[0])[0][:] )
> >    d = 0.5
> >    aa = arcsin( w / (2.0*pi) )*180./pi     # in degrees
> >    print 'aa unsorted =', aa
> > """ ==== end of testlstsq.py =========================="""
> >
> > If this is run under ipython, or python on Ubuntu, the answer is:
> >
> > In [3]: run testlstsq.py
> > Got here
> > Finished lstsq
> > phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> > phi[ 1 ]= [-0.04649206-0.01274722j]
> > phi[ 2 ]= 1
> > phi[ 3 ]= [ 0.84459073]
> > aoa unsorted = [-3.780145]
> >
> > If testlstsq.py is run under WinXP-32 one gets the following result:
> >> python testlstsq.py
> > Got here
> >  ** On entry to ZGELSS parameter number 12 had an illegal value
> >
> > I think that ZGELSS is in LAPACK.  After that, I am in over my head. 
 Can
> > someone help with this?
> >
> > Under Windows I am running:
> > WinXP-x86-32
> > Python(x,y)2.6.2.0 --> Python 2.6.2
> > numpy 1.3.0
> > scipy 0.7.1
> > ipython 0.10
> 
> no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy 
0.8dev_something

Is this WinXP-32?

I notice you have an earlier python, and a later version of numpy and 
scipy.  I thought numpy 1.4.0 was "recalled" a while back.  I'll try 
numpy.linalg to see if there is a difference.

> 
> Got here
> Finished lstsq
> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> phi[ 1 ]= [-0.04649206-0.01274722j]
> phi[ 2 ]= 1
> phi[ 3 ]= [ 0.84459073]
> aa unsorted = [-3.780145]
> 
> after replacing scipy.linalg with numpy.linalg, same result
> 
> Got here
> Finished lstsq
> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
> phi[ 1 ]= [-0.04649206-0.01274722j]
> phi[ 2 ]= 1
> phi[ 3 ]= [ 0.84459073]
> aa unsorted = [-3.780145]
> 
> Josef

Thanks for trying this!
-Bruce

> 
> >
> > For Linux, Ubuntu 10.04 x86-64
> > Python 2.6.5
> > numpy 1.30
> > scipy 0.7.0-2
> > ipython 0.10-1
> >
> > Thanks for any and all help!
> > -Bruce
> >
> >
> >


******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************

From josef.pktd at gmail.com  Tue May  4 17:22:55 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 4 May 2010 17:22:55 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OFFFEA58AC.5DFD3584-ON85257719.006EE924-85257719.006F5D56@autoliv.int>
References: <o2s1cd32cbb1005041310x218d4e27t4ef1966a43417673@mail.gmail.com>
	<OFFFEA58AC.5DFD3584-ON85257719.006EE924-85257719.006F5D56@autoliv.int>
Message-ID: <v2t1cd32cbb1005041422k11a40a8dmc541e915c36708ae@mail.gmail.com>

On Tue, May 4, 2010 at 4:15 PM,  <bruce.labitt at autoliv.com> wrote:
> scipy-user-bounces at scipy.org wrote on 05/04/2010 04:10:04 PM:
>
>> On Tue, May 4, 2010 at 3:48 PM, ?<bruce.labitt at autoliv.com> wrote:
>> > I have found an issue with scipy.linalg.lstsq, I think. ?The following
>> > code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it
> should
>> > work in WinXP. ?Here is a minimum example:
>> >
>> > """==== program testlstsq.py ======================"""
>> > from scipy.linalg import eig, lstsq
>> > from numpy import angle, zeros, pi, arcsin
>> >
>> > A = zeros((4,1), dtype='complex')
>> > B = zeros((4,1), dtype='complex')
>> >
>> > A[0,0] = -0.535412460549-2.65798938848e-17j
>> > A[1,0] = -0.369432866546-0.131765700574j
>> > A[2,0] = -0.222906796932-0.263237285725j
>> > A[3,0] = -0.069087096386-0.38609560454j
>> >
>> > B[0,0] = -0.369432866546-0.131765700574j
>> > B[1,0] = -0.222906796932-0.263237285725j
>> > B[2,0] = -0.069087096386-0.38609560454j
>> > B[3,0] = ?0.0882283631514-0.528093039953j
>> >
>> > try:
>> > ? ?print 'Got here'
>> > ? ?phi = lstsq(A,B)
>> > ? ?print 'Finished lstsq'
>> > except:
>> > ? ?print 'Exception Occurred'
>> > else:
>> > ? ?for a in range(len(phi)):
>> > ? ? ? ?print 'phi[',a,']=',phi[a]
>> >
>> > ? ?w = -angle( eig(phi[0])[0][:] )
>> > ? ?d = 0.5
>> > ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees
>> > ? ?print 'aa unsorted =', aa
>> > """ ==== end of testlstsq.py =========================="""
>> >
>> > If this is run under ipython, or python on Ubuntu, the answer is:
>> >
>> > In [3]: run testlstsq.py
>> > Got here
>> > Finished lstsq
>> > phi[ 0 ]= [[ 0.88271111+0.38811028j]]
>> > phi[ 1 ]= [-0.04649206-0.01274722j]
>> > phi[ 2 ]= 1
>> > phi[ 3 ]= [ 0.84459073]
>> > aoa unsorted = [-3.780145]
>> >
>> > If testlstsq.py is run under WinXP-32 one gets the following result:
>> >> python testlstsq.py
>> > Got here
>> > ?** On entry to ZGELSS parameter number 12 had an illegal value
>> >
>> > I think that ZGELSS is in LAPACK. ?After that, I am in over my head.
> ?Can
>> > someone help with this?
>> >
>> > Under Windows I am running:
>> > WinXP-x86-32
>> > Python(x,y)2.6.2.0 --> Python 2.6.2
>> > numpy 1.3.0
>> > scipy 0.7.1
>> > ipython 0.10
>>
>> no problem here, windowsXP, python 2.5, numpy 1.4.0, scipy
> 0.8dev_something
>
> Is this WinXP-32?

Yes

>
> I notice you have an earlier python, and a later version of numpy and
> scipy. ?I thought numpy 1.4.0 was "recalled" a while back. ?I'll try
> numpy.linalg to see if there is a difference.

I recompiled most packages after numpy 1.4.0 came out and I'm too lazy
or too busy to figure out what I need to recompile to switch to numpy
1.4.1. (I just avoid the things that crash with 1.4.0)

Josef

>
>>
>> Got here
>> Finished lstsq
>> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
>> phi[ 1 ]= [-0.04649206-0.01274722j]
>> phi[ 2 ]= 1
>> phi[ 3 ]= [ 0.84459073]
>> aa unsorted = [-3.780145]
>>
>> after replacing scipy.linalg with numpy.linalg, same result
>>
>> Got here
>> Finished lstsq
>> phi[ 0 ]= [[ 0.88271111+0.38811028j]]
>> phi[ 1 ]= [-0.04649206-0.01274722j]
>> phi[ 2 ]= 1
>> phi[ 3 ]= [ 0.84459073]
>> aa unsorted = [-3.780145]
>>
>> Josef
>
> Thanks for trying this!
> -Bruce
>
>>
>> >
>> > For Linux, Ubuntu 10.04 x86-64
>> > Python 2.6.5
>> > numpy 1.30
>> > scipy 0.7.0-2
>> > ipython 0.10-1
>> >
>> > Thanks for any and all help!
>> > -Bruce
>> >
>> >
>> >
>
>
>
> ******************************
> Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail. ?Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
> ******************************
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From denis-bz-gg at t-online.de  Wed May  5 07:23:54 2010
From: denis-bz-gg at t-online.de (denis)
Date: Wed, 5 May 2010 04:23:54 -0700 (PDT)
Subject: [SciPy-User] Interpolate based on three closest points
In-Reply-To: <y2i362a43271004210704jf5eea48czd12a4fb491019c4a@mail.gmail.com>
References: <y2i362a43271004210704jf5eea48czd12a4fb491019c4a@mail.gmail.com>
Message-ID: <1e43539b-ae21-4d45-a9f6-e35a4b7c2e87@p2g2000yqh.googlegroups.com>

On Apr 21, 4:04?pm, Tom Foutz <tom.fo... at gmail.com> wrote:
> Hi everybody,
> I have an irregular mesh of ~1e5 data points, with unreliable connection
> data. ?I am trying to interpolate based on these points. ?
...

Tom,
  instead of iterating to a triangle containing the point of interest,
you could always take say 6 neighbors (see below),
then average their 6 values, distance-weighted as Anne suggests.
It's a clear speed / accuracy tradeoff: there's some chance
that a point is not in the convex hull of its 6 nearest neighbors,
but even then you're using 6 values, not 3.

What's the probability that N random points around the origin
land in >= 3 of 4 quadrants in 2d, or >= 5 of 8 octants in 3d ?
My back-of-the-envelope estimate of this probability (unchecked) is

    ndim  N  prob one-sided ~ 2*ndim/2^N ?
    ------------------------------------
    2   6   6 %
    3   6   9 %
    3   7   5 %

==> taking 6 neighbors or so is seldom one-sided.  Experts correct
me ?

By the way, exactly-N-nearest interpolation may have discontinuities.
Consider N+1 points on a circle around the point of interest:
the result depends on which N you take. In practice, not a problem.

Also, RBF of N initial points sets up and solves an N x N linear
system
so is impractical for large N.  Furthermore the matrix can be ill-
conditioned.

cheers
  -- denis


From agile.aspect at gmail.com  Wed May  5 14:49:16 2010
From: agile.aspect at gmail.com (Agile Aspect)
Date: Wed, 5 May 2010 11:49:16 -0700
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
References: <OF60AA11E4.FA621CE5-ON85257719.006A4C35-85257719.006CE981@autoliv.int>
Message-ID: <v2tc6a6f63b1005051149tf09c70c0q43dec9ea3bcd057a@mail.gmail.com>

On Tue, May 4, 2010 at 12:48 PM,  <bruce.labitt at autoliv.com> wrote:
> I have found an issue with scipy.linalg.lstsq, I think. ?The following
> code works in Ubuntu 10.04 x86-64, but not in WinXP-32. ?I think it should
> work in WinXP. ?Here is a minimum example:
>
> """==== program testlstsq.py ======================"""
> from scipy.linalg import eig, lstsq
> from numpy import angle, zeros, pi, arcsin
>
> A = zeros((4,1), dtype='complex')
> B = zeros((4,1), dtype='complex')
>
> A[0,0] = -0.535412460549-2.65798938848e-17j
> A[1,0] = -0.369432866546-0.131765700574j
> A[2,0] = -0.222906796932-0.263237285725j
> A[3,0] = -0.069087096386-0.38609560454j
>
> B[0,0] = -0.369432866546-0.131765700574j
> B[1,0] = -0.222906796932-0.263237285725j
> B[2,0] = -0.069087096386-0.38609560454j
> B[3,0] = ?0.0882283631514-0.528093039953j
>
> try:
> ? ?print 'Got here'
> ? ?phi = lstsq(A,B)
> ? ?print 'Finished lstsq'
> except:
> ? ?print 'Exception Occurred'
> else:
> ? ?for a in range(len(phi)):
> ? ? ? ?print 'phi[',a,']=',phi[a]
>
> ? ?w = -angle( eig(phi[0])[0][:] )
> ? ?d = 0.5
> ? ?aa = arcsin( w / (2.0*pi) )*180./pi ? ? # in degrees
> ? ?print 'aa unsorted =', aa
> """ ==== end of testlstsq.py =========================="""
>

On Ubuntu, what version of ALTAS is being used?

If I modify the 'except' to read

  print 'Exception Occurred', sys.exc_info()[0]

the above code generates the enclosed error on CentOS 5 and Fedora 9
running python 2.5 and python 2.6, respectively, and both using
scipy-0.71 and nump 1.30.

Both plaforms use the same version of ATLAS, namely 3.8.2.

All software was built from source on the respective platforms.

Traceback (most recent call last):
  File "./lstsq.py", line 22, in <module>
    phi = lstsq(A,B)
  File "/usr/devtools/lib/python2.6/site-packages/scipy/linalg/basic.py",
line 549, in lstsq
    overwrite_b = overwrite_b)
ValueError: On entry to ZGELSS parameter number 12 had an illegal value

-- 
      Enjoy global warming while it lasts.


From eijkhout at tacc.utexas.edu  Wed May  5 22:18:11 2010
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Wed, 5 May 2010 21:18:11 -0500
Subject: [SciPy-User] getting started with ndarray
Message-ID: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>

I found the "guide to numpy" book, but I can't figure out how to  
create a multi-dimensional array. Is there a short tutorial? Or can  
someone give me a short example program with the most relevant features?

Victor.


From kwgoodman at gmail.com  Wed May  5 22:24:16 2010
From: kwgoodman at gmail.com (Keith Goodman)
Date: Wed, 5 May 2010 19:24:16 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
Message-ID: <w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>

On Wed, May 5, 2010 at 7:18 PM, Victor Eijkhout
<eijkhout at tacc.utexas.edu> wrote:
> I found the "guide to numpy" book, but I can't figure out how to
> create a multi-dimensional array. Is there a short tutorial? Or can
> someone give me a short example program with the most relevant features?

Here's one way to create arrays:

One dimensional:

>> import numpy as np
>> x1 = np.array([1, 2, 3])
>> x1.ndim
   1
>> x1
   array([1, 2, 3])

Two dimensional:

>> x2 = np.array([[1, 2], [3, 4]])
>> x2.ndim
   2
>> x2
array([[1, 2],
       [3, 4]])

Three dimensional:

>> x3 = np.random.rand(2, 3, 4)
>> x3.ndim
   3
>> x3.shape
   (2, 3, 4)
>> x3
array([[[ 0.85887601,  0.2988635 ,  0.93155938,  0.48419988],
        [ 0.677853  ,  0.67478433,  0.7065251 ,  0.49045808],
        [ 0.87160361,  0.55503905,  0.36378423,  0.39314846]],

       [[ 0.80761194,  0.54838378,  0.80576339,  0.08248982],
        [ 0.16729305,  0.16320019,  0.5628961 ,  0.77325458],
        [ 0.7073337 ,  0.08927084,  0.89050264,  0.54985488]]])


From eijkhout at tacc.utexas.edu  Wed May  5 22:28:41 2010
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Wed, 5 May 2010 21:28:41 -0500
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
Message-ID: <A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>


On 2010/05/05, at 9:24 PM, Keith Goodman wrote:

> Here's one way to create arrays:

Thanks. Suppose I don't have the data yet, but I simple want to  
allocate a, oh let's say, 5000x300x20 array?

Victor.


From d.l.goldsmith at gmail.com  Wed May  5 22:31:17 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 5 May 2010 19:31:17 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
Message-ID: <j2g45d1ab481005051931l7354404bib45ccbb6a81a9be7@mail.gmail.com>

It's rare that one wants to "manually" create arrays using np.array,
however.  Typically, the arrays used in numerics are "standard" in some way
(e.g., the array corresponding to the n x n identity matrix, obtained from
np.eye(n)) and thus numpy has many, _many_ functions which provide these
various standard arrays for you.  Even in the case of an array created from
data in a file, one doesn't read in the data and then pass it as an argument
to np.array - there's a numpy function which reads the file-stored data
directly into an array for you.

DG

On Wed, May 5, 2010 at 7:24 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Wed, May 5, 2010 at 7:18 PM, Victor Eijkhout
> <eijkhout at tacc.utexas.edu> wrote:
> > I found the "guide to numpy" book, but I can't figure out how to
> > create a multi-dimensional array. Is there a short tutorial? Or can
> > someone give me a short example program with the most relevant features?
>
> Here's one way to create arrays:
>
> One dimensional:
>
> >> import numpy as np
> >> x1 = np.array([1, 2, 3])
> >> x1.ndim
>   1
> >> x1
>   array([1, 2, 3])
>
> Two dimensional:
>
> >> x2 = np.array([[1, 2], [3, 4]])
> >> x2.ndim
>   2
> >> x2
> array([[1, 2],
>       [3, 4]])
>
> Three dimensional:
>
> >> x3 = np.random.rand(2, 3, 4)
> >> x3.ndim
>   3
> >> x3.shape
>   (2, 3, 4)
> >> x3
> array([[[ 0.85887601,  0.2988635 ,  0.93155938,  0.48419988],
>        [ 0.677853  ,  0.67478433,  0.7065251 ,  0.49045808],
>        [ 0.87160361,  0.55503905,  0.36378423,  0.39314846]],
>
>       [[ 0.80761194,  0.54838378,  0.80576339,  0.08248982],
>        [ 0.16729305,  0.16320019,  0.5628961 ,  0.77325458],
>        [ 0.7073337 ,  0.08927084,  0.89050264,  0.54985488]]])
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/306510f8/attachment.html>

From d.l.goldsmith at gmail.com  Wed May  5 22:44:45 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 5 May 2010 19:44:45 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
Message-ID: <p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>

On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout <eijkhout at tacc.utexas.edu>wrote:

>
> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
>
> > Here's one way to create arrays:
>
> Thanks. Suppose I don't have the data yet, but I simple want to
> allocate a, oh let's say, 5000x300x20 array?
>

>>> import numpy as np
>>> a = np.zeros((5000, 300, 20))
>>> a.shape
(5000L, 300L, 20L)

IIRC, there's a quicker way if you don't need the array's values
initialized, but I forget what it is.

DG

>
> Victor.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/774b61f4/attachment.html>

From kwgoodman at gmail.com  Wed May  5 22:52:09 2010
From: kwgoodman at gmail.com (Keith Goodman)
Date: Wed, 5 May 2010 19:52:09 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
Message-ID: <m2qf4f93d421005051952y5e79e2a0j1f5e015cd8d1b1ee@mail.gmail.com>

On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout
<eijkhout at tacc.utexas.edu> wrote:
>
> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
>
>> Here's one way to create arrays:
>
> Thanks. Suppose I don't have the data yet, but I simple want to
> allocate a, oh let's say, 5000x300x20 array?

>> import numpy as np
>> x = np.zeros((5000, 300, 20))
>> x.ndim
   3
>> x.sum()
   0.0
>> x[0,0,0] = 9
>> x.sum()
   9.0
>> x[0] = 1 # All elements set to zero of first 300x20 slice
>> x.sum()
   6000.0


From ben.v.root at gmail.com  Wed May  5 22:57:27 2010
From: ben.v.root at gmail.com (Benjamin Root)
Date: Wed, 5 May 2010 21:57:27 -0500
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu> 
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com> 
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
	<p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
Message-ID: <u2h6a7d07ce1005051957i7a3e6ebau57da46ee019604cd@mail.gmail.com>

On Wed, May 5, 2010 at 9:44 PM, David Goldsmith <d.l.goldsmith at gmail.com>wrote:

> On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout <eijkhout at tacc.utexas.edu>wrote:
>
>>
>> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
>>
>> > Here's one way to create arrays:
>>
>> Thanks. Suppose I don't have the data yet, but I simple want to
>> allocate a, oh let's say, 5000x300x20 array?
>>
>
> >>> import numpy as np
> >>> a = np.zeros((5000, 300, 20))
> >>> a.shape
> (5000L, 300L, 20L)
>
> IIRC, there's a quicker way if you don't need the array's values
> initialized, but I forget what it is.
>

>>> import numpy as np
>>> a =  np.empty((5000, 300, 20))
>>> a.shape
(5000, 300, 20)

Ben


> DG
>
>>
>> Victor.
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
>
> --
> Mathematician: noun, someone who disavows certainty when their uncertainty
> set is non-empty, even if that set has measure zero.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/6ba2afd6/attachment.html>

From kwgoodman at gmail.com  Wed May  5 22:58:47 2010
From: kwgoodman at gmail.com (Keith Goodman)
Date: Wed, 5 May 2010 19:58:47 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
	<p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
Message-ID: <p2wf4f93d421005051958s6b73c3c4o2116bb176a28c1ae@mail.gmail.com>

On Wed, May 5, 2010 at 7:44 PM, David Goldsmith <d.l.goldsmith at gmail.com> wrote:
> On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout <eijkhout at tacc.utexas.edu>
> wrote:
>>
>> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
>>
>> > Here's one way to create arrays:
>>
>> Thanks. Suppose I don't have the data yet, but I simple want to
>> allocate a, oh let's say, 5000x300x20 array?
>
>>>> import numpy as np
>>>> a = np.zeros((5000, 300, 20))
>>>> a.shape
> (5000L, 300L, 20L)
>
> IIRC, there's a quicker way if you don't need the array's values
> initialized, but I forget what it is.

Faster to computer, but slower to grok:

>> timeit np.zeros((5000, 300, 20))
10 loops, best of 3: 119 ms per loop
>> timeit np.empty((5000, 300, 20))
100000 loops, best of 3: 6.72 us per loop
>> timeit x = np.empty((5000, 300, 20)); x.fill(0.0)
10 loops, best of 3: 115 ms per loop


From d.l.goldsmith at gmail.com  Thu May  6 00:48:58 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 5 May 2010 21:48:58 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <p2wf4f93d421005051958s6b73c3c4o2116bb176a28c1ae@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
	<p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
	<p2wf4f93d421005051958s6b73c3c4o2116bb176a28c1ae@mail.gmail.com>
Message-ID: <h2u45d1ab481005052148vb3ca72bbj236f762339fe3cb@mail.gmail.com>

On Wed, May 5, 2010 at 7:58 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Wed, May 5, 2010 at 7:44 PM, David Goldsmith <d.l.goldsmith at gmail.com>
> wrote:
> > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout <
> eijkhout at tacc.utexas.edu>
> > wrote:
> >>
> >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
> >>
> >> > Here's one way to create arrays:
> >>
> >> Thanks. Suppose I don't have the data yet, but I simple want to
> >> allocate a, oh let's say, 5000x300x20 array?
> >
> >>>> import numpy as np
> >>>> a = np.zeros((5000, 300, 20))
> >>>> a.shape
> > (5000L, 300L, 20L)
> >
> > IIRC, there's a quicker way if you don't need the array's values
> > initialized, but I forget what it is.
>
> Faster to computer, but slower to grok:
>
> >> timeit np.zeros((5000, 300, 20))
> 10 loops, best of 3: 119 ms per loop
> >> timeit np.empty((5000, 300, 20))
> 100000 loops, best of 3: 6.72 us per loop
> >> timeit x = np.empty((5000, 300, 20)); x.fill(0.0)
> 10 loops, best of 3: 115 ms per loop
>

Interesting, thanks guys!

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/f80ed8f7/attachment.html>

From d.l.goldsmith at gmail.com  Thu May  6 00:59:06 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 5 May 2010 21:59:06 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <h2u45d1ab481005052148vb3ca72bbj236f762339fe3cb@mail.gmail.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<w2sf4f93d421005051924lc6be3a74hd343f6110b71e60f@mail.gmail.com>
	<A7D5FBCF-F523-4553-9487-A048D626717E@tacc.utexas.edu>
	<p2x45d1ab481005051944ue5f6020fg1dffb4dbc97c6824@mail.gmail.com>
	<p2wf4f93d421005051958s6b73c3c4o2116bb176a28c1ae@mail.gmail.com>
	<h2u45d1ab481005052148vb3ca72bbj236f762339fe3cb@mail.gmail.com>
Message-ID: <g2t45d1ab481005052159y205fe9dma8107f55c0f1c390@mail.gmail.com>

BTW Victor: FWIW, the "center" of the ndarray universe is NumPy - though I'm
sure there's a great deal of overlap in the subscribers lists, for your
ndarray-specific questions (indeed, for any object that resides in NumPy, as
opposed to SciPy), it might be more efficient to post to and monitor
numpy-discussion at scipy.org.  As a general rule: if it gets imported from
NumPy, ask on the numpy-discussion list, if it gets imported from SciPy, ask
here (scipy-user).  But this is just a suggestion - you'll get your numpy
questions answered in either place... (but scipy questions on the numpy
list, that's probably a different story).  Again, FWIW

DG

On Wed, May 5, 2010 at 9:48 PM, David Goldsmith <d.l.goldsmith at gmail.com>wrote:

> On Wed, May 5, 2010 at 7:58 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>
>> On Wed, May 5, 2010 at 7:44 PM, David Goldsmith <d.l.goldsmith at gmail.com>
>> wrote:
>> > On Wed, May 5, 2010 at 7:28 PM, Victor Eijkhout <
>> eijkhout at tacc.utexas.edu>
>> > wrote:
>> >>
>> >> On 2010/05/05, at 9:24 PM, Keith Goodman wrote:
>> >>
>> >> > Here's one way to create arrays:
>> >>
>> >> Thanks. Suppose I don't have the data yet, but I simple want to
>> >> allocate a, oh let's say, 5000x300x20 array?
>> >
>> >>>> import numpy as np
>> >>>> a = np.zeros((5000, 300, 20))
>> >>>> a.shape
>> > (5000L, 300L, 20L)
>> >
>> > IIRC, there's a quicker way if you don't need the array's values
>> > initialized, but I forget what it is.
>>
>> Faster to computer, but slower to grok:
>>
>> >> timeit np.zeros((5000, 300, 20))
>> 10 loops, best of 3: 119 ms per loop
>> >> timeit np.empty((5000, 300, 20))
>> 100000 loops, best of 3: 6.72 us per loop
>> >> timeit x = np.empty((5000, 300, 20)); x.fill(0.0)
>> 10 loops, best of 3: 115 ms per loop
>>
>
> Interesting, thanks guys!
>
> DG
>
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/30b0d39c/attachment.html>

From denis-bz-gg at t-online.de  Thu May  6 05:28:48 2010
From: denis-bz-gg at t-online.de (denis)
Date: Thu, 6 May 2010 02:28:48 -0700 (PDT)
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
Message-ID: <99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com>

I'd recommend http://scipy.org/Cookbook/BuildingArrays, then
http://scipy.org/Cookbook/Indexing
(can't resist quoting from Indexing:
"numpy and scipy provide a few other types that behave like arrays, in
particular matrices and sparse matrices.
Their indexing can differ from that of arrays in surprising ways")

Also http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html
is a nice 2-page cheat sheet.
cheers
  -- denis


From bruce.labitt at autoliv.com  Thu May  6 09:23:54 2010
From: bruce.labitt at autoliv.com (bruce.labitt at autoliv.com)
Date: Thu, 6 May 2010 09:23:54 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <v2tc6a6f63b1005051149tf09c70c0q43dec9ea3bcd057a@mail.gmail.com>
Message-ID: <OFE723B0F6.341F7C14-ON8525771B.0048966E-8525771B.0049A9D5@autoliv.int>

scipy-user-bounces at scipy.org wrote on 05/05/2010 02:49:16 PM:

> On Tue, May 4, 2010 at 12:48 PM,  <bruce.labitt at autoliv.com> wrote:
> > I have found an issue with scipy.linalg.lstsq, I think.  The following
> > code works in Ubuntu 10.04 x86-64, but not in WinXP-32.  I think it 
should
> > work in WinXP.  Here is a minimum example:
> >
> > """==== program testlstsq.py ======================"""
> > from scipy.linalg import eig, lstsq
> > from numpy import angle, zeros, pi, arcsin
> >
> > A = zeros((4,1), dtype='complex')
> > B = zeros((4,1), dtype='complex')
> >
> > A[0,0] = -0.535412460549-2.65798938848e-17j
> > A[1,0] = -0.369432866546-0.131765700574j
> > A[2,0] = -0.222906796932-0.263237285725j
> > A[3,0] = -0.069087096386-0.38609560454j
> >
> > B[0,0] = -0.369432866546-0.131765700574j
> > B[1,0] = -0.222906796932-0.263237285725j
> > B[2,0] = -0.069087096386-0.38609560454j
> > B[3,0] =  0.0882283631514-0.528093039953j
> >
> > try:
> >    print 'Got here'
> >    phi = lstsq(A,B)
> >    print 'Finished lstsq'
> > except:
> >    print 'Exception Occurred'
> > else:
> >    for a in range(len(phi)):
> >        print 'phi[',a,']=',phi[a]
> >
> >    w = -angle( eig(phi[0])[0][:] )
> >    d = 0.5
> >    aa = arcsin( w / (2.0*pi) )*180./pi     # in degrees
> >    print 'aa unsorted =', aa
> > """ ==== end of testlstsq.py =========================="""
> >
> 
> On Ubuntu, what version of ALTAS is being used?

I'm embarassed to say that I haven't compiled and run ATLAS on 10.04 yet. 
On my todo list.
The latest ATLAS is 3.8.3 I think.  So it appears I have the reference 
BLAS, uggh.

Previously, I had ATLAS 3.8.2 on my machine (Ubuntu 9.10).  However, I 
have no idea which lib the scipy and numpy Ubuntu packages link to.  How 
does one find out?  And how does one link scipy and numpy to the better 
ATLAS and LAPACK libs that one has optimized for one's machine?

> 
> If I modify the 'except' to read
> 
>   print 'Exception Occurred', sys.exc_info()[0]
> 
> the above code generates the enclosed error on CentOS 5 and Fedora 9
> running python 2.5 and python 2.6, respectively, and both using
> scipy-0.71 and nump 1.30.
> 
> Both plaforms use the same version of ATLAS, namely 3.8.2.
> 
> All software was built from source on the respective platforms.
> 

I need to do this...

> Traceback (most recent call last):
>   File "./lstsq.py", line 22, in <module>
>     phi = lstsq(A,B)
>   File 
"/usr/devtools/lib/python2.6/site-packages/scipy/linalg/basic.py",
> line 549, in lstsq
>     overwrite_b = overwrite_b)
> ValueError: On entry to ZGELSS parameter number 12 had an illegal value

ZGELSS is the LAPACK Linear Least Squares solver.  It is for double 
precision complex numbers.

Having an illegal value on entry, looks like a bug, no?


> 
> -- 
>       Enjoy global warming while it lasts.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************

From jsseabold at gmail.com  Thu May  6 10:44:40 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Thu, 6 May 2010 10:44:40 -0400
Subject: [SciPy-User] lstsq error under Windows?
In-Reply-To: <OFE723B0F6.341F7C14-ON8525771B.0048966E-8525771B.0049A9D5@autoliv.int>
References: <v2tc6a6f63b1005051149tf09c70c0q43dec9ea3bcd057a@mail.gmail.com> 
	<OFE723B0F6.341F7C14-ON8525771B.0048966E-8525771B.0049A9D5@autoliv.int>
Message-ID: <m2oc048da1c1005060744yad7c20fdi62223e7843ea644c@mail.gmail.com>

On Thu, May 6, 2010 at 9:23 AM,  <bruce.labitt at autoliv.com> wrote:
>> On Ubuntu, what version of ALTAS is being used?
>
> I'm embarassed to say that I haven't compiled and run ATLAS on 10.04 yet.
> On my todo list.
> The latest ATLAS is 3.8.3 I think. ?So it appears I have the reference
> BLAS, uggh.
>
> Previously, I had ATLAS 3.8.2 on my machine (Ubuntu 9.10). ?However, I
> have no idea which lib the scipy and numpy Ubuntu packages link to. ?How
> does one find out? ?And how does one link scipy and numpy to the better
> ATLAS and LAPACK libs that one has optimized for one's machine?
>

import numpy as np
np.show_config()

To link to your own, you need to edit site.cfg when you install.

Skipper


From hihighsky at gmail.com  Thu May  6 10:54:35 2010
From: hihighsky at gmail.com (Tingting HAN)
Date: Thu, 6 May 2010 16:54:35 +0200
Subject: [SciPy-User] problem with installing scipy
Message-ID: <t2jd5b0848f1005060754j6dc4f00ci706d3f36e18547df@mail.gmail.com>

Dear Officer,

I work on linux and have python originally installed in the system:

shau at tityro:/home/hantingting/Downloads$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.

I want to install package scipy, but there is the following problem:


shau at tityro:/home/hantingting/Downloads/triMC3D/python$ sudo apt-get install
scipy
[sudo] password for shau:
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Couldn't find package scipy

Could you please give me some advice to solve the problem or to properly
install scipy?
-- 
Yours sincerely,

Sofia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100506/13546a5b/attachment.html>

From cr.anil at gmail.com  Thu May  6 10:56:21 2010
From: cr.anil at gmail.com (Anil C R)
Date: Thu, 6 May 2010 20:26:21 +0530
Subject: [SciPy-User] problem with installing scipy
In-Reply-To: <t2jd5b0848f1005060754j6dc4f00ci706d3f36e18547df@mail.gmail.com>
References: <t2jd5b0848f1005060754j6dc4f00ci706d3f36e18547df@mail.gmail.com>
Message-ID: <g2y8cf7db601005060756ya9a40cd0m67c3c88170026035@mail.gmail.com>

it's "sudo apt-get install python-scipy"
Anil


On Thu, May 6, 2010 at 8:24 PM, Tingting HAN <hihighsky at gmail.com> wrote:

> Dear Officer,
>
> I work on linux and have python originally installed in the system:
>
> shau at tityro:/home/hantingting/Downloads$ python
> Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
> I want to install package scipy, but there is the following problem:
>
>
> shau at tityro:/home/hantingting/Downloads/triMC3D/python$ sudo apt-get
> install scipy
> [sudo] password for shau:
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> E: Couldn't find package scipy
>
> Could you please give me some advice to solve the problem or to properly
> install scipy?
> --
> Yours sincerely,
>
> Sofia
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100506/818a3c07/attachment.html>

From d.l.goldsmith at gmail.com  Thu May  6 12:43:50 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Thu, 6 May 2010 09:43:50 -0700
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
	<99301b0f-35c9-4cba-aaa3-4bf1100b80cc@i9g2000yqi.googlegroups.com>
Message-ID: <z2m45d1ab481005060943jb1d54eb7wf501aa4edc93bfaa@mail.gmail.com>

On Thu, May 6, 2010 at 2:28 AM, denis <denis-bz-gg at t-online.de> wrote:

> I'd recommend http://scipy.org/Cookbook/BuildingArrays, then
> http://scipy.org/Cookbook/Indexing
> (can't resist quoting from Indexing:
> "numpy and scipy provide a few other types that behave like arrays, in
> particular matrices and sparse matrices.
> Their indexing can differ from that of arrays in surprising ways")
>
> Also
> http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html<http://pages.physics.cornell.edu/%7Emyers/teaching/ComputationalMethods/python/arrays.html>
> is a nice 2-page cheat sheet.
>

Nice indeed, I just bookmarked it!  Is there a link to that on the scipy
Site?  There should be!

DG

> cheers
>   -- denis
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100506/f8a5866e/attachment.html>

From amenity at enthought.com  Thu May  6 14:53:12 2010
From: amenity at enthought.com (Amenity Applewhite)
Date: Thu, 6 May 2010 13:53:12 -0500
Subject: [SciPy-User] SciPy 2010: Bioinformatic & Parallel/cloud talks
	announced...& register now!
References: <e91b4574d5d1709a9dc4f7ab7.20100506151618@mcsv137.net>
Message-ID: <48351A11-7A2D-4A12-B9EE-9483046DA10A@enthought.com>

Hello!

Things are moving quickly in preparation for SciPy 2010: Last week we  
announced the
General Conference schedule (http://conference.scipy.org/scipy2010/schedule.html 
),
Tuesday we announced our student sponsorship recipients (http://conference.scipy.org/scipy2010/student.html 
)
and now we're ready to tell you give you a look at the talks we have  
lined up for our
Bioinformatics and Parallel Processing /Cloud Computing tracks.

===Parallel  Processing & Cloud Computing track===
We really appreciate Brian and Ken's work organizing the papers for this
specialized track. And of course, thanks to everyone who submitted a  
paper.
There has been a great deal of interest in this set of talks ? and  
word on
the street is that Brian may even have a HPC tutorial up his sleeve...
* StarCluster - NumPy/SciPy Computing in the Cloud-
      Justin Riley
* pomsets: workflow management for your cloud-
      Michael J Pan
* Getting Down with Big Data
      Jared Flatow, Anita Lillie, Ville Tuulos
* StarFlow: A Cloud-Enables Python Workflow Engine for Scientific  
Analysis Pipelines
      Elaine Angelino, Dan Yamins, Margo Seltzer
* A Programmatic Interface for Particle Plasma Simulation in Python,  
and Early Backend Results with PyCUDA
      Min Ragan-Kelley
* Parallel Computing with IPython: an Application to Air Pollution  
Modeling
      B.E. Granger, J.G. Hemann
* Astronomy App in the Cloud using Google Geo APIs and Python App Engine
      Shawn Shen

===Bioinformatics track===
Once again, we are indebted to Glen Otero, from Dell, for putting  
together the Bioinformatics track.
He received some fantastic papers and we're really looking forward to  
these presentations:	
* Protein Folding with Python on Supercomputers
      Jan H. Meinke
* Can Python Save Next-Generation Sequencing?
* The Use of Galaxy for the Research and the Teaching of Genomics
     Roy Weckiewicz, Jim Hu, and Rodolfo Aramayo

===Early registration ends next Monday===
That's right: Only a few days left before rates increase! Think of all  
the
BBQ and breakfast tacos you can buy with that $50-$100 you'll save by
registering early. If that doesn't convince you, consider:

-Cheap flights to Austin-
Buy your tickets now for some very nice prices: $275 from Chicago,  
$330 from
San Francisco, $380 from New York City, $810 from London...(prices from
Kayak.com)

-Convenient & affordable hotel-
We got an fantastic deal for on-site accommodations at the AT&T
Conference Center. Pay only $89/night for single occupancy or $105/ 
night for
double occupancy. It will be great to have everyone staying in the same
spot. Once you register, you'll get a code to book your hotel  
reservation.
The discounted rate will be applied automatically.
https://conference.scipy.org/scipy2010/accommodation.html

No car necessary to get to the conference... and see Austin!
An airport bus (http://capmetro.org/riding/current_schedules/maps/rt100_sb.pdf 
)
runs straight to and from the AT&T center, so you won't
have to rent a car at all. Plus, the UT campus area is in walking  
distance
to a number of great restaurants and activities.  For any longer trips
you'd like to make Austin has a great public bus system.


Not to mention all of the mind-blowing things you'll learn and  
outstanding
people you'll meet and catch up with. So what are you waiting for?
Register: https://conference.scipy.org/scipy2010/registration.html

Best,

The SciPy 2010 Team
@SciPy2010 on Twitter


From stefan at sun.ac.za  Thu May  6 17:40:42 2010
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 6 May 2010 23:40:42 +0200
Subject: [SciPy-User] getting started with ndarray
In-Reply-To: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
References: <DA42A118-2C99-479D-BA04-D3F21EBB9437@tacc.utexas.edu>
Message-ID: <l2n9457e7c81005061440me5775653heb95413f6fdc0ad5@mail.gmail.com>

Hi Victor

On 6 May 2010 04:18, Victor Eijkhout <eijkhout at tacc.utexas.edu> wrote:
> I found the "guide to numpy" book, but I can't figure out how to
> create a multi-dimensional array. Is there a short tutorial?

I've got a short NumPy tutorial here which might help:

http://mentat.za.net/numpy/intro/intro.html

Regards
St?fan


From josef.pktd at gmail.com  Fri May  7 12:40:05 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 12:40:05 -0400
Subject: [SciPy-User] inverse function of a spline
Message-ID: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>

I have a function  y = f(x) which is monotonically increasing (a
cumulative distribution function)
f is defined by piecewise polynomial interpolation, an interpolating
spline on some points

I would like to get the inverse function (ppf)  x = f^{-1} (y)
if the spline is of higher order than linear

In the linear case it's trivial, because the inverse function is also
just a piecewise linear interpolation.

If I have a cubic spline, or any other smooth interpolator in scipy,
is there a way to get the
inverse function directly?

I don't know much about general properties of splines, and would
appreciate any hints,
so I can avoid numerical inversion (fsolve or similar)

Thanks,

Josef


From charlesr.harris at gmail.com  Fri May  7 13:57:04 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 7 May 2010 11:57:04 -0600
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
Message-ID: <v2we06186141005071057v70cfa1c4wc6ac63da13ba1953@mail.gmail.com>

On Fri, May 7, 2010 at 10:40 AM, <josef.pktd at gmail.com> wrote:

> I have a function  y = f(x) which is monotonically increasing (a
> cumulative distribution function)
> f is defined by piecewise polynomial interpolation, an interpolating
> spline on some points
>
> I would like to get the inverse function (ppf)  x = f^{-1} (y)
> if the spline is of higher order than linear
>
> In the linear case it's trivial, because the inverse function is also
> just a piecewise linear interpolation.
>
> If I have a cubic spline, or any other smooth interpolator in scipy,
> is there a way to get the
> inverse function directly?
>
> I don't know much about general properties of splines, and would
> appreciate any hints,
> so I can avoid numerical inversion (fsolve or similar)
>
>
Since the curve is piecewise cubic the problem reduces to inverting a piece
of a cubic, which inverse won't itself be a cubic in general. I think your
best bet is interpolate the same points with x,y reversed, or resample using
your spline and interpolate the new samples with x,y reversed. It won't be a
exact inverse, but then, the original is probably not exact either.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100507/26b2726b/attachment.html>

From josef.pktd at gmail.com  Fri May  7 14:34:02 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 14:34:02 -0400
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <v2we06186141005071057v70cfa1c4wc6ac63da13ba1953@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
	<v2we06186141005071057v70cfa1c4wc6ac63da13ba1953@mail.gmail.com>
Message-ID: <y2s1cd32cbb1005071134re675c760u265b0b30a7651aa2@mail.gmail.com>

On Fri, May 7, 2010 at 1:57 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, May 7, 2010 at 10:40 AM, <josef.pktd at gmail.com> wrote:
>>
>> I have a function ?y = f(x) which is monotonically increasing (a
>> cumulative distribution function)
>> f is defined by piecewise polynomial interpolation, an interpolating
>> spline on some points
>>
>> I would like to get the inverse function (ppf) ?x = f^{-1} (y)
>> if the spline is of higher order than linear
>>
>> In the linear case it's trivial, because the inverse function is also
>> just a piecewise linear interpolation.
>>
>> If I have a cubic spline, or any other smooth interpolator in scipy,
>> is there a way to get the
>> inverse function directly?
>>
>> I don't know much about general properties of splines, and would
>> appreciate any hints,
>> so I can avoid numerical inversion (fsolve or similar)
>>
>
> Since the curve is piecewise cubic the problem reduces to inverting a piece
> of a cubic, which inverse won't itself be a cubic in general. I think your
> best bet is interpolate the same points with x,y reversed, or resample using
> your spline and interpolate the new samples with x,y reversed. It won't be a
> exact inverse, but then, the original is probably not exact either.

That's what I suspected, I was hoping for a trick (like one interpolator is the
"natural" inverse of another one).

resampling should give a good enough approximation. Without resampling, the
error for round tripping x= f^{-1} ( f(x) ) might be too large to give
consistent results.
(Even if there are sampling and approximation errors, I still would
prefer consistency.)


Just a follow-up question on approximation:

for the cdf (e.g. normal distribution)  f:R->[0,1]   f^{-1}:[0,1]->R

Is it better to start with a spline on the inverse function (ppf),
f^{-1}, because it has
compact support, resample from it, and then create the cdf f from a
resampled ppf;
or the other way around, or it wouldn't really matter?

Thanks for the information and hint,

Josef

>
> Chuck
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From charlesr.harris at gmail.com  Fri May  7 15:37:36 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 7 May 2010 13:37:36 -0600
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <y2s1cd32cbb1005071134re675c760u265b0b30a7651aa2@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
	<v2we06186141005071057v70cfa1c4wc6ac63da13ba1953@mail.gmail.com>
	<y2s1cd32cbb1005071134re675c760u265b0b30a7651aa2@mail.gmail.com>
Message-ID: <h2le06186141005071237l88945141g7b5e95cd79ba08c8@mail.gmail.com>

On Fri, May 7, 2010 at 12:34 PM, <josef.pktd at gmail.com> wrote:

> On Fri, May 7, 2010 at 1:57 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Fri, May 7, 2010 at 10:40 AM, <josef.pktd at gmail.com> wrote:
> >>
> >> I have a function  y = f(x) which is monotonically increasing (a
> >> cumulative distribution function)
> >> f is defined by piecewise polynomial interpolation, an interpolating
> >> spline on some points
> >>
> >> I would like to get the inverse function (ppf)  x = f^{-1} (y)
> >> if the spline is of higher order than linear
> >>
> >> In the linear case it's trivial, because the inverse function is also
> >> just a piecewise linear interpolation.
> >>
> >> If I have a cubic spline, or any other smooth interpolator in scipy,
> >> is there a way to get the
> >> inverse function directly?
> >>
> >> I don't know much about general properties of splines, and would
> >> appreciate any hints,
> >> so I can avoid numerical inversion (fsolve or similar)
> >>
> >
> > Since the curve is piecewise cubic the problem reduces to inverting a
> piece
> > of a cubic, which inverse won't itself be a cubic in general. I think
> your
> > best bet is interpolate the same points with x,y reversed, or resample
> using
> > your spline and interpolate the new samples with x,y reversed. It won't
> be a
> > exact inverse, but then, the original is probably not exact either.
>
> That's what I suspected, I was hoping for a trick (like one interpolator is
> the
> "natural" inverse of another one).
>
> resampling should give a good enough approximation. Without resampling, the
> error for round tripping x= f^{-1} ( f(x) ) might be too large to give
> consistent results.
> (Even if there are sampling and approximation errors, I still would
> prefer consistency.)
>
>
> Just a follow-up question on approximation:
>
> for the cdf (e.g. normal distribution)  f:R->[0,1]   f^{-1}:[0,1]->R
>
> Is it better to start with a spline on the inverse function (ppf),
> f^{-1}, because it has
> compact support, resample from it, and then create the cdf f from a
> resampled ppf;
> or the other way around, or it wouldn't really matter?
>
> Thanks for the information and hint,
>
> I don't know the answer to that  although I suspect starting from the
inverse might be superior. The end points might be a problem though if the
curve goes vertical. You might have to experiment a bit or use a spline in
combination with other functions. There has probably been a small industry
out there dealing with these sorts of problem but I don't know who they are.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100507/1545e993/attachment.html>

From peter.shepard at gmail.com  Fri May  7 15:45:08 2010
From: peter.shepard at gmail.com (Pete Shepard)
Date: Fri, 7 May 2010 12:45:08 -0700
Subject: [SciPy-User] fisherexact.py returns "NA" for large #s
Message-ID: <r2w5c2c43621005071245uf3fcbb75x9f83a27a15e6f7f1@mail.gmail.com>

Hello List,


I am using "fisherexact.py" to calculate the p-value of two ratios however,
when large #s are involved, it returns "NA". Is there a way to override
this?

TIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100507/73d0a34f/attachment.html>

From josef.pktd at gmail.com  Fri May  7 16:15:19 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 16:15:19 -0400
Subject: [SciPy-User] fisherexact.py returns "NA" for large #s
In-Reply-To: <r2w5c2c43621005071245uf3fcbb75x9f83a27a15e6f7f1@mail.gmail.com>
References: <r2w5c2c43621005071245uf3fcbb75x9f83a27a15e6f7f1@mail.gmail.com>
Message-ID: <r2s1cd32cbb1005071315md0bfbceds9679363bb46b24cb@mail.gmail.com>

On Fri, May 7, 2010 at 3:45 PM, Pete Shepard <peter.shepard at gmail.com> wrote:
> Hello List,
>
>
> I am using "fisherexact.py" to calculate the p-value of two ratios however,
> when large #s are involved, it returns "NA". Is there a way to override
> this?


You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ?

Do you have an example? Can you add it to the ticket?

Do you have large ratios or large numbers in each cell?
If you have a large number of entries in each cell, then the chisquare
test or similar
asymptotic tests should be pretty reliable.

Last time I tried, I didn't manage to get rid of incorrect results if
the first cell is zero.
And I didn't understand the details of the algorithm well enough to
figure out what's
going on (within a reasonable time).

If you add some print statements, you could find out if the nan comes from a
0./0. division or from the hypergeometric distribution.
Do you get the same result if you permute rows or columns?

fisherexact works very well over a large range of values, but I'm
waiting for someone
to provide a patch for the cases that don't work.

Josef


>
> TIA
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From peridot.faceted at gmail.com  Fri May  7 16:24:27 2010
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Fri, 7 May 2010 16:24:27 -0400
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
Message-ID: <i2lce557a361005071324g432169aflc873ddb47fddec83@mail.gmail.com>

On 7 May 2010 12:40,  <josef.pktd at gmail.com> wrote:
> I have a function ?y = f(x) which is monotonically increasing (a
> cumulative distribution function)
> f is defined by piecewise polynomial interpolation, an interpolating
> spline on some points
>
> I would like to get the inverse function (ppf) ?x = f^{-1} (y)
> if the spline is of higher order than linear
>
> In the linear case it's trivial, because the inverse function is also
> just a piecewise linear interpolation.
>
> If I have a cubic spline, or any other smooth interpolator in scipy,
> is there a way to get the
> inverse function directly?
>
> I don't know much about general properties of splines, and would
> appreciate any hints,
> so I can avoid numerical inversion (fsolve or similar)

I should first say that even though your input points are monotonic,
the spline is not guaranteed to be. (Though in practice if your
sampled points have no sharp corners you're probably fine.) If this
matters to you, there are algorithms for enforcing monotonicity of
splines, some of which are simply procedures for jiggering the
interpolation points just enough to avoid non-monotonicty and some of
which are more clever. Sadly none are implemented in scipy.

As Charles Harris pointed out, the inverse of a cubic is not a cubic,
so the inverse function won't be a spline. But you can nevertheless
efficiently evaluate it with scipy.interpolate.sproot, which a
special-purpose numerical solver. I'm not sure whether it uses cubic
root solvers or an optimized numerical solver with knowledge about the
bounding properties of spline control points, but in any case it's
quite efficient. It only finds zeros, but (check this) you should be
able to shift a spline vertically by subtracting a constant from the
coefficient array (c in t,c,k).

Since you are constructing the spline in the first place, you should
also think about whether you're evaluating f or f inverse more often
and choose which one to be the spline appropriately.

Anne


From amcmorl at gmail.com  Fri May  7 16:28:21 2010
From: amcmorl at gmail.com (Angus McMorland)
Date: Fri, 7 May 2010 16:28:21 -0400
Subject: [SciPy-User] trouble loading one .mat file
Message-ID: <j2he85f13c41005071328h227aa75fjdd5ab819753ff3f@mail.gmail.com>

Hi all,

After upgrading to svn scipy 0.8.0.dev6369, to take advantage of Matthew
Brett's bugfix to the scipy.io code (thanks for that, Matthew), I now have
one matlab file which I cannot load using scipy.io.loadmat. Trying it gives
the following error:

/usr/local/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in
get_variables(self, variable_names)
    397         mdict['__globals__'] = []
    398         while not self.end_of_stream():
--> 399             hdr, next_position = self.read_var_header()
    400             name = hdr.name
    401             if name == '':

/usr/local/lib/python2.6/dist-packages/scipy/io/matlab/mio5.pyc in
read_var_header(self)
    352         next_pos = self.mat_stream.tell() + byte_count
    353         if mdtype == miCOMPRESSED: # make new stream from compressed
data
--> 354             stream =
StringIO(zlib.decompress(self.mat_stream.read(byte_count)))
    355             self._matrix_reader.set_stream(stream)
    356             mdtype, byte_count = self._matrix_reader.read_full_tag()

error: Error -5 while decompressing data

I could definitely read this file before using the ubuntu karmic package
0.7.0-2, but I've also upgraded to lucid recently and I'm unsure whether I
had successfully read it with lucid and packaged scipy before upgrading to
scipy svn. In any case, a number of very similar files can be read fine
using the new setup and a colleague has verified that the problem file can
be opened with Matlab okay.

Has anyone come across and solved this sort of problem before, or have any
idea what might be causing it? It seems impolite to distribute the file on
the list here, but I could send it to someone who has the capability to
tackle debugging the problem.

Many thanks,

Angus.
-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100507/ee43044f/attachment.html>

From josef.pktd at gmail.com  Fri May  7 16:36:15 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 16:36:15 -0400
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <i2lce557a361005071324g432169aflc873ddb47fddec83@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
	<i2lce557a361005071324g432169aflc873ddb47fddec83@mail.gmail.com>
Message-ID: <h2z1cd32cbb1005071336he6764e32i9093f7130cd57716@mail.gmail.com>

On Fri, May 7, 2010 at 4:24 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> On 7 May 2010 12:40, ?<josef.pktd at gmail.com> wrote:
>> I have a function ?y = f(x) which is monotonically increasing (a
>> cumulative distribution function)
>> f is defined by piecewise polynomial interpolation, an interpolating
>> spline on some points
>>
>> I would like to get the inverse function (ppf) ?x = f^{-1} (y)
>> if the spline is of higher order than linear
>>
>> In the linear case it's trivial, because the inverse function is also
>> just a piecewise linear interpolation.
>>
>> If I have a cubic spline, or any other smooth interpolator in scipy,
>> is there a way to get the
>> inverse function directly?
>>
>> I don't know much about general properties of splines, and would
>> appreciate any hints,
>> so I can avoid numerical inversion (fsolve or similar)
>
> I should first say that even though your input points are monotonic,
> the spline is not guaranteed to be. (Though in practice if your
> sampled points have no sharp corners you're probably fine.) If this
> matters to you, there are algorithms for enforcing monotonicity of
> splines, some of which are simply procedures for jiggering the
> interpolation points just enough to avoid non-monotonicty and some of
> which are more clever. Sadly none are implemented in scipy.
>
> As Charles Harris pointed out, the inverse of a cubic is not a cubic,
> so the inverse function won't be a spline. But you can nevertheless
> efficiently evaluate it with scipy.interpolate.sproot, which a
> special-purpose numerical solver. I'm not sure whether it uses cubic
> root solvers or an optimized numerical solver with knowledge about the
> bounding properties of spline control points, but in any case it's
> quite efficient. It only finds zeros, but (check this) you should be
> able to shift a spline vertically by subtracting a constant from the
> coefficient array (c in t,c,k).
>
> Since you are constructing the spline in the first place, you should
> also think about whether you're evaluating f or f inverse more often
> and choose which one to be the spline appropriately.

Thanks, I will try to figure out the sproot version.

For now I'm stuck (and go somewhere else) because in the examples
that I tried out, I get small non-monotonicities most of the time.
The spline of the inverse function is backwards bending.

I will stick with linear interpolation and kernel density estimation for the
smooth case.

BTW: I'm writing some histogram distribution and variants of empirical
distribution classes that have the same methods as the ones in scipy.stats.

Josef
>
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From vanforeest at gmail.com  Fri May  7 16:37:33 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Fri, 7 May 2010 22:37:33 +0200
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
Message-ID: <k2qfa510ff81005071337z30b8ba9ap55ecce044cc8ed6c@mail.gmail.com>

Hi Josef,

> If I have a cubic spline, or any other smooth interpolator in scipy,
> is there a way to get the
> inverse function directly?

How can you ensure that the cubic spline approx is non-decreasing? I
actually wonder whether using cubic splines is the best way to
approximate distribution functions.

Nicky


From josef.pktd at gmail.com  Fri May  7 16:44:44 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 16:44:44 -0400
Subject: [SciPy-User] inverse function of a spline
In-Reply-To: <k2qfa510ff81005071337z30b8ba9ap55ecce044cc8ed6c@mail.gmail.com>
References: <y2i1cd32cbb1005070940k73fbaa7fo853ac9dbc228c466@mail.gmail.com>
	<k2qfa510ff81005071337z30b8ba9ap55ecce044cc8ed6c@mail.gmail.com>
Message-ID: <r2h1cd32cbb1005071344j9ba596cch760d5e815030fdd1@mail.gmail.com>

On Fri, May 7, 2010 at 4:37 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi Josef,
>
>> If I have a cubic spline, or any other smooth interpolator in scipy,
>> is there a way to get the
>> inverse function directly?
>
> How can you ensure that the cubic spline approx is non-decreasing? I
> actually wonder whether using cubic splines is the best way to
> approximate distribution functions.

Now I know it's not, but I was designing the extension to the linear case
on paper instead of in the interpreter, and got stuck on the wrong
problem.

Maybe I ask the question again when scipy has monotonic interpolators.

Josef

>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Fri May  7 18:45:51 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 7 May 2010 18:45:51 -0400
Subject: [SciPy-User] fisherexact.py returns "NA" for large #s
In-Reply-To: <p2j77e831101005071444i78c3b12bref0c2ea18cb81741@mail.gmail.com>
References: <r2w5c2c43621005071245uf3fcbb75x9f83a27a15e6f7f1@mail.gmail.com>
	<r2s1cd32cbb1005071315md0bfbceds9679363bb46b24cb@mail.gmail.com>
	<p2j77e831101005071444i78c3b12bref0c2ea18cb81741@mail.gmail.com>
Message-ID: <r2j1cd32cbb1005071545j5aff8685n1d56634d7df679c0@mail.gmail.com>

On Fri, May 7, 2010 at 5:44 PM, Vincent Davis <vincent at vincentdavis.net>wrote:

> @ Josef, I assume you know about this reference from the wikipedia page.
> http://mathworld.wolfram.com/FishersExactTest.html
>
>
I have it in my second comment to the ticket. But from this it's still a
long
way to figure out where the zero is supposed to go in the strict or weak
inequalities in the binary search. And why does the second path work
but not the first ?

I wasn't patient enough.

Josef


> Vincent
>
> On Fri, May 7, 2010 at 2:15 PM, <josef.pktd at gmail.com> wrote:
>
>> On Fri, May 7, 2010 at 3:45 PM, Pete Shepard <peter.shepard at gmail.com>
>> wrote:
>> > Hello List,
>> >
>> >
>> > I am using "fisherexact.py" to calculate the p-value of two ratios
>> however,
>> > when large #s are involved, it returns "NA". Is there a way to override
>> > this?
>>
>>
>> You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ?
>>
>> Do you have an example? Can you add it to the ticket?
>>
>> Do you have large ratios or large numbers in each cell?
>> If you have a large number of entries in each cell, then the chisquare
>> test or similar
>> asymptotic tests should be pretty reliable.
>>
>> Last time I tried, I didn't manage to get rid of incorrect results if
>> the first cell is zero.
>> And I didn't understand the details of the algorithm well enough to
>> figure out what's
>> going on (within a reasonable time).
>>
>> If you add some print statements, you could find out if the nan comes from
>> a
>> 0./0. division or from the hypergeometric distribution.
>> Do you get the same result if you permute rows or columns?
>>
>> fisherexact works very well over a large range of values, but I'm
>> waiting for someone
>> to provide a patch for the cases that don't work.
>>
>> Josef
>>
>>
>>
>>
>>
>> >
>> > TIA
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100507/24cf0831/attachment.html>

From jsseabold at gmail.com  Sat May  8 12:14:09 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 8 May 2010 12:14:09 -0400
Subject: [SciPy-User] Optimization with smoothing
In-Reply-To: <s2ve85f13c41005040634od8bccdc0ld88a6cb342d69f01@mail.gmail.com>
References: <s2ve85f13c41005040634od8bccdc0ld88a6cb342d69f01@mail.gmail.com>
Message-ID: <o2xc048da1c1005080914rf76d2d74z8de710bf37cd0bb8@mail.gmail.com>

On Tue, May 4, 2010 at 9:34 AM, Angus McMorland <amcmorl at gmail.com> wrote:
> Hi all,
> I need to do some optimization where one of the parameters is a
> spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to
> go about this using scipy (or any other numpy-compatible Python package)? I
> could imagine using one of the scipy.optimize routines and then smoothing
> the relevant parameters within the optimization loop, but it would be best
> if the next iteration's of parameters were chosen from the previous
> iteration's _smoothed_ parameters rather than their 'non-smooth'
> predecessors, as it seems like this would keep the optimization better
> behaved. Is this possible?

I would think you could modify the callback function in the source of
your chosen optimization routine from

callback(xk)

to

xk = callback(xk)

Though you would probably want to recompute the gradient and Hessian
at the new smoothed parameters.  Sorry, I don't have a better answer,
but I've often wondered the same thing and I'm hoping someone might
know better than I.

Skipper


From josef.pktd at gmail.com  Sat May  8 13:02:16 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 8 May 2010 13:02:16 -0400
Subject: [SciPy-User] Optimization with smoothing
In-Reply-To: <o2xc048da1c1005080914rf76d2d74z8de710bf37cd0bb8@mail.gmail.com>
References: <s2ve85f13c41005040634od8bccdc0ld88a6cb342d69f01@mail.gmail.com>
	<o2xc048da1c1005080914rf76d2d74z8de710bf37cd0bb8@mail.gmail.com>
Message-ID: <AANLkTilMkUN0OpcudBPVolhmhRujY5mMBBtcwi_qBM64@mail.gmail.com>

On Sat, May 8, 2010 at 12:14 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Tue, May 4, 2010 at 9:34 AM, Angus McMorland <amcmorl at gmail.com> wrote:
>> Hi all,
>> I need to do some optimization where one of the parameters is a
>> spline-smoothed 1-d sequence, with, say, 10 values. What's the best way to
>> go about this using scipy (or any other numpy-compatible Python package)? I
>> could imagine using one of the scipy.optimize routines and then smoothing
>> the relevant parameters within the optimization loop, but it would be best
>> if the next iteration's of parameters were chosen from the previous
>> iteration's _smoothed_ parameters rather than their 'non-smooth'
>> predecessors, as it seems like this would keep the optimization better
>> behaved. Is this possible?
>
> I would think you could modify the callback function in the source of
> your chosen optimization routine from
>
> callback(xk)
>
> to
>
> xk = callback(xk)
>
> Though you would probably want to recompute the gradient and Hessian
> at the new smoothed parameters. ?Sorry, I don't have a better answer,
> but I've often wondered the same thing and I'm hoping someone might
> know better than I.

I was wondering more whether you really have a well defined optimization
problem if you don't really use the parameters. Does the argmin really
end up at the smoothed values, or at some parameterization of the spline?

I would attempt to put the smoothness restriction in a constraint or
rewrite the problem in terms of some lower dimensional "hyper parameters".

Doing it directly might require adjustments to the optimization algorithm,
e.g. how a new point is found, so that it ends up hardcoding the
smoothness constraint into the optimization function.

My impression, and 2.5 cents,

Josef


>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From tmp50 at ukr.net  Sun May  9 10:02:33 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Sun, 09 May 2010 17:02:33 +0300
Subject: [SciPy-User] [OT] any ways to run PETSc4py on several CPU?
Message-ID: <E1OB75V-000Hi3-1q@ffe6.ukr.net>

hi all,  
sorry for using the mail list but I haven't found more suitable.  
Are there any ways to run PETSc4py on several CPU, i.e. something like mpirun -np 4?  
  
Currently I have  
      >>> print PETSc.COMM_WORLD.Get_size()        1    
    
  Thank you in advance, D.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100509/00eec71a/attachment.html>

From dagss at student.matnat.uio.no  Sun May  9 13:35:58 2010
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Sun, 09 May 2010 19:35:58 +0200
Subject: [SciPy-User] [OT] any ways to run PETSc4py on several CPU?
In-Reply-To: <E1OB75V-000Hi3-1q@ffe6.ukr.net>
References: <E1OB75V-000Hi3-1q@ffe6.ukr.net>
Message-ID: <4BE6F27E.5040503@student.matnat.uio.no>

Dmitrey wrote:
>   hi all,
> sorry for using the mail list but I haven't found more suitable.
> Are there any ways to run PETSc4py on several CPU, i.e. something like 
> mpirun -np 4?

I'd ask on the mpi4py mailing list, as Lisandro is a developer of both, 
and it's an MPI-related thing.

-- 
Dag Sverre


From cool-rr at cool-rr.com  Mon May 10 07:37:29 2010
From: cool-rr at cool-rr.com (cool-RR)
Date: Mon, 10 May 2010 13:37:29 +0200
Subject: [SciPy-User] Distributing SciPy and NumPy
Message-ID: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>

Hello,

I have a project called GarlicSim which I want to distribute as an
executable, packaged using py2exe. I want to package numpy and scipy with
it, so they can be used by the end user. This means I'll be distributing an
installer which installs scipy and numpy to my application's library.

Are there any licensing issues I should be aware of? Is there any LGPL or
GPL licensing in there?

Thanks,
Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100510/362433fe/attachment.html>

From dagss at student.matnat.uio.no  Mon May 10 08:08:57 2010
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Mon, 10 May 2010 14:08:57 +0200
Subject: [SciPy-User] Distributing SciPy and NumPy
In-Reply-To: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
Message-ID: <4BE7F759.7060100@student.matnat.uio.no>

cool-RR wrote:
> Hello,
>
> I have a project called GarlicSim which I want to distribute as an 
> executable, packaged using py2exe. I want to package numpy and scipy 
> with it, so they can be used by the end user. This means I'll be 
> distributing an installer which installs scipy and numpy to my 
> application's library.
>
> Are there any licensing issues I should be aware of? Is there any LGPL 
> or GPL licensing in there?
You need a LAPACK implementation to use SciPy, and those come with 
various licenses. But SciPy+ATLAS is a common combination which is all BSD.

SciPy developers are pretty conscious about keeping GPL or LGPL code out 
of the main SciPy library (though some scikits libraries are under GPL).

Dag Sverre


From cool-rr at cool-rr.com  Mon May 10 09:18:22 2010
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Mon, 10 May 2010 13:18:22 +0000 (UTC)
Subject: [SciPy-User] Distributing SciPy and NumPy
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
	<4BE7F759.7060100@student.matnat.uio.no>
Message-ID: <loom.20100510T151630-951@post.gmane.org>

Dag Sverre Seljebotn <dagss <at> student.matnat.uio.no> writes:

> 
> cool-RR wrote:
>
> > Are there any licensing issues I should be aware of? Is there any LGPL 
> > or GPL licensing in there?
>
> You need a LAPACK implementation to use SciPy, and those come with 
> various licenses. But SciPy+ATLAS is a common combination which is all BSD.
> 
> SciPy developers are pretty conscious about keeping GPL or LGPL code out 
> of the main SciPy library (though some scikits libraries are under GPL).
> 
> Dag Sverre
> 

Hey Dag,

I've installed numpy and scipy using the standard installers from the website. 
(Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL?

Should I worry about those scikits libraries? Are they in numpy/scipy?

Ram.


From pav at iki.fi  Mon May 10 09:27:15 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 10 May 2010 13:27:15 +0000 (UTC)
Subject: [SciPy-User] Distributing SciPy and NumPy
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
	<4BE7F759.7060100@student.matnat.uio.no>
	<loom.20100510T151630-951@post.gmane.org>
Message-ID: <hs91jj$iht$1@dough.gmane.org>

Mon, 10 May 2010 13:18:22 +0000, Ram Rachum wrote:
> I've installed numpy and scipy using the standard installers from the
> website. (Not EPD or Python(x,y)). Is this installation free of any
> GPL/LGPL?

Should be.

> Should I worry about those scikits libraries?

Only if you have installed some of them.

> Are they in numpy/scipy?

No. They are separate libraries.

-- 
Pauli Virtanen


From cool-rr at cool-rr.com  Mon May 10 10:20:32 2010
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Mon, 10 May 2010 14:20:32 +0000 (UTC)
Subject: [SciPy-User] Distributing SciPy and NumPy
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
	<4BE7F759.7060100@student.matnat.uio.no>
	<loom.20100510T151630-951@post.gmane.org>
	<hs91jj$iht$1@dough.gmane.org>
Message-ID: <loom.20100510T161923-341@post.gmane.org>

Pauli Virtanen <pav <at> iki.fi> writes:
| > I've installed numpy and scipy using the standard installers from the
| > website. (Not EPD or Python(x,y)). Is this installation free of any
| > GPL/LGPL?
| Should be.
| > Should I worry about those scikits libraries?
| Only if you have installed some of them.
| > Are they in numpy/scipy?
| No. They are separate libraries.

Great.

Thanks for your help,
Ram.


From matthew.brett at gmail.com  Mon May 10 15:56:25 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 10 May 2010 12:56:25 -0700
Subject: [SciPy-User] trouble loading one .mat file
In-Reply-To: <u2r1e2af89e1005071435j4f8bcd52pdfc0011d4ca9966e@mail.gmail.com>
References: <j2he85f13c41005071328h227aa75fjdd5ab819753ff3f@mail.gmail.com>
	<u2r1e2af89e1005071435j4f8bcd52pdfc0011d4ca9966e@mail.gmail.com>
Message-ID: <m2l1e2af89e1005101256y8bccfc94u3ea5f40d6bc7a8cb@mail.gmail.com>

Hi,

>> After upgrading to svn scipy 0.8.0.dev6369, to take advantage of Matthew
>> Brett's bugfix to the scipy.io code (thanks for that, Matthew), I now have
>> one matlab file which I cannot load using scipy.io.loadmat. Trying it gives
>> the following error:
...
>> --> 354 ? ? ? ? ? ? stream =
>> StringIO(zlib.decompress(self.mat_stream.read(byte_count)))
>> ?? ?355 ? ? ? ? ? ? self._matrix_reader.set_stream(stream)
>> ?? ?356 ? ? ? ? ? ? mdtype, byte_count = self._matrix_reader.read_full_tag()
>> error: Error -5 while decompressing data

This proved to be an odd one; http://bugs.python.org/issue7191

I've committed the workaround I put in the bug report above; it seems
to have a tiny performance penalty.   Please do let me know if the fix
doesn't help or causes more problems,

See you,

Matthew


From martin.felder at zsw-bw.de  Wed May  5 06:03:05 2010
From: martin.felder at zsw-bw.de (Martin Felder)
Date: Wed, 05 May 2010 12:03:05 +0200
Subject: [SciPy-User] scikits.timeseries: How to define frequency of
	15minutes
Message-ID: <fb7cd7eb49d9.4be15e79@zsw-bw.de>

Hi *,

just for the record, I'm having the exact same problem as Georges. I read through your discussion from three weeks ago, but I also don't feel up to modifying the C code myself (being a Fortran kind of guy...).

I understand implementing custom user-defined frequencies is probably a lot of effort, but maybe it's less troublesome to just add some frequencies often used (=by Georges and me, and hopefully others?) to the currently implemented ones? I'd be extremely happy to have 12h, 6h, 3h, 15min and 10min intervals in addition to the existing ones. 

If you could point me to the part of the code that would have to be modified for that, maybe I can find someone more apt in C who can implement it.

Thanks,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/49a96cbc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: martin.felder.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: Card for Martin Felder <martin.felder at zsw-bw.de>
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100505/49a96cbc/attachment.vcf>

From kmichael.aye at gmail.com  Tue May 11 10:04:31 2010
From: kmichael.aye at gmail.com (K.-Michael Aye)
Date: Tue, 11 May 2010 16:04:31 +0200
Subject: [SciPy-User] Distributing SciPy and NumPy
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com>
	<4BE7F759.7060100@student.matnat.uio.no>
	<loom.20100510T151630-951@post.gmane.org>
Message-ID: <hsbo5f$v9a$1@dough.gmane.org>

On 2010-05-10 15:18:22 +0200, Ram Rachum said:

> Dag Sverre Seljebotn <dagss <at> student.matnat.uio.no> writes:
> 
>> 
>> cool-RR wrote:
>> 
>>> Are there any licensing issues I should be aware of? Is there any LGPL
>>> or GPL licensing in there?
>> 
>> You need a LAPACK implementation to use SciPy, and those come with
>> various licenses. But SciPy+ATLAS is a common combination which is all BSD.
>> 
>> SciPy developers are pretty conscious about keeping GPL or LGPL code out
>> of the main SciPy library (though some scikits libraries are under GPL).
>> 
>> Dag Sverre
>> 
> 
> Hey Dag,
> 
> I've installed numpy and scipy using the standard installers from the website.
> (Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL?

Please excuse my ignorance respectively my legal insecurity, but am I 
right in assuming, that the only 'problem' I would have with scipy or 
numpy being released under GPL/LGPL, if I were to release my app NOT 
under GPL/LGPL?
In other words, if i release my app using libraries under GPL/LPGL, all 
I have to worry is, to release it the same way, right? (Assuming I 
don't want to earn money with it).

This legal stuff confuses the hell outta me... :S

Best regards,
Michael

> 
> Should I worry about those scikits libraries? Are they in numpy/scipy?
> 
> Ram.


From ben.root at ou.edu  Tue May 11 11:04:30 2010
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 11 May 2010 10:04:30 -0500
Subject: [SciPy-User] Distributing SciPy and NumPy
In-Reply-To: <hsbo5f$v9a$1@dough.gmane.org>
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com> 
	<4BE7F759.7060100@student.matnat.uio.no>
	<loom.20100510T151630-951@post.gmane.org> 
	<hsbo5f$v9a$1@dough.gmane.org>
Message-ID: <AANLkTil8a7X0a_ZjTYPvXQmf9Rz0sU9AT8R09Vf7USda@mail.gmail.com>

On Tue, May 11, 2010 at 9:04 AM, K.-Michael Aye <kmichael.aye at gmail.com>wrote:

> On 2010-05-10 15:18:22 +0200, Ram Rachum said:
>
> > Dag Sverre Seljebotn <dagss <at> student.matnat.uio.no> writes:
> >
> >>
> >> cool-RR wrote:
> >>
> >>> Are there any licensing issues I should be aware of? Is there any LGPL
> >>> or GPL licensing in there?
> >>
> >> You need a LAPACK implementation to use SciPy, and those come with
> >> various licenses. But SciPy+ATLAS is a common combination which is all
> BSD.
> >>
> >> SciPy developers are pretty conscious about keeping GPL or LGPL code out
> >> of the main SciPy library (though some scikits libraries are under GPL).
> >>
> >> Dag Sverre
> >>
> >
> > Hey Dag,
> >
> > I've installed numpy and scipy using the standard installers from the
> website.
> > (Not EPD or Python(x,y)). Is this installation free of any GPL/LGPL?
>
> Please excuse my ignorance respectively my legal insecurity, but am I
> right in assuming, that the only 'problem' I would have with scipy or
> numpy being released under GPL/LGPL, if I were to release my app NOT
> under GPL/LGPL?
>
The GPL/LGPL is a distribution license, so it can only dictate terms for
redistribution of code.
Software using GPL'ed code must also be released under a GPL-compatible
license.  All of the source codes (including changes you made to the
original code) must remain open. Software using LGPL'ed code can be released
using other licenses, but the LGPL'ed code (and any changes you made to it)
must remain open.

It is best practice to have the source code accompany the software package,
but as far as I understand, this isn't a requirement so long as the code is
available by request.  Someone else can correct me on this.


> In other words, if i release my app using libraries under GPL/LPGL, all
> I have to worry is, to release it the same way, right? (Assuming I
> don't want to earn money with it).
>

Argh!  You can make money on open source code!  This isn't the proper place
to discuss it, but the open-source community is not a charity case.
Open-source is a very viable business model.


> This legal stuff confuses the hell outta me... :S
>

Same here.  Also, IANAL, so this isn't legal advice, merely the distillation
of various discussions on this topic.

Sincerely,
Ben Root


>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100511/4cc59ca9/attachment.html>

From aarchiba at physics.mcgill.ca  Tue May 11 13:32:18 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Tue, 11 May 2010 13:32:18 -0400
Subject: [SciPy-User] Distributing SciPy and NumPy
In-Reply-To: <AANLkTil8a7X0a_ZjTYPvXQmf9Rz0sU9AT8R09Vf7USda@mail.gmail.com>
References: <AANLkTik_A72ItyeE6k4bn5Dm7lF6ADs2LoGLZYXfWPaI@mail.gmail.com> 
	<4BE7F759.7060100@student.matnat.uio.no>
	<loom.20100510T151630-951@post.gmane.org> 
	<hsbo5f$v9a$1@dough.gmane.org>
	<AANLkTil8a7X0a_ZjTYPvXQmf9Rz0sU9AT8R09Vf7USda@mail.gmail.com>
Message-ID: <AANLkTikJ-1nnS1Bpn7pPbLo782CJuimWWNBuFy7SE9bc@mail.gmail.com>

On 11 May 2010 11:04, Benjamin Root <ben.root at ou.edu> wrote:

> Argh!  You can make money on open source code!  This isn't the proper place
> to discuss it, but the open-source community is not a charity case.
> Open-source is a very viable business model.

Not to spawn a  discussion, but this is germane - Enthought, that
maintains mayavi and pays many of the main numpy/scipy developers, is
a for-profit open-source-based company. They just have a different
business model than Microsoft (thankfully!).

>> This legal stuff confuses the hell outta me... :S
>
> Same here.  Also, IANAL, so this isn't legal advice, merely the distillation
> of various discussions on this topic.

Copyright law is a nightmare.

Anne


From permafacture at gmail.com  Tue May 11 13:37:42 2010
From: permafacture at gmail.com (Elliot Hallmark)
Date: Tue, 11 May 2010 12:37:42 -0500
Subject: [SciPy-User] help interpreting univariate spline
In-Reply-To: <E6155323-AD72-4E53-AAC8-11A809B85AB8@cs.toronto.edu>
References: <t2va07a0f1c1004231653g630f0a71ma1c2e1779c594941@mail.gmail.com>
	<n2na07a0f1c1004261226m5d5416c7ib79812469d9978b6@mail.gmail.com>
	<o2kce557a361004300704ucf8b7b52tc975fd46f5785508@mail.gmail.com>
	<v2sa07a0f1c1004301314z6dbf63d0p9177299c6cfda4fb@mail.gmail.com>
	<E6155323-AD72-4E53-AAC8-11A809B85AB8@cs.toronto.edu>
Message-ID: <AANLkTilhDV7A1q3b-c-D2HC-wTs0wHAuygl4FXd2Qc9B@mail.gmail.com>

> It's documented, in the FITPACK user's manual, and possibly in that book that I pointed you to in another reply.

I had seen this before, and I think joseph is right that the
coeeficents given are for the form given by wikipedia.  So, my lack of
understanding is just mathematical.

For others who come across this, here is the solution I used.

First, we wanted the spline in bezier form, so we found code for
adding knots to a bspline to get the the bezier knots.  These knots
are all on the curve and are sufficent to define the curve so I used
linear albegra to solve for the coefficents from the points.

the code to put the spline in bezier form is
http://mail.scipy.org/pipermail/scipy-dev/2007-February/006651.html

(actually, I got it from
http://old.nabble.com/bezier-curve-through-set-of-2D-points-td27158642.html)

We used a quadratic spline (to start with) which is defined by three
points.  I had to determine the normal vector at some point on the
curve, so here is that function (in cython code) computing the
derivative given the 2D quadratic bezier knots.

line 363 in
https://bitbucket.org/permafacture/solar-concentrator-design/changeset/6b90db3cf454#chg-raytrace/cfaces.pyx

just calculating the determinant and adjoint matrix to solve y = ax^2
+ bx + c for A,B and C given three (x,y) pairs.

thanks all.


From gideon.simpson at gmail.com  Tue May 11 15:09:32 2010
From: gideon.simpson at gmail.com (Gideon)
Date: Tue, 11 May 2010 12:09:32 -0700 (PDT)
Subject: [SciPy-User] writing data to binary for fortran
Message-ID: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>

I've previously used the FortranFile.py to read in binary data
generated by fortran computations, but now I'd like to write data from
NumPy/SciPy to binary which can be read in by a fortran program.  Does
anyone have an example of using fortranfile.py to create and write
data to binary?  Alternatively, can anyone suggest a way to write
numpy arrays to binary in away that permits me to specify the correct
offset (4 bytes on my machine) for fortran to then properly read the
data in?


From kwmsmith at gmail.com  Tue May 11 15:29:05 2010
From: kwmsmith at gmail.com (Kurt Smith)
Date: Tue, 11 May 2010 14:29:05 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
Message-ID: <AANLkTinouMWfuQXTJPL679fyoXbviuGwKupdICGOapk-@mail.gmail.com>

On Tue, May 11, 2010 at 2:09 PM, Gideon <gideon.simpson at gmail.com> wrote:
> I've previously used the FortranFile.py to read in binary data
> generated by fortran computations, but now I'd like to write data from
> NumPy/SciPy to binary which can be read in by a fortran program. ?Does
> anyone have an example of using fortranfile.py to create and write
> data to binary? ?Alternatively, can anyone suggest a way to write
> numpy arrays to binary in away that permits me to specify the correct
> offset (4 bytes on my machine) for fortran to then properly read the
> data in?

I have a couple of simple fortran reading/writing routines (in python)
that work with numpy arrays.  I've attached them to this email -- use
as you see fit.  Hopefully they help, or at least show you how to do
what you want.

Kurt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fio.py
Type: text/x-python
Size: 3375 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100511/7bc992aa/attachment.py>

From nmb at wartburg.edu  Tue May 11 15:58:37 2010
From: nmb at wartburg.edu (Neil Martinsen-Burrell)
Date: Tue, 11 May 2010 14:58:37 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
Message-ID: <4BE9B6ED.1020603@wartburg.edu>

On 2010-05-11 14:09, Gideon wrote:
> I've previously used the FortranFile.py to read in binary data
> generated by fortran computations, but now I'd like to write data from
> NumPy/SciPy to binary which can be read in by a fortran program.  Does
> anyone have an example of using fortranfile.py to create and write
> data to binary?  Alternatively, can anyone suggest a way to write
> numpy arrays to binary in away that permits me to specify the correct
> offset (4 bytes on my machine) for fortran to then properly read the
> data in?

You can use the writeReals method of a FortranFile object:

In [1]: import fortranfile

In [2]: import numpy as np

In [3]: F = fortranfile.FortranFile('test.unf',mode='w')

In [4]: F.writeReals(np.linspace(0,1,10))

In [5]: F.close()

In [6]: !ls -l 'test.unf'
-rw-r--r-- 1 nmb nmb 48 2010-05-11 14:56 test.unf

There are also writeInts and writeString methods.  Like usual, 
FortranFile only writes and reads homogeneous records: all integers, all 
reals, etc.  To write fortran files with items of different types in a 
single record, you will have to work harder, perhaps using the struct 
module directly.

-Neil


From goodfellow.ian at gmail.com  Tue May 11 16:04:00 2010
From: goodfellow.ian at gmail.com (Ian Goodfellow)
Date: Tue, 11 May 2010 16:04:00 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
Message-ID: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>

I've find that (scipy/numpy).linalg.eig have a problem where given a
symmetric matrix they return complex eigenvalues. I can use scipy.io
to save this matrix in matlab format, load it in matlab, and use
matlab's eig function to succesfully decompose it with real
eigenvalues, so the problem seems to be with scipy/numpy or their
dependencies, not with my matrix. Is this a known issue? And is there
a good workaround?

I saw another mailing post elsewhere that recommended using
scipy.sparse.linalg.eigen.arpack.eigen as an alternative but it
doesn't seem to work at all. Can anyone recommend some other way of
getting an eigenvalue decomposition in scipy or explain how to use
arpack?

My failed attempts at using arpack are below.

Thanks,
Ian

>>> A = N.random.randn(3,3)
>>> B = arpack.eigen(A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scip
                y/sparse/linalg/eigen/arpack/arpack.py", line 172, in
eigen
    raise ValueError("ncv must be k<=ncv<=n, ncv=%s"%ncv)
ValueError: ncv must be k<=ncv<=n, ncv=3
>>> B = arpack.eigen(A,3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
line 165, in eigen
    raise ValueError("k must be less than rank(A), k=%d"%k)
ValueError: k must be less than rank(A), k=3
>>> B = arpack.eigen(A,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
line 220, in eigen
    raise RuntimeError("Error info=%d in arpack"%info)
RuntimeError: Error info=-3 in arpack
>>> B = arpack.eigen(A,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
line 220, in eigen
    raise RuntimeError("Error info=%d in arpack"%info)
RuntimeError: Error info=-3 in arpack


From josef.pktd at gmail.com  Tue May 11 16:20:45 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 11 May 2010 16:20:45 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
Message-ID: <AANLkTiloU7htPBfsPuGB2tbB_8rsm7toSS5U_9qo1ODv@mail.gmail.com>

On Tue, May 11, 2010 at 4:04 PM, Ian Goodfellow
<goodfellow.ian at gmail.com> wrote:
> I've find that (scipy/numpy).linalg.eig have a problem where given a
> symmetric matrix they return complex eigenvalues. I can use scipy.io
> to save this matrix in matlab format, load it in matlab, and use
> matlab's eig function to succesfully decompose it with real
> eigenvalues, so the problem seems to be with scipy/numpy or their
> dependencies, not with my matrix. Is this a known issue? And is there
> a good workaround?

you could try linalg.eigh
it's more specialized and I found that it produces the usual expected
results for symmetric matrices

Josef

>
> I saw another mailing post elsewhere that recommended using
> scipy.sparse.linalg.eigen.arpack.eigen as an alternative but it
> doesn't seem to work at all. Can anyone recommend some other way of
> getting an eigenvalue decomposition in scipy or explain how to use
> arpack?
>
> My failed attempts at using arpack are below.
>
> Thanks,
> Ian
>
>>>> A = N.random.randn(3,3)
>>>> B = arpack.eigen(A)
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scip
> ? ? ? ? ? ? ? ?y/sparse/linalg/eigen/arpack/arpack.py", line 172, in
> eigen
> ? ?raise ValueError("ncv must be k<=ncv<=n, ncv=%s"%ncv)
> ValueError: ncv must be k<=ncv<=n, ncv=3
>>>> B = arpack.eigen(A,3)
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
> line 165, in eigen
> ? ?raise ValueError("k must be less than rank(A), k=%d"%k)
> ValueError: k must be less than rank(A), k=3
>>>> B = arpack.eigen(A,2)
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
> line 220, in eigen
> ? ?raise RuntimeError("Error info=%d in arpack"%info)
> RuntimeError: Error info=-3 in arpack
>>>> B = arpack.eigen(A,2)
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/u/lisa/local/export.soft.lisa.master/linux-x86_64-fc9.x86_64//lib64/python2.5/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py",
> line 220, in eigen
> ? ?raise RuntimeError("Error info=%d in arpack"%info)
> RuntimeError: Error info=-3 in arpack
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From pav at iki.fi  Tue May 11 16:39:59 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Tue, 11 May 2010 20:39:59 +0000 (UTC)
Subject: [SciPy-User] Eigenvalue decomposition bug
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
Message-ID: <hscfav$in7$1@dough.gmane.org>

Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
> I've find that (scipy/numpy).linalg.eig have a problem where given a
> symmetric matrix they return complex eigenvalues. I can use scipy.io to
> save this matrix in matlab format, load it in matlab, and use matlab's
> eig function to succesfully decompose it with real eigenvalues, so the
> problem seems to be with scipy/numpy or their dependencies, not with my
> matrix. Is this a known issue? And is there a good workaround?

Use the eigh function if you know your matrix is symmetric.

Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a 
symmetric-specific eigensolver. Numpy and Scipy don't do this automatic 
check.

A nonsymmetric eigensolver cannot know that your matrix is supposed to 
have real eigenvalues, so it's possible some of them explode to complex 
pairs because of minuscule numerical error. The imaginary part, however, 
is typically small.

-- 
Pauli Virtanen


From nahumoz at gmail.com  Tue May 11 23:47:27 2010
From: nahumoz at gmail.com (Oz Nahum)
Date: Tue, 11 May 2010 20:47:27 -0700
Subject: [SciPy-User] finding max value in a vector which contains NaN's
Message-ID: <k2t6ec71d091005112047gaa020138s36a390971d677d98@mail.gmail.com>

Hi All,
I have a code that needs to find a max value in a vector, which has also NaN.
using max(cr), I get answer: nan, even though, the largest value is 15.1879....

Anyone has an idea how to avoid this problem ? I don't want to make a
loop to kick out all the NaN values, although that would be a
solution...

Thanks in advance,

-- 
Oz Nahum
Graduate Student
Zentrum f?r Angewandte Geologie
Universit?t T?bingen

---

Imagine there's no countries
it isn't hard to do
Nothing to kill or die for
And no religion too
Imagine all the people
Living life in peace


From zachary.pincus at yale.edu  Wed May 12 00:02:07 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 12 May 2010 00:02:07 -0400
Subject: [SciPy-User] mail not getting through?
Message-ID: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>

Hi and sorry for the spam,

The last couple of times I've replied to messages from scipy-user, it  
would appear that the mail never comes through to the list, but it  
doesn't bounce back to me either. (I replied to the symmetric  
eigenvalue message, but nothing came back on the list, e.g.)

If this email gets through, has anyone else seen this issue?

Zach


From pgmdevlist at gmail.com  Wed May 12 00:24:35 2010
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 12 May 2010 00:24:35 -0400
Subject: [SciPy-User] finding max value in a vector which contains NaN's
In-Reply-To: <k2t6ec71d091005112047gaa020138s36a390971d677d98@mail.gmail.com>
References: <k2t6ec71d091005112047gaa020138s36a390971d677d98@mail.gmail.com>
Message-ID: <8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com>

On May 11, 2010, at 11:47 PM, Oz Nahum wrote:
> Hi All,
> I have a code that needs to find a max value in a vector, which has also NaN.
> using max(cr), I get answer: nan, even though, the largest value is 15.1879....
> 
> Anyone has an idea how to avoid this problem ? I don't want to make a
> loop to kick out all the NaN values, although that would be a
> solution...


Use `nanmax` (a numpy function).


From ariver at enthought.com  Wed May 12 01:21:19 2010
From: ariver at enthought.com (Aaron River)
Date: Wed, 12 May 2010 00:21:19 -0500
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
Message-ID: <AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>

Hello Zach,

I'm the IT Administrator at Enthought.

This is a known issue which I'm working to rectify. I'm hoping to have
it all ironed out tomorrow.

Thanks,

-- 
Aaron

On Tuesday, May 11, 2010, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hi and sorry for the spam,
>
> The last couple of times I've replied to messages from scipy-user, it
> would appear that the mail never comes through to the list, but it
> doesn't bounce back to me either. (I replied to the symmetric
> eigenvalue message, but nothing came back on the list, e.g.)
>
> If this email gets through, has anyone else seen this issue?
>
> Zach
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From zachary.pincus at yale.edu  Wed May 12 01:56:49 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 12 May 2010 01:56:49 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
Message-ID: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>

Hi all,

I've been meaning for a long time to look into cobbling together some  
non-broken, maintainable (e.g. non-PIL) image IO library that can deal  
with scientific (16-bit and floating-point) image formats. I finally  
bit the bullet yesterday and whipped together a ctypes wrapper for the  
FreeImage library. (FreeImage is portable and largely if not entirely  
dependency-free; Windows binaries are available and it compiles  
cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/ 
  Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf 
  , particularly the appendix that shows the supported image types and  
pixel formats: pretty impressive. Also note that there is a  
"FreeImagePy" project that has ctypes wrappers for FreeImage, but the  
code is... idiosyncratic... and doesn't interface with numpy anyway.)

The underlying library and wrappers I wrote support reading and  
writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint  
and 32-bit float pixels, as well as greyscale images with 64-bit float  
and 128-bit complex pixels. (The TIFF spec supports all of these, at  
least, as does FreeImage, but most other TIFF readers probably don't.  
The PNG format itself is a bit more limited, but FreeImage can read/ 
write everything in the spec, I think. Most other formats are 8-bit  
only.) Multipage image IO is also supported, and there's currently a  
bit of support for reading EXIF tags, which could easily be beefed up.

The wrapper code is pretty compact and straightforward, and the  
FreeImage library seems pretty robust and simple (once one notes that  
it uses BGRA ordering on little-endian systems). Overall I feel a lot  
better about using this than dealing with PIL and its broken memory  
model and worse patch-acceptance track record.

If anyone wants to test the wrappers out, I'll send you the code.  
Going forward, I'll look into getting this into the scikits image IO  
system, but I don't really have free cycles for that right now.

Zach

PS. FreeImage is dual licensed: GPL and a "FreeImage license", the  
latter of which I have no idea if is BSD compatible -- it says it's  
"less restrictive" than GPL but I'm unable to parse the license's many  
clauses. In any case, as long as users are required to provide their  
own FreeImage dll/so/dylib, it's not really a problem.


From josef.pktd at gmail.com  Wed May 12 02:15:01 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 12 May 2010 02:15:01 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
Message-ID: <AANLkTilFGW87bjJylwdAQJqxznCmywgo6irM423O6aMS@mail.gmail.com>

On Wed, May 12, 2010 at 1:56 AM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hi all,
>
> I've been meaning for a long time to look into cobbling together some
> non-broken, maintainable (e.g. non-PIL) image IO library that can deal
> with scientific (16-bit and floating-point) image formats. I finally
> bit the bullet yesterday and whipped together a ctypes wrapper for the
> FreeImage library. (FreeImage is portable and largely if not entirely
> dependency-free; Windows binaries are available and it compiles
> cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/
> ?Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf
> ?, particularly the appendix that shows the supported image types and
> pixel formats: pretty impressive. Also note that there is a
> "FreeImagePy" project that has ctypes wrappers for FreeImage, but the
> code is... idiosyncratic... and doesn't interface with numpy anyway.)
>
> The underlying library and wrappers I wrote support reading and
> writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint
> and 32-bit float pixels, as well as greyscale images with 64-bit float
> and 128-bit complex pixels. (The TIFF spec supports all of these, at
> least, as does FreeImage, but most other TIFF readers probably don't.
> The PNG format itself is a bit more limited, but FreeImage can read/
> write everything in the spec, I think. Most other formats are 8-bit
> only.) Multipage image IO is also supported, and there's currently a
> bit of support for reading EXIF tags, which could easily be beefed up.
>
> The wrapper code is pretty compact and straightforward, and the
> FreeImage library seems pretty robust and simple (once one notes that
> it uses BGRA ordering on little-endian systems). Overall I feel a lot
> better about using this than dealing with PIL and its broken memory
> model and worse patch-acceptance track record.
>
> If anyone wants to test the wrappers out, I'll send you the code.
> Going forward, I'll look into getting this into the scikits image IO
> system, but I don't really have free cycles for that right now.
>
> Zach
>
> PS. FreeImage is dual licensed: GPL and a "FreeImage license", the
> latter of which I have no idea if is BSD compatible -- it says it's
> "less restrictive" than GPL but I'm unable to parse the license's many
> clauses. In any case, as long as users are required to provide their
> own FreeImage dll/so/dylib, it's not really a problem.

"FreeImage Public license" looks like http://www.mozilla.org/MPL/MPL-1.1.html
no item 13 is the only difference from a quick look

Josef


> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From goodfellow.ian at gmail.com  Wed May 12 08:43:50 2010
From: goodfellow.ian at gmail.com (Ian Goodfellow)
Date: Wed, 12 May 2010 08:43:50 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <hscfav$in7$1@dough.gmane.org>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
	<hscfav$in7$1@dough.gmane.org>
Message-ID: <AANLkTikyOTq7X4l0BsLzHiBlMCWG_BLf5FCdxn-WLDkj@mail.gmail.com>

Great, thanks. eigh seems to be working.
-Ian

On Tue, May 11, 2010 at 4:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>> save this matrix in matlab format, load it in matlab, and use matlab's
>> eig function to succesfully decompose it with real eigenvalues, so the
>> problem seems to be with scipy/numpy or their dependencies, not with my
>> matrix. Is this a known issue? And is there a good workaround?
>
> Use the eigh function if you know your matrix is symmetric.
>
> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
> check.
>
> A nonsymmetric eigensolver cannot know that your matrix is supposed to
> have real eigenvalues, so it's possible some of them explode to complex
> pairs because of minuscule numerical error. The imaginary part, however,
> is typically small.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From seb.haase at gmail.com  Wed May 12 10:41:16 2010
From: seb.haase at gmail.com (Sebastian Haase)
Date: Wed, 12 May 2010 16:41:16 +0200
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
Message-ID: <AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com>

Hi Zach,

this sounds exciting and I might find some time to try it out ...
BTW, the Python image-sig  should not be a "PIL only" mailing list. So
(eventually) I feel, this issue could be brought up there, too.

But most importantly, I think it would be great to finally have a
"small footprint" image-format library that does not try to reproduce
all kinds of operations that we can do easily in numpy.

Do you know if FreeImage does anything via memory-mapping ? I'm mostly
interested in TIFF-memmap, which exists according to libtif, but I
have now idea how useful it is .....  (I need memmap for GB-size
multipage images)

Thanks,
Sebastian


On Wed, May 12, 2010 at 7:56 AM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hi all,
>
> I've been meaning for a long time to look into cobbling together some
> non-broken, maintainable (e.g. non-PIL) image IO library that can deal
> with scientific (16-bit and floating-point) image formats. I finally
> bit the bullet yesterday and whipped together a ctypes wrapper for the
> FreeImage library. (FreeImage is portable and largely if not entirely
> dependency-free; Windows binaries are available and it compiles
> cleanly on os x as well as other unixes: http://freeimage.sourceforge.net/
> ?Check out the manual: http://downloads.sourceforge.net/freeimage/FreeImage3131.pdf
> ?, particularly the appendix that shows the supported image types and
> pixel formats: pretty impressive. Also note that there is a
> "FreeImagePy" project that has ctypes wrappers for FreeImage, but the
> code is... idiosyncratic... and doesn't interface with numpy anyway.)
>
> The underlying library and wrappers I wrote support reading and
> writing of greyscale, RGB, and RGBA images with 8- and 16-bit int/uint
> and 32-bit float pixels, as well as greyscale images with 64-bit float
> and 128-bit complex pixels. (The TIFF spec supports all of these, at
> least, as does FreeImage, but most other TIFF readers probably don't.
> The PNG format itself is a bit more limited, but FreeImage can read/
> write everything in the spec, I think. Most other formats are 8-bit
> only.) Multipage image IO is also supported, and there's currently a
> bit of support for reading EXIF tags, which could easily be beefed up.
>
> The wrapper code is pretty compact and straightforward, and the
> FreeImage library seems pretty robust and simple (once one notes that
> it uses BGRA ordering on little-endian systems). Overall I feel a lot
> better about using this than dealing with PIL and its broken memory
> model and worse patch-acceptance track record.
>
> If anyone wants to test the wrappers out, I'll send you the code.
> Going forward, I'll look into getting this into the scikits image IO
> system, but I don't really have free cycles for that right now.
>
> Zach
>
> PS. FreeImage is dual licensed: GPL and a "FreeImage license", the
> latter of which I have no idea if is BSD compatible -- it says it's
> "less restrictive" than GPL but I'm unable to parse the license's many
> clauses. In any case, as long as users are required to provide their
> own FreeImage dll/so/dylib, it's not really a problem.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From ben.root at ou.edu  Wed May 12 10:48:35 2010
From: ben.root at ou.edu (Benjamin Root)
Date: Wed, 12 May 2010 09:48:35 -0500
Subject: [SciPy-User] finding max value in a vector which contains NaN's
In-Reply-To: <8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com>
References: <k2t6ec71d091005112047gaa020138s36a390971d677d98@mail.gmail.com> 
	<8C6714EC-31C3-4EA9-BEE6-06C8E71AD427@gmail.com>
Message-ID: <AANLkTikTXyoT5V083yKJhOjI6F8GnQz_b8jM79qGg-yi@mail.gmail.com>

FYI, if you are coming from another language like Matlab, you may have been
used to using NaNs to indicate bad values and such (not that there is
anything wrong with that!).  However, Numpy offers an interesting way to
deal with bad values in arrays called "Masked Arrays".

>>> import numpy as np
>>> x = np.array([2, 1, 3, np.nan, 5, 2, 3, np.nan])

>>> np.max(x)
nan

>>> m = np.ma.masked_array(x, np.isnan(x))
>>> np.max(m)
5.0

Ben Root


On Tue, May 11, 2010 at 11:24 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

> On May 11, 2010, at 11:47 PM, Oz Nahum wrote:
> > Hi All,
> > I have a code that needs to find a max value in a vector, which has also
> NaN.
> > using max(cr), I get answer: nan, even though, the largest value is
> 15.1879....
> >
> > Anyone has an idea how to avoid this problem ? I don't want to make a
> > loop to kick out all the NaN values, although that would be a
> > solution...
>
>
> Use `nanmax` (a numpy function).
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100512/eb55cd5c/attachment.html>

From zachary.pincus at yale.edu  Wed May 12 13:10:19 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 12 May 2010 13:10:19 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
	<AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com>
Message-ID: <E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>

> Do you know if FreeImage does anything via memory-mapping ? I'm mostly
> interested in TIFF-memmap, which exists according to libtif, but I
> have now idea how useful it is .....  (I need memmap for GB-size
> multipage images)

I don't know a ton about how memmapping works, but check out these  
functions from FreeImage:

> FreeImage_OpenMemory
> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data  
> FI_DEFAULT(0), DWORD
> size_in_bytes FI_DEFAULT(0));
>
> Open a memory stream. The function returns a pointer to the opened  
> memory stream.
> When called with default arguments (0), this function opens a memory  
> stream for read / write
> access. The stream will support loading and saving of FIBITMAP in a  
> memory file (managed
> internally by FreeImage). It will also support seeking and telling  
> in the memory file.
> This function can also be used to wrap a memory buffer provided by  
> the application driving
> FreeImage. A buffer containing image data is given as function  
> arguments data (start of the
> buffer) and size_in_bytes (buffer size in bytes). A memory buffer  
> wrapped by FreeImage is
> read only. Images can be loaded but cannot be saved.

> FreeImage_LoadFromHandle
> DLL_API FIBITMAP *DLL_CALLCONV  
> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif,
> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0));
>
> FreeImage has the unique feature to load a bitmap from an arbitrary  
> source. This source
> might for example be a cabinet file, a zip file or an Internet  
> stream. Handling of these arbitrary
> sources is not directly handled in the FREEIMAGE.DLL, but can be  
> easily added by using a
> FreeImageIO structure as defined in FREEIMAGE.H.
> FreeImageIO is a structure that contains 4 function pointers: one to  
> read from a source, one
> to write to a source, one to seek in the source and one to tell  
> where in the source we currently
> are. When you populate the FreeImageIO structure with pointers to  
> functions and pass that
> structure to FreeImage_LoadFromHandle, FreeImage will call your  
> functions to read, seek
> and tell in a file. The handle-parameter (third parameter from the  
> left) is used in this to
> differentiate between different contexts, e.g. different files or  
> different Internet streams.

With the first, I think you could just pass the void* pointer returned  
from memmapping a file; with the second, I think you could wrap a  
memmapped file with a file-like interface (implemented in python  
callbacks, even). Not sure, of course, if that will work OK... Might  
be easier to work with wrappers to libtiff directly?

Zach


From kwmsmith at gmail.com  Wed May 12 13:36:22 2010
From: kwmsmith at gmail.com (Kurt Smith)
Date: Wed, 12 May 2010 12:36:22 -0500
Subject: [SciPy-User] Bug in ndimage.map_coordinates with mode='wrap' ?
In-Reply-To: <c266a7931003221024u1a1d8209k974e2d644ab27203@mail.gmail.com>
References: <c266a7931003221020j272337abn51a323b1653699d3@mail.gmail.com>
	<c266a7931003221024u1a1d8209k974e2d644ab27203@mail.gmail.com>
Message-ID: <AANLkTim1mxuvAnKo6rRVckDHepT9Zhp_krZ4ouOHm3JL@mail.gmail.com>

On Mon, Mar 22, 2010 at 11:24 AM, Kurt Smith <kwmsmith at gmail.com> wrote:
> On Mon, Mar 22, 2010 at 11:20 AM, Kurt Smith <kwmsmith at gmail.com> wrote:
>> Hi,
>>
>> Testing the example code in ndimage.map_coordinate's docstring, I
>> can't get things to work with mode='wrap'. ?What am I doing wrong?
>>
>> In [31]: a
>> Out[31]:
>> array([[ ?0., ? 1., ? 2.],
>> ? ? ? [ ?3., ? 4., ? 5.],
>> ? ? ? [ ?6., ? 7., ? 8.],
>> ? ? ? [ ?9., ?10., ?11.]])
>>
>> In [32]: ndimage.map_coordinates(a, [range(5), [0]*5], order=1,
>> mode='wrap') ?# should be 0, 3, 6, 9, 0 -- right?
>> Out[32]: array([ 0., ?3., ?6., ?9., ?3.])
>>
>> In [33]: ndimage.map_coordinates(a, [[0]*4, range(4)], order=1,
>> mode='wrap') ?# should be 0, 1, 2, 0 -- right?
>> Out[33]: array([ 0., ?1., ?2., ?1.])
>>
>> Here's the output when extending the sampling range:
>>
>> In [36]: ndimage.map_coordinates(a, [range(10), [0]*10], order=1,
>> mode='wrap') ?# should be 0, 3, 6, 9, 0, 3, 6, 9, ...
>> Out[36]: array([ 0., ?3., ?6., ?9., ?3., ?6., ?0., ?3., ?6., ?0.])
>>
>> In [37]: ndimage.map_coordinates(a, [[0]*8, range(8)], order=1, mode='wrap')
>> Out[37]: array([ 0., ?1., ?2., ?1., ?0., ?1., ?0., ?1.])
>>
>>
>> If it's a bug, where can I file a report, and what can I do to help
>> fix it? ?Looks like the wrapping code is in a compiled extension
>> module -- I'll take a look.
>
> I forgot to include:
>
> In [39]: sp.version.version
> Out[39]: '0.8.0.dev6120'
>

Looks like the above is another version of this bug:

http://projects.scipy.org/scipy/ticket/796

It affects any scipy.ndimage routines that use mode='wrap'.

The patch has been helpfully submitted and it's in 'needs review'
status -- any chance it could see some action?  Otherwise I'll just
patch scipy locally.

Kurt


From rmb62 at cornell.edu  Wed May 12 13:38:53 2010
From: rmb62 at cornell.edu (Robin M Baur)
Date: Wed, 12 May 2010 13:38:53 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
Message-ID: <AANLkTikYty85efm2gjzmrrCovOq7fEgnXwPSetUUOV7E@mail.gmail.com>

On Wed, May 12, 2010 at 01:56, Zachary Pincus <zachary.pincus at yale.edu> wrote:
[snip]

> The wrapper code is pretty compact and straightforward, and the
> FreeImage library seems pretty robust and simple (once one notes that
> it uses BGRA ordering on little-endian systems). Overall I feel a lot
> better about using this than dealing with PIL and its broken memory
> model and worse patch-acceptance track record.
>
> If anyone wants to test the wrappers out, I'll send you the code.
> Going forward, I'll look into getting this into the scikits image IO
> system, but I don't really have free cycles for that right now.
>
> Zach

I'm definitely interested, having had several nightmarish attempts at
making PIL play nice with my 16-bit TIFF data. I don't have a ton of
spare time myself right now, but I'd like to give it a shot.

Robin


From gideon.simpson at gmail.com  Wed May 12 13:56:20 2010
From: gideon.simpson at gmail.com (Gideon)
Date: Wed, 12 May 2010 10:56:20 -0700 (PDT)
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BE9B6ED.1020603@wartburg.edu>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<4BE9B6ED.1020603@wartburg.edu>
Message-ID: <c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>

I've tried the following.

In Python:
import numpy as np
from FortranFile import FortranFile

x = np.random.rand(10)
f = FortranFile('test.bin',mode='w')
f.writeReals(x)
f.close()

In Fortran:
      program bintest

      double precision x(10)
      integer j

      open(unit=80, file='test.bin', status='old', form='unformatted')

      read(80) x
      close(80)

      do j=1,10
         write(*,*) x(j)

      enddo


      end

then at the command line,

gfortran bintest.f -o bintest
./bintest
At line 9 of file bintest.f (unit = 80, file = 'test.bin')
Fortran runtime error: I/O past end of record on unformatted file

Note, I have no difficulty reading the test.bin file back in, while in
python, using the FortranFile.py routines.

On May 11, 3:58?pm, Neil Martinsen-Burrell <n... at wartburg.edu> wrote:
> On 2010-05-11 14:09, Gideon wrote:
>
> > I've previously used the FortranFile.py to read in binary data
> > generated by fortran computations, but now I'd like to write data from
> > NumPy/SciPy to binary which can be read in by a fortran program. ?Does
> > anyone have an example of using fortranfile.py to create and write
> > data to binary? ?Alternatively, can anyone suggest a way to write
> > numpy arrays to binary in away that permits me to specify the correct
> > offset (4 bytes on my machine) for fortran to then properly read the
> > data in?
>
> You can use the writeReals method of a FortranFile object:
>
> In [1]: import fortranfile
>
> In [2]: import numpy as np
>
> In [3]: F = fortranfile.FortranFile('test.unf',mode='w')
>
> In [4]: F.writeReals(np.linspace(0,1,10))
>
> In [5]: F.close()
>
> In [6]: !ls -l 'test.unf'
> -rw-r--r-- 1 nmb nmb 48 2010-05-11 14:56 test.unf
>
> There are also writeInts and writeString methods. ?Like usual,
> FortranFile only writes and reads homogeneous records: all integers, all
> reals, etc. ?To write fortran files with items of different types in a
> single record, you will have to work harder, perhaps using the struct
> module directly.
>
> -Neil
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> You received this message because you are subscribed to the Google Groups "SciPy-user" group.
> To post to this group, send email to scipy-user at googlegroups.com.
> To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en.


From nmb at wartburg.edu  Wed May 12 14:00:36 2010
From: nmb at wartburg.edu (Neil Martinsen-Burrell)
Date: Wed, 12 May 2010 13:00:36 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>	<4BE9B6ED.1020603@wartburg.edu>
	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
Message-ID: <4BEAECC4.5050508@wartburg.edu>

On 2010-05-12 12:56, Gideon wrote:
> I've tried the following.
>
> In Python:
> import numpy as np
> from FortranFile import FortranFile
>
> x = np.random.rand(10)
> f = FortranFile('test.bin',mode='w')
> f.writeReals(x)
> f.close()
>
> In Fortran:
>        program bintest
>
>        double precision x(10)
>        integer j
>
>        open(unit=80, file='test.bin', status='old', form='unformatted')
>
>        read(80) x
>        close(80)
>
>        do j=1,10
>           write(*,*) x(j)
>
>        enddo
>
>
>        end
>
> then at the command line,
>
> gfortran bintest.f -o bintest
> ./bintest
> At line 9 of file bintest.f (unit = 80, file = 'test.bin')
> Fortran runtime error: I/O past end of record on unformatted file
>
> Note, I have no difficulty reading the test.bin file back in, while in
> python, using the FortranFile.py routines.

It is likely that the problem is with the endian-ness of the file being 
created by FortranFile not matching what is expected by the fortran 
compiler.  (There is a reason that the format of unformatted I/O is not 
specified in the Fortran standard.)  Try the above with different 
settings of FortranFile(..., endian='<') or '>' or '='.

-Neil


From sebastian.walter at gmail.com  Wed May 12 14:48:43 2010
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Wed, 12 May 2010 20:48:43 +0200
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <hscfav$in7$1@dough.gmane.org>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
	<hscfav$in7$1@dough.gmane.org>
Message-ID: <AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com>

Hello Pauli,
On what kind of matrix did you observe such unstable behavior?
Were there repeated eigenvalues?

Sebastian


On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>> save this matrix in matlab format, load it in matlab, and use matlab's
>> eig function to succesfully decompose it with real eigenvalues, so the
>> problem seems to be with scipy/numpy or their dependencies, not with my
>> matrix. Is this a known issue? And is there a good workaround?
>
> Use the eigh function if you know your matrix is symmetric.
>
> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
> check.
>
> A nonsymmetric eigensolver cannot know that your matrix is supposed to
> have real eigenvalues, so it's possible some of them explode to complex
> pairs because of minuscule numerical error. The imaginary part, however,
> is typically small.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From gideon.simpson at gmail.com  Wed May 12 15:58:28 2010
From: gideon.simpson at gmail.com (Gideon)
Date: Wed, 12 May 2010 12:58:28 -0700 (PDT)
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BEAECC4.5050508@wartburg.edu>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<4BE9B6ED.1020603@wartburg.edu>
	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
	<4BEAECC4.5050508@wartburg.edu>
Message-ID: <a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>

Tried both, but I got the same error in both cases.

On May 12, 2:00?pm, Neil Martinsen-Burrell <n... at wartburg.edu> wrote:
> On 2010-05-12 12:56, Gideon wrote:
>
>
>
>
>
> > I've tried the following.
>
> > In Python:
> > import numpy as np
> > from FortranFile import FortranFile
>
> > x = np.random.rand(10)
> > f = FortranFile('test.bin',mode='w')
> > f.writeReals(x)
> > f.close()
>
> > In Fortran:
> > ? ? ? ?program bintest
>
> > ? ? ? ?double precision x(10)
> > ? ? ? ?integer j
>
> > ? ? ? ?open(unit=80, file='test.bin', status='old', form='unformatted')
>
> > ? ? ? ?read(80) x
> > ? ? ? ?close(80)
>
> > ? ? ? ?do j=1,10
> > ? ? ? ? ? write(*,*) x(j)
>
> > ? ? ? ?enddo
>
> > ? ? ? ?end
>
> > then at the command line,
>
> > gfortran bintest.f -o bintest
> > ./bintest
> > At line 9 of file bintest.f (unit = 80, file = 'test.bin')
> > Fortran runtime error: I/O past end of record on unformatted file
>
> > Note, I have no difficulty reading the test.bin file back in, while in
> > python, using the FortranFile.py routines.
>
> It is likely that the problem is with the endian-ness of the file being
> created by FortranFile not matching what is expected by the fortran
> compiler. ?(There is a reason that the format of unformatted I/O is not
> specified in the Fortran standard.) ?Try the above with different
> settings of FortranFile(..., endian='<') or '>' or '='.
>
> -Neil
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> You received this message because you are subscribed to the Google Groups "SciPy-user" group.
> To post to this group, send email to scipy-user at googlegroups.com.
> To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en.


From nmb at wartburg.edu  Wed May 12 16:21:17 2010
From: nmb at wartburg.edu (Neil Martinsen-Burrell)
Date: Wed, 12 May 2010 15:21:17 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>	<4BE9B6ED.1020603@wartburg.edu>	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>	<4BEAECC4.5050508@wartburg.edu>
	<a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>
Message-ID: <4BEB0DBD.4080000@wartburg.edu>

On 2010-05-12 14:58, Gideon wrote:
> Tried both, but I got the same error in both cases.

If you want doubles in your file, you have to request them:

F.writeReals(x, prec='d')

makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran 
4.4.3).  Note that looking at the size of the file that you would expect 
to have for the data you are expecting to read would have demonstrated 
this: 10 doubles at eight bytes per double plus two 4-byte integers 
would have given you 88 bytes for the file, rather than the 48 that were 
being produced.

I use fortranfile most heavily for reading files, rather than writing 
them, so I may have missed this opportunity, but do you think that the 
precision used in writeReals should be auto-detected from the data type 
that it is passed.  That is, would

def writeReals(self, reals, prec=None):
     if prec is None:
         prec = reals.dtype.char
     ...

be better for your use?  That would have made your original code work as 
written.

-Neil


From seb.haase at gmail.com  Wed May 12 16:31:33 2010
From: seb.haase at gmail.com (Sebastian Haase)
Date: Wed, 12 May 2010 22:31:33 +0200
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> 
	<AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com> 
	<E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>
Message-ID: <AANLkTilfZiKdGh13ydlTpUd4t-OcOq4uAn4hvRaZK4D6@mail.gmail.com>

Zach,
Thanks for the reply,

I have looked at the sourceforge - few comments:

- is there currently only one person behind FreeImage ?
- it seems there are some problems with 64 bit windows - apparently
related to inline assembly...
- the discussion group seems not very responsive / active -- might
also of course mean that it mostly "just works" ;-)
- the "who uses FreeImage" list seems really quite long - but then PIL
is probably also used by many ...

How large is the DLL actually ?

Thanks,
Sebastian


On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>> Do you know if FreeImage does anything via memory-mapping ? I'm mostly
>> interested in TIFF-memmap, which exists according to libtif, but I
>> have now idea how useful it is ..... ?(I need memmap for GB-size
>> multipage images)
>
> I don't know a ton about how memmapping works, but check out these
> functions from FreeImage:
>
>> FreeImage_OpenMemory
>> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data
>> FI_DEFAULT(0), DWORD
>> size_in_bytes FI_DEFAULT(0));
>>
>> Open a memory stream. The function returns a pointer to the opened
>> memory stream.
>> When called with default arguments (0), this function opens a memory
>> stream for read / write
>> access. The stream will support loading and saving of FIBITMAP in a
>> memory file (managed
>> internally by FreeImage). It will also support seeking and telling
>> in the memory file.
>> This function can also be used to wrap a memory buffer provided by
>> the application driving
>> FreeImage. A buffer containing image data is given as function
>> arguments data (start of the
>> buffer) and size_in_bytes (buffer size in bytes). A memory buffer
>> wrapped by FreeImage is
>> read only. Images can be loaded but cannot be saved.
>
>> FreeImage_LoadFromHandle
>> DLL_API FIBITMAP *DLL_CALLCONV
>> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif,
>> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0));
>>
>> FreeImage has the unique feature to load a bitmap from an arbitrary
>> source. This source
>> might for example be a cabinet file, a zip file or an Internet
>> stream. Handling of these arbitrary
>> sources is not directly handled in the FREEIMAGE.DLL, but can be
>> easily added by using a
>> FreeImageIO structure as defined in FREEIMAGE.H.
>> FreeImageIO is a structure that contains 4 function pointers: one to
>> read from a source, one
>> to write to a source, one to seek in the source and one to tell
>> where in the source we currently
>> are. When you populate the FreeImageIO structure with pointers to
>> functions and pass that
>> structure to FreeImage_LoadFromHandle, FreeImage will call your
>> functions to read, seek
>> and tell in a file. The handle-parameter (third parameter from the
>> left) is used in this to
>> differentiate between different contexts, e.g. different files or
>> different Internet streams.
>
> With the first, I think you could just pass the void* pointer returned
> from memmapping a file; with the second, I think you could wrap a
> memmapped file with a file-like interface (implemented in python
> callbacks, even). Not sure, of course, if that will work OK... Might
> be easier to work with wrappers to libtiff directly?
>
> Zach
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From zachary.pincus at yale.edu  Wed May 12 17:16:11 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 12 May 2010 17:16:11 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <AANLkTikYty85efm2gjzmrrCovOq7fEgnXwPSetUUOV7E@mail.gmail.com>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
	<AANLkTikYty85efm2gjzmrrCovOq7fEgnXwPSetUUOV7E@mail.gmail.com>
Message-ID: <2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu>

Hi all,

> I'm definitely interested, having had several nightmarish attempts at
> making PIL play nice with my 16-bit TIFF data. I don't have a ton of
> spare time myself right now, but I'd like to give it a shot.

Wrappers attached. Currently, they try to load "FreeImage.[dll|dylib| 
so]" (depending on the platform) from the same directory as the  
module. This is of course easy to change.

The rest is basically straightforward and at least partially  
documented. Let me know how it works out.

Note that right now there's no support for palettized images, though  
that could be added too. And as for license, assume this code, such as  
it is, is BSD.

Zach


-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.py
Type: text/x-python-script
Size: 10285 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100512/ef56b5a6/attachment.bin>
-------------- next part --------------


From zachary.pincus at yale.edu  Wed May 12 17:25:29 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 12 May 2010 17:25:29 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <AANLkTilfZiKdGh13ydlTpUd4t-OcOq4uAn4hvRaZK4D6@mail.gmail.com>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
	<AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com>
	<E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>
	<AANLkTilfZiKdGh13ydlTpUd4t-OcOq4uAn4hvRaZK4D6@mail.gmail.com>
Message-ID: <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu>

> - is there currently only one person behind FreeImage ?
No idea...

> - it seems there are some problems with 64 bit windows - apparently
> related to inline assembly...
Ugh, didn't notice that.

> - the discussion group seems not very responsive / active -- might
> also of course mean that it mostly "just works" ;-)
> - the "who uses FreeImage" list seems really quite long - but then PIL
> is probably also used by many ...
Yeah, it doesn't seem to have a super-active community, but so far it  
seems to just work, which is OK for now, I hope.

Not sure how clean the C code is, but going from the API at least  
someone has spent time thinking about making things clean and  
portable, etc., and hopefully maintainable. I think I'd rather hack on  
FreeImage's C guts than PIL's, but that's without any experience with  
the former.

> How large is the DLL actually ?

Win32 dll = 2.3 MB, OS X intel-only dylib (with debug symbols) = 2.9 MB.

Not terrible.

Zach


> Thanks,
> Sebastian
>
>
> On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus <zachary.pincus at yale.edu 
> > wrote:
>>> Do you know if FreeImage does anything via memory-mapping ? I'm  
>>> mostly
>>> interested in TIFF-memmap, which exists according to libtif, but I
>>> have now idea how useful it is .....  (I need memmap for GB-size
>>> multipage images)
>>
>> I don't know a ton about how memmapping works, but check out these
>> functions from FreeImage:
>>
>>> FreeImage_OpenMemory
>>> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data
>>> FI_DEFAULT(0), DWORD
>>> size_in_bytes FI_DEFAULT(0));
>>>
>>> Open a memory stream. The function returns a pointer to the opened
>>> memory stream.
>>> When called with default arguments (0), this function opens a memory
>>> stream for read / write
>>> access. The stream will support loading and saving of FIBITMAP in a
>>> memory file (managed
>>> internally by FreeImage). It will also support seeking and telling
>>> in the memory file.
>>> This function can also be used to wrap a memory buffer provided by
>>> the application driving
>>> FreeImage. A buffer containing image data is given as function
>>> arguments data (start of the
>>> buffer) and size_in_bytes (buffer size in bytes). A memory buffer
>>> wrapped by FreeImage is
>>> read only. Images can be loaded but cannot be saved.
>>
>>> FreeImage_LoadFromHandle
>>> DLL_API FIBITMAP *DLL_CALLCONV
>>> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif,
>>> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0));
>>>
>>> FreeImage has the unique feature to load a bitmap from an arbitrary
>>> source. This source
>>> might for example be a cabinet file, a zip file or an Internet
>>> stream. Handling of these arbitrary
>>> sources is not directly handled in the FREEIMAGE.DLL, but can be
>>> easily added by using a
>>> FreeImageIO structure as defined in FREEIMAGE.H.
>>> FreeImageIO is a structure that contains 4 function pointers: one to
>>> read from a source, one
>>> to write to a source, one to seek in the source and one to tell
>>> where in the source we currently
>>> are. When you populate the FreeImageIO structure with pointers to
>>> functions and pass that
>>> structure to FreeImage_LoadFromHandle, FreeImage will call your
>>> functions to read, seek
>>> and tell in a file. The handle-parameter (third parameter from the
>>> left) is used in this to
>>> differentiate between different contexts, e.g. different files or
>>> different Internet streams.
>>
>> With the first, I think you could just pass the void* pointer  
>> returned
>> from memmapping a file; with the second, I think you could wrap a
>> memmapped file with a file-like interface (implemented in python
>> callbacks, even). Not sure, of course, if that will work OK... Might
>> be easier to work with wrappers to libtiff directly?
>>
>> Zach
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From gideon.simpson at gmail.com  Wed May 12 18:05:55 2010
From: gideon.simpson at gmail.com (Gideon)
Date: Wed, 12 May 2010 15:05:55 -0700 (PDT)
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BEB0DBD.4080000@wartburg.edu>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<4BE9B6ED.1020603@wartburg.edu>
	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
	<4BEAECC4.5050508@wartburg.edu>
	<a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>
	<4BEB0DBD.4080000@wartburg.edu>
Message-ID: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>

Yea, that worked for me on my OS X machine.  Thanks so much.

To be honest, in the 10 years I've been doing floating point
calculations for ODEs and PDEs, I don't think I've ever used single
precision arithmetic. So I am surprised it doesn't default to double
precision.  Obviously, different people have different needs.

On May 12, 4:21?pm, Neil Martinsen-Burrell <n... at wartburg.edu> wrote:
> On 2010-05-12 14:58, Gideon wrote:
>
> > Tried both, but I got the same error in both cases.
>
> If you want doubles in your file, you have to request them:
>
> F.writeReals(x, prec='d')
>
> makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran
> 4.4.3). ?Note that looking at the size of the file that you would expect
> to have for the data you are expecting to read would have demonstrated
> this: 10 doubles at eight bytes per double plus two 4-byte integers
> would have given you 88 bytes for the file, rather than the 48 that were
> being produced.
>
> I use fortranfile most heavily for reading files, rather than writing
> them, so I may have missed this opportunity, but do you think that the
> precision used in writeReals should be auto-detected from the data type
> that it is passed. ?That is, would
>
> def writeReals(self, reals, prec=None):
> ? ? ?if prec is None:
> ? ? ? ? ?prec = reals.dtype.char
> ? ? ?...
>
> be better for your use? ?That would have made your original code work as
> written.
>
> -Neil
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> You received this message because you are subscribed to the Google Groups "SciPy-user" group.
> To post to this group, send email to scipy-user at googlegroups.com.
> To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en.


From josef.pktd at gmail.com  Thu May 13 00:35:07 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 13 May 2010 00:35:07 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
Message-ID: <AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>

On Mon, May 3, 2010 at 3:32 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi Josef,
>
> Thanks for your answer.
>
> On 3 May 2010 15:16, ?<josef.pktd at gmail.com> wrote:
>> On Mon, May 3, 2010 at 6:04 AM, nicky van foreest <vanforeest at gmail.com> wrote:
>>> Hi,
>>>
>>> As far as I can see scipy.stats does not support the deterministic
>>> distribution. Would it be a good idea to implement this also? In my
>>> opinion this distribution is very useful to use as a test case, for
>>> debugging purposes for instance.
>
> One case is the M/D/1 queue, a single server with exponentially
> distributed interarrival times and deterministic service times.
> Another case is an inventory system with periodic replenishments, and
> random demands. A first simple model would be to use deterministically
> distributed interreplenishment times. The size of demand can also be
> taken to be deterministic, as an interesting limiting case.
>
>>
>> You mean something like http://en.wikipedia.org/wiki/Degenerate_distribution
>> (I never heard the term deterministic distribution before).
>
> Yes.
>
>
>>
>> If the support is an integer, then rv_discrete might work, looks good see below
>>
>> Are there any useful operations, that we could do with it?
>
> Yes, like simulating the M/D/1 queue. Suppose I would like to build a
> queueing simulator. I would like to set this up in a generic way, and
> pass rv_arrival and ?rv_service as frozen rvs, Like this I can
> experiment with several distributions, including the deterministic
> distribution as a limiting case or simple case, ?all within the same
> framework.
>
>> I think I can see a case for debugging programs that use the
>> distributions in scipy.stats, but almost degenerate might also work
>> for debugging.
>
> Sure, but sometimes you just want to exclude random effects. Moreover,
> I would like to see "rv = stats.deterministic(...)" in the ?code, for
> the purpose of readability.
>
>>
>> What I would like to have is a discrete distribution on the real line,
>> instead of the integers, like rv_discrete but with support on
>> arbitrary floats.
>
> Yes, indeed.
>
> Please let me know your opinion.

I can see that a onepoint distribution can be quite useful as a plugin
degenerate distribution but also for other purposes like mixture
distributions with a continuous and a discrete part (masspoints).

Actually, if the onepoint distribution directly subclasses rv_generic
then it wouldn't rely on or interfere with the generic framework in
rv_continuous or rv_discrete (where it wouldn't really fit in if
onepoint is on reals), and it might be relatively easy to provide all
the methods of the distributions for a single point distribution.

Choice of name:
to me, "deterministic random variable" sounds like an oxymoron,
although I found some references to deterministic distribution (mainly
or exclusively in queuing theory and
http://isi.cbs.nl/glossary/term902.htm)
I would prefer a boring "onepoint" distribution, or "degenerate", or ... ?
Google brings up more statistics/probability references for one-point
or degenerate distribution.

Can you file a ticket with what you would like to have?


<rambling ahead>
I started to work again a bit on enhancing the distributions, mainly
I'm experimenting with several generic estimation methods. My target
is to have a working estimator for any distribution in scipy.stats and
for several additional distributions.

I worry a bit that a deterministic distribution might not fit into a
general framework for distributions and might need to be special cased
for some methods. (but see above)

In my new code, I went away from using distributions by name eg. in
arguments for function, so I don't care anymore whether a distribution
is defined in scipy.stats or in some other module, i.e. no more
getattr(scipy.stats, distname)

One problem is that, once new functions/classes are in scipy,
backwards compatibility considerations make development a lot more
sluggish, and for many parts I know what I don't like, but I'm not
sure yet what an improvement should look like.

In case you are interested, I'm having fun in the sandbox
http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/

Cheers,

Josef


>
> bye
>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From ariver at enthought.com  Thu May 13 01:44:17 2010
From: ariver at enthought.com (Aaron River)
Date: Thu, 13 May 2010 00:44:17 -0500
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
	<AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>
Message-ID: <AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>

Okay, it's fixed now.

I disabled spamassassin a long while back, and I don't know if someone
re-enabled it, or if I just spaced and didn't make it persistent.

Either way, that is what was blocking your emails.

I'm sending a notification of the issue to subscribers of the affected
lists that were caught by this problem up to a month ago.

Thanks for your patience! :)

-- 
Aaron

On Wed, May 12, 2010 at 00:21, Aaron River <ariver at enthought.com> wrote:
> Hello Zach,
>
> I'm the IT Administrator at Enthought.
>
> This is a known issue which I'm working to rectify. I'm hoping to have
> it all ironed out tomorrow.
>
> Thanks,
>
> --
> Aaron
>
> On Tuesday, May 11, 2010, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>> Hi and sorry for the spam,
>>
>> The last couple of times I've replied to messages from scipy-user, it
>> would appear that the mail never comes through to the list, but it
>> doesn't bounce back to me either. (I replied to the symmetric
>> eigenvalue message, but nothing came back on the list, e.g.)
>>
>> If this email gets through, has anyone else seen this issue?
>>
>> Zach
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From dwf at cs.toronto.edu  Thu May 13 02:13:43 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 13 May 2010 02:13:43 -0400 (EDT)
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
	<AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>
	<AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>
Message-ID: <alpine.DEB.1.00.1005130213260.23625@apps0.cs.toronto.edu>


On Thu, 13 May 2010, Aaron River wrote:

> Okay, it's fixed now.
>
> I disabled spamassassin a long while back, and I don't know if someone
> re-enabled it, or if I just spaced and didn't make it persistent.
>
> Either way, that is what was blocking your emails.
>
> I'm sending a notification of the issue to subscribers of the affected
> lists that were caught by this problem up to a month ago.
>
> Thanks for your patience! :)
>
> --
> Aaron
>
> On Wed, May 12, 2010 at 00:21, Aaron River <ariver at enthought.com> wrote:
>> Hello Zach,
>>
>> I'm the IT Administrator at Enthought.
>>
>> This is a known issue which I'm working to rectify. I'm hoping to have
q>> it all ironed out tomorrow.
>>
>> Thanks,
>>
>> --
>> Aaron
>>
>> On Tuesday, May 11, 2010, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>>> Hi and sorry for the spam,
>>>
>>> The last couple of times I've replied to messages from scipy-user, it
>>> would appear that the mail never comes through to the list, but it
>>> doesn't bounce back to me either. (I replied to the symmetric
>>> eigenvalue message, but nothing came back on the list, e.g.)
>>>
>>> If this email gets through, has anyone else seen this issue?
>>>
>>> Zach
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From dwf at cs.toronto.edu  Thu May 13 02:15:06 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 13 May 2010 02:15:06 -0400 (EDT)
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
Message-ID: <alpine.DEB.1.00.1005130214230.23625@apps0.cs.toronto.edu>

Yep. Had that happening to me too.

If this gets through then hopefully it's fixed. I accidentally sent 
another reply too, sorry for the noise.

David


From dagss at student.matnat.uio.no  Thu May 13 04:17:27 2010
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Thu, 13 May 2010 10:17:27 +0200
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BEAECC4.5050508@wartburg.edu>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>	<4BE9B6ED.1020603@wartburg.edu>	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
	<4BEAECC4.5050508@wartburg.edu>
Message-ID: <4BEBB597.7060908@student.matnat.uio.no>

Neil Martinsen-Burrell wrote:
> On 2010-05-12 12:56, Gideon wrote:
>   
>> I've tried the following.
>>
>> In Python:
>> import numpy as np
>> from FortranFile import FortranFile
>>
>> x = np.random.rand(10)
>> f = FortranFile('test.bin',mode='w')
>> f.writeReals(x)
>> f.close()
>>
>> In Fortran:
>>        program bintest
>>
>>        double precision x(10)
>>        integer j
>>
>>        open(unit=80, file='test.bin', status='old', form='unformatted')
>>
>>        read(80) x
>>        close(80)
>>
>>        do j=1,10
>>           write(*,*) x(j)
>>
>>        enddo
>>
>>
>>        end
>>
>> then at the command line,
>>
>> gfortran bintest.f -o bintest
>> ./bintest
>> At line 9 of file bintest.f (unit = 80, file = 'test.bin')
>> Fortran runtime error: I/O past end of record on unformatted file
>>
>> Note, I have no difficulty reading the test.bin file back in, while in
>> python, using the FortranFile.py routines.
>>     
>
> It is likely that the problem is with the endian-ness of the file being 
> created by FortranFile not matching what is expected by the fortran 
> compiler.  (There is a reason that the format of unformatted I/O is not 
> specified in the Fortran standard.)  Try the above with different 
> settings of FortranFile(..., endian='<') or '>' or '='.
>   
A fully reliable way of reading such files is to wrap Fortran code 
reading it with f2py (and fwrap, when that is done). Then, compile with 
the Fortran compiler in question.

Dag Sverre


From chris at simplistix.co.uk  Thu May 13 05:05:48 2010
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 13 May 2010 10:05:48 +0100
Subject: [SciPy-User] problems with build
In-Reply-To: <4BDFFE35.8060107@simplistix.co.uk>
References: <4BDFFE35.8060107@simplistix.co.uk>
Message-ID: <4BEBC0EC.3030506@simplistix.co.uk>

It's been almost a week, does no-one want to shed any light on this?
(and even longer, trying again now the lists are fixed)

Chris

Chris Withers wrote:
> So, I tried this to get the latest numpy installed on an Ubuntu box:
> 
> sudo apt-get build-dep python-numpy
> 
> Then, inside the virtual_env I'm working in:
> 
> bin/easy_install bin/easy_install numpy
> 
> ...which left me with:
> 
> Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg
> Processing dependencies for numpy
> Finished processing dependencies for numpy
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>    File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
>      func(*targs, **kargs)
>    File 
> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", 
> line 248, in clean_up_temporary_directory
> SystemError: Parent module 'numpy.distutils' not loaded
> Error in sys.exitfunc:
> Traceback (most recent call last):
>    File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
>      func(*targs, **kargs)
>    File 
> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py", 
> line 248, in clean_up_temporary_directory
> SystemError: Parent module 'numpy.distutils' not loaded
> 
> ...and yet:
> 
> $ bin/python
> Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import numpy
>  >>>
> 
> Any idea what those weird atexit handlers are supposed to do?!
> 
> They seem to fire not only when numpy is installed but also when 
> anything that depends on numpy is installed...

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk


From hasslerjc at comcast.net  Thu May 13 07:38:56 2010
From: hasslerjc at comcast.net (John Hassler)
Date: Thu, 13 May 2010 07:38:56 -0400
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>	<4BE9B6ED.1020603@wartburg.edu>	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>	<4BEAECC4.5050508@wartburg.edu>	<a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>	<4BEB0DBD.4080000@wartburg.edu>
	<7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>
Message-ID: <4BEBE4D0.7090003@comcast.net>

An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100513/3e906d78/attachment.html>

From chris at simplistix.co.uk  Thu May 13 08:12:12 2010
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 13 May 2010 13:12:12 +0100
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>	<AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>
	<AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>
Message-ID: <4BEBEC9C.9000106@simplistix.co.uk>

Aaron River wrote:
> Okay, it's fixed now.
> 
> I disabled spamassassin a long while back, and I don't know if someone
> re-enabled it, or if I just spaced and didn't make it persistent.
> 
> Either way, that is what was blocking your emails.
> 
> I'm sending a notification of the issue to subscribers of the affected
> lists that were caught by this problem up to a month ago.
> 
> Thanks for your patience! :)

My mails still don't appear to be getting through...

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk


From faltet at pytables.org  Thu May 13 08:18:19 2010
From: faltet at pytables.org (Francesc Alted)
Date: Thu, 13 May 2010 14:18:19 +0200
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BEBE4D0.7090003@comcast.net>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>
	<4BEBE4D0.7090003@comcast.net>
Message-ID: <201005131418.19355.faltet@pytables.org>

A Thursday 13 May 2010 13:38:56 John Hassler escrigu?:
>  "Back in the day," double precision was MUCH slower than single precision
>  arithmetic, so Fortran used single precision by default.  You used double
>  precision only when absolutely necessary, and you had to call it
>  explicitly.  Fortran even had separate "built-in" functions for single and
>  double - eg., sin, dsin, log, dlog, etc. - that the user called
>  explicitly.  (I haven't used Fortran for 20 years, but I think modern
>  Fortran recognizes the type of argument, now.)
> 
>  Single and double precision are about the same speed on modern processors,
>  and double is sometimes even faster than single on 64 bit processors

Beware!  This is so only for basic arithmetic operations.  For computation of 
transcendent functions (sin, cos, atanh, sqrt, log...), single precision is 
still way faster (they require much less computations to reach the precision).

>  (because of the ancillary data shuffling, I think).  However, Fortran is
>  dragging nearly 60 years of history along with it, so I'm not surprised
>  that it defaults to single precision.
> 
>  john

-- 
Francesc Alted


From ariver at enthought.com  Thu May 13 09:42:03 2010
From: ariver at enthought.com (Aaron River)
Date: Thu, 13 May 2010 08:42:03 -0500
Subject: [SciPy-User] mail not getting through?
In-Reply-To: <4BEBEC9C.9000106@simplistix.co.uk>
References: <24CA9945-8E01-4A41-BF9C-5F6F280866C3@yale.edu>
	<AANLkTimuiRn88M4dsWqdjJTQNOHo8ta-K9mYC8_7muGq@mail.gmail.com>
	<AANLkTimRVxJVcjS14Esg7jHPepAUCo7qNiiYG6TMf3pa@mail.gmail.com>
	<4BEBEC9C.9000106@simplistix.co.uk>
Message-ID: <AANLkTikzq0e_dPBbAf6LGj2tEOmiRaYZbtww_JJuAEcY@mail.gmail.com>

On Thu, May 13, 2010 at 07:12, Chris Withers <chris at simplistix.co.uk> wrote:
> Aaron River wrote:
>>
>> Okay, it's fixed now.
>
> My mails still don't appear to be getting through...

Hi Chris,

I see two emails from you sent to scipy-user in the past 7 hours.
(Both show up in the scipy-user archives.)

4:06am -- http://mail.scipy.org/pipermail/scipy-user/2010-May/025298.html
7:11am -- http://mail.scipy.org/pipermail/scipy-user/2010-May/025300.html

Did you send any more than that?
(If so, they never touched the scipy.org mta.)

I've switched on "acknowledgment" for emails you send to this list.
This will send you a small email confirming future posts to the list.

If you wish to turn this off, or adjust your settings further, you may visit ...

http://mail.scipy.org/mailman/listinfo/scipy-user

... and use "Unsubscribe or edit options" at the bottom of the page.

If you have any additional or continued troubles, let me know
directly, offline from the list.

Thanks,

-- 
Aaron


From gael.varoquaux at normalesup.org  Thu May 13 09:31:24 2010
From: gael.varoquaux at normalesup.org (=?ISO-8859-1?Q?Ga=EBl_Varoquaux?=)
Date: Thu, 13 May 2010 15:31:24 +0200
Subject: [SciPy-User] EuroScipy is finally open for registration
Message-ID: <AANLkTimCRD6r1JOzl6kMbktu5jmBZv8Lm_0y0IzmNUO2@mail.gmail.com>

The registration for
EuroScipy<http://www.euroscipy.org//conference/euroscipy2010>is
finally open.

To register, go to the
website<http://www.euroscipy.org//conference/euroscipy2010>,
create an account, and you will see a *?register to the conference?* button
on the left. Follow it to a page which presents a *?shoping cart?*. Simply
submitting this information registers you to the conference, and on the left
of the website, the button will now display *?You are registered for the
conference?*.

The registration fee is 50 euros for the conference, and 50 euros for the
tutorial. Right now there is no payment system: you will be contacted later
(in a week) with instructions for paying.

We apologize for such a late set up. We do realize this has come as an
inconvenience to people.

*Do not wait to register: the number of people we can host is limited.*
An exciting program Tutorials: from beginners to experts

We have two tutorial tracks:

   - *Introductory tutorial* <http://www.euroscipy.org/track/871>: to get
   you to speed on scientific programming with Python.
   - *Advanced tutorial* <http://www.euroscipy.org/track/872>: experts
   sharing their knowledge on specific techniques and libraries.

   We are very fortunate to have a top notch set of presenters.

Scientific track: doing new science in Python

Although the abstract submission is not yet over, We can say that we are
going to have a rich set of talks, looking at the current submissions. In
addition to the contributed talks, we have:

   - *Keynote speakers* <http://www.euroscipy.org/conference/euroscipy2010>:
   Hans Petter Langtangen and Konrard Hinsen, two major player of scientific
   computing in Python.
   - *Lightning talks* <http://www.euroscipy.org/talk/937>: one hour will be
   open for people to come up and present in a flash an interesting project.

Publishing papers

We are talking with the editors of a major scientific computing journal, and
the odds are quite high that we will be able to publish a special issue on
scientific computing in Python based on the proceedings of the conference.
The papers will undergo peer-review independently from the conference, to
ensure high quality of the final publication.
Call for papers

Abstract submission is still open, though not for long. We are soliciting
contributions on scientific libraries and tools developed with Python and on
scientific or engineering achievements using Python. These include
applications, teaching, future development directions, and current research.
See the call for
papers<http://www.euroscipy.org/card/euroscipy2010_call_for_papers>
.

*We are very much looking forward to passionate discussions about Python in
science in Paris*

*Nicolas Chauvat and Ga?l Varoquaux*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100513/982201b2/attachment.html>

From cimrman3 at ntc.zcu.cz  Thu May 13 10:11:40 2010
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Thu, 13 May 2010 16:11:40 +0200
Subject: [SciPy-User] ANN: SfePy 2010.2
Message-ID: <4BEC089C.1000609@ntc.zcu.cz>

(resending - the Monday post did not get through)

I am pleased to announce release 2010.2 of SfePy.

Description
-----------

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method. The code
is based on NumPy and SciPy packages. It is distributed under the new BSD
license.

Mailing lists, issue tracking, git repository: http://sfepy.org
Home page: http://sfepy.kme.zcu.cz

Documentation: http://docs.sfepy.org/doc

Highlights of this release
--------------------------
- significantly updated documentation
- new wiki pages:
   - SfePy Primer [1]
   - How to use Salome for generating meshes [2]

[1] http://code.google.com/p/sfepy/wiki/Primer
[2] http://code.google.com/p/sfepy/wiki/ExampleUsingSalomeWithSfePy

Major improvements
------------------
Apart from many bug-fixes, let us mention:
- new mesh readers (MED (Salome, PythonOCC), Gambit NEU, UserMeshIO)
- mechanics:
   - ElasticConstants class - conversion formulas for elastic constants
   - StressTransform class to convert various stress tensors
   - basic tensor transformations
- new examples:
   - usage of functions to define various parameter
   - usage of probes
- new tests and many new terms

For more information on this release, see
http://sfepy.googlecode.com/svn/web/releases/2010.2_RELEASE_NOTES.txt
(full release notes, rather long).

Best regards,
Robert Cimrman and Contributors (*)

(*) Contributors to this release (alphabetical order):

Vladim?r Luke?, Andre Smit, Logan Sorenson, Zuzana Z?horov?


From stefan at sun.ac.za  Thu May 13 10:20:33 2010
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 13 May 2010 16:20:33 +0200
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> 
	<AANLkTikYty85efm2gjzmrrCovOq7fEgnXwPSetUUOV7E@mail.gmail.com> 
	<2A97F654-19A1-43D4-BDEA-D212AB669EBA@yale.edu>
Message-ID: <AANLkTilvWSNGk86w-xV0ZyENTL6XIpJRZS0a8gZGYa8C@mail.gmail.com>

Hey Zach

On 12 May 2010 23:16, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hi all,
>
>> I'm definitely interested, having had several nightmarish attempts at
>> making PIL play nice with my 16-bit TIFF data. I don't have a ton of
>> spare time myself right now, but I'd like to give it a shot.
>
> Wrappers attached. Currently, they try to load "FreeImage.[dll|dylib|
> so]" (depending on the platform) from the same directory as the
> module. This is of course easy to change.

I converted your wrappers to plugins for scikits.image.  At the
moment, it still segfaults---could you help me to iron out the
problems?

http://github.com/stefanv/scikits.image/tree/freeimage

Cheers
St?fan


From jrennie at gmail.com  Thu May 13 10:26:45 2010
From: jrennie at gmail.com (Jason Rennie)
Date: Thu, 13 May 2010 10:26:45 -0400
Subject: [SciPy-User] sparse array hstack
Message-ID: <AANLkTik5MrT-veL3pCR9h8hsHck3jjeLzlw9mH9HH8Tr@mail.gmail.com>

It appears that numpy.hstack doesn't work with scipy sparse arrays.  I'm
using scipy 0.6.0 (Debian stable).  Am I observing correctly?  Does a later
version of numpy/scipy fix this?  Or, is there code available which will do
an hstack on sparse arrays?

Thanks,

Jason

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100513/61939412/attachment.html>

From seb.haase at gmail.com  Thu May 13 13:31:29 2010
From: seb.haase at gmail.com (Sebastian Haase)
Date: Thu, 13 May 2010 19:31:29 +0200
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu> 
	<AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com> 
	<E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>
	<AANLkTilfZiKdGh13ydlTpUd4t-OcOq4uAn4hvRaZK4D6@mail.gmail.com> 
	<1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu>
Message-ID: <AANLkTikU-k6NK2nS-7fCAcTk4WNVq6tHUbiFbwfIy6qx@mail.gmail.com>

I got another question:
One nice thing about PIL is that I could just throw any image file at
it and it finds by itself the right format/plugin to load it.
Does FreeImage have a similar feature ?
 - i.e. determining the image format (not just depending on file name
extension would be quite important ...

- Sebastian


On Wed, May 12, 2010 at 11:25 PM, Zachary Pincus
<zachary.pincus at yale.edu> wrote:
>> - is there currently only one person behind FreeImage ?
> No idea...
>
>> - it seems there are some problems with 64 bit windows - apparently
>> related to inline assembly...
> Ugh, didn't notice that.
>
>> - the discussion group seems not very responsive / active -- might
>> also of course mean that it mostly "just works" ;-)
>> - the "who uses FreeImage" list seems really quite long - but then PIL
>> is probably also used by many ...
> Yeah, it doesn't seem to have a super-active community, but so far it
> seems to just work, which is OK for now, I hope.
>
> Not sure how clean the C code is, but going from the API at least
> someone has spent time thinking about making things clean and
> portable, etc., and hopefully maintainable. I think I'd rather hack on
> FreeImage's C guts than PIL's, but that's without any experience with
> the former.
>
>> How large is the DLL actually ?
>
> Win32 dll = 2.3 MB, OS X intel-only dylib (with debug symbols) = 2.9 MB.
>
> Not terrible.
>
> Zach
>
>
>> Thanks,
>> Sebastian
>>
>>
>> On Wed, May 12, 2010 at 7:10 PM, Zachary Pincus <zachary.pincus at yale.edu
>> > wrote:
>>>> Do you know if FreeImage does anything via memory-mapping ? I'm
>>>> mostly
>>>> interested in TIFF-memmap, which exists according to libtif, but I
>>>> have now idea how useful it is ..... ?(I need memmap for GB-size
>>>> multipage images)
>>>
>>> I don't know a ton about how memmapping works, but check out these
>>> functions from FreeImage:
>>>
>>>> FreeImage_OpenMemory
>>>> DLL_API FIMEMORY *DLL_CALLCONV FreeImage_OpenMemory(BYTE *data
>>>> FI_DEFAULT(0), DWORD
>>>> size_in_bytes FI_DEFAULT(0));
>>>>
>>>> Open a memory stream. The function returns a pointer to the opened
>>>> memory stream.
>>>> When called with default arguments (0), this function opens a memory
>>>> stream for read / write
>>>> access. The stream will support loading and saving of FIBITMAP in a
>>>> memory file (managed
>>>> internally by FreeImage). It will also support seeking and telling
>>>> in the memory file.
>>>> This function can also be used to wrap a memory buffer provided by
>>>> the application driving
>>>> FreeImage. A buffer containing image data is given as function
>>>> arguments data (start of the
>>>> buffer) and size_in_bytes (buffer size in bytes). A memory buffer
>>>> wrapped by FreeImage is
>>>> read only. Images can be loaded but cannot be saved.
>>>
>>>> FreeImage_LoadFromHandle
>>>> DLL_API FIBITMAP *DLL_CALLCONV
>>>> FreeImage_LoadFromHandle(FREE_IMAGE_FORMAT fif,
>>>> FreeImageIO *io, fi_handle handle, int flags FI_DEFAULT(0));
>>>>
>>>> FreeImage has the unique feature to load a bitmap from an arbitrary
>>>> source. This source
>>>> might for example be a cabinet file, a zip file or an Internet
>>>> stream. Handling of these arbitrary
>>>> sources is not directly handled in the FREEIMAGE.DLL, but can be
>>>> easily added by using a
>>>> FreeImageIO structure as defined in FREEIMAGE.H.
>>>> FreeImageIO is a structure that contains 4 function pointers: one to
>>>> read from a source, one
>>>> to write to a source, one to seek in the source and one to tell
>>>> where in the source we currently
>>>> are. When you populate the FreeImageIO structure with pointers to
>>>> functions and pass that
>>>> structure to FreeImage_LoadFromHandle, FreeImage will call your
>>>> functions to read, seek
>>>> and tell in a file. The handle-parameter (third parameter from the
>>>> left) is used in this to
>>>> differentiate between different contexts, e.g. different files or
>>>> different Internet streams.
>>>
>>> With the first, I think you could just pass the void* pointer
>>> returned
>>> from memmapping a file; with the second, I think you could wrap a
>>> memmapped file with a file-like interface (implemented in python
>>> callbacks, even). Not sure, of course, if that will work OK... Might
>>> be easier to work with wrappers to libtiff directly?
>>>
>>> Zach
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From zachary.pincus at yale.edu  Thu May 13 13:43:45 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Thu, 13 May 2010 13:43:45 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <AANLkTikU-k6NK2nS-7fCAcTk4WNVq6tHUbiFbwfIy6qx@mail.gmail.com>
References: <68E7297B-CFDB-41B5-A6F5-F07B8AD671BD@yale.edu>
	<AANLkTimAaCu0ruzT26m4Lnzd11Jr1wS6ilS3bvX1GiUl@mail.gmail.com>
	<E0434619-9C4E-431C-9042-88C8C161CFE4@yale.edu>
	<AANLkTilfZiKdGh13ydlTpUd4t-OcOq4uAn4hvRaZK4D6@mail.gmail.com>
	<1CFC398B-16A8-4D41-8453-8B2DFE26722F@yale.edu>
	<AANLkTikU-k6NK2nS-7fCAcTk4WNVq6tHUbiFbwfIy6qx@mail.gmail.com>
Message-ID: <E56022D9-7EA0-4856-BBCC-795C3743D933@yale.edu>

> I got another question:
> One nice thing about PIL is that I could just throw any image file at
> it and it finds by itself the right format/plugin to load it.
> Does FreeImage have a similar feature ?
> - i.e. determining the image format (not just depending on file name
> extension would be quite important ...

Yeah, it has tools to sniff the type from a file, and also ones to  
determine type based on name alone:
FreeImage_GetFileType and FreeImage_GetFIFFromFilename, respectively.

My wrappers use the former on reading, and the latter for writing, but  
this is easy enough to modify. The return values of the above  
functions are int constants that identify each file format plugin --  
and, from a brief look, it looks like it should be possible to  
implement new format plugins in python, using the ctypes callback tools.

Overall, and to a first approximation, I'm pretty happy with the API  
FreeImage exposes.

Zach


From david at silveregg.co.jp  Thu May 13 22:51:03 2010
From: david at silveregg.co.jp (David)
Date: Fri, 14 May 2010 11:51:03 +0900
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <201005131418.19355.faltet@pytables.org>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>
	<4BEBE4D0.7090003@comcast.net> <201005131418.19355.faltet@pytables.org>
Message-ID: <4BECBA97.40809@silveregg.co.jp>

On 05/13/2010 09:18 PM, Francesc Alted wrote:
> A Thursday 13 May 2010 13:38:56 John Hassler escrigu?:
>>   "Back in the day," double precision was MUCH slower than single precision
>>   arithmetic, so Fortran used single precision by default.  You used double
>>   precision only when absolutely necessary, and you had to call it
>>   explicitly.  Fortran even had separate "built-in" functions for single and
>>   double - eg., sin, dsin, log, dlog, etc. - that the user called
>>   explicitly.  (I haven't used Fortran for 20 years, but I think modern
>>   Fortran recognizes the type of argument, now.)
>>
>>   Single and double precision are about the same speed on modern processors,
>>   and double is sometimes even faster than single on 64 bit processors
>
> Beware!  This is so only for basic arithmetic operations.  For computation of
> transcendent functions (sin, cos, atanh, sqrt, log...), single precision is
> still way faster (they require much less computations to reach the precision).

Also, float and double operations are at the same speed only considering 
everything is in the registers... So concretely, single precision is 
much faster for almost any code which is memory bound (it is very easy 
to check with numpy: something as simple as dot is around twice faster 
for single than double precision, assuming dot uses atlas or similarly 
optimized library).

cheers,

David


From faltet at pytables.org  Fri May 14 03:31:15 2010
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 14 May 2010 09:31:15 +0200
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BECBA97.40809@silveregg.co.jp>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<201005131418.19355.faltet@pytables.org>
	<4BECBA97.40809@silveregg.co.jp>
Message-ID: <201005140931.15397.faltet@pytables.org>

A Friday 14 May 2010 04:51:03 David escrigu?:
> > Beware!  This is so only for basic arithmetic operations.  For
> > computation of transcendent functions (sin, cos, atanh, sqrt, log...),
> > single precision is still way faster (they require much less computations
> > to reach the precision).
> 
> Also, float and double operations are at the same speed only considering
> everything is in the registers... So concretely, single precision is
> much faster for almost any code which is memory bound (it is very easy
> to check with numpy: something as simple as dot is around twice faster
> for single than double precision, assuming dot uses atlas or similarly
> optimized library).

True.  Although you don't even need atlas to see this:

In [3]: a = np.arange(1e6, dtype=np.float64)

In [4]: b = np.arange(1e6, dtype=np.float64)

In [5]: timeit a*b
100 loops, best of 3: 5.02 ms per loop

In [6]: a = np.arange(1e6, dtype=np.float32)

In [7]: b = np.arange(1e6, dtype=np.float32)

In [8]: timeit a*b
100 loops, best of 3: 2.68 ms per loop


-- 
Francesc Alted


From faltet at pytables.org  Fri May 14 04:57:17 2010
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 14 May 2010 10:57:17 +0200
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
Message-ID: <201005141057.17170.faltet@pytables.org>

A Tuesday 11 May 2010 21:09:32 Gideon escrigu?:
> I've previously used the FortranFile.py to read in binary data
> generated by fortran computations, but now I'd like to write data from
> NumPy/SciPy to binary which can be read in by a fortran program.  Does
> anyone have an example of using fortranfile.py to create and write
> data to binary?  Alternatively, can anyone suggest a way to write
> numpy arrays to binary in away that permits me to specify the correct
> offset (4 bytes on my machine) for fortran to then properly read the
> data in?

Just for completeness to other solutions offered, I'm attaching a BinaryFile 
class that allows you to read/write fortran files (in general, binary files).  
From its docstrings:

"""
BinaryFile: A class for accessing data to/from large binary files
=================================================================

The data is meant to be read/write sequentially from/to a binary file.
One can request to read a piece of data with a specific type and shape
from it.  Also, it supports the notion of Fortran and C ordered data,
so that the returned data is always well-behaved (C-contiguous and
aligned).

This class is seeking capable.
"""

It differs from the solutions that other presented here in that it does not 
use the struct module at all, so it is much more faster.  For example, when 
using Neil's fortranfile module, one have:

In [1]: import fortranfile

In [2]: import numpy as np

In [3]: f = fortranfile.FortranFile('/tmp/test.unf',mode='w')

In [5]: time f.writeReals(np.arange(1e7))
CPU times: user 6.06 s, sys: 0.14 s, total: 6.21 s
Wall time: 6.41 s

In [7]: f.close()

In [8]: f = fortranfile.FortranFile('/tmp/test.unf',mode='r')

In [9]: time f.readReals()
CPU times: user 0.64 s, sys: 0.35 s, total: 0.99 s
Wall time: 1.00 s
Out[10]:
array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
         9.99999700e+06,   9.99999800e+06,   9.99999900e+06], dtype=float32)

while using my binaryfile module gives:

In [1]: import numpy as np

In [2]: from binaryfile import BinaryFile

In [3]: f = BinaryFile('/tmp/test.bin', mode="w+", order='fortran')

In [4]: time f.write(np.arange(1e7))
CPU times: user 0.04 s, sys: 0.19 s, total: 0.24 s
Wall time: 0.24 s        # 26x times faster than fortranfile           

In [6]: f.seek(0)

In [7]: time f.read('f8', (int(1e7),))
CPU times: user 0.03 s, sys: 0.12 s, total: 0.15 s
Wall time: 0.15 s       # 6.6 times faster than fortranfile                          
Out[8]:                                           
array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
         9.99999700e+06,   9.99999800e+06,   9.99999900e+06])

Also, binaryfile supports all the types in NumPy, even strings and records.

HTH,

-- 
Francesc Alted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: binaryfile.py
Type: text/x-python
Size: 4910 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100514/57ee9ed7/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_binaryfile.py
Type: text/x-python
Size: 6853 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100514/57ee9ed7/attachment-0001.py>

From paul.m.edwards at gmail.com  Fri May 14 06:30:27 2010
From: paul.m.edwards at gmail.com (Paul Edwards)
Date: Fri, 14 May 2010 11:30:27 +0100
Subject: [SciPy-User] compile on win64 for intel and msvc
Message-ID: <AANLkTilnx2c6xKIZDg81qQhpt4d0JdZKysrwdTX8sBpy@mail.gmail.com>

Hi,

I am trying to build scipy on win64 with the intel 10.1 compiler and
ms visual studio 2008.  I am having the same problem as reported here:

    http://mail.scipy.org/pipermail/scipy-user/2009-December/023642.html

The problem is with the flags for the intel compiler shown here:

    http://mail.scipy.org/pipermail/scipy-user/2009-December/023654.html

This was reported to be fixed at the end of the thread by modifying
Python/scipy - could anyone tell me what I need to change in order to
make this compile?

Thanks in advance,
Paul


From paul.m.edwards at gmail.com  Fri May 14 06:56:09 2010
From: paul.m.edwards at gmail.com (Paul Edwards)
Date: Fri, 14 May 2010 11:56:09 +0100
Subject: [SciPy-User] compile on win64 for intel and msvc
In-Reply-To: <AANLkTilnx2c6xKIZDg81qQhpt4d0JdZKysrwdTX8sBpy@mail.gmail.com>
References: <AANLkTilnx2c6xKIZDg81qQhpt4d0JdZKysrwdTX8sBpy@mail.gmail.com>
Message-ID: <AANLkTiknN1ntPFwhg855dbAQb41vGDx-3bwd_fg6xSz5@mail.gmail.com>

BTW If I try and use numscons instead I get:

8<---------------------------------------------------------------------
D:\scratch\SS02\pkgs\build\numpy-1.4.1>..\Python-2.6.5\PCbuild\amd64\python.exe
setupscons.py scons -b --fcompiler=ifort --compiler=msvc config
Running from numpy source directory.
Forcing DISTUTILS_USE_SDK=1
non-existing path in 'numpy\\core': 'code_generators\\numpy_api_order.txt'
non-existing path in 'numpy\\core': 'code_generators\\ufunc_api_order.txt'
non-existing path in 'numpy\\core': 'include/numpy\\numpyconfig.h.in'
running scons
Executing scons command (pkg is numpy.core):
D:\scratch\SS02\pkgs\build\Python-2.6.5\PCbuild\amd64\python.exe
"D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-
packages\numscons\scons-local\scons.py" -f numpy\core\SConstruct -I.
scons_tool_path="" src_dir="numpy\core" pkg_path="numpy\core"
pkg_name="numpy.core" log_lev
el=50 distutils_libdir="..\..\..\..\build\lib.win-amd64-2.6"
distutils_clibdir="..\..\..\..\build\temp.win-amd64-2.6"
distutils_install_prefix="D:\scratch\SS02\
pkgs\build\Python-2.6.5\Lib\site-packages\numpy\core" cc_opt=msvc
cc_opt=msvc debug=0 f77_opt=ifort cxx_opt=msvc
include_bootstrap=..\..\..\..\numpy\core\includ
e bypass=1 import_env=0 silent=0 bootstrapping=1
scons: Reading SConscript files ...
Mkdir("build\scons\numpy\core")
WindowsError: [Error 2] The system cannot find the file specified:
  File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\numpy\core\SConstruct", line 2:
    GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript')
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py",
line 135:
    build_dir = '$build_dir', src_dir = '$src_dir')
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py",
line 553:
    return apply(_SConscript, [self.fs,] + files, subst_kw)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py",
line 262:
    exec _file_ in call_stack[-1].globals
  File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\build\scons\numpy\core\SConscript",
line 38:
    env = GetNumpyEnvironment(ARGUMENTS)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py",
line 23:
    env = _get_numpy_env(args)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py",
line 63:
    initialize_tools(env)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py",
line 186:
    initialize_f77(env)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py",
line 119:
    env.Tool(name)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py",
line 125:
    get_numscons_toolpaths(self))
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Environment.py",
line 1704:
    tool(self)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Tool\__init__.py",
line 181:
    apply(self.generate, ( env, ) + args, kw)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py",
line 44:
    return generate_win32(env)
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py",
line 30:
    pdir = product_dir_fc(versdict[vers[0]])
  File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\intel_common\win32.py",
line 77:
    return _winreg.QueryValueEx(k, "ProductDir")[0]
error: Error while executing scons command. See above for more information.
If you think it is a problem in numscons, you can also try executing the scons
command with --log-level option for more detailed output of what numscons is
doing, for example --log-level=0; the lowest the level is, the more detailed
the output it.
--------------------------------------------------------------------->8

Regards,
Paul

---------- Forwarded message ----------
From: Paul Edwards <paul.m.edwards at gmail.com>
Date: 14 May 2010 11:30
Subject: compile on win64 for intel and msvc
To: scipy-user at scipy.org


Hi,

I am trying to build scipy on win64 with the intel 10.1 compiler and
ms visual studio 2008. ?I am having the same problem as reported here:

? ?http://mail.scipy.org/pipermail/scipy-user/2009-December/023642.html

The problem is with the flags for the intel compiler shown here:

? ?http://mail.scipy.org/pipermail/scipy-user/2009-December/023654.html

This was reported to be fixed at the end of the thread by modifying
Python/scipy - could anyone tell me what I need to change in order to
make this compile?

Thanks in advance,
Paul


From nmb at wartburg.edu  Fri May 14 10:30:37 2010
From: nmb at wartburg.edu (Neil Martinsen-Burrell)
Date: Fri, 14 May 2010 09:30:37 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <201005141057.17170.faltet@pytables.org>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<201005141057.17170.faltet@pytables.org>
Message-ID: <4BED5E8D.80902@wartburg.edu>

On 2010-05-14 03:57 , Francesc Alted wrote:
> A Tuesday 11 May 2010 21:09:32 Gideon escrigu?:
>> I've previously used the FortranFile.py to read in binary data
>> generated by fortran computations, but now I'd like to write data from
>> NumPy/SciPy to binary which can be read in by a fortran program.  Does
>> anyone have an example of using fortranfile.py to create and write
>> data to binary?  Alternatively, can anyone suggest a way to write
>> numpy arrays to binary in away that permits me to specify the correct
>> offset (4 bytes on my machine) for fortran to then properly read the
>> data in?
>
> Just for completeness to other solutions offered, I'm attaching a BinaryFile
> class that allows you to read/write fortran files (in general, binary files).
>  From its docstrings:
>
> """
> BinaryFile: A class for accessing data to/from large binary files
> =================================================================
>
> The data is meant to be read/write sequentially from/to a binary file.
> One can request to read a piece of data with a specific type and shape
> from it.  Also, it supports the notion of Fortran and C ordered data,
> so that the returned data is always well-behaved (C-contiguous and
> aligned).
>
> This class is seeking capable.
> """
>
> It differs from the solutions that other presented here in that it does not
> use the struct module at all, so it is much more faster.  For example, when
> using Neil's fortranfile module, one have:
>
> In [1]: import fortranfile
>
> In [2]: import numpy as np
>
> In [3]: f = fortranfile.FortranFile('/tmp/test.unf',mode='w')
>
> In [5]: time f.writeReals(np.arange(1e7))
> CPU times: user 6.06 s, sys: 0.14 s, total: 6.21 s
> Wall time: 6.41 s
>
> In [7]: f.close()
>
> In [8]: f = fortranfile.FortranFile('/tmp/test.unf',mode='r')
>
> In [9]: time f.readReals()
> CPU times: user 0.64 s, sys: 0.35 s, total: 0.99 s
> Wall time: 1.00 s
> Out[10]:
> array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
>           9.99999700e+06,   9.99999800e+06,   9.99999900e+06], dtype=float32)
>
> while using my binaryfile module gives:
>
> In [1]: import numpy as np
>
> In [2]: from binaryfile import BinaryFile
>
> In [3]: f = BinaryFile('/tmp/test.bin', mode="w+", order='fortran')
>
> In [4]: time f.write(np.arange(1e7))
> CPU times: user 0.04 s, sys: 0.19 s, total: 0.24 s
> Wall time: 0.24 s        # 26x times faster than fortranfile
>
> In [6]: f.seek(0)
>
> In [7]: time f.read('f8', (int(1e7),))
> CPU times: user 0.03 s, sys: 0.12 s, total: 0.15 s
> Wall time: 0.15 s       # 6.6 times faster than fortranfile
> Out[8]:
> array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
>           9.99999700e+06,   9.99999800e+06,   9.99999900e+06])
>
> Also, binaryfile supports all the types in NumPy, even strings and records.

Wonderful speed!  But, alas, binaryfile does not produce fortran 
unformatted output.  The format that you've written is what Fortran 
calls stream output and is a relatively recent addition to that 
language.  While fortranfile is certainly slow due to its use of the 
struct module for all writes and reads, it allows it to read and write 
Fortran's record-oriented (not like numpy records) format with a great 
deal of flexibility.  It was designed to be able to read data files 
created by Fortran simulation codes that may have been produced on 
machines with different integer sizes and endian-ness than the machine 
doing the reading.  Your binaryfile does not do this, although I do not 
doubt that it could be done.  Any improvements that make fortranfile 
faster will be gladly accepted!

-Neil


From faltet at pytables.org  Fri May 14 10:51:29 2010
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 14 May 2010 16:51:29 +0200
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BED5E8D.80902@wartburg.edu>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<201005141057.17170.faltet@pytables.org>
	<4BED5E8D.80902@wartburg.edu>
Message-ID: <201005141651.29769.faltet@pytables.org>

A Friday 14 May 2010 16:30:37 Neil Martinsen-Burrell escrigu?:
> Wonderful speed!  But, alas, binaryfile does not produce fortran
> unformatted output.  The format that you've written is what Fortran
> calls stream output and is a relatively recent addition to that
> language.

Mmh.  I'm rather ignorant in this matter, but I'm wondering if what you call 
'stream output' would be the same than the venerable 'sequential access' mode 
(that exists at least since Fortran 90)?

> While fortranfile is certainly slow due to its use of the
> struct module for all writes and reads, it allows it to read and write
> Fortran's record-oriented (not like numpy records) format with a great
> deal of flexibility.

You are right.  I suppose that what you call 'record-oriented' is the 'direct 
access' mode in literature.  Yup, this is not supported by binaryfile.

> It was designed to be able to read data files
> created by Fortran simulation codes that may have been produced on
> machines with different integer sizes and endian-ness than the machine
> doing the reading.  Your binaryfile does not do this, although I do not
> doubt that it could be done.  Any improvements that make fortranfile
> faster will be gladly accepted!

Well, I suppose that if you can get rid of the struct module in fortranfile 
you may get much better performance.  I don't think this would require a lot 
of work.

-- 
Francesc Alted


From ralf.gommers at googlemail.com  Fri May 14 11:45:27 2010
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Fri, 14 May 2010 23:45:27 +0800
Subject: [SciPy-User] problems with build
In-Reply-To: <4BEBC0EC.3030506@simplistix.co.uk>
References: <4BDFFE35.8060107@simplistix.co.uk>
	<4BEBC0EC.3030506@simplistix.co.uk>
Message-ID: <AANLkTilGvLDxCqNeKnV30R5mvVUfDP6HZYPl_UdpEE9t@mail.gmail.com>

On Thu, May 13, 2010 at 5:05 PM, Chris Withers <chris at simplistix.co.uk>wrote:

> It's been almost a week, does no-one want to shed any light on this?
> (and even longer, trying again now the lists are fixed)
>
> Chris
>
> Chris Withers wrote:
> > So, I tried this to get the latest numpy installed on an Ubuntu box:
> >
> > sudo apt-get build-dep python-numpy
> >
> > Then, inside the virtual_env I'm working in:
> >
> > bin/easy_install bin/easy_install numpy
> >
> > ...which left me with:
> >
> > Installed
> .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg
> > Processing dependencies for numpy
> > Finished processing dependencies for numpy
> > Error in atexit._run_exitfuncs:
> > Traceback (most recent call last):
> >    File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
> >      func(*targs, **kargs)
> >    File
> > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py",
> > line 248, in clean_up_temporary_directory
> > SystemError: Parent module 'numpy.distutils' not loaded
> > Error in sys.exitfunc:
> > Traceback (most recent call last):
> >    File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
> >      func(*targs, **kargs)
> >    File
> > "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py",
> > line 248, in clean_up_temporary_directory
> > SystemError: Parent module 'numpy.distutils' not loaded
> >
> > ...and yet:
> >
> > $ bin/python
> > Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
> > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >  >>> import numpy
> >  >>>
> >
> > Any idea what those weird atexit handlers are supposed to do?!
>

Looks like they should be cleaning up temporary dirs after the install is
completed and python (or in this case easy_install) exits.

Do you see this also when installing numpy or a package that depends on
numpy with regular "python setup.py build/install"? Given how often
easy_install fails to install anything but pure-python packages that would
be my first guess of where the problem is.

Cheers,
Ralf


> >
> > They seem to fire not only when numpy is installed but also when
> > anything that depends on numpy is installed...
>
> cheers,
>
> Chris
>
> --
> Simplistix - Content Management, Batch Processing & Python Consulting
>            - http://www.simplistix.co.uk
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100514/1b09ab06/attachment.html>

From opossumnano at gmail.com  Fri May 14 13:44:06 2010
From: opossumnano at gmail.com (Tiziano Zito)
Date: Fri, 14 May 2010 19:44:06 +0200
Subject: [SciPy-User] ANN: MDP release 2.6 and MDP Sprint 2010
Message-ID: <20100514174406.GE29048@multivac.zonafranca>

We are glad to announce release 2.6 of the Modular toolkit for Data
Processing (MDP).

MDP is a Python library of widely used data processing algorithms
that can be combined according to a pipeline analogy to build more
complex data processing software. The base of available algorithms
includes, to name but the most common, Principal Component Analysis
(PCA and NIPALS), several Independent Component Analysis algorithms
(CuBICA, FastICA, TDSEP, JADE, and XSFA), Slow Feature Analysis,
Restricted Boltzmann Machine, and Locally Linear Embedding.

What's new in version 2.6?
--------------------------

- Several new classifier nodes have been added.
- A new node extension mechanism makes it possible to dynamically
  add methods or attributes for specific features to node classes,
  enabling aspect-oriented programming in MDP. Several MDP features
  (like parallelization) are now based on this mechanism, and users
  can add their own custom node extensions.
- BiMDP is a large new package in MDP that introduces bidirectional
  data flows to MDP, including backpropagation and even loops. BiMDP
  also enables the transportation of additional data in flows via
  messages.
- BiMDP includes a new flow inspection tool, that runs as as a
  graphical debugger in the webrowser to step through complex flows.
  It can be extended by users for the analysis and visualization of
  intermediate data.
- As usual, tons of bug fixes

The new additions in the library have been thoroughly tested but, as
usual after a public release, we especially welcome user's feedback
and bug reports.

MDP Sprint 2010
---------------

Following our tradition of sprint-driven development, the team of the
core developers decided to organize a programming sprint open to
external participants. We invite in particular all users who
implemented new algorithms and would like to see them integrated in
MDP: you will work together with a core developer!
More info: http://sourceforge.net/apps/mediawiki/mdp-toolkit/index.php?title=MDP_Sprint_2010 

Resources
---------
Download: http://sourceforge.net/projects/mdp-toolkit/files
Homepage: http://mdp-toolkit.sourceforge.net
Mailing list: http://lists.sourceforge.net/mailman/listinfo/mdp-toolkit-users

--

Pietro Berkes
Volen Center for Complex Systems
Brandeis University
Waltham, MA, USA

Rike-Benjamin Schuppner
Berlin, Germany

Niko Wilbert
Institute for Theoretical Biology
Humboldt-University
Berlin, Germany

Tiziano Zito
Modelling of Cognitive Processes
Berlin Institute of Technology and
Bernstein Center for Computational Neuroscience
Berlin, Germany


From saintmlx at apstat.com  Fri May 14 14:05:22 2010
From: saintmlx at apstat.com (Xavier Saint-Mleux)
Date: Fri, 14 May 2010 14:05:22 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>	<hscfav$in7$1@dough.gmane.org>
	<AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com>
Message-ID: <4BED90E2.4000707@apstat.com>

Sebastian Walter wrote:
> Hello Pauli,
> On what kind of matrix did you observe such unstable behavior?
> Were there repeated eigenvalues?
>   

It happens to me a lot with complex covariance matrices (Hermitian). 
Here's a simple example that returns non-real eigenvalues for an
Hermitian matrix:


>>> np.random.seed(0)
>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j
>>> x = (x+x.T.conj())/2 # make it Hermitian
>>> x == x.T.conj() # ensure it is Hermitian
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)
>>> np.linalg.eigvals(x) # returns complex values
array([ 1.99062044 -4.98523579e-17j,  0.18062978 -9.36952928e-19j,
       -0.23511915 -2.19606549e-17j])
>>> np.linalg.eigvalsh(x) # imag always zero
array([-0.23511915+0.j,  0.18062978+0.j,  1.99062044+0.j])
>>>


Xavier


> Sebastian
>
>
>
>
>
> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
>   
>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>>     
>>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>>> save this matrix in matlab format, load it in matlab, and use matlab's
>>> eig function to succesfully decompose it with real eigenvalues, so the
>>> problem seems to be with scipy/numpy or their dependencies, not with my
>>> matrix. Is this a known issue? And is there a good workaround?
>>>       
>> Use the eigh function if you know your matrix is symmetric.
>>
>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
>> check.
>>
>> A nonsymmetric eigensolver cannot know that your matrix is supposed to
>> have real eigenvalues, so it's possible some of them explode to complex
>> pairs because of minuscule numerical error. The imaginary part, however,
>> is typically small.
>>
>> --
>> Pauli Virtanen
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>     
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From aarchiba at physics.mcgill.ca  Fri May 14 14:12:37 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 14 May 2010 14:12:37 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <4BED90E2.4000707@apstat.com>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com> 
	<hscfav$in7$1@dough.gmane.org>
	<AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com> 
	<4BED90E2.4000707@apstat.com>
Message-ID: <AANLkTik94_XH2gLo0Q76P6EoLUF9P5WTNYPkikw3Yf05@mail.gmail.com>

On 14 May 2010 14:05, Xavier Saint-Mleux <saintmlx at apstat.com> wrote:
> Sebastian Walter wrote:
>> Hello Pauli,
>> On what kind of matrix did you observe such unstable behavior?
>> Were there repeated eigenvalues?
>>
>
> It happens to me a lot with complex covariance matrices (Hermitian).
> Here's a simple example that returns non-real eigenvalues for an
> Hermitian matrix:

Uh, not to be difficult, but these values are not actually complex.
The complex component is within a floating-point epsilon of zero. The
only way to do better than this is to explicitly notice that the
matrix is Hermitian and branch to special-case code. And if the matrix
is only numerically Hermitian, i.e. values that should be equal differ
by a floating-point epsilon, even this won't help.

Anne

>
>>>> np.random.seed(0)
>>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j
>>>> x = (x+x.T.conj())/2 # make it Hermitian
>>>> x == x.T.conj() # ensure it is Hermitian
> array([[ True, ?True, ?True],
> ? ? ? [ True, ?True, ?True],
> ? ? ? [ True, ?True, ?True]], dtype=bool)
>>>> np.linalg.eigvals(x) # returns complex values
> array([ 1.99062044 -4.98523579e-17j, ?0.18062978 -9.36952928e-19j,
> ? ? ? -0.23511915 -2.19606549e-17j])
>>>> np.linalg.eigvalsh(x) # imag always zero
> array([-0.23511915+0.j, ?0.18062978+0.j, ?1.99062044+0.j])
>>>>
>
>
>
> Xavier
>
>
>
>> Sebastian
>>
>>
>>
>>
>>
>> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>
>>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>>>
>>>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>>>> save this matrix in matlab format, load it in matlab, and use matlab's
>>>> eig function to succesfully decompose it with real eigenvalues, so the
>>>> problem seems to be with scipy/numpy or their dependencies, not with my
>>>> matrix. Is this a known issue? And is there a good workaround?
>>>>
>>> Use the eigh function if you know your matrix is symmetric.
>>>
>>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
>>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
>>> check.
>>>
>>> A nonsymmetric eigensolver cannot know that your matrix is supposed to
>>> have real eigenvalues, so it's possible some of them explode to complex
>>> pairs because of minuscule numerical error. The imaginary part, however,
>>> is typically small.
>>>
>>> --
>>> Pauli Virtanen
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From chris.d.burns at gmail.com  Fri May 14 14:23:09 2010
From: chris.d.burns at gmail.com (Christopher Burns)
Date: Fri, 14 May 2010 11:23:09 -0700
Subject: [SciPy-User] problems with build
In-Reply-To: <4BEBC0EC.3030506@simplistix.co.uk>
References: <4BDFFE35.8060107@simplistix.co.uk>
	<4BEBC0EC.3030506@simplistix.co.uk>
Message-ID: <AANLkTilsnfH0HJpRFGADtut8CBufOallYgl006_yXQmr@mail.gmail.com>

I ran into the same error yesterday:

"SystemError: Parent module 'numpy.distutils' not loaded"

It turned out to be a broken install of numpy.  I was installing
mayavi with synaptic and it pulled in numpy as a dependency.  At the
end of the install it gave me this error msg:

E: python-numpy: subprocess post-installation script returned error
exit status 1

Synaptic installed numpy here:
/usr/lib/python2.5/site-packages

... but this numpy was not importable.  To fix it I manually removed
numpy, then used synaptic to install numpy only... tested the import
and ran numpy tests... once that was ok, then I installed mayavi and
everything worked.

Chris

On Thu, May 13, 2010 at 2:05 AM, Chris Withers <chris at simplistix.co.uk> wrote:
> It's been almost a week, does no-one want to shed any light on this?
> (and even longer, trying again now the lists are fixed)
>
> Chris
>
> Chris Withers wrote:
>> So, I tried this to get the latest numpy installed on an Ubuntu box:
>>
>> sudo apt-get build-dep python-numpy
>>
>> Then, inside the virtual_env I'm working in:
>>
>> bin/easy_install bin/easy_install numpy
>>
>> ...which left me with:
>>
>> Installed .../lib/python2.5/site-packages/numpy-1.4.1-py2.5-linux-x86_64.egg
>> Processing dependencies for numpy
>> Finished processing dependencies for numpy
>> Error in atexit._run_exitfuncs:
>> Traceback (most recent call last):
>> ? ?File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
>> ? ? ?func(*targs, **kargs)
>> ? ?File
>> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py",
>> line 248, in clean_up_temporary_directory
>> SystemError: Parent module 'numpy.distutils' not loaded
>> Error in sys.exitfunc:
>> Traceback (most recent call last):
>> ? ?File "/usr/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
>> ? ? ?func(*targs, **kargs)
>> ? ?File
>> "/tmp/easy_install-TFDAD2/numpy-1.4.1/numpy/distutils/misc_util.py",
>> line 248, in clean_up_temporary_directory
>> SystemError: Parent module 'numpy.distutils' not loaded
>>
>> ...and yet:
>>
>> $ bin/python
>> Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
>> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> ?>>> import numpy
>> ?>>>
>>
>> Any idea what those weird atexit handlers are supposed to do?!
>>
>> They seem to fire not only when numpy is installed but also when
>> anything that depends on numpy is installed...
>
> cheers,
>
> Chris
>
> --
> Simplistix - Content Management, Batch Processing & Python Consulting
> ? ? ? ? ? ?- http://www.simplistix.co.uk
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Christopher Burns
Senior Software Engineer
O.N. Diagnostics, LLC
64 Shattuck Sq. Suite 220, Berkeley, CA 94704
_____________________________________

If you receive this message in error, please delete it immediately.
This message may contain information that is privileged, confidential
and exempt from disclosure and dissemination under applicable law.


From josef.pktd at gmail.com  Fri May 14 14:31:20 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 14 May 2010 14:31:20 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <AANLkTik94_XH2gLo0Q76P6EoLUF9P5WTNYPkikw3Yf05@mail.gmail.com>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>
	<hscfav$in7$1@dough.gmane.org>
	<AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com>
	<4BED90E2.4000707@apstat.com>
	<AANLkTik94_XH2gLo0Q76P6EoLUF9P5WTNYPkikw3Yf05@mail.gmail.com>
Message-ID: <AANLkTinXTfyXOJvYDuChMLsaYDQBtTpx0YhZJHkk3jwP@mail.gmail.com>

On Fri, May 14, 2010 at 2:12 PM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> On 14 May 2010 14:05, Xavier Saint-Mleux <saintmlx at apstat.com> wrote:
>> Sebastian Walter wrote:
>>> Hello Pauli,
>>> On what kind of matrix did you observe such unstable behavior?
>>> Were there repeated eigenvalues?
>>>
>>
>> It happens to me a lot with complex covariance matrices (Hermitian).
>> Here's a simple example that returns non-real eigenvalues for an
>> Hermitian matrix:
>
> Uh, not to be difficult, but these values are not actually complex.
> The complex component is within a floating-point epsilon of zero. The
> only way to do better than this is to explicitly notice that the
> matrix is Hermitian and branch to special-case code. And if the matrix
> is only numerically Hermitian, i.e. values that should be equal differ
> by a floating-point epsilon, even this won't help.

this might help to get rid of complex noise

numpy.real_if_close(a, tol=100)
If complex input returns a real array if complex parts are close to zero.

Josef


>
> Anne
>
>>
>>>>> np.random.seed(0)
>>>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j
>>>>> x = (x+x.T.conj())/2 # make it Hermitian
>>>>> x == x.T.conj() # ensure it is Hermitian
>> array([[ True, ?True, ?True],
>> ? ? ? [ True, ?True, ?True],
>> ? ? ? [ True, ?True, ?True]], dtype=bool)
>>>>> np.linalg.eigvals(x) # returns complex values
>> array([ 1.99062044 -4.98523579e-17j, ?0.18062978 -9.36952928e-19j,
>> ? ? ? -0.23511915 -2.19606549e-17j])
>>>>> np.linalg.eigvalsh(x) # imag always zero
>> array([-0.23511915+0.j, ?0.18062978+0.j, ?1.99062044+0.j])
>>>>>
>>
>>
>>
>> Xavier
>>
>>
>>
>>> Sebastian
>>>
>>>
>>>
>>>
>>>
>>> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>>
>>>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>>>>
>>>>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>>>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>>>>> save this matrix in matlab format, load it in matlab, and use matlab's
>>>>> eig function to succesfully decompose it with real eigenvalues, so the
>>>>> problem seems to be with scipy/numpy or their dependencies, not with my
>>>>> matrix. Is this a known issue? And is there a good workaround?
>>>>>
>>>> Use the eigh function if you know your matrix is symmetric.
>>>>
>>>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
>>>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
>>>> check.
>>>>
>>>> A nonsymmetric eigensolver cannot know that your matrix is supposed to
>>>> have real eigenvalues, so it's possible some of them explode to complex
>>>> pairs because of minuscule numerical error. The imaginary part, however,
>>>> is typically small.
>>>>
>>>> --
>>>> Pauli Virtanen
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From saintmlx at apstat.com  Fri May 14 15:00:23 2010
From: saintmlx at apstat.com (Xavier Saint-Mleux)
Date: Fri, 14 May 2010 15:00:23 -0400
Subject: [SciPy-User] Eigenvalue decomposition bug
In-Reply-To: <AANLkTinXTfyXOJvYDuChMLsaYDQBtTpx0YhZJHkk3jwP@mail.gmail.com>
References: <AANLkTilH6slRpMD-sWPOelym1pSY7PbTlRWTVQhrdQJE@mail.gmail.com>	<hscfav$in7$1@dough.gmane.org>	<AANLkTilwS8hkDFx8vepxQLgwyalkKeeA2tYXG1014ww1@mail.gmail.com>	<4BED90E2.4000707@apstat.com>	<AANLkTik94_XH2gLo0Q76P6EoLUF9P5WTNYPkikw3Yf05@mail.gmail.com>
	<AANLkTinXTfyXOJvYDuChMLsaYDQBtTpx0YhZJHkk3jwP@mail.gmail.com>
Message-ID: <4BED9DC7.8020804@apstat.com>

josef.pktd at gmail.com wrote:
> On Fri, May 14, 2010 at 2:12 PM, Anne Archibald
> <aarchiba at physics.mcgill.ca> wrote:
>   
>> On 14 May 2010 14:05, Xavier Saint-Mleux <saintmlx at apstat.com> wrote:
>>     
>>> Sebastian Walter wrote:
>>>       
>>>> Hello Pauli,
>>>> On what kind of matrix did you observe such unstable behavior?
>>>> Were there repeated eigenvalues?
>>>>
>>>>         
>>> It happens to me a lot with complex covariance matrices (Hermitian).
>>> Here's a simple example that returns non-real eigenvalues for an
>>> Hermitian matrix:
>>>       
>> Uh, not to be difficult, but these values are not actually complex.
>> The complex component is within a floating-point epsilon of zero. The
>> only way to do better than this is to explicitly notice that the
>> matrix is Hermitian and branch to special-case code. And if the matrix
>> is only numerically Hermitian, i.e. values that should be equal differ
>> by a floating-point epsilon, even this won't help.
>>     
>
> this might help to get rid of complex noise
>
> numpy.real_if_close(a, tol=100)
> If complex input returns a real array if complex parts are close to zero.
>   

Thanks, Josef!  I was just dropping the imaginary part whenever I needed
to, but 'real_if_close' looks like a much cleaner solution.


Xavier


> Josef
>
>
>   
>> Anne
>>
>>     
>>>>>> np.random.seed(0)
>>>>>> x = np.random.random((3,3)) + np.random.random((3,3)) * 1j
>>>>>> x = (x+x.T.conj())/2 # make it Hermitian
>>>>>> x == x.T.conj() # ensure it is Hermitian
>>>>>>             
>>> array([[ True,  True,  True],
>>>       [ True,  True,  True],
>>>       [ True,  True,  True]], dtype=bool)
>>>       
>>>>>> np.linalg.eigvals(x) # returns complex values
>>>>>>             
>>> array([ 1.99062044 -4.98523579e-17j,  0.18062978 -9.36952928e-19j,
>>>       -0.23511915 -2.19606549e-17j])
>>>       
>>>>>> np.linalg.eigvalsh(x) # imag always zero
>>>>>>             
>>> array([-0.23511915+0.j,  0.18062978+0.j,  1.99062044+0.j])
>>>       
>>>
>>> Xavier
>>>
>>>
>>>
>>>       
>>>> Sebastian
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 11, 2010 at 10:39 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>>>
>>>>         
>>>>> Tue, 11 May 2010 16:04:00 -0400, Ian Goodfellow wrote:
>>>>>
>>>>>           
>>>>>> I've find that (scipy/numpy).linalg.eig have a problem where given a
>>>>>> symmetric matrix they return complex eigenvalues. I can use scipy.io to
>>>>>> save this matrix in matlab format, load it in matlab, and use matlab's
>>>>>> eig function to succesfully decompose it with real eigenvalues, so the
>>>>>> problem seems to be with scipy/numpy or their dependencies, not with my
>>>>>> matrix. Is this a known issue? And is there a good workaround?
>>>>>>
>>>>>>             
>>>>> Use the eigh function if you know your matrix is symmetric.
>>>>>
>>>>> Matlab IIRC checks first if the matrix is symmetric, and if yes, uses a
>>>>> symmetric-specific eigensolver. Numpy and Scipy don't do this automatic
>>>>> check.
>>>>>
>>>>> A nonsymmetric eigensolver cannot know that your matrix is supposed to
>>>>> have real eigenvalues, so it's possible some of them explode to complex
>>>>> pairs because of minuscule numerical error. The imaginary part, however,
>>>>> is typically small.
>>>>>
>>>>> --
>>>>> Pauli Virtanen
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>         
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>       
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>     
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From briedel at wisc.edu  Fri May 14 15:01:06 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Fri, 14 May 2010 14:01:06 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
Message-ID: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>

Hey,

I am fairly new Scipy and am trying to do a least square fit to a set of
data. Currently, I am using following code:

fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
errfunc = lambda p, x, y: (y-fitfunc(p,x))
pinit = [20,20.]
out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1)

I am now trying to get the goodness of fit out of this data. I am sort of
running into a brick wall because I found a lot of conflicting ways of how
to calculate it.

I am aware of the chisquare function in stats function, but the
documentation seems a little confusing to me. Any help would be greatly
appreciates.

Thanks very much in advance.

Cheers,

Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100514/e1ee166c/attachment.html>

From josef.pktd at gmail.com  Fri May 14 15:51:29 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 14 May 2010 15:51:29 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
Message-ID: <AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>

On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
> Hey,
>
> I am fairly new Scipy and am trying to do a least square fit to a set of
> data. Currently, I am using following code:
>
> fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> errfunc = lambda p, x, y: (y-fitfunc(p,x))
> pinit = [20,20.]
> out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1)
>
> I am now trying to get the goodness of fit out of this data. I am sort of
> running into a brick wall because I found a lot of conflicting ways of how
> to calculate it.

For regression the usual is
http://en.wikipedia.org/wiki/Coefficient_of_determination
coefficient of determination is

    R^2 = 1 - {SS_{err} / SS_{tot}}

Note your fitfunc is linear in parameters and can be better estimated
by linear least squares, OLS.
linear regression is handled in statsmodels and you can get lot's of
statistics without worrying about the formulas.
If you only have one slope parameter, then scipy.stats.linregress also works

scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
of the parameter estimates.
http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit

> I am aware of the chisquare function in stats function, but the
> documentation seems a little confusing to me. Any help would be greatly
> appreciates.

chisquare and others like kolmogorov-smirnov are more for testing the
goodness-of-fit of entire distributions, not for how well a curve or
line fits the data.

Josef

>
> Thanks very much in advance.
>
> Cheers,
>
> Ben
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From lpc at cmu.edu  Fri May 14 16:50:11 2010
From: lpc at cmu.edu (Luis Pedro Coelho)
Date: Fri, 14 May 2010 16:50:11 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
Message-ID: <201005141650.17597.lpc@cmu.edu>

On Wednesday, Sebastian Haase wrote:
> this sounds exciting and I might find some time to try it out ...
> BTW, the Python image-sig  should not be a "PIL only" mailing list. So
> (eventually) I feel, this issue could be brought up there, too.

I have created a mailing list for python computer vision topics (things that 
are images but not PIL related):

http://groups.google.com/group/pythonvision?pli=1

It is currently very low traffic since it just started (this is my first 
public announcement).

*

Btw, for the same sort of issues (opening 16-bit TIFFs in particular), I once 
wrote a wrapper around imagemagick's C++ image opening functions:

http://github.com/luispedro/readmagick

I works nicely on linux, but some people were trying to use it on Mac or 
Windows and got really stuck b/c they didn't know how to compile it and I 
couldn't help them, so I gave up on trying to make this more widely used.

HTH
-- 
Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100514/749a15de/attachment.sig>

From rajs2010 at gmail.com  Sat May 15 00:05:31 2010
From: rajs2010 at gmail.com (Rajeev Singh)
Date: Sat, 15 May 2010 09:35:31 +0530
Subject: [SciPy-User] weave newbie question
Message-ID: <AANLkTimP8W4VckVCZM0rtDFSbiNa3o-ne32vsAlBfwhT@mail.gmail.com>

Hi,

The following program is not doing what I expect it to do

a, b = 1, 2
code = \
'''
int temp;
temp = a;
a = b;
b = temp;
'''
weave.inline(code, ['a', 'b'])
print a, b

whereas the following is working fine

a = np.arange(5)
b = np.arange(5,10)
print a, b
code = \
'''
double temp;
int i;
for (i=0; i<5; i++) {
    temp = a[i];
    a[i] = b[i];
    b[i] = temp;
}
'''
weave.inline(code, ['a', 'b'])
print a, b

I think I am missing something very basic. Can someone help me out here?

Best wishes,
Rajeev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100515/5b4c05a0/attachment.html>

From cournape at gmail.com  Sat May 15 05:03:33 2010
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 15 May 2010 18:03:33 +0900
Subject: [SciPy-User] compile on win64 for intel and msvc
In-Reply-To: <AANLkTiknN1ntPFwhg855dbAQb41vGDx-3bwd_fg6xSz5@mail.gmail.com>
References: <AANLkTilnx2c6xKIZDg81qQhpt4d0JdZKysrwdTX8sBpy@mail.gmail.com>
	<AANLkTiknN1ntPFwhg855dbAQb41vGDx-3bwd_fg6xSz5@mail.gmail.com>
Message-ID: <AANLkTinQ9dRw1K5HpqZA6uGO-BQZf9ophIhrDTWhPspx@mail.gmail.com>

On Fri, May 14, 2010 at 7:56 PM, Paul Edwards <paul.m.edwards at gmail.com> wrote:
> BTW If I try and use numscons instead I get:
>
> 8<---------------------------------------------------------------------
> D:\scratch\SS02\pkgs\build\numpy-1.4.1>..\Python-2.6.5\PCbuild\amd64\python.exe
> setupscons.py scons -b --fcompiler=ifort --compiler=msvc config
> Running from numpy source directory.
> Forcing DISTUTILS_USE_SDK=1
> non-existing path in 'numpy\\core': 'code_generators\\numpy_api_order.txt'
> non-existing path in 'numpy\\core': 'code_generators\\ufunc_api_order.txt'
> non-existing path in 'numpy\\core': 'include/numpy\\numpyconfig.h.in'
> running scons
> Executing scons command (pkg is numpy.core):
> D:\scratch\SS02\pkgs\build\Python-2.6.5\PCbuild\amd64\python.exe
> "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-
> packages\numscons\scons-local\scons.py" -f numpy\core\SConstruct -I.
> scons_tool_path="" src_dir="numpy\core" pkg_path="numpy\core"
> pkg_name="numpy.core" log_lev
> el=50 distutils_libdir="..\..\..\..\build\lib.win-amd64-2.6"
> distutils_clibdir="..\..\..\..\build\temp.win-amd64-2.6"
> distutils_install_prefix="D:\scratch\SS02\
> pkgs\build\Python-2.6.5\Lib\site-packages\numpy\core" cc_opt=msvc
> cc_opt=msvc debug=0 f77_opt=ifort cxx_opt=msvc
> include_bootstrap=..\..\..\..\numpy\core\includ
> e bypass=1 import_env=0 silent=0 bootstrapping=1
> scons: Reading SConscript files ...
> Mkdir("build\scons\numpy\core")
> WindowsError: [Error 2] The system cannot find the file specified:
> ?File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\numpy\core\SConstruct", line 2:
> ? ?GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript')
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py",
> line 135:
> ? ?build_dir = '$build_dir', src_dir = '$src_dir')
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py",
> line 553:
> ? ?return apply(_SConscript, [self.fs,] + files, subst_kw)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Script\SConscript.py",
> line 262:
> ? ?exec _file_ in call_stack[-1].globals
> ?File "D:\scratch\SS02\pkgs\build\numpy-1.4.1\build\scons\numpy\core\SConscript",
> line 38:
> ? ?env = GetNumpyEnvironment(ARGUMENTS)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py",
> line 23:
> ? ?env = _get_numpy_env(args)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\starter.py",
> line 63:
> ? ?initialize_tools(env)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py",
> line 186:
> ? ?initialize_f77(env)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\initialization.py",
> line 119:
> ? ?env.Tool(name)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\core\numpyenv.py",
> line 125:
> ? ?get_numscons_toolpaths(self))
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Environment.py",
> line 1704:
> ? ?tool(self)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\scons-local\scons-local-1.2.0\SCons\Tool\__init__.py",
> line 181:
> ? ?apply(self.generate, ( env, ) + args, kw)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py",
> line 44:
> ? ?return generate_win32(env)
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\ifort.py",
> line 30:
> ? ?pdir = product_dir_fc(versdict[vers[0]])
> ?File "D:\scratch\SS02\pkgs\build\Python-2.6.5\lib\site-packages\numscons\tools\intel_common\win32.py",
> line 77:
> ? ?return _winreg.QueryValueEx(k, "ProductDir")[0]
> error: Error while executing scons command. See above for more information.
> If you think it is a problem in numscons, you can also try executing the scons
> command with --log-level option for more detailed output of what numscons is
> doing, for example --log-level=0; the lowest the level is, the more detailed
> the output it.

Where is VS 2008 installed ? Are you sure you have the 64 bits SDK
(the free version does not have it AFAIK) ? I should update numscons
scons copy to a more recent version, but I don't have time to work on
numscons ATM.

cheers,

David


From 3njoywind at gmail.com  Sat May 15 05:25:37 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Sat, 15 May 2010 17:25:37 +0800
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of an
	array with more than one element is ambiguous
Message-ID: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>

Traceback (most recent call last):
  File "D:\Yt.py", line 31, in <module>
    r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
  File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 266,
in leastsq
    m = check_func(func,x0,args,n)[0]
  File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 12,
in check_func
    res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
  File "D:\Yt.py", line 26, in residuals
    return y - Yt(x, p)
  File "D:\Yt.py", line 20, in Yt
    for i in range(0, Et(x)):
  File "D:\Yt.py", line 11, in Et
    if t == 1995:
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

---------------------------------------------------------------------------------------------------------


When running the following code:

--------------------------------------------------------------------------------------------

from scipy.optimize import leastsq
import numpy as np

def Iv(t):
    if t == 1995:
        return t + 2
    else:
        return t

def Et(t):
    if t == 1995:
        return t + 2
    else:
        return t

def Yt(x, p):
    a, pa = p
    sum = 0

    for i in range(0, Et(x)):
        v = x - et + i
        sum += a*(1+p)**(v)*Iv(v)
    return sum

def residuals(p, y, x):
    return y - Yt(x, p)

T = np.array([1995,1996,1997,1998,1999])
Y = np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])

r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
A, Pa = r[0]
print "A=",A,"Pa=",Pa

----------------------------------------------------------------------------------------------

I know the error occurs when I compare t like: "if t == 1995",but I
have no idea how to handle it correctly.

Any help would be greatly appreciated.

Zhe Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100515/f70500ad/attachment.html>

From josef.pktd at gmail.com  Sat May 15 07:02:40 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 15 May 2010 07:02:40 -0400
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
Message-ID: <AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>

On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> Traceback (most recent call last):
> ??File "D:\Yt.py", line 31, in <module>
> ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 266,
> in leastsq
> ?? ?m = check_func(func,x0,args,n)[0]
> ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line 12,
> in check_func
> ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
> ??File "D:\Yt.py", line 26, in residuals
> ?? ?return y - Yt(x, p)
> ??File "D:\Yt.py", line 20, in Yt
> ?? ?for i in range(0, Et(x)):
> ??File "D:\Yt.py", line 11, in Et
> ?? ?if t == 1995:
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
> ---------------------------------------------------------------------------------------------------------
>
> When running the following code:
>
> --------------------------------------------------------------------------------------------
>
> from scipy.optimize import leastsq
> import numpy as np
>
> def Iv(t):
>     if t == 1995:
>         return t + 2
>     else:
>         return t
>
> def Et(t):
>     if t == 1995:
>         return t + 2
>     else:
>         return t
>
> def Yt(x, p):
>     a, pa = p
>     sum = 0
>
>     for i in range(0, Et(x)):
>         v = x - et + i
>         sum += a*(1+p)**(v)*Iv(v)
>     return sum
>
> def residuals(p, y, x):
>     return y - Yt(x, p)
>
> T = np.array([1995,1996,1997,1998,1999])
> Y =
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>
> r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> A, Pa = r[0]
> print "A=",A,"Pa=",Pa
>
> ----------------------------------------------------------------------------------------------
>
> I know the error occurs when I compare t like: "if t == 1995",but I have no
> idea how to handle it correctly.

try the vectorized version of a conditional assignment, e.g.
np.where(t == 1995, t, t+2)

I didn't read enough of your example, to tell whether your Yt loop can
be vectorized with a single sum, but I guess so.

optimize leastsq expects an array, so residuals (and Yt) need to
return an array not a single value, maybe np.cusum and conditional or
data dependent slicing/indexing works

Josef


>
> Any help would be greatly appreciated.
>
> Zhe Wang
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From 3njoywind at gmail.com  Sat May 15 08:49:06 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Sat, 15 May 2010 20:49:06 +0800
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com> 
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
Message-ID: <AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>

Josef:

Thanks for your reply:)

Actually I want to fit this equation:

Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]

I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
calculated by e() and I().

I rewrote my code like this:
----------------------------------------------------------------------------------------------
from scipy.optimize import leastsq
import numpy as np

def Iv(t):
    return 4

def Yt(x, et):
    a, pa = x
    sum = np.array([0,0,0,0,0])
    for i in range(0, len(et)):
        for j in range(0, et[i]):
            v = T[i] - et[i] + j
            sum[i] += a*(1+pa)**(v)*Iv(v)
    return sum - Y

T = np.array([1995,1996,1997,1998,1999])
Y =
np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
E = np.array([10,11,12,13,14])

r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
A, Pa = r[0]
print "A=",A,"Pa=",Pa
----------------------------------------------------------------------------------------------
the output is:
A= 1.0 Pa = 0.0
----------------------------------------------------------------------------------------------
I don't think it is correct. Hope for your guidence.

On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:

> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> > Traceback (most recent call last):
> >   File "D:\Yt.py", line 31, in <module>
> >     r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
> 266,
> > in leastsq
> >     m = check_func(func,x0,args,n)[0]
> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
> 12,
> > in check_func
> >     res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
> >   File "D:\Yt.py", line 26, in residuals
> >     return y - Yt(x, p)
> >   File "D:\Yt.py", line 20, in Yt
> >     for i in range(0, Et(x)):
> >   File "D:\Yt.py", line 11, in Et
> >     if t == 1995:
> > ValueError: The truth value of an array with more than one element is
> > ambiguous. Use a.any() or a.all()
> >
> ---------------------------------------------------------------------------------------------------------
> >
> > When running the following code:
> >
> >
> --------------------------------------------------------------------------------------------
> >
> > from scipy.optimize import leastsq
> > import numpy as np
> >
> > def Iv(t):
> >     if t == 1995:
> >         return t + 2
> >     else:
> >         return t
> >
> > def Et(t):
> >     if t == 1995:
> >         return t + 2
> >     else:
> >         return t
> >
> > def Yt(x, p):
> >     a, pa = p
> >     sum = 0
> >
> >     for i in range(0, Et(x)):
> >         v = x - et + i
> >         sum += a*(1+p)**(v)*Iv(v)
> >     return sum
> >
> > def residuals(p, y, x):
> >     return y - Yt(x, p)
> >
> > T = np.array([1995,1996,1997,1998,1999])
> > Y =
> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >
> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> > A, Pa = r[0]
> > print "A=",A,"Pa=",Pa
> >
> >
> ----------------------------------------------------------------------------------------------
> >
> > I know the error occurs when I compare t like: "if t == 1995",but I have
> no
> > idea how to handle it correctly.
>
> try the vectorized version of a conditional assignment, e.g.
> np.where(t == 1995, t, t+2)
>
> I didn't read enough of your example, to tell whether your Yt loop can
> be vectorized with a single sum, but I guess so.
>
> optimize leastsq expects an array, so residuals (and Yt) need to
> return an array not a single value, maybe np.cusum and conditional or
> data dependent slicing/indexing works
>
> Josef
>
>
> >
> > Any help would be greatly appreciated.
> >
> > Zhe Wang
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100515/922270bc/attachment.html>

From josef.pktd at gmail.com  Sat May 15 09:08:22 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 15 May 2010 09:08:22 -0400
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>
Message-ID: <AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>

On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> Josef:
> Thanks for your reply:)
> Actually I want to fit this equation:
> Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
> I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
> calculated by e() and I().

Do you always have a fixed start date as in your example

>>> T = np.array([1995,1996,1997,1998,1999])
>>> E = np.array([10,11,12,13,14])
>>> T-E
array([1985, 1985, 1985, 1985, 1985])

so that always v =range(T0, T+1)     with fixed T0=1985

this would make it easier to work forwards than backwards, e.g. something like
v = np.arange(...)
Y = np.cusum((a*(1+p)**v)*I(v))

Josef


> I rewrote my code like this:
> ----------------------------------------------------------------------------------------------
> from scipy.optimize import leastsq
> import numpy as np
> def Iv(t):
> ?? ?return 4
> def Yt(x, et):
> ?? ?a, pa = x
> ?? ?sum = np.array([0,0,0,0,0])
> ?? ?for i in range(0, len(et)):
> ?? ? ? ?for j in range(0, et[i]):
> ?? ? ? ? ? ?v = T[i] - et[i] + j
> ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v)
> ?? ?return sum - Y
> T = np.array([1995,1996,1997,1998,1999])
> Y =
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> E = np.array([10,11,12,13,14])
> r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
> A, Pa = r[0]
> print "A=",A,"Pa=",Pa
> ----------------------------------------------------------------------------------------------
> the output is:
> A= 1.0 Pa = 0.0
> ----------------------------------------------------------------------------------------------
> I don't think it is correct. Hope for your guidence.
> On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote:
>> > Traceback (most recent call last):
>> > ??File "D:\Yt.py", line 31, in <module>
>> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
>> > 266,
>> > in leastsq
>> > ?? ?m = check_func(func,x0,args,n)[0]
>> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
>> > 12,
>> > in check_func
>> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>> > ??File "D:\Yt.py", line 26, in residuals
>> > ?? ?return y - Yt(x, p)
>> > ??File "D:\Yt.py", line 20, in Yt
>> > ?? ?for i in range(0, Et(x)):
>> > ??File "D:\Yt.py", line 11, in Et
>> > ?? ?if t == 1995:
>> > ValueError: The truth value of an array with more than one element is
>> > ambiguous. Use a.any() or a.all()
>> >
>> > ---------------------------------------------------------------------------------------------------------
>> >
>> > When running the following code:
>> >
>> >
>> > --------------------------------------------------------------------------------------------
>> >
>> > from scipy.optimize import leastsq
>> > import numpy as np
>> >
>> > def Iv(t):
>> > ? ? if t == 1995:
>> > ? ? ? ? return t + 2
>> > ? ? else:
>> > ? ? ? ? return t
>> >
>> > def Et(t):
>> > ? ? if t == 1995:
>> > ? ? ? ? return t + 2
>> > ? ? else:
>> > ? ? ? ? return t
>> >
>> > def Yt(x, p):
>> > ? ? a, pa = p
>> > ? ? sum = 0
>> >
>> > ? ? for i in range(0, Et(x)):
>> > ? ? ? ? v = x - et + i
>> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v)
>> > ? ? return sum
>> >
>> > def residuals(p, y, x):
>> > ? ? return y - Yt(x, p)
>> >
>> > T = np.array([1995,1996,1997,1998,1999])
>> > Y =
>> >
>> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>> >
>> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> > A, Pa = r[0]
>> > print "A=",A,"Pa=",Pa
>> >
>> >
>> > ----------------------------------------------------------------------------------------------
>> >
>> > I know the error occurs when I compare t like: "if t == 1995",but I have
>> > no
>> > idea how to handle it correctly.
>>
>> try the vectorized version of a conditional assignment, e.g.
>> np.where(t == 1995, t, t+2)
>>
>> I didn't read enough of your example, to tell whether your Yt loop can
>> be vectorized with a single sum, but I guess so.
>>
>> optimize leastsq expects an array, so residuals (and Yt) need to
>> return an array not a single value, maybe np.cusum and conditional or
>> data dependent slicing/indexing works
>>
>> Josef
>>
>>
>> >
>> > Any help would be greatly appreciated.
>> >
>> > Zhe Wang
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From 3njoywind at gmail.com  Sat May 15 11:06:04 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Sat, 15 May 2010 23:06:04 +0800
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com> 
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com> 
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com> 
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>
Message-ID: <AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com>

Josef:

Thanks,
my example is just for test, it is not always have a fixed start, e.g

>>> T = np.array([1995,1996,1997,1998,1999])
>>> E = np.array([14,12,11,15,12])
>>> T-E
array([1981, 1984, 1986, 1983, 1987])

so, if I define the function:

def func(x):
    #....
    v = np.arange(...)
    Y = np.cumsum((a*(1+p)**v)*I(v))
    return Y

when I call leastsq(func, [1,0]), v should change as the element of T
change, e.g.

when t(one element of T) is 1995?v = np.arange(1981, 1995)
when t is 1996, v = np.arange(1984, 1996)
......

this troubles me so much.

Zhe Wang


On Sat, May 15, 2010 at 9:08 PM, <josef.pktd at gmail.com> wrote:

> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> > Josef:
> > Thanks for your reply:)
> > Actually I want to fit this equation:
> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
> > calculated by e() and I().
>
> Do you always have a fixed start date as in your example
>
> >>> T = np.array([1995,1996,1997,1998,1999])
> >>> E = np.array([10,11,12,13,14])
> >>> T-E
> array([1985, 1985, 1985, 1985, 1985])
>
> so that always v =range(T0, T+1)     with fixed T0=1985
>
> this would make it easier to work forwards than backwards, e.g. something
> like
> v = np.arange(...)
> Y = np.cusum((a*(1+p)**v)*I(v))
>
> Josef
>
>
>
>
> > I rewrote my code like this:
> >
> ----------------------------------------------------------------------------------------------
> > from scipy.optimize import leastsq
> > import numpy as np
> > def Iv(t):
> >     return 4
> > def Yt(x, et):
> >     a, pa = x
> >     sum = np.array([0,0,0,0,0])
> >     for i in range(0, len(et)):
> >         for j in range(0, et[i]):
> >             v = T[i] - et[i] + j
> >             sum[i] += a*(1+pa)**(v)*Iv(v)
> >     return sum - Y
> > T = np.array([1995,1996,1997,1998,1999])
> > Y =
> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> > E = np.array([10,11,12,13,14])
> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
> > A, Pa = r[0]
> > print "A=",A,"Pa=",Pa
> >
> ----------------------------------------------------------------------------------------------
> > the output is:
> > A= 1.0 Pa = 0.0
> >
> ----------------------------------------------------------------------------------------------
> > I don't think it is correct. Hope for your guidence.
> > On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> >> > Traceback (most recent call last):
> >> >   File "D:\Yt.py", line 31, in <module>
> >> >     r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
> >> > 266,
> >> > in leastsq
> >> >     m = check_func(func,x0,args,n)[0]
> >> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py", line
> >> > 12,
> >> > in check_func
> >> >     res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
> >> >   File "D:\Yt.py", line 26, in residuals
> >> >     return y - Yt(x, p)
> >> >   File "D:\Yt.py", line 20, in Yt
> >> >     for i in range(0, Et(x)):
> >> >   File "D:\Yt.py", line 11, in Et
> >> >     if t == 1995:
> >> > ValueError: The truth value of an array with more than one element is
> >> > ambiguous. Use a.any() or a.all()
> >> >
> >> >
> ---------------------------------------------------------------------------------------------------------
> >> >
> >> > When running the following code:
> >> >
> >> >
> >> >
> --------------------------------------------------------------------------------------------
> >> >
> >> > from scipy.optimize import leastsq
> >> > import numpy as np
> >> >
> >> > def Iv(t):
> >> >     if t == 1995:
> >> >         return t + 2
> >> >     else:
> >> >         return t
> >> >
> >> > def Et(t):
> >> >     if t == 1995:
> >> >         return t + 2
> >> >     else:
> >> >         return t
> >> >
> >> > def Yt(x, p):
> >> >     a, pa = p
> >> >     sum = 0
> >> >
> >> >     for i in range(0, Et(x)):
> >> >         v = x - et + i
> >> >         sum += a*(1+p)**(v)*Iv(v)
> >> >     return sum
> >> >
> >> > def residuals(p, y, x):
> >> >     return y - Yt(x, p)
> >> >
> >> > T = np.array([1995,1996,1997,1998,1999])
> >> > Y =
> >> >
> >> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >> >
> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> > A, Pa = r[0]
> >> > print "A=",A,"Pa=",Pa
> >> >
> >> >
> >> >
> ----------------------------------------------------------------------------------------------
> >> >
> >> > I know the error occurs when I compare t like: "if t == 1995",but I
> have
> >> > no
> >> > idea how to handle it correctly.
> >>
> >> try the vectorized version of a conditional assignment, e.g.
> >> np.where(t == 1995, t, t+2)
> >>
> >> I didn't read enough of your example, to tell whether your Yt loop can
> >> be vectorized with a single sum, but I guess so.
> >>
> >> optimize leastsq expects an array, so residuals (and Yt) need to
> >> return an array not a single value, maybe np.cusum and conditional or
> >> data dependent slicing/indexing works
> >>
> >> Josef
> >>
> >>
> >> >
> >> > Any help would be greatly appreciated.
> >> >
> >> > Zhe Wang
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100515/8dcb2bbf/attachment.html>

From josef.pktd at gmail.com  Sat May 15 12:34:33 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 15 May 2010 12:34:33 -0400
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com>
Message-ID: <AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com>

On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> Josef:
> Thanks,
> my example is just for test, it is not always have a fixed start, e.g
>>>> T = np.array([1995,1996,1997,1998,1999])
>>>> E = np.array([14,12,11,15,12])
>>>> T-E
> array([1981, 1984, 1986, 1983, 1987])
> so, if I define the function:
> def func(x):
> ?? ?#....
> ?? ?v = np.arange(...)
> ?? ?Y = np.cumsum((a*(1+p)**v)*I(v))
> ?? ?return Y
> when I call leastsq(func, [1,0]), v should change as the element of T
> change, e.g.
> when t(one element of T) is 1995?v = np.arange(1981, 1995)
> when t is 1996, v = np.arange(1984, 1996)

are you sure v is supposed to be calender years and not number of
years accumulated?  i.e

(1+p)**1984
or
(1+p)**(1996-1984)

The most efficient would be to use the formula for the sum of a finite
geometric series, which would also avoid the sum.

Josef


> ......
> this troubles me so much.
> Zhe Wang
>
>
> On Sat, May 15, 2010 at 9:08 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote:
>> > Josef:
>> > Thanks for your reply:)
>> > Actually I want to fit this equation:
>> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
>> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
>> > calculated by e() and I().
>>
>> Do you always have a fixed start date as in your example
>>
>> >>> T = np.array([1995,1996,1997,1998,1999])
>> >>> E = np.array([10,11,12,13,14])
>> >>> T-E
>> array([1985, 1985, 1985, 1985, 1985])
>>
>> so that always v =range(T0, T+1) ? ? with fixed T0=1985
>>
>> this would make it easier to work forwards than backwards, e.g. something
>> like
>> v = np.arange(...)
>> Y = np.cusum((a*(1+p)**v)*I(v))
>>
>> Josef
>>
>>
>>
>>
>> > I rewrote my code like this:
>> >
>> > ----------------------------------------------------------------------------------------------
>> > from scipy.optimize import leastsq
>> > import numpy as np
>> > def Iv(t):
>> > ?? ?return 4
>> > def Yt(x, et):
>> > ?? ?a, pa = x
>> > ?? ?sum = np.array([0,0,0,0,0])
>> > ?? ?for i in range(0, len(et)):
>> > ?? ? ? ?for j in range(0, et[i]):
>> > ?? ? ? ? ? ?v = T[i] - et[i] + j
>> > ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v)
>> > ?? ?return sum - Y
>> > T = np.array([1995,1996,1997,1998,1999])
>> > Y =
>> >
>> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>> > E = np.array([10,11,12,13,14])
>> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
>> > A, Pa = r[0]
>> > print "A=",A,"Pa=",Pa
>> >
>> > ----------------------------------------------------------------------------------------------
>> > the output is:
>> > A= 1.0 Pa = 0.0
>> >
>> > ----------------------------------------------------------------------------------------------
>> > I don't think it is correct. Hope for your guidence.
>> > On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com> wrote:
>> >> > Traceback (most recent call last):
>> >> > ??File "D:\Yt.py", line 31, in <module>
>> >> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
>> >> > line
>> >> > 266,
>> >> > in leastsq
>> >> > ?? ?m = check_func(func,x0,args,n)[0]
>> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
>> >> > line
>> >> > 12,
>> >> > in check_func
>> >> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>> >> > ??File "D:\Yt.py", line 26, in residuals
>> >> > ?? ?return y - Yt(x, p)
>> >> > ??File "D:\Yt.py", line 20, in Yt
>> >> > ?? ?for i in range(0, Et(x)):
>> >> > ??File "D:\Yt.py", line 11, in Et
>> >> > ?? ?if t == 1995:
>> >> > ValueError: The truth value of an array with more than one element is
>> >> > ambiguous. Use a.any() or a.all()
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------------------------------------------
>> >> >
>> >> > When running the following code:
>> >> >
>> >> >
>> >> >
>> >> > --------------------------------------------------------------------------------------------
>> >> >
>> >> > from scipy.optimize import leastsq
>> >> > import numpy as np
>> >> >
>> >> > def Iv(t):
>> >> > ? ? if t == 1995:
>> >> > ? ? ? ? return t + 2
>> >> > ? ? else:
>> >> > ? ? ? ? return t
>> >> >
>> >> > def Et(t):
>> >> > ? ? if t == 1995:
>> >> > ? ? ? ? return t + 2
>> >> > ? ? else:
>> >> > ? ? ? ? return t
>> >> >
>> >> > def Yt(x, p):
>> >> > ? ? a, pa = p
>> >> > ? ? sum = 0
>> >> >
>> >> > ? ? for i in range(0, Et(x)):
>> >> > ? ? ? ? v = x - et + i
>> >> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v)
>> >> > ? ? return sum
>> >> >
>> >> > def residuals(p, y, x):
>> >> > ? ? return y - Yt(x, p)
>> >> >
>> >> > T = np.array([1995,1996,1997,1998,1999])
>> >> > Y =
>> >> >
>> >> >
>> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>> >> >
>> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> >> > A, Pa = r[0]
>> >> > print "A=",A,"Pa=",Pa
>> >> >
>> >> >
>> >> >
>> >> > ----------------------------------------------------------------------------------------------
>> >> >
>> >> > I know the error occurs when I compare t like: "if t == 1995",but I
>> >> > have
>> >> > no
>> >> > idea how to handle it correctly.
>> >>
>> >> try the vectorized version of a conditional assignment, e.g.
>> >> np.where(t == 1995, t, t+2)
>> >>
>> >> I didn't read enough of your example, to tell whether your Yt loop can
>> >> be vectorized with a single sum, but I guess so.
>> >>
>> >> optimize leastsq expects an array, so residuals (and Yt) need to
>> >> return an array not a single value, maybe np.cusum and conditional or
>> >> data dependent slicing/indexing works
>> >>
>> >> Josef
>> >>
>> >>
>> >> >
>> >> > Any help would be greatly appreciated.
>> >> >
>> >> > Zhe Wang
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From alan at ajackson.org  Sat May 15 20:09:02 2010
From: alan at ajackson.org (alan at ajackson.org)
Date: Sat, 15 May 2010 19:09:02 -0500
Subject: [SciPy-User] writing data to binary for fortran
In-Reply-To: <4BEBE4D0.7090003@comcast.net>
References: <a40bf0ce-982e-4b9e-8a17-afd3e05070a4@o14g2000yqb.googlegroups.com>
	<4BE9B6ED.1020603@wartburg.edu>
	<c953d751-0ec3-47d9-a832-a3eb57d33b2d@b7g2000yqk.googlegroups.com>
	<4BEAECC4.5050508@wartburg.edu>
	<a1e4a2cc-8dff-4a1a-a1bd-788fa76e5fcd@i9g2000yqi.googlegroups.com>
	<4BEB0DBD.4080000@wartburg.edu>
	<7b63ab5f-16af-44a7-8682-fda863b21b01@p2g2000yqh.googlegroups.com>
	<4BEBE4D0.7090003@comcast.net>
Message-ID: <20100515190902.25a3ed70@ajackson.org>

A few years ago I was speaking with a colleague - a brilliant gentleman
in his late seventies who had done the blast wave modeling for the
Bikini Atoll tests on a slide rule - and I mentioned that for his
finite difference elastic wave equation modeling code he must use a lot
of double precision arithmetic. He looked very hurt, and replied "Oh
no, single precision is all you need if you know what you're doing".


>"Back in the day," double precision was MUCH slower than single precision arithmetic, so Fortran used single precision by default.  You used double precision only when absolutely necessary, and you had to call it explicitly.  Fortran even had separate "built-in" functions for single and double - eg., sin, dsin, log, dlog, etc. - that the user called explicitly.  (I haven't used Fortran for 20 years, but I think modern Fortran recognizes the type of argument, now.)
>
>Single and double precision are about the same speed on modern processors, and double is sometimes even faster than single on 64 bit processors (because of the ancillary data shuffling, I think).  However, Fortran is dragging nearly 60 years of history along with it, so I'm not surprised that it defaults to single precision.
>
>john
>
>
>
>On 5/12/2010 6:05 PM, Gideon wrote:Yea, that worked for me on my OS X machine.  Thanks so much.
>
>To be honest, in the 10 years I've been doing floating point
>calculations for ODEs and PDEs, I don't think I've ever used single
>precision arithmetic. So I am surprised it doesn't default to double
>precision.  Obviously, different people have different needs.
>
>On May 12, 4:21 pm, Neil Martinsen-Burrell <n... at wartburg.edu> wrote:
>  On 2010-05-12 14:58, Gideon wrote:
>
>    Tried both, but I got the same error in both cases.
>      
>If you want doubles in your file, you have to request them:
>
>F.writeReals(x, prec='d')
>
>makes everything work for me (Ubuntu 10.04, python 2.6.5, gfortran
>4.4.3).  Note that looking at the size of the file that you would expect
>to have for the data you are expecting to read would have demonstrated
>this: 10 doubles at eight bytes per double plus two 4-byte integers
>would have given you 88 bytes for the file, rather than the 48 that were
>being produced.
>
>I use fortranfile most heavily for reading files, rather than writing
>them, so I may have missed this opportunity, but do you think that the
>precision used in writeReals should be auto-detected from the data type
>that it is passed.  That is, would
>
>def writeReals(self, reals, prec=None):
>     if prec is None:
>         prec = reals.dtype.char
>     ...
>
>be better for your use?  That would have made your original code work as
>written.
>
>-Neil
>_______________________________________________
>SciPy-User mailing list
>SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
>--
>You received this message because you are subscribed to the Google Groups "SciPy-user" group.
>To post to this group, send email to scipy-user at googlegroups.com.
>To unsubscribe from this group, send email to scipy-user+unsubscribe at googlegroups.com.
>For more options, visit this group athttp://groups.google.com/group/scipy-user?hl=en.
>    _______________________________________________
>SciPy-User mailing list
>SciPy-User at scipy.org
>http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>No virus found in this incoming message.
>Checked by AVG - www.avg.com 
>Version: 9.0.819 / Virus Database: 271.1.1/2869 - Release Date: 05/12/10 02:26:00
>
>  


-- 
-----------------------------------------------------------------------
| Alan K. Jackson            | To see a World in a Grain of Sand      |
| alan at ajackson.org          | And a Heaven in a Wild Flower,         |
| www.ajackson.org           | Hold Infinity in the palm of your hand |
| Houston, Texas             | And Eternity in an hour. - Blake       |
-----------------------------------------------------------------------


From 3njoywind at gmail.com  Sat May 15 21:58:37 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Sun, 16 May 2010 09:58:37 +0800
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com> 
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com> 
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com> 
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com> 
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com> 
	<AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com>
Message-ID: <AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com>

v is calender years e.g

(1+p)**1984

May be you could help me with this simple example:

I have a function defined like this(a0 is a parameter):
------------------------------------
def f(x):
    if x > 4:
        return x + a0
    else:
        return x - a0
------------------------------------

I generated the data when I let a0=2 :
--------------------------------------
X = np.array([1,2,3,4,5,6,7,8,9])
Y = np.array([-1, 0, 1, 2,7,8,9,10,11])
--------------------------------------

How can I use leastsq() to fit f(x)? I have wrote some code like this:
--------------------------------------
def func(p, x, y):
    a = p
    sum = np.array([0,0,0,0,0,0,0,0,0])
    for i in range(0, len(x)):
        if x[i] > 4:
            sum[i] = x[i] + a
        else:
            sum[i] = x[i] - a
    return sum - y
r = leastsq(func,1,args=(X,Y))
print r[0]
--------------------------------------

the output is 1.0000000149, much different like 2.
I doubt whether leastsq() is suitable for this kind of problem. Maybe I
should try another way?

Zhe

On Sun, May 16, 2010 at 12:34 AM, <josef.pktd at gmail.com> wrote:

> On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> > Josef:
> > Thanks,
> > my example is just for test, it is not always have a fixed start, e.g
> >>>> T = np.array([1995,1996,1997,1998,1999])
> >>>> E = np.array([14,12,11,15,12])
> >>>> T-E
> > array([1981, 1984, 1986, 1983, 1987])
> > so, if I define the function:
> > def func(x):
> >     #....
> >     v = np.arange(...)
> >     Y = np.cumsum((a*(1+p)**v)*I(v))
> >     return Y
> > when I call leastsq(func, [1,0]), v should change as the element of T
> > change, e.g.
> > when t(one element of T) is 1995?v = np.arange(1981, 1995)
> > when t is 1996, v = np.arange(1984, 1996)
>
> are you sure v is supposed to be calender years and not number of
> years accumulated?  i.e
>
> (1+p)**1984
> or
> (1+p)**(1996-1984)
>
> The most efficient would be to use the formula for the sum of a finite
> geometric series, which would also avoid the sum.
>
> Josef
>
>
> > ......
> > this troubles me so much.
> > Zhe Wang
> >
> >
> > On Sat, May 15, 2010 at 9:08 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> >> > Josef:
> >> > Thanks for your reply:)
> >> > Actually I want to fit this equation:
> >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
> >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
> >> > calculated by e() and I().
> >>
> >> Do you always have a fixed start date as in your example
> >>
> >> >>> T = np.array([1995,1996,1997,1998,1999])
> >> >>> E = np.array([10,11,12,13,14])
> >> >>> T-E
> >> array([1985, 1985, 1985, 1985, 1985])
> >>
> >> so that always v =range(T0, T+1)     with fixed T0=1985
> >>
> >> this would make it easier to work forwards than backwards, e.g.
> something
> >> like
> >> v = np.arange(...)
> >> Y = np.cusum((a*(1+p)**v)*I(v))
> >>
> >> Josef
> >>
> >>
> >>
> >>
> >> > I rewrote my code like this:
> >> >
> >> >
> ----------------------------------------------------------------------------------------------
> >> > from scipy.optimize import leastsq
> >> > import numpy as np
> >> > def Iv(t):
> >> >     return 4
> >> > def Yt(x, et):
> >> >     a, pa = x
> >> >     sum = np.array([0,0,0,0,0])
> >> >     for i in range(0, len(et)):
> >> >         for j in range(0, et[i]):
> >> >             v = T[i] - et[i] + j
> >> >             sum[i] += a*(1+pa)**(v)*Iv(v)
> >> >     return sum - Y
> >> > T = np.array([1995,1996,1997,1998,1999])
> >> > Y =
> >> >
> >> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >> > E = np.array([10,11,12,13,14])
> >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
> >> > A, Pa = r[0]
> >> > print "A=",A,"Pa=",Pa
> >> >
> >> >
> ----------------------------------------------------------------------------------------------
> >> > the output is:
> >> > A= 1.0 Pa = 0.0
> >> >
> >> >
> ----------------------------------------------------------------------------------------------
> >> > I don't think it is correct. Hope for your guidence.
> >> > On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com>
> wrote:
> >> >> > Traceback (most recent call last):
> >> >> >   File "D:\Yt.py", line 31, in <module>
> >> >> >     r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> >> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
> >> >> > line
> >> >> > 266,
> >> >> > in leastsq
> >> >> >     m = check_func(func,x0,args,n)[0]
> >> >> >   File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
> >> >> > line
> >> >> > 12,
> >> >> > in check_func
> >> >> >     res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
> >> >> >   File "D:\Yt.py", line 26, in residuals
> >> >> >     return y - Yt(x, p)
> >> >> >   File "D:\Yt.py", line 20, in Yt
> >> >> >     for i in range(0, Et(x)):
> >> >> >   File "D:\Yt.py", line 11, in Et
> >> >> >     if t == 1995:
> >> >> > ValueError: The truth value of an array with more than one element
> is
> >> >> > ambiguous. Use a.any() or a.all()
> >> >> >
> >> >> >
> >> >> >
> ---------------------------------------------------------------------------------------------------------
> >> >> >
> >> >> > When running the following code:
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> --------------------------------------------------------------------------------------------
> >> >> >
> >> >> > from scipy.optimize import leastsq
> >> >> > import numpy as np
> >> >> >
> >> >> > def Iv(t):
> >> >> >     if t == 1995:
> >> >> >         return t + 2
> >> >> >     else:
> >> >> >         return t
> >> >> >
> >> >> > def Et(t):
> >> >> >     if t == 1995:
> >> >> >         return t + 2
> >> >> >     else:
> >> >> >         return t
> >> >> >
> >> >> > def Yt(x, p):
> >> >> >     a, pa = p
> >> >> >     sum = 0
> >> >> >
> >> >> >     for i in range(0, Et(x)):
> >> >> >         v = x - et + i
> >> >> >         sum += a*(1+p)**(v)*Iv(v)
> >> >> >     return sum
> >> >> >
> >> >> > def residuals(p, y, x):
> >> >> >     return y - Yt(x, p)
> >> >> >
> >> >> > T = np.array([1995,1996,1997,1998,1999])
> >> >> > Y =
> >> >> >
> >> >> >
> >> >> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >> >> >
> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> >> > A, Pa = r[0]
> >> >> > print "A=",A,"Pa=",Pa
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> ----------------------------------------------------------------------------------------------
> >> >> >
> >> >> > I know the error occurs when I compare t like: "if t == 1995",but I
> >> >> > have
> >> >> > no
> >> >> > idea how to handle it correctly.
> >> >>
> >> >> try the vectorized version of a conditional assignment, e.g.
> >> >> np.where(t == 1995, t, t+2)
> >> >>
> >> >> I didn't read enough of your example, to tell whether your Yt loop
> can
> >> >> be vectorized with a single sum, but I guess so.
> >> >>
> >> >> optimize leastsq expects an array, so residuals (and Yt) need to
> >> >> return an array not a single value, maybe np.cusum and conditional or
> >> >> data dependent slicing/indexing works
> >> >>
> >> >> Josef
> >> >>
> >> >>
> >> >> >
> >> >> > Any help would be greatly appreciated.
> >> >> >
> >> >> > Zhe Wang
> >> >> >
> >> >> > _______________________________________________
> >> >> > SciPy-User mailing list
> >> >> > SciPy-User at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> SciPy-User mailing list
> >> >> SciPy-User at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/7a68f7a9/attachment.html>

From aisaac at american.edu  Sat May 15 22:25:14 2010
From: aisaac at american.edu (Alan G Isaac)
Date: Sat, 15 May 2010 22:25:14 -0400
Subject: [SciPy-User] optimize.leastsq
In-Reply-To: <AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com>
	<AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com>
	<AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com>
Message-ID: <4BEF578A.6050004@american.edu>

On 5/15/2010 9:58 PM, Zhe Wang wrote:
>      sum = np.array([0,0,0,0,0,0,0,0,0])

sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float)

hth,
Alan Isaac


From 3njoywind at gmail.com  Sat May 15 22:37:34 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Sun, 16 May 2010 10:37:34 +0800
Subject: [SciPy-User] optimize.leastsq
In-Reply-To: <4BEF578A.6050004@american.edu>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com> 
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com> 
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com> 
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com> 
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com> 
	<AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com> 
	<AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com> 
	<4BEF578A.6050004@american.edu>
Message-ID: <AANLkTimJ73WipmIumU1cm_qsKPMO-nLrDG8v1C6_yx0v@mail.gmail.com>

Alan:
Thanks, it works. lol
I'll try it in my current work.

Regards
Zhe

On Sun, May 16, 2010 at 10:25 AM, Alan G Isaac <aisaac at american.edu> wrote:

> On 5/15/2010 9:58 PM, Zhe Wang wrote:
> >      sum = np.array([0,0,0,0,0,0,0,0,0])
>
> sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float)
>
> hth,
> Alan Isaac
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/20306229/attachment.html>

From briedel at wisc.edu  Sun May 16 00:12:21 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Sat, 15 May 2010 23:12:21 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
Message-ID: <AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>

On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:

> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
> > Hey,
> >
> > I am fairly new Scipy and am trying to do a least square fit to a set of
> > data. Currently, I am using following code:
> >
> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> > pinit = [20,20.]
> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1)
> >
> > I am now trying to get the goodness of fit out of this data. I am sort of
> > running into a brick wall because I found a lot of conflicting ways of
> how
> > to calculate it.
>
> For regression the usual is
> http://en.wikipedia.org/wiki/Coefficient_of_determination
> coefficient of determination is
>
>    R^2 = 1 - {SS_{err} / SS_{tot}}
>
> Note your fitfunc is linear in parameters and can be better estimated
> by linear least squares, OLS.
> linear regression is handled in statsmodels and you can get lot's of
> statistics without worrying about the formulas.
> If you only have one slope parameter, then scipy.stats.linregress also
> works
>
>
Thanks for the information. I am still note quite sure if this is what my
boss wants because there should not be an average y value.


> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
> of the parameter estimates.
> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>

I have been trying this out, but the fit just looks horrid compared to using
leastsq method even though they call the same function according to the
documentation.


> > I am aware of the chisquare function in stats function, but the
> > documentation seems a little confusing to me. Any help would be greatly
> > appreciates.
>
> chisquare and others like kolmogorov-smirnov are more for testing the
> goodness-of-fit of entire distributions, not for how well a curve or
> line fits the data.
>
>
That is what I thought, which brought up my confusion when I asked other
people and they told me to use that


> Josef
>
> >
> > Thanks very much in advance.
> >
> > Cheers,
> >
> > Ben
> >
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100515/e9cd6844/attachment.html>

From tmp50 at ukr.net  Sun May 16 04:39:29 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Sun, 16 May 2010 11:39:29 +0300
Subject: [SciPy-User] solving ODE by FuncDesigner with automatic
	differentiation
Message-ID: <E1ODZNh-0009At-Ii@ffe8.ukr.net>

Hi all,  
if anyone is interested, I have implemented possibility to model ODE in FuncDesigner and solve it, involving automatic differentiation.  
  
For examples and more details see  
http://openopt.org/FuncDesignerDoc#Solving_ODE  
  
Regards, D.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/59048750/attachment.html>

From josef.pktd at gmail.com  Sun May 16 06:50:29 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 16 May 2010 06:50:29 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
Message-ID: <AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>

On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
>
>
> On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>>
>> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
>> > Hey,
>> >
>> > I am fairly new Scipy and am trying to do a least square fit to a set of
>> > data. Currently, I am using following code:
>> >
>> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> > pinit = [20,20.]
>> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect), full_output=1)
>> >
>> > I am now trying to get the goodness of fit out of this data. I am sort
>> > of
>> > running into a brick wall because I found a lot of conflicting ways of
>> > how
>> > to calculate it.
>>
>> For regression the usual is
>> http://en.wikipedia.org/wiki/Coefficient_of_determination
>> coefficient of determination is
>>
>> ? ?R^2 = 1 - {SS_{err} / SS_{tot}}
>>
>> Note your fitfunc is linear in parameters and can be better estimated
>> by linear least squares, OLS.
>> linear regression is handled in statsmodels and you can get lot's of
>> statistics without worrying about the formulas.
>> If you only have one slope parameter, then scipy.stats.linregress also
>> works
>>
>
> Thanks for the information. I am still note quite sure if this is what my
> boss wants because there should not be an average y value.

The definition of Rsquared is pretty uncontroversial with the y.mean()
correction, if there is a constant in the regression (although I know
mainly the linear case for this).

If there is no constant in the regression, the definition or Rsquared
is not clear/unambiguous, but usually used without mean correction of
y.

Josef

>
>>
>> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
>> of the parameter estimates.
>> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>
> I have been trying this out, but the fit just looks horrid compared to using
> leastsq method even though they call the same function according to the
> documentation.
>
>>
>> > I am aware of the chisquare function in stats function, but the
>> > documentation seems a little confusing to me. Any help would be greatly
>> > appreciates.
>>
>> chisquare and others like kolmogorov-smirnov are more for testing the
>> goodness-of-fit of entire distributions, not for how well a curve or
>> line fits the data.
>>
>
> That is what I thought, which brought up my confusion when I asked other
> people and they told me to use that
>
>>
>> Josef
>>
>> >
>> > Thanks very much in advance.
>> >
>> > Cheers,
>> >
>> > Ben
>> >
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Benedikt Riedel
> Graduate Student University of Wisconsin-Madison
> Department of Physics
> Office: 2304 Chamberlin Hall
> Lab: 6247 Chamberlin Hall
> Tel: ?(608) 301-5736
> Cell: (213) 519-1771
> Lab: (608) 262-5916
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From tmp50 at ukr.net  Sun May 16 08:57:35 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Sun, 16 May 2010 15:57:35 +0300
Subject: [SciPy-User] Isn't it a bug in scipy.integrate.odeint doc?
Message-ID: <E1ODdPT-000L2G-Q7@ffe14.ukr.net>

hi all,  
I see the following lines in odeint doc/docstring  
http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html  
  
  
dy/dt = func(y,t0,...)
  func : callable(y, t0, ...)      

Computes the derivative of y at t0.    
  

  Dfun : callable(y, t0, ...)      

Gradient (Jacobian) of func.  shouldn't it be "t" instead of "t0" there?  
Let me also note, that some input variables are undocumented there.  
D.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/afc9cf55/attachment.html>

From baker.alexander at gmail.com  Sun May 16 12:10:40 2010
From: baker.alexander at gmail.com (alexander baker)
Date: Sun, 16 May 2010 17:10:40 +0100
Subject: [SciPy-User] python for physics
Message-ID: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>

3 friends Physics friends of mine are looking for a starting point to learn
scientific computing in Python relevant to applied Physics, does anyone have
any suggestions, hints or event a deck of slides that could be useful?

Alex


Mobile: 07788 872118
Blog: www.alexfb.com

--
All science is either physics or stamp collecting.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/c3130391/attachment.html>

From sebastian.walter at gmail.com  Sun May 16 12:22:53 2010
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Sun, 16 May 2010 18:22:53 +0200
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com>
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com>
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com>
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com>
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com>
	<AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com>
	<AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com>
Message-ID: <AANLkTilS5z2RPqVXv_W6BnAKD--uswbLRtxg6N8ZYXwf@mail.gmail.com>

you are using integer arrays...

this should work:
import numpy as np
from scipy.optimize import leastsq

X = np.array([1,2,3,4,5,6,7,8,9],dtype=float)
Y = np.array([-1, 0, 1, 2,7,8,9,10,11],dtype=float)

def func(p, x, y):
    a = p
    sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float)
    for i in range(0, len(x)):
        if x[i] > 4:
            sum[i] = x[i] + a
        else:
            sum[i] = x[i] - a
            return sum - y

r = leastsq(func,1,args=(X,Y))

print r[0]


regards,
Sebastian


On Sun, May 16, 2010 at 3:58 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> v is calender years e.g
> (1+p)**1984
> May be you could help me with this simple example:
> I have a function defined like this(a0 is a parameter):
> ------------------------------------
> def f(x):
> ?? ?if x > 4:
> ?? ? ? ?return x + a0
> ?? ?else:
> ?? ? ? ?return x - a0
> ------------------------------------
> I generated the data when I let a0=2 :
> --------------------------------------
> X = np.array([1,2,3,4,5,6,7,8,9])
> Y = np.array([-1, 0, 1, 2,7,8,9,10,11])
> --------------------------------------
> How can I use leastsq() to fit f(x)? I have wrote some code like this:
> --------------------------------------
> def func(p, x, y):
> ?? ?a = p
> ?? ?sum = np.array([0,0,0,0,0,0,0,0,0])
> ?? ?for i in range(0, len(x)):
> ?? ? ? ?if x[i] > 4:
> ?? ? ? ? ? ?sum[i] = x[i] + a
> ?? ? ? ?else:
> ?? ? ? ? ? ?sum[i] = x[i] - a
> ?? ?return sum - y
> r = leastsq(func,1,args=(X,Y))
> print r[0]
> --------------------------------------
> the output is 1.0000000149, much different like 2.
> I doubt whether leastsq() is suitable for this kind of problem. Maybe I
> should try another way?
> Zhe
> On Sun, May 16, 2010 at 12:34 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote:
>> > Josef:
>> > Thanks,
>> > my example is just for test, it is not always have a fixed start, e.g
>> >>>> T = np.array([1995,1996,1997,1998,1999])
>> >>>> E = np.array([14,12,11,15,12])
>> >>>> T-E
>> > array([1981, 1984, 1986, 1983, 1987])
>> > so, if I define the function:
>> > def func(x):
>> > ?? ?#....
>> > ?? ?v = np.arange(...)
>> > ?? ?Y = np.cumsum((a*(1+p)**v)*I(v))
>> > ?? ?return Y
>> > when I call leastsq(func, [1,0]), v should change as the element of T
>> > change, e.g.
>> > when t(one element of T) is 1995?v = np.arange(1981, 1995)
>> > when t is 1996, v = np.arange(1984, 1996)
>>
>> are you sure v is supposed to be calender years and not number of
>> years accumulated? ?i.e
>>
>> (1+p)**1984
>> or
>> (1+p)**(1996-1984)
>>
>> The most efficient would be to use the formula for the sum of a finite
>> geometric series, which would also avoid the sum.
>>
>> Josef
>>
>>
>> > ......
>> > this troubles me so much.
>> > Zhe Wang
>> >
>> >
>> > On Sat, May 15, 2010 at 9:08 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com> wrote:
>> >> > Josef:
>> >> > Thanks for your reply:)
>> >> > Actually I want to fit this equation:
>> >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
>> >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
>> >> > calculated by e() and I().
>> >>
>> >> Do you always have a fixed start date as in your example
>> >>
>> >> >>> T = np.array([1995,1996,1997,1998,1999])
>> >> >>> E = np.array([10,11,12,13,14])
>> >> >>> T-E
>> >> array([1985, 1985, 1985, 1985, 1985])
>> >>
>> >> so that always v =range(T0, T+1) ? ? with fixed T0=1985
>> >>
>> >> this would make it easier to work forwards than backwards, e.g.
>> >> something
>> >> like
>> >> v = np.arange(...)
>> >> Y = np.cusum((a*(1+p)**v)*I(v))
>> >>
>> >> Josef
>> >>
>> >>
>> >>
>> >>
>> >> > I rewrote my code like this:
>> >> >
>> >> >
>> >> > ----------------------------------------------------------------------------------------------
>> >> > from scipy.optimize import leastsq
>> >> > import numpy as np
>> >> > def Iv(t):
>> >> > ?? ?return 4
>> >> > def Yt(x, et):
>> >> > ?? ?a, pa = x
>> >> > ?? ?sum = np.array([0,0,0,0,0])
>> >> > ?? ?for i in range(0, len(et)):
>> >> > ?? ? ? ?for j in range(0, et[i]):
>> >> > ?? ? ? ? ? ?v = T[i] - et[i] + j
>> >> > ?? ? ? ? ? ?sum[i] += a*(1+pa)**(v)*Iv(v)
>> >> > ?? ?return sum - Y
>> >> > T = np.array([1995,1996,1997,1998,1999])
>> >> > Y =
>> >> >
>> >> >
>> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>> >> > E = np.array([10,11,12,13,14])
>> >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
>> >> > A, Pa = r[0]
>> >> > print "A=",A,"Pa=",Pa
>> >> >
>> >> >
>> >> > ----------------------------------------------------------------------------------------------
>> >> > the output is:
>> >> > A= 1.0 Pa = 0.0
>> >> >
>> >> >
>> >> > ----------------------------------------------------------------------------------------------
>> >> > I don't think it is correct. Hope for your guidence.
>> >> > On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com>
>> >> >> wrote:
>> >> >> > Traceback (most recent call last):
>> >> >> > ??File "D:\Yt.py", line 31, in <module>
>> >> >> > ?? ?r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
>> >> >> > line
>> >> >> > 266,
>> >> >> > in leastsq
>> >> >> > ?? ?m = check_func(func,x0,args,n)[0]
>> >> >> > ??File "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
>> >> >> > line
>> >> >> > 12,
>> >> >> > in check_func
>> >> >> > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
>> >> >> > ??File "D:\Yt.py", line 26, in residuals
>> >> >> > ?? ?return y - Yt(x, p)
>> >> >> > ??File "D:\Yt.py", line 20, in Yt
>> >> >> > ?? ?for i in range(0, Et(x)):
>> >> >> > ??File "D:\Yt.py", line 11, in Et
>> >> >> > ?? ?if t == 1995:
>> >> >> > ValueError: The truth value of an array with more than one element
>> >> >> > is
>> >> >> > ambiguous. Use a.any() or a.all()
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ---------------------------------------------------------------------------------------------------------
>> >> >> >
>> >> >> > When running the following code:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --------------------------------------------------------------------------------------------
>> >> >> >
>> >> >> > from scipy.optimize import leastsq
>> >> >> > import numpy as np
>> >> >> >
>> >> >> > def Iv(t):
>> >> >> > ? ? if t == 1995:
>> >> >> > ? ? ? ? return t + 2
>> >> >> > ? ? else:
>> >> >> > ? ? ? ? return t
>> >> >> >
>> >> >> > def Et(t):
>> >> >> > ? ? if t == 1995:
>> >> >> > ? ? ? ? return t + 2
>> >> >> > ? ? else:
>> >> >> > ? ? ? ? return t
>> >> >> >
>> >> >> > def Yt(x, p):
>> >> >> > ? ? a, pa = p
>> >> >> > ? ? sum = 0
>> >> >> >
>> >> >> > ? ? for i in range(0, Et(x)):
>> >> >> > ? ? ? ? v = x - et + i
>> >> >> > ? ? ? ? sum += a*(1+p)**(v)*Iv(v)
>> >> >> > ? ? return sum
>> >> >> >
>> >> >> > def residuals(p, y, x):
>> >> >> > ? ? return y - Yt(x, p)
>> >> >> >
>> >> >> > T = np.array([1995,1996,1997,1998,1999])
>> >> >> > Y =
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
>> >> >> >
>> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
>> >> >> > A, Pa = r[0]
>> >> >> > print "A=",A,"Pa=",Pa
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ----------------------------------------------------------------------------------------------
>> >> >> >
>> >> >> > I know the error occurs when I compare t like: "if t == 1995",but
>> >> >> > I
>> >> >> > have
>> >> >> > no
>> >> >> > idea how to handle it correctly.
>> >> >>
>> >> >> try the vectorized version of a conditional assignment, e.g.
>> >> >> np.where(t == 1995, t, t+2)
>> >> >>
>> >> >> I didn't read enough of your example, to tell whether your Yt loop
>> >> >> can
>> >> >> be vectorized with a single sum, but I guess so.
>> >> >>
>> >> >> optimize leastsq expects an array, so residuals (and Yt) need to
>> >> >> return an array not a single value, maybe np.cusum and conditional
>> >> >> or
>> >> >> data dependent slicing/indexing works
>> >> >>
>> >> >> Josef
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Any help would be greatly appreciated.
>> >> >> >
>> >> >> > Zhe Wang
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > SciPy-User mailing list
>> >> >> > SciPy-User at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From aisaac at american.edu  Sun May 16 12:33:10 2010
From: aisaac at american.edu (Alan G Isaac)
Date: Sun, 16 May 2010 12:33:10 -0400
Subject: [SciPy-User] python for physics
In-Reply-To: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
Message-ID: <4BF01E46.3020700@american.edu>

On 5/16/2010 12:10 PM, alexander baker wrote:
> Physics friends of mine are looking for a starting point to learn
> scientific computing in Python relevant to applied Physics, does anyone
> have any suggestions, hints or event a deck of slides that could be useful?

http://pages.physics.cornell.edu/sethna/StatMech/
http://pages.physics.cornell.edu/sethna/StatMech/ComputerExercises/
http://pages.physics.cornell.edu/sethna/StatMech/ComputerExercises/PythonSoftware/

hth,
Alan Isaac


From hasslerjc at comcast.net  Sun May 16 12:43:59 2010
From: hasslerjc at comcast.net (John Hassler)
Date: Sun, 16 May 2010 12:43:59 -0400
Subject: [SciPy-User] python for physics
In-Reply-To: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
Message-ID: <4BF020CF.9040406@comcast.net>

An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/9c6d598b/attachment.html>

From gael.varoquaux at normalesup.org  Sun May 16 12:51:43 2010
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 16 May 2010 18:51:43 +0200
Subject: [SciPy-User] python for physics
In-Reply-To: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
Message-ID: <20100516165143.GF19278@phare.normalesup.org>

On Sun, May 16, 2010 at 05:10:40PM +0100, alexander baker wrote:
>    3 friends Physics friends of mine are looking for a starting point to
>    learn scientific computing in Python relevant to applied Physics, does
>    anyone have any suggestions, hints or event a deck of slides that could be
>    useful?

This is not really physics-related, and is more oriented towards image
analysis than Physics, and on top of that it is unfinished, and I have
been shying from publishing on the net, but the notes of the courses I
give can be found here:
http://gael-varoquaux.info/python4science-2x1.pdf

Also, see Fernando's py4science page, full of useful material:
http://fperez.org/py4science/starter_kit.html

Ga?l


From d.l.goldsmith at gmail.com  Sun May 16 14:55:06 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sun, 16 May 2010 11:55:06 -0700
Subject: [SciPy-User] Isn't it a bug in scipy.integrate.odeint doc?
In-Reply-To: <E1ODdPT-000L2G-Q7@ffe14.ukr.net>
References: <E1ODdPT-000L2G-Q7@ffe14.ukr.net>
Message-ID: <AANLkTikhHXSpaEPdfJMetjqEZGYijk55CbaRa_cnRRPr@mail.gmail.com>

2010/5/16 Dmitrey <tmp50 at ukr.net>

>  hi all,
> I see the following lines in odeint doc/docstring
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html
>
> dy/dt = func(y,t0,...)*func* : callable(y, t0, ...)
>
> Computes the derivative of y at t0.
>
> *Dfun* : callable(y, t0, ...)
>
> Gradient (Jacobian) of func.
>
> shouldn't it be "t" instead of "t0" there?
> Let me also note, that some input variables are undocumented there.
> D.
>

Let me take this opportunity to note publicly: the scipy (as opposed to the
numpy) docs are not in a very advanced state (and that's putting it
politely).  Soon (hopefully tomorrow), I will be issuing a formal
announcement of the commencement of the 2010 Summer _SciPy_ Documentation
Marathon.  This will be a formal solicitation of volunteers to work on the
SciPy documentation; I'm hoping everyone concerned with the overall quality
of SciPy will help (whether or not you've participated in the past
Marathons).  But just for the record, for better or worse, we don't have a
ticketing system for reporting and tracking documentation "bugs"; rather, we
have the doc Wiki docs.scipy.org/scipy.  If you go to the docs page (click
the Docstrings link at the top of the front page) you'll see a color-coded
listing of all the objects in SciPy - light grey background = "Being
written," and white background = "Needs editing" = never been touched (since
having been imported into the Wiki database from the source code in SVN).
If you go to the status page (click on stats), you'll see that presently 97%
of SciPy's docstrings fall into one of these two categories (92% being in
the "never been touched" category).  So, while it is certainly helpful to
inform the list of deficiencies such as above, please understand that
problems like these are the overwhelming norm, not the exception, and the
_most_ helpful thing one can do in these situations is to register as an
editor (if one has not already done so; see
http://docs.scipy.org/numpy/Front%20Page/, and especially "Before you start"
on that page for instructions) and help fix the problem.  (Don't worry if
you feel you don't know enough about an object: if, in working on a
docstring, you have questions about an object, email your questions to the
list - getting these answered and then using that info to fix the docstring
oneself will almost certainly get it fixed faster than simply reporting the
problem and waiting for someone else to get around to it - that's the
motivation for asking people to help: not that others don't want to do it,
but that if everyone pitches in, it'll get done a whole lot faster.)
Thanks,

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/b5ce6cb9/attachment.html>

From gael.varoquaux at normalesup.org  Sun May 16 16:37:28 2010
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 16 May 2010 22:37:28 +0200
Subject: [SciPy-User] [sympy] EuroScipy abstract submission deadline extended
Message-ID: <20100516203728.GJ19278@phare.normalesup.org>

Given that we have been able to turn on registration only very late, the
EuroScipy conference committee is extending the deadline for abstract
submission for the 2010 EuroScipy conference.

On Thursday May 20th, at midnight Samoa time, we will turn off the
abstract submission on the conference site. Up to then, you can modify
the already-submitted abstract, or submit new abstracts.

We are very much looking forward to your submissions to the conference.

Ga?l Varoquaux
Nicolas Chauvat

-- 
 
EuroScipy 2010 is the annual European conference for scientists using
Python. It will be held July 8-11 2010, in ENS, Paris, France.
 
Links: 
Conference website: http://www.euroscipy.org/conference/euroscipy2010
Call for papers: http://www.euroscipy.org/card/euroscipy2010_call_for_papers
Practical information: http://www.euroscipy.org/card/euroscipy2010_practical_information
 

From gael.varoquaux at normalesup.org  Sat May 15 18:40:12 2010
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 16 May 2010 00:40:12 +0200
Subject: [SciPy-User] EuroScipy abstract submission deadline extended
Message-ID: <20100515224012.GC19412@phare.normalesup.org>

Given that we have been able to turn on registration only very late, the
EuroScipy conference committee is extending the deadline for abstract
submission for the 2010 EuroScipy conference.

On Thursday May 20th, at midnight Samoa time, we will turn off the
abstract submission on the conference site. Up to then, you can modify
the already-submitted abstract, or submit new abstracts.

We are very much looking forward to your submissions to the conference.

Ga?l Varoquaux
Nicolas Chauvat

-- 
 
EuroScipy 2010 is the annual conference for scientists using Python. It
will be held July 8-11 2010, in ENS, Paris, France.
 
Links: 
Conference website: http://www.euroscipy.org/conference/euroscipy2010
Call for papers: http://www.euroscipy.org/card/euroscipy2010_call_for_papers
Practical information: http://www.euroscipy.org/card/euroscipy2010_practical_information
 

From briedel at wisc.edu  Sun May 16 21:05:54 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Sun, 16 May 2010 20:05:54 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
Message-ID: <AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>

What I still do not understand is the fact that curve_fit gives me a
different output then leastsq, even though curve_fit calls leastsq.

I tried to get the chi-squared because we want to plot contours of
chi-square from the minimum to the maximum. I used following code:

fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
errfunc = lambda p, x, y: (y-fitfunc(p,x))
pinit = [20,20.]

def func(x, a, b):
     return a*exp(-x) + b

pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
sigma=R4errctsdataselect)
print pfinal
print covar
dof=size(tau)-size(pinit)
print dof
chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
tau)))/dof
print chi2

I am not 100% sure I am doing the degrees of freedom calculation right. I
got the chi-square formula from the Pearson chi-squared test.

Thank you very much for the help so far.

Cheers,

Ben

On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:

> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >
> >
> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
> >>
> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >> > Hey,
> >> >
> >> > I am fairly new Scipy and am trying to do a least square fit to a set
> of
> >> > data. Currently, I am using following code:
> >> >
> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> > pinit = [20,20.]
> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
> full_output=1)
> >> >
> >> > I am now trying to get the goodness of fit out of this data. I am sort
> >> > of
> >> > running into a brick wall because I found a lot of conflicting ways of
> >> > how
> >> > to calculate it.
> >>
> >> For regression the usual is
> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
> >> coefficient of determination is
> >>
> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
> >>
> >> Note your fitfunc is linear in parameters and can be better estimated
> >> by linear least squares, OLS.
> >> linear regression is handled in statsmodels and you can get lot's of
> >> statistics without worrying about the formulas.
> >> If you only have one slope parameter, then scipy.stats.linregress also
> >> works
> >>
> >
> > Thanks for the information. I am still note quite sure if this is what my
> > boss wants because there should not be an average y value.
>
> The definition of Rsquared is pretty uncontroversial with the y.mean()
> correction, if there is a constant in the regression (although I know
> mainly the linear case for this).
>
> If there is no constant in the regression, the definition or Rsquared
> is not clear/unambiguous, but usually used without mean correction of
> y.
>
> Josef
>
> >
> >>
> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
> >> of the parameter estimates.
> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
> >
> > I have been trying this out, but the fit just looks horrid compared to
> using
> > leastsq method even though they call the same function according to the
> > documentation.
> >
> >>
> >> > I am aware of the chisquare function in stats function, but the
> >> > documentation seems a little confusing to me. Any help would be
> greatly
> >> > appreciates.
> >>
> >> chisquare and others like kolmogorov-smirnov are more for testing the
> >> goodness-of-fit of entire distributions, not for how well a curve or
> >> line fits the data.
> >>
> >
> > That is what I thought, which brought up my confusion when I asked other
> > people and they told me to use that
> >
> >>
> >> Josef
> >>
> >> >
> >> > Thanks very much in advance.
> >> >
> >> > Cheers,
> >> >
> >> > Ben
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> > --
> > Benedikt Riedel
> > Graduate Student University of Wisconsin-Madison
> > Department of Physics
> > Office: 2304 Chamberlin Hall
> > Lab: 6247 Chamberlin Hall
> > Tel:  (608) 301-5736
> > Cell: (213) 519-1771
> > Lab: (608) 262-5916
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/15713364/attachment.html>

From 3njoywind at gmail.com  Sun May 16 21:33:43 2010
From: 3njoywind at gmail.com (Zhe Wang)
Date: Mon, 17 May 2010 09:33:43 +0800
Subject: [SciPy-User] optimize.leastsq - Value Error: The truth value of
	an array with more than one element is ambiguous
In-Reply-To: <AANLkTilS5z2RPqVXv_W6BnAKD--uswbLRtxg6N8ZYXwf@mail.gmail.com>
References: <AANLkTikyKLgqmnJ6yQlP_Q_9McltqeRoDbU8yNJloTfX@mail.gmail.com> 
	<AANLkTinTbsGWB5m1CJ6El7Py5uIWdXVYK25lvv9ggp69@mail.gmail.com> 
	<AANLkTikGOqVI8r2RoJDe4_kfmGS18RggBifXdU9aFBzn@mail.gmail.com> 
	<AANLkTinBIY_fXWkvBV090rgHMicyzL67V0UZKqobwb1A@mail.gmail.com> 
	<AANLkTikNoTMosJBAK7WO6AHt-Ex7dwf5Skk8RCeRuXNY@mail.gmail.com> 
	<AANLkTikhDKiq87UJcBhaKAhBSq90tPY4wA6I919yPmkg@mail.gmail.com> 
	<AANLkTikLqtW67EnYPAsdsMOe3LIUv3oFmX9dcVFElCkc@mail.gmail.com> 
	<AANLkTilS5z2RPqVXv_W6BnAKD--uswbLRtxg6N8ZYXwf@mail.gmail.com>
Message-ID: <AANLkTinM69s7-ahHncBfStRCW4ERW5tSJrVJOl4AISVF@mail.gmail.com>

Sebasian:
Thank you.
I have found that too and solved my problem now.

regards,
Zhe

On Mon, May 17, 2010 at 12:22 AM, Sebastian Walter <
sebastian.walter at gmail.com> wrote:

> you are using integer arrays...
>
> this should work:
> import numpy as np
> from scipy.optimize import leastsq
>
> X = np.array([1,2,3,4,5,6,7,8,9],dtype=float)
> Y = np.array([-1, 0, 1, 2,7,8,9,10,11],dtype=float)
>
> def func(p, x, y):
>    a = p
>     sum = np.array([0,0,0,0,0,0,0,0,0],dtype=float)
>     for i in range(0, len(x)):
>        if x[i] > 4:
>            sum[i] = x[i] + a
>        else:
>            sum[i] = x[i] - a
>            return sum - y
>
> r = leastsq(func,1,args=(X,Y))
>
> print r[0]
>
>
> regards,
> Sebastian
>
>
>
> On Sun, May 16, 2010 at 3:58 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> > v is calender years e.g
> > (1+p)**1984
> > May be you could help me with this simple example:
> > I have a function defined like this(a0 is a parameter):
> > ------------------------------------
> > def f(x):
> >     if x > 4:
> >         return x + a0
> >     else:
> >         return x - a0
> > ------------------------------------
> > I generated the data when I let a0=2 :
> > --------------------------------------
> > X = np.array([1,2,3,4,5,6,7,8,9])
> > Y = np.array([-1, 0, 1, 2,7,8,9,10,11])
> > --------------------------------------
> > How can I use leastsq() to fit f(x)? I have wrote some code like this:
> > --------------------------------------
> > def func(p, x, y):
> >     a = p
> >     sum = np.array([0,0,0,0,0,0,0,0,0])
> >     for i in range(0, len(x)):
> >         if x[i] > 4:
> >             sum[i] = x[i] + a
> >         else:
> >             sum[i] = x[i] - a
> >     return sum - y
> > r = leastsq(func,1,args=(X,Y))
> > print r[0]
> > --------------------------------------
> > the output is 1.0000000149, much different like 2.
> > I doubt whether leastsq() is suitable for this kind of problem. Maybe I
> > should try another way?
> > Zhe
> > On Sun, May 16, 2010 at 12:34 AM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, May 15, 2010 at 11:06 AM, Zhe Wang <3njoywind at gmail.com> wrote:
> >> > Josef:
> >> > Thanks,
> >> > my example is just for test, it is not always have a fixed start, e.g
> >> >>>> T = np.array([1995,1996,1997,1998,1999])
> >> >>>> E = np.array([14,12,11,15,12])
> >> >>>> T-E
> >> > array([1981, 1984, 1986, 1983, 1987])
> >> > so, if I define the function:
> >> > def func(x):
> >> >     #....
> >> >     v = np.arange(...)
> >> >     Y = np.cumsum((a*(1+p)**v)*I(v))
> >> >     return Y
> >> > when I call leastsq(func, [1,0]), v should change as the element of T
> >> > change, e.g.
> >> > when t(one element of T) is 1995?v = np.arange(1981, 1995)
> >> > when t is 1996, v = np.arange(1984, 1996)
> >>
> >> are you sure v is supposed to be calender years and not number of
> >> years accumulated?  i.e
> >>
> >> (1+p)**1984
> >> or
> >> (1+p)**(1996-1984)
> >>
> >> The most efficient would be to use the formula for the sum of a finite
> >> geometric series, which would also avoid the sum.
> >>
> >> Josef
> >>
> >>
> >> > ......
> >> > this troubles me so much.
> >> > Zhe Wang
> >> >
> >> >
> >> > On Sat, May 15, 2010 at 9:08 PM, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Sat, May 15, 2010 at 8:49 AM, Zhe Wang <3njoywind at gmail.com>
> wrote:
> >> >> > Josef:
> >> >> > Thanks for your reply:)
> >> >> > Actually I want to fit this equation:
> >> >> > Y(t) = sigma(v=t-e(t), t)[(a*(1+p)**v)*I(v)]
> >> >> > I got {t} and {Y(t)} and a, p are parameters. e(t) and I(v) can be
> >> >> > calculated by e() and I().
> >> >>
> >> >> Do you always have a fixed start date as in your example
> >> >>
> >> >> >>> T = np.array([1995,1996,1997,1998,1999])
> >> >> >>> E = np.array([10,11,12,13,14])
> >> >> >>> T-E
> >> >> array([1985, 1985, 1985, 1985, 1985])
> >> >>
> >> >> so that always v =range(T0, T+1)     with fixed T0=1985
> >> >>
> >> >> this would make it easier to work forwards than backwards, e.g.
> >> >> something
> >> >> like
> >> >> v = np.arange(...)
> >> >> Y = np.cusum((a*(1+p)**v)*I(v))
> >> >>
> >> >> Josef
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> > I rewrote my code like this:
> >> >> >
> >> >> >
> >> >> >
> ----------------------------------------------------------------------------------------------
> >> >> > from scipy.optimize import leastsq
> >> >> > import numpy as np
> >> >> > def Iv(t):
> >> >> >     return 4
> >> >> > def Yt(x, et):
> >> >> >     a, pa = x
> >> >> >     sum = np.array([0,0,0,0,0])
> >> >> >     for i in range(0, len(et)):
> >> >> >         for j in range(0, et[i]):
> >> >> >             v = T[i] - et[i] + j
> >> >> >             sum[i] += a*(1+pa)**(v)*Iv(v)
> >> >> >     return sum - Y
> >> >> > T = np.array([1995,1996,1997,1998,1999])
> >> >> > Y =
> >> >> >
> >> >> >
> >> >> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >> >> > E = np.array([10,11,12,13,14])
> >> >> > r = leastsq(Yt, [1,0], args = (E), maxfev=10000000)
> >> >> > A, Pa = r[0]
> >> >> > print "A=",A,"Pa=",Pa
> >> >> >
> >> >> >
> >> >> >
> ----------------------------------------------------------------------------------------------
> >> >> > the output is:
> >> >> > A= 1.0 Pa = 0.0
> >> >> >
> >> >> >
> >> >> >
> ----------------------------------------------------------------------------------------------
> >> >> > I don't think it is correct. Hope for your guidence.
> >> >> > On Sat, May 15, 2010 at 7:02 PM, <josef.pktd at gmail.com> wrote:
> >> >> >>
> >> >> >> On Sat, May 15, 2010 at 5:25 AM, Zhe Wang <3njoywind at gmail.com>
> >> >> >> wrote:
> >> >> >> > Traceback (most recent call last):
> >> >> >> >   File "D:\Yt.py", line 31, in <module>
> >> >> >> >     r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> >> >> >   File
> "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
> >> >> >> > line
> >> >> >> > 266,
> >> >> >> > in leastsq
> >> >> >> >     m = check_func(func,x0,args,n)[0]
> >> >> >> >   File
> "D:\Python26\lib\site-packages\scipy\optimize\minpack.py",
> >> >> >> > line
> >> >> >> > 12,
> >> >> >> > in check_func
> >> >> >> >     res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
> >> >> >> >   File "D:\Yt.py", line 26, in residuals
> >> >> >> >     return y - Yt(x, p)
> >> >> >> >   File "D:\Yt.py", line 20, in Yt
> >> >> >> >     for i in range(0, Et(x)):
> >> >> >> >   File "D:\Yt.py", line 11, in Et
> >> >> >> >     if t == 1995:
> >> >> >> > ValueError: The truth value of an array with more than one
> element
> >> >> >> > is
> >> >> >> > ambiguous. Use a.any() or a.all()
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ---------------------------------------------------------------------------------------------------------
> >> >> >> >
> >> >> >> > When running the following code:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> --------------------------------------------------------------------------------------------
> >> >> >> >
> >> >> >> > from scipy.optimize import leastsq
> >> >> >> > import numpy as np
> >> >> >> >
> >> >> >> > def Iv(t):
> >> >> >> >     if t == 1995:
> >> >> >> >         return t + 2
> >> >> >> >     else:
> >> >> >> >         return t
> >> >> >> >
> >> >> >> > def Et(t):
> >> >> >> >     if t == 1995:
> >> >> >> >         return t + 2
> >> >> >> >     else:
> >> >> >> >         return t
> >> >> >> >
> >> >> >> > def Yt(x, p):
> >> >> >> >     a, pa = p
> >> >> >> >     sum = 0
> >> >> >> >
> >> >> >> >     for i in range(0, Et(x)):
> >> >> >> >         v = x - et + i
> >> >> >> >         sum += a*(1+p)**(v)*Iv(v)
> >> >> >> >     return sum
> >> >> >> >
> >> >> >> > def residuals(p, y, x):
> >> >> >> >     return y - Yt(x, p)
> >> >> >> >
> >> >> >> > T = np.array([1995,1996,1997,1998,1999])
> >> >> >> > Y =
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> np.array([639300.36866,664872.383407,691467.278743,719125.969893,747891.008688])
> >> >> >> >
> >> >> >> > r = leastsq(residuals, [1,0], args=(Y,T), maxfev=10000000)
> >> >> >> > A, Pa = r[0]
> >> >> >> > print "A=",A,"Pa=",Pa
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ----------------------------------------------------------------------------------------------
> >> >> >> >
> >> >> >> > I know the error occurs when I compare t like: "if t ==
> 1995",but
> >> >> >> > I
> >> >> >> > have
> >> >> >> > no
> >> >> >> > idea how to handle it correctly.
> >> >> >>
> >> >> >> try the vectorized version of a conditional assignment, e.g.
> >> >> >> np.where(t == 1995, t, t+2)
> >> >> >>
> >> >> >> I didn't read enough of your example, to tell whether your Yt loop
> >> >> >> can
> >> >> >> be vectorized with a single sum, but I guess so.
> >> >> >>
> >> >> >> optimize leastsq expects an array, so residuals (and Yt) need to
> >> >> >> return an array not a single value, maybe np.cusum and conditional
> >> >> >> or
> >> >> >> data dependent slicing/indexing works
> >> >> >>
> >> >> >> Josef
> >> >> >>
> >> >> >>
> >> >> >> >
> >> >> >> > Any help would be greatly appreciated.
> >> >> >> >
> >> >> >> > Zhe Wang
> >> >> >> >
> >> >> >> > _______________________________________________
> >> >> >> > SciPy-User mailing list
> >> >> >> > SciPy-User at scipy.org
> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >> >
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> SciPy-User mailing list
> >> >> >> SciPy-User at scipy.org
> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > SciPy-User mailing list
> >> >> > SciPy-User at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> SciPy-User mailing list
> >> >> SciPy-User at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/68a91648/attachment.html>

From josef.pktd at gmail.com  Sun May 16 23:33:31 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 16 May 2010 23:33:31 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
Message-ID: <AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>

On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
> What I still do not understand is the fact that curve_fit gives me a
> different output then leastsq, even though curve_fit calls leastsq.
>
> I tried to get the chi-squared because we want to plot contours of
> chi-square from the minimum to the maximum. I used following code:
>
> fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> errfunc = lambda p, x, y: (y-fitfunc(p,x))
> pinit = [20,20.]
>
> def func(x, a, b):
> ???? return a*exp(-x) + b
>
> pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
> sigma=R4errctsdataselect)

this uses weighted least squares
sigma : None or N-length sequence
    If not None, it represents the standard-deviation of ydata. This
vector, if given, will be used as weights in the least-squares problem

In your initial example with leastsq you don't have any weighting,
it's just ordinary least squares

maybe that's the difference.


> print pfinal
> print covar
> dof=size(tau)-size(pinit)
> print dof
> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
> tau)))/dof
> print chi2
>
> I am not 100% sure I am doing the degrees of freedom calculation right. I
> got the chi-square formula from the Pearson chi-squared test.

I don't recognize your formula for chi2, and I don't see the
connection to Pearson chi-squared test .

Do you have a reference?

Josef

>
> Thank you very much for the help so far.
>
> Cheers,
>
> Ben
>
> On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>>
>> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>> wrote:
>> >
>> >
>> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu>
>> >> wrote:
>> >> > Hey,
>> >> >
>> >> > I am fairly new Scipy and am trying to do a least square fit to a set
>> >> > of
>> >> > data. Currently, I am using following code:
>> >> >
>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> > pinit = [20,20.]
>> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>> >> > full_output=1)
>> >> >
>> >> > I am now trying to get the goodness of fit out of this data. I am
>> >> > sort
>> >> > of
>> >> > running into a brick wall because I found a lot of conflicting ways
>> >> > of
>> >> > how
>> >> > to calculate it.
>> >>
>> >> For regression the usual is
>> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>> >> coefficient of determination is
>> >>
>> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}}
>> >>
>> >> Note your fitfunc is linear in parameters and can be better estimated
>> >> by linear least squares, OLS.
>> >> linear regression is handled in statsmodels and you can get lot's of
>> >> statistics without worrying about the formulas.
>> >> If you only have one slope parameter, then scipy.stats.linregress also
>> >> works
>> >>
>> >
>> > Thanks for the information. I am still note quite sure if this is what
>> > my
>> > boss wants because there should not be an average y value.
>>
>> The definition of Rsquared is pretty uncontroversial with the y.mean()
>> correction, if there is a constant in the regression (although I know
>> mainly the linear case for this).
>>
>> If there is no constant in the regression, the definition or Rsquared
>> is not clear/unambiguous, but usually used without mean correction of
>> y.
>>
>> Josef
>>
>> >
>> >>
>> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
>> >> of the parameter estimates.
>> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>> >
>> > I have been trying this out, but the fit just looks horrid compared to
>> > using
>> > leastsq method even though they call the same function according to the
>> > documentation.
>> >
>> >>
>> >> > I am aware of the chisquare function in stats function, but the
>> >> > documentation seems a little confusing to me. Any help would be
>> >> > greatly
>> >> > appreciates.
>> >>
>> >> chisquare and others like kolmogorov-smirnov are more for testing the
>> >> goodness-of-fit of entire distributions, not for how well a curve or
>> >> line fits the data.
>> >>
>> >
>> > That is what I thought, which brought up my confusion when I asked other
>> > people and they told me to use that
>> >
>> >>
>> >> Josef
>> >>
>> >> >
>> >> > Thanks very much in advance.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Ben
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> >
>> > --
>> > Benedikt Riedel
>> > Graduate Student University of Wisconsin-Madison
>> > Department of Physics
>> > Office: 2304 Chamberlin Hall
>> > Lab: 6247 Chamberlin Hall
>> > Tel: ?(608) 301-5736
>> > Cell: (213) 519-1771
>> > Lab: (608) 262-5916
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Benedikt Riedel
> Graduate Student University of Wisconsin-Madison
> Department of Physics
> Office: 2304 Chamberlin Hall
> Lab: 6247 Chamberlin Hall
> Tel: ?(608) 301-5736
> Cell: (213) 519-1771
> Lab: (608) 262-5916
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From briedel at wisc.edu  Mon May 17 00:18:00 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Sun, 16 May 2010 23:18:00 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
Message-ID: <AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>

On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:

> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
> > What I still do not understand is the fact that curve_fit gives me a
> > different output then leastsq, even though curve_fit calls leastsq.
> >
> > I tried to get the chi-squared because we want to plot contours of
> > chi-square from the minimum to the maximum. I used following code:
> >
> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> > pinit = [20,20.]
> >
> > def func(x, a, b):
> >      return a*exp(-x) + b
> >
> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
> > sigma=R4errctsdataselect)
>
> this uses weighted least squares
> sigma : None or N-length sequence
>    If not None, it represents the standard-deviation of ydata. This
> vector, if given, will be used as weights in the least-squares problem
>
> In your initial example with leastsq you don't have any weighting,
> it's just ordinary least squares
>
> maybe that's the difference.
>
>
>
Yeah I guess that will be it.


>
> > print pfinal
> > print covar
> > dof=size(tau)-size(pinit)
> > print dof
> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
> > tau)))/dof
> > print chi2
> >
> > I am not 100% sure I am doing the degrees of freedom calculation right. I
> > got the chi-square formula from the Pearson chi-squared test.
>
> I don't recognize your formula for chi2, and I don't see the
> connection to Pearson chi-squared test .
>
> Do you have a reference?
>
>
I based my use of the Pearson test from what I read in an Econometrics book,
but wiki has the a pretty good description. I basically based it off the
example there. Where the expected would be what comes out of the fit and
what you is the "R4ctsdataselect" for those specific values.

http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test


> Josef
>
>
Thanks again

Ben


>  >
> > Thank you very much for the help so far.
> >
> > Cheers,
> >
> > Ben
> >
> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
> >> wrote:
> >> >
> >> >
> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu>
> >> >> wrote:
> >> >> > Hey,
> >> >> >
> >> >> > I am fairly new Scipy and am trying to do a least square fit to a
> set
> >> >> > of
> >> >> > data. Currently, I am using following code:
> >> >> >
> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> >> > pinit = [20,20.]
> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
> >> >> > full_output=1)
> >> >> >
> >> >> > I am now trying to get the goodness of fit out of this data. I am
> >> >> > sort
> >> >> > of
> >> >> > running into a brick wall because I found a lot of conflicting ways
> >> >> > of
> >> >> > how
> >> >> > to calculate it.
> >> >>
> >> >> For regression the usual is
> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
> >> >> coefficient of determination is
> >> >>
> >> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
> >> >>
> >> >> Note your fitfunc is linear in parameters and can be better estimated
> >> >> by linear least squares, OLS.
> >> >> linear regression is handled in statsmodels and you can get lot's of
> >> >> statistics without worrying about the formulas.
> >> >> If you only have one slope parameter, then scipy.stats.linregress
> also
> >> >> works
> >> >>
> >> >
> >> > Thanks for the information. I am still note quite sure if this is what
> >> > my
> >> > boss wants because there should not be an average y value.
> >>
> >> The definition of Rsquared is pretty uncontroversial with the y.mean()
> >> correction, if there is a constant in the regression (although I know
> >> mainly the linear case for this).
> >>
> >> If there is no constant in the regression, the definition or Rsquared
> >> is not clear/unambiguous, but usually used without mean correction of
> >> y.
> >>
> >> Josef
> >>
> >> >
> >> >>
> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the covariance
> >> >> of the parameter estimates.
> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
> >> >
> >> > I have been trying this out, but the fit just looks horrid compared to
> >> > using
> >> > leastsq method even though they call the same function according to
> the
> >> > documentation.
> >> >
> >> >>
> >> >> > I am aware of the chisquare function in stats function, but the
> >> >> > documentation seems a little confusing to me. Any help would be
> >> >> > greatly
> >> >> > appreciates.
> >> >>
> >> >> chisquare and others like kolmogorov-smirnov are more for testing the
> >> >> goodness-of-fit of entire distributions, not for how well a curve or
> >> >> line fits the data.
> >> >>
> >> >
> >> > That is what I thought, which brought up my confusion when I asked
> other
> >> > people and they told me to use that
> >> >
> >> >>
> >> >> Josef
> >> >>
> >> >> >
> >> >> > Thanks very much in advance.
> >> >> >
> >> >> > Cheers,
> >> >> >
> >> >> > Ben
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > SciPy-User mailing list
> >> >> > SciPy-User at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> SciPy-User mailing list
> >> >> SciPy-User at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> >
> >> > --
> >> > Benedikt Riedel
> >> > Graduate Student University of Wisconsin-Madison
> >> > Department of Physics
> >> > Office: 2304 Chamberlin Hall
> >> > Lab: 6247 Chamberlin Hall
> >> > Tel:  (608) 301-5736
> >> > Cell: (213) 519-1771
> >> > Lab: (608) 262-5916
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> > --
> > Benedikt Riedel
> > Graduate Student University of Wisconsin-Madison
> > Department of Physics
> > Office: 2304 Chamberlin Hall
> > Lab: 6247 Chamberlin Hall
> > Tel:  (608) 301-5736
> > Cell: (213) 519-1771
> > Lab: (608) 262-5916
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100516/cf4df253/attachment.html>

From josef.pktd at gmail.com  Mon May 17 01:20:59 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 17 May 2010 01:20:59 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
	<AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
Message-ID: <AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>

On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
>
>
> On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
>>
>> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu> wrote:
>> > What I still do not understand is the fact that curve_fit gives me a
>> > different output then leastsq, even though curve_fit calls leastsq.
>> >
>> > I tried to get the chi-squared because we want to plot contours of
>> > chi-square from the minimum to the maximum. I used following code:
>> >
>> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> > pinit = [20,20.]
>> >
>> > def func(x, a, b):
>> > ???? return a*exp(-x) + b
>> >
>> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
>> > sigma=R4errctsdataselect)
>>
>> this uses weighted least squares
>> sigma : None or N-length sequence
>> ? ?If not None, it represents the standard-deviation of ydata. This
>> vector, if given, will be used as weights in the least-squares problem
>>
>> In your initial example with leastsq you don't have any weighting,
>> it's just ordinary least squares
>>
>> maybe that's the difference.
>>
>>
>
> Yeah I guess that will be it.
>
>>
>> > print pfinal
>> > print covar
>> > dof=size(tau)-size(pinit)
>> > print dof
>> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
>> > tau)))/dof
>> > print chi2
>> >
>> > I am not 100% sure I am doing the degrees of freedom calculation right.
>> > I
>> > got the chi-square formula from the Pearson chi-squared test.
>>
>> I don't recognize your formula for chi2, and I don't see the
>> connection to Pearson chi-squared test .
>>
>> Do you have a reference?
>>
>
> I based my use of the Pearson test from what I read in an Econometrics book,
> but wiki has the a pretty good description. I basically based it off the
> example there. Where the expected would be what comes out of the fit and
> what you is the "R4ctsdataselect" for those specific values.
>
> http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test

I looked at that, but it's a completely different case, the values in
the formulas are frequencies

    Oi = an observed frequency;
    Ei = an expected (theoretical) frequency, asserted by the null hypothesis;

not points on a regression curve

Josef

>
>
>>
>> Josef
>>
>
> Thanks again
>
> Ben
>
>
>>
>> >
>> > Thank you very much for the help so far.
>> >
>> > Cheers,
>> >
>> > Ben
>> >
>> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <briedel at wisc.edu>
>> >> >> wrote:
>> >> >> > Hey,
>> >> >> >
>> >> >> > I am fairly new Scipy and am trying to do a least square fit to a
>> >> >> > set
>> >> >> > of
>> >> >> > data. Currently, I am using following code:
>> >> >> >
>> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> >> > pinit = [20,20.]
>> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>> >> >> > full_output=1)
>> >> >> >
>> >> >> > I am now trying to get the goodness of fit out of this data. I am
>> >> >> > sort
>> >> >> > of
>> >> >> > running into a brick wall because I found a lot of conflicting
>> >> >> > ways
>> >> >> > of
>> >> >> > how
>> >> >> > to calculate it.
>> >> >>
>> >> >> For regression the usual is
>> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>> >> >> coefficient of determination is
>> >> >>
>> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}}
>> >> >>
>> >> >> Note your fitfunc is linear in parameters and can be better
>> >> >> estimated
>> >> >> by linear least squares, OLS.
>> >> >> linear regression is handled in statsmodels and you can get lot's of
>> >> >> statistics without worrying about the formulas.
>> >> >> If you only have one slope parameter, then scipy.stats.linregress
>> >> >> also
>> >> >> works
>> >> >>
>> >> >
>> >> > Thanks for the information. I am still note quite sure if this is
>> >> > what
>> >> > my
>> >> > boss wants because there should not be an average y value.
>> >>
>> >> The definition of Rsquared is pretty uncontroversial with the y.mean()
>> >> correction, if there is a constant in the regression (although I know
>> >> mainly the linear case for this).
>> >>
>> >> If there is no constant in the regression, the definition or Rsquared
>> >> is not clear/unambiguous, but usually used without mean correction of
>> >> y.
>> >>
>> >> Josef
>> >>
>> >> >
>> >> >>
>> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
>> >> >> covariance
>> >> >> of the parameter estimates.
>> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>> >> >
>> >> > I have been trying this out, but the fit just looks horrid compared
>> >> > to
>> >> > using
>> >> > leastsq method even though they call the same function according to
>> >> > the
>> >> > documentation.
>> >> >
>> >> >>
>> >> >> > I am aware of the chisquare function in stats function, but the
>> >> >> > documentation seems a little confusing to me. Any help would be
>> >> >> > greatly
>> >> >> > appreciates.
>> >> >>
>> >> >> chisquare and others like kolmogorov-smirnov are more for testing
>> >> >> the
>> >> >> goodness-of-fit of entire distributions, not for how well a curve or
>> >> >> line fits the data.
>> >> >>
>> >> >
>> >> > That is what I thought, which brought up my confusion when I asked
>> >> > other
>> >> > people and they told me to use that
>> >> >
>> >> >>
>> >> >> Josef
>> >> >>
>> >> >> >
>> >> >> > Thanks very much in advance.
>> >> >> >
>> >> >> > Cheers,
>> >> >> >
>> >> >> > Ben
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > SciPy-User mailing list
>> >> >> > SciPy-User at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Benedikt Riedel
>> >> > Graduate Student University of Wisconsin-Madison
>> >> > Department of Physics
>> >> > Office: 2304 Chamberlin Hall
>> >> > Lab: 6247 Chamberlin Hall
>> >> > Tel: ?(608) 301-5736
>> >> > Cell: (213) 519-1771
>> >> > Lab: (608) 262-5916
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> >
>> > --
>> > Benedikt Riedel
>> > Graduate Student University of Wisconsin-Madison
>> > Department of Physics
>> > Office: 2304 Chamberlin Hall
>> > Lab: 6247 Chamberlin Hall
>> > Tel: ?(608) 301-5736
>> > Cell: (213) 519-1771
>> > Lab: (608) 262-5916
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Benedikt Riedel
> Graduate Student University of Wisconsin-Madison
> Department of Physics
> Office: 2304 Chamberlin Hall
> Lab: 6247 Chamberlin Hall
> Tel: ?(608) 301-5736
> Cell: (213) 519-1771
> Lab: (608) 262-5916
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From briedel at wisc.edu  Mon May 17 02:01:07 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Mon, 17 May 2010 01:01:07 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
	<AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
	<AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>
Message-ID: <AANLkTikXCdOhQYvqCNiVzbpCdTsgUpLvp1V2R0y-AxPt@mail.gmail.com>

Thanks for the clarification. I am still not sure how to get the chi-squared
value of my regression though. When I use the formula under "Regression
Analysis" here

http://en.wikipedia.org/wiki/Goodness_of_fit

I get a chi-square somewhere around 19, which seems way to large compared to
the value of 3.2 I get for the same data set when I fit it using gnuplot.
Where gnuplot supposedly used the weighted sum of squares of residuals. I do
not fully this because of the results I get.

Here is the python code I used:

chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
2)/pow(R4errctsdataselect,2)))/dof

Sorry for being so thick headed, statistics is just beyond me at times.

Cheers,

Ben

On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:

> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >
> >
> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >> > What I still do not understand is the fact that curve_fit gives me a
> >> > different output then leastsq, even though curve_fit calls leastsq.
> >> >
> >> > I tried to get the chi-squared because we want to plot contours of
> >> > chi-square from the minimum to the maximum. I used following code:
> >> >
> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> > pinit = [20,20.]
> >> >
> >> > def func(x, a, b):
> >> >      return a*exp(-x) + b
> >> >
> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
> >> > sigma=R4errctsdataselect)
> >>
> >> this uses weighted least squares
> >> sigma : None or N-length sequence
> >>    If not None, it represents the standard-deviation of ydata. This
> >> vector, if given, will be used as weights in the least-squares problem
> >>
> >> In your initial example with leastsq you don't have any weighting,
> >> it's just ordinary least squares
> >>
> >> maybe that's the difference.
> >>
> >>
> >
> > Yeah I guess that will be it.
> >
> >>
> >> > print pfinal
> >> > print covar
> >> > dof=size(tau)-size(pinit)
> >> > print dof
> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
> >> > tau)))/dof
> >> > print chi2
> >> >
> >> > I am not 100% sure I am doing the degrees of freedom calculation
> right.
> >> > I
> >> > got the chi-square formula from the Pearson chi-squared test.
> >>
> >> I don't recognize your formula for chi2, and I don't see the
> >> connection to Pearson chi-squared test .
> >>
> >> Do you have a reference?
> >>
> >
> > I based my use of the Pearson test from what I read in an Econometrics
> book,
> > but wiki has the a pretty good description. I basically based it off the
> > example there. Where the expected would be what comes out of the fit and
> > what you is the "R4ctsdataselect" for those specific values.
> >
> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>
> I looked at that, but it's a completely different case, the values in
> the formulas are frequencies
>
>    Oi = an observed frequency;
>    Ei = an expected (theoretical) frequency, asserted by the null
> hypothesis;
>
> not points on a regression curve
>
> Josef
>
> >
> >
> >>
> >> Josef
> >>
> >
> > Thanks again
> >
> > Ben
> >
> >
> >>
> >> >
> >> > Thank you very much for the help so far.
> >> >
> >> > Cheers,
> >> >
> >> > Ben
> >> >
> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
> >> >> >>
> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <
> briedel at wisc.edu>
> >> >> >> wrote:
> >> >> >> > Hey,
> >> >> >> >
> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
> a
> >> >> >> > set
> >> >> >> > of
> >> >> >> > data. Currently, I am using following code:
> >> >> >> >
> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> >> >> > pinit = [20,20.]
> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
> >> >> >> > full_output=1)
> >> >> >> >
> >> >> >> > I am now trying to get the goodness of fit out of this data. I
> am
> >> >> >> > sort
> >> >> >> > of
> >> >> >> > running into a brick wall because I found a lot of conflicting
> >> >> >> > ways
> >> >> >> > of
> >> >> >> > how
> >> >> >> > to calculate it.
> >> >> >>
> >> >> >> For regression the usual is
> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
> >> >> >> coefficient of determination is
> >> >> >>
> >> >> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
> >> >> >>
> >> >> >> Note your fitfunc is linear in parameters and can be better
> >> >> >> estimated
> >> >> >> by linear least squares, OLS.
> >> >> >> linear regression is handled in statsmodels and you can get lot's
> of
> >> >> >> statistics without worrying about the formulas.
> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
> >> >> >> also
> >> >> >> works
> >> >> >>
> >> >> >
> >> >> > Thanks for the information. I am still note quite sure if this is
> >> >> > what
> >> >> > my
> >> >> > boss wants because there should not be an average y value.
> >> >>
> >> >> The definition of Rsquared is pretty uncontroversial with the
> y.mean()
> >> >> correction, if there is a constant in the regression (although I know
> >> >> mainly the linear case for this).
> >> >>
> >> >> If there is no constant in the regression, the definition or Rsquared
> >> >> is not clear/unambiguous, but usually used without mean correction of
> >> >> y.
> >> >>
> >> >> Josef
> >> >>
> >> >> >
> >> >> >>
> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
> >> >> >> covariance
> >> >> >> of the parameter estimates.
> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
> >> >> >
> >> >> > I have been trying this out, but the fit just looks horrid compared
> >> >> > to
> >> >> > using
> >> >> > leastsq method even though they call the same function according to
> >> >> > the
> >> >> > documentation.
> >> >> >
> >> >> >>
> >> >> >> > I am aware of the chisquare function in stats function, but the
> >> >> >> > documentation seems a little confusing to me. Any help would be
> >> >> >> > greatly
> >> >> >> > appreciates.
> >> >> >>
> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
> >> >> >> the
> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
> or
> >> >> >> line fits the data.
> >> >> >>
> >> >> >
> >> >> > That is what I thought, which brought up my confusion when I asked
> >> >> > other
> >> >> > people and they told me to use that
> >> >> >
> >> >> >>
> >> >> >> Josef
> >> >> >>
> >> >> >> >
> >> >> >> > Thanks very much in advance.
> >> >> >> >
> >> >> >> > Cheers,
> >> >> >> >
> >> >> >> > Ben
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > _______________________________________________
> >> >> >> > SciPy-User mailing list
> >> >> >> > SciPy-User at scipy.org
> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >> >
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> SciPy-User mailing list
> >> >> >> SciPy-User at scipy.org
> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Benedikt Riedel
> >> >> > Graduate Student University of Wisconsin-Madison
> >> >> > Department of Physics
> >> >> > Office: 2304 Chamberlin Hall
> >> >> > Lab: 6247 Chamberlin Hall
> >> >> > Tel:  (608) 301-5736
> >> >> > Cell: (213) 519-1771
> >> >> > Lab: (608) 262-5916
> >> >> >
> >> >> > _______________________________________________
> >> >> > SciPy-User mailing list
> >> >> > SciPy-User at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> SciPy-User mailing list
> >> >> SciPy-User at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> >
> >> > --
> >> > Benedikt Riedel
> >> > Graduate Student University of Wisconsin-Madison
> >> > Department of Physics
> >> > Office: 2304 Chamberlin Hall
> >> > Lab: 6247 Chamberlin Hall
> >> > Tel:  (608) 301-5736
> >> > Cell: (213) 519-1771
> >> > Lab: (608) 262-5916
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> > --
> > Benedikt Riedel
> > Graduate Student University of Wisconsin-Madison
> > Department of Physics
> > Office: 2304 Chamberlin Hall
> > Lab: 6247 Chamberlin Hall
> > Tel:  (608) 301-5736
> > Cell: (213) 519-1771
> > Lab: (608) 262-5916
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/4bbf35e3/attachment.html>

From josef.pktd at gmail.com  Mon May 17 07:35:27 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 17 May 2010 07:35:27 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTikXCdOhQYvqCNiVzbpCdTsgUpLvp1V2R0y-AxPt@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
	<AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
	<AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>
	<AANLkTikXCdOhQYvqCNiVzbpCdTsgUpLvp1V2R0y-AxPt@mail.gmail.com>
Message-ID: <AANLkTinp0MykUcivTVjdPOUFcEUc6G5N7UtShBlSEgzT@mail.gmail.com>

On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
> Thanks for the clarification. I am still not sure how to get the chi-squared
> value of my regression though. When I use the formula under "Regression
> Analysis" here
>
> http://en.wikipedia.org/wiki/Goodness_of_fit
>
> I get a chi-square somewhere around 19, which seems way to large compared to
> the value of 3.2 I get for the same data set when I fit it using gnuplot.
> Where gnuplot supposedly used the weighted sum of squares of residuals. I do
> not fully this because of the results I get.
>
> Here is the python code I used:
>
> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
> 2)/pow(R4errctsdataselect,2)))/dof


from some gnuplot help page it looks like what they call chisquare is WSSR/dof

which would be something like

chi2=(sum(  ( R4ctsdataselect-fitfunc(pinit, tau)) /
sqrt(R4errctsdataselect) )**2  )/dof

I'm not sure whether the sqrt is in there or not, because I don't
remember the normalization that is used, weights or weights squared

Josef


>
> Sorry for being so thick headed, statistics is just beyond me at times.
>
> Cheers,
>
> Ben
>
> On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:
>>
>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
>> wrote:
>> >
>> >
>> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
>> >> wrote:
>> >> > What I still do not understand is the fact that curve_fit gives me a
>> >> > different output then leastsq, even though curve_fit calls leastsq.
>> >> >
>> >> > I tried to get the chi-squared because we want to plot contours of
>> >> > chi-square from the minimum to the maximum. I used following code:
>> >> >
>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> > pinit = [20,20.]
>> >> >
>> >> > def func(x, a, b):
>> >> > ???? return a*exp(-x) + b
>> >> >
>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
>> >> > sigma=R4errctsdataselect)
>> >>
>> >> this uses weighted least squares
>> >> sigma : None or N-length sequence
>> >> ? ?If not None, it represents the standard-deviation of ydata. This
>> >> vector, if given, will be used as weights in the least-squares problem
>> >>
>> >> In your initial example with leastsq you don't have any weighting,
>> >> it's just ordinary least squares
>> >>
>> >> maybe that's the difference.
>> >>
>> >>
>> >
>> > Yeah I guess that will be it.
>> >
>> >>
>> >> > print pfinal
>> >> > print covar
>> >> > dof=size(tau)-size(pinit)
>> >> > print dof
>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
>> >> > tau)))/dof
>> >> > print chi2
>> >> >
>> >> > I am not 100% sure I am doing the degrees of freedom calculation
>> >> > right.
>> >> > I
>> >> > got the chi-square formula from the Pearson chi-squared test.
>> >>
>> >> I don't recognize your formula for chi2, and I don't see the
>> >> connection to Pearson chi-squared test .
>> >>
>> >> Do you have a reference?
>> >>
>> >
>> > I based my use of the Pearson test from what I read in an Econometrics
>> > book,
>> > but wiki has the a pretty good description. I basically based it off the
>> > example there. Where the expected would be what comes out of the fit and
>> > what you is the "R4ctsdataselect" for those specific values.
>> >
>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>>
>> I looked at that, but it's a completely different case, the values in
>> the formulas are frequencies
>>
>> ? ?Oi = an observed frequency;
>> ? ?Ei = an expected (theoretical) frequency, asserted by the null
>> hypothesis;
>>
>> not points on a regression curve
>>
>> Josef
>>
>> >
>> >
>> >>
>> >> Josef
>> >>
>> >
>> > Thanks again
>> >
>> > Ben
>> >
>> >
>> >>
>> >> >
>> >> > Thank you very much for the help so far.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Ben
>> >> >
>> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>> >> >> >>
>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel
>> >> >> >> <briedel at wisc.edu>
>> >> >> >> wrote:
>> >> >> >> > Hey,
>> >> >> >> >
>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
>> >> >> >> > a
>> >> >> >> > set
>> >> >> >> > of
>> >> >> >> > data. Currently, I am using following code:
>> >> >> >> >
>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> >> >> > pinit = [20,20.]
>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>> >> >> >> > full_output=1)
>> >> >> >> >
>> >> >> >> > I am now trying to get the goodness of fit out of this data. I
>> >> >> >> > am
>> >> >> >> > sort
>> >> >> >> > of
>> >> >> >> > running into a brick wall because I found a lot of conflicting
>> >> >> >> > ways
>> >> >> >> > of
>> >> >> >> > how
>> >> >> >> > to calculate it.
>> >> >> >>
>> >> >> >> For regression the usual is
>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>> >> >> >> coefficient of determination is
>> >> >> >>
>> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}}
>> >> >> >>
>> >> >> >> Note your fitfunc is linear in parameters and can be better
>> >> >> >> estimated
>> >> >> >> by linear least squares, OLS.
>> >> >> >> linear regression is handled in statsmodels and you can get lot's
>> >> >> >> of
>> >> >> >> statistics without worrying about the formulas.
>> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
>> >> >> >> also
>> >> >> >> works
>> >> >> >>
>> >> >> >
>> >> >> > Thanks for the information. I am still note quite sure if this is
>> >> >> > what
>> >> >> > my
>> >> >> > boss wants because there should not be an average y value.
>> >> >>
>> >> >> The definition of Rsquared is pretty uncontroversial with the
>> >> >> y.mean()
>> >> >> correction, if there is a constant in the regression (although I
>> >> >> know
>> >> >> mainly the linear case for this).
>> >> >>
>> >> >> If there is no constant in the regression, the definition or
>> >> >> Rsquared
>> >> >> is not clear/unambiguous, but usually used without mean correction
>> >> >> of
>> >> >> y.
>> >> >>
>> >> >> Josef
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
>> >> >> >> covariance
>> >> >> >> of the parameter estimates.
>> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>> >> >> >
>> >> >> > I have been trying this out, but the fit just looks horrid
>> >> >> > compared
>> >> >> > to
>> >> >> > using
>> >> >> > leastsq method even though they call the same function according
>> >> >> > to
>> >> >> > the
>> >> >> > documentation.
>> >> >> >
>> >> >> >>
>> >> >> >> > I am aware of the chisquare function in stats function, but the
>> >> >> >> > documentation seems a little confusing to me. Any help would be
>> >> >> >> > greatly
>> >> >> >> > appreciates.
>> >> >> >>
>> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
>> >> >> >> the
>> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
>> >> >> >> or
>> >> >> >> line fits the data.
>> >> >> >>
>> >> >> >
>> >> >> > That is what I thought, which brought up my confusion when I asked
>> >> >> > other
>> >> >> > people and they told me to use that
>> >> >> >
>> >> >> >>
>> >> >> >> Josef
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Thanks very much in advance.
>> >> >> >> >
>> >> >> >> > Cheers,
>> >> >> >> >
>> >> >> >> > Ben
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > _______________________________________________
>> >> >> >> > SciPy-User mailing list
>> >> >> >> > SciPy-User at scipy.org
>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >> >
>> >> >> >> >
>> >> >> >> _______________________________________________
>> >> >> >> SciPy-User mailing list
>> >> >> >> SciPy-User at scipy.org
>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Benedikt Riedel
>> >> >> > Graduate Student University of Wisconsin-Madison
>> >> >> > Department of Physics
>> >> >> > Office: 2304 Chamberlin Hall
>> >> >> > Lab: 6247 Chamberlin Hall
>> >> >> > Tel: ?(608) 301-5736
>> >> >> > Cell: (213) 519-1771
>> >> >> > Lab: (608) 262-5916
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > SciPy-User mailing list
>> >> >> > SciPy-User at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Benedikt Riedel
>> >> > Graduate Student University of Wisconsin-Madison
>> >> > Department of Physics
>> >> > Office: 2304 Chamberlin Hall
>> >> > Lab: 6247 Chamberlin Hall
>> >> > Tel: ?(608) 301-5736
>> >> > Cell: (213) 519-1771
>> >> > Lab: (608) 262-5916
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> >
>> > --
>> > Benedikt Riedel
>> > Graduate Student University of Wisconsin-Madison
>> > Department of Physics
>> > Office: 2304 Chamberlin Hall
>> > Lab: 6247 Chamberlin Hall
>> > Tel: ?(608) 301-5736
>> > Cell: (213) 519-1771
>> > Lab: (608) 262-5916
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Benedikt Riedel
> Graduate Student University of Wisconsin-Madison
> Department of Physics
> Office: 2304 Chamberlin Hall
> Lab: 6247 Chamberlin Hall
> Tel: ?(608) 301-5736
> Cell: (213) 519-1771
> Lab: (608) 262-5916
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From georges.schutz at internet.lu  Mon May 17 08:39:34 2010
From: georges.schutz at internet.lu (Georges Schutz)
Date: Mon, 17 May 2010 14:39:34 +0200
Subject: [SciPy-User] scikits.timeseries: How to define frequency of
	15minutes
In-Reply-To: <fb7cd7eb49d9.4be15e79@zsw-bw.de>
References: <fb7cd7eb49d9.4be15e79@zsw-bw.de>
Message-ID: <4BF13906.9030707@internet.lu>

Hi Martin,
It is good to hear that there are others facing the same problem because 
this my raise the importance of that issue for future plans.

The solution you propose would be OK for me, I think I could live a 
while with being restricted to the proposed frequencies even if I would 
look foreword to customizable frequency on the long term.

Thanks
Georges Schutz

On 05/05/2010 12:03, Martin Felder wrote:
> Hi *,
>
> just for the record, I'm having the exact same problem as Georges. I
> read through your discussion from three weeks ago, but I also don't feel
> up to modifying the C code myself (being a Fortran kind of guy...).
>
> I understand implementing custom user-defined frequencies is probably a
> lot of effort, but maybe it's less troublesome to just add some
> frequencies often used (=by Georges and me, and hopefully others?) to
> the currently implemented ones? I'd be extremely happy to have 12h, 6h,
> 3h, 15min and 10min intervals in addition to the existing ones.
>
> If you could point me to the part of the code that would have to be
> modified for that, maybe I can find someone more apt in C who can
> implement it.
>
> Thanks,
> Martin
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From josef.pktd at gmail.com  Mon May 17 11:20:13 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 17 May 2010 11:20:13 -0400
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTinp0MykUcivTVjdPOUFcEUc6G5N7UtShBlSEgzT@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilxtURNwTjZrD8_rD6-rbDKZSrsOf1qlONjhiYI@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
	<AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
	<AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>
	<AANLkTikXCdOhQYvqCNiVzbpCdTsgUpLvp1V2R0y-AxPt@mail.gmail.com>
	<AANLkTinp0MykUcivTVjdPOUFcEUc6G5N7UtShBlSEgzT@mail.gmail.com>
Message-ID: <AANLkTilW5-f_dfrUZ9B2vzjsIjuUuyMAA_kmYnov5S3N@mail.gmail.com>

On Mon, May 17, 2010 at 7:35 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
>> Thanks for the clarification. I am still not sure how to get the chi-squared
>> value of my regression though. When I use the formula under "Regression
>> Analysis" here
>>
>> http://en.wikipedia.org/wiki/Goodness_of_fit
>>
>> I get a chi-square somewhere around 19, which seems way to large compared to
>> the value of 3.2 I get for the same data set when I fit it using gnuplot.
>> Where gnuplot supposedly used the weighted sum of squares of residuals. I do
>> not fully this because of the results I get.
>>
>> Here is the python code I used:
>>
>> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
>> 2)/pow(R4errctsdataselect,2)))/dof
>
>
> from some gnuplot help page it looks like what they call chisquare is WSSR/dof
>
> which would be something like
>
> chi2=(sum( ?( R4ctsdataselect-fitfunc(pinit, tau)) /
> sqrt(R4errctsdataselect) )**2 ?)/dof
>
> I'm not sure whether the sqrt is in there or not, because I don't
> remember the normalization that is used, weights or weights squared

(for reference)
gnuplot is pretty vague on the denominator
http://theochem.ki.ku.dk/on_line_docs/gnuplot/gnuplot_21.html#SEC81

a bit better explanation of the terminology
http://www.graphpad.com/faq/viewfaq.cfm?faq=926

any more explicit reference has sigma in the denominator

Josef

> Josef
>
>
>
>
>>
>> Sorry for being so thick headed, statistics is just beyond me at times.
>>
>> Cheers,
>>
>> Ben
>>
>> On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:
>>>
>>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
>>> wrote:
>>> >
>>> >
>>> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
>>> >>
>>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
>>> >> wrote:
>>> >> > What I still do not understand is the fact that curve_fit gives me a
>>> >> > different output then leastsq, even though curve_fit calls leastsq.
>>> >> >
>>> >> > I tried to get the chi-squared because we want to plot contours of
>>> >> > chi-square from the minimum to the maximum. I used following code:
>>> >> >
>>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>>> >> > pinit = [20,20.]
>>> >> >
>>> >> > def func(x, a, b):
>>> >> > ???? return a*exp(-x) + b
>>> >> >
>>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
>>> >> > sigma=R4errctsdataselect)
>>> >>
>>> >> this uses weighted least squares
>>> >> sigma : None or N-length sequence
>>> >> ? ?If not None, it represents the standard-deviation of ydata. This
>>> >> vector, if given, will be used as weights in the least-squares problem
>>> >>
>>> >> In your initial example with leastsq you don't have any weighting,
>>> >> it's just ordinary least squares
>>> >>
>>> >> maybe that's the difference.
>>> >>
>>> >>
>>> >
>>> > Yeah I guess that will be it.
>>> >
>>> >>
>>> >> > print pfinal
>>> >> > print covar
>>> >> > dof=size(tau)-size(pinit)
>>> >> > print dof
>>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
>>> >> > tau)))/dof
>>> >> > print chi2
>>> >> >
>>> >> > I am not 100% sure I am doing the degrees of freedom calculation
>>> >> > right.
>>> >> > I
>>> >> > got the chi-square formula from the Pearson chi-squared test.
>>> >>
>>> >> I don't recognize your formula for chi2, and I don't see the
>>> >> connection to Pearson chi-squared test .
>>> >>
>>> >> Do you have a reference?
>>> >>
>>> >
>>> > I based my use of the Pearson test from what I read in an Econometrics
>>> > book,
>>> > but wiki has the a pretty good description. I basically based it off the
>>> > example there. Where the expected would be what comes out of the fit and
>>> > what you is the "R4ctsdataselect" for those specific values.
>>> >
>>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>>>
>>> I looked at that, but it's a completely different case, the values in
>>> the formulas are frequencies
>>>
>>> ? ?Oi = an observed frequency;
>>> ? ?Ei = an expected (theoretical) frequency, asserted by the null
>>> hypothesis;
>>>
>>> not points on a regression curve
>>>
>>> Josef
>>>
>>> >
>>> >
>>> >>
>>> >> Josef
>>> >>
>>> >
>>> > Thanks again
>>> >
>>> > Ben
>>> >
>>> >
>>> >>
>>> >> >
>>> >> > Thank you very much for the help so far.
>>> >> >
>>> >> > Cheers,
>>> >> >
>>> >> > Ben
>>> >> >
>>> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>>> >> >>
>>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>>> >> >> >>
>>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel
>>> >> >> >> <briedel at wisc.edu>
>>> >> >> >> wrote:
>>> >> >> >> > Hey,
>>> >> >> >> >
>>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
>>> >> >> >> > a
>>> >> >> >> > set
>>> >> >> >> > of
>>> >> >> >> > data. Currently, I am using following code:
>>> >> >> >> >
>>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>>> >> >> >> > pinit = [20,20.]
>>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>>> >> >> >> > full_output=1)
>>> >> >> >> >
>>> >> >> >> > I am now trying to get the goodness of fit out of this data. I
>>> >> >> >> > am
>>> >> >> >> > sort
>>> >> >> >> > of
>>> >> >> >> > running into a brick wall because I found a lot of conflicting
>>> >> >> >> > ways
>>> >> >> >> > of
>>> >> >> >> > how
>>> >> >> >> > to calculate it.
>>> >> >> >>
>>> >> >> >> For regression the usual is
>>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>>> >> >> >> coefficient of determination is
>>> >> >> >>
>>> >> >> >> ? ?R^2 = 1 - {SS_{err} / SS_{tot}}
>>> >> >> >>
>>> >> >> >> Note your fitfunc is linear in parameters and can be better
>>> >> >> >> estimated
>>> >> >> >> by linear least squares, OLS.
>>> >> >> >> linear regression is handled in statsmodels and you can get lot's
>>> >> >> >> of
>>> >> >> >> statistics without worrying about the formulas.
>>> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
>>> >> >> >> also
>>> >> >> >> works
>>> >> >> >>
>>> >> >> >
>>> >> >> > Thanks for the information. I am still note quite sure if this is
>>> >> >> > what
>>> >> >> > my
>>> >> >> > boss wants because there should not be an average y value.
>>> >> >>
>>> >> >> The definition of Rsquared is pretty uncontroversial with the
>>> >> >> y.mean()
>>> >> >> correction, if there is a constant in the regression (although I
>>> >> >> know
>>> >> >> mainly the linear case for this).
>>> >> >>
>>> >> >> If there is no constant in the regression, the definition or
>>> >> >> Rsquared
>>> >> >> is not clear/unambiguous, but usually used without mean correction
>>> >> >> of
>>> >> >> y.
>>> >> >>
>>> >> >> Josef
>>> >> >>
>>> >> >> >
>>> >> >> >>
>>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
>>> >> >> >> covariance
>>> >> >> >> of the parameter estimates.
>>> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>>> >> >> >
>>> >> >> > I have been trying this out, but the fit just looks horrid
>>> >> >> > compared
>>> >> >> > to
>>> >> >> > using
>>> >> >> > leastsq method even though they call the same function according
>>> >> >> > to
>>> >> >> > the
>>> >> >> > documentation.
>>> >> >> >
>>> >> >> >>
>>> >> >> >> > I am aware of the chisquare function in stats function, but the
>>> >> >> >> > documentation seems a little confusing to me. Any help would be
>>> >> >> >> > greatly
>>> >> >> >> > appreciates.
>>> >> >> >>
>>> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
>>> >> >> >> the
>>> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
>>> >> >> >> or
>>> >> >> >> line fits the data.
>>> >> >> >>
>>> >> >> >
>>> >> >> > That is what I thought, which brought up my confusion when I asked
>>> >> >> > other
>>> >> >> > people and they told me to use that
>>> >> >> >
>>> >> >> >>
>>> >> >> >> Josef
>>> >> >> >>
>>> >> >> >> >
>>> >> >> >> > Thanks very much in advance.
>>> >> >> >> >
>>> >> >> >> > Cheers,
>>> >> >> >> >
>>> >> >> >> > Ben
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > _______________________________________________
>>> >> >> >> > SciPy-User mailing list
>>> >> >> >> > SciPy-User at scipy.org
>>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> _______________________________________________
>>> >> >> >> SciPy-User mailing list
>>> >> >> >> SciPy-User at scipy.org
>>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Benedikt Riedel
>>> >> >> > Graduate Student University of Wisconsin-Madison
>>> >> >> > Department of Physics
>>> >> >> > Office: 2304 Chamberlin Hall
>>> >> >> > Lab: 6247 Chamberlin Hall
>>> >> >> > Tel: ?(608) 301-5736
>>> >> >> > Cell: (213) 519-1771
>>> >> >> > Lab: (608) 262-5916
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > SciPy-User mailing list
>>> >> >> > SciPy-User at scipy.org
>>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >
>>> >> >> >
>>> >> >> _______________________________________________
>>> >> >> SciPy-User mailing list
>>> >> >> SciPy-User at scipy.org
>>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Benedikt Riedel
>>> >> > Graduate Student University of Wisconsin-Madison
>>> >> > Department of Physics
>>> >> > Office: 2304 Chamberlin Hall
>>> >> > Lab: 6247 Chamberlin Hall
>>> >> > Tel: ?(608) 301-5736
>>> >> > Cell: (213) 519-1771
>>> >> > Lab: (608) 262-5916
>>> >> >
>>> >> > _______________________________________________
>>> >> > SciPy-User mailing list
>>> >> > SciPy-User at scipy.org
>>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> >
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> >
>>> > --
>>> > Benedikt Riedel
>>> > Graduate Student University of Wisconsin-Madison
>>> > Department of Physics
>>> > Office: 2304 Chamberlin Hall
>>> > Lab: 6247 Chamberlin Hall
>>> > Tel: ?(608) 301-5736
>>> > Cell: (213) 519-1771
>>> > Lab: (608) 262-5916
>>> >
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>>
>> --
>> Benedikt Riedel
>> Graduate Student University of Wisconsin-Madison
>> Department of Physics
>> Office: 2304 Chamberlin Hall
>> Lab: 6247 Chamberlin Hall
>> Tel: ?(608) 301-5736
>> Cell: (213) 519-1771
>> Lab: (608) 262-5916
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>


From briedel at wisc.edu  Mon May 17 13:26:59 2010
From: briedel at wisc.edu (Benedikt Riedel)
Date: Mon, 17 May 2010 12:26:59 -0500
Subject: [SciPy-User] Least Square fit and goodness of fit
In-Reply-To: <AANLkTilW5-f_dfrUZ9B2vzjsIjuUuyMAA_kmYnov5S3N@mail.gmail.com>
References: <AANLkTikxv1fW3QLfWJQ7OPceY3xzPrEPOGnnT7YK2pgV@mail.gmail.com>
	<AANLkTilL0tS4-4B9GQRqPIrHdxedbvQ2ylOsuPq9Kqji@mail.gmail.com>
	<AANLkTin74OuVhxPV0q_rOLka_DMgOFAPzYDDGWrmXuO0@mail.gmail.com>
	<AANLkTilCanp88iRBi1ppjw4qTnAdD_XCxeIm0qfNGyCR@mail.gmail.com>
	<AANLkTikTItF6IkvbHowa8tXn7SjcRKM5KNaY2QA6oY63@mail.gmail.com>
	<AANLkTinIHduS2f7gRWwbDaMnFzBXyT53m7eqlPBuaMhI@mail.gmail.com>
	<AANLkTingGnLXtlT7BzgQNYKHZ_Ji_zF53MITFg2BTvcf@mail.gmail.com>
	<AANLkTikXCdOhQYvqCNiVzbpCdTsgUpLvp1V2R0y-AxPt@mail.gmail.com>
	<AANLkTinp0MykUcivTVjdPOUFcEUc6G5N7UtShBlSEgzT@mail.gmail.com>
	<AANLkTilW5-f_dfrUZ9B2vzjsIjuUuyMAA_kmYnov5S3N@mail.gmail.com>
Message-ID: <AANLkTiktfArEunvPK8IwtpEoVmlkGTyiM-zRducXdZiC@mail.gmail.com>

Thanks for the references. I have adjusted the code, such that
R4errctsdataselect is sigma and not sigma_squared. Oddly, enough I made a
stupid mistake by using the original guess for parameters rather than final
guess of parameters in my chi-squared, which of course threw me off.

Thanks again for the help.

Cheers,

Ben

On Mon, May 17, 2010 at 10:20, <josef.pktd at gmail.com> wrote:

> On Mon, May 17, 2010 at 7:35 AM,  <josef.pktd at gmail.com> wrote:
> > On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >> Thanks for the clarification. I am still not sure how to get the
> chi-squared
> >> value of my regression though. When I use the formula under "Regression
> >> Analysis" here
> >>
> >> http://en.wikipedia.org/wiki/Goodness_of_fit
> >>
> >> I get a chi-square somewhere around 19, which seems way to large
> compared to
> >> the value of 3.2 I get for the same data set when I fit it using
> gnuplot.
> >> Where gnuplot supposedly used the weighted sum of squares of residuals.
> I do
> >> not fully this because of the results I get.
> >>
> >> Here is the python code I used:
> >>
> >> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
> >> 2)/pow(R4errctsdataselect,2)))/dof
> >
> >
> > from some gnuplot help page it looks like what they call chisquare is
> WSSR/dof
> >
> > which would be something like
> >
> > chi2=(sum(  ( R4ctsdataselect-fitfunc(pinit, tau)) /
> > sqrt(R4errctsdataselect) )**2  )/dof
> >
> > I'm not sure whether the sqrt is in there or not, because I don't
> > remember the normalization that is used, weights or weights squared
>
> (for reference)
> gnuplot is pretty vague on the denominator
> http://theochem.ki.ku.dk/on_line_docs/gnuplot/gnuplot_21.html#SEC81
>
> a bit better explanation of the terminology
> http://www.graphpad.com/faq/viewfaq.cfm?faq=926
>
> any more explicit reference has sigma in the denominator
>
> Josef
>
> > Josef
> >
> >
> >
> >
> >>
> >> Sorry for being so thick headed, statistics is just beyond me at times.
> >>
> >> Cheers,
> >>
> >> Ben
> >>
> >> On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:
> >>>
> >>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
> >>> wrote:
> >>> >
> >>> >
> >>> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
> >>> >>
> >>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
> >>> >> wrote:
> >>> >> > What I still do not understand is the fact that curve_fit gives me
> a
> >>> >> > different output then leastsq, even though curve_fit calls
> leastsq.
> >>> >> >
> >>> >> > I tried to get the chi-squared because we want to plot contours of
> >>> >> > chi-square from the minimum to the maximum. I used following code:
> >>> >> >
> >>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >>> >> > pinit = [20,20.]
> >>> >> >
> >>> >> > def func(x, a, b):
> >>> >> >      return a*exp(-x) + b
> >>> >> >
> >>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
> >>> >> > sigma=R4errctsdataselect)
> >>> >>
> >>> >> this uses weighted least squares
> >>> >> sigma : None or N-length sequence
> >>> >>    If not None, it represents the standard-deviation of ydata. This
> >>> >> vector, if given, will be used as weights in the least-squares
> problem
> >>> >>
> >>> >> In your initial example with leastsq you don't have any weighting,
> >>> >> it's just ordinary least squares
> >>> >>
> >>> >> maybe that's the difference.
> >>> >>
> >>> >>
> >>> >
> >>> > Yeah I guess that will be it.
> >>> >
> >>> >>
> >>> >> > print pfinal
> >>> >> > print covar
> >>> >> > dof=size(tau)-size(pinit)
> >>> >> > print dof
> >>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
> 2)/fitfunc(pinit,
> >>> >> > tau)))/dof
> >>> >> > print chi2
> >>> >> >
> >>> >> > I am not 100% sure I am doing the degrees of freedom calculation
> >>> >> > right.
> >>> >> > I
> >>> >> > got the chi-square formula from the Pearson chi-squared test.
> >>> >>
> >>> >> I don't recognize your formula for chi2, and I don't see the
> >>> >> connection to Pearson chi-squared test .
> >>> >>
> >>> >> Do you have a reference?
> >>> >>
> >>> >
> >>> > I based my use of the Pearson test from what I read in an
> Econometrics
> >>> > book,
> >>> > but wiki has the a pretty good description. I basically based it off
> the
> >>> > example there. Where the expected would be what comes out of the fit
> and
> >>> > what you is the "R4ctsdataselect" for those specific values.
> >>> >
> >>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
> >>>
> >>> I looked at that, but it's a completely different case, the values in
> >>> the formulas are frequencies
> >>>
> >>>    Oi = an observed frequency;
> >>>    Ei = an expected (theoretical) frequency, asserted by the null
> >>> hypothesis;
> >>>
> >>> not points on a regression curve
> >>>
> >>> Josef
> >>>
> >>> >
> >>> >
> >>> >>
> >>> >> Josef
> >>> >>
> >>> >
> >>> > Thanks again
> >>> >
> >>> > Ben
> >>> >
> >>> >
> >>> >>
> >>> >> >
> >>> >> > Thank you very much for the help so far.
> >>> >> >
> >>> >> > Cheers,
> >>> >> >
> >>> >> > Ben
> >>> >> >
> >>> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
> >>> >> >>
> >>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <
> briedel at wisc.edu>
> >>> >> >> wrote:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
> >>> >> >> >>
> >>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel
> >>> >> >> >> <briedel at wisc.edu>
> >>> >> >> >> wrote:
> >>> >> >> >> > Hey,
> >>> >> >> >> >
> >>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit
> to
> >>> >> >> >> > a
> >>> >> >> >> > set
> >>> >> >> >> > of
> >>> >> >> >> > data. Currently, I am using following code:
> >>> >> >> >> >
> >>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >>> >> >> >> > pinit = [20,20.]
> >>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
> >>> >> >> >> > full_output=1)
> >>> >> >> >> >
> >>> >> >> >> > I am now trying to get the goodness of fit out of this data.
> I
> >>> >> >> >> > am
> >>> >> >> >> > sort
> >>> >> >> >> > of
> >>> >> >> >> > running into a brick wall because I found a lot of
> conflicting
> >>> >> >> >> > ways
> >>> >> >> >> > of
> >>> >> >> >> > how
> >>> >> >> >> > to calculate it.
> >>> >> >> >>
> >>> >> >> >> For regression the usual is
> >>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
> >>> >> >> >> coefficient of determination is
> >>> >> >> >>
> >>> >> >> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
> >>> >> >> >>
> >>> >> >> >> Note your fitfunc is linear in parameters and can be better
> >>> >> >> >> estimated
> >>> >> >> >> by linear least squares, OLS.
> >>> >> >> >> linear regression is handled in statsmodels and you can get
> lot's
> >>> >> >> >> of
> >>> >> >> >> statistics without worrying about the formulas.
> >>> >> >> >> If you only have one slope parameter, then
> scipy.stats.linregress
> >>> >> >> >> also
> >>> >> >> >> works
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> > Thanks for the information. I am still note quite sure if this
> is
> >>> >> >> > what
> >>> >> >> > my
> >>> >> >> > boss wants because there should not be an average y value.
> >>> >> >>
> >>> >> >> The definition of Rsquared is pretty uncontroversial with the
> >>> >> >> y.mean()
> >>> >> >> correction, if there is a constant in the regression (although I
> >>> >> >> know
> >>> >> >> mainly the linear case for this).
> >>> >> >>
> >>> >> >> If there is no constant in the regression, the definition or
> >>> >> >> Rsquared
> >>> >> >> is not clear/unambiguous, but usually used without mean
> correction
> >>> >> >> of
> >>> >> >> y.
> >>> >> >>
> >>> >> >> Josef
> >>> >> >>
> >>> >> >> >
> >>> >> >> >>
> >>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
> >>> >> >> >> covariance
> >>> >> >> >> of the parameter estimates.
> >>> >> >> >>
> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
> >>> >> >> >
> >>> >> >> > I have been trying this out, but the fit just looks horrid
> >>> >> >> > compared
> >>> >> >> > to
> >>> >> >> > using
> >>> >> >> > leastsq method even though they call the same function
> according
> >>> >> >> > to
> >>> >> >> > the
> >>> >> >> > documentation.
> >>> >> >> >
> >>> >> >> >>
> >>> >> >> >> > I am aware of the chisquare function in stats function, but
> the
> >>> >> >> >> > documentation seems a little confusing to me. Any help would
> be
> >>> >> >> >> > greatly
> >>> >> >> >> > appreciates.
> >>> >> >> >>
> >>> >> >> >> chisquare and others like kolmogorov-smirnov are more for
> testing
> >>> >> >> >> the
> >>> >> >> >> goodness-of-fit of entire distributions, not for how well a
> curve
> >>> >> >> >> or
> >>> >> >> >> line fits the data.
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> > That is what I thought, which brought up my confusion when I
> asked
> >>> >> >> > other
> >>> >> >> > people and they told me to use that
> >>> >> >> >
> >>> >> >> >>
> >>> >> >> >> Josef
> >>> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> > Thanks very much in advance.
> >>> >> >> >> >
> >>> >> >> >> > Cheers,
> >>> >> >> >> >
> >>> >> >> >> > Ben
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > _______________________________________________
> >>> >> >> >> > SciPy-User mailing list
> >>> >> >> >> > SciPy-User at scipy.org
> >>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> _______________________________________________
> >>> >> >> >> SciPy-User mailing list
> >>> >> >> >> SciPy-User at scipy.org
> >>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Benedikt Riedel
> >>> >> >> > Graduate Student University of Wisconsin-Madison
> >>> >> >> > Department of Physics
> >>> >> >> > Office: 2304 Chamberlin Hall
> >>> >> >> > Lab: 6247 Chamberlin Hall
> >>> >> >> > Tel:  (608) 301-5736
> >>> >> >> > Cell: (213) 519-1771
> >>> >> >> > Lab: (608) 262-5916
> >>> >> >> >
> >>> >> >> > _______________________________________________
> >>> >> >> > SciPy-User mailing list
> >>> >> >> > SciPy-User at scipy.org
> >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >> >
> >>> >> >> >
> >>> >> >> _______________________________________________
> >>> >> >> SciPy-User mailing list
> >>> >> >> SciPy-User at scipy.org
> >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Benedikt Riedel
> >>> >> > Graduate Student University of Wisconsin-Madison
> >>> >> > Department of Physics
> >>> >> > Office: 2304 Chamberlin Hall
> >>> >> > Lab: 6247 Chamberlin Hall
> >>> >> > Tel:  (608) 301-5736
> >>> >> > Cell: (213) 519-1771
> >>> >> > Lab: (608) 262-5916
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > SciPy-User mailing list
> >>> >> > SciPy-User at scipy.org
> >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >
> >>> >> >
> >>> >> _______________________________________________
> >>> >> SciPy-User mailing list
> >>> >> SciPy-User at scipy.org
> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Benedikt Riedel
> >>> > Graduate Student University of Wisconsin-Madison
> >>> > Department of Physics
> >>> > Office: 2304 Chamberlin Hall
> >>> > Lab: 6247 Chamberlin Hall
> >>> > Tel:  (608) 301-5736
> >>> > Cell: (213) 519-1771
> >>> > Lab: (608) 262-5916
> >>> >
> >>> > _______________________________________________
> >>> > SciPy-User mailing list
> >>> > SciPy-User at scipy.org
> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >
> >>> >
> >>> _______________________________________________
> >>> SciPy-User mailing list
> >>> SciPy-User at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >>
> >>
> >> --
> >> Benedikt Riedel
> >> Graduate Student University of Wisconsin-Madison
> >> Department of Physics
> >> Office: 2304 Chamberlin Hall
> >> Lab: 6247 Chamberlin Hall
> >> Tel:  (608) 301-5736
> >> Cell: (213) 519-1771
> >> Lab: (608) 262-5916
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >>
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/c665fbc3/attachment.html>

From jsseabold at gmail.com  Mon May 17 13:32:23 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Mon, 17 May 2010 13:32:23 -0400
Subject: [SciPy-User] sparse array hstack
In-Reply-To: <AANLkTik5MrT-veL3pCR9h8hsHck3jjeLzlw9mH9HH8Tr@mail.gmail.com>
References: <AANLkTik5MrT-veL3pCR9h8hsHck3jjeLzlw9mH9HH8Tr@mail.gmail.com>
Message-ID: <AANLkTilJ95O_YpZM3P0yCeSDn_X7KlE9otO0vIQKYdIW@mail.gmail.com>

On Thu, May 13, 2010 at 10:26 AM, Jason Rennie <jrennie at gmail.com> wrote:
> It appears that numpy.hstack doesn't work with scipy sparse arrays. ?I'm
> using scipy 0.6.0 (Debian stable). ?Am I observing correctly? ?Does a later
> version of numpy/scipy fix this? ?Or, is there code available which will do
> an hstack on sparse arrays?
> Thanks,
> Jason

You want to use scipy.sparse.hstack, which works for me with recent scipy trunk

In [20]: from scipy import sparse

In [21]: a = sparse.lil_matrix((10,10))

In [22]: a[0,0]=100

In [23]: b = sparse.lil_matrix((10,10))

In [24]: b[0,0] = 99

In [25]: c = sparse.hstack([a,b])

In [26]: c.toarray()
<snip>

Skipper


From wesmckinn at gmail.com  Mon May 17 13:45:48 2010
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 17 May 2010 13:45:48 -0400
Subject: [SciPy-User] scikits.timeseries: How to define frequency of
	15minutes
In-Reply-To: <4BF13906.9030707@internet.lu>
References: <fb7cd7eb49d9.4be15e79@zsw-bw.de> <4BF13906.9030707@internet.lu>
Message-ID: <AANLkTinb6PMllZx81PRuf02Wcm7f5VfLCPoie7EOKlEv@mail.gmail.com>

On Mon, May 17, 2010 at 8:39 AM, Georges Schutz
<georges.schutz at internet.lu> wrote:
> Hi Martin,
> It is good to hear that there are others facing the same problem because
> this my raise the importance of that issue for future plans.
>
> The solution you propose would be OK for me, I think I could live a
> while with being restricted to the proposed frequencies even if I would
> look foreword to customizable frequency on the long term.
>
> Thanks
> Georges Schutz
>
> On 05/05/2010 12:03, Martin Felder wrote:
>> Hi *,
>>
>> just for the record, I'm having the exact same problem as Georges. I
>> read through your discussion from three weeks ago, but I also don't feel
>> up to modifying the C code myself (being a Fortran kind of guy...).
>>
>> I understand implementing custom user-defined frequencies is probably a
>> lot of effort, but maybe it's less troublesome to just add some
>> frequencies often used (=by Georges and me, and hopefully others?) to
>> the currently implemented ones? I'd be extremely happy to have 12h, 6h,
>> 3h, 15min and 10min intervals in addition to the existing ones.
>>
>> If you could point me to the part of the code that would have to be
>> modified for that, maybe I can find someone more apt in C who can
>> implement it.
>>
>> Thanks,
>> Martin
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

On this note and per an offline discussion I had with Martin-- I'd be
interested to see what people think about the approach I've taken to
dealing with this problem in pandas
(http://code.google.com/p/pandas/). For example, it's relatively
trivial to do something like:

offset = Minute(15)
ts_15min = ts.asfreq(offset)

and to fill forward, interpolate the resulting series, among other
things. One of the key differences between the pandas data structures
and scikits.timeseries.TimeSeries is that data is not required to be
fixed-frequency, but can be explicitly "reindexed" to the desired
frequency. I find in my applications that I will often generate a
"date range of interest" (with the desired frequency) and then conform
all my data to that date range, e.g.:

conformed_data = data.reindex(date_range)

Of course you trade performance for flexibility. But IO is still by
and large the biggest bottleneck I've encountered.


From vanforeest at gmail.com  Mon May 17 15:10:21 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 17 May 2010 21:10:21 +0200
Subject: [SciPy-User] python for physics
In-Reply-To: <20100516165143.GF19278@phare.normalesup.org>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
	<20100516165143.GF19278@phare.normalesup.org>
Message-ID: <AANLkTinzz7nlvPHYiGOcCR4L-p6zw_umQUzGr75YfHqo@mail.gmail.com>

Hi,

You might also find Hans Langtangen's book on scientific computation
with pyhon interesting.

Nicky

On 16 May 2010 18:51, Gael Varoquaux <gael.varoquaux at normalesup.org> wrote:
> On Sun, May 16, 2010 at 05:10:40PM +0100, alexander baker wrote:
>> ? ?3 friends Physics friends of mine are looking for a starting point to
>> ? ?learn scientific computing in Python relevant to applied Physics, does
>> ? ?anyone have any suggestions, hints or event a deck of slides that could be
>> ? ?useful?
>
> This is not really physics-related, and is more oriented towards image
> analysis than Physics, and on top of that it is unfinished, and I have
> been shying from publishing on the net, but the notes of the courses I
> give can be found here:
> http://gael-varoquaux.info/python4science-2x1.pdf
>
> Also, see Fernando's py4science page, full of useful material:
> http://fperez.org/py4science/starter_kit.html
>
> Ga?l
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From jlconlin at gmail.com  Mon May 17 15:20:53 2010
From: jlconlin at gmail.com (Jeremy Conlin)
Date: Mon, 17 May 2010 13:20:53 -0600
Subject: [SciPy-User] python for physics
In-Reply-To: <20100516165143.GF19278@phare.normalesup.org>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
	<20100516165143.GF19278@phare.normalesup.org>
Message-ID: <AANLkTimmNR94EYJIoyWN6miZnvZIixDfcnMYlvJPq68r@mail.gmail.com>

Ga?l,

Thanks for posting these links, they look like a really good
introduction which I can use to help my coworkers.  (I'm not even the
original poster.)

One question though is how you got the output from iPython into your
document.  Of course you could just copy and paste it in, but for some
reason I believe you have this process automated.  Is it automated and
are you willing to share how you did it?

Thanks,
Jeremy


> This is not really physics-related, and is more oriented towards image
> analysis than Physics, and on top of that it is unfinished, and I have
> been shying from publishing on the net, but the notes of the courses I
> give can be found here:
> http://gael-varoquaux.info/python4science-2x1.pdf
>
> Also, see Fernando's py4science page, full of useful material:
> http://fperez.org/py4science/starter_kit.html
>
> Ga?l
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From baker.alexander at gmail.com  Mon May 17 15:47:06 2010
From: baker.alexander at gmail.com (alexander baker)
Date: Mon, 17 May 2010 20:47:06 +0100
Subject: [SciPy-User] python for physics
In-Reply-To: <AANLkTimmNR94EYJIoyWN6miZnvZIixDfcnMYlvJPq68r@mail.gmail.com>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
	<20100516165143.GF19278@phare.normalesup.org>
	<AANLkTimmNR94EYJIoyWN6miZnvZIixDfcnMYlvJPq68r@mail.gmail.com>
Message-ID: <AANLkTilyVSJ2Yx-eqV5Yskgq_h-P5AFRrq2-HicJYN2f@mail.gmail.com>

Thank you all for the links thus far, I aim to try out the docs with folks
in early part of June so will come back with some feedback on how things get
on.

Alex


Mobile: 07788 872118
Blog: www.alexfb.com

--
All science is either physics or stamp collecting.


On 17 May 2010 20:20, Jeremy Conlin <jlconlin at gmail.com> wrote:

> Ga?l,
>
> Thanks for posting these links, they look like a really good
> introduction which I can use to help my coworkers.  (I'm not even the
> original poster.)
>
> One question though is how you got the output from iPython into your
> document.  Of course you could just copy and paste it in, but for some
> reason I believe you have this process automated.  Is it automated and
> are you willing to share how you did it?
>
> Thanks,
> Jeremy
>
>
> > This is not really physics-related, and is more oriented towards image
> > analysis than Physics, and on top of that it is unfinished, and I have
> > been shying from publishing on the net, but the notes of the courses I
> > give can be found here:
> > http://gael-varoquaux.info/python4science-2x1.pdf
> >
> > Also, see Fernando's py4science page, full of useful material:
> > http://fperez.org/py4science/starter_kit.html
> >
> > Ga?l
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >_______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/fae11850/attachment.html>

From vanforeest at gmail.com  Mon May 17 15:57:44 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 17 May 2010 21:57:44 +0200
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
Message-ID: <AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>

Hi Josef,

Thanks for the answer.

> Actually, if the onepoint distribution directly subclasses rv_generic
> then it wouldn't rely on or interfere with the generic framework in
> rv_continuous or rv_discrete (where it wouldn't really fit in if
> onepoint is on reals), and it might be relatively easy to provide all
> the methods of the distributions for a single point distribution.

I must admit that I haven't had a look at the innards of rv_generic,
so I am afraid I cannot be of any relevant help in this respect.

>
> Choice of name:
> to me, "deterministic random variable" sounds like an oxymoron,
> although I found some references to deterministic distribution (mainly
> or exclusively in queuing theory and
> http://isi.cbs.nl/glossary/term902.htm)
> I would prefer a boring "onepoint" distribution, or "degenerate", or ... ?

Degenerate seems nice to me. I just checked the book Probability by
Shiryaev, and he also uses the word `degenerate'. Interestingly, he
introduces the degenerate distribution as the normal distribution with
sigma = 0. I suspect that implementing the degenerate distribution
like this is utterly stupid.

> Can you file a ticket with what you would like to have?

Sure. Sorry for bothering you with this, but how?

> <rambling ahead>
> I started to work again a bit on enhancing the distributions, mainly
> I'm experimenting with several generic estimation methods. My target
> is to have a working estimator for any distribution in scipy.stats and
> for several additional distributions.

This seems a nice idea, but quite ambitious. Have you also thought
about estimators for heavy tailed distributions? This is, as far as I
know, a very delicate topic.

>
> I worry a bit that a deterministic distribution might not fit into a
> general framework for distributions and might need to be special cased
> for some methods. (but see above)

This must be fairly easy. Just the mean can be relevant.

> http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/

I'll have a look. Thanks.

Nicky


From vanforeest at gmail.com  Mon May 17 16:37:06 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 17 May 2010 22:37:06 +0200
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
Message-ID: <AANLkTik_pw0EQkWlfZpzJrvqo8Ge8PyNuwXceA0zEKZl@mail.gmail.com>

While checking out your sandbox:

>> http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/

I came across the file stats_dhuard.py. Here you mention to use kernel
density estimators to approximate densities. I suddenly recalled that
I read Section 6.1.3 of Stachursky's book Economic Dynamics, theory
and dynamics. This section on kernel density estimators may be quite
(very?) useful for the problems you mentioned in another mail (using
splines to approximate distributions).

Nicky


From jgomezdans at gmail.com  Mon May 17 16:39:59 2010
From: jgomezdans at gmail.com (Jose Gomez-Dans)
Date: Mon, 17 May 2010 21:39:59 +0100
Subject: [SciPy-User] python for physics
In-Reply-To: <AANLkTimmNR94EYJIoyWN6miZnvZIixDfcnMYlvJPq68r@mail.gmail.com>
References: <AANLkTinNKYTPCau87B9NsuZomz9zJHvkYa-_X_FHI4IR@mail.gmail.com>
	<20100516165143.GF19278@phare.normalesup.org>
	<AANLkTimmNR94EYJIoyWN6miZnvZIixDfcnMYlvJPq68r@mail.gmail.com>
Message-ID: <AANLkTilfXq7zTOcQ28goUJSjkSV06GZ2AZHYnnQvkMro@mail.gmail.com>

Hi,

On 17 May 2010 20:20, Jeremy Conlin <jlconlin at gmail.com> wrote:

> One question though is how you got the output from iPython into your
> document.  Of course you could just copy and paste it in, but for some
> reason I believe you have this process automated.  Is it automated and
> are you willing to share how you did it?


I'm not Ga?l, but I think he used sphinxSee
<http://sphinx.pocoo.org/>
<http://matplotlib.sourceforge.net/devel/documenting_mpl.html>

and in particular
<http://matplotlib.sourceforge.net/sampledoc/extensions.html>
(ipython is here: <
http://matplotlib.sourceforge.net/sampledoc/extensions.html#ipython-sessions
>)

Very easy to use and lovely results.

Hope that helps,
Jose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/11e1e9dc/attachment.html>

From mattknox.ca at gmail.com  Mon May 17 18:59:13 2010
From: mattknox.ca at gmail.com (Matt Knox)
Date: Mon, 17 May 2010 22:59:13 +0000 (UTC)
Subject: [SciPy-User]
	=?utf-8?q?scikits=2Etimeseries=3A_How_to_define_freq?=
	=?utf-8?q?uency_of=0915minutes?=
References: <fb7cd7eb49d9.4be15e79@zsw-bw.de> <4BF13906.9030707@internet.lu>
Message-ID: <loom.20100518T004523-79@post.gmane.org>

Georges Schutz <georges.schutz <at> internet.lu> writes:

> 
> Hi Martin,
> It is good to hear that there are others facing the same problem because 
> this my raise the importance of that issue for future plans.
> 
> The solution you propose would be OK for me, I think I could live a 
> while with being restricted to the proposed frequencies even if I would 
> look foreword to customizable frequency on the long term.
> 
> Thanks
> Georges Schutz
> 
> On 05/05/2010 12:03, Martin Felder wrote:
> > Hi *,
> >
> > just for the record, I'm having the exact same problem as Georges. I
> > read through your discussion from three weeks ago, but I also don't feel
> > up to modifying the C code myself (being a Fortran kind of guy...).
> >
> > I understand implementing custom user-defined frequencies is probably a
> > lot of effort, but maybe it's less troublesome to just add some
> > frequencies often used (=by Georges and me, and hopefully others?) to
> > the currently implemented ones? I'd be extremely happy to have 12h, 6h,
> > 3h, 15min and 10min intervals in addition to the existing ones.
> >
> > If you could point me to the part of the code that would have to be
> > modified for that, maybe I can find someone more apt in C who can
> > implement it.
> >
> > Thanks,
> > Martin
> >

Sorry, missed this post earlier. The relevant C code is in the src and include
subfolders in the c_dates.c and c_dates.h files. I don't have any objections
to defining some extra frequencies like this as a stop gap solution along the
way to a longer term more generic custom frequency solution. Or if Pierre feels
that it is not appropriate to include these in the package, it should be easy
enough to maintain a separate set of patches since that code doesn't really
change much these days. And if it does change substantially it will likely be
because we are doing a major overhaul of the package which would probably
include support for custom frequencies anyway.

- Matt

PS.

If you don't hear from Pierre or myself within several days on questions like
this, feel free to ping me at my personal email to draw my attention to it
because it probably means I just didn't notice it. You can find my email
address somewhere in the timeseries documentation or source code.


From seb.haase at gmail.com  Tue May 18 03:20:40 2010
From: seb.haase at gmail.com (Sebastian Haase)
Date: Tue, 18 May 2010 09:20:40 +0200
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <201005141650.17597.lpc@cmu.edu>
References: <201005141650.17597.lpc@cmu.edu>
Message-ID: <AANLkTilVI0nN528t6xePWl3CZw5c92BLjTj8b49SvQ2f@mail.gmail.com>

Just thought of another question:

So FreeImage doesn't even depend on libjpeg !?
I'm asking because I remember problems with installing (building!?)
PIL on OS-X where jpg wasn't working because of some problem related
to libjpeg ...
I don't remember the exact circumstances - but if FreeImage didn't
have that dependency it would be another thing _less_ to worry about.

-Sebastian


On Fri, May 14, 2010 at 10:50 PM, Luis Pedro Coelho <lpc at cmu.edu> wrote:
> On Wednesday, Sebastian Haase wrote:
>> this sounds exciting and I might find some time to try it out ...
>> BTW, the Python image-sig ?should not be a "PIL only" mailing list. So
>> (eventually) I feel, this issue could be brought up there, too.
>
> I have created a mailing list for python computer vision topics (things that
> are images but not PIL related):
>
> http://groups.google.com/group/pythonvision?pli=1
>
> It is currently very low traffic since it just started (this is my first
> public announcement).
>
> *
>
> Btw, for the same sort of issues (opening 16-bit TIFFs in particular), I once
> wrote a wrapper around imagemagick's C++ image opening functions:
>
> http://github.com/luispedro/readmagick
>
> I works nicely on linux, but some people were trying to use it on Mac or
> Windows and got really stuck b/c they didn't know how to compile it and I
> couldn't help them, so I gave up on trying to make this more widely used.
>
> HTH
> --
> Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From zachary.pincus at yale.edu  Tue May 18 07:28:33 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 18 May 2010 07:28:33 -0400
Subject: [SciPy-User] FreeImage <-> numpy IO wrappers
In-Reply-To: <AANLkTilVI0nN528t6xePWl3CZw5c92BLjTj8b49SvQ2f@mail.gmail.com>
References: <201005141650.17597.lpc@cmu.edu>
	<AANLkTilVI0nN528t6xePWl3CZw5c92BLjTj8b49SvQ2f@mail.gmail.com>
Message-ID: <1362D71A-5799-436D-BB21-01CE45ECFBD6@yale.edu>

> Just thought of another question:
>
> So FreeImage doesn't even depend on libjpeg !?
> I'm asking because I remember problems with installing (building!?)
> PIL on OS-X where jpg wasn't working because of some problem related
> to libjpeg ...
> I don't remember the exact circumstances - but if FreeImage didn't
> have that dependency it would be another thing _less_ to worry about.

It doesn't depend on any external libraries, but uses libs jpeg, tiff,  
png, and z internally -- they're included with the source and compiled  
as part of the build process.


From mikehulluk at googlemail.com  Wed May 19 10:53:58 2010
From: mikehulluk at googlemail.com (Michael Hull)
Date: Wed, 19 May 2010 15:53:58 +0100
Subject: [SciPy-User] PCA functions
Message-ID: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>

Hi Everybody,
I am doing some work using numpy/scipy and wanted to find the
principle components for some data. I can write a fairly simple
function to do this, but was wondering if there was already a function
in scipy to do this that I hadn't found before re-inventing the wheel

Many thanks,


Mike Hull


From zachary.pincus at yale.edu  Wed May 19 11:31:20 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 19 May 2010 11:31:20 -0400
Subject: [SciPy-User] PCA functions
In-Reply-To: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
Message-ID: <5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu>

Hi Mike,

Here's what I use. I don't think there's anything in scipy per se, but  
I might be wrong.

Empirically, I find that doing the PCA with eigh is faster than with  
svd, but this might be based on the dimensionality of my data vs. the  
number of data points I use. The functions take in an (m,n)-shaped  
matrix of m data points in n dimensions, and return an array of shape  
(k,n) consisting of k principal components in n dimensions (where  
k=min(m,n)), a (k,)-shaped array of the variances of the data along  
each principal component, and a (m,k)-shaped array of the projection  
of each data point into the subspace spanned by the principal  
components.

Zach


import numpy

def pca_svd(flat):
   u, s, vt = numpy.linalg.svd(flat, full_matrices = 0)
   pcs = vt
   v = numpy.transpose(vt)
   data_count = len(flat)
   variances = s**2 / data_count
   positions =  u * s
   return pcs, variances, positions

def pca_eig(flat):
   values, vectors = _symm_eig(flat)
   pcs = vectors.transpose()
   variances = values / len(flat)
   positions = numpy.dot(flat, vectors)
   return pcs, variances, positions

def _symm_eig(a):
   """Given input a, return the non-zero eigenvectors and eigenvalues  
of the symmetric matrix a'a.

   If a has more columns than rows, then that matrix will be rank- 
deficient,
   and the non-zero eigenvalues and eigenvectors of a'a can be more  
easily extracted
   from the matrix aa'. From the properties of the SVD:
     if a of shape (m,n) has SVD u*s*v', then:
       a'a = v*s's*v'
       aa' = u*ss'*u'
     let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m)
     and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of  
the other:
       v = a'*u*s_hat'
       u = a*v*s_hat
   """
   m, n = a.shape
   if m >= n:
     # just return the eigenvalues and eigenvectors of a'a
     vecs, vals = _eigh(numpy.dot(a.transpose(), a))
     vecs = numpy.where(vecs < 0, 0, vecs)
     return vecs, vals
   else:
     # figure out the eigenvalues and vectors based on aa', which is  
smaller
     sst_diag, u = _eigh(numpy.dot(a, a.transpose()))
     # in case due to numerical instabilities we have sst_diag < 0  
anywhere,
     # peg them to zero
     sst_diag = numpy.where(sst_diag < 0, 0, sst_diag)
     # now get the inverse square root of the diagonal, which will  
form the
     # main diagonal of s_hat
     err = numpy.seterr(divide='ignore', invalid='ignore')
     s_hat_diag = 1/numpy.sqrt(sst_diag)
     numpy.seterr(**err)
     s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0)
     # s_hat_diag is a list of length m, a'u is (n,m), so we can just  
use
     # numpy's broadcasting instead of matrix multiplication, and only  
create
     # the upper mxm block of a'u, since that's all we'll use anyway...
     v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag
     return sst_diag, v

def _eigh(m):
   values, vectors = numpy.linalg.eigh(m)
   order = numpy.flipud(values.argsort())
   return values[order], vectors[:,order]


On May 19, 2010, at 10:53 AM, Michael Hull wrote:

> Hi Everybody,
> I am doing some work using numpy/scipy and wanted to find the
> principle components for some data. I can write a fairly simple
> function to do this, but was wondering if there was already a function
> in scipy to do this that I hadn't found before re-inventing the wheel
>
> Many thanks,
>
>
> Mike Hull
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From josef.pktd at gmail.com  Wed May 19 11:44:20 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 19 May 2010 11:44:20 -0400
Subject: [SciPy-User] PCA functions
In-Reply-To: <5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
	<5398423A-13C4-4A7E-B8A8-127934EC5DB5@yale.edu>
Message-ID: <AANLkTimygQG_h1zIfbxVldlLZRN1P5RPAeEzw6LM5Ivl@mail.gmail.com>

On Wed, May 19, 2010 at 11:31 AM, Zachary Pincus
<zachary.pincus at yale.edu> wrote:
> Hi Mike,
>
> Here's what I use. I don't think there's anything in scipy per se, but
> I might be wrong.

There is nothing directly in numpy/scipy but many packages have their own
version,
the most heavy duty version might be in MDP
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html

and the corresponding thread
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html

Josef

>
> Empirically, I find that doing the PCA with eigh is faster than with
> svd, but this might be based on the dimensionality of my data vs. the
> number of data points I use. The functions take in an (m,n)-shaped
> matrix of m data points in n dimensions, and return an array of shape
> (k,n) consisting of k principal components in n dimensions (where
> k=min(m,n)), a (k,)-shaped array of the variances of the data along
> each principal component, and a (m,k)-shaped array of the projection
> of each data point into the subspace spanned by the principal
> components.
>
> Zach
>
>
>
> import numpy
>
> def pca_svd(flat):
> ? u, s, vt = numpy.linalg.svd(flat, full_matrices = 0)
> ? pcs = vt
> ? v = numpy.transpose(vt)
> ? data_count = len(flat)
> ? variances = s**2 / data_count
> ? positions = ?u * s
> ? return pcs, variances, positions
>
> def pca_eig(flat):
> ? values, vectors = _symm_eig(flat)
> ? pcs = vectors.transpose()
> ? variances = values / len(flat)
> ? positions = numpy.dot(flat, vectors)
> ? return pcs, variances, positions
>
> def _symm_eig(a):
> ? """Given input a, return the non-zero eigenvectors and eigenvalues
> of the symmetric matrix a'a.
>
> ? If a has more columns than rows, then that matrix will be rank-
> deficient,
> ? and the non-zero eigenvalues and eigenvectors of a'a can be more
> easily extracted
> ? from the matrix aa'. From the properties of the SVD:
> ? ? if a of shape (m,n) has SVD u*s*v', then:
> ? ? ? a'a = v*s's*v'
> ? ? ? aa' = u*ss'*u'
> ? ? let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m)
> ? ? and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of
> the other:
> ? ? ? v = a'*u*s_hat'
> ? ? ? u = a*v*s_hat
> ? """
> ? m, n = a.shape
> ? if m >= n:
> ? ? # just return the eigenvalues and eigenvectors of a'a
> ? ? vecs, vals = _eigh(numpy.dot(a.transpose(), a))
> ? ? vecs = numpy.where(vecs < 0, 0, vecs)
> ? ? return vecs, vals
> ? else:
> ? ? # figure out the eigenvalues and vectors based on aa', which is
> smaller
> ? ? sst_diag, u = _eigh(numpy.dot(a, a.transpose()))
> ? ? # in case due to numerical instabilities we have sst_diag < 0
> anywhere,
> ? ? # peg them to zero
> ? ? sst_diag = numpy.where(sst_diag < 0, 0, sst_diag)
> ? ? # now get the inverse square root of the diagonal, which will
> form the
> ? ? # main diagonal of s_hat
> ? ? err = numpy.seterr(divide='ignore', invalid='ignore')
> ? ? s_hat_diag = 1/numpy.sqrt(sst_diag)
> ? ? numpy.seterr(**err)
> ? ? s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0)
> ? ? # s_hat_diag is a list of length m, a'u is (n,m), so we can just
> use
> ? ? # numpy's broadcasting instead of matrix multiplication, and only
> create
> ? ? # the upper mxm block of a'u, since that's all we'll use anyway...
> ? ? v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag
> ? ? return sst_diag, v
>
> def _eigh(m):
> ? values, vectors = numpy.linalg.eigh(m)
> ? order = numpy.flipud(values.argsort())
> ? return values[order], vectors[:,order]
>
>
>
>
>
>
> On May 19, 2010, at 10:53 AM, Michael Hull wrote:
>
>> Hi Everybody,
>> I am doing some work using numpy/scipy and wanted to find the
>> principle components for some data. I can write a fairly simple
>> function to do this, but was wondering if there was already a function
>> in scipy to do this that I hadn't found before re-inventing the wheel
>>
>> Many thanks,
>>
>>
>> Mike Hull
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From lesserwhirls at gmail.com  Wed May 19 12:04:59 2010
From: lesserwhirls at gmail.com (Sean Arms)
Date: Wed, 19 May 2010 11:04:59 -0500
Subject: [SciPy-User] PCA functions
In-Reply-To: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
Message-ID: <AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e@mail.gmail.com>

Greetings Mike,

     Are you looking for just the PCA decomposition, or are you
wanting to rotate the truncated PC's using something like promax,
varimax, etc.?  If so, I do not think MDP or NiPy have that
capability.  I have functions to do some of the basic rotations, and
I've tested them against S+ and Matlab if you are looking for that
functionality, but I'll probably need to clean them up a bit :-)

Sean

On Wed, May 19, 2010 at 9:53 AM, Michael Hull <mikehulluk at googlemail.com> wrote:
> Hi Everybody,
> I am doing some work using numpy/scipy and wanted to find the
> principle components for some data. I can write a fairly simple
> function to do this, but was wondering if there was already a function
> in scipy to do this that I hadn't found before re-inventing the wheel
>
> Many thanks,
>
>
> Mike Hull
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From mikehulluk at googlemail.com  Wed May 19 16:13:07 2010
From: mikehulluk at googlemail.com (Michael Hull)
Date: Wed, 19 May 2010 21:13:07 +0100
Subject: [SciPy-User] SciPy-User Digest, Vol 81, Issue 39
In-Reply-To: <mailman.11.1274288410.4926.scipy-user@scipy.org>
References: <mailman.11.1274288410.4926.scipy-user@scipy.org>
Message-ID: <AANLkTimRWAG61qtM9t4vishskN8kKe8mzWSg_H4w2Mn2@mail.gmail.com>

> Hi Mike,
>
> Here's what I use. I don't think there's anything in scipy per se, but
> I might be wrong.
>
> Empirically, I find that doing the PCA with eigh is faster than with
> svd, but this might be based on the dimensionality of my data vs. the
> number of data points I use. The functions take in an (m,n)-shaped
> matrix of m data points in n dimensions, and return an array of shape
> (k,n) consisting of k principal components in n dimensions (where
> k=min(m,n)), a (k,)-shaped array of the variances of the data along
> each principal component, and a (m,k)-shaped array of the projection
> of each data point into the subspace spanned by the principal
> components.
>
> Zach
>
>>
>> Here's what I use. I don't think there's anything in scipy per se, but
>> I might be wrong.
>
> There is nothing directly in numpy/scipy but many packages have their own
> version,
> the most heavy duty version might be in MDP
> http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html
>
> and the corresponding thread
> http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html
>
> Josef
>

>>
>> Empirically, I find that doing the PCA with eigh is faster than with
>> svd, but this might be based on the dimensionality of my data vs. the
>> number of data points I use. The functions take in an (m,n)-shaped
>> matrix of m data points in n dimensions, and return an array of shape
>> (k,n) consisting of k principal components in n dimensions (where
>> k=min(m,n)), a (k,)-shaped array of the variances of the data along
>> each principal component, and a (m,k)-shaped array of the projection
>> of each data point into the subspace spanned by the principal
>> components.
>>
>> Zach
>>
>
> Message: 4
> Date: Wed, 19 May 2010 11:04:59 -0500
> From: Sean Arms <lesserwhirls at gmail.com>
> Subject: Re: [SciPy-User] PCA functions
> To: SciPy Users List <scipy-user at scipy.org>
> Message-ID:
> ? ? ? ?<AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Greetings Mike,
>
> ? ? Are you looking for just the PCA decomposition, or are you
> wanting to rotate the truncated PC's using something like promax,
> varimax, etc.? ?If so, I do not think MDP or NiPy have that
> capability. ?I have functions to do some of the basic rotations, and
> I've tested them against S+ and Matlab if you are looking for that
> functionality, but I'll probably need to clean them up a bit :-)
>
> Sean


Hi Guys,
Thanks very much for the quick responses.
I was looking for something simple - the principle components of 1000
data points in a 3 dimensional space, just to do a bit of prelim data
exploration,  so speed was not so much of an issue - I just
implemented something fairly simple with numpy.cov and numpy.eig.
I was just wondering if there was something in scipy since this seems
like something other people would also reimplement, but
It sounds like that to implement this properly in scipy would require
more thought/work as there can be more pca than I had thought....
(Apparently the matlab statistics toolbox has a pca function, but
according to one colleague "Its just easier to write your own than
deal with license servers" :) )

Many thanks,

Mike


From oliver.tomic at nofima.no  Thu May 20 05:35:37 2010
From: oliver.tomic at nofima.no (Oliver Tomic)
Date: Thu, 20 May 2010 11:35:37 +0200
Subject: [SciPy-User] PCA functions
In-Reply-To: <AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e@mail.gmail.com>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>,
	<AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e@mail.gmail.com>
Message-ID: <OF23DD22E9.5FBCAFF7-ONC1257729.0034B316-C1257729.0034B319@nofima.no>

An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100520/70ce3734/attachment.html>

From vincent at vincentdavis.net  Thu May 20 09:38:11 2010
From: vincent at vincentdavis.net (Vincent Davis)
Date: Thu, 20 May 2010 07:38:11 -0600
Subject: [SciPy-User] PCA functions
In-Reply-To: <OF23DD22E9.5FBCAFF7-ONC1257729.0034B316-C1257729.0034B319@nofima.no>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
	<AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e@mail.gmail.com>
	<OF23DD22E9.5FBCAFF7-ONC1257729.0034B316-C1257729.0034B319@nofima.no>
Message-ID: <AANLkTimcJV1GZDv4DkNDveKzQXtPEgeMktdEjT6F6ubC@mail.gmail.com>

On Thu, May 20, 2010 at 3:35 AM, Oliver Tomic <oliver.tomic at nofima.no>wrote:

@Oliver, I posted your email over on the statsmodels list. I'll take a look
at the link.

Vincent


> Hi,
>
> I already sent this link to Mike (off-list, since my mails kept bouncing
> back). A while ago I supervised a student who implemented various flavours
> of PCA (using SVD and NIPALS, in Python and C respectively) as part of a
> semester project. There is quite a bit of documentation coming with the PCA
> module.
>
> http://folk.uio.no/henninri/pca_module/
>
>
> I was considering to ask the pystatmodels-group whether they are interested
> in including this code, however both code and documentation may need a
> little bit of polishing first. Unfortunately, there is no validation
> procedure available in the code to validate the model. I have plans on
> implementing this if I ever should find some time to do this.
>
> Cheers
> Oliver
>
>
>
>
>
> -----scipy-user-bounces at scipy.org wrote: -----
>
> >To: SciPy Users List <scipy-user at scipy.org>
> >From: Sean Arms <lesserwhirls at gmail.com>
> >Sent by: scipy-user-bounces at scipy.org
> >Date: 05/19/2010 06:04PM
> >Subject: Re: [SciPy-User] PCA functions
>
> >
> >Greetings Mike,
> >
> >     Are you looking for just the PCA decomposition, or are you
> >wanting to rotate the truncated PC's using something like promax,
> >varimax, etc.?  If so, I do not think MDP or NiPy have that
> >capability.  I have functions to do some of the basic rotations, and
> >I've tested them against S+ and Matlab if you are looking for that
> >functionality, but I'll probably need to clean them up a bit :-)
> >
> >Sean
> >
> >On Wed, May 19, 2010 at 9:53 AM, Michael Hull
> ><mikehulluk at googlemail.com> wrote:
> >> Hi Everybody,
> >> I am doing some work using numpy/scipy and wanted to find the
> >> principle components for some data. I can write a fairly simple
> >> function to do this, but was wondering if there was already a
> >function
> >> in scipy to do this that I hadn't found before re-inventing the
> >wheel
> >>
> >> Many thanks,
> >>
> >>
> >> Mike Hull
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >_______________________________________________
> >SciPy-User mailing list
> >SciPy-User at scipy.org
> >http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100520/073341dc/attachment.html>

From josef.pktd at gmail.com  Thu May 20 11:21:22 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 20 May 2010 11:21:22 -0400
Subject: [SciPy-User] PCA functions
In-Reply-To: <AANLkTimcJV1GZDv4DkNDveKzQXtPEgeMktdEjT6F6ubC@mail.gmail.com>
References: <AANLkTim01f6fSGrVcA2EGmkNZmUUR1Ph2oLUaJ-S5iPe@mail.gmail.com>
	<AANLkTilsE0tEtvC6wtluNTXuVafN7B4lD5pkvcoQ8w6e@mail.gmail.com>
	<OF23DD22E9.5FBCAFF7-ONC1257729.0034B316-C1257729.0034B319@nofima.no>
	<AANLkTimcJV1GZDv4DkNDveKzQXtPEgeMktdEjT6F6ubC@mail.gmail.com>
Message-ID: <AANLkTimyEiDXsnJYqkZDurUF02sj5Eh23HqBk7-YVZr3@mail.gmail.com>

On Thu, May 20, 2010 at 9:38 AM, Vincent Davis <vincent at vincentdavis.net>wrote:

> On Thu, May 20, 2010 at 3:35 AM, Oliver Tomic <oliver.tomic at nofima.no>wrote:
>
> @Oliver, I posted your email over on the statsmodels list. I'll take a look
> at the link.
>

I briefly looked at the pca_module and it looks well written and documented
already, although I think the matrix versions are redundant based on very
fast skimming of the code and list of functions.

statsmodels already has 3 implementations of pca, with eigh, svd and one
wrapped in a class. And there is also an example how to do Principal
Component Regression. But we don't have NIPALS yet, or a version that
calculates only a few eigenvectors (with eigh). And the current versions in
statsmodels are pretty basic, the eigh and svd versions are modeled after
and tested against matlab princomp.

I think, as the discussions on the nipy and scipy list show, a basic PCA
version is easy to write, but everyone emphasizes different extras or
performance features, e.g. rotation would be nice for factor analysis.

I also think that scipy should have a basic version, just so we don't have
to figure out or remember how to do eigh or what all the different parts of
svd mean.

For statsmodels, I looked at this mainly for regressions in a "data-rich
environment", i.e. with lots of possible regressors.
For (unsupervised) dimension reduction we still have to figure out how it
fits in when pca gets out of the sandbox or when we expand in this area.
Also, I don't know if statsmodels will eventually get factor analysis. (I
have a multivariate analysis folder on my computer, but thought of leaving
this area to pymvpa.)
I stopped working on this for the moment, but I thought maybe a class that
makes the usage of pca and the corresponding projections easy and
self-explanatory would be useful. E.g. for regression we need to be able to
rerun the regression with an increasing number of components and should
reuse previous calculations. The second point, if there are different
implementations, then we should have either automatic selection of the best
one given the arguments or a comparative documentation when to use which
version.

Josef

http://tinyurl.com/2dwyjt8


>
> Vincent
>
>
>> Hi,
>>
>> I already sent this link to Mike (off-list, since my mails kept bouncing
>> back). A while ago I supervised a student who implemented various flavours
>> of PCA (using SVD and NIPALS, in Python and C respectively) as part of a
>> semester project. There is quite a bit of documentation coming with the PCA
>> module.
>>
>> http://folk.uio.no/henninri/pca_module/
>>
>>
>> I was considering to ask the pystatmodels-group whether they are
>> interested in including this code, however both code and documentation may
>> need a little bit of polishing first. Unfortunately, there is no validation
>> procedure available in the code to validate the model. I have plans on
>> implementing this if I ever should find some time to do this.
>>
>
>> Cheers
>> Oliver
>>
>>
>>
>>
>>
>> -----scipy-user-bounces at scipy.org wrote: -----
>>
>> >To: SciPy Users List <scipy-user at scipy.org>
>> >From: Sean Arms <lesserwhirls at gmail.com>
>> >Sent by: scipy-user-bounces at scipy.org
>> >Date: 05/19/2010 06:04PM
>> >Subject: Re: [SciPy-User] PCA functions
>>
>> >
>> >Greetings Mike,
>> >
>> >     Are you looking for just the PCA decomposition, or are you
>> >wanting to rotate the truncated PC's using something like promax,
>> >varimax, etc.?  If so, I do not think MDP or NiPy have that
>> >capability.  I have functions to do some of the basic rotations, and
>> >I've tested them against S+ and Matlab if you are looking for that
>> >functionality, but I'll probably need to clean them up a bit :-)
>> >
>> >Sean
>> >
>> >On Wed, May 19, 2010 at 9:53 AM, Michael Hull
>> ><mikehulluk at googlemail.com> wrote:
>> >> Hi Everybody,
>> >> I am doing some work using numpy/scipy and wanted to find the
>> >> principle components for some data. I can write a fairly simple
>> >> function to do this, but was wondering if there was already a
>> >function
>> >> in scipy to do this that I hadn't found before re-inventing the
>> >wheel
>> >>
>> >> Many thanks,
>> >>
>> >>
>> >> Mike Hull
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >>
>> >_______________________________________________
>> >SciPy-User mailing list
>> >SciPy-User at scipy.org
>> >http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100520/52e074c7/attachment.html>

From mdekauwe at gmail.com  Fri May 21 08:59:08 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 21 May 2010 05:59:08 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
Message-ID: <28633477.post@talk.nabble.com>


Hi,

I am trying to extract data from a 4D array and store it in a 2D array, but
avoid my current usage of the for loops for speed, as in reality the arrays
sizes are quite big. Could someone also try and explain the solution as well
if they have a spare moment as I am still finding it quite difficult to get
over the habit of using loops (C convert for my sins). I get that one could
precompute the indices's i and j i.e.

i = np.arange(tsteps)
j = np.arange(numpts)

but just can't get my head round how i then use them...

Thanks,
Martin

import numpy as np

numpts=10
tsteps = 12
vari = 22

data = np.random.random((tsteps, vari, numpts, 1))
new_data = np.zeros((tsteps, numpts), dtype=np.float32)
index = np.arange(numpts)

for i in xrange(tsteps):
    for j in xrange(numpts):
        new_data[i,j] = data[i,5,index[j],0]


-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From zachary.pincus at yale.edu  Fri May 21 09:11:34 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Fri, 21 May 2010 09:11:34 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28633477.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
Message-ID: <74BDBBD3-B73B-45B1-B859-3F8C28902DE7@yale.edu>

> import numpy as np
>
> numpts=10
> tsteps = 12
> vari = 22
>
> data = np.random.random((tsteps, vari, numpts, 1))
> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
> index = np.arange(numpts)
>
> for i in xrange(tsteps):
>    for j in xrange(numpts):
>        new_data[i,j] = data[i,5,index[j],0]
>

new_data2 = data[:,5,index,0].astype(numpy.float32)
numpy.all(new_data == new_data2) # returns True

This assuming that your real "index" array is more interesting than  
just [0,1,2,3,...]... if not,
new_data2 = data[:,5,:,0].astype(numpy.float32)
would do fine.

That said, I've never been able to figure out whether it's possible to  
index particular points along multiple axes with index lists -- that  
is, how to make numpy do something like:
new_data3 = data[index_x,5,index_y,0]

Zach


From josef.pktd at gmail.com  Fri May 21 09:12:57 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 21 May 2010 09:12:57 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28633477.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
Message-ID: <AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>

On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Hi,
>
> I am trying to extract data from a 4D array and store it in a 2D array, but
> avoid my current usage of the for loops for speed, as in reality the arrays
> sizes are quite big. Could someone also try and explain the solution as well
> if they have a spare moment as I am still finding it quite difficult to get
> over the habit of using loops (C convert for my sins). I get that one could
> precompute the indices's i and j i.e.
>
> i = np.arange(tsteps)
> j = np.arange(numpts)
>
> but just can't get my head round how i then use them...
>
> Thanks,
> Martin
>
> import numpy as np
>
> numpts=10
> tsteps = 12
> vari = 22
>
> data = np.random.random((tsteps, vari, numpts, 1))
> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
> index = np.arange(numpts)
>
> for i in xrange(tsteps):
> ? ?for j in xrange(numpts):
> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]

The index arrays need to be broadcastable against each other.

I think this should do it

new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]

Josef
>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From mdekauwe at gmail.com  Fri May 21 10:55:50 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 21 May 2010 07:55:50 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
Message-ID: <28634924.post@talk.nabble.com>


Thanks that works...

So the way to do it is with np.arange(tsteps)[:,None], that was the step I
was struggling with, so this forms a 2D array which replaces the the two for
loops? Do I have that right?

A lot quicker...!

Martin


josef.pktd wrote:
> 
> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Hi,
>>
>> I am trying to extract data from a 4D array and store it in a 2D array,
>> but
>> avoid my current usage of the for loops for speed, as in reality the
>> arrays
>> sizes are quite big. Could someone also try and explain the solution as
>> well
>> if they have a spare moment as I am still finding it quite difficult to
>> get
>> over the habit of using loops (C convert for my sins). I get that one
>> could
>> precompute the indices's i and j i.e.
>>
>> i = np.arange(tsteps)
>> j = np.arange(numpts)
>>
>> but just can't get my head round how i then use them...
>>
>> Thanks,
>> Martin
>>
>> import numpy as np
>>
>> numpts=10
>> tsteps = 12
>> vari = 22
>>
>> data = np.random.random((tsteps, vari, numpts, 1))
>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>> index = np.arange(numpts)
>>
>> for i in xrange(tsteps):
>>    for j in xrange(numpts):
>>        new_data[i,j] = data[i,5,index[j],0]
> 
> The index arrays need to be broadcastable against each other.
> 
> I think this should do it
> 
> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
> 
> Josef
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Fri May 21 11:27:39 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 21 May 2010 11:27:39 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28634924.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
Message-ID: <AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>

On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Thanks that works...
>
> So the way to do it is with np.arange(tsteps)[:,None], that was the step I
> was struggling with, so this forms a 2D array which replaces the the two for
> loops? Do I have that right?

Yes, but as Zachary showed, if you need the full index in a dimension,
then you can use slicing. It might be faster.
And a warning, mixing slices and index arrays with 3 or more
dimensions can have some surprise switching of axes.

Josef

>
> A lot quicker...!
>
> Martin
>
>
> josef.pktd wrote:
>>
>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to extract data from a 4D array and store it in a 2D array,
>>> but
>>> avoid my current usage of the for loops for speed, as in reality the
>>> arrays
>>> sizes are quite big. Could someone also try and explain the solution as
>>> well
>>> if they have a spare moment as I am still finding it quite difficult to
>>> get
>>> over the habit of using loops (C convert for my sins). I get that one
>>> could
>>> precompute the indices's i and j i.e.
>>>
>>> i = np.arange(tsteps)
>>> j = np.arange(numpts)
>>>
>>> but just can't get my head round how i then use them...
>>>
>>> Thanks,
>>> Martin
>>>
>>> import numpy as np
>>>
>>> numpts=10
>>> tsteps = 12
>>> vari = 22
>>>
>>> data = np.random.random((tsteps, vari, numpts, 1))
>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>> index = np.arange(numpts)
>>>
>>> for i in xrange(tsteps):
>>> ? ?for j in xrange(numpts):
>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>
>> The index arrays need to be broadcastable against each other.
>>
>> I think this should do it
>>
>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>
>> Josef
>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From zachary.pincus at yale.edu  Fri May 21 11:46:58 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Fri, 21 May 2010 11:46:58 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28634924.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
Message-ID: <3DEFBA54-C30E-4BA3-918F-E0640B5AF8F5@yale.edu>

> Thanks that works...
>
> So the way to do it is with np.arange(tsteps)[:,None], that was the  
> step I
> was struggling with, so this forms a 2D array which replaces the the  
> two for
> loops? Do I have that right?

If tsteps is just the size of the array in that dimension, you can  
use :, as before:
data[:,5,index,0]
which will be quicker and more straightforward.

If you want to index with multiple list-of-indices along different  
axes, then Josef's point about broadcasting is a good one (and the  
answer to the question I'd asked, actually...)

Given:
a = numpy.arange(100).reshape((10,10))
Then:
a[numpy.array([0,4,2])[:,numpy.newaxis], [1,2]]
or equivalently:
a[[[0],[4],[2]], [1,2]]

yields:
array([[ 1,  2],
        [41, 42],
        [21, 22]])

That is, the 0th, 4th, and 2nd rows, and the 1st and 2nd columns of a.

Thanks Josef!

Zach


> A lot quicker...!
>
> Martin
>
>
> josef.pktd wrote:
>>
>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to extract data from a 4D array and store it in a 2D  
>>> array,
>>> but
>>> avoid my current usage of the for loops for speed, as in reality the
>>> arrays
>>> sizes are quite big. Could someone also try and explain the  
>>> solution as
>>> well
>>> if they have a spare moment as I am still finding it quite  
>>> difficult to
>>> get
>>> over the habit of using loops (C convert for my sins). I get that  
>>> one
>>> could
>>> precompute the indices's i and j i.e.
>>>
>>> i = np.arange(tsteps)
>>> j = np.arange(numpts)
>>>
>>> but just can't get my head round how i then use them...
>>>
>>> Thanks,
>>> Martin
>>>
>>> import numpy as np
>>>
>>> numpts=10
>>> tsteps = 12
>>> vari = 22
>>>
>>> data = np.random.random((tsteps, vari, numpts, 1))
>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>> index = np.arange(numpts)
>>>
>>> for i in xrange(tsteps):
>>>   for j in xrange(numpts):
>>>       new_data[i,j] = data[i,5,index[j],0]
>>
>> The index arrays need to be broadcastable against each other.
>>
>> I think this should do it
>>
>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>
>> Josef
>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> -- 
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From DParker at chromalloy.com  Fri May 21 17:07:56 2010
From: DParker at chromalloy.com (DParker at chromalloy.com)
Date: Fri, 21 May 2010 17:07:56 -0400
Subject: [SciPy-User] Sort geometric data by proximity
Message-ID: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>

I have a set of geometric data in x, y plane that represents a section of 
a turbine airfoil. The shape looks something like a fat boomerang with the 
coordinates wrapping around the entire shape (a completely closed loop). 
The coordinate points are in a random order and I need to sort or fit them 
by proximity to develop a dataset containing continuos shape of the 
airfoil. 

I started looking through the interpolation functions but I would need a 
method that ignores the order of the data (fits based on proximity of the 
points) and can handle data that forms a closed loop.

The points are spaced closely enough along the airfoil surface so that 
they could be sorted by nearest neighbor - start with the first point find 
the next closest point and continue until all the points are "consumed". 

Any advice or pointers would be greatly appreciated. 

David Parker 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100521/83a5694e/attachment.html>

From david_baddeley at yahoo.com.au  Fri May 21 17:30:30 2010
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Fri, 21 May 2010 14:30:30 -0700 (PDT)
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
Message-ID: <896588.9118.qm@web33006.mail.mud.yahoo.com>

Hi David,

I'd probably do a Delaunay triangularisation and then, starting at an arbitrary node, walk the shortest edges, collecting nodes as I went. You can get the triangulation from the scikits.delaunay package, which you'll probably already have if you've got matplotlib installed (in this case you can find it as matplotlib.delaunay). 

You'll need to write a loop (or recursive function) to do the walking, but that shouldn't be too tricky. I've done something similar (collecting 'blobs' of unstructured points which are closer than a certain cutoff) using this technique, so if you need any additional pointers, or ideas on how to optimise the procedure (I precompute a database mapping each vertex to all the edges leading from it - otherwise you've got to loop through the entire edge list on each iteration to find which edges go from the current node/vertex) give me a bell.

cheers,
David


________________________________
From: "DParker at chromalloy.com" <DParker at chromalloy.com>
To: scipy-user at scipy.org
Sent: Sat, 22 May, 2010 9:07:56 AM
Subject: [SciPy-User] Sort geometric data by proximity

I have a set of geometric data in x, y
plane that represents a section of a turbine airfoil. The shape looks something
like a fat boomerang with the coordinates wrapping around the entire shape
(a completely closed loop). The coordinate points are in a random order
and I need to sort or fit them by proximity to develop a dataset containing
continuos shape of the airfoil.  

I started looking through the interpolation
functions but I would need a method that ignores the order of the data
(fits based on proximity of the points) and can handle data that forms
a closed loop. 

The points are spaced closely enough
along the airfoil surface so that they could be sorted by nearest neighbor
- start with the first point find the next closest point and continue until
all the points are "consumed".  

Any advice or pointers would be greatly
appreciated.  

David Parker 


From aarchiba at physics.mcgill.ca  Fri May 21 17:46:10 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 21 May 2010 17:46:10 -0400
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
Message-ID: <AANLkTintEzd6MalXE8NMiay6vvgP6_VB5-JYxBa8T6Ro@mail.gmail.com>

On 21 May 2010 17:07,  <DParker at chromalloy.com> wrote:
> I have a set of geometric data in x, y plane that represents a section of a
> turbine airfoil. The shape looks something like a fat boomerang with the
> coordinates wrapping around the entire shape (a completely closed loop). The
> coordinate points are in a random order and I need to sort or fit them by
> proximity to develop a dataset containing continuos shape of the airfoil.
>
> I started looking through the interpolation functions but I would need a
> method that ignores the order of the data (fits based on proximity of the
> points) and can handle data that forms a closed loop.
>
> The points are spaced closely enough along the airfoil surface so that they
> could be sorted by nearest neighbor - start with the first point find the
> next closest point and continue until all the points are "consumed".
>
> Any advice or pointers would be greatly appreciated.

The most direct approach is to pick a start point at random, then ask
for its two nearest neighbours. Then pick one, and loop. For each
point ask for its two nearest neighbours; one should be the last point
you looked at, and one should be the next point on your curve. If ever
this isn't true, you've found some place where your points don't
sample closely enough to clearly describe the turbine shape. When you
get your first point back, you're done.

As described, this is a fairly slow process, but the dominating
operation is not the python looping overhead but the time it takes to
find each nearest neighbour. Fortunately scipy.spatial includes an
object designed for this sort of problem, the kd-tree. So the way I'd
solve your problem is construct a kd-tree from your array of points,
then run a query asking the the three closest neighbours of each of
your original points (three because each point is its own closest
neighbour). Then just write a python loop to walk through the array of
neighbours as I described above. This process should be nice and fast,
and will diagnose some situations where you've inadequately sampled
your object.

Anne

>
> David Parker
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From mdekauwe at gmail.com  Fri May 21 21:57:05 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 21 May 2010 18:57:05 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
Message-ID: <28640602.post@talk.nabble.com>


Yes as Zachary said index is only 0 to 15237, so both methods work.

I don't quite get what you mean about slicing with axis > 3. Is there a link
you can recommend I should read? Does that mean given I have 4dims that
Josef's suggestion would be more advised in this case?

Thanks.


josef.pktd wrote:
> 
> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Thanks that works...
>>
>> So the way to do it is with np.arange(tsteps)[:,None], that was the step
>> I
>> was struggling with, so this forms a 2D array which replaces the the two
>> for
>> loops? Do I have that right?
> 
> Yes, but as Zachary showed, if you need the full index in a dimension,
> then you can use slicing. It might be faster.
> And a warning, mixing slices and index arrays with 3 or more
> dimensions can have some surprise switching of axes.
> 
> Josef
> 
>>
>> A lot quicker...!
>>
>> Martin
>>
>>
>> josef.pktd wrote:
>>>
>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to extract data from a 4D array and store it in a 2D array,
>>>> but
>>>> avoid my current usage of the for loops for speed, as in reality the
>>>> arrays
>>>> sizes are quite big. Could someone also try and explain the solution as
>>>> well
>>>> if they have a spare moment as I am still finding it quite difficult to
>>>> get
>>>> over the habit of using loops (C convert for my sins). I get that one
>>>> could
>>>> precompute the indices's i and j i.e.
>>>>
>>>> i = np.arange(tsteps)
>>>> j = np.arange(numpts)
>>>>
>>>> but just can't get my head round how i then use them...
>>>>
>>>> Thanks,
>>>> Martin
>>>>
>>>> import numpy as np
>>>>
>>>> numpts=10
>>>> tsteps = 12
>>>> vari = 22
>>>>
>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>> index = np.arange(numpts)
>>>>
>>>> for i in xrange(tsteps):
>>>> ? ?for j in xrange(numpts):
>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>
>>> The index arrays need to be broadcastable against each other.
>>>
>>> I think this should do it
>>>
>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>
>>> Josef
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640602.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From mdekauwe at gmail.com  Fri May 21 22:14:51 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 21 May 2010 19:14:51 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28640602.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com>
Message-ID: <28640656.post@talk.nabble.com>


Also I then need to remap the 2D array I make onto another grid (the world in
this case). Which again I had am doing with a loop (note numpts is a lot
bigger than my example above).

wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
for i in xrange(numpts):
        # exclude the NaN, note masking them doesn't work in the stats func
        x = data1_snow[:,i]
        x = x[np.isfinite(x)]
        y = data2_snow[:,i]
        y = y[np.isfinite(y)]
        
        # wilcox signed rank test
        # make sure we have enough samples to do the test
        d = x - y
        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
differences
        count = len(d)
        if count > 10:
            z, pval = stats.wilcoxon(x, y)
            # only map out sign different data
            if pval < 0.05:
                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
np.mean(x - y)

Now I think I can push the data in one move into the wilcoxStats_snow array
by removing the index, 
but I can't see how I will get the individual x and y pts for each array
member correctly without the loop, this was my attempt which of course
doesn't work!

x = data1_snow[:,:]
x = x[np.isfinite(x)]
y = data2_snow[:,:]
y = y[np.isfinite(y)]
    
# r^2
# exclude v.small arrays, i.e. we need just less over 4 years of data
if len(x) and len(y) > 50:
    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, y)[0])**2


thanks.


mdekauwe wrote:
> 
> Yes as Zachary said index is only 0 to 15237, so both methods work.
> 
> I don't quite get what you mean about slicing with axis > 3. Is there a
> link you can recommend I should read? Does that mean given I have 4dims
> that Josef's suggestion would be more advised in this case?
> 
> Thanks.
> 
> 
> 
> josef.pktd wrote:
>> 
>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Thanks that works...
>>>
>>> So the way to do it is with np.arange(tsteps)[:,None], that was the step
>>> I
>>> was struggling with, so this forms a 2D array which replaces the the two
>>> for
>>> loops? Do I have that right?
>> 
>> Yes, but as Zachary showed, if you need the full index in a dimension,
>> then you can use slicing. It might be faster.
>> And a warning, mixing slices and index arrays with 3 or more
>> dimensions can have some surprise switching of axes.
>> 
>> Josef
>> 
>>>
>>> A lot quicker...!
>>>
>>> Martin
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>> array,
>>>>> but
>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>> arrays
>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>> as
>>>>> well
>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>> to
>>>>> get
>>>>> over the habit of using loops (C convert for my sins). I get that one
>>>>> could
>>>>> precompute the indices's i and j i.e.
>>>>>
>>>>> i = np.arange(tsteps)
>>>>> j = np.arange(numpts)
>>>>>
>>>>> but just can't get my head round how i then use them...
>>>>>
>>>>> Thanks,
>>>>> Martin
>>>>>
>>>>> import numpy as np
>>>>>
>>>>> numpts=10
>>>>> tsteps = 12
>>>>> vari = 22
>>>>>
>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>> index = np.arange(numpts)
>>>>>
>>>>> for i in xrange(tsteps):
>>>>> ? ?for j in xrange(numpts):
>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>
>>>> The index arrays need to be broadcastable against each other.
>>>>
>>>> I think this should do it
>>>>
>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>
>>>> Josef
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Fri May 21 22:41:54 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 21 May 2010 22:41:54 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28640656.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
Message-ID: <AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>

On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Also I then need to remap the 2D array I make onto another grid (the world in
> this case). Which again I had am doing with a loop (note numpts is a lot
> bigger than my example above).
>
> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
> for i in xrange(numpts):
> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats func
> ? ? ? ?x = data1_snow[:,i]
> ? ? ? ?x = x[np.isfinite(x)]
> ? ? ? ?y = data2_snow[:,i]
> ? ? ? ?y = y[np.isfinite(y)]
>
> ? ? ? ?# wilcox signed rank test
> ? ? ? ?# make sure we have enough samples to do the test
> ? ? ? ?d = x - y
> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
> differences
> ? ? ? ?count = len(d)
> ? ? ? ?if count > 10:
> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
> ? ? ? ? ? ?# only map out sign different data
> ? ? ? ? ? ?if pval < 0.05:
> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> np.mean(x - y)
>
> Now I think I can push the data in one move into the wilcoxStats_snow array
> by removing the index,
> but I can't see how I will get the individual x and y pts for each array
> member correctly without the loop, this was my attempt which of course
> doesn't work!
>
> x = data1_snow[:,:]
> x = x[np.isfinite(x)]
> y = data2_snow[:,:]
> y = y[np.isfinite(y)]
>
> # r^2
> # exclude v.small arrays, i.e. we need just less over 4 years of data
> if len(x) and len(y) > 50:
> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x, y)[0])**2


If you want to do pairwise comparisons with stats.wilcoxon, then you
might be stuck with the loop, since wilcoxon takes only two 1d arrays
at a time (if I read the help correctly).

Also the presence of nans might force the use a loop. stats.mstats has
masked array versions, but I didn't see wilcoxon in the list. (Even
when vectorized operations would work with regular arrays, nan or
masked array versions still have to loop in many cases.)

If you have many columns with count <= 10, so that wilcoxon is not
calculated then it might be worth to use only array operations up to
that point. If wilcoxon is calculated most of the time, then it's not
worth thinking too hard about this.

Josef


>
> thanks.
>
>
>
>
> mdekauwe wrote:
>>
>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>
>> I don't quite get what you mean about slicing with axis > 3. Is there a
>> link you can recommend I should read? Does that mean given I have 4dims
>> that Josef's suggestion would be more advised in this case?

There were several discussions on the mailing lists (fancy slicing and
indexing). Your case is safe, but if you run in future into funny
shapes, you can look up the details.
when in doubt, I use np.arange(...)

Josef

>>
>> Thanks.
>>
>>
>>
>> josef.pktd wrote:
>>>
>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Thanks that works...
>>>>
>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the step
>>>> I
>>>> was struggling with, so this forms a 2D array which replaces the the two
>>>> for
>>>> loops? Do I have that right?
>>>
>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>> then you can use slicing. It might be faster.
>>> And a warning, mixing slices and index arrays with 3 or more
>>> dimensions can have some surprise switching of axes.
>>>
>>> Josef
>>>
>>>>
>>>> A lot quicker...!
>>>>
>>>> Martin
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>> array,
>>>>>> but
>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>> arrays
>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>> as
>>>>>> well
>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>> to
>>>>>> get
>>>>>> over the habit of using loops (C convert for my sins). I get that one
>>>>>> could
>>>>>> precompute the indices's i and j i.e.
>>>>>>
>>>>>> i = np.arange(tsteps)
>>>>>> j = np.arange(numpts)
>>>>>>
>>>>>> but just can't get my head round how i then use them...
>>>>>>
>>>>>> Thanks,
>>>>>> Martin
>>>>>>
>>>>>> import numpy as np
>>>>>>
>>>>>> numpts=10
>>>>>> tsteps = 12
>>>>>> vari = 22
>>>>>>
>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>> index = np.arange(numpts)
>>>>>>
>>>>>> for i in xrange(tsteps):
>>>>>> ? ?for j in xrange(numpts):
>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>
>>>>> The index arrays need to be broadcastable against each other.
>>>>>
>>>>> I think this should do it
>>>>>
>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>
>>>>> Josef
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From mdekauwe at gmail.com  Sat May 22 06:21:09 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Sat, 22 May 2010 03:21:09 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
Message-ID: <28642434.post@talk.nabble.com>


Sounds like I am stuck with the loop as I need to do the comparison for each
pixel of the world and then I have a basemap function call which I guess
slows it down further...hmm

i.e.

def compareSnowData(jules_var):
    # Extract the 11 years of snow data and return 
    outrows = 180
    outcols = 360
    numyears = 11
    nummonths = 12
    
    # Read various files
    fname="world_valid_jules_pts.ascii"
    (numpts, land_pts_index, latitude, longitude, rows, cols) =
jo.read_land_points_ascii(fname, 1.0)
    
    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
                       timesteps=132, numvars=26)
    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
                       timesteps=132, numvars=26)

    # grab some space
    data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
    data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32) 
    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
np.nan
   
    # extract the data
    data1_snow = jules_data1[:,jules_var,:,0]
    data2_snow = jules_data2[:,jules_var,:,0]
    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
    #for month in xrange(numyears * nummonths):
    #    for i in xrange(numpts):
    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
    #        if data1 >= 0.0:
    #            data1_snow[month,i] = data1
    #        else:
    #            data1_snow[month,i] = np.nan 
    #        if data2 > 0.0:        
    #            data2_snow[month,i] = data2
    #        else:
    #            data2_snow[month,i] = np.nan                                    
    
    # exclude any months from *both* arrays where we have dodgy data, else
we
    # can't do the correlations correctly!!
    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)    
    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)   
    
    # put data on a regular grid...
    print 'regridding landpts...'
    for i in xrange(numpts):
        # exclude the NaN, note masking them doesn't work in the stats func
        x = data1_snow[:,i]
        x = x[np.isfinite(x)]
        y = data2_snow[:,i]
        y = y[np.isfinite(y)]
        
        # r^2
        # exclude v.small arrays, i.e. we need just less over 4 years of
data
        if len(x) and len(y) > 50:
            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
(stats.pearsonr(x, y)[0])**2  
        
        # wilcox signed rank test
        # make sure we have enough samples to do the test
        d = x - y
        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
differences
        count = len(d)
        if count > 10:
            z, pval = stats.wilcoxon(x, y)
            # only map out sign different data
            if pval < 0.05:
                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
np.mean(x - y)
               
    return (pearsonsr_snow, wilcoxStats_snow)


josef.pktd wrote:
> 
> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Also I then need to remap the 2D array I make onto another grid (the
>> world in
>> this case). Which again I had am doing with a loop (note numpts is a lot
>> bigger than my example above).
>>
>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>> for i in xrange(numpts):
>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>> func
>> ? ? ? ?x = data1_snow[:,i]
>> ? ? ? ?x = x[np.isfinite(x)]
>> ? ? ? ?y = data2_snow[:,i]
>> ? ? ? ?y = y[np.isfinite(y)]
>>
>> ? ? ? ?# wilcox signed rank test
>> ? ? ? ?# make sure we have enough samples to do the test
>> ? ? ? ?d = x - y
>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>> differences
>> ? ? ? ?count = len(d)
>> ? ? ? ?if count > 10:
>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>> ? ? ? ? ? ?# only map out sign different data
>> ? ? ? ? ? ?if pval < 0.05:
>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>> np.mean(x - y)
>>
>> Now I think I can push the data in one move into the wilcoxStats_snow
>> array
>> by removing the index,
>> but I can't see how I will get the individual x and y pts for each array
>> member correctly without the loop, this was my attempt which of course
>> doesn't work!
>>
>> x = data1_snow[:,:]
>> x = x[np.isfinite(x)]
>> y = data2_snow[:,:]
>> y = y[np.isfinite(y)]
>>
>> # r^2
>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>> if len(x) and len(y) > 50:
>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>> y)[0])**2
> 
> 
> If you want to do pairwise comparisons with stats.wilcoxon, then you
> might be stuck with the loop, since wilcoxon takes only two 1d arrays
> at a time (if I read the help correctly).
> 
> Also the presence of nans might force the use a loop. stats.mstats has
> masked array versions, but I didn't see wilcoxon in the list. (Even
> when vectorized operations would work with regular arrays, nan or
> masked array versions still have to loop in many cases.)
> 
> If you have many columns with count <= 10, so that wilcoxon is not
> calculated then it might be worth to use only array operations up to
> that point. If wilcoxon is calculated most of the time, then it's not
> worth thinking too hard about this.
> 
> Josef
> 
> 
>>
>> thanks.
>>
>>
>>
>>
>> mdekauwe wrote:
>>>
>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>
>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>> link you can recommend I should read? Does that mean given I have 4dims
>>> that Josef's suggestion would be more advised in this case?
> 
> There were several discussions on the mailing lists (fancy slicing and
> indexing). Your case is safe, but if you run in future into funny
> shapes, you can look up the details.
> when in doubt, I use np.arange(...)
> 
> Josef
> 
>>>
>>> Thanks.
>>>
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Thanks that works...
>>>>>
>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>> step
>>>>> I
>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>> two
>>>>> for
>>>>> loops? Do I have that right?
>>>>
>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>> then you can use slicing. It might be faster.
>>>> And a warning, mixing slices and index arrays with 3 or more
>>>> dimensions can have some surprise switching of axes.
>>>>
>>>> Josef
>>>>
>>>>>
>>>>> A lot quicker...!
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>> array,
>>>>>>> but
>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>> arrays
>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>> as
>>>>>>> well
>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>> to
>>>>>>> get
>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>> one
>>>>>>> could
>>>>>>> precompute the indices's i and j i.e.
>>>>>>>
>>>>>>> i = np.arange(tsteps)
>>>>>>> j = np.arange(numpts)
>>>>>>>
>>>>>>> but just can't get my head round how i then use them...
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Martin
>>>>>>>
>>>>>>> import numpy as np
>>>>>>>
>>>>>>> numpts=10
>>>>>>> tsteps = 12
>>>>>>> vari = 22
>>>>>>>
>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>> index = np.arange(numpts)
>>>>>>>
>>>>>>> for i in xrange(tsteps):
>>>>>>> ? ?for j in xrange(numpts):
>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>
>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>
>>>>>> I think this should do it
>>>>>>
>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>
>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Sat May 22 08:59:50 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 22 May 2010 08:59:50 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28642434.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
Message-ID: <AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>

On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Sounds like I am stuck with the loop as I need to do the comparison for each
> pixel of the world and then I have a basemap function call which I guess
> slows it down further...hmm

I don't see much that could be done differently, after a brief look.

stats.pearsonr could be replaced by an array version using directly
the formula for correlation even with nans. wilcoxon looks slow, and I
never tried or seen a faster version.

just a reminder, the p-values are for a single test, when you have
many of them, then they don't have the right size/confidence level for
an overall or joint test. (some packages report a Bonferroni
correction in this case)

Josef


>
> i.e.
>
> def compareSnowData(jules_var):
> ? ?# Extract the 11 years of snow data and return
> ? ?outrows = 180
> ? ?outcols = 360
> ? ?numyears = 11
> ? ?nummonths = 12
>
> ? ?# Read various files
> ? ?fname="world_valid_jules_pts.ascii"
> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
> jo.read_land_points_ascii(fname, 1.0)
>
> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>
> ? ?# grab some space
> ? ?data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
> ? ?data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
> np.nan
>
> ? ?# extract the data
> ? ?data1_snow = jules_data1[:,jules_var,:,0]
> ? ?data2_snow = jules_data2[:,jules_var,:,0]
> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
> ? ?#for month in xrange(numyears * nummonths):
> ? ?# ? ?for i in xrange(numpts):
> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
> ? ?# ? ? ? ?if data1 >= 0.0:
> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
> ? ?# ? ? ? ?else:
> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
> ? ?# ? ? ? ?if data2 > 0.0:
> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
> ? ?# ? ? ? ?else:
> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>
> ? ?# exclude any months from *both* arrays where we have dodgy data, else
> we
> ? ?# can't do the correlations correctly!!
> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>
> ? ?# put data on a regular grid...
> ? ?print 'regridding landpts...'
> ? ?for i in xrange(numpts):
> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats func
> ? ? ? ?x = data1_snow[:,i]
> ? ? ? ?x = x[np.isfinite(x)]
> ? ? ? ?y = data2_snow[:,i]
> ? ? ? ?y = y[np.isfinite(y)]
>
> ? ? ? ?# r^2
> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of
> data
> ? ? ? ?if len(x) and len(y) > 50:
> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> (stats.pearsonr(x, y)[0])**2
>
> ? ? ? ?# wilcox signed rank test
> ? ? ? ?# make sure we have enough samples to do the test
> ? ? ? ?d = x - y
> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
> differences
> ? ? ? ?count = len(d)
> ? ? ? ?if count > 10:
> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
> ? ? ? ? ? ?# only map out sign different data
> ? ? ? ? ? ?if pval < 0.05:
> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> np.mean(x - y)
>
> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>
>
> josef.pktd wrote:
>>
>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Also I then need to remap the 2D array I make onto another grid (the
>>> world in
>>> this case). Which again I had am doing with a loop (note numpts is a lot
>>> bigger than my example above).
>>>
>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>>> for i in xrange(numpts):
>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>> func
>>> ? ? ? ?x = data1_snow[:,i]
>>> ? ? ? ?x = x[np.isfinite(x)]
>>> ? ? ? ?y = data2_snow[:,i]
>>> ? ? ? ?y = y[np.isfinite(y)]
>>>
>>> ? ? ? ?# wilcox signed rank test
>>> ? ? ? ?# make sure we have enough samples to do the test
>>> ? ? ? ?d = x - y
>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>>> differences
>>> ? ? ? ?count = len(d)
>>> ? ? ? ?if count > 10:
>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>> ? ? ? ? ? ?# only map out sign different data
>>> ? ? ? ? ? ?if pval < 0.05:
>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>> np.mean(x - y)
>>>
>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>> array
>>> by removing the index,
>>> but I can't see how I will get the individual x and y pts for each array
>>> member correctly without the loop, this was my attempt which of course
>>> doesn't work!
>>>
>>> x = data1_snow[:,:]
>>> x = x[np.isfinite(x)]
>>> y = data2_snow[:,:]
>>> y = y[np.isfinite(y)]
>>>
>>> # r^2
>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>> if len(x) and len(y) > 50:
>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>> y)[0])**2
>>
>>
>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>> at a time (if I read the help correctly).
>>
>> Also the presence of nans might force the use a loop. stats.mstats has
>> masked array versions, but I didn't see wilcoxon in the list. (Even
>> when vectorized operations would work with regular arrays, nan or
>> masked array versions still have to loop in many cases.)
>>
>> If you have many columns with count <= 10, so that wilcoxon is not
>> calculated then it might be worth to use only array operations up to
>> that point. If wilcoxon is calculated most of the time, then it's not
>> worth thinking too hard about this.
>>
>> Josef
>>
>>
>>>
>>> thanks.
>>>
>>>
>>>
>>>
>>> mdekauwe wrote:
>>>>
>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>
>>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>>> link you can recommend I should read? Does that mean given I have 4dims
>>>> that Josef's suggestion would be more advised in this case?
>>
>> There were several discussions on the mailing lists (fancy slicing and
>> indexing). Your case is safe, but if you run in future into funny
>> shapes, you can look up the details.
>> when in doubt, I use np.arange(...)
>>
>> Josef
>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Thanks that works...
>>>>>>
>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>>> step
>>>>>> I
>>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>>> two
>>>>>> for
>>>>>> loops? Do I have that right?
>>>>>
>>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>>> then you can use slicing. It might be faster.
>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>> dimensions can have some surprise switching of axes.
>>>>>
>>>>> Josef
>>>>>
>>>>>>
>>>>>> A lot quicker...!
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>> array,
>>>>>>>> but
>>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>>> arrays
>>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>>> as
>>>>>>>> well
>>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>>> to
>>>>>>>> get
>>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>>> one
>>>>>>>> could
>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>
>>>>>>>> i = np.arange(tsteps)
>>>>>>>> j = np.arange(numpts)
>>>>>>>>
>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> import numpy as np
>>>>>>>>
>>>>>>>> numpts=10
>>>>>>>> tsteps = 12
>>>>>>>> vari = 22
>>>>>>>>
>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>> index = np.arange(numpts)
>>>>>>>>
>>>>>>>> for i in xrange(tsteps):
>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>
>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>
>>>>>>> I think this should do it
>>>>>>>
>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>>
>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From yosefmel at post.tau.ac.il  Sat May 22 10:06:09 2010
From: yosefmel at post.tau.ac.il (Yosef Meller)
Date: Sat, 22 May 2010 17:06:09 +0300
Subject: [SciPy-User] Fwd: Announcing Tracer v0.2
In-Reply-To: <AANLkTikuwFXq6do8NpgSKsXLJ0IaYF8vUoAHCJSGAU3k@mail.gmail.com>
References: <AANLkTikuwFXq6do8NpgSKsXLJ0IaYF8vUoAHCJSGAU3k@mail.gmail.com>
Message-ID: <AANLkTikwA2Tv96JIrW7zIAe_GDlo1iGC3esWmd2kF6UG@mail.gmail.com>

This is a one-time message to announce the availability of version 0.2
of the Tracer package.

About
---------
Tracer is a ray-tracing package for Python. It is geared toward solar
energy research; written for extensibility and scriptability, it is
particularly suitable for use together with optimization programs, but
your imagination is the limit.

Tracer is free-software, distributed under the GPL v3.0 license. You
are free to use, review the code or help improve it.

Features
-------------
The package contains two parts: a set of modules for building and
running ray tracing scenes; and a set of pre-assembled models.

The scene construction part provides:

* Assemblies of arbitrary complexity
* Surfaces with flat, spherical, cylindrical (new in v0.2) or
paraboloidal shapes; API for developing other shapes.
* Surface materials with specular-reflective or refractive surfaces;
API for developing more complex models
* Receiver surfaces for output collection.

The models included in the second part:

* One-side mirror
* Rectangular kaleidoscope light-guide
* Parabolic dishes with circular or hexagonal apertures.
* Heliostats field (new in v0.2).
* Spherical lens - any plano/convex/concave combination (new in v0.2).

In addition, the package contains basic tools for GUI generation using
MayaVi (new in v0.2).

More information
-------------------------
Project website: http://tracer.berlios.de/
Project mailing list: https://lists.berlios.de/mailman/listinfo/tracer-user


From chris.michalski at gmail.com  Sat May 22 17:22:23 2010
From: chris.michalski at gmail.com (Chris Michalski)
Date: Sat, 22 May 2010 14:22:23 -0700
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <AANLkTintEzd6MalXE8NMiay6vvgP6_VB5-JYxBa8T6Ro@mail.gmail.com>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
	<AANLkTintEzd6MalXE8NMiay6vvgP6_VB5-JYxBa8T6Ro@mail.gmail.com>
Message-ID: <0375DFE9-9E7E-4DFB-8F93-7D2A064B4D7A@gmail.com>

Haven't worked a problem like this, but it seemed to me that combining the fact that nearby points are close in (x,y) position, couldn't you compute two metrics.  The first metric is the measure of distance from each point on the edge to an arbitrary point (which could be inside or outside the boomerang).  Since the distance isn't unique (there could be the same distance to two or three points on the surface) then construct a metric with is the angle to the point. Sort the distance metric.  Then rely on the fact that the angle metric can't change quickly between adjacent points to select from a small range in the sorted distance metric which point is closest.  This should put adjacent points close enough together that the true nearest neighbor involves a search over a only couple of points either side of any individual point.  You might be able to sort both metric and run the search jumping between metrics.

Chris


On May 21, 2010, at 2:46 PM, Anne Archibald wrote:

> On 21 May 2010 17:07,  <DParker at chromalloy.com> wrote:
>> I have a set of geometric data in x, y plane that represents a section of a
>> turbine airfoil. The shape looks something like a fat boomerang with the
>> coordinates wrapping around the entire shape (a completely closed loop). The
>> coordinate points are in a random order and I need to sort or fit them by
>> proximity to develop a dataset containing continuos shape of the airfoil.
>> 
>> I started looking through the interpolation functions but I would need a
>> method that ignores the order of the data (fits based on proximity of the
>> points) and can handle data that forms a closed loop.
>> 
>> The points are spaced closely enough along the airfoil surface so that they
>> could be sorted by nearest neighbor - start with the first point find the
>> next closest point and continue until all the points are "consumed".
>> 
>> Any advice or pointers would be greatly appreciated.
> 
> The most direct approach is to pick a start point at random, then ask
> for its two nearest neighbours. Then pick one, and loop. For each
> point ask for its two nearest neighbours; one should be the last point
> you looked at, and one should be the next point on your curve. If ever
> this isn't true, you've found some place where your points don't
> sample closely enough to clearly describe the turbine shape. When you
> get your first point back, you're done.
> 
> As described, this is a fairly slow process, but the dominating
> operation is not the python looping overhead but the time it takes to
> find each nearest neighbour. Fortunately scipy.spatial includes an
> object designed for this sort of problem, the kd-tree. So the way I'd
> solve your problem is construct a kd-tree from your array of points,
> then run a query asking the the three closest neighbours of each of
> your original points (three because each point is its own closest
> neighbour). Then just write a python loop to walk through the array of
> neighbours as I described above. This process should be nice and fast,
> and will diagnose some situations where you've inadequately sampled
> your object.
> 
> Anne
> 
>> 
>> David Parker
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From gruben at bigpond.net.au  Sat May 22 20:46:21 2010
From: gruben at bigpond.net.au (Gary Ruben)
Date: Sun, 23 May 2010 10:46:21 +1000
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <896588.9118.qm@web33006.mail.mud.yahoo.com>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
	<896588.9118.qm@web33006.mail.mud.yahoo.com>
Message-ID: <4BF87ADD.9060501@bigpond.net.au>

I've previously done something similar to David's suggestion - used a 
Delaunay triangulation to get the points, then used NetworkX (NX) to 
turn this into a Euclidean/Geometric graph structure (Google 
delaunay2d_nx.py). The minimum spanning tree of this graph can be found 
and traversed by NX to visit all the nodes in the correct order. I used 
the NX dfs_preorder() traversal algorithm, culled off the side branches 
and spline fit the remaining ordered nodes. You could probably use 
astar_path() instead - I don't think this was available when I did it. 
Either way, you would also want to duplicate the first or last node to 
close the path.

Gary R.

David Baddeley wrote:
> Hi David,
> 
> I'd probably do a Delaunay triangularisation and then, starting at an
> arbitrary node, walk the shortest edges, collecting nodes as I went.
> You can get the triangulation from the scikits.delaunay package,
> which you'll probably already have if you've got matplotlib installed
> (in this case you can find it as matplotlib.delaunay).
> 
> You'll need to write a loop (or recursive function) to do the
> walking, but that shouldn't be too tricky. I've done something
> similar (collecting 'blobs' of unstructured points which are closer
> than a certain cutoff) using this technique, so if you need any
> additional pointers, or ideas on how to optimise the procedure (I
> precompute a database mapping each vertex to all the edges leading
> from it - otherwise you've got to loop through the entire edge list
> on each iteration to find which edges go from the current
> node/vertex) give me a bell.
> 
> cheers, David


From charlesr.harris at gmail.com  Sat May 22 22:38:38 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 22 May 2010 20:38:38 -0600
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <4BF87ADD.9060501@bigpond.net.au>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
	<896588.9118.qm@web33006.mail.mud.yahoo.com>
	<4BF87ADD.9060501@bigpond.net.au>
Message-ID: <AANLkTin1BbL-JNMARs2bR39CBKnon7Wpm0B-gpY7-y42@mail.gmail.com>

On Sat, May 22, 2010 at 6:46 PM, Gary Ruben <gruben at bigpond.net.au> wrote:

> I've previously done something similar to David's suggestion - used a
> Delaunay triangulation to get the points, then used NetworkX (NX) to
> turn this into a Euclidean/Geometric graph structure (Google
> delaunay2d_nx.py). The minimum spanning tree of this graph can be found
> and traversed by NX to visit all the nodes in the correct order. I used
> the NX dfs_preorder() traversal algorithm, culled off the side branches
> and spline fit the remaining ordered nodes. You could probably use
> astar_path() instead - I don't think this was available when I did it.
> Either way, you would also want to duplicate the first or last node to
> close the path.
>
>
That's interesting. From the homological point of view there are two cycles,
an inner one and an outer one. I suppose the orientation of any triangle
would then be the sign of the determinant of the matrix formed from the
vectors of it's vertices in some given order, although there is probably a
more efficient way to assign orientations. Remove the edges that cancel out
and it should be easy to find a cycle. I'll bet there is software out there
for that problem.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100522/91650e95/attachment.html>

From charlesr.harris at gmail.com  Sat May 22 23:01:52 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 22 May 2010 21:01:52 -0600
Subject: [SciPy-User] Sort geometric data by proximity
In-Reply-To: <AANLkTin1BbL-JNMARs2bR39CBKnon7Wpm0B-gpY7-y42@mail.gmail.com>
References: <OF363ABF3D.C1249588-ON8525772A.0070AC3B-8525772A.007461F5@CHROMALLOY.COM>
	<896588.9118.qm@web33006.mail.mud.yahoo.com>
	<4BF87ADD.9060501@bigpond.net.au>
	<AANLkTin1BbL-JNMARs2bR39CBKnon7Wpm0B-gpY7-y42@mail.gmail.com>
Message-ID: <AANLkTikRmJqiQiXm67QckoDvgbD3lFE_OcPG1YSQuN1C@mail.gmail.com>

On Sat, May 22, 2010 at 8:38 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, May 22, 2010 at 6:46 PM, Gary Ruben <gruben at bigpond.net.au> wrote:
>
>> I've previously done something similar to David's suggestion - used a
>> Delaunay triangulation to get the points, then used NetworkX (NX) to
>> turn this into a Euclidean/Geometric graph structure (Google
>> delaunay2d_nx.py). The minimum spanning tree of this graph can be found
>> and traversed by NX to visit all the nodes in the correct order. I used
>> the NX dfs_preorder() traversal algorithm, culled off the side branches
>> and spline fit the remaining ordered nodes. You could probably use
>> astar_path() instead - I don't think this was available when I did it.
>> Either way, you would also want to duplicate the first or last node to
>> close the path.
>>
>>
> That's interesting. From the homological point of view there are two
> cycles, an inner one and an outer one. I suppose the orientation of any
> triangle would then be the sign of the determinant of the matrix formed from
> the vectors of it's vertices in some given order, although there is probably
> a more efficient way to assign orientations. Remove the edges that cancel
> out and it should be easy to find a cycle. I'll bet there is software out
> there for that problem.
>
>
Umm, boundaries, not cycles ;)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100522/7ce8a56d/attachment.html>

From cool-rr at cool-rr.com  Sun May 23 11:42:53 2010
From: cool-rr at cool-rr.com (cool-RR)
Date: Sun, 23 May 2010 17:42:53 +0200
Subject: [SciPy-User] Do NumPy and SciPy release the GIL?
Message-ID: <AANLkTimQNJ4Iq6bjLcxYLkKfj_xL_UNpm_BORLSfO_vu@mail.gmail.com>

Hello,

I'm still a newbie at NumPy/Scipy. I have a question: When I call one of
SciPy's or NumPy's efficient C routines: Do they release the GIL?

Thanks,
Ram Rachum.

(P.S. Please put me in the `to` field of any replies.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100523/f99f2e4e/attachment.html>

From robert.kern at gmail.com  Sun May 23 13:47:54 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 23 May 2010 12:47:54 -0500
Subject: [SciPy-User] Do NumPy and SciPy release the GIL?
In-Reply-To: <AANLkTimQNJ4Iq6bjLcxYLkKfj_xL_UNpm_BORLSfO_vu@mail.gmail.com>
References: <AANLkTimQNJ4Iq6bjLcxYLkKfj_xL_UNpm_BORLSfO_vu@mail.gmail.com>
Message-ID: <AANLkTinXYhyyS8qzqBFTiGhXmKErNANSxFxC-OhLX-Ki@mail.gmail.com>

On Sun, May 23, 2010 at 10:42, cool-RR <cool-rr at cool-rr.com> wrote:
> Hello,
> I'm still a newbie at NumPy/Scipy.?I have a question: When I call one of
> SciPy's or NumPy's efficient C routines: Do they release the GIL?

Sometimes. You will have to check the sources of the particular
function in order to determine that.

> Thanks,
> Ram Rachum.
> (P.S. Please put me in the `to` field of any replies.)

If you want this, then you should set your Reply-To: header
appropriately. But everyone would be much happier if you were to
simply subscribe to the lists in which you are asking questions. You
are placing a significant annoyance on those who would donate their
time to help you.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From d.l.goldsmith at gmail.com  Mon May 24 00:37:18 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sun, 23 May 2010 21:37:18 -0700
Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation Marathon
	is on!
Message-ID: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>

Hi, folks.  It's SciPy Marathon time again!

For those who have just joined us: the past two summers,
volunteers<%3Chttp://docs.scipy.org/numpy/contributors/%3E>from the
NumPy/SciPy community have worked together to improve NumPy's
documentation.  So far, we have written most of the docs for NumPy (see,
e.g., http://conference.scipy.org/proceedings/SciPy2009/paper_14/), using a
Wiki application (pydocweb <%3Chttp://code.google.com/p/pydocweb/>, thanks
to Pauli-Virtanen, Emmanuelle Gouillart, St?fan van der Walt, and Gael
Varoquaux) that we use to edit and manage the
docs<%3Chttp://docs.scipy.org/numpy/docs/%3E>
.  We have advanced the NumPy docstrings from only 8% being Ready For Review
or better, to 85% being so (click here
<http://docs.scipy.org/numpy/stats/>for details).

This summer, we will focus on doing the same thing for SciPy.  To
participate (and we very much hope you will), please start by reading
http://docs.scipy.org/numpy/Front%20Page/, and, in particular, the Before
You Start section (but it's all important).  (Despite the URL and any
wording that may make it seem otherwise, the information there is as
applicable to editing the SciPy documentation as it is to NumPy.)

As far as actually performing the work is concerned, we will again attack
these things as teams: go to http://docs.scipy.org/scipy/Milestones/ and
poke around.  Figure out where you think you could do the most good, and
join that team by appending your name below the Milestone(s).  If you think
you could help lead a team (i.e., answer technical questions about the
subject(s) encompassed by a Milestone) and are willing to do so, please
append (L) to your name.

As the summer progresses, look for Marathon-related announcements on the
scipy-dev email list (subscription strongly recommended for Marathon
participants).  Happy editing!

David Goldsmith
Editor Pro Tem
-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100523/7bb9a7f3/attachment.html>

From cool-rr at cool-rr.com  Mon May 24 04:47:08 2010
From: cool-rr at cool-rr.com (cool-RR)
Date: Mon, 24 May 2010 10:47:08 +0200
Subject: [SciPy-User] Do NumPy and SciPy release the GIL?
In-Reply-To: <AANLkTinXYhyyS8qzqBFTiGhXmKErNANSxFxC-OhLX-Ki@mail.gmail.com>
References: <AANLkTimQNJ4Iq6bjLcxYLkKfj_xL_UNpm_BORLSfO_vu@mail.gmail.com>
	<AANLkTinXYhyyS8qzqBFTiGhXmKErNANSxFxC-OhLX-Ki@mail.gmail.com>
Message-ID: <AANLkTilNJA0edAyWW0pjn1r8x6Yy4Me0lbvAM_mCtYcE@mail.gmail.com>

On Sun, May 23, 2010 at 7:47 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, May 23, 2010 at 10:42, cool-RR <cool-rr at cool-rr.com> wrote:
> > Hello,
> > I'm still a newbie at NumPy/Scipy. I have a question: When I call one of
> > SciPy's or NumPy's efficient C routines: Do they release the GIL?
>
> Sometimes. You will have to check the sources of the particular
> function in order to determine that.


Thanks for the info.


>

> Thanks,
> > Ram Rachum.
> > (P.S. Please put me in the `to` field of any replies.)
>
> If you want this, then you should set your Reply-To: header
> appropriately. But everyone would be much happier if you were to
> simply subscribe to the lists in which you are asking questions. You
> are placing a significant annoyance on those who would donate their
> time to help you.
>

I really don't know what to about this. If I'm setting the Reply-To, I
should set it to both my address and the list's address. But I'm on G-Mail,
and I can only set the Reply-To header globally. If I set it to my personal
mail address, then by default when people will reply to me it will not get
to the list, which would be undesirable as well.

Regarding actually subscribing to the lists: I'm not sure how I'm supposed
to deal with the 90% of messages that I'm not interested in. I'm active in a
few dozen mailing lists, and I wouldn't want to receive every single message
that is sent to each of them.

The nicest arrangement I have is with lists managed by Google Groups: They
have a feature where you can pick a specific thread and ask to receive
emails only from it. That's really convenient. Of course, Google Groups has
its own drawbacks.

If you have any suggestion about what I can do, I'd be happy to hear it.

(Of course you may reply to this off-list, this is probably not of interest
to this list.)

Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100524/43376da3/attachment.html>

From millman at berkeley.edu  Mon May 24 06:45:44 2010
From: millman at berkeley.edu (Jarrod Millman)
Date: Mon, 24 May 2010 03:45:44 -0700
Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation
	Marathon is on!
In-Reply-To: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
References: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
Message-ID: <AANLkTimfclfsPwcogDbll2sfM7l__GQvZOZjxA9iQmOV@mail.gmail.com>

On Sun, May 23, 2010 at 9:37 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> Hi, folks.? It's SciPy Marathon time again!

Excellent!  Thanks for spearheading this again David.  The
improvements to our documentation over the previous marathons have
been staggering and I look forward to seeing how much is accomplished
this summer.

Thanks,
Jarrod


From harald.schilly at gmail.com  Mon May 24 07:22:56 2010
From: harald.schilly at gmail.com (Harald Schilly)
Date: Mon, 24 May 2010 13:22:56 +0200
Subject: [SciPy-User] Do NumPy and SciPy release the GIL?
In-Reply-To: <AANLkTilNJA0edAyWW0pjn1r8x6Yy4Me0lbvAM_mCtYcE@mail.gmail.com>
References: <AANLkTimQNJ4Iq6bjLcxYLkKfj_xL_UNpm_BORLSfO_vu@mail.gmail.com> 
	<AANLkTinXYhyyS8qzqBFTiGhXmKErNANSxFxC-OhLX-Ki@mail.gmail.com> 
	<AANLkTilNJA0edAyWW0pjn1r8x6Yy4Me0lbvAM_mCtYcE@mail.gmail.com>
Message-ID: <AANLkTinKDdLACJSin8yo8TsZdV20bLmjMTAGj_n1dcQY@mail.gmail.com>

On Mon, May 24, 2010 at 10:47, cool-RR <cool-rr at cool-rr.com> wrote:
> If you have any suggestion about what I can do, I'd be happy to hear it.


in gmail, in the top menu in the "more actions" dropdown select
"filter messages like these", apply a label "scipy" and check the
"skip inbox" (or something like that). then you do not see the
messages and you can click on the scipy label to read them.

h


From Dharhas.Pothina at twdb.state.tx.us  Tue May 25 10:17:55 2010
From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina)
Date: Tue, 25 May 2010 09:17:55 -0500
Subject: [SciPy-User] Parameterizing a curve / Map curve to line
Message-ID: <4BFB95C4.63BA.009B.0@twdb.state.tx.us>

Hi,

I'm am trying to correct some bathymetric data. We have a series of x,y,z values that represent the boat path (x,y) and depth (z) on a river. Due to trees etc in some sections of the data the GPS went out leading to spurious x & y values. We have manually drawn in the boat path in those sections and I am trying to move the spurious points onto this manually drawn path.

Essentially I have a points dataset (x,y,z) that contains all the data points and a boatpath dataset (x,y) that represents the actual boat path. The plan is to traverse the boat path curve and if the next point lies on the boatpath leave it alone, if it does not to use the boat speed (known value) to calculate how much farther along the boat path curve the next point should be and move it to that location.

My plan to do this involved either parameterizing or mapping the boat path curve to a 1D line and then mapping the final corrected points back to the curve. One way of doing this that I thought of was to interpolate the boat path curve to a higher resolution (ie 0.1ft spacing etc) and then calculate the cumulative distance along the line from the origin. 

The boat path is not monotonically increasing in x, curves around and sometimes can loop around. Looking at the interp and spline functions in scipy I'm unsure how to interpolate to a 0.1ft spacing in this case. Also any other ideas for achieving the above is welcomed.

- dharhas


From zachary.pincus at yale.edu  Tue May 25 11:20:58 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 25 May 2010 11:20:58 -0400
Subject: [SciPy-User] Parameterizing a curve / Map curve to line
In-Reply-To: <4BFB95C4.63BA.009B.0@twdb.state.tx.us>
References: <4BFB95C4.63BA.009B.0@twdb.state.tx.us>
Message-ID: <CE26F73A-A0A7-4E5C-8D22-8F66C30FC11E@yale.edu>

> Essentially I have a points dataset (x,y,z) that contains all the  
> data points and a boatpath dataset (x,y) that represents the actual  
> boat path. The plan is to traverse the boat path curve and if the  
> next point lies on the boatpath leave it alone, if it does not to  
> use the boat speed (known value) to calculate how much farther along  
> the boat path curve the next point should be and move it to that  
> location.
>
> My plan to do this involved either parameterizing or mapping the  
> boat path curve to a 1D line and then mapping the final corrected  
> points back to the curve. One way of doing this that I thought of  
> was to interpolate the boat path curve to a higher resolution (ie  
> 0.1ft spacing etc) and then calculate the cumulative distance along  
> the line from the origin.
>
> The boat path is not monotonically increasing in x, curves around  
> and sometimes can loop around. Looking at the interp and spline  
> functions in scipy I'm unsure how to interpolate to a 0.1ft spacing  
> in this case. Also any other ideas for achieving the above is  
> welcomed.

You could use the scipy.interpolate.splprep routines to fit a  
parametric spline to your (x,y,p) data (where p is some parameter:  
timestamps if you have them, else just monotonically increasing  
indices), then use splev to resample the curve. (If you have  
timestamps and speeds, you could probably figure out a set of  
evaluation times that ought to yield 0.1ft resolution, but you  
probably don't need that kind of precision: you could just upsample by  
10-fold or whatever if desired.)

This is probably overkill, though... you could just use numpy.interp  
to linearly interpolate your data up by 10-fold or something. (Again,  
parameterized by t or some index: you just need to resample the x and  
y data separately and then put it back together... this is what the  
splprep routines do under the hood.)

And of course for this part of your plan:
> traverse the boat path curve and if the next point lies on the  
> boatpath leave it alone

I would recommend not just testing points for equality, especially  
after resampling, but for each point, calculate the minimum distance  
from that point to the boatpath, and if it's below a threshold value,  
leave it alone.

Zach


From afraser at lanl.gov  Tue May 25 12:39:55 2010
From: afraser at lanl.gov (Andy Fraser)
Date: Tue, 25 May 2010 10:39:55 -0600
Subject: [SciPy-User] using multiple processors for particle filtering
Message-ID: <8739xgndes.fsf@lanl.gov>

I am using a particle filter to estimate the trajectory of a camera
based on a sequence of images taken by the camera.  The code is slow,
but I have 8 processors in my desktop machine.  I'd like to use them
to get results 8 times faster.  I've been looking at the following
sections of http://docs.python.org/library: "16.6. multiprocessing"
and "16.2. threading".  I've also read some discussion from 2006 on
scipy-user at scipy.org about seeds for random numbers in threads.  I
don't have any experience with multiprocessing and would appreciate
advice.

Here is a bit of code that I want to modify:

        for i in xrange(len(self.particles)):
            self.particles[i] = self.particles[i].random_fork()

Each particle is a class instance that represents a possible camera
state (position, orientation, and velocities).  particle.random_fork()
is a method that moves the position and orientation based on current
velocities and then uses numpy.random.standard_normal((N,)) to perturb
the velocities.  I handle the correlation structure of the noise by
matrices that are members of particle, and I do some of the
calculations in c++.

I would like to do something like:

        for i in xrange(len(self.particles)):
            nv = numpy.random.standard_normal((N,))
            launch_on_any_available_processor(
                self.particles[i] = self.particles[i].random_fork(nv)
            )
        wait_for_completions()

But I don't see a command like "launch_on_any_available_processor".
I would be grateful for any advice.

-- 
Andy Fraser				ISR-2	(MS:B244)
afraser at lanl.gov			Los Alamos National Laboratory
505 665 9448				Los Alamos, NM 87545


From yosefm at gmail.com  Thu May 20 01:06:46 2010
From: yosefm at gmail.com (Yosef Meller)
Date: Thu, 20 May 2010 08:06:46 +0300
Subject: [SciPy-User] Announcing Tracer v0.2
Message-ID: <AANLkTikuwFXq6do8NpgSKsXLJ0IaYF8vUoAHCJSGAU3k@mail.gmail.com>

This is a one-time message to announce the availability of version 0.2
of the Tracer package.

About
---------
Tracer is a ray-tracing package for Python. It is geared toward solar
energy research; written for extensibility and scriptability, it is
particularly suitable for use together with optimization programs, but
your imagination is the limit.

Tracer is free-software, distributed under the GPL v3.0 license. You
are free to use, review the code or help improve it.

Features
-------------
The package contains two parts: a set of modules for building and
running ray tracing scenes; and a set of pre-assembled models.

The scene construction part provides:

* Assemblies of arbitrary complexity
* Surfaces with flat, spherical, cylindrical (new in v0.2) or
paraboloidal shapes; API for developing other shapes.
* Surface materials with specular-reflective or refractive surfaces;
API for developing more complex models
* Receiver surfaces for output collection.

The models included in the second part:

* One-side mirror
* Rectangular kaleidoscope light-guide
* Parabolic dishes with circular or hexagonal apertures.
* Heliostats field (new in v0.2).
* Spherical lens - any plano/convex/concave combination (new in v0.2).

In addition, the package contains basic tools for GUI generation using
MayaVi (new in v0.2).

More information
-------------------------
Project website: http://tracer.berlios.de/
Project mailing list: https://lists.berlios.de/mailman/listinfo/tracer-user


From Christer.Malmberg.0653 at student.uu.se  Fri May 21 12:45:57 2010
From: Christer.Malmberg.0653 at student.uu.se (Christer Malmberg)
Date: Fri, 21 May 2010 18:45:57 +0200
Subject: [SciPy-User] Weave compilation problems
Message-ID: <20100521184557.u1k7rafqzowk8c00@webmail.uu.se>

Hi,

I'm trying to use weave, but I't doesn't work. Here is the error I get:

...
  File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py",  
line 272, in build_extension
    setup(name = module_name, ext_modules = [ext],verbose=verb)
  File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line  
184, in setup
    return old_setup(**new_attr)
  File "C:\Python26\lib\distutils\core.py", line 162, in setup
    raise SystemExit, error
CompileError: error: Bad file descriptor

I know nothing about C++ programming, the code is part of a package I  
need to use for curve fitting. I contacted the author, but he  
suggested the problem is in my setup of weave. I'm running windows,  
with the Visual Express 2008 compiler, latest SciPy/Numpy and python  
2.6.5.

I didn't find anything on this error while googleing. Anyone have an  
idea what might be the problem?

Best regards,
Christer Malmberg


From Christer.Malmberg.0653 at student.uu.se  Sat May 22 03:22:34 2010
From: Christer.Malmberg.0653 at student.uu.se (Christer Malmberg)
Date: Sat, 22 May 2010 09:22:34 +0200
Subject: [SciPy-User] Weave compilation problems
Message-ID: <20100522092234.vaco60cm2o48sogs@webmail6.uu.se>

Hi,

I'm trying to use weave, but I't doesn't work. Here is the error I get:

...
  File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 272, in
build_extension
    setup(name = module_name, ext_modules = [ext],verbose=verb)
  File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line  
184, in setup
    return old_setup(**new_attr)
  File "C:\Python26\lib\distutils\core.py", line 162, in setup
    raise SystemExit, error
CompileError: error: Bad file descriptor

I know nothing about C++ programming, the code is part of a package I  
need to use for
curve fitting. I contacted the author, but he suggested the problem is  
in my setup of
weave. I'm running windows, with the Visual Express 2008 compiler,  
latest SciPy/Numpy and
python 2.6.5.

I didn't find anything on this error while googleing. Anyone have an  
idea what might be
the problem?

Best regards,
Christer Malmberg


From et.barthel at free.fr  Sat May 22 08:06:54 2010
From: et.barthel at free.fr (et.barthel at free.fr)
Date: Sat, 22 May 2010 14:06:54 +0200
Subject: [SciPy-User] complex numbers - sign problem
Message-ID: <1274530014.4bf7c8de59422@imp.free.fr>

Hi,
not sure it's the right place to ask the question, but I don't really know where
to send it.
I have an issue with cmath sign handling:
In [75]: -1-0.j
Out[75]: (-1+0j)

In [76]: -(1+0.j)
Out[76]: (-1-0j)
is a bit strange by itself - at least to me - but combined with a multivalued
function which has a branch cut on the x-axis leads to significant and
potentially harmful sign problem. Of course one can fiddle around the problem
but I would like to be sure if this is a bug or if there is some sense to it.
Thanks,
Etienne


From josef.pktd at gmail.com  Tue May 25 13:04:00 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 25 May 2010 13:04:00 -0400
Subject: [SciPy-User] complex numbers - sign problem
In-Reply-To: <1274530014.4bf7c8de59422@imp.free.fr>
References: <1274530014.4bf7c8de59422@imp.free.fr>
Message-ID: <AANLkTikjhfBxA13w7s1EAGr622ipMfHMqM3jAJPkt3Rb@mail.gmail.com>

On Sat, May 22, 2010 at 8:06 AM,  <et.barthel at free.fr> wrote:
> Hi,
> not sure it's the right place to ask the question, but I don't really know where
> to send it.
> I have an issue with cmath sign handling:
> In [75]: -1-0.j
> Out[75]: (-1+0j)
>
> In [76]: -(1+0.j)
> Out[76]: (-1-0j)
> is a bit strange by itself - at least to me - but combined with a multivalued
> function which has a branch cut on the x-axis leads to significant and
> potentially harmful sign problem. Of course one can fiddle around the problem
> but I would like to be sure if this is a bug or if there is some sense to it.

(not an answer)
negative zero depends on the operating system

on Windows with Python 2.5:
>>> -(1+0.j)
(-1+0j)
>>> (-1-0.j)
(-1+0j)

So relying on the sign of zero doesn't look like a good strategy to me

Josef

> Thanks,
> Etienne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From zachary.pincus at yale.edu  Tue May 25 13:46:22 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 25 May 2010 13:46:22 -0400
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <8739xgndes.fsf@lanl.gov>
References: <8739xgndes.fsf@lanl.gov>
Message-ID: <EAEFE2F6-B3CB-44F3-B7A6-33D931919197@yale.edu>

> I would like to do something like:
>
>        for i in xrange(len(self.particles)):
>            nv = numpy.random.standard_normal((N,))
>            launch_on_any_available_processor(
>                self.particles[i] = self.particles[i].random_fork(nv)
>            )
>        wait_for_completions()
>
> But I don't see a command like "launch_on_any_available_processor".
> I would be grateful for any advice.
>

Look more in depth at the multiprocessing library -- it's likely to be  
what you want... more or less.

However, it might be a bit "less" than "more" because, from above,  
what it looks like you want to do is to launch many (millions?) of  
very lightweight tasks ("random_fork") on different processors. If you  
naively start up a fresh python process for each "random_fork" task,  
the startup costs will dominate (hugely so).

So you'll likely need to re-jigger the task you want to dispatch to  
each processor so that the chunks are larger:

processes = []
for i in range(num_processors):
    
processes.append(start_worker_process(num_particles=total_particles// 
num_processors))
wait_for_processes_to_end()
self.particles = numpy.concatenate([process.particles])

Though even this might be a too-granular a task, if you have numerous  
time-steps for which you need to generate particles. (E.g. if the  
above would be within another large loop.) Then you'd need to write  
the worker processes as sort of "particle servers":

processes = []
for i in range(num_processors):
   processes.append(start_worker())

for task in huge_task_list:
   sub_tasks = divide_tasks(num_processors)
   for process, sub_task in zip(processes, sub_tasks):
     process.start(sub_task)
   wait_for_processes_to_complete_task()
   self.results = assemble_results(processes)


This is of course pretty naive still, but it'll be a start, and the  
architectures I've (roughly) outlined here fit better with the  
multiprocessing paradigm. You'll need to do some internet reading and  
looking at various examples first, to get the details of how to  
actually implement this stuff. I don't know anything off the top of my  
head, but perhaps others can chime in?

Zach


From robert.kern at gmail.com  Tue May 25 14:10:58 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 25 May 2010 14:10:58 -0400
Subject: [SciPy-User] complex numbers - sign problem
In-Reply-To: <1274530014.4bf7c8de59422@imp.free.fr>
References: <1274530014.4bf7c8de59422@imp.free.fr>
Message-ID: <AANLkTik0mgMALDhLxSESlkWtpwVVzlAz-gC7QtuKuX3V@mail.gmail.com>

On Sat, May 22, 2010 at 08:06,  <et.barthel at free.fr> wrote:
> Hi,
> not sure it's the right place to ask the question, but I don't really know where
> to send it.
> I have an issue with cmath sign handling:
> In [75]: -1-0.j
> Out[75]: (-1+0j)
>
> In [76]: -(1+0.j)
> Out[76]: (-1-0j)
> is a bit strange by itself - at least to me - but combined with a multivalued
> function which has a branch cut on the x-axis leads to significant and
> potentially harmful sign problem. Of course one can fiddle around the problem
> but I would like to be sure if this is a bug or if there is some sense to it.

This looks like cornercase with Python's parsing of the imaginary
literal. You can work around it by explicitly using the complex()
constructor:

In [7]: -1.0 - 0.0j
Out[7]: (-1+0j)

In [8]: complex(-1.0, -0.0)
Out[8]: (-1-0j)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From josef.pktd at gmail.com  Tue May 25 14:17:42 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 25 May 2010 14:17:42 -0400
Subject: [SciPy-User] complex numbers - sign problem
In-Reply-To: <AANLkTik0mgMALDhLxSESlkWtpwVVzlAz-gC7QtuKuX3V@mail.gmail.com>
References: <1274530014.4bf7c8de59422@imp.free.fr>
	<AANLkTik0mgMALDhLxSESlkWtpwVVzlAz-gC7QtuKuX3V@mail.gmail.com>
Message-ID: <AANLkTimAKB2yUm-kIAwQ6XOX8dmS3g1SJjppOrEFR-Ae@mail.gmail.com>

On Tue, May 25, 2010 at 2:10 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Sat, May 22, 2010 at 08:06, ?<et.barthel at free.fr> wrote:
>> Hi,
>> not sure it's the right place to ask the question, but I don't really know where
>> to send it.
>> I have an issue with cmath sign handling:
>> In [75]: -1-0.j
>> Out[75]: (-1+0j)
>>
>> In [76]: -(1+0.j)
>> Out[76]: (-1-0j)
>> is a bit strange by itself - at least to me - but combined with a multivalued
>> function which has a branch cut on the x-axis leads to significant and
>> potentially harmful sign problem. Of course one can fiddle around the problem
>> but I would like to be sure if this is a bug or if there is some sense to it.
>
> This looks like cornercase with Python's parsing of the imaginary
> literal. You can work around it by explicitly using the complex()
> constructor:
>
> In [7]: -1.0 - 0.0j
> Out[7]: (-1+0j)
>
> In [8]: complex(-1.0, -0.0)
> Out[8]: (-1-0j)

>>> complex(-1.0, -0.0)
(-1+0j)

Josef
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ?-- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From robert.kern at gmail.com  Tue May 25 14:38:18 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 25 May 2010 14:38:18 -0400
Subject: [SciPy-User] complex numbers - sign problem
In-Reply-To: <AANLkTimAKB2yUm-kIAwQ6XOX8dmS3g1SJjppOrEFR-Ae@mail.gmail.com>
References: <1274530014.4bf7c8de59422@imp.free.fr>
	<AANLkTik0mgMALDhLxSESlkWtpwVVzlAz-gC7QtuKuX3V@mail.gmail.com> 
	<AANLkTimAKB2yUm-kIAwQ6XOX8dmS3g1SJjppOrEFR-Ae@mail.gmail.com>
Message-ID: <AANLkTiknZoZsXnTuWg6PeWfwNoXkrkirfw6ZBsqr6T_A@mail.gmail.com>

On Tue, May 25, 2010 at 14:17,  <josef.pktd at gmail.com> wrote:
> On Tue, May 25, 2010 at 2:10 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Sat, May 22, 2010 at 08:06, ?<et.barthel at free.fr> wrote:
>>> Hi,
>>> not sure it's the right place to ask the question, but I don't really know where
>>> to send it.
>>> I have an issue with cmath sign handling:
>>> In [75]: -1-0.j
>>> Out[75]: (-1+0j)
>>>
>>> In [76]: -(1+0.j)
>>> Out[76]: (-1-0j)
>>> is a bit strange by itself - at least to me - but combined with a multivalued
>>> function which has a branch cut on the x-axis leads to significant and
>>> potentially harmful sign problem. Of course one can fiddle around the problem
>>> but I would like to be sure if this is a bug or if there is some sense to it.
>>
>> This looks like cornercase with Python's parsing of the imaginary
>> literal. You can work around it by explicitly using the complex()
>> constructor:
>>
>> In [7]: -1.0 - 0.0j
>> Out[7]: (-1+0j)
>>
>> In [8]: complex(-1.0, -0.0)
>> Out[8]: (-1-0j)
>
>>>> complex(-1.0, -0.0)
> (-1+0j)


... and use Python 2.6.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From robince at gmail.com  Tue May 25 17:16:47 2010
From: robince at gmail.com (Robin)
Date: Tue, 25 May 2010 22:16:47 +0100
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <8739xgndes.fsf@lanl.gov>
References: <8739xgndes.fsf@lanl.gov>
Message-ID: <AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>

On Tue, May 25, 2010 at 5:39 PM, Andy Fraser <afraser at lanl.gov> wrote:
> I am using a particle filter to estimate the trajectory of a camera
> based on a sequence of images taken by the camera. ?The code is slow,
> but I have 8 processors in my desktop machine. ?I'd like to use them
> to get results 8 times faster. ?I've been looking at the following
> sections of http://docs.python.org/library: "16.6. multiprocessing"
> and "16.2. threading". ?I've also read some discussion from 2006 on
> scipy-user at scipy.org about seeds for random numbers in threads. ?I
> don't have any experience with multiprocessing and would appreciate
> advice.
>
> Here is a bit of code that I want to modify:
>
> ? ? ? ?for i in xrange(len(self.particles)):
> ? ? ? ? ? ?self.particles[i] = self.particles[i].random_fork()

If the updates are independent and don't have to be done sequentially
you can use the multiprocessing.Pool interface which I've found very
convenient for this sort of thing.

Ideally if particles[i] is a class instance then random_fork could
modify itself in place instad of returning a modified copy of the
instance... then you could do something like

def update_particle(self, i):
    nv = numpy.random.standard_normal((N,))
    self.particles[i].random_fork(nv)

p = multiprocessing.Pool(8)
p.map(self.update_particle, range(len(self.particles)))

this will distribute each update_particle call to a different process
using all cores (providing the processing is independent).

I'm not sure if random is multiprocessor safe for use like this so
that would need checking but I hope this helps a bit...

cheers

Robin

> Each particle is a class instance that represents a possible camera
> state (position, orientation, and velocities). ?particle.random_fork()
> is a method that moves the position and orientation based on current
> velocities and then uses numpy.random.standard_normal((N,)) to perturb
> the velocities. ?I handle the correlation structure of the noise by
> matrices that are members of particle, and I do some of the
> calculations in c++.
>
> I would like to do something like:
>
> ? ? ? ?for i in xrange(len(self.particles)):
> ? ? ? ? ? ?nv = numpy.random.standard_normal((N,))
> ? ? ? ? ? ?launch_on_any_available_processor(
> ? ? ? ? ? ? ? ?self.particles[i] = self.particles[i].random_fork(nv)
> ? ? ? ? ? ?)
> ? ? ? ?wait_for_completions()
>
> But I don't see a command like "launch_on_any_available_processor".
> I would be grateful for any advice.
>
> --
> Andy Fraser ? ? ? ? ? ? ? ? ? ? ? ? ? ? ISR-2 ? (MS:B244)
> afraser at lanl.gov ? ? ? ? ? ? ? ? ? ? ? ?Los Alamos National Laboratory
> 505 665 9448 ? ? ? ? ? ? ? ? ? ? ? ? ? ?Los Alamos, NM 87545
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From robince at gmail.com  Tue May 25 17:19:38 2010
From: robince at gmail.com (Robin)
Date: Tue, 25 May 2010 22:19:38 +0100
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>
References: <8739xgndes.fsf@lanl.gov>
	<AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>
Message-ID: <AANLkTimSCRBduYNLO1mRkkMhS2cJsoKGAxj4oxtKGHq2@mail.gmail.com>

On Tue, May 25, 2010 at 10:16 PM, Robin <robince at gmail.com> wrote:
> If the updates are independent and don't have to be done sequentially
> you can use the multiprocessing.Pool interface which I've found very
> convenient for this sort of thing.
>
> Ideally if particles[i] is a class instance then random_fork could
> modify itself in place instad of returning a modified copy of the
> instance... then you could do something like
>
> def update_particle(self, i):
> ? ?nv = numpy.random.standard_normal((N,))
> ? ?self.particles[i].random_fork(nv)
>
> p = multiprocessing.Pool(8)
> p.map(self.update_particle, range(len(self.particles)))

Sorry - just thought it probably doesn't make sense to use map in this
case since your processing function isn't returning anything... you
can check Pool.apply_async (which returns control and lets stuff
continue in the background) and Pool.apply_sync (which is probably
what you want).

Cheers

Robin

>
> this will distribute each update_particle call to a different process
> using all cores (providing the processing is independent).
>
> I'm not sure if random is multiprocessor safe for use like this so
> that would need checking but I hope this helps a bit...
>


From robince at gmail.com  Tue May 25 17:50:54 2010
From: robince at gmail.com (Robin)
Date: Tue, 25 May 2010 22:50:54 +0100
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <AANLkTimSCRBduYNLO1mRkkMhS2cJsoKGAxj4oxtKGHq2@mail.gmail.com>
References: <8739xgndes.fsf@lanl.gov>
	<AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>
	<AANLkTimSCRBduYNLO1mRkkMhS2cJsoKGAxj4oxtKGHq2@mail.gmail.com>
Message-ID: <AANLkTikLnFY5fsYCmO0FBOpZJ2rYolvHBre5STgzJYvU@mail.gmail.com>

On Tue, May 25, 2010 at 10:19 PM, Robin <robince at gmail.com> wrote:
> Sorry - just thought it probably doesn't make sense to use map in this
> case since your processing function isn't returning anything... you
> can check Pool.apply_async (which returns control and lets stuff
> continue in the background) and Pool.apply_sync (which is probably
> what you want).

I'm being a bit silly I think - this won't work properly of course
because the particle instances will be changed in the subprocesses and
not propagated back.. but hopefully the pointer to using Pool if
useful. If you can split the task into an independent function then
it's really handy...

Maybe something like

self.updated_particles = p.map(update_particle, self.particles)
where update_particle takes a single particle instance, does the
random number generation and returns the updated particle.

Cheers

Robin


From warren.weckesser at enthought.com  Tue May 25 18:00:24 2010
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Tue, 25 May 2010 17:00:24 -0500
Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation
 Marathon is on!
In-Reply-To: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
References: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
Message-ID: <4BFC4878.2050303@enthought.com>

David Goldsmith wrote:
>
> Hi, folks.  It's SciPy Marathon time again!
>
<snip>
>
> As far as actually performing the work is concerned, we will again 
> attack these things as teams: go to 
> http://docs.scipy.org/scipy/Milestones/ and poke around.
>

David, how was that list generated?  Because of some refactoring I did 
in linalg and signal, many of the links in these modules will give a 
warning that the docstring is obsolete because the corresponding object 
is no longer present in SVN. 

Warren


From d.l.goldsmith at gmail.com  Tue May 25 18:52:42 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Tue, 25 May 2010 15:52:42 -0700
Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation
	Marathon is on!
In-Reply-To: <4BFC4878.2050303@enthought.com>
References: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
	<4BFC4878.2050303@enthought.com>
Message-ID: <AANLkTinwtoLX2DyCYhWS_-NXsX9wFsI1TAfTY6hlLp5g@mail.gmail.com>

On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser <
warren.weckesser at enthought.com> wrote:

> David Goldsmith wrote:
> >
> > Hi, folks.  It's SciPy Marathon time again!
> >
> <snip>
> >
> > As far as actually performing the work is concerned, we will again
> > attack these things as teams: go to
> > http://docs.scipy.org/scipy/Milestones/ and poke around.
> >
>
> David, how was that list generated?


http://docs.scipy.org/scipy/Milestones/log/

Jack Liddle created it "by hand" last summer.


> Because of some refactoring I did
> in linalg and signal, many of the links in these modules will give a
> warning that the docstring is obsolete because the corresponding object
> is no longer present in SVN.
>

Do you have a list of all the new objects you created?

DG

>
> Warren
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100525/8488da6c/attachment.html>

From warren.weckesser at enthought.com  Tue May 25 19:19:00 2010
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Tue, 25 May 2010 18:19:00 -0500
Subject: [SciPy-User] [Announcement] The 2010 Summer
 Documentation	Marathon is on!
In-Reply-To: <AANLkTinwtoLX2DyCYhWS_-NXsX9wFsI1TAfTY6hlLp5g@mail.gmail.com>
References: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>	<4BFC4878.2050303@enthought.com>
	<AANLkTinwtoLX2DyCYhWS_-NXsX9wFsI1TAfTY6hlLp5g@mail.gmail.com>
Message-ID: <4BFC5AE4.2020104@enthought.com>

David Goldsmith wrote:
> On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser 
> <warren.weckesser at enthought.com 
> <mailto:warren.weckesser at enthought.com>> wrote:
>
>     David Goldsmith wrote:
>     >
>     > Hi, folks.  It's SciPy Marathon time again!
>     >
>     <snip>
>     >
>     > As far as actually performing the work is concerned, we will again
>     > attack these things as teams: go to
>     > http://docs.scipy.org/scipy/Milestones/ and poke around.
>     >
>
>     David, how was that list generated?  
>
>
> http://docs.scipy.org/scipy/Milestones/log/
>
> Jack Liddle created it "by hand" last summer.
>  
>
>     Because of some refactoring I did
>     in linalg and signal, many of the links in these modules will give a
>     warning that the docstring is obsolete because the corresponding
>     object
>     is no longer present in SVN.
>
>
> Do you have a list of all the new objects you created?
>

I think this is it:

linalg: new modules (reorganized basic.py and decomp.py):
    decomp_cholesky.py
    decomp_lu.py
    decomp_qr.py
    decomp_schur.py
    decomp_svd.py
    special_matrices.py

linalg: actual new functions:
    decomp_cholesky.cho_solve_banded
    special_matrices.circulant
    special_matrices.companion
    special_matrices.hadamard
    special_matrices.leslie

signal: new modules (moved window functions from signaltools.py):
    windows.py

signal: actual new functions:
    ltisys.impulse2
    ltisys.step2
    waveforms.sweep_poly


Warren


> DG
>
>
>     Warren
>
>     _______________________________________________
>     SciPy-User mailing list
>     SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
>
> -- 
> Mathematician: noun, someone who disavows certainty when their 
> uncertainty set is non-empty, even if that set has measure zero.
>
> Hope: noun, that delusive spirit which escaped Pandora's jar and, with 
> her lies, prevents mankind from committing a general suicide.  (As 
> interpreted by Robert Graves)
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From peter.shepard at gmail.com  Tue May 25 21:35:26 2010
From: peter.shepard at gmail.com (Pete Shepard)
Date: Tue, 25 May 2010 18:35:26 -0700
Subject: [SciPy-User] fisherexact.py returns "NA" for large #s
In-Reply-To: <r2s1cd32cbb1005071315md0bfbceds9679363bb46b24cb@mail.gmail.com>
References: <r2w5c2c43621005071245uf3fcbb75x9f83a27a15e6f7f1@mail.gmail.com>
	<r2s1cd32cbb1005071315md0bfbceds9679363bb46b24cb@mail.gmail.com>
Message-ID: <AANLkTilM3xbbA1qbYy9yjT0hIf0-kob8nYbgot6YUs6x@mail.gmail.com>

Hi Josef,

An example of ratios that returns "nan"; 110:859 and 48:327

On Fri, May 7, 2010 at 1:15 PM, <josef.pktd at gmail.com> wrote:

> On Fri, May 7, 2010 at 3:45 PM, Pete Shepard <peter.shepard at gmail.com>
> wrote:
> > Hello List,
> >
> >
> > I am using "fisherexact.py" to calculate the p-value of two ratios
> however,
> > when large #s are involved, it returns "NA". Is there a way to override
> > this?
>
>
> You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ?
>
> Do you have an example? Can you add it to the ticket?
>
> Do you have large ratios or large numbers in each cell?
> If you have a large number of entries in each cell, then the chisquare
> test or similar
> asymptotic tests should be pretty reliable.
>
> Last time I tried, I didn't manage to get rid of incorrect results if
> the first cell is zero.
> And I didn't understand the details of the algorithm well enough to
> figure out what's
> going on (within a reasonable time).
>
> If you add some print statements, you could find out if the nan comes from
> a
> 0./0. division or from the hypergeometric distribution.
> Do you get the same result if you permute rows or columns?
>
> fisherexact works very well over a large range of values, but I'm
> waiting for someone
> to provide a patch for the cases that don't work.
>
> Josef
>
>
>
>
>
> >
> > TIA
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100525/3a26a575/attachment.html>

From d.l.goldsmith at gmail.com  Wed May 26 02:03:39 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Tue, 25 May 2010 23:03:39 -0700
Subject: [SciPy-User] [Announcement] The 2010 Summer Documentation
	Marathon is on!
In-Reply-To: <4BFC5AE4.2020104@enthought.com>
References: <AANLkTinNl2cRzqaPYK5qBtDlfRrCCs73BRHy8QWto6Vl@mail.gmail.com>
	<4BFC4878.2050303@enthought.com>
	<AANLkTinwtoLX2DyCYhWS_-NXsX9wFsI1TAfTY6hlLp5g@mail.gmail.com>
	<4BFC5AE4.2020104@enthought.com>
Message-ID: <AANLkTiliUgiMpNi0JU1L5t2OW5y0v98fXVkooqGbuUFu@mail.gmail.com>

On Tue, May 25, 2010 at 4:19 PM, Warren Weckesser <
warren.weckesser at enthought.com> wrote:

> David Goldsmith wrote:
> > On Tue, May 25, 2010 at 3:00 PM, Warren Weckesser
> > <warren.weckesser at enthought.com
> > <mailto:warren.weckesser at enthought.com>> wrote:
> >
> >     David Goldsmith wrote:
> >     >
> >     > Hi, folks.  It's SciPy Marathon time again!
> >     >
> >     <snip>
> >     >
> >     > As far as actually performing the work is concerned, we will again
> >     > attack these things as teams: go to
> >     > http://docs.scipy.org/scipy/Milestones/ and poke around.
> >     >
> >
> >     David, how was that list generated?
> >
> >
> > http://docs.scipy.org/scipy/Milestones/log/
> >
> > Jack Liddle created it "by hand" last summer.
> >
> >
> >     Because of some refactoring I did
> >     in linalg and signal, many of the links in these modules will give a
> >     warning that the docstring is obsolete because the corresponding
> >     object
> >     is no longer present in SVN.
> >
> >
> > Do you have a list of all the new objects you created?
> >
>
> I think this is it:
>
> linalg: new modules (reorganized basic.py and decomp.py):
>    decomp_cholesky.py
>    decomp_lu.py
>    decomp_qr.py
>    decomp_schur.py
>    decomp_svd.py
>    special_matrices.py
>
> linalg: actual new functions:
>    decomp_cholesky.cho_solve_banded
>    special_matrices.circulant
>    special_matrices.companion
>    special_matrices.hadamard
>    special_matrices.leslie
>
> signal: new modules (moved window functions from signaltools.py):
>    windows.py
>
> signal: actual new functions:
>    ltisys.impulse2
>    ltisys.step2
>    waveforms.sweep_poly
>

OK, thanks, I'll make sure the Wiki is seeing them, then add them to the
Milestones.

DG

>
>
> Warren
>
>
> > DG
> >
> >
> >     Warren
> >
> >     _______________________________________________
> >     SciPy-User mailing list
> >     SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
> >     http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> >
> > --
> > Mathematician: noun, someone who disavows certainty when their
> > uncertainty set is non-empty, even if that set has measure zero.
> >
> > Hope: noun, that delusive spirit which escaped Pandora's jar and, with
> > her lies, prevents mankind from committing a general suicide.  (As
> > interpreted by Robert Graves)
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100525/71d9b4c8/attachment.html>

From Dharhas.Pothina at twdb.state.tx.us  Wed May 26 08:43:49 2010
From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina)
Date: Wed, 26 May 2010 07:43:49 -0500
Subject: [SciPy-User] Parameterizing a curve / Map curve to line
In-Reply-To: <CE26F73A-A0A7-4E5C-8D22-8F66C30FC11E@yale.edu>
References: <4BFB95C4.63BA.009B.0@twdb.state.tx.us>
	<CE26F73A-A0A7-4E5C-8D22-8F66C30FC11E@yale.edu>
Message-ID: <4BFCD135.63BA.009B.0@twdb.state.tx.us>

>This is probably overkill, though... you could just use numpy.interp  
>to linearly interpolate your data up by 10-fold or something. (Again,  
>parameterized by t or some index: you just need to resample the x and  
>y data separately and then put it back together... this is what the  
>splprep routines do under the hood.)

Thanks I used this technique with some modifications to get equal spacing and it seems to be working great.

> I would recommend not just testing points for equality, especially  
> after resampling, but for each point, calculate the minimum distance  
> from that point to the boatpath, and if it's below a threshold value,  
> leave it alone.

This was my original plan, I just left it out of the explaination for simplicity.

- dharhas


From chanley at stsci.edu  Wed May 26 12:19:51 2010
From: chanley at stsci.edu (Christopher Hanley)
Date: Wed, 26 May 2010 12:19:51 -0400
Subject: [SciPy-User] numpy and the Google App Engine
Message-ID: <AANLkTimze0vMzLJOtqvI_227CgIBAKysU0yzbtV7Fvr4@mail.gmail.com>

Greetings,

Google provides a product called App Engine.  The description from
their site follows,

"Google App Engine enables you to build and host web apps on the same
systems that power Google applications.
App Engine offers fast development and deployment; simple
administration, with no need to worry about hardware,
patches or backups; and effortless scalability. "

You can deploy applications written in either Python or JAVA.  There
are free and paid versions of the service.

The Google App Engine would appear to be a powerful source of CPU
cycles for scientific computing.  Unfortunately this is currently not
the case because numpy is not one of the supported libraries.  The
Python App Engine allows only the installation of user supplied pure
Python code.

I have recently returned from attending the Google I/O conference in
San Francisco.  While there I inquired into the possibility of getting
numpy added.  The basic response was that there doesn't appear to be
much interest from the community given the amount of work it would
take to vet and add numpy.

I would like to ask your help in changing this perception.

The quickest and easiest thing you can do would be to add your "me
too" to this feature request (item #190) on the support site:

http://code.google.com/p/googleappengine/issues/detail?id=190

If this issue is important to you could also consider raising this
issue in the related Google Group:

http://groups.google.com/group/google-appengine

Letting Google know how you will use numpy would be helpful.  If you
or your institution would be willing to pay for service if you could
deploy cloud applications that required numpy would be helpful to let
them know as well.

Finally, if you run into any App Engine developers (Guido included)
let them know that you would like to see numpy added.

Thank you for your time and consideration.

Chris


-- 
Christopher Hanley
Senior Systems Software Engineer
Space Telescope Science Institute
3700 San Martin Drive
Baltimore MD, 21218
(410) 338-4338


From robert.kern at gmail.com  Wed May 26 12:54:17 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 May 2010 12:54:17 -0400
Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine
In-Reply-To: <AANLkTimze0vMzLJOtqvI_227CgIBAKysU0yzbtV7Fvr4@mail.gmail.com>
References: <AANLkTimze0vMzLJOtqvI_227CgIBAKysU0yzbtV7Fvr4@mail.gmail.com>
Message-ID: <AANLkTilQJYMGB05bf1IjiuXOaFemksY0Q7g1e0aJy8ft@mail.gmail.com>

On Wed, May 26, 2010 at 12:19, Christopher Hanley <chanley at stsci.edu> wrote:
> Greetings,
>
> Google provides a product called App Engine. ?The description from
> their site follows,
>
> "Google App Engine enables you to build and host web apps on the same
> systems that power Google applications.
> App Engine offers fast development and deployment; simple
> administration, with no need to worry about hardware,
> patches or backups; and effortless scalability. "
>
> You can deploy applications written in either Python or JAVA. ?There
> are free and paid versions of the service.
>
> The Google App Engine would appear to be a powerful source of CPU
> cycles for scientific computing.

Not really. It is not intended for such purposes. It is intended for
the easy deployment and horizontal scaling of web applications. Each
individual request is very short; it is limited to 10 seconds of CPU
time. While numpy would be useful for scientific web applications (not
least because it would help you keep to that 10 second limit when
doing things like simple image processing or summary statistics or
whatever), it is not a source of CPU cycles. Services like Amazon EC2
or Rackspace Cloud are much closer to what you want. PiCloud provides
an even nicer interface for you:

  http://www.picloud.com/

Disclosure: Enthought partners with PiCloud to provide most EPD
libraries. I can't say I'm disinterested in promoting it, but it *is*
a really powerful product that *does* provide CPU cycles for
scientific computing with an interface much more suited to it than
GAE.

>?Unfortunately this is currently not
> the case because numpy is not one of the supported libraries. ?The
> Python App Engine allows only the installation of user supplied pure
> Python code.
>
> I have recently returned from attending the Google I/O conference in
> San Francisco. ?While there I inquired into the possibility of getting
> numpy added. ?The basic response was that there doesn't appear to be
> much interest from the community given the amount of work it would
> take to vet and add numpy.
>
> I would like to ask your help in changing this perception.
>
> The quickest and easiest thing you can do would be to add your "me
> too" to this feature request (item #190) on the support site:
>
> http://code.google.com/p/googleappengine/issues/detail?id=190

My understanding is that they hate "me too" comments. They ask that
you star the issue instead.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From mdekauwe at gmail.com  Wed May 26 17:03:03 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Wed, 26 May 2010 14:03:03 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
Message-ID: <28686356.post@talk.nabble.com>


Could you possibly if you have time explain further your comment re the
p-values, your suggesting I am misusing them?

Thanks.


josef.pktd wrote:
> 
> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Sounds like I am stuck with the loop as I need to do the comparison for
>> each
>> pixel of the world and then I have a basemap function call which I guess
>> slows it down further...hmm
> 
> I don't see much that could be done differently, after a brief look.
> 
> stats.pearsonr could be replaced by an array version using directly
> the formula for correlation even with nans. wilcoxon looks slow, and I
> never tried or seen a faster version.
> 
> just a reminder, the p-values are for a single test, when you have
> many of them, then they don't have the right size/confidence level for
> an overall or joint test. (some packages report a Bonferroni
> correction in this case)
> 
> Josef
> 
> 
>>
>> i.e.
>>
>> def compareSnowData(jules_var):
>> ? ?# Extract the 11 years of snow data and return
>> ? ?outrows = 180
>> ? ?outcols = 360
>> ? ?numyears = 11
>> ? ?nummonths = 12
>>
>> ? ?# Read various files
>> ? ?fname="world_valid_jules_pts.ascii"
>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>> jo.read_land_points_ascii(fname, 1.0)
>>
>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>
>> ? ?# grab some space
>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>> dtype=np.float32)
>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>> dtype=np.float32)
>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>> np.nan
>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>> np.nan
>>
>> ? ?# extract the data
>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>> ? ?#for month in xrange(numyears * nummonths):
>> ? ?# ? ?for i in xrange(numpts):
>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>> ? ?# ? ? ? ?if data1 >= 0.0:
>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>> ? ?# ? ? ? ?else:
>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>> ? ?# ? ? ? ?if data2 > 0.0:
>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>> ? ?# ? ? ? ?else:
>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>
>> ? ?# exclude any months from *both* arrays where we have dodgy data, else
>> we
>> ? ?# can't do the correlations correctly!!
>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>
>> ? ?# put data on a regular grid...
>> ? ?print 'regridding landpts...'
>> ? ?for i in xrange(numpts):
>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>> func
>> ? ? ? ?x = data1_snow[:,i]
>> ? ? ? ?x = x[np.isfinite(x)]
>> ? ? ? ?y = data2_snow[:,i]
>> ? ? ? ?y = y[np.isfinite(y)]
>>
>> ? ? ? ?# r^2
>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of
>> data
>> ? ? ? ?if len(x) and len(y) > 50:
>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>> (stats.pearsonr(x, y)[0])**2
>>
>> ? ? ? ?# wilcox signed rank test
>> ? ? ? ?# make sure we have enough samples to do the test
>> ? ? ? ?d = x - y
>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>> differences
>> ? ? ? ?count = len(d)
>> ? ? ? ?if count > 10:
>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>> ? ? ? ? ? ?# only map out sign different data
>> ? ? ? ? ? ?if pval < 0.05:
>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>> np.mean(x - y)
>>
>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>
>>
>> josef.pktd wrote:
>>>
>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Also I then need to remap the 2D array I make onto another grid (the
>>>> world in
>>>> this case). Which again I had am doing with a loop (note numpts is a
>>>> lot
>>>> bigger than my example above).
>>>>
>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>> np.nan
>>>> for i in xrange(numpts):
>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>> func
>>>> ? ? ? ?x = data1_snow[:,i]
>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>> ? ? ? ?y = data2_snow[:,i]
>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>
>>>> ? ? ? ?# wilcox signed rank test
>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>> ? ? ? ?d = x - y
>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>> non-zero
>>>> differences
>>>> ? ? ? ?count = len(d)
>>>> ? ? ? ?if count > 10:
>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>> ? ? ? ? ? ?# only map out sign different data
>>>> ? ? ? ? ? ?if pval < 0.05:
>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>> np.mean(x - y)
>>>>
>>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>>> array
>>>> by removing the index,
>>>> but I can't see how I will get the individual x and y pts for each
>>>> array
>>>> member correctly without the loop, this was my attempt which of course
>>>> doesn't work!
>>>>
>>>> x = data1_snow[:,:]
>>>> x = x[np.isfinite(x)]
>>>> y = data2_snow[:,:]
>>>> y = y[np.isfinite(y)]
>>>>
>>>> # r^2
>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>>> if len(x) and len(y) > 50:
>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>> y)[0])**2
>>>
>>>
>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>>> at a time (if I read the help correctly).
>>>
>>> Also the presence of nans might force the use a loop. stats.mstats has
>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>> when vectorized operations would work with regular arrays, nan or
>>> masked array versions still have to loop in many cases.)
>>>
>>> If you have many columns with count <= 10, so that wilcoxon is not
>>> calculated then it might be worth to use only array operations up to
>>> that point. If wilcoxon is calculated most of the time, then it's not
>>> worth thinking too hard about this.
>>>
>>> Josef
>>>
>>>
>>>>
>>>> thanks.
>>>>
>>>>
>>>>
>>>>
>>>> mdekauwe wrote:
>>>>>
>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>>
>>>>> I don't quite get what you mean about slicing with axis > 3. Is there
>>>>> a
>>>>> link you can recommend I should read? Does that mean given I have
>>>>> 4dims
>>>>> that Josef's suggestion would be more advised in this case?
>>>
>>> There were several discussions on the mailing lists (fancy slicing and
>>> indexing). Your case is safe, but if you run in future into funny
>>> shapes, you can look up the details.
>>> when in doubt, I use np.arange(...)
>>>
>>> Josef
>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks that works...
>>>>>>>
>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>>>> step
>>>>>>> I
>>>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>>>> two
>>>>>>> for
>>>>>>> loops? Do I have that right?
>>>>>>
>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>> dimension,
>>>>>> then you can use slicing. It might be faster.
>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>> dimensions can have some surprise switching of axes.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>>
>>>>>>> A lot quicker...!
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> josef.pktd wrote:
>>>>>>>>
>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>>> array,
>>>>>>>>> but
>>>>>>>>> avoid my current usage of the for loops for speed, as in reality
>>>>>>>>> the
>>>>>>>>> arrays
>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>> solution
>>>>>>>>> as
>>>>>>>>> well
>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>> difficult
>>>>>>>>> to
>>>>>>>>> get
>>>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>>>> one
>>>>>>>>> could
>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>
>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>
>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> import numpy as np
>>>>>>>>>
>>>>>>>>> numpts=10
>>>>>>>>> tsteps = 12
>>>>>>>>> vari = 22
>>>>>>>>>
>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>
>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>
>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>
>>>>>>>> I think this should do it
>>>>>>>>
>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Wed May 26 18:43:52 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 26 May 2010 18:43:52 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28686356.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
Message-ID: <AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>

On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Could you possibly if you have time explain further your comment re the
> p-values, your suggesting I am misusing them?

Depends on your use and interpretation

test statistics, p-values are random variables, if you look at several
tests at the same time, some p-values will be large just by chance.
If, for example you just look at the largest test statistic, then the
distribution for the max of several test statistics is not the same as
the distribution for a single test statistic

http://en.wikipedia.org/wiki/Multiple_comparisons
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm

we also just had a related discussion for ANOVA post-hoc tests on the
pystatsmodels group.

Josef
>
> Thanks.
>
>
> josef.pktd wrote:
>>
>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Sounds like I am stuck with the loop as I need to do the comparison for
>>> each
>>> pixel of the world and then I have a basemap function call which I guess
>>> slows it down further...hmm
>>
>> I don't see much that could be done differently, after a brief look.
>>
>> stats.pearsonr could be replaced by an array version using directly
>> the formula for correlation even with nans. wilcoxon looks slow, and I
>> never tried or seen a faster version.
>>
>> just a reminder, the p-values are for a single test, when you have
>> many of them, then they don't have the right size/confidence level for
>> an overall or joint test. (some packages report a Bonferroni
>> correction in this case)
>>
>> Josef
>>
>>
>>>
>>> i.e.
>>>
>>> def compareSnowData(jules_var):
>>> ? ?# Extract the 11 years of snow data and return
>>> ? ?outrows = 180
>>> ? ?outcols = 360
>>> ? ?numyears = 11
>>> ? ?nummonths = 12
>>>
>>> ? ?# Read various files
>>> ? ?fname="world_valid_jules_pts.ascii"
>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>> jo.read_land_points_ascii(fname, 1.0)
>>>
>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>
>>> ? ?# grab some space
>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>> dtype=np.float32)
>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>> dtype=np.float32)
>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>> np.nan
>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>> np.nan
>>>
>>> ? ?# extract the data
>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>> ? ?#for month in xrange(numyears * nummonths):
>>> ? ?# ? ?for i in xrange(numpts):
>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>> ? ?# ? ? ? ?else:
>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>> ? ?# ? ? ? ?if data2 > 0.0:
>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>> ? ?# ? ? ? ?else:
>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>
>>> ? ?# exclude any months from *both* arrays where we have dodgy data, else
>>> we
>>> ? ?# can't do the correlations correctly!!
>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>
>>> ? ?# put data on a regular grid...
>>> ? ?print 'regridding landpts...'
>>> ? ?for i in xrange(numpts):
>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>> func
>>> ? ? ? ?x = data1_snow[:,i]
>>> ? ? ? ?x = x[np.isfinite(x)]
>>> ? ? ? ?y = data2_snow[:,i]
>>> ? ? ? ?y = y[np.isfinite(y)]
>>>
>>> ? ? ? ?# r^2
>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of
>>> data
>>> ? ? ? ?if len(x) and len(y) > 50:
>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>> (stats.pearsonr(x, y)[0])**2
>>>
>>> ? ? ? ?# wilcox signed rank test
>>> ? ? ? ?# make sure we have enough samples to do the test
>>> ? ? ? ?d = x - y
>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>>> differences
>>> ? ? ? ?count = len(d)
>>> ? ? ? ?if count > 10:
>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>> ? ? ? ? ? ?# only map out sign different data
>>> ? ? ? ? ? ?if pval < 0.05:
>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>> np.mean(x - y)
>>>
>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Also I then need to remap the 2D array I make onto another grid (the
>>>>> world in
>>>>> this case). Which again I had am doing with a loop (note numpts is a
>>>>> lot
>>>>> bigger than my example above).
>>>>>
>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>> np.nan
>>>>> for i in xrange(numpts):
>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>> func
>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>
>>>>> ? ? ? ?# wilcox signed rank test
>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>> ? ? ? ?d = x - y
>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>> non-zero
>>>>> differences
>>>>> ? ? ? ?count = len(d)
>>>>> ? ? ? ?if count > 10:
>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>> np.mean(x - y)
>>>>>
>>>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>>>> array
>>>>> by removing the index,
>>>>> but I can't see how I will get the individual x and y pts for each
>>>>> array
>>>>> member correctly without the loop, this was my attempt which of course
>>>>> doesn't work!
>>>>>
>>>>> x = data1_snow[:,:]
>>>>> x = x[np.isfinite(x)]
>>>>> y = data2_snow[:,:]
>>>>> y = y[np.isfinite(y)]
>>>>>
>>>>> # r^2
>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>>>> if len(x) and len(y) > 50:
>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>> y)[0])**2
>>>>
>>>>
>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>>>> at a time (if I read the help correctly).
>>>>
>>>> Also the presence of nans might force the use a loop. stats.mstats has
>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>> when vectorized operations would work with regular arrays, nan or
>>>> masked array versions still have to loop in many cases.)
>>>>
>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>> calculated then it might be worth to use only array operations up to
>>>> that point. If wilcoxon is calculated most of the time, then it's not
>>>> worth thinking too hard about this.
>>>>
>>>> Josef
>>>>
>>>>
>>>>>
>>>>> thanks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> mdekauwe wrote:
>>>>>>
>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>>>
>>>>>> I don't quite get what you mean about slicing with axis > 3. Is there
>>>>>> a
>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>> 4dims
>>>>>> that Josef's suggestion would be more advised in this case?
>>>>
>>>> There were several discussions on the mailing lists (fancy slicing and
>>>> indexing). Your case is safe, but if you run in future into funny
>>>> shapes, you can look up the details.
>>>> when in doubt, I use np.arange(...)
>>>>
>>>> Josef
>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks that works...
>>>>>>>>
>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>>>>> step
>>>>>>>> I
>>>>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>>>>> two
>>>>>>>> for
>>>>>>>> loops? Do I have that right?
>>>>>>>
>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>> dimension,
>>>>>>> then you can use slicing. It might be faster.
>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>>
>>>>>>>> A lot quicker...!
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>> josef.pktd wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>>>> array,
>>>>>>>>>> but
>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality
>>>>>>>>>> the
>>>>>>>>>> arrays
>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>> solution
>>>>>>>>>> as
>>>>>>>>>> well
>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>> difficult
>>>>>>>>>> to
>>>>>>>>>> get
>>>>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>>>>> one
>>>>>>>>>> could
>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>
>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>
>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> import numpy as np
>>>>>>>>>>
>>>>>>>>>> numpts=10
>>>>>>>>>> tsteps = 12
>>>>>>>>>> vari = 22
>>>>>>>>>>
>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>
>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>
>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>
>>>>>>>>> I think this should do it
>>>>>>>>>
>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From david at silveregg.co.jp  Wed May 26 21:21:04 2010
From: david at silveregg.co.jp (David)
Date: Thu, 27 May 2010 10:21:04 +0900
Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine
In-Reply-To: <AANLkTilQJYMGB05bf1IjiuXOaFemksY0Q7g1e0aJy8ft@mail.gmail.com>
References: <AANLkTimze0vMzLJOtqvI_227CgIBAKysU0yzbtV7Fvr4@mail.gmail.com>
	<AANLkTilQJYMGB05bf1IjiuXOaFemksY0Q7g1e0aJy8ft@mail.gmail.com>
Message-ID: <4BFDC900.8030802@silveregg.co.jp>

On 05/27/2010 01:54 AM, Robert Kern wrote:

> Not really. It is not intended for such purposes. It is intended for
> the easy deployment and horizontal scaling of web applications. Each
> individual request is very short; it is limited to 10 seconds of CPU
> time. While numpy would be useful for scientific web applications (not
> least because it would help you keep to that 10 second limit when
> doing things like simple image processing or summary statistics or
> whatever), it is not a source of CPU cycles.

Besides what Robert said, I would also mention the datastore limitations 
given by the Google App Engine (no big blob of data, high latency, 
etc...) which make it quite hard to do something non-trivial even 
assuming numpy were available.

It is also my understanding that EC2 is pretty competitive compared to 
GAE (but of course GAE does more for you). I did not know about picloud 
until two days ago, and I have only used GAE for a couple of months at 
work, but picloud seems like a much more usable service for scientific 
computing to me.

cheers,

David


From chanley at stsci.edu  Wed May 26 21:30:56 2010
From: chanley at stsci.edu (Christopher Hanley)
Date: Wed, 26 May 2010 21:30:56 -0400
Subject: [SciPy-User] [SciPy-Dev] numpy and the Google App Engine
In-Reply-To: <4BFDC900.8030802@silveregg.co.jp>
References: <AANLkTimze0vMzLJOtqvI_227CgIBAKysU0yzbtV7Fvr4@mail.gmail.com> 
	<AANLkTilQJYMGB05bf1IjiuXOaFemksY0Q7g1e0aJy8ft@mail.gmail.com> 
	<4BFDC900.8030802@silveregg.co.jp>
Message-ID: <AANLkTimBSs2QyzAtwLLGBoHO63UdeLKe8gsFBFBKO03O@mail.gmail.com>

On Wed, May 26, 2010 at 9:21 PM, David <david at silveregg.co.jp> wrote:
> On 05/27/2010 01:54 AM, Robert Kern wrote:
>
>> Not really. It is not intended for such purposes. It is intended for
>> the easy deployment and horizontal scaling of web applications. Each
>> individual request is very short; it is limited to 10 seconds of CPU
>> time. While numpy would be useful for scientific web applications (not
>> least because it would help you keep to that 10 second limit when
>> doing things like simple image processing or summary statistics or
>> whatever), it is not a source of CPU cycles.
>
> Besides what Robert said, I would also mention the datastore limitations
> given by the Google App Engine (no big blob of data, high latency,
> etc...) which make it quite hard to do something non-trivial even
> assuming numpy were available.
>
> It is also my understanding that EC2 is pretty competitive compared to
> GAE (but of course GAE does more for you). I did not know about picloud
> until two days ago, and I have only used GAE for a couple of months at
> work, but picloud seems like a much more usable service for scientific
> computing to me.
>
> cheers,
>
> David

If you are looking for a large data blob for GAE you might want to
sign up for the Google Storage for Developers service.  It was just
introduced at Google I/O.  You can find the project here:

http://code.google.com/apis/storage/


Chris


-- 
Christopher Hanley
Senior Systems Software Engineer
Space Telescope Science Institute
3700 San Martin Drive
Baltimore MD, 21218
(410) 338-4338


From linda.polman at gmail.com  Thu May 27 04:09:12 2010
From: linda.polman at gmail.com (Linda)
Date: Thu, 27 May 2010 10:09:12 +0200
Subject: [SciPy-User] finding frequency of wav
Message-ID: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>

Hello all,

I have a digital signal where the bits in it are encoded with frequencies
1300 and 2100 Hz. The message is sent as a wav-file with a samplerate of
22050.
My goal is to find the bits again so I can decode the message in it, for
that I have chopped the wav up in pieces of 18 samples, which would be the
bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have a list of chunks of
length 18. I thought I could just fft each chunks and find the max of the
chunk-spectrum, to find out the bitfrequency in the chunk (and thus the
bitvalue)

But somehow I am stuck in the numbers, I was hopeing you could give me a
hint.

here is what I have:

chunks[3] #this is one of the wavchunks, there should be a bit hidden in
here
Out[98]:
  array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,  0,
  0], dtype=int16)

test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the
value of the bitfrequency 1300 of 2100 Hz?

test
Out[100]:
array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
        1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
        2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
       -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
        4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
        4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
       -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
        2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
        1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])

I am unsure how to proceed from here, so I would really appreciate any
tips..
I found fftfreq, but I am not sure how to use it? I read fftfreq? but I
don't see how the example even uses the 'fourier' variable in the fftfreq
there?

Thanks in advance

Linda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100527/36fa6ee1/attachment.html>

From silva at lma.cnrs-mrs.fr  Thu May 27 09:28:42 2010
From: silva at lma.cnrs-mrs.fr (Fabrice Silva)
Date: Thu, 27 May 2010 10:28:42 -0300
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
Message-ID: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>

Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
> Hello all,

> I have a digital signal where the bits in it are encoded with
> frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
> samplerate of 22050.
> My goal is to find the bits again so I can decode the message in it,
> for that I have chopped the wav up in pieces of 18 samples, which
> would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
> a list of chunks of length 18. I thought I could just fft each chunks
> and find the max of the chunk-spectrum, to find out the bitfrequency
> in the chunk (and thus the bitvalue)

Correct me if I am wrong. You are cutting your signal into chunks that
you expect to contain at least one period of the lower coding frequency.
You then perform a fft on a very small signal (18 samples) which gives
you (without zero padding) an estimation of the Fourier transform of
your chunk computed on only 18 frequencies, i.e. with a really bad
frequential resolution. It is possible if your coding frequencies are
not too close. A raw Rayleight criteria leads to cut your signal into at
least N=2*Fe/df_min where df_min is the minimal spacing between two
coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
power of 2).
> 
> But somehow I am stuck in the numbers, I was hopeing you could give me
> a hint. here is what I have:

> chunks[3] #this is one of the wavchunks, there should be a bit hidden in here
> Out[98]:
>   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,  0,  0], dtype=int16)
> test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the value of the bitfrequency 1300 of 2100 Hz?
> test
> Out[100]:
> array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
> 
> 
> I am unsure how to proceed from here, so I would really appreciate any
> tips.. I found fftfreq, but I am not sure how to use it? I read
> fftfreq? but I don't see how the example even uses the 'fourier'
> variable in the fftfreq there? 
> 
Fftfreq is a function that constructs the frequency vector associated to
the data computed by the fft algorithm. It is aware of how fft orders
the frequency bins, and transform it in a more convenient way (it
'anti-aliases', centering the results on zero frequency).

import numpy as np
import matplotlib.pyplot as plt
chunks[3]=....
test = np.fft.fft(chunks[3])
frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling period
plt.plot(frequencies, np.abs(test), 'o')
plt.show()

but you won't see any things on this fft. I am suspicious due to the
fact that the signal to noise ratio seems rather low leading to strong
peak at Fe/2
In chunk[3], what do you expect to be the bit?

Fabricio


From josef.pktd at gmail.com  Thu May 27 10:38:46 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 27 May 2010 10:38:46 -0400
Subject: [SciPy-User] script to rst converter for examples, tutorials
Message-ID: <AANLkTikE5Z4ai-9I4HGimjS9Pn_Z4n10OATPe9roEmVy@mail.gmail.com>

a bit off-topic

I would like to write a tutorial in a python script file that can then
be converted to restructured text for sphinx.

I have seen various versions in the past, the most recent in pymvpa.
However, I would also like to automatically include interactive output
or print-results as an option, which doesn't seem possible with the
pymvpa version.

http://pypi.python.org/pypi/Pweave looks interesting but is not based
on a valid python script.

Are there any recommendations?

Thanks,

Josef


From linda.polman at gmail.com  Thu May 27 10:42:05 2010
From: linda.polman at gmail.com (Linda)
Date: Thu, 27 May 2010 16:42:05 +0200
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
Message-ID: <AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>

Thanks for your reply. The explanation on fftfreq already made a few puzzle
pieces fall into place.

The signal I am trying to decode is a DSC transmission that is recorded in a
wav file. (Digital Selective Calling, used in marine radio)
It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and
there's a carrier at 1700Hz. That should be all frequencies involved (apart
from noise). Currently I am used generated, clean signals. But probably I
should get a clean '10101010'-signal first to try my work on.

Since the bitrate is set at 1200bits/sec, the bit length would be
samplerate/1200 = 18.4 samples at 22050.
I can double the samplerate to 44100, but that still leaves me at only 36.8
samples per chunk.
If I understand what you say correctly, I would need at least 55 (64)
samples in each chunk?

I'm not sure what chunk[3] would have been, I should have used a
dotting-signal instead of an unknown message to try this on. I will try this
again with more useful data this afternoon.

cheers,
Linda


On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr> wrote:

> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
> > Hello all,
>
> > I have a digital signal where the bits in it are encoded with
> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
> > samplerate of 22050.
> > My goal is to find the bits again so I can decode the message in it,
> > for that I have chopped the wav up in pieces of 18 samples, which
> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
> > a list of chunks of length 18. I thought I could just fft each chunks
> > and find the max of the chunk-spectrum, to find out the bitfrequency
> > in the chunk (and thus the bitvalue)
>
> Correct me if I am wrong. You are cutting your signal into chunks that
> you expect to contain at least one period of the lower coding frequency.
> You then perform a fft on a very small signal (18 samples) which gives
> you (without zero padding) an estimation of the Fourier transform of
> your chunk computed on only 18 frequencies, i.e. with a really bad
> frequential resolution. It is possible if your coding frequencies are
> not too close. A raw Rayleight criteria leads to cut your signal into at
> least N=2*Fe/df_min where df_min is the minimal spacing between two
> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
> power of 2).
> >
> > But somehow I am stuck in the numbers, I was hopeing you could give me
> > a hint. here is what I have:
>
> > chunks[3] #this is one of the wavchunks, there should be a bit hidden in
> here
> > Out[98]:
> >   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,
>  0,  0], dtype=int16)
> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me
> the value of the bitfrequency 1300 of 2100 Hz?
> > test
> > Out[100]:
> > array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
> >         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
> >         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
> >        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
> >         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
> >         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
> >        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
> >         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
> >         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
> >
> >
> > I am unsure how to proceed from here, so I would really appreciate any
> > tips.. I found fftfreq, but I am not sure how to use it? I read
> > fftfreq? but I don't see how the example even uses the 'fourier'
> > variable in the fftfreq there?
> >
> Fftfreq is a function that constructs the frequency vector associated to
> the data computed by the fft algorithm. It is aware of how fft orders
> the frequency bins, and transform it in a more convenient way (it
> 'anti-aliases', centering the results on zero frequency).
>
> import numpy as np
> import matplotlib.pyplot as plt
> chunks[3]=....
> test = np.fft.fft(chunks[3])
> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling
> period
> plt.plot(frequencies, np.abs(test), 'o')
> plt.show()
>
> but you won't see any things on this fft. I am suspicious due to the
> fact that the signal to noise ratio seems rather low leading to strong
> peak at Fe/2
> In chunk[3], what do you expect to be the bit?
>
> Fabricio
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100527/ac64a3aa/attachment.html>

From aisaac at american.edu  Thu May 27 11:05:33 2010
From: aisaac at american.edu (Alan G Isaac)
Date: Thu, 27 May 2010 11:05:33 -0400
Subject: [SciPy-User] script to rst converter for examples, tutorials
In-Reply-To: <AANLkTikE5Z4ai-9I4HGimjS9Pn_Z4n10OATPe9roEmVy@mail.gmail.com>
References: <AANLkTikE5Z4ai-9I4HGimjS9Pn_Z4n10OATPe9roEmVy@mail.gmail.com>
Message-ID: <4BFE8A3D.9070103@american.edu>

pylit? http://pylit.berlios.de/literate-programming/index.html

pyreport? http://gael-varoquaux.info/computers/pyreport/

Alan


From silva at lma.cnrs-mrs.fr  Thu May 27 11:02:21 2010
From: silva at lma.cnrs-mrs.fr (Fabrice Silva)
Date: Thu, 27 May 2010 12:02:21 -0300
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
Message-ID: <1274972542.2121.60.camel@Portable-s2m.cnrs-mrs.fr>

Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit :

> The signal I am trying to decode is a DSC transmission that is
> recorded in a wav file. (Digital Selective Calling, used in marine
> radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is
> 1300 Hz and there's a carrier at 1700Hz. That should be all
> frequencies involved (apart from noise). Currently I am used
> generated, clean signals. But probably I should get a clean
> '10101010'-signal first to try my work on.
> Since the bitrate is set at 1200bits/sec, the bit length would be
> samplerate/1200 = 18.4 samples at 22050. I can double the samplerate
> to 44100, but that still leaves me at only 36.8 samples per chunk.

Then a chuck is what you can consider a stationary signal with a single
frequency. Due to the bitrate, it has 18 samples, so that, due to the
limited observation time range, its Fourier transform is a lobe centered
on 2100 or 1300Hz whose relative bandwidth is the inverse of the number
of periods that whould have been observed (neglecting noise).
Result : width = frequency/#periods = 1300/{almost 1}
i.e. the very limited number of samples in each sample lead to, in
frequency domain, a lobe whose width is almost the same as the central
frequency!

> If I understand what you say correctly, I would need at least 55 (64)
samples in each chunk?
In my previous answer, I only consider a very raw Rayleigth criteria:
two coding frequencies would at least be spaced by two computed fft
frequencies. Zero padding can easily deal with this problem. But the
width of the lobes (linked with the limited number of periods observed)
may be a lot more tricky to solve. But you can still use estimation
methods based for example on the spectrum centro?d to determine whether
the bit is now on 0 or 1.
> 
> 
> I'm not sure what chunk[3] would have been, I should have used a
> dotting-signal instead of an unknown message to try this on. I will
> try this again with more useful data this afternoon.


From silva at lma.cnrs-mrs.fr  Thu May 27 11:42:45 2010
From: silva at lma.cnrs-mrs.fr (Fabrice Silva)
Date: Thu, 27 May 2010 12:42:45 -0300
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
Message-ID: <1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr>

Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit :

> The signal I am trying to decode is a DSC transmission that is
> recorded in a wav file. (Digital Selective Calling, used in marine
> radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is
> 1300 Hz and there's a carrier at 1700Hz. That should be all
> frequencies involved (apart from noise). Currently I am used
> generated, clean signals. But probably I should get a clean
> '10101010'-signal first to try my work on.
> Since the bitrate is set at 1200bits/sec, the bit length would be
> samplerate/1200 = 18.4 samples at 22050. I can double the samplerate
> to 44100, but that still leaves me at only 36.8 samples per chunk.

Then a chuck is what you can consider a stationary signal with a single
frequency. Due to the bitrate and the sampling frequency, it has only 18
samples, so that, its Fourier transform is a lobe centered
on 2100 or 1300Hz whose relative bandwidth is the inverse of the number
of periods that whould have been observed (neglecting noise).
Result : width = frequency/#periods = 1300/{almost 1} Hz
i.e. the very limited number of samples in each sample lead to, in
frequency domain, a lobe whose width is almost the same as the central
frequency! because the bitrate and carrier are close...

> If I understand what you say correctly, I would need at least 55 (64)
samples in each chunk?
In my previous answer, I only consider a very raw Rayleigth criteria:
two coding frequencies would at least be spaced by two computed fft
frequencies. Zero padding can easily deal with this problem. But the
width of the lobes (linked with the limited number of periods observed)
may be a lot more tricky to solve. But you can still use estimation
methods based for example on the spectrum centro?d to determine whether
the bit is now on 0 or 1.
> 
> 
> I'm not sure what chunk[3] would have been, I should have used a
> dotting-signal instead of an unknown message to try this on. I will
> try this again with more useful data this afternoon.


From linda.polman at gmail.com  Thu May 27 12:44:16 2010
From: linda.polman at gmail.com (Linda)
Date: Thu, 27 May 2010 18:44:16 +0200
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
	<1274974965.4445.9.camel@Portable-s2m.cnrs-mrs.fr>
Message-ID: <AANLkTin8S8HGgSC98UfHmpkO-YVxDPGFgmp_JCxmnm8T@mail.gmail.com>

Thanks again for you explanation :-) This certainly helps.


On Thu, May 27, 2010 at 17:42, Fabrice Silva <silva at lma.cnrs-mrs.fr> wrote:

> Le jeudi 27 mai 2010 ? 16:42 +0200, Linda a ?crit :
>
> > The signal I am trying to decode is a DSC transmission that is
> > recorded in a wav file. (Digital Selective Calling, used in marine
> > radio) It is a phase modulated digital signal: '1' is 2100Hz, '0' is
> > 1300 Hz and there's a carrier at 1700Hz. That should be all
> > frequencies involved (apart from noise). Currently I am used
> > generated, clean signals. But probably I should get a clean
> > '10101010'-signal first to try my work on.
> > Since the bitrate is set at 1200bits/sec, the bit length would be
> > samplerate/1200 = 18.4 samples at 22050. I can double the samplerate
> > to 44100, but that still leaves me at only 36.8 samples per chunk.
>
> Then a chuck is what you can consider a stationary signal with a single
> frequency. Due to the bitrate and the sampling frequency, it has only 18
> samples, so that, its Fourier transform is a lobe centered
> on 2100 or 1300Hz whose relative bandwidth is the inverse of the number
> of periods that whould have been observed (neglecting noise).
> Result : width = frequency/#periods = 1300/{almost 1} Hz
> i.e. the very limited number of samples in each sample lead to, in
> frequency domain, a lobe whose width is almost the same as the central
> frequency! because the bitrate and carrier are close...
>
> > If I understand what you say correctly, I would need at least 55 (64)
> samples in each chunk?
> In my previous answer, I only consider a very raw Rayleigth criteria:
> two coding frequencies would at least be spaced by two computed fft
> frequencies. Zero padding can easily deal with this problem. But the
> width of the lobes (linked with the limited number of periods observed)
> may be a lot more tricky to solve. But you can still use estimation
> methods based for example on the spectrum centro?d to determine whether
> the bit is now on 0 or 1.
> >
> >
> > I'm not sure what chunk[3] would have been, I should have used a
> > dotting-signal instead of an unknown message to try this on. I will
> > try this again with more useful data this afternoon.
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100527/f33bdf56/attachment.html>

From justin.t.riley at gmail.com  Thu May 27 13:53:31 2010
From: justin.t.riley at gmail.com (Justin Riley)
Date: Thu, 27 May 2010 13:53:31 -0400
Subject: [SciPy-User] StarCluster 0.91 - NumPy/SciPy Clusters on EC2
In-Reply-To: <AANLkTikuwFXq6do8NpgSKsXLJ0IaYF8vUoAHCJSGAU3k@mail.gmail.com>
References: <AANLkTikuwFXq6do8NpgSKsXLJ0IaYF8vUoAHCJSGAU3k@mail.gmail.com>
Message-ID: <4BFEB19B.5040804@gmail.com>

This is a one-time message to announce the availability of version 0.91
of the StarCluster package.

Why should you care? StarCluster allows you to create NumPy/SciPy
clusters configured with NFS-shared filesystems and the Sun Grid Engine
queueing system out of the box on Amazon's Elastic Compute Cloud (EC2).
The NumPy/SciPy installations have been compiled against a
custom-compiled ATLAS for the larger EC2 instances.

About
-----

There is an article about StarCluster on www.hpcinthecloud.com:

http://www.hpcinthecloud.com/features/StarCluster-Brings-HPC-to-the-Amazon-Cloud-94099324.html

There is also a screencast of installing, configuring, launching, and
terminating an HPC cluster on Amazon EC2:

http://www.hpcinthecloud.com/blogs/MITs-StarCluster-An-Update-with-Screencast-94599554.html

Project description from PyPI:

StarCluster is a utility for creating and managing scientific computing
clusters hosted on Amazon's Elastic Compute Cloud (EC2). StarCluster
utilizes Amazon's EC2 web service to create and destroy clusters of
Linux virtual machines on demand.

To get started, the user creates a simple configuration file with their
AWS account details and a few cluster preferences (e.g. number of
machines, machine type, ssh keypairs, etc). After creating the
configuration file and running StarCluster's "start" command, a cluster
of Linux machines configured with the Sun Grid Engine queuing system, an
NFS-shared /home directory, and OpenMPI with password-less ssh is
created and ready to go out-of-the-box. Running StarCluster's "stop"
command will shutdown the cluster and stop paying for service. This
allows the user to only pay for what they use.

StarCluster provides a Ubuntu-based Amazon Machine Image (AMI) in 32bit
and 64bit architectures. The AMI contains an optimized
NumPy/SciPy/Atlas/Blas/Lapack installation compiled for the larger
Amazon EC2 instance types. The AMI also comes with Sun Grid Engine (SGE)
and OpenMPI compiled with SGE support. The public AMI can easily be
customized by launching a single instance of the public AMI, installing
additional software on the instance, and then using

StarCluster can also utilize Amazon's Elastic Block Storage (EBS)
volumes to provide persistent data storage for a cluster. EBS volumes
allow you to store large amounts of data in the Amazon cloud and are
also easy to back-up and replicate in the cloud. StarCluster will mount
and NFS-share any volumes specified in the config. StarCluster's
"createvolume" command provides the ability to automatically create,
format, and partition new EBS volumes for use with StarCluster.


Download
--------
StarCluster is available on PyPI
(http://pypi.python.org/pypi/StarCluster) and also on the project's website:

http://web.mit.edu/starcluster

You will find the docs as well as links to the StarCluster mailing list
on the website.


New in this version:
--------------------

* support for launching and managing multiple clusters on EC2

* added "listclusters" command for showing all active clusters on EC2

* support for attaching and NFS-sharing multiple EBS volumes

* added createimage and createvolume commands for easily creating new
AMIs and EBS volumes for use with StarCluster

* experimental support for launching clusters using spot instances

* added support for StarCluster "plugins" that provide the ability to
perform additional configuration/setup routines on top of StarCluster's
default cluster configuration

* added "listpublic" command for listing all available public StarCluser
AMIs that can be used with StarCluster

* bash/zsh command line completion for StarCluster's command line interface


From david_baddeley at yahoo.com.au  Thu May 27 17:12:12 2010
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Thu, 27 May 2010 14:12:12 -0700 (PDT)
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
Message-ID: <792740.87711.qm@web33006.mail.mud.yahoo.com>

Hi Linda, 

I probably wouldn't divide the signal up into chunks before procesing, and also suspect that the FFT might be the wrong tool for the job (I'd certainly take the fft of the whole signal just to check that you have the right frequencies in it though).

The problem with dividing into chunks and processing each separately is that you don't necessarily know where each bit start and stops - your chunks are thus, more likely than not, going to be misaligned.

I'd probably tackle the problem with a strategy directly analogous to that used in analogue circuitry for decoding PSK - I'd either mix the carrier out & do I-Q detection (multiply with a complex exponential and then look at the low pass filtered real & imaginary parts of the result), or just look for the two frequency components separately by multiplying with a complex exponential at each frequency & low pass filtering the amplitude (I'd probably use a boxcar filter the same length as your symbols/frames).

After doing this you can then start to decide where your frame boundaries are. If you've filtered as described, you should just be able to start at some offset and then take every 18th value.

hope this gives you some ideas,

David


________________________________
From: Linda <linda.polman at gmail.com>
To: SciPy Users List <scipy-user at scipy.org>
Sent: Fri, 28 May, 2010 2:42:05 AM
Subject: Re: [SciPy-User] finding frequency of wav

Thanks for your reply. The explanation on fftfreq already made a few puzzle pieces fall into place.

The signal I am trying to decode is a DSC transmission that is recorded in a wav file. (Digital Selective Calling, used in marine radio)
It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and there's a carrier at 1700Hz. That should be all frequencies involved (apart from noise). Currently I am used generated, clean signals. But probably I should get a clean '10101010'-signal first to try my work on.

Since the bitrate is set at 1200bits/sec, the bit length would be samplerate/1200 = 18.4 samples at 22050.
I can double the samplerate to 44100, but that still leaves me at only 36.8 samples per chunk.
If I understand what you say correctly, I would need at least 55 (64) samples in each chunk?

I'm not sure what chunk[3] would have been, I should have used a dotting-signal instead of an unknown message to try this on. I will try this again with more useful data this afternoon.

cheers,
Linda


On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr> wrote:

Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
>
>> Hello all,
>
>>> I have a digital signal where the bits in it are encoded with
>>> frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
>>> samplerate of 22050.
>>> My goal is to find the bits again so I can decode the message in it,
>>> for that I have chopped the wav up in pieces of 18 samples, which
>>> would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
>>> a list of chunks of length 18. I thought I could just fft each chunks
>>> and find the max of the chunk-spectrum, to find out the bitfrequency
>>> in the chunk (and thus the bitvalue)
>
>Correct me if I am wrong. You are cutting your signal into chunks that
>>you expect to contain at least one period of the lower coding frequency.
>>You then perform a fft on a very small signal (18 samples) which gives
>>you (without zero padding) an estimation of the Fourier transform of
>>your chunk computed on only 18 frequencies, i.e. with a really bad
>>frequential resolution. It is possible if your coding frequencies are
>>not too close. A raw Rayleight criteria leads to cut your signal into at
>>least N=2*Fe/df_min where df_min is the minimal spacing between two
>>coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
>>power of 2).
>
>>
>>> But somehow I am stuck in the numbers, I was hopeing you could give me
>>> a hint. here is what I have:
>
>>> chunks[3] #this is one of the wavchunks, there should be a bit hidden in here
>>> Out[98]:
>>>   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,  0,  0], dtype=int16)
>>> test = fft(chunks[3]) # spectrum of the chunk, the peak should give me the value of the bitfrequency 1300 of 2100 Hz?
>>> test
>>> Out[100]:
>>> array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>>>         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>>>         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>>>        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>>>         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>>>         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>>>        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>>>         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>>>         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
>>>
>>>
>>> I am unsure how to proceed from here, so I would really appreciate any
>>> tips.. I found fftfreq, but I am not sure how to use it? I read
>>> fftfreq? but I don't see how the example even uses the 'fourier'
>>> variable in the fftfreq there?
>>>
>Fftfreq is a function that constructs the frequency vector associated to
>>the data computed by the fft algorithm. It is aware of how fft orders
>>the frequency bins, and transform it in a more convenient way (it
>>'anti-aliases', centering the results on zero frequency).
>
>>import numpy as np
>>import matplotlib.pyplot as plt
>>chunks[3]=....
>>test = np.fft.fft(chunks[3])
>>frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling period
>>plt.plot(frequencies, np.abs(test), 'o')
>>plt.show()
>
>>but you won't see any things on this fft. I am suspicious due to the
>>fact that the signal to noise ratio seems rather low leading to strong
>>peak at Fe/2
>>In chunk[3], what do you expect to be the bit?
>
>>Fabricio
>
>>_______________________________________________
>>SciPy-User mailing list
>SciPy-User at scipy.org
>http://mail.scipy.org/mailman/listinfo/scipy-user
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100527/9883b761/attachment.html>

From afraser at lanl.gov  Thu May 27 17:37:07 2010
From: afraser at lanl.gov (Andy Fraser)
Date: Thu, 27 May 2010 15:37:07 -0600
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <AANLkTikLnFY5fsYCmO0FBOpZJ2rYolvHBre5STgzJYvU@mail.gmail.com>
	(Robin's message of "Tue\, 25 May 2010 22\:50\:54 +0100")
References: <8739xgndes.fsf@lanl.gov>
	<AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>
	<AANLkTimSCRBduYNLO1mRkkMhS2cJsoKGAxj4oxtKGHq2@mail.gmail.com>
	<AANLkTikLnFY5fsYCmO0FBOpZJ2rYolvHBre5STgzJYvU@mail.gmail.com>
Message-ID: <8763292fi4.fsf@lanl.gov>

Thanks for the replies and pointers.  I got multiprocessing.Pool to
work, but it eats up memory and time.  I append two implementation
segments below.  The multiprocessing version is about 33 times
_slower_ than the single processor version.  Unless I use a small
number of processors, memory fills up and I kill the job to make the
computer usable again.  The following segments of code are inside a
loop that steps over 115 lines of pixels.

def func(job):
    return job[0].random_fork(job[1])

 .
 .
 .
 .
 .
 .


#Multiprocessing version:

        noise = numpy.random.standard_normal((N_particles,noise_df))
        jobs = zip(self.particles,noise)
        self.particles = self.pool.map(func, jobs, self.chunk_size)
        return (m,v)

 .
 .
 .
 .
 .
 .

#Single processing version

        noise = numpy.random.standard_normal((N_particles,noise_df))
        jobs = zip(self.particles,noise)
        self.particles = map(func, jobs)
        return (m,v)

-- 
Andy Fraser				ISR-2	(MS:B244)
afraser at lanl.gov			Los Alamos National Laboratory
505 665 9448				Los Alamos, NM 87545


From zachary.pincus at yale.edu  Thu May 27 23:13:20 2010
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Thu, 27 May 2010 23:13:20 -0400
Subject: [SciPy-User] using multiple processors for particle filtering
In-Reply-To: <8763292fi4.fsf@lanl.gov>
References: <8739xgndes.fsf@lanl.gov>
	<AANLkTik52gzsOvk9uUlKAQKD6jEmINH607gSyWwOp-Ak@mail.gmail.com>
	<AANLkTimSCRBduYNLO1mRkkMhS2cJsoKGAxj4oxtKGHq2@mail.gmail.com>
	<AANLkTikLnFY5fsYCmO0FBOpZJ2rYolvHBre5STgzJYvU@mail.gmail.com>
	<8763292fi4.fsf@lanl.gov>
Message-ID: <97132D07-D91C-45F1-BACD-AAE476E91F9F@yale.edu>

> Thanks for the replies and pointers.  I got multiprocessing.Pool to
> work, but it eats up memory and time.  I append two implementation
> segments below.  The multiprocessing version is about 33 times
> _slower_ than the single processor version.  Unless I use a small
> number of processors, memory fills up and I kill the job to make the
> computer usable again.  The following segments of code are inside a
> loop that steps over 115 lines of pixels.

Several problems here:

(1) I am sorry I didn't mention this earlier, but looking over your  
original email, it appears that your single-process code might be very  
inefficient: it seems to perturb each particle individually in a for- 
loop rather than working on an array of all the particles. Perhaps you  
should try to fix that before adding multiprocessing? Basically, you  
should hopefully be able to write random_fork to work on a number of  
particles at once using numpy broadcasting, etc. This way, the for- 
loop that steps through the elements is implemented in compiled C,  
rather than interpreted python. Check out various numpy tutorials for  
details, but here's the general gist:

points = numpy.arange(6000).reshape((3000,2)) # 3000 x,y points
perturbations = numpy.random.normal(size=(3000,2))

def perturb_bad(points, perturbations):
   for point, perturbation in zip(points, perturbations):
     point += perturbation

def perturb_good(points, perturbations):
   points += perturbations

timeit perturb_bad(points, perturbations) # 10 loops, best of 3: 18.7  
milliseconds per loop
timeit perturb_good(points, perturbations) # 10000 loops, best of 3:  
161 microseconds per loop

Compare this orders-of-magnitude gain to the at-best-8-fold gain you'd  
get from multiprocessing the bad code.

Also note that "map" is basically just an interpreted for-loop under  
the hood:
import operator
timeit map(operator.add, points, perturbations) # 10 loops, best of 3:  
18.7 milliseconds per loop

The moral here is to avoid looping constructs in python when working  
with sets of numbers and instead use numpy operations that operate on  
lots of numbers with one python command.


(2) From the slowdowns you report, it looks like overhead costs are  
completely dominating. For each job, the code and data need to be  
serialized (pickled, I think, is how the multiprocessing library  
handles it), written to a pipe, unpickled, executed, and the results  
need to be pickled, sent back, and unpickled. Perhaps using memmap to  
share state might be better? Or you can make sure that the function  
parameters and results can be very rapidly pickled and unpickled  
(single numpy arrays, e.g., not lists-of-sub-arrays or something).


Still, tune the single-processor code first. Perhaps you can send more  
detailed code samples and folks on the list can offer some advice  
about how to make it numpy-friendly and fast.

Zach


On May 27, 2010, at 5:37 PM, Andy Fraser wrote:

> Thanks for the replies and pointers.  I got multiprocessing.Pool to
> work, but it eats up memory and time.  I append two implementation
> segments below.  The multiprocessing version is about 33 times
> _slower_ than the single processor version.  Unless I use a small
> number of processors, memory fills up and I kill the job to make the
> computer usable again.  The following segments of code are inside a
> loop that steps over 115 lines of pixels.
>
> def func(job):
>    return job[0].random_fork(job[1])
>
> .
> .
> .
> .
> .
> .
>
>
> #Multiprocessing version:
>
>        noise = numpy.random.standard_normal((N_particles,noise_df))
>        jobs = zip(self.particles,noise)
>        self.particles = self.pool.map(func, jobs, self.chunk_size)
>        return (m,v)
>
> .
> .
> .
> .
> .
> .
>
> #Single processing version
>
>        noise = numpy.random.standard_normal((N_particles,noise_df))
>        jobs = zip(self.particles,noise)
>        self.particles = map(func, jobs)
>        return (m,v)
>
> -- 
> Andy Fraser				ISR-2	(MS:B244)
> afraser at lanl.gov			Los Alamos National Laboratory
> 505 665 9448				Los Alamos, NM 87545
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From linda.polman at gmail.com  Fri May 28 03:03:33 2010
From: linda.polman at gmail.com (Linda)
Date: Fri, 28 May 2010 09:03:33 +0200
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <792740.87711.qm@web33006.mail.mud.yahoo.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
	<792740.87711.qm@web33006.mail.mud.yahoo.com>
Message-ID: <AANLkTil3UnWxpYffhiIcaxcx5_htoP6cEu605mL9WsE7@mail.gmail.com>

Thank you, it certainly does give me ideas :-) I will look into this today.

Linda


On Thu, May 27, 2010 at 23:12, David Baddeley
<david_baddeley at yahoo.com.au>wrote:

> Hi Linda,
>
> I probably wouldn't divide the signal up into chunks before procesing, and
> also suspect that the FFT might be the wrong tool for the job (I'd certainly
> take the fft of the whole signal just to check that you have the right
> frequencies in it though).
>
> The problem with dividing into chunks and processing each separately is
> that you don't necessarily know where each bit start and stops - your chunks
> are thus, more likely than not, going to be misaligned.
>
> I'd probably tackle the problem with a strategy directly analogous to that
> used in analogue circuitry for decoding PSK - I'd either mix the carrier out
> & do I-Q detection (multiply with a complex exponential and then look at the
> low pass filtered real & imaginary parts of the result), or just look for
> the two frequency components separately by multiplying with a complex
> exponential at each frequency & low pass filtering the amplitude (I'd
> probably use a boxcar filter the same length as your symbols/frames).
>
> After doing this you can then start to decide where your frame boundaries
> are. If you've filtered as described, you should just be able to start at
> some offset and then take every 18th value.
>
> hope this gives you some ideas,
>
> David
> ------------------------------
> *From:* Linda <linda.polman at gmail.com>
> *To:* SciPy Users List <scipy-user at scipy.org>
> *Sent:* Fri, 28 May, 2010 2:42:05 AM
> *Subject:* Re: [SciPy-User] finding frequency of wav
>
> Thanks for your reply. The explanation on fftfreq already made a few puzzle
> pieces fall into place.
>
> The signal I am trying to decode is a DSC transmission that is recorded in
> a wav file. (Digital Selective Calling, used in marine radio)
> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and
> there's a carrier at 1700Hz. That should be all frequencies involved (apart
> from noise). Currently I am used generated, clean signals. But probably I
> should get a clean '10101010'-signal first to try my work on.
>
> Since the bitrate is set at 1200bits/sec, the bit length would be
> samplerate/1200 = 18.4 samples at 22050.
> I can double the samplerate to 44100, but that still leaves me at only 36.8
> samples per chunk.
> If I understand what you say correctly, I would need at least 55 (64)
> samples in each chunk?
>
> I'm not sure what chunk[3] would have been, I should have used a
> dotting-signal instead of an unknown message to try this on. I will try this
> again with more useful data this afternoon.
>
> cheers,
> Linda
>
>
> On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr>wrote:
>
>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
>> > Hello all,
>>
>> > I have a digital signal where the bits in it are encoded with
>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
>> > samplerate of 22050.
>> > My goal is to find the bits again so I can decode the message in it,
>> > for that I have chopped the wav up in pieces of 18 samples, which
>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
>> > a list of chunks of length 18. I thought I could just fft each chunks
>> > and find the max of the chunk-spectrum, to find out the bitfrequency
>> > in the chunk (and thus the bitvalue)
>>
>> Correct me if I am wrong. You are cutting your signal into chunks that
>> you expect to contain at least one period of the lower coding frequency.
>> You then perform a fft on a very small signal (18 samples) which gives
>> you (without zero padding) an estimation of the Fourier transform of
>> your chunk computed on only 18 frequencies, i.e. with a really bad
>> frequential resolution. It is possible if your coding frequencies are
>> not too close. A raw Rayleight criteria leads to cut your signal into at
>> least N=2*Fe/df_min where df_min is the minimal spacing between two
>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
>> power of 2).
>> >
>> > But somehow I am stuck in the numbers, I was hopeing you could give me
>> > a hint. here is what I have:
>>
>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden in
>> here
>> > Out[98]:
>> >   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,
>>  0,  0], dtype=int16)
>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me
>> the value of the bitfrequency 1300 of 2100 Hz?
>> > test
>> > Out[100]:
>> > array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>> >         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>> >         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>> >        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>> >         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>> >         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>> >        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>> >         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>> >         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
>> >
>> >
>> > I am unsure how to proceed from here, so I would really appreciate any
>> > tips.. I found fftfreq, but I am not sure how to use it? I read
>> > fftfreq? but I don't see how the example even uses the 'fourier'
>> > variable in the fftfreq there?
>> >
>> Fftfreq is a function that constructs the frequency vector associated to
>> the data computed by the fft algorithm. It is aware of how fft orders
>> the frequency bins, and transform it in a more convenient way (it
>> 'anti-aliases', centering the results on zero frequency).
>>
>> import numpy as np
>> import matplotlib.pyplot as plt
>> chunks[3]=....
>> test = np.fft.fft(chunks[3])
>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling
>> period
>> plt.plot(frequencies, np.abs(test), 'o')
>> plt.show()
>>
>> but you won't see any things on this fft. I am suspicious due to the
>> fact that the signal to noise ratio seems rather low leading to strong
>> peak at Fe/2
>> In chunk[3], what do you expect to be the bit?
>>
>> Fabricio
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/506ad3c3/attachment.html>

From christophermarkstrickland at gmail.com  Fri May 28 07:29:26 2010
From: christophermarkstrickland at gmail.com (Chris Strickland)
Date: Fri, 28 May 2010 21:29:26 +1000
Subject: [SciPy-User] log pdf, cdf, etc
Message-ID: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>

Hi,

When using any of the distributions of scipy.stats there does not seem to be

the ability (or at least I cannot figure out how) to have the function
return
the log of the pdf, cdf, sf, etc. For statistical analysis this is
essential.
For instance suppose we are interested in an exponential distribution for a
random variable x with a hyperparameter lambda there needs to be an option
that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
calculate log(scipy.stats.expon.pdf(x,lambda)).

Is there a way to do this using the distributions in scipy.stats?

If there is not is it possible for me to suggest that this feature is added.

There is such an excellent range of distributions, each with such an
impressive range of options, it seems ashame to have to mostly manually code

up the log of pdfs and often call the log of CDFs from R.

Thanks,
Chris.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/1df2721c/attachment.html>

From linda.polman at gmail.com  Fri May 28 08:41:11 2010
From: linda.polman at gmail.com (Linda)
Date: Fri, 28 May 2010 14:41:11 +0200
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <792740.87711.qm@web33006.mail.mud.yahoo.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
	<792740.87711.qm@web33006.mail.mud.yahoo.com>
Message-ID: <AANLkTimynKl7wsfkjqKo4UJt8ECIR9ee4MS_UG3l26PT@mail.gmail.com>

Hello David,
I decided that fft would not work on finding those bits, so I spent all day
looking into your suggestions.

I think my data signals are FSK (FM?) instead of PSK, there are no
discontinuieties in the signals when I take a look at them in audacity.

I'm not sure I completely understand the multiply with a complex exponential
part. Should I multiply my data-array with exp( 1j * omega * T )? where
omega would be 2 * pi * f_carrier (1700)  and  T would be 1/Fs? or the
bitlength in samples?

This task seems to be quite a bit more difficult than I initially thought
(undergrad student)

I would really appreciate some more help :-)

Cheers
Linda

On Thu, May 27, 2010 at 23:12, David Baddeley
<david_baddeley at yahoo.com.au>wrote:

> Hi Linda,
>
> I probably wouldn't divide the signal up into chunks before procesing, and
> also suspect that the FFT might be the wrong tool for the job (I'd certainly
> take the fft of the whole signal just to check that you have the right
> frequencies in it though).
>
> The problem with dividing into chunks and processing each separately is
> that you don't necessarily know where each bit start and stops - your chunks
> are thus, more likely than not, going to be misaligned.
>
> I'd probably tackle the problem with a strategy directly analogous to that
> used in analogue circuitry for decoding PSK - I'd either mix the carrier out
> & do I-Q detection (multiply with a complex exponential and then look at the
> low pass filtered real & imaginary parts of the result), or just look for
> the two frequency components separately by multiplying with a complex
> exponential at each frequency & low pass filtering the amplitude (I'd
> probably use a boxcar filter the same length as your symbols/frames).
>
> After doing this you can then start to decide where your frame boundaries
> are. If you've filtered as described, you should just be able to start at
> some offset and then take every 18th value.
>
> hope this gives you some ideas,
>
> David
> ------------------------------
> *From:* Linda <linda.polman at gmail.com>
> *To:* SciPy Users List <scipy-user at scipy.org>
> *Sent:* Fri, 28 May, 2010 2:42:05 AM
> *Subject:* Re: [SciPy-User] finding frequency of wav
>
> Thanks for your reply. The explanation on fftfreq already made a few puzzle
> pieces fall into place.
>
> The signal I am trying to decode is a DSC transmission that is recorded in
> a wav file. (Digital Selective Calling, used in marine radio)
> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and
> there's a carrier at 1700Hz. That should be all frequencies involved (apart
> from noise). Currently I am used generated, clean signals. But probably I
> should get a clean '10101010'-signal first to try my work on.
>
> Since the bitrate is set at 1200bits/sec, the bit length would be
> samplerate/1200 = 18.4 samples at 22050.
> I can double the samplerate to 44100, but that still leaves me at only 36.8
> samples per chunk.
> If I understand what you say correctly, I would need at least 55 (64)
> samples in each chunk?
>
> I'm not sure what chunk[3] would have been, I should have used a
> dotting-signal instead of an unknown message to try this on. I will try this
> again with more useful data this afternoon.
>
> cheers,
> Linda
>
>
> On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr>wrote:
>
>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
>> > Hello all,
>>
>> > I have a digital signal where the bits in it are encoded with
>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
>> > samplerate of 22050.
>> > My goal is to find the bits again so I can decode the message in it,
>> > for that I have chopped the wav up in pieces of 18 samples, which
>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
>> > a list of chunks of length 18. I thought I could just fft each chunks
>> > and find the max of the chunk-spectrum, to find out the bitfrequency
>> > in the chunk (and thus the bitvalue)
>>
>> Correct me if I am wrong. You are cutting your signal into chunks that
>> you expect to contain at least one period of the lower coding frequency.
>> You then perform a fft on a very small signal (18 samples) which gives
>> you (without zero padding) an estimation of the Fourier transform of
>> your chunk computed on only 18 frequencies, i.e. with a really bad
>> frequential resolution. It is possible if your coding frequencies are
>> not too close. A raw Rayleight criteria leads to cut your signal into at
>> least N=2*Fe/df_min where df_min is the minimal spacing between two
>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
>> power of 2).
>> >
>> > But somehow I am stuck in the numbers, I was hopeing you could give me
>> > a hint. here is what I have:
>>
>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden in
>> here
>> > Out[98]:
>> >   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,  0,
>>  0,  0], dtype=int16)
>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me
>> the value of the bitfrequency 1300 of 2100 Hz?
>> > test
>> > Out[100]:
>> > array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>> >         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>> >         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>> >        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>> >         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>> >         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>> >        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>> >         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>> >         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
>> >
>> >
>> > I am unsure how to proceed from here, so I would really appreciate any
>> > tips.. I found fftfreq, but I am not sure how to use it? I read
>> > fftfreq? but I don't see how the example even uses the 'fourier'
>> > variable in the fftfreq there?
>> >
>> Fftfreq is a function that constructs the frequency vector associated to
>> the data computed by the fft algorithm. It is aware of how fft orders
>> the frequency bins, and transform it in a more convenient way (it
>> 'anti-aliases', centering the results on zero frequency).
>>
>> import numpy as np
>> import matplotlib.pyplot as plt
>> chunks[3]=....
>> test = np.fft.fft(chunks[3])
>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling
>> period
>> plt.plot(frequencies, np.abs(test), 'o')
>> plt.show()
>>
>> but you won't see any things on this fft. I am suspicious due to the
>> fact that the signal to noise ratio seems rather low leading to strong
>> peak at Fe/2
>> In chunk[3], what do you expect to be the bit?
>>
>> Fabricio
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/b8ee8915/attachment.html>

From josef.pktd at gmail.com  Fri May 28 10:15:55 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 10:15:55 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
Message-ID: <AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>

On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
> Hi,
>
> When using any of the distributions of scipy.stats there does not seem to be
> the ability (or at least I cannot figure out how) to have the function
> return
> the log of the pdf, cdf, sf, etc. For statistical analysis this is
> essential.
> For instance suppose we are interested in an exponential distribution for a
> random variable x with a hyperparameter lambda there needs to be an option
> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
> calculate log(scipy.stats.expon.pdf(x,lambda)).
>
> Is there a way to do this using the distributions in scipy.stats?

It would need a new method for each distribution, e.g. _loglike, _logpdf
So, this is work, and for some distributions the log wouldn't simplify much.

I proposed this once together with other improvements (but without response).

The second useful method for estimation would be _fitstart, which
provides distribution specific starting values for fit, e.g. a moment
estimator, or a simple rules of thumb
http://projects.scipy.org/scipy/ticket/808


Here are some of my currently planned enhancements to the distributions:

http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py

but I just checked, it looks like I forgot to copy the _loglike method
that I started from my experimental scripts.

For a few distributions, where this is possible, it would also be
useful to add the gradient with respect to the parameters, (or even
the Hessian). But this is currently mostly just an idea, since we need
some analytical gradients in the estimation of stats models.


>
> If there is not is it possible for me to suggest that this feature is added.
> There is such an excellent range of distributions, each with such an
> impressive range of options, it seems ashame to have to mostly manually code
> up the log of pdfs and often call the log of CDFs from R.

So far I only thought about log pdf, because I wanted it for Maximum
Likelihood estimation.

Do you have a rough idea for which distributions log cdf would work?
that is, for which distribution is an analytical or efficient
numerical expression possible.

I also think that scipy.stats.distributions could be one of the best
(broadest, consistent) collection of univariate distributions that I
have seen so far, once we fill in some missing pieces.

As a way forward, I think we could make the distributions into a
numerical encyclopedia by adding private methods to those
distributions where it makes sense, like log pdf, log cdf and I also
started to add characteristic functions to some distributions in my
experimental scripts.
If you have a collection of logpdf, logcdf, we could add a trac ticket for this.

However, this would miss the generic broadcasting part of the public
functions, pdf, cdf,... but for estimation I wouldn't necessarily call
those because of the overhead.


I'm working on and off on this, so it's moving only slowly (and my
wishlist is big).
(for example, I was reading up on extreme value distributions in
actuarial science and hydrology to get a better overview over the
estimators.)


So, I really love to hear any ideas, feedback, and see contributions
to improving the distributions.

Josef


>
> Thanks,
> Chris.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From ben.root at ou.edu  Fri May 28 10:27:31 2010
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 28 May 2010 09:27:31 -0500
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTimynKl7wsfkjqKo4UJt8ECIR9ee4MS_UG3l26PT@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com> 
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com> 
	<792740.87711.qm@web33006.mail.mud.yahoo.com>
	<AANLkTimynKl7wsfkjqKo4UJt8ECIR9ee4MS_UG3l26PT@mail.gmail.com>
Message-ID: <AANLkTimnre1GzCCx77yHqr-XOSnt1cZS1940mYNaMwYe@mail.gmail.com>

Linda,

I am not very familiar with this particular topic, but you might want to
look into Wavelet Analysis:

http://en.wikipedia.org/wiki/Wavelet
http://www.amara.com/IEEEwave/IW_wave_ana.html

Fourier transforms are definately the wrong tool here because it assumes
that the different frequency waves exists for the entire sample.  My (basic)
understanding of wavelet analysis is that it does not make that assumption.

Hope this helps,
Ben Root

On Fri, May 28, 2010 at 7:41 AM, Linda <linda.polman at gmail.com> wrote:

> Hello David,
> I decided that fft would not work on finding those bits, so I spent all day
> looking into your suggestions.
>
> I think my data signals are FSK (FM?) instead of PSK, there are no
> discontinuieties in the signals when I take a look at them in audacity.
>
> I'm not sure I completely understand the multiply with a complex
> exponential part. Should I multiply my data-array with exp( 1j * omega * T
> )? where omega would be 2 * pi * f_carrier (1700)  and  T would be 1/Fs? or
> the bitlength in samples?
>
> This task seems to be quite a bit more difficult than I initially thought
> (undergrad student)
>
> I would really appreciate some more help :-)
>
> Cheers
> Linda
>
> On Thu, May 27, 2010 at 23:12, David Baddeley <david_baddeley at yahoo.com.au
> > wrote:
>
>> Hi Linda,
>>
>> I probably wouldn't divide the signal up into chunks before procesing, and
>> also suspect that the FFT might be the wrong tool for the job (I'd certainly
>> take the fft of the whole signal just to check that you have the right
>> frequencies in it though).
>>
>> The problem with dividing into chunks and processing each separately is
>> that you don't necessarily know where each bit start and stops - your chunks
>> are thus, more likely than not, going to be misaligned.
>>
>> I'd probably tackle the problem with a strategy directly analogous to that
>> used in analogue circuitry for decoding PSK - I'd either mix the carrier out
>> & do I-Q detection (multiply with a complex exponential and then look at the
>> low pass filtered real & imaginary parts of the result), or just look for
>> the two frequency components separately by multiplying with a complex
>> exponential at each frequency & low pass filtering the amplitude (I'd
>> probably use a boxcar filter the same length as your symbols/frames).
>>
>> After doing this you can then start to decide where your frame boundaries
>> are. If you've filtered as described, you should just be able to start at
>> some offset and then take every 18th value.
>>
>> hope this gives you some ideas,
>>
>> David
>> ------------------------------
>> *From:* Linda <linda.polman at gmail.com>
>> *To:* SciPy Users List <scipy-user at scipy.org>
>> *Sent:* Fri, 28 May, 2010 2:42:05 AM
>> *Subject:* Re: [SciPy-User] finding frequency of wav
>>
>> Thanks for your reply. The explanation on fftfreq already made a few
>> puzzle pieces fall into place.
>>
>> The signal I am trying to decode is a DSC transmission that is recorded in
>> a wav file. (Digital Selective Calling, used in marine radio)
>> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and
>> there's a carrier at 1700Hz. That should be all frequencies involved (apart
>> from noise). Currently I am used generated, clean signals. But probably I
>> should get a clean '10101010'-signal first to try my work on.
>>
>> Since the bitrate is set at 1200bits/sec, the bit length would be
>> samplerate/1200 = 18.4 samples at 22050.
>> I can double the samplerate to 44100, but that still leaves me at only
>> 36.8 samples per chunk.
>> If I understand what you say correctly, I would need at least 55 (64)
>> samples in each chunk?
>>
>> I'm not sure what chunk[3] would have been, I should have used a
>> dotting-signal instead of an unknown message to try this on. I will try this
>> again with more useful data this afternoon.
>>
>> cheers,
>> Linda
>>
>>
>> On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr>wrote:
>>
>>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
>>> > Hello all,
>>>
>>> > I have a digital signal where the bits in it are encoded with
>>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
>>> > samplerate of 22050.
>>> > My goal is to find the bits again so I can decode the message in it,
>>> > for that I have chopped the wav up in pieces of 18 samples, which
>>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
>>> > a list of chunks of length 18. I thought I could just fft each chunks
>>> > and find the max of the chunk-spectrum, to find out the bitfrequency
>>> > in the chunk (and thus the bitvalue)
>>>
>>> Correct me if I am wrong. You are cutting your signal into chunks that
>>> you expect to contain at least one period of the lower coding frequency.
>>> You then perform a fft on a very small signal (18 samples) which gives
>>> you (without zero padding) an estimation of the Fourier transform of
>>> your chunk computed on only 18 frequencies, i.e. with a really bad
>>> frequential resolution. It is possible if your coding frequencies are
>>> not too close. A raw Rayleight criteria leads to cut your signal into at
>>> least N=2*Fe/df_min where df_min is the minimal spacing between two
>>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
>>> power of 2).
>>> >
>>> > But somehow I am stuck in the numbers, I was hopeing you could give me
>>> > a hint. here is what I have:
>>>
>>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden
>>> in here
>>> > Out[98]:
>>> >   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,
>>>  0,  0,  0], dtype=int16)
>>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me
>>> the value of the bitfrequency 1300 of 2100 Hz?
>>> > test
>>> > Out[100]:
>>> > array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>>> >         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>>> >         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>>> >        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>>> >         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>>> >         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>>> >        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>>> >         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>>> >         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
>>> >
>>> >
>>> > I am unsure how to proceed from here, so I would really appreciate any
>>> > tips.. I found fftfreq, but I am not sure how to use it? I read
>>> > fftfreq? but I don't see how the example even uses the 'fourier'
>>> > variable in the fftfreq there?
>>> >
>>> Fftfreq is a function that constructs the frequency vector associated to
>>> the data computed by the fft algorithm. It is aware of how fft orders
>>> the frequency bins, and transform it in a more convenient way (it
>>> 'anti-aliases', centering the results on zero frequency).
>>>
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> chunks[3]=....
>>> test = np.fft.fft(chunks[3])
>>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling
>>> period
>>> plt.plot(frequencies, np.abs(test), 'o')
>>> plt.show()
>>>
>>> but you won't see any things on this fft. I am suspicious due to the
>>> fact that the signal to noise ratio seems rather low leading to strong
>>> peak at Fe/2
>>> In chunk[3], what do you expect to be the bit?
>>>
>>> Fabricio
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/1d52ab18/attachment.html>

From linda.polman at gmail.com  Fri May 28 11:26:06 2010
From: linda.polman at gmail.com (Linda)
Date: Fri, 28 May 2010 17:26:06 +0200
Subject: [SciPy-User] finding frequency of wav
In-Reply-To: <AANLkTimnre1GzCCx77yHqr-XOSnt1cZS1940mYNaMwYe@mail.gmail.com>
References: <AANLkTikytBxC7bHhz1A8SEMphU2V737Bi7KWG99N7jhl@mail.gmail.com>
	<1274966922.2121.33.camel@Portable-s2m.cnrs-mrs.fr>
	<AANLkTimPb-h0sqM_cICHbWCfbtnZxo0iFSS91EeToJPl@mail.gmail.com>
	<792740.87711.qm@web33006.mail.mud.yahoo.com>
	<AANLkTimynKl7wsfkjqKo4UJt8ECIR9ee4MS_UG3l26PT@mail.gmail.com>
	<AANLkTimnre1GzCCx77yHqr-XOSnt1cZS1940mYNaMwYe@mail.gmail.com>
Message-ID: <AANLkTilANdrOz9TDUSQvpz6tKJ2KkJKI0_gcGupKmxZJ@mail.gmail.com>

Thank you, I will take a look at this :-)

On Fri, May 28, 2010 at 16:27, Benjamin Root <ben.root at ou.edu> wrote:

> Linda,
>
> I am not very familiar with this particular topic, but you might want to
> look into Wavelet Analysis:
>
> http://en.wikipedia.org/wiki/Wavelet
> http://www.amara.com/IEEEwave/IW_wave_ana.html
>
> Fourier transforms are definately the wrong tool here because it assumes
> that the different frequency waves exists for the entire sample.  My (basic)
> understanding of wavelet analysis is that it does not make that assumption.
>
> Hope this helps,
> Ben Root
>
>
> On Fri, May 28, 2010 at 7:41 AM, Linda <linda.polman at gmail.com> wrote:
>
>> Hello David,
>> I decided that fft would not work on finding those bits, so I spent all
>> day looking into your suggestions.
>>
>> I think my data signals are FSK (FM?) instead of PSK, there are no
>> discontinuieties in the signals when I take a look at them in audacity.
>>
>> I'm not sure I completely understand the multiply with a complex
>> exponential part. Should I multiply my data-array with exp( 1j * omega * T
>> )? where omega would be 2 * pi * f_carrier (1700)  and  T would be 1/Fs? or
>> the bitlength in samples?
>>
>> This task seems to be quite a bit more difficult than I initially thought
>> (undergrad student)
>>
>> I would really appreciate some more help :-)
>>
>> Cheers
>> Linda
>>
>> On Thu, May 27, 2010 at 23:12, David Baddeley <
>> david_baddeley at yahoo.com.au> wrote:
>>
>>> Hi Linda,
>>>
>>> I probably wouldn't divide the signal up into chunks before procesing,
>>> and also suspect that the FFT might be the wrong tool for the job (I'd
>>> certainly take the fft of the whole signal just to check that you have the
>>> right frequencies in it though).
>>>
>>> The problem with dividing into chunks and processing each separately is
>>> that you don't necessarily know where each bit start and stops - your chunks
>>> are thus, more likely than not, going to be misaligned.
>>>
>>> I'd probably tackle the problem with a strategy directly analogous to
>>> that used in analogue circuitry for decoding PSK - I'd either mix the
>>> carrier out & do I-Q detection (multiply with a complex exponential and then
>>> look at the low pass filtered real & imaginary parts of the result), or just
>>> look for the two frequency components separately by multiplying with a
>>> complex exponential at each frequency & low pass filtering the amplitude
>>> (I'd probably use a boxcar filter the same length as your symbols/frames).
>>>
>>> After doing this you can then start to decide where your frame boundaries
>>> are. If you've filtered as described, you should just be able to start at
>>> some offset and then take every 18th value.
>>>
>>> hope this gives you some ideas,
>>>
>>> David
>>> ------------------------------
>>> *From:* Linda <linda.polman at gmail.com>
>>> *To:* SciPy Users List <scipy-user at scipy.org>
>>> *Sent:* Fri, 28 May, 2010 2:42:05 AM
>>> *Subject:* Re: [SciPy-User] finding frequency of wav
>>>
>>> Thanks for your reply. The explanation on fftfreq already made a few
>>> puzzle pieces fall into place.
>>>
>>> The signal I am trying to decode is a DSC transmission that is recorded
>>> in a wav file. (Digital Selective Calling, used in marine radio)
>>> It is a phase modulated digital signal: '1' is 2100Hz, '0' is 1300 Hz and
>>> there's a carrier at 1700Hz. That should be all frequencies involved (apart
>>> from noise). Currently I am used generated, clean signals. But probably I
>>> should get a clean '10101010'-signal first to try my work on.
>>>
>>> Since the bitrate is set at 1200bits/sec, the bit length would be
>>> samplerate/1200 = 18.4 samples at 22050.
>>> I can double the samplerate to 44100, but that still leaves me at only
>>> 36.8 samples per chunk.
>>> If I understand what you say correctly, I would need at least 55 (64)
>>> samples in each chunk?
>>>
>>> I'm not sure what chunk[3] would have been, I should have used a
>>> dotting-signal instead of an unknown message to try this on. I will try this
>>> again with more useful data this afternoon.
>>>
>>> cheers,
>>> Linda
>>>
>>>
>>> On Thu, May 27, 2010 at 15:28, Fabrice Silva <silva at lma.cnrs-mrs.fr>wrote:
>>>
>>>> Le jeudi 27 mai 2010 ? 10:09 +0200, Linda a ?crit :
>>>> > Hello all,
>>>>
>>>> > I have a digital signal where the bits in it are encoded with
>>>> > frequencies 1300 and 2100 Hz. The message is sent as a wav-file with a
>>>> > samplerate of 22050.
>>>> > My goal is to find the bits again so I can decode the message in it,
>>>> > for that I have chopped the wav up in pieces of 18 samples, which
>>>> > would be the bitlength (at 1200 Bit/s > 22050/1200=18.375). So I have
>>>> > a list of chunks of length 18. I thought I could just fft each chunks
>>>> > and find the max of the chunk-spectrum, to find out the bitfrequency
>>>> > in the chunk (and thus the bitvalue)
>>>>
>>>> Correct me if I am wrong. You are cutting your signal into chunks that
>>>> you expect to contain at least one period of the lower coding frequency.
>>>> You then perform a fft on a very small signal (18 samples) which gives
>>>> you (without zero padding) an estimation of the Fourier transform of
>>>> your chunk computed on only 18 frequencies, i.e. with a really bad
>>>> frequential resolution. It is possible if your coding frequencies are
>>>> not too close. A raw Rayleight criteria leads to cut your signal into at
>>>> least N=2*Fe/df_min where df_min is the minimal spacing between two
>>>> coding frequencies df_min=2100-1300 here thus N=55 (so 64 to have a
>>>> power of 2).
>>>> >
>>>> > But somehow I am stuck in the numbers, I was hopeing you could give me
>>>> > a hint. here is what I have:
>>>>
>>>> > chunks[3] #this is one of the wavchunks, there should be a bit hidden
>>>> in here
>>>> > Out[98]:
>>>> >   array([ 2, -1,  1, -2,  2, -2,  2, -1,  0,  0,  0,  1, -2,  2, -1,
>>>>  0,  0,  0], dtype=int16)
>>>> > test = fft(chunks[3]) # spectrum of the chunk, the peak should give me
>>>> the value of the bitfrequency 1300 of 2100 Hz?
>>>> > test
>>>> > Out[100]:
>>>> > array([ 1.00000000 +0.00000000e+00j,  1.00000000 +2.37564698e-01j,
>>>> >         1.46791111 +4.90375770e-01j,  2.50000000 +8.66025404e-01j,
>>>> >         2.65270364 -7.37891832e-01j,  1.00000000 +3.01762603e+00j,
>>>> >        -0.50000000 -2.59807621e+00j,  1.00000000 -2.41609109e+00j,
>>>> >         4.87938524 +1.43601897e+01j,  7.00000000 -6.88706904e-15j,
>>>> >         4.87938524 -1.43601897e+01j,  1.00000000 +2.41609109e+00j,
>>>> >        -0.50000000 +2.59807621e+00j,  1.00000000 -3.01762603e+00j,
>>>> >         2.65270364 +7.37891832e-01j,  2.50000000 -8.66025404e-01j,
>>>> >         1.46791111 -4.90375770e-01j,  1.00000000 -2.37564698e-01j])
>>>> >
>>>> >
>>>> > I am unsure how to proceed from here, so I would really appreciate any
>>>> > tips.. I found fftfreq, but I am not sure how to use it? I read
>>>> > fftfreq? but I don't see how the example even uses the 'fourier'
>>>> > variable in the fftfreq there?
>>>> >
>>>> Fftfreq is a function that constructs the frequency vector associated to
>>>> the data computed by the fft algorithm. It is aware of how fft orders
>>>> the frequency bins, and transform it in a more convenient way (it
>>>> 'anti-aliases', centering the results on zero frequency).
>>>>
>>>> import numpy as np
>>>> import matplotlib.pyplot as plt
>>>> chunks[3]=....
>>>> test = np.fft.fft(chunks[3])
>>>> frequencies = np.fft.fftfreq(len(test), d=1./22050.) # d is the sampling
>>>> period
>>>> plt.plot(frequencies, np.abs(test), 'o')
>>>> plt.show()
>>>>
>>>> but you won't see any things on this fft. I am suspicious due to the
>>>> fact that the signal to noise ratio seems rather low leading to strong
>>>> peak at Fe/2
>>>> In chunk[3], what do you expect to be the bit?
>>>>
>>>> Fabricio
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/c13a3cbc/attachment.html>

From jake.biesinger at gmail.com  Fri May 28 12:30:22 2010
From: jake.biesinger at gmail.com (Jacob Biesinger)
Date: Fri, 28 May 2010 09:30:22 -0700
Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data,
	possibly 	deprecation error
Message-ID: <AANLkTilnvD2ksoCfeQ-5quXQmG0Z0KEq4w4o-MnBHOe_@mail.gmail.com>

Hi!

Having some trouble with a gaussian_kde on uniform integer data.

$ python --version
Python 2.6.5
$ ipython --Version
0.10
$ ipython
In [1]: from scipy.stats import gaussian_kde
In [2]: import scipy
In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250)  #
should be uniform on [-250,250)
In [4]: k = gaussian_kde(randDistFromCenter)
/usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486:
DeprecationWarning: scipy.stats.cov is deprecated; please update your code
to use numpy.cov.
Please note that:
    - numpy.cov rowvar argument defaults to true, not false
    - numpy.cov bias argument defaults to false, not true

  """, DeprecationWarning)
/usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420:
DeprecationWarning: scipy.stats.mean is deprecated; please update your code
to use numpy.mean.
Please note that:
    - numpy.mean axis argument defaults to None, not 0
    - numpy.mean has a ddof argument to replace bias in a more general
manner.
      scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x,
axis=0, ddof=1).
  axis=0, ddof=1).""", DeprecationWarning)

In [5]: x = scipy.linspace(-300,300,100)
In [6]: y = k.evaluate(x)
In [7]: y
Out[7]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0])


#  Perhaps it's a bandwidth issue, though there should still be at least a
few non-zero entries!:
In [8]: k.covariance
Out[8]: array([[ 523.56767608]])


--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/422fdaa7/attachment.html>

From josef.pktd at gmail.com  Fri May 28 12:46:24 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 12:46:24 -0400
Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data,
	possibly deprecation error
In-Reply-To: <AANLkTilnvD2ksoCfeQ-5quXQmG0Z0KEq4w4o-MnBHOe_@mail.gmail.com>
References: <AANLkTilnvD2ksoCfeQ-5quXQmG0Z0KEq4w4o-MnBHOe_@mail.gmail.com>
Message-ID: <AANLkTikHjGrXR2DCXS41MYA_c-fQXTg26BMBPE4JDpS1@mail.gmail.com>

On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger
<jake.biesinger at gmail.com> wrote:
> Hi!
> Having some trouble with a gaussian_kde on uniform integer data.
> $ python --version
> Python 2.6.5
> $ ipython --Version
> 0.10
> $ ipython
> In [1]: from scipy.stats import gaussian_kde
> In [2]: import scipy
> In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) ?#
> should be uniform on [-250,250)
> In [4]: k = gaussian_kde(randDistFromCenter)
> /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486:
> DeprecationWarning: scipy.stats.cov is deprecated; please update your code
> to use numpy.cov.
> Please note that:
> ?? ?- numpy.cov rowvar argument defaults to true, not false
> ?? ?- numpy.cov bias argument defaults to false, not true
> ??""", DeprecationWarning)
> /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420:
> DeprecationWarning: scipy.stats.mean is deprecated; please update your code
> to use numpy.mean.
> Please note that:
> ?? ?- numpy.mean axis argument defaults to None, not 0
> ?? ?- numpy.mean has a ddof argument to replace bias in a more general
> manner.
> ?? ? ?scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x,
> axis=0, ddof=1).
> ??axis=0, ddof=1).""", DeprecationWarning)
> In [5]: x = scipy.linspace(-300,300,100)
> In [6]: y = k.evaluate(x)
> In [7]: y
> Out[7]:
> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0])
>
> # ?Perhaps it's a bandwidth issue, though there should still be at least a
> few non-zero entries!:
> In [8]: k.covariance
> Out[8]: array([[ 523.56767608]])


kde doesn't like integers, Can you file a ticket for this?

if you don't convert the original sample to integers, or convert them
to float, e.g

k = gaussian_kde(np.array(randDistFromCenter, float))

then I get y=
[  2.37718449e-05   4.59333855e-05   8.33629081e-05   1.42283310e-04
   2.28733649e-04   3.46958715e-04   4.97636815e-04   6.76566775e-04
   8.74448533e-04   1.07809127e-03   1.27285862e-03   1.44565290e-03
   1.58750741e-03   1.69501260e-03   1.77025571e-03   1.81947393e-03
....

I don't know how good the bandwidth is for a uniform distribution of
the sample, but you will get a lot of spillover/smoothing at the
boundary.

Thanks for reporting,

Josef

>
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From jake.biesinger at gmail.com  Fri May 28 13:03:14 2010
From: jake.biesinger at gmail.com (Jacob Biesinger)
Date: Fri, 28 May 2010 10:03:14 -0700
Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data,
	possibly deprecation error
In-Reply-To: <AANLkTikHjGrXR2DCXS41MYA_c-fQXTg26BMBPE4JDpS1@mail.gmail.com>
References: <AANLkTilnvD2ksoCfeQ-5quXQmG0Z0KEq4w4o-MnBHOe_@mail.gmail.com> 
	<AANLkTikHjGrXR2DCXS41MYA_c-fQXTg26BMBPE4JDpS1@mail.gmail.com>
Message-ID: <AANLkTinELccBle8BI3D4pAgvTrjtOyQjiI7xD2e2bQQL@mail.gmail.com>

Oh, great.  Thanks for the workaround.  I opened a ticket for this on Scipy
Trac:  http://projects.scipy.org/scipy/ticket/1181

<http://projects.scipy.org/scipy/ticket/1181>
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


On Fri, May 28, 2010 at 9:46 AM, <josef.pktd at gmail.com> wrote:

> On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger
> <jake.biesinger at gmail.com> wrote:
> > Hi!
> > Having some trouble with a gaussian_kde on uniform integer data.
> > $ python --version
> > Python 2.6.5
> > $ ipython --Version
> > 0.10
> > $ ipython
> > In [1]: from scipy.stats import gaussian_kde
> > In [2]: import scipy
> > In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250)  #
> > should be uniform on [-250,250)
> > In [4]: k = gaussian_kde(randDistFromCenter)
> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486:
> > DeprecationWarning: scipy.stats.cov is deprecated; please update your
> code
> > to use numpy.cov.
> > Please note that:
> >     - numpy.cov rowvar argument defaults to true, not false
> >     - numpy.cov bias argument defaults to false, not true
> >   """, DeprecationWarning)
> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420:
> > DeprecationWarning: scipy.stats.mean is deprecated; please update your
> code
> > to use numpy.mean.
> > Please note that:
> >     - numpy.mean axis argument defaults to None, not 0
> >     - numpy.mean has a ddof argument to replace bias in a more general
> > manner.
> >       scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x,
> > axis=0, ddof=1).
> >   axis=0, ddof=1).""", DeprecationWarning)
> > In [5]: x = scipy.linspace(-300,300,100)
> > In [6]: y = k.evaluate(x)
> > In [7]: y
> > Out[7]:
> > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> >        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> >        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> >        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> >        0, 0, 0, 0, 0, 0, 0, 0])
> >
> > #  Perhaps it's a bandwidth issue, though there should still be at least
> a
> > few non-zero entries!:
> > In [8]: k.covariance
> > Out[8]: array([[ 523.56767608]])
>
>
> kde doesn't like integers, Can you file a ticket for this?
>
> if you don't convert the original sample to integers, or convert them
> to float, e.g
>
> k = gaussian_kde(np.array(randDistFromCenter, float))
>
> then I get y=
> [  2.37718449e-05   4.59333855e-05   8.33629081e-05   1.42283310e-04
>   2.28733649e-04   3.46958715e-04   4.97636815e-04   6.76566775e-04
>   8.74448533e-04   1.07809127e-03   1.27285862e-03   1.44565290e-03
>   1.58750741e-03   1.69501260e-03   1.77025571e-03   1.81947393e-03
> ....
>
> I don't know how good the bandwidth is for a uniform distribution of
> the sample, but you will get a lot of spillover/smoothing at the
> boundary.
>
> Thanks for reporting,
>
> Josef
>
> >
> > --
> > Jake Biesinger
> > Graduate Student
> > Xie Lab, UC Irvine
> > (949) 231-7587
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/776c1c05/attachment.html>

From josef.pktd at gmail.com  Fri May 28 13:18:41 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 13:18:41 -0400
Subject: [SciPy-User] Trouble with gaussian_kde on uniform integer data,
	possibly deprecation error
In-Reply-To: <AANLkTinELccBle8BI3D4pAgvTrjtOyQjiI7xD2e2bQQL@mail.gmail.com>
References: <AANLkTilnvD2ksoCfeQ-5quXQmG0Z0KEq4w4o-MnBHOe_@mail.gmail.com>
	<AANLkTikHjGrXR2DCXS41MYA_c-fQXTg26BMBPE4JDpS1@mail.gmail.com>
	<AANLkTinELccBle8BI3D4pAgvTrjtOyQjiI7xD2e2bQQL@mail.gmail.com>
Message-ID: <AANLkTiny-8O82LbSuJX812TG-tjfQ7bng_HSbDzemYtF@mail.gmail.com>

On Fri, May 28, 2010 at 1:03 PM, Jacob Biesinger
<jake.biesinger at gmail.com> wrote:
> Oh, great. ?Thanks for the workaround. ?I opened a ticket for this on Scipy
> Trac: ?http://projects.scipy.org/scipy/ticket/1181

Thanks,
Josef
>
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
>
> On Fri, May 28, 2010 at 9:46 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Fri, May 28, 2010 at 12:30 PM, Jacob Biesinger
>> <jake.biesinger at gmail.com> wrote:
>> > Hi!
>> > Having some trouble with a gaussian_kde on uniform integer data.
>> > $ python --version
>> > Python 2.6.5
>> > $ ipython --Version
>> > 0.10
>> > $ ipython
>> > In [1]: from scipy.stats import gaussian_kde
>> > In [2]: import scipy
>> > In [3]: randDistFromCenter = map(int, scipy.rand(10000) * 500 - 250) ?#
>> > should be uniform on [-250,250)
>> > In [4]: k = gaussian_kde(randDistFromCenter)
>> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1486:
>> > DeprecationWarning: scipy.stats.cov is deprecated; please update your
>> > code
>> > to use numpy.cov.
>> > Please note that:
>> > ?? ?- numpy.cov rowvar argument defaults to true, not false
>> > ?? ?- numpy.cov bias argument defaults to false, not true
>> > ??""", DeprecationWarning)
>> > /usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420:
>> > DeprecationWarning: scipy.stats.mean is deprecated; please update your
>> > code
>> > to use numpy.mean.
>> > Please note that:
>> > ?? ?- numpy.mean axis argument defaults to None, not 0
>> > ?? ?- numpy.mean has a ddof argument to replace bias in a more general
>> > manner.
>> > ?? ? ?scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x,
>> > axis=0, ddof=1).
>> > ??axis=0, ddof=1).""", DeprecationWarning)
>> > In [5]: x = scipy.linspace(-300,300,100)
>> > In [6]: y = k.evaluate(x)
>> > In [7]: y
>> > Out[7]:
>> > array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0,
>> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0,
>> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0,
>> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0,
>> > ?? ? ? 0, 0, 0, 0, 0, 0, 0, 0])
>> >
>> > # ?Perhaps it's a bandwidth issue, though there should still be at least
>> > a
>> > few non-zero entries!:
>> > In [8]: k.covariance
>> > Out[8]: array([[ 523.56767608]])
>>
>>
>> kde doesn't like integers, Can you file a ticket for this?
>>
>> if you don't convert the original sample to integers, or convert them
>> to float, e.g
>>
>> k = gaussian_kde(np.array(randDistFromCenter, float))
>>
>> then I get y=
>> [ ?2.37718449e-05 ? 4.59333855e-05 ? 8.33629081e-05 ? 1.42283310e-04
>> ? 2.28733649e-04 ? 3.46958715e-04 ? 4.97636815e-04 ? 6.76566775e-04
>> ? 8.74448533e-04 ? 1.07809127e-03 ? 1.27285862e-03 ? 1.44565290e-03
>> ? 1.58750741e-03 ? 1.69501260e-03 ? 1.77025571e-03 ? 1.81947393e-03
>> ....
>>
>> I don't know how good the bandwidth is for a uniform distribution of
>> the sample, but you will get a lot of spillover/smoothing at the
>> boundary.
>>
>> Thanks for reporting,
>>
>> Josef
>>
>> >
>> > --
>> > Jake Biesinger
>> > Graduate Student
>> > Xie Lab, UC Irvine
>> > (949) 231-7587
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From josef.pktd at gmail.com  Fri May 28 13:38:11 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 13:38:11 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
Message-ID: <AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>

On Mon, May 17, 2010 at 3:57 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi Josef,
>
> Thanks for the answer.
>
>> Actually, if the onepoint distribution directly subclasses rv_generic
>> then it wouldn't rely on or interfere with the generic framework in
>> rv_continuous or rv_discrete (where it wouldn't really fit in if
>> onepoint is on reals), and it might be relatively easy to provide all
>> the methods of the distributions for a single point distribution.
>
> I must admit that I haven't had a look at the innards of rv_generic,
> so I am afraid I cannot be of any relevant help in this respect.
>
>>
>> Choice of name:
>> to me, "deterministic random variable" sounds like an oxymoron,
>> although I found some references to deterministic distribution (mainly
>> or exclusively in queuing theory and
>> http://isi.cbs.nl/glossary/term902.htm)
>> I would prefer a boring "onepoint" distribution, or "degenerate", or ... ?
>
> Degenerate seems nice to me. I just checked the book Probability by
> Shiryaev, and he also uses the word `degenerate'. Interestingly, he
> introduces the degenerate distribution as the normal distribution with
> sigma = 0. I suspect that implementing the degenerate distribution
> like this is utterly stupid.
>
>> Can you file a ticket with what you would like to have?
>
> Sure. Sorry for bothering you with this, but how?

Nicky, Sorry about the delay, another thread I lost track of.

http://projects.scipy.org/scipy/newticket
has a form to fill out, you might have to sign up first
It's a good place to file things so they don't get forgotten.

>
>> <rambling ahead>
>> I started to work again a bit on enhancing the distributions, mainly
>> I'm experimenting with several generic estimation methods. My target
>> is to have a working estimator for any distribution in scipy.stats and
>> for several additional distributions.
>
> This seems a nice idea, but quite ambitious. Have you also thought
> about estimators for heavy tailed distributions? This is, as far as I
> know, a very delicate topic.
>
>>
>> I worry a bit that a deterministic distribution might not fit into a
>> general framework for distributions and might need to be special cased
>> for some methods. (but see above)
>
> This must be fairly easy. Just the mean can be relevant.

We would want to provide all the same methods, even if most of them are trivial.

BTW: I had started to work on the discrete distribution on the real
line. Some methods
works easily, but then I ran into the "hashtable on floats" problem
(from another thread)

pdf(x), cdf(x)  with x float would need to know whether x is a support
point, but which might not be equal to the actual point because of
floating point problems.
So, the direct translation of rv_discrete doesn't work, and it looks
like at least pdf needs to be accessible either pointwise for queries
or using known support points for actual calculations.

No fun, and EDA dropped.

Josef

>
>> http://bazaar.launchpad.net/~josef-pktd/statsmodels/statsmodels-josef-experimental/files/head:/scikits/statsmodels/sandbox/stats/
>
> I'll have a look. Thanks.
>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From njs at pobox.com  Fri May 28 14:11:04 2010
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 28 May 2010 11:11:04 -0700
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
Message-ID: <AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>

On Fri, May 28, 2010 at 10:38 AM,  <josef.pktd at gmail.com> wrote:
> pdf(x), cdf(x) ?with x float would need to know whether x is a support
> point, but which might not be equal to the actual point because of
> floating point problems.
> So, the direct translation of rv_discrete doesn't work, and it looks
> like at least pdf needs to be accessible either pointwise for queries
> or using known support points for actual calculations.

Discrete distributions on the real line don't *have* a pdf...

-- Nathaniel


From josef.pktd at gmail.com  Fri May 28 14:21:54 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 14:21:54 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
Message-ID: <AANLkTimt6wMf2_qhXEmMeUGJPFihVtHZ-Ubxw7ELezIw@mail.gmail.com>

On Fri, May 28, 2010 at 2:11 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 10:38 AM, ?<josef.pktd at gmail.com> wrote:
>> pdf(x), cdf(x) ?with x float would need to know whether x is a support
>> point, but which might not be equal to the actual point because of
>> floating point problems.
>> So, the direct translation of rv_discrete doesn't work, and it looks
>> like at least pdf needs to be accessible either pointwise for queries
>> or using known support points for actual calculations.
>
> Discrete distributions on the real line don't *have* a pdf...

sorry, pmf,
it's a pain if continuous and discrete have different names, one of my
favorite typos in this.

What do we call it when we have a mixture distribution with mass
points and a density?

pmdf ? or just pf ?

Josef

>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From njs at pobox.com  Fri May 28 15:16:05 2010
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 28 May 2010 12:16:05 -0700
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimt6wMf2_qhXEmMeUGJPFihVtHZ-Ubxw7ELezIw@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
	<AANLkTimt6wMf2_qhXEmMeUGJPFihVtHZ-Ubxw7ELezIw@mail.gmail.com>
Message-ID: <AANLkTiknLipPr_tgwiccDMzFPP11aF8gcw8-mE9ZFtHZ@mail.gmail.com>

On Fri, May 28, 2010 at 11:21 AM,  <josef.pktd at gmail.com> wrote:
>> Discrete distributions on the real line don't *have* a pdf...
>
> sorry, pmf,
> it's a pain if continuous and discrete have different names, one of my
> favorite typos in this.

It's not just a difference in terminology -- I brought it up because
the practical problems you were running into are closely related to
the reasons why it's impossible to define a pdf for a point mass on
the real line.

A pdf and pmf on the real line have to be used differently -- pdf's
are mostly meaningful as things to integrate, but imagine integrating
(some function of) a pmf over the real line as if it were a pdf -- you
always get 0...

> What do we call it when we have a mixture distribution with mass
> points and a density?

I don't know -- before trying to name it, can it even be defined? What
value does it have at the locations where there's a point mass? Inf?

IEEE954 neglected to include values for Dirac delta functions :-).

-- Nathaniel


From josef.pktd at gmail.com  Fri May 28 15:41:05 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 15:41:05 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTiknLipPr_tgwiccDMzFPP11aF8gcw8-mE9ZFtHZ@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
	<AANLkTimt6wMf2_qhXEmMeUGJPFihVtHZ-Ubxw7ELezIw@mail.gmail.com>
	<AANLkTiknLipPr_tgwiccDMzFPP11aF8gcw8-mE9ZFtHZ@mail.gmail.com>
Message-ID: <AANLkTin6-9_HwGCi2NLdaQDEzpebqA7lx0wwyxcbkWEj@mail.gmail.com>

On Fri, May 28, 2010 at 3:16 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 11:21 AM, ?<josef.pktd at gmail.com> wrote:
>>> Discrete distributions on the real line don't *have* a pdf...
>>
>> sorry, pmf,
>> it's a pain if continuous and discrete have different names, one of my
>> favorite typos in this.
>
> It's not just a difference in terminology -- I brought it up because
> the practical problems you were running into are closely related to
> the reasons why it's impossible to define a pdf for a point mass on
> the real line.
>
> A pdf and pmf on the real line have to be used differently -- pdf's
> are mostly meaningful as things to integrate, but imagine integrating
> (some function of) a pmf over the real line as if it were a pdf -- you
> always get 0...

I was trying to convert rv_discrete to be defined on an arbitrary
(countable) number of masspoints on the real line instead of over
integers.  I'm summing, not integrating over points.
The problem is that we can specify the set of integers relatively
easily, but looking for a finite (or countable) number of points on
the real line is more difficult.

I'm not trying to reuse anything from the continuous distributions,
except that one application will be to work with discretized
continuous distributions.

>
>> What do we call it when we have a mixture distribution with mass
>> points and a density?
>
> I don't know -- before trying to name it, can it even be defined? What
> value does it have at the locations where there's a point mass? Inf?

It would have to combine integration for the continuous part with
addition of mass points for the discrete part.
Tweedie distribution for some parameters is an example. A mass point
at zero and continuous on the positive real line. "Invented" for
rainfall, with positive probability it doesn't rain, but if it rains
then the amount of rainfall is a continuous random variable.

There are some decomposition theorems for this, approximately every
distribution can be represented as the sum of discrete mass points
plus a continuous distribution.

>
> IEEE954 neglected to include values for Dirac delta functions :-).

I think integration over singularities in a function doesn't work too
well in scipy.

Josef

>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From robert.kern at gmail.com  Fri May 28 15:50:09 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 28 May 2010 14:50:09 -0500
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com> 
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com> 
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com> 
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com> 
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com> 
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com> 
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
Message-ID: <AANLkTimYjsNkzL-LjF7GL3JZJwc3aDr5HMtJhr9bWwpk@mail.gmail.com>

On Fri, May 28, 2010 at 13:11, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 10:38 AM, ?<josef.pktd at gmail.com> wrote:
>> pdf(x), cdf(x) ?with x float would need to know whether x is a support
>> point, but which might not be equal to the actual point because of
>> floating point problems.
>> So, the direct translation of rv_discrete doesn't work, and it looks
>> like at least pdf needs to be accessible either pointwise for queries
>> or using known support points for actual calculations.
>
> Discrete distributions on the real line don't *have* a pdf...

Well, they *have* one; they just can't be implemented in floating point. :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From mdekauwe at gmail.com  Fri May 28 15:53:42 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 28 May 2010 12:53:42 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
Message-ID: <28711249.post@talk.nabble.com>


Ok thanks...I'll take a look.

Back to my loops issue. What if instead this time I wanted to take an
average so every march in 11 years, is there a quicker way to go about doing
that than my current method?

nummonths = 12
numyears = 11

for month in xrange(nummonths):
    for i in xrange(numpts):
        for ym in xrange(month, numyears * nummonths, nummonths):
            data[month, i] += array[ym, VAR, land_pts_index[i], 0]

so for each point in the array for a given month i am jumping through and
getting the next years month and so on, summing it.
                
Thanks...


josef.pktd wrote:
> 
> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Could you possibly if you have time explain further your comment re the
>> p-values, your suggesting I am misusing them?
> 
> Depends on your use and interpretation
> 
> test statistics, p-values are random variables, if you look at several
> tests at the same time, some p-values will be large just by chance.
> If, for example you just look at the largest test statistic, then the
> distribution for the max of several test statistics is not the same as
> the distribution for a single test statistic
> 
> http://en.wikipedia.org/wiki/Multiple_comparisons
> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
> 
> we also just had a related discussion for ANOVA post-hoc tests on the
> pystatsmodels group.
> 
> Josef
>>
>> Thanks.
>>
>>
>> josef.pktd wrote:
>>>
>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Sounds like I am stuck with the loop as I need to do the comparison for
>>>> each
>>>> pixel of the world and then I have a basemap function call which I
>>>> guess
>>>> slows it down further...hmm
>>>
>>> I don't see much that could be done differently, after a brief look.
>>>
>>> stats.pearsonr could be replaced by an array version using directly
>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>> never tried or seen a faster version.
>>>
>>> just a reminder, the p-values are for a single test, when you have
>>> many of them, then they don't have the right size/confidence level for
>>> an overall or joint test. (some packages report a Bonferroni
>>> correction in this case)
>>>
>>> Josef
>>>
>>>
>>>>
>>>> i.e.
>>>>
>>>> def compareSnowData(jules_var):
>>>> ? ?# Extract the 11 years of snow data and return
>>>> ? ?outrows = 180
>>>> ? ?outcols = 360
>>>> ? ?numyears = 11
>>>> ? ?nummonths = 12
>>>>
>>>> ? ?# Read various files
>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>
>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>> \
>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>> \
>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>
>>>> ? ?# grab some space
>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>> dtype=np.float32)
>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>> dtype=np.float32)
>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>> np.nan
>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>> np.nan
>>>>
>>>> ? ?# extract the data
>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>> ? ?#for month in xrange(numyears * nummonths):
>>>> ? ?# ? ?for i in xrange(numpts):
>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>> ? ?# ? ? ? ?else:
>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>> ? ?# ? ? ? ?else:
>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>
>>>> ? ?# exclude any months from *both* arrays where we have dodgy data,
>>>> else
>>>> we
>>>> ? ?# can't do the correlations correctly!!
>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>
>>>> ? ?# put data on a regular grid...
>>>> ? ?print 'regridding landpts...'
>>>> ? ?for i in xrange(numpts):
>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>> func
>>>> ? ? ? ?x = data1_snow[:,i]
>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>> ? ? ? ?y = data2_snow[:,i]
>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>
>>>> ? ? ? ?# r^2
>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of
>>>> data
>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>> (stats.pearsonr(x, y)[0])**2
>>>>
>>>> ? ? ? ?# wilcox signed rank test
>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>> ? ? ? ?d = x - y
>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>> non-zero
>>>> differences
>>>> ? ? ? ?count = len(d)
>>>> ? ? ? ?if count > 10:
>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>> ? ? ? ? ? ?# only map out sign different data
>>>> ? ? ? ? ? ?if pval < 0.05:
>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>> np.mean(x - y)
>>>>
>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Also I then need to remap the 2D array I make onto another grid (the
>>>>>> world in
>>>>>> this case). Which again I had am doing with a loop (note numpts is a
>>>>>> lot
>>>>>> bigger than my example above).
>>>>>>
>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>> np.nan
>>>>>> for i in xrange(numpts):
>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>>> func
>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>
>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>> ? ? ? ?d = x - y
>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>> non-zero
>>>>>> differences
>>>>>> ? ? ? ?count = len(d)
>>>>>> ? ? ? ?if count > 10:
>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>> np.mean(x - y)
>>>>>>
>>>>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>>>>> array
>>>>>> by removing the index,
>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>> array
>>>>>> member correctly without the loop, this was my attempt which of
>>>>>> course
>>>>>> doesn't work!
>>>>>>
>>>>>> x = data1_snow[:,:]
>>>>>> x = x[np.isfinite(x)]
>>>>>> y = data2_snow[:,:]
>>>>>> y = y[np.isfinite(y)]
>>>>>>
>>>>>> # r^2
>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>>>>> if len(x) and len(y) > 50:
>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>> y)[0])**2
>>>>>
>>>>>
>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>>>>> at a time (if I read the help correctly).
>>>>>
>>>>> Also the presence of nans might force the use a loop. stats.mstats has
>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>> when vectorized operations would work with regular arrays, nan or
>>>>> masked array versions still have to loop in many cases.)
>>>>>
>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>> calculated then it might be worth to use only array operations up to
>>>>> that point. If wilcoxon is calculated most of the time, then it's not
>>>>> worth thinking too hard about this.
>>>>>
>>>>> Josef
>>>>>
>>>>>
>>>>>>
>>>>>> thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> mdekauwe wrote:
>>>>>>>
>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>>>>
>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>> there
>>>>>>> a
>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>> 4dims
>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>
>>>>> There were several discussions on the mailing lists (fancy slicing and
>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>> shapes, you can look up the details.
>>>>> when in doubt, I use np.arange(...)
>>>>>
>>>>> Josef
>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> josef.pktd wrote:
>>>>>>>>
>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks that works...
>>>>>>>>>
>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>> the
>>>>>>>>> step
>>>>>>>>> I
>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>> the
>>>>>>>>> two
>>>>>>>>> for
>>>>>>>>> loops? Do I have that right?
>>>>>>>>
>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>> dimension,
>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>>
>>>>>>>>> A lot quicker...!
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>>>>> array,
>>>>>>>>>>> but
>>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality
>>>>>>>>>>> the
>>>>>>>>>>> arrays
>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>> solution
>>>>>>>>>>> as
>>>>>>>>>>> well
>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>> difficult
>>>>>>>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>> that
>>>>>>>>>>> one
>>>>>>>>>>> could
>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>
>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>
>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> import numpy as np
>>>>>>>>>>>
>>>>>>>>>>> numpts=10
>>>>>>>>>>> tsteps = 12
>>>>>>>>>>> vari = 22
>>>>>>>>>>>
>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>
>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>
>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>
>>>>>>>>>> I think this should do it
>>>>>>>>>>
>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts),
>>>>>>>>>> 0]
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> View this message in context:
>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Fri May 28 16:05:17 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 16:05:17 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28711249.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
	<28711249.post@talk.nabble.com>
Message-ID: <AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>

On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Ok thanks...I'll take a look.
>
> Back to my loops issue. What if instead this time I wanted to take an
> average so every march in 11 years, is there a quicker way to go about doing
> that than my current method?
>
> nummonths = 12
> numyears = 11
>
> for month in xrange(nummonths):
> ? ?for i in xrange(numpts):
> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths):
> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0]


x[start:end:12,:] gives you every 12th row of an array x

something like this should work to get rid of the inner loop, or you
could directly put
range(month, numyears * nummonths, nummonths) into the array instead
of ym and sum()

Josef


>
> so for each point in the array for a given month i am jumping through and
> getting the next years month and so on, summing it.
>
> Thanks...
>
>
> josef.pktd wrote:
>>
>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Could you possibly if you have time explain further your comment re the
>>> p-values, your suggesting I am misusing them?
>>
>> Depends on your use and interpretation
>>
>> test statistics, p-values are random variables, if you look at several
>> tests at the same time, some p-values will be large just by chance.
>> If, for example you just look at the largest test statistic, then the
>> distribution for the max of several test statistics is not the same as
>> the distribution for a single test statistic
>>
>> http://en.wikipedia.org/wiki/Multiple_comparisons
>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>
>> we also just had a related discussion for ANOVA post-hoc tests on the
>> pystatsmodels group.
>>
>> Josef
>>>
>>> Thanks.
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Sounds like I am stuck with the loop as I need to do the comparison for
>>>>> each
>>>>> pixel of the world and then I have a basemap function call which I
>>>>> guess
>>>>> slows it down further...hmm
>>>>
>>>> I don't see much that could be done differently, after a brief look.
>>>>
>>>> stats.pearsonr could be replaced by an array version using directly
>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>> never tried or seen a faster version.
>>>>
>>>> just a reminder, the p-values are for a single test, when you have
>>>> many of them, then they don't have the right size/confidence level for
>>>> an overall or joint test. (some packages report a Bonferroni
>>>> correction in this case)
>>>>
>>>> Josef
>>>>
>>>>
>>>>>
>>>>> i.e.
>>>>>
>>>>> def compareSnowData(jules_var):
>>>>> ? ?# Extract the 11 years of snow data and return
>>>>> ? ?outrows = 180
>>>>> ? ?outcols = 360
>>>>> ? ?numyears = 11
>>>>> ? ?nummonths = 12
>>>>>
>>>>> ? ?# Read various files
>>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>
>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>>> \
>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>>> \
>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>
>>>>> ? ?# grab some space
>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>> dtype=np.float32)
>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>> dtype=np.float32)
>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>> np.nan
>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>> np.nan
>>>>>
>>>>> ? ?# extract the data
>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>> ? ?#for month in xrange(numyears * nummonths):
>>>>> ? ?# ? ?for i in xrange(numpts):
>>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>>> ? ?# ? ? ? ?else:
>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>>> ? ?# ? ? ? ?else:
>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>>
>>>>> ? ?# exclude any months from *both* arrays where we have dodgy data,
>>>>> else
>>>>> we
>>>>> ? ?# can't do the correlations correctly!!
>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>
>>>>> ? ?# put data on a regular grid...
>>>>> ? ?print 'regridding landpts...'
>>>>> ? ?for i in xrange(numpts):
>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>> func
>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>
>>>>> ? ? ? ?# r^2
>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years of
>>>>> data
>>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>
>>>>> ? ? ? ?# wilcox signed rank test
>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>> ? ? ? ?d = x - y
>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>> non-zero
>>>>> differences
>>>>> ? ? ? ?count = len(d)
>>>>> ? ? ? ?if count > 10:
>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>> np.mean(x - y)
>>>>>
>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Also I then need to remap the 2D array I make onto another grid (the
>>>>>>> world in
>>>>>>> this case). Which again I had am doing with a loop (note numpts is a
>>>>>>> lot
>>>>>>> bigger than my example above).
>>>>>>>
>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>> for i in xrange(numpts):
>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>>>> func
>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>
>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>> ? ? ? ?d = x - y
>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>> non-zero
>>>>>>> differences
>>>>>>> ? ? ? ?count = len(d)
>>>>>>> ? ? ? ?if count > 10:
>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> np.mean(x - y)
>>>>>>>
>>>>>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>>>>>> array
>>>>>>> by removing the index,
>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>> array
>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>> course
>>>>>>> doesn't work!
>>>>>>>
>>>>>>> x = data1_snow[:,:]
>>>>>>> x = x[np.isfinite(x)]
>>>>>>> y = data2_snow[:,:]
>>>>>>> y = y[np.isfinite(y)]
>>>>>>>
>>>>>>> # r^2
>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>>>>>> if len(x) and len(y) > 50:
>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>> y)[0])**2
>>>>>>
>>>>>>
>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>>>>>> at a time (if I read the help correctly).
>>>>>>
>>>>>> Also the presence of nans might force the use a loop. stats.mstats has
>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>> masked array versions still have to loop in many cases.)
>>>>>>
>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>> calculated then it might be worth to use only array operations up to
>>>>>> that point. If wilcoxon is calculated most of the time, then it's not
>>>>>> worth thinking too hard about this.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> mdekauwe wrote:
>>>>>>>>
>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>>>>>
>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>> there
>>>>>>>> a
>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>> 4dims
>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>
>>>>>> There were several discussions on the mailing lists (fancy slicing and
>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>> shapes, you can look up the details.
>>>>>> when in doubt, I use np.arange(...)
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> josef.pktd wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks that works...
>>>>>>>>>>
>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>> the
>>>>>>>>>> step
>>>>>>>>>> I
>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>> the
>>>>>>>>>> two
>>>>>>>>>> for
>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>
>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>> dimension,
>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A lot quicker...!
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>>>>>> array,
>>>>>>>>>>>> but
>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality
>>>>>>>>>>>> the
>>>>>>>>>>>> arrays
>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>> solution
>>>>>>>>>>>> as
>>>>>>>>>>>> well
>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>> difficult
>>>>>>>>>>>> to
>>>>>>>>>>>> get
>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>> that
>>>>>>>>>>>> one
>>>>>>>>>>>> could
>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>
>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>
>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>
>>>>>>>>>>>> numpts=10
>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>
>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>
>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>
>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>
>>>>>>>>>>> I think this should do it
>>>>>>>>>>>
>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts),
>>>>>>>>>>> 0]
>>>>>>>>>>>
>>>>>>>>>>> Josef
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From mdekauwe at gmail.com  Fri May 28 16:14:56 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 28 May 2010 13:14:56 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
	<28711249.post@talk.nabble.com>
	<AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>
Message-ID: <28711444.post@talk.nabble.com>


ok - something like this then...but how would i get the index for the month
for the data array (where month is 0, 1, 2, 4 ... 11)?

data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0]

and would that be quicker than making an array months...

months = np.arange(numyears * nummonths)

and you that instead like you suggested x[start:end:12,:]?

Many thanks again...


josef.pktd wrote:
> 
> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> Ok thanks...I'll take a look.
>>
>> Back to my loops issue. What if instead this time I wanted to take an
>> average so every march in 11 years, is there a quicker way to go about
>> doing
>> that than my current method?
>>
>> nummonths = 12
>> numyears = 11
>>
>> for month in xrange(nummonths):
>> ? ?for i in xrange(numpts):
>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths):
>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0]
> 
> 
> x[start:end:12,:] gives you every 12th row of an array x
> 
> something like this should work to get rid of the inner loop, or you
> could directly put
> range(month, numyears * nummonths, nummonths) into the array instead
> of ym and sum()
> 
> Josef
> 
> 
>>
>> so for each point in the array for a given month i am jumping through and
>> getting the next years month and so on, summing it.
>>
>> Thanks...
>>
>>
>> josef.pktd wrote:
>>>
>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Could you possibly if you have time explain further your comment re the
>>>> p-values, your suggesting I am misusing them?
>>>
>>> Depends on your use and interpretation
>>>
>>> test statistics, p-values are random variables, if you look at several
>>> tests at the same time, some p-values will be large just by chance.
>>> If, for example you just look at the largest test statistic, then the
>>> distribution for the max of several test statistics is not the same as
>>> the distribution for a single test statistic
>>>
>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>
>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>> pystatsmodels group.
>>>
>>> Josef
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Sounds like I am stuck with the loop as I need to do the comparison
>>>>>> for
>>>>>> each
>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>> guess
>>>>>> slows it down further...hmm
>>>>>
>>>>> I don't see much that could be done differently, after a brief look.
>>>>>
>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>>> never tried or seen a faster version.
>>>>>
>>>>> just a reminder, the p-values are for a single test, when you have
>>>>> many of them, then they don't have the right size/confidence level for
>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>> correction in this case)
>>>>>
>>>>> Josef
>>>>>
>>>>>
>>>>>>
>>>>>> i.e.
>>>>>>
>>>>>> def compareSnowData(jules_var):
>>>>>> ? ?# Extract the 11 years of snow data and return
>>>>>> ? ?outrows = 180
>>>>>> ? ?outcols = 360
>>>>>> ? ?numyears = 11
>>>>>> ? ?nummonths = 12
>>>>>>
>>>>>> ? ?# Read various files
>>>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>
>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>> numcols=1,
>>>>>> \
>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>> numcols=1,
>>>>>> \
>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>
>>>>>> ? ?# grab some space
>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>> dtype=np.float32)
>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>> dtype=np.float32)
>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>> np.nan
>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>> np.nan
>>>>>>
>>>>>> ? ?# extract the data
>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>> ? ?#for month in xrange(numyears * nummonths):
>>>>>> ? ?# ? ?for i in xrange(numpts):
>>>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>>>> ? ?# ? ? ? ?else:
>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>>>> ? ?# ? ? ? ?else:
>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>>>
>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy data,
>>>>>> else
>>>>>> we
>>>>>> ? ?# can't do the correlations correctly!!
>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>
>>>>>> ? ?# put data on a regular grid...
>>>>>> ? ?print 'regridding landpts...'
>>>>>> ? ?for i in xrange(numpts):
>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>>> func
>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>
>>>>>> ? ? ? ?# r^2
>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years
>>>>>> of
>>>>>> data
>>>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>
>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>> ? ? ? ?d = x - y
>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>> non-zero
>>>>>> differences
>>>>>> ? ? ? ?count = len(d)
>>>>>> ? ? ? ?if count > 10:
>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>> np.mean(x - y)
>>>>>>
>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>> (the
>>>>>>>> world in
>>>>>>>> this case). Which again I had am doing with a loop (note numpts is
>>>>>>>> a
>>>>>>>> lot
>>>>>>>> bigger than my example above).
>>>>>>>>
>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>>> np.nan
>>>>>>>> for i in xrange(numpts):
>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>> stats
>>>>>>>> func
>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>
>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>> ? ? ? ?d = x - y
>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>> non-zero
>>>>>>>> differences
>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>> np.mean(x - y)
>>>>>>>>
>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>> wilcoxStats_snow
>>>>>>>> array
>>>>>>>> by removing the index,
>>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>>> array
>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>> course
>>>>>>>> doesn't work!
>>>>>>>>
>>>>>>>> x = data1_snow[:,:]
>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>> y = data2_snow[:,:]
>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>
>>>>>>>> # r^2
>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>> data
>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>>> y)[0])**2
>>>>>>>
>>>>>>>
>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>> arrays
>>>>>>> at a time (if I read the help correctly).
>>>>>>>
>>>>>>> Also the presence of nans might force the use a loop. stats.mstats
>>>>>>> has
>>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>
>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>>> calculated then it might be worth to use only array operations up to
>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>> not
>>>>>>> worth thinking too hard about this.
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> mdekauwe wrote:
>>>>>>>>>
>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>> there
>>>>>>>>> a
>>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>>> 4dims
>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>
>>>>>>> There were several discussions on the mailing lists (fancy slicing
>>>>>>> and
>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>> shapes, you can look up the details.
>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>
>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>>> the
>>>>>>>>>>> step
>>>>>>>>>>> I
>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>>> the
>>>>>>>>>>> two
>>>>>>>>>>> for
>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>
>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>> dimension,
>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a
>>>>>>>>>>>>> 2D
>>>>>>>>>>>>> array,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>> reality
>>>>>>>>>>>>> the
>>>>>>>>>>>>> arrays
>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>> solution
>>>>>>>>>>>>> as
>>>>>>>>>>>>> well
>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>> difficult
>>>>>>>>>>>>> to
>>>>>>>>>>>>> get
>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>>> that
>>>>>>>>>>>>> one
>>>>>>>>>>>>> could
>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>
>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>
>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>
>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>
>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>
>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>
>>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>>
>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>
>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>> 0]
>>>>>>>>>>>>
>>>>>>>>>>>> Josef
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> View this message in context:
>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Fri May 28 16:25:59 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 16:25:59 -0400
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28711444.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
	<28711249.post@talk.nabble.com>
	<AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>
	<28711444.post@talk.nabble.com>
Message-ID: <AANLkTim6NGYQ9RjXPGqgTgp-hiOIohsS4XARrMcUAlbU@mail.gmail.com>

On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> ok - something like this then...but how would i get the index for the month
> for the data array (where month is 0, 1, 2, 4 ... 11)?
>
> data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0]

you would still need to start at the right month
data[month,:] = array[xrange(month, numyears * nummonths, nummonths),VAR,:,0]
or
data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0]

an alternative would be a reshape with an extra month dimension and
then sum only once over the year axis. this might be faster but
trickier to get the correct reshape .

Josef

>
> and would that be quicker than making an array months...
>
> months = np.arange(numyears * nummonths)
>
> and you that instead like you suggested x[start:end:12,:]?
>
> Many thanks again...
>
>
> josef.pktd wrote:
>>
>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Ok thanks...I'll take a look.
>>>
>>> Back to my loops issue. What if instead this time I wanted to take an
>>> average so every march in 11 years, is there a quicker way to go about
>>> doing
>>> that than my current method?
>>>
>>> nummonths = 12
>>> numyears = 11
>>>
>>> for month in xrange(nummonths):
>>> ? ?for i in xrange(numpts):
>>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths):
>>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0]
>>
>>
>> x[start:end:12,:] gives you every 12th row of an array x
>>
>> something like this should work to get rid of the inner loop, or you
>> could directly put
>> range(month, numyears * nummonths, nummonths) into the array instead
>> of ym and sum()
>>
>> Josef
>>
>>
>>>
>>> so for each point in the array for a given month i am jumping through and
>>> getting the next years month and so on, summing it.
>>>
>>> Thanks...
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Could you possibly if you have time explain further your comment re the
>>>>> p-values, your suggesting I am misusing them?
>>>>
>>>> Depends on your use and interpretation
>>>>
>>>> test statistics, p-values are random variables, if you look at several
>>>> tests at the same time, some p-values will be large just by chance.
>>>> If, for example you just look at the largest test statistic, then the
>>>> distribution for the max of several test statistics is not the same as
>>>> the distribution for a single test statistic
>>>>
>>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>>
>>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>>> pystatsmodels group.
>>>>
>>>> Josef
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Sounds like I am stuck with the loop as I need to do the comparison
>>>>>>> for
>>>>>>> each
>>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>>> guess
>>>>>>> slows it down further...hmm
>>>>>>
>>>>>> I don't see much that could be done differently, after a brief look.
>>>>>>
>>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>>>> never tried or seen a faster version.
>>>>>>
>>>>>> just a reminder, the p-values are for a single test, when you have
>>>>>> many of them, then they don't have the right size/confidence level for
>>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>>> correction in this case)
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> i.e.
>>>>>>>
>>>>>>> def compareSnowData(jules_var):
>>>>>>> ? ?# Extract the 11 years of snow data and return
>>>>>>> ? ?outrows = 180
>>>>>>> ? ?outcols = 360
>>>>>>> ? ?numyears = 11
>>>>>>> ? ?nummonths = 12
>>>>>>>
>>>>>>> ? ?# Read various files
>>>>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>>
>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>> numcols=1,
>>>>>>> \
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>> numcols=1,
>>>>>>> \
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>>
>>>>>>> ? ?# grab some space
>>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>> dtype=np.float32)
>>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>> dtype=np.float32)
>>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>>
>>>>>>> ? ?# extract the data
>>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>>> ? ?#for month in xrange(numyears * nummonths):
>>>>>>> ? ?# ? ?for i in xrange(numpts):
>>>>>>> ? ?# ? ? ? ?data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>>> ? ?# ? ? ? ?data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>>>>> ? ?# ? ? ? ?else:
>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>>>>> ? ?# ? ? ? ?else:
>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>>>>
>>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy data,
>>>>>>> else
>>>>>>> we
>>>>>>> ? ?# can't do the correlations correctly!!
>>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>>
>>>>>>> ? ?# put data on a regular grid...
>>>>>>> ? ?print 'regridding landpts...'
>>>>>>> ? ?for i in xrange(numpts):
>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the stats
>>>>>>> func
>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>
>>>>>>> ? ? ? ?# r^2
>>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4 years
>>>>>>> of
>>>>>>> data
>>>>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>>
>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>> ? ? ? ?d = x - y
>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>> non-zero
>>>>>>> differences
>>>>>>> ? ? ? ?count = len(d)
>>>>>>> ? ? ? ?if count > 10:
>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> np.mean(x - y)
>>>>>>>
>>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>>
>>>>>>>
>>>>>>> josef.pktd wrote:
>>>>>>>>
>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>>> (the
>>>>>>>>> world in
>>>>>>>>> this case). Which again I had am doing with a loop (note numpts is
>>>>>>>>> a
>>>>>>>>> lot
>>>>>>>>> bigger than my example above).
>>>>>>>>>
>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>>>> np.nan
>>>>>>>>> for i in xrange(numpts):
>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>>> stats
>>>>>>>>> func
>>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>>
>>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>>> ? ? ? ?d = x - y
>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>>> non-zero
>>>>>>>>> differences
>>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>>> np.mean(x - y)
>>>>>>>>>
>>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>>> wilcoxStats_snow
>>>>>>>>> array
>>>>>>>>> by removing the index,
>>>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>>>> array
>>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>>> course
>>>>>>>>> doesn't work!
>>>>>>>>>
>>>>>>>>> x = data1_snow[:,:]
>>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>>> y = data2_snow[:,:]
>>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>>
>>>>>>>>> # r^2
>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>>> data
>>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>>>> y)[0])**2
>>>>>>>>
>>>>>>>>
>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>>> arrays
>>>>>>>> at a time (if I read the help correctly).
>>>>>>>>
>>>>>>>> Also the presence of nans might force the use a loop. stats.mstats
>>>>>>>> has
>>>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>>
>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>>>> calculated then it might be worth to use only array operations up to
>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>>> not
>>>>>>>> worth thinking too hard about this.
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> mdekauwe wrote:
>>>>>>>>>>
>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>>> there
>>>>>>>>>> a
>>>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>>>> 4dims
>>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>>
>>>>>>>> There were several discussions on the mailing lists (fancy slicing
>>>>>>>> and
>>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>>> shapes, you can look up the details.
>>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>>
>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>>>> the
>>>>>>>>>>>> step
>>>>>>>>>>>> I
>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>>>> the
>>>>>>>>>>>> two
>>>>>>>>>>>> for
>>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>>
>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>>> dimension,
>>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>>
>>>>>>>>>>> Josef
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a
>>>>>>>>>>>>>> 2D
>>>>>>>>>>>>>> array,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>>> reality
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> arrays
>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> well
>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>>
>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>>> 0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From mdekauwe at gmail.com  Fri May 28 16:28:12 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 28 May 2010 13:28:12 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <AANLkTim6NGYQ9RjXPGqgTgp-hiOIohsS4XARrMcUAlbU@mail.gmail.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
	<28711249.post@talk.nabble.com>
	<AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>
	<28711444.post@talk.nabble.com>
	<AANLkTim6NGYQ9RjXPGqgTgp-hiOIohsS4XARrMcUAlbU@mail.gmail.com>
Message-ID: <28711581.post@talk.nabble.com>


OK so I just need to have a quick loop across the 12 months then, that is
fine, just thought there might have been a sneaky way!

Really appreciated, getting there slowly!


josef.pktd wrote:
> 
> On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>
>> ok - something like this then...but how would i get the index for the
>> month
>> for the data array (where month is 0, 1, 2, 4 ... 11)?
>>
>> data[month,:] = array[xrange(0, numyears * nummonths, nummonths),VAR,:,0]
> 
> you would still need to start at the right month
> data[month,:] = array[xrange(month, numyears * nummonths,
> nummonths),VAR,:,0]
> or
> data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0]
> 
> an alternative would be a reshape with an extra month dimension and
> then sum only once over the year axis. this might be faster but
> trickier to get the correct reshape .
> 
> Josef
> 
>>
>> and would that be quicker than making an array months...
>>
>> months = np.arange(numyears * nummonths)
>>
>> and you that instead like you suggested x[start:end:12,:]?
>>
>> Many thanks again...
>>
>>
>> josef.pktd wrote:
>>>
>>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>
>>>> Ok thanks...I'll take a look.
>>>>
>>>> Back to my loops issue. What if instead this time I wanted to take an
>>>> average so every march in 11 years, is there a quicker way to go about
>>>> doing
>>>> that than my current method?
>>>>
>>>> nummonths = 12
>>>> numyears = 11
>>>>
>>>> for month in xrange(nummonths):
>>>> ? ?for i in xrange(numpts):
>>>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths):
>>>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0]
>>>
>>>
>>> x[start:end:12,:] gives you every 12th row of an array x
>>>
>>> something like this should work to get rid of the inner loop, or you
>>> could directly put
>>> range(month, numyears * nummonths, nummonths) into the array instead
>>> of ym and sum()
>>>
>>> Josef
>>>
>>>
>>>>
>>>> so for each point in the array for a given month i am jumping through
>>>> and
>>>> getting the next years month and so on, summing it.
>>>>
>>>> Thanks...
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Could you possibly if you have time explain further your comment re
>>>>>> the
>>>>>> p-values, your suggesting I am misusing them?
>>>>>
>>>>> Depends on your use and interpretation
>>>>>
>>>>> test statistics, p-values are random variables, if you look at several
>>>>> tests at the same time, some p-values will be large just by chance.
>>>>> If, for example you just look at the largest test statistic, then the
>>>>> distribution for the max of several test statistics is not the same as
>>>>> the distribution for a single test statistic
>>>>>
>>>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>>>
>>>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>>>> pystatsmodels group.
>>>>>
>>>>> Josef
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Sounds like I am stuck with the loop as I need to do the comparison
>>>>>>>> for
>>>>>>>> each
>>>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>>>> guess
>>>>>>>> slows it down further...hmm
>>>>>>>
>>>>>>> I don't see much that could be done differently, after a brief look.
>>>>>>>
>>>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>>>> the formula for correlation even with nans. wilcoxon looks slow, and
>>>>>>> I
>>>>>>> never tried or seen a faster version.
>>>>>>>
>>>>>>> just a reminder, the p-values are for a single test, when you have
>>>>>>> many of them, then they don't have the right size/confidence level
>>>>>>> for
>>>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>>>> correction in this case)
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> i.e.
>>>>>>>>
>>>>>>>> def compareSnowData(jules_var):
>>>>>>>> ? ?# Extract the 11 years of snow data and return
>>>>>>>> ? ?outrows = 180
>>>>>>>> ? ?outcols = 360
>>>>>>>> ? ?numyears = 11
>>>>>>>> ? ?nummonths = 12
>>>>>>>>
>>>>>>>> ? ?# Read various files
>>>>>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>>>
>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>>> numcols=1,
>>>>>>>> \
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>>> numcols=1,
>>>>>>>> \
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>>>
>>>>>>>> ? ?# grab some space
>>>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>>> dtype=np.float32)
>>>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>>> dtype=np.float32)
>>>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>>> np.nan
>>>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32)
>>>>>>>> *
>>>>>>>> np.nan
>>>>>>>>
>>>>>>>> ? ?# extract the data
>>>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>>>> ? ?#for month in xrange(numyears * nummonths):
>>>>>>>> ? ?# ? ?for i in xrange(numpts):
>>>>>>>> ? ?# ? ? ? ?data1 =
>>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>>>> ? ?# ? ? ? ?data2 =
>>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>>>>>> ? ?# ? ? ? ?else:
>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>>>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>>>>>> ? ?# ? ? ? ?else:
>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>>>>>
>>>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy
>>>>>>>> data,
>>>>>>>> else
>>>>>>>> we
>>>>>>>> ? ?# can't do the correlations correctly!!
>>>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>>>
>>>>>>>> ? ?# put data on a regular grid...
>>>>>>>> ? ?print 'regridding landpts...'
>>>>>>>> ? ?for i in xrange(numpts):
>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>> stats
>>>>>>>> func
>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>
>>>>>>>> ? ? ? ?# r^2
>>>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4
>>>>>>>> years
>>>>>>>> of
>>>>>>>> data
>>>>>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>>>
>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>> ? ? ? ?d = x - y
>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>> non-zero
>>>>>>>> differences
>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>> np.mean(x - y)
>>>>>>>>
>>>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>>>
>>>>>>>>
>>>>>>>> josef.pktd wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>>>> (the
>>>>>>>>>> world in
>>>>>>>>>> this case). Which again I had am doing with a loop (note numpts
>>>>>>>>>> is
>>>>>>>>>> a
>>>>>>>>>> lot
>>>>>>>>>> bigger than my example above).
>>>>>>>>>>
>>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32)
>>>>>>>>>> *
>>>>>>>>>> np.nan
>>>>>>>>>> for i in xrange(numpts):
>>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>>>> stats
>>>>>>>>>> func
>>>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>>>
>>>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>>>> ? ? ? ?d = x - y
>>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>>>> non-zero
>>>>>>>>>> differences
>>>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1]
>>>>>>>>>> =
>>>>>>>>>> np.mean(x - y)
>>>>>>>>>>
>>>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>>>> wilcoxStats_snow
>>>>>>>>>> array
>>>>>>>>>> by removing the index,
>>>>>>>>>> but I can't see how I will get the individual x and y pts for
>>>>>>>>>> each
>>>>>>>>>> array
>>>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>>>> course
>>>>>>>>>> doesn't work!
>>>>>>>>>>
>>>>>>>>>> x = data1_snow[:,:]
>>>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>>>> y = data2_snow[:,:]
>>>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>>>
>>>>>>>>>> # r^2
>>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>>>> data
>>>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>>>>> y)[0])**2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then
>>>>>>>>> you
>>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>>>> arrays
>>>>>>>>> at a time (if I read the help correctly).
>>>>>>>>>
>>>>>>>>> Also the presence of nans might force the use a loop. stats.mstats
>>>>>>>>> has
>>>>>>>>> masked array versions, but I didn't see wilcoxon in the list.
>>>>>>>>> (Even
>>>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>>>
>>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>>>>> calculated then it might be worth to use only array operations up
>>>>>>>>> to
>>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>>>> not
>>>>>>>>> worth thinking too hard about this.
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> mdekauwe wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>>>> work.
>>>>>>>>>>>
>>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>>>> there
>>>>>>>>>>> a
>>>>>>>>>>> link you can recommend I should read? Does that mean given I
>>>>>>>>>>> have
>>>>>>>>>>> 4dims
>>>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>>>
>>>>>>>>> There were several discussions on the mailing lists (fancy slicing
>>>>>>>>> and
>>>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>>>> shapes, you can look up the details.
>>>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that
>>>>>>>>>>>>> was
>>>>>>>>>>>>> the
>>>>>>>>>>>>> step
>>>>>>>>>>>>> I
>>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces
>>>>>>>>>>>>> the
>>>>>>>>>>>>> the
>>>>>>>>>>>>> two
>>>>>>>>>>>>> for
>>>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>>>> dimension,
>>>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>>>
>>>>>>>>>>>> Josef
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe
>>>>>>>>>>>>>> <mdekauwe at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> 2D
>>>>>>>>>>>>>>> array,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>>>> reality
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> arrays
>>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>>>> 0]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711581.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From vanforeest at gmail.com  Fri May 28 16:28:24 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Fri, 28 May 2010 22:28:24 +0200
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimYjsNkzL-LjF7GL3JZJwc3aDr5HMtJhr9bWwpk@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
	<AANLkTimYjsNkzL-LjF7GL3JZJwc3aDr5HMtJhr9bWwpk@mail.gmail.com>
Message-ID: <AANLkTimh2E9KlTlAYFybKl0-pxc_-aTj_jF8yO_rUPGv@mail.gmail.com>

Hi,

Nice to see the issue to be taken up again.

>> Discrete distributions on the real line don't *have* a pdf...
>
> Well, they *have* one; they just can't be implemented in floating point. :-)

A distribution function can be decomposed in a part that can be
represented by a pdf (absolute continuous), and a part that can be
represented by a pmf (jumps), and some extra stuff (Cantor like
functions) that we can safely neglect from a numerical point of view.
(The discussion above is resolved in any book on measure theory, and
covered by the Lebesgue decomposition theorem, for the interested...)

I don't know how to resolve the name problem about pdf and pmf. I must
admit I find it quite disturbing, since I also make these typo's, but
I don't know how to resolve this neatly.

>>> snip
pdf(x), cdf(x)  with x float would need to know whether x is a support
point, but which might not be equal to the actual point because of
floating point problems.
So, the direct translation of rv_discrete doesn't work, and it looks
like at least pdf needs to be accessible either pointwise for queries
or using known support points for actual calculations.
>>>
About representing floats in a hashtable, this is indeed hard to
resolve. However, for the particular purpose of defining a random
variable with support on a finite set of reals, it might suffice to
represent these reals by fractions, for instance, \pi \approx 22/7 (I
realize better approximations exist.), and then store 22 and 7
separately. Then generalize rv_discrete such that it accepts tuples
like (22, 7, 1.) with dtype (int, int, float).

>>>
No fun, and EDA dropped.
>>>
EDA dropped? I don't know what EDA means. I hope it does not have
severe consequences.

Nicky


From mdekauwe at gmail.com  Fri May 28 16:42:44 2010
From: mdekauwe at gmail.com (mdekauwe)
Date: Fri, 28 May 2010 13:42:44 -0700 (PDT)
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28711581.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com>
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com>
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com>
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com>
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com>
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com>
	<28711249.post@talk.nabble.com>
	<AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com>
	<28711444.post@talk.nabble.com>
	<AANLkTim6NGYQ9RjXPGqgTgp-hiOIohsS4XARrMcUAlbU@mail.gmail.com>
	<28711581.post@talk.nabble.com>
Message-ID: <28711708.post@talk.nabble.com>


In my original attempt I was only averaging values greater than 0.0, would
this be the work around, I wonder if it is a bit clumsy...?


for month in xrange(nummonths):
    temp[:] = array[month:numyears * nummonths:nummonths,VAR,:,0]
    temp = temp[temp>0.0]
    data[month, :] = np.mean(temp[:])


mdekauwe wrote:
> 
> OK so I just need to have a quick loop across the 12 months then, that is
> fine, just thought there might have been a sneaky way!
> 
> Really appreciated, getting there slowly!
> 
> 
> 
> josef.pktd wrote:
>> 
>> On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> ok - something like this then...but how would i get the index for the
>>> month
>>> for the data array (where month is 0, 1, 2, 4 ... 11)?
>>>
>>> data[month,:] = array[xrange(0, numyears * nummonths,
>>> nummonths),VAR,:,0]
>> 
>> you would still need to start at the right month
>> data[month,:] = array[xrange(month, numyears * nummonths,
>> nummonths),VAR,:,0]
>> or
>> data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0]
>> 
>> an alternative would be a reshape with an extra month dimension and
>> then sum only once over the year axis. this might be faster but
>> trickier to get the correct reshape .
>> 
>> Josef
>> 
>>>
>>> and would that be quicker than making an array months...
>>>
>>> months = np.arange(numyears * nummonths)
>>>
>>> and you that instead like you suggested x[start:end:12,:]?
>>>
>>> Many thanks again...
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Ok thanks...I'll take a look.
>>>>>
>>>>> Back to my loops issue. What if instead this time I wanted to take an
>>>>> average so every march in 11 years, is there a quicker way to go about
>>>>> doing
>>>>> that than my current method?
>>>>>
>>>>> nummonths = 12
>>>>> numyears = 11
>>>>>
>>>>> for month in xrange(nummonths):
>>>>> ? ?for i in xrange(numpts):
>>>>> ? ? ? ?for ym in xrange(month, numyears * nummonths, nummonths):
>>>>> ? ? ? ? ? ?data[month, i] += array[ym, VAR, land_pts_index[i], 0]
>>>>
>>>>
>>>> x[start:end:12,:] gives you every 12th row of an array x
>>>>
>>>> something like this should work to get rid of the inner loop, or you
>>>> could directly put
>>>> range(month, numyears * nummonths, nummonths) into the array instead
>>>> of ym and sum()
>>>>
>>>> Josef
>>>>
>>>>
>>>>>
>>>>> so for each point in the array for a given month i am jumping through
>>>>> and
>>>>> getting the next years month and so on, summing it.
>>>>>
>>>>> Thanks...
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Could you possibly if you have time explain further your comment re
>>>>>>> the
>>>>>>> p-values, your suggesting I am misusing them?
>>>>>>
>>>>>> Depends on your use and interpretation
>>>>>>
>>>>>> test statistics, p-values are random variables, if you look at
>>>>>> several
>>>>>> tests at the same time, some p-values will be large just by chance.
>>>>>> If, for example you just look at the largest test statistic, then the
>>>>>> distribution for the max of several test statistics is not the same
>>>>>> as
>>>>>> the distribution for a single test statistic
>>>>>>
>>>>>> http://en.wikipedia.org/wiki/Multiple_comparisons
>>>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>>>>>
>>>>>> we also just had a related discussion for ANOVA post-hoc tests on the
>>>>>> pystatsmodels group.
>>>>>>
>>>>>> Josef
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> josef.pktd wrote:
>>>>>>>>
>>>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Sounds like I am stuck with the loop as I need to do the
>>>>>>>>> comparison
>>>>>>>>> for
>>>>>>>>> each
>>>>>>>>> pixel of the world and then I have a basemap function call which I
>>>>>>>>> guess
>>>>>>>>> slows it down further...hmm
>>>>>>>>
>>>>>>>> I don't see much that could be done differently, after a brief
>>>>>>>> look.
>>>>>>>>
>>>>>>>> stats.pearsonr could be replaced by an array version using directly
>>>>>>>> the formula for correlation even with nans. wilcoxon looks slow,
>>>>>>>> and I
>>>>>>>> never tried or seen a faster version.
>>>>>>>>
>>>>>>>> just a reminder, the p-values are for a single test, when you have
>>>>>>>> many of them, then they don't have the right size/confidence level
>>>>>>>> for
>>>>>>>> an overall or joint test. (some packages report a Bonferroni
>>>>>>>> correction in this case)
>>>>>>>>
>>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> i.e.
>>>>>>>>>
>>>>>>>>> def compareSnowData(jules_var):
>>>>>>>>> ? ?# Extract the 11 years of snow data and return
>>>>>>>>> ? ?outrows = 180
>>>>>>>>> ? ?outcols = 360
>>>>>>>>> ? ?numyears = 11
>>>>>>>>> ? ?nummonths = 12
>>>>>>>>>
>>>>>>>>> ? ?# Read various files
>>>>>>>>> ? ?fname="world_valid_jules_pts.ascii"
>>>>>>>>> ? ?(numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>>>>>
>>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>>>>> ? ?jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>>>> numcols=1,
>>>>>>>>> \
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>>>> ? ?fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>>>>> ? ?jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
>>>>>>>>> numcols=1,
>>>>>>>>> \
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? timesteps=132, numvars=26)
>>>>>>>>>
>>>>>>>>> ? ?# grab some space
>>>>>>>>> ? ?data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>>>> dtype=np.float32)
>>>>>>>>> ? ?data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>>>>>> dtype=np.float32)
>>>>>>>>> ? ?pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32)
>>>>>>>>> *
>>>>>>>>> np.nan
>>>>>>>>> ? ?wilcoxStats_snow = np.ones((outrows, outcols),
>>>>>>>>> dtype=np.float32) *
>>>>>>>>> np.nan
>>>>>>>>>
>>>>>>>>> ? ?# extract the data
>>>>>>>>> ? ?data1_snow = jules_data1[:,jules_var,:,0]
>>>>>>>>> ? ?data2_snow = jules_data2[:,jules_var,:,0]
>>>>>>>>> ? ?data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>>>>> ? ?data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>>>>> ? ?#for month in xrange(numyears * nummonths):
>>>>>>>>> ? ?# ? ?for i in xrange(numpts):
>>>>>>>>> ? ?# ? ? ? ?data1 =
>>>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>>>>> ? ?# ? ? ? ?data2 =
>>>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>>>>> ? ?# ? ? ? ?if data1 >= 0.0:
>>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = data1
>>>>>>>>> ? ?# ? ? ? ?else:
>>>>>>>>> ? ?# ? ? ? ? ? ?data1_snow[month,i] = np.nan
>>>>>>>>> ? ?# ? ? ? ?if data2 > 0.0:
>>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = data2
>>>>>>>>> ? ?# ? ? ? ?else:
>>>>>>>>> ? ?# ? ? ? ? ? ?data2_snow[month,i] = np.nan
>>>>>>>>>
>>>>>>>>> ? ?# exclude any months from *both* arrays where we have dodgy
>>>>>>>>> data,
>>>>>>>>> else
>>>>>>>>> we
>>>>>>>>> ? ?# can't do the correlations correctly!!
>>>>>>>>> ? ?data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>>>>> ? ?data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>>>>>
>>>>>>>>> ? ?# put data on a regular grid...
>>>>>>>>> ? ?print 'regridding landpts...'
>>>>>>>>> ? ?for i in xrange(numpts):
>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>>> stats
>>>>>>>>> func
>>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>>
>>>>>>>>> ? ? ? ?# r^2
>>>>>>>>> ? ? ? ?# exclude v.small arrays, i.e. we need just less over 4
>>>>>>>>> years
>>>>>>>>> of
>>>>>>>>> data
>>>>>>>>> ? ? ? ?if len(x) and len(y) > 50:
>>>>>>>>> ? ? ? ? ? ?pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>>>>>
>>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>>> ? ? ? ?d = x - y
>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>>> non-zero
>>>>>>>>> differences
>>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>>>> np.mean(x - y)
>>>>>>>>>
>>>>>>>>> ? ?return (pearsonsr_snow, wilcoxStats_snow)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Also I then need to remap the 2D array I make onto another grid
>>>>>>>>>>> (the
>>>>>>>>>>> world in
>>>>>>>>>>> this case). Which again I had am doing with a loop (note numpts
>>>>>>>>>>> is
>>>>>>>>>>> a
>>>>>>>>>>> lot
>>>>>>>>>>> bigger than my example above).
>>>>>>>>>>>
>>>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32)
>>>>>>>>>>> *
>>>>>>>>>>> np.nan
>>>>>>>>>>> for i in xrange(numpts):
>>>>>>>>>>> ? ? ? ?# exclude the NaN, note masking them doesn't work in the
>>>>>>>>>>> stats
>>>>>>>>>>> func
>>>>>>>>>>> ? ? ? ?x = data1_snow[:,i]
>>>>>>>>>>> ? ? ? ?x = x[np.isfinite(x)]
>>>>>>>>>>> ? ? ? ?y = data2_snow[:,i]
>>>>>>>>>>> ? ? ? ?y = y[np.isfinite(y)]
>>>>>>>>>>>
>>>>>>>>>>> ? ? ? ?# wilcox signed rank test
>>>>>>>>>>> ? ? ? ?# make sure we have enough samples to do the test
>>>>>>>>>>> ? ? ? ?d = x - y
>>>>>>>>>>> ? ? ? ?d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>>>>>> non-zero
>>>>>>>>>>> differences
>>>>>>>>>>> ? ? ? ?count = len(d)
>>>>>>>>>>> ? ? ? ?if count > 10:
>>>>>>>>>>> ? ? ? ? ? ?z, pval = stats.wilcoxon(x, y)
>>>>>>>>>>> ? ? ? ? ? ?# only map out sign different data
>>>>>>>>>>> ? ? ? ? ? ?if pval < 0.05:
>>>>>>>>>>> ? ? ? ? ? ? ? ?wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1]
>>>>>>>>>>> =
>>>>>>>>>>> np.mean(x - y)
>>>>>>>>>>>
>>>>>>>>>>> Now I think I can push the data in one move into the
>>>>>>>>>>> wilcoxStats_snow
>>>>>>>>>>> array
>>>>>>>>>>> by removing the index,
>>>>>>>>>>> but I can't see how I will get the individual x and y pts for
>>>>>>>>>>> each
>>>>>>>>>>> array
>>>>>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>>>>>> course
>>>>>>>>>>> doesn't work!
>>>>>>>>>>>
>>>>>>>>>>> x = data1_snow[:,:]
>>>>>>>>>>> x = x[np.isfinite(x)]
>>>>>>>>>>> y = data2_snow[:,:]
>>>>>>>>>>> y = y[np.isfinite(y)]
>>>>>>>>>>>
>>>>>>>>>>> # r^2
>>>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>>>>>>>> data
>>>>>>>>>>> if len(x) and len(y) > 50:
>>>>>>>>>>> ? ?pearsonsr_snow[((180-1)-(rows-1)),cols-1] =
>>>>>>>>>>> (stats.pearsonr(x,
>>>>>>>>>>> y)[0])**2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then
>>>>>>>>>> you
>>>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
>>>>>>>>>> arrays
>>>>>>>>>> at a time (if I read the help correctly).
>>>>>>>>>>
>>>>>>>>>> Also the presence of nans might force the use a loop.
>>>>>>>>>> stats.mstats
>>>>>>>>>> has
>>>>>>>>>> masked array versions, but I didn't see wilcoxon in the list.
>>>>>>>>>> (Even
>>>>>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>>>>>> masked array versions still have to loop in many cases.)
>>>>>>>>>>
>>>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is
>>>>>>>>>> not
>>>>>>>>>> calculated then it might be worth to use only array operations up
>>>>>>>>>> to
>>>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
>>>>>>>>>> not
>>>>>>>>>> worth thinking too hard about this.
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> mdekauwe wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
>>>>>>>>>>>> work.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>>>>>> there
>>>>>>>>>>>> a
>>>>>>>>>>>> link you can recommend I should read? Does that mean given I
>>>>>>>>>>>> have
>>>>>>>>>>>> 4dims
>>>>>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>>>>>
>>>>>>>>>> There were several discussions on the mailing lists (fancy
>>>>>>>>>> slicing
>>>>>>>>>> and
>>>>>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>>>>>> shapes, you can look up the details.
>>>>>>>>>> when in doubt, I use np.arange(...)
>>>>>>>>>>
>>>>>>>>>> Josef
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe
>>>>>>>>>>>>> <mdekauwe at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks that works...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> step
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> two
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>>>>>> dimension,
>>>>>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A lot quicker...!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe
>>>>>>>>>>>>>>> <mdekauwe at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> 2D
>>>>>>>>>>>>>>>> array,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
>>>>>>>>>>>>>>>> reality
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> arrays
>>>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I
>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> numpts=10
>>>>>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>>>>> ? ?for j in xrange(numpts):
>>>>>>>>>>>>>>>> ? ? ? ?new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The index arrays need to be broadcastable against each
>>>>>>>>>>>>>>> other.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think this should do it
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
>>>>>>>>>>>>>>> np.arange(numpts),
>>>>>>>>>>>>>>> 0]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Josef
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at
>>>>>>>>>>>>>>>> Nabble.com.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> View this message in context:
>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711708.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Fri May 28 16:48:16 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 16:48:16 -0400
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimh2E9KlTlAYFybKl0-pxc_-aTj_jF8yO_rUPGv@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
	<AANLkTimYjsNkzL-LjF7GL3JZJwc3aDr5HMtJhr9bWwpk@mail.gmail.com>
	<AANLkTimh2E9KlTlAYFybKl0-pxc_-aTj_jF8yO_rUPGv@mail.gmail.com>
Message-ID: <AANLkTimi6RwKVHte-HTbHtPOQPA-cSYT3tFPSGu8tElq@mail.gmail.com>

On Fri, May 28, 2010 at 4:28 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi,
>
> Nice to see the issue to be taken up again.
>
>>> Discrete distributions on the real line don't *have* a pdf...
>>
>> Well, they *have* one; they just can't be implemented in floating point. :-)
>
> A distribution function can be decomposed in a part that can be
> represented by a pdf (absolute continuous), and a part that can be
> represented by a pmf (jumps), and some extra stuff (Cantor like
> functions) that we can safely neglect from a numerical point of view.
> (The discussion above is resolved in any book on measure theory, and
> covered by the Lebesgue decomposition theorem, for the interested...)
>
> I don't know how to resolve the name problem about pdf and pmf. I must
> admit I find it quite disturbing, since I also make these typo's, but
> I don't know how to resolve this neatly.
>
>>>> snip
> pdf(x), cdf(x) ?with x float would need to know whether x is a support
> point, but which might not be equal to the actual point because of
> floating point problems.
> So, the direct translation of rv_discrete doesn't work, and it looks
> like at least pdf needs to be accessible either pointwise for queries
> or using known support points for actual calculations.
>>>>
> About representing floats in a hashtable, this is indeed hard to
> resolve. However, for the particular purpose of defining a random
> variable with support on a finite set of reals, it might suffice to
> represent these reals by fractions, for instance, \pi \approx 22/7 (I
> realize better approximations exist.), and then store 22 and 7
> separately. Then generalize rv_discrete such that it accepts tuples
> like (22, 7, 1.) with dtype (int, int, float).

What is the float in this? how do you find which fractions to use?

I don't want to restrict necessarily to finite number of points, but
countable, e.g. what's the distribution of sqrt(x) where x is Poisson
(just made up).
I still need to think about this, I thought the cheapest might be
approx_equal rounding, or searchsorted for the finite case.

But I think the direct access for a specific x won't be a big usecase,
because the calculations for expectation, cdf or other calculations
can loop over the array of support points. That's why I was thinking
about dual access to pmf.

>
>>>>
> No fun, and EDA dropped.
>>>>
> EDA dropped? I don't know what EDA means. I hope it does not have
> severe consequences.

today is my lucky day with typos, how about ETA
http://en.wikipedia.org/wiki/Estimated_time_of_arrival

Josef
http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm

>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From ben.root at ou.edu  Fri May 28 17:49:28 2010
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 28 May 2010 16:49:28 -0500
Subject: [SciPy-User] re[SciPy-user] moving for loops...
In-Reply-To: <28711581.post@talk.nabble.com>
References: <28633477.post@talk.nabble.com>
	<AANLkTimQCJnB-zlu9l1N79LE3jX0S_NQDSI9-ut4GvYh@mail.gmail.com> 
	<28634924.post@talk.nabble.com>
	<AANLkTimE1zui8vGjHVKa7vBfB7DjgGY8P23TXVhyA4Fq@mail.gmail.com> 
	<28640602.post@talk.nabble.com> <28640656.post@talk.nabble.com> 
	<AANLkTikWfXvfM8BfGZfPrR_R70hti-VjX-b_PgkPN8c3@mail.gmail.com> 
	<28642434.post@talk.nabble.com>
	<AANLkTil-7H2tl9iXeNhu0K-QfSIHrhRIOmOrz4PfK1yx@mail.gmail.com> 
	<28686356.post@talk.nabble.com>
	<AANLkTimDNAADTO6KrAaZSeesBk4Lmeo3D9T5pTc3fOGp@mail.gmail.com> 
	<28711249.post@talk.nabble.com>
	<AANLkTim-KbVaBpNl6aDkShwmsk1yGKlfI-kkSHhU4AZz@mail.gmail.com> 
	<28711444.post@talk.nabble.com>
	<AANLkTim6NGYQ9RjXPGqgTgp-hiOIohsS4XARrMcUAlbU@mail.gmail.com> 
	<28711581.post@talk.nabble.com>
Message-ID: <AANLkTinvnG_1vHYg65avM_1H0TpmlJ-fRLWE0zzKEAXf@mail.gmail.com>

If you want an average for each month from your timeseries, then the sneaky
way would be to reshape your array so that the time dimension is split into
two (month, year) dimensions.

For a 1-D array, this would be:

> dataarray = numpy.mod(numpy.arange(36), 12)
> print dataarray
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  0,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
       10, 11])
> datamatrix = dataarray.reshape((-1, 12))
> print datamatrix
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])

Hope that helps.

Ben Root


On Fri, May 28, 2010 at 3:28 PM, mdekauwe <mdekauwe at gmail.com> wrote:

>
> OK so I just need to have a quick loop across the 12 months then, that is
> fine, just thought there might have been a sneaky way!
>
> Really appreciated, getting there slowly!
>
>
>
> josef.pktd wrote:
> >
> > On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
> >>
> >> ok - something like this then...but how would i get the index for the
> >> month
> >> for the data array (where month is 0, 1, 2, 4 ... 11)?
> >>
> >> data[month,:] = array[xrange(0, numyears * nummonths,
> nummonths),VAR,:,0]
> >
> > you would still need to start at the right month
> > data[month,:] = array[xrange(month, numyears * nummonths,
> > nummonths),VAR,:,0]
> > or
> > data[month,:] = array[month: numyears * nummonths : nummonths),VAR,:,0]
> >
> > an alternative would be a reshape with an extra month dimension and
> > then sum only once over the year axis. this might be faster but
> > trickier to get the correct reshape .
> >
> > Josef
> >
> >>
> >> and would that be quicker than making an array months...
> >>
> >> months = np.arange(numyears * nummonths)
> >>
> >> and you that instead like you suggested x[start:end:12,:]?
> >>
> >> Many thanks again...
> >>
> >>
> >> josef.pktd wrote:
> >>>
> >>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
> >>>>
> >>>> Ok thanks...I'll take a look.
> >>>>
> >>>> Back to my loops issue. What if instead this time I wanted to take an
> >>>> average so every march in 11 years, is there a quicker way to go about
> >>>> doing
> >>>> that than my current method?
> >>>>
> >>>> nummonths = 12
> >>>> numyears = 11
> >>>>
> >>>> for month in xrange(nummonths):
> >>>>    for i in xrange(numpts):
> >>>>        for ym in xrange(month, numyears * nummonths, nummonths):
> >>>>            data[month, i] += array[ym, VAR, land_pts_index[i], 0]
> >>>
> >>>
> >>> x[start:end:12,:] gives you every 12th row of an array x
> >>>
> >>> something like this should work to get rid of the inner loop, or you
> >>> could directly put
> >>> range(month, numyears * nummonths, nummonths) into the array instead
> >>> of ym and sum()
> >>>
> >>> Josef
> >>>
> >>>
> >>>>
> >>>> so for each point in the array for a given month i am jumping through
> >>>> and
> >>>> getting the next years month and so on, summing it.
> >>>>
> >>>> Thanks...
> >>>>
> >>>>
> >>>> josef.pktd wrote:
> >>>>>
> >>>>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com>
> wrote:
> >>>>>>
> >>>>>> Could you possibly if you have time explain further your comment re
> >>>>>> the
> >>>>>> p-values, your suggesting I am misusing them?
> >>>>>
> >>>>> Depends on your use and interpretation
> >>>>>
> >>>>> test statistics, p-values are random variables, if you look at
> several
> >>>>> tests at the same time, some p-values will be large just by chance.
> >>>>> If, for example you just look at the largest test statistic, then the
> >>>>> distribution for the max of several test statistics is not the same
> as
> >>>>> the distribution for a single test statistic
> >>>>>
> >>>>> http://en.wikipedia.org/wiki/Multiple_comparisons
> >>>>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
> >>>>>
> >>>>> we also just had a related discussion for ANOVA post-hoc tests on the
> >>>>> pystatsmodels group.
> >>>>>
> >>>>> Josef
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>
> >>>>>> josef.pktd wrote:
> >>>>>>>
> >>>>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Sounds like I am stuck with the loop as I need to do the
> comparison
> >>>>>>>> for
> >>>>>>>> each
> >>>>>>>> pixel of the world and then I have a basemap function call which I
> >>>>>>>> guess
> >>>>>>>> slows it down further...hmm
> >>>>>>>
> >>>>>>> I don't see much that could be done differently, after a brief
> look.
> >>>>>>>
> >>>>>>> stats.pearsonr could be replaced by an array version using directly
> >>>>>>> the formula for correlation even with nans. wilcoxon looks slow,
> and
> >>>>>>> I
> >>>>>>> never tried or seen a faster version.
> >>>>>>>
> >>>>>>> just a reminder, the p-values are for a single test, when you have
> >>>>>>> many of them, then they don't have the right size/confidence level
> >>>>>>> for
> >>>>>>> an overall or joint test. (some packages report a Bonferroni
> >>>>>>> correction in this case)
> >>>>>>>
> >>>>>>> Josef
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> i.e.
> >>>>>>>>
> >>>>>>>> def compareSnowData(jules_var):
> >>>>>>>>    # Extract the 11 years of snow data and return
> >>>>>>>>    outrows = 180
> >>>>>>>>    outcols = 360
> >>>>>>>>    numyears = 11
> >>>>>>>>    nummonths = 12
> >>>>>>>>
> >>>>>>>>    # Read various files
> >>>>>>>>    fname="world_valid_jules_pts.ascii"
> >>>>>>>>    (numpts, land_pts_index, latitude, longitude, rows, cols) =
> >>>>>>>> jo.read_land_points_ascii(fname, 1.0)
> >>>>>>>>
> >>>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
> >>>>>>>>    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238,
> >>>>>>>> numcols=1,
> >>>>>>>> \
> >>>>>>>>                       timesteps=132, numvars=26)
> >>>>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
> >>>>>>>>    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238,
> >>>>>>>> numcols=1,
> >>>>>>>> \
> >>>>>>>>                       timesteps=132, numvars=26)
> >>>>>>>>
> >>>>>>>>    # grab some space
> >>>>>>>>    data1_snow = np.zeros((nummonths * numyears, numpts),
> >>>>>>>> dtype=np.float32)
> >>>>>>>>    data2_snow = np.zeros((nummonths * numyears, numpts),
> >>>>>>>> dtype=np.float32)
> >>>>>>>>    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32)
> *
> >>>>>>>> np.nan
> >>>>>>>>    wilcoxStats_snow = np.ones((outrows, outcols),
> dtype=np.float32)
> >>>>>>>> *
> >>>>>>>> np.nan
> >>>>>>>>
> >>>>>>>>    # extract the data
> >>>>>>>>    data1_snow = jules_data1[:,jules_var,:,0]
> >>>>>>>>    data2_snow = jules_data2[:,jules_var,:,0]
> >>>>>>>>    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
> >>>>>>>>    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
> >>>>>>>>    #for month in xrange(numyears * nummonths):
> >>>>>>>>    #    for i in xrange(numpts):
> >>>>>>>>    #        data1 =
> >>>>>>>> jules_data1[month,jules_var,land_pts_index[i],0]
> >>>>>>>>    #        data2 =
> >>>>>>>> jules_data2[month,jules_var,land_pts_index[i],0]
> >>>>>>>>    #        if data1 >= 0.0:
> >>>>>>>>    #            data1_snow[month,i] = data1
> >>>>>>>>    #        else:
> >>>>>>>>    #            data1_snow[month,i] = np.nan
> >>>>>>>>    #        if data2 > 0.0:
> >>>>>>>>    #            data2_snow[month,i] = data2
> >>>>>>>>    #        else:
> >>>>>>>>    #            data2_snow[month,i] = np.nan
> >>>>>>>>
> >>>>>>>>    # exclude any months from *both* arrays where we have dodgy
> >>>>>>>> data,
> >>>>>>>> else
> >>>>>>>> we
> >>>>>>>>    # can't do the correlations correctly!!
> >>>>>>>>    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
> >>>>>>>>    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
> >>>>>>>>
> >>>>>>>>    # put data on a regular grid...
> >>>>>>>>    print 'regridding landpts...'
> >>>>>>>>    for i in xrange(numpts):
> >>>>>>>>        # exclude the NaN, note masking them doesn't work in the
> >>>>>>>> stats
> >>>>>>>> func
> >>>>>>>>        x = data1_snow[:,i]
> >>>>>>>>        x = x[np.isfinite(x)]
> >>>>>>>>        y = data2_snow[:,i]
> >>>>>>>>        y = y[np.isfinite(y)]
> >>>>>>>>
> >>>>>>>>        # r^2
> >>>>>>>>        # exclude v.small arrays, i.e. we need just less over 4
> >>>>>>>> years
> >>>>>>>> of
> >>>>>>>> data
> >>>>>>>>        if len(x) and len(y) > 50:
> >>>>>>>>            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> >>>>>>>> (stats.pearsonr(x, y)[0])**2
> >>>>>>>>
> >>>>>>>>        # wilcox signed rank test
> >>>>>>>>        # make sure we have enough samples to do the test
> >>>>>>>>        d = x - y
> >>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
> >>>>>>>> non-zero
> >>>>>>>> differences
> >>>>>>>>        count = len(d)
> >>>>>>>>        if count > 10:
> >>>>>>>>            z, pval = stats.wilcoxon(x, y)
> >>>>>>>>            # only map out sign different data
> >>>>>>>>            if pval < 0.05:
> >>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> >>>>>>>> np.mean(x - y)
> >>>>>>>>
> >>>>>>>>    return (pearsonsr_snow, wilcoxStats_snow)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> josef.pktd wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Also I then need to remap the 2D array I make onto another grid
> >>>>>>>>>> (the
> >>>>>>>>>> world in
> >>>>>>>>>> this case). Which again I had am doing with a loop (note numpts
> >>>>>>>>>> is
> >>>>>>>>>> a
> >>>>>>>>>> lot
> >>>>>>>>>> bigger than my example above).
> >>>>>>>>>>
> >>>>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32)
> >>>>>>>>>> *
> >>>>>>>>>> np.nan
> >>>>>>>>>> for i in xrange(numpts):
> >>>>>>>>>>        # exclude the NaN, note masking them doesn't work in the
> >>>>>>>>>> stats
> >>>>>>>>>> func
> >>>>>>>>>>        x = data1_snow[:,i]
> >>>>>>>>>>        x = x[np.isfinite(x)]
> >>>>>>>>>>        y = data2_snow[:,i]
> >>>>>>>>>>        y = y[np.isfinite(y)]
> >>>>>>>>>>
> >>>>>>>>>>        # wilcox signed rank test
> >>>>>>>>>>        # make sure we have enough samples to do the test
> >>>>>>>>>>        d = x - y
> >>>>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
> >>>>>>>>>> non-zero
> >>>>>>>>>> differences
> >>>>>>>>>>        count = len(d)
> >>>>>>>>>>        if count > 10:
> >>>>>>>>>>            z, pval = stats.wilcoxon(x, y)
> >>>>>>>>>>            # only map out sign different data
> >>>>>>>>>>            if pval < 0.05:
> >>>>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1]
> >>>>>>>>>> =
> >>>>>>>>>> np.mean(x - y)
> >>>>>>>>>>
> >>>>>>>>>> Now I think I can push the data in one move into the
> >>>>>>>>>> wilcoxStats_snow
> >>>>>>>>>> array
> >>>>>>>>>> by removing the index,
> >>>>>>>>>> but I can't see how I will get the individual x and y pts for
> >>>>>>>>>> each
> >>>>>>>>>> array
> >>>>>>>>>> member correctly without the loop, this was my attempt which of
> >>>>>>>>>> course
> >>>>>>>>>> doesn't work!
> >>>>>>>>>>
> >>>>>>>>>> x = data1_snow[:,:]
> >>>>>>>>>> x = x[np.isfinite(x)]
> >>>>>>>>>> y = data2_snow[:,:]
> >>>>>>>>>> y = y[np.isfinite(y)]
> >>>>>>>>>>
> >>>>>>>>>> # r^2
> >>>>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of
> >>>>>>>>>> data
> >>>>>>>>>> if len(x) and len(y) > 50:
> >>>>>>>>>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] =
> (stats.pearsonr(x,
> >>>>>>>>>> y)[0])**2
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then
> >>>>>>>>> you
> >>>>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d
> >>>>>>>>> arrays
> >>>>>>>>> at a time (if I read the help correctly).
> >>>>>>>>>
> >>>>>>>>> Also the presence of nans might force the use a loop.
> stats.mstats
> >>>>>>>>> has
> >>>>>>>>> masked array versions, but I didn't see wilcoxon in the list.
> >>>>>>>>> (Even
> >>>>>>>>> when vectorized operations would work with regular arrays, nan or
> >>>>>>>>> masked array versions still have to loop in many cases.)
> >>>>>>>>>
> >>>>>>>>> If you have many columns with count <= 10, so that wilcoxon is
> not
> >>>>>>>>> calculated then it might be worth to use only array operations up
> >>>>>>>>> to
> >>>>>>>>> that point. If wilcoxon is calculated most of the time, then it's
> >>>>>>>>> not
> >>>>>>>>> worth thinking too hard about this.
> >>>>>>>>>
> >>>>>>>>> Josef
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> thanks.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> mdekauwe wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods
> >>>>>>>>>>> work.
> >>>>>>>>>>>
> >>>>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
> >>>>>>>>>>> there
> >>>>>>>>>>> a
> >>>>>>>>>>> link you can recommend I should read? Does that mean given I
> >>>>>>>>>>> have
> >>>>>>>>>>> 4dims
> >>>>>>>>>>> that Josef's suggestion would be more advised in this case?
> >>>>>>>>>
> >>>>>>>>> There were several discussions on the mailing lists (fancy
> slicing
> >>>>>>>>> and
> >>>>>>>>> indexing). Your case is safe, but if you run in future into funny
> >>>>>>>>> shapes, you can look up the details.
> >>>>>>>>> when in doubt, I use np.arange(...)
> >>>>>>>>>
> >>>>>>>>> Josef
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> josef.pktd wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <
> mdekauwe at gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks that works...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that
> >>>>>>>>>>>>> was
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> step
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>> was struggling with, so this forms a 2D array which replaces
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> two
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>> loops? Do I have that right?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
> >>>>>>>>>>>> dimension,
> >>>>>>>>>>>> then you can use slicing. It might be faster.
> >>>>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
> >>>>>>>>>>>> dimensions can have some surprise switching of axes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Josef
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> A lot quicker...!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Martin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> josef.pktd wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe
> >>>>>>>>>>>>>> <mdekauwe at gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> 2D
> >>>>>>>>>>>>>>> array,
> >>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in
> >>>>>>>>>>>>>>> reality
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> arrays
> >>>>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
> >>>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>> well
> >>>>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
> >>>>>>>>>>>>>>> difficult
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> get
> >>>>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I
> get
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>> precompute the indices's i and j i.e.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> i = np.arange(tsteps)
> >>>>>>>>>>>>>>> j = np.arange(numpts)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> but just can't get my head round how i then use them...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Martin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> import numpy as np
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> numpts=10
> >>>>>>>>>>>>>>> tsteps = 12
> >>>>>>>>>>>>>>> vari = 22
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
> >>>>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
> >>>>>>>>>>>>>>> index = np.arange(numpts)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> for i in xrange(tsteps):
> >>>>>>>>>>>>>>>    for j in xrange(numpts):
> >>>>>>>>>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The index arrays need to be broadcastable against each
> other.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this should do it
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5,
> >>>>>>>>>>>>>> np.arange(numpts),
> >>>>>>>>>>>>>> 0]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Josef
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> View this message in context:
> >>>>>>>>>>>>>>>
> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
> >>>>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at
> Nabble.com.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>> SciPy-User mailing list
> >>>>>>>>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>> SciPy-User mailing list
> >>>>>>>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> View this message in context:
> >>>>>>>>>>>>>
> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
> >>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>> SciPy-User mailing list
> >>>>>>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> SciPy-User mailing list
> >>>>>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> View this message in context:
> >>>>>>>>>>
> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
> >>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> SciPy-User mailing list
> >>>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> SciPy-User mailing list
> >>>>>>>>> SciPy-User at scipy.org
> >>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> View this message in context:
> >>>>>>>>
> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
> >>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> SciPy-User mailing list
> >>>>>>>> SciPy-User at scipy.org
> >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> SciPy-User mailing list
> >>>>>>> SciPy-User at scipy.org
> >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> View this message in context:
> >>>>>>
> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
> >>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> SciPy-User mailing list
> >>>>>> SciPy-User at scipy.org
> >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>>
> >>>>> _______________________________________________
> >>>>> SciPy-User mailing list
> >>>>> SciPy-User at scipy.org
> >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
> >>>> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>>>
> >>>> _______________________________________________
> >>>> SciPy-User mailing list
> >>>> SciPy-User at scipy.org
> >>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>>
> >>> _______________________________________________
> >>> SciPy-User mailing list
> >>> SciPy-User at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >> http://old.nabble.com/removing-for-loops...-tp28633477p28711444.html
> >> Sent from the Scipy-User mailing list archive at Nabble.com.
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/removing-for-loops...-tp28633477p28711581.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/323d92e2/attachment.html>

From vanforeest at gmail.com  Fri May 28 18:05:48 2010
From: vanforeest at gmail.com (nicky van foreest)
Date: Sat, 29 May 2010 00:05:48 +0200
Subject: [SciPy-User] deterministic random variable
In-Reply-To: <AANLkTimi6RwKVHte-HTbHtPOQPA-cSYT3tFPSGu8tElq@mail.gmail.com>
References: <w2pfa510ff81005030304zfc498046w9d97b0e499cd635b@mail.gmail.com>
	<w2z1cd32cbb1005030616p3f03a175i22b13efefe67f538@mail.gmail.com>
	<j2ofa510ff81005031232m89149cf6p1f058e67ae049021@mail.gmail.com>
	<AANLkTimG1prqxHpdYrwAElpL5OJEfmbBj1z_y-NBo41G@mail.gmail.com>
	<AANLkTilqhD1SdmtbA22cfXwY5FZmrGow2cXZjAR_X-yN@mail.gmail.com>
	<AANLkTimRiMKRP-8bmLhKrN4IC0EF48BYSQC_fwxjsEiJ@mail.gmail.com>
	<AANLkTinvsd5-5DJnWJogaZL6dgVqSoyWDmEZEqblljQ2@mail.gmail.com>
	<AANLkTimYjsNkzL-LjF7GL3JZJwc3aDr5HMtJhr9bWwpk@mail.gmail.com>
	<AANLkTimh2E9KlTlAYFybKl0-pxc_-aTj_jF8yO_rUPGv@mail.gmail.com>
	<AANLkTimi6RwKVHte-HTbHtPOQPA-cSYT3tFPSGu8tElq@mail.gmail.com>
Message-ID: <AANLkTinO00YRIq7aoebdKocoqU649wml6S3CNZIzjwQ7@mail.gmail.com>

>> like (22, 7, 1.) with dtype (int, int, float).
>
> What is the float in this?

The float is intended to refer to the probability mass at the atom.

> how do you find which fractions to use?

In part this is trivial, e.g., 0.05 = 5/100. A division by the
greatest common denominator is assumed (and can be implemented in the
background.). Another way would be to use a continued fraction
approximation for a given float. There exist (as far as i know) very
fast recursive algorithms to compute continued fractions, and it is
known that in some sense these fractions are the most efficient to
approximate reals.

>
> I don't want to restrict necessarily to finite number of points, but
> countable, e.g. what's the distribution of sqrt(x) where x is Poisson
> (just made up).

Sure, but numerically this cannot be a problem. At the risk of being
mathematically pedantic, but since the range of the the distribution
function is bounded (in fact, it is [0,1]) the number of jumps is at
most countable. However, even if the number of atoms is countable,
most (that is, nearly all) of these atoms cannot be seen by the
computer, as these atoms are `too small'. The largest number of atoms
that can be seen is roughly 10e-16 (assuming floats, rather than
doubles). I cannot image any distribution functions based on empirical
data that contains this amount of atoms.

> I still need to think about this, I thought the cheapest might be
> approx_equal rounding

I did not know of this function.

, or searchsorted

I suppose this is much slower than using fractions in hash tables.

> But I think the direct access for a specific x won't be a big usecase,
> because the calculations for expectation, cdf or other calculations
> can loop over the array of support points. That's why I was thinking
> about dual access to pmf.

I don't follow you here.

> today is my lucky day with typos, how about ETA
> http://en.wikipedia.org/wiki/Estimated_time_of_arrival

My wife is complaining about my ETA :-) its bed time here.

bye

Nicky


From christophermarkstrickland at gmail.com  Fri May 28 21:03:40 2010
From: christophermarkstrickland at gmail.com (Chris Strickland)
Date: Sat, 29 May 2010 11:03:40 +1000
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
Message-ID: <AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>

On Sat, May 29, 2010 at 12:15 AM, <josef.pktd at gmail.com> wrote:

>
> It would need a new method for each distribution, e.g. _loglike, _logpdf
> So, this is work, and for some distributions the log wouldn't simplify
> much.
>
> I am not sure what you mean the log wouldn't simply much.


> I proposed this once together with other improvements (but without
> response).
>
> This is a little disappointing, it significantly reduces how useful the
library is. In actual fact I have not been able to use a single function for
anything other than testing (although, I have been using numpy.random for
random numbers, this scipy.stats collection seems far more complete). This
would dramatically change if a log version of the distribution were
available. I think in most cases this would be a straightforward addition at
least for the pdf.


> The second useful method for estimation would be _fitstart, which
> provides distribution specific starting values for fit, e.g. a moment
> estimator, or a simple rules of thumb
> http://projects.scipy.org/scipy/ticket/808
>
>
> Here are some of my currently planned enhancements to the distributions:
>
>
> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py<http://bazaar.launchpad.net/%7Escipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py>
>
> but I just checked, it looks like I forgot to copy the _loglike method
> that I started from my experimental scripts.
>
> For a few distributions, where this is possible, it would also be
> useful to add the gradient with respect to the parameters, (or even
> the Hessian). But this is currently mostly just an idea, since we need
> some analytical gradients in the estimation of stats models.
>
> This certainly would be nice as well.

>
> >
> > If there is not is it possible for me to suggest that this feature is
> added.
> > There is such an excellent range of distributions, each with such an
> > impressive range of options, it seems ashame to have to mostly manually
> code
> > up the log of pdfs and often call the log of CDFs from R.
>
> So far I only thought about log pdf, because I wanted it for Maximum
> Likelihood estimation.
>
> It is also necessary for MCMC.


> Do you have a rough idea for which distributions log cdf would work?
> that is, for which distribution is an analytical or efficient
> numerical expression possible.
>

Not sure off the top of my head as I mainly require the only the pdf. I was,
however, doing a little survival analysis the other day though and it was
required. The log of the survival and hazard functions would be nice also.
So far I have only required the exponential (analytical), weibull
(analytical), normal (numerical) and powernormal (analytical function of the
log of the normal cdf). I just had a peak at the R source code for pnorm
(R's code for the normal cdf). The function is not big and also licensed
under the GNU public licence. I assume it could be fairly easily ported to
scipy.

>
> I also think that scipy.stats.distributions could be one of the best
> (broadest, consistent) collection of univariate distributions that I
> have seen so far, once we fill in some missing pieces.
>
> As a way forward, I think we could make the distributions into a
> numerical encyclopedia by adding private methods to those
> distributions where it makes sense, like log pdf, log cdf and I also
> started to add characteristic functions to some distributions in my
> experimental scripts.
> If you have a collection of logpdf, logcdf, we could add a trac ticket for
> this.
>

I could fairly easy whip up a collection of functions to compute the logpdf
for a large number of distributions. Not sure about the CDFs but I can look
into it as well. The pdf's are definitely far more urgent for my own work. I
am a bit busy at work though for the next three weeks so it would have to be
after that.

>
> However, this would miss the generic broadcasting part of the public
> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
> those because of the overhead.
>
>
> I'm working on and off on this, so it's moving only slowly (and my
> wishlist is big).
> (for example, I was reading up on extreme value distributions in
> actuarial science and hydrology to get a better overview over the
> estimators.)
>
>
> So, I really love to hear any ideas, feedback, and see contributions
> to improving the distributions.
>
> Josef
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/8a89af12/attachment.html>

From josef.pktd at gmail.com  Fri May 28 22:53:37 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 28 May 2010 22:53:37 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
Message-ID: <AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>

On Fri, May 28, 2010 at 9:03 PM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
>
>
> On Sat, May 29, 2010 at 12:15 AM, <josef.pktd at gmail.com> wrote:
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify
>> much.
>>
> I am not sure what you mean the log wouldn't simply much.
>
>>
>> I proposed this once together with other improvements (but without
>> response).
>>
> This is a little disappointing, it significantly reduces how useful the
> library is. In actual fact I have not been able to use a single function for
> anything other than testing (although, I have been using numpy.random for
> random numbers, this scipy.stats collection seems far more complete). This
> would dramatically change if a log version of the distribution were
> available. I think in most cases this would be a straightforward addition at
> least for the pdf.

I don't think for many use cases log(stats.t.pdf) or many other
distributions the performance and accuracy hit would be large enough
to make it useless. At least, I haven't seen any other comments in
this direction.

On of the main use cases for me of stats.distributions are all the
statistical test distributions, t, F, chi2 and so on. Howver, in
statsmodels we have a mixture of calls to the pdf/cdf of
stats.distributions and reimplementations of loglikelhood functions,
where the scipy version is also just used for testing.

>
>
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
> This certainly would be nice as well.
>>
>> >
>> > If there is not is it possible for me to suggest that this feature is
>> > added.
>> > There is such an excellent range of distributions, each with such an
>> > impressive range of options, it seems ashame to have to mostly manually
>> > code
>> > up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
> It is also necessary for MCMC.

pymc has many distributions with loglike in fortran for speed, but for
most distributions only loglike and rvs are defined, if I remember
correctly.

>
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>
> Not sure off the top of my head as I mainly require the only the pdf. I was,
> however, doing a little survival analysis the other day though and it was
> required. The log of the survival and hazard functions would be nice also.
> So far I have only required the exponential (analytical), weibull
> (analytical), normal (numerical) and powernormal (analytical function of the
> log of the normal cdf). I just had a peak at the R source code for pnorm
> (R's code for the normal cdf). The function is not big and also licensed
> under the GNU public licence. I assume it could be fairly easily ported to
> scipy.

R's license, GPL, is incompatible with the license of scipy, BSD.
While they are allowed to look at our code, code that goes into scipy
cannot be based on GPL licensed code.

If never seen it mentioned before that there is a direct function for
log(norm.cdf). Which functions and packages in R implement the
logarithm of the cdf of these distributions?

The cdf for several distributions (including normal) is implement in
Fortran or C in scipy.special, and I've never seen a log version for
them.

>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for
>> this.
>
> I could fairly easy whip up a collection of functions to compute the logpdf
> for a large number of distributions. Not sure about the CDFs but I can look
> into it as well. The pdf's are definitely far more urgent for my own work. I
> am a bit busy at work though for the next three weeks so it would have to be
> after that.

I looked at some of the distributions, and logpdf could be more
efficiently calculated in many of them and very often also logcdf

I opened a ticket for this
http://projects.scipy.org/scipy/ticket/1184

I also saw that there are still smaller, numerical improvements
possible in several distributions.

Thanks,

Josef


>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From njs at pobox.com  Fri May 28 23:24:21 2010
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 28 May 2010 20:24:21 -0700
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
Message-ID: <AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com>

On Fri, May 28, 2010 at 7:53 PM,  <josef.pktd at gmail.com> wrote:
> I don't think for many use cases log(stats.t.pdf) or many other
> distributions the performance and accuracy hit would be large enough
> to make it useless. At least, I haven't seen any other comments in
> this direction.

"Useless" is a value judgement, of course, but it doesn't seem *too*
far off to me either. I myself basically always find myself wanting
log-space values, and even if you're just doing statistical tests,
numerical precision in the tails can become very practically relevant
when doing multiple hypothesis correction.

> R's license, GPL, is incompatible with the license of scipy, BSD.
> While they are allowed to look at our code, code that goes into scipy
> cannot be based on GPL licensed code.

You mean, they're allowed to copy our code, and we're allowed to look
at their code for reference but can't use it directly :-).

> If never seen it mentioned before that there is a direct function for
> log(norm.cdf). Which functions and packages in R implement the
> logarithm of the cdf of these distributions?
>
> The cdf for several distributions (including normal) is implement in
> Fortran or C in scipy.special, and I've never seen a log version for
> them.

Yet R does in fact use specialized code for computing the log-cdf for
the normal distribution... at least over some parts of its range. I'm
not sure how much difference it makes or anything, I'm just reporting
on the existence of 'if' statements in the source :-). See the base R
distribution, src/nmath/pnorm.c (which also contains references).

-- Nathaniel


From christophermarkstrickland at gmail.com  Fri May 28 23:34:50 2010
From: christophermarkstrickland at gmail.com (Chris Strickland)
Date: Sat, 29 May 2010 13:34:50 +1000
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
Message-ID: <AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>

On Sat, May 29, 2010 at 12:53 PM, <josef.pktd at gmail.com> wrote:

>
> I don't think for many use cases log(stats.t.pdf) or many other
> distributions the performance and accuracy hit would be large enough
> to make it useless. At least, I haven't seen any other comments in
> this direction.
>
> On of the main use cases for me of stats.distributions are all the
> statistical test distributions, t, F, chi2 and so on. Howver, in
> statsmodels we have a mixture of calls to the pdf/cdf of
> stats.distributions and reimplementations of loglikelhood functions,
> where the scipy version is also just used for testing.
>
> The main use for me is in specifying (log) prior distributions, (log)
posterior distributions and log-likelihood functions. There is simply no way
around using the log pdf in the vast majority of cases in MCMC analysis.
Whilst it is trivial for me to simply write functions when I need them it
would obviously benefit the statistical community as a whole if the option
was available in the excellent set of distributions that are available as a
part of Scipy.


R's license, GPL, is incompatible with the license of scipy, BSD.
> While they are allowed to look at our code, code that goes into scipy
> cannot be based on GPL licensed code.
>
> Fair enough. Still at least for the normal cdf we could simply use the
references in the R code to write a Scipy version.


> If never seen it mentioned before that there is a direct function for
> log(norm.cdf). Which functions and packages in R implement the
> logarithm of the cdf of these distributions?
>

pnorm it is in the stats package for the log of the normal CDF. Kind of
essential for distributions like the powernormal as well that use the normal
cdf as a part of their pdf.

>
> The cdf for several distributions (including normal) is implement in
> Fortran or C in scipy.special, and I've never seen a log version for
> them.
>
> I looked at some of the distributions, and logpdf could be more
> efficiently calculated in many of them and very often also logcdf
>
> I opened a ticket for this
> http://projects.scipy.org/scipy/ticket/1184
>
> I also saw that there are still smaller, numerical improvements
> possible in several distributions.
>
> Thanks,
>
> Josef
>
> ______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/c44fec94/attachment.html>

From josef.pktd at gmail.com  Sat May 29 00:00:07 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 00:00:07 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com>
Message-ID: <AANLkTinh2GcXunB35uvF0LQDrMUfu-32HuecLgbJYWSO@mail.gmail.com>

On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 7:53 PM, ?<josef.pktd at gmail.com> wrote:
>> I don't think for many use cases log(stats.t.pdf) or many other
>> distributions the performance and accuracy hit would be large enough
>> to make it useless. At least, I haven't seen any other comments in
>> this direction.
>
> "Useless" is a value judgement, of course, but it doesn't seem *too*
> far off to me either. I myself basically always find myself wanting
> log-space values, and even if you're just doing statistical tests,
> numerical precision in the tails can become very practically relevant
> when doing multiple hypothesis correction.
>
>> R's license, GPL, is incompatible with the license of scipy, BSD.
>> While they are allowed to look at our code, code that goes into scipy
>> cannot be based on GPL licensed code.
>
> You mean, they're allowed to copy our code, and we're allowed to look
> at their code for reference but can't use it directly :-).

We are allowed to look at their manuals but not their code.
(Life ain't fair.)

>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>
> Yet R does in fact use specialized code for computing the log-cdf for
> the normal distribution... at least over some parts of its range. I'm
> not sure how much difference it makes or anything, I'm just reporting
> on the existence of 'if' statements in the source :-). See the base R
> distribution, src/nmath/pnorm.c (which also contains references).

pnorm is the cdf not the log of the cdf, that's what I thought,
 but I just saw that they have a "log.p" option.

from the R manual:
"""
For pnorm, based on
Cody, W. D. (1993) Algorithm 715: SPECFUN ? A portable FORTRAN package
of special function routines and test drivers
"""

this sounds similar to the fortran or c code that scipy.special has. I
never tried to read that one, except for the doc comments.

accuracy doesn't seem to be a problem

np.log(stats.norm.cdf(np.linspace(-20,20,21))) - [r.pnorm(x,
log_p=True) for x in np.linspace(-20,20,21)]
array([ -2.84217094e-14,  -2.84217094e-14,  -2.84217094e-14,
         0.00000000e+00,  -1.42108547e-14,  -7.10542736e-15,
        -7.10542736e-15,  -3.55271368e-15,  -1.77635684e-15,
        -4.44089210e-16,   0.00000000e+00,   0.00000000e+00,
        -5.42101086e-20,  -5.53867815e-17,  -4.40377573e-17,
         7.61985302e-24,   1.77648211e-33,   7.79353682e-45,
         6.38875440e-58,   9.74094892e-73,   2.75362412e-89])

except the small numbers in the tail look much better in R

>>> np.log(stats.norm.cdf(np.linspace(-20,20,21)))
array([ -2.03917155e+02,  -1.65812373e+02,  -1.31695396e+02,
        -1.01563034e+02,  -7.54106730e+01,  -5.32312852e+01,
        -3.50134372e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587701e-10,  -6.66133815e-16,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00])

>>> np.array([r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)])
array([ -2.03917155e+02,  -1.65812373e+02,  -1.31695396e+02,
        -1.01563034e+02,  -7.54106730e+01,  -5.32312852e+01,
        -3.50134372e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587646e-10,  -6.22096057e-16,
        -7.61985302e-24,  -1.77648211e-33,  -7.79353682e-45,
        -6.38875440e-58,  -9.74094892e-73,  -2.75362412e-89])

except if we use a branch cut

>>> np.log1p(-stats.norm.sf(np.linspace(-20,20,21)))
array([            -Inf,             -Inf,             -Inf,
                   -Inf,             -Inf,             -Inf,
        -3.49450411e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587646e-10,  -6.22096057e-16,
        -7.61985302e-24,  -1.77648211e-33,  -7.79353682e-45,
        -6.38875440e-58,  -9.74094892e-73,  -2.75362412e-89])

I have no idea about speed.

Josef

>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Sat May 29 00:15:49 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 00:15:49 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
Message-ID: <AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>

On Fri, May 28, 2010 at 11:34 PM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
>
>
> On Sat, May 29, 2010 at 12:53 PM, <josef.pktd at gmail.com> wrote:
>>
>> I don't think for many use cases log(stats.t.pdf) or many other
>> distributions the performance and accuracy hit would be large enough
>> to make it useless. At least, I haven't seen any other comments in
>> this direction.
>>
>> On of the main use cases for me of stats.distributions are all the
>> statistical test distributions, t, F, chi2 and so on. Howver, in
>> statsmodels we have a mixture of calls to the pdf/cdf of
>> stats.distributions and reimplementations of loglikelhood functions,
>> where the scipy version is also just used for testing.
>>
> The main use for me is in specifying (log) prior distributions, (log)
> posterior distributions and log-likelihood functions. There is simply no way
> around using the log pdf in the vast majority of cases in MCMC analysis.
> Whilst it is trivial for me to simply write functions when I need them it
> would obviously benefit the statistical community as a whole if the option
> was available in the excellent set of distributions that are available as a
> part of Scipy.

I agree that it would be very good to have this generally available,
and I will appreciate it for maximum likelihood.
For MCMC (where I know only little about the details), it might,
however, always be faster to work with dedicated code as in pymc.

>
>>
>> R's license, GPL, is incompatible with the license of scipy, BSD.
>> While they are allowed to look at our code, code that goes into scipy
>> cannot be based on GPL licensed code.
>>
> Fair enough. Still at least for the normal cdf we could simply use the
> references in the R code to write a Scipy version.

If it's the C or Fortran implementation, then it is out of my
competence, I'm a pure scripting language person.

Another idea for this would be to see if any of the pymc code for this
would fit into scipy. Since I leave Fortran to others, I never looked
at it.

I think if we get the easier cases, logpdf and logcdf that don't
require compiled versions, we would be able to cover already a
considerable range of the distributions.

However, I also agree now, having norm.logcdf would also be useful for
many other distributions.

>
>>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>
> pnorm it is in the stats package for the log of the normal CDF. Kind of
> essential for distributions like the powernormal as well that use the normal
> cdf as a part of their pdf.

see previous message, I never paid enough attention to see the log.p option

Josef

>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>>
>> I looked at some of the distributions, and logpdf could be more
>> efficiently calculated in many of them and very often also logcdf
>>
>> I opened a ticket for this
>> http://projects.scipy.org/scipy/ticket/1184
>>
>> I also saw that there are still smaller, numerical improvements
>> possible in several distributions.
>>
>> Thanks,
>>
>> Josef
>>
>> ______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From josef.pktd at gmail.com  Sat May 29 00:20:51 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 00:20:51 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
	<AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>
Message-ID: <AANLkTineTmQLT-GQ8Pw6ZLS1VGt-vJ-En8ow7uriAW1N@mail.gmail.com>

On Sat, May 29, 2010 at 12:15 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, May 28, 2010 at 11:34 PM, Chris Strickland
> <christophermarkstrickland at gmail.com> wrote:
>>
>>
>> On Sat, May 29, 2010 at 12:53 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> I don't think for many use cases log(stats.t.pdf) or many other
>>> distributions the performance and accuracy hit would be large enough
>>> to make it useless. At least, I haven't seen any other comments in
>>> this direction.
>>>
>>> On of the main use cases for me of stats.distributions are all the
>>> statistical test distributions, t, F, chi2 and so on. Howver, in
>>> statsmodels we have a mixture of calls to the pdf/cdf of
>>> stats.distributions and reimplementations of loglikelhood functions,
>>> where the scipy version is also just used for testing.
>>>
>> The main use for me is in specifying (log) prior distributions, (log)
>> posterior distributions and log-likelihood functions. There is simply no way
>> around using the log pdf in the vast majority of cases in MCMC analysis.
>> Whilst it is trivial for me to simply write functions when I need them it
>> would obviously benefit the statistical community as a whole if the option
>> was available in the excellent set of distributions that are available as a
>> part of Scipy.
>
> I agree that it would be very good to have this generally available,
> and I will appreciate it for maximum likelihood.
> For MCMC (where I know only little about the details), it might,
> however, always be faster to work with dedicated code as in pymc.
>
>>
>>>
>>> R's license, GPL, is incompatible with the license of scipy, BSD.
>>> While they are allowed to look at our code, code that goes into scipy
>>> cannot be based on GPL licensed code.
>>>
>> Fair enough. Still at least for the normal cdf we could simply use the
>> references in the R code to write a Scipy version.
>
> If it's the C or Fortran implementation, then it is out of my
> competence, I'm a pure scripting language person.
>
> Another idea for this would be to see if any of the pymc code for this
> would fit into scipy. Since I leave Fortran to others, I never looked
> at it.

I'm contradicting and confusing myself, I don't think pymc has any cdf
code, only pdf.

Josef

>
> I think if we get the easier cases, logpdf and logcdf that don't
> require compiled versions, we would be able to cover already a
> considerable range of the distributions.
>
> However, I also agree now, having norm.logcdf would also be useful for
> many other distributions.
>
>>
>>>
>>> If never seen it mentioned before that there is a direct function for
>>> log(norm.cdf). Which functions and packages in R implement the
>>> logarithm of the cdf of these distributions?
>>
>> pnorm it is in the stats package for the log of the normal CDF. Kind of
>> essential for distributions like the powernormal as well that use the normal
>> cdf as a part of their pdf.
>
> see previous message, I never paid enough attention to see the log.p option
>
> Josef
>
>>>
>>> The cdf for several distributions (including normal) is implement in
>>> Fortran or C in scipy.special, and I've never seen a log version for
>>> them.
>>>
>>> I looked at some of the distributions, and logpdf could be more
>>> efficiently calculated in many of them and very often also logcdf
>>>
>>> I opened a ticket for this
>>> http://projects.scipy.org/scipy/ticket/1184
>>>
>>> I also saw that there are still smaller, numerical improvements
>>> possible in several distributions.
>>>
>>> Thanks,
>>>
>>> Josef
>>>
>>> ______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>


From christophermarkstrickland at gmail.com  Sat May 29 00:32:48 2010
From: christophermarkstrickland at gmail.com (Chris Strickland)
Date: Sat, 29 May 2010 14:32:48 +1000
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTineTmQLT-GQ8Pw6ZLS1VGt-vJ-En8ow7uriAW1N@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
	<AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>
	<AANLkTineTmQLT-GQ8Pw6ZLS1VGt-vJ-En8ow7uriAW1N@mail.gmail.com>
Message-ID: <AANLkTim_I0VT9vOzF6O585Z-f-_4dPdsHuyMbQbFm8hy@mail.gmail.com>

On Sat, May 29, 2010 at 2:20 PM, <josef.pktd at gmail.com> wrote:

>
> Josef
>

> >
> > I think if we get the easier cases, logpdf and logcdf that don't
> > require compiled versions, we would be able to cover already a
> > considerable range of the distributions.
> >
> > However, I also agree now, having norm.logcdf would also be useful for
> > many other distributions.
> >
>
I  can write in C and Fortran (I prefer Fortran with Python) so I could
easily write code ( in around three weeks when my workload reduces) for
cases where compiled languages are required.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/48d60261/attachment.html>

From josef.pktd at gmail.com  Sat May 29 00:43:04 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 00:43:04 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTim_I0VT9vOzF6O585Z-f-_4dPdsHuyMbQbFm8hy@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
	<AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>
	<AANLkTineTmQLT-GQ8Pw6ZLS1VGt-vJ-En8ow7uriAW1N@mail.gmail.com>
	<AANLkTim_I0VT9vOzF6O585Z-f-_4dPdsHuyMbQbFm8hy@mail.gmail.com>
Message-ID: <AANLkTinFMwfrjjl7DtEnE-qiNr2VDoOXTurzj0pABWSG@mail.gmail.com>

On Sat, May 29, 2010 at 12:32 AM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
>
>
> On Sat, May 29, 2010 at 2:20 PM, <josef.pktd at gmail.com> wrote:
>>
>> Josef
>>
>> >
>> > I think if we get the easier cases, logpdf and logcdf that don't
>> > require compiled versions, we would be able to cover already a
>> > considerable range of the distributions.
>> >
>> > However, I also agree now, having norm.logcdf would also be useful for
>> > many other distributions.
>> >
>
> I? can write in C and Fortran (I prefer Fortran with Python) so I could
> easily write code ( in around three weeks when my workload reduces) for
> cases where compiled languages are required.

I'm busy for another month.

Just a warning in case you don't know: scipy is still stuck at fortran
77 because some platforms (e.g. Windows with mingw - which I use)
support only g77. I don't know when the upgrade will happen.

cython would be an alternative that's easier to maintain.

Josef


>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From christophermarkstrickland at gmail.com  Sat May 29 00:55:26 2010
From: christophermarkstrickland at gmail.com (Chris Strickland)
Date: Sat, 29 May 2010 14:55:26 +1000
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinFMwfrjjl7DtEnE-qiNr2VDoOXTurzj0pABWSG@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTimazIkBPdju0j0BGMWWAtcaxOCwufD0_Ewgrzl2@mail.gmail.com>
	<AANLkTimf0yxv3rMFiA5cr9e6aM6HB7coFpe7lwv8n60u@mail.gmail.com>
	<AANLkTineTmQLT-GQ8Pw6ZLS1VGt-vJ-En8ow7uriAW1N@mail.gmail.com>
	<AANLkTim_I0VT9vOzF6O585Z-f-_4dPdsHuyMbQbFm8hy@mail.gmail.com>
	<AANLkTinFMwfrjjl7DtEnE-qiNr2VDoOXTurzj0pABWSG@mail.gmail.com>
Message-ID: <AANLkTim62oYIVcGIFq8dIaa8kOLmUnHPJG8XM5pe1qAo@mail.gmail.com>

On Sat, May 29, 2010 at 2:43 PM, <josef.pktd at gmail.com> wrote:

> On Sat, May 29, 2010 at 12:32 AM, Chris Strickland
> <christophermarkstrickland at gmail.com> wrote:
> >
> >
> > On Sat, May 29, 2010 at 2:20 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> Josef
> >>
> >> >
> >> > I think if we get the easier cases, logpdf and logcdf that don't
> >> > require compiled versions, we would be able to cover already a
> >> > considerable range of the distributions.
> >> >
> >> > However, I also agree now, having norm.logcdf would also be useful for
> >> > many other distributions.
> >> >
> >
> > I  can write in C and Fortran (I prefer Fortran with Python) so I could
> > easily write code ( in around three weeks when my workload reduces) for
> > cases where compiled languages are required.
>
> I'm busy for another month.
>
> Just a warning in case you don't know: scipy is still stuck at fortran
> 77 because some platforms (e.g. Windows with mingw - which I use)
> support only g77. I don't know when the upgrade will happen.
>
> cython would be an alternative that's easier to maintain.
>
> Josef
>
> Fortran77 isn't a problem assuming we can just link our Fortran using f2py.
I don't really have any experience linking code manually.  Hmm, the g77
compiler has been superseded by  the gfortran complier. Does this not work
under Windows?

>
>
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/a6860764/attachment.html>

From jsalvati at u.washington.edu  Sat May 29 01:15:21 2010
From: jsalvati at u.washington.edu (John Salvatier)
Date: Fri, 28 May 2010 22:15:21 -0700
Subject: [SciPy-User] log pdf, cdf, etc
Message-ID: <AANLkTikTvODGmRG8gIab1fgBEDaGSkGgcnp_tW339BKS@mail.gmail.com>

The package PyMC(http://code.google.com/p/pymc/) contains fortran log
likelihood functions for a lot of distributions, but you would have to look
at the source code to figure out how to use them since they are meant mostly
for internal use. They are not ufuncs but can handle arrays or single values
for each parameter. A recent PyMC branch also contains similar log
likelihood gradient functions for the same distributions (
http://github.com/pymc-devs/pymc/tree/gradientBranch).

Hope that is useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100528/70671e64/attachment.html>

From m.boumans at gmx.net  Sat May 29 00:47:33 2010
From: m.boumans at gmx.net (bowie_22)
Date: Sat, 29 May 2010 04:47:33 +0000 (UTC)
Subject: [SciPy-User] leastsq interface and features
Message-ID: <loom.20100529T063045-666@post.gmane.org>

Hello,

at the moment am I am evaluating scipy as a subsitute for Matlab.
One important use case for me is to fit a model to measured data.

In Matlab I use lsqnonlin from the Optimization Toolbox.
In Scipy I would use leastsq.

By comparing the 2 approaches with a "daily use" point of view I see the
following improvements for the scipy module

1) setting the options for the algorithm:
   ML uses a structure together with optimset optimget 
   --> lsqnonlin has a quite short signature
IMPROVEMENT: introduce a common options structure for all optimization algos

2) there is the possibilty to set an output function that is called in each
iteration step in ML. That can be used for displaying the current status of the
optimization. For me a quite important point as my customers want to "see" what
happens (not just throwing measured data to an algorithm and get back a set of
numbers)
  IMPROVEMENT: introduce a output function that can be called each iteration

3) give lower and upper bounds for the optimization variables. Also quite
important as in my uses cases you have normally an idea in which range your
parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide this
knowledge as lower bounds and upper bounds to lsqnonlin. 
IMPROVEMENT: introduce lower and upper bounds

My problem:
How can I help to get this improvements to scipy? Is this the correct address to
ask?

Regs 

Marcus


From josef.pktd at gmail.com  Sat May 29 10:46:14 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 10:46:14 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTikTvODGmRG8gIab1fgBEDaGSkGgcnp_tW339BKS@mail.gmail.com>
References: <AANLkTikTvODGmRG8gIab1fgBEDaGSkGgcnp_tW339BKS@mail.gmail.com>
Message-ID: <AANLkTimTxLSkN03Apht7pg6Uy4nJrbU5KLSKj44vDcWO@mail.gmail.com>

On Sat, May 29, 2010 at 1:15 AM, John Salvatier
<jsalvati at u.washington.edu> wrote:
> The package PyMC(http://code.google.com/p/pymc/) contains fortran log
> likelihood functions for a lot of distributions, but you would have to look
> at the source code to figure out how to use them since they are meant mostly
> for internal use. They are not ufuncs but can handle arrays or single values
> for each parameter. A recent PyMC branch also contains similar log
> likelihood gradient functions for the same distributions
> (http://github.com/pymc-devs/pymc/tree/gradientBranch).

Thanks, I will have a look at the gradient branch


To get started, I added a test script to the ticked that makes it
easier to add and test new methods for lnpdf and lncdf.  It's adapted
from the scipy tests and tests all distributions that have a _lnpdf or
_lncdf method.

The new methods can just be added to the script to monkey patch
scipy.stats.distributions.

I monkey patched 13 easy cases so far, mainly to check that the script
works. (Still far too go for full coverage of cases where this makes
sense.)

The tests use nosetests and test for (almost) equality of the new
methods with the log of the old methods, and check a simple
broadcasting case.

Everyone is invited to add new cases, and to report any problems with
the script.

I hope that helps to get the ball rolling.

Josef

>
> Hope that is useful.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From vincent at vincentdavis.net  Sat May 29 11:06:42 2010
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sat, 29 May 2010 09:06:42 -0600
Subject: [SciPy-User] leastsq interface and features
In-Reply-To: <loom.20100529T063045-666@post.gmane.org>
References: <loom.20100529T063045-666@post.gmane.org>
Message-ID: <AANLkTik_Av7kr1Dh5_UojjGzGGxwDs6FwI7muI3_snps@mail.gmail.com>

On Fri, May 28, 2010 at 10:47 PM, bowie_22 <m.boumans at gmx.net> wrote:

> Hello,
>
> at the moment am I am evaluating scipy as a subsitute for Matlab.
> One important use case for me is to fit a model to measured data.
>
> In Matlab I use lsqnonlin from the Optimization Toolbox.
> In Scipy I would use leastsq.
>
> By comparing the 2 approaches with a "daily use" point of view I see the
> following improvements for the scipy module
>
> 1) setting the options for the algorithm:
>   ML uses a structure together with optimset optimget
>   --> lsqnonlin has a quite short signature
> IMPROVEMENT: introduce a common options structure for all optimization
> algos
>
> 2) there is the possibilty to set an output function that is called in each
> iteration step in ML. That can be used for displaying the current status of
> the
> optimization. For me a quite important point as my customers want to "see"
> what
> happens (not just throwing measured data to an algorithm and get back a set
> of
> numbers)
>  IMPROVEMENT: introduce a output function that can be called each iteration
>
> 3) give lower and upper bounds for the optimization variables. Also quite
> important as in my uses cases you have normally an idea in which range your
> parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide
> this
> knowledge as lower bounds and upper bounds to lsqnonlin.
> IMPROVEMENT: introduce lower and upper bounds
>
> My problem:
> How can I help to get this improvements to scipy? Is this the correct
> address to
> ask?
>

This is a good place to ask, I am surprised you have not already gotten
several responses.

I see in the docs that (?leastsq? is a wrapper around MINPACK?s lmdif and
lmder algorithms.)

You can also file a ticket at scipy, an example would be
http://projects.scipy.org/scipy/ticket/808
You can take a look at the source code.
http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py

That said you might want to look at http://statsmodels.sourceforge.net/  And
you could help by contributing :) This is not a part of the project I am
real familiar with but should be.

I don't have much more in the way of answers, I just don't know them :)

Vincent


> Regs
>
> Marcus
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/ee7fe9d7/attachment.html>

From josef.pktd at gmail.com  Sat May 29 11:42:12 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 11:42:12 -0400
Subject: [SciPy-User] leastsq interface and features
In-Reply-To: <AANLkTik_Av7kr1Dh5_UojjGzGGxwDs6FwI7muI3_snps@mail.gmail.com>
References: <loom.20100529T063045-666@post.gmane.org>
	<AANLkTik_Av7kr1Dh5_UojjGzGGxwDs6FwI7muI3_snps@mail.gmail.com>
Message-ID: <AANLkTinWVYFRU1EZKQXYT8wWJpfHUI0ixFkM1iAOUmt7@mail.gmail.com>

On Sat, May 29, 2010 at 11:06 AM, Vincent Davis <vincent at vincentdavis.net>wrote:

> On Fri, May 28, 2010 at 10:47 PM, bowie_22 <m.boumans at gmx.net> wrote:
>
>> Hello,
>>
>> at the moment am I am evaluating scipy as a subsitute for Matlab.
>> One important use case for me is to fit a model to measured data.
>>
>> In Matlab I use lsqnonlin from the Optimization Toolbox.
>> In Scipy I would use leastsq.
>>
>> By comparing the 2 approaches with a "daily use" point of view I see the
>> following improvements for the scipy module
>>
>> 1) setting the options for the algorithm:
>>   ML uses a structure together with optimset optimget
>>   --> lsqnonlin has a quite short signature
>> IMPROVEMENT: introduce a common options structure for all optimization
>> algos
>>
>> 2) there is the possibilty to set an output function that is called in
>> each
>> iteration step in ML. That can be used for displaying the current status
>> of the
>> optimization. For me a quite important point as my customers want to "see"
>> what
>> happens (not just throwing measured data to an algorithm and get back a
>> set of
>> numbers)
>>  IMPROVEMENT: introduce a output function that can be called each
>> iteration
>>
>> 3) give lower and upper bounds for the optimization variables. Also quite
>> important as in my uses cases you have normally an idea in which range
>> your
>> parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide
>> this
>> knowledge as lower bounds and upper bounds to lsqnonlin.
>> IMPROVEMENT: introduce lower and upper bounds
>>
>> My problem:
>> How can I help to get this improvements to scipy? Is this the correct
>> address to
>> ask?
>>
>
> This is a good place to ask, I am surprised you have not already gotten
> several responses.
>
> I see in the docs that (?leastsq? is a wrapper around MINPACK?s lmdif and
> lmder algorithms.)
>
> You can also file a ticket at scipy, an example would be
> http://projects.scipy.org/scipy/ticket/808
> You can take a look at the source code.
> http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py
>
> That said you might want to look at http://statsmodels.sourceforge.net/ And you could help by contributing :) This is not a part of the project I
> am real familiar with but should be.
>

statsmodels has nothing to offer for this case.

A consistent interface to solvers is in openopt.

I don't know if leastsq could be extended, but for constraint minimization
scipy has other minimizers available. I think many of the other fmins have
callbacks and printing

Josef


>
> I don't have much more in the way of answers, I just don't know them :)
>
> Vincent
>
>
>> Regs
>>
>> Marcus
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/6b23feb8/attachment.html>

From j.reid at mail.cryst.bbk.ac.uk  Sat May 29 12:18:07 2010
From: j.reid at mail.cryst.bbk.ac.uk (John Reid)
Date: Sat, 29 May 2010 17:18:07 +0100
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
Message-ID: <htrenv$k1n$1@dough.gmane.org>

josef.pktd at gmail.com wrote:
> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
> <christophermarkstrickland at gmail.com> wrote:
>> Hi,
>>
>> When using any of the distributions of scipy.stats there does not seem to be
>> the ability (or at least I cannot figure out how) to have the function
>> return
>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>> essential.
>> For instance suppose we are interested in an exponential distribution for a
>> random variable x with a hyperparameter lambda there needs to be an option
>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>
>> Is there a way to do this using the distributions in scipy.stats?
> 
> It would need a new method for each distribution, e.g. _loglike, _logpdf
> So, this is work, and for some distributions the log wouldn't simplify much.

Presumably it would be easy to add a method to all distributions that 
called the pdf and took its log. This could be over-riden for those 
distributions for which a specialised log_pdf implementation was 
available. This would make the entry cost of providing the functionality 
lower.


From josef.pktd at gmail.com  Sat May 29 12:49:06 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 12:49:06 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <htrenv$k1n$1@dough.gmane.org>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<htrenv$k1n$1@dough.gmane.org>
Message-ID: <AANLkTinRCiS4nwHqA6pF7FgCQxyVDJsvkFq4fgeOsxFO@mail.gmail.com>

On Sat, May 29, 2010 at 12:18 PM, John Reid <j.reid at mail.cryst.bbk.ac.uk> wrote:
> josef.pktd at gmail.com wrote:
>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>> <christophermarkstrickland at gmail.com> wrote:
>>> Hi,
>>>
>>> When using any of the distributions of scipy.stats there does not seem to be
>>> the ability (or at least I cannot figure out how) to have the function
>>> return
>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>> essential.
>>> For instance suppose we are interested in an exponential distribution for a
>>> random variable x with a hyperparameter lambda there needs to be an option
>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>
>>> Is there a way to do this using the distributions in scipy.stats?
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify much.
>
> Presumably it would be easy to add a method to all distributions that
> called the pdf and took its log. This could be over-riden for those
> distributions for which a specialised log_pdf implementation was
> available. This would make the entry cost of providing the functionality
> lower.

Yes, I haven't thought about it yet for this case, but that's how the
system for the current methods works, only _pdf or _cdf is required,
all other methods have generic substitutes (which are sometimes very
slow.)

For testing, not having the generic version is easier, I have to
figure out again how to check whether a method was defined in the
super or the sub class (instead of using hasattr).

a naming question
_lnpdf or _logpdf ? _lncdf or _logcdf ?

Josef

>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From jsseabold at gmail.com  Sat May 29 13:21:53 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 29 May 2010 13:21:53 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinRCiS4nwHqA6pF7FgCQxyVDJsvkFq4fgeOsxFO@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com> 
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com> 
	<htrenv$k1n$1@dough.gmane.org>
	<AANLkTinRCiS4nwHqA6pF7FgCQxyVDJsvkFq4fgeOsxFO@mail.gmail.com>
Message-ID: <AANLkTikNLtWqeFwR-ZMUDM4MBFIqGpuUXXAA6k5ism9t@mail.gmail.com>

On Sat, May 29, 2010 at 12:49 PM,  <josef.pktd at gmail.com> wrote:
> a naming question
> _lnpdf or _logpdf ? _lncdf or _logcdf ?
>

My vote would be for log over ln since np.log(np.e) == 1.

Skipper


From josef.pktd at gmail.com  Sat May 29 14:20:19 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 14:20:19 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTikNLtWqeFwR-ZMUDM4MBFIqGpuUXXAA6k5ism9t@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<htrenv$k1n$1@dough.gmane.org>
	<AANLkTinRCiS4nwHqA6pF7FgCQxyVDJsvkFq4fgeOsxFO@mail.gmail.com>
	<AANLkTikNLtWqeFwR-ZMUDM4MBFIqGpuUXXAA6k5ism9t@mail.gmail.com>
Message-ID: <AANLkTintS5aj_tINSD4-W87UbEWY-NdenZ3_4rw6wJBO@mail.gmail.com>

On Sat, May 29, 2010 at 1:21 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Sat, May 29, 2010 at 12:49 PM, ?<josef.pktd at gmail.com> wrote:
>> a naming question
>> _lnpdf or _logpdf ? _lncdf or _logcdf ?
>>
>
> My vote would be for log over ln since np.log(np.e) == 1.

Yes, I don't know what I was thinking early in the morning, nobody
seems to use ln anymore

I edited the script
rename ln ->log
make print optional
add generic method
replace hasattr by  (not sure this is the best way)

def isnotgeneric(distfn, methodname):
    sub = getattr(distfn, methodname)
    gen = getattr(stats.distributions.rv_continuous, methodname)
    return not sub.im_func is gen.im_func

(generic methods don't pass the tests for all distributions, there are
some problems with broadcasting in the current _pdf, _cdf
implementations for some distributions)

Josef

>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From charlesr.harris at gmail.com  Sat May 29 14:29:10 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 29 May 2010 12:29:10 -0600
Subject: [SciPy-User] leastsq interface and features
In-Reply-To: <loom.20100529T063045-666@post.gmane.org>
References: <loom.20100529T063045-666@post.gmane.org>
Message-ID: <AANLkTinS9GYOxTZtLHKXi0Cp4SznRCXIVPpLW_0eNDCb@mail.gmail.com>

On Fri, May 28, 2010 at 10:47 PM, bowie_22 <m.boumans at gmx.net> wrote:

> Hello,
>
> at the moment am I am evaluating scipy as a subsitute for Matlab.
> One important use case for me is to fit a model to measured data.
>
> In Matlab I use lsqnonlin from the Optimization Toolbox.
> In Scipy I would use leastsq.
>
> By comparing the 2 approaches with a "daily use" point of view I see the
> following improvements for the scipy module
>
> 1) setting the options for the algorithm:
>   ML uses a structure together with optimset optimget
>   --> lsqnonlin has a quite short signature
> IMPROVEMENT: introduce a common options structure for all optimization
> algos
>
>
I hate the options structure. It's a hidden super-huge signature with poorly
chosen defaults.


> 2) there is the possibilty to set an output function that is called in each
> iteration step in ML. That can be used for displaying the current status of
> the
> optimization. For me a quite important point as my customers want to "see"
> what
> happens (not just throwing measured data to an algorithm and get back a set
> of
> numbers)
>  IMPROVEMENT: introduce a output function that can be called each iteration
>
>
Possible, I think.


> 3) give lower and upper bounds for the optimization variables. Also quite
> important as in my uses cases you have normally an idea in which range your
> parameter should be (mass of a car 1200 - 1800 kg). In ML you can provide
> this
> knowledge as lower bounds and upper bounds to lsqnonlin.
> IMPROVEMENT: introduce lower and upper bounds
>
>
Suggest using a different function for this. Matlab tends to over-overload
it's functions.


> My problem:
> How can I help to get this improvements to scipy? Is this the correct
> address to
> ask?
>
>
Yes.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/66547672/attachment.html>

From njs at pobox.com  Sat May 29 15:44:20 2010
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 29 May 2010 12:44:20 -0700
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinh2GcXunB35uvF0LQDrMUfu-32HuecLgbJYWSO@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com>
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com>
	<AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com>
	<AANLkTinh2GcXunB35uvF0LQDrMUfu-32HuecLgbJYWSO@mail.gmail.com>
Message-ID: <AANLkTileqUwoLqht-JzFBGyh3FFdhbfb2SrHULqN3SLq@mail.gmail.com>

On Fri, May 28, 2010 at 9:00 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, May 28, 2010 at 7:53 PM, ?<josef.pktd at gmail.com> wrote:
>>> R's license, GPL, is incompatible with the license of scipy, BSD.
>>> While they are allowed to look at our code, code that goes into scipy
>>> cannot be based on GPL licensed code.
>>
>> You mean, they're allowed to copy our code, and we're allowed to look
>> at their code for reference but can't use it directly :-).
>
> We are allowed to look at their manuals but not their code.
> (Life ain't fair.)

It sounds like you guys have this well in hand, but just a point here
-- you certainly are allowed to look at their code, just not copy the
"expressive aspects" of it. (Saying you can't *look* at it because of
the license is like saying writers can't read other people's novels!)
"Expressive" is a tricky term, of course -- IIUC it's basically
anything that could be changed while preserving functionality (because
the functionality, the algorithm itself, is not covered by copyright).
So, say, variable names certainly count as expressive, decisions about
which way to lay out the code, etc. If one wants to be really safe,
one can write down a textual description of the algorithm and then ask
someone else to translate back to code (the "clean room" method).

So you do have to be a bit careful, but when you have code that
contains valuable information that isn't really written down anywhere
else then I'd say it's worth it.

-- Nathaniel


From oliphant at enthought.com  Sat May 29 16:51:38 2010
From: oliphant at enthought.com (Travis Oliphant)
Date: Sat, 29 May 2010 15:51:38 -0500
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
Message-ID: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>


On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:

> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
> <christophermarkstrickland at gmail.com> wrote:
>> Hi,
>> 
>> When using any of the distributions of scipy.stats there does not seem to be
>> the ability (or at least I cannot figure out how) to have the function
>> return
>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>> essential.
>> For instance suppose we are interested in an exponential distribution for a
>> random variable x with a hyperparameter lambda there needs to be an option
>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>> 
>> Is there a way to do this using the distributions in scipy.stats?
> 
> It would need a new method for each distribution, e.g. _loglike, _logpdf
> So, this is work, and for some distributions the log wouldn't simplify much.
> 
> I proposed this once together with other improvements (but without response).
> 
> The second useful method for estimation would be _fitstart, which
> provides distribution specific starting values for fit, e.g. a moment
> estimator, or a simple rules of thumb
> http://projects.scipy.org/scipy/ticket/808
> 
> 
> Here are some of my currently planned enhancements to the distributions:
> 
> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py

Hey Josef, 

I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).  

I also added your _fitstart suggestion.   I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.  

Do you have updated code I could look at.   These are relatively easy adds that I would like to put in today.     Do you have check-in rights to SciPy?   

Thanks,

-Travis

> 
> but I just checked, it looks like I forgot to copy the _loglike method
> that I started from my experimental scripts.
> 
> For a few distributions, where this is possible, it would also be
> useful to add the gradient with respect to the parameters, (or even
> the Hessian). But this is currently mostly just an idea, since we need
> some analytical gradients in the estimation of stats models.
> 
> 
>> 
>> If there is not is it possible for me to suggest that this feature is added.
>> There is such an excellent range of distributions, each with such an
>> impressive range of options, it seems ashame to have to mostly manually code
>> up the log of pdfs and often call the log of CDFs from R.
> 
> So far I only thought about log pdf, because I wanted it for Maximum
> Likelihood estimation.
> 
> Do you have a rough idea for which distributions log cdf would work?
> that is, for which distribution is an analytical or efficient
> numerical expression possible.
> 
> I also think that scipy.stats.distributions could be one of the best
> (broadest, consistent) collection of univariate distributions that I
> have seen so far, once we fill in some missing pieces.
> 
> As a way forward, I think we could make the distributions into a
> numerical encyclopedia by adding private methods to those
> distributions where it makes sense, like log pdf, log cdf and I also
> started to add characteristic functions to some distributions in my
> experimental scripts.
> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
> 
> However, this would miss the generic broadcasting part of the public
> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
> those because of the overhead.
> 
> 
> I'm working on and off on this, so it's moving only slowly (and my
> wishlist is big).
> (for example, I was reading up on extreme value distributions in
> actuarial science and hydrology to get a better overview over the
> estimators.)
> 
> 
> So, I really love to hear any ideas, feedback, and see contributions
> to improving the distributions.
> 
> Josef
> 
> 
>> 
>> Thanks,
>> Chris.
>> 
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

---
Travis Oliphant
Enthought, Inc.
oliphant at enthought.com
1-512-536-1057
http://www.enthought.com


From d.l.goldsmith at gmail.com  Sat May 29 17:14:35 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sat, 29 May 2010 14:14:35 -0700
Subject: [SciPy-User] [OT] Fwd: (Former student of mine) Searching for
	Python studies
Message-ID: <AANLkTimzwWKHU2zqlNWGE9dTC_TL-Yf9N351hgbrZfEy@mail.gmail.com>

Hi!  RaNaldo is a former student of mine; he's sought my advice for formally
continuing his programming studies in Python, and I'm afraid I'm at a loss
for specific resources to which he may be referred.  If any one can help him
out, he (and I) would be most appreciative.  Thanks!

David Goldsmith

On Wed, May 12, 2010 at 11:24 AM, RaNaldo Shorter <ranaldoS at msn.com> wrote:

Hello Mr. Goldsmith,

Could you help me find formal training to learn Python?
college preferred any accredited sort is best.

I cloud simply study at home....

I was introduced to Python from you during studies
at The Art Institute of Seattle, Spring 2008.

Thank you for your time helping me.

>From RaNaldo Shorter

 <http://www.ranaldos.com> www.ranaldos.com

---------- Forwarded message ----------
From: RaNaldo Shorter <ranaldoS at msn.com>
Date: Wed, May 12, 2010 at 12:19 PM
Subject: RE: Searching for Python studies
To: David Goldsmith <d.l.goldsmith at gmail.com>

Thank you for remembering me! I've answered your questions below.

Thanks.

From: David Goldsmith [mailto:d.l.goldsmith at gmail.com]
Sent: Wednesday, May 12, 2010 11:43 AM
To: RaNaldo Shorter
Subject: Re: Searching for Python studies

Hi, RaNaldo, I remember you.  I'm going to "punt" this one to some lists I'm
on; some info people will want to know:

0) How much Python programming have you done since my class?

Ans:
I've not deeply delved into Python since your work with us.  [DG: For
reference, we used Dawson, M. 2006 "Python Programming for the Absolute
Beginner, 2nd Ed." as the text, which uses game development as its
motivational basis, and is organized the "traditional" way of presenting
procedural/structured programming needs and techniques first, OO concepts
and techniques second.]

However, I currently  use Digital Tutors
<http://www.digitaltutors.com/09/index.php> " when I have
immediate needs for information.

1) How much, and what kind of, programming have you done in languages other
than Python?

Ans:
Other languages I use (need and want) includes Maya
Embedded Script Language (MEL) and standard X-HTML,
XML, CSS, and some Java Script.

2) How much and what kind of non-programming computer experience do you
have?

Ans:
I have a fair amount. My first computer was IBM in 1985 when
I nearly started learning fundamental PC languages before Windows.

3) Is online study an option (i.e., do you need it to be a traditional,
lecture-format course)?

Ans:
On-line study is a good option, yet I love classroom possibilities...

4) What sort of Python programming do you want to (perhaps ultimately) learn
(e.g., general purpose, Web development, UI design & implementation,
database, data processing, graphic design/animation, scripting other
programs w/ Python) and how far do you want to go with it, i.e., what's your
motivation?

Ans:
Am currently learning MEL from Digital Tutors
<http://www.digitaltutors.com/09/index.php> .

I'm motivated by the frequent choices to use Python in many of my simulation
programs especially "Real Flow <http://www.realflow.com/> " a fluid dynamics
simulation application.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/a8cef103/attachment.html>

From josef.pktd at gmail.com  Sat May 29 17:20:25 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 17:20:25 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
Message-ID: <AANLkTikucpNxBRyddhhAcEau0vhT0lg8MJGJnStW1Qp2@mail.gmail.com>

On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>
>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>> <christophermarkstrickland at gmail.com> wrote:
>>> Hi,
>>>
>>> When using any of the distributions of scipy.stats there does not seem to be
>>> the ability (or at least I cannot figure out how) to have the function
>>> return
>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>> essential.
>>> For instance suppose we are interested in an exponential distribution for a
>>> random variable x with a hyperparameter lambda there needs to be an option
>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>
>>> Is there a way to do this using the distributions in scipy.stats?
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify much.
>>
>> I proposed this once together with other improvements (but without response).
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>
> Hey Josef,
>
> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
>
> I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.

>
> Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy?

I just committed the changes for the _logpdf, ..., I didn't see any
changes of yours in the timeline, nor in svn changes, plus a fix to
internal wrapcauchy_cdf

generic _logpdf, logcdf and the 13 cases of my test script are in svn

Josef


Josef

>
> Thanks,
>
> -Travis
>
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
>>
>>>
>>> If there is not is it possible for me to suggest that this feature is added.
>>> There is such an excellent range of distributions, each with such an
>>> impressive range of options, it seems ashame to have to mostly manually code
>>> up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>>>
>>> Thanks,
>>> Chris.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> ---
> Travis Oliphant
> Enthought, Inc.
> oliphant at enthought.com
> 1-512-536-1057
> http://www.enthought.com
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Sat May 29 17:38:46 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 17:38:46 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
Message-ID: <AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>

On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>
>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>> <christophermarkstrickland at gmail.com> wrote:
>>> Hi,
>>>
>>> When using any of the distributions of scipy.stats there does not seem to be
>>> the ability (or at least I cannot figure out how) to have the function
>>> return
>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>> essential.
>>> For instance suppose we are interested in an exponential distribution for a
>>> random variable x with a hyperparameter lambda there needs to be an option
>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>
>>> Is there a way to do this using the distributions in scipy.stats?
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify much.
>>
>> I proposed this once together with other improvements (but without response).
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>
> Hey Josef,
>
> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).

I would like to get the private _logpdf in a useful (vectorized or
broadcastable) version because for estimation and optimization, I want
to avoid the logpdf overhead. So, my testing will be on the underline
versions.

>
> I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.

I have written a semi-frozen fit function and posted to the mailing
list a long time ago, but since I'm not sure about the API and I'm
expanding to several new estimators, I kept this under
work-in-progress.

Similar _fitstart might need extra options, for estimation when some
parameters are fixed, e.g. there are good moment estimators that work
when some of the parameters (e.g. loc or scale) are fixed. Also
_fitstart is currently used only by my fit_frozen.

I was hoping to get this done this year, maybe together with the
enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
Maximum Spacings.

I also have a Generalized Method of Moments estimator based on
matching quantiles and moments in the works.

So, I don't want yet to be pinned down with any API for the estimation
enhancements.

Josef

>
> Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy?
>
> Thanks,
>
> -Travis
>
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
>>
>>>
>>> If there is not is it possible for me to suggest that this feature is added.
>>> There is such an excellent range of distributions, each with such an
>>> impressive range of options, it seems ashame to have to mostly manually code
>>> up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>>>
>>> Thanks,
>>> Chris.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> ---
> Travis Oliphant
> Enthought, Inc.
> oliphant at enthought.com
> 1-512-536-1057
> http://www.enthought.com
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From d.l.goldsmith at gmail.com  Sat May 29 17:53:15 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sat, 29 May 2010 14:53:15 -0700
Subject: [SciPy-User] [OT] Analysis for Applied Mathematics
Message-ID: <AANLkTin2v1kgMQB5M9E2Zap7jrp3jxA67_249M2sERrd@mail.gmail.com>

Hi!  A long time ago, I asked Bill Derrick, my advisor @ the Univ. of
Montana, to recommend a text on Analysis written for applied mathematicians
and he suggested Cheney, "Analysis for Applied Mathematics."  I'm finally in
a position to add such a volume to my library and I'm wondering if A) anyone
reading this would strongly disagree w/ this recommendation (and if so,
why), and B) in particular, has it since been superseded by something
superior?  Thanks!

DG

PS: I'm also in the market for a treatise on Noise (i.e., I'm interested in
something that attempts to be pretty comprehensive, covering theory and
applications, looking at it - in its various "colors", i.e., white, pink,
brown, etc. - from the variety of disciplines in which it plays an important
part, etc., etc.)  Thanks again!

-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/c08ce360/attachment.html>

From josef.pktd at gmail.com  Sat May 29 17:58:31 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 29 May 2010 17:58:31 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
	<AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>
Message-ID: <AANLkTilw0UdU7fv9jGO0Sw1HXL3ziCSBQvavp4WpUK0L@mail.gmail.com>

On Sat, May 29, 2010 at 5:38 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>>
>> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>>
>>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>>> <christophermarkstrickland at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> When using any of the distributions of scipy.stats there does not seem to be
>>>> the ability (or at least I cannot figure out how) to have the function
>>>> return
>>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>>> essential.
>>>> For instance suppose we are interested in an exponential distribution for a
>>>> random variable x with a hyperparameter lambda there needs to be an option
>>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>>
>>>> Is there a way to do this using the distributions in scipy.stats?
>>>
>>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>>> So, this is work, and for some distributions the log wouldn't simplify much.
>>>
>>> I proposed this once together with other improvements (but without response).
>>>
>>> The second useful method for estimation would be _fitstart, which
>>> provides distribution specific starting values for fit, e.g. a moment
>>> estimator, or a simple rules of thumb
>>> http://projects.scipy.org/scipy/ticket/808
>>>
>>>
>>> Here are some of my currently planned enhancements to the distributions:
>>>
>>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>>
>> Hey Josef,
>>
>> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
>
> I would like to get the private _logpdf in a useful (vectorized or
> broadcastable) version because for estimation and optimization, I want
> to avoid the logpdf overhead. So, my testing will be on the underline
> versions.
>
>>
>> I also added your _fitstart suggestion. ? I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.
>
> I have written a semi-frozen fit function and posted to the mailing
> list a long time ago, but since I'm not sure about the API and I'm
> expanding to several new estimators, I kept this under
> work-in-progress.
>
> Similar _fitstart might need extra options, for estimation when some
> parameters are fixed, e.g. there are good moment estimators that work
> when some of the parameters (e.g. loc or scale) are fixed. Also
> _fitstart is currently used only by my fit_frozen.
>
> I was hoping to get this done this year, maybe together with the
> enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
> Maximum Spacings.
>
> I also have a Generalized Method of Moments estimator based on
> matching quantiles and moments in the works.
>
> So, I don't want yet to be pinned down with any API for the estimation
> enhancements.
>
> Josef
>
>>
>> Do you have updated code I could look at. ? These are relatively easy adds that I would like to put in today. ? ? Do you have check-in rights to SciPy?

http://projects.scipy.org/scipy/log/trunk/scipy/stats/distributions.py

>>
>> Thanks,
>>
>> -Travis
>>
>>>
>>> but I just checked, it looks like I forgot to copy the _loglike method
>>> that I started from my experimental scripts.
>>>
>>> For a few distributions, where this is possible, it would also be
>>> useful to add the gradient with respect to the parameters, (or even
>>> the Hessian). But this is currently mostly just an idea, since we need
>>> some analytical gradients in the estimation of stats models.
>>>
>>>
>>>>
>>>> If there is not is it possible for me to suggest that this feature is added.
>>>> There is such an excellent range of distributions, each with such an
>>>> impressive range of options, it seems ashame to have to mostly manually code
>>>> up the log of pdfs and often call the log of CDFs from R.
>>>
>>> So far I only thought about log pdf, because I wanted it for Maximum
>>> Likelihood estimation.
>>>
>>> Do you have a rough idea for which distributions log cdf would work?
>>> that is, for which distribution is an analytical or efficient
>>> numerical expression possible.
>>>
>>> I also think that scipy.stats.distributions could be one of the best
>>> (broadest, consistent) collection of univariate distributions that I
>>> have seen so far, once we fill in some missing pieces.
>>>
>>> As a way forward, I think we could make the distributions into a
>>> numerical encyclopedia by adding private methods to those
>>> distributions where it makes sense, like log pdf, log cdf and I also
>>> started to add characteristic functions to some distributions in my
>>> experimental scripts.
>>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>>
>>> However, this would miss the generic broadcasting part of the public
>>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>>> those because of the overhead.
>>>
>>>
>>> I'm working on and off on this, so it's moving only slowly (and my
>>> wishlist is big).
>>> (for example, I was reading up on extreme value distributions in
>>> actuarial science and hydrology to get a better overview over the
>>> estimators.)
>>>
>>>
>>> So, I really love to hear any ideas, feedback, and see contributions
>>> to improving the distributions.
>>>
>>> Josef
>>>
>>>
>>>>
>>>> Thanks,
>>>> Chris.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> ---
>> Travis Oliphant
>> Enthought, Inc.
>> oliphant at enthought.com
>> 1-512-536-1057
>> http://www.enthought.com
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From uclamathguy at gmail.com  Sat May 29 21:05:32 2010
From: uclamathguy at gmail.com (Ryan Rosario)
Date: Sat, 29 May 2010 18:05:32 -0700
Subject: [SciPy-User] Problem with np.load() on Huge Sparse Matrix
Message-ID: <AANLkTilmI_6JzfDgUNKQNAHTdI5MzGzX5ieiKnFzbDGI@mail.gmail.com>

Hi,

I have a very huge sparse (395000 x 395000) CSC matrix that I cannot
save in one pass, so I saved the data, indices, indptr and shape in
separate files as suggested by Dave Wade-Farley a few years back.

When I try to read back the indices pickle:

>> np.save("indices.pickle", mymatrix.indices)
>>> indices = np.load("indices.pickle.npy")
>>> indices
array([394852, 394649, 394533, ...,      0,      0,      0], dtype=int32)
>>> intersection_matrix.indices
array([394852, 394649, 394533, ...,   1557,   1223,    285], dtype=int32)

Why is this happening? My only workaround is to print all of entries
of intersection_matrix.indices to a file, and read in back which takes
up to 2 hours. It would be great if I could get np.load to work
because it is much faster.

Thanks,
Ryan


From jsseabold at gmail.com  Sun May 30 00:20:45 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sun, 30 May 2010 00:20:45 -0400
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTileqUwoLqht-JzFBGyh3FFdhbfb2SrHULqN3SLq@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com> 
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com> 
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com> 
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com> 
	<AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com> 
	<AANLkTinh2GcXunB35uvF0LQDrMUfu-32HuecLgbJYWSO@mail.gmail.com> 
	<AANLkTileqUwoLqht-JzFBGyh3FFdhbfb2SrHULqN3SLq@mail.gmail.com>
Message-ID: <AANLkTinsLWZhOLDv5PedNOvDfSzoHjaDmte1GkNzWD-O@mail.gmail.com>

On Sat, May 29, 2010 at 3:44 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 9:00 PM, ?<josef.pktd at gmail.com> wrote:
>> On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Fri, May 28, 2010 at 7:53 PM, ?<josef.pktd at gmail.com> wrote:
>>>> R's license, GPL, is incompatible with the license of scipy, BSD.
>>>> While they are allowed to look at our code, code that goes into scipy
>>>> cannot be based on GPL licensed code.
>>>
>>> You mean, they're allowed to copy our code, and we're allowed to look
>>> at their code for reference but can't use it directly :-).
>>
>> We are allowed to look at their manuals but not their code.
>> (Life ain't fair.)
>
> It sounds like you guys have this well in hand, but just a point here
> -- you certainly are allowed to look at their code, just not copy the
> "expressive aspects" of it. (Saying you can't *look* at it because of
> the license is like saying writers can't read other people's novels!)
> "Expressive" is a tricky term, of course -- IIUC it's basically
> anything that could be changed while preserving functionality (because
> the functionality, the algorithm itself, is not covered by copyright).
> So, say, variable names certainly count as expressive, decisions about
> which way to lay out the code, etc. If one wants to be really safe,
> one can write down a textual description of the algorithm and then ask
> someone else to translate back to code (the "clean room" method).
>
> So you do have to be a bit careful, but when you have code that
> contains valuable information that isn't really written down anywhere
> else then I'd say it's worth it.
>

Thanks, this is useful to know.  I've always erred on the side of
caution and just compared the results of functions/algorithms that
*should* be the same vs, say, R, but if I could do this and then look
at implementation details this could relieve substantial headaches.
It still seems like such a fine line though.

Skipper


From aarchiba at physics.mcgill.ca  Sun May 30 00:36:08 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Sun, 30 May 2010 01:36:08 -0300
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTinsLWZhOLDv5PedNOvDfSzoHjaDmte1GkNzWD-O@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com> 
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com> 
	<AANLkTilweH9cB6GfzDcMz-N3tQfUiFixazbXeA80IUqV@mail.gmail.com> 
	<AANLkTim0XlTjbAxOvwK9MUuEkRd3C-mJTVRql9ukAANP@mail.gmail.com> 
	<AANLkTinWC0NzPBhQJisiQFz97Q_bjaFXjmJyUpC3xB2z@mail.gmail.com> 
	<AANLkTinh2GcXunB35uvF0LQDrMUfu-32HuecLgbJYWSO@mail.gmail.com> 
	<AANLkTileqUwoLqht-JzFBGyh3FFdhbfb2SrHULqN3SLq@mail.gmail.com> 
	<AANLkTinsLWZhOLDv5PedNOvDfSzoHjaDmte1GkNzWD-O@mail.gmail.com>
Message-ID: <AANLkTim59FXFE9mGyddnfT4ZPNfhGRFYCaQ4990SyqX8@mail.gmail.com>

On 30 May 2010 01:20, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Sat, May 29, 2010 at 3:44 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, May 28, 2010 at 9:00 PM, ?<josef.pktd at gmail.com> wrote:
>>> On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Fri, May 28, 2010 at 7:53 PM, ?<josef.pktd at gmail.com> wrote:
>>>>> R's license, GPL, is incompatible with the license of scipy, BSD.
>>>>> While they are allowed to look at our code, code that goes into scipy
>>>>> cannot be based on GPL licensed code.
>>>>
>>>> You mean, they're allowed to copy our code, and we're allowed to look
>>>> at their code for reference but can't use it directly :-).
>>>
>>> We are allowed to look at their manuals but not their code.
>>> (Life ain't fair.)
>>
>> It sounds like you guys have this well in hand, but just a point here
>> -- you certainly are allowed to look at their code, just not copy the
>> "expressive aspects" of it. (Saying you can't *look* at it because of
>> the license is like saying writers can't read other people's novels!)
>> "Expressive" is a tricky term, of course -- IIUC it's basically
>> anything that could be changed while preserving functionality (because
>> the functionality, the algorithm itself, is not covered by copyright).
>> So, say, variable names certainly count as expressive, decisions about
>> which way to lay out the code, etc. If one wants to be really safe,
>> one can write down a textual description of the algorithm and then ask
>> someone else to translate back to code (the "clean room" method).
>>
>> So you do have to be a bit careful, but when you have code that
>> contains valuable information that isn't really written down anywhere
>> else then I'd say it's worth it.
>>
>
> Thanks, this is useful to know. ?I've always erred on the side of
> caution and just compared the results of functions/algorithms that
> *should* be the same vs, say, R, but if I could do this and then look
> at implementation details this could relieve substantial headaches.
> It still seems like such a fine line though.

This is exactly the problem. I don't think the R community is
particularly litigious, but as a rule of thumb, doing something that
is technically legal but for which the legality is subtle opens one up
to lawsuits. The problem is that even when you are right, a lawsuit is
tremendously destructive. So things that are legal but subtle should
probably be avoided by a group as penniless as the community as scipy
developers. So it's probably better to just not read their source
code.

Anne

> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From m.boumans at gmx.net  Mon May 31 01:03:48 2010
From: m.boumans at gmx.net (bowie_22)
Date: Mon, 31 May 2010 05:03:48 +0000 (UTC)
Subject: [SciPy-User] OpenOpt and Scipy.optimize
Message-ID: <loom.20100531T065827-575@post.gmane.org>

Hello together,

during my evaluation of scipy as subsitute for Matlab I started to look at the
optimization features of sciypy by looking at the optimze module.

I posted a question and one answer contained a hint to OpenOpt.

Now I am a little bit unsure how to proceed. Does it make more sense to look at
OpenOpt rather then evaluating scipy.optimize? 

Regrads 

Marcus   


From ralf.gommers at googlemail.com  Mon May 31 07:39:41 2010
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 31 May 2010 19:39:41 +0800
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
	<AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>
Message-ID: <AANLkTikHSTyt_OeKLtz4sWzpPKYMUlvr78qmY8gfVdek@mail.gmail.com>

On Sun, May 30, 2010 at 5:38 AM, <josef.pktd at gmail.com> wrote:

> On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com>
> wrote:
> >
> > Hey Josef,
> >
> > I've been playing with distributions.py today and added logpdf, logcdf,
> logsf methods (based on _logpdf, _logcdf, _logsf methods in each
> distribution).
>
> I would like to get the private _logpdf in a useful (vectorized or
> broadcastable) version because for estimation and optimization, I want
> to avoid the logpdf overhead. So, my testing will be on the underline
> versions.
>
> >
> > I also added your _fitstart suggestion.   I would like to do something
> like your nnlf_fit method that allows you to fix some parameters and only
> solve for others, but I haven't thought through all the issues yet.
>
> I have written a semi-frozen fit function and posted to the mailing
> list a long time ago, but since I'm not sure about the API and I'm
> expanding to several new estimators, I kept this under
> work-in-progress.
>
> Similar _fitstart might need extra options, for estimation when some
> parameters are fixed, e.g. there are good moment estimators that work
> when some of the parameters (e.g. loc or scale) are fixed. Also
> _fitstart is currently used only by my fit_frozen.
>
> I was hoping to get this done this year, maybe together with the
> enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
> Maximum Spacings.
>
> I also have a Generalized Method of Moments estimator based on
> matching quantiles and moments in the works.
>
> So, I don't want yet to be pinned down with any API for the estimation
> enhancements.
>
>  These recent changes are a bit problematic for several reasons:
- there are many new methods for distributions without tests.
- there are no docs for many new private and public methods
- invalid syntax: http://projects.scipy.org/scipy/ticket/1186
- the old rv_continuous doc template was put back in

This, plus Josef saying that he doesn't want to fix the API for some methods
yet, makes me want to take it out of the 0.8.x branch. Any objections to
that Travis or Josef?

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/c214673f/attachment.html>

From charlesr.harris at gmail.com  Mon May 31 11:59:39 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 31 May 2010 09:59:39 -0600
Subject: [SciPy-User] log pdf, cdf, etc
In-Reply-To: <AANLkTikHSTyt_OeKLtz4sWzpPKYMUlvr78qmY8gfVdek@mail.gmail.com>
References: <AANLkTilmC_l3pWTizWFhB6v0R8Jo03v440ZdLfZf5pND@mail.gmail.com>
	<AANLkTinJXzsaCkbP1E1dkGIDNDruuYX_1vD9PCg6Ml5l@mail.gmail.com>
	<6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com>
	<AANLkTilDNm6fK2FYLlNgtWJLWb77_OaefOKqWhwb7sJ3@mail.gmail.com>
	<AANLkTikHSTyt_OeKLtz4sWzpPKYMUlvr78qmY8gfVdek@mail.gmail.com>
Message-ID: <AANLkTinePsH1EY-NfbXI8v0KWu6jihAiFOX84SXirOQx@mail.gmail.com>

On Mon, May 31, 2010 at 5:39 AM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Sun, May 30, 2010 at 5:38 AM, <josef.pktd at gmail.com> wrote:
>
>> On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com>
>> wrote:
>> >
>> > Hey Josef,
>> >
>> > I've been playing with distributions.py today and added logpdf, logcdf,
>> logsf methods (based on _logpdf, _logcdf, _logsf methods in each
>> distribution).
>>
>> I would like to get the private _logpdf in a useful (vectorized or
>> broadcastable) version because for estimation and optimization, I want
>> to avoid the logpdf overhead. So, my testing will be on the underline
>> versions.
>>
>> >
>> > I also added your _fitstart suggestion.   I would like to do something
>> like your nnlf_fit method that allows you to fix some parameters and only
>> solve for others, but I haven't thought through all the issues yet.
>>
>> I have written a semi-frozen fit function and posted to the mailing
>> list a long time ago, but since I'm not sure about the API and I'm
>> expanding to several new estimators, I kept this under
>> work-in-progress.
>>
>> Similar _fitstart might need extra options, for estimation when some
>> parameters are fixed, e.g. there are good moment estimators that work
>> when some of the parameters (e.g. loc or scale) are fixed. Also
>> _fitstart is currently used only by my fit_frozen.
>>
>> I was hoping to get this done this year, maybe together with the
>> enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
>> Maximum Spacings.
>>
>> I also have a Generalized Method of Moments estimator based on
>> matching quantiles and moments in the works.
>>
>> So, I don't want yet to be pinned down with any API for the estimation
>> enhancements.
>>
>>  These recent changes are a bit problematic for several reasons:
> - there are many new methods for distributions without tests.
> - there are no docs for many new private and public methods
> - invalid syntax: http://projects.scipy.org/scipy/ticket/1186
> - the old rv_continuous doc template was put back in
>
> This, plus Josef saying that he doesn't want to fix the API for some
> methods yet, makes me want to take it out of the 0.8.x branch. Any
> objections to that Travis or Josef?
>
>
I'm thinking it should be taken out of the trunk as well as the 0.8.x
branch.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/d76c5485/attachment.html>

From akshaysrinivasan at gmail.com  Mon May 31 12:34:55 2010
From: akshaysrinivasan at gmail.com (Akshay Srinivasan)
Date: Mon, 31 May 2010 22:04:55 +0530
Subject: [SciPy-User] Kinpy
Message-ID: <4C03E52F.1010309@gmail.com>

Hello,
         I have been doing a lot Chemical Kinetic simulation lately, I 
particularly found that generating the code for solving a given set of 
reactions is a lot more time consuming and mechanistic than the time 
taken to do the rest of the work. I wrote Kinpy as a simple script to 
generate the Python code for doing exactly this from the natural 
representation of a set of chemical reactions. Its not really a 
*project* per se - its just one file! I couldn't any other place to put 
it, so it ended up on google code.
         You can find the the source code and information on usage here:
http://code.google.com/p/kinpy/

Regards,
Akshay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/1504f87d/attachment.html>

From jr at sun.ac.za  Mon May 31 14:56:31 2010
From: jr at sun.ac.za (Johann Rohwer)
Date: Mon, 31 May 2010 20:56:31 +0200
Subject: [SciPy-User] Kinpy
In-Reply-To: <4C03E52F.1010309@gmail.com>
References: <4C03E52F.1010309@gmail.com>
Message-ID: <4C04065F.5090509@sun.ac.za>

You might be interested in PySCeS, the Python Simulator for Cellular Systems 
(http://pysces.sf.net), which is a package (runs on top of scipy and numpy) 
dedicated to solving (bio)chemical reaction networks and does, amongst 
others, time-course simulation, steady-state analysis, higher-order analyses 
such as stability analysis and metabolic control analysis, and more. I just 
don't want you to re-invent the wheel, and solving the kind of numerical 
problem you mention on your page is a breeze with PySCeS.

Regards
Johann

On 31/05/2010 18:34, Akshay Srinivasan wrote:
> Hello,
> I have been doing a lot Chemical Kinetic simulation lately, I
> particularly found that generating the code for solving a given set of
> reactions is a lot more time consuming and mechanistic than the time
> taken to do the rest of the work. I wrote Kinpy as a simple script to
> generate the Python code for doing exactly this from the natural
> representation of a set of chemical reactions. Its not really a
> *project* per se - its just one file! I couldn't any other place to put
> it, so it ended up on google code.
> You can find the the source code and information on usage here:
> http://code.google.com/p/kinpy/
>
> Regards,
> Akshay
>


From matthew.brett at gmail.com  Mon May 31 18:42:24 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 31 May 2010 15:42:24 -0700
Subject: [SciPy-User] Kinpy
In-Reply-To: <4C04065F.5090509@sun.ac.za>
References: <4C03E52F.1010309@gmail.com>
	<4C04065F.5090509@sun.ac.za>
Message-ID: <AANLkTikrVlA7baobHUlOjSY7MrfoRkjTuZDWTZjpvA15@mail.gmail.com>

Hi,

On Mon, May 31, 2010 at 11:56 AM, Johann Rohwer <jr at sun.ac.za> wrote:
> You might be interested in PySCeS, the Python Simulator for Cellular Systems
> (http://pysces.sf.net),

I bow low in respect for that excellent name.   I don't know who came
up with it, but whoever it was deserves due honor ;)

Matthew


From fernando.ferreira at poli.ufrj.br  Mon May 31 18:54:17 2010
From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=)
Date: Mon, 31 May 2010 19:54:17 -0300
Subject: [SciPy-User] scipy.io.matlab.loadmat error
Message-ID: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>

Hi,

I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2

For some reason, I can't load matlab files using scipy.io.matlab.loadmat:

scipy.io.matlab.loadmat('all_data.mat')
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84:
FutureWarning: Using struct_as_record default value (False) This will change
to True in future versions
  return MatFile5Reader(byte_stream, **kwargs)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/Users/fguimara/Documents/UFRJ/mestrado/CPE782_-_ICA/time_series_ica/script/<ipython
console> in <module>()

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc
in loadmat(file_name, mdict, appendmat, **kwargs)
    109     '''
    110     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 111     matfile_dict = MR.get_variables()
    112     if mdict is not None:
    113         mdict.update(matfile_dict)

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/miobase.pyc
in get_variables(self, variable_names)
    444         mdict['__globals__'] = []
    445         while not self.end_of_stream():
--> 446             getter = self.matrix_getter_factory()
    447             name = getter.name
    448             if variable_names and name not in variable_names:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc
in matrix_getter_factory(self)
    694
    695     def matrix_getter_factory(self):
--> 696         return self._array_reader.matrix_getter_factory()
    697
    698     def guess_byte_order(self):

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc
in matrix_getter_factory(self)
    313         elif not mdtype == miMATRIX:
    314             raise TypeError, \
--> 315                   'Expecting miMATRIX type here, got %d' %  mdtype
    316         else:
    317             getter = self.current_getter(byte_count)

TypeError: Expecting miMATRIX type here, got 1296630016


Can't understand why...

This is the info about the file
% > file all_data.mat
all_data.mat: Matlab v5 mat-file (little endian) version 0x0100


Anything?

Cheers
Fernando
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/3a4f4691/attachment.html>

From matthew.brett at gmail.com  Mon May 31 19:08:13 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 31 May 2010 16:08:13 -0700
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
Message-ID: <AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>

Hi,

2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
> Hi,
> I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2
> For some reason, I can't load matlab files using?scipy.io.matlab.loadmat:
> scipy.io.matlab.loadmat('all_data.mat')
...
> ?? ?317 ? ? ? ? ? ? getter = self.current_getter(byte_count)
> TypeError: Expecting miMATRIX type here, got 1296630016
>
> Can't understand why...

I don't know either I'm afraid.  Can you try the latest version?  Is
there some way you can get me the .mat file so I can debug the problem
in more detail?

Best,

Matthew


From fernando.ferreira at poli.ufrj.br  Mon May 31 21:10:55 2010
From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=)
Date: Mon, 31 May 2010 22:10:55 -0300
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com> 
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
Message-ID: <AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>

In the last email I meant python 2.6.5

The mat file is attached... This is a test, there is just an array 'x' with
few elements..
Didn't work either


Thanks,
Fernando


On Mon, May 31, 2010 at 8:08 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> 2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
> > Hi,
> > I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2
> > For some reason, I can't load matlab files using scipy.io.matlab.loadmat:
> > scipy.io.matlab.loadmat('all_data.mat')
> ...
> >     317             getter = self.current_getter(byte_count)
> > TypeError: Expecting miMATRIX type here, got 1296630016
> >
> > Can't understand why...
>
> I don't know either I'm afraid.  Can you try the latest version?  Is
> there some way you can get me the .mat file so I can debug the problem
> in more detail?
>
> Best,
>
> Matthew
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/c7bb779f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: teste.mat
Type: application/octet-stream
Size: 183 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/c7bb779f/attachment.obj>

From vincent at vincentdavis.net  Mon May 31 22:09:32 2010
From: vincent at vincentdavis.net (Vincent Davis)
Date: Mon, 31 May 2010 20:09:32 -0600
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
Message-ID: <AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>

Just as a note, this probably is not it  but I recently ran into this
with a csv file saved using excel on a mac. I guess it saves it as a
unicode format, the error reported is a EOL when opening with
genfromtxt but thats not quite right. Anyway if the file is saved
using matlab on you mac this unicode might be the problem. Of course I
am seeing this through skewed glasses I just couldn't not mention it.

Vincent

2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
> In the last email I meant python 2.6.5
> The mat file is attached... This is a test, there is just an array 'x' with
> few elements..
> Didn't work either
>
> Thanks,
> Fernando
>
>
>
>
> On Mon, May 31, 2010 at 8:08 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> 2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
>> > Hi,
>> > I'm using scipy under MacOS Snow Leopard v:0.7.2 with python 2.6.2
>> > For some reason, I can't load matlab files
>> > using?scipy.io.matlab.loadmat:
>> > scipy.io.matlab.loadmat('all_data.mat')
>> ...
>> > ?? ?317 ? ? ? ? ? ? getter = self.current_getter(byte_count)
>> > TypeError: Expecting miMATRIX type here, got 1296630016
>> >
>> > Can't understand why...
>>
>> I don't know either I'm afraid. ?Can you try the latest version? ?Is
>> there some way you can get me the .mat file so I can debug the problem
>> in more detail?
>>
>> Best,
>>
>> Matthew
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From jsseabold at gmail.com  Mon May 31 22:16:53 2010
From: jsseabold at gmail.com (Skipper Seabold)
Date: Mon, 31 May 2010 22:16:53 -0400
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com> 
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com> 
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
Message-ID: <AANLkTiltBNm8GJvLEaGwdbrA7THWV0ARGLaVjZ6K22G-@mail.gmail.com>

2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
> In the last email I meant python 2.6.5
> The mat file is attached... This is a test, there is just an array 'x' with
> few elements..
> Didn't work either

Works for me with a recent trunk of scipy.

In [1]: from scipy import io

In [2]: dta = io.loadmat('./teste.mat')

In [3]: dta['x']
Out[3]: array([[0, 1, 3, 0, 1, 3, 4, 5, 7, 7]], dtype=uint8)

In [4]: from scipy import __version__ as v

In [5]: v
Out[5]: '0.9.0.dev6447'

Skipper


From matthew.brett at gmail.com  Mon May 31 22:36:04 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 31 May 2010 19:36:04 -0700
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
Message-ID: <AANLkTil83uCzKW1ThfOVmXDGBQml001oGKjS2uThuQQV@mail.gmail.com>

Hi,

2010/5/31 Fernando Guimar?es Ferreira <fernando.ferreira at poli.ufrj.br>:
> In the last email I meant python 2.6.5
> The mat file is attached... This is a test, there is just an array 'x' with
> few elements..

It works for me with 0.7.2.  I wonder what's going on?

[mb312 at blair ~/tmp]$ uname -a
Darwin blair 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09
PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386
[mb312 at blair ~/tmp]$ ipython
Python 2.6.4 (r264:75706, Dec 22 2009, 14:55:30)
Type "copyright", "credits" or "license" for more information.

IPython 0.11.alpha1.bzr.r1223 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import scipy

In [2]: scipy.__version__
Out[2]: '0.7.2'

In [3]: import scipy.io.matlab

In [4]: scipy.io.matlab.loadmat('/Users/mb312/Downloads/teste.mat')
/Users/mb312/usr/local/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84:
FutureWarning: Using struct_as_record default value (False) This will
change to True in future versions
  return MatFile5Reader(byte_stream, **kwargs)
Out[4]:
{'__globals__': [],
 '__header__': 'MATLAB 5.0 MAT-file, Platform: MACI, Created on: Mon
May 31 21:06:09 2010',
 '__version__': '1.0',
 'x': array([[0, 1, 3, 0, 1, 3, 4, 5, 7, 7]], dtype=uint8)}

Best,

Matthew


From pgmdevlist at gmail.com  Mon May 31 22:38:55 2010
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 31 May 2010 22:38:55 -0400
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
	<AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>
Message-ID: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>

On May 31, 2010, at 10:09 PM, Vincent Davis wrote:
> Just as a note, this probably is not it  but I recently ran into this
> with a csv file saved using excel on a mac. I guess it saves it as a
> unicode format, the error reported is a EOL when opening with
> genfromtxt but thats not quite right.

 But I thought I had fixed that on the SVN ???


From fernando.ferreira at poli.ufrj.br  Mon May 31 22:43:40 2010
From: fernando.ferreira at poli.ufrj.br (=?ISO-8859-1?Q?Fernando_Guimar=E3es_Ferreira?=)
Date: Mon, 31 May 2010 23:43:40 -0300
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com> 
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com> 
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com> 
	<AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com> 
	<8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
Message-ID: <AANLkTinRIBT2M0OFVxE_VsfNMmtchA0VXmzzk35NedGF@mail.gmail.com>

14 [fguimara] script > uname -a
Darwin warley.local 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09
PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386
15 [fguimara] script > ipython
Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55)
Type "copyright", "credits" or "license" for more information.

IPython 0.10 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import scipy

In [2]: scipy.__version__
Out[2]: '0.7.2'

In [3]: import scipy.io.matlab

In [4]: scipy.io.matlab.loadmat('teste.mat')
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.py:84:
FutureWarning: Using struct_as_record default value (False) This will change
to True in future versions
  return MatFile5Reader(byte_stream, **kwargs)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/Users/fguimara/Documents/UFRJ/mestrado/CPE782_-_ICA/time_series_ica/script/<ipython
console> in <module>()

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc
in loadmat(file_name, mdict, appendmat, **kwargs)
    109     '''
    110     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 111     matfile_dict = MR.get_variables()
    112     if mdict is not None:
    113         mdict.update(matfile_dict)

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/miobase.pyc
in get_variables(self, variable_names)
    444         mdict['__globals__'] = []
    445         while not self.end_of_stream():
--> 446             getter = self.matrix_getter_factory()
    447             name = getter.name
    448             if variable_names and name not in variable_names:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc
in matrix_getter_factory(self)
    694
    695     def matrix_getter_factory(self):
--> 696         return self._array_reader.matrix_getter_factory()
    697
    698     def guess_byte_order(self):

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc
in matrix_getter_factory(self)
    313         elif not mdtype == miMATRIX:
    314             raise TypeError, \
--> 315                   'Expecting miMATRIX type here, got %d' %  mdtype
    316         else:
    317             getter = self.current_getter(byte_count)

TypeError: Expecting miMATRIX type here, got 1296630016

In [5]:


Same file.... But it doesnot work at all...


Cheers,
Fernando


On Mon, May 31, 2010 at 11:38 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

> On May 31, 2010, at 10:09 PM, Vincent Davis wrote:
> > Just as a note, this probably is not it  but I recently ran into this
> > with a csv file saved using excel on a mac. I guess it saves it as a
> > unicode format, the error reported is a EOL when opening with
> > genfromtxt but thats not quite right.
>
>  But I thought I had fixed that on the SVN ???
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100531/1449a509/attachment.html>

From matthew.brett at gmail.com  Mon May 31 22:58:24 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 31 May 2010 19:58:24 -0700
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTinRIBT2M0OFVxE_VsfNMmtchA0VXmzzk35NedGF@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
	<AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>
	<8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
	<AANLkTinRIBT2M0OFVxE_VsfNMmtchA0VXmzzk35NedGF@mail.gmail.com>
Message-ID: <AANLkTimvZsGgevCtuYPouGcegXzI9azQ3UtArOfRhtCz@mail.gmail.com>

Hi,
...
> TypeError: Expecting miMATRIX type here, got 1296630016
> In [5]:
>
> Same file.... But it does not work at all...

What version of numpy do you have?  I can't imagine it makes a
difference, but still.

Did you run the scipy tests?    Did the scipy.io.matlab tests pass?

Best,

Matthew


From vincent at vincentdavis.net  Mon May 31 23:53:34 2010
From: vincent at vincentdavis.net (Vincent Davis)
Date: Mon, 31 May 2010 21:53:34 -0600
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
	<AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>
	<8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
Message-ID: <AANLkTikBPodgFNfrn8fSZUcfHykLh91wYs8MeBP3FlWn@mail.gmail.com>

On Mon, May 31, 2010 at 8:38 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> On May 31, 2010, at 10:09 PM, Vincent Davis wrote:
>> Just as a note, this probably is not it ?but I recently ran into this
>> with a csv file saved using excel on a mac. I guess it saves it as a
>> unicode format, the error reported is a EOL when opening with
>> genfromtxt but thats not quite right.
>
> ?But I thought I had fixed that on the SVN ???

You did but I assume that only applied to csv (type?) files.
I was thinking that they may have a "similar" problem with this mat
file. But I tried to clearly say I have no idea.
Vincent

>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From matthew.brett at gmail.com  Mon May 31 23:58:21 2010
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 31 May 2010 20:58:21 -0700
Subject: [SciPy-User] scipy.io.matlab.loadmat error
In-Reply-To: <AANLkTikBPodgFNfrn8fSZUcfHykLh91wYs8MeBP3FlWn@mail.gmail.com>
References: <AANLkTikJCp0RChkoS6WnQUhJHh_Z9yo0N82dZWm1ogZW@mail.gmail.com>
	<AANLkTil1qmtiR4d039fs2p-0qcO5W4-SWaEHJYhex6i4@mail.gmail.com>
	<AANLkTikpng3fMure40G7DTzKsQ0RQ0iBvKcGuHPO4bIh@mail.gmail.com>
	<AANLkTinkO7xwbHoSmTKR_sweiAzXvhETureDQnkeFUFl@mail.gmail.com>
	<8CA9D85A-CA93-4B7F-8434-02F633C44090@gmail.com>
	<AANLkTikBPodgFNfrn8fSZUcfHykLh91wYs8MeBP3FlWn@mail.gmail.com>
Message-ID: <AANLkTikqtG6R1pIMcVY5ods0AYubD9_QHkJyKqNVooen@mail.gmail.com>

Hi,

>> ?But I thought I had fixed that on the SVN ???
>
> You did but I assume that only applied to csv (type?) files.
> I was thinking that they may have a "similar" problem with this mat
> file. But I tried to clearly say I have no idea.

Actually the .mat files are a custom binary format by matlab - we
don't use the genfromtxt stuff to load them...

Matthew


From skorpio11 at gmail.com  Tue May 25 20:33:04 2010
From: skorpio11 at gmail.com (Leon Adams)
Date: Tue, 25 May 2010 20:33:04 -0400
Subject: [SciPy-User] Triangular Distribution ppf method
Message-ID: <AANLkTikgFQxKci8shEd1YlVx1CANCg5G3kZrrOFghIed@mail.gmail.com>

Hi all,
There seems to be a bug of some sort in evaluating the ppf method of the
scipy.stats.triang.ppf function. Evaluating the distribution with a location
parameter 1 or greater seems to problematic. I am looking for confirmation
on this behavior and suggestions for work around.

Thanks in advance

-- 
Leon Adams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100525/6da93ca1/attachment.html>

From tanwp at gis.a-star.edu.sg  Wed May 26 00:07:35 2010
From: tanwp at gis.a-star.edu.sg (Padma TAN)
Date: Wed, 26 May 2010 12:07:35 +0800
Subject: [SciPy-User] Python scipy error.
Message-ID: <C822BF87.6705%tanwp@gis.a-star.edu.sg>

Hi

Error message I got when needed to run this. Please assist!

[rjauch at giswk002 pwm_scanner]$ python pwm_scanner.py
Traceback (most recent call last):
  File "pwm_scanner.py", line 36, in <module>
    from glbase import *
  File "/home/rjauch/glbase/__init__.py", line 57, in <module>
    from glglob import glglob
  File "/home/rjauch/glbase/glglob.py", line 27, in <module>
    from scipy.stats import spearmanr, pearsonr
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/stats/__init__.py", line 7, in <module>
    from stats import *
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/stats/stats.py", line 199, in <module>
    import scipy.linalg as linalg
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/__init__.py", line 8, in <module>
    from basic import *
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/basic.py", line 389, in <module>
    import decomp
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/decomp.py", line 23, in <module>
    from blas import get_blas_funcs
  File "/usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/blas.py", line 14, in <module>
    from scipy.linalg import fblas
ImportError: /usr/local/Python-2.6.2/lib/python2.6/site-packages/scipy/linalg/fblas.so: undefined symbol: srotmg_


Can I safely ignore this? Messages shown when running python setup.py build for scipy.


customize UnixCCompiler
customize UnixCCompiler using build_clib
customize GnuFCompiler
Found executable /usr/bin/g77
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler using build_clib
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
extending extension 'scipy.sparse.linalg.dsolve._zsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)]
extending extension 'scipy.sparse.linalg.dsolve._dsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)]
extending extension 'scipy.sparse.linalg.dsolve._csuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)]
extending extension 'scipy.sparse.linalg.dsolve._ssuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)]
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler using build_ext
running scons
[root at giswk002 scipy-0.7.2]#


SYSTEM PYTHON INFO

[root at giswk002 local]# python -c 'from numpy.f2py.diagnose import run; run()'
------
os.name='posix'
------
sys.platform='linux2'
------
sys.version:
2.6.2 (r262:71600, Jul 15 2009, 19:48:50)
[GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)]
------
sys.prefix:
/usr/local/Python-2.6.2
------
sys.path=':/usr/local/Python-2.6.2/lib/python2.6/site-packages:/usr/local:/usr/local/Python-2.6.2/lib/python26.zip:/usr/local/Python-2.6.2/lib/python2.6:/usr/local/Python-2.6.2/lib/python2.6/plat-linux2:/usr/local/Python-2.6.2/lib/python2.6/lib-tk:/usr/local/Python-2.6.2/lib/python2.6/lib-old:/usr/local/Python-2.6.2/lib/python2.6/lib-dynload:/root/.local/lib/python2.6/site-packages'
------
Failed to import Numeric: No module named Numeric
Failed to import numarray: No module named numarray
Found new numpy version '1.3.0' in /usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/__init__.pyc
Found f2py2e version '2' in /usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/f2py/f2py2e.pyc
Found numpy.distutils version '0.4.0' in '/usr/local/Python-2.6.2/lib/python2.6/site-packages/numpy/distutils/__init__.pyc'
------
Importing numpy.distutils.fcompiler ... ok
------
Checking availability of supported Fortran compilers:
GnuFCompiler instance properties:
  archiver        = ['/usr/bin/g77', '-cr']
  compile_switch  = '-c'
  compiler_f77    = ['/usr/bin/g77', '-g', '-Wall', '-fno-second-
                    underscore', '-fPIC', '-O3', '-funroll-loops']
  compiler_f90    = None
  compiler_fix    = None
  libraries       = ['g2c']
  library_dirs    = []
  linker_exe      = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall']
  linker_so       = ['/usr/bin/g77', '-g', '-Wall', '-g', '-Wall', '-
                    shared']
  object_switch   = '-o '
  ranlib          = ['/usr/bin/g77']
  version         = LooseVersion ('3.4.3')
  version_cmd     = ['/usr/bin/g77', '--version']
Gnu95FCompiler instance properties:
  archiver        = ['/usr/bin/gfortran', '-cr']
  compile_switch  = '-c'
  compiler_f77    = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno-
                    second-underscore', '-fPIC', '-O3', '-funroll-loops']
  compiler_f90    = ['/usr/bin/gfortran', '-Wall', '-fno-second-underscore',
                    '-fPIC', '-O3', '-funroll-loops']
  compiler_fix    = ['/usr/bin/gfortran', '-Wall', '-ffixed-form', '-fno-
                    second-underscore', '-Wall', '-fno-second-underscore', '-
                    fPIC', '-O3', '-funroll-loops']
  libraries       = ['gfortran']
  library_dirs    = []
  linker_exe      = ['/usr/bin/gfortran', '-Wall', '-Wall']
  linker_so       = ['/usr/bin/gfortran', '-Wall', '-Wall', '-shared']
  object_switch   = '-o '
  ranlib          = ['/usr/bin/gfortran']
  version         = LooseVersion ('4.0.0')
  version_cmd     = ['/usr/bin/gfortran', '--version']
Fortran compilers found:
  --fcompiler=gnu    GNU Fortran 77 compiler (3.4.3)
  --fcompiler=gnu95  GNU Fortran 95 compiler (4.0.0)
Compilers available for this platform, but not found:
  --fcompiler=absoft   Absoft Corp Fortran Compiler
  --fcompiler=compaq   Compaq Fortran Compiler
  --fcompiler=g95      G95 Fortran Compiler
  --fcompiler=intel    Intel Fortran Compiler for 32-bit apps
  --fcompiler=intele   Intel Fortran Compiler for Itanium apps
  --fcompiler=intelem  Intel Fortran Compiler for EM64T-based apps
  --fcompiler=lahey    Lahey/Fujitsu Fortran 95 Compiler
  --fcompiler=nag      NAGWare Fortran 95 Compiler
  --fcompiler=pg       Portland Group Fortran Compiler
  --fcompiler=vast     Pacific-Sierra Research Fortran 90 Compiler
Compilers not available on this platform:
  --fcompiler=hpux     HP Fortran 90 Compiler
  --fcompiler=ibm      IBM XL Fortran Compiler
  --fcompiler=intelev  Intel Visual Fortran Compiler for Itanium apps
  --fcompiler=intelv   Intel Visual Fortran Compiler for 32-bit apps
  --fcompiler=mips     MIPSpro Fortran Compiler
  --fcompiler=none     Fake Fortran compiler
  --fcompiler=sun      Sun or Forte Fortran 95 Compiler
For compiler details, run 'config_fc --verbose' setup command.
------
Importing numpy.distutils.cpuinfo ... ok
------
CPU information: CPUInfoBase__get_nbits getNCPUs has_mmx has_sse has_sse2 has_sse3 is_32bit is_Intel is_Nocona is_XEON is_Xeon is_singleCPU ------
[root at giswk002 local]#


Thanks a lot in advance!!!


Regards,
Padma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100526/873bb3d8/attachment.html>

From christopher.strickland at qut.edu.au  Thu May 27 07:22:54 2010
From: christopher.strickland at qut.edu.au (Chris Strickland)
Date: Thu, 27 May 2010 21:22:54 +1000
Subject: [SciPy-User] log pdf, cdf, etc
Message-ID: <201005272122.54911.christopher.strickland@qut.edu.au>

Hi, 

When using any of the distributions of scipy.stats there does not seem to be 
the ability (or at least I cannot figure out how) to have the function return 
the log of the pdf, cdf, sf, etc. For statistical analysis this is essential. 
For instance suppose we are interested in an exponential distribution for a 
random variable x with a hyperparameter lambda there needs to be an option 
that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to 
calculate log(scipy.stats.expon.pdf(x,lambda)). 

Is there a way to do this using the distributions in scipy.stats? 

If there is not is it possible for me to suggest that this feature is added. 
There is such an excellent range of distributions, each with such an 
impressive range of options, it seems ashame to have to mostly manually code 
up the log of pdfs and often call the log of CDFs from R.  

Thanks,
Chris.


From thoeger at fys.ku.dk  Fri May 28 09:29:27 2010
From: thoeger at fys.ku.dk (=?ISO-8859-1?Q?Th=F8ger?= Emil Juul Thorsen)
Date: Fri, 28 May 2010 15:29:27 +0200
Subject: [SciPy-User] matplotlib woes
Message-ID: <1275053367.1431.7.camel@falconeer>

Hello SciPy list; 

For my thesis I have an image which is also a spectrum of an object. I
want to plot the image using imshow along with a data plot of the
intensity, as can be seen on http://yfrog.com/0tforscipylistp .

My questions are 2:

1) imshow() sets the ticks on the two upper subplots as pixels
coordinates. What I want to show as tick labels on my x-axis is the
wavelength coordinates of the lower plot on the upper images (since
there is a straightforward pixel-to-wavelength conversion). I have
googled everywhere but can't seem to find a solution, is it possible?

2) Is there any possible way to make the subplots layout look a bit
nicer? Ideally to squeeze the two upper plots closer together and
stretch the lower plot vertically, or at least to make the two upper
subplots take up an equal amount of space?

Best regards; 

Emil, python-newb and (former) IDL-user, 
Master student of Astrophysics at the University of Copenhagen, 
Niels Bohr Instutute.