From ndbecker2 at gmail.com  Sun Nov  1 08:26:18 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Sun, 01 Nov 2009 08:26:18 -0500
Subject: [SciPy-User] [SciPy-user] least-square filter design
References: <hc7gqg$mvs$1@ger.gmane.org> <26139404.post@talk.nabble.com>
Message-ID: <hck29q$lik$1@ger.gmane.org>

Tom K. wrote:

> 
> 
> Neal Becker wrote:
>> 
>> Anyone have code for least square (minimum mean square error) FIR filter
>> design?
>> 
> 
> Could you be a little more specific?  scipy.signal.firwin almost designs a
> least square low pass FIR filter if you use a rectangular window (I say
> almost because like other packages the filter's response is normalized to
> unity at DC so technically it is not least squares although the difference
> is slight and decreases with increasing filter order).
> 
> Do you need a transition band?  What type of FIR filter: lowpass,
> highpass,
> bandpass, bandstop, or multiband?  Are discrete samples OK, or do you need
> a
> continuous band (or set of bands)?  Which type of filter - is symmetric
> OK, or do you need antisymmetric?
> 
> Or, are you talking about an adaptive filter?
>

I'm looking for something like this: 
http://www.mathworks.com/access/helpdesk/help/toolbox/filterdesign/ref/firls.html


From amenity at enthought.com  Sun Nov  1 08:59:42 2009
From: amenity at enthought.com (Amenity Applewhite)
Date: Sun, 1 Nov 2009 07:59:42 -0600
Subject: [SciPy-User] November 6 EPD Webinar: How do I... use Envisage for
	GUIs?
References: <74384495.1256959480096.JavaMail.root@p2-ws608.ad.prodcc.net>
Message-ID: <E0B79A9F-ED4E-4B04-958D-C65F4299C568@enthought.com>


Having trouble viewing this email? Click here

Friday, November 6:
How do I... use Envisage for GUIs?
Dear Leah,

Envisage is a Python-based framework for building extensible  
applications. The Envisage Core and corresponding Envisage Plugins are  
components of the Enthought Tool Suite. We've found that Envisage  
grants us a degree of immediate functionality in our custom  
applications and have come to rely on the framework in much of our  
development.
For November's EPD webinar, Corran Webster will show how you can hook  
together existing Envisage plugins to quickly create a new GUI. We'll  
also look at how you can easily turn an existing Traits UI interface  
into an Envisage plugin.

New: Linux-ready webinars!

In order to better serve the Linux-users among our subscribers, we've  
decided to begin hosting our EPD webinars on WebEx instead of  
GoToMeeting. This means that our original limit of 35 attendees will  
be scaled back to 30. As usual, EPD subscribers at a Basic level or  
above will be guaranteed seats for the event while the general public  
may add their name to the wait list here.

EPD Webinar: How do I... use Envisage for GUIs?
Friday, November 6
1pm CDT/6pm UTC
Wait list
We look forward to seeing you Friday! As always, feel free to contact  
us with questions, concerns, or suggestions for future webinar topics.

Thanks,

The Enthought Team
QUICK LINKS :::
www.enthought.com
code.enthought.com
Facebook
Enthought Blog

Forward email

This email was sent to leah at enthought.com by amenity at enthought.com.
Update Profile/Email Address | Instant removal with SafeUnsubscribe? |  
Privacy Policy.
Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091101/207b33a4/attachment.html>

From tpk at kraussfamily.org  Sun Nov  1 15:26:55 2009
From: tpk at kraussfamily.org (Tom K.)
Date: Sun, 1 Nov 2009 12:26:55 -0800 (PST)
Subject: [SciPy-User] [SciPy-user] least-square filter design
In-Reply-To: <hck29q$lik$1@ger.gmane.org>
References: <hc7gqg$mvs$1@ger.gmane.org> <26139404.post@talk.nabble.com>
	<hck29q$lik$1@ger.gmane.org>
Message-ID: <26154273.post@talk.nabble.com>


Neal Becker wrote:
> 
> Tom K. wrote:
> 
>> 
>> 
>> Neal Becker wrote:
>>> 
>>> Anyone have code for least square (minimum mean square error) FIR filter
>>> design?
>>> 
> I'm looking for something like this: 
> http://www.mathworks.com/access/helpdesk/help/toolbox/filterdesign/ref/firls.html
> 
> 

Here's something that works for odd length, symmetric filters with constant
magnitude per band - it doesn't support everything that MathWorks' firls
supports (e.g. design of even length, differentiators, antisymmetric
filters, sloping bands) but hopefully this meets your need.

import numpy as np
from scipy.special import sinc

def firls(N, f, D=None):
    """Least-squares FIR filter.
    N -- filter length, must be odd
    f -- list of tuples of band edges
       Units of band edges are Hz with 0.5 Hz == Nyquist
       and assumed 1 Hz sampling frequency
    D -- list of desired responses, one per band
    """
    if D is None:
        D = [1, 0]
    assert len(D) == len(f), "must have one desired response per band"
    assert N%2 == 1, 'filter length must be odd'
    L = (N-1)//2

    k = np.arange(L+1)
    k.shape = (1, L+1)
    j = k.T

    R = 0
    r = 0
    for i, (f0, f1) in enumerate(f):
        R += np.pi*f1*sinc(2*(j-k)*f1) - np.pi*f0*sinc(2*(j-k)*f0) + \
             np.pi*f1*sinc(2*(j+k)*f1) - np.pi*f0*sinc(2*(j+k)*f0)

        r += D[i]*(2*np.pi*f1*sinc(2*j*f1) - 2*np.pi*f0*sinc(2*j*f0))

    a = np.dot(np.linalg.inv(R), r)
    a.shape = (-1,)
    h = np.zeros(N)
    h[:L] = a[:0:-1]/2.
    h[L] = a[0]
    h[L+1:] = a[1:]/2.
    return h

def plot_response(h, name):
    H = np.fft.fft(h, 2000)
    f = np.arange(2000)/2000.
    figure()
    semilogy(f, abs(H))
    grid()
    setp(gca(), xlim=(0, .5))
    xlabel('frequency (Hz)')
    ylabel('magnitude')
    title(name)

if __name__ == '__main__':
    h = firls(31, [(0, .2), (.3, .5)])
    from matplotlib.pyplot import *
    plot_response(h, 'lowpass')

    h = firls(51, [(0, .25), (.35, .5)], [0, 1])
    plot_response(h, 'highpass')

    h = firls(51, [(0, .1), (.2, .3), (.4, .5)], [0, 1, 0])
    plot_response(h, 'bandpass')
    show()

-- 
View this message in context: http://old.nabble.com/least-square-filter-design-tp26083443p26154273.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From arun.gokule at gmail.com  Sun Nov  1 15:30:11 2009
From: arun.gokule at gmail.com (Arun Gokule)
Date: Sun, 1 Nov 2009 12:30:11 -0800
Subject: [SciPy-User] scipy.linalg.det TypeError
In-Reply-To: <20091030172409.GA1977@wombat.atmos.colostate.edu>
References: <20091028190701.GA30122@wombat.atmos.colostate.edu>
	<45d1ab480910290158u7d274687t737a27690fd08497@mail.gmail.com>
	<20091030172409.GA1977@wombat.atmos.colostate.edu>
Message-ID: <b7e5c4600911011230s29e2008em23f7fea8bd5ed674@mail.gmail.com>

Let us know if you need more help.


On Fri, Oct 30, 2009 at 9:24 AM, Norm Wood <nbwood at lamar.colostate.edu>wrote:

>
>
> On 29 Oct., David Goldsmith wrote:
> > Uncertain why you're having a problem - your sample code works for me:
> >
> > >>> import scipy.linalg
> > >>> import numpy.linalg
> > >>> A=np.matrix([[1.1, 1.9],[1.9,3.5]])
> > >>> y = numpy.linalg.det(A); y
> > 0.23999999999999988
> > >>> y = scipy.linalg.det(A); y
> > 0.23999999999999988
> > >>> scipy.__version__
> > '0.7.1'
> > >>> np.__version__
> > '1.3.0rc2'
> > Python 2.5 on Windoze Vista HPE
>
>
> Thanks for checking, David.  I'll have to take a closer look at how the
> "get_flinalg_funcs" procedure works, and will probably try
> rebuilding & reinstalling LAPACK, ATLAS, numpy and scipy from scratch.
>
> Norm
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091101/6933fea9/attachment.html>

From tpk at kraussfamily.org  Sun Nov  1 20:28:22 2009
From: tpk at kraussfamily.org (Tom K.)
Date: Sun, 1 Nov 2009 17:28:22 -0800 (PST)
Subject: [SciPy-User] [SciPy-user] least-square filter design
In-Reply-To: <26154273.post@talk.nabble.com>
References: <hc7gqg$mvs$1@ger.gmane.org> <26139404.post@talk.nabble.com>
	<hck29q$lik$1@ger.gmane.org> <26154273.post@talk.nabble.com>
Message-ID: <26155428.post@talk.nabble.com>


Tom K. wrote:
> 
>     a = np.dot(np.linalg.inv(R), r)
> 

It occurred to me that "inv" is not the right choice here...

Replace that line with:
    a = np.linalg.solve(R, r)

-- 
View this message in context: http://old.nabble.com/least-square-filter-design-tp26083443p26155428.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Sun Nov  1 20:45:09 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 1 Nov 2009 21:45:09 -0400
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
Message-ID: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>

This is just an exercise.

In econometrics (including statsmodels) we have a lot of quadratic
forms that are usually calculate with a matrix inverse. I finally
spend some time to figure out how to do this with cholesky or LU
composition which should be numerically more stable or accurate (and
faster).

"Don't let that INV go past your eyes" (matlab file exchange)

Josef

""" Use cholesky or LU decomposition to calculate quadratic forms

different ways to calculate matrix product B.T * inv(B) * B

Note: calling convention in sparse not consistent, sparse requires
loop over right hand side

Author: josef-pktd
"""

import numpy as np
from scipy import linalg

B = np.ones((3,2)).T
B = np.arange(6).reshape((3,2)).T

print 'using inv'
Ainv = linalg.inv(A)
print np.dot(Ainv, B[:,0])
print np.dot(Ainv, B)
print reduce(np.dot, [B.T, Ainv, B])

print 'using cholesky'
F = linalg.cho_factor(A)
print linalg.cho_solve(F, B[:,0])
print linalg.cho_solve(F, B)
print np.dot(B.T, linalg.cho_solve(F, B))


print 'using lu'
F = linalg.lu_factor(A)
print linalg.lu_solve(F, B[:,0])
print linalg.lu_solve(F, B)
print np.dot(B.T, linalg.lu_solve(F, B))

from scipy import sparse

Asp = sparse.csr_matrix(A)
print 'using sparse symmetric lu'
F = sparse.linalg.splu(A)
print F.solve(B[:,0])
#print F.solve(B) # wrong results but no exception
AiB = np.column_stack([F.solve(Bcol) for Bcol in B.T])
print AiB
print np.dot(B.T, AiB)

#not:
#Bsp = sparse.csr_matrix(B)
#print B.T * F.solve(Bsp)  # argument to solve must be dense array

print 'using sparse lu'
F = sparse.linalg.factorized(A)
print F(B[:,0])
#print F(B) # wrong results but no exception
AiB = np.column_stack([F(Bcol) for Bcol in B.T])
print np.dot(B.T, AiB)


From sturla at molden.no  Sun Nov  1 22:03:14 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 02 Nov 2009 04:03:14 +0100
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
Message-ID: <4AEE4BF2.4060403@molden.no>

josef.pktd at gmail.com skrev:
> In econometrics (including statsmodels) we have a lot of quadratic
> forms that are usually calculate with a matrix inverse. 
That is a sign of numerical incompetence.

You see this often in statistics as well, people who think matrix 
inverse is the way to calculate mahalanobis distances, when you should 
really use a Cholesky.

As for LU, I'd rather use an SVD as it is numerically more stabile. 
Using LU, you are betting on singular values not being tiny. With SVD 
you can solve an ill-conditioned system by zeroing tiny singular values. 
With LU you just get astronomic rounding errors.

Sturla


From gael.varoquaux at normalesup.org  Sun Nov  1 22:06:52 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 2 Nov 2009 04:06:52 +0100
Subject: [SciPy-User] linear algebra: quadratic forms without	linalg.inv
In-Reply-To: <4AEE4BF2.4060403@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
Message-ID: <20091102030652.GB27768@phare.normalesup.org>

On Mon, Nov 02, 2009 at 04:03:14AM +0100, Sturla Molden wrote:
> josef.pktd at gmail.com skrev:
> > In econometrics (including statsmodels) we have a lot of quadratic
> > forms that are usually calculate with a matrix inverse. 
> That is a sign of numerical incompetence.

Yup, but you'd be surprised to see how much inverse is used. I was
astonished to find out that senior researchers that I respect a lot were
not even aware of the problem.

Ga?l


From josef.pktd at gmail.com  Sun Nov  1 22:25:42 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 1 Nov 2009 23:25:42 -0400
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEE4BF2.4060403@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
Message-ID: <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>

On Sun, Nov 1, 2009 at 11:03 PM, Sturla Molden <sturla at molden.no> wrote:
> josef.pktd at gmail.com skrev:
>> In econometrics (including statsmodels) we have a lot of quadratic
>> forms that are usually calculate with a matrix inverse.

> That is a sign of numerical incompetence.

I agree, but that's the training. Last time I did principal components
with SVD, it took me a long time to figure out how to get it to work,
I still don't understand it.The only matrix decomposition that I'm
familiar with is eigenvalue decomposition.

But we had this part of the discussion before, in applied
econometrics, if we have enough multicollinearity that numerical
precision matters, then we are screwed anyway and have to rethink the
data analysis or the model, or do a pca.

>

> You see this often in statistics as well, people who think matrix
> inverse is the way to calculate mahalanobis distances, when you should
> really use a Cholesky.
>
> As for LU, I'd rather use an SVD as it is numerically more stabile.
> Using LU, you are betting on singular values not being tiny. With SVD
> you can solve an ill-conditioned system by zeroing tiny singular values.
> With LU you just get astronomic rounding errors.

How can you calculate the quadratic form or the product inv(A)*B with SVD?
Solving the equations is ok, since pinv and lstsq are based on SVD internally.

In matlab there is also a version for QR, but I haven't figured out
how to do this in scipy without an inverse.

Josef

>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From sturla at molden.no  Sun Nov  1 23:33:26 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 02 Nov 2009 05:33:26 +0100
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
Message-ID: <4AEE6116.6070602@molden.no>

josef.pktd at gmail.com skrev:
> if we have enough multicollinearity that numerical
> precision matters, then we are screwed anyway and have to rethink the
> data analysis or the model, or do a pca.
>
>   
And PCA has nothing to do with SVD, right?

Or ... what what would you call a procesure that takes your data, 
subtracts the mean, and does an SVD?

:-D


Sturla


From josef.pktd at gmail.com  Mon Nov  2 00:15:41 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Nov 2009 01:15:41 -0400
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEE6116.6070602@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
Message-ID: <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>

On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden <sturla at molden.no> wrote:
> josef.pktd at gmail.com skrev:
>> if we have enough multicollinearity that numerical
>> precision matters, then we are screwed anyway and have to rethink the
>> data analysis or the model, or do a pca.
>>
>>
> And PCA has nothing to do with SVD, right?

> Or ... what what would you call a procesure that takes your data,
> subtracts the mean, and does an SVD?

All the explanations I read where in terms of eigenvalue decomposition
and not with SVD. I'm pretty good in removing negative eigenvalues
when I'm supposed to have a positive definite matrix, but SVD has too
many parts.

(Besides I don't like pca for regression, and I'm still struggling how to
do partial least squares with SVD.)

Josef


>
> :-D
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From robert.kern at gmail.com  Mon Nov  2 00:19:50 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 2 Nov 2009 00:19:50 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> 
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> 
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
Message-ID: <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>

On Mon, Nov 2, 2009 at 00:15,  <josef.pktd at gmail.com> wrote:
> On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden <sturla at molden.no> wrote:
>> josef.pktd at gmail.com skrev:
>>> if we have enough multicollinearity that numerical
>>> precision matters, then we are screwed anyway and have to rethink the
>>> data analysis or the model, or do a pca.
>>>
>>>
>> And PCA has nothing to do with SVD, right?
>
>> Or ... what what would you call a procesure that takes your data,
>> subtracts the mean, and does an SVD?
>
> All the explanations I read where in terms of eigenvalue decomposition
> and not with SVD.

Eigenvalues of the covariance matrix. The SVD gives you eigenvalues of
the covariance matrix directly from the demeaned data matrix without
explicitly forming the covariance matrix.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From josef.pktd at gmail.com  Mon Nov  2 00:55:42 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Nov 2009 01:55:42 -0400
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
Message-ID: <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>

On Mon, Nov 2, 2009 at 1:19 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Nov 2, 2009 at 00:15, ?<josef.pktd at gmail.com> wrote:
>> On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden <sturla at molden.no> wrote:
>>> josef.pktd at gmail.com skrev:
>>>> if we have enough multicollinearity that numerical
>>>> precision matters, then we are screwed anyway and have to rethink the
>>>> data analysis or the model, or do a pca.
>>>>
>>>>
>>> And PCA has nothing to do with SVD, right?
>>
>>> Or ... what what would you call a procesure that takes your data,
>>> subtracts the mean, and does an SVD?
>>
>> All the explanations I read where in terms of eigenvalue decomposition
>> and not with SVD.
>
> Eigenvalues of the covariance matrix. The SVD gives you eigenvalues of
> the covariance matrix directly from the demeaned data matrix without
> explicitly forming the covariance matrix.

Good, I didn't realize this when I worked on the eig and svd versions of
the pca. In a similar way, I was initially puzzled that pinv can be used
on the data matrix or on the covariance matrix (only the latter I have seen
in books).

I will go back to do my homework, I just saw that numpy.linalg.pinv
directly works with the svd. I never read the source of the linalgs, because
I thought they are just direct calls to Lapack and Blas.

Josef


>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ?-- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Mon Nov  2 01:09:52 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Nov 2009 02:09:52 -0400
Subject: [SciPy-User] characteristic functions of probability distributions
Message-ID: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>

The characteristic function is just the (continuous) fourier transform
of the probability density function.

I tried to use fft and ifft to convert between the characteristic
function and the density function but I don't manage to get the units
or discretization correctly. Does anyone have an example script for
any distribution. Right now it's mostly a theoretical exercise, but
there are some interesting applications in finance.

Second related question, since I'm not good with complex numbers.

scipy.integrate.quad of a complex function returns the absolute value.
Is there a numerical integration function in scipy that returns the
complex integral or do I have to integrate the real and imaginary
parts separately?

Thanks,

Josef


From sturla at molden.no  Mon Nov  2 04:37:26 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 02 Nov 2009 10:37:26 +0100
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>	<4AEE4BF2.4060403@molden.no>	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>	<4AEE6116.6070602@molden.no>	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
Message-ID: <4AEEA856.2060109@molden.no>

josef.pktd at gmail.com skrev:
> Good, I didn't realize this when I worked on the eig and svd versions of
> the pca. In a similar way, I was initially puzzled that pinv can be used
> on the data matrix or on the covariance matrix (only the latter I have seen
> in books).
>
>   

I'll try to explain... If you have a matrix C, you can factorize like 
this, with Sigma being a diagonal matrix:

   C = U * Sigma * V'

   >>> u,s,vt = np.linalg.svd(c)

If C is square (rank n x n), we now have the inverse

   C**-1 = V * [S**-1] * U'

   >>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T)

And here you have the pathology diagnosis:

A small value of s, will cause a huge value of 1/s. This is 
"ill-conditioning" that e.g. happens with multicolinearity. You get a 
small s, you divide by it, and rounding error skyrockets. We can improve 
the situation by editing the tiny values in Sigma to zero. That just 
changes C by a tiny amount, but might have a dramatic stabilizing effect 
on C**-1. Now you can do your LU and not worry. It might not be clear 
from statistics textbooks why multicolinearity is problem. But using 
SVD, we see both the problem and the solution very clearly: A small 
singular value might not contribute significantly to C, but could or 
severly affect or even dominate in C**-1. We can thus get a biased but 
numerically better approximation to C**-1 by deleting it from the 
equation. So after editing s, we could e.g. do:

   >>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt)

and continue with LU on c_fixed to get the quadratic form.

Also beware that you can solve

   C * x = b

like this

   x = (V * [S**-1]) * (U' * b)

But if we are to reapeat this for several values of b, it would make 
more sence to reconstruct C and go for the LU. Soving with LU also 
involves two matrix multiplications:

   L * y = b
   U * x = y
 
but the computational demand is reduced by the triangular structure of L 
and U.

Please don't say you'd rather preprocess data with a PCA. If C was a 
covariance matrix, we just threw out the smallest principal components 
out of the data. Deleting tiny singular values is in fact why PCA helps!

Also beware that

   pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0)

So we can get PCA from SVD without even calculating the covariance. Now 
you have the standard deviations in Sigma, the principal components in 
V, and the factor loadings in U. SVD is how PCA is usually computed. It 
is better than estimating Cov(X), and then apply Jacobi rotations to get 
the eigenvalues and eigenvectors of  Cov(X). One reason is that Cov(X) 
should be estimated using a "two-pass algorithm" to cancel accumulating 
rounding error (Am Stat, 37: p.  242-247). But that equation is not 
shown in most statistics textbooks, so most practitioners tend to not 
know of it . 
We can solve the common least squares problem using an SVD:
 
   b = argmin { || X * b - Y  ||  **  2 }

If we do an SVD of X, we can compute

   b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T )

Unlike the other methods of fitting least squares, this one cannot fail. 
And you also see clearly what a PCA will do:

   Skip "(u[i,:] * Y )/s[i]" for too small values of s[i]

So you can preprocess with PCA anf fit LS in one shot. 

Ridge-regression (Tychonov regularization) is another solution to the 
multicollinearity problem:

    (A'A + lambda*I)*x = A'b

But how would you choose the numerically optimal value of lambda? It 
turns out to be a case of SVD as well. Goloub & van Loan has that on 
page 583.

QR with column pivoting can be seen as a case of SVD. Many use this for 
least-squares, not even knowing it is SVD. So SVD is ubiquitous in data 
modelling, even if you don't know it. :-)

One more thing: The Cholesky factorization is always stabile, the LU is 
not. But don't be fooled: This only applies to the facotization itself. 
If you have multicolinearity, the problem is there even if you use 
Cholesky. You get the "singular value disease" (astronomic rounding 
error) when you solve the triangular system. A Cholesky can tell you if 
a covariance matrix is singular at your numerical precision. An SVD can 
tell you how close to singularity it is, and how to fix it. SVD comes at 
a cost, which is  slower  computation. But usually it is worth the extra 
investment in CPU cycles.

Sturla Molden


From bsouthey at gmail.com  Mon Nov  2 09:46:22 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 02 Nov 2009 08:46:22 -0600
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEE4BF2.4060403@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
Message-ID: <4AEEF0BE.3010508@gmail.com>

On 11/01/2009 09:03 PM, Sturla Molden wrote:
> josef.pktd at gmail.com skrev:
>    
>> In econometrics (including statsmodels) we have a lot of quadratic
>> forms that are usually calculate with a matrix inverse.
>>      
> That is a sign of numerical incompetence.
>    
By whom? :-)
Sure there are cases that just require solving of a linear system when 
inverses perhaps should not be used. But there are a lot of other cases 
especially statistical that you require an inverse such as getting 
standard errors and using solving algorithms that are way faster than 
than those that do not use inverses.

Although, statistical problems are usually bad numerically, some of the 
issues like speed and precision get further away with modern 64-bit 
cpus. Really it is getting the right tool for the job in hand.

Bruce


From souheil.inati at nyu.edu  Mon Nov  2 09:50:36 2009
From: souheil.inati at nyu.edu (Souheil Inati)
Date: Mon, 2 Nov 2009 09:50:36 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEEF0BE.3010508@gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no> <4AEEF0BE.3010508@gmail.com>
Message-ID: <4A9052E2-0E8F-4EE8-A01A-20E93D38ACC5@nyu.edu>


On Nov 2, 2009, at 9:46 AM, Bruce Southey wrote:

> On 11/01/2009 09:03 PM, Sturla Molden wrote:
>> josef.pktd at gmail.com skrev:
>>
>>> In econometrics (including statsmodels) we have a lot of quadratic
>>> forms that are usually calculate with a matrix inverse.
>>>
>> That is a sign of numerical incompetence.
>>
> By whom? :-)
> Sure there are cases that just require solving of a linear system when
> inverses perhaps should not be used. But there are a lot of other  
> cases
> especially statistical that you require an inverse such as getting
> standard errors and using solving algorithms that are way faster than
> than those that do not use inverses.
>
> Although, statistical problems are usually bad numerically, some of  
> the
> issues like speed and precision get further away with modern 64-bit
> cpus. Really it is getting the right tool for the job in hand.
>
> Bruce
>
Sorry, this statement is misleading.  Precision has nothing to do with  
64-bit cpus - that's just of storing bigger matrices.

-Souheil


---------------------------------

Souheil Inati, PhD
Research Associate Professor
Center for Neural Science and Department of Psychology
Chief Physicist, NYU Center for Brain Imaging
New York University
4 Washington Place, Room 809
New York, N.Y., 10003-6621
Office: (212) 998-3741
Fax:     (212) 995-4011
Email: souheil.inati at nyu.edu


From souheil.inati at nyu.edu  Mon Nov  2 09:53:25 2009
From: souheil.inati at nyu.edu (Souheil Inati)
Date: Mon, 2 Nov 2009 09:53:25 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEEA856.2060109@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
	<4AEEA856.2060109@molden.no>
Message-ID: <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>


On Nov 2, 2009, at 4:37 AM, Sturla Molden wrote:

> josef.pktd at gmail.com skrev:
>> Good, I didn't realize this when I worked on the eig and svd  
>> versions of
>> the pca. In a similar way, I was initially puzzled that pinv can be  
>> used
>> on the data matrix or on the covariance matrix (only the latter I  
>> have seen
>> in books).
>>
>>
>
> I'll try to explain... If you have a matrix C, you can factorize like
> this, with Sigma being a diagonal matrix:
>
>   C = U * Sigma * V'
>
>>>> u,s,vt = np.linalg.svd(c)
>
> If C is square (rank n x n), we now have the inverse
>
>   C**-1 = V * [S**-1] * U'
>
>>>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T)
>
> And here you have the pathology diagnosis:
>
> A small value of s, will cause a huge value of 1/s. This is
> "ill-conditioning" that e.g. happens with multicolinearity. You get a
> small s, you divide by it, and rounding error skyrockets. We can  
> improve
> the situation by editing the tiny values in Sigma to zero. That just
> changes C by a tiny amount, but might have a dramatic stabilizing  
> effect
> on C**-1. Now you can do your LU and not worry. It might not be clear
> from statistics textbooks why multicolinearity is problem. But using
> SVD, we see both the problem and the solution very clearly: A small
> singular value might not contribute significantly to C, but could or
> severly affect or even dominate in C**-1. We can thus get a biased but
> numerically better approximation to C**-1 by deleting it from the
> equation. So after editing s, we could e.g. do:
>
>>>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt)
>
> and continue with LU on c_fixed to get the quadratic form.
>
> Also beware that you can solve
>
>   C * x = b
>
> like this
>
>   x = (V * [S**-1]) * (U' * b)
>
> But if we are to reapeat this for several values of b, it would make
> more sence to reconstruct C and go for the LU. Soving with LU also
> involves two matrix multiplications:
>
>   L * y = b
>   U * x = y
>
> but the computational demand is reduced by the triangular structure  
> of L
> and U.
>
> Please don't say you'd rather preprocess data with a PCA. If C was a
> covariance matrix, we just threw out the smallest principal components
> out of the data. Deleting tiny singular values is in fact why PCA  
> helps!
>
> Also beware that
>
>   pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0)
>
> So we can get PCA from SVD without even calculating the covariance.  
> Now
> you have the standard deviations in Sigma, the principal components in
> V, and the factor loadings in U. SVD is how PCA is usually computed.  
> It
> is better than estimating Cov(X), and then apply Jacobi rotations to  
> get
> the eigenvalues and eigenvectors of  Cov(X). One reason is that Cov(X)
> should be estimated using a "two-pass algorithm" to cancel  
> accumulating
> rounding error (Am Stat, 37: p.  242-247). But that equation is not
> shown in most statistics textbooks, so most practitioners tend to not
> know of it .
> We can solve the common least squares problem using an SVD:
>
>   b = argmin { || X * b - Y  ||  **  2 }
>
> If we do an SVD of X, we can compute
>
>   b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T )
>
> Unlike the other methods of fitting least squares, this one cannot  
> fail.
> And you also see clearly what a PCA will do:
>
>   Skip "(u[i,:] * Y )/s[i]" for too small values of s[i]
>
> So you can preprocess with PCA anf fit LS in one shot.
>
> Ridge-regression (Tychonov regularization) is another solution to the
> multicollinearity problem:
>
>    (A'A + lambda*I)*x = A'b
>
> But how would you choose the numerically optimal value of lambda? It
> turns out to be a case of SVD as well. Goloub & van Loan has that on
> page 583.
>
> QR with column pivoting can be seen as a case of SVD. Many use this  
> for
> least-squares, not even knowing it is SVD. So SVD is ubiquitous in  
> data
> modelling, even if you don't know it. :-)
>
> One more thing: The Cholesky factorization is always stabile, the LU  
> is
> not. But don't be fooled: This only applies to the facotization  
> itself.
> If you have multicolinearity, the problem is there even if you use
> Cholesky. You get the "singular value disease" (astronomic rounding
> error) when you solve the triangular system. A Cholesky can tell you  
> if
> a covariance matrix is singular at your numerical precision. An SVD  
> can
> tell you how close to singularity it is, and how to fix it. SVD  
> comes at
> a cost, which is  slower  computation. But usually it is worth the  
> extra
> investment in CPU cycles.
>
> Sturla Molden


I agree with Sturla's comment's above 100%.    You should almost  
always use SVD to understand your linear system properties.   For  
least squares fitting QR is the modern, stable algorithm of choise.   
(see for example the matlab \ operator).   It's really a crime that we  
don't teach SVD and QR.

There are two sources of error: 1. noise in the measurement and 2.  
noise in the numerics (rounding, division, etc.).   A properly  
constructed linear system solver will take care of the second type of  
error (rounding, etc.).  If your system is ill-conditioned, then you  
need to control the inversion so that the signal is maintained and the  
noise is not amplified too much.  In the overwhelming majority of  
applications, the SNR isn't better than 1000:1.  If you know your the  
relative size of your noise and signal, then you can control the SNR  
in your parameter estimates by choosing the svd truncation (noise  
amplification factor).

For those of you that want an accessible reference for numerical  
stability in linear algebra, this book is a must read:
Numerical Linear Algebra, Lloyd Trefethen
http://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617

Cheers,
Souheil

---------------------------------

Souheil Inati, PhD
Research Associate Professor
Center for Neural Science and Department of Psychology
Chief Physicist, NYU Center for Brain Imaging
New York University
4 Washington Place, Room 809
New York, N.Y., 10003-6621
Office: (212) 998-3741
Fax:     (212) 995-4011
Email: souheil.inati at nyu.edu


From josef.pktd at gmail.com  Mon Nov  2 10:38:59 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Nov 2009 10:38:59 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
	<4AEEA856.2060109@molden.no>
	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>
Message-ID: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>

On Mon, Nov 2, 2009 at 9:53 AM, Souheil Inati <souheil.inati at nyu.edu> wrote:
>
> On Nov 2, 2009, at 4:37 AM, Sturla Molden wrote:
>
>> josef.pktd at gmail.com skrev:
>>> Good, I didn't realize this when I worked on the eig and svd
>>> versions of
>>> the pca. In a similar way, I was initially puzzled that pinv can be
>>> used
>>> on the data matrix or on the covariance matrix (only the latter I
>>> have seen
>>> in books).
>>>
>>>
>>
>> I'll try to explain... If you have a matrix C, you can factorize like
>> this, with Sigma being a diagonal matrix:
>>
>> ? C = U * Sigma * V'
>>
>>>>> u,s,vt = np.linalg.svd(c)
>>
>> If C is square (rank n x n), we now have the inverse
>>
>> ? C**-1 = V * [S**-1] * U'
>>
>>>>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T)
>>
>> And here you have the pathology diagnosis:
>>
>> A small value of s, will cause a huge value of 1/s. This is
>> "ill-conditioning" that e.g. happens with multicolinearity. You get a
>> small s, you divide by it, and rounding error skyrockets. We can
>> improve
>> the situation by editing the tiny values in Sigma to zero. That just
>> changes C by a tiny amount, but might have a dramatic stabilizing
>> effect
>> on C**-1. Now you can do your LU and not worry. It might not be clear
>> from statistics textbooks why multicolinearity is problem. But using
>> SVD, we see both the problem and the solution very clearly: A small
>> singular value might not contribute significantly to C, but could or
>> severly affect or even dominate in C**-1. We can thus get a biased but
>> numerically better approximation to C**-1 by deleting it from the
>> equation. So after editing s, we could e.g. do:
>>
>>>>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt)
>>
>> and continue with LU on c_fixed to get the quadratic form.
>>
>> Also beware that you can solve
>>
>> ? C * x = b
>>
>> like this
>>
>> ? x = (V * [S**-1]) * (U' * b)
>>
>> But if we are to reapeat this for several values of b, it would make
>> more sence to reconstruct C and go for the LU. Soving with LU also
>> involves two matrix multiplications:
>>
>> ? L * y = b
>> ? U * x = y
>>
>> but the computational demand is reduced by the triangular structure
>> of L
>> and U.
>>
>> Please don't say you'd rather preprocess data with a PCA. If C was a
>> covariance matrix, we just threw out the smallest principal components
>> out of the data. Deleting tiny singular values is in fact why PCA
>> helps!
>>
>> Also beware that
>>
>> ? pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0)
>>
>> So we can get PCA from SVD without even calculating the covariance.
>> Now
>> you have the standard deviations in Sigma, the principal components in
>> V, and the factor loadings in U. SVD is how PCA is usually computed.
>> It
>> is better than estimating Cov(X), and then apply Jacobi rotations to
>> get
>> the eigenvalues and eigenvectors of ?Cov(X). One reason is that Cov(X)
>> should be estimated using a "two-pass algorithm" to cancel
>> accumulating
>> rounding error (Am Stat, 37: p. ?242-247). But that equation is not
>> shown in most statistics textbooks, so most practitioners tend to not
>> know of it .
>> We can solve the common least squares problem using an SVD:
>>
>> ? b = argmin { || X * b - Y ?|| ?** ?2 }
>>
>> If we do an SVD of X, we can compute
>>
>> ? b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T )
>>
>> Unlike the other methods of fitting least squares, this one cannot
>> fail.
>> And you also see clearly what a PCA will do:
>>
>> ? Skip "(u[i,:] * Y )/s[i]" for too small values of s[i]
>>
>> So you can preprocess with PCA anf fit LS in one shot.
>>
>> Ridge-regression (Tychonov regularization) is another solution to the
>> multicollinearity problem:
>>
>> ? ?(A'A + lambda*I)*x = A'b
>>
>> But how would you choose the numerically optimal value of lambda? It
>> turns out to be a case of SVD as well. Goloub & van Loan has that on
>> page 583.
>>
>> QR with column pivoting can be seen as a case of SVD. Many use this
>> for
>> least-squares, not even knowing it is SVD. So SVD is ubiquitous in
>> data
>> modelling, even if you don't know it. :-)
>>
>> One more thing: The Cholesky factorization is always stabile, the LU
>> is
>> not. But don't be fooled: This only applies to the facotization
>> itself.
>> If you have multicolinearity, the problem is there even if you use
>> Cholesky. You get the "singular value disease" (astronomic rounding
>> error) when you solve the triangular system. A Cholesky can tell you
>> if
>> a covariance matrix is singular at your numerical precision. An SVD
>> can
>> tell you how close to singularity it is, and how to fix it. SVD
>> comes at
>> a cost, which is ?slower ?computation. But usually it is worth the
>> extra
>> investment in CPU cycles.
>>
>> Sturla Molden

Thanks Sturla,

I'm going to slowly work my way through this. at least I'm able now to
calculate a inverse matrix squareroot, which is useful for the quadratic
form and we will switch away from some of the remaining uses of the
matrix inverse in statsmodels.


>
>
> I agree with Sturla's comment's above 100%. ? ?You should almost
> always use SVD to understand your linear system properties. ? For
> least squares fitting QR is the modern, stable algorithm of choise.
> (see for example the matlab \ operator). ? It's really a crime that we
> don't teach SVD and QR.
>
> There are two sources of error: 1. noise in the measurement and 2.
> noise in the numerics (rounding, division, etc.). ? A properly
> constructed linear system solver will take care of the second type of
> error (rounding, etc.). ?If your system is ill-conditioned, then you
> need to control the inversion so that the signal is maintained and the
> noise is not amplified too much. ?In the overwhelming majority of
> applications, the SNR isn't better than 1000:1. ?If you know your the
> relative size of your noise and signal, then you can control the SNR
> in your parameter estimates by choosing the svd truncation (noise
> amplification factor).
>
> For those of you that want an accessible reference for numerical
> stability in linear algebra, this book is a must read:
> Numerical Linear Algebra, Lloyd Trefethen
> http://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617


It really depends on the application. From the applications I know,
pca is used for dimension reduction, when there are way too many
regressors to avoid overfitting. The most popular in econometrics
might be

Forecasting Using Principal Components from a Large Number of Predictors
# James H. Stock and Mark W. Watson
# Journal of the American Statistical Association, Vol. 97, No. 460
(Dec., 2002), pp. 1167-1179

A similar problem exists in chemometrics with more regressors than
observations (at least from the descriptions I read when reading about
NIPALS).

I don't think that compared to the big stochastic errors, numerical
precision plays much of a role. When we have large estimation errors
in small samples in statistics, we don't have to worry, for example,
about 10e-15 precision when our sampling errors are 10e-1.

Of course, there are other applications, and I'm working my way slowly
through the numerical issues.

Josef

>
> Cheers,
> Souheil
>
> ---------------------------------
>
> Souheil Inati, PhD
> Research Associate Professor
> Center for Neural Science and Department of Psychology
> Chief Physicist, NYU Center for Brain Imaging
> New York University
> 4 Washington Place, Room 809
> New York, N.Y., 10003-6621
> Office: (212) 998-3741
> Fax: ? ? (212) 995-4011
> Email: souheil.inati at nyu.edu
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From souheil.inati at nyu.edu  Mon Nov  2 11:26:07 2009
From: souheil.inati at nyu.edu (Souheil Inati)
Date: Mon, 2 Nov 2009 11:26:07 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<4AEE4BF2.4060403@molden.no>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
	<4AEEA856.2060109@molden.no>
	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>
	<1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
Message-ID: <E102A41E-D536-42D7-A1B1-A9A563AFACCA@nyu.edu>

On Nov 2, 2009, at 10:38 AM, josef.pktd at gmail.com wrote:

> > snip
>
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting. The most popular in econometrics
> might be
>
> Forecasting Using Principal Components from a Large Number of  
> Predictors
> # James H. Stock and Mark W. Watson
> # Journal of the American Statistical Association, Vol. 97, No. 460
> (Dec., 2002), pp. 1167-1179
>
> A similar problem exists in chemometrics with more regressors than
> observations (at least from the descriptions I read when reading about
> NIPALS).
>
> I don't think that compared to the big stochastic errors, numerical
> precision plays much of a role. When we have large estimation errors
> in small samples in statistics, we don't have to worry, for example,
> about 10e-15 precision when our sampling errors are 10e-1.
>
> Of course, there are other applications, and I'm working my way slowly
> through the numerical issues.
>
> Josef


Hi Josef,

I have a strong opinion about this, and I am almost certainly in the  
minority, but my feeling is this: once you have ill-conditioning all  
bets are off.

Once the problem is ill-conditioned, then there are an infinite number  
of solutions that match your data in a least-squares sense.  You are  
then required to say something further about how you want to pick a  
particular solution from among the infinite number of equivalent  
solutions.   SVD/PCA is a procedure to find the minimum-two-norm  
solution that fits the data.  The minimum two-norm solution is  
unique.  For the general case, SVD is the only method that has a  
proper theory.  There is no proper theory for anything else, PERIOD.

The only other useful thing one can say is that if you expect your  
solution to be sparse, then you can use the newly developed theory of  
compressed sensing (tao and candes).  This says that the minimum one- 
norm solution is best in a statistical sense and provides an algorithm  
to find it.  The difference between SVD and compressed sensing is that  
the former spreads the power out equally among the coefficients, while  
the latter picks the solution that maximizes the magnitude of some of  
the cofficients and sets others to zero (i.e. picks a sparse answer).

So if you're problem is ill-conditioned, then you are in trouble.   
Your only legitimate options are to you use the SVD to pick the  
minimum two-norm answer, or to use compressed sensing and pick the  
minimum one-norm answer.   Everything else is completely nonsense.

Cheers,
Souheil


From sturla at molden.no  Mon Nov  2 11:31:16 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 02 Nov 2009 17:31:16 +0100
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>	<4AEE4BF2.4060403@molden.no>	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>	<4AEE6116.6070602@molden.no>	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>	<4AEEA856.2060109@molden.no>	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>
	<1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
Message-ID: <4AEF0954.8060509@molden.no>

josef.pktd at gmail.com skrev:
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting.

Too many regressors gives you one or more tiny singular values in the 
covariance matrix (X'X), which you use in:

   betas = (X'X)**-1 * X' * y

So the inverse of X'X is heavily influenced by one or more of these 
"singular values" that do not contribute significantly to X'X. That is 
obviously ridicilous, because we want the factors that determines X'X to 
determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors 
(betas) we estimate to be determined by the same factors that determines 
X'X.

So we proceed by doing SVD on X'X and throw the offenders out. And in 
statistics, that is called "PCA". And small singular values in X'X is 
known as "multicolinearity".


When multicolinearity is present, numerical stability is the problem:

1 / s[i]  becomes infinite for s[i] == 0, and thus s[i] dominates 
(X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute 
to X'X. So it makes sence to edit too small s[i] values out, so that 
only the values of s[i] important for X'X is used to compute (X'X)**-1 
and betas. And that is what PCA does. Statistics textbooks usually don't 
teach this. They just say "multicolinearity is bad".

Yes PCA is used for "dimensionality reduction" and avoiding overfitting. 
But why is overfitting a problem anyway? And why does PCA help? This is 
actually all entagled. The main issue is alwys that 1/s[i] is big when 
s[i] is small. Overfitting gives you a lot of these big 1/s values. And 
now the betas you solved does not reflect the signal in X'X, so the 
model has no predictive power.


Sturla


From cimrman3 at ntc.zcu.cz  Mon Nov  2 11:46:57 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Mon, 02 Nov 2009 17:46:57 +0100
Subject: [SciPy-User] reverse Cuthill-McKee
Message-ID: <4AEF0D01.1080209@ntc.zcu.cz>

Hi,

I need an implementation of the (symmetric) reverse Cuthill-McKee matrix 
reordering algorithm. Is anyone aware of an implementation callable from 
Python? A scipy CSR/CSC matrix-based one would be the best, of course.

thanks,
r.


From bsouthey at gmail.com  Mon Nov  2 12:31:53 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 02 Nov 2009 11:31:53 -0600
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEF0954.8060509@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>	<4AEE4BF2.4060403@molden.no>	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>	<4AEE6116.6070602@molden.no>	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>	<4AEEA856.2060109@molden.no>	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>	<1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
	<4AEF0954.8060509@molden.no>
Message-ID: <4AEF1789.5060405@gmail.com>

On 11/02/2009 10:31 AM, Sturla Molden wrote:
> josef.pktd at gmail.com skrev:
>    
>> It really depends on the application. From the applications I know,
>> pca is used for dimension reduction, when there are way too many
>> regressors to avoid overfitting.
>>      
> Too many regressors gives you one or more tiny singular values in the
> covariance matrix (X'X), which you use in:
>
>     betas = (X'X)**-1 * X' * y
>
> So the inverse of X'X is heavily influenced by one or more of these
> "singular values" that do not contribute significantly to X'X. That is
> obviously ridicilous, because we want the factors that determines X'X to
> determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors
> (betas) we estimate to be determined by the same factors that determines
> X'X.
>
> So we proceed by doing SVD on X'X and throw the offenders out. And in
> statistics, that is called "PCA". And small singular values in X'X is
> known as "multicolinearity".
>
>
>
> When multicolinearity is present, numerical stability is the problem:
>
> 1 / s[i]  becomes infinite for s[i] == 0, and thus s[i] dominates
> (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute
> to X'X. So it makes sence to edit too small s[i] values out, so that
> only the values of s[i] important for X'X is used to compute (X'X)**-1
> and betas. And that is what PCA does. Statistics textbooks usually don't
> teach this. They just say "multicolinearity is bad".
>
> Yes PCA is used for "dimensionality reduction" and avoiding overfitting.
> But why is overfitting a problem anyway? And why does PCA help? This is
> actually all entagled. The main issue is alwys that 1/s[i] is big when
> s[i] is small. Overfitting gives you a lot of these big 1/s values. And
> now the betas you solved does not reflect the signal in X'X, so the
> model has no predictive power.
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>    
Well that is fine if you are doing feature extraction but not feature 
selection. Most of statistical problems involve feature selection so 
obviously it gets more space and time. Feature extraction has relatively 
very limited use in statistics (usually when 'black boxes' are useful) 
so it is usually taught as an advanced topic.


Bruce


From josef.pktd at gmail.com  Mon Nov  2 12:40:47 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Nov 2009 12:40:47 -0500
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <4AEF0954.8060509@molden.no>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>
	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>
	<4AEE6116.6070602@molden.no>
	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>
	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>
	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>
	<4AEEA856.2060109@molden.no>
	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>
	<1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>
	<4AEF0954.8060509@molden.no>
Message-ID: <1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com>

On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati <souheil.inati at nyu.edu> wrote:
>
> I have a strong opinion about this, and I am almost certainly in the
> minority, but my feeling is this: once you have ill-conditioning all
> bets are off.
>
> Once the problem is ill-conditioned, then there are an infinite number
> of solutions that match your data in a least-squares sense.  You are
> then required to say something further about how you want to pick a
> particular solution from among the infinite number of equivalent
> solutions.

I think, that's the point. However, the solution in economics is not to
replace the decision about your solution by a numerical procedure
that selects one for the researcher.

In statsmodels, I looked at the estimation results using pinv, which is
exactly svd plus throw away tiny singular values (np.linalg.pinv).

The problem is that this provides a nice solution and doesn't ring
an alarm bell, I want to have exceptions or infinite standard errors for
the parameter estimates.
Handling multicollinearity has to be an explicit task and a conscious choice
by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors,
reparameterization and variable selection in the past.
The choice of multicollinearity correction has to be reported in the results.
If pinv (or svd) is blindly used, because there is no warning, then we will see
researchers presenting their "nice" parameter estimates, which completely
hide the fact that the parameters are actually not identified.

I think I worry more about numerical precision and efficiency when the
multicollinearity is not yet so extreme that we have to drop (near)zero
eigenvalues.


On Mon, Nov 2, 2009 at 11:31 AM, Sturla Molden <sturla at molden.no> wrote:
> josef.pktd at gmail.com skrev:
>> It really depends on the application. From the applications I know,
>> pca is used for dimension reduction, when there are way too many
>> regressors to avoid overfitting.
>
> Too many regressors gives you one or more tiny singular values in the
> covariance matrix (X'X), which you use in:
>
> ? betas = (X'X)**-1 * X' * y
>
> So the inverse of X'X is heavily influenced by one or more of these
> "singular values" that do not contribute significantly to X'X. That is
> obviously ridicilous, because we want the factors that determines X'X to
> determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors
> (betas) we estimate to be determined by the same factors that determines
> X'X.
>
> So we proceed by doing SVD on X'X and throw the offenders out. And in
> statistics, that is called "PCA". And small singular values in X'X is
> known as "multicolinearity".
>

I think this applies to forecasting, but not when parameter estimates
and standard errors of the parameter estimates are the primary interest.

>
> When multicolinearity is present, numerical stability is the problem:
>
> 1 / s[i] ?becomes infinite for s[i] == 0, and thus s[i] dominates
> (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute
> to X'X. So it makes sence to edit too small s[i] values out, so that
> only the values of s[i] important for X'X is used to compute (X'X)**-1
> and betas. And that is what PCA does. Statistics textbooks usually don't
> teach this. They just say "multicolinearity is bad".
>
> Yes PCA is used for "dimensionality reduction" and avoiding overfitting.
> But why is overfitting a problem anyway? And why does PCA help? This is
> actually all entagled. The main issue is alwys that 1/s[i] is big when
> s[i] is small. Overfitting gives you a lot of these big 1/s values. And
> now the betas you solved does not reflect the signal in X'X, so the
> model has no predictive power.

I'm not sure you need high multicollinearity to have overfitting.
Overfitting is still a problem after dropping the near zero singular
values, if many of the variables just capture variation in the past
data that doesn't really reflect the data generating process.
I think, cross validation and parameter selection usually select
fewer variables than would be required for positive definiteness.

Josef
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bsouthey at gmail.com  Mon Nov  2 13:31:28 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 02 Nov 2009 12:31:28 -0600
Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv
In-Reply-To: <1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com>
References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com>	<1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com>	<4AEE6116.6070602@molden.no>	<1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com>	<3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com>	<1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com>	<4AEEA856.2060109@molden.no>	<1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu>	<1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com>	<4AEF0954.8060509@molden.no>
	<1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com>
Message-ID: <4AEF2580.3050804@gmail.com>

On 11/02/2009 11:40 AM, josef.pktd at gmail.com wrote:
> On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati<souheil.inati at nyu.edu>  wrote:
>    
>> I have a strong opinion about this, and I am almost certainly in the
>> minority, but my feeling is this: once you have ill-conditioning all
>> bets are off.
>>
>> Once the problem is ill-conditioned, then there are an infinite number
>> of solutions that match your data in a least-squares sense.  You are
>> then required to say something further about how you want to pick a
>> particular solution from among the infinite number of equivalent
>> solutions.
>>      
> I think, that's the point. However, the solution in economics is not to
> replace the decision about your solution by a numerical procedure
> that selects one for the researcher.
>
> In statsmodels, I looked at the estimation results using pinv, which is
> exactly svd plus throw away tiny singular values (np.linalg.pinv).
>    
Please do not confuse SVD with pinv as these are not the same functions.
pinv returns a Moore Penrose inverse:
http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse

Thus pinv is implemented using SVD but that is not the only way to get a 
Moore Penrose inverse.


> The problem is that this provides a nice solution and doesn't ring
> an alarm bell, I want to have exceptions or infinite standard errors for
> the parameter estimates.
> Handling multicollinearity has to be an explicit task and a conscious choice
> by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors,
> reparameterization and variable selection in the past.
> The choice of multicollinearity correction has to be reported in the results.
> If pinv (or svd) is blindly used, because there is no warning, then we will see
> researchers presenting their "nice" parameter estimates, which completely
> hide the fact that the parameters are actually not identified.
>    
There is no people 'blindly' using these methods as these are the basics 
of linear algebra and really has nothing to do with multicollinearity. 
When you have an overdetermined system to solve then there are an 
infinite number of solutions and you can not use the inverse to solve 
the normal equations. The most common approach is to rely on a 
generalized inverse (http://en.wikipedia.org/wiki/Generalized_inverse - 
not a great reference) to solve it - of which the Moore Penrose inverse 
is one specific type. When these are used such as in analysis of 
variance, then the results are not wrong, not hidden and totally 
accepted by the scientific community. But it does rely on the user to 
know when things are not as expected (which is usually trivial because 
the degrees of freedom are not as expected).

Bruce


From dominique.orban at gmail.com  Mon Nov  2 14:32:50 2009
From: dominique.orban at gmail.com (Dominique Orban)
Date: Mon, 2 Nov 2009 15:32:50 -0400
Subject: [SciPy-User] reverse Cuthill-McKee
Message-ID: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com>

> ---------- Forwarded message ----------
> From:?Robert Cimrman <cimrman3 at ntc.zcu.cz>
> To:?SciPy Users List <scipy-user at scipy.org>
> Date:?Mon, 02 Nov 2009 17:46:57 +0100
> Subject:?[SciPy-User] reverse Cuthill-McKee
> Hi,
>
> I need an implementation of the (symmetric) reverse Cuthill-McKee matrix reordering algorithm. Is anyone aware of an implementation callable from Python? A scipy CSR/CSC matrix-based one would be the best, of course.
>
> thanks,
> r.

Hi Robert,

I just pushed my repo to GitHub so you can try it out. I'm using the
implementation from the Harwell Subroutine Library which you'll need
to grab from their website. Specify your source dir in site.cfg and
you should be good to go.

git clone git://github.com/dpo/pyorder.git

Cheers,
Dominique


From vanforeest at gmail.com  Mon Nov  2 15:51:06 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 2 Nov 2009 21:51:06 +0100
Subject: [SciPy-User] characteristic functions of probability
	distributions
In-Reply-To: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
Message-ID: <fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>

Hi Josef,

2009/11/2  <josef.pktd at gmail.com>:
> The characteristic function is just the (continuous) fourier transform
> of the probability density function.
>
> I tried to use fft and ifft to convert between the characteristic
> function and the density function but I don't manage to get the units
> or discretization correctly. Does anyone have an example script for
> any distribution. Right now it's mostly a theoretical exercise, but
> there are some interesting applications in finance.

There is an inversion formula used to invert the characteristic
function to the distribution function (the density should follow
easily then), see e.g. Chung (or any other book on graduate
probability). I don't know about its numerical properties though. The
formula is used to prove the central limit theorem. I also recall that
Ward Whitt (see his homepage) used Fourier theory to invert Laplace
transforms. He was also concerned with numerical properties, so this
might be the best place to look for. He also uses the inversion
formula, and refers to Feller.

>
> Second related question, since I'm not good with complex numbers.
>
> scipy.integrate.quad of a complex function returns the absolute value.
> Is there a numerical integration function in scipy that returns the
> complex integral or do I have to integrate the real and imaginary
> parts separately?

You want to compute \int_w^z f(t) dt? When f is analytic (i.e.,
satisfies the Cauchy Riemann equations) this integral is path
independent. Otherwise the path from w to z is of importance. You
might like the book Visual Complex Analysis by Needham for intuition.

bye

Nicky

>
> Thanks,
>
> Josef
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From lorenzo.isella at gmail.com  Tue Nov  3 06:16:19 2009
From: lorenzo.isella at gmail.com (Lorenzo Isella)
Date: Tue, 3 Nov 2009 12:16:19 +0100
Subject: [SciPy-User] Wrapping C/C++ Code
Message-ID: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>

Dear All,
I hope this is not too off-topic.
If you were asked to wrap  C/C++ codes into a Python application
(potentially relying on NumPy/SciPy) which route would you follow?
Bear in mind that the initial C/C++ code is a standalone program which
was not written having Python in mind at all.
Many thanks

Lorenzo


From cimrman3 at ntc.zcu.cz  Tue Nov  3 10:05:36 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Tue, 03 Nov 2009 16:05:36 +0100
Subject: [SciPy-User] reverse Cuthill-McKee
In-Reply-To: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com>
References: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com>
Message-ID: <4AF046C0.7090706@ntc.zcu.cz>

Dominique Orban wrote:
>> ---------- Forwarded message ----------
>> From: Robert Cimrman <cimrman3 at ntc.zcu.cz>
>> To: SciPy Users List <scipy-user at scipy.org>
>> Date: Mon, 02 Nov 2009 17:46:57 +0100
>> Subject: [SciPy-User] reverse Cuthill-McKee
>> Hi,
>>
>> I need an implementation of the (symmetric) reverse Cuthill-McKee matrix reordering algorithm. Is anyone aware of an implementation callable from Python? A scipy CSR/CSC matrix-based one would be the best, of course.
>>
>> thanks,
>> r.
> 
> Hi Robert,
> 
> I just pushed my repo to GitHub so you can try it out. I'm using the
> implementation from the Harwell Subroutine Library which you'll need
> to grab from their website. Specify your source dir in site.cfg and
> you should be good to go.
> 
> git clone git://github.com/dpo/pyorder.git
> 
> Cheers,
> Dominique

Hi Dominique,

thanks! I may ultimately need something BSD-ish, but your code would be great 
to test the stuff against.

cheers,
r.


From rpg.314 at gmail.com  Tue Nov  3 10:23:03 2009
From: rpg.314 at gmail.com (Rohit Garg)
Date: Tue, 3 Nov 2009 20:53:03 +0530
Subject: [SciPy-User] Wrapping C/C++ Code
In-Reply-To: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>
References: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>
Message-ID: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com>

The first thing to do is to expose an API from your program that your
script can access. It'll likely not be done as  it was written with
one language in mind.

After that it's your call whether you want to embed or extend the
interpreter. For extending, IMHO, SWIG is your friend.

On Tue, Nov 3, 2009 at 4:46 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
> Dear All,
> I hope this is not too off-topic.
> If you were asked to wrap ?C/C++ codes into a Python application
> (potentially relying on NumPy/SciPy) which route would you follow?
> Bear in mind that the initial C/C++ code is a standalone program which
> was not written having Python in mind at all.
> Many thanks
>
> Lorenzo
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay


From bsouthey at gmail.com  Tue Nov  3 11:06:43 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 03 Nov 2009 10:06:43 -0600
Subject: [SciPy-User] Wrapping C/C++ Code
In-Reply-To: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com>
References: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>
	<4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com>
Message-ID: <4AF05513.8080307@gmail.com>

On 11/03/2009 09:23 AM, Rohit Garg wrote:
> The first thing to do is to expose an API from your program that your
> script can access. It'll likely not be done as  it was written with
> one language in mind.
>
> After that it's your call whether you want to embed or extend the
> interpreter. For extending, IMHO, SWIG is your friend.
>
> On Tue, Nov 3, 2009 at 4:46 PM, Lorenzo Isella<lorenzo.isella at gmail.com>  wrote:
>    
>> Dear All,
>> I hope this is not too off-topic.
>> If you were asked to wrap  C/C++ codes into a Python application
>> (potentially relying on NumPy/SciPy) which route would you follow?
>> Bear in mind that the initial C/C++ code is a standalone program which
>> was not written having Python in mind at all.
>> Many thanks
>>
>> Lorenzo
>>      
First you should determine if it is worth accessing that code/program. 
Since you are going to use numpy then it may be worth the effort to 
rewrite the required parts using numpy/scipy/Cython.

If you have no control over the development or it needs to be a 
standalone program then you probably should call it through Python. The 
reason is that you probably have little control over code maintenance 
and how any changes will impact your code.

If the code is a stable then I agree with Rohit that swig is a viable 
option.

Bruce


From zachary.pincus at yale.edu  Tue Nov  3 11:25:52 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 3 Nov 2009 11:25:52 -0500
Subject: [SciPy-User] Wrapping C/C++ Code
In-Reply-To: <4AF05513.8080307@gmail.com>
References: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>
	<4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com>
	<4AF05513.8080307@gmail.com>
Message-ID: <22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu>

Another option instead of SWIG, if you have a reasonably stable C API  
and a pre-built shared library exporting the same, is to use ctypes to  
call into it. This works well enough with many numeric APIs, too where  
you can allocate arrays with numpy, and then use the array's ctypes  
attribute to get at a pointer to the memory suitable for passing into  
the C code.

The downside is that (as far as I know) there's no good way to build  
pure-C libraries as part of a "python setup.py build" step (though  
some functionality along these lines might be now in the numpy  
distutuls?).

Zach


From rmrndr at unife.it  Tue Nov  3 14:04:57 2009
From: rmrndr at unife.it (ANDREA ARMAROLI)
Date: Tue, 3 Nov 2009 20:04:57 +0100
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <mailman.0.1257274513.8025.scipy-user@scipy.org>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org>
Message-ID: <20091103185905.M2354@unife.it>

Dear users,

I'm trying to solve an ODE system that models a parametric oscillator with
complex amplitudes at two freqencies.

I'm new to python. This problem is simply solved in matlab using ode45.

If I try to use odeint I cannot set parameters like tolerances or max step.

The result is constant-valued solutions.

Trying with ode class and ZVODE integrator, I have that total intensity is not
conserved. I have weird quasi-periodic oscillations.

I do know that my equations have excess degrees of freedom and I know from
symmetries what the integrals of motion are, but in Matlab this works pretty well.

Here is the code with odeint

import numpy as N
import scipy as S
import scipy.integrate
import pylab as P

def deriv(Y,t):
  
    A1 = Y[0]
    A2 = Y[1]
    A1d = 1j*A2*N.conj(A1)
    A2d = 1j*(A1**2/2.0 + dk*A2)
    return [A1d, A2d]
    

nplot = 10000
zmax = 10.0
zstep = zmax/nplot

dk = 0.5 # normalised detuning/dispersion
phi0 = 0.0*N.pi # initial dephasing
eta0 = 0.3 # initial pump intensity

# initial values
u20 = N.sqrt(eta0)*N.exp(1j*phi0)
u10 = N.sqrt(2*(1-eta0))

H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);

Y,info =
scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_output=True,printmessg
= True)
# are conserved quantities constant?
Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2
P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
# then Hamiltonian...
#...

p1  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
p2  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))


Thank you very much for your help.

Andrea Armaroli


From peridot.faceted at gmail.com  Tue Nov  3 14:28:09 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Tue, 3 Nov 2009 14:28:09 -0500
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <20091103185905.M2354@unife.it>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org>
	<20091103185905.M2354@unife.it>
Message-ID: <ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>

2009/11/3 ANDREA ARMAROLI <rmrndr at unife.it>:
> Dear users,
>
> I'm trying to solve an ODE system that models a parametric oscillator with
> complex amplitudes at two freqencies.
>
> I'm new to python. This problem is simply solved in matlab using ode45.
>
> If I try to use odeint I cannot set parameters like tolerances or max step.

You can in fact control these using the (optional) parameters hmax,
rtol, and atol.

> The result is constant-valued solutions.

After a bit of experimentation (in particular, a print statement
inside your derivative function) it turns out that the problem is that
odeint does not support complex values (it silently discards imaginary
parts). This is not a major obstacle, since you can just pack and
unpack the values yourself. When I do that, the plot I get is two
oscillatory results (the absolute values), and one nice flat line (for
the total, which I'm guessing you need to be conserved).

I have to say, it would be very good if odeint either reported an
error or worked with complex values, so you would have found this
easier to track down. But it looks like it does a reasonable job of
solving your problem once you work around its lack of complex support.

Anne

> Trying with ode class and ZVODE integrator, I have that total intensity is not
> conserved. I have weird quasi-periodic oscillations.
>
> I do know that my equations have excess degrees of freedom and I know from
> symmetries what the integrals of motion are, but in Matlab this works pretty well.
>
> Here is the code with odeint
>
> import numpy as N
> import scipy as S
> import scipy.integrate
> import pylab as P
>
> def deriv(Y,t):
>
>    A1 = Y[0]
>    A2 = Y[1]
>    A1d = 1j*A2*N.conj(A1)
>    A2d = 1j*(A1**2/2.0 + dk*A2)
>    return [A1d, A2d]
>
>
>
>
> nplot = 10000
> zmax = 10.0
> zstep = zmax/nplot
>
> dk = 0.5 # normalised detuning/dispersion
> phi0 = 0.0*N.pi # initial dephasing
> eta0 = 0.3 # initial pump intensity
>
> # initial values
> u20 = N.sqrt(eta0)*N.exp(1j*phi0)
> u10 = N.sqrt(2*(1-eta0))
>
> H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);
>
> Y,info =
> scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_output=True,printmessg
> = True)
> # are conserved quantities constant?
> Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2
> P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
> # then Hamiltonian...
> #...
>
> p1  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
> p2  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))
>
>
> Thank you very much for your help.
>
> Andrea Armaroli
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.py
Type: text/x-python
Size: 1235 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091103/d0a63da4/attachment.py>

From fperez.net at gmail.com  Tue Nov  3 14:28:55 2009
From: fperez.net at gmail.com (Fernando Perez)
Date: Tue, 3 Nov 2009 11:28:55 -0800
Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with
	Guido at the Berkeley Py4Science seminar
Message-ID: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>

Hi folks,

if you reside in the San Francisco Bay Area, you may be interested in
a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our
regular py4science meeting series.  Guido van Rossum, the creator of
the Python language, will visit for a session where we will first do a
very rapid overview of a number of scientific projects that use Python
(in a lightning talk format) and then we will have an open discussion
with Guido with hopefully interesting questions going in both
directions.  The meeting is open to all, bring your questions!

More details on this seminar series (including location) can be found here:

https://cirl.berkeley.edu/view/Py4Science

Cheers,

f


From jagan_cbe2003 at yahoo.co.in  Tue Nov  3 14:58:32 2009
From: jagan_cbe2003 at yahoo.co.in (jagan prabhu)
Date: Wed, 4 Nov 2009 01:28:32 +0530 (IST)
Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed
Message-ID: <807497.44749.qm@web8316.mail.in.yahoo.com>

Dear users,

Problem is " Bounded with inequality constrains", in slssqp often Bounds are not obeyed, its deviates the bounds. So if deviates i made it to come back with in the region(bounds). But i face a problem in execution. i get error like,

File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", line 277, in fmin_slsqp
??? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
? File "/usr/lib/python2.5/site-packages/scipy/optimize/optimize.py", line 97, in function_wrapper
??? return function(x, *args)
TypeError: <lambda>() takes exactly 1 argument (2 given)

program will look like,


#import os
#import scipy.optimize
from scipy import *
import numpy
from scipy import optimize
from numpy import asarray
from math import *


def cst(aParams,bounds):
? aParams = numpy.asarray(aParams)
? for par in range(len(aParams)):
?? if ((bounds[par][0]<= aParams[par]<= bounds[par][1])):
???? pass
?? else:
???? if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0]
???? if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1]

? x = aParams[0]
? y = aParams[1]
? z = aParams[2]
# objective function
? eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2)) 
? return eqn


#Initial guess
Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z
bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)]
# inequality constraints x must be least,y larger than x smaller than z,and z the largest of all
con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) 


opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime = None, args=(bounds,), full_output=True, iter=20000, iprint=2, acc=0.001)

print '****************************************'

print opt[0]
print opt[1]
print opt[2]
print opt[4]

Problems are,
1, bounds i could not able to pass to the function as args( ).
2, Whether implementation of the ineq. constraints are correct? any better way?
3, How to avoid bounds deviation?

Please help me.

Regards,
Prabhu


      Now, send attachments up to 25MB with Yahoo! India Mail. Learn how. http://in.overview.mail.yahoo.com/photos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091104/26b728a6/attachment.html>

From zachary.pincus at yale.edu  Tue Nov  3 15:09:17 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 3 Nov 2009 15:09:17 -0500
Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed
In-Reply-To: <807497.44749.qm@web8316.mail.in.yahoo.com>
References: <807497.44749.qm@web8316.mail.in.yahoo.com>
Message-ID: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu>

Hello,

> Problem is " Bounded with inequality constrains", in slssqp often  
> Bounds are not obeyed, its deviates the bounds. So if deviates i  
> made it to come back with in the region(bounds). But i face a  
> problem in execution. i get error like,
>
> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",  
> line 277, in fmin_slsqp
>     c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>   File "/usr/lib/python2.5/site-packages/scipy/optimize/ 
> optimize.py", line 97, in function_wrapper
>     return function(x, *args)
> TypeError: <lambda>() takes exactly 1 argument (2 given)

I can't help with the bounds not being obeyed... but I can say that  
you'll probably need to provide more detail about what you mean by  
this for others to help though -- is it that the optimizer will  
occasionally evaluate the objective outside of the bounds (which I  
understand is normal?) or that the final results are out-of-bounds?

Anyhow, the traceback explains exactly what the problem with the  
execution is. You define your inequality constraint as:
con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])

but from the traceback, you can see that it is being called like:
function(x, *args)

The error is quite clear on the problem: your lambda takes one  
argument, but it is called with two.

I assume that x is the current position, and args is just what you  
passed to the slsqp. So you should rewrite con1 as lambda x, args:  
whatever...

Zach


On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote:

> Dear users,
>
> Problem is " Bounded with inequality constrains", in slssqp often  
> Bounds are not obeyed, its deviates the bounds. So if deviates i  
> made it to come back with in the region(bounds). But i face a  
> problem in execution. i get error like,
>
> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",  
> line 277, in fmin_slsqp
>     c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>   File "/usr/lib/python2.5/site-packages/scipy/optimize/ 
> optimize.py", line 97, in function_wrapper
>     return function(x, *args)
> TypeError: <lambda>() takes exactly 1 argument (2 given)
>
> program will look like,
>
>
> #import os
> #import scipy.optimize
> from scipy import *
> import numpy
> from scipy import optimize
> from numpy import asarray
> from math import *
>
>
> def cst(aParams,bounds):
>   aParams = numpy.asarray(aParams)
>   for par in range(len(aParams)):
>    if ((bounds[par][0]<= aParams[par]<= bounds[par][1])):
>      pass
>    else:
>      if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0]
>      if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1]
>
>   x = aParams[0]
>   y = aParams[1]
>   z = aParams[2]
> # objective function
>   eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2))
>   return eqn
>
>
> #Initial guess
> Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z
> bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)]
> # inequality constraints x must be least,y larger than x smaller  
> than z,and z the largest of all
> con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])
>
>
> opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime =  
> None, args=(bounds,), full_output=True, iter=20000, iprint=2,  
> acc=0.001)
>
> print '****************************************'
>
> print opt[0]
> print opt[1]
> print opt[2]
> print opt[4]
>
> Problems are,
> 1, bounds i could not able to pass to the function as args( ).
> 2, Whether implementation of the ineq. constraints are correct? any  
> better way?
> 3, How to avoid bounds deviation?
>
> Please help me.
>
> Regards,
> Prabhu
>
> Add whatever you love to the Yahoo! India homepage. Try now! 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From girishvs at gmail.com  Tue Nov  3 16:28:21 2009
From: girishvs at gmail.com (Girish Venkatasubramanian)
Date: Tue, 3 Nov 2009 13:28:21 -0800
Subject: [SciPy-User] GPG key issues when trying to install on RHEL
Message-ID: <fe02d61c0911031328x267c8e9bqc089e3b27a443e9e@mail.gmail.com>

Hello,
I followed your instructions and copied the appropriate .repo file to
/etc/yum.repos.d/ on my RHEL5 x86_64 machine. But when I try to
install the python-numpy and python-scipy using yum, I get the
following error

Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID eda8433f

GPG key retrieval failed: [ Errno 4] IOError: <urlopen error (110,
'Connection timed out')>

I am trying to install via a proxy - and I have configured the
yum.conf and set proxy_http and proxy_ftp directives.

Any help is appreciated.
Thanks


From dwf at cs.toronto.edu  Tue Nov  3 17:44:36 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Tue, 3 Nov 2009 17:44:36 -0500
Subject: [SciPy-User] GPG key issues when trying to install on RHEL
In-Reply-To: <fe02d61c0911031328x267c8e9bqc089e3b27a443e9e@mail.gmail.com>
References: <fe02d61c0911031328x267c8e9bqc089e3b27a443e9e@mail.gmail.com>
Message-ID: <A5E4C7B9-AB2F-4B4D-A7B4-AD5CAD50AF88@cs.toronto.edu>

On 3-Nov-09, at 4:28 PM, Girish Venkatasubramanian wrote:

> Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID  
> eda8433f
>
> GPG key retrieval failed: [ Errno 4] IOError: <urlopen error (110,
> 'Connection timed out')>
>
> I am trying to install via a proxy - and I have configured the
> yum.conf and set proxy_http and proxy_ftp directives.


Hmm... You'd be better off asking Red Hat support (or possibly on a  
Fedora forum),
the SciPy project doesn't maintain those package repositories.

David


From girishvs at gmail.com  Tue Nov  3 18:09:17 2009
From: girishvs at gmail.com (Girish Venkatasubramanian)
Date: Tue, 3 Nov 2009 15:09:17 -0800
Subject: [SciPy-User] GPG key issues when trying to install on RHEL
In-Reply-To: <A5E4C7B9-AB2F-4B4D-A7B4-AD5CAD50AF88@cs.toronto.edu>
References: <fe02d61c0911031328x267c8e9bqc089e3b27a443e9e@mail.gmail.com>
	<A5E4C7B9-AB2F-4B4D-A7B4-AD5CAD50AF88@cs.toronto.edu>
Message-ID: <fe02d61c0911031509r35ca46aev8d375ce2abd94e7@mail.gmail.com>

Thanks David - but I managed to figure it out. The problem was that
the key could not be retrieved because of proxy issues (even though
the rpms themselves were being downloaded). So I downloaded the key
(from the location in the .repo file) and imported it using rpm
--import. After that, the installation went through OK

Thanks.


On Tue, Nov 3, 2009 at 2:44 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> On 3-Nov-09, at 4:28 PM, Girish Venkatasubramanian wrote:
>
>> Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID
>> eda8433f
>>
>> GPG key retrieval failed: [ Errno 4] IOError: <urlopen error (110,
>> 'Connection timed out')>
>>
>> I am trying to install via a proxy - and I have configured the
>> yum.conf and set proxy_http and proxy_ftp directives.
>
>
> Hmm... You'd be better off asking Red Hat support (or possibly on a
> Fedora forum),
> the SciPy project doesn't maintain those package repositories.
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Tue Nov  3 22:40:47 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 3 Nov 2009 22:40:47 -0500
Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed
In-Reply-To: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu>
References: <807497.44749.qm@web8316.mail.in.yahoo.com>
	<7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu>
Message-ID: <1cd32cbb0911031940y2e2778b3o5a0867dd4942bfe6@mail.gmail.com>

On Tue, Nov 3, 2009 at 3:09 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hello,
>
>> Problem is " Bounded with inequality constrains", in slssqp often
>> Bounds are not obeyed, its deviates the bounds. So if deviates i
>> made it to come back with in the region(bounds). But i face a
>> problem in execution. i get error like,
>>
>> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",
>> line 277, in fmin_slsqp
>> ? ? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>> ? File "/usr/lib/python2.5/site-packages/scipy/optimize/
>> optimize.py", line 97, in function_wrapper
>> ? ? return function(x, *args)
>> TypeError: <lambda>() takes exactly 1 argument (2 given)
>
> I can't help with the bounds not being obeyed... but I can say that
> you'll probably need to provide more detail about what you mean by
> this for others to help though -- is it that the optimizer will
> occasionally evaluate the objective outside of the bounds (which I
> understand is normal?) or that the final results are out-of-bounds?
>
> Anyhow, the traceback explains exactly what the problem with the
> execution is. You define your inequality constraint as:
> con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])
>
> but from the traceback, you can see that it is being called like:
> function(x, *args)
>
> The error is quite clear on the problem: your lambda takes one
> argument, but it is called with two.
>
> I assume that x is the current position, and args is just what you
> passed to the slsqp. So you should rewrite con1 as lambda x, args:
> whatever...
>
> Zach
>
>
>
> On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote:
>
>> Dear users,
>>
>> Problem is " Bounded with inequality constrains", in slssqp often
>> Bounds are not obeyed, its deviates the bounds. So if deviates i
>> made it to come back with in the region(bounds). But i face a
>> problem in execution. i get error like,
>>
>> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",
>> line 277, in fmin_slsqp
>> ? ? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>> ? File "/usr/lib/python2.5/site-packages/scipy/optimize/
>> optimize.py", line 97, in function_wrapper
>> ? ? return function(x, *args)
>> TypeError: <lambda>() takes exactly 1 argument (2 given)
>>
>> program will look like,
>>
>>
>> #import os
>> #import scipy.optimize
>> from scipy import *
>> import numpy
>> from scipy import optimize
>> from numpy import asarray
>> from math import *
>>
>>
>> def cst(aParams,bounds):
>> ? aParams = numpy.asarray(aParams)
>> ? for par in range(len(aParams)):
>> ? ?if ((bounds[par][0]<= aParams[par]<= bounds[par][1])):
>> ? ? ?pass
>> ? ?else:
>> ? ? ?if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0]
>> ? ? ?if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1]
>>
>> ? x = aParams[0]
>> ? y = aParams[1]
>> ? z = aParams[2]
>> # objective function
>> ? eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2))
>> ? return eqn
>>
>>
>> #Initial guess
>> Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z
>> bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)]
>> # inequality constraints x must be least,y larger than x smaller
>> than z,and z the largest of all
>> con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])
>>
>>
>> opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime =
>> None, args=(bounds,), full_output=True, iter=20000, iprint=2,
>> acc=0.001)
>>
>> print '****************************************'
>>
>> print opt[0]
>> print opt[1]
>> print opt[2]
>> print opt[4]
>>
>> Problems are,
>> 1, bounds i could not able to pass to the function as args( ).
>> 2, Whether implementation of the ineq. constraints are correct? any
>> better way?
>> 3, How to avoid bounds deviation?
>>
>> Please help me.
>>
>> Regards,
>> Prabhu
>>

According to the slsqp help   inequality constraints are supposed to
be a list of functions

running your example with the args as mentioned by Zachary, produced a
result that violated the second inequality

changing to this

con1 = [lambda x,args: x[1]-x[0],
        lambda x,args: x[2]-x[1]]


opt = optimize.fmin_slsqp(cst, Init, ieqcons= con1 , bounds=bounds,
fprime = None, args=(bounds,), full_output=True, iter=20000, iprint=2,
acc=0.001)

gives results with second ineq constraint binding:

****************************************
[6.2838144667527311, 15.724071907665964, 15.724071907665964]
-5.72459190155
6
Optimization terminated successfully.


I don't know why in your example

np.asarray(x[1]-x[0], x[2]-x[1])

doesn't raise an exception,  the second argument to asarray is dtype,
so this should be wrong. (missing [])

>>> xx=opt[0]
>>> xx
[6.2838144667527311, 15.724071907665964, 15.724071907665964]
>>> np.asarray(xx[1]-xx[0], xx[2]-xx[1])
array(9.440257440913232)

There was also recently a nice example for slsqp on the mailing list.

Josef


From sebastian.walter at gmail.com  Wed Nov  4 05:05:52 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Wed, 4 Nov 2009 11:05:52 +0100
Subject: [SciPy-User] Wrapping C/C++ Code
In-Reply-To: <22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu>
References: <a2b3004b0911030316y14870c1cud22e5b0e5897425e@mail.gmail.com>
	<4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com>
	<4AF05513.8080307@gmail.com>
	<22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu>
Message-ID: <ec9f80fa0911040205n78ceecj984389ed28ced87a@mail.gmail.com>

1)
I'd also use ctypes whenever possible. Numpy offers good builtin
support to make it easy to call
C/Fortran functions that expect pointers to arrays. There is a nice tutorial on
http://www.scipy.org/Cookbook/Ctypes

Unfortunately, this route only works for C  and not for C++, so you
would have to write a C interface to a C++ library.

2) I use boost::python to wrap existing C++ projects in a quite
verbose way, e.g.  in
http://github.com/b45ch1/pyadolc/blob/master/adolc/src/py_adolc.hpp

It works reasonably well when you know what you are doing and it's
also quite flexible.
The downside is  the documentation, the long compilation times and the
"magic" template implementation
that is hard to understand.

hope that helps a little,
Sebastian


On Tue, Nov 3, 2009 at 5:25 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Another option instead of SWIG, if you have a reasonably stable C API
> and a pre-built shared library exporting the same, is to use ctypes to
> call into it. This works well enough with many numeric APIs, too where
> you can allocate arrays with numpy, and then use the array's ctypes
> attribute to get at a pointer to the memory suitable for passing into
> the C code.
>
> The downside is that (as far as I know) there's no good way to build
> pure-C libraries as part of a "python setup.py build" step (though
> some functionality along these lines might be now in the numpy
> distutuls?).
>
> Zach
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From jagan_cbe2003 at yahoo.co.in  Wed Nov  4 05:16:43 2009
From: jagan_cbe2003 at yahoo.co.in (jagan prabhu)
Date: Wed, 4 Nov 2009 15:46:43 +0530 (IST)
Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed
In-Reply-To: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu>
Message-ID: <396448.739.qm@web8314.mail.in.yahoo.com>

Thank you for your answers, the program is working now

referring to "outside bounds"

optimizer occasionally evaluate the objective outside of the bounds (which are quite abnormal for my case?) 

For example:
occasionally fmin_slsqp passes parameters like? -45.0,-12.0,16
which perfectly obeys inequality constrain, but out of bounds..
my code is quite sensitive, at any case it should stick with in bounds.

Please help me. Is there any problem in my Bound syntax ?

* i face out of bounds problem in all Constrained (multivariate) optimization methods.

--- On Wed, 4/11/09, Zachary Pincus <zachary.pincus at yale.edu> wrote:

From: Zachary Pincus <zachary.pincus at yale.edu>
Subject: Re: [SciPy-User] fmin_slsqp- Bounds are not obeyed
To: "SciPy Users List" <scipy-user at scipy.org>
Date: Wednesday, 4 November, 2009, 1:39 AM

Hello,

> Problem is " Bounded with inequality constrains", in slssqp often? 
> Bounds are not obeyed, its deviates the bounds. So if deviates i? 
> made it to come back with in the region(bounds). But i face a? 
> problem in execution. i get error like,
>
> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",? 
> line 277, in fmin_slsqp
>? ???c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>???File "/usr/lib/python2.5/site-packages/scipy/optimize/ 
> optimize.py", line 97, in function_wrapper
>? ???return function(x, *args)
> TypeError: <lambda>() takes exactly 1 argument (2 given)

I can't help with the bounds not being obeyed... but I can say that? 
you'll probably need to provide more detail about what you mean by? 
this for others to help though -- is it that the optimizer will? 
occasionally evaluate the objective outside of the bounds (which I? 
understand is normal?) or that the final results are out-of-bounds?

Anyhow, the traceback explains exactly what the problem with the? 
execution is. You define your inequality constraint as:
con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])

but from the traceback, you can see that it is being called like:
function(x, *args)

The error is quite clear on the problem: your lambda takes one? 
argument, but it is called with two.

I assume that x is the current position, and args is just what you? 
passed to the slsqp. So you should rewrite con1 as lambda x, args:? 
whatever...

Zach


On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote:

> Dear users,
>
> Problem is " Bounded with inequality constrains", in slssqp often? 
> Bounds are not obeyed, its deviates the bounds. So if deviates i? 
> made it to come back with in the region(bounds). But i face a? 
> problem in execution. i get error like,
>
> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",? 
> line 277, in fmin_slsqp
>? ???c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ])
>???File "/usr/lib/python2.5/site-packages/scipy/optimize/ 
> optimize.py", line 97, in function_wrapper
>? ???return function(x, *args)
> TypeError: <lambda>() takes exactly 1 argument (2 given)
>
> program will look like,
>
>
> #import os
> #import scipy.optimize
> from scipy import *
> import numpy
> from scipy import optimize
> from numpy import asarray
> from math import *
>
>
> def cst(aParams,bounds):
>???aParams = numpy.asarray(aParams)
>???for par in range(len(aParams)):
>? ? if ((bounds[par][0]<= aParams[par]<= bounds[par][1])):
>? ? ? pass
>? ? else:
>? ? ? if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0]
>? ? ? if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1]
>
>???x = aParams[0]
>???y = aParams[1]
>???z = aParams[2]
> # objective function
>???eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2))
>???return eqn
>
>
> #Initial guess
> Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z
> bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)]
> # inequality constraints x must be least,y larger than x smaller? 
> than z,and z the largest of all
> con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1])
>
>
> opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime =? 
> None, args=(bounds,), full_output=True, iter=20000, iprint=2,? 
> acc=0.001)
>
> print '****************************************'
>
> print opt[0]
> print opt[1]
> print opt[2]
> print opt[4]
>
> Problems are,
> 1, bounds i could not able to pass to the function as args( ).
> 2, Whether implementation of the ineq. constraints are correct? any? 
> better way?
> 3, How to avoid bounds deviation?
>
> Please help me.
>
> Regards,
> Prabhu
>
> Add whatever you love to the Yahoo! India homepage. Try now! 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


      Yahoo! India has a new look. Take a sneak peek http://in.yahoo.com/trynew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091104/e32fb013/attachment.html>

From zunbeltz at gmail.com  Wed Nov  4 07:10:26 2009
From: zunbeltz at gmail.com (Zunbeltz Izaola)
Date: Wed, 04 Nov 2009 13:10:26 +0100
Subject: [SciPy-User] iterating in a timeseries
Message-ID: <1257336626.13402.3.camel@mineat2.hmi.de>

Hi,

I am using scikits.timeseries. 

I have a time series with daily frequency, with data of 2 years.  I have
data almost every day, but some days are missing.

I want to iterate over the timeseries to get the values of the first and
last day of each month. I had tried to convert freq to 'M' and other
things, but I can not find an easy way. Any idea?

TIA,

Zunbeltz


From kmichael.aye at googlemail.com  Wed Nov  4 08:09:46 2009
From: kmichael.aye at googlemail.com (Michael Aye)
Date: Wed, 4 Nov 2009 05:09:46 -0800 (PST)
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org> 
	<20091103185905.M2354@unife.it>
	<ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>
Message-ID: <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com>

I think this example is worth to be kept in the cookbook, what do you
guys think?
It is showing how to do the oscillator modelling and can show the
'danger' of lack of complex support of odeint.

Just my feeling it could be quite helpful.
I myself certainly copied this into my treasure of worth-to-remember
evernotes.. ;)

Regards,
Michael

On Nov 3, 8:28?pm, Anne Archibald <peridot.face... at gmail.com> wrote:
> 2009/11/3 ANDREA ARMAROLI <rmr... at unife.it>:
>
> > Dear users,
>
> > I'm trying to solve an ODE system that models a parametric oscillator with
> > complex amplitudes at two freqencies.
>
> > I'm new to python. This problem is simply solved in matlab using ode45.
>
> > If I try to use odeint I cannot set parameters like tolerances or max step.
>
> You can in fact control these using the (optional) parameters hmax,
> rtol, and atol.
>
> > The result is constant-valued solutions.
>
> After a bit of experimentation (in particular, a print statement
> inside your derivative function) it turns out that the problem is that
> odeint does not support complex values (it silently discards imaginary
> parts). This is not a major obstacle, since you can just pack and
> unpack the values yourself. When I do that, the plot I get is two
> oscillatory results (the absolute values), and one nice flat line (for
> the total, which I'm guessing you need to be conserved).
>
> I have to say, it would be very good if odeint either reported an
> error or worked with complex values, so you would have found this
> easier to track down. But it looks like it does a reasonable job of
> solving your problem once you work around its lack of complex support.
>
> Anne
>
>
>
> > Trying with ode class and ZVODE integrator, I have that total intensity is not
> > conserved. I have weird quasi-periodic oscillations.
>
> > I do know that my equations have excess degrees of freedom and I know from
> > symmetries what the integrals of motion are, but in Matlab this works pretty well.
>
> > Here is the code with odeint
>
> > import numpy as N
> > import scipy as S
> > import scipy.integrate
> > import pylab as P
>
> > def deriv(Y,t):
>
> > ? ?A1 = Y[0]
> > ? ?A2 = Y[1]
> > ? ?A1d = 1j*A2*N.conj(A1)
> > ? ?A2d = 1j*(A1**2/2.0 + dk*A2)
> > ? ?return [A1d, A2d]
>
> > nplot = 10000
> > zmax = 10.0
> > zstep = zmax/nplot
>
> > dk = 0.5 # normalised detuning/dispersion
> > phi0 = 0.0*N.pi # initial dephasing
> > eta0 = 0.3 # initial pump intensity
>
> > # initial values
> > u20 = N.sqrt(eta0)*N.exp(1j*phi0)
> > u10 = N.sqrt(2*(1-eta0))
>
> > H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);
>
> > Y,info =
> > scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou tput=True,printmessg
> > = True)
> > # are conserved quantities constant?
> > Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2
> > P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
> > # then Hamiltonian...
> > #...
>
> > p1 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
> > p2 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))
>
> > Thank you very much for your help.
>
> > Andrea Armaroli
>
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-U... at scipy.org
> >http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> ?foo.py
> 1KViewDownload
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user


From dave.hirschfeld at gmail.com  Wed Nov  4 08:49:27 2009
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Wed, 4 Nov 2009 13:49:27 +0000 (UTC)
Subject: [SciPy-User] iterating in a timeseries
References: <1257336626.13402.3.camel@mineat2.hmi.de>
Message-ID: <loom.20091104T144709-740@post.gmane.org>

Zunbeltz Izaola <zunbeltz <at> gmail.com> writes:

> 
> Hi,
> 
> I am using scikits.timeseries. 
> 
> I have a time series with daily frequency, with data of 2 years.  I have
> data almost every day, but some days are missing.
> 
> I want to iterate over the timeseries to get the values of the first and
> last day of each month. I had tried to convert freq to 'M' and other
> things, but I can not find an easy way. Any idea?
> 
> TIA,
> 
> Zunbeltz
> 


Does this do what you're looking for?

from numpy.random import randint
import scikits.timeseries as ts
dates = ts.date_array(ts.Date('D','01-Jan-2009'),ts.Date('D','31-Dec-2011'))
series = ts.time_series(dates.day,dates)
monthly_series = series.convert('M')
ts.first_unmasked_val(monthly_series,axis=1)
ts.last_unmasked_val(monthly_series,axis=1)

# Example with missing data
series[randint(0,series.size,512)] = ma.masked
monthly_series = series.convert('M')
ts.first_unmasked_val(monthly_series,axis=1)
ts.last_unmasked_val(monthly_series,axis=1)

HTH,
Dave


From Jim.Vickroy at noaa.gov  Wed Nov  4 10:03:39 2009
From: Jim.Vickroy at noaa.gov (Jim Vickroy)
Date: Wed, 04 Nov 2009 08:03:39 -0700
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org>
	<20091103185905.M2354@unife.it>
	<ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>
	<13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com>
Message-ID: <4AF197CB.2040402@noaa.gov>

Michael Aye wrote:
> I think this example is worth to be kept in the cookbook, what do you
> guys think?
>   
+1

I thought the same but did not speak up.
-- jv
> It is showing how to do the oscillator modelling and can show the
> 'danger' of lack of complex support of odeint.
>
> Just my feeling it could be quite helpful.
> I myself certainly copied this into my treasure of worth-to-remember
> evernotes.. ;)
>
> Regards,
> Michael
>
> On Nov 3, 8:28 pm, Anne Archibald <peridot.face... at gmail.com> wrote:
>   
>> 2009/11/3 ANDREA ARMAROLI <rmr... at unife.it>:
>>
>>     
>>> Dear users,
>>>       
>>> I'm trying to solve an ODE system that models a parametric oscillator with
>>> complex amplitudes at two freqencies.
>>>       
>>> I'm new to python. This problem is simply solved in matlab using ode45.
>>>       
>>> If I try to use odeint I cannot set parameters like tolerances or max step.
>>>       
>> You can in fact control these using the (optional) parameters hmax,
>> rtol, and atol.
>>
>>     
>>> The result is constant-valued solutions.
>>>       
>> After a bit of experimentation (in particular, a print statement
>> inside your derivative function) it turns out that the problem is that
>> odeint does not support complex values (it silently discards imaginary
>> parts). This is not a major obstacle, since you can just pack and
>> unpack the values yourself. When I do that, the plot I get is two
>> oscillatory results (the absolute values), and one nice flat line (for
>> the total, which I'm guessing you need to be conserved).
>>
>> I have to say, it would be very good if odeint either reported an
>> error or worked with complex values, so you would have found this
>> easier to track down. But it looks like it does a reasonable job of
>> solving your problem once you work around its lack of complex support.
>>
>> Anne
>>
>>
>>
>>     
>>> Trying with ode class and ZVODE integrator, I have that total intensity is not
>>> conserved. I have weird quasi-periodic oscillations.
>>>       
>>> I do know that my equations have excess degrees of freedom and I know from
>>> symmetries what the integrals of motion are, but in Matlab this works pretty well.
>>>       
>>> Here is the code with odeint
>>>       
>>> import numpy as N
>>> import scipy as S
>>> import scipy.integrate
>>> import pylab as P
>>>       
>>> def deriv(Y,t):
>>>       
>>>    A1 = Y[0]
>>>    A2 = Y[1]
>>>    A1d = 1j*A2*N.conj(A1)
>>>    A2d = 1j*(A1**2/2.0 + dk*A2)
>>>    return [A1d, A2d]
>>>       
>>> nplot = 10000
>>> zmax = 10.0
>>> zstep = zmax/nplot
>>>       
>>> dk = 0.5 # normalised detuning/dispersion
>>> phi0 = 0.0*N.pi # initial dephasing
>>> eta0 = 0.3 # initial pump intensity
>>>       
>>> # initial values
>>> u20 = N.sqrt(eta0)*N.exp(1j*phi0)
>>> u10 = N.sqrt(2*(1-eta0))
>>>       
>>> H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);
>>>       
>>> Y,info =
>>> scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou tput=True,printmessg
>>> = True)
>>> # are conserved quantities constant?
>>> Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2
>>> P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
>>> # then Hamiltonian...
>>> #...
>>>       
>>> p1  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
>>> p2  = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))
>>>       
>>> Thank you very much for your help.
>>>       
>>> Andrea Armaroli
>>>       
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-U... at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>       
>>
>>  foo.py
>> 1KViewDownload
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>>     
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091104/65674cc3/attachment.html>

From peridot.faceted at gmail.com  Wed Nov  4 10:11:34 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Wed, 4 Nov 2009 10:11:34 -0500
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <4AF197CB.2040402@noaa.gov>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org>
	<20091103185905.M2354@unife.it>
	<ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>
	<13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com>
	<4AF197CB.2040402@noaa.gov>
Message-ID: <ce557a360911040711n472ec977y8252f55f0d80ac6e@mail.gmail.com>

2009/11/4 Jim Vickroy <Jim.Vickroy at noaa.gov>:
> Michael Aye wrote:
>
> I think this example is worth to be kept in the cookbook, what do you
> guys think?

In the short term it's probably worth putting something like this in
the cookbook (though don't use the OP's code without permission!) but
really the solution is to fix odeint (which shouldn't be too
difficult). I put a ticket in Trac, but haven't had time to actually
fix the problem yet.

Anne

> +1
>
> I thought the same but did not speak up.
> -- jv
>
> It is showing how to do the oscillator modelling and can show the
> 'danger' of lack of complex support of odeint.
>
> Just my feeling it could be quite helpful.
> I myself certainly copied this into my treasure of worth-to-remember
> evernotes.. ;)
>
> Regards,
> Michael
>
> On Nov 3, 8:28?pm, Anne Archibald <peridot.face... at gmail.com> wrote:
>
>
> 2009/11/3 ANDREA ARMAROLI <rmr... at unife.it>:
>
>
>
> Dear users,
>
>
> I'm trying to solve an ODE system that models a parametric oscillator with
> complex amplitudes at two freqencies.
>
>
> I'm new to python. This problem is simply solved in matlab using ode45.
>
>
> If I try to use odeint I cannot set parameters like tolerances or max step.
>
>
> You can in fact control these using the (optional) parameters hmax,
> rtol, and atol.
>
>
>
> The result is constant-valued solutions.
>
>
> After a bit of experimentation (in particular, a print statement
> inside your derivative function) it turns out that the problem is that
> odeint does not support complex values (it silently discards imaginary
> parts). This is not a major obstacle, since you can just pack and
> unpack the values yourself. When I do that, the plot I get is two
> oscillatory results (the absolute values), and one nice flat line (for
> the total, which I'm guessing you need to be conserved).
>
> I have to say, it would be very good if odeint either reported an
> error or worked with complex values, so you would have found this
> easier to track down. But it looks like it does a reasonable job of
> solving your problem once you work around its lack of complex support.
>
> Anne
>
>
>
>
>
> Trying with ode class and ZVODE integrator, I have that total intensity is
> not
> conserved. I have weird quasi-periodic oscillations.
>
>
> I do know that my equations have excess degrees of freedom and I know from
> symmetries what the integrals of motion are, but in Matlab this works pretty
> well.
>
>
> Here is the code with odeint
>
>
> import numpy as N
> import scipy as S
> import scipy.integrate
> import pylab as P
>
>
> def deriv(Y,t):
>
>
> ? ?A1 = Y[0]
> ? ?A2 = Y[1]
> ? ?A1d = 1j*A2*N.conj(A1)
> ? ?A2d = 1j*(A1**2/2.0 + dk*A2)
> ? ?return [A1d, A2d]
>
>
> nplot = 10000
> zmax = 10.0
> zstep = zmax/nplot
>
>
> dk = 0.5 # normalised detuning/dispersion
> phi0 = 0.0*N.pi # initial dephasing
> eta0 = 0.3 # initial pump intensity
>
>
> # initial values
> u20 = N.sqrt(eta0)*N.exp(1j*phi0)
> u10 = N.sqrt(2*(1-eta0))
>
>
> H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);
>
>
> Y,info =
> scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou
> tput=True,printmessg
> = True)
> # are conserved quantities constant?
> Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2
> P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
> # then Hamiltonian...
> #...
>
>
> p1 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
> p2 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))
>
>
> Thank you very much for your help.
>
>
> Andrea Armaroli
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> ?foo.py
> 1KViewDownload
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From jsseabold at gmail.com  Wed Nov  4 20:25:12 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Wed, 4 Nov 2009 20:25:12 -0500
Subject: [SciPy-User] Reshaping Question
Message-ID: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>

My brain is failing me.  Is there a clean way to reshape an array like
the following?

import numpy as np

c = np.arange(16).reshape(4, 2, 2)

In [209]: c
Out[209]:
array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15]]])

So that c == d where

d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15]))

In [211]: d
Out[211]:
array([[ 0,  1,  4,  5],
       [ 2,  3,  6,  7],
       [ 8,  9, 12, 13],
       [10, 11, 14, 15]])

Cheers,

Skipper


From dwf at cs.toronto.edu  Wed Nov  4 20:08:35 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Wed, 4 Nov 2009 20:08:35 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>
Message-ID: <20091105010834.GA15007@rodimus>

Hi Skipper, 

No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming 
dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim 
2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't 
legal, you'd have to do a copy to construct this matrix.

David

On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote:
> My brain is failing me.  Is there a clean way to reshape an array like
> the following?
> 
> import numpy as np
> 
> c = np.arange(16).reshape(4, 2, 2)
> 
> In [209]: c
> Out[209]:
> array([[[ 0,  1],
>         [ 2,  3]],
> 
>        [[ 4,  5],
>         [ 6,  7]],
> 
>        [[ 8,  9],
>         [10, 11]],
> 
>        [[12, 13],
>         [14, 15]]])
> 
> So that c == d where
> 
> d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15]))
> 
> In [211]: d
> Out[211]:
> array([[ 0,  1,  4,  5],
>        [ 2,  3,  6,  7],
>        [ 8,  9, 12, 13],
>        [10, 11, 14, 15]])
> 
> Cheers,
> 
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From peridot.faceted at gmail.com  Wed Nov  4 21:05:54 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Wed, 4 Nov 2009 21:05:54 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <20091105010834.GA15007@rodimus>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>
	<20091105010834.GA15007@rodimus>
Message-ID: <ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>

2009/11/4 David Warde-Farley <dwf at cs.toronto.edu>:
> Hi Skipper,
>
> No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming
> dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim
> 2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't
> legal, you'd have to do a copy to construct this matrix.

Reshape sometimes creates copies. It tries hard not to, and if you
assign the shape attribute rather than calling reshape it won't ever
make a copy, but if necessary reshape will copy the input array:

In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]:
array([[ 0,  1,  4,  5],
       [ 2,  3,  6,  7],
       [ 8,  9, 12, 13],
       [10, 11, 14, 15]])

The trick is to use transpose to do an arbitrary permutation of the
input axes, and also to rearrange the first axis with an additional
reshape.

Anne

> David
>
> On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote:
>> My brain is failing me. ?Is there a clean way to reshape an array like
>> the following?
>>
>> import numpy as np
>>
>> c = np.arange(16).reshape(4, 2, 2)
>>
>> In [209]: c
>> Out[209]:
>> array([[[ 0, ?1],
>> ? ? ? ? [ 2, ?3]],
>>
>> ? ? ? ?[[ 4, ?5],
>> ? ? ? ? [ 6, ?7]],
>>
>> ? ? ? ?[[ 8, ?9],
>> ? ? ? ? [10, 11]],
>>
>> ? ? ? ?[[12, 13],
>> ? ? ? ? [14, 15]]])
>>
>> So that c == d where
>>
>> d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15]))
>>
>> In [211]: d
>> Out[211]:
>> array([[ 0, ?1, ?4, ?5],
>> ? ? ? ?[ 2, ?3, ?6, ?7],
>> ? ? ? ?[ 8, ?9, 12, 13],
>> ? ? ? ?[10, 11, 14, 15]])
>>
>> Cheers,
>>
>> Skipper
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From dwf at cs.toronto.edu  Wed Nov  4 22:12:18 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Wed, 4 Nov 2009 22:12:18 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>
	<20091105010834.GA15007@rodimus>
	<ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>
Message-ID: <38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu>


On 4-Nov-09, at 9:05 PM, Anne Archibald wrote:

> Reshape sometimes creates copies. It tries hard not to, and if you
> assign the shape attribute rather than calling reshape it won't ever
> make a copy, but if necessary reshape will copy the input array:
>
> In [42]: np.transpose(c.reshape(2,2,2,2), 
> (0,2,1,3)).reshape(4,4)Out[42]:
> array([[ 0,  1,  4,  5],
>       [ 2,  3,  6,  7],
>       [ 8,  9, 12, 13],
>       [10, 11, 14, 15]])
>
> The trick is to use transpose to do an arbitrary permutation of the
> input axes, and also to rearrange the first axis with an additional
> reshape.

D'oh. When he said reshape I was thinking purely in terms of what  
could be done with .reshape(). I didn't even think about .transpose().

Is it then the .transpose() call that triggers the copy in this  
situation?

David


From peridot.faceted at gmail.com  Wed Nov  4 22:49:57 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Wed, 4 Nov 2009 22:49:57 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com>
	<20091105010834.GA15007@rodimus>
	<ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>
	<38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu>
Message-ID: <ce557a360911041949j7073ef20me35c07174ea0171e@mail.gmail.com>

2009/11/4 David Warde-Farley <dwf at cs.toronto.edu>:
>
> On 4-Nov-09, at 9:05 PM, Anne Archibald wrote:
>
>> Reshape sometimes creates copies. It tries hard not to, and if you
>> assign the shape attribute rather than calling reshape it won't ever
>> make a copy, but if necessary reshape will copy the input array:
>>
>> In [42]: np.transpose(c.reshape(2,2,2,2),
>> (0,2,1,3)).reshape(4,4)Out[42]:
>> array([[ 0, ?1, ?4, ?5],
>> ? ? ? [ 2, ?3, ?6, ?7],
>> ? ? ? [ 8, ?9, 12, 13],
>> ? ? ? [10, 11, 14, 15]])
>>
>> The trick is to use transpose to do an arbitrary permutation of the
>> input axes, and also to rearrange the first axis with an additional
>> reshape.
>
> D'oh. When he said reshape I was thinking purely in terms of what
> could be done with .reshape(). I didn't even think about .transpose().
>
> Is it then the .transpose() call that triggers the copy in this
> situation?

No, transpose() never needs to copy. It's the reshape.

In [3]: a = np.arange(6)

In [4]: a.shape = (2,3)

Here a's shape can be changed without copying the data, so the assignment works.

In [5]: b = a.T

In [6]: b.reshape(6)
Out[6]: array([0, 3, 1, 4, 2, 5])

Here b has been reshaped using the method, which returns a new array
that has copied the underlying data.

In [7]: b.shape = 6,
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

/home/peridot/<ipython console> in <module>()

AttributeError: incompatible shape for a non-contiguous array

This didn't work because assignment to the shape attribute is not
allowed to copy the data, and there's no way for this reshape to work
without copying the data.

The error message is misleading, because there are a number of
copy-less rearrangements that are possible even with non-contiguous
arrays:

In [9]: c = np.arange(12)[::2]

In [10]: c.shape = (2,3)

In [11]: c.shape = (3,2)

IIRC there are still a few rearrangements that can in principle be
done without copying the data that numpy doesn't recognize, but it's
fairly good at avoiding copies. This is not necessarily a good thing,
since it can mean that users expect reshape() never to copy data and
then become surprised when they fail to get a view when handed some
array whose strides are especially peculiar. So I think the best rule
is, if you want a view, always assign to the shape attribute.

Anne


From jsseabold at gmail.com  Wed Nov  4 22:56:10 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Wed, 4 Nov 2009 22:56:10 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com> 
	<20091105010834.GA15007@rodimus>
	<ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com>
Message-ID: <c048da1c0911041956x1f965dbfv91944e08ef38564b@mail.gmail.com>

On Wed, Nov 4, 2009 at 9:05 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/4 David Warde-Farley <dwf at cs.toronto.edu>:
>> Hi Skipper,
>>
>> No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming
>> dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim
>> 2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't
>> legal, you'd have to do a copy to construct this matrix.

Ah ok.  This makes sense, and is kind of why I thought I couldn't do
what I wanted as easily, as I'd like.

>
> Reshape sometimes creates copies. It tries hard not to, and if you
> assign the shape attribute rather than calling reshape it won't ever
> make a copy, but if necessary reshape will copy the input array:
>
> In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]:
> array([[ 0, ?1, ?4, ?5],
> ? ? ? [ 2, ?3, ?6, ?7],
> ? ? ? [ 8, ?9, 12, 13],
> ? ? ? [10, 11, 14, 15]])
>
> The trick is to use transpose to do an arbitrary permutation of the
> input axes, and also to rearrange the first axis with an additional
> reshape.
>
> Anne
>

This makes sense as well.  This is kind of what I was looking for I
just couldn't figure out the permutation.  I was trying to roll the
axes, though I guess this could still work if you add the extra axis.

I don't know if I'd use this in the end though, as it might sacrifice
too much readability in the code, but maybe that's just me...

What if I had the outermost container as a list?  Say,

c = [np.arange(4).reshape(2,2),np.arange(4,8).reshape(2,2),np.arange(8,12).reshape(2,2),np.arange(12,16).reshape(2,2)]

I seem to be running into much the same problems trying to use list
comprehension to end up with d.

It seems like I'm going to need a copy anyway, so maybe I'd be better
off just allocating a new array and filling it up transparently?

Skipper

>> David
>>
>> On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote:
>>> My brain is failing me. ?Is there a clean way to reshape an array like
>>> the following?
>>>
>>> import numpy as np
>>>
>>> c = np.arange(16).reshape(4, 2, 2)
>>>
>>> In [209]: c
>>> Out[209]:
>>> array([[[ 0, ?1],
>>> ? ? ? ? [ 2, ?3]],
>>>
>>> ? ? ? ?[[ 4, ?5],
>>> ? ? ? ? [ 6, ?7]],
>>>
>>> ? ? ? ?[[ 8, ?9],
>>> ? ? ? ? [10, 11]],
>>>
>>> ? ? ? ?[[12, 13],
>>> ? ? ? ? [14, 15]]])
>>>
>>> So that c == d where
>>>
>>> d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15]))
>>>
>>> In [211]: d
>>> Out[211]:
>>> array([[ 0, ?1, ?4, ?5],
>>> ? ? ? ?[ 2, ?3, ?6, ?7],
>>> ? ? ? ?[ 8, ?9, 12, 13],
>>> ? ? ? ?[10, 11, 14, 15]])
>>>
>>> Cheers,
>>>
>>> Skipper
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From jsseabold at gmail.com  Wed Nov  4 23:42:57 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Wed, 4 Nov 2009 23:42:57 -0500
Subject: [SciPy-User] Reshaping Question
In-Reply-To: <c048da1c0911041956x1f965dbfv91944e08ef38564b@mail.gmail.com>
References: <c048da1c0911041725o1f99467bx250f7cc4cac9d083@mail.gmail.com> 
	<20091105010834.GA15007@rodimus>
	<ce557a360911041805t27f4c50fk42b5a0fa164c2f50@mail.gmail.com> 
	<c048da1c0911041956x1f965dbfv91944e08ef38564b@mail.gmail.com>
Message-ID: <c048da1c0911042042v5ac6dc12ic6620c386c2ed2a4@mail.gmail.com>

On Wed, Nov 4, 2009 at 10:56 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>
>> Reshape sometimes creates copies. It tries hard not to, and if you
>> assign the shape attribute rather than calling reshape it won't ever
>> make a copy, but if necessary reshape will copy the input array:
>>
>> In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]:
>> array([[ 0, ?1, ?4, ?5],
>> ? ? ? [ 2, ?3, ?6, ?7],
>> ? ? ? [ 8, ?9, 12, 13],
>> ? ? ? [10, 11, 14, 15]])
>>
>> The trick is to use transpose to do an arbitrary permutation of the
>> input axes, and also to rearrange the first axis with an additional
>> reshape.
>>
>> Anne
>>
>
> This makes sense as well. ?This is kind of what I was looking for I
> just couldn't figure out the permutation. ?I was trying to roll the
> axes, though I guess this could still work if you add the extra axis.
>
> I don't know if I'd use this in the end though, as it might sacrifice
> too much readability in the code, but maybe that's just me...
>

The more I think about it, this is actually pretty elegant.  I'm
always going to have a c (it's really a Hessian from a multinomial
logit) that's J**2*K x K, so I can just replace (2,2,2,2) with
(J,J,K,K) and (4,4) with (J * K, J * K), and I think it's still pretty
clear.

Thanks!

Skipper


From matthew.brett at gmail.com  Thu Nov  5 03:04:07 2009
From: matthew.brett at gmail.com (Matthew Brett)
Date: Thu, 5 Nov 2009 00:04:07 -0800
Subject: [SciPy-User] loading mat file in scipy
In-Reply-To: <63321.93.37.128.116.1256227162.squirrel@webmail.sissa.it>
References: <63321.93.37.128.116.1256227162.squirrel@webmail.sissa.it>
Message-ID: <1e2af89e0911050004u5bb11942l9d28f8023963674b@mail.gmail.com>

Hi,

> I am sure the cause of ?slow loading of your file and that of mine are the
> same. It took ~56sec on my computer to load your data into the python.

If you are still interested in this problem, please consider taking a
look at a branch I'm working on:

git clone git://github.com/matthew-brett/scipy-work.git scipy-mb
cd scipy-mb
git checkout mio-optimization
# make (largish) cython .c file
cython scipy/io/matlab/mio5_utils.pyx
python setup.py install

It's about three times faster for loading Robin's file, at least (10s
on my laptop).

Best,

Matthew


From zunbeltz at gmail.com  Thu Nov  5 03:45:32 2009
From: zunbeltz at gmail.com (Zunbeltz Izaola)
Date: Thu, 05 Nov 2009 09:45:32 +0100
Subject: [SciPy-User] iterating in a timeseries
In-Reply-To: <loom.20091104T144709-740@post.gmane.org>
References: <1257336626.13402.3.camel@mineat2.hmi.de>
	<loom.20091104T144709-740@post.gmane.org>
Message-ID: <1257410732.13402.5.camel@mineat2.hmi.de>

On Wed, 2009-11-04 at 13:49 +0000, Dave Hirschfeld wrote:
> Zunbeltz Izaola <zunbeltz <at> gmail.com> writes:
> 
> > 
> > Hi,
> > 
> > I am using scikits.timeseries. 
> > 
> > I have a time series with daily frequency, with data of 2 years.  I have
> > data almost every day, but some days are missing.
> > 
> > I want to iterate over the timeseries to get the values of the first and
> > last day of each month. I had tried to convert freq to 'M' and other
> > things, but I can not find an easy way. Any idea?
> > 
> > TIA,
> > 
> > Zunbeltz
> > 
> 
> 
> Does this do what you're looking for?
> 

Thanks, It works perfectly,

Zunbeltz

> from numpy.random import randint
> import scikits.timeseries as ts
> dates = ts.date_array(ts.Date('D','01-Jan-2009'),ts.Date('D','31-Dec-2011'))
> series = ts.time_series(dates.day,dates)
> monthly_series = series.convert('M')
> ts.first_unmasked_val(monthly_series,axis=1)
> ts.last_unmasked_val(monthly_series,axis=1)
> 
> # Example with missing data
> series[randint(0,series.size,512)] = ma.masked
> monthly_series = series.convert('M')
> ts.first_unmasked_val(monthly_series,axis=1)
> ts.last_unmasked_val(monthly_series,axis=1)
> 
> HTH,
> Dave
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From rmrndr at unife.it  Thu Nov  5 05:17:15 2009
From: rmrndr at unife.it (ANDREA ARMAROLI)
Date: Thu, 5 Nov 2009 11:17:15 +0100
Subject: [SciPy-User] Troubles with odeint or ode
In-Reply-To: <4AF197CB.2040402@noaa.gov>
References: <mailman.0.1257274513.8025.scipy-user@scipy.org>
	<20091103185905.M2354@unife.it>
	<ce557a360911031128ic1982aerfaa365295cc7be38@mail.gmail.com>
	<13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com>
	<4AF197CB.2040402@noaa.gov>
Message-ID: <20091105101716.M4638@unife.it>

Hi guys,

I think it is worth placing in the cookbook. If you like, I can provide a few details of the framework it comes from.

Thank you VERY much for the quick response.

Andrea.
---------- Original Message -----------
From: Jim Vickroy <Jim.Vickroy at noaa.gov> 
To: SciPy Users List <scipy-user at scipy.org> 
Sent: Wed, 04 Nov 2009 08:03:39 -0700 
Subject: Re: [SciPy-User] Troubles with odeint or ode

> Michael Aye wrote:I think this example is worth to be kept in the cookbook, what do

you
guys

think?

+1
> 
> I thought the same but did not speak up.
> -- jv
> It is showing how to do the oscillator modelling and can show

the
'danger' of lack of complex support of

odeint.

Just my feeling it could be quite

helpful.
I myself certainly copied this into my treasure of

worth-to-remember
evernotes..

;)

Regards,
Michael

On Nov 3, 8:28?pm, Anne Archibald <peridot.face... at gmail.com>

wrote:

2009/11/3 ANDREA ARMAROLI <rmr... at unife.it>:

Dear

users,

I'm trying to solve an ODE system that models a parametric

oscillator

with
complex amplitudes at two

freqencies.

I'm new to python. This problem is simply solved in matlab

using

ode45.

If I try to use odeint I cannot set parameters like

tolerances or max

step.

You can in fact control these using the (optional) parameters

hmax,
rtol, and

atol.

The result is constant-valued

solutions.

After a bit of experimentation (in particular, a print

statement
inside your derivative function) it turns out that the problem is

that
odeint does not support complex values (it silently discards

imaginary
parts). This is not a major obstacle, since you can just pack

and
unpack the values yourself. When I do that, the plot I get is

two
oscillatory results (the absolute values), and one nice flat line

(for
the total, which I'm guessing you need to be

conserved).

I have to say, it would be very good if odeint either reported

an
error or worked with complex values, so you would have found

this
easier to track down. But it looks like it does a reasonable job

of
solving your problem once you work around its lack of complex

support.

Anne

Trying with ode class and ZVODE integrator, I have that total

intensity is

not
conserved. I have weird quasi-periodic

oscillations.

I do know that my equations have excess degrees of freedom

and I know

from
symmetries what the integrals of motion are, but in Matlab this works pretty

well.

Here is the code with

odeint

import numpy as

N
import scipy as

S
import

scipy.integrate
import pylab as

P

def

deriv(Y,t):

? ?A1 =

Y[0]
? ?A2 =

Y[1]
? ?A1d =

1j*A2*N.conj(A1)
? ?A2d = 1j*(A1**2/2.0 +

dk*A2)
? ?return [A1d,

A2d]

nplot =

10000
zmax =

10.0
zstep =

zmax/nplot

dk = 0.5 # normalised

detuning/dispersion
phi0 = 0.0*N.pi # initial

dephasing
eta0 = 0.3 # initial pump

intensity

# initial

values
u20 =

N.sqrt(eta0)*N.exp(1j*phi0)
u10 =

N.sqrt(2*(1-eta0))

H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0);

Y,info

=
scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou

tput=True,printmessg
=

True)
# are conserved quantities

constant?
Ptot = N.abs(Y[:,0])**2/2+

N.abs(Y[:,1])**2
P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot)
# then

Hamiltonian...
#...

p1 ?=

P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0]))
p2 ?=

P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1]))

Thank you very much for your

help.

Andrea

Armaroli

_______________________________________________
SciPy-User mailing

list
SciPy-U... at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

?foo.py
1KViewDownload

_______________________________________________
SciPy-User mailing

list
SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing

list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

------- End of Original Message -------
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091105/cb75d565/attachment.html>

From gaston.fiore at gmail.com  Thu Nov  5 13:37:13 2009
From: gaston.fiore at gmail.com (Gaston Fiore)
Date: Thu, 5 Nov 2009 13:37:13 -0500
Subject: [SciPy-User] SciPy installation problems
Message-ID: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com>

Hello,

I'm trying to install scipy from the SVN repository but I'm getting an
error. I've also discovered that numpy doesn't pass the built-in unit
tests, although it did seem to install without problems. Below is all
the information that I think it's needed to determine the problem and
its solution. Is it the fact that UMFPACK is missing?

Thanks a lot,

-Gaston


gbrain2:~ gafiore$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.5.8
BuildVersion:	9L30


gbrain2:~ gafiore$ gcc --version
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


gbrain2:~ gafiore$ gfortran --version
GNU Fortran (GCC) 4.2.3
Copyright (C) 2007 Free Software Foundation, Inc.


GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING


>>> import numpy
>>> numpy.test('1','10')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/__init__.py",
line 88, in test
    return NumpyTest().testall(level, verbosity)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/testing/numpytest.py",
line 576, in testall
    for mthname in self._get_method_names(obj,abs(level)):
TypeError: bad operand type for abs(): 'str'


dhcp-0011377089-64-8f:scipy gafiore$ python setup.py build
Warning: No configuration returned, assuming unavailable.
blas_opt_info:
  FOUND:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3)]
    extra_compile_args = ['-msse3',
'-I/System/Library/Frameworks/vecLib.framework/Headers']

lapack_opt_info:
  FOUND:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3)]
    extra_compile_args = ['-msse3']

umfpack_info:
  libraries umfpack not found in
/System/Library/Frameworks/Python.framework/Versions/2.5/lib
  libraries umfpack not found in /usr/local/lib
  libraries umfpack not found in /usr/lib
  libraries umfpack not found in /opt/local/lib
/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/system_info.py:401:
UserWarning:
    UMFPACK sparse solver (http://www.cise.ufl.edu/research/sparse/umfpack/)
    not found. Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [umfpack]) or by setting
    the UMFPACK environment variable.
  warnings.warn(self.notfounderror.__doc__)
  NOT AVAILABLE

Traceback (most recent call last):
  File "setup.py", line 160, in <module>
    setup_package()
  File "setup.py", line 152, in setup_package
    configuration=configuration )
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/core.py",
line 144, in setup
    config = configuration()
  File "setup.py", line 118, in configuration
    config.add_subpackage('scipy')
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 765, in add_subpackage
    caller_level = 2)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 748, in get_subpackage
    caller_level = caller_level + 1)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 695, in _get_configuration_from_setup_py
    config = setup_module.configuration(*args)
  File "./scipy/setup.py", line 20, in configuration
    config.add_subpackage('special')
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 765, in add_subpackage
    caller_level = 2)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 748, in get_subpackage
    caller_level = caller_level + 1)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py",
line 680, in _get_configuration_from_setup_py
    ('.py', 'U', 1))
  File "scipy/special/setup.py", line 7, in <module>
    from numpy.distutils.misc_util import get_numpy_include_dirs, get_info
ImportError: cannot import name get_info


From robert.kern at gmail.com  Thu Nov  5 15:20:25 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 5 Nov 2009 14:20:25 -0600
Subject: [SciPy-User] SciPy installation problems
In-Reply-To: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com>
References: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com>
Message-ID: <3d375d730911051220u7a4f499erbb2629949ac768a2@mail.gmail.com>

On Thu, Nov 5, 2009 at 12:37, Gaston Fiore <gaston.fiore at gmail.com> wrote:
> Hello,
>
> I'm trying to install scipy from the SVN repository but I'm getting an
> error. I've also discovered that numpy doesn't pass the built-in unit
> tests, although it did seem to install without problems. Below is all
> the information that I think it's needed to determine the problem and
> its solution. Is it the fact that UMFPACK is missing?

No.

>>>> import numpy
>>>> numpy.test('1','10')

It's "numpy.test(1, 10)".

> dhcp-0011377089-64-8f:scipy gafiore$ python setup.py build

> ?File "scipy/special/setup.py", line 7, in <module>
> ? ?from numpy.distutils.misc_util import get_numpy_include_dirs, get_info
> ImportError: cannot import name get_info

The version of numpy supplied with the OS is very old. You will need a
newer version in order to build this version of scipy. I recommend
avoiding the system's installation of Python entirely and using the
binaries from python.org instead.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Thu Nov  5 21:21:15 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Thu, 5 Nov 2009 20:21:15 -0600
Subject: [SciPy-User] Comparing variable time-shifted two measurements
Message-ID: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com>

Hello,

I have two aircraft based aerosol measurements. The first one is dccnConSTP
(blue), and the latter is CPCConc (red) as shown in this screen capture. (
http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to
compare these two measurements. It is expected to see that they must have a
positive correlation throughout the flight. However, the instrument that
gives CPCConc was experiencing a sampling issue and therefore making a
varying time-shifted measurements with respect to the first instrument.
(From the first box it is about 20 seconds, 24 from the seconds before the
dccnConSTP measurements shows up.) In other words in different altitude
levels, I have varying time differences in between these two measurements in
terms of their shapes. So, my goal turns to addressing this variable
shifting issue before I start doing the comparisons.

Is there a known automated approach to correct this mentioned varying-lag
issue? If so, how?

Thank you.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091105/5ad1c120/attachment.html>

From d.l.goldsmith at gmail.com  Thu Nov  5 23:05:05 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Thu, 5 Nov 2009 20:05:05 -0800
Subject: [SciPy-User] characteristic functions of probability
	distributions
In-Reply-To: <fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>
References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
	<fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>
Message-ID: <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com>

On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest <vanforeest at gmail.com>wrote:

> Hi Josef,
> > Second related question, since I'm not good with complex numbers.
> >
> > scipy.integrate.quad of a complex function returns the absolute value.
> > Is there a numerical integration function in scipy that returns the
> > complex integral or do I have to integrate the real and imaginary
> > parts separately?
>
> You want to compute \int_w^z f(t) dt? When f is analytic (i.e.,
> satisfies the Cauchy Riemann equations) this integral is path
> independent. Otherwise the path from w to z is of importance. You
> might like the book Visual Complex Analysis by Needham for intuition.
>

Furthermore, if f is analytic in an (open) region R homotopic to an (open)
disc, then the integral (an integer number of times) around *any* _closed_
path wholly in R is identically equal to zero; there's a similar statement
(though the end value is a multiple of 2ipi) if f has only poles of finite
order in R.  (Indeed, these properties should be used to unit test any
numerical complex path integration routine.)  Are any of your paths closed?

DG

>
> bye
>
> Nicky
>
> >
> > Thanks,
> >
> > Josef
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091105/d6dea6c0/attachment.html>

From peridot.faceted at gmail.com  Thu Nov  5 23:19:56 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Thu, 5 Nov 2009 23:19:56 -0500
Subject: [SciPy-User] characteristic functions of probability
	distributions
In-Reply-To: <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com>
References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
	<fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>
	<45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com>
Message-ID: <ce557a360911052019m16aaa9c0ye07e4faa13a95276@mail.gmail.com>

2009/11/5 David Goldsmith <d.l.goldsmith at gmail.com>:
> On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest <vanforeest at gmail.com>
> wrote:
>>
>> Hi Josef,
>> > Second related question, since I'm not good with complex numbers.
>> >
>> > scipy.integrate.quad of a complex function returns the absolute value.
>> > Is there a numerical integration function in scipy that returns the
>> > complex integral or do I have to integrate the real and imaginary
>> > parts separately?
>>
>> You want to compute \int_w^z f(t) dt? When f is analytic (i.e.,
>> satisfies the Cauchy Riemann equations) this integral is path
>> independent. Otherwise the path from w to z is of importance. You
>> might like the book Visual Complex Analysis by Needham for intuition.
>
> Furthermore, if f is analytic in an (open) region R homotopic to an (open)
> disc, then the integral (an integer number of times) around *any* _closed_
> path wholly in R is identically equal to zero; there's a similar statement
> (though the end value is a multiple of 2ipi) if f has only poles of finite
> order in R.? (Indeed, these properties should be used to unit test any
> numerical complex path integration routine.)? Are any of your paths closed?

This may well be a red herring. It happens fairly often (to me at
least) that I want to integrate or otherwise manipulate a function
whose values are complex but whose independent variable is real.

Such a function can arise by substituting a path into an analytic
function, but there are potentially many other ways to get such a
thing - for example you might choose to represent some random function
R -> R2 as R -> C instead. Even if it's obtained by feeding a path
into some function from C -> C, it happens very often that that
function isn't analytic - say it involves an absolute value, or
involves the complex conjugate.

There are definitely situations in which all the clever machinery of
analytic functions can be applied to integration problems (or for that
matter, contour integration may be the best way available to evaluate
some complex function), but there are also plenty of situations where
what you want is just a real function whose values happen to be
complex numbers. (Or vectors of length n for that matter.) But I don't
think that any of the adaptive quadrature gizmos can handle such a
case, so you might be stuck integrating the real and imaginary parts
separately.

If you *are* in a situation where you're dealing with an analytic
function, then as long as you're well away from its poles and your
path is nice enough, you may find that it's very well approximated by
a polynomial of high degree, which will let you use Gaussian
quadrature, which can very easily work with complex-valued functions.
The Romberg integration might even work unmodified.

Anne

> DG
>>
>> bye
>>
>> Nicky
>>
>> >
>> > Thanks,
>> >
>> > Josef
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From peridot.faceted at gmail.com  Thu Nov  5 23:48:21 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Thu, 5 Nov 2009 23:48:21 -0500
Subject: [SciPy-User] Comparing variable time-shifted two measurements
In-Reply-To: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com>
References: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com>
Message-ID: <ce557a360911052048y450caccdr3014ffcb4599c877@mail.gmail.com>

2009/11/5 G?khan Sever <gokhansever at gmail.com>:
> Hello,
>
> I have two aircraft based aerosol measurements. The first one is dccnConSTP
> (blue), and the latter is CPCConc (red) as shown in this screen capture.
> (http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to
> compare these two measurements. It is expected to see that they must have a
> positive correlation throughout the flight. However, the instrument that
> gives CPCConc was experiencing a sampling issue and therefore making a
> varying time-shifted measurements with respect to the first instrument.
> (From the first box it is about 20 seconds, 24 from the seconds before the
> dccnConSTP measurements shows up.) In other words in different altitude
> levels, I have varying time differences in between these two measurements in
> terms of their shapes. So, my goal turns to addressing this variable
> shifting issue before I start doing the comparisons.
>
> Is there a known automated approach to correct this mentioned varying-lag
> issue? If so, how?

There are several tools you can use, depending on exactly what the problem is.

If the problem is that there's a constant lag for each data set but
you don't know what it is, then you can use the correlation to fit for
the lag - if you take the correlation of two vectors, then the highest
peak in the correlation vector is the lag where the two vectors are
most similar. Correlations can be calculated rapidly using FFTs.

If the lag isn't constant over a data set, you can try using
correlations to find the lag at several points in the data set and
interpolate to get the lag as a function of time (but be careful -
depending on what caused the lag, a steadily-drifting model isn't
necessarily appropriate; maybe you'll have periods of constant offset
separated by jumps).

If you know the lag, but it isn't constant and you're not sure how to
resample your data set to remove the lag, look at scipy's ndimage.
This should have the tools to do what you want.

If your data sets are unevenly sampled, so that you can't use simple
correlations, I'm not sure quite what to suggest, except perhaps
interpolating them to evenly-spaced samples and then running the
correlation. For this try scipy.interpolate.

If you do end up fitting for the lag, keep in mind that you'll have
adjusted the lags to make the time series as similar as possible, so
that there's a risk of overestimating their similarities. But the only
way around that problem is to know the lags from some independent
source.

Anne

> Thank you.
>
> --
> G?khan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From josef.pktd at gmail.com  Fri Nov  6 00:02:47 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 6 Nov 2009 00:02:47 -0500
Subject: [SciPy-User] characteristic functions of probability
	distributions
In-Reply-To: <ce557a360911052019m16aaa9c0ye07e4faa13a95276@mail.gmail.com>
References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
	<fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>
	<45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com>
	<ce557a360911052019m16aaa9c0ye07e4faa13a95276@mail.gmail.com>
Message-ID: <1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com>

On Thu, Nov 5, 2009 at 11:19 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/5 David Goldsmith <d.l.goldsmith at gmail.com>:
>> On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest <vanforeest at gmail.com>
>> wrote:
>>>
>>> Hi Josef,
>>> > Second related question, since I'm not good with complex numbers.
>>> >
>>> > scipy.integrate.quad of a complex function returns the absolute value.
>>> > Is there a numerical integration function in scipy that returns the
>>> > complex integral or do I have to integrate the real and imaginary
>>> > parts separately?
>>>
>>> You want to compute \int_w^z f(t) dt? When f is analytic (i.e.,
>>> satisfies the Cauchy Riemann equations) this integral is path
>>> independent. Otherwise the path from w to z is of importance. You
>>> might like the book Visual Complex Analysis by Needham for intuition.
>>
>> Furthermore, if f is analytic in an (open) region R homotopic to an (open)
>> disc, then the integral (an integer number of times) around *any* _closed_
>> path wholly in R is identically equal to zero; there's a similar statement
>> (though the end value is a multiple of 2ipi) if f has only poles of finite
>> order in R.? (Indeed, these properties should be used to unit test any
>> numerical complex path integration routine.)? Are any of your paths closed?
>
> This may well be a red herring. It happens fairly often (to me at
> least) that I want to integrate or otherwise manipulate a function
> whose values are complex but whose independent variable is real.
>
> Such a function can arise by substituting a path into an analytic
> function, but there are potentially many other ways to get such a
> thing - for example you might choose to represent some random function
> R -> R2 as R -> C instead. Even if it's obtained by feeding a path
> into some function from C -> C, it happens very often that that
> function isn't analytic - say it involves an absolute value, or
> involves the complex conjugate.
>
> There are definitely situations in which all the clever machinery of
> analytic functions can be applied to integration problems (or for that
> matter, contour integration may be the best way available to evaluate
> some complex function), but there are also plenty of situations where
> what you want is just a real function whose values happen to be
> complex numbers. (Or vectors of length n for that matter.) But I don't
> think that any of the adaptive quadrature gizmos can handle such a
> case, so you might be stuck integrating the real and imaginary parts
> separately.
>
> If you *are* in a situation where you're dealing with an analytic
> function, then as long as you're well away from its poles and your
> path is nice enough, you may find that it's very well approximated by
> a polynomial of high degree, which will let you use Gaussian
> quadrature, which can very easily work with complex-valued functions.
> The Romberg integration might even work unmodified.

Sorry for not coming back to this earlier,

Thanks Nicky, I looked at some papers by Ward Whitt and they look
interesting but much more than what I want to chew on right now. There
is more background, that I would have to read, than I have time right
now for this. I finally added "matlab" to my google searches, and I
think I found some references that use discretization and fft more
directly.

The integration problem should be pretty "nice", just a continuous
fourier transform and the inverse

http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29#Definition
http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29#Inversion_formulas

For many distributions there is an explicit formula for both the
density and the characteristic function, e.g. normal
http://en.wikipedia.org/wiki/Normal_distribution#Characteristic_function

For some distributions only the characteristic functions has a closed
form expression, and the pdf or cdf has to be recovered numerically,
and I would have liked to have a generic method to go between the two.

I don't think I ever needed a path integral in my life, and I'm pretty
much a newbie to complex numbers, so parts of your explanations are
still quite a bit over my head. I think, I will come back to this
after I looked more at the examples where the estimation of a
statistical model or of a distribution is done in terms of the
characteristic function instead of the density.

The immediate example that I had tried, was (integration from -large
number to +large number)

integral exp(i t x)dF(x)  = integrate.quad(exp(itx)*f(x))

or do I have to do

integral exp(i t x)dF(x)  = integrate.quad(real(exp(itx)*f(x))) + j *
integrate.quad(imag(exp(itx)*f(x)))

or is there another way?
The solution/integral might be either real or complex.


Thanks,

Josef

>
> Anne
>
>> DG
>>>
>>> bye
>>>
>>> Nicky
>>>
>>> >
>>> > Thanks,
>>> >
>>> > Josef


From denis-bz-py at t-online.de  Wed Nov  4 11:24:37 2009
From: denis-bz-py at t-online.de (denis)
Date: Wed, 04 Nov 2009 17:24:37 +0100
Subject: [SciPy-User] RBF plot in reference/tutorial/interpolate.html
Message-ID: <hcs9s5$g9b$1@ger.gmane.org>

Scipy doc people,
   in http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#d-example
last updated Oct 08

the two 1d plots
     Interpolation using univariate spline
     Interpolation using RBF - multiquadrics
look rather similar, hmm
- plt.plot(xi, yi, 'g')
+ plt.plot(xi, fi, 'g')

cheers
   -- denis


From scott.sinclair.za at gmail.com  Fri Nov  6 06:03:26 2009
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Fri, 6 Nov 2009 13:03:26 +0200
Subject: [SciPy-User] RBF plot in reference/tutorial/interpolate.html
In-Reply-To: <hcs9s5$g9b$1@ger.gmane.org>
References: <hcs9s5$g9b$1@ger.gmane.org>
Message-ID: <6a17e9ee0911060303x70c7854eub5c8ca2840c872f3@mail.gmail.com>

>2009/11/4 denis <denis-bz-py at t-online.de>:
> Scipy doc people,
> ? in http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#d-example
> last updated Oct 08
>
> the two 1d plots
> ? ? Interpolation using univariate spline
> ? ? Interpolation using RBF - multiquadrics
> look rather similar, hmm
> - plt.plot(xi, yi, 'g')
> + plt.plot(xi, fi, 'g')

Thanks for spotting that. It's fixed in the doc editor
http://docs.scipy.org/scipy/docs/scipy-docs/tutorial/interpolate.rst

You are welcome to create an account on the doc wiki and help out with
similar fixes in future. Take a look at
http://docs.scipy.org/numpy/Front%20Page/ to see how you can
contribute (esp. the section "Before you start").

Cheers,
Scott


From rcsqtc at iqac.csic.es  Fri Nov  6 05:53:25 2009
From: rcsqtc at iqac.csic.es (Ramon Crehuet)
Date: Fri, 06 Nov 2009 11:53:25 +0100
Subject: [SciPy-User] Contribution to Performance Python
Message-ID: <4AF40025.3000906@iqac.csic.es>

Hi all,
After reading the "Performance Python" page at:
http://www.scipy.org/PerformancePython?action=show
I thought some code with Fortran 90/95 was missing, in partuclar
considering its useful array features. So I have written a couple of
examples with Fortran 90 arrays and with the Fortran 95 forall
construct. Both are nicer (to me :-) ) than the FORTRAN77 loops and also
faster! In my PC a 1000x1000 array gives:

Doing 100 iterations on a 1000x1000 grid
numeric took 5.86 seconds
fortran77 took 3.53 seconds
fortran90-arrays took 1.58 seconds
fortran95-forall took 1.58 seconds
slow (1 iteration) took 9.13 seconds
100 iterations should take about 913.000000 seconds

If this is interesting to the community, who should I contact to have
this included in the scipy web page?
Cheers,
Ramon


This are the two new subroutines. I can send the modified laplace.py to
whoever wants it.
******************************************************
! File flaplace90_arrays.f90

subroutine timestep(u,n,m,dx,dy,error)

implicit none

real (kind=8), dimension(0:n-1,0:m-1), intent(inout):: u


real (kind=8), intent(in) :: dx,dy
real (kind=8), intent(out) :: error
integer, intent(in) :: n,m
real (kind=8), dimension(0:n-1,0:m-1) :: diff
real (kind=8) :: dx2,dy2,dnr_inv

!f2py intent(in) :: dx,dy
!f2py intent(in,out) :: u
!f2py intent(out) :: error
!f2py intent(hide) :: n,m

dx2 = dx*dx
dy2 = dy*dy
dnr_inv = 0.5d0 / (dx2+dy2)

diff=u

u(1:n-2, 1:m-2) = ((u(0:n-3, 1:m-2) + u(2:n-1, 1:m-2))*dy2 + &
                         (u(1:n-2,0:m-3) + u(1:n-2, 2:m-1))*dx2)*dnr_inv

error=sqrt(sum((u-diff)**2))

end subroutine
******************************************************
! File flaplace95_forall.f90

subroutine timestep(u,n,m,dx,dy,error)

implicit none

real (kind=8), dimension(0:n-1,0:m-1), intent(inout):: u


real (kind=8), intent(in) :: dx,dy
real (kind=8), intent(out) :: error
integer, intent(in) :: n,m
real (kind=8), dimension(0:n-1,0:m-1) :: diff
real (kind=8) :: dx2,dy2,dnr_inv
integer :: i,j

!f2py intent(in) :: dx,dy
!f2py intent(in,out) :: u
!f2py intent(out) :: error
!f2py intent(hide) :: n,m

dx2 = dx*dx
dy2 = dy*dy
dnr_inv = 0.5d0 / (dx2+dy2)

diff=u

forall (i=1:n-2,j=1:m-2)
    u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2+(u(i,j-1) + u(i,j+1))*dx2)*dnr_inv
end forall

error=sqrt(sum((u-diff)**2))

end subroutine

******************************************************


From fperez.net at gmail.com  Fri Nov  6 07:18:53 2009
From: fperez.net at gmail.com (Fernando Perez)
Date: Fri, 6 Nov 2009 04:18:53 -0800
Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with
	Guido at the Berkeley Py4Science seminar
In-Reply-To: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
Message-ID: <db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>

On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez <fperez.net at gmail.com> wrote:

> if you reside in the San Francisco Bay Area, you may be interested in
> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our
> regular py4science meeting series. ?Guido van Rossum, the creator of
> the Python language, will visit for a session where we will first do a
> very rapid overview of a number of scientific projects that use Python
> (in a lightning talk format) and then we will have an open discussion
> with Guido with hopefully interesting questions going in both
> directions. ?The meeting is open to all, bring your questions!

Video of the event:
http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum

Slides: http://fperez.org/py4science/2009_guido_ucb/index.html

A few blog posts about it:

- Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html

- Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html

- Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html

- Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html

Attendance was excellent (standing room only, and I saw some people
leave because it was too full). Many thanks to all the presenters!

Cheers,

f


From nmb at wartburg.edu  Fri Nov  6 12:17:27 2009
From: nmb at wartburg.edu (Neil Martinsen-Burrell)
Date: Fri, 06 Nov 2009 11:17:27 -0600
Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with
 Guido at the Berkeley Py4Science seminar
In-Reply-To: <db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
Message-ID: <4AF45A27.3020805@wartburg.edu>

On 2009-11-06 06:18 , Fernando Perez wrote:
> On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez<fperez.net at gmail.com>  wrote:
>
>> if you reside in the San Francisco Bay Area, you may be interested in
>> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our
>> regular py4science meeting series.  Guido van Rossum, the creator of
>> the Python language, will visit for a session where we will first do a
>> very rapid overview of a number of scientific projects that use Python
>> (in a lightning talk format) and then we will have an open discussion
>> with Guido with hopefully interesting questions going in both
>> directions.  The meeting is open to all, bring your questions!
>
> Video of the event:
> http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum
>
> Slides: http://fperez.org/py4science/2009_guido_ucb/index.html
>
> A few blog posts about it:
>
> - Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html
>
> - Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html
>
> - Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html
>
> - Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html
>
> Attendance was excellent (standing room only, and I saw some people
> leave because it was too full). Many thanks to all the presenters!

 From the silent majority who lurk here, many thanks to you Fernando for 
setting this up (and for IPython).  It is wonderful to know that the 
concerns and achievements of scientific computing in Python are on the 
radar of the group of people responsible for leading the language.  If 
you have thoughts on how the wider community can contribute to this sort 
of communication in the future, please share.

-Neil


From dsdale24 at gmail.com  Fri Nov  6 12:34:34 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Fri, 6 Nov 2009 12:34:34 -0500
Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with
	Guido at the Berkeley Py4Science seminar
In-Reply-To: <4AF45A27.3020805@wartburg.edu>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF45A27.3020805@wartburg.edu>
Message-ID: <a08e5f80911060934s4bd7d7aaqc50c8c84280f4658@mail.gmail.com>

On Fri, Nov 6, 2009 at 12:17 PM, Neil Martinsen-Burrell
<nmb at wartburg.edu> wrote:
> On 2009-11-06 06:18 , Fernando Perez wrote:
>> On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez<fperez.net at gmail.com> ?wrote:
>>
>>> if you reside in the San Francisco Bay Area, you may be interested in
>>> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our
>>> regular py4science meeting series. ?Guido van Rossum, the creator of
>>> the Python language, will visit for a session where we will first do a
>>> very rapid overview of a number of scientific projects that use Python
>>> (in a lightning talk format) and then we will have an open discussion
>>> with Guido with hopefully interesting questions going in both
>>> directions. ?The meeting is open to all, bring your questions!
>>
>> Video of the event:
>> http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum
>>
>> Slides: http://fperez.org/py4science/2009_guido_ucb/index.html
>>
>> A few blog posts about it:
>>
>> - Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html
>>
>> - Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html
>>
>> - Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html
>>
>> - Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html
>>
>> Attendance was excellent (standing room only, and I saw some people
>> leave because it was too full). Many thanks to all the presenters!
>
> ?From the silent majority who lurk here, many thanks to you Fernando for
> setting this up (and for IPython).

Yes, thank you Fernando. If you are at liberty to comment further on
discussions concerning parallel computing and the GIL, I would be very
interested to hear about it.

Darren


From gokhansever at gmail.com  Fri Nov  6 18:36:23 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 6 Nov 2009 17:36:23 -0600
Subject: [SciPy-User] Comparing variable time-shifted two measurements
In-Reply-To: <ce557a360911052048y450caccdr3014ffcb4599c877@mail.gmail.com>
References: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com>
	<ce557a360911052048y450caccdr3014ffcb4599c877@mail.gmail.com>
Message-ID: <49d6b3500911061536v780f3686v594b58bf26e419c9@mail.gmail.com>

On Thu, Nov 5, 2009 at 10:48 PM, Anne Archibald
<peridot.faceted at gmail.com>wrote:

> 2009/11/5 G?khan Sever <gokhansever at gmail.com>:
> > Hello,
> >
> > I have two aircraft based aerosol measurements. The first one is
> dccnConSTP
> > (blue), and the latter is CPCConc (red) as shown in this screen capture.
> > (http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to
> > compare these two measurements. It is expected to see that they must have
> a
> > positive correlation throughout the flight. However, the instrument that
> > gives CPCConc was experiencing a sampling issue and therefore making a
> > varying time-shifted measurements with respect to the first instrument.
> > (From the first box it is about 20 seconds, 24 from the seconds before
> the
> > dccnConSTP measurements shows up.) In other words in different altitude
> > levels, I have varying time differences in between these two measurements
> in
> > terms of their shapes. So, my goal turns to addressing this variable
> > shifting issue before I start doing the comparisons.
> >
> > Is there a known automated approach to correct this mentioned varying-lag
> > issue? If so, how?
>
> There are several tools you can use, depending on exactly what the problem
> is.
>
> If the problem is that there's a constant lag for each data set but
> you don't know what it is, then you can use the correlation to fit for
> the lag - if you take the correlation of two vectors, then the highest
> peak in the correlation vector is the lag where the two vectors are
> most similar.


That's how I discovered the varying lag. I was expecting a nicer correlation
when I shifted the data at a constant value however, it turned wrong and
later analysis showed that the lags are not constant.


> Correlations can be calculated rapidly using FFTs.
>

I am curious to know how to use FFT in this case?


>
> If the lag isn't constant over a data set, you can try using
> correlations to find the lag at several points in the data set and
> interpolate to get the lag as a function of time (but be careful -
> depending on what caused the lag, a steadily-drifting model isn't
> necessarily appropriate; maybe you'll have periods of constant offset
> separated by jumps).
>

Ok, good idea. Probably the more finer I correlate the data the higher
accuracy I will get from the correlations therefore a better interpolated
result. "steadily-drifting model" is another new term to me.


>
> If you know the lag, but it isn't constant and you're not sure how to
> resample your data set to remove the lag, look at scipy's ndimage.
> This should have the tools to do what you want.
>

This is a 1D data. Could you give me an example how to utilize the ndimage
library for my case?


>
> If your data sets are unevenly sampled, so that you can't use simple
> correlations, I'm not sure quite what to suggest, except perhaps
> interpolating them to evenly-spaced samples and then running the
> correlation. For this try scipy.interpolate.
>

I don't think uneven sampling is an issue in my case. Both instruments
sample at 1Hz. One samples from 0.5 L/min flow, the other from 1.0 L/min
where it cannot maintain this rate when the pressure gets lower.


>
> If you do end up fitting for the lag, keep in mind that you'll have
> adjusted the lags to make the time series as similar as possible, so
> that there's a risk of overestimating their similarities. But the only
> way around that problem is to know the lags from some independent
> source.
>

Thank you for your suggestions. For now I am sure that these varying lags
are only determined via a manual inspection. If I had the sample flow rate
recorded than it would be easy to correct the data, unfortunately this will
be something for the future experiments.


> Anne
>
> > Thank you.
> >
> > --
> > G?khan
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091106/f14410aa/attachment.html>

From aarchiba at physics.mcgill.ca  Fri Nov  6 19:13:20 2009
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 6 Nov 2009 19:13:20 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
Message-ID: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>

Hi,

I have implemented a simple Bayesian regression program (it takes
events modulo one and returns a posterior probability that the data is
phase-invariant plus a posterior distribution for two parameters
(modulation fraction and phase) in case there is modulation). I'm
rather new at this, so I'd like to construct some unit tests. Does
anyone have any suggestions on how to go about this?

For a frequentist periodicity detector, the return value is a
probability that, given the null hypothesis is true, the statistic
would be this extreme. So I can construct a strong unit test by
generating a collection of data sets given the null hypothesis,
evaluating the statistic, and seeing whether the number that claim to
be significant at a 5% level is really 5%. (In fact I can use the
binomial distribution to get limits on the number of false positive.)
This gives me a unit test that is completely orthogonal to my
implementation, and that passes if and only if the code works. For a
Bayesian hypothesis testing setup, I don't really see how to do
something analogous.

I can generate non-modulated data sets and confirm that my code
returns a high probability that the data is not modulated, but how
high should I expect the probability to be? I can generate data sets
with models with known parameters and check that the best-fit
parameters are close to the known parameters - but how close? Even if
I do it many times, is the posterior mean unbiased? What about the
posterior mode or median? I can even generate models and then data
sets that are drawn from the prior distribution, but what should I
expect from the code output on such a data set? I feel sure there's
some test that verifies a statistical property of Bayesian
estimators/hypothesis testers, but I cant quite put my finger on it.

Suggestions welcome.

Thanks,
Anne


From josef.pktd at gmail.com  Fri Nov  6 22:37:44 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 6 Nov 2009 22:37:44 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
Message-ID: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>

On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> Hi,
>
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation). I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?
>
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
>
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.


The Bayesian experts are at pymc, maybe you can look at there tests
for inspiration. I don't know those, since I never looked at that
part.

I never tried to test a Bayesian estimator but many properties are
still the same as in the non-Bayesian analysis. In my Bayesian past, I
essentially only used normal and t distributions, and binomial.

One of my first tests for these things is to create a huge sample and
see whether the parameter estimates converge. With Bayesian analysis
you still have the law of large numbers, (for non-dogmatic priors)
Do you have an example with a known posterior? Then, the posterior
with a large sample or the average in a Monte Carlo should still be
approximately the true one.
For symmetric distributions, the Bayesian posterior confidence
intervals and posterior mean should be roughly the same as the
frequentist estimates. With diffuse priors, in many cases the results
are exactly the same in Bayesian and MLE.
Another version I used in the past is to trace the posterior mean, as
the prior variance is reduced, in one extreme you should get the prior
back in the other extreme the MLE.

> I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set?

If you ignore the Bayesian interpretation, then this is just a
standard sampling problem, you draw prior parameters and observations,
the rest is just finding the conditional and marginal probabilities. I
think the posterior odds ratio should converge in a large Monte Carlo
to the true one, and the significance levels should correspond to the
one that has been set for the test (5%).
(In simplest case of conjugate priors, you can just interpret the
prior as a previous sample and you are back to a frequentist
explanation.)

The problem is that with an informative prior, you always have a
biased estimator in small samples and the posterior odds ratio is
affected by an informative prior.  And "real" Bayesians don't care
about sampling properties.

What are your prior distributions and the likelihood function in your
case? Can you model degenerate and diffuse priors, so that an
informative prior doesn't influence you sampling results?
I'm trying to think of special cases where you could remove the effect
of the prior.

It's a bit vague because I don't see the details, and I haven't looked
at this in a while.


>
> Suggestions welcome.
>
> Thanks,
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From tpk at kraussfamily.org  Sat Nov  7 10:09:27 2009
From: tpk at kraussfamily.org (Tom K.)
Date: Sat, 7 Nov 2009 07:09:27 -0800 (PST)
Subject: [SciPy-User] [SciPy-user] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
Message-ID: <26241135.post@talk.nabble.com>


Hi Anne, interesting question.

I'm not really sure what a Bayesian hypothesis tester is, but I expect that
it results in a random variable.  For a given input prior and measurement
distribution, and choice of hypothesis (signal present or signal absent),
can you know the distribution of this random variable?  If so, it could come
down to a test that this random variable - or a function of it such as mean
or probability that it is greater than some value - behaves as expected. 
How do you create a unit-test that a random variable generator is working? 
If the random variables were all iid normal, you could average a bunch and
then test that the mean of the sample was close to the mean of the
distribution - which is going to be impossible to guarantee, since there is
a non-zero probability that the mean is large.  In practice however it is
likely that your test will pass, but you will probably have to tack down the
seeds and make sure that the probability of failure is really small so
changing seeds (and hence the underlying sequence of "random" inputs) won't
likely cause a failure.

Is there anything in scipy's stats module to test that a series of random
variables has a given distribution?  Maybe scipy.stats.kstest?  Who the heck
are Kolmogorov and Smirnov anyway :-)?


Anne Archibald-2 wrote:
> 
> Hi,
> 
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation). I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?
> 
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
> 
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.
> 
> Suggestions welcome.
> 
> Thanks,
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/Unit-testing-of-Bayesian-estimator-tp26240654p26241135.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From bsouthey at gmail.com  Sat Nov  7 22:23:25 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Sat, 7 Nov 2009 21:23:25 -0600
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
Message-ID: <bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>

On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> Hi,
>
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation).

I do not know your field, a little rusty on certain issues and I do
not consider myself a Bayesian.

Exactly what type of Bayesian did you use?
I also do not know how you implemented it especially if it is
empirical or Monte Carlo Markov Chains.

> I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?

Since this is a test, the theoretical 'correctness' is irrelevant. So
I would guess that you should use very informative priors and data
with a huge amount of information. That should make the posterior have
an extremely narrow range so your modal estimate is very close to the
true value within a very small range.

After that it really depends on the algorithm, the data used and what
you need to test. Basically you just have to say given this set of
inputs I get this 'result' that I consider reasonable. After all, if
the implementation of algorithm works then it is most likely the
inputs that are a problem. In statistics, problems usually enter
because the desired model can not be estimated from the provided data.
Separation of user errors from a bug in the code usually identified by
fitting simpler or alternative models.

>
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
>
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.
>
> Suggestions welcome.
>
> Thanks,
> Anne

Please do not mix Frequentist or Likelihood concepts with Bayesian.
Also you never generate data for estimation from the prior
distribution, you generate it from the posterior distribution as that
is what your estimating.

Really in Bayesian sense all this data generation is unnecessary
because you have already calculated that information in computing the
posteriors. The posterior of a parameter is a distribution not a
single number so you just compare distributions.  For example, you can
compute modal values and construct Bayesian credible intervals of the
parameters. These should make very strong sense to the original values
simulated.

For Bayesian work, you must address the data and the priors. In
particular, you need to be careful about the informativeness of the
prior. You can get great results just because your prior was
sufficiently informative but you can get great results because you
data was very informative.

Depending on how it was implemented, a improper prior can be an issue
because these do not guarantee a proper posterior (but often do lead
to proper posteriors). So if your posterior is improper then you are
in a very bad situation and can lead to weird results some or all of
the time.Some times this is can easily be fixed such as by putting
bounds on flat priors. Whereas proper priors give proper posteriors.

But as a final comment, it should not matter which approach you use as
if you do not get what you simulated then either your code is wrong or
you did not simulate what your code implements. (Surprising how
frequent the latter is.)

Bruce


From peridot.faceted at gmail.com  Sun Nov  8 02:14:37 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 8 Nov 2009 02:14:37 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
Message-ID: <ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>

2009/11/6  <josef.pktd at gmail.com>:
> On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald
> <aarchiba at physics.mcgill.ca> wrote:
>> Hi,
>>
>> I have implemented a simple Bayesian regression program (it takes
>> events modulo one and returns a posterior probability that the data is
>> phase-invariant plus a posterior distribution for two parameters
>> (modulation fraction and phase) in case there is modulation). I'm
>> rather new at this, so I'd like to construct some unit tests. Does
>> anyone have any suggestions on how to go about this?
>>
>> For a frequentist periodicity detector, the return value is a
>> probability that, given the null hypothesis is true, the statistic
>> would be this extreme. So I can construct a strong unit test by
>> generating a collection of data sets given the null hypothesis,
>> evaluating the statistic, and seeing whether the number that claim to
>> be significant at a 5% level is really 5%. (In fact I can use the
>> binomial distribution to get limits on the number of false positive.)
>> This gives me a unit test that is completely orthogonal to my
>> implementation, and that passes if and only if the code works. For a
>> Bayesian hypothesis testing setup, I don't really see how to do
>> something analogous.
>>
>> I can generate non-modulated data sets and confirm that my code
>> returns a high probability that the data is not modulated, but how
>> high should I expect the probability to be? I can generate data sets
>> with models with known parameters and check that the best-fit
>> parameters are close to the known parameters - but how close? Even if
>> I do it many times, is the posterior mean unbiased? What about the
>> posterior mode or median? I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set? I feel sure there's
>> some test that verifies a statistical property of Bayesian
>> estimators/hypothesis testers, but I cant quite put my finger on it.
>
>
> The Bayesian experts are at pymc, maybe you can look at there tests
> for inspiration. I don't know those, since I never looked at that
> part.
>
> I never tried to test a Bayesian estimator but many properties are
> still the same as in the non-Bayesian analysis. In my Bayesian past, I
> essentially only used normal and t distributions, and binomial.
>
> One of my first tests for these things is to create a huge sample and
> see whether the parameter estimates converge. With Bayesian analysis
> you still have the law of large numbers, (for non-dogmatic priors)

As far as getting the code roughly working, this is what I used; just
run it generating lots of photons and see that roughly the right
parameters come out. Unfortunately, this isn't really very sensitive
to all the things that are supposed to make a Bayesian estimator
better than (say) a maximum-likelihood estimator; I could have the
probability estimation pretty badly wrong, but there are so many
photons that anything but the right parameters are such a horrible fit
even a somewhat wrong algorithm won't select them.

Maybe you meant something different: I could also try fixing some
model parameters and generating just a handful of photons, so I get a
crummy estimate, but then repeating the photon generation and fit many
times, to see if the average value of the best-fit parameters comes
out close to the true parameters. But this is a test for unbiasedness
of the estimator, and it's not clear that this estimator should be
unbiased even if correct.

> Do you have an example with a known posterior? Then, the posterior
> with a large sample or the average in a Monte Carlo should still be
> approximately the true one.
> For symmetric distributions, the Bayesian posterior confidence
> intervals and posterior mean should be roughly the same as the
> frequentist estimates. With diffuse priors, in many cases the results
> are exactly the same in Bayesian and MLE.
> Another version I used in the past is to trace the posterior mean, as
> the prior variance is reduced, in one extreme you should get the prior
> back in the other extreme the MLE.

My priors are flat, on (0,1) in both phase and pulsed fraction. It
seems a bit peculiar to use anything else in phase, but I can imagine
some sort of logarithmic prior for pulsed fraction (making 0.01-0.1
equally likely to 0.1-1). I haven't experimented with introducing
localized priors, but it seems like that too wouldn't be very
sensitive to whether the Bayesian calculation is right; if the prior
insists that the values are both 0.5, then any remotely sane algorithm
will come up with posteriors that are also 0.5.

>> I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set?
>
> If you ignore the Bayesian interpretation, then this is just a
> standard sampling problem, you draw prior parameters and observations,
> the rest is just finding the conditional and marginal probabilities. I
> think the posterior odds ratio should converge in a large Monte Carlo
> to the true one, and the significance levels should correspond to the
> one that has been set for the test (5%).
> (In simplest case of conjugate priors, you can just interpret the
> prior as a previous sample and you are back to a frequentist
> explanation.)

This sounds like what I was trying for - draw a model according to the
priors, then generate a data set according to the model. I then get
some numbers out: the simplest is a probability that the model was
pulsed, but I can also get a credible interval or an estimated CDF for
the model parameters.  But I'm trying to figure out what test I should
apply to those values to see if they make sense.

For a credible interval, I suppose I could take (say) a 95% credible
interval, then 95 times out of a hundred the model parameters I used
to generate the data set should be in the credible interval. And I
should be able to use the binomial distribution to put limits on how
close to 95% I should get in M trials. This seems to work, but I'm not
sure I understand why. The credible region is obtained from a
probability distribution for the model parameters, but I am turning
things around and testing the distribution of credible regions.

In any case, that seems to work, so now I just need to figure out a
similar test for the probability of being pulsed.

> The problem is that with an informative prior, you always have a
> biased estimator in small samples and the posterior odds ratio is
> affected by an informative prior. ?And "real" Bayesians don't care
> about sampling properties.
>
> What are your prior distributions and the likelihood function in your
> case? Can you model degenerate and diffuse priors, so that an
> informative prior doesn't influence you sampling results?
> I'm trying to think of special cases where you could remove the effect
> of the prior.

I've put the code on github in case that helps make this any clearer:
http://github.com/aarchiba/bayespf

The model I'm using is either: completely uniform mod 1 (probability
0.5) or: the PDF is a cosine plus a constant, where the two parameters
are the fraction of area under the cosine (as opposed to under the
constant) and the phase offset of the cosine. The likelihood is just
(modulo logs for range issues) the product over all observed phases x
of PDF(fraction, phase, x). So the mode of the posterior is exactly
the maximum-likelihood estimate (whether or not I got the math right,
more or less).

> It's a bit vague because I don't see the details, and I haven't looked
> at this in a while.

As is probably obvious, I'm pretty vague on Bayesian statistics in
general. But I'm working on it.

Anne

>
>
>
>
>
>
>>
>> Suggestions welcome.
>>
>> Thanks,
>> Anne
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From peridot.faceted at gmail.com  Sun Nov  8 02:25:04 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 8 Nov 2009 02:25:04 -0500
Subject: [SciPy-User] [SciPy-user] Unit testing of Bayesian estimator
In-Reply-To: <26241135.post@talk.nabble.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<26241135.post@talk.nabble.com>
Message-ID: <ce557a360911072325j1fcb0fd6x8a9f25ba4eb94b7e@mail.gmail.com>

2009/11/7 Tom K. <tpk at kraussfamily.org>:
>
> Hi Anne, interesting question.
>
> I'm not really sure what a Bayesian hypothesis tester is, but I expect that
> it results in a random variable. ?For a given input prior and measurement
> distribution, and choice of hypothesis (signal present or signal absent),
> can you know the distribution of this random variable? ?If so, it could come
> down to a test that this random variable - or a function of it such as mean
> or probability that it is greater than some value - behaves as expected.
> How do you create a unit-test that a random variable generator is working?
> If the random variables were all iid normal, you could average a bunch and
> then test that the mean of the sample was close to the mean of the
> distribution - which is going to be impossible to guarantee, since there is
> a non-zero probability that the mean is large. ?In practice however it is
> likely that your test will pass, but you will probably have to tack down the
> seeds and make sure that the probability of failure is really small so
> changing seeds (and hence the underlying sequence of "random" inputs) won't
> likely cause a failure.
>
> Is there anything in scipy's stats module to test that a series of random
> variables has a given distribution? ?Maybe scipy.stats.kstest? ?Who the heck
> are Kolmogorov and Smirnov anyway :-)?

Focusing on the hypothesis testing part, what my code does is take a
collection of photons and return the probability that they are drawn
from a pulsed distribution. My prior has two alternatives: not pulsed
(p=0.5) and pulsed (p=0.5, parameters randomly chosen). I feed this to
a Bayesian gizmo and get back a probability that the photons were
drawn from the former case.

In terms of testing, the very crude tests pass: if I give it a zillion
photons, it can correctly distinguish pulsed from unpulsed. But what
I'd like to test is whether the probability it returns is correct.
What I'd really like is some statistical test I can do on the
procedure to check whether the returned numbers are correct. Of
course, if I knew what distribution they were supposed to have, I
could just feed them to the K-S test. But I don't.

Part of the problem is that the data quality affects the result: if
feed in a zillion unpulsed photons, I get a variety of probabilities
that are close to zero - but how close is correct? I have no idea. If
I use pulsed photons, it is even more complicated: for a large pulsed
fraction, I'll get a variety of probabilities that are close to one.
But if I either reduce the number of photons or the pulsed fraction,
it gets harder to distinguish pulsed from unpulsed and the
probabilities start to drop. But I have no real idea how much.

Anne

> Anne Archibald-2 wrote:
>>
>> Hi,
>>
>> I have implemented a simple Bayesian regression program (it takes
>> events modulo one and returns a posterior probability that the data is
>> phase-invariant plus a posterior distribution for two parameters
>> (modulation fraction and phase) in case there is modulation). I'm
>> rather new at this, so I'd like to construct some unit tests. Does
>> anyone have any suggestions on how to go about this?
>>
>> For a frequentist periodicity detector, the return value is a
>> probability that, given the null hypothesis is true, the statistic
>> would be this extreme. So I can construct a strong unit test by
>> generating a collection of data sets given the null hypothesis,
>> evaluating the statistic, and seeing whether the number that claim to
>> be significant at a 5% level is really 5%. (In fact I can use the
>> binomial distribution to get limits on the number of false positive.)
>> This gives me a unit test that is completely orthogonal to my
>> implementation, and that passes if and only if the code works. For a
>> Bayesian hypothesis testing setup, I don't really see how to do
>> something analogous.
>>
>> I can generate non-modulated data sets and confirm that my code
>> returns a high probability that the data is not modulated, but how
>> high should I expect the probability to be? I can generate data sets
>> with models with known parameters and check that the best-fit
>> parameters are close to the known parameters - but how close? Even if
>> I do it many times, is the posterior mean unbiased? What about the
>> posterior mode or median? I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set? I feel sure there's
>> some test that verifies a statistical property of Bayesian
>> estimators/hypothesis testers, but I cant quite put my finger on it.
>>
>> Suggestions welcome.
>>
>> Thanks,
>> Anne
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Unit-testing-of-Bayesian-estimator-tp26240654p26241135.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From peridot.faceted at gmail.com  Sun Nov  8 02:47:04 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 8 Nov 2009 02:47:04 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
Message-ID: <ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>

2009/11/7 Bruce Southey <bsouthey at gmail.com>:
> On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald
> <aarchiba at physics.mcgill.ca> wrote:
>> Hi,
>>
>> I have implemented a simple Bayesian regression program (it takes
>> events modulo one and returns a posterior probability that the data is
>> phase-invariant plus a posterior distribution for two parameters
>> (modulation fraction and phase) in case there is modulation).
>
> I do not know your field, a little rusty on certain issues and I do
> not consider myself a Bayesian.
>
> Exactly what type of Bayesian did you use?
> I also do not know how you implemented it especially if it is
> empirical or Monte Carlo Markov Chains.

It's an ultra-simple toy problem, really: I did the numerical
integration in the absolute simplest way possible, by evaluating the
quantity to be evaluated on a grid and averaging. See github for
details:
http://github.com/aarchiba/bayespf

I can certainly improve on this, but I'd rather get my testing issues
sorted out first, so that I can test the tests, as it were, on an
implementation I'm reasonably confident is correct, before changing it
to a mathematically more subtle one.

>> I'm
>> rather new at this, so I'd like to construct some unit tests. Does
>> anyone have any suggestions on how to go about this?
>
> Since this is a test, the theoretical 'correctness' is irrelevant. So
> I would guess that you should use very informative priors and data
> with a huge amount of information. That should make the posterior have
> an extremely narrow range so your modal estimate is very close to the
> true value within a very small range.

This doesn't really test whether the estimator is doing a good job,
since if I throw mountains of information at it, even a rather badly
wrong implementation will eventually converge to the right answer.
(This is painful experience speaking.)

I disagree on the issue of theoretical correctness, though. The best
tests do exactly that: test the theoretical correctness of the routine
in question, ideally without any reference to the implementation. To
test the SVD, for example, you just test that the two matrices are
both orthogonal, and you test that multiplying them together with the
singular values between gives you your original matrix. If your
implementation passes this test, it is computing the SVD just fine, no
matter what it looks like inside.

With the frequentist signal-detection statistics I'm more familiar
with, I can write exactly this sort of test. I talk a little more
about it here:
http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html

This works too well, it turns out, to apply to scipy's K-S test or my
own Kuiper test, since their p-values are calculated rather
approximately, so they fail.

> After that it really depends on the algorithm, the data used and what
> you need to test. Basically you just have to say given this set of
> inputs I get this 'result' that I consider reasonable. After all, if
> the implementation of algorithm works then it is most likely the
> inputs that are a problem. In statistics, problems usually enter
> because the desired model can not be estimated from the provided data.
> Separation of user errors from a bug in the code usually identified by
> fitting simpler or alternative models.

It's exactly the implementation I don't trust, here. I can scrutinize
the implementation all I like, but I'd really like an independent
check on my calculations, and staring at the code won't get me that.

>>
>> For a frequentist periodicity detector, the return value is a
>> probability that, given the null hypothesis is true, the statistic
>> would be this extreme. So I can construct a strong unit test by
>> generating a collection of data sets given the null hypothesis,
>> evaluating the statistic, and seeing whether the number that claim to
>> be significant at a 5% level is really 5%. (In fact I can use the
>> binomial distribution to get limits on the number of false positive.)
>> This gives me a unit test that is completely orthogonal to my
>> implementation, and that passes if and only if the code works. For a
>> Bayesian hypothesis testing setup, I don't really see how to do
>> something analogous.
>>
>> I can generate non-modulated data sets and confirm that my code
>> returns a high probability that the data is not modulated, but how
>> high should I expect the probability to be? I can generate data sets
>> with models with known parameters and check that the best-fit
>> parameters are close to the known parameters - but how close? Even if
>> I do it many times, is the posterior mean unbiased? What about the
>> posterior mode or median? I can even generate models and then data
>> sets that are drawn from the prior distribution, but what should I
>> expect from the code output on such a data set? I feel sure there's
>> some test that verifies a statistical property of Bayesian
>> estimators/hypothesis testers, but I cant quite put my finger on it.
>>
>> Suggestions welcome.
>>
>> Thanks,
>> Anne
>
> Please do not mix Frequentist or Likelihood concepts with Bayesian.
> Also you never generate data for estimation from the prior
> distribution, you generate it from the posterior distribution as that
> is what your estimating.

Um. I would be picking models from the prior distribution, not data.
However I find the models, I have a well-defined way to generate data
from the model.

Why do you say it's a bad idea to mix Bayesian and frequentist
approaches? It seems to me that as I use them to try to answer similar
questions, it makes sense to compare them; and since I know how to
test frequentist estimators, it's worth seeing whether I can cast
Bayesian estimators in frequentist terms, at least for testing
purposes.

> Really in Bayesian sense all this data generation is unnecessary
> because you have already calculated that information in computing the
> posteriors. The posterior of a parameter is a distribution not a
> single number so you just compare distributions. ?For example, you can
> compute modal values and construct Bayesian credible intervals of the
> parameters. These should make very strong sense to the original values
> simulated.

I take this to mean that I don't need to do simulations to get
credible intervals (while I normally would have to to get confidence
intervals), which I agree with. But this is a different question: I'm
talking about constructing a test by simulating the whole Bayesian
process and seeing whether it behaves as it should. The problem is
coming up with a sufficiently clear mathematical definition of
"should".

> For Bayesian work, you must address the data and the priors. In
> particular, you need to be careful about the informativeness of the
> prior. You can get great results just because your prior was
> sufficiently informative but you can get great results because you
> data was very informative.
>
> Depending on how it was implemented, a improper prior can be an issue
> because these do not guarantee a proper posterior (but often do lead
> to proper posteriors). So if your posterior is improper then you are
> in a very bad situation and can lead to weird results some or all of
> the time.Some times this is can easily be fixed such as by putting
> bounds on flat priors. Whereas proper priors give proper posteriors.

Indeed. I think my priors are pretty safe: 50% chance it's pulsed,
flat priors in phase and pulsed fraction. In the long run I might want
a slightly smarter prior on pulsed fraction, but for the moment I
think it's fine.

> But as a final comment, it should not matter which approach you use as
> if you do not get what you simulated then either your code is wrong or
> you did not simulate what your code implements. (Surprising how
> frequent the latter is.)

This is a bit misleading. If I use a (fairly) small number of photons,
and/or a fairly small pulsed fraction, I should be astonished if I got
back the model parameters exactly. I know already that the data leave
a lot of room for slop, so what I am trying to test is how well this
Bayesian gizmo quantifies that slop.

Anne

> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From tmp50 at ukr.net  Sun Nov  8 06:22:34 2009
From: tmp50 at ukr.net (Dmitrey)
Date: Sun, 08 Nov 2009 13:22:34 +0200
Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions
Message-ID: <E1N75qs-0001cM-FC@ffe12.ukr.net>

Hi scipy.sparse developers and all other scipy users,  
I'm trying to take benefits for solving SLEs in FuncDesigner via involving scipy.sparse.  
Some examples are here  
http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations  
and example for sparse SLEs is here  
http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py  
It already works faster than using dense matrices, but I want to speedup it even more, so I have some questions and seems like bug report (scipy.__version__ 0.7.0):  
  
from scipy import sparse  
from numpy import *  
a=sparse.lil_matrix((3,1))  
a[0:3,:] = ones(3)  
print a.todense()  
#prints  
[[ 1.]  
?[ 0.]  
?[ 0.]]  
while I expect all-ones  
  
Questions:  
1) Seems like a[some_ind,:]=something works very, very slow for lil. I have implemented a workaround, but can I use a[some_ind,:] for another format than lil? (seems like all other ones doesn't support it).  
2) What is current situation with matmat and matvec functions? They say "deprecated" but no alternative is mentioned.  
3) What is current situation with scipy.sparse.linalg.spsolve? It says  
/usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead  
? ' install scikits.umfpack instead', DeprecationWarning )  
But I don't want my code to be dependent on a scikits module. Are there another default/autoselect solver for sparse SLEs?  
If no, which one would you recommend me to use as default for sparse SLEs -   bicg, gmres, something else?  
  
Thank you in advance, D.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091108/44cdd457/attachment.html>

From eadrogue at gmx.net  Sun Nov  8 10:16:25 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Sun, 8 Nov 2009 16:16:25 +0100
Subject: [SciPy-User] the skellam distribution
Message-ID: <20091108151625.GA561@doriath.local>

Hi,

In case somebody is interested, or you want to include it
in scipy. I used these specs here from the R package:
cran.r-project.org/web/packages/skellam/skellam.pdf

Note that I am no statician, somebody who knows what he's
doing (as opposed to me ;) should verify it's correct.


import numpy
import scipy.stats.distributions

# Skellam distribution

ncx2 = scipy.stats.distributions.ncx2

class skellam_gen(scipy.stats.distributions.rv_discrete):
    def _pmf(self, x, mu1, mu2):
        if x < 0:
            px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
        else:
            px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
        return px
    def _cdf(self, x, mu1, mu2):
        x = numpy.floor(x)
        if x < 0:
            px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
        else:
            px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
        return px
    def _stats(self, mu1, mu2):
        mean = mu1 - mu2
        var = mu1 + mu2
        g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
        g2 = 1 / (mu1 + mu2)
        return mean, var, g1, g2
skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
                      shapes="mu1,mu2", extradoc="")


Bye.

-- 
Ernest


From vanforeest at gmail.com  Sun Nov  8 15:30:36 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Sun, 8 Nov 2009 21:30:36 +0100
Subject: [SciPy-User] characteristic functions of probability
	distributions
In-Reply-To: <1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com>
References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com>
	<fa510ff80911021251o769a876drdd3e6c1dac0c13d3@mail.gmail.com>
	<45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com>
	<ce557a360911052019m16aaa9c0ye07e4faa13a95276@mail.gmail.com>
	<1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com>
Message-ID: <fa510ff80911081230g668d410cj8f2d800e2ca4af0c@mail.gmail.com>

Hi Joseph,

> Thanks Nicky, I looked at some papers by Ward Whitt and they look
> interesting but much more than what I want to chew on right now.

I understand. I wish I had the time to study these papers in more detail.

> I don't think I ever needed a path integral in my life,

That must be a most undesirable state of affairs, :-)

> integral exp(i t x)dF(x) ?= integrate.quad(real(exp(itx)*f(x))) + j *
> integrate.quad(imag(exp(itx)*f(x)))
> or is there another way?

Perhaps you recall that Re(exp(ix)) =  cos(x), and Im(exp(ix) =
sin(x). Hence, you might try simply:

integrate.quad(cos(tx)f(x)) + i integrate.quad(sin(tx) f(x))

(untested code though..)

I have my doubt about the stability of these integrations, although I
am by no means an expert on this. Suppose that t is big. Then cos(tx)
varies rapidly in comparison to f(x) as a function of x. Then you are
adding lots of negative and possitive numbers of roughly the same
size... This must result in bogus.

Perhaps it is better not to include any "generic" code to tranform the
characteristic function into a density, unless the methods work
reasonably well.

bye

Nicky


From josef.pktd at gmail.com  Sun Nov  8 17:14:58 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 8 Nov 2009 17:14:58 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
Message-ID: <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>

On Sun, Nov 8, 2009 at 2:14 AM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/6 ?<josef.pktd at gmail.com>:
>> On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald
>> <aarchiba at physics.mcgill.ca> wrote:
>>> Hi,
>>>
>>> I have implemented a simple Bayesian regression program (it takes
>>> events modulo one and returns a posterior probability that the data is
>>> phase-invariant plus a posterior distribution for two parameters
>>> (modulation fraction and phase) in case there is modulation). I'm
>>> rather new at this, so I'd like to construct some unit tests. Does
>>> anyone have any suggestions on how to go about this?
>>>
>>> For a frequentist periodicity detector, the return value is a
>>> probability that, given the null hypothesis is true, the statistic
>>> would be this extreme. So I can construct a strong unit test by
>>> generating a collection of data sets given the null hypothesis,
>>> evaluating the statistic, and seeing whether the number that claim to
>>> be significant at a 5% level is really 5%. (In fact I can use the
>>> binomial distribution to get limits on the number of false positive.)
>>> This gives me a unit test that is completely orthogonal to my
>>> implementation, and that passes if and only if the code works. For a
>>> Bayesian hypothesis testing setup, I don't really see how to do
>>> something analogous.
>>>
>>> I can generate non-modulated data sets and confirm that my code
>>> returns a high probability that the data is not modulated, but how
>>> high should I expect the probability to be? I can generate data sets
>>> with models with known parameters and check that the best-fit
>>> parameters are close to the known parameters - but how close? Even if
>>> I do it many times, is the posterior mean unbiased? What about the
>>> posterior mode or median? I can even generate models and then data
>>> sets that are drawn from the prior distribution, but what should I
>>> expect from the code output on such a data set? I feel sure there's
>>> some test that verifies a statistical property of Bayesian
>>> estimators/hypothesis testers, but I cant quite put my finger on it.
>>
>>
>> The Bayesian experts are at pymc, maybe you can look at there tests
>> for inspiration. I don't know those, since I never looked at that
>> part.
>>
>> I never tried to test a Bayesian estimator but many properties are
>> still the same as in the non-Bayesian analysis. In my Bayesian past, I
>> essentially only used normal and t distributions, and binomial.
>>
>> One of my first tests for these things is to create a huge sample and
>> see whether the parameter estimates converge. With Bayesian analysis
>> you still have the law of large numbers, (for non-dogmatic priors)
>
> As far as getting the code roughly working, this is what I used; just
> run it generating lots of photons and see that roughly the right
> parameters come out. Unfortunately, this isn't really very sensitive
> to all the things that are supposed to make a Bayesian estimator
> better than (say) a maximum-likelihood estimator; I could have the
> probability estimation pretty badly wrong, but there are so many
> photons that anything but the right parameters are such a horrible fit
> even a somewhat wrong algorithm won't select them.
>
> Maybe you meant something different: I could also try fixing some
> model parameters and generating just a handful of photons, so I get a
> crummy estimate, but then repeating the photon generation and fit many
> times, to see if the average value of the best-fit parameters comes
> out close to the true parameters. But this is a test for unbiasedness
> of the estimator, and it's not clear that this estimator should be
> unbiased even if correct.

When I do a Monte Carlo for point estimates, I usually check bias,
variance, mean squared error,
and mean absolute and median absolute error (which is a more
robust to outliers, e.g. because for some cases the estimator produces
numerical nonsense because of non-convergence or other numerical
problems). MSE captures better cases of biased estimators that are
better in MSE sense.

I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return
in_interval_f"
and inside = test_credible_interval()

If my reading is correct inside should be 80% of M, and you are pretty close.
(M=1000 is pretty slow on my notebook)

>>> inside
39
>>> 39/50.
0.78000000000000003
>>>
>>> inside
410
>>> inside/500.
0.81999999999999995
>>>
>>> inside/1000.
0.81499999999999995

I haven't looked enough on the details yet, but I think this way you could
test more quantiles of the distribution, to see whether the posterior
distribution is roughly the same as the sampling distribution in the
MonteCarlo.

In each iteration of the Monte Carlo you get a full posterior distribution,
after a large number of iterations you have a sampling distribution,
and it should be possible to compare this distribution with the
posterior distributions. I'm still not sure how.

two questions to your algorithm

Isn't np.random.shuffle(r) redundant?
I didn't see anywhere were the sequence of observation in r would matter.

Why do you subtract mx in the loglikelihood function?
    mx = np.amax(lpdf)
    p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx))

>
>> Do you have an example with a known posterior? Then, the posterior
>> with a large sample or the average in a Monte Carlo should still be
>> approximately the true one.
>> For symmetric distributions, the Bayesian posterior confidence
>> intervals and posterior mean should be roughly the same as the
>> frequentist estimates. With diffuse priors, in many cases the results
>> are exactly the same in Bayesian and MLE.
>> Another version I used in the past is to trace the posterior mean, as
>> the prior variance is reduced, in one extreme you should get the prior
>> back in the other extreme the MLE.
>
> My priors are flat, on (0,1) in both phase and pulsed fraction. It
> seems a bit peculiar to use anything else in phase, but I can imagine
> some sort of logarithmic prior for pulsed fraction (making 0.01-0.1
> equally likely to 0.1-1). I haven't experimented with introducing
> localized priors, but it seems like that too wouldn't be very
> sensitive to whether the Bayesian calculation is right; if the prior
> insists that the values are both 0.5, then any remotely sane algorithm
> will come up with posteriors that are also 0.5.

In the last sentence, you better hope that, if the true fraction is 0.1
than the posterior should be concentrated around 0.1 and not around
0.5. Right now you don't have an explicit prior, but once you use
one, you might want to test the effects of an informative prior.

For binomial (fraction) the natural prior is the beta distribution,
if I remember correctly. But I don't know if the marginal
posterior in this case would also be beta.

>
>>> I can even generate models and then data
>>> sets that are drawn from the prior distribution, but what should I
>>> expect from the code output on such a data set?
>>
>> If you ignore the Bayesian interpretation, then this is just a
>> standard sampling problem, you draw prior parameters and observations,
>> the rest is just finding the conditional and marginal probabilities. I
>> think the posterior odds ratio should converge in a large Monte Carlo
>> to the true one, and the significance levels should correspond to the
>> one that has been set for the test (5%).
>> (In simplest case of conjugate priors, you can just interpret the
>> prior as a previous sample and you are back to a frequentist
>> explanation.)
>
> This sounds like what I was trying for - draw a model according to the
> priors, then generate a data set according to the model. I then get
> some numbers out: the simplest is a probability that the model was
> pulsed, but I can also get a credible interval or an estimated CDF for
> the model parameters. ?But I'm trying to figure out what test I should
> apply to those values to see if they make sense.
>
> For a credible interval, I suppose I could take (say) a 95% credible
> interval, then 95 times out of a hundred the model parameters I used
> to generate the data set should be in the credible interval. And I
> should be able to use the binomial distribution to put limits on how
> close to 95% I should get in M trials. This seems to work, but I'm not
> sure I understand why. The credible region is obtained from a
> probability distribution for the model parameters, but I am turning
> things around and testing the distribution of credible regions.

If you ignore the Bayesian belief interpretation, then it's just a
problem of Probability Theory, and you are just checking the
small and large sample behavior of an estimator and a test,
whether it has a Bayesian origin or not.

>
> In any case, that seems to work, so now I just need to figure out a
> similar test for the probability of being pulsed.

"probability of being pulsed"
I'm not sure what test you have in mind.
There are two interpretations:
In your current example, fraction is the fraction of observations that
are pulsed and fraction=0 is a zero probability event. So you cannot
really test fraction==0 versus fraction >0.

In the other interpretation you would have a prior probability (mass)
that your star is a pulsar with fraction >0 or a non-pulsing unit
with fraction=0.

The probabilities in both cases would be similar, but the interpretation
of the test would be different, and differ between frequentists and
Bayesians.

Overall your results look almost too "nice", with 8000 observations
you get a very narrow posterior in the plot.

Josef

>
>> The problem is that with an informative prior, you always have a
>> biased estimator in small samples and the posterior odds ratio is
>> affected by an informative prior. ?And "real" Bayesians don't care
>> about sampling properties.
>>
>> What are your prior distributions and the likelihood function in your
>> case? Can you model degenerate and diffuse priors, so that an
>> informative prior doesn't influence you sampling results?
>> I'm trying to think of special cases where you could remove the effect
>> of the prior.
>
> I've put the code on github in case that helps make this any clearer:
> http://github.com/aarchiba/bayespf
>
> The model I'm using is either: completely uniform mod 1 (probability
> 0.5) or: the PDF is a cosine plus a constant, where the two parameters
> are the fraction of area under the cosine (as opposed to under the
> constant) and the phase offset of the cosine. The likelihood is just
> (modulo logs for range issues) the product over all observed phases x
> of PDF(fraction, phase, x). So the mode of the posterior is exactly
> the maximum-likelihood estimate (whether or not I got the math right,
> more or less).
>
>> It's a bit vague because I don't see the details, and I haven't looked
>> at this in a while.
>
> As is probably obvious, I'm pretty vague on Bayesian statistics in
> general. But I'm working on it.
>
> Anne
>
>>
>>
>>
>>
>>
>>
>>>
>>> Suggestions welcome.
>>>
>>> Thanks,
>>> Anne
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From peridot.faceted at gmail.com  Sun Nov  8 17:51:08 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 8 Nov 2009 17:51:08 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
Message-ID: <ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>

2009/11/8  <josef.pktd at gmail.com>:

> When I do a Monte Carlo for point estimates, I usually check bias,
> variance, mean squared error,
> and mean absolute and median absolute error (which is a more
> robust to outliers, e.g. because for some cases the estimator produces
> numerical nonsense because of non-convergence or other numerical
> problems). MSE captures better cases of biased estimators that are
> better in MSE sense.

I can certainly compute all these quantities from a collection of
Monte Carlo runs, but I don't have any idea what values would indicate
correctness, apart from "not too big".

> I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return
> in_interval_f"
> and inside = test_credible_interval()
>
> If my reading is correct inside should be 80% of M, and you are pretty close.
> (M=1000 is pretty slow on my notebook)

Yeah, that's the problem with using the world's simplest numerical
integration scheme.

>>>> inside
> 39
>>>> 39/50.
> 0.78000000000000003
>>>>
>>>> inside
> 410
>>>> inside/500.
> 0.81999999999999995
>>>>
>>>> inside/1000.
> 0.81499999999999995
>
> I haven't looked enough on the details yet, but I think this way you could
> test more quantiles of the distribution, to see whether the posterior
> distribution is roughly the same as the sampling distribution in the
> MonteCarlo.

I could test more quantiles, but I'm very distrustful of testing more
than one quantile per randomly-generated sample: they should be
covariant (if the 90% mark is too high, the 95% mark will almost
certainly be too high as well) and I don't know how to take that into
account. And running the test is currently so slow I'm inclined to
spend my CPU time on a stricter test of a single quantile. Though
unfortunately to increase the strictness I also need to improve the
sampling in phase and fraction.

> In each iteration of the Monte Carlo you get a full posterior distribution,
> after a large number of iterations you have a sampling distribution,
> and it should be possible to compare this distribution with the
> posterior distributions. I'm still not sure how.

I don't understand what you mean here. I do get a full posterior
distribution out of every simulation. But how would I combine these
different distributions, and what would the combined distribution
mean?

> two questions to your algorithm
>
> Isn't np.random.shuffle(r) redundant?
> I didn't see anywhere were the sequence of observation in r would matter.

It is technically redundant. But since the point of all this is that I
don't trust my code to be right, I want to make sure there's no way it
can "cheat" by taking advantage of the order. And in any case, the
slow part is my far-too-simple numerical integration scheme. I'm
pretty sure the phase integration, at least, could be done
analytically.

> Why do you subtract mx in the loglikelihood function?
> ? ?mx = np.amax(lpdf)
> ? ?p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx))

This is to avoid overflows. I could just use logsumexp/logaddexp, but
that's not yet in numpy on any of the machines I regularly use. It has
no effect on the value, since it's subtracted from top and bottom
both, but it ensures that the largest value exponentiated is exactly
zero.

>>>> I can even generate models and then data
>>>> sets that are drawn from the prior distribution, but what should I
>>>> expect from the code output on such a data set?
>>>
>>> If you ignore the Bayesian interpretation, then this is just a
>>> standard sampling problem, you draw prior parameters and observations,
>>> the rest is just finding the conditional and marginal probabilities. I
>>> think the posterior odds ratio should converge in a large Monte Carlo
>>> to the true one, and the significance levels should correspond to the
>>> one that has been set for the test (5%).
>>> (In simplest case of conjugate priors, you can just interpret the
>>> prior as a previous sample and you are back to a frequentist
>>> explanation.)
>>
>> This sounds like what I was trying for - draw a model according to the
>> priors, then generate a data set according to the model. I then get
>> some numbers out: the simplest is a probability that the model was
>> pulsed, but I can also get a credible interval or an estimated CDF for
>> the model parameters. ?But I'm trying to figure out what test I should
>> apply to those values to see if they make sense.
>>
>> For a credible interval, I suppose I could take (say) a 95% credible
>> interval, then 95 times out of a hundred the model parameters I used
>> to generate the data set should be in the credible interval. And I
>> should be able to use the binomial distribution to put limits on how
>> close to 95% I should get in M trials. This seems to work, but I'm not
>> sure I understand why. The credible region is obtained from a
>> probability distribution for the model parameters, but I am turning
>> things around and testing the distribution of credible regions.
>
> If you ignore the Bayesian belief interpretation, then it's just a
> problem of Probability Theory, and you are just checking the
> small and large sample behavior of an estimator and a test,
> whether it has a Bayesian origin or not.

Indeed. But with frequentist tests, I have a clear statement of what
they're telling me that I can test against: "If you feed this test
pure noise you'll get a result this high with probability p". I
haven't figured out how to turn the p-value returned by this test into
something I can test against.

>> In any case, that seems to work, so now I just need to figure out a
>> similar test for the probability of being pulsed.
>
> "probability of being pulsed"
> I'm not sure what test you have in mind.
> There are two interpretations:
> In your current example, fraction is the fraction of observations that
> are pulsed and fraction=0 is a zero probability event. So you cannot
> really test fraction==0 versus fraction >0.
>
> In the other interpretation you would have a prior probability (mass)
> that your star is a pulsar with fraction >0 or a non-pulsing unit
> with fraction=0.

This is what the code currently implements: I begin with a 50% chance
the signal is unpulsed and a 50% chance the signal is pulsed with some
fraction >= 0.

> The probabilities in both cases would be similar, but the interpretation
> of the test would be different, and differ between frequentists and
> Bayesians.
>
> Overall your results look almost too "nice", with 8000 observations
> you get a very narrow posterior in the plot.

If you supply a fairly high pulsed fraction, it's indeed easy to tell
that it's pulsed with 8000 photons; the difficulty comes when you're
looking for a 10% pulsed fraction; it's much harder than 800 photons
with a 100% pulsed fraction. If I were really interested in the
many-photons case I'd want to think about a prior that made more sense
for really small fractions. But I'm keeping things simple for now.

Anne


From josef.pktd at gmail.com  Sun Nov  8 21:35:18 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 8 Nov 2009 21:35:18 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
	<ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
Message-ID: <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>

On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/8 ?<josef.pktd at gmail.com>:
>
>> When I do a Monte Carlo for point estimates, I usually check bias,
>> variance, mean squared error,
>> and mean absolute and median absolute error (which is a more
>> robust to outliers, e.g. because for some cases the estimator produces
>> numerical nonsense because of non-convergence or other numerical
>> problems). MSE captures better cases of biased estimators that are
>> better in MSE sense.
>
> I can certainly compute all these quantities from a collection of
> Monte Carlo runs, but I don't have any idea what values would indicate
> correctness, apart from "not too big".

I consider them mainly as an absolute standard to see how well
an estimator works (or what the size and power of a test is) or
to compare them to other estimators, which is a common case for
publishing Monte Carlo studies for new estimators.


>
>> I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return
>> in_interval_f"
>> and inside = test_credible_interval()
>>
>> If my reading is correct inside should be 80% of M, and you are pretty close.
>> (M=1000 is pretty slow on my notebook)
>
> Yeah, that's the problem with using the world's simplest numerical
> integration scheme.
>
>>>>> inside
>> 39
>>>>> 39/50.
>> 0.78000000000000003
>>>>>
>>>>> inside
>> 410
>>>>> inside/500.
>> 0.81999999999999995
>>>>>
>>>>> inside/1000.
>> 0.81499999999999995
>>
>> I haven't looked enough on the details yet, but I think this way you could
>> test more quantiles of the distribution, to see whether the posterior
>> distribution is roughly the same as the sampling distribution in the
>> MonteCarlo.
>
> I could test more quantiles, but I'm very distrustful of testing more
> than one quantile per randomly-generated sample: they should be
> covariant (if the 90% mark is too high, the 95% mark will almost
> certainly be too high as well) and I don't know how to take that into
> account. And running the test is currently so slow I'm inclined to
> spend my CPU time on a stricter test of a single quantile. Though
> unfortunately to increase the strictness I also need to improve the
> sampling in phase and fraction.

Adding additional quantiles might be relatively cheap, mainly the
call to searchsorted. One or two quantiles could be consistent
with many different distributions or e.g with fatter tails, so I usually
check more points.

>
>> In each iteration of the Monte Carlo you get a full posterior distribution,
>> after a large number of iterations you have a sampling distribution,
>> and it should be possible to compare this distribution with the
>> posterior distributions. I'm still not sure how.
>
> I don't understand what you mean here. I do get a full posterior
> distribution out of every simulation. But how would I combine these
> different distributions, and what would the combined distribution
> mean?

I'm still trying to think how this can be done. Checking more quantiles
as discussed above might be doing it to some extend.
(I also wonder whether it might be useful to fix the observations
during the monte carlo and only vary the sampling of the parameters ?)

>
>> two questions to your algorithm
>>
>> Isn't np.random.shuffle(r) redundant?
>> I didn't see anywhere were the sequence of observation in r would matter.
>
> It is technically redundant. But since the point of all this is that I
> don't trust my code to be right, I want to make sure there's no way it
> can "cheat" by taking advantage of the order. And in any case, the
> slow part is my far-too-simple numerical integration scheme. I'm
> pretty sure the phase integration, at least, could be done
> analytically.
>
>> Why do you subtract mx in the loglikelihood function?
>> ? ?mx = np.amax(lpdf)
>> ? ?p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx))
>
> This is to avoid overflows. I could just use logsumexp/logaddexp, but
> that's not yet in numpy on any of the machines I regularly use. It has
> no effect on the value, since it's subtracted from top and bottom
> both, but it ensures that the largest value exponentiated is exactly
> zero.
>
>>>>> I can even generate models and then data
>>>>> sets that are drawn from the prior distribution, but what should I
>>>>> expect from the code output on such a data set?
>>>>
>>>> If you ignore the Bayesian interpretation, then this is just a
>>>> standard sampling problem, you draw prior parameters and observations,
>>>> the rest is just finding the conditional and marginal probabilities. I
>>>> think the posterior odds ratio should converge in a large Monte Carlo
>>>> to the true one, and the significance levels should correspond to the
>>>> one that has been set for the test (5%).
>>>> (In simplest case of conjugate priors, you can just interpret the
>>>> prior as a previous sample and you are back to a frequentist
>>>> explanation.)
>>>
>>> This sounds like what I was trying for - draw a model according to the
>>> priors, then generate a data set according to the model. I then get
>>> some numbers out: the simplest is a probability that the model was
>>> pulsed, but I can also get a credible interval or an estimated CDF for
>>> the model parameters. ?But I'm trying to figure out what test I should
>>> apply to those values to see if they make sense.
>>>
>>> For a credible interval, I suppose I could take (say) a 95% credible
>>> interval, then 95 times out of a hundred the model parameters I used
>>> to generate the data set should be in the credible interval. And I
>>> should be able to use the binomial distribution to put limits on how
>>> close to 95% I should get in M trials. This seems to work, but I'm not
>>> sure I understand why. The credible region is obtained from a
>>> probability distribution for the model parameters, but I am turning
>>> things around and testing the distribution of credible regions.
>>
>> If you ignore the Bayesian belief interpretation, then it's just a
>> problem of Probability Theory, and you are just checking the
>> small and large sample behavior of an estimator and a test,
>> whether it has a Bayesian origin or not.
>
> Indeed. But with frequentist tests, I have a clear statement of what
> they're telling me that I can test against: "If you feed this test
> pure noise you'll get a result this high with probability p". I
> haven't figured out how to turn the p-value returned by this test into
> something I can test against.

What exactly are the null and the alternative hypothesis that you
want to test? This is still not clear to me, see also below.


>
>>> In any case, that seems to work, so now I just need to figure out a
>>> similar test for the probability of being pulsed.
>>
>> "probability of being pulsed"
>> I'm not sure what test you have in mind.
>> There are two interpretations:
>> In your current example, fraction is the fraction of observations that
>> are pulsed and fraction=0 is a zero probability event. So you cannot
>> really test fraction==0 versus fraction >0.
>>
>> In the other interpretation you would have a prior probability (mass)
>> that your star is a pulsar with fraction >0 or a non-pulsing unit
>> with fraction=0.
>
> This is what the code currently implements: I begin with a 50% chance
> the signal is unpulsed and a 50% chance the signal is pulsed with some
> fraction >= 0.

I don't see this

in generate you have m = np.random.binomial(n, fraction)
where m is the number of pulsed observations.

the probability of observing no pulsed observations is very
small
>>> stats.binom.pmf(0,100,0.05)
0.0059205292203340009

your likelihood function pdf_data_given_model
also treats each observation with equal fraction to be pulsed or not.

I would have expected something like
case1: pulsed according to binomial fraction, same as now
case2: no observations are pulsed
prior Prob(case1)=0.5, Prob(case2)=0.5

Or am I missing something?

Where is there a test? You have a good posterior distribution
for the fraction, which imply a point estimate and confidence interval,
which look good from the tests.
But I don't see a test hypothesis, (especially a Bayesian statement)

Josef

>
>> The probabilities in both cases would be similar, but the interpretation
>> of the test would be different, and differ between frequentists and
>> Bayesians.
>>
>> Overall your results look almost too "nice", with 8000 observations
>> you get a very narrow posterior in the plot.
>
> If you supply a fairly high pulsed fraction, it's indeed easy to tell
> that it's pulsed with 8000 photons; the difficulty comes when you're
> looking for a 10% pulsed fraction; it's much harder than 800 photons
> with a 100% pulsed fraction. If I were really interested in the
> many-photons case I'd want to think about a prior that made more sense
> for really small fractions. But I'm keeping things simple for now.
>
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From peridot.faceted at gmail.com  Sun Nov  8 22:22:36 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 8 Nov 2009 22:22:36 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
	<ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
	<1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>
Message-ID: <ce557a360911081922u3fa375d9pc525568c7f478c96@mail.gmail.com>

2009/11/8  <josef.pktd at gmail.com>:
> On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald
> <peridot.faceted at gmail.com> wrote:
>> 2009/11/8 ?<josef.pktd at gmail.com>:
>>
>>> When I do a Monte Carlo for point estimates, I usually check bias,
>>> variance, mean squared error,
>>> and mean absolute and median absolute error (which is a more
>>> robust to outliers, e.g. because for some cases the estimator produces
>>> numerical nonsense because of non-convergence or other numerical
>>> problems). MSE captures better cases of biased estimators that are
>>> better in MSE sense.
>>
>> I can certainly compute all these quantities from a collection of
>> Monte Carlo runs, but I don't have any idea what values would indicate
>> correctness, apart from "not too big".
>
> I consider them mainly as an absolute standard to see how well
> an estimator works (or what the size and power of a test is) or
> to compare them to other estimators, which is a common case for
> publishing Monte Carlo studies for new estimators.

Ah. Yes, I will definitely be wanting to compute these at some point.
But first I'd like to make sure this estimator is doing what I want it
to.

>> I could test more quantiles, but I'm very distrustful of testing more
>> than one quantile per randomly-generated sample: they should be
>> covariant (if the 90% mark is too high, the 95% mark will almost
>> certainly be too high as well) and I don't know how to take that into
>> account. And running the test is currently so slow I'm inclined to
>> spend my CPU time on a stricter test of a single quantile. Though
>> unfortunately to increase the strictness I also need to improve the
>> sampling in phase and fraction.
>
> Adding additional quantiles might be relatively cheap, mainly the
> call to searchsorted. One or two quantiles could be consistent
> with many different distributions or e.g with fatter tails, so I usually
> check more points.

As I said, I'm concerned about using more than one credible interval
per simulation run, since the credible intervals for different
quantiles will be different sizes.

>>> In each iteration of the Monte Carlo you get a full posterior distribution,
>>> after a large number of iterations you have a sampling distribution,
>>> and it should be possible to compare this distribution with the
>>> posterior distributions. I'm still not sure how.
>>
>> I don't understand what you mean here. I do get a full posterior
>> distribution out of every simulation. But how would I combine these
>> different distributions, and what would the combined distribution
>> mean?
>
> I'm still trying to think how this can be done. Checking more quantiles
> as discussed above might be doing it to some extend.
> (I also wonder whether it might be useful to fix the observations
> during the monte carlo and only vary the sampling of the parameters ?)

I can see that fixing the data would be sort of nice, but it's not at
all clear to me what it would even mean to vary the model while
keeping the data constant - after all, the estimator has no access to
the model, only the data, so varying the model would have no effect on
the result returned.

>> Indeed. But with frequentist tests, I have a clear statement of what
>> they're telling me that I can test against: "If you feed this test
>> pure noise you'll get a result this high with probability p". I
>> haven't figured out how to turn the p-value returned by this test into
>> something I can test against.
>
> What exactly are the null and the alternative hypothesis that you
> want to test? This is still not clear to me, see also below.

Null hypothesis: no pulsations, all photons are drawn from a uniform
distributions.

Alternative: photons are drawn from a distribution with pulsed
fraction f and phase p.

>> This is what the code currently implements: I begin with a 50% chance
>> the signal is unpulsed and a 50% chance the signal is pulsed with some
>> fraction >= 0.
>
> I don't see this
>
> in generate you have m = np.random.binomial(n, fraction)
> where m is the number of pulsed observations.

Generate can be used to generate photons from a uniform distribution
by calling it with fraction set to zero. I don't actually do this
while testing the credible intervals, because  (as I understand it)
the presence of this hypothesis does not affect the credible
intervals. That is, the credible intervals I'm testing are the
credible intervals assuming that the pulsations are real. I'm not at
all sure how to incorporate the alternative hypothesis into my
testing.

> the probability of observing no pulsed observations is very
> small
>>>> stats.binom.pmf(0,100,0.05)
> 0.0059205292203340009
>
> your likelihood function pdf_data_given_model
> also treats each observation with equal fraction to be pulsed or not.

pdf_data_given_model computes the PDF given a set of model parameters.
If you want to use it to get a likelihood with fraction=0, you can
call it with fraction=0. But this likelihood is always zero.

> I would have expected something like
> case1: pulsed according to binomial fraction, same as now
> case2: no observations are pulsed
> prior Prob(case1)=0.5, Prob(case2)=0.5
>
> Or am I missing something?
>
> Where is there a test? You have a good posterior distribution
> for the fraction, which imply a point estimate and confidence interval,
> which look good from the tests.
> But I don't see a test hypothesis, (especially a Bayesian statement)

When I came to implement it, the only place I actually needed to
mention the null hypothesis was in the calculation of the pulsed
probability, which is the last value returned by the inference
routine. I did make the somewhat peculiar choice to have the model PDF
returned normalized so that its total probability was one, rather than
scaling all points by the probability that the system is pulsed at
all.

It turns out that the inference code just computes the total
normalization S over all parameters; then the probability that the
signal is pulsed is S/(S+1).

Anne


From josef.pktd at gmail.com  Mon Nov  9 00:07:53 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 00:07:53 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911081922u3fa375d9pc525568c7f478c96@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
	<ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
	<1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>
	<ce557a360911081922u3fa375d9pc525568c7f478c96@mail.gmail.com>
Message-ID: <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com>

On Sun, Nov 8, 2009 at 10:22 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/8 ?<josef.pktd at gmail.com>:
>> On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald
>> <peridot.faceted at gmail.com> wrote:
>>> 2009/11/8 ?<josef.pktd at gmail.com>:
>>>
>>>> When I do a Monte Carlo for point estimates, I usually check bias,
>>>> variance, mean squared error,
>>>> and mean absolute and median absolute error (which is a more
>>>> robust to outliers, e.g. because for some cases the estimator produces
>>>> numerical nonsense because of non-convergence or other numerical
>>>> problems). MSE captures better cases of biased estimators that are
>>>> better in MSE sense.
>>>
>>> I can certainly compute all these quantities from a collection of
>>> Monte Carlo runs, but I don't have any idea what values would indicate
>>> correctness, apart from "not too big".
>>
>> I consider them mainly as an absolute standard to see how well
>> an estimator works (or what the size and power of a test is) or
>> to compare them to other estimators, which is a common case for
>> publishing Monte Carlo studies for new estimators.
>
> Ah. Yes, I will definitely be wanting to compute these at some point.
> But first I'd like to make sure this estimator is doing what I want it
> to.
>
>>> I could test more quantiles, but I'm very distrustful of testing more
>>> than one quantile per randomly-generated sample: they should be
>>> covariant (if the 90% mark is too high, the 95% mark will almost
>>> certainly be too high as well) and I don't know how to take that into
>>> account. And running the test is currently so slow I'm inclined to
>>> spend my CPU time on a stricter test of a single quantile. Though
>>> unfortunately to increase the strictness I also need to improve the
>>> sampling in phase and fraction.
>>
>> Adding additional quantiles might be relatively cheap, mainly the
>> call to searchsorted. One or two quantiles could be consistent
>> with many different distributions or e.g with fatter tails, so I usually
>> check more points.
>
> As I said, I'm concerned about using more than one credible interval
> per simulation run, since the credible intervals for different
> quantiles will be different sizes.
>
>>>> In each iteration of the Monte Carlo you get a full posterior distribution,
>>>> after a large number of iterations you have a sampling distribution,
>>>> and it should be possible to compare this distribution with the
>>>> posterior distributions. I'm still not sure how.
>>>
>>> I don't understand what you mean here. I do get a full posterior
>>> distribution out of every simulation. But how would I combine these
>>> different distributions, and what would the combined distribution
>>> mean?
>>
>> I'm still trying to think how this can be done. Checking more quantiles
>> as discussed above might be doing it to some extend.
>> (I also wonder whether it might be useful to fix the observations
>> during the monte carlo and only vary the sampling of the parameters ?)
>
> I can see that fixing the data would be sort of nice, but it's not at
> all clear to me what it would even mean to vary the model while
> keeping the data constant - after all, the estimator has no access to
> the model, only the data, so varying the model would have no effect on
> the result returned.
>
>>> Indeed. But with frequentist tests, I have a clear statement of what
>>> they're telling me that I can test against: "If you feed this test
>>> pure noise you'll get a result this high with probability p". I
>>> haven't figured out how to turn the p-value returned by this test into
>>> something I can test against.
>>
>> What exactly are the null and the alternative hypothesis that you
>> want to test? This is still not clear to me, see also below.
>
> Null hypothesis: no pulsations, all photons are drawn from a uniform
> distributions.
>
> Alternative: photons are drawn from a distribution with pulsed
> fraction f and phase p.
>
>>> This is what the code currently implements: I begin with a 50% chance
>>> the signal is unpulsed and a 50% chance the signal is pulsed with some
>>> fraction >= 0.
>>
>> I don't see this
>>
>> in generate you have m = np.random.binomial(n, fraction)
>> where m is the number of pulsed observations.
>
> Generate can be used to generate photons from a uniform distribution
> by calling it with fraction set to zero. I don't actually do this
> while testing the credible intervals, because ?(as I understand it)
> the presence of this hypothesis does not affect the credible
> intervals. That is, the credible intervals I'm testing are the
> credible intervals assuming that the pulsations are real. I'm not at
> all sure how to incorporate the alternative hypothesis into my
> testing.
>
>> the probability of observing no pulsed observations is very
>> small
>>>>> stats.binom.pmf(0,100,0.05)
>> 0.0059205292203340009
>>
>> your likelihood function pdf_data_given_model
>> also treats each observation with equal fraction to be pulsed or not.
>
> pdf_data_given_model computes the PDF given a set of model parameters.
> If you want to use it to get a likelihood with fraction=0, you can
> call it with fraction=0. But this likelihood is always zero.
>
>> I would have expected something like
>> case1: pulsed according to binomial fraction, same as now
>> case2: no observations are pulsed
>> prior Prob(case1)=0.5, Prob(case2)=0.5
>>
>> Or am I missing something?
>>
>> Where is there a test? You have a good posterior distribution
>> for the fraction, which imply a point estimate and confidence interval,
>> which look good from the tests.
>> But I don't see a test hypothesis, (especially a Bayesian statement)
>
> When I came to implement it, the only place I actually needed to
> mention the null hypothesis was in the calculation of the pulsed
> probability, which is the last value returned by the inference
> routine. I did make the somewhat peculiar choice to have the model PDF
> returned normalized so that its total probability was one, rather than
> scaling all points by the probability that the system is pulsed at
> all.
>
> It turns out that the inference code just computes the total
> normalization S over all parameters; then the probability that the
> signal is pulsed is S/(S+1).

Ok, I think I'm starting to see how this works, since you drop the
prior probabilities (0.5, 0.5) and the likelihood under the uniform
distribution is just 1, everything is pretty reduced form.

>From the posterior probability S/(S+1), you could construct
a decision rule similar to a classical test, e.g. accept null
if S/(S+1) < 0.95, and then construct a MonteCarlo
with samples drawn form either the uniform or the pulsed
distribution in the same way as for a classical test, and
verify that the decision mistakes, alpha and beta errors, in the
sample are close to the posterior probabilities.
The posterior probability would be similar to the p-value
in a classical test. If you want to balance alpha and
beta errors, a threshold S/(S+1)<0.5 would be more
appropriate, but for the unit tests it wouldn't matter.

Running the example a few times, it looks like that the power
is relatively low for distinguishing uniform distribution from
a pulsed distribution with fraction/binomial parameter 0.05
and sample size <1000.
If you have strong beliefs that the fraction is really this low
than an informative prior for the fraction, might improve the
results.

Josef


>
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Mon Nov  9 09:40:50 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 09:40:50 -0500
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <20091108151625.GA561@doriath.local>
References: <20091108151625.GA561@doriath.local>
Message-ID: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>

2009/11/8 Ernest Adrogu? <eadrogue at gmx.net>:
> Hi,
>
> In case somebody is interested, or you want to include it
> in scipy. I used these specs here from the R package:
> cran.r-project.org/web/packages/skellam/skellam.pdf
>
> Note that I am no statician, somebody who knows what he's
> doing (as opposed to me ;) should verify it's correct.
>
>
> import numpy
> import scipy.stats.distributions
>
> # Skellam distribution
>
> ncx2 = scipy.stats.distributions.ncx2
>
> class skellam_gen(scipy.stats.distributions.rv_discrete):
> ? ?def _pmf(self, x, mu1, mu2):
> ? ? ? ?if x < 0:
> ? ? ? ? ? ?px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
> ? ? ? ?else:
> ? ? ? ? ? ?px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
> ? ? ? ?return px
> ? ?def _cdf(self, x, mu1, mu2):
> ? ? ? ?x = numpy.floor(x)
> ? ? ? ?if x < 0:
> ? ? ? ? ? ?px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
> ? ? ? ?else:
> ? ? ? ? ? ?px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
> ? ? ? ?return px
> ? ?def _stats(self, mu1, mu2):
> ? ? ? ?mean = mu1 - mu2
> ? ? ? ?var = mu1 + mu2
> ? ? ? ?g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
> ? ? ? ?g2 = 1 / (mu1 + mu2)
> ? ? ? ?return mean, var, g1, g2
> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
> ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="")
>

Thanks, I think the distribution of the difference of two poisson
distributed random variables could be useful.

Would you please open an enhancement ticket for this at
http://projects.scipy.org/scipy/report/1

I had only a brief look at it so far, I had never looked at the
Skellam distribution before, and just read a few references.

The "if x < 0 .. else ..." will have to be replace with a
"numpy.where" assignment, since the methods are supposed to work with
arrays of x (as far as I remember)

_rvs could be implemented directly instead of generically (I don't
find the reference, where I saw it, right now).

Documentation will be necessary,  a brief description in the
(currently) extradocs, and a listing of the properties for the
description of the distributions currently in the stats tutorial.

I have some background questions, which address the limitation of the
implementation (but are not really necessary for inclusion into
scipy).

The description in R mentions several implementation of Skellam. Do
you have a rough idea what the range of parameters are for which the
implementation using ncx produces good results? Do you know if any
other special functions would produce good results over a larger
range, e.g. using Bessel function?

Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
mentions (but doesn't describe) the case of Skellam distribution with
correlated Poisson distributions. Do you know what the difference to
your implementation would be?

Tests for a new distribution will be picked up by the generic tests,
but it would be useful to have some extra tests for extreme/uncommon
parameter ranges. Do you have any comparisons with R, since you
already looked it?


Thanks again, I'm always looking out for new useful distributions,
(but I have to find the time to do the testing and actual
implementation).

Josef


> Bye.
>
> --
> Ernest
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bsouthey at gmail.com  Mon Nov  9 10:53:33 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Nov 2009 09:53:33 -0600
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
Message-ID: <4AF83AFD.60304@gmail.com>

On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
> 2009/11/8 Ernest Adrogu?<eadrogue at gmx.net>:
>    
>> Hi,
>>
>> In case somebody is interested, or you want to include it
>> in scipy. I used these specs here from the R package:
>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>
>> Note that I am no statician, somebody who knows what he's
>> doing (as opposed to me ;) should verify it's correct.
>>
>>
>> import numpy
>> import scipy.stats.distributions
>>
>> # Skellam distribution
>>
>> ncx2 = scipy.stats.distributions.ncx2
>>
>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>     def _pmf(self, x, mu1, mu2):
>>         if x<  0:
>>             px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>         else:
>>             px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>         return px
>>     def _cdf(self, x, mu1, mu2):
>>         x = numpy.floor(x)
>>         if x<  0:
>>             px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>         else:
>>             px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>         return px
>>     def _stats(self, mu1, mu2):
>>         mean = mu1 - mu2
>>         var = mu1 + mu2
>>         g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>         g2 = 1 / (mu1 + mu2)
>>         return mean, var, g1, g2
>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>                       shapes="mu1,mu2", extradoc="")
>>
>>      
> Thanks, I think the distribution of the difference of two poisson
> distributed random variables could be useful.
>
> Would you please open an enhancement ticket for this at
> http://projects.scipy.org/scipy/report/1
>
> I had only a brief look at it so far, I had never looked at the
> Skellam distribution before, and just read a few references.
>
> The "if x<  0 .. else ..." will have to be replace with a
> "numpy.where" assignment, since the methods are supposed to work with
> arrays of x (as far as I remember)
>
> _rvs could be implemented directly instead of generically (I don't
> find the reference, where I saw it, right now).
>
> Documentation will be necessary,  a brief description in the
> (currently) extradocs, and a listing of the properties for the
> description of the distributions currently in the stats tutorial.
>
> I have some background questions, which address the limitation of the
> implementation (but are not really necessary for inclusion into
> scipy).
>
> The description in R mentions several implementation of Skellam. Do
> you have a rough idea what the range of parameters are for which the
> implementation using ncx produces good results? Do you know if any
> other special functions would produce good results over a larger
> range, e.g. using Bessel function?
>
> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
> mentions (but doesn't describe) the case of Skellam distribution with
> correlated Poisson distributions. Do you know what the difference to
> your implementation would be?
>
> Tests for a new distribution will be picked up by the generic tests,
> but it would be useful to have some extra tests for extreme/uncommon
> parameter ranges. Do you have any comparisons with R, since you
> already looked it?
>
>
> Thanks again, I'm always looking out for new useful distributions,
> (but I have to find the time to do the testing and actual
> implementation).
>
> Josef
>
>
>    
>> Bye.
>>
>> --
>> Ernest
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>      
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>    

Generally any R code can not be used in numpy because R is GPL. Usually 
R code is also licensed under GPL  so translation from R to Python/numpy 
still maintains the original license. So the code not used by numpy 
unless that code is licensed under a BSD compatible license.

You *must* show that you implementation is from a BSD-compatible source 
not from the R package. I can see that your code is very simple so there 
should be an viable alternative source.

Also, in the _stats function why do you do not re-use the mean and var 
variables in computing the g1 and g2 variables?

What are 'x, mu1, mu2' ?
This looks like a scalar implementation so you need to either check that 
or allow for array-like inputs.

Bruce


From cohen at lpta.in2p3.fr  Mon Nov  9 11:01:44 2009
From: cohen at lpta.in2p3.fr (Johann Cohen-Tanugi)
Date: Mon, 09 Nov 2009 17:01:44 +0100
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF83AFD.60304@gmail.com>
References: <20091108151625.GA561@doriath.local>	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com>
Message-ID: <4AF83CE8.7080507@lpta.in2p3.fr>

 From what I understand of the initial statement from Ernest:
"
In case somebody is interested, or you want to include it
in scipy. I used these specs here from the R package:
cran.r-project.org/web/packages/skellam/skellam.pdf
"

he used the spec, as defined in this pdf, and did not look at the code 
itself. If my interpretation of the small preamble above is correct, I 
believe his implementation is not GPL-tainted, right?

Johann

Bruce Southey wrote:
> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>   
>> 2009/11/8 Ernest Adrogu?<eadrogue at gmx.net>:
>>    
>>     
>>> Hi,
>>>
>>> In case somebody is interested, or you want to include it
>>> in scipy. I used these specs here from the R package:
>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>
>>> Note that I am no statician, somebody who knows what he's
>>> doing (as opposed to me ;) should verify it's correct.
>>>
>>>
>>> import numpy
>>> import scipy.stats.distributions
>>>
>>> # Skellam distribution
>>>
>>> ncx2 = scipy.stats.distributions.ncx2
>>>
>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>     def _pmf(self, x, mu1, mu2):
>>>         if x<  0:
>>>             px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>         else:
>>>             px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>         return px
>>>     def _cdf(self, x, mu1, mu2):
>>>         x = numpy.floor(x)
>>>         if x<  0:
>>>             px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>         else:
>>>             px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>         return px
>>>     def _stats(self, mu1, mu2):
>>>         mean = mu1 - mu2
>>>         var = mu1 + mu2
>>>         g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>         g2 = 1 / (mu1 + mu2)
>>>         return mean, var, g1, g2
>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>                       shapes="mu1,mu2", extradoc="")
>>>
>>>      
>>>       
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>>
>> Would you please open an enhancement ticket for this at
>> http://projects.scipy.org/scipy/report/1
>>
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>>
>> The "if x<  0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>>
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
>>
>> Documentation will be necessary,  a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>>
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> scipy).
>>
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
>>
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
>>
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
>>
>>
>> Thanks again, I'm always looking out for new useful distributions,
>> (but I have to find the time to do the testing and actual
>> implementation).
>>
>> Josef
>>
>>
>>    
>>     
>>> Bye.
>>>
>>> --
>>> Ernest
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>      
>>>       
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>    
>>     
>
> Generally any R code can not be used in numpy because R is GPL. Usually 
> R code is also licensed under GPL  so translation from R to Python/numpy 
> still maintains the original license. So the code not used by numpy 
> unless that code is licensed under a BSD compatible license.
>
> You *must* show that you implementation is from a BSD-compatible source 
> not from the R package. I can see that your code is very simple so there 
> should be an viable alternative source.
>
> Also, in the _stats function why do you do not re-use the mean and var 
> variables in computing the g1 and g2 variables?
>
> What are 'x, mu1, mu2' ?
> This looks like a scalar implementation so you need to either check that 
> or allow for array-like inputs.
>
> Bruce
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>   


From josef.pktd at gmail.com  Mon Nov  9 11:07:02 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 11:07:02 -0500
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF83AFD.60304@gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com>
Message-ID: <1cd32cbb0911090807x2cdcebbcu9b296b8a943630e9@mail.gmail.com>

On Mon, Nov 9, 2009 at 10:53 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>> 2009/11/8 Ernest Adrogu?<eadrogue at gmx.net>:
>>
>>> Hi,
>>>
>>> In case somebody is interested, or you want to include it
>>> in scipy. I used these specs here from the R package:
>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>
>>> Note that I am no statician, somebody who knows what he's
>>> doing (as opposed to me ;) should verify it's correct.
>>>
>>>
>>> import numpy
>>> import scipy.stats.distributions
>>>
>>> # Skellam distribution
>>>
>>> ncx2 = scipy.stats.distributions.ncx2
>>>
>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>> ? ? def _pmf(self, x, mu1, mu2):
>>> ? ? ? ? if x< ?0:
>>> ? ? ? ? ? ? px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>> ? ? ? ? else:
>>> ? ? ? ? ? ? px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>> ? ? ? ? return px
>>> ? ? def _cdf(self, x, mu1, mu2):
>>> ? ? ? ? x = numpy.floor(x)
>>> ? ? ? ? if x< ?0:
>>> ? ? ? ? ? ? px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>> ? ? ? ? else:
>>> ? ? ? ? ? ? px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>> ? ? ? ? return px
>>> ? ? def _stats(self, mu1, mu2):
>>> ? ? ? ? mean = mu1 - mu2
>>> ? ? ? ? var = mu1 + mu2
>>> ? ? ? ? g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>> ? ? ? ? g2 = 1 / (mu1 + mu2)
>>> ? ? ? ? return mean, var, g1, g2
>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>> ? ? ? ? ? ? ? ? ? ? ? shapes="mu1,mu2", extradoc="")
>>>
>>>
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>>
>> Would you please open an enhancement ticket for this at
>> http://projects.scipy.org/scipy/report/1
>>
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>>
>> The "if x< ?0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>>
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
>>
>> Documentation will be necessary, ?a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>>
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> scipy).
>>
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
>>
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
>>
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
>>
>>
>> Thanks again, I'm always looking out for new useful distributions,
>> (but I have to find the time to do the testing and actual
>> implementation).
>>
>> Josef
>>
>>
>>
>>> Bye.
>>>
>>> --
>>> Ernest
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> Generally any R code can not be used in numpy because R is GPL. Usually
> R code is also licensed under GPL ?so translation from R to Python/numpy
> still maintains the original license. So the code not used by numpy
> unless that code is licensed under a BSD compatible license.
>
> You *must* show that you implementation is from a BSD-compatible source
> not from the R package. I can see that your code is very simple so there
> should be an viable alternative source.

We only need to read the R documentation and not the R code:

pskellam(x,lambda1,lambda2) returns pchisq(2*lambda2, -2*x, 2*lambda1)
for x <= 0 and 1 - pchisq(2*lambda1, 2*(x+1), 2*lambda2) for x >= 0. When
pchisq incorrectly returns 0, a saddlepoint approximation is
substituted, which typically gives at
least 2-figure accuracy.
The quantile is defined as the smallest value x such that F(x)  p,
where F is the distribution
function. For lower.tail=FALSE, the quantile is defined as the largest
value x such that
F(x;lower.tail=FALSE)  p.
rskellam is calculated as rpois(n,lambda1)-rpois(n,lambda2)
dskellam.

and

The relation of dgamma to the modified Bessel function of the first
kind was given by Skellam
(1946). The relation of pgamma to the noncentral chi-square was given
by Johnson (1959). Tables
are given by Strackee and van der Gon (1962), which can be used to
verify this implementation (cf.
direct calculation in the examples below).

the rest follows from the Wikipedia page (which is also in the list of
references in R docs), there is no copyright on the definition of a
distribution.

>
> Also, in the _stats function why do you do not re-use the mean and var
> variables in computing the g1 and g2 variables?
>
> What are 'x, mu1, mu2' ?

x is the integer at which cdf pr pmf are calculated, mu1,mu2 are the
parameters of the poisson distributions, following wikipedia.

Josef

> This looks like a scalar implementation so you need to either check that
> or allow for array-like inputs.
>
> Bruce
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bsouthey at gmail.com  Mon Nov  9 11:19:41 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Nov 2009 10:19:41 -0600
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF83CE8.7080507@lpta.in2p3.fr>
References: <20091108151625.GA561@doriath.local>	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>	<4AF83AFD.60304@gmail.com>
	<4AF83CE8.7080507@lpta.in2p3.fr>
Message-ID: <4AF8411D.9050605@gmail.com>

On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote:
>    From what I understand of the initial statement from Ernest:
> "
> In case somebody is interested, or you want to include it
> in scipy. I used these specs here from the R package:
> cran.r-project.org/web/packages/skellam/skellam.pdf
> "
>
> he used the spec, as defined in this pdf, and did not look at the code
> itself. If my interpretation of the small preamble above is correct, I
> believe his implementation is not GPL-tainted, right?
>
> Johann
> Bruce Southey wrote:
>    
>> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>>
>>      
>>> 2009/11/8 Ernest Adrogu?<eadrogue at gmx.net>:
>>>
>>>
>>>        
>>>> Hi,
>>>>
>>>> In case somebody is interested, or you want to include it
>>>> in scipy. I used these specs here from the R package:
>>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>>
>>>> Note that I am no statician, somebody who knows what he's
>>>> doing (as opposed to me ;) should verify it's correct.
>>>>
>>>>
>>>> import numpy
>>>> import scipy.stats.distributions
>>>>
>>>> # Skellam distribution
>>>>
>>>> ncx2 = scipy.stats.distributions.ncx2
>>>>
>>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>>      def _pmf(self, x, mu1, mu2):
>>>>          if x<   0:
>>>>              px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>>          else:
>>>>              px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>>          return px
>>>>      def _cdf(self, x, mu1, mu2):
>>>>          x = numpy.floor(x)
>>>>          if x<   0:
>>>>              px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>>          else:
>>>>              px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>>          return px
>>>>      def _stats(self, mu1, mu2):
>>>>          mean = mu1 - mu2
>>>>          var = mu1 + mu2
>>>>          g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>>          g2 = 1 / (mu1 + mu2)
>>>>          return mean, var, g1, g2
>>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>>                        shapes="mu1,mu2", extradoc="")
>>>>
>>>>
>>>>
>>>>          
>>> Thanks, I think the distribution of the difference of two poisson
>>> distributed random variables could be useful.
>>>
>>> Would you please open an enhancement ticket for this at
>>> http://projects.scipy.org/scipy/report/1
>>>
>>> I had only a brief look at it so far, I had never looked at the
>>> Skellam distribution before, and just read a few references.
>>>
>>> The "if x<   0 .. else ..." will have to be replace with a
>>> "numpy.where" assignment, since the methods are supposed to work with
>>> arrays of x (as far as I remember)
>>>
>>> _rvs could be implemented directly instead of generically (I don't
>>> find the reference, where I saw it, right now).
>>>
>>> Documentation will be necessary,  a brief description in the
>>> (currently) extradocs, and a listing of the properties for the
>>> description of the distributions currently in the stats tutorial.
>>>
>>> I have some background questions, which address the limitation of the
>>> implementation (but are not really necessary for inclusion into
>>> scipy).
>>>
>>> The description in R mentions several implementation of Skellam. Do
>>> you have a rough idea what the range of parameters are for which the
>>> implementation using ncx produces good results? Do you know if any
>>> other special functions would produce good results over a larger
>>> range, e.g. using Bessel function?
>>>
>>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>>> mentions (but doesn't describe) the case of Skellam distribution with
>>> correlated Poisson distributions. Do you know what the difference to
>>> your implementation would be?
>>>
>>> Tests for a new distribution will be picked up by the generic tests,
>>> but it would be useful to have some extra tests for extreme/uncommon
>>> parameter ranges. Do you have any comparisons with R, since you
>>> already looked it?
>>>
>>>
>>> Thanks again, I'm always looking out for new useful distributions,
>>> (but I have to find the time to do the testing and actual
>>> implementation).
>>>
>>> Josef
>>>
>>>
>>>
>>>
>>>        
>>>> Bye.
>>>>
>>>> --
>>>> Ernest
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>>
>>>>          
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>        
>> Generally any R code can not be used in numpy because R is GPL. Usually
>> R code is also licensed under GPL  so translation from R to Python/numpy
>> still maintains the original license. So the code not used by numpy
>> unless that code is licensed under a BSD compatible license.
>>
>> You *must* show that you implementation is from a BSD-compatible source
>> not from the R package. I can see that your code is very simple so there
>> should be an viable alternative source.
>>
>> Also, in the _stats function why do you do not re-use the mean and var
>> variables in computing the g1 and g2 variables?
>>
>> What are 'x, mu1, mu2' ?
>> This looks like a scalar implementation so you need to either check that
>> or allow for array-like inputs.
>>
>> Bruce
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>>      
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>    
I am not a lawyer! But I do not see that any reference to not seeing the 
code. Furthermore, there is insufficient information in the cited 
reference for this implementation (but I have not seen the actual code 
and would rather not have to see it). But, as Josef pointed out, there 
is a Wikipedia source so it should be trivial to show that this code is 
independent of the R implementation.

Bruce


From josef.pktd at gmail.com  Mon Nov  9 11:58:04 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 11:58:04 -0500
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF8411D.9050605@gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr>
	<4AF8411D.9050605@gmail.com>
Message-ID: <1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com>

On Mon, Nov 9, 2009 at 11:19 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote:
>> ? ?From what I understand of the initial statement from Ernest:
>> "
>> In case somebody is interested, or you want to include it
>> in scipy. I used these specs here from the R package:
>> cran.r-project.org/web/packages/skellam/skellam.pdf
>> "
>>
>> he used the spec, as defined in this pdf, and did not look at the code
>> itself. If my interpretation of the small preamble above is correct, I
>> believe his implementation is not GPL-tainted, right?
>>
>> Johann
>> Bruce Southey wrote:
>>
>>> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>>>
>>>
>>>> 2009/11/8 Ernest Adrogu?<eadrogue at gmx.net>:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> In case somebody is interested, or you want to include it
>>>>> in scipy. I used these specs here from the R package:
>>>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>>>
>>>>> Note that I am no statician, somebody who knows what he's
>>>>> doing (as opposed to me ;) should verify it's correct.
>>>>>
>>>>>
>>>>> import numpy
>>>>> import scipy.stats.distributions
>>>>>
>>>>> # Skellam distribution
>>>>>
>>>>> ncx2 = scipy.stats.distributions.ncx2
>>>>>
>>>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>>> ? ? ?def _pmf(self, x, mu1, mu2):
>>>>> ? ? ? ? ?if x< ? 0:
>>>>> ? ? ? ? ? ? ?px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>>> ? ? ? ? ?else:
>>>>> ? ? ? ? ? ? ?px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>>> ? ? ? ? ?return px
>>>>> ? ? ?def _cdf(self, x, mu1, mu2):
>>>>> ? ? ? ? ?x = numpy.floor(x)
>>>>> ? ? ? ? ?if x< ? 0:
>>>>> ? ? ? ? ? ? ?px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>>> ? ? ? ? ?else:
>>>>> ? ? ? ? ? ? ?px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>>> ? ? ? ? ?return px
>>>>> ? ? ?def _stats(self, mu1, mu2):
>>>>> ? ? ? ? ?mean = mu1 - mu2
>>>>> ? ? ? ? ?var = mu1 + mu2
>>>>> ? ? ? ? ?g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>>> ? ? ? ? ?g2 = 1 / (mu1 + mu2)
>>>>> ? ? ? ? ?return mean, var, g1, g2
>>>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="")
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Thanks, I think the distribution of the difference of two poisson
>>>> distributed random variables could be useful.
>>>>
>>>> Would you please open an enhancement ticket for this at
>>>> http://projects.scipy.org/scipy/report/1
>>>>
>>>> I had only a brief look at it so far, I had never looked at the
>>>> Skellam distribution before, and just read a few references.
>>>>
>>>> The "if x< ? 0 .. else ..." will have to be replace with a
>>>> "numpy.where" assignment, since the methods are supposed to work with
>>>> arrays of x (as far as I remember)
>>>>
>>>> _rvs could be implemented directly instead of generically (I don't
>>>> find the reference, where I saw it, right now).
>>>>
>>>> Documentation will be necessary, ?a brief description in the
>>>> (currently) extradocs, and a listing of the properties for the
>>>> description of the distributions currently in the stats tutorial.
>>>>
>>>> I have some background questions, which address the limitation of the
>>>> implementation (but are not really necessary for inclusion into
>>>> scipy).
>>>>
>>>> The description in R mentions several implementation of Skellam. Do
>>>> you have a rough idea what the range of parameters are for which the
>>>> implementation using ncx produces good results? Do you know if any
>>>> other special functions would produce good results over a larger
>>>> range, e.g. using Bessel function?
>>>>
>>>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>>>> mentions (but doesn't describe) the case of Skellam distribution with
>>>> correlated Poisson distributions. Do you know what the difference to
>>>> your implementation would be?

same form different interpretation

"The distributions of the dierence between two independent and two
bivariate (correlated)
Poisson variates are of the same form. However, the interpretation of
the parameters is different.
Assuming that the bivariate Poisson distribution is the correct
distribution, then the
marginal means x and y will be unbiased estimates of 1 + 3 and 2
+ 3, respectively,
instead of the parameters of interest 1 and 2. Therefore, the
parameters of the PD distribution
are not directly connected to the marginal means of the actual Poisson
distributions."

from
Bayesian analysis of the dierences of count data
D. Karlis and I. Ntzoufras
STATISTICS IN MEDICINE
Statist. Med. 2006; 25:1885?1905

they have some funny application to soccer scores
http://stat-athens.aueb.gr/~jbn/publications.htm

Josef
Published online 26 October 2005 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/sim.2382


>>>>
>>>> Tests for a new distribution will be picked up by the generic tests,
>>>> but it would be useful to have some extra tests for extreme/uncommon
>>>> parameter ranges. Do you have any comparisons with R, since you
>>>> already looked it?
>>>>
>>>>
>>>> Thanks again, I'm always looking out for new useful distributions,
>>>> (but I have to find the time to do the testing and actual
>>>> implementation).
>>>>
>>>> Josef
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Bye.
>>>>>
>>>>> --
>>>>> Ernest
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>>
>>> Generally any R code can not be used in numpy because R is GPL. Usually
>>> R code is also licensed under GPL ?so translation from R to Python/numpy
>>> still maintains the original license. So the code not used by numpy
>>> unless that code is licensed under a BSD compatible license.
>>>
>>> You *must* show that you implementation is from a BSD-compatible source
>>> not from the R package. I can see that your code is very simple so there
>>> should be an viable alternative source.
>>>
>>> Also, in the _stats function why do you do not re-use the mean and var
>>> variables in computing the g1 and g2 variables?
>>>
>>> What are 'x, mu1, mu2' ?
>>> This looks like a scalar implementation so you need to either check that
>>> or allow for array-like inputs.
>>>
>>> Bruce
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> I am not a lawyer! But I do not see that any reference to not seeing the
> code. Furthermore, there is insufficient information in the cited
> reference for this implementation (but I have not seen the actual code
> and would rather not have to see it). But, as Josef pointed out, there
> is a Wikipedia source so it should be trivial to show that this code is
> independent of the R implementation.
>
> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bsouthey at gmail.com  Mon Nov  9 12:18:41 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Nov 2009 11:18:41 -0600
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
Message-ID: <4AF84EF1.2090608@gmail.com>

On 11/08/2009 01:47 AM, Anne Archibald wrote:
> 2009/11/7 Bruce Southey<bsouthey at gmail.com>:
>    
>> On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald
>> <aarchiba at physics.mcgill.ca>  wrote:
>>      
>>> Hi,
>>>
>>> I have implemented a simple Bayesian regression program (it takes
>>> events modulo one and returns a posterior probability that the data is
>>> phase-invariant plus a posterior distribution for two parameters
>>> (modulation fraction and phase) in case there is modulation).
>>>        
>> I do not know your field, a little rusty on certain issues and I do
>> not consider myself a Bayesian.
>>
>> Exactly what type of Bayesian did you use?
>> I also do not know how you implemented it especially if it is
>> empirical or Monte Carlo Markov Chains.
>>      
> It's an ultra-simple toy problem, really: I did the numerical
> integration in the absolute simplest way possible, by evaluating the
> quantity to be evaluated on a grid and averaging. See github for
> details:
> http://github.com/aarchiba/bayespf
>
> I can certainly improve on this, but I'd rather get my testing issues
> sorted out first, so that I can test the tests, as it were, on an
> implementation I'm reasonably confident is correct, before changing it
> to a mathematically more subtle one.
>    
I do not know what you are trying to do with the code as it is not my 
area. But you are using some empirical Bayesian estimator 
(http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose 
much of the value of Bayesian as you are only dealing with modal 
estimates. Really you should be obtaining the distribution of 
"Probability the signal is pulsed" not just the modal estimate.

>>> I'm
>>> rather new at this, so I'd like to construct some unit tests. Does
>>> anyone have any suggestions on how to go about this?
>>>        
>> Since this is a test, the theoretical 'correctness' is irrelevant. So
>> I would guess that you should use very informative priors and data
>> with a huge amount of information. That should make the posterior have
>> an extremely narrow range so your modal estimate is very close to the
>> true value within a very small range.
>>      
> This doesn't really test whether the estimator is doing a good job,
> since if I throw mountains of information at it, even a rather badly
> wrong implementation will eventually converge to the right answer.
> (This is painful experience speaking.)
>    
Are you testing the code or the method?
My understanding of unit tests is that they test the code not the 
method. Unit tests tell me that my code is working correctly but do not 
necessary tell me if the method is right always. For example, if I need 
to iterate to get a solution, my test could stop after 1 or 2 rounds 
before convergence because I know that rest will be correct if the first 
rounds are correct.

Testing the algorithm is relatively easy because you just have to use 
sensitivity analysis. Basically just use multiple data sets that vary in 
the number of observations and parameters to see how well these work. 
The hard part is making sense of the numbers.

Also note that you have some explicit assumptions involved like the type 
of prior distribution. These tend to limit what you can do because if 
these assume a uniform prior then you can not use a non-uniform data 
set. Well you can but unless the data dominates the prior you will most 
likely get a weird answer.

> I disagree on the issue of theoretical correctness, though. The best
> tests do exactly that: test the theoretical correctness of the routine
> in question, ideally without any reference to the implementation. To
> test the SVD, for example, you just test that the two matrices are
> both orthogonal, and you test that multiplying them together with the
> singular values between gives you your original matrix. If your
> implementation passes this test, it is computing the SVD just fine, no
> matter what it looks like inside.
>    
I agree that code must provide theoretical correctness. But I disagree 
that the code should always give the correct answer because the code is 
only as good as the algorithm.

> With the frequentist signal-detection statistics I'm more familiar
> with, I can write exactly this sort of test. I talk a little more
> about it here:
> http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html
>
> This works too well, it turns out, to apply to scipy's K-S test or my
> own Kuiper test, since their p-values are calculated rather
> approximately, so they fail.
>    
Again, this is a failure of the algorithm not the code. Often 
statistical tests rely on large sample sizes or rely on the central 
limit theorem so these break down when the data is not well approximated 
by the normal distribution and when the sample size is small. In the 
blog, your usage of the chi-squared approximation is an example of this 
- it will be inappropriate for small sample size as well as when the 
true probability is very extreme (usually consider it valid between 
about 0.2 and 0.8 obviously depending on sample size).


>> After that it really depends on the algorithm, the data used and what
>> you need to test. Basically you just have to say given this set of
>> inputs I get this 'result' that I consider reasonable. After all, if
>> the implementation of algorithm works then it is most likely the
>> inputs that are a problem. In statistics, problems usually enter
>> because the desired model can not be estimated from the provided data.
>> Separation of user errors from a bug in the code usually identified by
>> fitting simpler or alternative models.
>>      
> It's exactly the implementation I don't trust, here. I can scrutinize
> the implementation all I like, but I'd really like an independent
> check on my calculations, and staring at the code won't get me that.
>    

I think that you mean is the algorithm and I do agree that looking at 
the code will only tell you that you have implemented the algorithm and 
will not tell you if the algorithm can be trusted.

>>> For a frequentist periodicity detector, the return value is a
>>> probability that, given the null hypothesis is true, the statistic
>>> would be this extreme. So I can construct a strong unit test by
>>> generating a collection of data sets given the null hypothesis,
>>> evaluating the statistic, and seeing whether the number that claim to
>>> be significant at a 5% level is really 5%. (In fact I can use the
>>> binomial distribution to get limits on the number of false positive.)
>>> This gives me a unit test that is completely orthogonal to my
>>> implementation, and that passes if and only if the code works. For a
>>> Bayesian hypothesis testing setup, I don't really see how to do
>>> something analogous.
>>>
>>> I can generate non-modulated data sets and confirm that my code
>>> returns a high probability that the data is not modulated, but how
>>> high should I expect the probability to be? I can generate data sets
>>> with models with known parameters and check that the best-fit
>>> parameters are close to the known parameters - but how close? Even if
>>> I do it many times, is the posterior mean unbiased? What about the
>>> posterior mode or median? I can even generate models and then data
>>> sets that are drawn from the prior distribution, but what should I
>>> expect from the code output on such a data set? I feel sure there's
>>> some test that verifies a statistical property of Bayesian
>>> estimators/hypothesis testers, but I cant quite put my finger on it.
>>>
>>> Suggestions welcome.
>>>
>>> Thanks,
>>> Anne
>>>        
>> Please do not mix Frequentist or Likelihood concepts with Bayesian.
>> Also you never generate data for estimation from the prior
>> distribution, you generate it from the posterior distribution as that
>> is what your estimating.
>>      
> Um. I would be picking models from the prior distribution, not data.
> However I find the models, I have a well-defined way to generate data
> from the model.
>
> Why do you say it's a bad idea to mix Bayesian and frequentist
> approaches? It seems to me that as I use them to try to answer similar
> questions, it makes sense to compare them; and since I know how to
> test frequentist estimators, it's worth seeing whether I can cast
> Bayesian estimators in frequentist terms, at least for testing
> purposes.
>    

This is the fundamental difference between Bayesian and Frequentist 
approaches. In Bayesian, the posterior provides everything that you know 
about a parameter because it is a distribution. However, the modal 
parameter estimates should agree between both approaches.

>> Really in Bayesian sense all this data generation is unnecessary
>> because you have already calculated that information in computing the
>> posteriors. The posterior of a parameter is a distribution not a
>> single number so you just compare distributions.  For example, you can
>> compute modal values and construct Bayesian credible intervals of the
>> parameters. These should make very strong sense to the original values
>> simulated.
>>      
> I take this to mean that I don't need to do simulations to get
> credible intervals (while I normally would have to to get confidence
> intervals), which I agree with. But this is a different question: I'm
> talking about constructing a test by simulating the whole Bayesian
> process and seeing whether it behaves as it should. The problem is
> coming up with a sufficiently clear mathematical definition of
> "should".
>    
In Bayesian, you should have the posterior distribution of the parameter 
which is far more than just the mode, mean and variance. So if the 
posterior is normal, then I know what the mean and variance should be 
and thus what the confidence interval should be.


>> For Bayesian work, you must address the data and the priors. In
>> particular, you need to be careful about the informativeness of the
>> prior. You can get great results just because your prior was
>> sufficiently informative but you can get great results because you
>> data was very informative.
>>
>> Depending on how it was implemented, a improper prior can be an issue
>> because these do not guarantee a proper posterior (but often do lead
>> to proper posteriors). So if your posterior is improper then you are
>> in a very bad situation and can lead to weird results some or all of
>> the time.Some times this is can easily be fixed such as by putting
>> bounds on flat priors. Whereas proper priors give proper posteriors.
>>      
> Indeed. I think my priors are pretty safe: 50% chance it's pulsed,
> flat priors in phase and pulsed fraction. In the long run I might want
> a slightly smarter prior on pulsed fraction, but for the moment I
> think it's fine.
>    
It is not about whether or not the prior are 'safe'. Rather it is the 
relative amount of information given.

Also, from other areas, I tend to distrust anything that assumes a 50:50 
split because these often lead to special results that do not always 
occur. So you think great, my code is working as everything looks fine. 
Then everything crashes when you deviate from that assumption because 
the 50% probability 'hides' something.
An example in my area, the formula (for genetic effect) is of the form:
alpha=2*p*q*(a + d(q-p))
where alpha, a and d are parameters and p and q are frequencies that add 
to one.
We could mistakenly assume that a=alpha but that is only true if d=0 or 
when p=q=0.5. So we may start to get incorrect results when p is not 
equal to q and d is not zero.


>> But as a final comment, it should not matter which approach you use as
>> if you do not get what you simulated then either your code is wrong or
>> you did not simulate what your code implements. (Surprising how
>> frequent the latter is.)
>>      
> This is a bit misleading. If I use a (fairly) small number of photons,
> and/or a fairly small pulsed fraction, I should be astonished if I got
> back the model parameters exactly. I know already that the data leave
> a lot of room for slop, so what I am trying to test is how well this
> Bayesian gizmo quantifies that slop.
>
> Anne
>    
>
You should not be astonished to get the prior as that is exactly what 
should happen if the data contains no information.

For your simplistic case, I would implore you to generate the posterior 
distribution of the probability that the signal is pulsed (yes, easier 
to say than do).  If you can do that then you will not only you get the 
modal value but you can compute the area under that distribution that is 
say less than 0.5 or whatever threshold you want to use.

Bruce


From josef.pktd at gmail.com  Mon Nov  9 13:02:38 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 13:02:38 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <4AF84EF1.2090608@gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
	<4AF84EF1.2090608@gmail.com>
Message-ID: <1cd32cbb0911091002s3bdb14cama1285a6596feda6a@mail.gmail.com>

On Mon, Nov 9, 2009 at 12:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/08/2009 01:47 AM, Anne Archibald wrote:
>> 2009/11/7 Bruce Southey<bsouthey at gmail.com>:
>>
>>> On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald
>>> <aarchiba at physics.mcgill.ca> ?wrote:
>>>
>>>> Hi,
>>>>
>>>> I have implemented a simple Bayesian regression program (it takes
>>>> events modulo one and returns a posterior probability that the data is
>>>> phase-invariant plus a posterior distribution for two parameters
>>>> (modulation fraction and phase) in case there is modulation).
>>>>
>>> I do not know your field, a little rusty on certain issues and I do
>>> not consider myself a Bayesian.
>>>
>>> Exactly what type of Bayesian did you use?
>>> I also do not know how you implemented it especially if it is
>>> empirical or Monte Carlo Markov Chains.
>>>
>> It's an ultra-simple toy problem, really: I did the numerical
>> integration in the absolute simplest way possible, by evaluating the
>> quantity to be evaluated on a grid and averaging. See github for
>> details:
>> http://github.com/aarchiba/bayespf
>>
>> I can certainly improve on this, but I'd rather get my testing issues
>> sorted out first, so that I can test the tests, as it were, on an
>> implementation I'm reasonably confident is correct, before changing it
>> to a mathematically more subtle one.
>>
> I do not know what you are trying to do with the code as it is not my
> area. But you are using some empirical Bayesian estimator
> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose
> much of the value of Bayesian as you are only dealing with modal
> estimates. Really you should be obtaining the distribution of
> "Probability the signal is pulsed" not just the modal estimate.
>
>>>> I'm
>>>> rather new at this, so I'd like to construct some unit tests. Does
>>>> anyone have any suggestions on how to go about this?
>>>>
>>> Since this is a test, the theoretical 'correctness' is irrelevant. So
>>> I would guess that you should use very informative priors and data
>>> with a huge amount of information. That should make the posterior have
>>> an extremely narrow range so your modal estimate is very close to the
>>> true value within a very small range.
>>>
>> This doesn't really test whether the estimator is doing a good job,
>> since if I throw mountains of information at it, even a rather badly
>> wrong implementation will eventually converge to the right answer.
>> (This is painful experience speaking.)
>>
> Are you testing the code or the method?
> My understanding of unit tests is that they test the code not the
> method. Unit tests tell me that my code is working correctly but do not
> necessary tell me if the method is right always. For example, if I need
> to iterate to get a solution, my test could stop after 1 or 2 rounds
> before convergence because I know that rest will be correct if the first
> rounds are correct.
>
> Testing the algorithm is relatively easy because you just have to use
> sensitivity analysis. Basically just use multiple data sets that vary in
> the number of observations and parameters to see how well these work.
> The hard part is making sense of the numbers.
>
> Also note that you have some explicit assumptions involved like the type
> of prior distribution. These tend to limit what you can do because if
> these assume a uniform prior then you can not use a non-uniform data
> set. Well you can but unless the data dominates the prior you will most
> likely get a weird answer.
>
>> I disagree on the issue of theoretical correctness, though. The best
>> tests do exactly that: test the theoretical correctness of the routine
>> in question, ideally without any reference to the implementation. To
>> test the SVD, for example, you just test that the two matrices are
>> both orthogonal, and you test that multiplying them together with the
>> singular values between gives you your original matrix. If your
>> implementation passes this test, it is computing the SVD just fine, no
>> matter what it looks like inside.
>>
> I agree that code must provide theoretical correctness. But I disagree
> that the code should always give the correct answer because the code is
> only as good as the algorithm.
>
>> With the frequentist signal-detection statistics I'm more familiar
>> with, I can write exactly this sort of test. I talk a little more
>> about it here:
>> http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html
>>
>> This works too well, it turns out, to apply to scipy's K-S test or my
>> own Kuiper test, since their p-values are calculated rather
>> approximately, so they fail.
>>
> Again, this is a failure of the algorithm not the code. Often
> statistical tests rely on large sample sizes or rely on the central
> limit theorem so these break down when the data is not well approximated
> by the normal distribution and when the sample size is small. In the
> blog, your usage of the chi-squared approximation is an example of this
> - it will be inappropriate for small sample size as well as when the
> true probability is very extreme (usually consider it valid between
> about 0.2 and 0.8 obviously depending on sample size).
>
>
>>> After that it really depends on the algorithm, the data used and what
>>> you need to test. Basically you just have to say given this set of
>>> inputs I get this 'result' that I consider reasonable. After all, if
>>> the implementation of algorithm works then it is most likely the
>>> inputs that are a problem. In statistics, problems usually enter
>>> because the desired model can not be estimated from the provided data.
>>> Separation of user errors from a bug in the code usually identified by
>>> fitting simpler or alternative models.
>>>
>> It's exactly the implementation I don't trust, here. I can scrutinize
>> the implementation all I like, but I'd really like an independent
>> check on my calculations, and staring at the code won't get me that.
>>
>
> I think that you mean is the algorithm and I do agree that looking at
> the code will only tell you that you have implemented the algorithm and
> will not tell you if the algorithm can be trusted.
>
>>>> For a frequentist periodicity detector, the return value is a
>>>> probability that, given the null hypothesis is true, the statistic
>>>> would be this extreme. So I can construct a strong unit test by
>>>> generating a collection of data sets given the null hypothesis,
>>>> evaluating the statistic, and seeing whether the number that claim to
>>>> be significant at a 5% level is really 5%. (In fact I can use the
>>>> binomial distribution to get limits on the number of false positive.)
>>>> This gives me a unit test that is completely orthogonal to my
>>>> implementation, and that passes if and only if the code works. For a
>>>> Bayesian hypothesis testing setup, I don't really see how to do
>>>> something analogous.
>>>>
>>>> I can generate non-modulated data sets and confirm that my code
>>>> returns a high probability that the data is not modulated, but how
>>>> high should I expect the probability to be? I can generate data sets
>>>> with models with known parameters and check that the best-fit
>>>> parameters are close to the known parameters - but how close? Even if
>>>> I do it many times, is the posterior mean unbiased? What about the
>>>> posterior mode or median? I can even generate models and then data
>>>> sets that are drawn from the prior distribution, but what should I
>>>> expect from the code output on such a data set? I feel sure there's
>>>> some test that verifies a statistical property of Bayesian
>>>> estimators/hypothesis testers, but I cant quite put my finger on it.
>>>>
>>>> Suggestions welcome.
>>>>
>>>> Thanks,
>>>> Anne
>>>>
>>> Please do not mix Frequentist or Likelihood concepts with Bayesian.
>>> Also you never generate data for estimation from the prior
>>> distribution, you generate it from the posterior distribution as that
>>> is what your estimating.
>>>
>> Um. I would be picking models from the prior distribution, not data.
>> However I find the models, I have a well-defined way to generate data
>> from the model.
>>
>> Why do you say it's a bad idea to mix Bayesian and frequentist
>> approaches? It seems to me that as I use them to try to answer similar
>> questions, it makes sense to compare them; and since I know how to
>> test frequentist estimators, it's worth seeing whether I can cast
>> Bayesian estimators in frequentist terms, at least for testing
>> purposes.
>>
>
> This is the fundamental difference between Bayesian and Frequentist
> approaches. In Bayesian, the posterior provides everything that you know
> about a parameter because it is a distribution. However, the modal
> parameter estimates should agree between both approaches.
>
>>> Really in Bayesian sense all this data generation is unnecessary
>>> because you have already calculated that information in computing the
>>> posteriors. The posterior of a parameter is a distribution not a
>>> single number so you just compare distributions. ?For example, you can
>>> compute modal values and construct Bayesian credible intervals of the
>>> parameters. These should make very strong sense to the original values
>>> simulated.
>>>
>> I take this to mean that I don't need to do simulations to get
>> credible intervals (while I normally would have to to get confidence
>> intervals), which I agree with. But this is a different question: I'm
>> talking about constructing a test by simulating the whole Bayesian
>> process and seeing whether it behaves as it should. The problem is
>> coming up with a sufficiently clear mathematical definition of
>> "should".
>>
> In Bayesian, you should have the posterior distribution of the parameter
> which is far more than just the mode, mean and variance. So if the
> posterior is normal, then I know what the mean and variance should be
> and thus what the confidence interval should be.
>
>
>>> For Bayesian work, you must address the data and the priors. In
>>> particular, you need to be careful about the informativeness of the
>>> prior. You can get great results just because your prior was
>>> sufficiently informative but you can get great results because you
>>> data was very informative.
>>>
>>> Depending on how it was implemented, a improper prior can be an issue
>>> because these do not guarantee a proper posterior (but often do lead
>>> to proper posteriors). So if your posterior is improper then you are
>>> in a very bad situation and can lead to weird results some or all of
>>> the time.Some times this is can easily be fixed such as by putting
>>> bounds on flat priors. Whereas proper priors give proper posteriors.
>>>
>> Indeed. I think my priors are pretty safe: 50% chance it's pulsed,
>> flat priors in phase and pulsed fraction. In the long run I might want
>> a slightly smarter prior on pulsed fraction, but for the moment I
>> think it's fine.
>>
> It is not about whether or not the prior are 'safe'. Rather it is the
> relative amount of information given.
>
> Also, from other areas, I tend to distrust anything that assumes a 50:50
> split because these often lead to special results that do not always
> occur. So you think great, my code is working as everything looks fine.
> Then everything crashes when you deviate from that assumption because
> the 50% probability 'hides' something.
> An example in my area, the formula (for genetic effect) is of the form:
> alpha=2*p*q*(a + d(q-p))
> where alpha, a and d are parameters and p and q are frequencies that add
> to one.
> We could mistakenly assume that a=alpha but that is only true if d=0 or
> when p=q=0.5. So we may start to get incorrect results when p is not
> equal to q and d is not zero.
>
>
>>> But as a final comment, it should not matter which approach you use as
>>> if you do not get what you simulated then either your code is wrong or
>>> you did not simulate what your code implements. (Surprising how
>>> frequent the latter is.)
>>>
>> This is a bit misleading. If I use a (fairly) small number of photons,
>> and/or a fairly small pulsed fraction, I should be astonished if I got
>> back the model parameters exactly. I know already that the data leave
>> a lot of room for slop, so what I am trying to test is how well this
>> Bayesian gizmo quantifies that slop.
>>
>> Anne
>>
>>
> You should not be astonished to get the prior as that is exactly what
> should happen if the data contains no information.
>
> For your simplistic case, I would implore you to generate the posterior
> distribution of the probability that the signal is pulsed (yes, easier
> to say than do). ?If you can do that then you will not only you get the
> modal value but you can compute the area under that distribution that is
> say less than 0.5 or whatever threshold you want to use.
>
> Bruce

With the limitation of using only a single prior at the current stage, I think
Anne has already implemented the standard Bayesian estimation including
a nice graph of the joint posterior distribution of fraction and phase.

Whether the signal is pulsed or not is just a binary variable, and the posterior
belief is just the probability that it is pulsed. (I think that's correct, since
there is no additional level where the "population" distribution of pulsing
versus non-pulsing signals is a random variable.)

I think it is very useful to look at the sampling distribution of a Bayesian
estimator or test. For an individual Bayesian everything is summarized
in the posterior distribution, but how well are a 1000 Bayesian
econometricians/statisticians doing that look at a 1000 different stars,
especially if I'm not sure they programmed their code correctly?
At the end, it's only interesting if they have a lower MSE or a higher
power in the test, otherwise I just use a different algorithm.

I usually check and test the statistical properties of an algorithm
and not just whether it is correctly implemented. And I think Anne
is doing it in a similar way, checking whether the size of a
statistical test is ok. Of course, for applications where only
incorrectly sized (biased) tests are available, it is difficult to find
a good tightness of the unit tests.

Josef


> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From peridot.faceted at gmail.com  Mon Nov  9 13:06:15 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 9 Nov 2009 13:06:15 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <4AF84EF1.2090608@gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
	<4AF84EF1.2090608@gmail.com>
Message-ID: <ce557a360911091006h4f68a6c5gcfa53372f100dabc@mail.gmail.com>

2009/11/9 Bruce Southey <bsouthey at gmail.com>:

> I do not know what you are trying to do with the code as it is not my
> area. But you are using some empirical Bayesian estimator
> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose
> much of the value of Bayesian as you are only dealing with modal
> estimates. Really you should be obtaining the distribution of
> "Probability the signal is pulsed" not just the modal estimate.

Um. Given a data set and a prior, I just do Bayesian hypothesis
comparison. This gives me a single probability that the signal is
pulsed. You seem to be imagining a probability distribution for this
probability - but what would the independent variables be? The
unpulsed distribution does not depend on any parameters, and I have
integrated over all possible values for the pulsed distribution. So
what I get should really be the probability, given the data, that the
signal is pulsed. I'm not using an empirical Bayesian estimator; I'm
doing the numerical integrations directly (and inefficiently).

>> This doesn't really test whether the estimator is doing a good job,
>> since if I throw mountains of information at it, even a rather badly
>> wrong implementation will eventually converge to the right answer.
>> (This is painful experience speaking.)
>>
> Are you testing the code or the method?
> My understanding of unit tests is that they test the code not the
> method. Unit tests tell me that my code is working correctly but do not
> necessary tell me if the method is right always. For example, if I need
> to iterate to get a solution, my test could stop after 1 or 2 rounds
> before convergence because I know that rest will be correct if the first
> rounds are correct.

Unit tests can be used to do either. Since what I'm trying to do here
is make sure I understand Bayesian inference, I'm most worried about
the algorithm.

> Testing the algorithm is relatively easy because you just have to use
> sensitivity analysis. Basically just use multiple data sets that vary in
> the number of observations and parameters to see how well these work.
> The hard part is making sense of the numbers.

It is exactly how to make sense of the numbers that I'm asking about.

> Also note that you have some explicit assumptions involved like the type
> of prior distribution. These tend to limit what you can do because if
> these assume a uniform prior then you can not use a non-uniform data
> set. Well you can but unless the data dominates the prior you will most
> likely get a weird answer.

I don't understand what you mean by a "non-uniform data set".
Individual data sets are drawn from models, one of which is uniform.
The priors define the distribution of models; the priors I use give a
50% chance the model is uniform and a 50% chance the model is pulsed.

Anne


From rpg.314 at gmail.com  Mon Nov  9 13:11:29 2009
From: rpg.314 at gmail.com (Rohit Garg)
Date: Mon, 9 Nov 2009 23:41:29 +0530
Subject: [SciPy-User] Distributed computing: running embarrassingly parallel
	(python/c++) codes over a cluster
Message-ID: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>

Hi all,

I have an embarrassingly parallel problem, very nicely suited to
parallelization. I am looking for community feedback on how to best
approach this matter? Basically, I just setup a bunch of tasks, and
the various cpu's will pull data, process it, and send it back. Out of
order arrival of results is no problem. The processing times involved
are so large that the communication is effectively free, and hence I
don't care how fast/slow the communication is. I thought I'll ask in
case somebody has done this stuff before to avoid reinventing the
wheel. Any other suggestions are welcome too.

My only constraint is that it should be able to run a python extension
(c++) with minimum of fuss. I want to minimize the headaches involved
with setting up/writing the boilerplate code. Which
framework/approach/library would you recommend?

There is one method mentioned at [1], and of course, one could resort
to something like mpi4py.

[1] http://docs.python.org/library/multiprocessing.html   {see the last example}

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay


From peridot.faceted at gmail.com  Mon Nov  9 13:14:36 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 9 Nov 2009 13:14:36 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
	<ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
	<1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>
	<ce557a360911081922u3fa375d9pc525568c7f478c96@mail.gmail.com>
	<1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com>
Message-ID: <ce557a360911091014p6da502dai5a46078c1b0612b2@mail.gmail.com>

2009/11/9  <josef.pktd at gmail.com>:

> >From the posterior probability S/(S+1), you could construct
> a decision rule similar to a classical test, e.g. accept null
> if S/(S+1) < 0.95, and then construct a MonteCarlo
> with samples drawn form either the uniform or the pulsed
> distribution in the same way as for a classical test, and
> verify that the decision mistakes, alpha and beta errors, in the
> sample are close to the posterior probabilities.
> The posterior probability would be similar to the p-value
> in a classical test. If you want to balance alpha and
> beta errors, a threshold S/(S+1)<0.5 would be more
> appropriate, but for the unit tests it wouldn't matter.

Unfortunately this doesn't work. Think of it this way: if my data size
is 10000 photons, and I'm looking at the fraction of
uniformly-distributed data sets that have a probability > 0.95 that
they are pulsed, this won't happen with 5% of my fake data sets - it
will almost never happen, since 10000 photons are enough to give a
very solid answer (experiment confirms this). So I can't interpret my
Bayesian probability as a frequentist probability of alpha error.

> Running the example a few times, it looks like that the power
> is relatively low for distinguishing uniform distribution from
> a pulsed distribution with fraction/binomial parameter 0.05
> and sample size <1000.
> If you have strong beliefs that the fraction is really this low
> than an informative prior for the fraction, might improve the
> results.

I really don't want to encourage my code to return reports of
pulsations. To be believed in this nest of frequentists I work with, I
need a solid detection in spite of very conservative priors.

Anne


From eadrogue at gmx.net  Mon Nov  9 13:16:51 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Mon, 9 Nov 2009 19:16:51 +0100
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
Message-ID: <20091109181650.GA5957@doriath.local>

 9/11/09 @ 09:40 (-0500), thus spake josef.pktd at gmail.com:
> Thanks, I think the distribution of the difference of two poisson
> distributed random variables could be useful.
> 
> Would you please open an enhancement ticket for this at
> http://projects.scipy.org/scipy/report/1

Okay, I will.

> I had only a brief look at it so far, I had never looked at the
> Skellam distribution before, and just read a few references.
> 
> The "if x < 0 .. else ..." will have to be replace with a
> "numpy.where" assignment, since the methods are supposed to work with
> arrays of x (as far as I remember)

Done.

> _rvs could be implemented directly instead of generically (I don't
> find the reference, where I saw it, right now).

I suppose it could be done, but I can't figure out how.
As far as I understand, you can't derive the poisson parameters
from the skellam parameters alone, you need the correlation
coefficient (between the two poisson rv) too, is that right?

> Documentation will be necessary,  a brief description in the
> (currently) extradocs, and a listing of the properties for the
> description of the distributions currently in the stats tutorial.
> 
> I have some background questions, which address the limitation of the
> implementation (but are not really necessary for inclusion into
> scipy).
> 
> The description in R mentions several implementation of Skellam. Do
> you have a rough idea what the range of parameters are for which the
> implementation using ncx produces good results? Do you know if any
> other special functions would produce good results over a larger
> range, e.g. using Bessel function?

No, sorry, I have no idea. The R paper says that in R the Bessel
function is more accurate, but I don't think this means that the
Bessel function implementation is more accurate in general.
I compared results using the Bessel function in scipy and using
the ncx2 implementation, and the differences were minimal, although
I admit my testing wasn't extensive. Another problem is that
I haven't got a table of values for this particular distribution,
so in case results differ I can't tell which one is more accurate.

> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
> mentions (but doesn't describe) the case of Skellam distribution with
> correlated Poisson distributions. Do you know what the difference to
> your implementation would be?

As far as I know, it makes no difference if the two Poisson
variates are correlated or not. The Skellam parameters are
defined as

mu1 = lam1 - rho * sqrt(lam1*lam2)
mu2 = lam2 - rho * sqrt(lam1*lam2)

where lam1 and lam2 are the Poisson means and rho is the
correlation coefficient. So, in case there is correlation it
is implicit in the parameters mu1 and mu2, and it doesn't
make any difference in terms of calculating the pmf or cdf.
Again, I am no statician or mathematician.

> Tests for a new distribution will be picked up by the generic tests,
> but it would be useful to have some extra tests for extreme/uncommon
> parameter ranges. Do you have any comparisons with R, since you
> already looked it?

I have this visual test, although it's usefulness is
questionable. It shows the deviation between observed and
expected frequencies for a Skellam random variable. Ideally
errors should be random and centered around 0. To me
it looks like errors tend to increase around the mean
and are smaller along the tails, but I don't know if this
means something or not.

import numpy
import scipy.stats.distributions

poisson = scipy.stats.distributions.poisson
ncx2 = scipy.stats.distributions.ncx2

# Skellam distribution

class skellam_gen(scipy.stats.distributions.rv_discrete):
    def _pmf(self, x, mu1, mu2):
        px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2,
                         ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2)
        return px
    def _cdf(self, x, mu1, mu2):
        x = numpy.floor(x)
        px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1),
                         1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2))
        return px
    def _stats(self, mu1, mu2):
        mean = mu1 - mu2
        var = mu1 + mu2
        g1 = (mean) / numpy.sqrt((var)**3)
        g2 = 1 / var
        return mean, var, g1, g2
skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
                      shapes="mu1,mu2", extradoc="")

if __name__ == '__main__':

    lam1 = 3.4
    lam2 = 6.1
    n = 5000

    poisson_var1 = numpy.random.poisson(lam1, n)
    poisson_var2 = numpy.random.poisson(lam2, n)
    skellam_var = poisson_var1-poisson_var2
    
    low = min(skellam_var)
    high = max(skellam_var)
    obs_freq = numpy.histogram(
        skellam_var, numpy.arange(low, high+2))[0] / float(n)

    rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0]
    mu1 = lam1 - rho * numpy.sqrt(lam1*lam2)
    mu2 = lam2 - rho * numpy.sqrt(lam1*lam2)
    exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2)
    print obs_freq-exp_freq

    # plot
    import matplotlib.pyplot as plt
    plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq)
    plt.show()
    

Regards.
-- 
Ernest


From gael.varoquaux at normalesup.org  Mon Nov  9 13:17:13 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 9 Nov 2009 19:17:13 +0100
Subject: [SciPy-User] Distributed computing: running
	embarrassingly	parallel (python/c++) codes over a cluster
In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
Message-ID: <20091109181713.GF28468@phare.normalesup.org>

On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
> Hi all,

> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. 

A non-optimal solution that I like:
http://gael-varoquaux.info/blog/?p=119

Ga?l


From peridot.faceted at gmail.com  Mon Nov  9 13:18:46 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 9 Nov 2009 13:18:46 -0500
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel (python/c++) codes over a cluster
In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
Message-ID: <ce557a360911091018m3e9d70ban987a7138ffdce6ea@mail.gmail.com>

2009/11/9 Rohit Garg <rpg.314 at gmail.com>:
> Hi all,
>
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. I am looking for community feedback on how to best
> approach this matter? Basically, I just setup a bunch of tasks, and
> the various cpu's will pull data, process it, and send it back. Out of
> order arrival of results is no problem. The processing times involved
> are so large that the communication is effectively free, and hence I
> don't care how fast/slow the communication is. I thought I'll ask in
> case somebody has done this stuff before to avoid reinventing the
> wheel. Any other suggestions are welcome too.
>
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?

For our pulsar searches, we pick about the simplest possible method.
Each job is set up so that you run it from a UNIX shell in a directory
containing all the needed files, and it saves any output to a common
directory. We then submit jobs to the PBS batch system. We have some
minor complications to this setup because copying the input data is
quite network-intensive, so we make sure only one job starts at a
time, but other than that the jobs have no interaction at all.

Anne

> There is one method mentioned at [1], and of course, one could resort
> to something like mpi4py.
>
> [1] http://docs.python.org/library/multiprocessing.html   {see the last example}
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From eadrogue at gmx.net  Mon Nov  9 13:20:59 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Mon, 9 Nov 2009 19:20:59 +0100
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF83CE8.7080507@lpta.in2p3.fr>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr>
Message-ID: <20091109182059.GB5957@doriath.local>

 9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi:
>  From what I understand of the initial statement from Ernest:
> "
> In case somebody is interested, or you want to include it
> in scipy. I used these specs here from the R package:
> cran.r-project.org/web/packages/skellam/skellam.pdf
> "
> 
> he used the spec, as defined in this pdf, and did not look at the code 
> itself. If my interpretation of the small preamble above is correct, I 
> believe his implementation is not GPL-tainted, right?

Your interpretation of my preamble is correct.
I didn't look at the source code, I only read the PDF doc
where they explain in plain English how the pmf and cdf are
calculated.

-- 
Ernest


From pav+sp at iki.fi  Mon Nov  9 13:28:39 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Mon, 9 Nov 2009 18:28:39 +0000 (UTC)
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel	(python/c++) codes over a cluster
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
Message-ID: <hd9n0n$na3$2@ger.gmane.org>

Mon, 09 Nov 2009 23:41:29 +0530, Rohit Garg wrote:
[clip: embarassingly parallel problems]

With multiprocessing, using Pool.imap_unordered to apply a computation 
function to a list of parameter sets is one good alternative. (IIRC, it 
balances load between subprocesses &c automatically.) Multiprocessing can 
however work on only one node at a time.

With mpi4py, it's probably best to write a simple master-slave 
architecture.

-- 
Pauli Virtanen


From rpg.314 at gmail.com  Mon Nov  9 13:28:55 2009
From: rpg.314 at gmail.com (Rohit Garg)
Date: Mon, 9 Nov 2009 23:58:55 +0530
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel (python/c++) codes over a cluster
In-Reply-To: <20091109181713.GF28468@phare.normalesup.org>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> 
	<20091109181713.GF28468@phare.normalesup.org>
Message-ID: <4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com>

On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
>> Hi all,
>
>> I have an embarrassingly parallel problem, very nicely suited to
>> parallelization.
>
> A non-optimal solution that I like:
> http://gael-varoquaux.info/blog/?p=119

Thanks, for the pointer, but after a quick read, it doesn't look like
it supports distributed memory parallelism. Or does it?
>
> Ga?l
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay


From gael.varoquaux at normalesup.org  Mon Nov  9 13:35:15 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 9 Nov 2009 19:35:15 +0100
Subject: [SciPy-User] Distributed computing: running
	embarrassingly	parallel (python/c++) codes over a cluster
In-Reply-To: <4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
	<20091109181713.GF28468@phare.normalesup.org>
	<4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com>
Message-ID: <20091109183515.GG28468@phare.normalesup.org>

On Mon, Nov 09, 2009 at 11:58:55PM +0530, Rohit Garg wrote:
> On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
> > On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote:
> >> Hi all,

> >> I have an embarrassingly parallel problem, very nicely suited to
> >> parallelization.

> > A non-optimal solution that I like:
> > http://gael-varoquaux.info/blog/?p=119

> Thanks, for the pointer, but after a quick read, it doesn't look like
> it supports distributed memory parallelism. Or does it?

If by distributed memory you mean shared memory, you won't get this, but
the copy on write of Unix gives you part of it, but not all of it. One
hack is to use memmapping to a file to share memory between processes (it
won't cost you IO, because your OS will be smart-enough to cache
everything). The right way to do it is to use a shared memory array,
which Sturla and I started working on ages ago, but never found time to
integrate to numpy. 

If you mean parallelism on architectures where 'fork' won't distributes
the processes (like a cluster), than multiprocessing won't do the trick,
and you will need to look at IPython or parallel Python.

Ga?l


From josef.pktd at gmail.com  Mon Nov  9 13:44:50 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 13:44:50 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911091014p6da502dai5a46078c1b0612b2@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com>
	<ce557a360911072314n67f67abfj44e514a7caab129c@mail.gmail.com>
	<1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com>
	<ce557a360911081451t1045a04o6577924a4ccc7cb3@mail.gmail.com>
	<1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com>
	<ce557a360911081922u3fa375d9pc525568c7f478c96@mail.gmail.com>
	<1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com>
	<ce557a360911091014p6da502dai5a46078c1b0612b2@mail.gmail.com>
Message-ID: <1cd32cbb0911091044h28e9a5a7x624be502df05629c@mail.gmail.com>

On Mon, Nov 9, 2009 at 1:14 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/9 ?<josef.pktd at gmail.com>:
>
>> >From the posterior probability S/(S+1), you could construct
>> a decision rule similar to a classical test, e.g. accept null
>> if S/(S+1) < 0.95, and then construct a MonteCarlo
>> with samples drawn form either the uniform or the pulsed
>> distribution in the same way as for a classical test, and
>> verify that the decision mistakes, alpha and beta errors, in the
>> sample are close to the posterior probabilities.
>> The posterior probability would be similar to the p-value
>> in a classical test. If you want to balance alpha and
>> beta errors, a threshold S/(S+1)<0.5 would be more
>> appropriate, but for the unit tests it wouldn't matter.
>
> Unfortunately this doesn't work. Think of it this way: if my data size
> is 10000 photons, and I'm looking at the fraction of
> uniformly-distributed data sets that have a probability > 0.95 that
> they are pulsed, this won't happen with 5% of my fake data sets - it
> will almost never happen, since 10000 photons are enough to give a
> very solid answer (experiment confirms this). So I can't interpret my
> Bayesian probability as a frequentist probability of alpha error.

Doesn't this mean that the Bayesian posterior doesn't have the
correct tail probabilities? If my posterior beliefs are that the
probability that I make a mistake is 5% and I have the correct
model, but the real probability that I make a mistake is only 0.1%,
then my updating should have correctly taken into account that
the signal is so informative and tightened my posterior distribution.

With 8000 in your example, I get
Probability the signal is pulsed: 0.999960
This makes it pretty obvious if the signal is pulsed or not.

Do the tail probabilities work better for cases that are not so
easy to distinguish?

Josef

>
>> Running the example a few times, it looks like that the power
>> is relatively low for distinguishing uniform distribution from
>> a pulsed distribution with fraction/binomial parameter 0.05
>> and sample size <1000.
>> If you have strong beliefs that the fraction is really this low
>> than an informative prior for the fraction, might improve the
>> results.
>
> I really don't want to encourage my code to return reports of
> pulsations. To be believed in this nest of frequentists I work with, I
> need a solid detection in spite of very conservative priors.

When you have everything working, then you could check the
sensitivity to the priors. For parameter estimation, I found it
interesting to see which parameters change a lot when I
varied the prior variance, and it helps in the defense against
frequentists.

Josef

>
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From coughlan at ski.org  Mon Nov  9 13:18:03 2009
From: coughlan at ski.org (James Coughlan)
Date: Mon, 09 Nov 2009 10:18:03 -0800
Subject: [SciPy-User] Distributed computing: running embarrassingly
 parallel (python/c++) codes over a cluster
In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
Message-ID: <4AF85CDB.5010908@ski.org>

Rohit Garg wrote:
> Hi all,
>
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. I am looking for community feedback on how to best
> approach this matter? Basically, I just setup a bunch of tasks, and
> the various cpu's will pull data, process it, and send it back. Out of
> order arrival of results is no problem. The processing times involved
> are so large that the communication is effectively free, and hence I
> don't care how fast/slow the communication is. I thought I'll ask in
> case somebody has done this stuff before to avoid reinventing the
> wheel. Any other suggestions are welcome too.
>
> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?
>
> There is one method mentioned at [1], and of course, one could resort
> to something like mpi4py.
>
> [1] http://docs.python.org/library/multiprocessing.html   {see the last example}
>
>   
Hi,

I've never done any parallel processing, but you might consider 
Shedskin, a Python to C++ compiler, which makes it easy to convert 
Python functions into fast C++ modules, and offers support for parallel 
processing:

http://code.google.com/p/shedskin/

Best,

James

-- 
-------------------------------------------------------
James Coughlan, Ph.D., Scientist                     

The Smith-Kettlewell Eye Research Institute

Email: coughlan at ski.org
URL: http://www.ski.org/Rehab/Coughlan_lab/
Phone: 415-345-2146 
Fax: 415-345-8455
-------------------------------------------------------


From bsouthey at gmail.com  Mon Nov  9 13:49:34 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Nov 2009 12:49:34 -0600
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <20091109182059.GB5957@doriath.local>
References: <20091108151625.GA561@doriath.local>	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>	<4AF83AFD.60304@gmail.com>
	<4AF83CE8.7080507@lpta.in2p3.fr>
	<20091109182059.GB5957@doriath.local>
Message-ID: <4AF8643E.3020802@gmail.com>

On 11/09/2009 12:20 PM, Ernest Adrogu? wrote:
>   9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi:
>    
>>    From what I understand of the initial statement from Ernest:
>> "
>> In case somebody is interested, or you want to include it
>> in scipy. I used these specs here from the R package:
>> cran.r-project.org/web/packages/skellam/skellam.pdf
>> "
>>
>> he used the spec, as defined in this pdf, and did not look at the code
>> itself. If my interpretation of the small preamble above is correct, I
>> believe his implementation is not GPL-tainted, right?
>>      
> Your interpretation of my preamble is correct.
> I didn't look at the source code, I only read the PDF doc
> where they explain in plain English how the pmf and cdf are
> calculated.
>
>    
Okay,
Then just provide a suitable reference outside the R package where 
someone can derive it independently or get more details. This also 
allows people to go back and check the implementation when things are 
not as expected. Perhaps the reference that Josef provided provides the 
same formulation?

Bruce


From josef.pktd at gmail.com  Mon Nov  9 14:18:33 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 14:18:33 -0500
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <4AF8643E.3020802@gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr>
	<20091109182059.GB5957@doriath.local> <4AF8643E.3020802@gmail.com>
Message-ID: <1cd32cbb0911091118i588800dat2d6707690ea8f868@mail.gmail.com>

On Mon, Nov 9, 2009 at 1:49 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/09/2009 12:20 PM, Ernest Adrogu? wrote:
>> ? 9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi:
>>
>>> ? ?From what I understand of the initial statement from Ernest:
>>> "
>>> In case somebody is interested, or you want to include it
>>> in scipy. I used these specs here from the R package:
>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>> "
>>>
>>> he used the spec, as defined in this pdf, and did not look at the code
>>> itself. If my interpretation of the small preamble above is correct, I
>>> believe his implementation is not GPL-tainted, right?
>>>
>> Your interpretation of my preamble is correct.
>> I didn't look at the source code, I only read the PDF doc
>> where they explain in plain English how the pmf and cdf are
>> calculated.
>>
>>
> Okay,
> Then just provide a suitable reference outside the R package where
> someone can derive it independently or get more details. This also
> allows people to go back and check the implementation when things are
> not as expected. Perhaps the reference that Josef provided provides the
> same formulation?

If someone wants to check the relationship of chisquare and difference
of two independent poisson, but that's an implementation detail and
not really informative in a doc string.

On an Extension of the Connexion Between Poisson and ?2 Distributions
Author(s): N. L. Johnson
Source: Biometrika, Vol. 46, No. 3/4 (Dec., 1959), pp. 352-363

Josef

>
> Bruce
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Mon Nov  9 14:36:29 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 14:36:29 -0500
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <20091109181650.GA5957@doriath.local>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<20091109181650.GA5957@doriath.local>
Message-ID: <1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com>

2009/11/9 Ernest Adrogu? <eadrogue at gmx.net>:
> ?9/11/09 @ 09:40 (-0500), thus spake josef.pktd at gmail.com:
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>>
>> Would you please open an enhancement ticket for this at
>> http://projects.scipy.org/scipy/report/1
>
> Okay, I will.
>
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>>
>> The "if x < 0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>
> Done.
>
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
>
> I suppose it could be done, but I can't figure out how.
> As far as I understand, you can't derive the poisson parameters
> from the skellam parameters alone, you need the correlation
> coefficient (between the two poisson rv) too, is that right?

it's can be generated as difference between two independent
poisson, (see R docs). Correlation only matters for the
interpretation, as you explain below and I found in a reference.

>
>> Documentation will be necessary, ?a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>>
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> scipy).
>>
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
>
> No, sorry, I have no idea. The R paper says that in R the Bessel
> function is more accurate, but I don't think this means that the
> Bessel function implementation is more accurate in general.
> I compared results using the Bessel function in scipy and using
> the ncx2 implementation, and the differences were minimal, although
> I admit my testing wasn't extensive. Another problem is that
> I haven't got a table of values for this particular distribution,
> so in case results differ I can't tell which one is more accurate

I will look at it, but small numerical inaccuracies, 1e-??, can also be
fixed later, if someone finds a better implementation.,

>
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
>
> As far as I know, it makes no difference if the two Poisson
> variates are correlated or not. The Skellam parameters are
> defined as
>
> mu1 = lam1 - rho * sqrt(lam1*lam2)
> mu2 = lam2 - rho * sqrt(lam1*lam2)
>
> where lam1 and lam2 are the Poisson means and rho is the
> correlation coefficient. So, in case there is correlation it
> is implicit in the parameters mu1 and mu2, and it doesn't
> make any difference in terms of calculating the pmf or cdf.
> Again, I am no statician or mathematician.

I agree, that's what I have seen in the references.

>
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
>
> I have this visual test, although it's usefulness is
> questionable. It shows the deviation between observed and
> expected frequencies for a Skellam random variable. Ideally
> errors should be random and centered around 0. To me
> it looks like errors tend to increase around the mean
> and are smaller along the tails, but I don't know if this
> means something or not.
>
> import numpy
> import scipy.stats.distributions
>
> poisson = scipy.stats.distributions.poisson
> ncx2 = scipy.stats.distributions.ncx2
>
> # Skellam distribution
>
> class skellam_gen(scipy.stats.distributions.rv_discrete):
> ? ?def _pmf(self, x, mu1, mu2):
> ? ? ? ?px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2,
> ? ? ? ? ? ? ? ? ? ? ? ? ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2)
> ? ? ? ?return px
> ? ?def _cdf(self, x, mu1, mu2):
> ? ? ? ?x = numpy.floor(x)
> ? ? ? ?px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1),
> ? ? ? ? ? ? ? ? ? ? ? ? 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2))
> ? ? ? ?return px
> ? ?def _stats(self, mu1, mu2):
> ? ? ? ?mean = mu1 - mu2
> ? ? ? ?var = mu1 + mu2
> ? ? ? ?g1 = (mean) / numpy.sqrt((var)**3)
> ? ? ? ?g2 = 1 / var
> ? ? ? ?return mean, var, g1, g2
> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
> ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="")
>
> if __name__ == '__main__':
>
> ? ?lam1 = 3.4
> ? ?lam2 = 6.1
> ? ?n = 5000
>
> ? ?poisson_var1 = numpy.random.poisson(lam1, n)
> ? ?poisson_var2 = numpy.random.poisson(lam2, n)
> ? ?skellam_var = poisson_var1-poisson_var2
>
> ? ?low = min(skellam_var)
> ? ?high = max(skellam_var)
> ? ?obs_freq = numpy.histogram(
> ? ? ? ?skellam_var, numpy.arange(low, high+2))[0] / float(n)
>
> ? ?rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0]
> ? ?mu1 = lam1 - rho * numpy.sqrt(lam1*lam2)
> ? ?mu2 = lam2 - rho * numpy.sqrt(lam1*lam2)
> ? ?exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2)
> ? ?print obs_freq-exp_freq
>
> ? ?# plot
> ? ?import matplotlib.pyplot as plt
> ? ?plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq)
> ? ?plt.show()

I'm not sure the example is correct. You are simulating two
independent poisson variables, so the difference skellam_var should be
distributed as skellam with
mu1 = lam1
mu2 = lam2
and theoretical rho should be zero. Or am I missing something?

thanks, the numpy.where looks good, I still have to actually run the examples.

Josef


>
>
> Regards.
> --
> Ernest
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From bsouthey at gmail.com  Mon Nov  9 14:47:10 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Nov 2009 13:47:10 -0600
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911091006h4f68a6c5gcfa53372f100dabc@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>	<4AF84EF1.2090608@gmail.com>
	<ce557a360911091006h4f68a6c5gcfa53372f100dabc@mail.gmail.com>
Message-ID: <4AF871BE.6050300@gmail.com>

On 11/09/2009 12:06 PM, Anne Archibald wrote:
> 2009/11/9 Bruce Southey<bsouthey at gmail.com>:
>
>    
>> I do not know what you are trying to do with the code as it is not my
>> area. But you are using some empirical Bayesian estimator
>> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose
>> much of the value of Bayesian as you are only dealing with modal
>> estimates. Really you should be obtaining the distribution of
>> "Probability the signal is pulsed" not just the modal estimate.
>>      
> Um. Given a data set and a prior, I just do Bayesian hypothesis
> comparison. This gives me a single probability that the signal is
> pulsed. You seem to be imagining a probability distribution for this
> probability - but what would the independent variables be? The
> unpulsed distribution does not depend on any parameters, and I have
> integrated over all possible values for the pulsed distribution. So
> what I get should really be the probability, given the data, that the
> signal is pulsed. I'm not using an empirical Bayesian estimator; I'm
> doing the numerical integrations directly (and inefficiently).
>    
Here are two links on what I mean with reference to the binomial case:
http://lingpipe-blog.com/2009/09/11/batting-averages-bayesian-vs-mle-estimate/

TEACHING OF BAYESIAN ESTIMATION OF ?P? PROBABILITY
IN A BERNOULLI PROCESS:
http://www.stat.auckland.ac.nz/~iase/publications/17/C439.pdf


I do not know your area but you should be able to do something similar.

Bruce


From david_baddeley at yahoo.com.au  Mon Nov  9 15:56:31 2009
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Mon, 9 Nov 2009 12:56:31 -0800 (PST)
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel (python/c++) codes over a cluster
In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
Message-ID: <92881.60847.qm@web33004.mail.mud.yahoo.com>

Hi Rohit,

I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code.

I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into.

I can post the code for a minimal task server/client if you like.

best wishes,
David

--- On Tue, 10/11/09, Rohit Garg <rpg.314 at gmail.com> wrote:

> From: Rohit Garg <rpg.314 at gmail.com>
> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
> To: "SciPy Users List" <scipy-user at scipy.org>, numpy-discussions at scipy.org
> Received: Tuesday, 10 November, 2009, 7:11 AM
> Hi all,
> 
> I have an embarrassingly parallel problem, very nicely
> suited to
> parallelization. I am looking for community feedback on how
> to best
> approach this matter? Basically, I just setup a bunch of
> tasks, and
> the various cpu's will pull data, process it, and send it
> back. Out of
> order arrival of results is no problem. The processing
> times involved
> are so large that the communication is effectively free,
> and hence I
> don't care how fast/slow the communication is. I thought
> I'll ask in
> case somebody has done this stuff before to avoid
> reinventing the
> wheel. Any other suggestions are welcome too.
> 
> My only constraint is that it should be able to run a
> python extension
> (c++) with minimum of fuss. I want to minimize the
> headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?
> 
> There is one method mentioned at [1], and of course, one
> could resort
> to something like mpi4py.
> 
> [1] http://docs.python.org/library/multiprocessing.html???{see
> the last example}
> 
> -- 
> Rohit Garg
> 
> http://rpg-314.blogspot.com/
> 
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From davide.cittaro at ifom-ieo-campus.it  Mon Nov  9 15:58:28 2009
From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro)
Date: Mon, 9 Nov 2009 21:58:28 +0100
Subject: [SciPy-User] poisson distribution in scipy.stats
Message-ID: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it>

Hi all,
about the poisson generator... given l (expected) and k (found) I  
guess that the way to get the probability of k I have to do this:


d = scipy.stats.poisson(l)
p = pmf(k)

which I found being the same of

p = scipy.stats.poisson.pmf(l, k)

I've here some code in which it is written:

d = scipy.stats.poisson(l, k)
p = d.pmf(k)

which gives different results. Which is the right way to initialize  
the distribution (and get the PMF)?

Thanks

d


/*
Davide Cittaro

Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy

tel.: +39(02)574303007
e-mail: davide.cittaro at ifom-ieo-campus.it
*/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091109/324ea64d/attachment.html>

From robert.kern at gmail.com  Mon Nov  9 16:00:29 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 9 Nov 2009 15:00:29 -0600
Subject: [SciPy-User] poisson distribution in scipy.stats
In-Reply-To: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it>
References: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it>
Message-ID: <3d375d730911091300h6d9b4df6u4be1e6ae0811b12@mail.gmail.com>

On Mon, Nov 9, 2009 at 14:58, Davide Cittaro
<davide.cittaro at ifom-ieo-campus.it> wrote:
> Hi all,
> about the poisson generator... given l (expected) and k (found) I guess that
> the way to get the probability of k I have to do this:
>
> d = scipy.stats.poisson(l)
> p = pmf(k)

Correct.

> which I found being the same of
> p = scipy.stats.poisson.pmf(l, k)

Also correct.

> I've here some code in which it is written:
> d = scipy.stats.poisson(l, k)

That one is completely wrong.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From wesmckinn at gmail.com  Mon Nov  9 16:03:18 2009
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 9 Nov 2009 16:03:18 -0500
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel (python/c++) codes over a cluster
In-Reply-To: <92881.60847.qm@web33004.mail.mud.yahoo.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> 
	<92881.60847.qm@web33004.mail.mud.yahoo.com>
Message-ID: <6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com>

On Mon, Nov 9, 2009 at 3:56 PM, David Baddeley
<david_baddeley at yahoo.com.au> wrote:
> Hi Rohit,
>
> I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code.
>
> I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into.
>
> I can post the code for a minimal task server/client if you like.
>
> best wishes,
> David
>
> --- On Tue, 10/11/09, Rohit Garg <rpg.314 at gmail.com> wrote:
>
>> From: Rohit Garg <rpg.314 at gmail.com>
>> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
>> To: "SciPy Users List" <scipy-user at scipy.org>, numpy-discussions at scipy.org
>> Received: Tuesday, 10 November, 2009, 7:11 AM
>> Hi all,
>>
>> I have an embarrassingly parallel problem, very nicely
>> suited to
>> parallelization. I am looking for community feedback on how
>> to best
>> approach this matter? Basically, I just setup a bunch of
>> tasks, and
>> the various cpu's will pull data, process it, and send it
>> back. Out of
>> order arrival of results is no problem. The processing
>> times involved
>> are so large that the communication is effectively free,
>> and hence I
>> don't care how fast/slow the communication is. I thought
>> I'll ask in
>> case somebody has done this stuff before to avoid
>> reinventing the
>> wheel. Any other suggestions are welcome too.
>>
>> My only constraint is that it should be able to run a
>> python extension
>> (c++) with minimum of fuss. I want to minimize the
>> headaches involved
>> with setting up/writing the boilerplate code. Which
>> framework/approach/library would you recommend?
>>
>> There is one method mentioned at [1], and of course, one
>> could resort
>> to something like mpi4py.
>>
>> [1] http://docs.python.org/library/multiprocessing.html???{see
>> the last example}
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Senior Undergraduate
>> Department of Physics
>> Indian Institute of Technology
>> Bombay
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Here's a little parallel processing library using Pyro which might be
of interest to some:

http://code.google.com/p/papyros/


From karl.young at ucsf.edu  Mon Nov  9 16:05:49 2009
From: karl.young at ucsf.edu (Karl Young)
Date: Mon, 09 Nov 2009 13:05:49 -0800
Subject: [SciPy-User] Weierstrass and Jacobi
In-Reply-To: <db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
Message-ID: <4AF8842D.5010805@ucsf.edu>


Sorry for the dumb question (but some of you know me by now !). I was 
able to stumble around and solve a differential equation I was working 
on in terms of Weierstrass elliptic functions (though an open source 
type of guy I have to thank Wolfram re. wloframalpha for help with 
that...). I'd like to evaluate the function for various sets of 
parameters and found that the special functions package for scipy has 
Jacobi elliptic functions available. I seem to recall that the 
Weierstrass elliptic functions are special cases of the Jacobi elliptic 
functions but haven't been able to locate any source that describes that 
in any detail. Anyone have any hints ? Thanks,

-- Karl


From vanforeest at gmail.com  Mon Nov  9 16:17:44 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Mon, 9 Nov 2009 22:17:44 +0100
Subject: [SciPy-User] Weierstrass and Jacobi
In-Reply-To: <4AF8842D.5010805@ucsf.edu>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
Message-ID: <fa510ff80911091317n79255d0do12155c0a6dff2601@mail.gmail.com>

Hi Karl,

I haven't checked.. you might try the books of Apostol (mathematical
analysis), Courant and John, or Numerical recipes on this.

bye

Nicky

2009/11/9 Karl Young <karl.young at ucsf.edu>:
>
> Sorry for the dumb question (but some of you know me by now !). I was
> able to stumble around and solve a differential equation I was working
> on in terms of Weierstrass elliptic functions (though an open source
> type of guy I have to thank Wolfram re. wloframalpha for help with
> that...). I'd like to evaluate the function for various sets of
> parameters and found that the special functions package for scipy has
> Jacobi elliptic functions available. I seem to recall that the
> Weierstrass elliptic functions are special cases of the Jacobi elliptic
> functions but haven't been able to locate any source that describes that
> in any detail. Anyone have any hints ? Thanks,
>
> -- Karl
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From davide.cittaro at ifom-ieo-campus.it  Mon Nov  9 16:21:59 2009
From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro)
Date: Mon, 9 Nov 2009 22:21:59 +0100
Subject: [SciPy-User] poisson distribution in scipy.stats
In-Reply-To: <mailman.1315.1257800625.2454.scipy-user@scipy.org>
References: <mailman.1315.1257800625.2454.scipy-user@scipy.org>
Message-ID: <C89A707B-B469-4754-9DF6-F7B79C7FFC86@ifom-ieo-campus.it>


On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote:

> Message: 5
> Date: Mon, 9 Nov 2009 15:00:29 -0600
> From: Robert Kern <robert.kern at gmail.com>
> Subject: Re: [SciPy-User] poisson distribution in scipy.stats
> To: SciPy Users List <scipy-user at scipy.org>
> Message-ID:
> 	<3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Mon, Nov 9, 2009 at 14:58, Davide Cittaro
> <davide.cittaro at ifom-ieo-campus.it> wrote:
>> Hi all,
>> about the poisson generator... given l (expected) and k (found) I  
>> guess that
>> the way to get the probability of k I have to do this:
>>
>> d = scipy.stats.poisson(l)
>> p = pmf(k)
>
> Correct.
>
>> which I found being the same of
>> p = scipy.stats.poisson.pmf(l, k)
>
> Also correct.
>

Although I've just plotted values and they are sligthly different...  
the second version does have only a max value at l, whereas the first  
has two maxima (l and l-1)


>> I've here some code in which it is written:
>> d = scipy.stats.poisson(l, k)
>
> That one is completely wrong.
>

Ok, I see... It looks like it shifts the pmf of k... but how does it  
works? I mean, how the discrete distribution constructor interprets  
this kind of declaration?

d


/*
Davide Cittaro

Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy

tel.: +39(02)574303007
e-mail: davide.cittaro at ifom-ieo-campus.it
*/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091109/7bcbae1f/attachment.html>

From karl.young at ucsf.edu  Mon Nov  9 16:13:05 2009
From: karl.young at ucsf.edu (Karl Young)
Date: Mon, 09 Nov 2009 13:13:05 -0800
Subject: [SciPy-User] Distributed computing: running
 embarrassingly	parallel (python/c++) codes over a cluster
In-Reply-To: <6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com>
References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com>
	<92881.60847.qm@web33004.mail.mud.yahoo.com>
	<6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com>
Message-ID: <4AF885E1.30709@ucsf.edu>


If I were starting from scratch I'd learn how to do this using Ipython:

http://ipython.scipy.org/moin/

but in the past I had great luck using pypar, a very simple interface to 
the MPI library; for embarrassingly parallel problems it's very easy to 
get up and running quickly by just copying examples (I knew a wee bit of 
MPI before using this but not much).

http://datamining.anu.edu.au/~ole/pypar/

-- KY

> On Mon, Nov 9, 2009 at 3:56 PM, David Baddeley
> <david_baddeley at yahoo.com.au> wrote:
>   
>> Hi Rohit,
>>
>> I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code.
>>
>> I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into.
>>
>> I can post the code for a minimal task server/client if you like.
>>
>> best wishes,
>> David
>>
>> --- On Tue, 10/11/09, Rohit Garg <rpg.314 at gmail.com> wrote:
>>
>>     
>>> From: Rohit Garg <rpg.314 at gmail.com>
>>> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
>>> To: "SciPy Users List" <scipy-user at scipy.org>, numpy-discussions at scipy.org
>>> Received: Tuesday, 10 November, 2009, 7:11 AM
>>> Hi all,
>>>
>>> I have an embarrassingly parallel problem, very nicely
>>> suited to
>>> parallelization. I am looking for community feedback on how
>>> to best
>>> approach this matter? Basically, I just setup a bunch of
>>> tasks, and
>>> the various cpu's will pull data, process it, and send it
>>> back. Out of
>>> order arrival of results is no problem. The processing
>>> times involved
>>> are so large that the communication is effectively free,
>>> and hence I
>>> don't care how fast/slow the communication is. I thought
>>> I'll ask in
>>> case somebody has done this stuff before to avoid
>>> reinventing the
>>> wheel. Any other suggestions are welcome too.
>>>
>>> My only constraint is that it should be able to run a
>>> python extension
>>> (c++) with minimum of fuss. I want to minimize the
>>> headaches involved
>>> with setting up/writing the boilerplate code. Which
>>> framework/approach/library would you recommend?
>>>
>>> There is one method mentioned at [1], and of course, one
>>> could resort
>>> to something like mpi4py.
>>>
>>> [1] http://docs.python.org/library/multiprocessing.html   {see
>>> the last example}
>>>
>>> --
>>> Rohit Garg
>>>
>>> http://rpg-314.blogspot.com/
>>>
>>> Senior Undergraduate
>>> Department of Physics
>>> Indian Institute of Technology
>>> Bombay
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>       
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>     
>
> Here's a little parallel processing library using Pyro which might be
> of interest to some:
>
> http://code.google.com/p/papyros/
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> .
>
>   


From josef.pktd at gmail.com  Mon Nov  9 16:24:35 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Nov 2009 16:24:35 -0500
Subject: [SciPy-User] poisson distribution in scipy.stats
In-Reply-To: <C89A707B-B469-4754-9DF6-F7B79C7FFC86@ifom-ieo-campus.it>
References: <mailman.1315.1257800625.2454.scipy-user@scipy.org>
	<C89A707B-B469-4754-9DF6-F7B79C7FFC86@ifom-ieo-campus.it>
Message-ID: <1cd32cbb0911091324y47ad01edne8d3f666feb746e2@mail.gmail.com>

On Mon, Nov 9, 2009 at 4:21 PM, Davide Cittaro
<davide.cittaro at ifom-ieo-campus.it> wrote:
>
> On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote:
>
> Message: 5
> Date: Mon, 9 Nov 2009 15:00:29 -0600
> From: Robert Kern <robert.kern at gmail.com>
> Subject: Re: [SciPy-User] poisson distribution in scipy.stats
> To: SciPy Users List <scipy-user at scipy.org>
> Message-ID:
> <3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Mon, Nov 9, 2009 at 14:58, Davide Cittaro
> <davide.cittaro at ifom-ieo-campus.it> wrote:
>
> Hi all,
>
> about the poisson generator... given l (expected) and k (found) I guess that
>
> the way to get the probability of k I have to do this:
>
> d = scipy.stats.poisson(l)
>
> p = pmf(k)
>
> Correct.
>
> which I found being the same of
>
> p = scipy.stats.poisson.pmf(l, k)
>
> Also correct.
>
>
> Although I've just plotted values and they are sligthly different... the
> second version does have only a max value at l, whereas the first has two
> maxima (l and l-1)
>
> I've here some code in which it is written:
>
> d = scipy.stats.poisson(l, k)

k is interpreted as location
 same as stats.poisson(l, loc=k)

loc shifts the distribution, and returns a frozen distribution with
shifted support

Josef


>
> That one is completely wrong.
>
>
> Ok, I see... It looks like it shifts the pmf of k... but how does it works?
> I mean, how the discrete distribution constructor interprets this kind of
> declaration?
> d
>
> /*
> Davide Cittaro
> Cogentech - Consortium for Genomic Technologies
> via adamello, 16
> 20139 Milano
> Italy
> tel.: +39(02)574303007
> e-mail:?davide.cittaro at ifom-ieo-campus.it
> */
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From robert.kern at gmail.com  Mon Nov  9 16:25:24 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 9 Nov 2009 15:25:24 -0600
Subject: [SciPy-User] poisson distribution in scipy.stats
In-Reply-To: <C89A707B-B469-4754-9DF6-F7B79C7FFC86@ifom-ieo-campus.it>
References: <mailman.1315.1257800625.2454.scipy-user@scipy.org> 
	<C89A707B-B469-4754-9DF6-F7B79C7FFC86@ifom-ieo-campus.it>
Message-ID: <3d375d730911091325g7c70aeb2x68fb7cb2137f0a06@mail.gmail.com>

On Mon, Nov 9, 2009 at 15:21, Davide Cittaro
<davide.cittaro at ifom-ieo-campus.it> wrote:
>
> On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote:
>
> Message: 5
> Date: Mon, 9 Nov 2009 15:00:29 -0600
> From: Robert Kern <robert.kern at gmail.com>
> Subject: Re: [SciPy-User] poisson distribution in scipy.stats
> To: SciPy Users List <scipy-user at scipy.org>
> Message-ID:
> <3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Mon, Nov 9, 2009 at 14:58, Davide Cittaro
> <davide.cittaro at ifom-ieo-campus.it> wrote:
>
> Hi all,
>
> about the poisson generator... given l (expected) and k (found) I guess that
>
> the way to get the probability of k I have to do this:
>
> d = scipy.stats.poisson(l)
>
> p = pmf(k)
>
> Correct.
>
> which I found being the same of
>
> p = scipy.stats.poisson.pmf(l, k)
>
> Also correct.
>
>
> Although I've just plotted values and they are sligthly different... the
> second version does have only a max value at l, whereas the first has two
> maxima (l and l-1)

I'm sorry. I meant "Wrong."

p = scipy.stats.poisson.pmf(k, l)

> I've here some code in which it is written:
>
> d = scipy.stats.poisson(l, k)
>
> That one is completely wrong.
>
>
> Ok, I see... It looks like it shifts the pmf of k... but how does it works?
> I mean, how the discrete distribution constructor interprets this kind of
> declaration?

Exactly as the documentation states:

    myrv = poisson(mu,loc=0)
        - frozen RV object with the same methods but holding the given
shape and location fixed.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From peridot.faceted at gmail.com  Mon Nov  9 16:28:19 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 9 Nov 2009 16:28:19 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <4AF871BE.6050300@gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
	<4AF84EF1.2090608@gmail.com>
	<ce557a360911091006h4f68a6c5gcfa53372f100dabc@mail.gmail.com>
	<4AF871BE.6050300@gmail.com>
Message-ID: <ce557a360911091328u13c3973fl7ea380962deb1856@mail.gmail.com>

2009/11/9 Bruce Southey <bsouthey at gmail.com>:
> On 11/09/2009 12:06 PM, Anne Archibald wrote:
>> 2009/11/9 Bruce Southey<bsouthey at gmail.com>:
>>
>>> I do not know what you are trying to do with the code as it is not my
>>> area. But you are using some empirical Bayesian estimator
>>> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose
>>> much of the value of Bayesian as you are only dealing with modal
>>> estimates. Really you should be obtaining the distribution of
>>> "Probability the signal is pulsed" not just the modal estimate.
>>>
>> Um. Given a data set and a prior, I just do Bayesian hypothesis
>> comparison. This gives me a single probability that the signal is
>> pulsed. You seem to be imagining a probability distribution for this
>> probability - but what would the independent variables be? The
>> unpulsed distribution does not depend on any parameters, and I have
>> integrated over all possible values for the pulsed distribution. So
>> what I get should really be the probability, given the data, that the
>> signal is pulsed. I'm not using an empirical Bayesian estimator; I'm
>> doing the numerical integrations directly (and inefficiently).
>>
> Here are two links on what I mean with reference to the binomial case:
> http://lingpipe-blog.com/2009/09/11/batting-averages-bayesian-vs-mle-estimate/
>
> TEACHING OF BAYESIAN ESTIMATION OF ?P? PROBABILITY
> IN A BERNOULLI PROCESS:
> http://www.stat.auckland.ac.nz/~iase/publications/17/C439.pdf
>
>
> I do not know your area but you should be able to do something similar.

They are doing something essentially different from what I am doing.
They have a single (parameterized) hypothesis, so they don't compute a
probability of it being the case rather than some other hypothesis.
Perhaps you are being misled by the fact that the system they are
reasoning about is a binomial system, in which the parameter is
"probability of occurrence". In my case, I am not working with a
binomial system; the closest analog in my system to their p is my
fraction parameter, and I seem to have a usable way to test the
posterior distribution of this parameter. It is the hypothesis testing
that I am trying to test at the moment.

Anne


From karl.young at ucsf.edu  Mon Nov  9 18:44:40 2009
From: karl.young at ucsf.edu (Karl Young)
Date: Mon, 09 Nov 2009 15:44:40 -0800
Subject: [SciPy-User] Weierstrass and Jacobi
In-Reply-To: <fa510ff80911091317n79255d0do12155c0a6dff2601@mail.gmail.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>	<4AF8842D.5010805@ucsf.edu>
	<fa510ff80911091317n79255d0do12155c0a6dff2601@mail.gmail.com>
Message-ID: <4AF8A968.8030907@ucsf.edu>


Hi Nicky,

Thanks for the tips,

-- Karl

> Hi Karl,
>
> I haven't checked.. you might try the books of Apostol (mathematical
> analysis), Courant and John, or Numerical recipes on this.
>
> bye
>
> Nicky
>
> 2009/11/9 Karl Young <karl.young at ucsf.edu>:
>   
>> Sorry for the dumb question (but some of you know me by now !). I was
>> able to stumble around and solve a differential equation I was working
>> on in terms of Weierstrass elliptic functions (though an open source
>> type of guy I have to thank Wolfram re. wloframalpha for help with
>> that...). I'd like to evaluate the function for various sets of
>> parameters and found that the special functions package for scipy has
>> Jacobi elliptic functions available. I seem to recall that the
>> Weierstrass elliptic functions are special cases of the Jacobi elliptic
>> functions but haven't been able to locate any source that describes that
>> in any detail. Anyone have any hints ? Thanks,
>>
>> -- Karl
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>     
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>   


From sturla at molden.no  Mon Nov  9 22:48:01 2009
From: sturla at molden.no (Sturla Molden)
Date: Tue, 10 Nov 2009 04:48:01 +0100
Subject: [SciPy-User] scipy.org is down
Message-ID: <4AF8E271.9090301@molden.no>

Someone please restart the web-server.

Sturla


From denis-bz-gg at t-online.de  Tue Nov 10 10:53:56 2009
From: denis-bz-gg at t-online.de (denis)
Date: Tue, 10 Nov 2009 07:53:56 -0800 (PST)
Subject: [SciPy-User] Contribution to Performance Python
In-Reply-To: <4AF40025.3000906@iqac.csic.es>
References: <4AF40025.3000906@iqac.csic.es>
Message-ID: <65963269-c0fe-483a-8628-bbcc8a8f9b9e@c3g2000yqd.googlegroups.com>

Ramon,
  not an answer to your question, just a word of agreement:

FORALL is a fine construct, natural to write
    in pseudocode (I don't use fortran)
    in python, generators
    in C++: for( j k x y in parallel) is very fast on superscalar cpus
       (I use e.g. #define Forjkxy(...)  a[j][k] = f(x,y),
       don't tell the C++ police)
    in testing: for a, b, c in product( [a...], [b...], [c...] )

Would you have any more examples ?  They'd help spread FORALL.

Apropos examples, one (1) laplace is a bit meager;
has anyone (flameproof) done others, weave vs cython vs C or fortran ?

cheers
  -- denis


From robince at gmail.com  Tue Nov 10 11:21:25 2009
From: robince at gmail.com (Robin)
Date: Tue, 10 Nov 2009 16:21:25 +0000
Subject: [SciPy-User] Contribution to Performance Python
In-Reply-To: <4AF40025.3000906@iqac.csic.es>
References: <4AF40025.3000906@iqac.csic.es>
Message-ID: <2d5132a50911100821l1332528bl9509f2f11e74e081@mail.gmail.com>

On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet <rcsqtc at iqac.csic.es> wrote:
> If this is interesting to the community, who should I contact to have
> this included in the scipy web page?

Hi,

It's a wiki, so I think you should be able to register an account and
modify the page yourself. I'd certainly support the addition of the
new fortran versions (although perhaps one is sufficient since they
both seem to perform very closely) and perhaps updated timings
section.

Cheers

Robin


From dyamins at gmail.com  Tue Nov 10 11:37:19 2009
From: dyamins at gmail.com (Dan Yamins)
Date: Tue, 10 Nov 2009 11:37:19 -0500
Subject: [SciPy-User] Edge Detection
Message-ID: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>

Hi,

I'm looking into using SciPy for a couple of edge-detection problems,
involving detection of edges in images of text (in simple, clean fonts).
If someone on this list could point me to a relevant resource / function,
that would be excellent.   (I have essentially no background in image
processing, but am reasonably comfortable mathematically, and I would be
happy to dive into something fairly technical.)

thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091110/1caa7139/attachment.html>

From zachary.pincus at yale.edu  Tue Nov 10 11:48:51 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 10 Nov 2009 11:48:51 -0500
Subject: [SciPy-User] Edge Detection
In-Reply-To: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>
References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>
Message-ID: <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu>

References: Start around just looking at the top google hits for  
"image processing edge detection" -- that should be a pretty good  
start. Also, google any unfamiliar terms below... I really find that  
there's a ton of good basic image-processing information available  
online.

Code: Look at what's available in scipy.ndimage. There are functions  
for getting gradient magnitudes, as well as standard filters like  
Sobel etc. (which you'll learn about from the above), plus  
morphological operators for modifying binarized image regions (e.g.  
like erosion etc.; useful for getting rid of stray noise-induced  
edges), plus some basic functions for image smoothing like median  
filters, etc.

For exploratory analysis, you might want some ability to interactively  
visualize images; you could use matplotlib or the imaging scikit,  
which is still pre-release but making fast progress: http://github.com/stefanv/scikits.image

I've attached basic code for Canny edge detection, which should  
demonstrate a bit about how ndimage works, plus it's useful in its own  
right. There is also some code floating around for anisotropic  
diffusion and bilateral filtering, which are two noise-reduction  
methods that can be better than simple median filtering.

Zach


On Nov 10, 2009, at 11:37 AM, Dan Yamins wrote:

> Hi,
>
> I'm looking into using SciPy for a couple of edge-detection  
> problems, involving detection of edges in images of text (in simple,  
> clean fonts).   If someone on this list could point me to a relevant  
> resource / function, that would be excellent.   (I have essentially  
> no background in image processing, but am reasonably comfortable  
> mathematically, and I would be happy to dive into something fairly  
> technical.)
>
> thanks,
> Dan
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
-------------- next part --------------
A non-text attachment was scrubbed...
Name: canny.py
Type: text/x-python-script
Size: 2174 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091110/2734e768/attachment.bin>

From agile.aspect at gmail.com  Tue Nov 10 12:54:31 2009
From: agile.aspect at gmail.com (Agile Aspect)
Date: Tue, 10 Nov 2009 09:54:31 -0800
Subject: [SciPy-User] Weierstrass and Jacobi
In-Reply-To: <4AF8842D.5010805@ucsf.edu>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
Message-ID: <c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>

On Mon, Nov 9, 2009 at 1:05 PM, Karl Young <karl.young at ucsf.edu> wrote:
>
> Sorry for the dumb question (but some of you know me by now !). I was
> able to stumble around and solve a differential equation I was working
> on in terms of Weierstrass elliptic functions (though an open source
> type of guy I have to thank Wolfram re. wloframalpha for help with
> that...). I'd like to evaluate the function for various sets of
> parameters and found that the special functions package for scipy has
> Jacobi elliptic functions available. I seem to recall that the
> Weierstrass elliptic functions are special cases of the Jacobi elliptic
> functions but haven't been able to locate any source that describes that
> in any detail. Anyone have any hints ? Thanks,
>
> -- Karl
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Take a look at

    http://eom.springer.de/w/w097450.htm

just before the references, or

http://en.wikipedia.org/wiki/Weierstrass's_elliptic_functions#Relation_to_Jacobi_elliptic_functions

just before the references.

-- 
      Enjoy global warming while it lasts.


From mudit_19a at yahoo.com  Tue Nov 10 13:16:53 2009
From: mudit_19a at yahoo.com (mudit sharma)
Date: Tue, 10 Nov 2009 23:46:53 +0530 (IST)
Subject: [SciPy-User] Pytseries numpy func error
In-Reply-To: <c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
	<c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>
Message-ID: <835246.33088.qm@web94906.mail.in2.yahoo.com>


series.sum() gives this error whereas series.data.sum() works.

/usr/local/lib/python2.6/dist-packages/scikits.timeseries-0.91.1-py2.6-linux-x86_64.egg/scikits/timeseries/tseries.pyc in __call__(self, *args, **params)
    471         (_dates, _series) = (instance._dates, instance._series)
    472         func = getattr(_series, self.__name__)
--> 473         result = func(*args, **params)
    474         if _dates.size != _series.size:
    475             axis = params.get('axis', None)

/usr/local/lib/python2.6/dist-packages/numpy-1.3.0-py2.6-linux-x86_64.egg/numpy/ma/core.pyc in sum(self, axis, dtype, out)
   3675         # No explicit output

   3676         if out is None:
-> 3677             result = self.filled(0).sum(axis, dtype=dtype).view(type(self))
   3678             if result.ndim:
   3679                 result.__setmask__(newmask)

AttributeError: 'float' object has no attribute 'view'


From rcsqtc at iqac.csic.es  Tue Nov 10 13:26:10 2009
From: rcsqtc at iqac.csic.es (Ramon Crehuet)
Date: Tue, 10 Nov 2009 19:26:10 +0100
Subject: [SciPy-User] Contribution to Performance Python
Message-ID: <4AF9B042.3040402@iqac.csic.es>

Robin,
I know it is a wiki, but you need to get permission to modify it. I've
tried registering at:
http://docs.scipy.org/numpy/accounts/login
but I always get an Authentication failed error. However if I try to
register again, it complains that my username is already in use (it
wasn't the first time I registered, so that is me! :-) ) My username is
rcrehuet, and the address rcsqtc_at_iqac.csic.es
Cheers,
Ramon

PS. David, I am willing to contribute to the cookbook.


On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet <rcsqtc at iqac.csic.es> wrote:
> > If this is interesting to the community, who should I contact to have
> > this included in the scipy web page?

Hi,

It's a wiki, so I think you should be able to register an account and
modify the page yourself. I'd certainly support the addition of the
new fortran versions (although perhaps one is sufficient since they
both seem to perform very closely) and perhaps updated timings
section.

Cheers

Robin


From robince at gmail.com  Tue Nov 10 13:40:37 2009
From: robince at gmail.com (Robin)
Date: Tue, 10 Nov 2009 18:40:37 +0000
Subject: [SciPy-User] Contribution to Performance Python
In-Reply-To: <4AF9B042.3040402@iqac.csic.es>
References: <4AF9B042.3040402@iqac.csic.es>
Message-ID: <2d5132a50911101040gfdfa593r3b914cfaf1f7e0a2@mail.gmail.com>

On Tue, Nov 10, 2009 at 6:26 PM, Ramon Crehuet <rcsqtc at iqac.csic.es> wrote:
> Robin,
> I know it is a wiki, but you need to get permission to modify it. I've
> tried registering at:
> http://docs.scipy.org/numpy/accounts/login
> but I always get an Authentication failed error. However if I try to
> register again, it complains that my username is already in use (it
> wasn't the first time I registered, so that is me! :-) ) My username is
> rcrehuet, and the address rcsqtc_at_iqac.csic.es

I think docs.scipy.org is specifically for the documentation effort -
it is a wiki whereby people can contribute to docstrings that will
then be merged back into the numpy source code. It does require
permissions to edit, but I thought the scipy website, which is a wiki
at www.scipy.org requires registration, but I thought only certain
pages (like the front page) were locked and the rest should be
editable by any user.

Try to follow the login link at the top right directly from the page
you want to edit:
http://www.scipy.org/PerformancePython
There is a link from the login screen to register a wiki account here:
http://www.scipy.org/UserPreferences

So you could try that.

Cheers

Robin

> Cheers,
> Ramon
>
> PS. David, I am willing to contribute to the cookbook.
>
>
>
> On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet <rcsqtc at iqac.csic.es> wrote:
>> > If this is interesting to the community, who should I contact to have
>> > this included in the scipy web page?
>
> Hi,
>
> It's a wiki, so I think you should be able to register an account and
> modify the page yourself. I'd certainly support the addition of the
> new fortran versions (although perhaps one is sufficient since they
> both seem to perform very closely) and perhaps updated timings
> section.
>
> Cheers
>
> Robin
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From david_baddeley at yahoo.com.au  Tue Nov 10 15:51:14 2009
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Tue, 10 Nov 2009 12:51:14 -0800 (PST)
Subject: [SciPy-User] Saving record arrays to tab formatted txt
Message-ID: <67025.17033.qm@web33008.mail.mud.yahoo.com>

Hi all,

does anyone know of an easy way to save record arrays as tab formatted txt? numpy.savetxt doesn't do the trick.

I've got a record array with nested records, and mixed data types - dtype is as follows:

dtype([('tIndex', '<i4'), ('fitResults', [('A', '<f4'), ('x0', '<f4'), ('y0', '<f4'), ('sigma', '<f4'), ('background', '<f4'), ('bx', '<f4'), ('by', '<f4')]), ('fitError', [('A', '<f4'), ('x0', '<f4'), ('y0', '<f4'), ('sigma', '<f4'), ('background', '<f4'), ('bx', '<f4'), ('by', '<f4')]), ('resultCode', '<i4'), ('slicesUsed', [('x', [('start', '<i4'), ('stop', '<i4'), ('step', '<i4')]), ('y', [('start', '<i4'), ('stop', '<i4'), ('step', '<i4')]), ('z', [('start', '<i4'), ('stop', '<i4'), ('step', '<i4')])])])

and want to flatten it so that each entry in the table becomes a row in the .txt file. ie:

tIndex fitResults_A fitResults_x0 ....\n
tIndex fitResults_A fitResults_x0 ....\n
etc...

If necessary I'll write my own code to format up a string for each line, but thought I'd ask first in case anyone knew of a pre-existing solution. It'd ideally be generic as I'm also likely to want to use variations on the data type.

many thanks,
David


From bruce at clearscienceinc.com  Tue Nov 10 16:27:22 2009
From: bruce at clearscienceinc.com (Bruce Ford)
Date: Tue, 10 Nov 2009 16:27:22 -0500
Subject: [SciPy-User] Plotting across the Prime Meridian
Message-ID: <bebb7fa10911101327i7dc07639o85c63955c3aa90db@mail.gmail.com>

In the code below (which is an extraction from a larger set of file),
I can plot fine across the dateline and elsewhere, however any plots
across the PM error out.

I'm using OpenLayers to allow users to designate their plot area via a
web form by dragging a map area.  OpenLayers provide longitudes from
180/-180.

The data I'm extracting has longitudes from 0-360 so some manipulation
is necessary.

I suspect the problem that is causing the error is the line that
reads:  "grid = grid[slat:nlat+1,wlong:elong+1]"
because my wlong is greater than my elong.  Does anyone have a
suggestion for handling this at the PM.

Any suggestions would be appreciated.

The error is in the comments of the below script:

import matplotlib
import matplotlib.pyplot as pyplot #used to build contour and wind barbs plots
import matplotlib.colors as pycolors #used to build color schemes for plots
import numpy.ma as M #matrix manipulation functions
from mpl_toolkits.basemap import Basemap
import numpy as np #used to perform simple math functions on data
from numpy import *
##############################################################
def read_netcdf(filename, hour, param, wlong, elong, nlat, slat,day_time_level):
    from netCDF4 import Dataset  #interprets NetCDF files
    nc_file = Dataset(filename, mode="r")
    nlat = nlat+90
    slat = slat+90
    swh = nc_file.variables['sig_wav_ht'][:]
    swh = np.squeeze(swh)
    grid = swh[int(day_time_level)-1, :,: ]
    grid = M.array(grid)
    grid = M.masked_where(grid < 0.001, grid)
    grid = grid[slat:nlat+1,wlong:elong+1]
    return grid
###############################################################
map_res = "i"
nlat = 7
slat = -7

#*******************************************************
#when the below values for wlong and elong are as such
# (across the Prime Meridian) I get the following error:
#Traceback (most recent call last):
#  File "C:\Program Files\Wing\src\debug\tserver\_sandbox.py", line
58, in <module>
#  File "C:\Python25\Lib\site-packages\numpy\ma\core.py", line 4262, in min
#    result = self.filled(fill_value).min(axis=axis, out=out).view(type(self))
#ValueError: zero-size array to ufunc.reduce without identity

wlong = -12
elong = 8

#*******************************************************
# When the below two values are as such, I get the figure I intend
#wlong = -89
#elong = -75

day_raw = 1
param = "sig"
year1 = 1993
month = "01"
vint = 15
para_spacing = 10
merid_spacing = 10
hour = "%02d" % day_raw
if wlong < 0:
    wlong = wlong+360
if elong < 0:
    elong = elong+360
vtype = "auto"

filename =  "x:/ww3/NetCDF/daily_means/ww3dm."+str(year1)+str(month)+ ".nc"
grid = read_netcdf(filename, hour=hour, param=param, wlong=wlong,
elong=elong, nlat=nlat, slat=slat, day_time_level=day_raw)

m = Basemap(projection='cyl',resolution=map_res,llcrnrlon=wlong,llcrnrlat=slat,urcrnrlon=elong,urcrnrlat=nlat)

x,y = m(*np.meshgrid(range(wlong,elong+1),range(slat,nlat+1)))

print wlong
print elong

if vtype == 'auto':
    vmin = grid.min()
    print "<h3>Vmin:  ",vmin,"</h3>"
    vmax = grid.max()
    print "<h3>Vmax:  ",vmax,"</h3>"
    #vmin = 0
    #vmax = 30
elif vtype== "manual":
    grid = M.masked_outside(grid,vmin,vmax)
pyplot.jet()
plot = m.contour(x,y,grid,int(vint)-1,linewidths=0.5,colors='k')
plot = m.contourf(x,y,grid,int(vint)-1,cmap=pyplot.cm.jet)
m.drawcoastlines() #draw coastlines
m.drawmapboundary() #draw a line around the map region
m.fillcontinents(color='0.8', lake_color=None, ax=None, zorder=None)
#fill in continents with color (gray)
m.drawparallels(np.arange(-90,90,para_spacing),labels=[1,0,0,0]) #draw parallels
m.drawmeridians(np.arange(-180,180,merid_spacing),labels=[0,0,0,1])
#draw meridians
pyplot.show()

Bruce
---------------------------------------
Bruce W. Ford
Clear Science, Inc.
bruce at clearscienceinc.com
bruce.w.ford.ctr at navy.smil.mil
http://www.ClearScienceInc.com
Phone/Fax: 904-379-9704
8241 Parkridge Circle N.
Jacksonville, FL  32211
Skype:  bruce.w.ford
Google Talk: fordbw at gmail.com


From jsseabold at gmail.com  Tue Nov 10 17:00:47 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Tue, 10 Nov 2009 17:00:47 -0500
Subject: [SciPy-User] Saving record arrays to tab formatted txt
In-Reply-To: <67025.17033.qm@web33008.mail.mud.yahoo.com>
References: <67025.17033.qm@web33008.mail.mud.yahoo.com>
Message-ID: <A33A4725-F715-464B-AD46-4084ADEB8B5B@gmail.com>

On Nov 10, 2009, at 3:51 PM, David Baddeley  
<david_baddeley at yahoo.com.au> wrote:

> Hi all,
>
> does anyone know of an easy way to save record arrays as tab  
> formatted txt? numpy.savetxt doesn't do the trick.
>
> I've got a record array with nested records, and mixed data types -  
> dtype is as follows:
>
> dtype([('tIndex', '<i4'), ('fitResults', [('A', '<f4'), ('x0',  
> '<f4'), ('y0', '<f4'), ('sigma', '<f4'), ('background', '<f4'),  
> ('bx', '<f4'), ('by', '<f4')]), ('fitError', [('A', '<f4'), ('x0',  
> '<f4'), ('y0', '<f4'), ('sigma', '<f4'), ('background', '<f4'),  
> ('bx', '<f4'), ('by', '<f4')]), ('resultCode', '<i4'),  
> ('slicesUsed', [('x', [('start', '<i4'), ('stop', '<i4'), ('step',  
> '<i4')]), ('y', [('start', '<i4'), ('stop', '<i4'), ('step',  
> '<i4')]), ('z', [('start', '<i4'), ('stop', '<i4'), ('step',  
> '<i4')])])])
>
> and want to flatten it so that each entry in the table becomes a row  
> in the .txt file. ie:
>
> tIndex fitResults_A fitResults_x0 ....\n
> tIndex fitResults_A fitResults_x0 ....\n
> etc...
>
> If necessary I'll write my own code to format up a string for each  
> line, but thought I'd ask first in case anyone knew of a pre- 
> existing solution. It'd ideally be generic as I'm also likely to  
> want to use variations on the data type.
>
> many thanks,
> David
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

I have just been using savetxt as a template and adding processing as  
needed.  Would be interested in hearing any other solutions or  
extending savetxt to be more flexible / general.

-Skipper


From mattknox.ca at gmail.com  Tue Nov 10 17:29:41 2009
From: mattknox.ca at gmail.com (Matt Knox)
Date: Tue, 10 Nov 2009 22:29:41 +0000 (UTC)
Subject: [SciPy-User] Pytseries numpy func error
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
	<c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>
	<835246.33088.qm@web94906.mail.in2.yahoo.com>
Message-ID: <loom.20091110T232101-47@post.gmane.org>


> series.sum() gives this error whereas series.data.sum()
> works.

I don't get this error when trying a sum on a TimeSeries object. I noticed you
are using an older version of the timeseries module. Can you try upgrading to
the latest version and see if you still get an error? Also, if you still get
the error please post a small example demonstrating how to get the error,
thanks.

Also, note that we will probably be doing a new minor bug fix release within
the next week or two.

- Matt


From eadrogue at gmx.net  Tue Nov 10 17:38:23 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Tue, 10 Nov 2009 23:38:23 +0100
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<20091109181650.GA5957@doriath.local>
	<1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com>
Message-ID: <20091110223823.GA10421@doriath.local>

 9/11/09 @ 14:36 (-0500), thus spake josef.pktd at gmail.com:
> I'm not sure the example is correct. You are simulating two
> independent poisson variables, so the difference skellam_var should be
> distributed as skellam with
> mu1 = lam1
> mu2 = lam2
> and theoretical rho should be zero. Or am I missing something?

Yes, rho should be zero, but in practice, there may be a
certain amount of correlation even though the variables are
theoretically independent.

If you set rho=0 when it's not actually zero, this will
result in an artificially worse fit, which sort of defeats
the purpose of this test. It would make sense to set rho=0
if we were testing whether the to Poisson variates are
independent, though.

Bye the way, I have opened a ticket here:
http://projects.scipy.org/scipy/ticket/1050

Bye.
Ernest


From eadrogue at gmx.net  Tue Nov 10 17:47:05 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Tue, 10 Nov 2009 23:47:05 +0100
Subject: [SciPy-User] the skellam distribution
In-Reply-To: <1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com>
References: <20091108151625.GA561@doriath.local>
	<1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com>
	<4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr>
	<4AF8411D.9050605@gmail.com>
	<1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com>
Message-ID: <20091110224705.GB10421@doriath.local>

 9/11/09 @ 11:58 (-0500), thus spake josef.pktd at gmail.com:
> from
> Bayesian analysis of the dierences of count data
> D. Karlis and I. Ntzoufras
> STATISTICS IN MEDICINE
> Statist. Med. 2006; 25:1885?1905
> 
> they have some funny application to soccer scores
> http://stat-athens.aueb.gr/~jbn/publications.htm

Yes, I have working on modelling soccer scores for some
time now, and have come to the conclusion that Poisson models
are doomed, because in essence football scores are not
Poisson distributed. If you are interested in these things
let me suggest you to concentrate your efforts on the negative
binomial model, no matter how tempting the Poisson model may be.

Cheers.
Ernest


From devicerandom at gmail.com  Wed Nov 11 11:26:23 2009
From: devicerandom at gmail.com (ms)
Date: Wed, 11 Nov 2009 16:26:23 +0000
Subject: [SciPy-User] ODR fitting several equations to the same parameters
Message-ID: <4AFAE5AF.3020506@gmail.com>

Hi,

Probably it is a noobish question, but statistics is still not my cup of
tea as I'd like it to be. :)

Let's start with a simple example. Imagine I have several linear data
sets y=ax+b which have different b (all of them are known) but that
should fit to the same (unknown) a. To have my best estimate of a, I
would want to fit them all together. In this case it is trivial, you
just subtract the known b from the data set and fit them all at the same
time.

In my case it is a bit different, in the sense that I have to do
conceptually the same thing but for a highly non-linear equation where
the equivalent of "b" above is not so simple to separate. I wonder
therefore if there is a way to do a simultaneous fit of different
equations differing only in the known parameters and having a single
output, possibly with the help of ODR. Is this possible? And/or what
should be the best thing to do, in general, for this kind of problems?

Many thanks,
M.


From bsouthey at gmail.com  Wed Nov 11 12:04:14 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 11 Nov 2009 11:04:14 -0600
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <4AFAE5AF.3020506@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
Message-ID: <4AFAEE8E.9080000@gmail.com>

On 11/11/2009 10:26 AM, ms wrote:
> Hi,
>
> Probably it is a noobish question, but statistics is still not my cup of
> tea as I'd like it to be. :)
>
> Let's start with a simple example. Imagine I have several linear data
> sets y=ax+b which have different b (all of them are known) but that
> should fit to the same (unknown) a. To have my best estimate of a, I
> would want to fit them all together. In this case it is trivial, you
> just subtract the known b from the data set and fit them all at the same
> time.
>    
Although b is known without error you still have potentially effects due 
to each data set.

What I would do is fit:
y= mu + dataset + a*x + dataset*a*x

Where mu is some overall mean,
dataset is the effect of the ith dataset - allows different intercepts 
for each data set
dataset*a is the interaction between a and the dataset - allows 
different slopes for each dataset.

Obviously you first test that interaction is zero. In theory, the 
difference between the solutions of dataset should equate to the 
differences between the known b's.


> In my case it is a bit different, in the sense that I have to do
> conceptually the same thing but for a highly non-linear equation where
> the equivalent of "b" above is not so simple to separate. I wonder
> therefore if there is a way to do a simultaneous fit of different
> equations differing only in the known parameters and having a single
> output, possibly with the help of ODR. Is this possible? And/or what
> should be the best thing to do, in general, for this kind of problems?
>
> Many thanks,
> M.
>    

Now you just expand your linear model to nonlinear one. The formulation 
depends on your equation. But really you just replace f(a*x) with 
f(a*x+dataset*a*x).

So I first try with a linear model before a nonlinear. Also I would see 
if I could linearize the non-linear function.

Bruce


From seefeld at sympatico.ca  Wed Nov 11 12:01:47 2009
From: seefeld at sympatico.ca (Stefan Seefeld)
Date: Wed, 11 Nov 2009 12:01:47 -0500
Subject: [SciPy-User] Use of MPI in extension modules
Message-ID: <4AFAEDFB.10808@sympatico.ca>

Hello,

I have a rather basic question about using (C++) extension modules with 
ipython. Sorry if this is the wrong list for this.

I'm working on a signal & image processing library that uses MPI 
internally. I'd like to provide a Python interface to it, so I can 
integrate it into SciPy. With 'normal' Python this all works nicely.
Just recently I have started to consider parallelism, i.e. I want to use 
the library's internal parallelism, by running it with ipython in parallel.
My assumption was that all the engines started via 'ipcluster mpiexec 
..." would already have MPI_Init called, and thus, my extension modules 
would merely share the global MPI state with the Python interpreter.
That doesn't seem to be the case, as I either see all my module 
instances report rank 0, or, if I don't call MPI_Init, get a failure on 
the first MPI call I do.

Can anybody help ? Do I need to initialize MPI myself in my extension 
module ?
Any pointers are highly appreciated.

Thanks,
          Stefan

-- 

       ...ich hab' noch einen Koffer in Berlin...


From ellisonbg.net at gmail.com  Wed Nov 11 13:23:42 2009
From: ellisonbg.net at gmail.com (Brian Granger)
Date: Wed, 11 Nov 2009 10:23:42 -0800
Subject: [SciPy-User] Use of MPI in extension modules
In-Reply-To: <4AFAEDFB.10808@sympatico.ca>
References: <4AFAEDFB.10808@sympatico.ca>
Message-ID: <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com>

Stefan,

This is probably a better topic for the IPython users list:

http://mail.scipy.org/mailman/listinfo/ipython-user

I'm working on a signal & image processing library that uses MPI

internally. I'd like to provide a Python interface to it, so I can
> integrate it into SciPy. With 'normal' Python this all works nicely.
> Just recently I have started to consider parallelism, i.e. I want to use
> the library's internal parallelism, by running it with ipython in parallel.
> My assumption was that all the engines started via 'ipcluster mpiexec
> ..." would already have MPI_Init called, and thus, my extension modules
> would merely share the global MPI state with the Python interpreter.
> That doesn't seem to be the case, as I either see all my module
> instances report rank 0, or, if I don't call MPI_Init, get a failure on
> the first MPI call I do.
>
>
You do need to tell the IPython engine how the should call MPI_Init.  The
best way of doing this
is to install mpi4py and then call ipcluster with the --mpi=mpi4py option.

Once you do this, you can simply import your extension module and use it -
you won't have
to call MPI_Init again.  The reason that IPython need to be told how
MPI_Init is called
is that we try to make sure that the engine ids match the MPI ranks.

But, one question.  Why not use mip4py for yor MPI calls?  If you really
need low-level C stuff
mpi4py works very well with cython.  All that would be much more pleasant
than writing
low level C/MPI code.  The key is that mpi4py handles all the subtleties of
the different MPI
platforms, and OSs.  Doing that yourself is quite painful.

Cheers,

Brian


Can anybody help ? Do I need to initialize MPI myself in my extension
> module ?
> Any pointers are highly appreciated.
>
> Thanks,
>          Stefan
>
> --
>
>       ...ich hab' noch einen Koffer in Berlin...
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091111/650e2452/attachment.html>

From tmp50 at ukr.net  Wed Nov 11 13:28:24 2009
From: tmp50 at ukr.net (Dmitrey)
Date: Wed, 11 Nov 2009 20:28:24 +0200
Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions
In-Reply-To: <E1N75qs-0001cM-FC@ffe12.ukr.net>
Message-ID: <E1N8Hvc-00073g-0G@ffe8.ukr.net>

So, anyone doesn't know answers to the questions about scipy.sparse module mentioned below?  
As for the bug mentioned, I have installed latest numpy & scipy svn snapshots (1.4.0.dev7726 and 0.8.0.dev6096), the bug still exist.  
D.  
  
--- ???????? ????????? ---  
?? ????: "Dmitrey" <tmp50 at ukr.net>  
????: scipy-user at scipy.org  
????: 8 ??????, 13:22:34  
????: [SciPy-User] isn't it a bug in scipy.sparse? + some questions  
  
  Hi scipy.sparse developers and all other scipy users,  
I'm trying to take benefits for solving SLEs in FuncDesigner via involving scipy.sparse.  
Some examples are here  
http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations  
and example for sparse SLEs is here  
http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py  
It already works faster than using dense matrices, but I want to speedup it even more, so I have some questions and seems like bug report (scipy.__version__ 0.7.0):  
  
from scipy import sparse  
from numpy import *  
a=sparse.lil_matrix((3,1))  
a[0:3,:] = ones(3)  
print a.todense()  
#prints  
[[ 1.]  
?[ 0.]  
?[ 0.]]  
while I expect all-ones  
  
Questions:  
1) Seems like a[some_ind,:]=something works very, very slow for lil. I have implemented a workaround, but can I use a[some_ind,:] for another format than lil? (seems like all other ones doesn't support it).  
2) What is current situation with matmat and matvec functions? They say "deprecated" but no alternative is mentioned.  
3) What is current situation with scipy.sparse.linalg.spsolve? It says  
/usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead  
? ' install scikits.umfpack instead', DeprecationWarning )  
But I don't want my code to be dependent on a scikits module. Are there another default/autoselect solver for sparse SLEs?  
If no, which one would you recommend me to use as default for sparse SLEs - bicg, gmres, something else?  
  
Thank you in advance, D.  
  
_______________________________________________  
SciPy-User mailing list  
SciPy-User at scipy.org  
http://mail.scipy.org/mailman/listinfo/scipy-user  
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091111/4a0aac53/attachment.html>

From pav at iki.fi  Wed Nov 11 14:36:31 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Wed, 11 Nov 2009 21:36:31 +0200
Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions
In-Reply-To: <E1N8Hvc-00073g-0G@ffe8.ukr.net>
References: <E1N75qs-0001cM-FC@ffe12.ukr.net> <E1N8Hvc-00073g-0G@ffe8.ukr.net>
Message-ID: <1257968191.4524.2.camel@idol>

ke, 2009-11-11 kello 20:28 +0200, Dmitrey kirjoitti:
> So, anyone doesn't know answers to the questions about scipy.sparse
> module mentioned below? 

Or the people who know did not have time to immediately answer your
question, and forgot about the mail afterwards.

If you think it's a bug, please file a ticket in the Trac. Thanks!

-- 
Pauli Virtanen


From denis.laxalde at gmail.com  Wed Nov 11 14:48:20 2009
From: denis.laxalde at gmail.com (Denis Laxalde)
Date: Wed, 11 Nov 2009 14:48:20 -0500
Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions
In-Reply-To: <E1N8Hvc-00073g-0G@ffe8.ukr.net>
References: <E1N8Hvc-00073g-0G@ffe8.ukr.net>
Message-ID: <1257968900.22136.24.camel@157-rome.campus.mcgill.ca>

Hi Dmitrey,

Le mercredi 11 novembre 2009 ? 20:28 +0200, Dmitrey a ?crit :
> So, anyone doesn't know answers to the questions about scipy.sparse
> module mentioned below? 
> As for the bug mentioned, I have installed latest numpy & scipy svn
> snapshots (1.4.0.dev7726 and 0.8.0.dev6096), the bug still exist. 
> D. 
> 
> --- ???????? ????????? --- 
> ?? ????: "Dmitrey" <tmp50 at ukr.net> 
> ????: scipy-user at scipy.org 
> ????: 8 ??????, 13:22:34 
> ????: [SciPy-User] isn't it a bug in scipy.sparse? + some questions 
> 
>         Hi scipy.sparse developers and all other scipy users, 
>         I'm trying to take benefits for solving SLEs in FuncDesigner
>         via involving scipy.sparse. 
>         Some examples are here 
>         http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations 
>         and example for sparse SLEs is here 
>         http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py 
>         It already works faster than using dense matrices, but I want
>         to speedup it even more, so I have some questions and seems
>         like bug report (scipy.__version__ 0.7.0): 
>         
>         from scipy import sparse 
>         from numpy import * 
>         a=sparse.lil_matrix((3,1)) 
>         a[0:3,:] = ones(3) 
>         print a.todense() 
>         #prints 
>         [[ 1.] 
>          [ 0.] 
>          [ 0.]] 
>         while I expect all-ones 

in this case, using:
   a[0:3,:] = 1
will do what you want.

I don't know if it's really a bug.
>         
>         Questions: 
>         1) Seems like a[some_ind,:]=something works very, very slow
>         for lil. I have implemented a workaround, but can I use
>         a[some_ind,:] for another format than lil? (seems like all
>         other ones doesn't support it).

>From what I understand, lil format is useful for building matrices terms
by terms. As for advanced indexing operations, I guess coo format is
more appropriate...

-- 
Denis


From sturla at molden.no  Wed Nov 11 17:16:43 2009
From: sturla at molden.no (Sturla Molden)
Date: Wed, 11 Nov 2009 23:16:43 +0100
Subject: [SciPy-User] Use of MPI in extension modules
In-Reply-To: <4AFAEDFB.10808@sympatico.ca>
References: <4AFAEDFB.10808@sympatico.ca>
Message-ID: <4AFB37CB.7000807@molden.no>

Stefan Seefeld skrev:
> the library's internal parallelism, by running it with ipython in parallel.
> My assumption was that all the engines started via 'ipcluster mpiexec 
> ..." would already have MPI_Init called, and thus, my extension modules 
> would merely share the global MPI state with the Python interpreter.
I don't know ipython, but I use MPI now and then.

You can e.g. spawn 4 processes of an executable using a statement like:

$ mpiexec -n 4 executable

Each process spawned ny mpiexec must call MPI_Init once and before any 
other MPI call. The call to MPI_Init is global to the process, it does 
not matter that Python extensions are DLLs. You need to call MPI_Init 
exactly once in each MPI-spawned process, and it does not matter how:

- using ctypes
- in an extension module
- in C code embedding a Python interpreter
- in a modified Python interpreter

If you only get rank 0 reported, it means you spawned just one process. 
That could happen if you forget to specify how many processes you want 
in the call to mpiexec.


Sturla


From seefeld at sympatico.ca  Wed Nov 11 17:45:35 2009
From: seefeld at sympatico.ca (Stefan Seefeld)
Date: Wed, 11 Nov 2009 17:45:35 -0500
Subject: [SciPy-User] Use of MPI in extension modules
In-Reply-To: <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com>
References: <4AFAEDFB.10808@sympatico.ca>
	<6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com>
Message-ID: <4AFB3E8F.3040009@sympatico.ca>

On 11/11/2009 01:23 PM, Brian Granger wrote:
> Stefan,
>
> This is probably a better topic for the IPython users list:
>
> http://mail.scipy.org/mailman/listinfo/ipython-user

Thanks !
I didn't know that actually exists. It doesn't appear to be listed on 
either http://www.scipy.org or http://ipython.scipy.org, nor on 
http://www.scipy.org/Mailing_Lists. I'll cross-post there, so we may 
continue the conversation there, assuming I'm not moderated.

>
> I'm working on a signal & image processing library that uses MPI
>
>     internally. I'd like to provide a Python interface to it, so I can
>     integrate it into SciPy. With 'normal' Python this all works nicely.
>     Just recently I have started to consider parallelism, i.e. I want
>     to use
>     the library's internal parallelism, by running it with ipython in
>     parallel.
>     My assumption was that all the engines started via 'ipcluster mpiexec
>     ..." would already have MPI_Init called, and thus, my extension
>     modules
>     would merely share the global MPI state with the Python interpreter.
>     That doesn't seem to be the case, as I either see all my module
>     instances report rank 0, or, if I don't call MPI_Init, get a
>     failure on
>     the first MPI call I do.
>
>
> You do need to tell the IPython engine how the should call MPI_Init.  
> The best way of doing this
> is to install mpi4py and then call ipcluster with the --mpi=mpi4py option.
>
> Once you do this, you can simply import your extension module and use 
> it - you won't have
> to call MPI_Init again.  The reason that IPython need to be told how 
> MPI_Init is called
> is that we try to make sure that the engine ids match the MPI ranks.

I'm not sure I understand. In fact, I had expected *only* ipython needed 
to know how to call MPI_Init. The rest of my own (extension) code then 
merely assumes it has been called with the appropriate arguments (which 
ultimately come from "mpirun", which itself is invoked by ipcluster, 
isn't it ?

Is that not true ?

Is there some documentation that explains the interaction between 
(i)python (the ipcluster.py module in particular), mpirun, and the 
ipengine script that the latter then invokes ? May be I can call 
MPI_Init() myself, if I know the arguments I need to pass along.

>
> But, one question.  Why not use mip4py for yor MPI calls?  If you 
> really need low-level C stuff
> mpi4py works very well with cython.  All that would be much more 
> pleasant than writing
> low level C/MPI code.  The key is that mpi4py handles all the 
> subtleties of the different MPI
> platforms, and OSs.  Doing that yourself is quite painful.

Well, happily this is already done. :-)

(I'm talking about http://www.codesourcery.com/vsiplplusplus)

In fact, we have embedded most of the MPI subtleties deeply in our 
library. Let me (very quickly) outline the idea of our approach:

The library provides a set of block types (for vectors, matrices, 
tensors), which may or may not be distributed. Most of the MPI calls 
need to be done on assignment, i.e. an equation "A = B" will result in 
communication if (and only if) A and B are distributed, and their 
distributions don't match. This programming paradigm is very similar to 
that used in pMatlab (http://www.ll.mit.edu/pMatlab)

So, all of this is already done. I'm now merely interested in adding 
Python bindings to it.

Thanks,
         Stefan


-- 

       ...ich hab' noch einen Koffer in Berlin...


From seefeld at sympatico.ca  Wed Nov 11 17:48:11 2009
From: seefeld at sympatico.ca (Stefan Seefeld)
Date: Wed, 11 Nov 2009 17:48:11 -0500
Subject: [SciPy-User] Use of MPI in extension modules
In-Reply-To: <4AFB37CB.7000807@molden.no>
References: <4AFAEDFB.10808@sympatico.ca> <4AFB37CB.7000807@molden.no>
Message-ID: <4AFB3F2B.8060706@sympatico.ca>

On 11/11/2009 05:16 PM, Sturla Molden wrote:
>
> If you only get rank 0 reported, it means you spawned just one process.
> That could happen if you forget to specify how many processes you want
> in the call to mpiexec.
>    

I get rank 0 reported in each of the spawned processes, which is caused 
by each one calling MPI_Init() as if it was the only process.
What I need to know is precisely how to call MPI_Init(), i.e. what to 
arguments to pass (either from sys.argv, or some other list where this 
gets stored when ipython invokes iengine).

Thanks,
         Stefan

-- 

       ...ich hab' noch einen Koffer in Berlin...


From robert.kern at gmail.com  Wed Nov 11 17:52:01 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 11 Nov 2009 16:52:01 -0600
Subject: [SciPy-User] Use of MPI in extension modules
In-Reply-To: <4AFB3E8F.3040009@sympatico.ca>
References: <4AFAEDFB.10808@sympatico.ca>
	<6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com> 
	<4AFB3E8F.3040009@sympatico.ca>
Message-ID: <3d375d730911111452p67fe262t27fd39f3d6024425@mail.gmail.com>

On Wed, Nov 11, 2009 at 16:45, Stefan Seefeld <seefeld at sympatico.ca> wrote:
> On 11/11/2009 01:23 PM, Brian Granger wrote:
>> Stefan,
>>
>> This is probably a better topic for the IPython users list:
>>
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> Thanks !
> I didn't know that actually exists. It doesn't appear to be listed on
> either http://www.scipy.org or http://ipython.scipy.org,

It's under the section "Mailing Lists" on that page.

> nor on
> http://www.scipy.org/Mailing_Lists. I'll cross-post there, so we may
> continue the conversation there, assuming I'm not moderated.

The next time you need to move a thread over, please just start a new
thread. Cross-posts have a way of not stopping when you want them to.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From matthew.brett at gmail.com  Thu Nov 12 02:51:23 2009
From: matthew.brett at gmail.com (Matthew Brett)
Date: Wed, 11 Nov 2009 23:51:23 -0800
Subject: [SciPy-User] loading mat file in scipy
In-Reply-To: <2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com>
References: <3312.140.105.40.24.1256052796.squirrel@webmail.sissa.it>
	<3d375d730910200835p65aa28b3i92c72759dbed5f20@mail.gmail.com>
	<2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com>
Message-ID: <1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com>

Hi,

> http://www.robince.net/robince/structs_cells.mat
> http://mail.scipy.org/pipermail/scipy-user/2009-April/020860.html
>
> Make sure you are using a recent version of scipy. I think there was
> some performance fixes that improved it - with current scipy SVN on a
> macbook pro structs_cells.mat takes about 28s to load
> (structs_as_record doesn't seem to make a difference). This is already
> some improvement (40s in April, 4 minutes prior to that). On Matlab it
> takes about 1.4s.

The current code from the git branch that I posted is now running at
around 1.5 s to load your file, on a fast machine. Matlab 7.4 is
taking around 5s on the same machine.

Best,

Matthew


From robince at gmail.com  Thu Nov 12 05:44:44 2009
From: robince at gmail.com (Robin)
Date: Thu, 12 Nov 2009 10:44:44 +0000
Subject: [SciPy-User] loading mat file in scipy
In-Reply-To: <1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com>
References: <3312.140.105.40.24.1256052796.squirrel@webmail.sissa.it>
	<3d375d730910200835p65aa28b3i92c72759dbed5f20@mail.gmail.com>
	<2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com>
	<1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com>
Message-ID: <2d5132a50911120244p7bcbd1eei496b62531d89f7fa@mail.gmail.com>

On Thu, Nov 12, 2009 at 7:51 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> The current code from the git branch that I posted is now running at
> around 1.5 s to load your file, on a fast machine. Matlab 7.4 is
> taking around 5s on the same machine.

Thanks very much, that's a terrific improvement!

I have been meaning to test your branch this week, but I wasn't sure
if there was a way I could build it without reinstalling my current
scipy version so I was waiting for some time to read about how
building in place and stuff work - in the end I just moved the current
installation sideways and installed your branch as normal.

I see a tremendous improvement as well - for me the demo file loads in
2.5s with improved loadmat vs 1.6s for Matlab 7.8. This is on a couple
of years old macbook pro. Because I'm building against python 2.5 I'm
forced to use apple gcc 4.0 and I think this could account for some of
the difference (I read gcc got a lot better in more recent versions).

Thanks again,

Robin


From anderse at gmx.de  Thu Nov 12 06:35:30 2009
From: anderse at gmx.de (anderse at gmx.de)
Date: Thu, 12 Nov 2009 12:35:30 +0100
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
Message-ID: <20091112113530.279300@gmx.net>

Hi,

I'd like to get the derivatives of parametric splines.
Looking at the tutorial (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html)
I get a spline like this:

>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8)
>>> y = np.sin(x)
>>> tck = interpolate.splrep(x, y, s = 0, k = 5)
>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50)
>>> ynew = interpolate.splev(xnew, tck, der = 0)

now, the derivatives can be determined like this:

>>> yder = interpolate.splev(xnew, tck, der = 1)
>>> yder2 = interpolate.splev(xnew, tck, der = 2)

>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2)

The first derivative is about null at pi / 2,
the second one at pi, as they should be:

>>> interpolate.spalde(np.pi, tck)
array([  0.00000000e+00,  -1.00064770e+00,  -1.73418916e-17,
         1.00726743e+00,  -2.65046223e-16,  -1.01680119e+00])

>>> interpolate.spalde(np.pi / 2, tck)
array([ 1.        , -0.00199181, -0.99629386,  0.02365328,  0.90756527,
       -0.1387468 ])

Of course, the x-range is the same, no matter of der=#.

Now the parametric version:

>>> tckp, u = interpolate.splprep([x, y], s=0, k=5)
>>> u
array([ 0.        ,  0.13941767,  0.25      ,  0.36058233,  0.5       ,
        0.63941767,  0.75      ,  0.86058233,  1.        ])

so pi is at 0.5, pi/2 is at 0.25.

And this is what I get at these 'x' values:

>>> interpolate.spalde(0.5, tckp)
[array([  3.14159265e+00,   5.14754151e+00,   1.10395807e-13,
         1.69542498e+02,  -4.03851332e-11,  -2.01255417e+04]),
 array([  7.73894012e-16,  -5.38240284e+00,  -1.31811639e-13,
         7.74093936e+01,   5.58012792e-11,   1.89849315e+04])]

>>> interpolate.spalde(0.25, tckp)
[array([  1.57079633e+00,   7.44935679e+00,  -7.65674781e-02,
        -1.85343925e+02,   7.51370411e+01,   2.46939899e+04]),
 array([  1.00000000e+00,  -3.47491248e-01,  -5.16420728e+01,
         2.05418849e+02,   3.66866738e+03,  -5.71113127e+04])]

The first array states the x-values, the second one the y-values, respectively, AFAIK.
This makes sense without derivatives, and I get a plot using

>>> unew = np.arange(0, 1.01, 0.01)
>>> out = interpolate.splev(unew, tckp, der = 0)
>>> plt.plot(out[0], out[1])

which looks like the one above, but what about the derivatives?

>>> der1 = interpolate.splev(unew, tckp, der = 1)
>>> der2 = interpolate.splev(unew, tckp, der = 2)
>>> plt.plot(der1[0], der1[1], der2[0], der2[1])

dont make sense to me at all.

Thank you in advance for your help.

Raimund

-- 
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


From devicerandom at gmail.com  Thu Nov 12 06:35:50 2009
From: devicerandom at gmail.com (ms)
Date: Thu, 12 Nov 2009 11:35:50 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <4AFAEE8E.9080000@gmail.com>
References: <4AFAE5AF.3020506@gmail.com> <4AFAEE8E.9080000@gmail.com>
Message-ID: <4AFBF316.30507@gmail.com>

Hi Bruce,

Thanks for your reply but there are several things I don't really grasp:

Bruce Southey ha scritto:
> On 11/11/2009 10:26 AM, ms wrote:
>> Let's start with a simple example. Imagine I have several linear data
>> sets y=ax+b which have different b (all of them are known) but that
>> should fit to the same (unknown) a. To have my best estimate of a, I
>> would want to fit them all together. In this case it is trivial, you
>> just subtract the known b from the data set and fit them all at the same
>> time.
>>    
> Although b is known without error you still have potentially effects due 
> to each data set.
> 
> What I would do is fit:
> y= mu + dataset + a*x + dataset*a*x
> 
> Where mu is some overall mean,

Mean of what? The b's?

> dataset is the effect of the ith dataset - allows different intercepts 
> for each data set
> dataset*a is the interaction between a and the dataset - allows 
> different slopes for each dataset.

I don't really understand what quantities you mean by "effect" and
"interaction", and why should I want to allow different slopes for each
dataset -the aim to fit one and only one slope from all datasets.

> Obviously you first test that interaction is zero. In theory, the 
> difference between the solutions of dataset should equate to the 
> differences between the known b's.

...same as above...

> Now you just expand your linear model to nonlinear one. The formulation 
> depends on your equation. But really you just replace f(a*x) with 
> f(a*x+dataset*a*x).
> 
> So I first try with a linear model before a nonlinear. Also I would see 
> if I could linearize the non-linear function.

Well, the function is for sure non linear (it has a sigmoidal shape). To
linearize it is a good idea but I am doubtful it is doable.

Thanks!

m.


From gnurser at googlemail.com  Thu Nov 12 08:07:15 2009
From: gnurser at googlemail.com (George Nurser)
Date: Thu, 12 Nov 2009 13:07:15 +0000
Subject: [SciPy-User] vectorplot scikit code patch
Message-ID: <1d1e6ea70911120507q621c8269h5d5b0c3f934a1377@mail.gmail.com>

Hi,

Not sure where to post this.

The lic_demo.py and lic_efield_demo.py scripts in the vectorplot
scikit fail for me with
Traceback (most recent call last):
  File "lic_efield_demo.py", line 55, in <module>
    plt.figimage(image)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/pyplot.py",
line 404, in figimage
    sci(ret)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/pyplot.py",
line 160, in sci
    gca()._sci(im)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/axes.py",
line 1338, in _sci
    "Argument must be an image, collection, or ContourSet in this Axes")
ValueError: Argument must be an image, collection, or ContourSet in this Axes

I got both scripts to work by in each of them replacing
    plt.clf()
    plt.axis('off')
    plt.figimage(image)
by
    fig = plt.figure()
    plt.clf()
    plt.axis('off')
    fig.figimage(image)

--George.


From zachary.pincus at yale.edu  Thu Nov 12 08:19:58 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Thu, 12 Nov 2009 08:19:58 -0500
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
In-Reply-To: <20091112113530.279300@gmx.net>
References: <20091112113530.279300@gmx.net>
Message-ID: <A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>

Without thinking deeply about this at all, aren't the derivatives of a  
parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy  
that you are perhaps expecting?


On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote:

> Hi,
>
> I'd like to get the derivatives of parametric splines.
> Looking at the tutorial (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html 
> )
> I get a spline like this:
>
>>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8)
>>>> y = np.sin(x)
>>>> tck = interpolate.splrep(x, y, s = 0, k = 5)
>>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50)
>>>> ynew = interpolate.splev(xnew, tck, der = 0)
>
> now, the derivatives can be determined like this:
>
>>>> yder = interpolate.splev(xnew, tck, der = 1)
>>>> yder2 = interpolate.splev(xnew, tck, der = 2)
>
>>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2)
>
> The first derivative is about null at pi / 2,
> the second one at pi, as they should be:
>
>>>> interpolate.spalde(np.pi, tck)
> array([  0.00000000e+00,  -1.00064770e+00,  -1.73418916e-17,
>         1.00726743e+00,  -2.65046223e-16,  -1.01680119e+00])
>
>>>> interpolate.spalde(np.pi / 2, tck)
> array([ 1.        , -0.00199181, -0.99629386,  0.02365328,   
> 0.90756527,
>       -0.1387468 ])
>
> Of course, the x-range is the same, no matter of der=#.
>
> Now the parametric version:
>
>>>> tckp, u = interpolate.splprep([x, y], s=0, k=5)
>>>> u
> array([ 0.        ,  0.13941767,  0.25      ,  0.36058233,   
> 0.5       ,
>        0.63941767,  0.75      ,  0.86058233,  1.        ])
>
> so pi is at 0.5, pi/2 is at 0.25.
>
> And this is what I get at these 'x' values:
>
>>>> interpolate.spalde(0.5, tckp)
> [array([  3.14159265e+00,   5.14754151e+00,   1.10395807e-13,
>         1.69542498e+02,  -4.03851332e-11,  -2.01255417e+04]),
> array([  7.73894012e-16,  -5.38240284e+00,  -1.31811639e-13,
>         7.74093936e+01,   5.58012792e-11,   1.89849315e+04])]
>
>>>> interpolate.spalde(0.25, tckp)
> [array([  1.57079633e+00,   7.44935679e+00,  -7.65674781e-02,
>        -1.85343925e+02,   7.51370411e+01,   2.46939899e+04]),
> array([  1.00000000e+00,  -3.47491248e-01,  -5.16420728e+01,
>         2.05418849e+02,   3.66866738e+03,  -5.71113127e+04])]
>
> The first array states the x-values, the second one the y-values,  
> respectively, AFAIK.
> This makes sense without derivatives, and I get a plot using
>
>>>> unew = np.arange(0, 1.01, 0.01)
>>>> out = interpolate.splev(unew, tckp, der = 0)
>>>> plt.plot(out[0], out[1])
>
> which looks like the one above, but what about the derivatives?
>
>>>> der1 = interpolate.splev(unew, tckp, der = 1)
>>>> der2 = interpolate.splev(unew, tckp, der = 2)
>>>> plt.plot(der1[0], der1[1], der2[0], der2[1])
>
> dont make sense to me at all.
>
> Thank you in advance for your help.
>
> Raimund
>
> -- 
> GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From anderse at gmx.de  Thu Nov 12 08:44:56 2009
From: anderse at gmx.de (Raimund Andersen)
Date: Thu, 12 Nov 2009 14:44:56 +0100
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
In-Reply-To: <A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>
References: <20091112113530.279300@gmx.net>
	<A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>
Message-ID: <20091112134456.279300@gmx.net>

Hello Zachary Pincus,

thanks for your answer. Maybe I didn't get you right.
The first derivative at pi/2 should be 0 ( cos(pi/2) ).
What I get from interpolate.spalde(0.25, tckp) is

7.44935679e+00 and -3.47491248e-01.

Now, how do I get to 0? Why those different 'x' values at all?
It should be always 1.57079633e+00, no?
 

-------- Original-Nachricht --------
> Datum: Thu, 12 Nov 2009 08:19:58 -0500
> Von: Zachary Pincus <zachary.pincus at yale.edu>
> An: SciPy Users List <scipy-user at scipy.org>
> Betreff: Re: [SciPy-User] Interpolate: Derivatives of parametric splines

> Without thinking deeply about this at all, aren't the derivatives of a  
> parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy  
> that you are perhaps expecting?
> 
> 
> On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote:
> 
> > Hi,
> >
> > I'd like to get the derivatives of parametric splines.
> > Looking at the tutorial
> (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html 
> > )
> > I get a spline like this:
> >
> >>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8)
> >>>> y = np.sin(x)
> >>>> tck = interpolate.splrep(x, y, s = 0, k = 5)
> >>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50)
> >>>> ynew = interpolate.splev(xnew, tck, der = 0)
> >
> > now, the derivatives can be determined like this:
> >
> >>>> yder = interpolate.splev(xnew, tck, der = 1)
> >>>> yder2 = interpolate.splev(xnew, tck, der = 2)
> >
> >>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2)
> >
> > The first derivative is about null at pi / 2,
> > the second one at pi, as they should be:
> >
> >>>> interpolate.spalde(np.pi, tck)
> > array([  0.00000000e+00,  -1.00064770e+00,  -1.73418916e-17,
> >         1.00726743e+00,  -2.65046223e-16,  -1.01680119e+00])
> >
> >>>> interpolate.spalde(np.pi / 2, tck)
> > array([ 1.        , -0.00199181, -0.99629386,  0.02365328,   
> > 0.90756527,
> >       -0.1387468 ])
> >
> > Of course, the x-range is the same, no matter of der=#.
> >
> > Now the parametric version:
> >
> >>>> tckp, u = interpolate.splprep([x, y], s=0, k=5)
> >>>> u
> > array([ 0.        ,  0.13941767,  0.25      ,  0.36058233,   
> > 0.5       ,
> >        0.63941767,  0.75      ,  0.86058233,  1.        ])
> >
> > so pi is at 0.5, pi/2 is at 0.25.
> >
> > And this is what I get at these 'x' values:
> >
> >>>> interpolate.spalde(0.5, tckp)
> > [array([  3.14159265e+00,   5.14754151e+00,   1.10395807e-13,
> >         1.69542498e+02,  -4.03851332e-11,  -2.01255417e+04]),
> > array([  7.73894012e-16,  -5.38240284e+00,  -1.31811639e-13,
> >         7.74093936e+01,   5.58012792e-11,   1.89849315e+04])]
> >
> >>>> interpolate.spalde(0.25, tckp)
> > [array([  1.57079633e+00,   7.44935679e+00,  -7.65674781e-02,
> >        -1.85343925e+02,   7.51370411e+01,   2.46939899e+04]),
> > array([  1.00000000e+00,  -3.47491248e-01,  -5.16420728e+01,
> >         2.05418849e+02,   3.66866738e+03,  -5.71113127e+04])]
> >
> > The first array states the x-values, the second one the y-values,  
> > respectively, AFAIK.
> > This makes sense without derivatives, and I get a plot using
> >
> >>>> unew = np.arange(0, 1.01, 0.01)
> >>>> out = interpolate.splev(unew, tckp, der = 0)
> >>>> plt.plot(out[0], out[1])
> >
> > which looks like the one above, but what about the derivatives?
> >
> >>>> der1 = interpolate.splev(unew, tckp, der = 1)
> >>>> der2 = interpolate.splev(unew, tckp, der = 2)
> >>>> plt.plot(der1[0], der1[1], der2[0], der2[1])
> >
> > dont make sense to me at all.
> >
> > Thank you in advance for your help.
> >
> > Raimund
> >
> > -- 
> > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

-- 
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


From stefan at sun.ac.za  Thu Nov 12 09:08:38 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 12 Nov 2009 16:08:38 +0200
Subject: [SciPy-User] ANN: scikits.image v0.2
In-Reply-To: <9457e7c80911120603h6863532cy99499a352c2f6fca@mail.gmail.com>
References: <9457e7c80911120603h6863532cy99499a352c2f6fca@mail.gmail.com>
Message-ID: <9457e7c80911120608i1343d61an9fd02fb82707e668@mail.gmail.com>

I'm glad to announce the second release of `scikits.image`, a
collection of image processing routines for SciPy.

On top of bug-fixes and improved documentation, the following
changes/additions were made:

- A new IO plugin infrastructure so that commands like 'imshow' are
available via multiple backends (PIL, matplotlib, QT4, etc.)
- ImageCollections (for cached loading of multiple images) and
MultiImage (for working with multi-layered images)
- More complete OpenCV wrappers
- A graphical image viewer (also installed as a script `scivi`), that
allows colour adjustments
- Shortest path algorithm

For version 0.3, we aim to

- Incorporate some of the code offered by the Broad institute
- Implement acquisition (grabbing images from cameras) and intrinsic
camera calibration
- Add real time video and camera display with processing
- Improve filtering code
- Add morphological operations

More information is available at: http://stefanv.github.com/scikits.image/

Regards
St?fan


From lpc at cmu.edu  Thu Nov 12 09:37:36 2009
From: lpc at cmu.edu (Luis Pedro Coelho)
Date: Thu, 12 Nov 2009 09:37:36 -0500
Subject: [SciPy-User] Distributed computing: running embarrassingly
	parallel (python/c++) codes over a cluster
Message-ID: <200911120937.36690.lpc@cmu.edu>

Rohit Garg wrote:
> I have an embarrassingly parallel problem, very nicely suited to
> parallelization. 

I have lots of those :)

> My only constraint is that it should be able to run a python extension
> (c++) with minimum of fuss. I want to minimize the headaches involved
> with setting up/writing the boilerplate code. Which
> framework/approach/library would you recommend?

My own: It's called jug. See

http://luispedro.org/software/jug

(
Or download the code from github:
http://github.com/luispedro/jug
)

*

It works with any set of processors that can either share a filesystem (plays 
well with NFS, but can be slow) or a connection to a redis database (which is 
very easy to set up and is probably as fast as any other approach if everyone 
is on the same processor).

A major advantage is that you write mostly Python (and not something funny 
looking). For example, here's what a programme with that framework would look 
like:

@TaskGenerator
def preprocess(input):
   ...

@TaskGenerator
def compute(input, param):
    ...

@TaskGenerator
def collect(inputs):
    ...

results = []
for input in glob('*.in'):
	intermediate = preprocess(input)
        results.append(compute(intermediate, param))        
final = collect(results)

The only step that's different w.r.t. to the linear version is adding the 
TaskGenerator decorator, which changes a call of preprocess(input) into 
Task(preprocess, input).

Jug handles everything else.

I have been using this now for almost year for all my research work and it 
works very well for me.

HTH,
Luis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/f8730150/attachment.sig>

From josef.pktd at gmail.com  Thu Nov 12 09:44:12 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 12 Nov 2009 09:44:12 -0500
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <4AFAE5AF.3020506@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
Message-ID: <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>

On Wed, Nov 11, 2009 at 11:26 AM, ms <devicerandom at gmail.com> wrote:
> Hi,
>
> Probably it is a noobish question, but statistics is still not my cup of
> tea as I'd like it to be. :)
>
> Let's start with a simple example. Imagine I have several linear data
> sets y=ax+b which have different b (all of them are known) but that
> should fit to the same (unknown) a. To have my best estimate of a, I
> would want to fit them all together. In this case it is trivial, you
> just subtract the known b from the data set and fit them all at the same
> time.
>
> In my case it is a bit different, in the sense that I have to do
> conceptually the same thing but for a highly non-linear equation where
> the equivalent of "b" above is not so simple to separate. I wonder
> therefore if there is a way to do a simultaneous fit of different
> equations differing only in the known parameters and having a single
> output, possibly with the help of ODR. Is this possible? And/or what
> should be the best thing to do, in general, for this kind of problems?

I don't know enough about ODR, but for least squares, optimize.leastsq
or curve_fit, it seems you can just substitute any known parameters
into your equation.

y_i = f(x_i, a, b_i) for each group i
plug in values for all b_i, gives reduced f(x_i, a) independent of
specific parameters
stack equations [y_i for all i] and [f(..) for all i]

If you fit this in curve_fit you could also choose the weights, in
case the error variance differs by groups.

Does this work or am I missing the point?

Josef


> Many thanks,
> M.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From devicerandom at gmail.com  Thu Nov 12 10:04:39 2009
From: devicerandom at gmail.com (ms)
Date: Thu, 12 Nov 2009 15:04:39 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
Message-ID: <4AFC2407.1020902@gmail.com>

josef.pktd at gmail.com ha scritto:
> On Wed, Nov 11, 2009 at 11:26 AM, ms <devicerandom at gmail.com> wrote:
>> Let's start with a simple example. Imagine I have several linear data
>> sets y=ax+b which have different b (all of them are known) but that
>> should fit to the same (unknown) a. To have my best estimate of a, I
>> would want to fit them all together. In this case it is trivial, you
>> just subtract the known b from the data set and fit them all at the same
>> time.
>>
>> In my case it is a bit different, in the sense that I have to do
>> conceptually the same thing but for a highly non-linear equation where
>> the equivalent of "b" above is not so simple to separate. I wonder
>> therefore if there is a way to do a simultaneous fit of different
>> equations differing only in the known parameters and having a single
>> output, possibly with the help of ODR. Is this possible? And/or what
>> should be the best thing to do, in general, for this kind of problems?
> 
> I don't know enough about ODR, but for least squares, optimize.leastsq
> or curve_fit, it seems you can just substitute any known parameters
> into your equation.
> 
> y_i = f(x_i, a, b_i) for each group i
> plug in values for all b_i, gives reduced f(x_i, a) independent of
> specific parameters
> stack equations [y_i for all i] and [f(..) for all i]
> 
> If you fit this in curve_fit you could also choose the weights, in
> case the error variance differs by groups.
> 
> Does this work or am I missing the point?

Probably it's me missing it. Do you just mean to fit them all together
separately and then make a weighted average of the fitted parameters,
and using the standard deviation of the mean as the error of the fit? I
am confused.

sorry,
m.


From sccolbert at gmail.com  Thu Nov 12 10:23:53 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Thu, 12 Nov 2009 16:23:53 +0100
Subject: [SciPy-User] Edge Detection
In-Reply-To: <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu>
References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>
	<4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu>
Message-ID: <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com>

All of the OpenCV edge detection routines are also available in
scikits.image if you have opencv  (>= 2.0) installed.

On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> References: Start around just looking at the top google hits for "image
> processing edge detection" -- that should be a pretty good start. Also,
> google any unfamiliar terms below... I really find that there's a ton of
> good basic image-processing information available online.
>
> Code: Look at what's available in scipy.ndimage. There are functions for
> getting gradient magnitudes, as well as standard filters like Sobel etc.
> (which you'll learn about from the above), plus morphological operators for
> modifying binarized image regions (e.g. like erosion etc.; useful for
> getting rid of stray noise-induced edges), plus some basic functions for
> image smoothing like median filters, etc.
>
> For exploratory analysis, you might want some ability to interactively
> visualize images; you could use matplotlib or the imaging scikit, which is
> still pre-release but making fast progress:
> http://github.com/stefanv/scikits.image
>
> I've attached basic code for Canny edge detection, which should demonstrate
> a bit about how ndimage works, plus it's useful in its own right. There is
> also some code floating around for anisotropic diffusion and bilateral
> filtering, which are two noise-reduction methods that can be better than
> simple median filtering.
>
> Zach
>
>
>
> On Nov 10, 2009, at 11:37 AM, Dan Yamins wrote:
>
>> Hi,
>>
>> I'm looking into using SciPy for a couple of edge-detection problems,
>> involving detection of edges in images of text (in simple, clean fonts).
>> If someone on this list could point me to a relevant resource / function,
>> that would be excellent. ? (I have essentially no background in image
>> processing, but am reasonably comfortable mathematically, and I would be
>> happy to dive into something fairly technical.)
>>
>> thanks,
>> Dan
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From warren.weckesser at enthought.com  Thu Nov 12 10:27:47 2009
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Thu, 12 Nov 2009 09:27:47 -0600
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
In-Reply-To: <20091112134456.279300@gmx.net>
References: <20091112113530.279300@gmx.net>	<A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>
	<20091112134456.279300@gmx.net>
Message-ID: <4AFC2973.6080800@enthought.com>

Raimund,

When you interpolate the curve using splprep, you get a spline 
representation of the parameterized curve (x(u), y(u)).  As Zachary 
pointed out, the derivative values returned by spalde are the 
derivatives of x and y with respect to u.  To get dy/dx, you can compute 
dy/dx = y'(u)/x'(u).

This bit of code shows an example:

----------
import numpy as np
from scipy import interpolate

numpoints = 20

x = np.linspace(0, 2*np.pi, numpoints)
y = np.sin(x)

tckp, u = interpolate.splprep([x, y], s=0, k=5)

u0 = 0.25
ders = interpolate.spalde(u0, tckp)

x = ders[0][0]
y = ders[1][0]
dxdu = ders[0][1]
dydu = ders[1][1]
dydx = dydu / dxdu
print "u =", u0, ": x =", x, " y =", y, " dy/dx = ", dydx
----------

Warren


Raimund Andersen wrote:
> Hello Zachary Pincus,
>
> thanks for your answer. Maybe I didn't get you right.
> The first derivative at pi/2 should be 0 ( cos(pi/2) ).
> What I get from interpolate.spalde(0.25, tckp) is
>
> 7.44935679e+00 and -3.47491248e-01.
>
> Now, how do I get to 0? Why those different 'x' values at all?
> It should be always 1.57079633e+00, no?
>  
>
> -------- Original-Nachricht --------
>   
>> Datum: Thu, 12 Nov 2009 08:19:58 -0500
>> Von: Zachary Pincus <zachary.pincus at yale.edu>
>> An: SciPy Users List <scipy-user at scipy.org>
>> Betreff: Re: [SciPy-User] Interpolate: Derivatives of parametric splines
>>     
>
>   
>> Without thinking deeply about this at all, aren't the derivatives of a  
>> parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy  
>> that you are perhaps expecting?
>>
>>
>> On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote:
>>
>>     
>>> Hi,
>>>
>>> I'd like to get the derivatives of parametric splines.
>>> Looking at the tutorial
>>>       
>> (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html 
>>     
>>> )
>>> I get a spline like this:
>>>
>>>       
>>>>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8)
>>>>>> y = np.sin(x)
>>>>>> tck = interpolate.splrep(x, y, s = 0, k = 5)
>>>>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50)
>>>>>> ynew = interpolate.splev(xnew, tck, der = 0)
>>>>>>             
>>> now, the derivatives can be determined like this:
>>>
>>>       
>>>>>> yder = interpolate.splev(xnew, tck, der = 1)
>>>>>> yder2 = interpolate.splev(xnew, tck, der = 2)
>>>>>>             
>>>>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2)
>>>>>>             
>>> The first derivative is about null at pi / 2,
>>> the second one at pi, as they should be:
>>>
>>>       
>>>>>> interpolate.spalde(np.pi, tck)
>>>>>>             
>>> array([  0.00000000e+00,  -1.00064770e+00,  -1.73418916e-17,
>>>         1.00726743e+00,  -2.65046223e-16,  -1.01680119e+00])
>>>
>>>       
>>>>>> interpolate.spalde(np.pi / 2, tck)
>>>>>>             
>>> array([ 1.        , -0.00199181, -0.99629386,  0.02365328,   
>>> 0.90756527,
>>>       -0.1387468 ])
>>>
>>> Of course, the x-range is the same, no matter of der=#.
>>>
>>> Now the parametric version:
>>>
>>>       
>>>>>> tckp, u = interpolate.splprep([x, y], s=0, k=5)
>>>>>> u
>>>>>>             
>>> array([ 0.        ,  0.13941767,  0.25      ,  0.36058233,   
>>> 0.5       ,
>>>        0.63941767,  0.75      ,  0.86058233,  1.        ])
>>>
>>> so pi is at 0.5, pi/2 is at 0.25.
>>>
>>> And this is what I get at these 'x' values:
>>>
>>>       
>>>>>> interpolate.spalde(0.5, tckp)
>>>>>>             
>>> [array([  3.14159265e+00,   5.14754151e+00,   1.10395807e-13,
>>>         1.69542498e+02,  -4.03851332e-11,  -2.01255417e+04]),
>>> array([  7.73894012e-16,  -5.38240284e+00,  -1.31811639e-13,
>>>         7.74093936e+01,   5.58012792e-11,   1.89849315e+04])]
>>>
>>>       
>>>>>> interpolate.spalde(0.25, tckp)
>>>>>>             
>>> [array([  1.57079633e+00,   7.44935679e+00,  -7.65674781e-02,
>>>        -1.85343925e+02,   7.51370411e+01,   2.46939899e+04]),
>>> array([  1.00000000e+00,  -3.47491248e-01,  -5.16420728e+01,
>>>         2.05418849e+02,   3.66866738e+03,  -5.71113127e+04])]
>>>
>>> The first array states the x-values, the second one the y-values,  
>>> respectively, AFAIK.
>>> This makes sense without derivatives, and I get a plot using
>>>
>>>       
>>>>>> unew = np.arange(0, 1.01, 0.01)
>>>>>> out = interpolate.splev(unew, tckp, der = 0)
>>>>>> plt.plot(out[0], out[1])
>>>>>>             
>>> which looks like the one above, but what about the derivatives?
>>>
>>>       
>>>>>> der1 = interpolate.splev(unew, tckp, der = 1)
>>>>>> der2 = interpolate.splev(unew, tckp, der = 2)
>>>>>> plt.plot(der1[0], der1[1], der2[0], der2[1])
>>>>>>             
>>> dont make sense to me at all.
>>>
>>> Thank you in advance for your help.
>>>
>>> Raimund
>>>
>>> -- 
>>> GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
>>> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>       
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>     
>
>   


From zachary.pincus at yale.edu  Thu Nov 12 10:31:14 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Thu, 12 Nov 2009 10:31:14 -0500
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
In-Reply-To: <20091112134456.279300@gmx.net>
References: <20091112113530.279300@gmx.net>
	<A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>
	<20091112134456.279300@gmx.net>
Message-ID: <5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu>

Hi,

> thanks for your answer. Maybe I didn't get you right.
> The first derivative at pi/2 should be 0 ( cos(pi/2) ).
> What I get from interpolate.spalde(0.25, tckp) is
>
> 7.44935679e+00 and -3.47491248e-01.

The first value is dx/du at 0.25. If you look at der1[0] (e.g. dx/du),  
you'll see it's basically constant, which is what you expect since x  
and u are linear with one another.

> Why those different 'x' values at all?
> It should be always 1.57079633e+00, no?

I don't know why you think dx/du ought to be pi/2: x goes from 0 to  
2pi while u goes from 0 to 1, therefore the slope of the line x(u) is  
2pi; thus dx/du ought to be 2pi as well. Which it is, more or less,  
except for endpoint effects. These effects are more pronounced with  
parametric splines since, basically, there's more degrees of freedom  
for what the spline can do beyond the range of the input data. (Check  
out how the spline goes beyond the endpoints of your original data --  
the parametric spline goes nuts, because, essentially, dx/du isn't  
fixed at a constant, unlike in the nonparametric spline case. When  
fitting a function with fewer constraints, it should not be a surprise  
that the fit is worse.)

Now, the second value you show above (-3.47491248e-01) is dy/du at  
0.25. Because dx/du is ~constant, dy/du should have the same zeros as  
dy/dx. Now, -0.35 isn't exactly zero, but if you look at the plot of  
der1[1], you'll see that der1[1] does have a zero pretty close to 0.25  
that point. So again, you're getting more or less the expected result,  
especially given that a parametric spline fit with all those extra  
degrees of freedom just won't fit a function y(x) as well as the  
nonparametric spline designed for fitting functions of the form y(x).

Make sense?

By the way, if I were to try to evaluate a periodic function with a  
spline, I'd use interpolate.splrep with per=1. And if I had a periodic  
parametric function (e.g. a closed plane curve), use splprep with  
per=1. Periodic functions are the only case where endpoint effects can  
be completely banished with spline fitting. Otherwise endpoints  
effects are just par for the course with non-periodic spline fits, and  
are, as above, more troublesome in the parametric case because there  
are even more degrees of freedom. Feed splrep more data and you'll get  
better results because there are more constraints.

Alternately, use a lower-order spline -- which are less prone to  
"ringing" artifacts when under-constrained -- to get better results  
with sparser data. (Not also below my use of numpy.linspace, which is  
far easier than arange for the sort of things you're needing.)

# Not much data, high order spline
In [119]: x = np.linspace(0, 2*np.pi, 9)
In [120]: y = np.sin(x)
In [121]: tckp, u = interpolate.splprep([x, y], s=0, k=5)
In [122]: interpolate.spalde(0.25, tckp)[1]
  array([  1.00000000e+00,  -3.47491248e-01,  -5.16420728e+01,
          2.05418849e+02,   3.66866738e+03,  -5.71113127e+04])]

# More data, high order spline
In [123]: x = np.linspace(0, 2*np.pi, 20)
In [124]: y = np.sin(x)
In [125]: tckp, u = interpolate.splprep([x, y], s=0, k=5)
In [126]: interpolate.spalde(0.25, tckp)[1]
  array([  9.99945859e-01,   4.44888721e-03,  -5.85188610e+01,
         -3.32328600e+01,   1.61037915e+04,   2.86354557e+05])]

# Not much data, but lower order spline
In [127]: x = np.linspace(0, 2*np.pi, 9)
In [128]: y = np.sin(x)
In [129]: tckp, u = interpolate.splprep([x, y], s=0, k=3)
In [130]: interpolate.spalde(0.25, tckp)[1]
  array([  1.00000000e+00,  -6.93498869e-02,  -6.23269054e+01,
          4.25319565e+02])]


Zach


From bsouthey at gmail.com  Thu Nov 12 10:45:34 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 12 Nov 2009 09:45:34 -0600
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <4AFBF316.30507@gmail.com>
References: <4AFAE5AF.3020506@gmail.com> <4AFAEE8E.9080000@gmail.com>
	<4AFBF316.30507@gmail.com>
Message-ID: <4AFC2D9E.2060302@gmail.com>

On 11/12/2009 05:35 AM, ms wrote:
> Hi Bruce,
>
> Thanks for your reply but there are several things I don't really grasp:
>
> Bruce Southey ha scritto:
>    
>> On 11/11/2009 10:26 AM, ms wrote:
>>      
>>> Let's start with a simple example. Imagine I have several linear data
>>> sets y=ax+b which have different b (all of them are known) but that
>>> should fit to the same (unknown) a. To have my best estimate of a, I
>>> would want to fit them all together. In this case it is trivial, you
>>> just subtract the known b from the data set and fit them all at the same
>>> time.
>>>
>>>        
>> Although b is known without error you still have potentially effects due
>> to each data set.
>>
>> What I would do is fit:
>> y= mu + dataset + a*x + dataset*a*x
>>
>> Where mu is some overall mean,
>>      
> Mean of what? The b's?
>    
Depending on what your terms are,  y=a*x +b can be viewed is a simple 
linear regression then b is an intercept and a is a slope. Under a 
different view (typically general linear modeling), b can be a factor or 
class variable where 'b' can have multiple levels. As in the model 
above, this is analysis of covariance. You can get your estimate of 'b' 
for each data set as mu plus the appropriate solution of dataset. (While 
you can parameterize the model as y= dataset + ..., it is not as easy to 
interpret as the one using mu.)

The reason for using this type of model is that you can quantify the 
variation between the data sets.


>> dataset is the effect of the ith dataset - allows different intercepts
>> for each data set
>> dataset*a is the interaction between a and the dataset - allows
>> different slopes for each dataset.
>>      
> I don't really understand what quantities you mean by "effect" and
> "interaction", and why should I want to allow different slopes for each
> dataset -the aim to fit one and only one slope from all datasets.
>    
The reason is that you can test that the slopes are the same and see if 
any data sets appear unusual. If the slopes are the same then you are 
back to what you wanted to know. Otherwise, you need to address why one 
or more data sets are different from the others.


>> Obviously you first test that interaction is zero. In theory, the
>> difference between the solutions of dataset should equate to the
>> differences between the known b's.
>>      
> ...same as above...
>
>    
>> Now you just expand your linear model to nonlinear one. The formulation
>> depends on your equation. But really you just replace f(a*x) with
>> f(a*x+dataset*a*x).
>>
>> So I first try with a linear model before a nonlinear. Also I would see
>> if I could linearize the non-linear function.
>>      
> Well, the function is for sure non linear (it has a sigmoidal shape). To
> linearize it is a good idea but I am doubtful it is doable.
>
> Thanks!
>
> m.
>    
Again it depends on the function because some of these do have 
linearized forms or can be well approximated by a linear model.

Bruce


From devicerandom at gmail.com  Thu Nov 12 10:46:43 2009
From: devicerandom at gmail.com (ms)
Date: Thu, 12 Nov 2009 15:46:43 +0000
Subject: [SciPy-User] ODR fitting several equations to
	the	same	parameters
In-Reply-To: <4AFC2D9E.2060302@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<4AFAEE8E.9080000@gmail.com>	<4AFBF316.30507@gmail.com>
	<4AFC2D9E.2060302@gmail.com>
Message-ID: <4AFC2DE3.1040908@gmail.com>

Bruce Southey ha scritto:
> On 11/12/2009 05:35 AM, ms wrote:
>>> Although b is known without error you still have potentially effects due
>>> to each data set.
>>>
>>> What I would do is fit:
>>> y= mu + dataset + a*x + dataset*a*x
>>>
>>> Where mu is some overall mean,
>>>      
>> Mean of what? The b's?
>>    
> Depending on what your terms are,  y=a*x +b can be viewed is a simple 
> linear regression then b is an intercept and a is a slope. Under a 
> different view (typically general linear modeling), b can be a factor or 
> class variable where 'b' can have multiple levels. As in the model 
> above, this is analysis of covariance. You can get your estimate of 'b' 
> for each data set as mu plus the appropriate solution of dataset. (While 
> you can parameterize the model as y= dataset + ..., it is not as easy to 
> interpret as the one using mu.)
> 
> The reason for using this type of model is that you can quantify the 
> variation between the data sets.

This sounds interesting, but my problem is much more mundane: what are
the "mu" or "dataset" quantities?

>>> dataset is the effect of the ith dataset - allows different intercepts
>>> for each data set
>>> dataset*a is the interaction between a and the dataset - allows
>>> different slopes for each dataset.
>>>      
>> I don't really understand what quantities you mean by "effect" and
>> "interaction", and why should I want to allow different slopes for each
>> dataset -the aim to fit one and only one slope from all datasets.
>>    
> The reason is that you can test that the slopes are the same and see if 
> any data sets appear unusual. If the slopes are the same then you are 
> back to what you wanted to know. Otherwise, you need to address why one 
> or more data sets are different from the others.

Agree. My point is however that the data sets fitting to the same slope
is an *assumption* that I have to make. Of course checking it is a good
idea, but again, I don't know what are the mathematical definitions of
the quantities you are talking about.

> Again it depends on the function because some of these do have 
> linearized forms or can be well approximated by a linear model.

A linear model cannot do it for sure, and I don't think it can be
linearized.

thanks...
m.


From josef.pktd at gmail.com  Thu Nov 12 11:55:40 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 12 Nov 2009 11:55:40 -0500
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <4AFC2407.1020902@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
	<4AFC2407.1020902@gmail.com>
Message-ID: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>

On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
> josef.pktd at gmail.com ha scritto:
>> On Wed, Nov 11, 2009 at 11:26 AM, ms <devicerandom at gmail.com> wrote:
>>> Let's start with a simple example. Imagine I have several linear data
>>> sets y=ax+b which have different b (all of them are known) but that
>>> should fit to the same (unknown) a. To have my best estimate of a, I
>>> would want to fit them all together. In this case it is trivial, you
>>> just subtract the known b from the data set and fit them all at the same
>>> time.
>>>
>>> In my case it is a bit different, in the sense that I have to do
>>> conceptually the same thing but for a highly non-linear equation where
>>> the equivalent of "b" above is not so simple to separate. I wonder
>>> therefore if there is a way to do a simultaneous fit of different
>>> equations differing only in the known parameters and having a single
>>> output, possibly with the help of ODR. Is this possible? And/or what
>>> should be the best thing to do, in general, for this kind of problems?
>>
>> I don't know enough about ODR, but for least squares, optimize.leastsq
>> or curve_fit, it seems you can just substitute any known parameters
>> into your equation.
>>
>> y_i = f(x_i, a, b_i) for each group i
>> plug in values for all b_i, gives reduced f(x_i, a) independent of
>> specific parameters
>> stack equations [y_i for all i] and [f(..) for all i]
>>
>> If you fit this in curve_fit you could also choose the weights, in
>> case the error variance differs by groups.
>>
>> Does this work or am I missing the point?
>
> Probably it's me missing it. Do you just mean to fit them all together
> separately and then make a weighted average of the fitted parameters,
> and using the standard deviation of the mean as the error of the fit? I
> am confused.

I meant stacking all equations into one big estimation problem y =
f(x,a) and minimize squared residual over all equations.
This assumes homoscedastic errors (identical noise in each equation).

an example
(quickly written and not optimized, there are parts I don't remember
about curve_fit, fixed parameters could be better handled by a class)

####################
"""stack equations with different known parameters

I didn't get curve_fit to work with only 1 parameter to estimate

Created on Thu Nov 12 11:17:21 2009
Author: josef-pktd
"""
import numpy as np
from scipy import optimize


def fsingle(a,c,b,x):
    return b*x**a + c

atrue = 1.
ctrue = 10.
b = np.array([[1.]*10, [2.]*10, [3.]*10])
b = np.array([1.,2.,3.])
x = np.random.uniform(size=(3,10))
y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])])
y += 0.1*np.random.normal(size=y.shape)

def fun(x,a,c):
    #b is taken from enclosing scope
    #print x.shape
    xx=x.reshape((3,10))
    return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])])

res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.]))

print 'true parameters   ', atrue, ctrue
print 'parameter estimate', res[0]
print 'standard deviation', np.sqrt(np.diag(res[1]))
####################


>
> sorry,
> m.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From devicerandom at gmail.com  Thu Nov 12 12:28:00 2009
From: devicerandom at gmail.com (ms)
Date: Thu, 12 Nov 2009 17:28:00 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
Message-ID: <4AFC45A0.30502@gmail.com>

josef.pktd at gmail.com ha scritto:

>>> Does this work or am I missing the point?
>> Probably it's me missing it. Do you just mean to fit them all together
>> separately and then make a weighted average of the fitted parameters,
>> and using the standard deviation of the mean as the error of the fit? I
>> am confused.
> 
> I meant stacking all equations into one big estimation problem y =
> f(x,a) and minimize squared residual over all equations.
> This assumes homoscedastic errors (identical noise in each equation).

Yes, that's what I want! Thanks. I am going to read and try your code
and see what I get and don't get of it. Thanks a lot :)

m.


> an example
> (quickly written and not optimized, there are parts I don't remember
> about curve_fit, fixed parameters could be better handled by a class)
> 
> ####################
> """stack equations with different known parameters
> 
> I didn't get curve_fit to work with only 1 parameter to estimate
> 
> Created on Thu Nov 12 11:17:21 2009
> Author: josef-pktd
> """
> import numpy as np
> from scipy import optimize
> 
> 
> def fsingle(a,c,b,x):
>     return b*x**a + c
> 
> atrue = 1.
> ctrue = 10.
> b = np.array([[1.]*10, [2.]*10, [3.]*10])
> b = np.array([1.,2.,3.])
> x = np.random.uniform(size=(3,10))
> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])])
> y += 0.1*np.random.normal(size=y.shape)
> 
> def fun(x,a,c):
>     #b is taken from enclosing scope
>     #print x.shape
>     xx=x.reshape((3,10))
>     return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])])
> 
> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.]))
> 
> print 'true parameters   ', atrue, ctrue
> print 'parameter estimate', res[0]
> print 'standard deviation', np.sqrt(np.diag(res[1]))
> ####################
> 
> 
> 
> 
>> sorry,
>> m.
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From dyamins at gmail.com  Thu Nov 12 13:23:33 2009
From: dyamins at gmail.com (Dan Yamins)
Date: Thu, 12 Nov 2009 13:23:33 -0500
Subject: [SciPy-User] Edge Detection
In-Reply-To: <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com>
References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>
	<4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu>
	<7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com>
Message-ID: <15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com>

On Thu, Nov 12, 2009 at 10:23 AM, Chris Colbert <sccolbert at gmail.com> wrote:

> All of the OpenCV edge detection routines are also available in
> scikits.image if you have opencv  (>= 2.0) installed.
>
> On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus <zachary.pincus at yale.edu>
> wrote:
>
> >
> > Code: Look at what's available in scipy.ndimage. There are functions for
> > getting gradient magnitudes, as well as standard filters like Sobel etc.
> > (which you'll learn about from the above), plus morphological operators
> for
> > modifying binarized image regions (e.g. like erosion etc.; useful for
> > getting rid of stray noise-induced edges), plus some basic functions for
> > image smoothing like median filters, etc.
> >
> > For exploratory analysis, you might want some ability to interactively
> > visualize images; you could use matplotlib or the imaging scikit, which
> is
> > still pre-release but making fast progress:
> > http://github.com/stefanv/scikits.image
> >
> > I've attached basic code for Canny edge detection, which should
> demonstrate
> > a bit about how ndimage works, plus it's useful in its own right. There
> is
> > also some code floating around for anisotropic diffusion and bilateral
> > filtering, which are two noise-reduction methods that can be better than
> > simple median filtering.
> >
>

Hi Chris and Zachary, thanks very much for your help.  I really appreciate
it.

My goal was to recognize linear (and circular) strokes in images of text.
After I wrote my question and did some further research, I realized that I
was so ignorant that I didn't know enough to properly ask for what I
wanted.  Finding strokes in letters is actually more like "line detection"
(as in "detecting lines as geometric features") than it is like edge
detection (e.g. something that the sobel operator does well).  I needed to
localize the lines and describe them in some geometric way, not so much
determine where their boundaries were.

What I ended up doing is using the Radon transform (scipy.misc.radon),
together with the hcluster package.  The basic idea is that applying Radon
transform to the image of a letter transforms the strokes into confined
blobs whose position and extent in the resulting point/angle space describes
the location, width, and angle of the original stroke.   Then, I make a
binary version of the transformed image by applying an indicated threshold
on intensity -- e.g. a 1 at all points in the transformed image whose
intensity are above the threshold, and 0 elsewhere.   Then, I cluster this
binary image, which ends up identifying clusters whose centroid and diameter
correspond to features of idealized strokes.      This algorithm seems to
work pretty well.

Thanks alot again for your help, the scipy.ndimage package really seems
great.  I read somewhere that the edge-detection routines will actually
become part of the next version of the package.  Is that still true?

Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/e1fe465f/attachment.html>

From sccolbert at gmail.com  Thu Nov 12 13:35:13 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Thu, 12 Nov 2009 19:35:13 +0100
Subject: [SciPy-User] Edge Detection
In-Reply-To: <15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com>
References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com>
	<4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu>
	<7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com>
	<15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com>
Message-ID: <7f014ea60911121035h73cb69e7pbdd0996253e2adb1@mail.gmail.com>

Dan,

You may also want to look into HOG features as well (Histogram of
oriented gradients). They are used quite often for shape
characterization, and with proper normalization, can become scale and
rotation invariant.

Glad to see you got something working!

Cheers,

Chris

On Thu, Nov 12, 2009 at 7:23 PM, Dan Yamins <dyamins at gmail.com> wrote:
>
> On Thu, Nov 12, 2009 at 10:23 AM, Chris Colbert <sccolbert at gmail.com> wrote:
>>
>> All of the OpenCV edge detection routines are also available in
>> scikits.image if you have opencv ?(>= 2.0) installed.
>>
>> On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus <zachary.pincus at yale.edu>
>> wrote:
>>
>> >
>> > Code: Look at what's available in scipy.ndimage. There are functions for
>> > getting gradient magnitudes, as well as standard filters like Sobel etc.
>> > (which you'll learn about from the above), plus morphological operators
>> > for
>> > modifying binarized image regions (e.g. like erosion etc.; useful for
>> > getting rid of stray noise-induced edges), plus some basic functions for
>> > image smoothing like median filters, etc.
>> >
>> > For exploratory analysis, you might want some ability to interactively
>> > visualize images; you could use matplotlib or the imaging scikit, which
>> > is
>> > still pre-release but making fast progress:
>> > http://github.com/stefanv/scikits.image
>> >
>> > I've attached basic code for Canny edge detection, which should
>> > demonstrate
>> > a bit about how ndimage works, plus it's useful in its own right. There
>> > is
>> > also some code floating around for anisotropic diffusion and bilateral
>> > filtering, which are two noise-reduction methods that can be better than
>> > simple median filtering.
>> >
>
> Hi Chris and Zachary, thanks very much for your help.? I really appreciate
> it.
>
> My goal was to recognize linear (and circular) strokes in images of text.
> After I wrote my question and did some further research, I realized that I
> was so ignorant that I didn't know enough to properly ask for what I
> wanted.? Finding strokes in letters is actually more like "line detection"
> (as in "detecting lines as geometric features") than it is like edge
> detection (e.g. something that the sobel operator does well).? I needed to
> localize the lines and describe them in some geometric way, not so much
> determine where their boundaries were.
>
> What I ended up doing is using the Radon transform (scipy.misc.radon),
> together with the hcluster package.? The basic idea is that applying Radon
> transform to the image of a letter transforms the strokes into confined
> blobs whose position and extent in the resulting point/angle space describes
> the location, width, and angle of the original stroke. ? Then, I make a
> binary version of the transformed image by applying an indicated threshold
> on intensity -- e.g. a 1 at all points in the transformed image whose
> intensity are above the threshold, and 0 elsewhere.?? Then, I cluster this
> binary image, which ends up identifying clusters whose centroid and diameter
> correspond to features of idealized strokes.????? This algorithm seems to
> work pretty well.
>
> Thanks alot again for your help, the scipy.ndimage package really seems
> great.? I read somewhere that the edge-detection routines will actually
> become part of the next version of the package.? Is that still true?
>
> Thanks,
> Dan
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From nwagner at iam.uni-stuttgart.de  Thu Nov 12 13:53:55 2009
From: nwagner at iam.uni-stuttgart.de (Nils Wagner)
Date: Thu, 12 Nov 2009 19:53:55 +0100
Subject: [SciPy-User] FAIL: test_lambertw.test_values
Message-ID: <web-126970910@uni-stuttgart.de>

Hi all,

Can someone reproduce the following failure with recent 
svn ?

======================================================================
FAIL: test_lambertw.test_values
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/home/nwagner/local/lib64/python2.6/site-packages/nose-0.11.2.dev-py2.6.egg/nose/case.py", 
line 183, in runTest
     self.test(*self.arg)
   File 
"/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/test_lambertw.py", 
line 80, in test_values
     FuncData(w, data, (0,1), 2, rtol=1e-10, 
atol=1e-13).check()
   File 
"/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/testutils.py", 
line 187, in check
     assert False, "\n".join(msg)
AssertionError:
Max |adiff|: 2.5797
Max |rdiff|: 3.81511
Bad results for the following points (in output 0):
(-0.44800000000000001+0.40000000000000002j) 
                             0j => 
(-1.2370928928166736-1.6588828572971359j) != 
(-0.11855133765652383+0.66570534313583418j)  (rdiff 
            3.8151122286225245)
  
Nils


From robfelty at gmail.com  Thu Nov 12 18:29:38 2009
From: robfelty at gmail.com (Robert Felty)
Date: Thu, 12 Nov 2009 16:29:38 -0700
Subject: [SciPy-User] specify libgfortran.dylib location
In-Reply-To: <mailman.11.1258048802.24896.scipy-user@scipy.org>
References: <mailman.11.1258048802.24896.scipy-user@scipy.org>
Message-ID: <5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com>

I've been trying to get scipy working on snow leopard for several  
weeks now. I have seen several blogs with lots of suggestions, but  
none have worked for me, until I finally figured it out today.
I kept getting an error that my libgfortan.dylib file was the wrong  
architecture. Here is the stack trace:
 >>> import scipy.stats
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- 
macosx-10.6-universal.egg/scipy/stats/__init__.py", line 7, in <module>
     from stats import *
   File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- 
macosx-10.6-universal.egg/scipy/stats/stats.py", line 198, in <module>
     import scipy.special as special
   File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- 
macosx-10.6-universal.egg/scipy/special/__init__.py", line 8, in  
<module>
     from basic import *
   File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- 
macosx-10.6-universal.egg/scipy/special/basic.py", line 8, in <module>
     from _cephes import *
ImportError: dlopen(/Library/Python/2.6/site-packages/ 
scipy-0.8.0.dev5975-py2.6-macosx-10.6-universal.egg/scipy/special/ 
_cephes.so, 2): Library not loaded: /usr/local/lib/libgfortran.2.dylib
   Referenced from: /Library/Python/2.6/site-packages/ 
scipy-0.8.0.dev5975-py2.6-macosx-10.6-universal.egg/scipy/special/ 
_cephes.so
   Reason: no suitable image found.  Did find:
	/usr/local/lib/libgfortran.2.dylib: mach-o, but wrong architecture
	/usr/local/lib/libgfortran.2.dylib: mach-o, but wrong architecture

I discovered today that I had several different libgfortran files:
/usr/local/lib/libgfortran.2.0.0.dylib
/usr/local/lib/libgfortran.2.dylib
/usr/local/lib/libgfortran.a
/usr/local/lib/libgfortran.dylib
/usr/local/lib/libgfortran.la
/usr/local/lib/ppc64/libgfortran.2.0.0.dylib
/usr/local/lib/ppc64/libgfortran.2.dylib
/usr/local/lib/ppc64/libgfortran.a
/usr/local/lib/ppc64/libgfortran.dylib
/usr/local/lib/ppc64/libgfortran.la
/usr/local/lib/x86_64/libgfortran.2.0.0.dylib
/usr/local/lib/x86_64/libgfortran.2.dylib
/usr/local/lib/x86_64/libgfortran.a
/usr/local/lib/x86_64/libgfortran.dylib
/usr/local/lib/x86_64/libgfortran.la

I tried copying the x86_64 file to /usr/local/lib, and now scipy  
works. However, this does not seem like the right way to do it. Is  
there a way to tell scipy that it should use the version in /usr/local/ 
lib/x86_64?
I am using system Python 2.6.1 on Mac 10.6.1, which scipy 0.8.0

Thanks in advance for any suggestions.

Rob


From robert.kern at gmail.com  Thu Nov 12 18:35:34 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 12 Nov 2009 17:35:34 -0600
Subject: [SciPy-User] specify libgfortran.dylib location
In-Reply-To: <5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com>
References: <mailman.11.1258048802.24896.scipy-user@scipy.org> 
	<5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com>
Message-ID: <3d375d730911121535t589b29a3y1e595bab8aa601e8@mail.gmail.com>

On Thu, Nov 12, 2009 at 17:29, Robert Felty <robfelty at gmail.com> wrote:
> I tried copying the x86_64 file to /usr/local/lib, and now scipy
> works. However, this does not seem like the right way to do it. Is
> there a way to tell scipy that it should use the version in /usr/local/
> lib/x86_64?
> I am using system Python 2.6.1 on Mac 10.6.1, which scipy 0.8.0

Where did you get your gfortran from?

If you are using the R gfortran binaries, my version appears to have
the files /usr/local/lib/x86_64/ as just symlinks back to the ones in
/usr/local/lib/. Perhaps you should upgrade to the most recent
release. Unless if you already are on the latest release and my
slightly older release is better.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From kevinar18 at hotmail.com  Thu Nov 12 22:41:14 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 12 Nov 2009 22:41:14 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
Message-ID: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>


The installer says:

"Python version 2.6 required, which was not found in the registry."

 
I did some searching and it seems like this may be a 32bit vs 64bit conflict.

 
I'm running Vista 64bit and Python 2.6.4 64bit.

 
Has anyone made an installer for 64bit Windows?
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/bb8010c0/attachment.html>

From sturla at molden.no  Thu Nov 12 23:05:26 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 05:05:26 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
Message-ID: <4AFCDB06.9000105@molden.no>

Kevin Ar18 skrev:
> Has anyone made an installer for 64bit Windows?
I tried the installer for NumPy ... no luck. NumPy segfaulted on import. 
I guess there is a good reason the release notes says "highly 
experimental". SciPy would be even further away from 64 bit support. I 
guess if you really need 64 bit, you should use Linux or Mac. In a 
perfect world, NumPy and SciPy would run on 64-bit Windows 7 and Python 
3.1. But in the real world, we have to use Python 2.6.4 in  32-bit mode 
for stability.


From dwf at cs.toronto.edu  Thu Nov 12 23:09:35 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 12 Nov 2009 23:09:35 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
Message-ID: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>

On 12-Nov-09, at 10:41 PM, Kevin Ar18 wrote:

> The installer says:
> "Python version 2.6 required, which was not found in the registry."
>
> I did some searching and it seems like this may be a 32bit vs 64bit  
> conflict.
>
> I'm running Vista 64bit and Python 2.6.4 64bit.
>
> Has anyone made an installer for 64bit Windows?

So far, no. NumPy is relatively easy to build yourself I think, see:

	http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html

SciPy a little less so, thanks to the mess created by the Fortran  
situation on Windows. As of June 2009, SciPy could be compiled with  
gfortran-mingw but would crash randomly, and there had been no success  
in debugging why.

	http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043571.html

I'd suggest using 32-bit Python/NumPy/SciPy unless you really need it.

David


From cournape at gmail.com  Thu Nov 12 23:21:18 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 13 Nov 2009 13:21:18 +0900
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
Message-ID: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>

On Fri, Nov 13, 2009 at 1:09 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:

>
> So far, no. NumPy is relatively easy to build yourself I think, see:
>
> ? ? ? ?http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html
>

Thanks to Enthought support, I have fixed all major *sources* issues
so that both numpy and scipy can be build under VS 2008 + ifort
combination for windows 64. The build process is complicated though,
and require to have the MKL. I have not made much progress on building
numpy and scipy with open source tools, though.

David


From kevinar18 at hotmail.com  Thu Nov 12 23:28:17 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 12 Nov 2009 23:28:17 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
Message-ID: <SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>


> So far, no. NumPy is relatively easy to build yourself I think, see:
> 
> 	http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html
My version of MSVC only compiles to 32bit (it's free after all).
This only makes me wish more that CLANG C++ support was complete and it could be used to compile all Python modules on the fly in Windows. :)

Oh, wait, I need a Fortran compiler?  What are my options?

> SciPy a little less so, thanks to the mess created by the Fortran  
> situation on Windows. As of June 2009, SciPy could be compiled with  
> gfortran-mingw but would crash randomly, and there had been no success  
> in debugging why.
> 
> 	http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043571.html
> 
> I'd suggest using 32-bit Python/NumPy/SciPy unless you really need it.

I only need Numpy -- which from what I am seeing may be possible to do?
As for going back to 32bit, well that would be kind of a mess.  I've have to figure out what to do with my current Python install, then I'd have to re-install the other Python modules I need, etc....  It's possible, I guess if there is no other option.


Here's what I am currently using:
Vista 64bit
Python 2.6.4 64bit
so, I need one for Python 2.6.4 64bit
 		 	   		  
_________________________________________________________________
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/1e265d9b/attachment.html>

From kevinar18 at hotmail.com  Thu Nov 12 23:30:07 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 12 Nov 2009 23:30:07 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
Message-ID: <SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>


> > So far, no. NumPy is relatively easy to build yourself I think, see:
> >
> >        http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html
> >
> 
> Thanks to Enthought support, I have fixed all major *sources* issues
> so that both numpy and scipy can be build under VS 2008 + ifort
> combination for windows 64. The build process is complicated though,
> and require to have the MKL. I have not made much progress on building
> numpy and scipy with open source tools, though.

Is numpy pure C?  If so, would CLANG work?   http://clang.llvm.org/
 		 	   		  
_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/171222985/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/fb8ba9a2/attachment.html>

From sturla at molden.no  Thu Nov 12 23:36:05 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 05:36:05 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
Message-ID: <4AFCE235.803@molden.no>

Kevin Ar18 skrev:
>
> Is numpy pure C?  
C and Python.

> If so, would CLANG work?   http://clang.llvm.org/
Probably not.

Also beware of the CRT issue.


From dwf at cs.toronto.edu  Thu Nov 12 23:36:21 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 12 Nov 2009 23:36:21 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
Message-ID: <46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu>

On 12-Nov-09, at 11:21 PM, David Cournapeau wrote:

> Thanks to Enthought support, I have fixed all major *sources* issues
> so that both numpy and scipy can be build under VS 2008 + ifort
> combination for windows 64. The build process is complicated though,
> and require to have the MKL. I have not made much progress on building
> numpy and scipy with open source tools, though.

Well, that's encouraging; at least there's SOME way to do it. Cheers  
to Enthought, and of course to Mr. Cournapeau. :) Does it require  
static linking, or is there a possibility (down the road, of course)  
that binaries can be built that dynamically link against a (separately  
installed) MKL?

Also, any chance you have the build instructions written down  
somewhere?  This is what we have for the Windows build instructions on  
the new website:

	http://projects.scipy.org/scipy/browser/scipy.org/source/building/windows.rst

I'm sure it is far from complete, and mentions nothing about 64-bit.

David


From sturla at molden.no  Thu Nov 12 23:44:19 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 05:44:19 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>
Message-ID: <4AFCE423.6000300@molden.no>

Kevin Ar18 skrev:
>
> Oh, wait, I need a Fortran compiler?  What are my options?

Only for SciPy. Your options are e.g. Intel (ifort), GNU (gfortran), 
Absoft, Lahey, or NAG.


From dwf at cs.toronto.edu  Thu Nov 12 23:47:05 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 12 Nov 2009 23:47:05 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
Message-ID: <A5634C18-00BE-45EE-B4B7-3F79CCAC30B6@cs.toronto.edu>

On 12-Nov-09, at 11:30 PM, Kevin Ar18 wrote:

> Is numpy pure C?  If so, would CLANG work?   http://clang.llvm.org/

The other David is the authority on these matters but I would imagine  
not, unless you also had somehow compiled a custom Python through  
that. Then maybe you'd have a shot. I think your only option is MSVC  
on 64-bit Windows.

David


From sturla at molden.no  Thu Nov 12 23:58:10 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 05:58:10 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>
Message-ID: <4AFCE762.3030907@molden.no>

Kevin Ar18 skrev:
> I only need Numpy -- which from what I am seeing may be possible to do?
If you have a processor that supports Intel VT-X or AMD-V technology, 
you can always download  Sun VirtualBox for free (PUEL license) and 
install a 64-bit Linux or OpenSolaris on it.


From kevinar18 at hotmail.com  Thu Nov 12 23:58:38 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 12 Nov 2009 23:58:38 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <A5634C18-00BE-45EE-B4B7-3F79CCAC30B6@cs.toronto.edu>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, ,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>,
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>,
	<A5634C18-00BE-45EE-B4B7-3F79CCAC30B6@cs.toronto.edu>
Message-ID: <SNT110-W1168CD3D3D106C61642AEBAAA80@phx.gbl>


> > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/
> 
> The other David is the authority on these matters but I would imagine 
> not, unless you also had somehow compiled a custom Python through 
> that. Then maybe you'd have a shot. I think your only option is MSVC 
> on 64-bit Windows.
> 
> David
Is this the same David that said it would be easy to compile numpy for Win64 in that other link?

 
Since I only need numpy, I'd be interested if anybody has made one for Win64 and just not uploaded it yet. :)
 		 	   		  
_________________________________________________________________
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091112/57995753/attachment.html>

From kevinar18 at hotmail.com  Fri Nov 13 00:02:27 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Fri, 13 Nov 2009 00:02:27 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <4AFCE235.803@molden.no>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, ,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>,
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>,
	<4AFCE235.803@molden.no>
Message-ID: <SNT110-W8026E994980261556E374AAA80@phx.gbl>


> > Is numpy pure C? 
> C and Python.
> 
> > If so, would CLANG work? http://clang.llvm.org/
> Probably not.
> 
> Also beware of the CRT issue.
Parden the fact that I don't know much about this.  The only c++ program I've compiled was to convert a Python program I made to something faster. :)  I know nothing about the CRT issue.

 
Anyways, I want to ask an off topic question....

Does this CRT issue only apply because Python itself was compiled in MSVC?

 
In theory, would it be possible to eventually create a version of Python that compiled in Clang (after it finishes C++ support), and thus, be able to eventually also compile modules like numpy in Clang?  Or does the problem go much deeper (unrelated to Python)?
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/d7aec5d5/attachment.html>

From kevinar18 at hotmail.com  Fri Nov 13 00:09:18 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Fri, 13 Nov 2009 00:09:18 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <4AFCE762.3030907@molden.no>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>,
	<4AFCE762.3030907@molden.no>
Message-ID: <SNT110-W408870149C044A7632F9B4AAA80@phx.gbl>


> > I only need Numpy -- which from what I am seeing may be possible to do?
> If you have a processor that supports Intel VT-X or AMD-V technology, 
> you can always download Sun VirtualBox for free (PUEL license) and 
> install a 64-bit Linux or OpenSolaris on it.
Yeah, "personal use" wouldn't work.  It's really more trouble to setup an entirely new system for it all.

 
Basically, I have Python like I want it right now and just wanted to add another module which required numpy.  If I can't get a numpy for 64bit, I can alway install Python 32bit and then re-install the extra modules/add-ons that I've added to it; but, of course, that's probably also gonna take some time to do.

 
Still, I do want to thank you all for trying to help out.  I do appreciate it. :)
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/7dca7e71/attachment.html>

From sturla at molden.no  Fri Nov 13 00:11:03 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 06:11:03 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W8026E994980261556E374AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, ,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>,
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>,
	<4AFCE235.803@molden.no> <SNT110-W8026E994980261556E374AAA80@phx.gbl>
Message-ID: <4AFCEA67.8020208@molden.no>

Kevin Ar18 skrev:
> Anyways, I want to ask an off topic question....
> Does this CRT issue only apply because Python itself was compiled in MSVC?
You always have to link extensions with the same C runtime as Python, 
otherwise bad things might happen.

To be precise: you cannot share resources between different CRTs. E.g. 
you cannot fopen a FILE* with one CRT and fread on the FILE* with another.

If you link with a different CRT than Python, a function like 
numpy.fromfile can mess up badly.

>  
> In theory, would it be possible to eventually create a version of 
> Python that compiled in Clang (after it finishes C++ support), and 
> thus, be able to eventually also compile modules like numpy in Clang?  
> Or does the problem go much deeper (unrelated to Python)?
I don't know which libraries clang link. But very likely ... if you want 
to build NumPy with clang, you also have to build Python with clang.


From sturla at molden.no  Fri Nov 13 00:14:13 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 06:14:13 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W408870149C044A7632F9B4AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>,
	<4AFCE762.3030907@molden.no>
	<SNT110-W408870149C044A7632F9B4AAA80@phx.gbl>
Message-ID: <4AFCEB25.5080309@molden.no>

Kevin Ar18 skrev:
> Yeah, "personal use" wouldn't work.  It's really more trouble to setup 
> an entirely new system for it all.
>  
That is confusing. "Personal use" in Sun's license also covers academic 
use and commercial use. What is does not cover is automated  mass 
deployment in an organisation, except academic institutions.


From sturla at molden.no  Fri Nov 13 00:18:48 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 06:18:48 +0100
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <4AFCEB25.5080309@molden.no>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>, ,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>,
	<4AFCE762.3030907@molden.no>	<SNT110-W408870149C044A7632F9B4AAA80@phx.gbl>
	<4AFCEB25.5080309@molden.no>
Message-ID: <4AFCEC38.3080606@molden.no>

Sturla Molden skrev:
> Kevin Ar18 skrev:
>   
>> Yeah, "personal use" wouldn't work.  It's really more trouble to setup 
>> an entirely new system for it all.
>>  
>>     
> That is confusing. "Personal use" in Sun's license also covers academic 
> use and commercial use. What is does not cover is automated  mass 
> deployment in an organisation, except academic institutions.
>
>   


http://www.virtualbox.org/wiki/Licensing_FAQ

   6. *What exactly do you mean by /personal use/ and /academic use/ in
      the Personal Use and Evaluation License </wiki/VirtualBox_PUEL>?*

    Personal use is when you install the product on one or more PCs
    yourself and you make use of it (or even your friend, sister and
    grandmother). It doesn't matter whether you just use it for fun or
    run your multi-million euro business with it. Also, if you install
    it on your work PC at some large company, this is still personal
    use. However, if you are an administrator and want to deploy it to
    the 500 desktops in your company, this would no longer qualify
    as /personal use/. Well, you could ask each of your 500 employees to
    install VirtualBox but don't you think we deserve some money in this
    case? We'd even assist you with any issue you might have.

    Use at academic institutions such as schools, colleges and
    universities by both teachers and students is covered. So in
    addition to the personal use which is always permitted, academic
    institutions may also choose to roll out the software in an
    automated way to make it available to its students and personnel.


From kevinar18 at hotmail.com  Fri Nov 13 00:48:03 2009
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Fri, 13 Nov 2009 00:48:03 -0500
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <4AFCEC38.3080606@molden.no>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	, , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, ,
	<SNT110-W280F7F1FE5DBBA7884DB2DAAA80@phx.gbl>, ,
	<4AFCE762.3030907@molden.no>
	<SNT110-W408870149C044A7632F9B4AAA80@phx.gbl>,
	<4AFCEB25.5080309@molden.no>, <4AFCEC38.3080606@molden.no>
Message-ID: <SNT110-W3287C21382700EFB7195DBAAA80@phx.gbl>


> Sturla Molden skrev:
> > Kevin Ar18 skrev:
> > 
> >> Yeah, "personal use" wouldn't work. It's really more trouble to setup 
> >> an entirely new system for it all.
> >> 
> >> 
> > That is confusing. "Personal use" in Sun's license also covers academic 
> > use and commercial use. What is does not cover is automated mass 
> > deployment in an organisation, except academic institutions.
> >
> > 
> 
> 
> http://www.virtualbox.org/wiki/Licensing_FAQ
> 
> 6. *What exactly do you mean by /personal use/ and /academic use/ in
> the Personal Use and Evaluation License </wiki/VirtualBox_PUEL>?*
> 
> Personal use is when you install the product on one or more PCs
> yourself and you make use of it (or even your friend, sister and
> grandmother). It doesn't matter whether you just use it for fun or
> run your multi-million euro business with it. Also, if you install
> it on your work PC at some large company, this is still personal
> use. However, if you are an administrator and want to deploy it to
> the 500 desktops in your company, this would no longer qualify
> as /personal use/. Well, you could ask each of your 500 employees to
> install VirtualBox but don't you think we deserve some money in this
> case? We'd even assist you with any issue you might have.
Wow, well thanks for clearing that up.  I would have never thought that is what they meant by personal use. :)  Maybe I'll even use it for something fun one day. 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with Microsoft?s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141664/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/dc75bee2/attachment.html>

From david at ar.media.kyoto-u.ac.jp  Fri Nov 13 00:28:17 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 13 Nov 2009 14:28:17 +0900
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>,
	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>,
	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
	<SNT110-W57AE8CF8ACA78B5E7EADB0AAA80@phx.gbl>
Message-ID: <4AFCEE71.1050405@ar.media.kyoto-u.ac.jp>

Kevin Ar18 wrote:
>
>
> Is numpy pure C?  If so, would CLANG work?   http://clang.llvm.org/

I doubt clang is well supported on windows. llvm itself has some issues
on windows AFAIK. Since the project is backed up by Apple, I don't think
there is a strong incentive to make this works well on windows.

To build numpy on windows 64 bits, you need only one software besides
python (which has to be 2.6), and that's Visual Studio 2008. If you are
willing to spend a lot of time, you can download various packages from
MS to get the free 64 bits compilers, but that's a lot of work compared
to just using VS 2008. Note that you can download a fully functional
trial edition of VS 2008 for free.

David


From david at ar.media.kyoto-u.ac.jp  Fri Nov 13 00:43:22 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 13 Nov 2009 14:43:22 +0900
Subject: [SciPy-User] Is there a Win 64bit version?
In-Reply-To: <46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu>
References: <SNT110-W8C7A0A33D7FF2B66ACB45AAA80@phx.gbl>	<81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>	<5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>
	<46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu>
Message-ID: <4AFCF1FA.3010706@ar.media.kyoto-u.ac.jp>

David Warde-Farley wrote:
> Well, that's encouraging; at least there's SOME way to do it. Cheers  
> to Enthought, and of course to Mr. Cournapeau. :) Does it require  
> static linking, or is there a possibility (down the road, of course)  
> that binaries can be built that dynamically link against a (separately  
> installed) MKL?
>   

It always require some dll, even if you compile the MKL statically.

> Also, any chance you have the build instructions written down  
> somewhere?  This is what we have for the Windows build instructions on  
> the new website:
>
> 	http://projects.scipy.org/scipy/browser/scipy.org/source/building/windows.rst
>
> I'm sure it is far from complete, and mentions nothing about 64-bit.

The necessary tools have not all been released.

David


From dineshbvadhia at hotmail.com  Fri Nov 13 03:30:40 2009
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Fri, 13 Nov 2009 00:30:40 -0800
Subject: [SciPy-User] Is there a Win 64bit version?
Message-ID: <COL103-DS3C98CFA83764C49C0762BA3A80@phx.gbl>

Ditto

I'd be over-the-moon if someone has a working Numpy for Windows 64-bit and are willing to share it.  Oh, yes ... bring it on!

Dinesh


--------------------------------------------------------------------------------

Message: 10
Date: Thu, 12 Nov 2009 23:58:38 -0500
From: Kevin Ar18 <kevinar18 at hotmail.com>
Subject: Re: [SciPy-User] Is there a Win 64bit version?
To: <scipy-user at scipy.org>
Message-ID: <SNT110-W1168CD3D3D106C61642AEBAAA80 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"


> > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/
> 
> The other David is the authority on these matters but I would imagine 
> not, unless you also had somehow compiled a custom Python through 
> that. Then maybe you'd have a shot. I think your only option is MSVC 
> on 64-bit Windows.
> 
> David
Is this the same David that said it would be easy to compile numpy for Win64 in that other link?

Since I only need numpy, I'd be interested if anybody has made one for Win64 and just not uploaded it yet. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/20f4c581/attachment.html>

From dineshbvadhia at hotmail.com  Fri Nov 13 04:02:06 2009
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Fri, 13 Nov 2009 01:02:06 -0800
Subject: [SciPy-User] Is there a Win 64bit version?
Message-ID: <COL103-DS104D9F73E25B0A525020E5A3A80@phx.gbl>

Good to know about the trial version of VS 2008.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/26717bfb/attachment.html>

From dave.hirschfeld at gmail.com  Fri Nov 13 04:14:26 2009
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Fri, 13 Nov 2009 09:14:26 +0000 (UTC)
Subject: [SciPy-User] scikits.timeseries concatenate
Message-ID: <loom.20091112T115923-387@post.gmane.org>


It appears that when remove_duplicates is True (the default) ts.concatenate
doesn't respect the dimensions of the data array c.f.

In [1]: ts1 =
ts.time_series(array([[1,2]]).repeat(10,axis=0),
               start_date=ts.Date('D','01-Jan-2009'))

In [2]: ts2 =
ts.time_series(array([[3,4]]).repeat(10,axis=0),
               start_date=ts.Date('D','11-Jan-2009'))

In [3]: ts.concatenate([ts1,ts2],axis=0,remove_duplicates=False)
Out[3]:
timeseries(
 [[1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [1 2]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]
 [3 4]],
    dates =
 [01-Jan-2009 ... 20-Jan-2009],
    freq  = D)

In [4]: ts.concatenate([ts1,ts2],axis=0)
Out[4]:
timeseries([1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2],
   dates = [01-Jan-2009 ... 20-Jan-2009],
   freq  = D)

?!?

I think the fix is to pass the axis parameter as follows:

        result = ts.time_series(ndata.compress(orig,axis=axis),
                             dates=ndates.compress(orig),freq=common_f)

I haven't had time to test this thoroughly (execpt that it works for me) but
thought I'd post it before the next bug-fix release got out.

HTH,
Dave


From anderse at gmx.de  Fri Nov 13 07:12:51 2009
From: anderse at gmx.de (Raimund Andersen)
Date: Fri, 13 Nov 2009 13:12:51 +0100
Subject: [SciPy-User] Interpolate: Derivatives of parametric splines
In-Reply-To: <5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu>
References: <20091112113530.279300@gmx.net>
	<A756DA20-5870-4F5B-AA1B-2378F3FAA3B1@yale.edu>
	<20091112134456.279300@gmx.net>
	<5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu>
Message-ID: <20091113121251.239630@gmx.net>

Hello Zach,

once again thank you so much for your long and detailed answer.

After testing this with many different kinds of input data, I really like it. Of course you are right, parametric splines are much less exact near the endpoints compared to the nonparametric ones. I didn't thought about that.
Indeed, with 20 points of input data, even the second derivative is about 0 near the endpoints, and I am mainly interested in finding the zeros of the second derivative, so here I don't care much about how the slope goes in between.
Maybe I can make use of the difference between x and (dx/du*u) as a kind of fuzzy quality rate; comparing (dx/du) between the first and the second derivative could bring some insights, too. 

This was really the help I needed, also thanks to you, Warren.

Raimund
-- 
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


From devicerandom at gmail.com  Fri Nov 13 10:28:51 2009
From: devicerandom at gmail.com (ms)
Date: Fri, 13 Nov 2009 15:28:51 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
Message-ID: <4AFD7B33.1040904@gmail.com>

josef.pktd at gmail.com ha scritto:
> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>> josef.pktd at gmail.com ha scritto:
> an example
> (quickly written and not optimized, there are parts I don't remember
> about curve_fit, fixed parameters could be better handled by a class)

Hmm, it seems I don't have curve_fit -I am constrained to use
scipy-0.6.0 and there's no chance to change that (it's an external server).

I am going to have a good look at what's doable with your approach
anyway, but I am happy if someone gives me old-school alternatives :)

cheers,
m.

> ####################
> """stack equations with different known parameters
> 
> I didn't get curve_fit to work with only 1 parameter to estimate
> 
> Created on Thu Nov 12 11:17:21 2009
> Author: josef-pktd
> """
> import numpy as np
> from scipy import optimize
> 
> 
> def fsingle(a,c,b,x):
>     return b*x**a + c
> 
> atrue = 1.
> ctrue = 10.
> b = np.array([[1.]*10, [2.]*10, [3.]*10])
> b = np.array([1.,2.,3.])
> x = np.random.uniform(size=(3,10))
> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])])
> y += 0.1*np.random.normal(size=y.shape)
> 
> def fun(x,a,c):
>     #b is taken from enclosing scope
>     #print x.shape
>     xx=x.reshape((3,10))
>     return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])])
> 
> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.]))
> 
> print 'true parameters   ', atrue, ctrue
> print 'parameter estimate', res[0]
> print 'standard deviation', np.sqrt(np.diag(res[1]))
> ####################
> 
> 
> 
> 
>> sorry,
>> m.
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From josef.pktd at gmail.com  Fri Nov 13 11:00:49 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 13 Nov 2009 11:00:49 -0500
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <4AFD7B33.1040904@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
	<4AFD7B33.1040904@gmail.com>
Message-ID: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>

On Fri, Nov 13, 2009 at 10:28 AM, ms <devicerandom at gmail.com> wrote:
> josef.pktd at gmail.com ha scritto:
>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>> josef.pktd at gmail.com ha scritto:
>> an example
>> (quickly written and not optimized, there are parts I don't remember
>> about curve_fit, fixed parameters could be better handled by a class)
>
> Hmm, it seems I don't have curve_fit -I am constrained to use
> scipy-0.6.0 and there's no chance to change that (it's an external server).

You can just copy the function (plus 2 helper functions) from the
current trunk. You would need to add the imports. Alternatively you
can just use optimize.leastsq directly, using curve_fit as a recipe.

Josef

http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py#L338

338	def _general_function(params, xdata, ydata, function):
339	    return function(xdata, *params) - ydata
340	
341	def _weighted_general_function(params, xdata, ydata, function, weights):
342	    return weights * (function(xdata, *params) - ydata)
343	
344	def curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw):


>
> I am going to have a good look at what's doable with your approach
> anyway, but I am happy if someone gives me old-school alternatives :)
>
> cheers,
> m.
>
>> ####################
>> """stack equations with different known parameters
>>
>> I didn't get curve_fit to work with only 1 parameter to estimate
>>
>> Created on Thu Nov 12 11:17:21 2009
>> Author: josef-pktd
>> """
>> import numpy as np
>> from scipy import optimize
>>
>>
>> def fsingle(a,c,b,x):
>> ? ? return b*x**a + c
>>
>> atrue = 1.
>> ctrue = 10.
>> b = np.array([[1.]*10, [2.]*10, [3.]*10])
>> b = np.array([1.,2.,3.])
>> x = np.random.uniform(size=(3,10))
>> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])])
>> y += 0.1*np.random.normal(size=y.shape)
>>
>> def fun(x,a,c):
>> ? ? #b is taken from enclosing scope
>> ? ? #print x.shape
>> ? ? xx=x.reshape((3,10))
>> ? ? return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])])
>>
>> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.]))
>>
>> print 'true parameters ? ', atrue, ctrue
>> print 'parameter estimate', res[0]
>> print 'standard deviation', np.sqrt(np.diag(res[1]))
>> ####################
>>
>>
>>
>>
>>> sorry,
>>> m.
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From devicerandom at gmail.com  Fri Nov 13 11:30:07 2009
From: devicerandom at gmail.com (ms)
Date: Fri, 13 Nov 2009 16:30:07 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>	<4AFC2407.1020902@gmail.com>	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>	<4AFD7B33.1040904@gmail.com>
	<1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
Message-ID: <4AFD898F.2040308@gmail.com>

josef.pktd at gmail.com ha scritto:
> On Fri, Nov 13, 2009 at 10:28 AM, ms <devicerandom at gmail.com> wrote:
>> josef.pktd at gmail.com ha scritto:
>>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>>> josef.pktd at gmail.com ha scritto:
>>> an example
>>> (quickly written and not optimized, there are parts I don't remember
>>> about curve_fit, fixed parameters could be better handled by a class)
>> Hmm, it seems I don't have curve_fit -I am constrained to use
>> scipy-0.6.0 and there's no chance to change that (it's an external server).
> 
> You can just copy the function (plus 2 helper functions) from the
> current trunk. You would need to add the imports. Alternatively you
> can just use optimize.leastsq directly, using curve_fit as a recipe.

Thanks, but I've seen that with a bit of tweaking it works good with ODR
too. Thanks a lot, really nice trick! A polished version of it should go
in the cookbook in my opinion.

thanks!
m.

> 
> Josef
> 
> http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py#L338
> 
> 338	def _general_function(params, xdata, ydata, function):
> 339	    return function(xdata, *params) - ydata
> 340	
> 341	def _weighted_general_function(params, xdata, ydata, function, weights):
> 342	    return weights * (function(xdata, *params) - ydata)
> 343	
> 344	def curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw):
> 
> 
> 
>> I am going to have a good look at what's doable with your approach
>> anyway, but I am happy if someone gives me old-school alternatives :)
>>
>> cheers,
>> m.
>>
>>> ####################
>>> """stack equations with different known parameters
>>>
>>> I didn't get curve_fit to work with only 1 parameter to estimate
>>>
>>> Created on Thu Nov 12 11:17:21 2009
>>> Author: josef-pktd
>>> """
>>> import numpy as np
>>> from scipy import optimize
>>>
>>>
>>> def fsingle(a,c,b,x):
>>>     return b*x**a + c
>>>
>>> atrue = 1.
>>> ctrue = 10.
>>> b = np.array([[1.]*10, [2.]*10, [3.]*10])
>>> b = np.array([1.,2.,3.])
>>> x = np.random.uniform(size=(3,10))
>>> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])])
>>> y += 0.1*np.random.normal(size=y.shape)
>>>
>>> def fun(x,a,c):
>>>     #b is taken from enclosing scope
>>>     #print x.shape
>>>     xx=x.reshape((3,10))
>>>     return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])])
>>>
>>> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.]))
>>>
>>> print 'true parameters   ', atrue, ctrue
>>> print 'parameter estimate', res[0]
>>> print 'standard deviation', np.sqrt(np.diag(res[1]))
>>> ####################
>>>
>>>
>>>
>>>
>>>> sorry,
>>>> m.
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From devicerandom at gmail.com  Fri Nov 13 11:44:35 2009
From: devicerandom at gmail.com (ms)
Date: Fri, 13 Nov 2009 16:44:35 +0000
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>	<4AFC2407.1020902@gmail.com>	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>	<4AFD7B33.1040904@gmail.com>
	<1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
Message-ID: <4AFD8CF3.7090508@gmail.com>

josef.pktd at gmail.com ha scritto:
> On Fri, Nov 13, 2009 at 10:28 AM, ms <devicerandom at gmail.com> wrote:
>> josef.pktd at gmail.com ha scritto:
>>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>>> josef.pktd at gmail.com ha scritto:
>>> an example
>>> (quickly written and not optimized, there are parts I don't remember
>>> about curve_fit, fixed parameters could be better handled by a class)
>> Hmm, it seems I don't have curve_fit -I am constrained to use
>> scipy-0.6.0 and there's no chance to change that (it's an external server).
> 
> You can just copy the function (plus 2 helper functions) from the
> current trunk. You would need to add the imports. Alternatively you
> can just use optimize.leastsq directly, using curve_fit as a recipe.

A further question: It seems to me it works only if the data sets have
the same size, because what gets minimized is then the matrix. What
about datasets with different sizes?

thanks,
m.


From oliphant at enthought.com  Fri Nov 13 11:56:21 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Fri, 13 Nov 2009 10:56:21 -0600
Subject: [SciPy-User] ODR fitting several equations to the
	same	parameters
In-Reply-To: <4AFD7B33.1040904@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
	<4AFD7B33.1040904@gmail.com>
Message-ID: <8530A83C-7525-4928-8C69-65E7249452F7@enthought.com>


On Nov 13, 2009, at 9:28 AM, ms wrote:

> josef.pktd at gmail.com ha scritto:
>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>> josef.pktd at gmail.com ha scritto:
>> an example
>> (quickly written and not optimized, there are parts I don't remember
>> about curve_fit, fixed parameters could be better handled by a class)
>
> Hmm, it seems I don't have curve_fit -I am constrained to use
> scipy-0.6.0 and there's no chance to change that (it's an external  
> server).
>

The code for curve_fit is pure python.    You can grab it from the  
scipy trunk and just insert it into your own code.

http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py


Get lines 338 through 430

and make sure the module you put them in also has the lines:

from scipy.optimize import leastsq
from numpy import isscalar, asarray


-Travis


From josef.pktd at gmail.com  Fri Nov 13 12:18:35 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 13 Nov 2009 12:18:35 -0500
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <4AFD8CF3.7090508@gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
	<4AFD7B33.1040904@gmail.com>
	<1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
	<4AFD8CF3.7090508@gmail.com>
Message-ID: <1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com>

On Fri, Nov 13, 2009 at 11:44 AM, ms <devicerandom at gmail.com> wrote:
> josef.pktd at gmail.com ha scritto:
>> On Fri, Nov 13, 2009 at 10:28 AM, ms <devicerandom at gmail.com> wrote:
>>> josef.pktd at gmail.com ha scritto:
>>>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>>>> josef.pktd at gmail.com ha scritto:
>>>> an example
>>>> (quickly written and not optimized, there are parts I don't remember
>>>> about curve_fit, fixed parameters could be better handled by a class)
>>> Hmm, it seems I don't have curve_fit -I am constrained to use
>>> scipy-0.6.0 and there's no chance to change that (it's an external server).
>>
>> You can just copy the function (plus 2 helper functions) from the
>> current trunk. You would need to add the imports. Alternatively you
>> can just use optimize.leastsq directly, using curve_fit as a recipe.
>
> A further question: It seems to me it works only if the data sets have
> the same size, because what gets minimized is then the matrix. What
> about datasets with different sizes?

In the example, I just did the stacking based on the 2d array to have
it quickly written, for unequal sized data groups it is easier to work
directly with the stacked array, and just index into it, or for
example create a `b` array that has the values repeated corresponding
to the group sizes. (Same story as with balanced versus unbalance
panels.)

Do you have a non-linear ODR example? I didn't even know ODR can do
non-linear parameter estimation.

Josef


>
> thanks,
> m.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Fri Nov 13 12:21:14 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 13 Nov 2009 12:21:14 -0500
Subject: [SciPy-User] ODR fitting several equations to the same
	parameters
In-Reply-To: <1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com>
References: <4AFAE5AF.3020506@gmail.com>
	<1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com>
	<4AFC2407.1020902@gmail.com>
	<1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com>
	<4AFD7B33.1040904@gmail.com>
	<1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com>
	<4AFD8CF3.7090508@gmail.com>
	<1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com>
Message-ID: <1cd32cbb0911130921y5e568718uf78f4c22227130c6@mail.gmail.com>

On Fri, Nov 13, 2009 at 12:18 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Nov 13, 2009 at 11:44 AM, ms <devicerandom at gmail.com> wrote:
>> josef.pktd at gmail.com ha scritto:
>>> On Fri, Nov 13, 2009 at 10:28 AM, ms <devicerandom at gmail.com> wrote:
>>>> josef.pktd at gmail.com ha scritto:
>>>>> On Thu, Nov 12, 2009 at 10:04 AM, ms <devicerandom at gmail.com> wrote:
>>>>>> josef.pktd at gmail.com ha scritto:
>>>>> an example
>>>>> (quickly written and not optimized, there are parts I don't remember
>>>>> about curve_fit, fixed parameters could be better handled by a class)
>>>> Hmm, it seems I don't have curve_fit -I am constrained to use
>>>> scipy-0.6.0 and there's no chance to change that (it's an external server).
>>>
>>> You can just copy the function (plus 2 helper functions) from the
>>> current trunk. You would need to add the imports. Alternatively you
>>> can just use optimize.leastsq directly, using curve_fit as a recipe.
>>
>> A further question: It seems to me it works only if the data sets have
>> the same size, because what gets minimized is then the matrix. What
>> about datasets with different sizes?
>
> In the example, I just did the stacking based on the 2d array to have
> it quickly written, for unequal sized data groups it is easier to work
> directly with the stacked array, and just index into it, or for
> example create a `b` array that has the values repeated corresponding
> to the group sizes. (Same story as with balanced versus unbalance
> panels.)
>
> Do you have a non-linear ODR example? I didn't even know ODR can do
> non-linear parameter estimation.

Or maybe I knew it and just forgot about it.

>
> Josef
>
>
>
>>
>> thanks,
>> m.
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From d.l.goldsmith at gmail.com  Fri Nov 13 15:23:10 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 13 Nov 2009 12:23:10 -0800
Subject: [SciPy-User] For DavidC,
	relevant to Windoze in general: BLAS/LAPACK installer
Message-ID: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>

Hi, David C. (and all).  Searching the archives for "Windows BLAS," I found
your 2008 post announcing an alpha version of a BLAS/LAPACK Windows
installer "superpack," but the link therein appears to be dead (I get a 404
not found error).  What's the status of this endeavor?  Have you "recalled"
or stopped developing the product?  If so, what's your present
recommendation for installing these in Vista?  (I found an interesting 2008
paper "Choosing the optimal BLAS and LAPACK library," which has a list
indicating ATLAS, Goto, and Intel MKL as "better-than-reference" tested
implementations on Intel-based architecture, but in Linux.  Also, it
indicates that Goto is BLAS-only - is this all I need for a viable build of
scipy from source?)  Thanks!

David G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/488ed5c2/attachment.html>

From dwf at cs.toronto.edu  Fri Nov 13 17:08:02 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 13 Nov 2009 17:08:02 -0500
Subject: [SciPy-User] For DavidC,
	relevant to Windoze in general: BLAS/LAPACK installer
In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
Message-ID: <C88E1BF5-5987-43C1-ADEB-6DF9FEEE38A4@cs.toronto.edu>

On 13-Nov-09, at 3:23 PM, David Goldsmith wrote:

> (I found an interesting 2008
> paper "Choosing the optimal BLAS and LAPACK library," which has a list
> indicating ATLAS, Goto, and Intel MKL as "better-than-reference"  
> tested
> implementations on Intel-based architecture, but in Linux.  Also, it
> indicates that Goto is BLAS-only - is this all I need for a viable  
> build of
> scipy from source?)  Thanks!

ATLAS is also "BLAS-only" (i.e. if you download the ATLAS source  
tarball and build, all you will get are ATLAS's tuned implementations  
of the Basic Linear Algebra Subroutines, LAPACK must be downloaded  
separately). LAPACK routines rely on BLAS, The difference is that I  
think ATLAS has support for building an (optimized?) LAPACK. I think  
it may also do some auto-tuning of LAPACK compiler flags or something.

I'm afraid I have no clue about building on Windows, sorry. If you get  
it working, document your steps so we can put it on the website (I'm  
pretty sure the version on the wiki/in SVN now are out of date).

David


From sturla at molden.no  Fri Nov 13 17:10:13 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 23:10:13 +0100
Subject: [SciPy-User] For DavidC,
 relevant to Windoze in general: BLAS/LAPACK installer
In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
Message-ID: <4AFDD945.60507@molden.no>

David Goldsmith skrev:
> Also, it indicates that Goto is BLAS-only - is this all I need for a 
> viable build of scipy from source?)  Thanks!
The heavy lifiting in LAPACK is delegated to BLAS. So if you build 
LAPACK against ATLAS or GotoBLAS,  LAPACK  does not tend to be the 
bottleneck.

Both ATLAS and GotoBLAS reimplements some routines from LAPACK. But they 
will not give you a full LAPACK.

To avoid LAPACK overshadowing LAPACK routines in GotoBLAS, see this:

http://jupiter.ethz.ch/~dmay/Research/GotoBLAS/index.html 
<http://jupiter.ethz.ch/%7Edmay/Research/GotoBLAS/index.html>

MKL has both LAPACK and BLAS, but it is very expensive. I don't know if 
Intel has reimplemented LAPACK, but that would surprise me, as it would 
suffice to make a fast BLAS.


From sturla at molden.no  Fri Nov 13 17:16:19 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 23:16:19 +0100
Subject: [SciPy-User] For DavidC,
 relevant to Windoze in general: BLAS/LAPACK installer
In-Reply-To: <C88E1BF5-5987-43C1-ADEB-6DF9FEEE38A4@cs.toronto.edu>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
	<C88E1BF5-5987-43C1-ADEB-6DF9FEEE38A4@cs.toronto.edu>
Message-ID: <4AFDDAB3.5060807@molden.no>

David Warde-Farley skrev:
> I'm afraid I have no clue about building on Windows, sorry. If you get  
> it working, document your steps so we can put it on the website (I'm  
> pretty sure the version on the wiki/in SVN now are out of date).
>
>   

At least ATLAS needs to be built using Cygwin, which is a PITA. And from 
what I know, there is no 64 bit support in Cygwin either, so we always 
end up with a 32 bit ATLAS.

And considering that GotoBLAS is claimed to speed up Matlab (ATLAS or 
MKL being defaults), the choise is not difficult...


From d.l.goldsmith at gmail.com  Fri Nov 13 17:32:09 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 13 Nov 2009 14:32:09 -0800
Subject: [SciPy-User] For DavidC,
	relevant to Windoze in general: 	BLAS/LAPACK installer
In-Reply-To: <4AFDDAB3.5060807@molden.no>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
	<C88E1BF5-5987-43C1-ADEB-6DF9FEEE38A4@cs.toronto.edu>
	<4AFDDAB3.5060807@molden.no>
Message-ID: <45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com>

Thanks, both, sounds like Goto is the way to go unless David C. still
has/supports his MSI for BLAS for Windoze.  (Speeding up Matlab isn't an
issue for me - except to the extent that it implies speeding up numpy/scipy
- as I've committed myself to eventually converting/refactoring all my old
Matlab code to Python and its modules.) :-)

DG

On Fri, Nov 13, 2009 at 2:16 PM, Sturla Molden <sturla at molden.no> wrote:

> David Warde-Farley skrev:
> > I'm afraid I have no clue about building on Windows, sorry. If you get
> > it working, document your steps so we can put it on the website (I'm
> > pretty sure the version on the wiki/in SVN now are out of date).
> >
> >
>
> At least ATLAS needs to be built using Cygwin, which is a PITA. And from
> what I know, there is no 64 bit support in Cygwin either, so we always
> end up with a 32 bit ATLAS.
>
> And considering that GotoBLAS is claimed to speed up Matlab (ATLAS or
> MKL being defaults), the choise is not difficult...
>
>
>
>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/70633a34/attachment.html>

From sturla at molden.no  Fri Nov 13 17:44:47 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Nov 2009 23:44:47 +0100
Subject: [SciPy-User] For DavidC,
 relevant to Windoze in general: 	BLAS/LAPACK installer
In-Reply-To: <45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>	<C88E1BF5-5987-43C1-ADEB-6DF9FEEE38A4@cs.toronto.edu>	<4AFDDAB3.5060807@molden.no>
	<45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com>
Message-ID: <4AFDE15F.7020700@molden.no>

David Goldsmith skrev:
> Thanks, both, sounds like Goto is the way to go unless David C. still 
> has/supports his MSI for BLAS for Windoze.  (Speeding up Matlab isn't 
> an issue for me - except to the extent that it implies speeding up 
> numpy/scipy - as I've committed myself to eventually 
> converting/refactoring all my old Matlab code to Python and its 
> modules.) :-)
GotoBLAS will speed up linear algebra computation with NumPy and SciPy, 
and it is trivial to build. The catch is that GotoBLAS is only free for 
personal or academic use.
 
Note that most use of NumPy or SciPy do not involve heavy use of LAPACK, 
and the computational bottleneck tends to be in the creation of 
temporary arrays, not in CPU-intensive computation.


From sturla at molden.no  Fri Nov 13 18:06:46 2009
From: sturla at molden.no (Sturla Molden)
Date: Sat, 14 Nov 2009 00:06:46 +0100
Subject: [SciPy-User] For DavidC,
 relevant to Windoze in general: BLAS/LAPACK installer
In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
Message-ID: <4AFDE686.9080102@molden.no>

David Goldsmith skrev:
> Hi, David C. (and all).  Searching the archives for "Windows BLAS," I 
> found your 2008 post announcing an alpha version of a BLAS/LAPACK 
> Windows installer "superpack," but the link therein appears to be dead 
> (I get a 404 not found error). 

I'll just mention that re-distribution of GotoBLAS is prohibited. So if 
SciPy is to provide a BLAS superpack it will have to be ATLAS. That is 
difficult as ATLAS requires more tuning parameters than  GotoBLAS+LAPACK.

The performance of Intel's MKL and AMD's ACML is only slightly less than 
GotoBLAS. If you cannot build NumPy with GotoBLAS and LAPACK, I suggest 
you use MKL or ACML instead.

By the way, it would be better if NumPy and  SciPy had a way of 
replacing it's BLAS and LAPACK libraries. For example if they were not 
statically linked, as today, they could be a DLL whose path would be 
found in a config file.

Sturla


From cournape at gmail.com  Fri Nov 13 18:21:48 2009
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 14 Nov 2009 08:21:48 +0900
Subject: [SciPy-User] For DavidC,
	relevant to Windoze in general: 	BLAS/LAPACK installer
In-Reply-To: <4AFDE686.9080102@molden.no>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
	<4AFDE686.9080102@molden.no>
Message-ID: <5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com>

On Sat, Nov 14, 2009 at 8:06 AM, Sturla Molden <sturla at molden.no> wrote:

>
> By the way, it would be better if NumPy and ?SciPy had a way of
> replacing it's BLAS and LAPACK libraries. For example if they were not
> statically linked, as today, they could be a DLL whose path would be
> found in a config file.

Yes, we know, but that is difficult, partly because of windows
limitations, partly because every blas/lapack has different
conventions (name mangling for example).

It is not impossible, but that's quite a lot of work to make it work reliably.

David


From pgmdevlist at gmail.com  Fri Nov 13 18:54:23 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Fri, 13 Nov 2009 18:54:23 -0500
Subject: [SciPy-User] scikits.timeseries concatenate
In-Reply-To: <loom.20091112T115923-387@post.gmane.org>
References: <loom.20091112T115923-387@post.gmane.org>
Message-ID: <0872A453-272D-4520-907B-8468146B8AB3@gmail.com>


On Nov 13, 2009, at 4:14 AM, Dave Hirschfeld wrote:

> 
> It appears that when remove_duplicates is True (the default) ts.concatenate
> doesn't respect the dimensions of the data array c.f.

Good call, and thanks for the fix. I gonna investigate some more and let you know...

From d.l.goldsmith at gmail.com  Fri Nov 13 19:16:07 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 13 Nov 2009 16:16:07 -0800
Subject: [SciPy-User] For DavidC,
	relevant to Windoze in general: 	BLAS/LAPACK installer
In-Reply-To: <5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com>
References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com>
	<4AFDE686.9080102@molden.no>
	<5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com>
Message-ID: <45d1ab480911131616y7f34c41cx5a6c92723c9375ed@mail.gmail.com>

Hi, David.  I take it you concur w/ Sturla's rec. that try Goto first?

DG

On Fri, Nov 13, 2009 at 3:21 PM, David Cournapeau <cournape at gmail.com>wrote:

> On Sat, Nov 14, 2009 at 8:06 AM, Sturla Molden <sturla at molden.no> wrote:
>
> >
> > By the way, it would be better if NumPy and  SciPy had a way of
> > replacing it's BLAS and LAPACK libraries. For example if they were not
> > statically linked, as today, they could be a DLL whose path would be
> > found in a config file.
>
> Yes, we know, but that is difficult, partly because of windows
> limitations, partly because every blas/lapack has different
> conventions (name mangling for example).
>
> It is not impossible, but that's quite a lot of work to make it work
> reliably.
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091113/1e5d8f7c/attachment.html>

From amenity at enthought.com  Sat Nov 14 14:18:04 2009
From: amenity at enthought.com (Amenity Applewhite)
Date: Sat, 14 Nov 2009 13:18:04 -0600
Subject: [SciPy-User] November 20 Webinar: Interpolation with NumPy/SciPy
References: <1102826606498.1102424111856.4101.9.1814156C@scheduler>
Message-ID: <C968D3DB-386F-454A-9931-4F307DDCFD06@enthought.com>


Having trouble viewing this email? Click here

Friday, November 20:
Interpolation with NumPy/SciPy
Dear Amenity,

It's time for our mid-month Scientific Computing with Python webinar!  
This month's topic is sure to prove very useful for data analysts:  
Interpolation with NumPy and SciPy.

In many data-processing scenarios, it is necessary to use a discrete  
set of available data-points to infer the value of a function at a new  
data-point. One approach to this problem is interpolation, which  
constructs a new model-function that goes through the original data- 
points. There are many forms of interpolation - polynomial, spline,  
kriging, radial basis function, etc. - and SciPy includes some of  
these interpolation forms. This webinar will review the interpolation  
modules available in SciPy and in the larger Python community and  
provide instruction on their use via example.
Scientific Computing with Python Webinar:
Interpolation with NumPy/SciPy
Friday, November 20
1pm CDT/7pm UTC
Register at GoToMeeting
We look forward to seeing you Friday! As always, feel free to contact  
us with questions, concerns, or suggestions for future webinar topics.

Thanks,

The Enthought Team
QUICK LINKS :::
www.enthought.com
code.enthought.com
Facebook
Enthought Blog
Forward email

This email was sent to amenity at enthought.com by amenity at enthought.com.
Update Profile/Email Address | Instant removal with SafeUnsubscribe? |  
Privacy Policy.
Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091114/c378865f/attachment.html>

From mattknox.ca at gmail.com  Sun Nov 15 15:47:51 2009
From: mattknox.ca at gmail.com (Matt Knox)
Date: Sun, 15 Nov 2009 20:47:51 +0000 (UTC)
Subject: [SciPy-User] ANN: scikits.timeseries 0.91.3
Message-ID: <loom.20091115T214434-328@post.gmane.org>

We are pleased to announce the release of scikits.timeseries 0.91.3

This is a bug fix release and is recommended for all users.

Home page: http://pytseries.sourceforge.net/
Please see the website for installation requirements and download details.

Bug Fixes
---------

* general improvements for tsfromtxt
* accept datetime objects for 'value' positional arg in Date class
* fixes for compatibility with matplotlib 0.99.1
* fix problem with '%j' directive in strftime method
* fix problem with concatenate and 2-d series
* fixed crash in reportlib.Report class when fixed_width=False and a header_row
  were specified at same time

Thanks,
Matt Knox & Pierre Gerard-Marchant


From cohen at lpta.in2p3.fr  Sun Nov 15 16:02:38 2009
From: cohen at lpta.in2p3.fr (Johann Cohen-Tanugi)
Date: Sun, 15 Nov 2009 22:02:38 +0100
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
Message-ID: <4B006C6E.6080800@lpta.in2p3.fr>

Anne, do you know of a python implementation of Lomb-Scargle?
Johann

Anne Archibald wrote:
> Hi,
>
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation). I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?
>
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
>
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.
>
> Suggestions welcome.
>
> Thanks,
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>   


From dpfrota at yahoo.com.br  Sun Nov 15 19:00:47 2009
From: dpfrota at yahoo.com.br (dpfrota)
Date: Sun, 15 Nov 2009 16:00:47 -0800 (PST)
Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6
Message-ID: <26355930.post@talk.nabble.com>


I got the exactly same error...
More tips?

Thanks


David Cournapeau wrote:
> 
> On Tue, Oct 27, 2009 at 3:04 AM, Christopher Brown <c-b at asu.edu> wrote:
>> Thanks for the suggestion. However, audiolab didn't need it installed on
>> Python 2.5. I also see the file _sndfile.dll in the audiolab folder,
>> which I assume contains the sndfile code (it is ~3.5mb).
>>
>> I installed it anyway, and I copied the dll into the audiolab folder,
>> but the error persists. Any other suggestions?
> 
> There is indeed a problem on 2.6, but I have not found the time to
> look at it. Most likely linked to the manifest nonsense on windows,
> 
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p26355930.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From mudit_19a at yahoo.com  Sun Nov 15 19:25:31 2009
From: mudit_19a at yahoo.com (mudit sharma)
Date: Mon, 16 Nov 2009 05:55:31 +0530 (IST)
Subject: [SciPy-User] Pytseries numpy func error
In-Reply-To: <loom.20091110T232101-47@post.gmane.org>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
	<c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>
	<835246.33088.qm@web94906.mail.in2.yahoo.com>
	<loom.20091110T232101-47@post.gmane.org>
Message-ID: <814366.30498.qm@web94914.mail.in2.yahoo.com>


actually i figured that out it throws that error when data array is of dtype object

In [74]: data = npy.array([-1840.0,-1550.0,-940.0,2660.0,190.0,3980.0,1130.0,2090.0,1980.0,1220.0,-1220.0,1140.0,-2420.0,2200.0,370.0,230.0,-60.0,2550.0,970.0,660.0,-20.0,50.0,-980.0,6580.0,4090.0,3240.0,-350.0,-1800.0,2020.0,5050.0,-110.0,-330.0,-2290.0], dtype=npy.object)

In [75]: dates = "Mar-2007","Apr-2007","May-2007","Jun-2007","Jul-2007","Aug-2007","Sep-2007","Oct-2007","Nov-2007","Dec-2007","Jan-2008","Feb-2008","Mar-2008","Apr-2008","May-2008","Jun-2008","Jul-2008","Aug-2008","Sep-2008","Oct-2008","Nov-2008","Dec-2008","Jan-2009","Feb-2009","Mar-2009","Apr-2009","May-2009","Jun-2009","Jul-2009","Aug-2009","Sep-2009","Oct-2009","Nov-2009"

In [76]: series = ts.time_series(data,dates, freq="M")


----- Original Message ----
From: Matt Knox <mattknox.ca at gmail.com>
To: scipy-user at scipy.org
Sent: Tue, 10 November, 2009 22:29:41
Subject: Re: [SciPy-User] Pytseries numpy func error


> series.sum() gives this error whereas series.data.sum()
> works.

I don't get this error when trying a sum on a TimeSeries object. I noticed you
are using an older version of the timeseries module. Can you try upgrading to
the latest version and see if you still get an error? Also, if you still get
the error please post a small example demonstrating how to get the error,
thanks.

Also, note that we will probably be doing a new minor bug fix release within
the next week or two.

- Matt

_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


From pgmdevlist at gmail.com  Sun Nov 15 19:58:10 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Sun, 15 Nov 2009 19:58:10 -0500
Subject: [SciPy-User] Pytseries numpy func error
In-Reply-To: <814366.30498.qm@web94914.mail.in2.yahoo.com>
References: <db6b5ecc0911031128o77bb588dn3f68ad6aa3fda12d@mail.gmail.com>
	<db6b5ecc0911060418g317168cap15a1a9667c0f19e8@mail.gmail.com>
	<4AF8842D.5010805@ucsf.edu>
	<c6a6f63b0911100954j5c6dfb10tc3915a1f1ff7aba1@mail.gmail.com>
	<835246.33088.qm@web94906.mail.in2.yahoo.com>
	<loom.20091110T232101-47@post.gmane.org>
	<814366.30498.qm@web94914.mail.in2.yahoo.com>
Message-ID: <11764FA5-B832-4397-9C76-6EEBF3A82AEA@gmail.com>


On Nov 15, 2009, at 7:25 PM, mudit sharma wrote:

> 
> actually i figured that out it throws that error when data array is of dtype object


Confirmed. The bug is in numpy.ma, I'll check that later this evening...

From peridot.faceted at gmail.com  Sun Nov 15 21:49:23 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 15 Nov 2009 21:49:23 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <4B006C6E.6080800@lpta.in2p3.fr>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<4B006C6E.6080800@lpta.in2p3.fr>
Message-ID: <ce557a360911151849m4ffa079cw614d15de80fef1a4@mail.gmail.com>

2009/11/15 Johann Cohen-Tanugi <cohen at lpta.in2p3.fr>:
> Anne, do you know of a python implementation of Lomb-Scargle?

I don't, unfortunately. But as there seems to be no clever FFT-like
trick to it, it can probably be written in a few lines of numpy code -
simply take an array of frequencies and an array of times, broadcast
them together, and apply the formulas. If you have a lot of events or
a lot of frequencies, a loop over the smaller array will save a big
intermediate array, but beyond that I don't think there's much
cleverness to be put in.

Anne


From peridot.faceted at gmail.com  Sun Nov 15 22:22:28 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sun, 15 Nov 2009 22:22:28 -0500
Subject: [SciPy-User] Unit testing of Bayesian estimator
In-Reply-To: <ce557a360911091328u13c3973fl7ea380962deb1856@mail.gmail.com>
References: <ce557a360911061613n1e6027c6y4dd42822d567c4be@mail.gmail.com>
	<bbcd77d00911071923l711ec33do8747636193526775@mail.gmail.com>
	<ce557a360911072347i5b9e557bo3d267629a2a8b5d2@mail.gmail.com>
	<4AF84EF1.2090608@gmail.com>
	<ce557a360911091006h4f68a6c5gcfa53372f100dabc@mail.gmail.com>
	<4AF871BE.6050300@gmail.com>
	<ce557a360911091328u13c3973fl7ea380962deb1856@mail.gmail.com>
Message-ID: <ce557a360911151922y7295de86i18facad9db1ea3ec@mail.gmail.com>

Thank you everyone for all your comments. I have managed to pull
together a more-or-less satisfactory solution. If you're curious, I
have written up the problem at:
http://lighthouseinthesky.blogspot.com/2009/11/curve-fitting-part-3-bayesian-fitting.html
and my solution so far at:
http://lighthouseinthesky.blogspot.com/2009/11/curve-fitting-part-4-validating.html

Thanks,
Anne


From gokhansever at gmail.com  Tue Nov 17 00:44:17 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Mon, 16 Nov 2009 23:44:17 -0600
Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
Message-ID: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>

Hello,

I have a data which represents aerosol size distribution in between 0.1 to
3.0 micrometer ranges. I would like extrapolate the lower size down to 10
nm. The data in this context is log-normally distributed. Therefore I am
looking a way to fit a log-normal curve onto my data. Could you please give
me some pointers to solve this problem?

Thank you.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091116/0340a6c5/attachment.html>

From robert.kern at gmail.com  Tue Nov 17 00:51:19 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 16 Nov 2009 23:51:19 -0600
Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
In-Reply-To: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
Message-ID: <3d375d730911162151v6f4db525ka8ccca864a32b162@mail.gmail.com>

On Mon, Nov 16, 2009 at 23:44, G?khan Sever <gokhansever at gmail.com> wrote:
> Hello,
>
> I have a data which represents aerosol size distribution in between 0.1 to
> 3.0 micrometer ranges. I would like extrapolate the lower size down to 10
> nm. The data in this context is log-normally distributed. Therefore I am
> looking a way to fit a log-normal curve onto my data. Could you please give
> me some pointers to solve this problem?

Transform the data y=log(x) then estimate the mean and variance of y.
With the appropriate transformations (which you will have to look up
depending on the convention of the log-normal calculations that you
are using), these are reasonable estimates of the log-normal
distribution for your data.

Or you could just stay in the transformed space.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Tue Nov 17 12:29:10 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 11:29:10 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
Message-ID: <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>

On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett <geometrian at gmail.com> wrote:

> Theory wise:
> -Do a linear regression on your data.
> -Apply a logrithmic transform to your data's dependent variable, and do
> another linear regression.
> -Apply a logrithmic transform to your data's independent variable, and do
> another linear regression.
> -Take the best regression (highest r^2 value) and execute a back transform.
>
> Then, to get your desired extrapolation, simply substitute in the size for
> the independent variable to get the expected value.
>
> If, however, you're looking for how to implement this in NumPy or SciPy, I
> can't really help :-P
> Ian
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
OK, before applying your suggestions. I have a few more questions. Here is 1
real-sample data that I will use as a part of the log-normal fitting. There
is 15 elements in this arrays each being a concentration for corresponding
0.1 - 3.0 um size ranges.

I[74]: conc
O[74]:
array([ 119.7681,  118.546 ,  146.6548,   96.5478,  109.9911,   32.9974,
         20.7762,    6.1107,   12.2212,    3.6664,    3.6664,    1.2221,
          2.4443,    2.4443,    3.6664])

For now not calibrated size range I just assume a linear array:

I[78]: sizes = linspace(0.1, 3.0, 15)

I[79]: sizes
O[79]:
array([ 0.1       ,  0.30714286,  0.51428571,  0.72142857,  0.92857143,
        1.13571429,  1.34285714,  1.55      ,  1.75714286,  1.96428571,
        2.17142857,  2.37857143,  2.58571429,  2.79285714,  3.        ])


Not a very ideal looking log-normal, but so far I don't know what else
besides a log-normal fit would give me a better estimate:
I[80]: figure(); plot(sizes, conc)
http://img406.imageshack.us/img406/156/sizeconc.png

scipy.stats has the lognorm.fit

    lognorm.fit(data,s,loc=0,scale=1)
        - Parameter estimates for lognorm data

and applying this to my data. However not sure the right way of calling it,
and not sure if this could be applied to my case?

I[81]: stats.lognorm.fit(conc)
O[81]: array([ 2.31386066,  1.19126064,  9.5748391 ])

Lastly, what is the way to create a ideal log-normal sample using the
stats.lognorm.rvs?

Thanks


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/216ea12b/attachment.html>

From josef.pktd at gmail.com  Tue Nov 17 13:38:01 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 17 Nov 2009 13:38:01 -0500
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
Message-ID: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>

On Tue, Nov 17, 2009 at 12:29 PM, G?khan Sever <gokhansever at gmail.com> wrote:
>
>
> On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett <geometrian at gmail.com> wrote:
>>
>> Theory wise:
>> -Do a linear regression on your data.
>> -Apply a logrithmic transform to your data's dependent variable, and do
>> another linear regression.
>> -Apply a logrithmic transform to your data's independent variable, and do
>> another linear regression.
>> -Take the best regression (highest r^2 value) and execute a back
>> transform.
>>
>> Then, to get your desired extrapolation, simply substitute in the size for
>> the independent variable to get the expected value.
>>
>> If, however, you're looking for how to implement this in NumPy or SciPy, I
>> can't really help :-P
>> Ian
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> OK, before applying your suggestions. I have a few more questions. Here is 1
> real-sample data that I will use as a part of the log-normal fitting. There
> is 15 elements in this arrays each being a concentration for corresponding
> 0.1 - 3.0 um size ranges.
>
> I[74]: conc
> O[74]:
> array([ 119.7681,? 118.546 ,? 146.6548,?? 96.5478,? 109.9911,?? 32.9974,
> ???????? 20.7762,??? 6.1107,?? 12.2212,??? 3.6664,??? 3.6664,??? 1.2221,
> ????????? 2.4443,??? 2.4443,??? 3.6664])
>
> For now not calibrated size range I just assume a linear array:
>
> I[78]: sizes = linspace(0.1, 3.0, 15)
>
> I[79]: sizes
> O[79]:
> array([ 0.1?????? ,? 0.30714286,? 0.51428571,? 0.72142857,? 0.92857143,
> ??????? 1.13571429,? 1.34285714,? 1.55????? ,? 1.75714286,? 1.96428571,
> ??????? 2.17142857,? 2.37857143,? 2.58571429,? 2.79285714,? 3.??????? ])
>
>
> Not a very ideal looking log-normal, but so far I don't know what else
> besides a log-normal fit would give me a better estimate:
> I[80]: figure(); plot(sizes, conc)
> http://img406.imageshack.us/img406/156/sizeconc.png
>
> scipy.stats has the lognorm.fit
>
> ??? lognorm.fit(data,s,loc=0,scale=1)
> ??????? - Parameter estimates for lognorm data
>
> and applying this to my data. However not sure the right way of calling it,
> and not sure if this could be applied to my case?
>
> I[81]: stats.lognorm.fit(conc)
> O[81]: array([ 2.31386066,? 1.19126064,? 9.5748391 ])
>
> Lastly, what is the way to create a ideal log-normal sample using the
> stats.lognorm.rvs?

I don't think I understand the connection to the log-normal distribution.
You seem to have a non-linear relationship
conc = f(size)  where you want to find a non-linear relationship f

If conc where just lognormal distributed, then you would not get any
relationship between conc and size.

If you have many observations with conc, size pairs then you could
estimate a noisy model
conc = f(size) + u  where the noise u is for example log-normal
distributed but you would still need to get an expression for the
non-linear function f.
Extending a non-linear function outside of the observed range
is essentially always just a guess or by assumption.

If you want to fit a curve f that has the same shape as the pdf of
the log-normal, then you cannot do it with lognorm.fit, because that
just assumes you have a random sample independent of size.

So, it's not clear to me what you really want, or what your sample data
looks like (do you have only one 15 element sample or lots of them).

Josef


>
> Thanks
>
>
> --
> G?khan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From robert.kern at gmail.com  Tue Nov 17 13:57:46 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 12:57:46 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
Message-ID: <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>

On Tue, Nov 17, 2009 at 12:38,  <josef.pktd at gmail.com> wrote:

> So, it's not clear to me what you really want, or what your sample data
> looks like (do you have only one 15 element sample or lots of them).

I'm guessing that they aren't really samples of (conc, size) pairs so
much as binned data. Particles with sizes between 0.1 and 0.3 um (for
example; I don't know where the bin edges actually are in his data)
have a concentration of 119.7681 particles/<some unit of volume>. This
can be normalized to a more proper histogrammed distribution, except
that the lower end of the distribution below 0.1 um has been censored
by his measuring process. He then wants to infer the continuous
distribution that generated that censored histogram so he can predict
what the distribution is in the censored region.

So, I would say that it's a bit trickier than fitting the log-normal
PDF to the data for a couple of reasons.

1) Directly fitting PDFs to histogram values is usually not a great
idea to begin with.
2) We don't know how much probability mass is in the censored region.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Tue Nov 17 14:28:45 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 13:28:45 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
Message-ID: <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>

On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 12:29 PM, G?khan Sever <gokhansever at gmail.com>
> wrote:
> >
> >
> > On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett <geometrian at gmail.com>
> wrote:
> >>
> >> Theory wise:
> >> -Do a linear regression on your data.
> >> -Apply a logrithmic transform to your data's dependent variable, and do
> >> another linear regression.
> >> -Apply a logrithmic transform to your data's independent variable, and
> do
> >> another linear regression.
> >> -Take the best regression (highest r^2 value) and execute a back
> >> transform.
> >>
> >> Then, to get your desired extrapolation, simply substitute in the size
> for
> >> the independent variable to get the expected value.
> >>
> >> If, however, you're looking for how to implement this in NumPy or SciPy,
> I
> >> can't really help :-P
> >> Ian
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> > OK, before applying your suggestions. I have a few more questions. Here
> is 1
> > real-sample data that I will use as a part of the log-normal fitting.
> There
> > is 15 elements in this arrays each being a concentration for
> corresponding
> > 0.1 - 3.0 um size ranges.
> >
> > I[74]: conc
> > O[74]:
> > array([ 119.7681,  118.546 ,  146.6548,   96.5478,  109.9911,   32.9974,
> >          20.7762,    6.1107,   12.2212,    3.6664,    3.6664,    1.2221,
> >           2.4443,    2.4443,    3.6664])
> >
> > For now not calibrated size range I just assume a linear array:
> >
> > I[78]: sizes = linspace(0.1, 3.0, 15)
> >
> > I[79]: sizes
> > O[79]:
> > array([ 0.1       ,  0.30714286,  0.51428571,  0.72142857,  0.92857143,
> >         1.13571429,  1.34285714,  1.55      ,  1.75714286,  1.96428571,
> >         2.17142857,  2.37857143,  2.58571429,  2.79285714,  3.        ])
> >
> >
> > Not a very ideal looking log-normal, but so far I don't know what else
> > besides a log-normal fit would give me a better estimate:
> > I[80]: figure(); plot(sizes, conc)
> > http://img406.imageshack.us/img406/156/sizeconc.png
> >
> > scipy.stats has the lognorm.fit
> >
> >     lognorm.fit(data,s,loc=0,scale=1)
> >         - Parameter estimates for lognorm data
> >
> > and applying this to my data. However not sure the right way of calling
> it,
> > and not sure if this could be applied to my case?
> >
> > I[81]: stats.lognorm.fit(conc)
> > O[81]: array([ 2.31386066,  1.19126064,  9.5748391 ])
> >
> > Lastly, what is the way to create a ideal log-normal sample using the
> > stats.lognorm.rvs?
>
>
R. Kern has nicely summarized my intention. Let me add some more onto his
description.


> I don't think I understand the connection to the log-normal distribution.
> You seem to have a non-linear relationship
> conc = f(size)  where you want to find a non-linear relationship f
>

Here I am directly quoting from on of my cloud physics books:

"Once a discrete model size distribution has been laid out, the initial
particle number,
volume, and mass concentrations must be distributed among model size bins.
This
can be accomplished by fitting measurements to a continuous size
distribution,
then discretizing the continuous distribution over the model bins. Three
continuous
distributions available for this procedure are the lognormal,
Marshall?Palmer, and
modified gamma distributions."

My data are discrete in its nature, since have only 15 channels in between
(0.1 to 3.0 um ranges).
Say that (from the sample data that I used in my previous e-mail) the first
channel is in between
0.10 to 0.31 um and I read the number concentration for this size-range as
119.77 #/cm^3 so on so forth.

Since I am interested to estimate the number concentrations below the 0.1 um
(preferably down to 0.01 um or 10 nm)
I would like to fit a continuous distribution onto my dataset. Among the all
three continuous distributions lognormal seems
to be the easiest to implement, and log-normal distribution is commonly used
to represent aerosol size distribution in the
atmosphere. If there is a way to do this discretely I would like to know
very much.


>
> If conc where just lognormal distributed, then you would not get any
> relationship between conc and size.
>
> If you have many observations with conc, size pairs then you could
> estimate a noisy model
> conc = f(size) + u  where the noise u is for example log-normal
> distributed but you would still need to get an expression for the
> non-linear function f.
>

I don't understand why I can't get a relation between sizes and conc values
if conc is log-normally distributed. Can I elaborate this a bit more? The
non-linear relationship part is also confusing me. If say to test the linear
relationship of x and y data pairs we just fit a line, in this case what I
am looking is to fit a log-normal to get a relation between size and conc.


> Extending a non-linear function outside of the observed range
> is essentially always just a guess or by assumption.
>

Yes, I am aware of this. Just trying to put my guesses into a well-defined
form. So when I am describing the analysis process I will be able tell to
others that this extrapolation is a result of log-normal fitting.


>
> If you want to fit a curve f that has the same shape as the pdf of
> the log-normal, then you cannot do it with lognorm.fit, because that
> just assumes you have a random sample independent of size.
>

Could you give an example on this?


>
> So, it's not clear to me what you really want, or what your sample data
> looks like (do you have only one 15 element sample or lots of them).
>

I have many sample points (thousands) that are composed of this 15 elements.
But the whole data don't look much different the sample I used. Most peaks
are around 3rd - 4th channel and decaying as shown in the figure.


>
> Josef
>


>
>
> >
> > Thanks
> >
> >
> > --
> > G?khan
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/a05fc3f2/attachment.html>

From gokhansever at gmail.com  Tue Nov 17 14:36:36 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 13:36:36 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
Message-ID: <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>

On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 12:38,  <josef.pktd at gmail.com> wrote:
>
> > So, it's not clear to me what you really want, or what your sample data
> > looks like (do you have only one 15 element sample or lots of them).
>
> I'm guessing that they aren't really samples of (conc, size) pairs so
> much as binned data.


Correct. These are discrete sample points.


> Particles with sizes between 0.1 and 0.3 um (for
> example; I don't know where the bin edges actually are in his data)
> have a concentration of 119.7681 particles/<some unit of volume>.


True, in particles/cm^3 units


> This can be normalized to a more proper histogrammed distribution, except
> that the lower end of the distribution below 0.1 um has been censored
> by his measuring process. He then wants to infer the continuous
> distribution that generated that censored histogram so he can predict
> what the distribution is in the censored region.
>

Exactly. Where later I am hoping to find a critical size point using another
equation, and
integrating upwards to obtain total concentration from that point on and do
a comparison
with another instrument.

The 0.1 um threshold comes from the instrument limit. It can't measure below
this level
due to the constraint of the Mie scattering theory.


>
> So, I would say that it's a bit trickier than fitting the log-normal
> PDF to the data for a couple of reasons.
>
> 1) Directly fitting PDFs to histogram values is usually not a great
> idea to begin with.
> 2) We don't know how much probability mass is in the censored region.
>
>
So we agree that it is easy to implement a log-normal fit than a discrete
one?


> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/a1863661/attachment.html>

From robert.kern at gmail.com  Tue Nov 17 14:37:33 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 13:37:33 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
Message-ID: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>

On Tue, Nov 17, 2009 at 13:28, G?khan Sever <gokhansever at gmail.com> wrote:
>
>
> On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:

>> If conc where just lognormal distributed, then you would not get any
>> relationship between conc and size.
>>
>> If you have many observations with conc, size pairs then you could
>> estimate a noisy model
>> conc = f(size) + u ?where the noise u is for example log-normal
>> distributed but you would still need to get an expression for the
>> non-linear function f.
>
> I don't understand why I can't get a relation between sizes and conc values
> if conc is log-normally distributed. Can I elaborate this a bit more? The
> non-linear relationship part is also confusing me. If say to test the linear
> relationship of x and y data pairs we just fit a line, in this case what I
> am looking is to fit a log-normal to get a relation between size and conc.

It's a language issue. Your concentration values are not log-normally
distributed. Your particle sizes are log-normally distributed (maybe).
The concentration of a range of particle sizes is a measurement that
is related to particle size the distribution, but you would not say
that the measurements themselves are log-normally distributed. Josef
was taking your language at face value.

>> If you want to fit a curve f that has the same shape as the pdf of
>> the log-normal, then you cannot do it with lognorm.fit, because that
>> just assumes you have a random sample independent of size.
>
> Could you give an example on this?

x = stats.norm.rvs()
stats.norm.fit(x)

>> So, it's not clear to me what you really want, or what your sample data
>> looks like (do you have only one 15 element sample or lots of them).
>
> I have many sample points (thousands) that are composed of this 15 elements.
> But the whole data don't look much different the sample I used. Most peaks
> are around 3rd - 4th channel and decaying as shown in the figure.

Do you need to fit a different distribution for each 15-vector? Or are
all of these measurements supposed to be merged somehow?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From robert.kern at gmail.com  Tue Nov 17 14:40:33 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 13:40:33 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> 
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
Message-ID: <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>

On Tue, Nov 17, 2009 at 13:36, G?khan Sever <gokhansever at gmail.com> wrote:
> On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern <robert.kern at gmail.com> wrote:

>> So, I would say that it's a bit trickier than fitting the log-normal
>> PDF to the data for a couple of reasons.
>>
>> 1) Directly fitting PDFs to histogram values is usually not a great
>> idea to begin with.
>> 2) We don't know how much probability mass is in the censored region.
>
> So we agree that it is easy to implement a log-normal fit than a discrete
> one?

No, none of the things we have suggested will work well for you. You
have a more complicated task ahead of you. I have ideas that might
work, but explaining them will take more time than I have.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From josef.pktd at gmail.com  Tue Nov 17 15:04:20 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 17 Nov 2009 15:04:20 -0500
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
Message-ID: <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com>

On Tue, Nov 17, 2009 at 2:37 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Tue, Nov 17, 2009 at 13:28, G?khan Sever <gokhansever at gmail.com> wrote:
>>
>>
>> On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:
>
>>> If conc where just lognormal distributed, then you would not get any
>>> relationship between conc and size.
>>>
>>> If you have many observations with conc, size pairs then you could
>>> estimate a noisy model
>>> conc = f(size) + u ?where the noise u is for example log-normal
>>> distributed but you would still need to get an expression for the
>>> non-linear function f.
>>
>> I don't understand why I can't get a relation between sizes and conc values
>> if conc is log-normally distributed. Can I elaborate this a bit more? The
>> non-linear relationship part is also confusing me. If say to test the linear
>> relationship of x and y data pairs we just fit a line, in this case what I
>> am looking is to fit a log-normal to get a relation between size and conc.
>
> It's a language issue. Your concentration values are not log-normally
> distributed. Your particle sizes are log-normally distributed (maybe).
> The concentration of a range of particle sizes is a measurement that
> is related to particle size the distribution, but you would not say
> that the measurements themselves are log-normally distributed. Josef
> was taking your language at face value.

The way I see it, you have to variables, size and counts (or concentration).
My initial interpretation was you want to model the relationship between
these two variables.
When the total number of particles is fixed, then the conditional size
distribution is univariate, and could be modeled by a log-normal
distribution. (This still leaves the total count unmodelled.)

If you have the total particle count per bin, then it
should be possible to write down the likelihood function that is
discretized to the bins from the continuous distribution.
Given a random particle, what's the probability of being in bin 1,
bin 2 and so on. Then add the log-likelihood over all particles
and maximize as a function of the log-normal parameters.
(There might be a numerical trick using fraction instead of
conditional count, but I'm not sure what the analogous discrete
distribution would be. )
Once the parameters of the log-normal distribution are
estimated, the distribution would be defined over all of
the real line (where the out of sample pdf is determined
by assumption not data).

Josef


>
>>> If you want to fit a curve f that has the same shape as the pdf of
>>> the log-normal, then you cannot do it with lognorm.fit, because that
>>> just assumes you have a random sample independent of size.
>>
>> Could you give an example on this?
>
> x = stats.norm.rvs()
> stats.norm.fit(x)
>
>>> So, it's not clear to me what you really want, or what your sample data
>>> looks like (do you have only one 15 element sample or lots of them).
>>
>> I have many sample points (thousands) that are composed of this 15 elements.
>> But the whole data don't look much different the sample I used. Most peaks
>> are around 3rd - 4th channel and decaying as shown in the figure.
>
> Do you need to fit a different distribution for each 15-vector? Or are
> all of these measurements supposed to be merged somehow?
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ?-- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From robert.kern at gmail.com  Tue Nov 17 15:41:47 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 14:41:47 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> 
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> 
	<1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com>
Message-ID: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com>

On Tue, Nov 17, 2009 at 14:04,  <josef.pktd at gmail.com> wrote:

> The way I see it, you have to variables, size and counts (or concentration).
> My initial interpretation was you want to model the relationship between
> these two variables.
> When the total number of particles is fixed, then the conditional size
> distribution is univariate, and could be modeled by a log-normal
> distribution. (This still leaves the total count unmodelled.)
>
> If you have the total particle count per bin, then it
> should be possible to write down the likelihood function that is
> discretized to the bins from the continuous distribution.
> Given a random particle, what's the probability of being in bin 1,
> bin 2 and so on. Then add the log-likelihood over all particles
> and maximize as a function of the log-normal parameters.
> (There might be a numerical trick using fraction instead of
> conditional count, but I'm not sure what the analogous discrete
> distribution would be. )

I usually use the multinomial as the likelihood for such
"histogram-fitting" exercises. The two problem points here are that we
have real-valued concentrations, not integer-valued counts, and that
we don't have a measurement for the censored region. For the former, I
would suggest simply multiplying by the concentrations by a factor of
10 (equivalently, changing the units to particles/<10^n larger
volume>) such that the resolution of the measurements is 1
particle/<volume>. Then just apply the multinomial. It should be a
close enough approximation.

I'm not entirely sure what to do about the censored probability mass.
I think there might be a simple correction factor that you can apply
to the multinomial likelihood, but I haven't worked it out.

> Once the parameters of the log-normal distribution are
> estimated, the distribution would be defined over all of
> the real line (where the out of sample pdf is determined
> by assumption not data).

Since we are extrapolating to the censored region, it would probably
be a good idea to estimate the uncertainty of the estimate. I would
probably suggest using PyMC to do a Bayesian model. A parametric
bootstrap might also serve.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From josef.pktd at gmail.com  Tue Nov 17 16:01:56 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 17 Nov 2009 16:01:56 -0500
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
	<1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com>
	<3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com>
Message-ID: <1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com>

On Tue, Nov 17, 2009 at 3:41 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Tue, Nov 17, 2009 at 14:04, ?<josef.pktd at gmail.com> wrote:
>
>> The way I see it, you have to variables, size and counts (or concentration).
>> My initial interpretation was you want to model the relationship between
>> these two variables.
>> When the total number of particles is fixed, then the conditional size
>> distribution is univariate, and could be modeled by a log-normal
>> distribution. (This still leaves the total count unmodelled.)
>>
>> If you have the total particle count per bin, then it
>> should be possible to write down the likelihood function that is
>> discretized to the bins from the continuous distribution.
>> Given a random particle, what's the probability of being in bin 1,
>> bin 2 and so on. Then add the log-likelihood over all particles
>> and maximize as a function of the log-normal parameters.
>> (There might be a numerical trick using fraction instead of
>> conditional count, but I'm not sure what the analogous discrete
>> distribution would be. )
>
> I usually use the multinomial as the likelihood for such
> "histogram-fitting" exercises. The two problem points here are that we
> have real-valued concentrations, not integer-valued counts, and that
> we don't have a measurement for the censored region. For the former, I
> would suggest simply multiplying by the concentrations by a factor of
> 10 (equivalently, changing the units to particles/<10^n larger
> volume>) such that the resolution of the measurements is 1
> particle/<volume>. Then just apply the multinomial. It should be a
> close enough approximation.
>
> I'm not entirely sure what to do about the censored probability mass.
> I think there might be a simple correction factor that you can apply
> to the multinomial likelihood, but I haven't worked it out.

I think, for the continuous distribution it would be just dividing by
the probability of the not-censored region (which is also a function of
the distribution parameters). This would then just be a truncated
log-normal. multinomial might work the same, as long as the
probabilities are defined by the discretization.

Would you apply the multinomial directly? I don't see in that case
how you would recover the parameters of the continuous distribution.

Josef

>
>> Once the parameters of the log-normal distribution are
>> estimated, the distribution would be defined over all of
>> the real line (where the out of sample pdf is determined
>> by assumption not data).
>
> Since we are extrapolating to the censored region, it would probably
> be a good idea to estimate the uncertainty of the estimate. I would
> probably suggest using PyMC to do a Bayesian model. A parametric
> bootstrap might also serve.

I would use bootstrap, since I still haven't figured out how to use MCMC.

Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ?-- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From david_baddeley at yahoo.com.au  Tue Nov 17 16:11:59 2009
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Tue, 17 Nov 2009 13:11:59 -0800 (PST)
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
	<1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com>
	<3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com>
Message-ID: <484601.32915.qm@web33004.mail.mud.yahoo.com>

I guess it depends on how accurately you want to estimate the missing bin, and whether you can get any information about the amount of error in the individual measurements. Just looking at the curve you posted it looks like the variability at low particle sizes is a lot higher than at larger particle sizes. Although you would expect a similar effect due to the Poisson nature of counting, I'd expect it to be smaller. This might suggest that there is additional structure in your size distribution at these sizes, and that the best you can hope for with a log-normal model is a fairly rough approximation.

If this is the case, I suspect you might be able to get away with just doing a least-squares fit of a log-normal model function to your measured values, potentially with weights which reflect the estimated error in each bin (obtained either by taking the std. deviation of repeated measurements, or by analysing the noise characteristics of the measurement instrument). Although it's not strictly optimal, and you ought to be aware of the potential hiccups, it's often good enough for the task at hand (I use it routinely to fit 2D Gaussians to point like objects in image data).

cheers,
David


----- Original Message ----
From: Robert Kern <robert.kern at gmail.com>
To: SciPy Users List <scipy-user at scipy.org>
Sent: Wed, 18 November, 2009 9:41:47 AM
Subject: Re: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data

On Tue, Nov 17, 2009 at 14:04,  <josef.pktd at gmail.com> wrote:

> The way I see it, you have to variables, size and counts (or concentration).
> My initial interpretation was you want to model the relationship between
> these two variables.
> When the total number of particles is fixed, then the conditional size
> distribution is univariate, and could be modeled by a log-normal
> distribution. (This still leaves the total count unmodelled.)
>
> If you have the total particle count per bin, then it
> should be possible to write down the likelihood function that is
> discretized to the bins from the continuous distribution.
> Given a random particle, what's the probability of being in bin 1,
> bin 2 and so on. Then add the log-likelihood over all particles
> and maximize as a function of the log-normal parameters.
> (There might be a numerical trick using fraction instead of
> conditional count, but I'm not sure what the analogous discrete
> distribution would be. )

I usually use the multinomial as the likelihood for such
"histogram-fitting" exercises. The two problem points here are that we
have real-valued concentrations, not integer-valued counts, and that
we don't have a measurement for the censored region. For the former, I
would suggest simply multiplying by the concentrations by a factor of
10 (equivalently, changing the units to particles/<10^n larger
volume>) such that the resolution of the measurements is 1
particle/<volume>. Then just apply the multinomial. It should be a
close enough approximation.

I'm not entirely sure what to do about the censored probability mass.
I think there might be a simple correction factor that you can apply
to the multinomial likelihood, but I haven't worked it out.

> Once the parameters of the log-normal distribution are
> estimated, the distribution would be defined over all of
> the real line (where the out of sample pdf is determined
> by assumption not data).

Since we are extrapolating to the censored region, it would probably
be a good idea to estimate the uncertainty of the estimate. I would
probably suggest using PyMC to do a Bayesian model. A parametric
bootstrap might also serve.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


From robert.kern at gmail.com  Tue Nov 17 16:12:17 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 15:12:17 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> 
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> 
	<1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> 
	<3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> 
	<1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com>
Message-ID: <3d375d730911171312k5def059p16e90d7d6063f08@mail.gmail.com>

On Tue, Nov 17, 2009 at 15:01,  <josef.pktd at gmail.com> wrote:
> On Tue, Nov 17, 2009 at 3:41 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Tue, Nov 17, 2009 at 14:04, ?<josef.pktd at gmail.com> wrote:
>>
>>> The way I see it, you have to variables, size and counts (or concentration).
>>> My initial interpretation was you want to model the relationship between
>>> these two variables.
>>> When the total number of particles is fixed, then the conditional size
>>> distribution is univariate, and could be modeled by a log-normal
>>> distribution. (This still leaves the total count unmodelled.)
>>>
>>> If you have the total particle count per bin, then it
>>> should be possible to write down the likelihood function that is
>>> discretized to the bins from the continuous distribution.
>>> Given a random particle, what's the probability of being in bin 1,
>>> bin 2 and so on. Then add the log-likelihood over all particles
>>> and maximize as a function of the log-normal parameters.
>>> (There might be a numerical trick using fraction instead of
>>> conditional count, but I'm not sure what the analogous discrete
>>> distribution would be. )
>>
>> I usually use the multinomial as the likelihood for such
>> "histogram-fitting" exercises. The two problem points here are that we
>> have real-valued concentrations, not integer-valued counts, and that
>> we don't have a measurement for the censored region. For the former, I
>> would suggest simply multiplying by the concentrations by a factor of
>> 10 (equivalently, changing the units to particles/<10^n larger
>> volume>) such that the resolution of the measurements is 1
>> particle/<volume>. Then just apply the multinomial. It should be a
>> close enough approximation.
>>
>> I'm not entirely sure what to do about the censored probability mass.
>> I think there might be a simple correction factor that you can apply
>> to the multinomial likelihood, but I haven't worked it out.
>
> I think, for the continuous distribution it would be just dividing by
> the probability of the not-censored region (which is also a function of
> the distribution parameters). This would then just be a truncated
> log-normal. multinomial might work the same, as long as the
> probabilities are defined by the discretization.
>
> Would you apply the multinomial directly? I don't see in that case
> how you would recover the parameters of the continuous distribution.

You would just be using the multinomial to build the likelihood. For
each iteration in the likelihood maximization, you are given the
parameters of the continuous distribution. Given the bin edges and
those parameters, you compute the probability mass within each bin for
that specific distribution (the difference of the CDF between bin
edges). That is the p-vector for the multinomial. The probability of
getting the observed counts is the likelihood for the given parameters
of the continuous distribution.

And now that I think about it, you don't need to apply any correction
to the multinomial in the likelihood. The number of counts in the
censored region is just another unknown parameter to optimize along
with the continuous distribution's parameters.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From lorenzo.isella at gmail.com  Tue Nov 17 17:03:14 2009
From: lorenzo.isella at gmail.com (Lorenzo Isella)
Date: Tue, 17 Nov 2009 23:03:14 +0100
Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
In-Reply-To: <mailman.9.1258480803.25983.scipy-user@scipy.org>
References: <mailman.9.1258480803.25983.scipy-user@scipy.org>
Message-ID: <4B031DA2.8040400@gmail.com>


> Date: Mon, 16 Nov 2009 23:44:17 -0600
> From: G?khan Sever <gokhansever at gmail.com>
> Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>,	SciPy
> 	Users List <scipy-user at scipy.org>
> Message-ID:
> 	<49d6b3500911162144x1193e04cj1a103776092c4471 at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I have a data which represents aerosol size distribution in between 0.1 to
> 3.0 micrometer ranges. I would like extrapolate the lower size down to 10
> nm. The data in this context is log-normally distributed. Therefore I am
> looking a way to fit a log-normal curve onto my data. Could you please give
> me some pointers to solve this problem?
>
> Thank you.
>
>   
Hello,
I have not followed the many replies to this long post in detail, but by 
chance I happen to know quite in detail what you are talking about 
(probably SMPS data or similar).
I normally resort to R for this kind of tasks 
(http://www.r-project.org/), but nothing prevents you from using Python 
instead. You just want to compare your empirical data binning with what 
would be expected from a lognormal distribution. Please have a look at
http://tinyurl.com/ygmw4lc
and at the functions defined there (A1, mu1 and myvar1 are the overall 
concentration, the geometric mean and the std of the number-size 
distribution, respectively).
Cheers

Lorenzo


From gokhansever at gmail.com  Tue Nov 17 17:07:17 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 16:07:17 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com>
	<3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com>
Message-ID: <49d6b3500911171407o19efbbf1t996bd33c698b4e2b@mail.gmail.com>

On Tue, Nov 17, 2009 at 1:37 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 13:28, G?khan Sever <gokhansever at gmail.com> wrote:
> >
> >
> > On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:
>
> >> If conc where just lognormal distributed, then you would not get any
> >> relationship between conc and size.
> >>
> >> If you have many observations with conc, size pairs then you could
> >> estimate a noisy model
> >> conc = f(size) + u  where the noise u is for example log-normal
> >> distributed but you would still need to get an expression for the
> >> non-linear function f.
> >
> > I don't understand why I can't get a relation between sizes and conc
> values
> > if conc is log-normally distributed. Can I elaborate this a bit more? The
> > non-linear relationship part is also confusing me. If say to test the
> linear
> > relationship of x and y data pairs we just fit a line, in this case what
> I
> > am looking is to fit a log-normal to get a relation between size and
> conc.
>
> It's a language issue. Your concentration values are not log-normally
> distributed. Your particle sizes are log-normally distributed (maybe).
> The concentration of a range of particle sizes is a measurement that
> is related to particle size the distribution, but you would not say
> that the measurements themselves are log-normally distributed. Josef
> was taking your language at face value.
>
> >> If you want to fit a curve f that has the same shape as the pdf of
> >> the log-normal, then you cannot do it with lognorm.fit, because that
> >> just assumes you have a random sample independent of size.
> >
> > Could you give an example on this?
>
> x = stats.norm.rvs()
> stats.norm.fit(x)
>
> >> So, it's not clear to me what you really want, or what your sample data
> >> looks like (do you have only one 15 element sample or lots of them).
> >
> > I have many sample points (thousands) that are composed of this 15
> elements.
> > But the whole data don't look much different the sample I used. Most
> peaks
> > are around 3rd - 4th channel and decaying as shown in the figure.
>
> Do you need to fit a different distribution for each 15-vector? Or are
> all of these measurements supposed to be merged somehow?
>

For my comparison case I will use an hour length of data, which are composed
of 3600 sample points. At each minute I will average these points. This is
because I am comparing data from two different instruments and by averaging
I am trying to eliminate intrinsic measurement error. It is really not an
easy task to make point by point comparison in my case. So in the end I will
have 60 averaged data-points where each point composed of 15-elements in
them. Later use the same fitting technique to guess the
out-ouf-the-measurement-limits parts.


>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/bd368b05/attachment.html>

From gokhansever at gmail.com  Tue Nov 17 17:21:27 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 16:21:27 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>
Message-ID: <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>

On Tue, Nov 17, 2009 at 1:40 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 13:36, G?khan Sever <gokhansever at gmail.com> wrote:
> > On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
> >> So, I would say that it's a bit trickier than fitting the log-normal
> >> PDF to the data for a couple of reasons.
> >>
> >> 1) Directly fitting PDFs to histogram values is usually not a great
> >> idea to begin with.
> >> 2) We don't know how much probability mass is in the censored region.
> >
> > So we agree that it is easy to implement a log-normal fit than a discrete
> > one?
>
> No, none of the things we have suggested will work well for you. You
> have a more complicated task ahead of you. I have ideas that might
> work, but explaining them will take more time than I have.
>

Looking at some recent replies and re-reading them a couple times, I should
say the techniques mentioned in them are beyond my technical skills or at
least I need a professor to help me or a good statistics book to study
further. I should also note that this is just a feasibility study comparing
actual observed cloud condensation nuclei concentration measurements to the
modelled concentrations using another instrument's size distribution data
with the help of a thermodynamic particle activation equation which I will
be able to infer an activation size limit. The results that are found in
this study will not be placed on a journal, they will just be presented in
my cloud physics class presentation. I am trying to assess the sources of
errors and testing the usability of the size distributions from that
particular instrument in this comparison study. Extending the size
distribution beyond and below the instruments measurement limit is one of
the biggest source of errors to represent the reality, but of course there
other simplifications and assumptions that add uncertainties.

Besides, what is wrong with using the spline interpolation technique? It
fits nicely on my sample data. See the resulting image here:
http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png    (Green line
represents the fit spline)


>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/2d246242/attachment.html>

From robert.kern at gmail.com  Tue Nov 17 17:27:06 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 16:27:06 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> 
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> 
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> 
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>
Message-ID: <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com>

On Tue, Nov 17, 2009 at 16:21, G?khan Sever <gokhansever at gmail.com> wrote:

> Besides, what is wrong with using the spline interpolation technique? It
> fits nicely on my sample data. See the resulting image here:
> http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png??? (Green line
> represents the fit spline)

What spline interpolation technique? That certainly doesn't look like
a good spline fit. In any case, splines may be fine for
*interpolation*, but you need *extrapolation*, and splines are useless
for that. You need a physically-motivated model like the distributions
recommended by your textbook.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Tue Nov 17 17:30:56 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 16:30:56 -0600
Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
In-Reply-To: <4B031DA2.8040400@gmail.com>
References: <mailman.9.1258480803.25983.scipy-user@scipy.org>
	<4B031DA2.8040400@gmail.com>
Message-ID: <49d6b3500911171430i1f85ca94y8a56300ec1952607@mail.gmail.com>

On Tue, Nov 17, 2009 at 4:03 PM, Lorenzo Isella <lorenzo.isella at gmail.com>wrote:

>
>  Date: Mon, 16 Nov 2009 23:44:17 -0600
>> From: G?khan Sever <gokhansever at gmail.com>
>> Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
>> To: Discussion of Numerical Python <numpy-discussion at scipy.org>,
>>  SciPy
>>
>>        Users List <scipy-user at scipy.org>
>> Message-ID:
>>        <49d6b3500911162144x1193e04cj1a103776092c4471 at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>> Hello,
>>
>> I have a data which represents aerosol size distribution in between 0.1 to
>> 3.0 micrometer ranges. I would like extrapolate the lower size down to 10
>> nm. The data in this context is log-normally distributed. Therefore I am
>> looking a way to fit a log-normal curve onto my data. Could you please
>> give
>> me some pointers to solve this problem?
>>
>> Thank you.
>>
>>
>>
> Hello,
> I have not followed the many replies to this long post in detail, but by
> chance I happen to know quite in detail what you are talking about (probably
> SMPS data or similar).
> I normally resort to R for this kind of tasks (http://www.r-project.org/),
> but nothing prevents you from using Python instead. You just want to compare
> your empirical data binning with what would be expected from a lognormal
> distribution. Please have a look at
> http://tinyurl.com/ygmw4lc
> and at the functions defined there (A1, mu1 and myvar1 are the overall
> concentration, the geometric mean and the std of the number-size
> distribution, respectively).
> Cheers
>

Hey Lorenzo,

Finally someone who knows the heart of the subject :) Thanks for stopping
by.

The data that I am using is Passive Cavity Aerosol Spectrometer (PCASP)
measured size-distributions. Unfortunately even if we had the mains part of
the SMPS instrument we couldn't fly it since the radioactive element was not
reached during campaign. It is always an issue to deliver the radioactive
parts out of the US :)

Anyways assuming that the relative humidity was quite low in the measurement
region I am not expecting a huge deviation from the dry-particle size
definition. But as I said above this is just a feasibility study. I will
test and see how much an error I will get with this method. Besides there is
no information regarding to the chemical composition of the aerosols,
therefore I am basing on kappa-kohler theory and making another
simplification at that point.

Could you please send this script off-list and the data file associated with
it?

Would be greatly appreciated.


>
> Lorenzo
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/b457f37b/attachment.html>

From gokhansever at gmail.com  Tue Nov 17 17:42:44 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 16:42:44 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>
	<3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com>
Message-ID: <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com>

On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 16:21, G?khan Sever <gokhansever at gmail.com> wrote:
>
> > Besides, what is wrong with using the spline interpolation technique? It
> > fits nicely on my sample data. See the resulting image here:
> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png    (Green
> line
> > represents the fit spline)
>
> What spline interpolation technique?


>From here
http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

Spline interpolation in 1-d (interpolate.splXXX)


That certainly doesn't look like
> a good spline fit.


True, because I used only 30 points. It looks much smoother with alot more
point as you might expected.


> In any case, splines may be fine for
> *interpolation*, but you need *extrapolation*, and splines are useless
> for that.
>
You need a physically-motivated model like the distributions
> recommended by your textbook.
>
>
Using spline-interp is a test case to see how good it will do on my data. I
will use log-normal way as was in the original intention. Let me check with
someone else in the department to get some feedback on this before I
completely get lost in the matter.

One quick question: "extrapolation" means to estimate a data both "beyond"
and "below" the given limits, right? (For my example to guess less than
0.1um should I say downward-extrapolation and above 3.0 um
upward-extrapolation or just extrapolation is enough?)


> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/b0407965/attachment.html>

From robert.kern at gmail.com  Tue Nov 17 17:58:43 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 16:58:43 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<a62fab400911162213u6eed2d57u604697371f28d40e@mail.gmail.com> 
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> 
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> 
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> 
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> 
	<3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> 
	<49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com>
Message-ID: <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com>

On Tue, Nov 17, 2009 at 16:42, G?khan Sever <gokhansever at gmail.com> wrote:
>
> On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> On Tue, Nov 17, 2009 at 16:21, G?khan Sever <gokhansever at gmail.com> wrote:
>>
>> > Besides, what is wrong with using the spline interpolation technique? It
>> > fits nicely on my sample data. See the resulting image here:
>> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png??? (Green
>> > line
>> > represents the fit spline)
>>
>> What spline interpolation technique?
>
> From here
> http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
>
> Spline interpolation in 1-d (interpolate.splXXX)
>
>> That certainly doesn't look like
>> a good spline fit.
>
> True, because I used only 30 points. It looks much smoother with alot more
> point as you might expected.

Don't judge it based on its smoothness at many points. The smooth
appearance is simply a function of the number of points you choose to
sample it with, not how well it fits the data.

Even if you weren't dealing with an extrapolation problem, you
shouldn't use spline interpolation* on noisy data. You would do
something like least-squares fitting to a low-order spline. The spline
should not go through the observed data points exactly.

* And this brings up another terminological issue. I may have used the
term "interpolation" in a couple of different ways. There is a general
sense in which "interpolate" means "to make predictions about certain
inputs (e.g. the concentration [prediction] for the given particle
size [input]) within the range of observed inputs". Whereas,
"interpolate" can also mean something much more specific: finding a
curve that exactly goes through the given observations. "Spline
interpolation" would be a form of the latter, and is not related to
what you need.

>> In any case, splines may be fine for
>> *interpolation*, but you need *extrapolation*, and splines are useless
>> for that.
>>
>> You need a physically-motivated model like the distributions
>> recommended by your textbook.
>
> Using spline-interp is a test case to see how good it will do on my data.

Good. I just wanted to make sure that you knew what was wrong with
using splines in this case. :-)

> I
> will use log-normal way as was in the original intention. Let me check with
> someone else in the department to get some feedback on this before I
> completely get lost in the matter.

Always wise. :-)

> One quick question: "extrapolation" means to estimate a data both "beyond"
> and "below" the given limits, right? (For my example to guess less than
> 0.1um should I say downward-extrapolation and above 3.0 um
> upward-extrapolation or just extrapolation is enough?)

Just "extrapolation" can describe either case, yes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Tue Nov 17 18:52:34 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 17:52:34 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com>
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com>
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>
	<3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com>
	<49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com>
	<3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com>
Message-ID: <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com>

On Tue, Nov 17, 2009 at 4:58 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 16:42, G?khan Sever <gokhansever at gmail.com> wrote:
> >
> > On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Tue, Nov 17, 2009 at 16:21, G?khan Sever <gokhansever at gmail.com>
> wrote:
> >>
> >> > Besides, what is wrong with using the spline interpolation technique?
> It
> >> > fits nicely on my sample data. See the resulting image here:
> >> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png
> (Green
> >> > line
> >> > represents the fit spline)
> >>
> >> What spline interpolation technique?
> >
> > From here
> > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
> >
> > Spline interpolation in 1-d (interpolate.splXXX)
> >
> >> That certainly doesn't look like
> >> a good spline fit.
> >
> > True, because I used only 30 points. It looks much smoother with alot
> more
> > point as you might expected.
>
> Don't judge it based on its smoothness at many points. The smooth
> appearance is simply a function of the number of points you choose to
> sample it with, not how well it fits the data.
>
> Even if you weren't dealing with an extrapolation problem, you
> shouldn't use spline interpolation* on noisy data. You would do
> something like least-squares fitting to a low-order spline. The spline
> should not go through the observed data points exactly.
>
> * And this brings up another terminological issue. I may have used the
> term "interpolation" in a couple of different ways. There is a general
> sense in which "interpolate" means "to make predictions about certain
> inputs (e.g. the concentration [prediction] for the given particle
> size [input]) within the range of observed inputs". Whereas,
> "interpolate" can also mean something much more specific: finding a
> curve that exactly goes through the given observations. "Spline
> interpolation" would be a form of the latter, and is not related to
> what you need.
>
> >> In any case, splines may be fine for
> >> *interpolation*, but you need *extrapolation*, and splines are useless
> >> for that.
> >>
> >> You need a physically-motivated model like the distributions
> >> recommended by your textbook.
> >
> > Using spline-interp is a test case to see how good it will do on my data.
>
> Good. I just wanted to make sure that you knew what was wrong with
> using splines in this case. :-)
>
> > I
> > will use log-normal way as was in the original intention. Let me check
> with
> > someone else in the department to get some feedback on this before I
> > completely get lost in the matter.
>
> Always wise. :-)
>

Talking to another guy creating second modal (probably a normal distributed
way) might be the other approach to take in addition to log-normally
extrapolating the data. In any case, I should be able to parametrize the
fits since I will do integration once I am done with the extrapolation part.


I asked this in one of my early replies just repeating what is the way to
get log-normal sample using scipy.stats? I will use it for a demonstrative
case.
For some reason, this never looks an expected log-normal sample to me:

stats.lognorm.rvs(1,size=15)

What am I missing here?


>
> > One quick question: "extrapolation" means to estimate a data both
> "beyond"
> > and "below" the given limits, right? (For my example to guess less than
> > 0.1um should I say downward-extrapolation and above 3.0 um
> > upward-extrapolation or just extrapolation is enough?)
>
> Just "extrapolation" can describe either case, yes.
>
>
Thanks for your time and explanations Robert. I really appreciate your help.
Probably I will include you in the acknowledgements part of my presentation.


> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/1716d6dc/attachment.html>

From robert.kern at gmail.com  Tue Nov 17 19:00:40 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Nov 2009 18:00:40 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> 
	<1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> 
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> 
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> 
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> 
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> 
	<3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> 
	<49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> 
	<3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> 
	<49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com>
Message-ID: <3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com>

On Tue, Nov 17, 2009 at 17:52, G?khan Sever <gokhansever at gmail.com> wrote:

> I asked this in one of my early replies just repeating what is the way to
> get log-normal sample using scipy.stats? I will use it for a demonstrative
> case.
> For some reason, this never looks an expected log-normal sample to me:
>
> stats.lognorm.rvs(1,size=15)
>
> What am I missing here?

Are you expecting that to look like your 15-vector concentration data?
That's not what you should expect. Instead,

  x = stats.lognorm.rvs(1, size=10000)
  h = np.histogram(x, bins=15)

Now, the *histogram* of the samples should look like roughly like the
shapes that you are expecting. .rvs() produces the samples themselves.
I.e. pretend like each element is the size of an individual particle,
not the concentration of a size class of particles. Taking the
histogram "simulates" what your instrument does: it finds the amont of
particles in each size class.

> Thanks for your time and explanations Robert. I really appreciate your help.
> Probably I will include you in the acknowledgements part of my presentation.

Entirely unnecessary, of course.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Tue Nov 17 19:36:42 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 17 Nov 2009 18:36:42 -0600
Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal
	distributed data
In-Reply-To: <3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
	<3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com>
	<49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com>
	<3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com>
	<49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com>
	<3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com>
	<49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com>
	<3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com>
	<49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com>
	<3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com>
Message-ID: <49d6b3500911171636n2fbb5bddt58bb21a0057742a6@mail.gmail.com>

On Tue, Nov 17, 2009 at 6:00 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 17:52, G?khan Sever <gokhansever at gmail.com> wrote:
>
> > I asked this in one of my early replies just repeating what is the way to
> > get log-normal sample using scipy.stats? I will use it for a
> demonstrative
> > case.
> > For some reason, this never looks an expected log-normal sample to me:
> >
> > stats.lognorm.rvs(1,size=15)
> >
> > What am I missing here?
>
> Are you expecting that to look like your 15-vector concentration data?
> That's not what you should expect. Instead,
>
>  x = stats.lognorm.rvs(1, size=10000)
>  h = np.histogram(x, bins=15)
>
> Now, the *histogram* of the samples should look like roughly like the
> shapes that you are expecting. .rvs() produces the samples themselves.
> I.e. pretend like each element is the size of an individual particle,
> not the concentration of a size class of particles. Taking the
> histogram "simulates" what your instrument does: it finds the amont of
> particles in each size class.
>

Now, I see it better. Makes much more sense now.


>
> > Thanks for your time and explanations Robert. I really appreciate your
> help.
> > Probably I will include you in the acknowledgements part of my
> presentation.
>
> Entirely unnecessary, of course.
>

Not at all ;)


>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/71d6dfe5/attachment.html>

From dpfrota at yahoo.com.br  Wed Nov 18 01:29:02 2009
From: dpfrota at yahoo.com.br (dpfrota)
Date: Tue, 17 Nov 2009 22:29:02 -0800 (PST)
Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6
In-Reply-To: <4AE5DEDF.7070701@asu.edu>
References: <4AE5DEDF.7070701@asu.edu>
Message-ID: <26402986.post@talk.nabble.com>


What is the meaning of these adresses?
I opened these files, and they has some strange lines. The first file has
only " __import__('pkg_resources').declare_namespace(__name__) ". Is module
PKG necessary?
And the second has a comment line that looks a code line.

I am looking forward to run Audiolab...
Thanks for helping, and forgive my (probably) mistakes!


Christopher Brown wrote:
> 
> Hi List,
> 
> Has anyone gotten scikits.audiolab working with python 2.6? Here is the 
> error I get on a clean Python 2.6 install with numpy and audiolab 
> installed (using the audiolab 0.10.2 installer for py2.6 I downloaded 
> from pypi, and a clean Win XPSP3 install):
> 
>  >>> from scikits import audiolab
> Traceback (most recent call last):
>    File "C:\Python26\lib\site-packages\scikits\audiolab\__init__.py", 
> line 25, in <module>
>      from pysndfile import formatinfo, sndfile
>    File 
> "C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py", 
> line 1, in <module>
>      from _sndfile import Sndfile, Format, available_file_formats, 
> available_encodings
> ImportError: DLL load failed: The specified procedure could not be found.
> 
> Any ideas? Everything works fine on py2.5.
> 
> -- 
> Chris
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p26402986.html
Sent from the Scipy-User mailing list archive at Nabble.com.


From robert.kern at gmail.com  Wed Nov 18 01:31:57 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 18 Nov 2009 00:31:57 -0600
Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6
In-Reply-To: <26402986.post@talk.nabble.com>
References: <4AE5DEDF.7070701@asu.edu> <26402986.post@talk.nabble.com>
Message-ID: <3d375d730911172231i4cf42760l80038a00f84fa7c8@mail.gmail.com>

On Wed, Nov 18, 2009 at 00:29, dpfrota <dpfrota at yahoo.com.br> wrote:
>
> What is the meaning of these adresses?
> I opened these files, and they has some strange lines. The first file has
> only " __import__('pkg_resources').declare_namespace(__name__) ". Is module
> PKG necessary?

These enable the scikits namespace such that you can have multiple
scikits packages installed (possibly to separate locations).

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From jr at sun.ac.za  Wed Nov 18 05:53:06 2009
From: jr at sun.ac.za (Johann Rohwer)
Date: Wed, 18 Nov 2009 12:53:06 +0200
Subject: [SciPy-User] SciPy build error
Message-ID: <200911181253.06501.jr@sun.ac.za>

I get the following build error for scipy (both numpy and scipy fresh 
from SVN today):

building 'scipy.special.lambertw' extension
compiling C sources
C compiler: gcc -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wformat -
Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-
protector --param=ssp-buffer-size=4 -g -fPIC

compile options: '-I/usr/lib64/python2.6/site-
packages/numpy/core/include -I/usr/lib64/python2.6/site-
packages/numpy/core/include -I/usr/include/python2.6 -c'
gcc: scipy/special/lambertw.c
scipy/special/lambertw.c: In function 
?__pyx_f_5scipy_7special_8lambertw_zlog?:
scipy/special/lambertw.c:562: error: incompatible types when assigning 
to type ?npy_cdouble? from type ?int?
scipy/special/lambertw.c: In function 
?__pyx_f_5scipy_7special_8lambertw_zexp?:
scipy/special/lambertw.c:599: error: incompatible types when assigning 
to type ?npy_cdouble? from type ?int?
scipy/special/lambertw.c: In function 
?__pyx_f_5scipy_7special_8lambertw_zlog?:
scipy/special/lambertw.c:562: error: incompatible types when assigning 
to type ?npy_cdouble? from type ?int?
scipy/special/lambertw.c: In function 
?__pyx_f_5scipy_7special_8lambertw_zexp?:
scipy/special/lambertw.c:599: error: incompatible types when assigning 
to type ?npy_cdouble? from type ?int?
error: Command "gcc -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -
Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -
fstack-protector --param=ssp-buffer-size=4 -g -fPIC -
I/usr/lib64/python2.6/site-packages/numpy/core/include -
I/usr/lib64/python2.6/site-packages/numpy/core/include -
I/usr/include/python2.6 -c scipy/special/lambertw.c -o 
build/temp.linux-x86_64-2.6/scipy/special/lambertw.o" failed with exit 
status 1

System: Linux x86_64
gcc version 4.4.1
Self compiled ATLAS 3.8.0 and LAPACK 3.1.1
(Numpy installs fine and passes all tests.)

Any ideas?
J.


From gael.varoquaux at normalesup.org  Wed Nov 18 08:46:13 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 14:46:13 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
Message-ID: <20091118134613.GB17382@phare.normalesup.org>

Hi there,

I would like to list the connect components of a graph (or a sparse
matrix, same thing). I know of course of the bread-first traversal, as
implemented eg in networkX, to find the connect components. However, I
have a feeling that sparse linear algebra must be performing such
searches, to decompose sparse matrices in blocks. I'd love to piggy back
on such implementations, rather than code and maintain a C or cython
version of breadth-first graph traversal.

Any idea how I could squeeze the information out of the sparse linear
algebra that we carry around with scipy? I thought about using arpack to
get the largest eigen vectors of the transition matrix, but that was a
stupid idea, as (AFAIK) it will not partition my graph in connect
components, but only tell me how many connect components I have.

Ga?l


From zachary.pincus at yale.edu  Wed Nov 18 09:03:18 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 18 Nov 2009 09:03:18 -0500
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091118134613.GB17382@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>
Message-ID: <7E884924-F811-4440-B107-DA86F76A150F@yale.edu>

Hi Ga?l,

> Any idea how I could squeeze the information out of the sparse linear
> algebra that we carry around with scipy? I thought about using  
> arpack to
> get the largest eigen vectors of the transition matrix, but that was a
> stupid idea, as (AFAIK) it will not partition my graph in connect
> components, but only tell me how many connect components I have.

 From this useful tutorial on spectral clustering:
http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/Luxburg07_tutorial.pdf

> Thus, the matrix L has as many eigenvalues 0 as there are connected  
> components, and
> the corresponding eigenvectors are the indicator vectors of the  
> connected components.

(where L is the graph laplacian).

Zach


On Nov 18, 2009, at 8:46 AM, Gael Varoquaux wrote:

> Hi there,
>
> I would like to list the connect components of a graph (or a sparse
> matrix, same thing). I know of course of the bread-first traversal, as
> implemented eg in networkX, to find the connect components. However, I
> have a feeling that sparse linear algebra must be performing such
> searches, to decompose sparse matrices in blocks. I'd love to piggy  
> back
> on such implementations, rather than code and maintain a C or cython
> version of breadth-first graph traversal.
>
> Any idea how I could squeeze the information out of the sparse linear
> algebra that we carry around with scipy? I thought about using  
> arpack to
> get the largest eigen vectors of the transition matrix, but that was a
> stupid idea, as (AFAIK) it will not partition my graph in connect
> components, but only tell me how many connect components I have.
>
> Ga?l
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From cimrman3 at ntc.zcu.cz  Wed Nov 18 09:03:43 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Wed, 18 Nov 2009 15:03:43 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091118134613.GB17382@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>
Message-ID: <4B03FEBF.3080308@ntc.zcu.cz>

Hi Gael,

Gael Varoquaux wrote:
> Hi there,
> 
> I would like to list the connect components of a graph (or a sparse
> matrix, same thing). I know of course of the bread-first traversal, as
> implemented eg in networkX, to find the connect components. However, I
> have a feeling that sparse linear algebra must be performing such
> searches, to decompose sparse matrices in blocks. I'd love to piggy back
> on such implementations, rather than code and maintain a C or cython
> version of breadth-first graph traversal.

I have a function in C (as a part of sfepy), that does that. But as it might be 
useful for more people, what about putting it scipy sparsetools?

> Any idea how I could squeeze the information out of the sparse linear
> algebra that we carry around with scipy? I thought about using arpack to
> get the largest eigen vectors of the transition matrix, but that was a
> stupid idea, as (AFAIK) it will not partition my graph in connect
> components, but only tell me how many connect components I have.

Getting eigenvectors is imho more costly that the graph search, no?

r.


From gael.varoquaux at normalesup.org  Wed Nov 18 09:12:59 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 15:12:59 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
References: <20091118134613.GB17382@phare.normalesup.org>
	<7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
Message-ID: <20091118141259.GA3477@phare.normalesup.org>

On Wed, Nov 18, 2009 at 09:03:18AM -0500, Zachary Pincus wrote:
>  From this useful tutorial on spectral clustering:
> http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/Luxburg07_tutorial.pdf

> > Thus, the matrix L has as many eigenvalues 0 as there are connected
> > components, and the corresponding eigenvectors are the indicator
> > vectors of the connected components.

> (where L is the graph laplacian).

I read this tutorial (a very good one, by the way). But I am too dumb to
figure out from the above assertion how to retrieve the connect
components. Let me explain my problem on an example. Suppose that we have
the trivial graph. Its adjacency matrix is the identity, the
corresponding laplacian is null. An EVD of these matrices will result in
an abitrary orthonormal basis of my vertex space. How do I figure out the
connect components from that?

The problem arises also on non trivial graphs, by the way. The problem is
that doing an EVD of the transition of laplace matrix only gives a
subspace of the kernel of the laplace matrix. I could probably do a
sparse matrix factorization on that, but I see complexity and cost coming
in, and I am trying to avoid that.

Thanks for your answer,

Ga?l


From gael.varoquaux at normalesup.org  Wed Nov 18 09:16:38 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 15:16:38 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <4B03FEBF.3080308@ntc.zcu.cz>
References: <20091118134613.GB17382@phare.normalesup.org>
	<4B03FEBF.3080308@ntc.zcu.cz>
Message-ID: <20091118141638.GB3477@phare.normalesup.org>

On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote:
> Hi Gael,

> Gael Varoquaux wrote:
> > Hi there,

> > I would like to list the connect components of a graph (or a sparse
> > matrix, same thing). I know of course of the bread-first traversal, as
> > implemented eg in networkX, to find the connect components. However, I
> > have a feeling that sparse linear algebra must be performing such
> > searches, to decompose sparse matrices in blocks. I'd love to piggy back
> > on such implementations, rather than code and maintain a C or cython
> > version of breadth-first graph traversal.

> I have a function in C (as a part of sfepy), that does that. But as it
> might be useful for more people, what about putting it scipy
> sparsetools?

I think it would be very useful. I would actually include it in the
scipy.sparse namespace too.

> > Any idea how I could squeeze the information out of the sparse linear
> > algebra that we carry around with scipy? I thought about using arpack to
> > get the largest eigen vectors of the transition matrix, but that was a
> > stupid idea, as (AFAIK) it will not partition my graph in connect
> > components, but only tell me how many connect components I have.

> Getting eigenvectors is imho more costly that the graph search, no?

Well, getting the largest eigenvector of the transition matrix is in
o(n), using arpack, AFAIK. So the cost is similar, and on one side we
have optimized C code, and on the other side I only had Python code (or C
code that I don't want to maintain). In addition, as I am doing diffusion
maps, I needed to call arpack anyhow.

Cheers,

Ga?l


From cimrman3 at ntc.zcu.cz  Wed Nov 18 09:24:37 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Wed, 18 Nov 2009 15:24:37 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091118141638.GB3477@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>	<4B03FEBF.3080308@ntc.zcu.cz>
	<20091118141638.GB3477@phare.normalesup.org>
Message-ID: <4B0403A5.60903@ntc.zcu.cz>

Gael Varoquaux wrote:
> On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote:
>> Hi Gael,
> 
>> Gael Varoquaux wrote:
>>> Hi there,
> 
>>> I would like to list the connect components of a graph (or a sparse
>>> matrix, same thing). I know of course of the bread-first traversal, as
>>> implemented eg in networkX, to find the connect components. However, I
>>> have a feeling that sparse linear algebra must be performing such
>>> searches, to decompose sparse matrices in blocks. I'd love to piggy back
>>> on such implementations, rather than code and maintain a C or cython
>>> version of breadth-first graph traversal.
> 
>> I have a function in C (as a part of sfepy), that does that. But as it
>> might be useful for more people, what about putting it scipy
>> sparsetools?
> 
> I think it would be very useful. I would actually include it in the
> scipy.sparse namespace too.

OK, I will give it a shot (soon), unless someone jumps in with a better solution.

>>> Any idea how I could squeeze the information out of the sparse linear
>>> algebra that we carry around with scipy? I thought about using arpack to
>>> get the largest eigen vectors of the transition matrix, but that was a
>>> stupid idea, as (AFAIK) it will not partition my graph in connect
>>> components, but only tell me how many connect components I have.
> 
>> Getting eigenvectors is imho more costly that the graph search, no?
> 
> Well, getting the largest eigenvector of the transition matrix is in
> o(n), using arpack, AFAIK. So the cost is similar, and on one side we
> have optimized C code, and on the other side I only had Python code (or C
> code that I don't want to maintain). In addition, as I am doing diffusion
> maps, I needed to call arpack anyhow.

I see. BTW. putting a code into scipy somewhat alleviates the maintenance burden ;)

cheers,
r.


From cclarke at chrisdev.com  Wed Nov 18 09:04:40 2009
From: cclarke at chrisdev.com (Chris Clarke)
Date: Wed, 18 Nov 2009 10:04:40 -0400
Subject: [SciPy-User] timeseries forwardfill
Message-ID: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>

Hi
I haven't used this library in a while but  i seem to recall you could  
forward fill 2d arrays and set initial starting values etc.??
Am i correct ?? If so, any special reasons were they removed??
Regards
Chris
  

From gael.varoquaux at normalesup.org  Wed Nov 18 09:31:19 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 15:31:19 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <4B0403A5.60903@ntc.zcu.cz>
References: <20091118134613.GB17382@phare.normalesup.org>
	<4B03FEBF.3080308@ntc.zcu.cz>
	<20091118141638.GB3477@phare.normalesup.org>
	<4B0403A5.60903@ntc.zcu.cz>
Message-ID: <20091118143119.GC3477@phare.normalesup.org>

On Wed, Nov 18, 2009 at 03:24:37PM +0100, Robert Cimrman wrote:
> > Well, getting the largest eigenvector of the transition matrix is in
> > o(n), using arpack, AFAIK. So the cost is similar, and on one side we
> > have optimized C code, and on the other side I only had Python code (or C
> > code that I don't want to maintain). In addition, as I am doing diffusion
> > maps, I needed to call arpack anyhow.

> I see. BTW. putting a code into scipy somewhat alleviates the
> maintenance burden ;)

I'd love to, but the code I am talking about is not something you want to
see. I inherited it from the lab, and its been a horrible burden. Not
that there are not good part in it (there are a lot of excellent
alogrithms), but the problem is that it uses home grown vector
abstractions, and graph structures, which makes it really hard to split
out the good part. In the long run, I hope I will be able to trim out the
bad parts and the vector library, and replace this by scipy components,
and the work that David Cournapeau has been doing to expose numpy
internals to C libraries. Once this is doing, we can think of moving
things out to other libraries: scipy, networkx, or the machine learning
scikit (we have an engineer hired to work on that, beginning in January).

Ga?l


From zachary.pincus at yale.edu  Wed Nov 18 09:36:26 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 18 Nov 2009 09:36:26 -0500
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091118141259.GA3477@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>
	<7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
	<20091118141259.GA3477@phare.normalesup.org>
Message-ID: <D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>

> I read this tutorial (a very good one, by the way). But I am too  
> dumb to
> figure out from the above assertion how to retrieve the connect
> components. Let me explain my problem on an example. Suppose that we  
> have
> the trivial graph. Its adjacency matrix is the identity, the
> corresponding laplacian is null. An EVD of these matrices will  
> result in
> an abitrary orthonormal basis of my vertex space. How do I figure  
> out the
> connect components from that?

Good question! On the other hand, IIRC the eigenvectors of the zero  
matrix are usually defined to be the unit basis vectors, so that  
solves this particular edge case.

Note that at least the non-sparse routines in numpy do this:

  numpy.linalg.eig(numpy.zeros((5,5)))

(array([ 0.,  0.,  0.,  0.,  0.]),
  array([[ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  0.,  1.]]))

So there we have exactly what you want in the trivial case.

> The problem arises also on non trivial graphs, by the way. The  
> problem is
> that doing an EVD of the transition of laplace matrix only gives a
> subspace of the kernel of the laplace matrix. I could probably do a
> sparse matrix factorization on that, but I see complexity and cost  
> coming
> in, and I am trying to avoid that.

My linear algebra is only tenuous at best, so I don't exactly see why  
this is a problem. As far as I understand, to find the connected  
components, first you find the eigenvectors of the laplacian that have  
an eigenvalue of zero. Then for each node in the graph i, there will  
be exactly one eigenvector with a non-zero value at position i. The  
index of this eigenvector is the index of the connected component that  
i belongs to.

Is that right? Again, my linear algebra is rusty.

Zach


From cimrman3 at ntc.zcu.cz  Wed Nov 18 09:44:16 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Wed, 18 Nov 2009 15:44:16 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091118143119.GC3477@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>	<4B03FEBF.3080308@ntc.zcu.cz>	<20091118141638.GB3477@phare.normalesup.org>	<4B0403A5.60903@ntc.zcu.cz>
	<20091118143119.GC3477@phare.normalesup.org>
Message-ID: <4B040840.2050606@ntc.zcu.cz>

Gael Varoquaux wrote:
> On Wed, Nov 18, 2009 at 03:24:37PM +0100, Robert Cimrman wrote:
>>> Well, getting the largest eigenvector of the transition matrix is in
>>> o(n), using arpack, AFAIK. So the cost is similar, and on one side we
>>> have optimized C code, and on the other side I only had Python code (or C
>>> code that I don't want to maintain). In addition, as I am doing diffusion
>>> maps, I needed to call arpack anyhow.
> 
>> I see. BTW. putting a code into scipy somewhat alleviates the
>> maintenance burden ;)
> 
> I'd love to, but the code I am talking about is not something you want to
> see. I inherited it from the lab, and its been a horrible burden. Not
> that there are not good part in it (there are a lot of excellent
> alogrithms), but the problem is that it uses home grown vector
> abstractions, and graph structures, which makes it really hard to split
> out the good part. In the long run, I hope I will be able to trim out the
> bad parts and the vector library, and replace this by scipy components,
> and the work that David Cournapeau has been doing to expose numpy
> internals to C libraries. Once this is doing, we can think of moving
> things out to other libraries: scipy, networkx, or the machine learning
> scikit (we have an engineer hired to work on that, beginning in January).

Now this is an interesting shift in attitude that I experienced myself - 
instead of putting all the cool stuff into own code, distribute it over 
well-known and maintained packages ;)

cheers,
r.


From zachary.pincus at yale.edu  Wed Nov 18 09:55:17 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 18 Nov 2009 09:55:17 -0500
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>
References: <20091118134613.GB17382@phare.normalesup.org>
	<7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
	<20091118141259.GA3477@phare.normalesup.org>
	<D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>
Message-ID: <79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu>

>> The problem arises also on non trivial graphs, by the way. The
>> problem is
>> that doing an EVD of the transition of laplace matrix only gives a
>> subspace of the kernel of the laplace matrix. I could probably do a
>> sparse matrix factorization on that, but I see complexity and cost
>> coming
>> in, and I am trying to avoid that.
>
> My linear algebra is only tenuous at best, so I don't exactly see why
> this is a problem. As far as I understand, to find the connected
> components, first you find the eigenvectors of the laplacian that have
> an eigenvalue of zero. Then for each node in the graph i, there will
> be exactly one eigenvector with a non-zero value at position i. The
> index of this eigenvector is the index of the connected component that
> i belongs to.

Wait... you're saying that the eigenvectors will only span a subspace  
of the kernel, so that there must be at least some position i where  
there is a zero value in each eigenvector? If this is correct then I  
see the problem; hopefully someone who actually knows what they're  
talking about can help me out here...


From pgmdevlist at gmail.com  Wed Nov 18 10:50:31 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 18 Nov 2009 10:50:31 -0500
Subject: [SciPy-User] timeseries forwardfill
In-Reply-To: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>
References: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>
Message-ID: <D9321E95-39E7-4F3D-A92B-B16DC8A46265@gmail.com>


On Nov 18, 2009, at 9:04 AM, Chris Clarke wrote:

> Hi
> I haven't used this library in a while but  i seem to recall you could  
> forward fill 2d arrays and set initial starting values etc.??
> Am i correct ?? If so, any special reasons were they removed??
> Regards
> Chris

What do you mean, removed ? You can find `forward_fill` in scikits.timeseries.lib.interpolate. 
Am I answering your question ?
P.

From seb.haase at gmail.com  Wed Nov 18 11:00:19 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Wed, 18 Nov 2009 17:00:19 +0100
Subject: [SciPy-User] difference of angles - to be between -180 and + 180
Message-ID: <bc657ead0911180800h79ae68b1k23e7e5d30f487f45@mail.gmail.com>

Hi,

Does anyone have a function that calculates delta-angles taking the
wrap-around at 180 degrees into account ?

I'm thinking of a function like:
>>> diffAngle(190, -10)
160

My current version looks like this:
def diffAngle(a1,a0):
    """
    return a1-a0
    handle wrap-around for -180 and +180
    """
    d = a1-a0
    if d < -180:
        d=360+d
    if d> 180:
        d=360-d
    return d
diffAngle=np.vectorize(diffAngle)


But I'm not sure if this is handling all cases correctly ;-(
Especially I have problems regarding the correct sign - in cases like this:
diffAngle(20, -170) where I was expecting -170 , but I get 170.

Thanks,
Sebastian Haase


From gael.varoquaux at normalesup.org  Wed Nov 18 11:10:04 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 17:10:04 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>
References: <20091118134613.GB17382@phare.normalesup.org>
	<7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
	<20091118141259.GA3477@phare.normalesup.org>
	<D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>
Message-ID: <20091118161004.GE17382@phare.normalesup.org>

On Wed, Nov 18, 2009 at 09:36:26AM -0500, Zachary Pincus wrote:
> Note that at least the non-sparse routines in numpy do this:

>   numpy.linalg.eig(numpy.zeros((5,5)))

> (array([ 0.,  0.,  0.,  0.,  0.]),
>   array([[ 1.,  0.,  0.,  0.,  0.],
>         [ 0.,  1.,  0.,  0.,  0.],
>         [ 0.,  0.,  1.,  0.,  0.],
>         [ 0.,  0.,  0.,  1.,  0.],
>         [ 0.,  0.,  0.,  0.,  1.]]))

Correct, and they do work OK on real-word graphs, but arpack doesn't, and
its easy to see why (more on that below).

> > The problem arises also on non trivial graphs, by the way. The
> > problem is that doing an EVD of the transition of laplace matrix only
> > gives a subspace of the kernel of the laplace matrix. I could
> > probably do a sparse matrix factorization on that, but I see
> > complexity and cost  coming in, and I am trying to avoid that.

> My linear algebra is only tenuous at best, so I don't exactly see why  
> this is a problem. As far as I understand, to find the connected  
> components, first you find the eigenvectors of the laplacian that have  
> an eigenvalue of zero. Then for each node in the graph i, there will  
> be exactly one eigenvector with a non-zero value at position i. The  
> index of this eigenvector is the index of the connected component that  
> i belongs to.

> Is that right? Again, my linear algebra is rusty.

Well, the problem is that if you have several eigen values that have the
same value (0 for the laplacian, or 1 for the transition matrix), there
is an infinity of eigen vectors defined: any combination of eigen vector
corresponding to that eigen value is an eigen vector. What I am looking
for is a set of particular eigen vectors. I suspect that the property
that defines seem is sparsity (in a machine learning sens, rather than a
sparse linear algebra sens): many of their coefficients are 0. There
machine learning algorithms to find a sparse basis from a non-sparse one,
but first of all it starts getting too complex for my liking, second I am
unsure that the sparsity is really the exact property that will give me
the connect components of my graph.

Ga?l


From guyer at nist.gov  Wed Nov 18 11:19:59 2009
From: guyer at nist.gov (Jonathan Guyer)
Date: Wed, 18 Nov 2009 11:19:59 -0500
Subject: [SciPy-User] difference of angles - to be between -180 and + 180
In-Reply-To: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov>
References: <bc657ead0911180800h79ae68b1k23e7e5d30f487f45@mail.gmail.com>
	<8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov>
Message-ID: <AA763441-0F7D-4D33-A410-B0EBDEF9858E@nist.gov>

On Nov 18, 2009, at 11:14 AM, I wrote:

>   return np.fmod(d + 540, 360) - 180

Actually, I think you can just write (d + 540) % 360 - 180

I think we used fmod because of some automatic weave inlining we do
that didn't play nice with '%'.


From gael.varoquaux at normalesup.org  Wed Nov 18 11:22:51 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 18 Nov 2009 17:22:51 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu>
References: <20091118134613.GB17382@phare.normalesup.org>
	<7E884924-F811-4440-B107-DA86F76A150F@yale.edu>
	<20091118141259.GA3477@phare.normalesup.org>
	<D29043D3-A1DC-4C46-94E1-2F171705DB34@yale.edu>
	<79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu>
Message-ID: <20091118162251.GF17382@phare.normalesup.org>

On Wed, Nov 18, 2009 at 09:55:17AM -0500, Zachary Pincus wrote:
> >> The problem arises also on non trivial graphs, by the way. The
> >> problem is
> >> that doing an EVD of the transition of laplace matrix only gives a
> >> subspace of the kernel of the laplace matrix. I could probably do a
> >> sparse matrix factorization on that, but I see complexity and cost
> >> coming
> >> in, and I am trying to avoid that.

> > My linear algebra is only tenuous at best, so I don't exactly see why
> > this is a problem. As far as I understand, to find the connected
> > components, first you find the eigenvectors of the laplacian that have
> > an eigenvalue of zero. Then for each node in the graph i, there will
> > be exactly one eigenvector with a non-zero value at position i. The
> > index of this eigenvector is the index of the connected component that
> > i belongs to.

> Wait... you're saying that the eigenvectors will only span a subspace  
> of the kernel, so that there must be at least some position i where  
> there is a zero value in each eigenvector? 

Yes, that's it. The eigenvectors are defined only at a rotation.

> If this is correct then I  see the problem; hopefully someone who
> actually knows what they're  talking about can help me out here...

So do I :)

Thanks for your thoughts,

Ga?l


From cclarke at chrisdev.com  Wed Nov 18 17:18:35 2009
From: cclarke at chrisdev.com (Chris Clarke)
Date: Wed, 18 Nov 2009 18:18:35 -0400
Subject: [SciPy-User] timeseries forwardfill
In-Reply-To: <D9321E95-39E7-4F3D-A92B-B16DC8A46265@gmail.com>
References: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>
	<D9321E95-39E7-4F3D-A92B-B16DC8A46265@gmail.com>
Message-ID: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com>

Sorry for the later reply.  Yes forward_fill is still there and it  
works!!!
But it seemed to have some more capability (initial values, 2d arrays)  
when  it was in the sandbox??
I may be wrong and mixing up with some other library.
  I just wanted to be sure that if i do my own patch i'm not  
reinventing the wheel!!
As this is why we are standardizing on scikists.timeseries
Regards
Chris

On Nov 18, 2009, at 11:50 AM, Pierre GM wrote:

>
> On Nov 18, 2009, at 9:04 AM, Chris Clarke wrote:
>
>> Hi
>> I haven't used this library in a while but  i seem to recall you  
>> could
>> forward fill 2d arrays and set initial starting values etc.??
>> Am i correct ?? If so, any special reasons were they removed??
>> Regards
>> Chris
>
> What do you mean, removed ? You can find `forward_fill` in  
> scikits.timeseries.lib.interpolate.
> Am I answering your question ?
> P.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From sftrytry at gmail.com  Wed Nov 18 17:56:53 2009
From: sftrytry at gmail.com (Jesse Fox)
Date: Wed, 18 Nov 2009 17:56:53 -0500
Subject: [SciPy-User] scipy have problems with preinstalled arpack
Message-ID: <6a2f0640911181456i3aa5fbffka85c5ba87a61b0a4@mail.gmail.com>

I tried to compile scipy and numpy on my Archlinux box. I always got segment
fault during scipy.test() on arpack related functions. I tried to remove my
pre-installed arpack and recompile scipy. The test() ran without any
problem.

Is there any conflict between scipy and arpack?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091118/e83c5e5e/attachment.html>

From reakinator at gmail.com  Wed Nov 18 18:13:42 2009
From: reakinator at gmail.com (Rich E)
Date: Thu, 19 Nov 2009 00:13:42 +0100
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
Message-ID: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>

Hi list,

I just joined because I've had this mac for over a month and I still can't
get a working module of scipy in Snow Leopard.  The dmg says it needs
'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1.
Installing MacPython seems to create more problems than anything else, and
building scipy from source is a no-go (I saw various other posts in the
archives about dependency problems, but no solutions yet.  I am stuck at
UMFPACK sourcers being missing.)

Any help or guidance is greatly appreciate.

Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091119/18613613/attachment.html>

From cool-rr at cool-rr.com  Wed Nov 18 18:50:40 2009
From: cool-rr at cool-rr.com (cool-RR)
Date: Thu, 19 Nov 2009 01:50:40 +0200
Subject: [SciPy-User] Announcment and question
Message-ID: <a422b72d0911181550s77e31ec0jbef739f52e117043@mail.gmail.com>

Hello,

Announcement:

I've talked about it in this mailing list before, but yesterday I finally
made the first alpha release of my open-source Python scientific computing
project, GarlicSim. Check it out:

http://garlicsim.org

It is a Pythonic framework for working with simulations. Check out the page,
also there is a yet-incomplete
introduction<http://dl.getdropbox.com/u/1927707/Introduction%20to%20GarlicSim.doc>to
it, which goes more in-depth.

My first priority right now is getting users and building a community around
it, so I'd be available to help people to write their simulation packages
and solve problems. Early users will have this benefit, and additionally
will have more effect on the evolution of the software. So if you do any
work with simulations, drop me a mail, and I can help you use GarlicSim.

Also, I have a question. Up to now I've been supporting Python 2.4 through
3.1 with my project. Supporting 2.4 has been a real burden; I created a
separate fork for it, because I don't want to limit my entire project to
2.4. (I love context managers, for example.) I'm considering dropping
support for 2.4.

The question is, how many people in the scientific Python community still
use 2.4? Is it worth supporting?

Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091119/7c7c0b90/attachment.html>

From pgmdevlist at gmail.com  Wed Nov 18 19:40:03 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 18 Nov 2009 19:40:03 -0500
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
Message-ID: <6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com>


On Nov 18, 2009, at 6:13 PM, Rich E wrote:

> Hi list,
> 
> I just joined because I've had this mac for over a month and I still can't get a working module of scipy in Snow Leopard.  The dmg says it needs 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1. 

So you're set: 2.6.1 is more recent than 2.6...
But you probably shouldn't use a dmg: install Scipy from sources, it's far easier to help you.
Assuming you have xcode installed, and a proper gfortran (I think this one is the recommended one: http://r.research.att.com/tools/)

* Install numpy first. make a local install by using the --user flag when calling python setup.py install. No need to install an additional Python if you use --user, you won't be messing with your system

* Then, install scipy, using the --user flag as well. Don't bother for UMFPACK for the moment

* You may want to set CFLAGS="-arch x86_64" before installing numpy and scipy.

* Let me know where your problems are (off-list for now to reduce the noise), post the log of your build somewhere.

Don't worry, it's straightforward, provided you stick to the Python that comes w/ SnowLeopard
Good luck
P.


From zachary.pincus at yale.edu  Wed Nov 18 20:15:55 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 18 Nov 2009 20:15:55 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
Message-ID: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>

Hi all,

A bit off-topic, but before I write some C or cython to do this, I  
thought I'd ask to see if anyone knows of existing code for the task  
of finding the shortest (weighted) path between two points on a lattice.

Specifically, I have images with "start" and "end" pixels marked and I  
want to find the path through the image with the lowest integrated  
intensity. Trivial but tedious to implement, so if anyone has some  
good tips I'd be happy to know. (There's already a left-to-right- 
shortest-path-finder in the image scikit repository, but that's not  
quite what I need.)

Thanks,
Zach


From pgmdevlist at gmail.com  Wed Nov 18 20:17:24 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 18 Nov 2009 20:17:24 -0500
Subject: [SciPy-User] timeseries forwardfill
In-Reply-To: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com>
References: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>
	<D9321E95-39E7-4F3D-A92B-B16DC8A46265@gmail.com>
	<9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com>
Message-ID: <DC530B35-B3E5-47DD-8483-2DF08A13106E@gmail.com>


On Nov 18, 2009, at 5:18 PM, Chris Clarke wrote:

> Sorry for the later reply.  Yes forward_fill is still there and it  
> works!!!

Good

> But it seemed to have some more capability (initial values, 2d arrays)  
> when  it was in the sandbox??
> I may be wrong and mixing up with some other library.

That does sound familiar, but i don't think it was part of scikits.timeseries...  
A patch for 2D would be welcome, I'm not quite sure what you mean by initial value, though


From david at ar.media.kyoto-u.ac.jp  Wed Nov 18 23:31:39 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Thu, 19 Nov 2009 13:31:39 +0900
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
Message-ID: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp>

Rich E wrote:
> Hi list,
>
> I just joined because I've had this mac for over a month and I still
> can't get a working module of scipy in Snow Leopard.  The dmg says it
> needs 'python 2.6 or newer', but the one that comes with Snow Leopard
> is 2.6.1.

The dmg needs a python installed from python.org. If you want to get
scipy with the included python, you need to build it yourself.

UMFPACK is not needed - if you have a problem, please report the exact
error as well as the command you used to build. Just saying it does not
work is not enough to help you,

cheers,

David


From cimrman3 at ntc.zcu.cz  Thu Nov 19 06:51:34 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Thu, 19 Nov 2009 12:51:34 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <4B0403A5.60903@ntc.zcu.cz>
References: <20091118134613.GB17382@phare.normalesup.org>	<4B03FEBF.3080308@ntc.zcu.cz>	<20091118141638.GB3477@phare.normalesup.org>
	<4B0403A5.60903@ntc.zcu.cz>
Message-ID: <4B053146.6020609@ntc.zcu.cz>

Robert Cimrman wrote:
> Gael Varoquaux wrote:
>> On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote:
>>> Hi Gael,
>>> Gael Varoquaux wrote:
>>>> Hi there,
>>>> I would like to list the connect components of a graph (or a sparse
>>>> matrix, same thing). I know of course of the bread-first traversal, as
>>>> implemented eg in networkX, to find the connect components. However, I
>>>> have a feeling that sparse linear algebra must be performing such
>>>> searches, to decompose sparse matrices in blocks. I'd love to piggy back
>>>> on such implementations, rather than code and maintain a C or cython
>>>> version of breadth-first graph traversal.
>>> I have a function in C (as a part of sfepy), that does that. But as it
>>> might be useful for more people, what about putting it scipy
>>> sparsetools?
>> I think it would be very useful. I would actually include it in the
>> scipy.sparse namespace too.
> 
> OK, I will give it a shot (soon), unless someone jumps in with a better solution.
> 

It's now in ticket #1057.

r.


From gael.varoquaux at normalesup.org  Thu Nov 19 07:27:27 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Thu, 19 Nov 2009 13:27:27 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <4B053146.6020609@ntc.zcu.cz>
References: <20091118134613.GB17382@phare.normalesup.org>
	<4B03FEBF.3080308@ntc.zcu.cz>
	<20091118141638.GB3477@phare.normalesup.org>
	<4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz>
Message-ID: <20091119122727.GA1278@phare.normalesup.org>

On Thu, Nov 19, 2009 at 12:51:34PM +0100, Robert Cimrman wrote:
> > OK, I will give it a shot (soon), unless someone jumps in with a better solution.

> It's now in ticket #1057.

Excellent. I am looking at it right now. Too bad I am not getting the
ticket mail :)

Ga?l


From seb.haase at gmail.com  Thu Nov 19 07:48:22 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Thu, 19 Nov 2009 13:48:22 +0100
Subject: [SciPy-User] difference of angles - to be between -180 and + 180
In-Reply-To: <AA763441-0F7D-4D33-A410-B0EBDEF9858E@nist.gov>
References: <bc657ead0911180800h79ae68b1k23e7e5d30f487f45@mail.gmail.com> 
	<8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov>
	<AA763441-0F7D-4D33-A410-B0EBDEF9858E@nist.gov>
Message-ID: <bc657ead0911190448k2822a9e2q5db2e0d66637b037@mail.gmail.com>

On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer <guyer at nist.gov> wrote:
> On Nov 18, 2009, at 11:14 AM, I wrote:
>
>> ? return np.fmod(d + 540, 360) - 180
>
> Actually, I think you can just write (d + 540) % 360 - 180
>
> I think we used fmod because of some automatic weave inlining we do
> that didn't play nice with '%'.

Hi Jonathan,
thanks for your answer. I might prefer your solution simply for its brevity.

However, there are also some sign "problems":
for the angle from -10 to 180 I was expecting +170, but your solution
returns -170.

and for (to:)'190' (from) '-10'  expected: '160' , yours returns -160.

I have a list of 30 test cases, which these are the only 2 were yours
gave unexpected results regarding the sign -- besides the fact that
yours always returns -180 instead of +180, but that is obviously not
really wrong.

Thanks,
Sebastian


From sccolbert at gmail.com  Thu Nov 19 08:34:25 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Thu, 19 Nov 2009 14:34:25 +0100
Subject: [SciPy-User] difference of angles - to be between -180 and + 180
In-Reply-To: <bc657ead0911190448k2822a9e2q5db2e0d66637b037@mail.gmail.com>
References: <bc657ead0911180800h79ae68b1k23e7e5d30f487f45@mail.gmail.com>
	<8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov>
	<AA763441-0F7D-4D33-A410-B0EBDEF9858E@nist.gov>
	<bc657ead0911190448k2822a9e2q5db2e0d66637b037@mail.gmail.com>
Message-ID: <7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com>

On Thu, Nov 19, 2009 at 1:48 PM, Sebastian Haase <seb.haase at gmail.com> wrote:
> On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer <guyer at nist.gov> wrote:
>> On Nov 18, 2009, at 11:14 AM, I wrote:
>>
>>> ? return np.fmod(d + 540, 360) - 180
>>
>> Actually, I think you can just write (d + 540) % 360 - 180
>>
>> I think we used fmod because of some automatic weave inlining we do
>> that didn't play nice with '%'.
>
> Hi Jonathan,
> thanks for your answer. I might prefer your solution simply for its brevity.
>
> However, there are also some sign "problems":
> for the angle from -10 to 180 I was expecting +170, but your solution
> returns -170.
>
> and for (to:)'190' (from) '-10' ?expected: '160' , yours returns -160.
>


-170 and -160 are the correct answers for those differences


> I have a list of 30 test cases, which these are the only 2 were yours
> gave unexpected results regarding the sign -- besides the fact that
> yours always returns -180 instead of +180, but that is obviously not
> really wrong.
>
> Thanks,
> Sebastian
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From seb.haase at gmail.com  Thu Nov 19 08:52:13 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Thu, 19 Nov 2009 14:52:13 +0100
Subject: [SciPy-User] difference of angles - to be between -180 and + 180
In-Reply-To: <7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com>
References: <bc657ead0911180800h79ae68b1k23e7e5d30f487f45@mail.gmail.com> 
	<8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov>
	<AA763441-0F7D-4D33-A410-B0EBDEF9858E@nist.gov> 
	<bc657ead0911190448k2822a9e2q5db2e0d66637b037@mail.gmail.com> 
	<7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com>
Message-ID: <bc657ead0911190552k7f0dabb1k8b2918cafbe3e557@mail.gmail.com>

Thanks for the enlightenment ;-)
-S.


On Thu, Nov 19, 2009 at 2:34 PM, Chris Colbert <sccolbert at gmail.com> wrote:
> On Thu, Nov 19, 2009 at 1:48 PM, Sebastian Haase <seb.haase at gmail.com> wrote:
>> On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer <guyer at nist.gov> wrote:
>>> On Nov 18, 2009, at 11:14 AM, I wrote:
>>>
>>>> ? return np.fmod(d + 540, 360) - 180
>>>
>>> Actually, I think you can just write (d + 540) % 360 - 180
>>>
>>> I think we used fmod because of some automatic weave inlining we do
>>> that didn't play nice with '%'.
>>
>> Hi Jonathan,
>> thanks for your answer. I might prefer your solution simply for its brevity.
>>
>> However, there are also some sign "problems":
>> for the angle from -10 to 180 I was expecting +170, but your solution
>> returns -170.
>>
>> and for (to:)'190' (from) '-10' ?expected: '160' , yours returns -160.
>>
>
>
> -170 and -160 are the correct answers for those differences
>
>
>> I have a list of 30 test cases, which these are the only 2 were yours
>> gave unexpected results regarding the sign -- besides the fact that
>> yours always returns -180 instead of +180, but that is obviously not
>> really wrong.
>>
>> Thanks,
>> Sebastian
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From dcswest at gmail.com  Thu Nov 19 10:58:33 2009
From: dcswest at gmail.com (Dennis C)
Date: Thu, 19 Nov 2009 07:58:33 -0800
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
Message-ID: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com>

Greetings Rich;

Another option that recently worked for me was to just install it through
MacPorts.  That does maintain its own library in /opt so it will also
install Python even when it's already elsewhere on the system, but it'll
take care of all other dependencies too including the NumPy...

Good luck,


Message: 3
Date: Thu, 19 Nov 2009 00:13:42 +0100
From: Rich E <reakinator at gmail.com>
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
To: scipy-user at scipy.org
Message-ID:
       <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi list,

I just joined because I've had this mac for over a month and I still can't
get a working module of scipy in Snow Leopard.  The dmg says it needs
'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1.
Installing MacPython seems to create more problems than anything else, and
building scipy from source is a no-go (I saw various other posts in the
archives about dependency problems, but no solutions yet.  I am stuck at
UMFPACK sourcers being missing.)

Any help or guidance is greatly appreciate.

Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091119/04b71af8/attachment.html>

From wnbell at gmail.com  Thu Nov 19 11:01:19 2009
From: wnbell at gmail.com (Nathan Bell)
Date: Thu, 19 Nov 2009 11:01:19 -0500
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <4B053146.6020609@ntc.zcu.cz>
References: <20091118134613.GB17382@phare.normalesup.org>
	<4B03FEBF.3080308@ntc.zcu.cz>
	<20091118141638.GB3477@phare.normalesup.org>
	<4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz>
Message-ID: <d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>

On Thu, Nov 19, 2009 at 6:51 AM, Robert Cimrman <cimrman3 at ntc.zcu.cz> wrote:
>
> It's now in ticket #1057.
>

Hi Robert,

Sorry for getting on this thread so late, I've been extremely busy lately.


I think we should definitely include more graph algorithms in
scipy.sparse.  The cost of extracting the same info via eigenvectors
is high and the results are less trustworthy.

We've implemented several such algorithms (like connected_components
[1]) in PyAMG.  Since the code is organized in similar fashion to
scipy.sparse it would make sense to transfer some or all of the
functionality in pyamg.graph into scipy.sparse.graph or some such
namespace.  I'd also like to add some reordering methods like RCM and
nested bisection.

[1] http://code.google.com/p/pyamg/source/browse/trunk/pyamg/graph.py#271

-- 
Nathan Bell wnbell at gmail.com
http://www.wnbell.com/


From gael.varoquaux at normalesup.org  Thu Nov 19 11:09:56 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Thu, 19 Nov 2009 17:09:56 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>
References: <20091118134613.GB17382@phare.normalesup.org>
	<4B03FEBF.3080308@ntc.zcu.cz>
	<20091118141638.GB3477@phare.normalesup.org>
	<4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz>
	<d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>
Message-ID: <20091119160956.GB1278@phare.normalesup.org>

On Thu, Nov 19, 2009 at 11:01:19AM -0500, Nathan Bell wrote:
> I think we should definitely include more graph algorithms in
> scipy.sparse.  The cost of extracting the same info via eigenvectors
> is high and the results are less trustworthy.

> We've implemented several such algorithms (like connected_components
> [1]) in PyAMG.  

I thought you might have. By the way, that thing (pyAMG) is just
fantastic).

> Since the code is organized in similar fashion to scipy.sparse it would
> make sense to transfer some or all of the functionality in pyamg.graph
> into scipy.sparse.graph or some such namespace.  

I'd love to see all of it, actually.

> I'd also like to add some reordering methods like RCM and nested
> bisection.

I am really interested in all that. I don't have time to contribute in
the short term, but in the long run (one to two years), I have a big
interest there.

I think moving these features in scipy would enable code sharing between
a lot of other libraries (pyAMG, networkX, sfepy, and probably other PDE
solvers). Beside, the nipy project has some graph algorithms for machine
learning and computer vision that use custom structures, and should move
to common structures in the long run, and maybe in a comon project (we
are thinking of the scikit learn).

Very exciting talk!

Ga?l


From mcmcclur at unca.edu  Thu Nov 19 11:01:21 2009
From: mcmcclur at unca.edu (Mark McClure)
Date: Thu, 19 Nov 2009 11:01:21 -0500
Subject: [SciPy-User] Numerical methods textbook recs?
Message-ID: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com>

I'll be teaching an undergraduate level course in elementary numerical
methods next semester.  I would seriously consider using Python/SciPy
as the computing environment for the course but I have not been able
to find a textbook that is Python based.  The appropriate level for
the text would be similar to Kincaid and Cheney, as you can preview
here:
http://books.google.com/books?id=x69Q226WR8kC

That book is more expensive than I'd like, however, and is not Python
based.

Any suggestions?

Thanks,
Mark McClure


From cimrman3 at ntc.zcu.cz  Thu Nov 19 11:21:56 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Thu, 19 Nov 2009 17:21:56 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>
References: <20091118134613.GB17382@phare.normalesup.org>	<4B03FEBF.3080308@ntc.zcu.cz>	<20091118141638.GB3477@phare.normalesup.org>	<4B0403A5.60903@ntc.zcu.cz>
	<4B053146.6020609@ntc.zcu.cz>
	<d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>
Message-ID: <4B0570A4.6090402@ntc.zcu.cz>

Nathan Bell wrote:
> On Thu, Nov 19, 2009 at 6:51 AM, Robert Cimrman <cimrman3 at ntc.zcu.cz> wrote:
>> It's now in ticket #1057.
>>
> 
> Hi Robert,
> 
> Sorry for getting on this thread so late, I've been extremely busy lately.
> 
> 
> I think we should definitely include more graph algorithms in
> scipy.sparse.  The cost of extracting the same info via eigenvectors
> is high and the results are less trustworthy.
> 
> We've implemented several such algorithms (like connected_components
> [1]) in PyAMG.  Since the code is organized in similar fashion to
> scipy.sparse it would make sense to transfer some or all of the
> functionality in pyamg.graph into scipy.sparse.graph or some such
> namespace.  I'd also like to add some reordering methods like RCM and
> nested bisection.
> 
> [1] http://code.google.com/p/pyamg/source/browse/trunk/pyamg/graph.py#271

Hi Nathan,

I have implemented RCM into sfepy too... Fortunately, I already had a C 
functions lying around, so I did not waste too much time on that. It would be 
perfect to have all this in scipy instead!

cheers,
r.


From cimrman3 at ntc.zcu.cz  Thu Nov 19 11:24:25 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Thu, 19 Nov 2009 17:24:25 +0100
Subject: [SciPy-User] Graph connect components and sparse matrices
In-Reply-To: <20091119160956.GB1278@phare.normalesup.org>
References: <20091118134613.GB17382@phare.normalesup.org>	<4B03FEBF.3080308@ntc.zcu.cz>	<20091118141638.GB3477@phare.normalesup.org>	<4B0403A5.60903@ntc.zcu.cz>
	<4B053146.6020609@ntc.zcu.cz>	<d05265cb0911190801o4833ca39nf6551f46c533cf14@mail.gmail.com>
	<20091119160956.GB1278@phare.normalesup.org>
Message-ID: <4B057139.7060001@ntc.zcu.cz>

Gael Varoquaux wrote:
> On Thu, Nov 19, 2009 at 11:01:19AM -0500, Nathan Bell wrote:
>> I think we should definitely include more graph algorithms in
>> scipy.sparse.  The cost of extracting the same info via eigenvectors
>> is high and the results are less trustworthy.
> 
>> We've implemented several such algorithms (like connected_components
>> [1]) in PyAMG.  
> 
> I thought you might have. By the way, that thing (pyAMG) is just
> fantastic).

+1. (BTW. I still have to explore why it does not work well with my matrices...)

>> Since the code is organized in similar fashion to scipy.sparse it would
>> make sense to transfer some or all of the functionality in pyamg.graph
>> into scipy.sparse.graph or some such namespace.  
> 
> I'd love to see all of it, actually.
> 
>> I'd also like to add some reordering methods like RCM and nested
>> bisection.
> 
> I am really interested in all that. I don't have time to contribute in
> the short term, but in the long run (one to two years), I have a big
> interest there.
> 
> I think moving these features in scipy would enable code sharing between
> a lot of other libraries (pyAMG, networkX, sfepy, and probably other PDE
> solvers). Beside, the nipy project has some graph algorithms for machine
> learning and computer vision that use custom structures, and should move
> to common structures in the long run, and maybe in a comon project (we
> are thinking of the scikit learn).

Again, +1. I was forced to code some linear algebra/graph stuff, which is now 
in sfepy, but which I would prefer to have in scipy instead.

r.


From aisaac at american.edu  Thu Nov 19 11:52:30 2009
From: aisaac at american.edu (Alan G Isaac)
Date: Thu, 19 Nov 2009 11:52:30 -0500
Subject: [SciPy-User] Numerical methods textbook recs?
In-Reply-To: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com>
References: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com>
Message-ID: <4B0577CE.1060500@american.edu>

On 11/19/2009 11:01 AM, Mark McClure wrote:
> I'll be teaching an undergraduate level course in elementary numerical
> methods next semester.  I would seriously consider using Python/SciPy
> as the computing environment for the course but I have not been able
> to find a textbook that is Python based.


http://www.amazon.com/Numerical-Methods-Engineering-Python-Kiusalaas/dp/0521852870/ref=sr_1_1?ie=UTF8&s=books&qid=1258649498&sr=8-1

hth,
Alan Isaac


From Dharhas.Pothina at twdb.state.tx.us  Thu Nov 19 11:53:33 2009
From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina)
Date: Thu, 19 Nov 2009 10:53:33 -0600
Subject: [SciPy-User] Reset IPython to original blank state.
Message-ID: <4B0523AD.63BA.009B.0@twdb.state.tx.us>

Hi All,

I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data.

With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted.

I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython.

Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it?

- dharhas


From robert.kern at gmail.com  Thu Nov 19 13:08:50 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 19 Nov 2009 12:08:50 -0600
Subject: [SciPy-User] Reset IPython to original blank state.
In-Reply-To: <4B0523AD.63BA.009B.0@twdb.state.tx.us>
References: <4B0523AD.63BA.009B.0@twdb.state.tx.us>
Message-ID: <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com>

On Thu, Nov 19, 2009 at 10:53, Dharhas Pothina
<Dharhas.Pothina at twdb.state.tx.us> wrote:
> Hi All,
>
> I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data.
>
> With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted.
>
> I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython.

Where are these open files coming from? Most of the code in
numpy/matplotlib/IPython should be properly closing files. If it is
your code, it would be worth your time to fix your code to not keep
files open longer than necessary rather than restarting IPython.

> Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it?

Not really, no.

Also, you will want to ask further IPython questions on the IPython
mailing list:

  http://mail.scipy.org/mailman/listinfo/ipython-user

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Dharhas.Pothina at twdb.state.tx.us  Thu Nov 19 13:41:42 2009
From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina)
Date: Thu, 19 Nov 2009 12:41:42 -0600
Subject: [SciPy-User] Reset IPython to original blank state.
In-Reply-To: <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com>
References: <4B0523AD.63BA.009B.0@twdb.state.tx.us>
	<3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com>
Message-ID: <4B053D06.63BA.009B.0@twdb.state.tx.us>


Sorry didn't realize IPython had a separate mailing list. Will repost there.

I think I may have fixed the "too many files" problem but I the big problem I have is when rerunning the same script and having too many legend labels show up in matplotlib plots since it still has the old ones from the previous time the script run.

- d

>>> Robert Kern <robert.kern at gmail.com> 11/19/2009 12:08 PM >>>
On Thu, Nov 19, 2009 at 10:53, Dharhas Pothina
<Dharhas.Pothina at twdb.state.tx.us> wrote:
> Hi All,
>
> I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data.
>
> With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted.
>
> I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython.

Where are these open files coming from? Most of the code in
numpy/matplotlib/IPython should be properly closing files. If it is
your code, it would be worth your time to fix your code to not keep
files open longer than necessary rather than restarting IPython.

> Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it?

Not really, no.

Also, you will want to ask further IPython questions on the IPython
mailing list:

  http://mail.scipy.org/mailman/listinfo/ipython-user 

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org 
http://mail.scipy.org/mailman/listinfo/scipy-user


From pav at iki.fi  Thu Nov 19 14:29:37 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 19 Nov 2009 21:29:37 +0200
Subject: [SciPy-User] Reset IPython to original blank state.
In-Reply-To: <4B053D06.63BA.009B.0@twdb.state.tx.us>
References: <4B0523AD.63BA.009B.0@twdb.state.tx.us>
	<3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com>
	<4B053D06.63BA.009B.0@twdb.state.tx.us>
Message-ID: <1258658976.6439.0.camel@idol>

to, 2009-11-19 kello 12:41 -0600, Dharhas Pothina kirjoitti:
> Sorry didn't realize IPython had a separate mailing list. Will repost there.
> 
> I think I may have fixed the "too many files" problem but I the big
> problem I have is when rerunning the same script and having too many
> legend labels show up in matplotlib plots since it still has the old
> ones from the previous time the script run.

Clear the figure before replotting
http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.clf


From dav at alum.mit.edu  Thu Nov 19 15:23:41 2009
From: dav at alum.mit.edu (Dav Clark)
Date: Thu, 19 Nov 2009 12:23:41 -0800
Subject: [SciPy-User] Numerical methods textbook recs?
In-Reply-To: <4B0577CE.1060500@american.edu>
References: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com>
	<4B0577CE.1060500@american.edu>
Message-ID: <4354A80C-A51B-45BA-9BF0-EE9F6A9AE548@alum.mit.edu>

On Nov 19, 2009, at 8:52 AM, Alan G Isaac wrote:

> On 11/19/2009 11:01 AM, Mark McClure wrote:
>> I'll be teaching an undergraduate level course in elementary  
>> numerical
>> methods next semester.  I would seriously consider using Python/SciPy
>> as the computing environment for the course but I have not been able
>> to find a textbook that is Python based.
>
>
> http://www.amazon.com/Numerical-Methods-Engineering-Python-Kiusalaas/dp/0521852870/ref=sr_1_1?ie=UTF8&s=books&qid=1258649498&sr=8-1

I'm partial to the Strang book:

http://www.amazon.com/Introduction-Applied-Mathematics-Gilbert-Strang/dp/0961408804

It's more a senior level text, not sure what level you need.  
Conceptually, it is one of the clearest texts I've come across - but  
you need to figure out the coding on your own.

I've floated the idea of writing a "python complements" to this or any  
similar book that a group might get behind. If you're doing a course,  
that could be a nice way to organize such an enterprise. Let me (and  
the list) know if you do write / want help writing anything like that.

Cheers,
Dav


From reakinator at gmail.com  Thu Nov 19 18:39:53 2009
From: reakinator at gmail.com (Rich E)
Date: Fri, 20 Nov 2009 00:39:53 +0100
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com>
References: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com>
Message-ID: <a3297bc90911191539x6f2297d0ne148e7b78be34f8f@mail.gmail.com>

When trying to install py-scipy in macports, I get the following error:

ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy
Portfile changed since last build; discarding previous state.
--->  Computing dependencies for py-scipy
--->  Staging py-numpy into destroot
Error: Target org.macports.destroot returned: error renaming
"/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py":
no such file or directory
Error: The following dependencies failed to build: py-numpy swig-python
bison gsed python_select swig
Error: Status 1 encountered during processing.


I don't know if that is an issue for this list or the macports list.  But,
I'm moving on to other methods..

Rich

On Thu, Nov 19, 2009 at 4:58 PM, Dennis C <dcswest at gmail.com> wrote:

> Greetings Rich;
>
> Another option that recently worked for me was to just install it through
> MacPorts.  That does maintain its own library in /opt so it will also
> install Python even when it's already elsewhere on the system, but it'll
> take care of all other dependencies too including the NumPy...
>
> Good luck,
>
>
> Message: 3
> Date: Thu, 19 Nov 2009 00:13:42 +0100
>
> From: Rich E <reakinator at gmail.com>
> Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
> To: scipy-user at scipy.org
> Message-ID:
>        <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> Hi list,
>
> I just joined because I've had this mac for over a month and I still can't
> get a working module of scipy in Snow Leopard.  The dmg says it needs
> 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1.
> Installing MacPython seems to create more problems than anything else, and
> building scipy from source is a no-go (I saw various other posts in the
> archives about dependency problems, but no solutions yet.  I am stuck at
> UMFPACK sourcers being missing.)
>
> Any help or guidance is greatly appreciate.
>
> Rich
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/203f4ab9/attachment.html>

From reakinator at gmail.com  Thu Nov 19 19:20:21 2009
From: reakinator at gmail.com (Rich E)
Date: Fri, 20 Nov 2009 01:20:21 +0100
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
	<4B04CA2B.4000105@ar.media.kyoto-u.ac.jp>
Message-ID: <a3297bc90911191620x6a919e79u25b1d1cbbd1018a8@mail.gmail.com>

I just installed both python and scipy from the dmg files on their websites
- not 64bit but I suppose that is unimportant at the moment.

Now, I'm looking for pylab, but I don't see that anywhere (just matplotlib,
although my scripts all use "import pylab").

Thanks for the advice.

Rich

On Thu, Nov 19, 2009 at 5:31 AM, David Cournapeau <
david at ar.media.kyoto-u.ac.jp> wrote:

> Rich E wrote:
> > Hi list,
> >
> > I just joined because I've had this mac for over a month and I still
> > can't get a working module of scipy in Snow Leopard.  The dmg says it
> > needs 'python 2.6 or newer', but the one that comes with Snow Leopard
> > is 2.6.1.
>
> The dmg needs a python installed from python.org. If you want to get
> scipy with the included python, you need to build it yourself.
>
> UMFPACK is not needed - if you have a problem, please report the exact
> error as well as the command you used to build. Just saying it does not
> work is not enough to help you,
>
> cheers,
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/3af5a1f6/attachment.html>

From robert.kern at gmail.com  Thu Nov 19 19:23:46 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 19 Nov 2009 18:23:46 -0600
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <a3297bc90911191620x6a919e79u25b1d1cbbd1018a8@mail.gmail.com>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com> 
	<4B04CA2B.4000105@ar.media.kyoto-u.ac.jp>
	<a3297bc90911191620x6a919e79u25b1d1cbbd1018a8@mail.gmail.com>
Message-ID: <3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com>

On Thu, Nov 19, 2009 at 18:20, Rich E <reakinator at gmail.com> wrote:
> I just installed both python and scipy from the dmg files on their websites
> - not 64bit but I suppose that is unimportant at the moment.
>
> Now, I'm looking for pylab, but I don't see that anywhere (just matplotlib,
> although my scripts all use "import pylab").

pylab is part of matplotlib.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From lujitsu at hotmail.com  Thu Nov 19 19:25:16 2009
From: lujitsu at hotmail.com (C. Campbell)
Date: Thu, 19 Nov 2009 19:25:16 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
Message-ID: <SNT126-W20B8F111D241382E18653AA7A10@phx.gbl>


The package networkx has Dijktra's algorithm implemented; if I understand you correctly you'd just need to assign the intensities to the edge weights when you form the network.

http://networkx.lanl.gov/

I hope this helps!
Colin

> From: zachary.pincus at yale.edu
> To: scipy-user at scipy.org
> Date: Wed, 18 Nov 2009 20:15:55 -0500
> Subject: [SciPy-User] Dijkstra's algorithm on a lattice
> 
> Hi all,
> 
> A bit off-topic, but before I write some C or cython to do this, I  
> thought I'd ask to see if anyone knows of existing code for the task  
> of finding the shortest (weighted) path between two points on a lattice.
> 
> Specifically, I have images with "start" and "end" pixels marked and I  
> want to find the path through the image with the lowest integrated  
> intensity. Trivial but tedious to implement, so if anyone has some  
> good tips I'd be happy to know. (There's already a left-to-right- 
> shortest-path-finder in the image scikit repository, but that's not  
> quite what I need.)
> 
> Thanks,
> Zach
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
 		 	   		  
_________________________________________________________________
Windows 7: It works the way you want. Learn more.
http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009v2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091119/65ad598f/attachment.html>

From jeremy.vancleve at gmail.com  Thu Nov 19 19:35:58 2009
From: jeremy.vancleve at gmail.com (Jeremy Van Cleve)
Date: Thu, 19 Nov 2009 17:35:58 -0700
Subject: [SciPy-User] compiling scipy with pgcc 9.0-3 on cray linux 2.2
Message-ID: <4B05E46E.6090907@gmail.com>

I am trying to compile scipy 0.7.0 on the cray linux 2.2 nodes on the 
NICS kraken system. I've successfully built lapack, ATLAS, python 2.6.4, 
and numpy 1.3.0 but am having problems with scipy. Running "python 
setup.py install" in the scipy source directory yields the following error:

...
compiling C sources
C compiler: cc -DNDEBUG -fastsse -fPIC

creating build/temp.linux-x86_64-2.6/build
creating build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6
creating build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6/scipy
creating 
build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6/scipy/fftpack
compile options: '-Ibuild/src.linux-x86_64-2.6 
-I/lustre/scratch/jvanclev/opt/lib/python2.6/site-packages/numpy/core/include 
-I/lustre/scratch/jvanclev/opt/include/python2.6 -c'
cc: build/src.linux-x86_64-2.6/fortranobject.c
/opt/cray/xt-asyncpe/3.3/bin/cc: INFO: linux target is being used
cc: scipy/fftpack/src/zfftnd.c
/opt/cray/xt-asyncpe/3.3/bin/cc: INFO: linux target is being used
PGC-W-0156-Type not specified, 'int' assumed 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0039-Use of undeclared variable caches_zfftnd_fftpack 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0155-Pointer value created from a nonlong integral type 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0155-Pointer value created from a nonlong integral type 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0155-Pointer value created from a nonlong integral type 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0095-Type cast required for this conversion 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-W-0155-Pointer value created from a nonlong integral type 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0054-Subscript operator ([]) applied to non-array 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-S-0059-Struct or union required on left of . or -> 
(scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC-F-0008-Error limit exceeded (scipy/fftpack/src/zfftnd_fftpack.c: 21)
PGC/x86-64 Linux 9.0-3: compilation aborted
...

I was unable to find any information on a problem in scipy with this 
source file, so I was wondering whether there might be something 
specific to compiling with pgcc. Any thoughts?

Thanks!

best,

Jeremy


From perfreem at gmail.com  Thu Nov 19 19:57:12 2009
From: perfreem at gmail.com (per freem)
Date: Thu, 19 Nov 2009 19:57:12 -0500
Subject: [SciPy-User] error using SciPy on Mac OS X Snow Leopard (using
	scipy.maxentropy)
Message-ID: <e95b09750911191657x4f1ae7a7mbe07f9067f634ead@mail.gmail.com>

hi all,

i recently upgraded to Mac OS X Snow Leopard and moved from Python 2.5
to Python 2.6. I reinstalled scipy and it seemed to work, except when
I try to execute:

from scipy.maxentropy import logsumexp

I get the errors:

Traceback (most recent call last):
  File "myfile.py", line 6, in <module>
    from scipy.maxentropy import logsumexp
  File "/Library/Python/2.6/site-packages/scipy/maxentropy/__init__.py",
line 9, in <module>
    from maxentropy import *
  File "/Library/Python/2.6/site-packages/scipy/maxentropy/maxentropy.py",
line 74, in <module>
    from scipy import optimize
  File "/Library/Python/2.6/site-packages/scipy/optimize/__init__.py",
line 7, in <module>
    from optimize import *
  File "/Library/Python/2.6/site-packages/scipy/optimize/optimize.py",
line 28, in <module>
    import linesearch
  File "/Library/Python/2.6/site-packages/scipy/optimize/linesearch.py",
line 3, in <module>
    from scipy.optimize import minpack2
ImportError: /Library/Python/2.6/site-packages/scipy/optimize/minpack2.so:
no appropriate 64-bit architecture (see "man python" for running in
32-bit mode)

any idea what could be wrong here? thanks.


From reakinator at gmail.com  Thu Nov 19 20:51:50 2009
From: reakinator at gmail.com (Rich E)
Date: Fri, 20 Nov 2009 02:51:50 +0100
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
	<4B04CA2B.4000105@ar.media.kyoto-u.ac.jp>
	<a3297bc90911191620x6a919e79u25b1d1cbbd1018a8@mail.gmail.com>
	<3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com>
Message-ID: <a3297bc90911191751w19575317h55da31fa1c028791@mail.gmail.com>

Cool, I think I have everything I need.  Just didn't see all the dmg's at
first!  Sorry for the waste of time.

Rich

On Fri, Nov 20, 2009 at 1:23 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Thu, Nov 19, 2009 at 18:20, Rich E <reakinator at gmail.com> wrote:
> > I just installed both python and scipy from the dmg files on their
> websites
> > - not 64bit but I suppose that is unimportant at the moment.
> >
> > Now, I'm looking for pylab, but I don't see that anywhere (just
> matplotlib,
> > although my scripts all use "import pylab").
>
> pylab is part of matplotlib.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/99150a50/attachment.html>

From karl.young at ucsf.edu  Fri Nov 20 11:48:48 2009
From: karl.young at ucsf.edu (Young, Karl)
Date: Fri, 20 Nov 2009 08:48:48 -0800
Subject: [SciPy-User] finite element packages
Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>


I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models. 

I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts,

-- Karl

From d.l.goldsmith at gmail.com  Fri Nov 20 12:10:21 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 20 Nov 2009 09:10:21 -0800
Subject: [SciPy-User] finite element packages
In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>
References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>
Message-ID: <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>

Forgive me if you provided this in the previous thread, but, for reference,
what analytic model(s) (differential equations, presumably) are you using
that led you to elliptical functions?  Also, are you interested in modeling
transient (time-dependent) or steady-state (d/dt=0), stability-instability
transitions, oscillatory mode amplification and damping, etc.?  Finally, are
you comparing theory w/ experiment, i.e., do you also have experimental data
you're modeling and/or using to tweak your analytic models' parameters?

DG

On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl <karl.young at ucsf.edu> wrote:

>
> I'm trying to model a flexible flywheel (hence my question about Wierstrass
> elliptic functions a couple of weeks ago - thanks again for the helpful
> replies). I'm now trying to consider realistic models with elastic materials
> that go beyond my abilities to model analytically and figured I need to look
> at finite element models.
>
> I haven't used finite element packages and was wondering if anyone on the
> list had any recommendations, preferably scipythonic but I'm just curious
> generally about what people would consider using for a problem like this
> (i.e. a rotating flexible rope type problem). Thanks for any thoughts,
>
> -- Karl
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/0358be10/attachment.html>

From ramercer at gmail.com  Fri Nov 20 12:33:13 2009
From: ramercer at gmail.com (Adam Mercer)
Date: Fri, 20 Nov 2009 11:33:13 -0600
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <a3297bc90911191539x6f2297d0ne148e7b78be34f8f@mail.gmail.com>
References: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com> 
	<a3297bc90911191539x6f2297d0ne148e7b78be34f8f@mail.gmail.com>
Message-ID: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com>

On Thu, Nov 19, 2009 at 17:39, Rich E <reakinator at gmail.com> wrote:
> When trying to install py-scipy in macports, I get the following error:
>
> ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy
> Portfile changed since last build; discarding previous state.
> --->? Computing dependencies for py-scipy
> --->? Staging py-numpy into destroot
> Error: Target org.macports.destroot returned: error renaming
> "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py":
> no such file or directory
> Error: The following dependencies failed to build: py-numpy swig-python
> bison gsed python_select swig
> Error: Status 1 encountered during processing.

This is http://trac.macports.org/ticket/22571

> I don't know if that is an issue for this list or the macports list.? But,
> I'm moving on to other methods..

Any reason why you want the python2.4 version? If you're on Snow
Leopard the python2.6 version will work much better. Anyway this is a
MacPort issue.

Cheers

Adam


From reakinator at gmail.com  Fri Nov 20 12:52:39 2009
From: reakinator at gmail.com (Rich E)
Date: Fri, 20 Nov 2009 18:52:39 +0100
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com>
References: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com>
	<a3297bc90911191539x6f2297d0ne148e7b78be34f8f@mail.gmail.com>
	<799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com>
Message-ID: <a3297bc90911200952r37941666rdba18fe0297a6aeb@mail.gmail.com>

Where do you see that I wanted the 2.4 version?  I ended up installing 2.6
macpython, numpy, scipy, and matplotlib from dmg files, then ipython from
source.  It is working so far.

Rich

On Fri, Nov 20, 2009 at 6:33 PM, Adam Mercer <ramercer at gmail.com> wrote:

> On Thu, Nov 19, 2009 at 17:39, Rich E <reakinator at gmail.com> wrote:
> > When trying to install py-scipy in macports, I get the following error:
> >
> > ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy
> > Portfile changed since last build; discarding previous state.
> > --->  Computing dependencies for py-scipy
> > --->  Staging py-numpy into destroot
> > Error: Target org.macports.destroot returned: error renaming
> >
> "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py":
> > no such file or directory
> > Error: The following dependencies failed to build: py-numpy swig-python
> > bison gsed python_select swig
> > Error: Status 1 encountered during processing.
>
> This is http://trac.macports.org/ticket/22571
>
> > I don't know if that is an issue for this list or the macports list.
> But,
> > I'm moving on to other methods..
>
> Any reason why you want the python2.4 version? If you're on Snow
> Leopard the python2.6 version will work much better. Anyway this is a
> MacPort issue.
>
> Cheers
>
> Adam
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/ea069472/attachment.html>

From ramercer at gmail.com  Fri Nov 20 13:01:07 2009
From: ramercer at gmail.com (Adam Mercer)
Date: Fri, 20 Nov 2009 12:01:07 -0600
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <a3297bc90911200952r37941666rdba18fe0297a6aeb@mail.gmail.com>
References: <d8c381b40911190758y679ae2e2j44c3d6b25ea5f03b@mail.gmail.com> 
	<a3297bc90911191539x6f2297d0ne148e7b78be34f8f@mail.gmail.com> 
	<799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com> 
	<a3297bc90911200952r37941666rdba18fe0297a6aeb@mail.gmail.com>
Message-ID: <799406d60911201001i7ad71417hec3dbfc2b3855f5b@mail.gmail.com>

On Fri, Nov 20, 2009 at 11:52, Rich E <reakinator at gmail.com> wrote:
> Where do you see that I wanted the 2.4 version?? I ended up installing 2.6
> macpython, numpy, scipy, and matplotlib from dmg files, then ipython from
> source.? It is working so far.

The fact that you tried to install py-scipy, this is the python2.4 version.

Cheers

Adam


From karl.young at ucsf.edu  Fri Nov 20 13:06:06 2009
From: karl.young at ucsf.edu (Young, Karl)
Date: Fri, 20 Nov 2009 10:06:06 -0800
Subject: [SciPy-User] finite element packages
In-Reply-To: <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>
References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>,
	<45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>
Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu>


Hi David,

Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices  like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,... 

I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down). 

Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop. 

The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers).

Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems !

-- Karl        

________________________________________
From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com]
Sent: Friday, November 20, 2009 9:10 AM
To: SciPy Users List
Subject: Re: [SciPy-User] finite element packages

Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions?  Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.?  Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters?

DG

On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl <karl.young at ucsf.edu<mailto:karl.young at ucsf.edu>> wrote:

I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models.

I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts,

-- Karl
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>
http://mail.scipy.org/mailman/listinfo/scipy-user


From d.l.goldsmith at gmail.com  Fri Nov 20 14:49:37 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 20 Nov 2009 11:49:37 -0800
Subject: [SciPy-User] finite element packages
In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu>
References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>
	<45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>
	<72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu>
Message-ID: <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com>

On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl <karl.young at ucsf.edu> wrote:

>
> Hi David,
>
> Thanks for the quick reply. I'm at a fairly early stage with this and so
> it's still fairly exploratory. That said I guess the main goal is to help my
> friend, who already has a working prtotype of a flexible flywheel, model and
> balance various parameter choices  like speed of the flywheel, deformation
> of the wheel based on parameters associated with various material
> choices,...
>
> I obtained my analytic model by appropriately modifying the force diagram
> from a paper on the "skipping rope" problem; I obtained a nonlinear
> differential equation for the form of the loops of the flywheel that had
> elliptic functions as solutions. To first order I'm hoping that I can do
> some useful static modeling, i.e. in the rotating frame, even with more
> realistic parameters for the loop material, i.e. I guess the answer to the
> question is that my initial interest is in steady-state models (though I
> guess at some point it would be nice to study spin up and spin down).
>
> Again, to first order I'm not that concerned about looking at
> stability-instability transitions or oscillatory mode amplification and
> damping because my friend has a working prototype that seems to be pretty
> deeply in a stable range, at least re. variation in rotation speeds. The
> hope is that I can model the system in a way such that small changes in
> things like material parameters won't effect the stability regime (the
> flexible flywheel, combined with a fancy gimbal system seems to have a sort
> of surprisingly large stability range, re. parameters like rotation speeds
> and loop radius). But I may need to eventually model oscillatory modes and
> stability transitions re. use of some materials for the loop.
>
> The first goal will be to compare the model/simulations with his prototype,
> i.e. experiment (e.g. we may take pictures as in some of the skipping rope
> papers).
>
> Maybe my approach sounds silly; it's very preliminary and exploratory.
> Physicists (and particularly me) are probably too dumb to think about hard
> mechanical engineering problems !
>

No, but there is one key factor you're unclear as to how you're modeling,
which an ME would consider among the first things to model, namely, a model
for the elasticity of the "flexible material": how the flywheel deforms due
to centripetal acceleration will clearly affect its moment of inertia,
affecting its rotational momentum and kinetic energy, and in turn its
elastic potential energy; elastic damping sounds like it is also important.
In any event, I was hoping you'd supply the actual non-linear DE(s), as the
FEM is not always well-suited to such problems: depending on the nature of
the nonlinearities and your choice of basis functions, completing the
required integration by parts may be intractable (or prohibitively difficult
for a first iteration in an "exploratory" investigation).  In particular,
the physically-required periodicity of your solutions (whatever your
solutions are at theta=0, they have to be the same at theta=2pi, unless your
flywheel is experiencing a jump discontinuity there) suggest that a spectral
method may be more appropriate (aka "Harmonic Balance"; "Article 125" in
Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic
Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/
periodic solutions.  Yields: An approximate solution valid over the entire
period.  There is a specified procedure for increasing the number of terms
and, hence, for increasing the accuracy."  Sounds like exactly what you
need...the article furnishes an external reference which I can forward if
desired.  I'd be remiss if I did not mention however, that spectral and
finite element methods are not necessarily mutually exclusive: periodic
basis functions are among those for which the FEM is well-developed.)

FWIW,

DG

>
> -- Karl
>
> ________________________________________
> From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On
> Behalf Of David Goldsmith [d.l.goldsmith at gmail.com]
> Sent: Friday, November 20, 2009 9:10 AM
> To: SciPy Users List
> Subject: Re: [SciPy-User] finite element packages
>
> Forgive me if you provided this in the previous thread, but, for reference,
> what analytic model(s) (differential equations, presumably) are you using
> that led you to elliptical functions?  Also, are you interested in modeling
> transient (time-dependent) or steady-state (d/dt=0), stability-instability
> transitions, oscillatory mode amplification and damping, etc.?  Finally, are
> you comparing theory w/ experiment, i.e., do you also have experimental data
> you're modeling and/or using to tweak your analytic models' parameters?
>
> DG
>
> On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl <karl.young at ucsf.edu<mailto:
> karl.young at ucsf.edu>> wrote:
>
> I'm trying to model a flexible flywheel (hence my question about Wierstrass
> elliptic functions a couple of weeks ago - thanks again for the helpful
> replies). I'm now trying to consider realistic models with elastic materials
> that go beyond my abilities to model analytically and figured I need to look
> at finite element models.
>
> I haven't used finite element packages and was wondering if anyone on the
> list had any recommendations, preferably scipythonic but I'm just curious
> generally about what people would consider using for a problem like this
> (i.e. a rotating flexible rope type problem). Thanks for any thoughts,
>
> -- Karl
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091120/f38795cc/attachment.html>

From david.trem at gmail.com  Fri Nov 20 15:26:06 2009
From: david.trem at gmail.com (David Trem)
Date: Fri, 20 Nov 2009 21:26:06 +0100
Subject: [SciPy-User] sinc interpolation
Message-ID: <4B06FB5E.8070806@gmail.com>

Hello,

Is sinc interpolation available in Scipy ?

  I've just ask this question to Travis Oliphant during
the entought webinar that had just ended but unfortunately
I was not able to ear the reply due to poor sound quality just at
that moment :-(
Hope someone could give me his or a reply to this question.

Thanks,

David


From karl.young at ucsf.edu  Fri Nov 20 16:21:41 2009
From: karl.young at ucsf.edu (Young, Karl)
Date: Fri, 20 Nov 2009 13:21:41 -0800
Subject: [SciPy-User] finite element packages
In-Reply-To: <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com>
References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>
	<45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>
	<72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu>,
	<45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com>
Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu>


Hi David,

I was assuming that I'd have to just abandon the analytical form if I included elasticity so I didn't think to include the differential equation that I got. I don't have it handy but it was something pretty simple like y(x)'' - c * y(x)^3 = 0 and based on whether I included a couple of approximations or not there was a first derivative term as well; y is the radial extent of the loop and x is angle. My configuration was a little odd in that there were 4 "spokes" for the flywheel, which were just more of the rope tied together and the loopy part consisted of lengths that were longer than circular arcs (a configuration that he found empirically to be more stable). So I only accounted for 1 quarter of the loop (between spokes) and my boundary conditions were just y(0) = L, y(pi/2) = L where L is length of the "spoke".

Re. generalizing to account for elasticity I found a nice paper that analyzed the catenary problem for "Neo-Hookean" materials (sort of the next step in sophistication from modeling deformation with Hooke's law, e.g. accounts for change in cross section as a function of stretching - though I'm sure you know about that already) and figured I'd start with that. Since I haven't done any finite element modeling I assumed I could just start with a model per element that included forces, boundary conditions, and elasticity parameters and get a numerical solution.

Thanks much for the suggestion re. spectral methods I will definitely try to run down a copy of Zwillinger's article and take a look.

-- Karl 

________________________________________
From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com]
Sent: Friday, November 20, 2009 11:49 AM
To: SciPy Users List
Subject: Re: [SciPy-User] finite element packages

On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl <karl.young at ucsf.edu<mailto:karl.young at ucsf.edu>> wrote:

Hi David,

Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices  like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,...

I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down).

Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop.

The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers).

Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems !

No, but there is one key factor you're unclear as to how you're modeling, which an ME would consider among the first things to model, namely, a model for the elasticity of the "flexible material": how the flywheel deforms due to centripetal acceleration will clearly affect its moment of inertia, affecting its rotational momentum and kinetic energy, and in turn its elastic potential energy; elastic damping sounds like it is also important.  In any event, I was hoping you'd supply the actual non-linear DE(s), as the FEM is not always well-suited to such problems: depending on the nature of the nonlinearities and your choice of basis functions, completing the required integration by parts may be intractable (or prohibitively difficult for a first iteration in an "exploratory" investigation).  In particular, the physically-required periodicity of your solutions (whatever your solutions are at theta=0, they have to be the same at theta=2pi, unless your flywheel is experiencing a jump discontinuity there) suggest that a spectral method may be more appropriate (aka "Harmonic Balance"; "Article 125" in Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/ periodic solutions.  Yields: An approximate solution valid over the entire period.  There is a specified procedure for increasing the number of terms and, hence, for increasing the accuracy."  Sounds like exactly what you need...the article furnishes an external reference which I can forward if desired.  I'd be remiss if I did not mention however, that spectral and finite element methods are not necessarily mutually exclusive: periodic basis functions are among those for which the FEM is well-developed.)

FWIW,

DG

-- Karl

________________________________________
From: scipy-user-bounces at scipy.org<mailto:scipy-user-bounces at scipy.org> [scipy-user-bounces at scipy.org<mailto:scipy-user-bounces at scipy.org>] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com<mailto:d.l.goldsmith at gmail.com>]
Sent: Friday, November 20, 2009 9:10 AM
To: SciPy Users List
Subject: Re: [SciPy-User] finite element packages

Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions?  Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.?  Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters?

DG

On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl <karl.young at ucsf.edu<mailto:karl.young at ucsf.edu><mailto:karl.young at ucsf.edu<mailto:karl.young at ucsf.edu>>> wrote:

I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models.

I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts,

-- Karl
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org<mailto:SciPy-User at scipy.org><mailto:SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>>
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>
http://mail.scipy.org/mailman/listinfo/scipy-user


From ferrell at diablotech.com  Fri Nov 20 17:55:47 2009
From: ferrell at diablotech.com (Robert Ferrell)
Date: Fri, 20 Nov 2009 15:55:47 -0700
Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard
In-Reply-To: <6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com>
References: <a3297bc90911181513o6a9dc67fp362c9a121c9fb3c7@mail.gmail.com>
	<6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com>
Message-ID: <F8C4F644-6057-4CE6-8DC9-12394D74311C@diablotech.com>


On Nov 18, 2009, at 5:40 PM, Pierre GM wrote:

>
> On Nov 18, 2009, at 6:13 PM, Rich E wrote:
>
>> Hi list,
>>
>> I just joined because I've had this mac for over a month and I  
>> still can't get a working module of scipy in Snow Leopard.  The dmg  
>> says it needs 'python 2.6 or newer', but the one that comes with  
>> Snow Leopard is 2.6.1.
>
> So you're set: 2.6.1 is more recent than 2.6...
> But you probably shouldn't use a dmg: install Scipy from sources,  
> it's far easier to help you.
> Assuming you have xcode installed, and a proper gfortran (I think  
> this one is the recommended one: http://r.research.att.com/tools/)
>
> * Install numpy first. make a local install by using the --user flag  
> when calling python setup.py install. No need to install an  
> additional Python if you use --user, you won't be messing with your  
> system

Is there documentation for --user?  http://docs.python.org/install/index.html 
  documents --home.  Is --user equivalent to --home=~?

thanks,
-robert

>
> * Then, install scipy, using the --user flag as well. Don't bother  
> for UMFPACK for the moment
>
> * You may want to set CFLAGS="-arch x86_64" before installing numpy  
> and scipy.
>
> * Let me know where your problems are (off-list for now to reduce  
> the noise), post the log of your build somewhere.
>
> Don't worry, it's straightforward, provided you stick to the Python  
> that comes w/ SnowLeopard
> Good luck
> P.
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From emmanuelle.gouillart at normalesup.org  Sat Nov 21 04:24:24 2009
From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart)
Date: Sat, 21 Nov 2009 10:24:24 +0100
Subject: [SciPy-User] finite element packages
In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu>
References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>
	<45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com>
	<45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com>
	<72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu>
Message-ID: <20091121092424.GA29506@phare.normalesup.org>

	Hello Karl,

	you may have already found it by googling "python finite
elements", but sfepy (http://code.google.com/p/sfepy/) has a good
reputation (I haven't used it myself, though). Check the examples page to
see if it can suit your needs.

	Cheers,

	Emmanuelle

On Fri, Nov 20, 2009 at 01:21:41PM -0800, Young, Karl wrote:

> Hi David,

> I was assuming that I'd have to just abandon the analytical form if I included elasticity so I didn't think to include the differential equation that I got. I don't have it handy but it was something pretty simple like y(x)'' - c * y(x)^3 = 0 and based on whether I included a couple of approximations or not there was a first derivative term as well; y is the radial extent of the loop and x is angle. My configuration was a little odd in that there were 4 "spokes" for the flywheel, which were just more of the rope tied together and the loopy part consisted of lengths that were longer than circular arcs (a configuration that he found empirically to be more stable). So I only accounted for 1 quarter of the loop (between spokes) and my boundary conditions were just y(0) = L, y(pi/2) = L where L is length of the "spoke".

> Re. generalizing to account for elasticity I found a nice paper that analyzed the catenary problem for "Neo-Hookean" materials (sort of the next step in sophistication from modeling deformation with Hooke's law, e.g. accounts for change in cross section as a function of stretching - though I'm sure you know about that already) and figured I'd start with that. Since I haven't done any finite element modeling I assumed I could just start with a model per element that included forces, boundary conditions, and elasticity parameters and get a numerical solution.

> Thanks much for the suggestion re. spectral methods I will definitely try to run down a copy of Zwillinger's article and take a look.

> -- Karl 

> ________________________________________
> From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com]
> Sent: Friday, November 20, 2009 11:49 AM
> To: SciPy Users List
> Subject: Re: [SciPy-User] finite element packages

> On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl <karl.young at ucsf.edu<mailto:karl.young at ucsf.edu>> wrote:

> Hi David,

> Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices  like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,...

> I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down).

> Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop.

> The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers).

> Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems !

> No, but there is one key factor you're unclear as to how you're modeling, which an ME would consider among the first things to model, namely, a model for the elasticity of the "flexible material": how the flywheel deforms due to centripetal acceleration will clearly affect its moment of inertia, affecting its rotational momentum and kinetic energy, and in turn its elastic potential energy; elastic damping sounds like it is also important.  In any event, I was hoping you'd supply the actual non-linear DE(s), as the FEM is not always well-suited to such problems: depending on the nature of the nonlinearities and your choice of basis functions, completing the required integration by parts may be intractable (or prohibitively difficult for a first iteration in an "exploratory" investigation).  In particular, the physically-required periodicity of your solutions (whatever your solutions are at theta=0, they have to be the same at theta=2pi, unless your flywheel is experiencing a j
>  ump discontinuity there) suggest that a spectral method may be more appropriate (aka "Harmonic Balance"; "Article 125" in Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/ periodic solutions.  Yields: An approximate solution valid over the entire period.  There is a specified procedure for increasing the number of terms and, hence, for increasing the accuracy."  Sounds like exactly what you need...the article furnishes an external reference which I can forward if desired.  I'd be remiss if I did not mention however, that spectral and finite element methods are not necessarily mutually exclusive: periodic basis functions are among those for which the FEM is well-developed.)

> FWIW,

> DG

> -- Karl

> ________________________________________
> From: scipy-user-bounces at scipy.org<mailto:scipy-user-bounces at scipy.org> [scipy-user-bounces at scipy.org<mailto:scipy-user-bounces at scipy.org>] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com<mailto:d.l.goldsmith at gmail.com>]
> Sent: Friday, November 20, 2009 9:10 AM
> To: SciPy Users List
> Subject: Re: [SciPy-User] finite element packages

> Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions?  Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.?  Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters?

> DG

> On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl <karl.young at ucsf.edu<mailto:karl.young at ucsf.edu><mailto:karl.young at ucsf.edu<mailto:karl.young at ucsf.edu>>> wrote:

> I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models.

> I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts,

> -- Karl
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org<mailto:SciPy-User at scipy.org><mailto:SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>>
> http://mail.scipy.org/mailman/listinfo/scipy-user

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org<mailto:SciPy-User at scipy.org>
> http://mail.scipy.org/mailman/listinfo/scipy-user

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From sturla at molden.no  Sat Nov 21 05:18:46 2009
From: sturla at molden.no (Sturla Molden)
Date: Sat, 21 Nov 2009 11:18:46 +0100
Subject: [SciPy-User] sinc interpolation
In-Reply-To: <4B06FB5E.8070806@gmail.com>
References: <4B06FB5E.8070806@gmail.com>
Message-ID: <4B07BE86.6000006@molden.no>


I have a least-sqaures interpolator similar to Matlab's interp function. 
Basically it just constructs a FIR filter that can be used with 
scipy.signal.lfilter.

Also you can use FFTs for interpolation. Just rfft the signal, append 
zeros, and invert the transform.

Sturla


David Trem skrev:
> Hello,
>
> Is sinc interpolation available in Scipy ?
>
>   I've just ask this question to Travis Oliphant during
> the entought webinar that had just ended but unfortunately
> I was not able to ear the reply due to poor sound quality just at
> that moment :-(
> Hope someone could give me his or a reply to this question.
>
> Thanks,
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From silva at lma.cnrs-mrs.fr  Sat Nov 21 07:42:19 2009
From: silva at lma.cnrs-mrs.fr (Fabricio Silva)
Date: Sat, 21 Nov 2009 13:42:19 +0100
Subject: [SciPy-User] sinc interpolation
In-Reply-To: <4B06FB5E.8070806@gmail.com>
References: <4B06FB5E.8070806@gmail.com>
Message-ID: <1258807340.2525.0.camel@PCTerrusse>

Le vendredi 20 novembre 2009 ? 21:26 +0100, David Trem a ?crit :
> Hello,
> 
> Is sinc interpolation available in Scipy ?

David Cournapeau has a scikit for that :
http://pypi.python.org/pypi/scikits.samplerate/

-- 
Fabrice Silva
Laboratory of Mechanics and Acoustics (CNRS, UPR 7051)


From stefan at sun.ac.za  Sat Nov 21 09:38:54 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Sat, 21 Nov 2009 16:38:54 +0200
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
Message-ID: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>

Hi Zach

2009/11/19 Zachary Pincus <zachary.pincus at yale.edu>:
> A bit off-topic, but before I write some C or cython to do this, I
> thought I'd ask to see if anyone knows of existing code for the task
> of finding the shortest (weighted) path between two points on a lattice.

This is what the shortest path routine in scikits.image is meant to
do.  How can we modify it to make it more useful to you?

Cheers
St?fan


From zachary.pincus at yale.edu  Sat Nov 21 10:06:57 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Sat, 21 Nov 2009 10:06:57 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
Message-ID: <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>

> 2009/11/19 Zachary Pincus <zachary.pincus at yale.edu>:
>> A bit off-topic, but before I write some C or cython to do this, I
>> thought I'd ask to see if anyone knows of existing code for the task
>> of finding the shortest (weighted) path between two points on a  
>> lattice.
>
> This is what the shortest path routine in scikits.image is meant to
> do.  How can we modify it to make it more useful to you?

Hi St?fan,

Based on just a rudimentary perusal of that code, I thought it only  
found the lowest-cost path from the left to the right of an array...  
is this still the case? I'd been needing something to go from an  
arbitrary point to any other arbitrary point.

I just started working on some cython code that will compute the  
shortest path (under a given length) from any point to any other. It's  
not quite Dijkstra (for which one needs to keep a sorted list of the  
next pixels to visit), but more like breadth-first search to a given  
depth. I'll be happy to send it over if that sort of thing sounds  
useful.

Zach

From zachary.pincus at yale.edu  Sat Nov 21 16:52:48 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Sat, 21 Nov 2009 16:52:48 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
Message-ID: <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>

OK, here's what I have. Not Dijkstra's algorithm, but very simple and  
not bad for many purposes.

You pass in a 2D costs array, start- and end-points, and a maximum  
number of iterations; the code then keeps track of the minimum  
cumulative cost to each pixel from the starting point, as well as the  
path thereto. It does this by keeping track of "active" pixels -- any  
time a lower cumulative cost to a given pixel is found, that pixel is  
made active. Each iteration, all the neighbors of the "active" pixels  
are examined to see if their costs can be lowered too. Basically  
breadth-first search.

Limitations and oddities:
- Currently, diagonal and vertical/horizontal steps are both allowed.  
Easy enough to make this a parameter.
- Paths along the boundary aren't traced out because I didn't want to  
deal with an if-check in the inner loop to make sure that the x,y  
position plus the current offset wasn't out of bounds. This could be  
addressed by (a) padding the input array by one pixel on each side,  
(b) putting the if in the inner loop, or (c) having a second pass  
through the edge pixels.
- In theory, the code could find the cheapest path from top-left to  
bottom-right in a single pass because "active" pixels are marked  
immediately as the code iterates through the array. So the max_iters  
parameter doesn't guarantee that paths longer than that will not be  
found. But it does guarantee that any path found less than that length  
is optimal...

Let's say it's BSD licensed, in case anyone finds it of use.

Zach


-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace_path.zip
Type: application/zip
Size: 1263 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091121/c38b059c/attachment.zip>
-------------- next part --------------


From gokhansever at gmail.com  Sat Nov 21 16:57:10 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Sat, 21 Nov 2009 15:57:10 -0600
Subject: [SciPy-User] Fitting a curve on a log-normal distributed data
In-Reply-To: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com>
Message-ID: <49d6b3500911211357h4b4870c8q7355c2d32db55fe6@mail.gmail.com>

One more update on this subject. I have been looking through some of the
papers on this topic, and I have finally found exactly what I need in this
paper:

Hussein, T., Dal Maso, M., Petaja, T., Koponen, I. K., Paatero, P., Aalto,
P. P., Hameri, K., and Kulmala, M.: Evaluation of an automatic algorithm for
?tting the particle number size distributions, Boreal Environ. Res., 10,
337?355, 2005.

Here is the abstract:

"The multi log-normal distribution function is widely in use to parameterize
the aerosol particle size distributions. The main purpose of such a
parameterization is to quantitatively describe size distributions and to
allow straightforward comparisons between different aerosol particle data
sets. In this study, we developed and evaluated an algorithm to parameterize
aerosol particle number size distributions with the multi log-normal
distribution function. The current algorithm is automatic and does not need
a user decision for the initial input parameters; it requires only the
maximum number of possible modes and then it reduces this number, if
possible, without affecting the fitting quality. The reduction of the number
of modes is based on an overlapping test between adjacent modes. The
algorithm was evaluated against a previous algorithm that can be considered
as a standard procedure. It was also evaluated against a long-term data set
and different types of measured aerosol particle size distributions in the
ambient atmosphere. The evaluation of the current algorithm showed the
following advantages: (I) it is suitable for different types of aerosol
particles observed in different environments and conditions, (2) it showed
agreement with the previous standard algorithm in about 90% of long-term
data set, (3) it is not time-consuming, particularly when long-term data
sets are analyzed, and (4) it is a useful tool in the studies of atmospheric
aerosol particle formation and transformation."

The full-text is freely available at:
http://www.borenv.net/BER/pdfs/ber10/ber10-337.pdf


On Mon, Nov 16, 2009 at 11:44 PM, G?khan Sever <gokhansever at gmail.com>wrote:

> Hello,
>
> I have a data which represents aerosol size distribution in between 0.1 to
> 3.0 micrometer ranges. I would like extrapolate the lower size down to 10
> nm. The data in this context is log-normally distributed. Therefore I am
> looking a way to fit a log-normal curve onto my data. Could you please give
> me some pointers to solve this problem?
>
> Thank you.
>
> --
> G?khan
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091121/2df9b3b3/attachment.html>

From gruben at bigpond.net.au  Sat Nov 21 19:48:27 2009
From: gruben at bigpond.net.au (Gary Ruben)
Date: Sun, 22 Nov 2009 11:48:27 +1100
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
Message-ID: <4B088A5B.8000604@bigpond.net.au>

Hi Zach,

I haven't looked at your code, but your description sounds like you've 
got a very nice solution. When you originally asked this, I immediately 
thought of Lee's algorithm, or Jarvis's distance-transform based path 
planning, which uses a modified distance transform that fixes the start 
and goal point costs. I didn't mention them because they don't cover to 
your case, but I think your solution is a more general case or theirs - 
i.e. you can use yours for navigation/maze solving by setting the 
obstacle/wall values to something greater than the maximum distance to 
the goal and the floor values to 0.

I think would be a very nice, general routine for scikits.image,

Gary R.

Zachary Pincus wrote:
> OK, here's what I have. Not Dijkstra's algorithm, but very simple and 
> not bad for many purposes.
> 
> You pass in a 2D costs array, start- and end-points, and a maximum 
> number of iterations; the code then keeps track of the minimum 
> cumulative cost to each pixel from the starting point, as well as the 
> path thereto. It does this by keeping track of "active" pixels -- any 
> time a lower cumulative cost to a given pixel is found, that pixel is 
> made active. Each iteration, all the neighbors of the "active" pixels 
> are examined to see if their costs can be lowered too. Basically 
> breadth-first search.
> 
> Limitations and oddities:
> - Currently, diagonal and vertical/horizontal steps are both allowed. 
> Easy enough to make this a parameter.
> - Paths along the boundary aren't traced out because I didn't want to 
> deal with an if-check in the inner loop to make sure that the x,y 
> position plus the current offset wasn't out of bounds. This could be 
> addressed by (a) padding the input array by one pixel on each side, (b) 
> putting the if in the inner loop, or (c) having a second pass through 
> the edge pixels.
> - In theory, the code could find the cheapest path from top-left to 
> bottom-right in a single pass because "active" pixels are marked 
> immediately as the code iterates through the array. So the max_iters 
> parameter doesn't guarantee that paths longer than that will not be 
> found. But it does guarantee that any path found less than that length 
> is optimal...
> 
> Let's say it's BSD licensed, in case anyone finds it of use.
> 
> Zach
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From zachary.pincus at yale.edu  Sat Nov 21 23:45:26 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Sat, 21 Nov 2009 23:45:26 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <4B088A5B.8000604@bigpond.net.au>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
Message-ID: <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>

Thanks Gary!

Attached is a simplified version that addresses all of the caveats I  
had earlier, plus I added documentation and input-checking.

Boundaries are now handled properly (the bounds-checking is not too  
slow, even on huge arrays), and the code now iterates until all paths  
have been fully traced. Still BSD licensed; if it might be useful to  
scikits.image, please feel free to include it.

Here's a simple "maze" solving example / test.

 >>> import numpy
 >>> import trace_path
 >>> a = numpy.ones((8,8), dtype=numpy.float32)
 >>> a[1:-1,1] = 0
 >>> a[1,1:-1] = 0
 >>> a
array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]], dtype=float32)

 >>> trace_path.trace_path(a, (1, 6), [(7, 2)])
(array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.],
        [ 1.,  0.,  1.,  2.,  2.,  2.,  2.,  2.],
        [ 1.,  0.,  1.,  2.,  3.,  3.,  3.,  3.],
        [ 1.,  0.,  1.,  2.,  3.,  4.,  4.,  4.],
        [ 1.,  0.,  1.,  2.,  3.,  4.,  5.,  5.],
        [ 1.,  1.,  1.,  2.,  3.,  4.,  5.,  6.]], dtype=float32),
  [[(1, 6),
    (1, 5),
    (1, 4),
    (1, 3),
    (1, 2),
    (2, 1),
    (3, 1),
    (4, 1),
    (5, 1),
    (6, 1),
    (7, 2)]])

 >>> trace_path.trace_path(a, (1, 6), [(7, 2)], diagonal_steps=False)
(array([[ 2.,  1.,  1.,  1.,  1.,  1.,  1.,  2.],
        [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
        [ 1.,  0.,  1.,  1.,  1.,  1.,  1.,  2.],
        [ 1.,  0.,  1.,  2.,  2.,  2.,  2.,  3.],
        [ 1.,  0.,  1.,  2.,  3.,  3.,  3.,  4.],
        [ 1.,  0.,  1.,  2.,  3.,  4.,  4.,  5.],
        [ 1.,  0.,  1.,  2.,  3.,  4.,  5.,  6.],
        [ 2.,  1.,  2.,  3.,  4.,  5.,  6.,  7.]], dtype=float32),
  [[(1, 6),
    (1, 5),
    (1, 4),
    (1, 3),
    (1, 2),
    (1, 1),
    (2, 1),
    (3, 1),
    (4, 1),
    (5, 1),
    (6, 1),
    (6, 2),
    (7, 2)]])

Zach


On Nov 21, 2009, at 7:48 PM, Gary Ruben wrote:

> Hi Zach,
>
> I haven't looked at your code, but your description sounds like you've
> got a very nice solution. When you originally asked this, I  
> immediately
> thought of Lee's algorithm, or Jarvis's distance-transform based path
> planning, which uses a modified distance transform that fixes the  
> start
> and goal point costs. I didn't mention them because they don't cover  
> to
> your case, but I think your solution is a more general case or  
> theirs -
> i.e. you can use yours for navigation/maze solving by setting the
> obstacle/wall values to something greater than the maximum distance to
> the goal and the floor values to 0.
>
> I think would be a very nice, general routine for scikits.image,
>
> Gary R.
>
> Zachary Pincus wrote:
>> OK, here's what I have. Not Dijkstra's algorithm, but very simple and
>> not bad for many purposes.
>>
>> You pass in a 2D costs array, start- and end-points, and a maximum
>> number of iterations; the code then keeps track of the minimum
>> cumulative cost to each pixel from the starting point, as well as the
>> path thereto. It does this by keeping track of "active" pixels -- any
>> time a lower cumulative cost to a given pixel is found, that pixel is
>> made active. Each iteration, all the neighbors of the "active" pixels
>> are examined to see if their costs can be lowered too. Basically
>> breadth-first search.
>>
>> Limitations and oddities:
>> - Currently, diagonal and vertical/horizontal steps are both allowed.
>> Easy enough to make this a parameter.
>> - Paths along the boundary aren't traced out because I didn't want to
>> deal with an if-check in the inner loop to make sure that the x,y
>> position plus the current offset wasn't out of bounds. This could be
>> addressed by (a) padding the input array by one pixel on each side,  
>> (b)
>> putting the if in the inner loop, or (c) having a second pass through
>> the edge pixels.
>> - In theory, the code could find the cheapest path from top-left to
>> bottom-right in a single pass because "active" pixels are marked
>> immediately as the code iterates through the array. So the max_iters
>> parameter doesn't guarantee that paths longer than that will not be
>> found. But it does guarantee that any path found less than that  
>> length
>> is optimal...
>>
>> Let's say it's BSD licensed, in case anyone finds it of use.
>>
>> Zach
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace_path.zip
Type: application/zip
Size: 2025 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091121/f655e995/attachment.zip>

From stefan at sun.ac.za  Sun Nov 22 06:00:57 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Sun, 22 Nov 2009 13:00:57 +0200
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu> 
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> 
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> 
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
Message-ID: <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>

Hi Zach

2009/11/22 Zachary Pincus <zachary.pincus at yale.edu>:
> Boundaries are now handled properly (the bounds-checking is not too
> slow, even on huge arrays), and the code now iterates until all paths
> have been fully traced. Still BSD licensed; if it might be useful to
> scikits.image, please feel free to include it.

This code looks really handy, and I'd love to add it.  Would you
consider putting your code in a branch on github?

Simply go to the following URL and click "fork":

http://github.com/stefanv/scikits.image

Add your changes, push back to github and click the button "merge
request", then I'll make sure it gets merged to the main branch.

Thanks!
St?fan


From stefan at sun.ac.za  Sun Nov 22 06:07:14 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Sun, 22 Nov 2009 13:07:14 +0200
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu> 
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> 
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> 
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> 
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
Message-ID: <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>

2009/11/22 St?fan van der Walt <stefan at sun.ac.za>:
> This code looks really handy, and I'd love to add it. ?Would you
> consider putting your code in a branch on github?

Actually, don't worry -- I'll add it quickly.  Thanks for the contribution!

Cheers
St?fan


From cedrick.faury at freesbee.fr  Sun Nov 22 07:50:47 2009
From: cedrick.faury at freesbee.fr (=?ISO-8859-1?Q?C=E9drick_FAURY?=)
Date: Sun, 22 Nov 2009 13:50:47 +0100
Subject: [SciPy-User] Incoherent results with signal.impulse
Message-ID: <4B0933A7.6020708@freesbee.fr>

Hello,

I have scipy 0.7.1, python 2.6, and when I do :

n = scipy.array([1])
d = scipy.array([0.01, 0.2, 1.0])
T, yout = scipy.signal.impulse((n,d))

it gives incoherent results for yout.

And that doesn't occurs with [1.0, 2.0, 1.0] denominator.

Is it a bug ?
I'm doing something wrong ?
Is anybody knows a solution ?

Thanks by advance

C?drick FAURY

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091122/24eafa89/attachment.html>

From simon+python at a-oben.org  Sun Nov 22 13:42:26 2009
From: simon+python at a-oben.org (Simon Friedberger)
Date: Sun, 22 Nov 2009 19:42:26 +0100
Subject: [SciPy-User] Mailing list?
Message-ID: <20091122184226.GA9227@a-oben.org>

Hi everybody,

I sent a message about the K-Means algorithm a couple of days ago but it
seems like it never made it on the list. Are new members moderated or
something?

Best
Simon


From simon+python at a-oben.org  Sun Nov 22 13:46:21 2009
From: simon+python at a-oben.org (Simon Friedberger)
Date: Sun, 22 Nov 2009 19:46:21 +0100
Subject: [SciPy-User] Mailing list?
In-Reply-To: <20091122184226.GA9227@a-oben.org>
References: <20091122184226.GA9227@a-oben.org>
Message-ID: <20091122184621.GB9227@a-oben.org>

Ok, apparently this message got through, so that answers my question.
Here is my original messages.
Sorry for the confusion.


Good Night Everybody,

I just looked at the documentation for the K-Means vector quantization
functions and I am a bit confused. On the one hand it says that
normalization to unit variance would be beneficial on the other hand
there are a lot of "must"s in the descriptions.
I was wondering if it is possible to use the functions without
normalization or if there is a negative impact.

This would make sense because it seems reasonable that one would want to
build the codebook on some set and then quantize a different set. In
this case normalization would have to be the same or be omitted.

I am also interested in literature recommendations concerning why this
is a good idea in general.

Any help would be greatly appreciated.

Best
Simon


On 19:42 Sun 22.11.09, Simon Friedberger wrote:
> Hi everybody,
> 
> I sent a message about the K-Means algorithm a couple of days ago but it
> seems like it never made it on the list. Are new members moderated or
> something?
> 
> Best
> Simon
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From warren.weckesser at enthought.com  Sun Nov 22 13:59:09 2009
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Sun, 22 Nov 2009 12:59:09 -0600
Subject: [SciPy-User] Incoherent results with signal.impulse
In-Reply-To: <4B0933A7.6020708@freesbee.fr>
References: <4B0933A7.6020708@freesbee.fr>
Message-ID: <4B0989FD.4060809@enthought.com>

C?drick FAURY wrote:
> Hello,
>
> I have scipy 0.7.1, python 2.6, and when I do :
>
> n = scipy.array([1])
> d = scipy.array([0.01, 0.2, 1.0])
> T, yout = scipy.signal.impulse((n,d))
>
> it gives incoherent results for yout.
>
> And that doesn't occurs with [1.0, 2.0, 1.0] denominator.
>
> Is it a bug ?
> I'm doing something wrong ?
> Is anybody knows a solution ?

Hi C?drick,

scipy.signal.impulse assumes that the state matrix A is diagonalizable, 
so it does not give a correct result when A is defective.  I would call 
that a bug. :)

The attached file contains the function impulse_response() that uses a 
different method to compute the impulse response.  If run as a script, 
the code at the bottom of the file plots impulse responses computed by 
impulse_response() and by scipy.signal.impulse() for your example, and 
for two other values of the leading coefficient of your denominator.


Warren
 

> Thanks by advance
>
> C?drick FAURY
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: impulse_response.py
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091122/95e58aeb/attachment.ksh>

From cedrick.faury at freesbee.fr  Sun Nov 22 14:31:00 2009
From: cedrick.faury at freesbee.fr (=?ISO-8859-1?Q?C=E9drick_FAURY?=)
Date: Sun, 22 Nov 2009 20:31:00 +0100
Subject: [SciPy-User]  Incoherent results with signal.impulse
References: 4B0933A7.6020708@freesbee.fr
Message-ID: <4B099174.8020800@freesbee.fr>

>
> >/ Hello,
> />/
> />/ I have scipy 0.7.1, python 2.6, and when I do :
> />/
> />/ n = scipy.array([1])
> />/ d = scipy.array([0.01, 0.2, 1.0])
> />/ T, yout = scipy.signal.impulse((n,d))
> />/
> />/ it gives incoherent results for yout.
> />/
> />/ And that doesn't occurs with [1.0, 2.0, 1.0] denominator.
> />/
> />/ Is it a bug ?
> />/ I'm doing something wrong ?
> />/ Is anybody knows a solution ?
> /
>
> scipy.signal.impulse assumes that the state matrix A is diagonalizable, 
> so it does not give a correct result when A is defective.  I would call 
> that a bug. :)
>
> The attached file contains the function impulse_response() that uses a 
> different method to compute the impulse response.  If run as a script, 
> the code at the bottom of the file plots impulse responses computed by 
> impulse_response() and by scipy.signal.impulse() for your example, and 
> for two other values of the leading coefficient of your denominator.
>   
Thank you very much, it works fine now !
Actualy, if i'm right, the solution is to use lsim2 ?

C?drick


From warren.weckesser at enthought.com  Sun Nov 22 14:38:29 2009
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Sun, 22 Nov 2009 13:38:29 -0600
Subject: [SciPy-User] Incoherent results with signal.impulse
In-Reply-To: <4B099174.8020800@freesbee.fr>
References: 4B0933A7.6020708@freesbee.fr <4B099174.8020800@freesbee.fr>
Message-ID: <4B099335.7070200@enthought.com>

C?drick FAURY wrote:
>>> / Hello,
>>>       
>> />/
>> />/ I have scipy 0.7.1, python 2.6, and when I do :
>> />/
>> />/ n = scipy.array([1])
>> />/ d = scipy.array([0.01, 0.2, 1.0])
>> />/ T, yout = scipy.signal.impulse((n,d))
>> />/
>> />/ it gives incoherent results for yout.
>> />/
>> />/ And that doesn't occurs with [1.0, 2.0, 1.0] denominator.
>> />/
>> />/ Is it a bug ?
>> />/ I'm doing something wrong ?
>> />/ Is anybody knows a solution ?
>> /
>>
>> scipy.signal.impulse assumes that the state matrix A is diagonalizable, 
>> so it does not give a correct result when A is defective.  I would call 
>> that a bug. :)
>>
>> The attached file contains the function impulse_response() that uses a 
>> different method to compute the impulse response.  If run as a script, 
>> the code at the bottom of the file plots impulse responses computed by 
>> impulse_response() and by scipy.signal.impulse() for your example, and 
>> for two other values of the leading coefficient of your denominator.
>>   
>>     
> Thank you very much, it works fine now !
> Actualy, if i'm right, the solution is to use lsim2 ?
>   

Yes, it uses lsim2, with the input U all zeros, and with the initial 
condition set to the B matrix (plus the optional X0, if given).

Warren
> C?drick
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From josef.pktd at gmail.com  Mon Nov 23 00:43:57 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 23 Nov 2009 00:43:57 -0500
Subject: [SciPy-User] stats,
	classes instead of functions for results MovStats
Message-ID: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>

Following up on a question by Keith on the numpy list and his reminder
that covariance can be calculated by the cross-product minus the
product of the means, I redid and
enhanced my moving stats functions.

Suppose x and y are two time series, then the moving correlation
requires the calculation of the mean, variance and covariance for each
window. Currently in scipy stats intermediate results are usually
thrown away on return (while rpy/R returns all intermediate results
used for the calculation.

Using a decorator/descriptor of Fernando written for nitime, I tried
out to write the function as a class instead, so that any desired (
intermediate) calculations are only made on demand, but once they are
calculated they are attached to the class as attributes or properties.
This seems to be a useful "pattern".

Are there any opinion for using the pattern in scipy.stats ? MovStats
will currently go into statsmodels

Below is the class (with cutting part of init), a full script is the
attachment, including examples that test the class.

about MovStats:
y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
axis=1, should (but may not yet) work for nd arrays along any axis
(signal.correlate docstring)
nans are handled by dropping the corresponding observations from the
window, not adding any additional observations,
not tested if a window is empty because it contains only nans, nor if
variance is zero
(kern is intended for weighted statistics in the window but not tested
yet, I still need to decide on normalization requirements)
requires scipy.signal, all calculations done with signal.correlate, no loops
as often, functions are one-liners
all results are returned for valid observations only, initial
observations with incomplete window are cut
bonus: slope of moving regression of y on x, since it was trivial to add
still some cleaning and documentation to do

usage:
ms = MovStats(x, y, axis=1)
ms.yvar
ms.xmean
ms.yxcorr
ms.yxcov
...


Josef

class MovStats(object):
    def __init__(self, y, x=None, kern=5, axis=0):
        self.y = y
        self.x = x
        if np.isscalar(kern):
            ws = kern
<... snip>

    @OneTimeProperty
    def ymean(self):
        ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice]
        ym = ys/self.n
        return ym

    @OneTimeProperty
    def yvar(self):
        ys2 = signal.correlate(self.y*self.y, self.kern,
mode='same')[self.sslice]
        yvar = ys2/self.n  - self.ymean**2
        return yvar

    @OneTimeProperty
    def xmean(self):
        if self.x is None:
            return None
        else:
            xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice]
            xm = xs/self.n
            return xm

    @OneTimeProperty
    def xvar(self):
        if self.x is None:
            return None
        else:
            xs2 = signal.correlate(self.x*self.x, self.kern,
mode='same')[self.sslice]
            xvar = xs2/self.n  - self.xmean**2
            return xvar
    @OneTimeProperty
    def yxcov(self):
        xys = signal.correlate(self.x*self.y, self.kern,
mode='same')[self.sslice]
        return xys/self.n - self.ymean*self.xmean

    @OneTimeProperty
    def yxcorr(self):
        return self.yxcov/np.sqrt(self.yvar*self.xvar)

    @OneTimeProperty
    def yxslope(self):
        return self.yxcov/self.xvar
-------------- next part --------------
# -*- coding: utf-8 -*-
"""
Created on Sat Nov 21 14:22:29 2009

Author: josef-pktd
"""

import numpy as np
from scipy import signal

class OneTimeProperty(object):
   """A descriptor to make special properties that become normal attributes.

   This is meant to be used mostly by the auto_attr decorator in this module.
   Author: Fernando Perez, copied from nitime
   """
   def __init__(self,func):
       """Create a OneTimeProperty instance.

        Parameters
        ----------
          func : method
          
            The method that will be called the first time to compute a value.
            Afterwards, the method's name will be a standard attribute holding
            the value of this computation.
            """
       self.getter = func
       self.name = func.func_name

   def __get__(self,obj,type=None):
       """This will be called on attribute access on the class or instance. """

       if obj is None:
           # Being called on the class, return the original function. This way,
           # introspection works on the class.
           #return func
           print 'class access'
           return self.getter

       val = self.getter(obj)
       #print "** auto_attr - loading '%s'" % self.name  # dbg
       setattr(obj, self.name, val)
       return val


def moving_slope(x,y):
    '''estimate moving slope coefficient of regression of y on x
    filters along axis=1, returns valid observations
    Todo: axis and lag options
    idea by John D'Errico
    '''
    xx = np.column_stack((np.ones(x.shape), x))
    pinvxx = np.linalg.pinv(xx)[1:,:]
    windsize = len(x)
    lead = windsize//2 - 1
    return signal.correlate(y, pinvxx, 'full' )[:,windsize-lead:-(windsize+1*lead-2)]

def corrxy(x, y, ws):
    # based on example by Keith
    d = np.nan * np.ones_like(y)
    for i in range(y.shape[0]):
        yi = y[i,:]
        xi = x[i,:]
        for j in range(ws-1, y.shape[1]):
            yj = yi[j+1-ws:j+1]
            xj = xi[j+1-ws:j+1]
            d[i,j] = np.corrcoef(xj, yj, bias=1)[0,1]
    return d

    
x = np.sin(np.arange(20))[None,:] + np.random.randn(5, 20)
#x = y**2

def movstats(y, x=None, ws=5, kind='mvcr', axis=0):
    ''' return moving correlation between two timeseries
    handles 1d or 2d data
    '''
    kdim = [1]*y.ndim
    kdim[axis] = ws
    kern = np.ones(tuple(kdim))
    sslice = [slice(None)]*y.ndim
    sslice[axis] = slice(ws//2, -ws//2+1)
    ys = signal.correlate(y, kern, mode='same')[sslice]
    ys2 = signal.correlate(y*y, kern, mode='same')[sslice]
    xs = signal.correlate(x, kern, mode='same')[sslice]
    xs2 = signal.correlate(x*x, kern, mode='same')[sslice]
    xys = signal.correlate(x*y, kern, mode='same')[sslice]
    n = ws
    ym = ys/(1.*n)
    xm = xs/(1.*n)
    yvar = ys2/(1.*n) - ym**2
    xvar = xs2/(1.*n) - xm**2
    xycov = xys/(1.*n) - ym*xm
    xycorr = xycov/np.sqrt(yvar*xvar)
    return xycorr

class MovStats(object):
    def __init__(self, y, x=None, kern=5, axis=0):
        self.y = y
        self.x = x
        if np.isscalar(kern):
            ws = kern        
            kdim = [1]*self.y.ndim
            #print ws, kdim, self.y.ndim
            kdim[axis] = ws
            self.kern = np.ones(tuple(kdim))
        else:
            ws = y.shape[axis]
            if ((kern.ndim != self.y.ndim)
                or np.all([kern.shape(i) for i in self.y.ndim
                   if not i==axis])):
                raise ValueError('kern has incorrect shape')
            self.kern = kern

        sslice = [slice(None)]*y.ndim
        sslice[axis] = slice(ws//2, -ws//2+1)
        self.sslice = sslice
        #Todo: add nan handling
        
        if self.x is None:
            ynotnan = ~np.isnan(self.y)
        else:
            self.x = np.copy(self.x)
            ynotnan = (~np.isnan(self.y))*(~np.isnan(self.x))
            #ynotnan = ~np.logical_or(np.isnan(self.y), np.isnan(self.x))
            self.x[~ynotnan] = 0
        self.y = np.copy(self.y)
        self.y[~ynotnan] = 0
        if ynotnan.all():
            self.n = 1.0* ws
        else:
            self.n = signal.correlate(ynotnan, self.kern, mode='same')[self.sslice]

    @OneTimeProperty
    def ymean(self):
        ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice]
        ym = ys/self.n
        return ym
    
    @OneTimeProperty
    def yvar(self):
        ys2 = signal.correlate(self.y*self.y, self.kern, mode='same')[self.sslice]
        yvar = ys2/self.n  - self.ymean**2
        return yvar
    
    @OneTimeProperty
    def xmean(self):
        if self.x is None:
            return None
        else:
            xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice]
            xm = xs/self.n
            return xm
    @OneTimeProperty
    def xvar(self):
        if self.x is None:
            return None
        else:
            xs2 = signal.correlate(self.x*self.x, self.kern, mode='same')[self.sslice]
            xvar = xs2/self.n  - self.xmean**2
            return xvar
    @OneTimeProperty
    def yxcov(self):
        xys = signal.correlate(self.x*self.y, self.kern, mode='same')[self.sslice]
        return xys/self.n - self.ymean*self.xmean
            
    @OneTimeProperty
    def yxcorr(self):
        return self.yxcov/np.sqrt(self.yvar*self.xvar)
    
    @OneTimeProperty
    def yxslope(self):
        return self.yxcov/self.xvar
    

x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])

x = np.sin(np.arange(20))[None,:] + np.random.randn(5, 20)
y = 1*np.arange(20)[None,:] + np.random.randn(5, 20)
ws=5

ms = MovStats(x, y, axis=1)
print dir(ms)
xyc = MovStats(x, y, axis=1).yxcorr
xyc_loop = corrxy(x, y, ws)[:,ws-1:]
#testing
#print xyc_loop
#print xyc
print np.corrcoef(y[0,:5],x[0,:5],bias=1)
print np.corrcoef(y[0,2:7],x[0,2:7],bias=1)
print np.corrcoef(y[1,:5],x[1,:5],bias=1)
print np.corrcoef(y[-1,-5:],x[-1,-5:],bias=1)
print 'maxabsdiff', np.max(np.abs(xyc_loop - xyc))

print 'test yxolsslope'
from scipy import stats
print stats.linregress(y[0,:5],x[0,:5])[0]
print stats.linregress(y[0,2:7],x[0,2:7])[0]
print stats.linregress(y[-1,-5:],x[-1,-5:])[0]
print ms.yxslope

print 'test axis=0'
xyc_loopT = corrxy(x.T, y.T, ws)[:,ws-1:].T
xycT = MovStats(x, y, axis=0).yxcorr
print 'maxabsdiff', np.max(np.abs(xyc_loopT - xycT))

print 'testnan'
xn = x.copy()
xn[:, 2::5] = np.nan

xync = MovStats(xn, y, axis=1).yxcorr
#print xync
xnr = xn[~np.isnan(xn)].reshape(5,-1)
ynr = y[~np.isnan(xn)].reshape(5,-1)
xync_loop = corrxy(xnr, ynr, 4)[:,4-1:]
xyncr = xync[~np.isnan(xn)[:,4:]].reshape(5,-1)
print 'maxabsdiff', np.max(np.abs(xync_loop - xyncr))


From dwf at cs.toronto.edu  Mon Nov 23 01:10:03 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Mon, 23 Nov 2009 01:10:03 -0500
Subject: [SciPy-User] kmeans (Re:  Mailing list?)
In-Reply-To: <20091122184621.GB9227@a-oben.org>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org>
Message-ID: <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>

On 22-Nov-09, at 1:46 PM, Simon Friedberger wrote:

> I just looked at the documentation for the K-Means vector quantization
> functions and I am a bit confused. On the one hand it says that
> normalization to unit variance would be beneficial on the other hand
> there are a lot of "must"s in the descriptions.
> I was wondering if it is possible to use the functions without
> normalization or if there is a negative impact.

Damian did use some strong language there. The kmeans function won't  
know whether you've normalized or not, but in some cases you can  
expect much better solutions with normalized input (the function is  
called "whiten" which is somewhat misleading, as "whitening" is often  
used in the literature to mean decorrelating i.e. rotating by the  
eigenvectors of the covariance).

kmeans uses the Euclidean distance, meaning that the distance between  
two points is the sum of the squared difference of each point's  
coordinates. If you have different coordinates that have vastly  
different scales, say some in the thousands and some that are always  
less than one, then one or two coordinates can dominate the distance  
calculation and make the other coordinates nearly irrelevant in the  
clustering (if the difference in scale is _really_ big then you can  
lose precision in the rounding error, too).

Units are almost always arbitrary, and so scaling by the standard  
deviation helps this in that it treats all of your features  
"equally" (you may also want to subtract the mean before calling  
whiten(), as well - this is in fact one of the standard tricks, it  
doesn't matter so much here but it can help with numerical  
conditioning in a lot of algorithms, particularly ones that involve  
gradient descent).

If you want to quantize new vectors after clustering then you can  
simply apply the reverse transformation to your codebook/centroids  
(i.e. multiply by the std. dev. of the original data and add back the  
mean if you subtracted it) which will scale them to the correct  
position in the space of the original data.

> This would make sense because it seems reasonable that one would  
> want to
> build the codebook on some set and then quantize a different set. In
> this case normalization would have to be the same or be omitted.

Yes, you could apply the normalization you applied to the training  
data on every point you wish to quantize after the fact, but it's  
usually easier to just apply the inverse transformation to the  
codebook (especially if the number of points you want to quantize  
greatly exceeds the number of items in your codebook, in which case  
you save a lot of floating point ops).

> I am also interested in literature recommendations concerning why this
> is a good idea in general.

Well, lots of references will tell you what I just told you (that high- 
variance features will dominate distance calculations if you don't  
normalize). There is some discussion of it in the 'Prototype Methods'  
chapter in 'The Elements of Statistical Learning': http://www-stat.stanford.edu/~tibs/ElemStatLearn/

David


From pgmdevlist at gmail.com  Mon Nov 23 01:39:16 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 23 Nov 2009 01:39:16 -0500
Subject: [SciPy-User] stats,
	classes instead of functions for results MovStats
In-Reply-To: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
Message-ID: <CDCC812A-6B55-4CA0-B9CE-D39637EE04DC@gmail.com>

On Nov 23, 2009, at 12:43 AM, josef.pktd at gmail.com wrote:
> Following up on a question by Keith on the numpy list and his reminder
> that covariance can be calculated by the cross-product minus the
> product of the means, I redid and
> enhanced my moving stats functions.
> 
> Suppose x and y are two time series, then the moving correlation
> requires the calculation of the mean, variance and covariance for each
> window. Currently in scipy stats intermediate results are usually
> thrown away on return (while rpy/R returns all intermediate results
> used for the calculation.
> 
> Using a decorator/descriptor of Fernando written for nitime, I tried
> out to write the function as a class instead, so that any desired (
> intermediate) calculations are only made on demand, but once they are
> calculated they are attached to the class as attributes or properties.
> This seems to be a useful "pattern".
> 
> Are there any opinion for using the pattern in scipy.stats ? MovStats
> will currently go into statsmodels
> 
> Below is the class (with cutting part of init), a full script is the
> attachment, including examples that test the class.
> 
> about MovStats:
> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
> axis=1, should (but may not yet) work for nd arrays along any axis
> (signal.correlate docstring)
> nans are handled by dropping the corresponding observations from the
> window, not adding any additional observations,
> not tested if a window is empty because it contains only nans, nor if
> variance is zero
> (kern is intended for weighted statistics in the window but not tested
> yet, I still need to decide on normalization requirements)
> requires scipy.signal, all calculations done with signal.correlate, no loops
> as often, functions are one-liners
> all results are returned for valid observations only, initial
> observations with incomplete window are cut
> bonus: slope of moving regression of y on x, since it was trivial to add
> still some cleaning and documentation to do


Can you add support for MaskedArrays ?
The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed.
You can also check what Matt did w/ scikits.timeseries.
About your suggestion: I'd leave it in statsmodels for now...

From josef.pktd at gmail.com  Mon Nov 23 02:13:28 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 23 Nov 2009 02:13:28 -0500
Subject: [SciPy-User] stats,
	classes instead of functions for results 	MovStats
In-Reply-To: <CDCC812A-6B55-4CA0-B9CE-D39637EE04DC@gmail.com>
References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
	<CDCC812A-6B55-4CA0-B9CE-D39637EE04DC@gmail.com>
Message-ID: <1cd32cbb0911222313v3bd21f9aye671125d0fea28b0@mail.gmail.com>

On Mon, Nov 23, 2009 at 1:39 AM, Pierre GM <pgmdevlist at gmail.com> wrote:
> On Nov 23, 2009, at 12:43 AM, josef.pktd at gmail.com wrote:
>> Following up on a question by Keith on the numpy list and his reminder
>> that covariance can be calculated by the cross-product minus the
>> product of the means, I redid and
>> enhanced my moving stats functions.
>>
>> Suppose x and y are two time series, then the moving correlation
>> requires the calculation of the mean, variance and covariance for each
>> window. Currently in scipy stats intermediate results are usually
>> thrown away on return (while rpy/R returns all intermediate results
>> used for the calculation.
>>
>> Using a decorator/descriptor of Fernando written for nitime, I tried
>> out to write the function as a class instead, so that any desired (
>> intermediate) calculations are only made on demand, but once they are
>> calculated they are attached to the class as attributes or properties.
>> This seems to be a useful "pattern".
>>
>> Are there any opinion for using the pattern in scipy.stats ? MovStats
>> will currently go into statsmodels
>>
>> Below is the class (with cutting part of init), a full script is the
>> attachment, including examples that test the class.
>>
>> about MovStats:
>> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
>> axis=1, should (but may not yet) work for nd arrays along any axis
>> (signal.correlate docstring)
>> nans are handled by dropping the corresponding observations from the
>> window, not adding any additional observations,
>> not tested if a window is empty because it contains only nans, nor if
>> variance is zero
>> (kern is intended for weighted statistics in the window but not tested
>> yet, I still need to decide on normalization requirements)
>> requires scipy.signal, all calculations done with signal.correlate, no loops
>> as often, functions are one-liners
>> all results are returned for valid observations only, initial
>> observations with incomplete window are cut
>> bonus: slope of moving regression of y on x, since it was trivial to add
>> still some cleaning and documentation to do
>
>
> Can you add support for MaskedArrays ?
> The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed.

Since only __init__ is affected this should be quite easy, I only need
the mask for the calculation of the number of  non-nan elements in a
window, and to fill the data array with zeros. I haven't thought about
different numeric types, I guess I should make sure that also for the
non-ma arrays the calculations are done with floats.

> You can also check what Matt did w/ scikits.timeseries.
The way of calculating this, I initially got from scikits.timeseries
autocovariance, your moving_funcs are mostly in c,
cmov_window uses np.convolve which is only for 1d and needs to loop.
The advantage of scipy.signal over numpy is that it does nd
convolution.
 I will look at the mask handling in time series again.

I always get mixed up with convolve versus correlate. Is there a
standard sorting for time series, up to down or left to right by
increasing time or reversed? I have to check this for non-flat window
weights/kernels.


> About your suggestion: I'd leave it in statsmodels for now...
movstat goes into statsmodels.sandbox.tsa which is my playground for
time series analysis

for scipy.stats I was thinking more of existing or other functions,
e.g. my version of groupstats, (mean, variance, demean, ... by groups)
would follow the same pattern of partly expensive calculations on
demand.

Josef


> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From simon+python at a-oben.org  Mon Nov 23 02:18:33 2009
From: simon+python at a-oben.org (Simon Friedberger)
Date: Mon, 23 Nov 2009 08:18:33 +0100
Subject: [SciPy-User] kmeans (Re:  Mailing list?)
In-Reply-To: <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org>
	<86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
Message-ID: <20091123071831.GA14634@a-oben.org>

Hi David,

thanks for your explanation. I agree with your arguments but couldn't it
have the opposite effect: Weighing features that should have less
discriminative power more because they have a small variance?
I'm just not sure about it but I will check out the book you reference.
I've had it lying around for a while anyway.

On the case of inverting the transformation. Is this functionality
built-in? I can't find anything in the docs.

Thanks
Simon


From robert.kern at gmail.com  Mon Nov 23 02:54:16 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 23 Nov 2009 01:54:16 -0600
Subject: [SciPy-User] kmeans (Re: Mailing list?)
In-Reply-To: <20091123071831.GA14634@a-oben.org>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org> 
	<86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
	<20091123071831.GA14634@a-oben.org>
Message-ID: <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com>

On Mon, Nov 23, 2009 at 01:18, Simon Friedberger
<simon+python at a-oben.org> wrote:
> Hi David,
>
> thanks for your explanation. I agree with your arguments but couldn't it
> have the opposite effect: Weighing features that should have less
> discriminative power more because they have a small variance?

If a variable has a small variance, a large deviation in that variable
is *very* informative and should have a larger impact on the
classification than a small deviation in a variable that has a large
variance.

Let's distinguish two cases: one in which each variable has its own
units (let's say degrees Celsius and meters) and one in which each
variable is commensurable and in the same units (let's say meters).

Now, in the first case, you need some way to put all of the variables
into the same units so you can sensibly compute a distance using all
of the variables. A reasonable choice of units is "one standard
deviation [of the marginal distribution for the variable]".

In the second case, there *may* be a case for not doing prewhitening.
If your points are actually 3D points in real space with a metric,
then you may want to use that space's metric as the distance. However,
if the process that created your data is creating "oblong"
distributions of points, that may indicate that it is using a
different notion of distance. In fact, you may want to do a PCA to
find the right rotation such that your variables are orthogonal to the
principal directions of variation. And then prewhiten in those
directions.

The key point is to find an appropriate definition of distance to use.
Prewhitening is a good default when you don't have a model of your
process, yet. And you usually don't. :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From matthieu.brucher at gmail.com  Mon Nov 23 08:46:40 2009
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Mon, 23 Nov 2009 14:46:40 +0100
Subject: [SciPy-User] Modified Bessel functions of the first kind
Message-ID: <e76aa17f0911230546t5a6eaccbs5b31d655dec22354@mail.gmail.com>

Hi,

I need the zero-order modified Bessel function of the first kind. I've
seen the scipy.special.besselpoly, but I don't know if it really is
what I'm looking for...
Does someone know?

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


From eadrogue at gmx.net  Mon Nov 23 09:11:33 2009
From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=)
Date: Mon, 23 Nov 2009 15:11:33 +0100
Subject: [SciPy-User] Modified Bessel functions of the first kind
In-Reply-To: <e76aa17f0911230546t5a6eaccbs5b31d655dec22354@mail.gmail.com>
References: <e76aa17f0911230546t5a6eaccbs5b31d655dec22354@mail.gmail.com>
Message-ID: <20091123141133.GA3743@doriath.local>

Hi, 
23/11/09 @ 14:46 (+0100), thus spake Matthieu Brucher:
> I need the zero-order modified Bessel function of the first kind. I've
> seen the scipy.special.besselpoly, but I don't know if it really is
> what I'm looking for...
> Does someone know?

It's scipy.special.iv(order, x)
it gives the modified Bessel function of the first kind of
order 'order' evaluated at 'x'.

Bye.


From matthieu.brucher at gmail.com  Mon Nov 23 09:15:41 2009
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Mon, 23 Nov 2009 15:15:41 +0100
Subject: [SciPy-User] Modified Bessel functions of the first kind
In-Reply-To: <20091123141133.GA3743@doriath.local>
References: <e76aa17f0911230546t5a6eaccbs5b31d655dec22354@mail.gmail.com>
	<20091123141133.GA3743@doriath.local>
Message-ID: <e76aa17f0911230615x16a36d47n3d76ea3f41c44918@mail.gmail.com>

Excellent! Thanks a lot for this.

Matthieu

2009/11/23 Ernest Adrogu? <eadrogue at gmx.net>:
> Hi,
> 23/11/09 @ 14:46 (+0100), thus spake Matthieu Brucher:
>> I need the zero-order modified Bessel function of the first kind. I've
>> seen the scipy.special.besselpoly, but I don't know if it really is
>> what I'm looking for...
>> Does someone know?
>
> It's scipy.special.iv(order, x)
> it gives the modified Bessel function of the first kind of
> order 'order' evaluated at 'x'.
>
> Bye.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


From cclarke at chrisdev.com  Mon Nov 23 08:59:21 2009
From: cclarke at chrisdev.com (Christopher Clarke)
Date: Mon, 23 Nov 2009 09:59:21 -0400
Subject: [SciPy-User] timeseries forwardfill
In-Reply-To: <3E381758-BD11-4EAE-B035-16554519F724@chrisdev.com>
References: <D7A66682-93DC-49DA-931E-FD868DACF6A4@chrisdev.com>
	<D9321E95-39E7-4F3D-A92B-B16DC8A46265@gmail.com>
	<9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com>
	<DC530B35-B3E5-47DD-8483-2DF08A13106E@gmail.com>
	<3E381758-BD11-4EAE-B035-16554519F724@chrisdev.com>
Message-ID: <6fb517fa0911230559k51b8cbaat141c02e1d82acb77@mail.gmail.com>

Hi
Something seems to have gone wrong with my initial reply!!
Anyway, I often encounter the "initial values" use case when I am creating
business day time series out of RDBMS tables using a subset of the
observations in the table.  For example i have a SQL query fragment like
WHERE symbol='SFC' and dateix BETWEEN '2009-01-01' AND '2009-09-01'
Now suppose that 2009-01-02 and 2009-01-05 are missing (trading is sparse on
many of the exchanges i'm dealing with) i am supposed to forward_fill using
the last traded value for SFC which may or may not be 2008-12-31.  Hence i
have a query that find the last traded values and i use these as the
"initial values".

Anyway here is by forward_fill wrapper. Its not very efficient as i'm
copying and forward_fill is copying etc but..
 I'm actually starting to have reservation about the usefulness forward_fill
on 2d as opposed to  the individual the individual series arrays as i am
finding that i've often got to do loads of transformations and checking on
the individuals arrays before i can combine them into  a single ma array for
filling anyway


def forward_fill2(marr,maxgap=None,init_vals=None):
  """
  init_vals a list with  the same no. of elements as marr.shape[1]
  """
    arr=ma.array(marr,copy=True
    if arr.ndim == 1:
        if init_vals:
            if arr.mask.any() and arr.mask[0]:
                if init_vals:
                    arr[0] = init_vals[0]

        return forward_fill(arr,maxgap)
    else:
        n = arr.shape[1]
        if init_vals:
            mask=ma.getmask(arr)
            if len(init_vals) != n:
                raise ValueError, 'Initial Values sequence does no match
number of columns'
            for c in range(len(init_vals)):
                if arr.mask.any() and mask[0,c]:
                        if init_vals[c]:
                            arr[0,c]=init_vals[c]
        arr = ma.hsplit(arr, n)[0]
        return ma.column_stack([forward_fill(np.squeeze(a),maxgap) for a in
arr])


On Thu, Nov 19, 2009 at 5:07 AM, Chris Clarke <cclarke at chrisdev.com> wrote:

> Hi
> The "initial value"
>
> On Nov 18, 2009, at 9:17 PM, Pierre GM wrote:
>
>
> On Nov 18, 2009, at 5:18 PM, Chris Clarke wrote:
>
> Sorry for the later reply.  Yes forward_fill is still there and it
>
> works!!!
>
>
> Good
>
> But it seemed to have some more capability (initial values, 2d arrays)
>
> when  it was in the sandbox??
>
> I may be wrong and mixing up with some other library.
>
>
> That does sound familiar, but i don't think it was part of
> scikits.timeseries...
> A patch for 2D would be welcome, I'm not quite sure what you mean by
> initial value, though
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091123/095fb31e/attachment.html>

From simon+python at a-oben.org  Mon Nov 23 10:29:43 2009
From: simon+python at a-oben.org (Simon Friedberger)
Date: Mon, 23 Nov 2009 16:29:43 +0100
Subject: [SciPy-User] kmeans (Re: Mailing list?)
In-Reply-To: <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org>
	<86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
	<20091123071831.GA14634@a-oben.org>
	<3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com>
Message-ID: <20091123152943.GC14634@a-oben.org>

Hi Robert,
I agree with you in every respect. :)

Now, it only remains to see if anybody knows how to get to the
'whitening' transformation so I can invert it or apply it to my other
data.

Anybody? :)

Best
Simon


From dwf at cs.toronto.edu  Mon Nov 23 13:30:57 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Mon, 23 Nov 2009 13:30:57 -0500
Subject: [SciPy-User] kmeans (Re:  Mailing list?)
In-Reply-To: <20091123071831.GA14634@a-oben.org>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org>
	<86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
	<20091123071831.GA14634@a-oben.org>
Message-ID: <0BA82226-42AD-46D7-A6B8-067759D1A8DD@cs.toronto.edu>

On 23-Nov-09, at 2:18 AM, Simon Friedberger wrote:

> thanks for your explanation. I agree with your arguments but  
> couldn't it
> have the opposite effect: Weighing features that should have less
> discriminative power more because they have a small variance?
> I'm just not sure about it but I will check out the book you  
> reference.
> I've had it lying around for a while anyway.

It could, but typically when you're employing k-means, you have little  
reason to believe any of the variables have any more explanatory power  
than any of the others, so treating them "equally" is the simplest,  
most reasonable thing to do. It indeed will inflate the range of low  
variance.

You also use the word "discriminative", which makes me think you're  
trying to do some sort of classification. Note that k-means can't take  
into account any label information and is thus ill-suited to  
classification, though it is sometimes used for this.

> On the case of inverting the transformation. Is this functionality
> built-in? I can't find anything in the docs.

It isn't, but maybe it should be. It'd involve rethinking the cluster  
module a bit (which I've been planning on as a means to expand it, but  
oh, the time, where does it go?...).

David


From dwf at cs.toronto.edu  Mon Nov 23 13:36:04 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Mon, 23 Nov 2009 13:36:04 -0500
Subject: [SciPy-User] kmeans (Re: Mailing list?)
In-Reply-To: <20091123152943.GC14634@a-oben.org>
References: <20091122184226.GA9227@a-oben.org>
	<20091122184621.GB9227@a-oben.org>
	<86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu>
	<20091123071831.GA14634@a-oben.org>
	<3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com>
	<20091123152943.GC14634@a-oben.org>
Message-ID: <FEB4A560-657D-4A37-BDC6-D98B7ED6B0E1@cs.toronto.edu>

On 23-Nov-09, at 10:29 AM, Simon Friedberger wrote:

> Hi Robert,
> I agree with you in every respect. :)
>
> Now, it only remains to see if anybody knows how to get to the
> 'whitening' transformation so I can invert it or apply it to my other
> data.

The 'whiten' function is only two lines:

     std_dev = std(obs, axis=0)
     return obs / std_dev

codebook *= std(youroriginaldata, axis=0)

will invert the transformation done by whiten() and apply it to your  
codebook.

David


From cimrman3 at ntc.zcu.cz  Tue Nov 24 07:10:58 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Tue, 24 Nov 2009 13:10:58 +0100
Subject: [SciPy-User] ANN: SfePy 2009.4
Message-ID: <4B0BCD52.4080706@ntc.zcu.cz>

I am pleased to announce release 2009.4 of SfePy.

Description
-----------

SfePy (simple finite elements in Python) is a software, distributed
under the BSD license, for solving systems of coupled partial
differential equations by the finite element method. The code is based
on NumPy and SciPy packages.

Mailing lists, issue tracking, git repository: http://sfepy.org
Home page: http://sfepy.kme.zcu.cz

New documentation site: http://docs.sfepy.org/doc

Many thanks to Logan Sorenson for the new documentation contents, and Vladimir
Lukes for setting up the server.

Highlights of this release
--------------------------
- unified handling of user-defined functions (for defining subdomains,
   heterogeneous material properties, boundary conditions etc.)
- greatly improved postprocessing and visualization capabilities, namely:
   - support for file sequences (evolutionary simulations)
   - animations (using ffmpeg)
   - automatic scalar bars
   - sfepy_gui.py: Mayavi2-based GUI to launch simulations

Major improvements
------------------
Apart from many bug-fixes, let us mention:
- quasistatic time stepping
- graphical logging:
   - dynamic adding of data groups (new axes) to Log and ProcessPlotter
- linear algebra:
   - reversed Cuthill-McKee permutation algorithm, graph in-place permutation
- setting of parameter variables by a user-defined function
- new tests and terms

For more information on this release, see
http://sfepy.googlecode.com/svn/web/releases/2009.4_RELEASE_NOTES.txt
(full release notes, rather long).

Best regards,
Robert Cimrman


From osman at fuse.net  Tue Nov 24 19:21:33 2009
From: osman at fuse.net (osman)
Date: Tue, 24 Nov 2009 19:21:33 -0500
Subject: [SciPy-User] ANN: SfePy 2009.4
In-Reply-To: <4B0BCD52.4080706@ntc.zcu.cz>
References: <4B0BCD52.4080706@ntc.zcu.cz>
Message-ID: <1259108493.28868.3.camel@osman-laptop>

On Tue, 2009-11-24 at 13:10 +0100, Robert Cimrman wrote:
> I am pleased to announce release 2009.4 of SfePy.

Hi Robert,
Thanks for the new version. I put it on my 32 bit ubuntu jaunty. All
tests ran fine without any errors. Then I tried isfepy. I am getting an
error:
In [1]: pb, vec, data = pde_solve('input/poisson.py')
sfepy: left over: ['__builtins__', '__file__', '__name__', '_filename',
'__doc__', '__package__']
sfepy: reading mesh (database/simple.mesh)...
sfepy:   ...done in 0.04 s
sfepy: setting up domain edges...
sfepy:   ...done in 0.01 s
sfepy: setting up domain faces...
sfepy:   ...done in 0.01 s
sfepy: creating regions...
sfepy:       Gamma_Right
sfepy:       Omega
sfepy:       Gamma_Left
sfepy:   ...done in 0.04 s
sfepy: equation "Temperature":
sfepy: dw_laplace.i1.Omega( coef.val, s, t ) = 0
sfepy: setting up dof connectivities...
sfepy:   ...done in 0.00 s
sfepy: describing geometries...
sfepy:   ...done in 0.00 s
sfepy: using solvers:
               nls: newton
                ls: ls
sfepy: matrix shape: (300, 300)
sfepy: assembling matrix graph...
sfepy:   ...done in 0.00 s
sfepy: matrix structural nonzeros: 3538 (3.93e-02% fill)
sfepy: updating materials...
sfepy:       coef
sfepy:   ...done in 0.01 s
sfepy: updating variables...
sfepy:   ...done
/usr/lib/python2.6/dist-packages/scipy/linsolve/__init__.py:4:
DeprecationWarning: scipy.linsolve has moved to
scipy.sparse.linalg.dsolve
  warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve',
DeprecationWarning)
sfepy: nls: iter: 0, residual: 1.176265e-01 (rel: 1.000000e+00)
/usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead
  ' install scikits.umfpack instead', DeprecationWarning )
sfepy:   rezidual:    0.00 [s]
sfepy:      solve:    0.01 [s]
sfepy:     matrix:    0.00 [s]
sfepy: nls: iter: 1, residual: 9.957055e-17 (rel: 8.464973e-16)

In [2]: view = Viewer(pb.get_output_name())

In [3]: view()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call
last)

/home/osman/sfepy-release-2009.4/sfepy/interactive/__init__.py in
<module>()
----> 1 
      2 
      3 
      4 
      5 

/home/osman/sfepy-release-2009.4/sfepy/postprocess/viewer.py in
call_mlab(self, scene, show, is_3d, view, roll, layout, scalar_mode,
vector_mode, rel_scaling, clamping, ranges, is_scalar_bar,
rel_text_width, fig_filename, resolution, filter_names, only_names,
step, anti_aliasing)
    555                 else:
    556                     gui = ViewerGUI(viewer=self)
--> 557                     scene = gui.scene.mayavi_scene
    558 
    559                 if scene is not self.scene:

AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene'


2009.3 release has no problem with isfepy.

Best,
Osman


From artpoon at gmail.com  Wed Nov 25 01:30:46 2009
From: artpoon at gmail.com (Art Poon)
Date: Tue, 24 Nov 2009 22:30:46 -0800
Subject: [SciPy-User] gamma ppf weirdness
Message-ID: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com>

Hello,

I'm trying to hammer out some quick simulation code and need to calculate a bunch of inverse CDF values from the gamma distribution.  SciPy seems like a great resource for this.  However, I've encountered some strangeness that is probably my own fault:

>>> g = stats.gamma(1,0,1)
>>> g
<scipy.stats.distributions.rv_frozen object at 0x1b9b2b0>
>>> g.ppf(0.1)
0.10536051565782635
>>> g.ppf(0.25)
0.0
>>> g.ppf(0.2500001)
0.28768220578512327
>>> g.ppf(0.2499999)
0.28768193911845646
>>> g.ppf(0.25)
0.0

I'm just dying to know where I've gone wrong here!  In the meantime, I'm coding up a function to compute the inverse CDF from MATLAB code.  

I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source.  

Thanks!
- Art.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091124/08b1da9b/attachment.html>

From josef.pktd at gmail.com  Wed Nov 25 01:44:47 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 01:44:47 -0500
Subject: [SciPy-User] gamma ppf weirdness
In-Reply-To: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com>
References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com>
Message-ID: <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com>

On Wed, Nov 25, 2009 at 1:30 AM, Art Poon <artpoon at gmail.com> wrote:
> Hello,
> I'm trying to hammer out some quick simulation code and need to calculate a
> bunch of inverse CDF values from the gamma distribution. ?SciPy seems like a
> great resource for this. ?However, I've encountered some strangeness that is
> probably my own fault:
>>>> g = stats.gamma(1,0,1)
>>>> g
> <scipy.stats.distributions.rv_frozen object at 0x1b9b2b0>
>>>> g.ppf(0.1)
> 0.10536051565782635
>>>> g.ppf(0.25)
> 0.0
>>>> g.ppf(0.2500001)
> 0.28768220578512327
>>>> g.ppf(0.2499999)
> 0.28768193911845646
>>>> g.ppf(0.25)
> 0.0
> I'm just dying to know where I've gone wrong here! ?In the meantime, I'm
> coding up a function to compute the inverse CDF from MATLAB code.
> I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python
> 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source.
> Thanks!
> - Art.

this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975

>>> stats.gamma.ppf(0.25, 1.,0.,1)
0.28768207245178096
>>> stats.gamma.ppf(0.2500001, 1.,0.,1)
0.28768220578512327
>>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1)
0.28766873920733566
>>> stats.gamma.ppf(0.25, 1,0,1)
0.28768207245178096
>>> stats.gamma(1,0,1).ppf(0.25)
0.28768207245178096
>>> import scipy
>>> scipy.version.version
'0.8.0.dev6118'


Josef

>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From josef.pktd at gmail.com  Wed Nov 25 02:00:43 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 02:00:43 -0500
Subject: [SciPy-User] gamma ppf weirdness
In-Reply-To: <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com>
References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com>
	<1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com>
Message-ID: <1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com>

On Wed, Nov 25, 2009 at 1:44 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 1:30 AM, Art Poon <artpoon at gmail.com> wrote:
>> Hello,
>> I'm trying to hammer out some quick simulation code and need to calculate a
>> bunch of inverse CDF values from the gamma distribution. ?SciPy seems like a
>> great resource for this. ?However, I've encountered some strangeness that is
>> probably my own fault:
>>>>> g = stats.gamma(1,0,1)
>>>>> g
>> <scipy.stats.distributions.rv_frozen object at 0x1b9b2b0>
>>>>> g.ppf(0.1)
>> 0.10536051565782635
>>>>> g.ppf(0.25)
>> 0.0
>>>>> g.ppf(0.2500001)
>> 0.28768220578512327
>>>>> g.ppf(0.2499999)
>> 0.28768193911845646
>>>>> g.ppf(0.25)
>> 0.0
>> I'm just dying to know where I've gone wrong here! ?In the meantime, I'm
>> coding up a function to compute the inverse CDF from MATLAB code.
>> I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python
>> 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source.
>> Thanks!
>> - Art.
>
> this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975
>
>>>> stats.gamma.ppf(0.25, 1.,0.,1)
> 0.28768207245178096
>>>> stats.gamma.ppf(0.2500001, 1.,0.,1)
> 0.28768220578512327
>>>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1)
> 0.28766873920733566
>>>> stats.gamma.ppf(0.25, 1,0,1)
> 0.28768207245178096
>>>> stats.gamma(1,0,1).ppf(0.25)
> 0.28768207245178096
>>>> import scipy
>>>> scipy.version.version
> '0.8.0.dev6118'
>
>
> Josef
>
>>

scipy special also has the function for the gamma.isf (which is
currently not used in stats.gamma)

>>> special.gammainccinv(1, 1-0.25)
0.28768207245178096

You could check whether it is correct on 0.7.0,  but I'm not sure
whether special.gammaincinv(a,q) and special.gammainccinv(1, 1-0.25)
are really independent implementation.

Josef


>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>


From artpoon at gmail.com  Wed Nov 25 02:05:31 2009
From: artpoon at gmail.com (Art Poon)
Date: Tue, 24 Nov 2009 23:05:31 -0800
Subject: [SciPy-User] gamma ppf weirdness
In-Reply-To: <1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com>
References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com>
	<1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com>
	<1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com>
Message-ID: <EE5690C0-274E-4FAC-87A3-69071730D008@gmail.com>


Excellent, thanks very much.
- Art.

On 2009-11-24, at 11:00 PM, josef.pktd at gmail.com wrote:

> On Wed, Nov 25, 2009 at 1:44 AM,  <josef.pktd at gmail.com> wrote:
>> On Wed, Nov 25, 2009 at 1:30 AM, Art Poon <artpoon at gmail.com> wrote:
>>> Hello,
>>> I'm trying to hammer out some quick simulation code and need to calculate a
>>> bunch of inverse CDF values from the gamma distribution.  SciPy seems like a
>>> great resource for this.  However, I've encountered some strangeness that is
>>> probably my own fault:
>>>>>> g = stats.gamma(1,0,1)
>>>>>> g
>>> <scipy.stats.distributions.rv_frozen object at 0x1b9b2b0>
>>>>>> g.ppf(0.1)
>>> 0.10536051565782635
>>>>>> g.ppf(0.25)
>>> 0.0
>>>>>> g.ppf(0.2500001)
>>> 0.28768220578512327
>>>>>> g.ppf(0.2499999)
>>> 0.28768193911845646
>>>>>> g.ppf(0.25)
>>> 0.0
>>> I'm just dying to know where I've gone wrong here!  In the meantime, I'm
>>> coding up a function to compute the inverse CDF from MATLAB code.
>>> I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python
>>> 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source.
>>> Thanks!
>>> - Art.
>> 
>> this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975
>> 
>>>>> stats.gamma.ppf(0.25, 1.,0.,1)
>> 0.28768207245178096
>>>>> stats.gamma.ppf(0.2500001, 1.,0.,1)
>> 0.28768220578512327
>>>>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1)
>> 0.28766873920733566
>>>>> stats.gamma.ppf(0.25, 1,0,1)
>> 0.28768207245178096
>>>>> stats.gamma(1,0,1).ppf(0.25)
>> 0.28768207245178096
>>>>> import scipy
>>>>> scipy.version.version
>> '0.8.0.dev6118'
>> 
>> 
>> Josef
>> 
>>> 
> 
> scipy special also has the function for the gamma.isf (which is
> currently not used in stats.gamma)
> 
>>>> special.gammainccinv(1, 1-0.25)
> 0.28768207245178096
> 
> You could check whether it is correct on 0.7.0,  but I'm not sure
> whether special.gammaincinv(a,q) and special.gammainccinv(1, 1-0.25)
> are really independent implementation.
> 
> Josef
> 
> 
>>> 
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> 
>>> 
>> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From cool-rr at cool-rr.com  Wed Nov 25 02:05:06 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Wed, 25 Nov 2009 07:05:06 +0000 (UTC)
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
Message-ID: <loom.20091125T075842-627@post.gmane.org>

Hello,

I've just started using scipy/numpy for some queue theory. I have a queue for 
which the arrival rate is a Poisson distribution. I also have the mean number of 
arrivals per time unit.

I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that
it could make a random variable for number of arrivals per time unit. But I want 
the time between consecutive arrivals, as a random variable.

Does anyone know how I can get that?

Thanks,
Ram.


From josef.pktd at gmail.com  Wed Nov 25 02:42:31 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 02:42:31 -0500
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <loom.20091125T075842-627@post.gmane.org>
References: <loom.20091125T075842-627@post.gmane.org>
Message-ID: <1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com>

On Wed, Nov 25, 2009 at 2:05 AM, Ram Rachum <cool-rr at cool-rr.com> wrote:
> Hello,
>
> I've just started using scipy/numpy for some queue theory. I have a queue for
> which the arrival rate is a Poisson distribution. I also have the mean number of
> arrivals per time unit.
>
> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that
> it could make a random variable for number of arrivals per time unit. But I want
> the time between consecutive arrivals, as a random variable.
>
> Does anyone know how I can get that?

I don't remember the relationship for the different random variables
related to arrival
processes (without looking it up again), but there is a related example in

https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/stats_distributions.py

Josef

>
> Thanks,
> Ram.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Wed Nov 25 03:00:13 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 03:00:13 -0500
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com>
References: <loom.20091125T075842-627@post.gmane.org>
	<1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com>
Message-ID: <1cd32cbb0911250000o70eb0f27n886074fab0fab308@mail.gmail.com>

On Wed, Nov 25, 2009 at 2:42 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 2:05 AM, Ram Rachum <cool-rr at cool-rr.com> wrote:
>> Hello,
>>
>> I've just started using scipy/numpy for some queue theory. I have a queue for
>> which the arrival rate is a Poisson distribution. I also have the mean number of
>> arrivals per time unit.
>>
>> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that
>> it could make a random variable for number of arrivals per time unit. But I want
>> the time between consecutive arrivals, as a random variable.
>>
>> Does anyone know how I can get that?
>
> I don't remember the relationship for the different random variables
> related to arrival
> processes (without looking it up again), but there is a related example in
>
> https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/stats_distributions.py

http://en.wikipedia.org/wiki/Queueing_theory#Role_of_Poisson_process.2C_exponential_distributions

mentions the exponential distribution for the time between arrivals

Josef

>
> Josef
>
>>
>> Thanks,
>> Ram.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From lucadeluge at gmail.com  Wed Nov 25 05:37:23 2009
From: lucadeluge at gmail.com (Luca Delucchi)
Date: Wed, 25 Nov 2009 11:37:23 +0100
Subject: [SciPy-User] problem with optimize.curve_fit
Message-ID: <b7d609c00911250237v25a1f291sd1fddbe98cc3a623@mail.gmail.com>

Hi everybody i try to use optimize.curve_fit but i have a error

gis at srvcavit:~/meteo_python$ python prova_optimize.py
Traceback (most recent call last):
  File "prova_optimize.py", line 23, in <module>
    popt, pcov = curve_fit(func,x,old_y,2)
  File "/usr/lib/python2.5/site-packages/scipy/optimize/minpack.py",
line 423, in curve_fit
    raise RuntimeError, "Optimal parameters not found: " + mesg
RuntimeError: Optimal parameters not found: Both actual and predicted
relative reductions in the sum of squares
  are at most 0.000000 and the relative error between two consecutive
iterates is at
  most 0.000000

i see that is a bug [0], i try to modify the script with the solution
proposed by kael but nothing change, here [1] you can find the script
that i use, how can i solve my problem?

thanks
Luca

[0] http://projects.scipy.org/scipy/ticket/984
[1] http://pastebin.com/m3c721c6f


From cimrman3 at ntc.zcu.cz  Wed Nov 25 06:00:03 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Wed, 25 Nov 2009 12:00:03 +0100
Subject: [SciPy-User] [Fwd: Re:  ANN: SfePy 2009.4]
Message-ID: <4B0D0E33.4010703@ntc.zcu.cz>

Hi Osman,

thanks for trying out the new version! As isfepy works for me, I assume it must
be a version issue with mayavi (tested with 3.3.0). What is your mayavi version?

If you cannot try 3.3.0, or use it already, could you send me the output of
'gui.scene.print_traits()'? Just put it prior to the offending line...

cheers,
r.
PS: I guess we should discuss this on sfepy-devel only...

osman wrote:
> On Tue, 2009-11-24 at 13:10 +0100, Robert Cimrman wrote:
>> I am pleased to announce release 2009.4 of SfePy.
> 
> Hi Robert,
> Thanks for the new version. I put it on my 32 bit ubuntu jaunty. All
> tests ran fine without any errors. Then I tried isfepy. I am getting an
> error:
> In [1]: pb, vec, data = pde_solve('input/poisson.py')
> sfepy: left over: ['__builtins__', '__file__', '__name__', '_filename',
> '__doc__', '__package__']
> sfepy: reading mesh (database/simple.mesh)...
> sfepy:   ...done in 0.04 s
> sfepy: setting up domain edges...
> sfepy:   ...done in 0.01 s
> sfepy: setting up domain faces...
> sfepy:   ...done in 0.01 s
> sfepy: creating regions...
> sfepy:       Gamma_Right
> sfepy:       Omega
> sfepy:       Gamma_Left
> sfepy:   ...done in 0.04 s
> sfepy: equation "Temperature":
> sfepy: dw_laplace.i1.Omega( coef.val, s, t ) = 0
> sfepy: setting up dof connectivities...
> sfepy:   ...done in 0.00 s
> sfepy: describing geometries...
> sfepy:   ...done in 0.00 s
> sfepy: using solvers:
>                nls: newton
>                 ls: ls
> sfepy: matrix shape: (300, 300)
> sfepy: assembling matrix graph...
> sfepy:   ...done in 0.00 s
> sfepy: matrix structural nonzeros: 3538 (3.93e-02% fill)
> sfepy: updating materials...
> sfepy:       coef
> sfepy:   ...done in 0.01 s
> sfepy: updating variables...
> sfepy:   ...done
> /usr/lib/python2.6/dist-packages/scipy/linsolve/__init__.py:4:
> DeprecationWarning: scipy.linsolve has moved to
> scipy.sparse.linalg.dsolve
>   warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve',
> DeprecationWarning)
> sfepy: nls: iter: 0, residual: 1.176265e-01 (rel: 1.000000e+00)
> /usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead
>   ' install scikits.umfpack instead', DeprecationWarning )
> sfepy:   rezidual:    0.00 [s]
> sfepy:      solve:    0.01 [s]
> sfepy:     matrix:    0.00 [s]
> sfepy: nls: iter: 1, residual: 9.957055e-17 (rel: 8.464973e-16)
> 
> In [2]: view = Viewer(pb.get_output_name())
> 
> In [3]: view()
> ---------------------------------------------------------------------------
> AttributeError                            Traceback (most recent call
> last)
> 
> /home/osman/sfepy-release-2009.4/sfepy/interactive/__init__.py in
> <module>()
> ----> 1 
>       2 
>       3 
>       4 
>       5 
> 
> /home/osman/sfepy-release-2009.4/sfepy/postprocess/viewer.py in
> call_mlab(self, scene, show, is_3d, view, roll, layout, scalar_mode,
> vector_mode, rel_scaling, clamping, ranges, is_scalar_bar,
> rel_text_width, fig_filename, resolution, filter_names, only_names,
> step, anti_aliasing)
>     555                 else:
>     556                     gui = ViewerGUI(viewer=self)
> --> 557                     scene = gui.scene.mayavi_scene
>     558 
>     559                 if scene is not self.scene:
> 
> AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene'
> 
> 
> 2009.3 release has no problem with isfepy.
> 
> Best,
> Osman


From gael.varoquaux at normalesup.org  Wed Nov 25 06:06:43 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 25 Nov 2009 12:06:43 +0100
Subject: [SciPy-User] [Fwd: Re:  ANN: SfePy 2009.4]
In-Reply-To: <4B0D0E33.4010703@ntc.zcu.cz>
References: <4B0D0E33.4010703@ntc.zcu.cz>
Message-ID: <20091125110643.GB21484@phare.normalesup.org>

On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote:
> >     555                 else:
> >     556                     gui = ViewerGUI(viewer=self)
> > --> 557                     scene = gui.scene.mayavi_scene
> >     558 
> >     559                 if scene is not self.scene:

> > AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene'

Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality
we needed a bit late :)

Ga?l


From cimrman3 at ntc.zcu.cz  Wed Nov 25 06:33:23 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Wed, 25 Nov 2009 12:33:23 +0100
Subject: [SciPy-User] [Fwd: Re:  ANN: SfePy 2009.4]
In-Reply-To: <20091125110643.GB21484@phare.normalesup.org>
References: <4B0D0E33.4010703@ntc.zcu.cz>
	<20091125110643.GB21484@phare.normalesup.org>
Message-ID: <4B0D1603.40509@ntc.zcu.cz>

Gael Varoquaux wrote:
> On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote:
>>>     555                 else:
>>>     556                     gui = ViewerGUI(viewer=self)
>>> --> 557                     scene = gui.scene.mayavi_scene
>>>     558 
>>>     559                 if scene is not self.scene:
> 
>>> AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene'
> 
> Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality
> we needed a bit late :)
> 
> Ga?l

And I have started using it even later :)

thanks for clarification!
r.


From almar.klein at gmail.com  Wed Nov 25 07:16:18 2009
From: almar.klein at gmail.com (Almar Klein)
Date: Wed, 25 Nov 2009 13:16:18 +0100
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
	<9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
Message-ID: <cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>

Hi,

I have an implementation of the Minimum Cost Path method too. It uses
a binary heap to store the
active pixels (I call it the front). Therefore it does not need to
iterate over the the whole image at
each iteration, but simply pops the pixel with the minimum cumulative
cost from the heap. This
significantly increase speed.

Because I am still changing stuff to my MCP implementation, and I use
it for my research, I am a bit
reluctant to make it publicly available now. I'd be happy to share the
binary heap implementation though.

Looking at the posted code, I think it is incorrect. Each iteration,
you should only check the neighbours
of the pixel that has the minimum cumulative costs. That's why the
binary heap is so important to get
it fast.

I short while ago I made a flash app with some examples. It also
contains the pseudo code, although
I use a slighly different terminology (cost=speed, cumulative cost=time):
http://dl.dropbox.com/u/1463853/mcpExamples.swf


Here's a snipet of how I use the binary heap:
=====
from heap cimport BinaryHeapWithCrossRef
front = BinaryHeapWithCrossRef(frozen_flat)
value, ii= front.Pop()  # to get the pixel of the front with the
minimum cumulative cost
front.Push(cumcost_flat[ii], ii)  #  to insert or update a value
=====
I use flat arrays and scalar indices, so I need to store only one
reference per pixel. This also
makes the implementation work for 3D data (or even higher dimensions
if you wish).
frozen_flat is a flat array, the same size as the imput (cost or
speed) image, that keeps track
whether pixels are frozen (indicating they wont change). A pixel is
frozen right after it is popped
from the heap.  I use the same array in the heap to be able to update
values if a pixel is already
in the heap.

I hope this helps a bit. If not, feel free to ask.

Almar


2009/11/22 St?fan van der Walt <stefan at sun.ac.za>:
> 2009/11/22 St?fan van der Walt <stefan at sun.ac.za>:
>> This code looks really handy, and I'd love to add it. ?Would you
>> consider putting your code in a branch on github?
>
> Actually, don't worry -- I'll add it quickly. ?Thanks for the contribution!
>
> Cheers
> St?fan
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heap.pxd
Type: application/octet-stream
Size: 1181 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/77dfe36a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heap.pyx
Type: application/octet-stream
Size: 26347 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/77dfe36a/attachment-0001.obj>

From zachary.pincus at yale.edu  Wed Nov 25 07:44:27 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 25 Nov 2009 07:44:27 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
	<9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
	<cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>
Message-ID: <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu>

Hi Almar,

The binary heap code looks extremely useful in general -- thanks for  
making it available! Do you have any license you want it under? (BSD  
seems preferable if this is to be incorporated into a scikit, e.g.)

It would be great if you would be interested in making your MCP code  
available too, even just as a base for others to hack on a bit (rather  
than as a finished contribution), but this is of course up to you.  
Otherwise I'll probably try to throw together something similar using  
the heap code.

> Looking at the posted code, I think it is incorrect. Each iteration,  
> you should only check the neighbours of the pixel that has the  
> minimum cumulative costs. That's why the binary heap is so important  
> to get it fast.


Incorrect means that the code might give a wrong result: is this the  
case? I *think* I had satisfied myself that the implementation (while  
suboptimal because it does extra work -- a lot in some cases!) would  
yield the correct path. (Note that the code doesn't terminate when the  
"end" pixel is first assigned a cost, but when no costs are changing  
anywhere. Basically, brute-force search instead of Dijkstra's  
algorithm. Again, while a lot more than necessary to just find the  
minimum cost to a single point, this condition should be sufficient to  
ensure that the minimum cost to *every* point in the array has been  
found, right? If my analysis is wrong, though, it wouldn't be the  
first time!)


> I use flat arrays and scalar indices, so I need to store only one  
> reference per pixel. This also makes the implementation work for 3D  
> data (or even higher dimensions if you wish).


Do you have code to take the flat index and the shape of the original  
array and return the indices to the neighboring pixels, or is there  
some other trick with that too?

Anyhow, thanks for your suggestions and contribution! I look forward  
to making use of the heap.

Best,
Zach


On Nov 25, 2009, at 7:16 AM, Almar Klein wrote:

> Hi,
>
> I have an implementation of the Minimum Cost Path method too. It uses
> a binary heap to store the
> active pixels (I call it the front). Therefore it does not need to
> iterate over the the whole image at
> each iteration, but simply pops the pixel with the minimum cumulative
> cost from the heap. This
> significantly increase speed.
>
> Because I am still changing stuff to my MCP implementation, and I use
> it for my research, I am a bit
> reluctant to make it publicly available now. I'd be happy to share the
> binary heap implementation though.
>
> Looking at the posted code, I think it is incorrect. Each iteration,
> you should only check the neighbours
> of the pixel that has the minimum cumulative costs. That's why the
> binary heap is so important to get
> it fast.
>
> I short while ago I made a flash app with some examples. It also
> contains the pseudo code, although
> I use a slighly different terminology (cost=speed, cumulative  
> cost=time):
> http://dl.dropbox.com/u/1463853/mcpExamples.swf
>
>
> Here's a snipet of how I use the binary heap:
> =====
> from heap cimport BinaryHeapWithCrossRef
> front = BinaryHeapWithCrossRef(frozen_flat)
> value, ii= front.Pop()  # to get the pixel of the front with the
> minimum cumulative cost
> front.Push(cumcost_flat[ii], ii)  #  to insert or update a value
> =====
> I use flat arrays and scalar indices, so I need to store only one
> reference per pixel. This also
> makes the implementation work for 3D data (or even higher dimensions
> if you wish).
> frozen_flat is a flat array, the same size as the imput (cost or
> speed) image, that keeps track
> whether pixels are frozen (indicating they wont change). A pixel is
> frozen right after it is popped
> from the heap.  I use the same array in the heap to be able to update
> values if a pixel is already
> in the heap.
>
> I hope this helps a bit. If not, feel free to ask.
>
> Almar
>
>
>
> 2009/11/22 St?fan van der Walt <stefan at sun.ac.za>:
>> 2009/11/22 St?fan van der Walt <stefan at sun.ac.za>:
>>> This code looks really handy, and I'd love to add it.  Would you
>>> consider putting your code in a branch on github?
>>
>> Actually, don't worry -- I'll add it quickly.  Thanks for the  
>> contribution!
>>
>> Cheers
>> St?fan
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> <heap.pxd><heap.pyx>_______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From almar.klein at gmail.com  Wed Nov 25 08:06:57 2009
From: almar.klein at gmail.com (Almar Klein)
Date: Wed, 25 Nov 2009 14:06:57 +0100
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
	<9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
	<cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>
	<6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu>
Message-ID: <cc38d75f0911250506q9a64201w82d1b9b72eb02733@mail.gmail.com>

Hi Zach,

> The binary heap code looks extremely useful in general -- thanks for
> making it available! Do you have any license you want it under? (BSD
> seems preferable if this is to be incorporated into a scikit, e.g.)

BSD's fine :)

> It would be great if you would be interested in making your MCP code
> available too, even just as a base for others to hack on a bit (rather
> than as a finished contribution), but this is of course up to you.
> Otherwise I'll probably try to throw together something similar using
> the heap code.

I'll send you my current implementation, you should be able to distill something
usefull from that. The problem is that 1) I make use of another module of
mine to deal with anisotropic data (because my data is anisotropic)  2) I use
the MCP method in a specific way, therefore I needed to make the
implementation more flexible. This makes it less easy to use for other people,
and thus less handy to include in any toolkit as-is.

>> Looking at the posted code, I think it is incorrect. Each iteration,
>> you should only check the neighbours of the pixel that has the
>> minimum cumulative costs. That's why the binary heap is so important
>> to get it fast.
>
> Incorrect means that the code might give a wrong result: is this the
> case? I *think* I had satisfied myself that the implementation (while
> suboptimal because it does extra work -- a lot in some cases!) would
> yield the correct path. (Note that the code doesn't terminate when the
> "end" pixel is first assigned a cost, but when no costs are changing
> anywhere. Basically, brute-force search instead of Dijkstra's
> algorithm. Again, while a lot more than necessary to just find the
> minimum cost to a single point, this condition should be sufficient to
> ensure that the minimum cost to *every* point in the array has been
> found, right? If my analysis is wrong, though, it wouldn't be the
> first time!)

I really mean wrong, sorry. You now select any pixel that is active (meaning
an arbitrary pixel in the front), and from it calculate the cumulative cost
for its neighbours. However, it might be that the cumulative cost of this pixel
is changed later. Therefore you must take the active pixel with the lowest
cumulative cost; so you know it won't be changed.

>> I use flat arrays and scalar indices, so I need to store only one
>> reference per pixel. This also makes the implementation work for 3D
>> data (or even higher dimensions if you wish).
>
>
> Do you have code to take the flat index and the shape of the original
> array and return the indices to the neighboring pixels, or is there
> some other trick with that too?

Yes, it's in the code I'll send you.


Cheers,
  Almar


From zachary.pincus at yale.edu  Wed Nov 25 08:50:00 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 25 Nov 2009 08:50:00 -0500
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <cc38d75f0911250506q9a64201w82d1b9b72eb02733@mail.gmail.com>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com>
	<9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
	<9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
	<cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>
	<6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu>
	<cc38d75f0911250506q9a64201w82d1b9b72eb02733@mail.gmail.com>
Message-ID: <D60C3466-D596-4D35-B6A4-0ADACABB5275@yale.edu>

> I'll send you my current implementation, you should be able to  
> distill something
> usefull from that. The problem is that 1) I make use of another  
> module of
> mine to deal with anisotropic data (because my data is anisotropic)   
> 2) I use
> the MCP method in a specific way, therefore I needed to make the
> implementation more flexible. This makes it less easy to use for  
> other people,
> and thus less handy to include in any toolkit as-is.

Thanks -- I'll see what I can distill!

>>> Looking at the posted code, I think it is incorrect. Each iteration,
>>> you should only check the neighbours of the pixel that has the
>>> minimum cumulative costs. That's why the binary heap is so important
>>> to get it fast.
>>
>> Incorrect means that the code might give a wrong result: is this the
>> case? I *think* I had satisfied myself that the implementation (while
>> suboptimal because it does extra work -- a lot in some cases!) would
>> yield the correct path. (Note that the code doesn't terminate when  
>> the
>> "end" pixel is first assigned a cost, but when no costs are changing
>> anywhere. Basically, brute-force search instead of Dijkstra's
>> algorithm. Again, while a lot more than necessary to just find the
>> minimum cost to a single point, this condition should be sufficient  
>> to
>> ensure that the minimum cost to *every* point in the array has been
>> found, right? If my analysis is wrong, though, it wouldn't be the
>> first time!)
>
> I really mean wrong, sorry. You now select any pixel that is active  
> (meaning
> an arbitrary pixel in the front), and from it calculate the  
> cumulative cost
> for its neighbours. However, it might be that the cumulative cost of  
> this pixel
> is changed later. Therefore you must take the active pixel with the  
> lowest
> cumulative cost; so you know it won't be changed.

Each time a pixel's cumulative cost decreases, I put it back into the  
"active" set (i.e. the front), so then the neighbors get re-examined  
the next iteration, etc. This should suffice, right? Or am I *still*  
missing something? Not that it really matters, because the approach is  
rather inefficient for anything except finding the minimum cost to  
every single array element (and even then I'm not certain this is  
better). But I am curious if I just conceived of the whole problem  
wrong. In which case perhaps I'm not the guy you want implementing  
this for the scikit!

Zach


From almar.klein at gmail.com  Wed Nov 25 09:05:02 2009
From: almar.klein at gmail.com (Almar Klein)
Date: Wed, 25 Nov 2009 15:05:02 +0100
Subject: [SciPy-User] Dijkstra's algorithm on a lattice
In-Reply-To: <D60C3466-D596-4D35-B6A4-0ADACABB5275@yale.edu>
References: <C4E4B8F2-A5C3-4E75-B7AD-61140EABA69A@yale.edu>
	<99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu>
	<4B088A5B.8000604@bigpond.net.au>
	<66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu>
	<9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com>
	<9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com>
	<cc38d75f0911250416o7c0cb7a0if54f86acf815c784@mail.gmail.com>
	<6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu>
	<cc38d75f0911250506q9a64201w82d1b9b72eb02733@mail.gmail.com>
	<D60C3466-D596-4D35-B6A4-0ADACABB5275@yale.edu>
Message-ID: <cc38d75f0911250605y9b1cb14me731d9ed5ffe21d4@mail.gmail.com>

>>>> Looking at the posted code, I think it is incorrect. Each iteration,
>>>> you should only check the neighbours of the pixel that has the
>>>> minimum cumulative costs. That's why the binary heap is so important
>>>> to get it fast.
>>>
>>> Incorrect means that the code might give a wrong result: is this the
>>> case? I *think* I had satisfied myself that the implementation (while
>>> suboptimal because it does extra work -- a lot in some cases!) would
>>> yield the correct path. (Note that the code doesn't terminate when
>>> the
>>> "end" pixel is first assigned a cost, but when no costs are changing
>>> anywhere. Basically, brute-force search instead of Dijkstra's
>>> algorithm. Again, while a lot more than necessary to just find the
>>> minimum cost to a single point, this condition should be sufficient
>>> to
>>> ensure that the minimum cost to *every* point in the array has been
>>> found, right? If my analysis is wrong, though, it wouldn't be the
>>> first time!)
>>
>> I really mean wrong, sorry. You now select any pixel that is active
>> (meaning
>> an arbitrary pixel in the front), and from it calculate the
>> cumulative cost
>> for its neighbours. However, it might be that the cumulative cost of
>> this pixel
>> is changed later. Therefore you must take the active pixel with the
>> lowest
>> cumulative cost; so you know it won't be changed.
>
> Each time a pixel's cumulative cost decreases, I put it back into the
> "active" set (i.e. the front), so then the neighbors get re-examined
> the next iteration, etc. This should suffice, right? Or am I *still*
> missing something? Not that it really matters, because the approach is
> rather inefficient for anything except finding the minimum cost to
> every single array element (and even then I'm not certain this is
> better). But I am curious if I just conceived of the whole problem
> wrong. In which case perhaps I'm not the guy you want implementing
> this for the scikit!

Ah, now I see. I'm sorry. Yes, your code should produce the correct result,
although it will probably evaluate a lot of pixels more than once :)

Almar


From oliphant at enthought.com  Wed Nov 25 09:48:09 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Wed, 25 Nov 2009 08:48:09 -0600
Subject: [SciPy-User] sinc interpolation
Message-ID: <C45370C4-4200-40DB-B46C-6D958708F9CF@enthought.com>


On Nov 20, 2009, at 2:26 PM, David Trem wrote:

> Hello,
>
> Is sinc interpolation available in Scipy ?

Yes, use scipy.signal.resample  which uses a Fourier method to  
downsample or upsample a signal:

from scipy.signal import resample
from numpy import r_, sin
from pylab import plot

x = r_[0:10]
y = sin(x)
yy = resample(x, 100)

# This is a bit tricky to get the x-samples right
xx = r_[0:10:101j][:-1]

plot(x,y,'ro', xx, yy)


-Travis


From cournape at gmail.com  Wed Nov 25 09:55:57 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 25 Nov 2009 23:55:57 +0900
Subject: [SciPy-User] sinc interpolation
In-Reply-To: <1258807340.2525.0.camel@PCTerrusse>
References: <4B06FB5E.8070806@gmail.com> <1258807340.2525.0.camel@PCTerrusse>
Message-ID: <5b8d13220911250655p7c7ad73bqc4ea08b79bfa5722@mail.gmail.com>

On Sat, Nov 21, 2009 at 9:42 PM, Fabricio Silva <silva at lma.cnrs-mrs.fr> wrote:
> Le vendredi 20 novembre 2009 ? 21:26 +0100, David Trem a ?crit :
>> Hello,
>>
>> Is sinc interpolation available in Scipy ?
>
> David Cournapeau has a scikit for that :
> http://pypi.python.org/pypi/scikits.samplerate/

It is mostly useful for audio signals, though, and limited to 1d
signals. A more general sinc-based interpolation scheme would be nice
for scipy.signal.

David


From vanforeest at gmail.com  Wed Nov 25 10:54:28 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Wed, 25 Nov 2009 16:54:28 +0100
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <loom.20091125T075842-627@post.gmane.org>
References: <loom.20091125T075842-627@post.gmane.org>
Message-ID: <fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>

Hi Ram,

YOu should take the interarrival time between two consecutive arrivals
to be exponentially distributed with rate lambda, where lambda is the
arrival rate. LIke this the number of arrivals in a fixed period is
Poisson distributed. I never tried, but I suppose scipy contains a
module to generate exponentially distributed rv's.

Nicky

2009/11/25 Ram Rachum <cool-rr at cool-rr.com>:
> Hello,
>
> I've just started using scipy/numpy for some queue theory. I have a queue for
> which the arrival rate is a Poisson distribution. I also have the mean number of
> arrivals per time unit.
>
> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that
> it could make a random variable for number of arrivals per time unit. But I want
> the time between consecutive arrivals, as a random variable.
>
> Does anyone know how I can get that?
>
> Thanks,
> Ram.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From dalloliogm at gmail.com  Wed Nov 25 11:16:39 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 25 Nov 2009 17:16:39 +0100
Subject: [SciPy-User] sinc interpolation
In-Reply-To: <C45370C4-4200-40DB-B46C-6D958708F9CF@enthought.com>
References: <C45370C4-4200-40DB-B46C-6D958708F9CF@enthought.com>
Message-ID: <5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com>

On Wed, Nov 25, 2009 at 3:48 PM, Travis Oliphant <oliphant at enthought.com>wrote:

>
>
> from scipy.signal import resample
> from numpy import r_, sin
> from pylab import plot
>
> x = r_[0:10]
> y = sin(x)
> yy = resample(x, 100)
>
> # This is a bit tricky to get the x-samples right
> xx = r_[0:10:101j][:-1]
>

just a question, why don't you use  numpy.linspace(0, 10, 101) ?
>>> n = numpy.linspace(0, 10, 101)[:-1]
array([  0. ,   0.1,   0.2,   0.3,   0.4,   0.5,   0.6,   0.7,   0.8,
         0.9,   1. ,   1.1,   1.2,   1.3,   1.4,   1.5,   1.6,   1.7,
         1.8,   1.9,   2. ,   2.1,   2.2,   2.3,   2.4,   2.5,   2.6,
         2.7,   2.8,   2.9,   3. ,   3.1,   3.2,   3.3,   3.4,   3.5,
         3.6,   3.7,   3.8,   3.9,   4. ,   4.1,   4.2,   4.3,   4.4,
         4.5,   4.6,   4.7,   4.8,   4.9,   5. ,   5.1,   5.2,   5.3,
         5.4,   5.5,   5.6,   5.7,   5.8,   5.9,   6. ,   6.1,   6.2,
         6.3,   6.4,   6.5,   6.6,   6.7,   6.8,   6.9,   7. ,   7.1,
         7.2,   7.3,   7.4,   7.5,   7.6,   7.7,   7.8,   7.9,   8. ,
         8.1,   8.2,   8.3,   8.4,   8.5,   8.6,   8.7,   8.8,   8.9,
         9. ,   9.1,   9.2,   9.3,   9.4,   9.5,   9.6,   9.7,   9.8,
         9.9,  10. ])
>>> n == r_[0:10:101j][:-1]
[True.....]

-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/8fefe78c/attachment.html>

From josef.pktd at gmail.com  Wed Nov 25 11:20:09 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 11:20:09 -0500
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>
References: <loom.20091125T075842-627@post.gmane.org>
	<fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>
Message-ID: <1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com>

On Wed, Nov 25, 2009 at 10:54 AM, nicky van foreest
<vanforeest at gmail.com> wrote:
> Hi Ram,
>
> YOu should take the interarrival time between two consecutive arrivals
> to be exponentially distributed with rate lambda, where lambda is the
> arrival rate. LIke this the number of arrivals in a fixed period is
> Poisson distributed. I never tried, but I suppose scipy contains a
> module to generate exponentially distributed rv's.

The sum of iid exponential distributed rvs is gamma distributed
http://en.wikipedia.org/wiki/Gamma_distribution

all available in scipy.stats

Josef

>
> Nicky
>
> 2009/11/25 Ram Rachum <cool-rr at cool-rr.com>:
>> Hello,
>>
>> I've just started using scipy/numpy for some queue theory. I have a queue for
>> which the arrival rate is a Poisson distribution. I also have the mean number of
>> arrivals per time unit.
>>
>> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that
>> it could make a random variable for number of arrivals per time unit. But I want
>> the time between consecutive arrivals, as a random variable.
>>
>> Does anyone know how I can get that?
>>
>> Thanks,
>> Ram.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From d.l.goldsmith at gmail.com  Wed Nov 25 14:53:19 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 25 Nov 2009 11:53:19 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) methods in Prob. & 	Stats.
Message-ID: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>

Are there enough applications of transform methods (by which I mean,
Fourier, Laplace, Z, etc.) in probability & statistics for this to be
considered its own specialty therein?  Any text recommendations on it (even
if it's only a chapter dedicated to it)?  Thanks,

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/ed3b3676/attachment.html>

From josef.pktd at gmail.com  Wed Nov 25 15:19:54 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 15:19:54 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
Message-ID: <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>

On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> Are there enough applications of transform methods (by which I mean,
> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
> considered its own specialty therein?? Any text recommendations on it (even
> if it's only a chapter dedicated to it)?? Thanks,
>

Some information is in the thread on my recent question
"characteristic functions of probability distributions"

There is a large literature in econometrics and statistics about using
the characteristic function for estimation and testing.
The reference of Nicky for queuing theory uses mostly the Laplace
transform (for discrete distributions), while for continuous
distributions and mixtures the continuous fourier transform is used
(definition of characteristic function).

I started to work my way through part of the literature with
application in finance. Main use I looked at was using the inverse
Fourier transform when the characteristic function has an analytical
expression and the pdf does not, e.g used for estimating difffusion
processes by MLE.

I haven't looked much at the Laplace transform, because I'm more
interested in the continuous random variable case.

Related methods work directly with the empirical characteristic
function to do estimation and testing, but I haven't looked much at
that yet.

I looked at references from all over the place, essentially with
google searches and searches of the main stats journal collections.
(I have a unsorted collection of pdfs on my computer but no overview
about what I actually read.)

Of course the biggest and oldest use of the Fourier transform is the
frequency domain analysis in time series analysis.

It's not off topic because I try to get some of these methods
programmed in python.

Josef

> DG
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From vanforeest at gmail.com  Wed Nov 25 17:41:43 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Wed, 25 Nov 2009 23:41:43 +0100
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
Message-ID: <fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>

Hi,

2009/11/25  <josef.pktd at gmail.com>:
> On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
>> Are there enough applications of transform methods (by which I mean,
>> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
>> considered its own specialty therein?? Any text recommendations on it (even
>> if it's only a chapter dedicated to it)?? Thanks,
>>
>
> Some information is in the thread on my recent question
> "characteristic functions of probability distributions"
>
> There is a large literature in econometrics and statistics about using
> the characteristic function for estimation and testing.
> The reference of Nicky for queuing theory uses mostly the Laplace
> transform (for discrete distributions),

It has been some time ago (more than 5 years...), but I recall that
Whitt, in his articles on the numerical inversion of Laplace
transforms, discretized Laplace transforms to facilitate the
inversion, The distributions themselves are not necessarily discrete.
One example would be the waiting time distribution of customers in a
queue, which is continuous for most service and arrival processes.

There is certainly potential for dedicated numerical inversion algo's
for the Laplace transforms of density and distribution functions. The
latter form a somewhat specialized sort of function. Distribution
functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
They may also have discontinuities, but not too many. These properties
may affect the inversion.  Besides these properties, the transforms
are used to obtain insight into the behavior of the sum of independent
random variables. Such sums can be rewritten as the product of the
transforms of distribution. This product in turn requires inversion
to, as some people call it, take away the Laplacian curtain.

Nicky


From mjtemkin at gmail.com  Wed Nov 25 18:13:00 2009
From: mjtemkin at gmail.com (Michael Temkin)
Date: Wed, 25 Nov 2009 15:13:00 -0800
Subject: [SciPy-User] scipy.linalg import issues on Mac OS X Snow Leopard
Message-ID: <79a789c20911251513l228219a2mc80f33f51361e61a@mail.gmail.com>

I've been having numerous issues getting scipy to work on Mac OX 10.6.
I finally got it to build using the instructions from
http://blog.hyperjeff.net/?p=160. For some reason the scipy superpack
won't work on my machine (and neither would the macports release) so
building from source was the best option.

Even though the build was finally successful, I am still unable to use
the library. The error message I am getting now is:

    from scipy.linalg import norm, inv
  File "/Library/Python/2.6/site-packages/scipy/linalg/__init__.py",
line 8, in <module>
    from basic import *
  File "/Library/Python/2.6/site-packages/scipy/linalg/basic.py", line
17, in <module>
    from lapack import get_lapack_funcs
  File "/Library/Python/2.6/site-packages/scipy/linalg/lapack.py",
line 17, in <module>
    from scipy.linalg import flapack
ImportError: dlopen(/Library/Python/2.6/site-packages/scipy/linalg/flapack.so,
2): Symbol not found: _f2pywrapdlamch_
  Referenced from: /Library/Python/2.6/site-packages/scipy/linalg/flapack.so
  Expected in: dynamic lookup

Everything is where it should be, and as far as I know I am not doing
anything non-standard. I'm using python 2.6.4, all the libraries were
built with no obvious issues.

Does anyone have any ideas as to what could cause this issue?

thanks.


From josef.pktd at gmail.com  Wed Nov 25 18:23:26 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 18:23:26 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
Message-ID: <1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com>

On Wed, Nov 25, 2009 at 5:41 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi,
>
> 2009/11/25 ?<josef.pktd at gmail.com>:
>> On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>>> Are there enough applications of transform methods (by which I mean,
>>> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
>>> considered its own specialty therein?? Any text recommendations on it (even
>>> if it's only a chapter dedicated to it)?? Thanks,
>>>
>>
>> Some information is in the thread on my recent question
>> "characteristic functions of probability distributions"
>>
>> There is a large literature in econometrics and statistics about using
>> the characteristic function for estimation and testing.
>> The reference of Nicky for queuing theory uses mostly the Laplace
>> transform (for discrete distributions),
>
> It has been some time ago (more than 5 years...), but I recall that
> Whitt, in his articles on the numerical inversion of Laplace
> transforms, discretized Laplace transforms to facilitate the
> inversion, The distributions themselves are not necessarily discrete.
> One example would be the waiting time distribution of customers in a
> queue, which is continuous for most service and arrival processes.
>
> There is certainly potential for dedicated numerical inversion algo's
> for the Laplace transforms of density and distribution functions. The
> latter form a somewhat specialized sort of function. Distribution
> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
> They may also have discontinuities, but not too many. These properties
> may affect the inversion. ?Besides these properties, the transforms
> are used to obtain insight into the behavior of the sum of independent
> random variables. Such sums can be rewritten as the product of the
> transforms of distribution. This product in turn requires inversion
> to, as some people call it, take away the Laplacian curtain.

Is there an advantage to using Laplace instead of Fourier transform
in this context?

I had to stop working on this, because I have to finish up some other
projects.
The advantages that I saw for the Fourier transform are that it has
directly the interpretation as characteristic function with explicit
formulas for many distributions, e.g. stable distribution which has no
analytical expression for pdf or cdf, and the availability of fft to do fast
inversion instead of pointwise integration.

Except reading the definition of the Laplace transform, I don't know
much about it and have no idea what the numerical advantages might
be.

Another application, besides the sum of rvs,  that I looked at, are mixture
distributions, e.g. Poisson mixture of continuous (lognormal) distributions,
which are also easy to calculate in terms of the characteristic function, and
 I guess the Laplace transform.

This is an older reference that is cited quite a bit:
Waller, Lance A., Bruce W. Turnbull, and J. Michael Hardin. ?Obtaining
Distribution Functions by Numerical Inversion of Characteristic
Functions with Applications.? The American Statistician 49, no. 4
(November 1995): 346-350.

Josef

>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From d.l.goldsmith at gmail.com  Wed Nov 25 18:24:37 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 25 Nov 2009 15:24:37 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
Message-ID: <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>

Good info, thanks; I'll look up "your" thread, Josef, on the archive and run
down what look like relevant references.  (FWIW, my interest is that I'm
helping out (nominally, "tutoring," but this level, it's more akin to being
a sounding board, checking his derivations, and "reminding" him of various
subtleties that are emphasized in math, but not necessarily in EE, etc.)
this guy working on his dissertation on air traffic control automation using
wireless communication protocols, very probability heavy stuff, and for the
second time yesterday, he presented me with a transform application - in
this instance, the "Z" transform - in this probability-heavy stuff, and this
is outside of my training in probability, so I want to "bone-up.")  Thanks
again,

DG

On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest <vanforeest at gmail.com>wrote:

> Hi,
>
> 2009/11/25  <josef.pktd at gmail.com>:
> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
> > <d.l.goldsmith at gmail.com> wrote:
> >> Are there enough applications of transform methods (by which I mean,
> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
> >> considered its own specialty therein?  Any text recommendations on it
> (even
> >> if it's only a chapter dedicated to it)?  Thanks,
> >>
> >
> > Some information is in the thread on my recent question
> > "characteristic functions of probability distributions"
> >
> > There is a large literature in econometrics and statistics about using
> > the characteristic function for estimation and testing.
> > The reference of Nicky for queuing theory uses mostly the Laplace
> > transform (for discrete distributions),
>
> It has been some time ago (more than 5 years...), but I recall that
> Whitt, in his articles on the numerical inversion of Laplace
> transforms, discretized Laplace transforms to facilitate the
> inversion, The distributions themselves are not necessarily discrete.
> One example would be the waiting time distribution of customers in a
> queue, which is continuous for most service and arrival processes.
>
> There is certainly potential for dedicated numerical inversion algo's
> for the Laplace transforms of density and distribution functions. The
> latter form a somewhat specialized sort of function. Distribution
> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
> They may also have discontinuities, but not too many. These properties
> may affect the inversion.  Besides these properties, the transforms
> are used to obtain insight into the behavior of the sum of independent
> random variables. Such sums can be rewritten as the product of the
> transforms of distribution. This product in turn requires inversion
> to, as some people call it, take away the Laplacian curtain.
>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/209474bb/attachment.html>

From d.l.goldsmith at gmail.com  Wed Nov 25 18:27:57 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 25 Nov 2009 15:27:57 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com>
Message-ID: <45d1ab480911251527u473b5503i5a7856ba47e3fccf@mail.gmail.com>

On Wed, Nov 25, 2009 at 3:23 PM, <josef.pktd at gmail.com> wrote:

> On Wed, Nov 25, 2009 at 5:41 PM, nicky van foreest <vanforeest at gmail.com>
> wrote:
> > Hi,
> >This is an older reference that is cited quite a bit:
> Waller, Lance A., Bruce W. Turnbull, and J. Michael Hardin. ?Obtaining
> Distribution Functions by Numerical Inversion of Characteristic
> Functions with Applications.? The American Statistician 49, no. 4
> (November 1995): 346-350.
>

Great, thanks!

DG

>
> Josef
>
> >
> > Nicky
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/2ee3d52e/attachment.html>

From josef.pktd at gmail.com  Wed Nov 25 18:45:38 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 18:45:38 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
Message-ID: <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>

On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> Good info, thanks; I'll look up "your" thread, Josef, on the archive and run
> down what look like relevant references.? (FWIW, my interest is that I'm
> helping out (nominally, "tutoring," but this level, it's more akin to being
> a sounding board, checking his derivations, and "reminding" him of various
> subtleties that are emphasized in math, but not necessarily in EE, etc.)
> this guy working on his dissertation on air traffic control automation using
> wireless communication protocols, very probability heavy stuff, and for the
> second time yesterday, he presented me with a transform application - in
> this instance, the "Z" transform - in this probability-heavy stuff, and this
> is outside of my training in probability, so I want to "bone-up.")? Thanks
> again,

I always have to look for your reply because you don't follow our bottom-posting
policy.

I have seen the z-transform only in the context of time series analysis
http://en.wikipedia.org/wiki/Z-transform
especially this
http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation
covered to some extend in scipy.signal, lfilter and lti

so the other literature to Laplace transforms and characteristic functions
might not be very closely related.

Josef


>
> DG
>
> On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest <vanforeest at gmail.com>
> wrote:
>>
>> Hi,
>>
>> 2009/11/25 ?<josef.pktd at gmail.com>:
>> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
>> > <d.l.goldsmith at gmail.com> wrote:
>> >> Are there enough applications of transform methods (by which I mean,
>> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
>> >> considered its own specialty therein?? Any text recommendations on it
>> >> (even
>> >> if it's only a chapter dedicated to it)?? Thanks,
>> >>
>> >
>> > Some information is in the thread on my recent question
>> > "characteristic functions of probability distributions"
>> >
>> > There is a large literature in econometrics and statistics about using
>> > the characteristic function for estimation and testing.
>> > The reference of Nicky for queuing theory uses mostly the Laplace
>> > transform (for discrete distributions),
>>
>> It has been some time ago (more than 5 years...), but I recall that
>> Whitt, in his articles on the numerical inversion of Laplace
>> transforms, discretized Laplace transforms to facilitate the
>> inversion, The distributions themselves are not necessarily discrete.
>> One example would be the waiting time distribution of customers in a
>> queue, which is continuous for most service and arrival processes.
>>
>> There is certainly potential for dedicated numerical inversion algo's
>> for the Laplace transforms of density and distribution functions. The
>> latter form a somewhat specialized sort of function. Distribution
>> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
>> They may also have discontinuities, but not too many. These properties
>> may affect the inversion. ?Besides these properties, the transforms
>> are used to obtain insight into the behavior of the sum of independent
>> random variables. Such sums can be rewritten as the product of the
>> transforms of distribution. This product in turn requires inversion
>> to, as some people call it, take away the Laplacian curtain.
>>
>> Nicky
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From d.l.goldsmith at gmail.com  Wed Nov 25 19:31:00 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 25 Nov 2009 16:31:00 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
Message-ID: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>

On Wed, Nov 25, 2009 at 3:45 PM, <josef.pktd at gmail.com> wrote:

> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and
> run
> > down what look like relevant references.  (FWIW, my interest is that I'm
> > helping out (nominally, "tutoring," but this level, it's more akin to
> being
> > a sounding board, checking his derivations, and "reminding" him of
> various
> > subtleties that are emphasized in math, but not necessarily in EE, etc.)
> > this guy working on his dissertation on air traffic control automation
> using
> > wireless communication protocols, very probability heavy stuff, and for
> the
> > second time yesterday, he presented me with a transform application - in
> > this instance, the "Z" transform - in this probability-heavy stuff, and
> this
> > is outside of my training in probability, so I want to "bone-up.")
> Thanks
> > again,
>
> I always have to look for your reply because you don't follow our
> bottom-posting
> policy.
>

Sorry, I tend to "follow" when I'm saying something in direct response to
something I'm replying to and/or when I think that I'm likely _not_
terminating the thread, but when I'm responding generally and/or think that
I am likely terminating the thread, then I tend to just reply at the top.
I'll try to remember that we have a policy. :-)

I have seen the z-transform only in the context of time series analysis
> http://en.wikipedia.org/wiki/Z-transform
> especially this
>
> http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation
> covered to some extend in scipy.signal, lfilter and lti
>

Part of the problem was that it wasn't clear to either of us - myself or my
"student" - why the authors of this particular paper were using the
z-transform at all where they were - it seemed their result was easily
derivable w/out it, so we were both baffled.

so the other literature to Laplace transforms and characteristic functions
> might not be very closely related.
>

Perhaps not directly (in any event, presently, I'm interested in
theoretical/"analytical," i.e., not numerical, applications anyway), but my
philosophy has always been, if I can be directed to something that is closer
to on target than what I've been able to find on my own, then even if it's
not a bulls-eye, I can often find a bulls-eye in the reference's
references.  For example, "Chung (or any other book on graduate
probability)" sounds like a good starting point.  So thanks for reminding me
about the thread.  (I knew it sounded familiar: I contributed to it!  And on
that note, I "let it lie" at the time, but now feel I should say, admittedly
a little defensively, that of course Anne's comments were on the mark; the
only reasons I felt it necessary to add what I did about complex integration
over a closed path were: A) you had indicated that you were a bit of a
novice in the field, and the result I was giving is, perhaps arguably, the
subject's most fundamental result, and B) I felt that it was important that
you were aware of it because, if any of your functions _were_ analytic and
your paths closed, then you shouldn't be doing any (explicit) numerical (or
symbolic, for that matter) integration at all - you should just be
"hard-wiring" those integrals to zero!  And for what it's worth: every time
you integrate with respect to one (continuous) real variable, you're doing a
path integration - one so comparatively trivial that we don't call it that,
but a path integration nevertheless.) :-)

DG


>
> Josef
>
>
> >
> > DG
> >
> > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest <vanforeest at gmail.com
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> 2009/11/25  <josef.pktd at gmail.com>:
> >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
> >> > <d.l.goldsmith at gmail.com> wrote:
> >> >> Are there enoug applications of transform methods (by which I mean,
> >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be
> >> >> considered its own specialty therein?  Any text recommendations on it
> >> >> (even
> >> >> if it's only a chapter dedicated to it)?  Thanks,
> >> >>
> >> >
> >> > Some information is in the thread on my recent question
> >> > "characteristic functions of probability distributions"
> >> >
> >> > There is a large literature in econometrics and statistics about using
> >> > the characteristic function for estimation and testing.
> >> > The reference of Nicky for queuing theory uses mostly the Laplace
> >> > transform (for discrete distributions),
> >>
> >> It has been some time ago (more than 5 years...), but I recall that
> >> Whitt, in his articles on the numerical inversion of Laplace
> >> transforms, discretized Laplace transforms to facilitate the
> >> inversion, The distributions themselves are not necessarily discrete.
> >> One example would be the waiting time distribution of customers in a
> >> queue, which is continuous for most service and arrival processes.
> >>
> >> There is certainly potential for dedicated numerical inversion algo's
> >> for the Laplace transforms of density and distribution functions. The
> >> latter form a somewhat specialized sort of function. Distribution
> >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
> >> They may also have discontinuities, but not too many. These properties
> >> may affect the inversion.  Besides these properties, the transforms
> >> are used to obtain insight into the behavior of the sum of independent
> >> random variables. Such sums can be rewritten as the product of the
> >> transforms of distribution. This product in turn requires inversion
> >> to, as some people call it, take away the Laplacian curtain.
> >>
> >> Nicky
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/4d16f936/attachment.html>

From d.l.goldsmith at gmail.com  Wed Nov 25 19:54:58 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Wed, 25 Nov 2009 16:54:58 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
Message-ID: <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com>

On Wed, Nov 25, 2009 at 4:31 PM, David Goldsmith <d.l.goldsmith at gmail.com>wrote:

> On Wed, Nov 25, 2009 at 3:45 PM, <josef.pktd at gmail.com> wrote:
>
>> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and
>> run
>>
>
Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic.
looks like a really good general reference, Nicky - I assume this is the
"Chung" to which you were referring?  Thanks!!!

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091125/837e6d9a/attachment.html>

From josef.pktd at gmail.com  Wed Nov 25 21:04:09 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 21:04:09 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
Message-ID: <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com>

On Wed, Nov 25, 2009 at 7:31 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 3:45 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and
>> > run
>> > down what look like relevant references.? (FWIW, my interest is that I'm
>> > helping out (nominally, "tutoring," but this level, it's more akin to
>> > being
>> > a sounding board, checking his derivations, and "reminding" him of
>> > various
>> > subtleties that are emphasized in math, but not necessarily in EE, etc.)
>> > this guy working on his dissertation on air traffic control automation
>> > using
>> > wireless communication protocols, very probability heavy stuff, and for
>> > the
>> > second time yesterday, he presented me with a transform application - in
>> > this instance, the "Z" transform - in this probability-heavy stuff, and
>> > this
>> > is outside of my training in probability, so I want to "bone-up.")
>> > Thanks
>> > again,
>>
>> I always have to look for your reply because you don't follow our
>> bottom-posting
>> policy.
>
> Sorry, I tend to "follow" when I'm saying something in direct response to
> something I'm replying to and/or when I think that I'm likely _not_
> terminating the thread, but when I'm responding generally and/or think that
> I am likely terminating the thread, then I tend to just reply at the top.
> I'll try to remember that we have a policy. :-)
>
>> I have seen the z-transform only in the context of time series analysis
>> http://en.wikipedia.org/wiki/Z-transform
>> especially this
>>
>> http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation
>> covered to some extend in scipy.signal, lfilter and lti
>
> Part of the problem was that it wasn't clear to either of us - myself or my
> "student" - why the authors of this particular paper were using the
> z-transform at all where they were - it seemed their result was easily
> derivable w/out it, so we were both baffled.
>
>> so the other literature to Laplace transforms and characteristic functions
>> might not be very closely related.
>
> Perhaps not directly (in any event, presently, I'm interested in
> theoretical/"analytical," i.e., not numerical, applications anyway), but my
> philosophy has always been, if I can be directed to something that is closer
> to on target than what I've been able to find on my own, then even if it's
> not a bulls-eye, I can often find a bulls-eye in the reference's
> references.? For example, "Chung (or any other book on graduate
> probability)" sounds like a good starting point.? So thanks for reminding me
> about the thread.? (I knew it sounded familiar: I contributed to it!? And on
> that note, I "let it lie" at the time, but now feel I should say, admittedly
> a little defensively, that of course Anne's comments were on the mark; the
> only reasons I felt it necessary to add what I did about complex integration
> over a closed path were: A) you had indicated that you were a bit of a
> novice in the field, and the result I was giving is, perhaps arguably, the
> subject's most fundamental result, and B) I felt that it was important that
> you were aware of it because, if any of your functions _were_ analytic and
> your paths closed, then you shouldn't be doing any (explicit) numerical (or
> symbolic, for that matter) integration at all - you should just be
> "hard-wiring" those integrals to zero!? And for what it's worth: every time
> you integrate with respect to one (continuous) real variable, you're doing a
> path integration - one so comparatively trivial that we don't call it that,
> but a path integration nevertheless.) :-)

I was just reading up a bit on contour integrals on wikipedia, and it
looks too applied for Probability and Measure theory. It just tells
you how to use some tricks to calculate specific Rieman integrals in
the complex plane. I didn't see any hints for Lebesque integrals.
All real analysis, and measure theory (that I have seen) is based on
Lebesque integration or Lebesque-Stiltjes as in Chungs book. So for me
contour integrals just falls in between the measure theory and the
applied (real) calculations, and I never had to figure out what it
does.

I'm not doing path integration when I integrate with respect to a
(probability) measure that has both continuous intervals and mass
points (Lebesque not Rieman if you want to be picky)

Josef


>
> DG
>
>>
>> Josef
>>
>>
>> >
>> > DG
>> >
>> > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest
>> > <vanforeest at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> 2009/11/25 ?<josef.pktd at gmail.com>:
>> >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
>> >> > <d.l.goldsmith at gmail.com> wrote:
>> >> >> Are there enoug applications of transform methods (by which I mean,
>> >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to
>> >> >> be
>> >> >> considered its own specialty therein?? Any text recommendations on
>> >> >> it
>> >> >> (even
>> >> >> if it's only a chapter dedicated to it)?? Thanks,
>> >> >>
>> >> >
>> >> > Some information is in the thread on my recent question
>> >> > "characteristic functions of probability distributions"
>> >> >
>> >> > There is a large literature in econometrics and statistics about
>> >> > using
>> >> > the characteristic function for estimation and testing.
>> >> > The reference of Nicky for queuing theory uses mostly the Laplace
>> >> > transform (for discrete distributions),
>> >>
>> >> It has been some time ago (more than 5 years...), but I recall that
>> >> Whitt, in his articles on the numerical inversion of Laplace
>> >> transforms, discretized Laplace transforms to facilitate the
>> >> inversion, The distributions themselves are not necessarily discrete.
>> >> One example would be the waiting time distribution of customers in a
>> >> queue, which is continuous for most service and arrival processes.
>> >>
>> >> There is certainly potential for dedicated numerical inversion algo's
>> >> for the Laplace transforms of density and distribution functions. The
>> >> latter form a somewhat specialized sort of function. Distribution
>> >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
>> >> They may also have discontinuities, but not too many. These properties
>> >> may affect the inversion. ?Besides these properties, the transforms
>> >> are used to obtain insight into the behavior of the sum of independent
>> >> random variables. Such sums can be rewritten as the product of the
>> >> transforms of distribution. This product in turn requires inversion
>> >> to, as some people call it, take away the Laplacian curtain.
>> >>
>> >> Nicky
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From josef.pktd at gmail.com  Wed Nov 25 21:48:56 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Nov 2009 21:48:56 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
	<1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com>
Message-ID: <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com>

On Wed, Nov 25, 2009 at 9:04 PM,  <josef.pktd at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 7:31 PM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
>> On Wed, Nov 25, 2009 at 3:45 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith
>>> <d.l.goldsmith at gmail.com> wrote:
>>> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and
>>> > run
>>> > down what look like relevant references.? (FWIW, my interest is that I'm
>>> > helping out (nominally, "tutoring," but this level, it's more akin to
>>> > being
>>> > a sounding board, checking his derivations, and "reminding" him of
>>> > various
>>> > subtleties that are emphasized in math, but not necessarily in EE, etc.)
>>> > this guy working on his dissertation on air traffic control automation
>>> > using
>>> > wireless communication protocols, very probability heavy stuff, and for
>>> > the
>>> > second time yesterday, he presented me with a transform application - in
>>> > this instance, the "Z" transform - in this probability-heavy stuff, and
>>> > this
>>> > is outside of my training in probability, so I want to "bone-up.")
>>> > Thanks
>>> > again,
>>>
>>> I always have to look for your reply because you don't follow our
>>> bottom-posting
>>> policy.
>>
>> Sorry, I tend to "follow" when I'm saying something in direct response to
>> something I'm replying to and/or when I think that I'm likely _not_
>> terminating the thread, but when I'm responding generally and/or think that
>> I am likely terminating the thread, then I tend to just reply at the top.
>> I'll try to remember that we have a policy. :-)
>>
>>> I have seen the z-transform only in the context of time series analysis
>>> http://en.wikipedia.org/wiki/Z-transform
>>> especially this
>>>
>>> http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation
>>> covered to some extend in scipy.signal, lfilter and lti
>>
>> Part of the problem was that it wasn't clear to either of us - myself or my
>> "student" - why the authors of this particular paper were using the
>> z-transform at all where they were - it seemed their result was easily
>> derivable w/out it, so we were both baffled.
>>
>>> so the other literature to Laplace transforms and characteristic functions
>>> might not be very closely related.
>>
>> Perhaps not directly (in any event, presently, I'm interested in
>> theoretical/"analytical," i.e., not numerical, applications anyway), but my
>> philosophy has always been, if I can be directed to something that is closer
>> to on target than what I've been able to find on my own, then even if it's
>> not a bulls-eye, I can often find a bulls-eye in the reference's
>> references.? For example, "Chung (or any other book on graduate
>> probability)" sounds like a good starting point.? So thanks for reminding me
>> about the thread.? (I knew it sounded familiar: I contributed to it!? And on
>> that note, I "let it lie" at the time, but now feel I should say, admittedly
>> a little defensively, that of course Anne's comments were on the mark; the
>> only reasons I felt it necessary to add what I did about complex integration
>> over a closed path were: A) you had indicated that you were a bit of a
>> novice in the field, and the result I was giving is, perhaps arguably, the
>> subject's most fundamental result, and B) I felt that it was important that
>> you were aware of it because, if any of your functions _were_ analytic and
>> your paths closed, then you shouldn't be doing any (explicit) numerical (or
>> symbolic, for that matter) integration at all - you should just be
>> "hard-wiring" those integrals to zero!? And for what it's worth: every time
>> you integrate with respect to one (continuous) real variable, you're doing a
>> path integration - one so comparatively trivial that we don't call it that,
>> but a path integration nevertheless.) :-)
>
> I was just reading up a bit on contour integrals on wikipedia, and it
> looks too applied for Probability and Measure theory. It just tells
> you how to use some tricks to calculate specific Rieman integrals in
> the complex plane. I didn't see any hints for Lebesque integrals.
> All real analysis, and measure theory (that I have seen) is based on
> Lebesque integration or Lebesque-Stiltjes as in Chungs book. So for me
> contour integrals just falls in between the measure theory and the
> applied (real) calculations, and I never had to figure out what it
> does.
>
> I'm not doing path integration when I integrate with respect to a
> (probability) measure that has both continuous intervals and mass
> points (Lebesque not Rieman if you want to be picky)

Maybe the last statement is wrong, it's too long ago that I
struggled with this. Maybe I'm mixing up Lebesgue-integral,
Lebesgue-measurable, and measures that are absolutely continuous
with respect to Lebesgue-measure.

Josef

>
> Josef
>
>
>
>
>>
>> DG
>>
>>>
>>> Josef
>>>
>>>
>>> >
>>> > DG
>>> >
>>> > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest
>>> > <vanforeest at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> 2009/11/25 ?<josef.pktd at gmail.com>:
>>> >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith
>>> >> > <d.l.goldsmith at gmail.com> wrote:
>>> >> >> Are there enoug applications of transform methods (by which I mean,
>>> >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to
>>> >> >> be
>>> >> >> considered its own specialty therein?? Any text recommendations on
>>> >> >> it
>>> >> >> (even
>>> >> >> if it's only a chapter dedicated to it)?? Thanks,
>>> >> >>
>>> >> >
>>> >> > Some information is in the thread on my recent question
>>> >> > "characteristic functions of probability distributions"
>>> >> >
>>> >> > There is a large literature in econometrics and statistics about
>>> >> > using
>>> >> > the characteristic function for estimation and testing.
>>> >> > The reference of Nicky for queuing theory uses mostly the Laplace
>>> >> > transform (for discrete distributions),
>>> >>
>>> >> It has been some time ago (more than 5 years...), but I recall that
>>> >> Whitt, in his articles on the numerical inversion of Laplace
>>> >> transforms, discretized Laplace transforms to facilitate the
>>> >> inversion, The distributions themselves are not necessarily discrete.
>>> >> One example would be the waiting time distribution of customers in a
>>> >> queue, which is continuous for most service and arrival processes.
>>> >>
>>> >> There is certainly potential for dedicated numerical inversion algo's
>>> >> for the Laplace transforms of density and distribution functions. The
>>> >> latter form a somewhat specialized sort of function. Distribution
>>> >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing.
>>> >> They may also have discontinuities, but not too many. These properties
>>> >> may affect the inversion. ?Besides these properties, the transforms
>>> >> are used to obtain insight into the behavior of the sum of independent
>>> >> random variables. Such sums can be rewritten as the product of the
>>> >> transforms of distribution. This product in turn requires inversion
>>> >> to, as some people call it, take away the Laplacian curtain.
>>> >>
>>> >> Nicky
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>


From david at ar.media.kyoto-u.ac.jp  Wed Nov 25 23:45:45 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Thu, 26 Nov 2009 13:45:45 +0900
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
 etc.) 	methods in Prob. & Stats.
In-Reply-To: <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>	<1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com>
	<1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com>
Message-ID: <4B0E07F9.8070409@ar.media.kyoto-u.ac.jp>

josef.pktd at gmail.com wrote:
>
> Maybe the last statement is wrong, it's too long ago that I
> struggled with this. Maybe I'm mixing up Lebesgue-integral,
> Lebesgue-measurable, and measures that are absolutely continuous
> with respect to Lebesgue-measure.
>   

I am by no mean an expert on this, but I believe you are right. AFAIK,
contour integrals require to have a piecewise-continuous parametrization
of your path, and for me, the whole point of Lebesgue integrals is to
handle cases where the set over which you integrate the function is not
a (finite) union of intervals.

I don't know if it makes sense to define something "like" contour
integrals for lebesgue integrals. The fundamental reason why Lebesgue
integrals work the way they do is because for a function f: E ->F, only
the properties of F (and how the inversion function maps elements of the
sigma algebra F) matter. And complex analysis is 'special' because of
the special structure of E, not F.

David


From josef.pktd at gmail.com  Thu Nov 26 01:19:38 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Nov 2009 01:19:38 -0500
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <4B0E07F9.8070409@ar.media.kyoto-u.ac.jp>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
	<1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com>
	<1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com>
	<4B0E07F9.8070409@ar.media.kyoto-u.ac.jp>
Message-ID: <1cd32cbb0911252219p45046eddx88ac1272e9ed8ec4@mail.gmail.com>

On Wed, Nov 25, 2009 at 11:45 PM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> josef.pktd at gmail.com wrote:
>>
>> Maybe the last statement is wrong, it's too long ago that I
>> struggled with this. Maybe I'm mixing up Lebesgue-integral,
>> Lebesgue-measurable, and measures that are absolutely continuous
>> with respect to Lebesgue-measure.
>>
>
> I am by no mean an expert on this, but I believe you are right. AFAIK,
> contour integrals require to have a piecewise-continuous parametrization
> of your path, and for me, the whole point of Lebesgue integrals is to
> handle cases where the set over which you integrate the function is not
> a (finite) union of intervals.
>
> I don't know if it makes sense to define something "like" contour
> integrals for lebesgue integrals. The fundamental reason why Lebesgue
> integrals work the way they do is because for a function f: E ->F, only
> the properties of F (and how the inversion function maps elements of the
> sigma algebra F) matter. And complex analysis is 'special' because of
> the special structure of E, not F.

I think on the theoretical level I'm right, but from what I read the last few
hours, contour integrals seem to provide a method to actually calculate
the integral, while I haven't seen much practical applications of
Lebesgue integration.

For the simple examples that I tried so far for the inversion of the
characteristic function, I didn't need contour nor Lebesque integrals.
And I hope it stays this way when I get back to this, especially
since I never had to learn anything about complex analysis and
the special structure of complex numbers.

Josef

>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From oliphant at enthought.com  Thu Nov 26 11:42:44 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 26 Nov 2009 10:42:44 -0600
Subject: [SciPy-User] sinc interpolation
In-Reply-To: <5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com>
References: <C45370C4-4200-40DB-B46C-6D958708F9CF@enthought.com>
	<5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com>
Message-ID: <F1AEA663-65D3-4715-883E-1F90630AE332@enthought.com>


On Nov 25, 2009, at 10:16 AM, Giovanni Marco Dall'Olio wrote:

>
>
> On Wed, Nov 25, 2009 at 3:48 PM, Travis Oliphant <oliphant at enthought.com 
> > wrote:
>
>
> from scipy.signal import resample
> from numpy import r_, sin
> from pylab import plot
>
> x = r_[0:10]
> y = sin(x)
> yy = resample(x, 100)
>
> # This is a bit tricky to get the x-samples right
> xx = r_[0:10:101j][:-1]
>
> just a question, why don't you use  numpy.linspace(0, 10, 101) ?

I do quite often (especially in module code),  but r_ is less typing  
and I like the use of slice syntax to specify endpoints.

-Travis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091126/d71e63d5/attachment.html>

From sccolbert at gmail.com  Thu Nov 26 12:19:19 2009
From: sccolbert at gmail.com (S. Chris Colbert)
Date: Thu, 26 Nov 2009 18:19:19 +0100
Subject: [SciPy-User] [Fwd: Re:  ANN: SfePy 2009.4]
In-Reply-To: <4B0D1603.40509@ntc.zcu.cz>
References: <4B0D0E33.4010703@ntc.zcu.cz>
	<20091125110643.GB21484@phare.normalesup.org>
	<4B0D1603.40509@ntc.zcu.cz>
Message-ID: <200911261819.19584.sccolbert@gmail.com>

I'm getting all sorts of errors trying to run sfepy tests and examples:

It builds fine. 

But I fail one of the solvers test because of a bug with OpenMPI (whichever 
solver is using Petsc4py which btw, is not listed as a dependency).

The schroedinger example runs, but produces erroneous output (~300% error). 
The poisson and valec examples produce error results.  

System specs:
Kubuntu 9.10 x64

Self built/easy_insall:
Numpy 1.3.0
Scipy 0.7.1 
Newest umfpack scikit
Newest Petsc4Py
pytables 

from the repos:
hdf5-serial 
openmpi 1.6.6
pysparse

Any help would be awesome!

Cheers, 

Chris

> Gael Varoquaux wrote:
> > On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote:
> >>>     555                 else:
> >>>     556                     gui = ViewerGUI(viewer=self)
> >>> --> 557                     scene = gui.scene.mayavi_scene
> >>>     558
> >>>     559                 if scene is not self.scene:
> >>>
> >>> AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene'
> >
> > Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality
> > we needed a bit late :)
> >
> > Ga?l
> 
> And I have started using it even later :)
> 
> thanks for clarification!
> r.
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From vanforeest at gmail.com  Thu Nov 26 15:34:37 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Thu, 26 Nov 2009 21:34:37 +0100
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
	<45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com>
Message-ID: <fa510ff80911261234p16ff133du3152f5fbb280ca2d@mail.gmail.com>

> Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic.
> looks like a really good general reference, Nicky - I assume this is the
> "Chung" to which you were referring?? Thanks!!!

That is the one indeed.

For z transforms you might like generatingfunctionoly (or something
like this). Search on Herbert Wilf, On his website you can find a very
nice book on the uses of z transforms. The first chapter is very
accessible.

HOpe this helps.

NIcky


From peter.combs at berkeley.edu  Thu Nov 26 16:34:12 2009
From: peter.combs at berkeley.edu (Peter Combs)
Date: Thu, 26 Nov 2009 13:34:12 -0800
Subject: [SciPy-User] Bivariate Spline Surface Fitting
Message-ID: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>

Hi all,
I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'. 


import scipy.interpolate as interp
...
def makeLSQspline(xl, yl, xr, yr):
   """docstring for makespline"""
   
   xmin = xr.min()-1
   xmax = xr.max()+1
   ymin = yr.min()-1
   ymax = yr.max()+1
   n = len(xl)
   
   print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax
   
   yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j]   # Makes an 11x11 regular grid of knot locations
   
   xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat)
   yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat)
   
   def mapping(xr, yr):
      xl = xspline.ev(xr, yr)
      yl = yspline.ev(xr, yr)
      return xl, yl
   return mapping


I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as
/Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning: 
Error on entry, no approximation returned. The following conditions
must hold:
xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1
If iopt==-1, then
xb<tx[kx+1]<tx[kx+2]<...<tx[nx-kx-2]<xe
yb<ty[ky+1]<ty[ky+2]<...<ty[ny-ky-2]<ye
warnings.warn(message)

about 1 copy of this (or something similar) per call of makeLSQSpline:
tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 
tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 

makes me think something isn't quite right. Any guesses what's going on? I have ~3400 data points, roughly evenly spread out over a 28,000nm x 23,000nm grid. 

On another note, how well is this going to scale up? If I end up collecting hundreds of thousands to low-millions of points, does spline fitting go as O(n^2), or more like O(n)? The error registration function runs as O(n*O(fitting)), and takes around 5 seconds now, so O(N) spline fitting is fine, about an hour run time total, but O(n^2) is very much not.
Peter Combs
peter.combs at berkeley.edu


From d.l.goldsmith at gmail.com  Thu Nov 26 17:05:31 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Thu, 26 Nov 2009 14:05:31 -0800
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <fa510ff80911261234p16ff133du3152f5fbb280ca2d@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com>
	<fa510ff80911251441k6f0303f0i97c773df76fadfe6@mail.gmail.com>
	<45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com>
	<1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com>
	<45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com>
	<45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com>
	<fa510ff80911261234p16ff133du3152f5fbb280ca2d@mail.gmail.com>
Message-ID: <45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com>

Sounds good, thanks again!

DG

On Thu, Nov 26, 2009 at 12:34 PM, nicky van foreest <vanforeest at gmail.com>wrote:

> > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic.
> > looks like a really good general reference, Nicky - I assume this is the
> > "Chung" to which you were referring?  Thanks!!!
>
> That is the one indeed.
>
> For z transforms you might like generatingfunctionoly (or something
> like this). Search on Herbert Wilf, On his website you can find a very
> nice book on the uses of z transforms. The first chapter is very
> accessible.
>
> HOpe this helps.
>
> NIcky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091126/089a7d3a/attachment.html>

From josef.pktd at gmail.com  Thu Nov 26 17:17:31 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Nov 2009 17:17:31 -0500
Subject: [SciPy-User] Bivariate Spline Surface Fitting
In-Reply-To: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
References: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
Message-ID: <1cd32cbb0911261417h36ea8a82lf59823bbeece05@mail.gmail.com>

On Thu, Nov 26, 2009 at 4:34 PM, Peter Combs <peter.combs at berkeley.edu> wrote:
> Hi all,
> I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'.
>
>
> import scipy.interpolate as interp
> ...
> def makeLSQspline(xl, yl, xr, yr):
> ? """docstring for makespline"""
>
> ? xmin = xr.min()-1
> ? xmax = xr.max()+1
> ? ymin = yr.min()-1
> ? ymax = yr.max()+1
> ? n = len(xl)
>
> ? print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax
>
> ? yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j] ? # Makes an 11x11 regular grid of knot locations
>
> ? xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat)
> ? yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat)
>
> ? def mapping(xr, yr):
> ? ? ?xl = xspline.ev(xr, yr)
> ? ? ?yl = yspline.ev(xr, yr)
> ? ? ?return xl, yl
> ? return mapping
>
>
> I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as
> /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning:
> Error on entry, no approximation returned. The following conditions
> must hold:
> xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1
> If iopt==-1, then
> xb<tx[kx+1]<tx[kx+2]<...<tx[nx-kx-2]<xe
> yb<ty[ky+1]<ty[ky+2]<...<ty[ny-ky-2]<ye

(I'm out the door, so briefly)

I had this error message before when I didn't set up the knots
correctly, but I only tried the UnivariateSplines. It took me a while
to figure out how the endpoints of the knots are supposed to be
defined. If I remember correctly (?), then the endpoints where not
supposed to be included or something like this.

Josef


> warnings.warn(message)
>
> about 1 copy of this (or something similar) per call of makeLSQSpline:
> tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358
> ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000
> tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358
> ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000
>
> makes me think something isn't quite right. Any guesses what's going on? I have ~3400 data points, roughly evenly spread out over a 28,000nm x 23,000nm grid.
>
> On another note, how well is this going to scale up? If I end up collecting hundreds of thousands to low-millions of points, does spline fitting go as O(n^2), or more like O(n)? The error registration function runs as O(n*O(fitting)), and takes around 5 seconds now, so O(N) spline fitting is fine, about an hour run time total, but O(n^2) is very much not.
> Peter Combs
> peter.combs at berkeley.edu
>
>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From sccolbert at gmail.com  Thu Nov 26 17:45:49 2009
From: sccolbert at gmail.com (S. Chris Colbert)
Date: Thu, 26 Nov 2009 23:45:49 +0100
Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace,
	etc.) 	methods in Prob. & Stats.
In-Reply-To: <45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com>
References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com>
	<fa510ff80911261234p16ff133du3152f5fbb280ca2d@mail.gmail.com>
	<45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com>
Message-ID: <200911262345.49903.sccolbert@gmail.com>

i dont know if it will be of any use to you, but:

Laplace transforms and their inversions are also used extensively in control 
theory. 

I wrote some python for the numerical inversion awhile back (before i was any 
good in python, be warned!) 

There are two different inversion methods in the attached file: the method of 
riemann sums is the faster of the two, and here is the reference from which I 
made my implementation:

http://books.google.com/books?id=CmX1aHur7jcC&pg=PA410&lpg=PA410&dq=%27%27%27This+algorithm+is+proposed+by+Tzou,
+Ozisik+and+Chiffelle+%281994%29%27%27%27&source=bl&ots=NSiw3tKRvG&sig=cqfa_ka_baPbcnhoSA9Gcxo8Vj8&hl=en&ei=YgQPS-3dKJ2qmwPO6LncBQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAgQ6AEwAA#v=onepage&q=%27%27%27This%20algorithm%20is%20proposed%20by%20Tzou%2C%20Ozisik%20and%20Chiffelle%20%281994%29%27%27%27&f=false

You can probably throw out the stehfest method. I was using it originally as 
it was faster, but then I managed to vectorize the riemann method. I had no 
luck vectorizing the stehfest method. 

Cheers, 

Chris

> Sounds good, thanks again!
> 
> DG
> 
> On Thu, Nov 26, 2009 at 12:34 PM, nicky van foreest 
<vanforeest at gmail.com>wrote:
> > > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic.
> > > looks like a really good general reference, Nicky - I assume this is
> > > the "Chung" to which you were referring?  Thanks!!!
> >
> > That is the one indeed.
> >
> > For z transforms you might like generatingfunctionoly (or something
> > like this). Search on Herbert Wilf, On his website you can find a very
> > nice book on the uses of z transforms. The first chapter is very
> > accessible.
> >
> > HOpe this helps.
> >
> > NIcky
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inverselaplace.py
Type: text/x-python
Size: 4864 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091126/eece7333/attachment.py>

From david_baddeley at yahoo.com.au  Thu Nov 26 19:30:27 2009
From: david_baddeley at yahoo.com.au (David Baddeley)
Date: Thu, 26 Nov 2009 16:30:27 -0800 (PST)
Subject: [SciPy-User] Bivariate Spline Surface Fitting
In-Reply-To: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
Message-ID: <393158.32904.qm@web33005.mail.mud.yahoo.com>

Hi Peter,

would that be localization microscopy data by any chance? Which method are you using? 

We're using a very similar approach to correct our chromatic shift, I suspect that the problem you're having with the standard spline is that if you have a few points where the shift estimation is way off (when you've got 1000's of points you're almost guaranteed to have a few in the tails of a distribution) and they're pulling the interpolation out of whack in their neighbourhood. To get around this, we've ended up using a smoothing spline rather than a simple bivariate spline (see code below). We also preprocess the data to throw out any shift measurements which are pointing in dramatically different directions to their neighbours.

def genShiftVectorFieldSpline(x,y, sx, sy, err_sx, err_sy):
    '''interpolates shift vectors using smoothing splines'''
    wonky = findWonkyVectors(x, y, sx, sy, tol=2*err_sx.mean())
    good = wonky == 0

    print '%d wonky vectors found and discarded' % wonky.sum()

    spx = SmoothBivariateSpline(x[good], y[good], sx[good], 1./err_sx[good])
    spy = SmoothBivariateSpline(x[good], y[good], sy[good], 1./err_sy[good])

    X, Y = np.meshgrid(np.arange(0, 512*70, 100), np.arange(0, 256*70, 100))

    dx = spx.ev(X.ravel(),Y.ravel()).reshape(X.shape)
    dy = spy.ev(X.ravel(),Y.ravel()).reshape(X.shape)

    return (dx.T, dy.T, spx, spy)

I've never found that I need more than a few thousand points to calculate a shift field which will get the error down to the 10nm regime, and the most I've tried fitting is probably ~10-20K points, which would only have taken a couple of minutes. Evaluating the splines is fast though, so you should have no worries evaluating with millions of points.

Cheers,
David


--- On Fri, 27/11/09, Peter Combs <peter.combs at berkeley.edu> wrote:

> From: Peter Combs <peter.combs at berkeley.edu>
> Subject: [SciPy-User] Bivariate Spline Surface Fitting
> To: scipy-user at scipy.org
> Received: Friday, 27 November, 2009, 10:34 AM
> Hi all,
> I have localization data in 2 color channels that should
> agree with each other, but in practice, they don't to the
> level we want. I thought I'd try doing a straight polynomial
> least squares fit, and while that gives better registration
> between the two, I'm still not to the level I want. My next
> thought was a spline fit, so I'm trying to make two
> least-squares bivariate spline fits: one for taking (x,y) to
> x', and one for taking (x,y) to y'. 
> 
> 
> import scipy.interpolate as interp
> ...
> def makeLSQspline(xl, yl, xr, yr):
> ???"""docstring for makespline"""
> ???
> ???xmin = xr.min()-1
> ???xmax = xr.max()+1
> ???ymin = yr.min()-1
> ???ymax = yr.max()+1
> ???n = len(xl)
> ???
> ???print "xrange: ", xmin, xmax, '\t',
> "yrange: ", ymin, ymax
> ???
> ???yknots, xknots = mgrid[ymin:ymax:10j,
> xmin:xmax:10j]???# Makes an 11x11 regular
> grid of knot locations
> ???
> ???xspline = interp.LSQBivariateSpline(xr,
> yr, xl, xknots.flat, yknots.flat)
> ???yspline = interp.LSQBivariateSpline(xr,
> yr, yl, xknots.flat, yknots.flat)
> ???
> ???def mapping(xr, yr):
> ? ? ? xl = xspline.ev(xr, yr)
> ? ? ? yl = yspline.ev(xr, yr)
> ? ? ? return xl, yl
> ???return mapping
> 
> 
> I have a "Registration Error" function which calculates a
> mapping for all but the ith point, then plugs that point
> into the mapping and finds the difference between the
> predicted value and the known value. For the 2nd order
> polynomial fit, I get a mean registration error around 7nm,
> but for the spline fitting using the function above, the
> mean error is more like 20,000nm. Which (along with all the
> random junk that gets spit out, such as
> /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498:
> UserWarning: 
> Error on entry, no approximation returned. The following
> conditions
> must hold:
> xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0,
> i=0..m-1
> If iopt==-1, then
> xb<tx[kx+1]<tx[kx+2]<...<tx[nx-kx-2]<xe
> yb<ty[ky+1]<ty[ky+2]<...<ty[ny-ky-2]<ye
> warnings.warn(message)
> 
> about 1 copy of this (or something similar) per call of
> makeLSQSpline:
> tx= 0.00000000000000 0.00000000000000 0.00000000000000
> -264.022095089756 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358
> ? 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 28778.4083647834 0.00000000000000
> 0.00000000000000 0.00000000000000 
> tx= 0.00000000000000 0.00000000000000 0.00000000000000
> -264.022095089756 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358
> ? 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 337.266858978529 3468.05790461355
> 6598.84895024857 9729.63999588358 12860.4310415186
> 15991.2220871536 19122.0131327886 22252.8041784237
> 25383.5952240587 28514.3862696937 337.266858978529
> 3468.05790461355 6598.84895024857 9729.63999588358
> 12860.4310415186 15991.2220871536 19122.0131327886
> 22252.8041784237 25383.5952240587 28514.3862696937
> 337.266858978529 3468.05790461355 6598.84895024857
> 9729.63999588358 12860.4310415186 15991.2220871536
> 19122.0131327886 22252.8041784237 25383.5952240587
> 28514.3862696937 28778.4083647834 0.00000000000000
> 0.00000000000000 0.00000000000000 
> 
> makes me think something isn't quite right. Any guesses
> what's going on? I have ~3400 data points, roughly evenly
> spread out over a 28,000nm x 23,000nm grid. 
> 
> On another note, how well is this going to scale up? If I
> end up collecting hundreds of thousands to low-millions of
> points, does spline fitting go as O(n^2), or more like O(n)?
> The error registration function runs as O(n*O(fitting)), and
> takes around 5 seconds now, so O(N) spline fitting is fine,
> about an hour run time total, but O(n^2) is very much not.
> Peter Combs
> peter.combs at berkeley.edu
> 
> 
> 
> 
> 
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 


From josef.pktd at gmail.com  Thu Nov 26 23:03:04 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Nov 2009 23:03:04 -0500
Subject: [SciPy-User] Bivariate Spline Surface Fitting
In-Reply-To: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
References: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
Message-ID: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com>

On Thu, Nov 26, 2009 at 4:34 PM, Peter Combs <peter.combs at berkeley.edu> wrote:
> Hi all,
> I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'.
>
>
> import scipy.interpolate as interp
> ...
> def makeLSQspline(xl, yl, xr, yr):
> ? """docstring for makespline"""
>
> ? xmin = xr.min()-1
> ? xmax = xr.max()+1
> ? ymin = yr.min()-1
> ? ymax = yr.max()+1
> ? n = len(xl)
>
> ? print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax
>
> ? yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j] ? # Makes an 11x11 regular grid of knot locations

knots should only specify the point of x and y not all grid points, I
added an s to play with the border values following the example in the
tests. most of it just trial and error, since I don't have a good
example of what I should get as a result

with the following knots, it finishes without warning and errors, and
I get some numbers back that might be reasonable.

  s = 1.1
  yknots = np.linspace(ymin+s,ymax-s,10)
  xknots = np.linspace(xmin+s,xmax-s,10)

Some good examples for the use of the different options in the spline
classes would be nice.

the docs are still pretty bad, but there is:

473	        Input:
474	          x,y,z  - 1-d sequences of data points (order is not
475	                   important)
476	          tx,ty  - strictly ordered 1-d sequences of knots
477	                   coordinates.
478	        Optional input:
479	          w          - positive 1-d sequence of weights
480	          bbox       - 4-sequence specifying the boundary of
481	                       the rectangular approximation domain.
482	                       By default, bbox=[min(x,tx),max(x,tx),
483	                                         min(y,ty),max(y,ty)]
484	          kx,ky=3,3  - degrees of the bivariate spline.
485	          eps        - a threshold for determining the effective rank
486	                       of an over-determined linear system of
487	                       equations. 0 < eps < 1, default is 1e-16.

Josef

>
> ? xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat)
> ? yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat)
>
> ? def mapping(xr, yr):
> ? ? ?xl = xspline.ev(xr, yr)
> ? ? ?yl = yspline.ev(xr, yr)
> ? ? ?return xl, yl
> ? return mapping
>
>
> I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as
> /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning:
> Error on entry, no approximation returned. The following conditions
> must hold:
> xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1
> If iopt==-1, then
> xb<tx[kx+1]<tx[kx+2]<...<tx[nx-kx-2]<xe
> yb<ty[ky+1]<ty[ky+2]<...<ty[ny-ky-2]<ye
> warnings.warn(message)
>
> about 1 copy of this (or something similar) per call of makeLSQSpline:
> tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358
> ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000
> tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358
> ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000
>
> makes me think something isn't quite right. Any guesses what's going on? I have ~3400 data points, roughly evenly spread out over a 28,000nm x 23,000nm grid.
>
> On another note, how well is this going to scale up? If I end up collecting hundreds of thousands to low-millions of points, does spline fitting go as O(n^2), or more like O(n)? The error registration function runs as O(n*O(fitting)), and takes around 5 seconds now, so O(N) spline fitting is fine, about an hour run time total, but O(n^2) is very much not.
> Peter Combs
> peter.combs at berkeley.edu
>
>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From cool-rr at cool-rr.com  Fri Nov 27 01:02:17 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Fri, 27 Nov 2009 06:02:17 +0000 (UTC)
Subject: [SciPy-User] Tool for visualizing queues
Message-ID: <loom.20091127T070053-982@post.gmane.org>

I'm working on a simulation in Queueing Theory, and I would like a good tool for 
visualizing clients standing in queues and servers serving them. I would 
eventually like the GUI to be interactive as well, so for example the user could 
drag a client from one queue to another.

Any ideas?

Ram.


From cool-rr at cool-rr.com  Fri Nov 27 01:49:33 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Fri, 27 Nov 2009 06:49:33 +0000 (UTC)
Subject: [SciPy-User]
	=?utf-8?q?Mean_arrivals_per_time_unit_-=3E_Time_betw?=
	=?utf-8?q?een=09consecutive_arrivals?=
References: <loom.20091125T075842-627@post.gmane.org>
	<fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>
	<1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com>
Message-ID: <loom.20091127T070350-188@post.gmane.org>

 <josef.pktd <at> gmail.com> writes:
> > YOu should take the interarrival time between two consecutive arrivals
> > to be exponentially distributed with rate lambda, where lambda is the
> > arrival rate. LIke this the number of arrivals in a fixed period is
> > Poisson distributed. I never tried, but I suppose scipy contains a
> > module to generate exponentially distributed rv's.
> 
> The sum of iid exponential distributed rvs is gamma distributed
> http://en.wikipedia.org/wiki/Gamma_distribution
> 
> all available in scipy.stats
> 
> Josef

I don't understand. So you mean that the exponential thing would NOT be the
right thing for the time between consecutive arrivals?

Also, why doesn't scipy automatically gives me the time between consecutive 
arrivals when I give the mean number of arrivals per time period?

Ram.


From cimrman3 at ntc.zcu.cz  Fri Nov 27 03:08:44 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Fri, 27 Nov 2009 09:08:44 +0100
Subject: [SciPy-User] [Fwd: Re:  ANN: SfePy 2009.4]
In-Reply-To: <200911261819.19584.sccolbert@gmail.com>
References: <4B0D0E33.4010703@ntc.zcu.cz>	<20091125110643.GB21484@phare.normalesup.org>	<4B0D1603.40509@ntc.zcu.cz>
	<200911261819.19584.sccolbert@gmail.com>
Message-ID: <4B0F890C.107@ntc.zcu.cz>

Hi Chris,

thanks for trying sfepy!

Let's discuss this at sfepy-devel, or, if you do not want to register there, 
write me personally, this is IMHO rather off-topic for the scipy-user list.

S. Chris Colbert wrote:
> I'm getting all sorts of errors trying to run sfepy tests and examples:

Can you copy & paste the output and send it to me? I wonder what errors you get.

> It builds fine. 
> 
> But I fail one of the solvers test because of a bug with OpenMPI (whichever 
> solver is using Petsc4py which btw, is not listed as a dependency).

Yes, PETSc (with Petsc4py) are optional packages, used only by 
test_linear_solvers.py test file. It should not fail, however, only skip the test.

> The schroedinger example runs, but produces erroneous output (~300% error). 
> The poisson and valec examples produce error results.  
> 
> System specs:
> Kubuntu 9.10 x64
> 
> Self built/easy_insall:
> Numpy 1.3.0
> Scipy 0.7.1 
> Newest umfpack scikit
> Newest Petsc4Py
> pytables 
> 
> from the repos:
> hdf5-serial 
> openmpi 1.6.6
> pysparse
> 
> Any help would be awesome!

I will gladly help you, but I need more information - could you send me 
(off-list) the full outputs of the simulations that do not work, or produce 
error results? Also attach the solution files if you find it necessary.

Thanks,
r.


From peter.combs at berkeley.edu  Fri Nov 27 05:42:51 2009
From: peter.combs at berkeley.edu (Peter Combs)
Date: Fri, 27 Nov 2009 02:42:51 -0800
Subject: [SciPy-User] Bivariate Spline Surface Fitting
In-Reply-To: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com>
References: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
	<1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com>
Message-ID: <57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu>

On Nov 26, 2009, at 4:30 PM, David Baddeley wrote:

> Hi Peter,
> 
> would that be localization microscopy data by any chance? Which method are you using? 

Indeed it is!  The lab I'm working in does a lot of FIONA (Fluorescence Imaging with One Nanometer Accuracy), although the data I'm using is a couple steps removed from the usual assays that are done. 

On Nov 26, 2009, at 8:03 PM, josef.pktd at gmail.com wrote:
> knots should only specify the point of x and y not all grid points, I
> added an s to play with the border values following the example in the
> tests. most of it just trial and error, since I don't have a good
> example of what I should get as a result
> 

Thanks, that pretty much did it, I think.  I'm still playing with the number of knots to see what gives reasonable results.  Taking it up to 75 brings my error down to under 4 nm, which is starting to get to the limit of what we could do in one channel anyways.  

> with the following knots, it finishes without warning and errors, and
> I get some numbers back that might be reasonable.
> 

Now I'm getting this warning, but given that the results are very usable, I'm not too worried: 
The coefficients of the spline returned have been computed as the
minimal norm least-squares solution of a (numerically) rank deficient
system (deficiency=92). If deficiency is large, the results may be
inaccurate. Deficiency may strongly depend on the value of eps.
  warnings.warn(message)

I think it's saying that there are some grid squares that don't have enough points to calculate a fit, is that right?  I'm pretty sure these are at the edge of the mesh, and shouldn't be a big problem.

>  s = 1.1
>  yknots = np.linspace(ymin+s,ymax-s,10)
>  xknots = np.linspace(xmin+s,xmax-s,10)
> 
> Some good examples for the use of the different options in the spline
> classes would be nice.
> 

Yeah.  I think once I get things mostly figured out I'll try and condense what I have into an example or two.  My problem is that I still don't *really* understand the difference between all these different kinds of splines, so I'll probably want someone to make sure I'm not going totally off the deep end.


Peter Combs
peter.combs at berkeley.edu


From vanforeest at gmail.com  Fri Nov 27 07:33:04 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Fri, 27 Nov 2009 13:33:04 +0100
Subject: [SciPy-User] Tool for visualizing queues
In-Reply-To: <loom.20091127T070053-982@post.gmane.org>
References: <loom.20091127T070053-982@post.gmane.org>
Message-ID: <fa510ff80911270433t5f2ef6b0o6fd6d5ab8b16b5e3@mail.gmail.com>

Hi Ram,

You could have a look at omnetpp. In a simulator I used at Bell Labs
there was a small number, like an index, that showed the number of
customers in queue (the system).

bye

Nicky

2009/11/27 Ram Rachum <cool-rr at cool-rr.com>:
> I'm working on a simulation in Queueing Theory, and I would like a good tool for
> visualizing clients standing in queues and servers serving them. I would
> eventually like the GUI to be interactive as well, so for example the user could
> drag a client from one queue to another.
>
> Any ideas?
>
> Ram.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From vanforeest at gmail.com  Fri Nov 27 07:36:18 2009
From: vanforeest at gmail.com (nicky van foreest)
Date: Fri, 27 Nov 2009 13:36:18 +0100
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <loom.20091127T070350-188@post.gmane.org>
References: <loom.20091125T075842-627@post.gmane.org>
	<fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>
	<1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com>
	<loom.20091127T070350-188@post.gmane.org>
Message-ID: <fa510ff80911270436x16c3fb73jb79e57ef92df68e1@mail.gmail.com>

> Also, why doesn't scipy automatically gives me the time between consecutive
 arrivals when I give the mean number of arrivals per time period?

Consider a scenario in which  precisely t time units fit inbetween two
arrivals. The arrival rate would then be 1/t. When customers arrive
with exponentially distributed interarrival times and at rate 1/t, the
arrival rate is the same in both scenarios, but the interarrival times
not.

>
> Ram.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Fri Nov 27 09:08:01 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 09:08:01 -0500
Subject: [SciPy-User] Bivariate Spline Surface Fitting
In-Reply-To: <57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu>
References: <D6398D01-7D0A-4374-9A0A-72AF13F3B576@berkeley.edu>
	<1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com>
	<57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu>
Message-ID: <1cd32cbb0911270608x2a53f411l193111cf9bbc647c@mail.gmail.com>

On Fri, Nov 27, 2009 at 5:42 AM, Peter Combs <peter.combs at berkeley.edu> wrote:
> On Nov 26, 2009, at 4:30 PM, David Baddeley wrote:
>
>> Hi Peter,
>>
>> would that be localization microscopy data by any chance? Which method are you using?
>
> Indeed it is! ?The lab I'm working in does a lot of FIONA (Fluorescence Imaging with One Nanometer Accuracy), although the data I'm using is a couple steps removed from the usual assays that are done.
>
> On Nov 26, 2009, at 8:03 PM, josef.pktd at gmail.com wrote:
>> knots should only specify the point of x and y not all grid points, I
>> added an s to play with the border values following the example in the
>> tests. most of it just trial and error, since I don't have a good
>> example of what I should get as a result
>>
>
> Thanks, that pretty much did it, I think. ?I'm still playing with the number of knots to see what gives reasonable results. ?Taking it up to 75 brings my error down to under 4 nm, which is starting to get to the limit of what we could do in one channel anyways.
>
>> with the following knots, it finishes without warning and errors, and
>> I get some numbers back that might be reasonable.
>>
>
> Now I'm getting this warning, but given that the results are very usable, I'm not too worried:
> The coefficients of the spline returned have been computed as the
> minimal norm least-squares solution of a (numerically) rank deficient
> system (deficiency=92). If deficiency is large, the results may be
> inaccurate. Deficiency may strongly depend on the value of eps.
> ?warnings.warn(message)
>
> I think it's saying that there are some grid squares that don't have enough points to calculate a fit, is that right? ?I'm pretty sure these are at the edge of the mesh, and shouldn't be a big problem.


In a example from the tests, I got this message when the third
variable z didn't have any variation. I guess for interpolation this
might not have a strong effect.

>
>> ?s = 1.1
>> ?yknots = np.linspace(ymin+s,ymax-s,10)
>> ?xknots = np.linspace(xmin+s,xmax-s,10)
>>
>> Some good examples for the use of the different options in the spline
>> classes would be nice.
>>
>
> Yeah. ?I think once I get things mostly figured out I'll try and condense what I have into an example or two. ?My problem is that I still don't *really* understand the difference between all these different kinds of splines, so I'll probably want someone to make sure I'm not going totally off the deep end.

>From what I figured out so far, the main difference between
SmoothBivariateSpline and LSQBivariateSpline is that in the first case
the approximation is controlled by s and the knot points are adjusted,
in the second case the knot points are fixed and will imply some s.
(For the univariate splines, we had the discussion on the mailing list
that the class names are not very descriptive and misleading)

Controlling s (small positive number) looks easier to adjust, than
choosing knots.  One possibility might be to try out different values
of s and see how the chosen knot points (get_knots()) compare to the
ones that you are using now.

I started to convert a scipy.tutorial example that uses the old
wrapper to using spline classes to see the difference in the required
arguments, but didn't get very far yet. Having a good (graphical)
summary of the spline results, helps a lot to quickly see what the
different "smoothing parameters" are doing.

Josef

>
>
> Peter Combs
> peter.combs at berkeley.edu
>
>
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Fri Nov 27 09:58:18 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 09:58:18 -0500
Subject: [SciPy-User] Mean arrivals per time unit -> Time between
	consecutive arrivals
In-Reply-To: <loom.20091127T070350-188@post.gmane.org>
References: <loom.20091125T075842-627@post.gmane.org>
	<fa510ff80911250754o54c3558di44e4d124b8dc7c71@mail.gmail.com>
	<1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com>
	<loom.20091127T070350-188@post.gmane.org>
Message-ID: <1cd32cbb0911270658g57c35802l489d53a58f55410d@mail.gmail.com>

On Fri, Nov 27, 2009 at 1:49 AM, Ram Rachum <cool-rr at cool-rr.com> wrote:
> ?<josef.pktd <at> gmail.com> writes:
>> > YOu should take the interarrival time between two consecutive arrivals
>> > to be exponentially distributed with rate lambda, where lambda is the
>> > arrival rate. LIke this the number of arrivals in a fixed period is
>> > Poisson distributed. I never tried, but I suppose scipy contains a
>> > module to generate exponentially distributed rv's.
>>
>> The sum of iid exponential distributed rvs is gamma distributed
>> http://en.wikipedia.org/wiki/Gamma_distribution
>>
>> all available in scipy.stats
>>
>> Josef
>
> I don't understand. So you mean that the exponential thing would NOT be the
> right thing for the time between consecutive arrivals?

What I meant was that the distribution of the time to the next arrival
is exponential distributed. The time until you have k arrivals is the
sum of k exponentially distributed random variables and is gamma
distributed.

For simulation of queuing models http://pypi.python.org/pypi/SimPy
looks also useful, although I never used it.

Josef

>
> Also, why doesn't scipy automatically gives me the time between consecutive
> arrivals when I give the mean number of arrivals per time period?
>
> Ram.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From cool-rr at cool-rr.com  Fri Nov 27 10:42:11 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Fri, 27 Nov 2009 15:42:11 +0000 (UTC)
Subject: [SciPy-User] Tool for visualizing queues
References: <loom.20091127T070053-982@post.gmane.org>
	<fa510ff80911270433t5f2ef6b0o6fd6d5ab8b16b5e3@mail.gmail.com>
Message-ID: <loom.20091127T164156-727@post.gmane.org>

nicky van foreest <vanforeest <at> gmail.com> writes:

> 
> Hi Ram,
> 
> You could have a look at omnetpp. In a simulator I used at Bell Labs
> there was a small number, like an index, that showed the number of
> customers in queue (the system).
> 
> bye
> 
> Nicky
> 

Thanks for the tip.

Ram.


From josef.pktd at gmail.com  Fri Nov 27 13:07:53 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 13:07:53 -0500
Subject: [SciPy-User] BivariateSpline examples and my crashing python
Message-ID: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>

I wanted to prepare some examples for the use of the Bivariate Spline classes,
The second part of the attached script contains a translation of a
scipy.tutorial example to using the 3 classes instead.

However, this script keeps crashing on me. Initially I thought it is
RectBivariateSpline, but now I think it might be matplotlib, TK
backend and maybe my latest numpy build.
I would like to know if the splines crash or if it is my current setup
that causes the crash. It crashes in spyder, idle and when I close the
windows when I run it on the commandline. The last might indicate that
the problem is matplotlib related.

Can someone run the script, preferably not in an interpreter where you
want to keep your session alive?

Josef
-------------- next part --------------
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 26 22:00:20 2009

Author: josef-pktd and scipy mailinglist example
"""

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt

# from mailing list - Peter Combs
def makeLSQspline(xl, yl, xr, yr):
  """docstring for makespline"""

  xmin = xr.min()-1
  xmax = xr.max()+1
  ymin = yr.min()-1
  ymax = yr.max()+1
  n = len(xl)

  print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax
  s = 1.1
  yknots, xknots = np.mgrid[ymin+s:ymax-s:10j, xmin+s:xmax-s:10j]   # Makes an 11x11 regular grid of knot locations
  yknots = np.linspace(ymin+s,ymax-s,10)
  xknots = np.linspace(xmin+s,xmax-s,10)

  xspline = interpolate.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat)
  yspline = interpolate.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat)

  def mapping(xr, yr):
      xl = xspline.ev(xr, yr)
      yl = yspline.ev(xr, yr)
      return xl, yl
  return mapping, xspline, yspline

xr = np.arange(20)
yr = np.arange(20)
s=0
xr, yr = np.mgrid[0+s:20-s:30j, 0+s:20-s:30j]
xr = xr.ravel()
yr = yr.ravel()

xl = np.sin(xr) + 0.1*np.random.normal(size=xr.shape)
yl = yr + 0.1*np.random.normal(size=yr.shape)

smap, xspline, yspline = makeLSQspline(xl, yl, xr, yr)
#print smap(xr, yr)
plt.plot(xl)
plt.plot(xr)
#plt.show()
xsp = interpolate.SmoothBivariateSpline(xr, yr, xl, kx=2,ky=2)
print xsp.get_knots()


#example from tests, testfitpack.py

x = [1,1,1,2,2,2,3,3,3]
y = [1,2,3,1,2,3,1,2,3]
z = [3,3,4,4,5,6,3,3,3]
s = 0.1
tx = [1+s,3-s]
ty = [1+s,3-s]
lut = interpolate.LSQBivariateSpline(x,y,z,tx,ty,kx=1,ky=1)


import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt


#2d spline interpolation example from the tutorial
#-------------------------------------------------

# Define function over sparse 20x20 grid

x,y = np.mgrid[-1:1:20j,-1:1:20j]
z = (x+y)*np.exp(-6.0*(x*x+y*y))

plt.figure()
plt.pcolor(x,y,z)
plt.colorbar()
plt.title("Sparsely sampled function.")
#plt.show()

# Interpolate function over new 70x70 grid

xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j]
tck = interpolate.bisplrep(x,y,z,s=0)
znew = interpolate.bisplev(xnew[:,0],ynew[0,:],tck)

plt.figure()
plt.pcolor(xnew,ynew,znew)
plt.colorbar()
plt.title("Interpolated function - bisplrep")
#plt.show()


#Use spline classes instead of original wrapper
#----------------------------------------------

#use same example as before
### Define function over sparse 20x20 grid
##
##x,y = np.mgrid[-1:1:20j,-1:1:20j]
##z = (x+y)*np.exp(-6.0*(x*x+y*y))
##
##plt.figure()
##plt.pcolor(x,y,z)
##plt.colorbar()
##plt.title("Sparsely sampled function.")
###plt.show()

#use SmoothBivariateSpline
#^^^^^^^^^^^^^^^^^^^^^^^^^


xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j]
#tck = interpolate.bisplrep(x,y,z,s=0)
intp = interpolate.SmoothBivariateSpline(x.ravel(),y.ravel(),z.ravel(),s=0.01)
znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70))

plt.figure()
plt.pcolor(xnew,ynew,znew)
plt.colorbar()
plt.title("Interpolated function - SmoothBivariateSpline")
#plt.show()

#use LSQBivariateSpline
#^^^^^^^^^^^^^^^^^^^^^^^^^


xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j]

#get knots from previous example
tx,ty = intp.get_knots()
tx = tx[4:-4]  # remove endpoints, 4 in this example
ty = ty[4:-4]

intp = interpolate.LSQBivariateSpline(x.ravel(),y.ravel(),z.ravel(), tx, ty)
znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70))
plt.figure()
plt.pcolor(xnew,ynew,znew)
plt.colorbar()
plt.title("Interpolated function - LSQBivariateSpline")
#plt.show()

#use RectBivariateSpline
#^^^^^^^^^^^^^^^^^^^^^^^^

# this seems to cause a crash, for eg. s=0.001
# or maybe matplotlib related or maybe numpy ABI problems ?
# or maybe some random crashing ?
# I think it's matplotlib when closing windows

intp = interpolate.RectBivariateSpline(x[:,0],y[0,:],z, s=0.001)
znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70))

plt.figure()
plt.pcolor(xnew,ynew,znew)
plt.colorbar()
plt.title("Interpolated function - RectBivariateSpline")
plt.show()

From jsseabold at gmail.com  Fri Nov 27 13:15:48 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Fri, 27 Nov 2009 13:15:48 -0500
Subject: [SciPy-User] BivariateSpline examples and my crashing python
In-Reply-To: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
Message-ID: <c048da1c0911271015s5edee90ai358fc1e141ed289e@mail.gmail.com>

On Fri, Nov 27, 2009 at 1:07 PM,  <josef.pktd at gmail.com> wrote:
> I wanted to prepare some examples for the use of the Bivariate Spline classes,
> The second part of the attached script contains a translation of a
> scipy.tutorial example to using the 3 classes instead.
>
> However, this script keeps crashing on me. Initially I thought it is
> RectBivariateSpline, but now I think it might be matplotlib, TK
> backend and maybe my latest numpy build.
> I would like to know if the splines crash or if it is my current setup
> that causes the crash. It crashes in spyder, idle and when I close the
> windows when I run it on the commandline. The last might indicate that
> the problem is matplotlib related.
>
> Can someone run the script, preferably not in an interpreter where you
> want to keep your session alive?
>

Runs fine for me and creates plots from within the interpreter and on
the command line in Linux.  I noticed that you recently ran into the
segfault problem that occured somewhere in trunk towards the end
summer (I forget when and why).  Did you delete everything and rebuild
matplotlib as well?  I had to rebuild everything that had numpy/scipy
as a dependency after that update.  Don't know if that's what's going
on though.

Skipper


From josef.pktd at gmail.com  Fri Nov 27 13:26:49 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 13:26:49 -0500
Subject: [SciPy-User] BivariateSpline examples and my crashing python
In-Reply-To: <c048da1c0911271015s5edee90ai358fc1e141ed289e@mail.gmail.com>
References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
	<c048da1c0911271015s5edee90ai358fc1e141ed289e@mail.gmail.com>
Message-ID: <1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com>

On Fri, Nov 27, 2009 at 1:15 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Fri, Nov 27, 2009 at 1:07 PM, ?<josef.pktd at gmail.com> wrote:
>> I wanted to prepare some examples for the use of the Bivariate Spline classes,
>> The second part of the attached script contains a translation of a
>> scipy.tutorial example to using the 3 classes instead.
>>
>> However, this script keeps crashing on me. Initially I thought it is
>> RectBivariateSpline, but now I think it might be matplotlib, TK
>> backend and maybe my latest numpy build.
>> I would like to know if the splines crash or if it is my current setup
>> that causes the crash. It crashes in spyder, idle and when I close the
>> windows when I run it on the commandline. The last might indicate that
>> the problem is matplotlib related.
>>
>> Can someone run the script, preferably not in an interpreter where you
>> want to keep your session alive?
>>
>
> Runs fine for me and creates plots from within the interpreter and on
> the command line in Linux. ?I noticed that you recently ran into the
> segfault problem that occured somewhere in trunk towards the end
> summer (I forget when and why). ?Did you delete everything and rebuild
> matplotlib as well? ?I had to rebuild everything that had numpy/scipy
> as a dependency after that update. ?Don't know if that's what's going
> on though.

On Windows I cannot rebuild matplotlib, I tried once but it has too
many dependencies (and a at least a while ago couldn't be fully
build with MingW.)

That's why I'm worried about all the ABI breakage that was going on
and I only recently started to work with the numpy trunk version.

At least RectBivariateSpline works. I got a bit suspicious because
it is missing in my (older) docs, and not in
http://docs.scipy.org/scipy/docs/scipy-docs/interpolate.rst/

Josef

>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From josef.pktd at gmail.com  Fri Nov 27 13:27:32 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 13:27:32 -0500
Subject: [SciPy-User] BivariateSpline examples and my crashing python
In-Reply-To: <1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com>
References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
	<c048da1c0911271015s5edee90ai358fc1e141ed289e@mail.gmail.com>
	<1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com>
Message-ID: <1cd32cbb0911271027s50ca924dwa670e4996f035e7d@mail.gmail.com>

On Fri, Nov 27, 2009 at 1:26 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Nov 27, 2009 at 1:15 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>> On Fri, Nov 27, 2009 at 1:07 PM, ?<josef.pktd at gmail.com> wrote:
>>> I wanted to prepare some examples for the use of the Bivariate Spline classes,
>>> The second part of the attached script contains a translation of a
>>> scipy.tutorial example to using the 3 classes instead.
>>>
>>> However, this script keeps crashing on me. Initially I thought it is
>>> RectBivariateSpline, but now I think it might be matplotlib, TK
>>> backend and maybe my latest numpy build.
>>> I would like to know if the splines crash or if it is my current setup
>>> that causes the crash. It crashes in spyder, idle and when I close the
>>> windows when I run it on the commandline. The last might indicate that
>>> the problem is matplotlib related.
>>>
>>> Can someone run the script, preferably not in an interpreter where you
>>> want to keep your session alive?
>>>
>>
>> Runs fine for me and creates plots from within the interpreter and on
>> the command line in Linux. ?I noticed that you recently ran into the
>> segfault problem that occured somewhere in trunk towards the end
>> summer (I forget when and why). ?Did you delete everything and rebuild
>> matplotlib as well? ?I had to rebuild everything that had numpy/scipy
>> as a dependency after that update. ?Don't know if that's what's going
>> on though.
>
> On Windows I cannot rebuild matplotlib, I tried once but it has too
> many dependencies (and a at least a while ago couldn't be fully
> build with MingW.)
>
> That's why I'm worried about all the ABI breakage that was going on
> and I only recently started to work with the numpy trunk version.
>
> At least RectBivariateSpline works. I got a bit suspicious because
> it is missing in my (older) docs, and not in
> http://docs.scipy.org/scipy/docs/scipy-docs/interpolate.rst/

And thank you for checking the script.

Josef

>
> Josef
>
>>
>> Skipper
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>


From kmichael.aye at googlemail.com  Fri Nov 27 14:49:10 2009
From: kmichael.aye at googlemail.com (Michael Aye)
Date: Fri, 27 Nov 2009 11:49:10 -0800 (PST)
Subject: [SciPy-User] How to find local minimum of 1d histogram
Message-ID: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>

Hi!

I am still fairly new with scipy, so please forgive me, if this is a
simple question. But I couldn't find an example for this.

What is the easiest way of finding the local minimum between 2
gaussian-like peaks in a 1d Histogram?

Background:
Using a histogram on an image to identify 2 populations of
intensities.
The minimum between the gaussian-like peaks in the histogram shall be
used as the masking limit to either show one or the other population
of pixel intensities.

My idea so far, but I'm not sure, if there is not a more obvious way?
* Using interpolate1d to get a spline.
* somehow get the coefficients of the spline function.
* put them into poly1d
* do derivative
* get roots of derivative

I am ready to go this way, but I wondered if it isn't easier?

Best regards and a nice weekend!
Michael


From robince at gmail.com  Fri Nov 27 15:12:20 2009
From: robince at gmail.com (Robin)
Date: Fri, 27 Nov 2009 20:12:20 +0000
Subject: [SciPy-User] How to find local minimum of 1d histogram
In-Reply-To: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>
References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>
Message-ID: <2d5132a50911271212j273088c2pe652c99f38062d1c@mail.gmail.com>

On Fri, Nov 27, 2009 at 7:49 PM, Michael Aye
<kmichael.aye at googlemail.com> wrote:
> Hi!
>
> I am still fairly new with scipy, so please forgive me, if this is a
> simple question. But I couldn't find an example for this.
>
> What is the easiest way of finding the local minimum between 2
> gaussian-like peaks in a 1d Histogram?
>
> Background:
> Using a histogram on an image to identify 2 populations of
> intensities.
> The minimum between the gaussian-like peaks in the histogram shall be
> used as the masking limit to either show one or the other population
> of pixel intensities.
>
> My idea so far, but I'm not sure, if there is not a more obvious way?
> * Using interpolate1d to get a spline.
> * somehow get the coefficients of the spline function.
> * put them into poly1d
> * do derivative
> * get roots of derivative
>
> I am ready to go this way, but I wondered if it isn't easier?

>From the histogram you get a vector of counts - couldn't you do a diff
on the vector and look for where that changes sign? If its a bit noisy
you could look for where it changes sign or perhaps smooth before
diffing.

Cheers

Robin


From dwf at cs.toronto.edu  Fri Nov 27 16:12:15 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 27 Nov 2009 16:12:15 -0500
Subject: [SciPy-User] How to find local minimum of 1d histogram
In-Reply-To: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>
References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>
Message-ID: <04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu>


On 27-Nov-09, at 2:49 PM, Michael Aye wrote:

> The minimum between the gaussian-like peaks in the histogram shall be
> used as the masking limit to either show one or the other population
> of pixel intensities.
>
> My idea so far, but I'm not sure, if there is not a more obvious way?
> * Using interpolate1d to get a spline.
> * somehow get the coefficients of the spline function.
> * put them into poly1d
> * do derivative
> * get roots of derivative

I had a similar problem, actually, and used  
scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete  
second derivative. The minimum should be pretty easy to locate (it  
will appear as a rather significant maximum peak in the transformed  
curve).

David


From josef.pktd at gmail.com  Fri Nov 27 17:07:49 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Nov 2009 17:07:49 -0500
Subject: [SciPy-User] How to find local minimum of 1d histogram
In-Reply-To: <04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu>
References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com>
	<04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu>
Message-ID: <1cd32cbb0911271407n450f80d0nc37d1e17d87b0824@mail.gmail.com>

On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
>
> On 27-Nov-09, at 2:49 PM, Michael Aye wrote:
>
>> The minimum between the gaussian-like peaks in the histogram shall be
>> used as the masking limit to either show one or the other population
>> of pixel intensities.
>>
>> My idea so far, but I'm not sure, if there is not a more obvious way?
>> * Using interpolate1d to get a spline.
>> * somehow get the coefficients of the spline function.
>> * put them into poly1d
>> * do derivative
>> * get roots of derivative
>
> I had a similar problem, actually, and used
> scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete
> second derivative. The minimum should be pretty easy to locate (it
> will appear as a rather significant maximum peak in the transformed
> curve).

In a similar direction, I thought of using gaussian_kde to get a
smoothed probability distribution. and look for local minimum.

Josef

>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From kgdunn at gmail.com  Fri Nov 27 17:31:32 2009
From: kgdunn at gmail.com (Kevin Dunn)
Date: Fri, 27 Nov 2009 17:31:32 -0500
Subject: [SciPy-User] How to find local minimum of 1d histogram
Message-ID: <ad94c31e0911271431y15d3f476q773e5b45cd99a71a@mail.gmail.com>

> On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
>>
>> On 27-Nov-09, at 2:49 PM, Michael Aye wrote:
>>
>>> The minimum between the gaussian-like peaks in the histogram shall be
>>> used as the masking limit to either show one or the other population
>>> of pixel intensities.
>>>
>>> My idea so far, but I'm not sure, if there is not a more obvious way?
>>> * Using interpolate1d to get a spline.
>>> * somehow get the coefficients of the spline function.
>>> * put them into poly1d
>>> * do derivative
>>> * get roots of derivative
>>
>> I had a similar problem, actually, and used
>> scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete
>> second derivative. The minimum should be pretty easy to locate (it
>> will appear as a rather significant maximum peak in the transformed
>> curve).
>
> In a similar direction, I thought of using gaussian_kde to get a
> smoothed probability distribution. and look for local minimum.

Yet another way: Otsu's method [1], which is a standard algorithm in
image processing to segment an image.  There are other methods as
well.

When I've used Otsu's method from real-time image processing (under
unpredictable lighting), I use it only to provide a starting value.
Then you move left or right along the smoothed histogram (I normally
just use a moving average smoother, because other exotic smoothers
take too much time and don't improve accuracy that much) until you
land up in a minimum.

Usually the Otsu initial guess isn't far off, but it can be under some
circumstances.

[1] http://en.wikipedia.org/wiki/Otsu%27s_method (also see the
references at the bottom)

HTH,
Kevin

> Josef
>
>>
>> David
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user


From kmichael.aye at googlemail.com  Fri Nov 27 19:43:42 2009
From: kmichael.aye at googlemail.com (Michael Aye)
Date: Fri, 27 Nov 2009 16:43:42 -0800 (PST)
Subject: [SciPy-User] How to find local minimum of 1d histogram
In-Reply-To: <ad94c31e0911271431y15d3f476q773e5b45cd99a71a@mail.gmail.com>
References: <ad94c31e0911271431y15d3f476q773e5b45cd99a71a@mail.gmail.com>
Message-ID: <02b80856-e735-4977-88fb-c39af845df23@m3g2000yqf.googlegroups.com>

Thanks to you all, love this forum!
That will keep me busy on the weekend! ;)

BR,
Michael

On Nov 27, 11:31?pm, Kevin Dunn <kgd... at gmail.com> wrote:
> > On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley <d... at cs.toronto.edu> wrote:
>
> >> On 27-Nov-09, at 2:49 PM, Michael Aye wrote:
>
> >>> The minimum between the gaussian-like peaks in the histogram shall be
> >>> used as the masking limit to either show one or the other population
> >>> of pixel intensities.
>
> >>> My idea so far, but I'm not sure, if there is not a more obvious way?
> >>> * Using interpolate1d to get a spline.
> >>> * somehow get the coefficients of the spline function.
> >>> * put them into poly1d
> >>> * do derivative
> >>> * get roots of derivative
>
> >> I had a similar problem, actually, and used
> >> scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete
> >> second derivative. The minimum should be pretty easy to locate (it
> >> will appear as a rather significant maximum peak in the transformed
> >> curve).
>
> > In a similar direction, I thought of using gaussian_kde to get a
> > smoothed probability distribution. and look for local minimum.
>
> Yet another way: Otsu's method [1], which is a standard algorithm in
> image processing to segment an image. ?There are other methods as
> well.
>
> When I've used Otsu's method from real-time image processing (under
> unpredictable lighting), I use it only to provide a starting value.
> Then you move left or right along the smoothed histogram (I normally
> just use a moving average smoother, because other exotic smoothers
> take too much time and don't improve accuracy that much) until you
> land up in a minimum.
>
> Usually the Otsu initial guess isn't far off, but it can be under some
> circumstances.
>
> [1]http://en.wikipedia.org/wiki/Otsu%27s_method(also see the
> references at the bottom)
>
> HTH,
> Kevin
>
> > Josef
>
> >> David
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-U... at scipy.org
> >>http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user


From nwagner at iam.uni-stuttgart.de  Sat Nov 28 03:59:03 2009
From: nwagner at iam.uni-stuttgart.de (Nils Wagner)
Date: Sat, 28 Nov 2009 09:59:03 +0100
Subject: [SciPy-User] splprep example
Message-ID: <web-127577772@uni-stuttgart.de>

Hi all,

I am looking for a cookbook example wrt splprep.
Any pointer would be appreciated.


Nils


From cool-rr at cool-rr.com  Sat Nov 28 05:04:41 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 28 Nov 2009 10:04:41 +0000 (UTC)
Subject: [SciPy-User] EPD doesn't run my code
Message-ID: <loom.20091128T105800-170@post.gmane.org>

Hello,

I have my project GarlicSim which runs in ordinary Python. I tried to run it in 
EPD, and it's failing, at two distinct points I could identify. (Possibly there 
are more.)

Here's the project code:

http://github.com/cool-RR/GarlicSim-for-Python-2.5

I identified one of the points in question. It's about the `win32api` and the 
`win32process` modules.

When I load up the Python shell of my EPD, and try `import win32process`, I get 
this error dialog:

python.exe - Entry Point Not Found
The procedure entry point ?PyWinGlobals_Ensure@@YAXXZ could not be located in 
the dynamic link library pywintypes25.dll.


When I try 'import win32api`, I get:

python.exe - Entry Point Not Found
The procedure entry point ?PyWinObject_AsHANDLE@@YAHPAU_object@@PAPAXH at Z could 
not be located in the dynamic link library pywintypes25.dll.


The second point is that in my wxPython window, the images in the toolbar get 
cropped.

Any idea?


From cool-rr at cool-rr.com  Sat Nov 28 05:48:21 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 28 Nov 2009 10:48:21 +0000 (UTC)
Subject: [SciPy-User] EPD doesn't run my code
References: <loom.20091128T105800-170@post.gmane.org>
Message-ID: <loom.20091128T114411-495@post.gmane.org>

Ram Rachum <cool-rr <at> cool-rr.com> writes:

> 
> Hello,
> 
> I have my project GarlicSim which runs in ordinary Python. I tried to run in 
> EPD, and it's failing, at two distinct points I could identify. (Possibly ther 
> are more.)


Apologies, I didn't notice the epd-users list. I'll post it there.

Ram. 


From josef.pktd at gmail.com  Sat Nov 28 08:11:19 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 28 Nov 2009 08:11:19 -0500
Subject: [SciPy-User] splprep example
In-Reply-To: <web-127577772@uni-stuttgart.de>
References: <web-127577772@uni-stuttgart.de>
Message-ID: <1cd32cbb0911280511n6f4c0ac7m4ee09b7b3f8fca59@mail.gmail.com>

On Sat, Nov 28, 2009 at 3:59 AM, Nils Wagner
<nwagner at iam.uni-stuttgart.de> wrote:
> Hi all,
>
> I am looking for a cookbook example wrt splprep.
> Any pointer would be appreciated.

There are some examples in the scipy tutorial for interpolate in the docs and in
http://www.scipy.org/Cookbook/Interpolation?highlight=%28splprep%29#head-34818696f8d7066bb3188495567dd776a451cf11

A mailinglist search should also turn up a few examples.

Do you have anything specific in mind?

Josef

>
>
> Nils
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


From kalle-test at gmx.de  Sat Nov 28 14:19:42 2009
From: kalle-test at gmx.de (Kalle)
Date: Sat, 28 Nov 2009 20:19:42 +0100
Subject: [SciPy-User] BivariateSpline examples and my crashing python
In-Reply-To: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com>
Message-ID: <hert4d$s09$1@ger.gmane.org>

Hello Josef,

josef.pktd at gmail.com schrieb:
> I wanted to prepare some examples for the use of the Bivariate Spline classes,
> The second part of the attached script contains a translation of a
> scipy.tutorial example to using the 3 classes instead.
[...]
> Can someone run the script, preferably not in an interpreter where you
> want to keep your session alive?


your Script runs fine here under Windows XP SP3 with python 2.5.4, scipy 
0.7.1, matplotlib 0.98.5.3 (WX Backend) and numpy 1.2.1

There is only one warning, which might come from makeLSQspline i guess...

Thanks for the example BTW,

Kalle.


From josef.pktd at gmail.com  Sun Nov 29 20:42:08 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 29 Nov 2009 20:42:08 -0500
Subject: [SciPy-User] chebfun
Message-ID: <1cd32cbb0911291742m1d3f3ab8r886009c7fba3cab9@mail.gmail.com>

I just came by chance across this (for matlab)

http://www2.maths.ox.ac.uk/chebfun/index.html

http://www2.maths.ox.ac.uk/chebfun/license.html

The documentation looks helpful for someone like me who doesn't know
enough about what Chebyshev polynomials are good for.

Josef


From almar.klein at gmail.com  Mon Nov 30 04:05:23 2009
From: almar.klein at gmail.com (Almar Klein)
Date: Mon, 30 Nov 2009 10:05:23 +0100
Subject: [SciPy-User] ANN: visvis
Message-ID: <cc38d75f0911300105x5e7b8754gf0c6b21547de95d1@mail.gmail.com>

Hi all,

I am pleased to announce the first release of visvis, a Python
visualization library for of 1D to 4D data.

Website: http://code.google.com/p/visvis/
Discussion group: http://groups.google.com/group/visvis/

Since this is the first release, it hasn't been tested on a large
scale yet. Therefore I'm specifically interested to know whether it
works for everyone.

=== Description ===

Visvis is a pure Python visualization library that uses OpenGl to
display 1D to 4D data; it can be used from simple plotting tasks to
rendering 3D volumetric data that moves in time.

Visvis can be used in Python scripts, interactive Python sessions (as
with IPython or IEP) and can be embedded in applications.

Visvis employs an object oriented structure; each object being
visualized (e.g. a line or a texture) has various properties that can
be modified to change its behaviour or appearance. A Matlab-like
interface in the form of a set of functions allows easy creation of
these objects (e.g. plot(), imshow(), volshow()).

Regards,
  Almar


From jgomezdans at gmail.com  Mon Nov 30 07:05:57 2009
From: jgomezdans at gmail.com (Jose Gomez-Dans)
Date: Mon, 30 Nov 2009 12:05:57 +0000
Subject: [SciPy-User] Parallel code
Message-ID: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>

Hi!
I want to run some code in parallel, and I have toyed with the idea of
either using the multiprocessing module, or using ipython (which is quite
easy to use). The main idea is to run a number of class methods in parallel
(unsurprisingly!), fed with some arguments. However, these methods will need
(read-)access to a rather large numpy array. Ideally (and since this is
running on a SMP box), this could be a chunk of shared memory. I am aware of
Sturla Molden's suggestion of using ctypes, but I guess that I was wondering
whether some magic simple stuff is available off the shelf for this shared
memory business?

Thanks!
J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091130/38449a81/attachment.html>

From robince at gmail.com  Mon Nov 30 07:39:41 2009
From: robince at gmail.com (Robin)
Date: Mon, 30 Nov 2009 12:39:41 +0000
Subject: [SciPy-User] Parallel code
In-Reply-To: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>
References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>
Message-ID: <2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com>

If it is read only and you are on a platform with fork (ie not
windows) that multiprocessing is great for this sort of situation...
as long as the data is loaded before the fork, all the children can
read it fine (but be sure not to write to it - on write the page will
be copied for the child process leading to more memory use and changes
not visible between children).

Usually I put the variable to share in a module before calling pool... ie:

import mymodule # a blank module

mymodule.d = big_data_array

p = Pool(8)
p.map(function_which_does_something_to_mymodule.d, list_of_paraters)
p.close()

Cheers

Robin

On Mon, Nov 30, 2009 at 12:05 PM, Jose Gomez-Dans <jgomezdans at gmail.com> wrote:
> Hi!
> I want to run some code in parallel, and I have toyed with the idea of
> either using the multiprocessing module, or using ipython (which is quite
> easy to use). The main idea is to run a number of class methods in parallel
> (unsurprisingly!), fed with some arguments. However, these methods will need
> (read-)access to a rather large numpy array. Ideally (and since this is
> running on a SMP box), this could be a chunk of shared memory. I am aware of
> Sturla Molden's suggestion of using ctypes, but I guess that I was wondering
> whether some magic simple stuff is available off the shelf for this shared
> memory business?
>
> Thanks!
> J
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From sturla at molden.no  Mon Nov 30 08:28:56 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 30 Nov 2009 14:28:56 +0100
Subject: [SciPy-User] Parallel code
In-Reply-To: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>
References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>
Message-ID: <4B13C898.7030407@molden.no>

What do you mean by my suggestion using ctypes?

Why don't you use shared memory? Ga?l Varoquaux and I wrote a shared 
memory backend for ndarrays earlier this year.


Sturla


Jose Gomez-Dans skrev:
> Hi!
> I want to run some code in parallel, and I have toyed with the idea of 
> either using the multiprocessing module, or using ipython (which is 
> quite easy to use). The main idea is to run a number of class methods 
> in parallel (unsurprisingly!), fed with some arguments. However, these 
> methods will need (read-)access to a rather large numpy array. Ideally 
> (and since this is running on a SMP box), this could be a chunk of 
> shared memory. I am aware of Sturla Molden's suggestion of using 
> ctypes, but I guess that I was wondering whether some magic simple 
> stuff is available off the shelf for this shared memory business?
>
> Thanks!
> J
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


From sturla at molden.no  Mon Nov 30 08:37:41 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 30 Nov 2009 14:37:41 +0100
Subject: [SciPy-User] Parallel code
In-Reply-To: <2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com>
References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com>
	<2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com>
Message-ID: <4B13CAA5.7050102@molden.no>

Robin skrev:
> If it is read only and you are on a platform with fork (ie not
> windows) that multiprocessing is great for this sort of situation...
> as long as the data is loaded before the fork, all the children canread it fine 

On a system with a copy-on-write optimized os.fork (i.e. almost anything 
but Cygwin), no shared memory are needed for shared read-only access.

Anonymous shared memory (multiprocessing.Array) will work on Windows as 
well, as handles can be inherited. This must be instantiated prior to 
process creation.

Named shared memory can be used for read-write access to shared memory 
created before of after forking.


Sturla


From bsouthey at gmail.com  Mon Nov 30 16:05:03 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 30 Nov 2009 15:05:03 -0600
Subject: [SciPy-User] stats,
 classes instead of functions for results MovStats
In-Reply-To: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
Message-ID: <4B14337F.40208@gmail.com>

On 11/22/2009 11:43 PM, josef.pktd at gmail.com wrote:
> Following up on a question by Keith on the numpy list and his reminder
> that covariance can be calculated by the cross-product minus the
> product of the means, I redid and
> enhanced my moving stats functions.
>
> Suppose x and y are two time series, then the moving correlation
> requires the calculation of the mean, variance and covariance for each
> window. Currently in scipy stats intermediate results are usually
> thrown away on return (while rpy/R returns all intermediate results
> used for the calculation.
>
> Using a decorator/descriptor of Fernando written for nitime, I tried
> out to write the function as a class instead, so that any desired (
> intermediate) calculations are only made on demand, but once they are
> calculated they are attached to the class as attributes or properties.
> This seems to be a useful "pattern".
>
> Are there any opinion for using the pattern in scipy.stats ? MovStats
> will currently go into statsmodels
>
> Below is the class (with cutting part of init), a full script is the
> attachment, including examples that test the class.
>
> about MovStats:
> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
> axis=1, should (but may not yet) work for nd arrays along any axis
> (signal.correlate docstring)
> nans are handled by dropping the corresponding observations from the
> window, not adding any additional observations,
> not tested if a window is empty because it contains only nans, nor if
> variance is zero
> (kern is intended for weighted statistics in the window but not tested
> yet, I still need to decide on normalization requirements)
> requires scipy.signal, all calculations done with signal.correlate, no loops
> as often, functions are one-liners
> all results are returned for valid observations only, initial
> observations with incomplete window are cut
> bonus: slope of moving regression of y on x, since it was trivial to add
> still some cleaning and documentation to do
>
> usage:
> ms = MovStats(x, y, axis=1)
> ms.yvar
> ms.xmean
> ms.yxcorr
> ms.yxcov
> ...
>
>
> Josef
>
> class MovStats(object):
>      def __init__(self, y, x=None, kern=5, axis=0):
>          self.y = y
>          self.x = x
>          if np.isscalar(kern):
>              ws = kern
> <... snip>
>
>      @OneTimeProperty
>      def ymean(self):
>          ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice]
>          ym = ys/self.n
>          return ym
>
>      @OneTimeProperty
>      def yvar(self):
>          ys2 = signal.correlate(self.y*self.y, self.kern,
> mode='same')[self.sslice]
>          yvar = ys2/self.n  - self.ymean**2
>          return yvar
>
>      @OneTimeProperty
>      def xmean(self):
>          if self.x is None:
>              return None
>          else:
>              xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice]
>              xm = xs/self.n
>              return xm
>
>      @OneTimeProperty
>      def xvar(self):
>          if self.x is None:
>              return None
>          else:
>              xs2 = signal.correlate(self.x*self.x, self.kern,
> mode='same')[self.sslice]
>              xvar = xs2/self.n  - self.xmean**2
>              return xvar
>      @OneTimeProperty
>      def yxcov(self):
>          xys = signal.correlate(self.x*self.y, self.kern,
> mode='same')[self.sslice]
>          return xys/self.n - self.ymean*self.xmean
>
>      @OneTimeProperty
>      def yxcorr(self):
>          return self.yxcov/np.sqrt(self.yvar*self.xvar)
>
>      @OneTimeProperty
>      def yxslope(self):
>          return self.yxcov/self.xvar
>    
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>    
I think your handling of NaN's is incorrect because you do not drop the 
corresponding observations. That is for two arrays

y=np.array([[ 1.229563, -0.339428,  0.83891 ,  4.026574,  3.069378,  
5.95668 ]])
x=np.array([[-1.236469,  1.941089, -0.346566, -0.268529,     np.nan,  
0.191336]])

For a windows size of 5, in the first window, the first mean and 
variance of y should use all 5 elements of y, the mean and variance of X 
should use the first 4 elements of x and, the regression and correlation 
coefficients should use the first 4 elements of x and y.


Some other points:
1) Your calculation of variance is susceptible to errors, see
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Provided that you are using sufficient precision (like numpy defaults) 
it is probably not that big a problem.
2) You only use the 'full' windows so when the window width is 5, you 
miss the first two windows and the last 2 windows. At least the mean 
exists in these windows and the variance in most of these partial 
windows. This may provided unexpected results to a user if they do not 
release which windows are not returned.
3) I think the user needs to define the kern argument for your MovStat 
class as there is probably no meaningful default value (except 42).
4) I do not know how you should handle positive and negative infinity.
5) Your code expects at least 2 dimensions so 1-d arrays fail because 
you can not do this assignment 'kdim[axis] = ws' with 1-d arrays.

Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091130/64ff314a/attachment.html>

From josef.pktd at gmail.com  Mon Nov 30 16:43:55 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 30 Nov 2009 16:43:55 -0500
Subject: [SciPy-User] stats,
	classes instead of functions for results 	MovStats
In-Reply-To: <4B14337F.40208@gmail.com>
References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com>
	<4B14337F.40208@gmail.com>
Message-ID: <1cd32cbb0911301343r4859874r8059eb1ebee19b3c@mail.gmail.com>

On Mon, Nov 30, 2009 at 4:05 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/22/2009 11:43 PM, josef.pktd at gmail.com wrote:
>
> Following up on a question by Keith on the numpy list and his reminder
> that covariance can be calculated by the cross-product minus the
> product of the means, I redid and
> enhanced my moving stats functions.
>
> Suppose x and y are two time series, then the moving correlation
> requires the calculation of the mean, variance and covariance for each
> window. Currently in scipy stats intermediate results are usually
> thrown away on return (while rpy/R returns all intermediate results
> used for the calculation.
>
> Using a decorator/descriptor of Fernando written for nitime, I tried
> out to write the function as a class instead, so that any desired (
> intermediate) calculations are only made on demand, but once they are
> calculated they are attached to the class as attributes or properties.
> This seems to be a useful "pattern".
>
> Are there any opinion for using the pattern in scipy.stats ? MovStats
> will currently go into statsmodels
>
> Below is the class (with cutting part of init), a full script is the
> attachment, including examples that test the class.
>
> about MovStats:
> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
> axis=1, should (but may not yet) work for nd arrays along any axis
> (signal.correlate docstring)
> nans are handled by dropping the corresponding observations from the
> window, not adding any additional observations,
> not tested if a window is empty because it contains only nans, nor if
> variance is zero
> (kern is intended for weighted statistics in the window but not tested
> yet, I still need to decide on normalization requirements)
> requires scipy.signal, all calculations done with signal.correlate, no loops
> as often, functions are one-liners
> all results are returned for valid observations only, initial
> observations with incomplete window are cut
> bonus: slope of moving regression of y on x, since it was trivial to add
> still some cleaning and documentation to do
>
> usage:
> ms = MovStats(x, y, axis=1)
> ms.yvar
> ms.xmean
> ms.yxcorr
> ms.yxcov
> ...
>
>
> Josef
>
> class MovStats(object):
>     def __init__(self, y, x=None, kern=5, axis=0):
>         self.y = y
>         self.x = x
>         if np.isscalar(kern):
>             ws = kern
> <... snip>
>
>     @OneTimeProperty
>     def ymean(self):
>         ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice]
>         ym = ys/self.n
>         return ym
>
>     @OneTimeProperty
>     def yvar(self):
>         ys2 = signal.correlate(self.y*self.y, self.kern,
> mode='same')[self.sslice]
>         yvar = ys2/self.n  - self.ymean**2
>         return yvar
>
>     @OneTimeProperty
>     def xmean(self):
>         if self.x is None:
>             return None
>         else:
>             xs = signal.correlate(self.x, self.kern,
> mode='same')[self.sslice]
>             xm = xs/self.n
>             return xm
>
>     @OneTimeProperty
>     def xvar(self):
>         if self.x is None:
>             return None
>         else:
>             xs2 = signal.correlate(self.x*self.x, self.kern,
> mode='same')[self.sslice]
>             xvar = xs2/self.n  - self.xmean**2
>             return xvar
>     @OneTimeProperty
>     def yxcov(self):
>         xys = signal.correlate(self.x*self.y, self.kern,
> mode='same')[self.sslice]
>         return xys/self.n - self.ymean*self.xmean
>
>     @OneTimeProperty
>     def yxcorr(self):
>         return self.yxcov/np.sqrt(self.yvar*self.xvar)
>
>     @OneTimeProperty
>     def yxslope(self):
>         return self.yxcov/self.xvar
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Thanks for checking

> I think your handling of NaN's is incorrect because you do not drop the
> corresponding observations. That is for two arrays
>
> y=np.array([[ 1.229563, -0.339428,? 0.83891 ,? 4.026574,? 3.069378,? 5.95668
> ]])
> x=np.array([[-1.236469,? 1.941089, -0.346566, -0.268529,???? np.nan,
> 0.191336]])
>
> For a windows size of 5, in the first window, the first mean and variance of
> y should use all 5 elements of y, the mean and variance of X should use the
> first 4 elements of x and, the regression and correlation coefficients
> should use the first 4 elements of x and y.

What I do currently is a compromise, I don't want to calculate mean
and variance twice. So the behavior now is, if only one array is
given, then you get the mean and variance dropping the nan
observations for that array. If two arrays are given then I drop
observations in both arrays if either one has a nan. This way the user
can choose whether they want the separate calculation.
If a user provides two arrays, my assumption is that they want cov and corr.

>
>
> Some other points:
> 1) Your calculation of variance is susceptible to errors, see
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
> Provided that you are using sufficient precision (like numpy defaults) it is
> probably not that big a problem.

Yes, I'm aware of this, for the example the difference to np.corrcoeff
is around
1e-14. I will add a warning to the docstring that this is designed for speed
with some precision loss. I might be able to use some preprocessing to at least
treat some badly scaled data, but the usual higher numerical precision ways
of calculating would require much slower loops. For reasonably short windows
this seems to be an acceptable tradeoff.

> 2) You only use the 'full' windows so when the window width is 5, you miss
> the first two windows and the last 2 windows. At least the mean exists in
> these windows and the variance in most of these partial windows. This may
> provided unexpected results to a user if they do not release which windows
> are not returned.

Currently I'm returning "valid" observations, that have a full window.
In a previous
version I allowed for a lag, lead, centered option, Keith returns nans, I think
scikits timeseries masks. I don't know yet which or whether these options
should be included in  this function (class).

> 3) I think the user needs to define the kern argument for your MovStat class
> as there is probably no meaningful default value (except 42).

at least the window length should be specified by the user. I picked 5 for
business week mostly arbitrary to reduce typing (?)
In a slightly updated version I switched to convolution instead of correlation
to have the correct orientation for e.g. exponential weights. But I haven't
tested this yet

> 4) I do not know how you should handle positive and negative infinity.

I haven't thought about this, but since most of the time I consider inf as
a valid number, I think, it will return infs in the corresponding windows.
With a masked array, infs could be masked, but for regular arrays, I don't
want to convert infs to nans.
However, I still have to figure out some corner cases, e.g. no valid
observations
in a window, windows with zero variance. it would also be possible to
require a minimum of valid observations per window.


> 5) Your code expects at least 2 dimensions so 1-d arrays fail because you
> can not do this assignment 'kdim[axis] = ws' with 1-d arrays.

Thanks, I have tested only the 2d case. I guess I have to review
general axis handling again

Josef

>
> Bruce
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


From Scott.Askey at afit.edu  Mon Nov 30 16:52:32 2009
From: Scott.Askey at afit.edu (Askey, Scott A Capt USAF AETC AFIT/ENY)
Date: Mon, 30 Nov 2009 16:52:32 -0500
Subject: [SciPy-User] "profiling" a function
References: <mailman.5.1259604003.11669.scipy-user@scipy.org>
Message-ID: <792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu>


What are the tools available in Scipy for evaluating the (computational) cost of a function call?   

 
In particular I am solving nonlinear systems (with fsolve) and considering exact versus approximate Jacobians
and trig functions versus their approximations.

V/R

Scott  

 
-----Original Message-----
From: scipy-user-bounces at scipy.org on behalf of scipy-user-request at scipy.org
Sent: Mon 11/30/2009 1:00 PM
To: scipy-user at scipy.org
Subject: SciPy-User Digest, Vol 75, Issue 70
 
Send SciPy-User mailing list submissions to
	scipy-user at scipy.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.scipy.org/mailman/listinfo/scipy-user
or, via email, send a message with subject or body 'help' to
	scipy-user-request at scipy.org

You can reach the person managing the list at
	scipy-user-owner at scipy.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of SciPy-User digest..."


Today's Topics:

   1. chebfun (josef.pktd at gmail.com)
   2. ANN: visvis (Almar Klein)
   3. Parallel code (Jose Gomez-Dans)
   4. Re: Parallel code (Robin)
   5. Re: Parallel code (Sturla Molden)
   6. Re: Parallel code (Sturla Molden)


----------------------------------------------------------------------

Message: 1
Date: Sun, 29 Nov 2009 20:42:08 -0500
From: josef.pktd at gmail.com
Subject: [SciPy-User] chebfun
To: SciPy Users List <scipy-user at scipy.org>
Message-ID:
	<1cd32cbb0911291742m1d3f3ab8r886009c7fba3cab9 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

I just came by chance across this (for matlab)

http://www2.maths.ox.ac.uk/chebfun/index.html

http://www2.maths.ox.ac.uk/chebfun/license.html

The documentation looks helpful for someone like me who doesn't know
enough about what Chebyshev polynomials are good for.

Josef


------------------------------

Message: 2
Date: Mon, 30 Nov 2009 10:05:23 +0100
From: Almar Klein <almar.klein at gmail.com>
Subject: [SciPy-User] ANN: visvis
To: SciPy Users List <scipy-user at scipy.org>
Message-ID:
	<cc38d75f0911300105x5e7b8754gf0c6b21547de95d1 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi all,

I am pleased to announce the first release of visvis, a Python
visualization library for of 1D to 4D data.

Website: http://code.google.com/p/visvis/
Discussion group: http://groups.google.com/group/visvis/

Since this is the first release, it hasn't been tested on a large
scale yet. Therefore I'm specifically interested to know whether it
works for everyone.

=== Description ===

Visvis is a pure Python visualization library that uses OpenGl to
display 1D to 4D data; it can be used from simple plotting tasks to
rendering 3D volumetric data that moves in time.

Visvis can be used in Python scripts, interactive Python sessions (as
with IPython or IEP) and can be embedded in applications.

Visvis employs an object oriented structure; each object being
visualized (e.g. a line or a texture) has various properties that can
be modified to change its behaviour or appearance. A Matlab-like
interface in the form of a set of functions allows easy creation of
these objects (e.g. plot(), imshow(), volshow()).

Regards,
  Almar


------------------------------

Message: 3
Date: Mon, 30 Nov 2009 12:05:57 +0000
From: Jose Gomez-Dans <jgomezdans at gmail.com>
Subject: [SciPy-User] Parallel code
To: SciPy Users List <scipy-user at scipy.org>
Message-ID:
	<91d218430911300405w34759867x7145cab01938d6bd at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi!
I want to run some code in parallel, and I have toyed with the idea of
either using the multiprocessing module, or using ipython (which is quite
easy to use). The main idea is to run a number of class methods in parallel
(unsurprisingly!), fed with some arguments. However, these methods will need
(read-)access to a rather large numpy array. Ideally (and since this is
running on a SMP box), this could be a chunk of shared memory. I am aware of
Sturla Molden's suggestion of using ctypes, but I guess that I was wondering
whether some magic simple stuff is available off the shelf for this shared
memory business?

Thanks!
J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20091130/38449a81/attachment-0001.html 

------------------------------

Message: 4
Date: Mon, 30 Nov 2009 12:39:41 +0000
From: Robin <robince at gmail.com>
Subject: Re: [SciPy-User] Parallel code
To: SciPy Users List <scipy-user at scipy.org>
Message-ID:
	<2d5132a50911300439g2d406ee7wb244217f39a30e16 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

If it is read only and you are on a platform with fork (ie not
windows) that multiprocessing is great for this sort of situation...
as long as the data is loaded before the fork, all the children can
read it fine (but be sure not to write to it - on write the page will
be copied for the child process leading to more memory use and changes
not visible between children).

Usually I put the variable to share in a module before calling pool... ie:

import mymodule # a blank module

mymodule.d = big_data_array

p = Pool(8)
p.map(function_which_does_something_to_mymodule.d, list_of_paraters)
p.close()

Cheers

Robin

On Mon, Nov 30, 2009 at 12:05 PM, Jose Gomez-Dans <jgomezdans at gmail.com> wrote:
> Hi!
> I want to run some code in parallel, and I have toyed with the idea of
> either using the multiprocessing module, or using ipython (which is quite
> easy to use). The main idea is to run a number of class methods in parallel
> (unsurprisingly!), fed with some arguments. However, these methods will need
> (read-)access to a rather large numpy array. Ideally (and since this is
> running on a SMP box), this could be a chunk of shared memory. I am aware of
> Sturla Molden's suggestion of using ctypes, but I guess that I was wondering
> whether some magic simple stuff is available off the shelf for this shared
> memory business?
>
> Thanks!
> J
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


------------------------------

Message: 5
Date: Mon, 30 Nov 2009 14:28:56 +0100
From: Sturla Molden <sturla at molden.no>
Subject: Re: [SciPy-User] Parallel code
To: SciPy Users List <scipy-user at scipy.org>
Message-ID: <4B13C898.7030407 at molden.no>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

What do you mean by my suggestion using ctypes?

Why don't you use shared memory? Ga?l Varoquaux and I wrote a shared 
memory backend for ndarrays earlier this year.


Sturla


Jose Gomez-Dans skrev:
> Hi!
> I want to run some code in parallel, and I have toyed with the idea of 
> either using the multiprocessing module, or using ipython (which is 
> quite easy to use). The main idea is to run a number of class methods 
> in parallel (unsurprisingly!), fed with some arguments. However, these 
> methods will need (read-)access to a rather large numpy array. Ideally 
> (and since this is running on a SMP box), this could be a chunk of 
> shared memory. I am aware of Sturla Molden's suggestion of using 
> ctypes, but I guess that I was wondering whether some magic simple 
> stuff is available off the shelf for this shared memory business?
>
> Thanks!
> J
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   


------------------------------

Message: 6
Date: Mon, 30 Nov 2009 14:37:41 +0100
From: Sturla Molden <sturla at molden.no>
Subject: Re: [SciPy-User] Parallel code
To: SciPy Users List <scipy-user at scipy.org>
Message-ID: <4B13CAA5.7050102 at molden.no>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Robin skrev:
> If it is read only and you are on a platform with fork (ie not
> windows) that multiprocessing is great for this sort of situation...
> as long as the data is loaded before the fork, all the children canread it fine 

On a system with a copy-on-write optimized os.fork (i.e. almost anything 
but Cygwin), no shared memory are needed for shared read-only access.

Anonymous shared memory (multiprocessing.Array) will work on Windows as 
well, as handles can be inherited. This must be instantiated prior to 
process creation.

Named shared memory can be used for read-write access to shared memory 
created before of after forking.


Sturla


------------------------------

_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


End of SciPy-User Digest, Vol 75, Issue 70
******************************************


From dwf at cs.toronto.edu  Mon Nov 30 17:45:03 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Mon, 30 Nov 2009 17:45:03 -0500
Subject: [SciPy-User] "profiling" a function
In-Reply-To: <792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu>
References: <mailman.5.1259604003.11669.scipy-user@scipy.org>
	<792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu>
Message-ID: <66D17E78-C2BB-420E-BF6C-78C70DA3EBBB@cs.toronto.edu>

On 30-Nov-09, at 4:52 PM, Askey, Scott A Capt USAF AETC AFIT/ENY wrote:

>
> What are the tools available in Scipy for evaluating the  
> (computational) cost of a function call?
>
>
>
> In particular I am solving nonlinear systems (with fsolve) and  
> considering exact versus approximate Jacobians
> and trig functions versus their approximations.


Nothing in SciPy itself, but Python contains the cProfile module, as  
well as hotshot.

There's also Robert Kern's line_profiler:

	 http://packages.python.org/line_profiler/

which is rather handy.

If you'd just like to time things, the 'timeit' module is good, as  
well as IPython's shortcuts for it (i.e. %timeit -n 3 my_call())

David


From Chris.Barker at noaa.gov  Mon Nov 30 18:58:29 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 30 Nov 2009 15:58:29 -0800
Subject: [SciPy-User] scikits.timeseries question
Message-ID: <4B145C25.7040303@noaa.gov>

HI all,

Maybe I'm missing something, but I can't seem to get this to work as I'd 
like.

I have a bunch of data that is indexed by "day since Jan 1, 2001". It 
seemed I should be able to do a DateArray like this:

In [40]: import scikits.timeseries as ts

In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1)

In [42]: sd
Out[42]: <D : 01-Jan-2001>

In [43]: da = ts.date_array((1,2,3,4), start_date=sd)

In [44]: da
Out[44]:
DateArray([1, 2, 3, 4],
           freq='U')

but it looks like it didn't get the frequency ffomr teh start date, so I 
did:

In [46]: da = ts.date_array((1,2,3,4), start_date=sd, freq='D')

In [47]: da
Out[47]:
DateArray([01-Jan-0001, 02-Jan-0001, 03-Jan-0001, 04-Jan-0001],
           freq='D')


Now it's got the frequency, but it's using  year 0001, instead of 2001, 
which is the same as I get if I don't use a start_date at all.

What am I missing?

In [50]: ts.__version__
Out[50]: '0.91.3'


-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From pgmdevlist at gmail.com  Mon Nov 30 19:12:52 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 30 Nov 2009 19:12:52 -0500
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <4B145C25.7040303@noaa.gov>
References: <4B145C25.7040303@noaa.gov>
Message-ID: <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>

On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote:
> HI all,
> 
> Maybe I'm missing something, but I can't seem to get this to work as I'd 
> like.

I guess you're confusing DateArrays and TimeSeries. DateArrays are just arrays of dates (think a ndarray of datetime objects, or a ndarray with a datetime64 dtype). TimeSeries are like MaskedArrays, the combination of a ndarray of values with 2 others ndarrays: one array of booleans (the mask), one DateArray.


> I have a bunch of data that is indexed by "day since Jan 1, 2001". It 
> seemed I should be able to do a DateArray like this:
> 
> In [40]: import scikits.timeseries as ts
> 
> In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1)
> 
> In [42]: sd
> Out[42]: <D : 01-Jan-2001>

All is well here.

> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)

Check the doc for date_array: the first argument can be
        * an existing :class:`DateArray` object;
        * a sequence of :class:`Date` objects with the same frequency;
        * a sequence of :class:`datetime.datetime` objects;
        * a sequence of dates in string format;
        * a sequence of integers corresponding to the representation of 
          :class:`Date` objects.

So, what you're trying to do is to build a an array of four dates (1,2,3,4)
Instead, use that:

>>> ts.time_series((1,2,3,4),start_date=sd) 
timeseries([1 2 3 4],
   dates = [01-Jan-2001 ... 04-Jan-2001],
   freq  = D)

If you think the doc is confusing to that respect, please let me know how to improve it.
And of course, don't hesitate to contact me if you need further info
P.

From Chris.Barker at noaa.gov  Mon Nov 30 19:23:52 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 30 Nov 2009 16:23:52 -0800
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
Message-ID: <4B146218.9000305@noaa.gov>

Pierre GM wrote:
> On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote:
> I guess you're confusing DateArrays and TimeSeries.

> DateArrays are just arrays of dates (think a ndarray of datetime
 > objects, or a ndarray with a datetime64 dtype). TimeSeries are like
 > MaskedArrays, the combination of a ndarray of values with 2 others
 > ndarrays: one array of booleans (the mask), one DateArray.

Actually, I think I got that.

>> In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1)
>>
>> In [42]: sd
>> Out[42]: <D : 01-Jan-2001>
> 
> All is well here.

yup.

>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
> 
> Check the doc for date_array: the first argument can be

...

>         * a sequence of integers corresponding to the representation of 
>           :class:`Date` objects.

That's what I'm trying to give it.

> So, what you're trying to do is to build a an array of four dates (1,2,3,4)
> Instead, use that:
> 
>>>> ts.time_series((1,2,3,4),start_date=sd) 
> timeseries([1 2 3 4],
>    dates = [01-Jan-2001 ... 04-Jan-2001],
>    freq  = D)

Ah, but what I am trying to do is build that "dates" array -- in teh 
real case, I have 1212 pieces of data, associated with time, in terms of 
"days since Jan 1, 2001). So I need to construct that dates array to 
associate with the time_series data.

So I want:

dates = what_to_put_here?

ts.time_series(an_array_of_data,
      start_date=sd)
      timeseries([1 2 3 4],
      dates = dates],
      freq  = D)

While I'm at it -- what I really have is a big 'ol 3-d array, which is 
gridded model output, of shape: (time, lat, lon). Time is expressed in 
days since...

I need to do a moving average of the while grid over time. Can a 
time_series be n-d, with time as one of the axis?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From pgmdevlist at gmail.com  Mon Nov 30 19:49:35 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 30 Nov 2009 19:49:35 -0500
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <4B146218.9000305@noaa.gov>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
Message-ID: <F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>

On Nov 30, 2009, at 7:23 PM, Christopher Barker wrote:
> Pierre GM wrote:
> ...
> 
>>        * a sequence of integers corresponding to the representation of 
>>          :class:`Date` objects.
> 
> That's what I'm trying to give it.

Ah OK. Well, the answer is: that depends. iIf you know that your dates are just in daily increments from 2001-01-01 (like a range), then just use start_date and length.

If you may have several duplicated dates (like 2001-01-01, 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably:
>>> da = ts.date_array(np.array(0,1,1,2)+sd)

np.array(...) + sd gives you a ndarray of Date objects (so its dtype is np.object), and you use that as the input of date_array. The frequency should be recognized properly.

Note that if 1 in your data set means '2001-01-01', then use (sd-1) instead, but you would have guessed that.

> While I'm at it -- what I really have is a big 'ol 3-d array, which is 
> gridded model output, of shape: (time, lat, lon). Time is expressed in 
> days since...
> 
> I need to do a moving average of the while grid over time. Can a 
> time_serie be n-d, with time as one of the axis?


Well, I never tried so I can tell you. Check wheter lib.moving_funcs supports 2D data. If not, not a big deal: just fill the missing dates (so that you have a regular-spaced series with masked elements for missing dates), and use whatever moving average you need on the .series attribute (which is just a MaskedArray). Or fill this .series with np.nans if your averaging function accepts floats but no missing values...

Let me know how it goes
P.


From ferrell at diablotech.com  Mon Nov 30 19:53:56 2009
From: ferrell at diablotech.com (Robert Ferrell)
Date: Mon, 30 Nov 2009 17:53:56 -0700
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <4B146218.9000305@noaa.gov>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
Message-ID: <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com>


On Nov 30, 2009, at 5:23 PM, Christopher Barker wrote:

> Pierre GM wrote:
>> On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote:
>> I guess you're confusing DateArrays and TimeSeries.
>
>> DateArrays are just arrays of dates (think a ndarray of datetime
>> objects, or a ndarray with a datetime64 dtype). TimeSeries are like
>> MaskedArrays, the combination of a ndarray of values with 2 others
>> ndarrays: one array of booleans (the mask), one DateArray.
>
> Actually, I think I got that.
>
>>> In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1)
>>>
>>> In [42]: sd
>>> Out[42]: <D : 01-Jan-2001>
>>
>> All is well here.
>
> yup.
>
>>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
>>
>> Check the doc for date_array: the first argument can be
>
> ...
>
>>        * a sequence of integers corresponding to the representation  
>> of
>>          :class:`Date` objects.
>
> That's what I'm trying to give it.
>
>> So, what you're trying to do is to build a an array of four dates  
>> (1,2,3,4)
>> Instead, use that:
>>
>>>>> ts.time_series((1,2,3,4),start_date=sd)
>> timeseries([1 2 3 4],
>>   dates = [01-Jan-2001 ... 04-Jan-2001],
>>   freq  = D)
>
> Ah, but what I am trying to do is build that "dates" array -- in teh
> real case, I have 1212 pieces of data, associated with time, in  
> terms of
> "days since Jan 1, 2001). So I need to construct that dates array to
> associate with the time_series data.
>
> So I want:
>
> dates = what_to_put_here?

I may be misunderstanding what you are trying to do, but here's what I  
do:

In [68]: sd = ts.Date('d', '2001-01-01')

In [69]: dates = ts.date_array(cumsum(ones(4)) + sd)

In [70]: dates
Out[70]:
DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001],
           freq='D')


If the dates aren't consecutive, you can always just use the known  
offsets:

In [73]: days_since_beginning = array([1, 3, 4, 8])

In [74]: dates = ts.date_array(days_since_beginning + sd)

In [75]: dates
Out[75]:
DateArray([02-Jan-2001, 04-Jan-2001, 05-Jan-2001, 09-Jan-2001],
           freq='D')


There's probably an easier way...

If this happens to be what you are trying to do, be careful of the  
counting of days (0 based, vs 1 based).

-robert


>
> ts.time_series(an_array_of_data,
>      start_date=sd)
>      timeseries([1 2 3 4],
>      dates = dates],
>      freq  = D)
>
> While I'm at it -- what I really have is a big 'ol 3-d array, which is
> gridded model output, of shape: (time, lat, lon). Time is expressed in
> days since...
>
> I need to do a moving average of the while grid over time. Can a
> time_series be n-d, with time as one of the axis?
>
> -Chris
>
>
>
> -- 
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


From pgmdevlist at gmail.com  Mon Nov 30 20:06:44 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 30 Nov 2009 20:06:44 -0500
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com>
Message-ID: <55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com>

On Nov 30, 2009, at 7:53 PM, Robert Ferrell wrote:
> 
> I may be misunderstanding what you are trying to do, but here's what I  
> do:
> 
> In [68]: sd = ts.Date('d', '2001-01-01')
> 
> In [69]: dates = ts.date_array(cumsum(ones(4)) + sd)
> 
> In [70]: dates
> Out[70]:
> DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001],
>           freq='D')

The cumsum approach works only if you have irregular time steps as inputs (as in 1 day after the first, 1 day after that, 3 days after that...). If you have regular time steps of 1, just use arange+start_date (or even just length+start_date)


From Chris.Barker at noaa.gov  Mon Nov 30 20:16:41 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 30 Nov 2009 17:16:41 -0800
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
Message-ID: <4B146E79.7090407@noaa.gov>

Pierre GM wrote:

> Ah OK. Well, the answer is: that depends. iIf you know that your
> dates are just in daily increments from 2001-01-01 (like a range),
> then just use start_date and length.

right -- but I don't know that.

> If you may have several duplicated dates (like 2001-01-01,
> 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably:
> 
>>>> da = ts.date_array(np.array(0,1,1,2)+sd)

nope -- not duplicated, but maybe there are missing ones. The point is 
that I have an array of "days since", and I want array of 
timeseries.dates (which is a DateArray, yes?)

> np.array(...) + sd gives you a ndarray of Date objects (so its dtype
> is np.object), and you use that as the input of date_array. The
> frequency should be recognized properly.

OK -- though it seems I SHOULD be able to go straight to an DateArray, 
and I'm still confused about what this means:

>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
> 
> Check the doc for date_array: the first argument can be
>         * an existing :class:`DateArray` object;
>         * a sequence of :class:`Date` objects with the same frequency;
>         * a sequence of :class:`datetime.datetime` objects;
>         * a sequence of dates in string format;
>         * a sequence of integers corresponding to the representation of 
>           :class:`Date` objects.

That's what I have: a sequence of integers corresponding to the 
representation of the Date objects (doesn't it represent them as "units 
since start date" where units is the "freq" ?

If that's not what if means, then what does it mean?


Robert Ferrell wrote:

> If this happens to be what you are trying to do, be careful of the  
> counting of days (0 based, vs 1 based).

yup -- thanks for the reminder.

>> I need to do a moving average of the while grid over time. Can a
>> time_series be n-d, with time as one of the axis?

> Well, I never tried so I can tell you. Check wheter lib.moving_funcs
> supports 2D data.

hmm -- I see this:

Definition:
        ts_lib.mov_average(data, span, dtype=None)
Docstring:
     Calculates the moving average of a series.

Parameters
     ----------
     data : array-like
         Input data, as a sequence or (subclass of) ndarray.
         Masked arrays and TimeSeries objects are also accepted.
         The input array should be 1D or 2D at most.
         If the input array is 2D, the function is applied on each
         column.

I've got a 3-d array -- darn! Maybe I'll poke into it and see if it can 
be generalized.

-CHB


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From pgmdevlist at gmail.com  Mon Nov 30 20:39:39 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 30 Nov 2009 20:39:39 -0500
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <4B146E79.7090407@noaa.gov>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
	<4B146E79.7090407@noaa.gov>
Message-ID: <3EB215DA-3808-42CE-B2E0-6568B6B40C37@gmail.com>

On Nov 30, 2009, at 8:16 PM, Christopher Barker wrote:

> nope -- not duplicated, but maybe there are missing ones. The point is 
> that I have an array of "days since", and I want array of 
> timeseries.dates (which is a DateArray, yes?)

Got it. Duplicated and/or missing dates correspond to the same problem: you can't assume that your dates are regularly spaced, so you can't use start_date and length.

>> np.array(...) + sd gives you a ndarray of Date objects (so its dtype
>> is np.object), and you use that as the input of date_array. The
>> frequency should be recognized properly.
> 
> OK -- though it seems I SHOULD be able to go straight to an DateArray, 
> and I'm still confused about what this means:

Well, that depends on the type of starting date, actually. If it's a Date, adding a ndarray to it will give you a  ndarray of Date objects. If it's a DateArray of length 1, it'll give you a DateArray. (Note to self: we could probably be a bit more consistent on this one...)


>>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
>> 
>> Check the doc for date_array: the first argument can be
>>        * an existing :class:`DateArray` object;
>>        * a sequence of :class:`Date` objects with the same frequency;
>>        * a sequence of :class:`datetime.datetime` objects;
>>        * a sequence of dates in string format;
>>        * a sequence of integers corresponding to the representation of 
>>          :class:`Date` objects.
> 
> That's what I have: a sequence of integers corresponding to the 
> representation of the Date objects (doesn't it represent them as "units 
> since start date" where units is the "freq" ?

No, not exactly: the representation of a Date objects is relative to an absolute build-in reference (Day #1 being 01/01/01). (Likewise,  nump.datetime64 uses the standard 1970/01/01). 
We can't have a variable reference as it would be far too messy too quickly. Instead, you have to use the trick start_date + ndarray of integers to get what you want.

> If that's not what if means, then what does it mean?

If you have a 'A' frequency, that'd be a sequence like 2001, 2002, ...
For a 'M' frequency, that'd be 24001 (for 2001/01), 24002 (for 2001/02)...
For a 'D' frequency, that'd be 730486, 730487... for 2001/01/01, 2001/01/02...
In other terms, the nb of units since the absolute reference.
> 
> hmm -- I see this:
> 
> Definition:
>        ts_lib.mov_average(data, span, dtype=None)
> Docstring:
>     Calculates the moving average of a series.
> 
> Parameters
>     ----------
>     data : array-like
>         Input data, as a sequence or (subclass of) ndarray.
>         Masked arrays and TimeSeries objects are also accepted.
>         The input array should be 1D or 2D at most.
>         If the input array is 2D, the function is applied on each
>         column.
> 
> I've got a 3-d array -- darn! Maybe I'll poke into it and see if it can 
> be generalized.


3D ? What are your actual variables ? Keep in mind that when we talk about dimensions with time series, we zap the time one, so if you have a series of maps, your array is only 2D in our terminology. 
If you have a time series of (lat, lon), mov_average will average your lats independently of your lons


From ferrell at diablotech.com  Mon Nov 30 21:59:32 2009
From: ferrell at diablotech.com (Robert Ferrell)
Date: Mon, 30 Nov 2009 19:59:32 -0700
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <4B146E79.7090407@noaa.gov>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
	<4B146E79.7090407@noaa.gov>
Message-ID: <2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com>


On Nov 30, 2009, at 6:16 PM, Christopher Barker wrote:

> Pierre GM wrote:
>
>> Ah OK. Well, the answer is: that depends. iIf you know that your
>> dates are just in daily increments from 2001-01-01 (like a range),
>> then just use start_date and length.
>
> right -- but I don't know that.
>
>> If you may have several duplicated dates (like 2001-01-01,
>> 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably:
>>
>>>>> da = ts.date_array(np.array(0,1,1,2)+sd)
>
> nope -- not duplicated, but maybe there are missing ones. The point is
> that I have an array of "days since", and I want array of
> timeseries.dates (which is a DateArray, yes?)

I don't think so.  An array of dates is not a DateArray.

In [98]: sd = ts.Date('d', '2001-01-01')

In [99]: zeros(4) + sd
Out[99]: array([01-Jan-2001, 01-Jan-2001, 01-Jan-2001, 01-Jan-2001],  
dtype=object)

This seems natural to me, (array + Date = array) although I do have to  
include an extra line sometimes to get a DateArray if I need it.  If I  
need a timeseries, sometimes I can skip making the DateArray explicitly.

In [109]: a = arange(4) + sd

In [110]: a
Out[110]: array([01-Jan-2001, 02-Jan-2001, 03-Jan-2001, 04-Jan-2001],  
dtype=object)

In [111]: ts.time_series([1,2,3,4], dates=a)
Out[111]:
timeseries([1 2 3 4],
            dates = [01-Jan-2001 ... 04-Jan-2001],
            freq  = D)

>
>> np.array(...) + sd gives you a ndarray of Date objects (so its dtype
>> is np.object), and you use that as the input of date_array. The
>> frequency should be recognized properly.
>
> OK -- though it seems I SHOULD be able to go straight to an DateArray,

Is the issue that sd is a Date and not a DateArray?  You can always  
make a DataArray with sd, of the correct length, and then add to that:

In [83]: sd = ts.Date('d', '2001-01-01')

In [84]: d1 = ts.date_array(zeros(4) + sd)

In [85]: d1
Out[85]:
DateArray([01-Jan-2001, 01-Jan-2001, 01-Jan-2001, 01-Jan-2001],
           freq='D')

In [86]: d1 + array([0,2,3,5])
Out[86]:
DateArray([01-Jan-2001, 03-Jan-2001, 04-Jan-2001, 06-Jan-2001],
           freq='D')


I'm probably telling you things that are obvious and are not  
addressing your question.

> and I'm still confused about what this means:
>
>>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)

This throws an exception for me.

<type 'exceptions.ValueError'>: year=1 is before 1900; the datetime  
strftime() methods require year >= 1900

-robert


From ferrell at diablotech.com  Mon Nov 30 22:15:29 2009
From: ferrell at diablotech.com (Robert Ferrell)
Date: Mon, 30 Nov 2009 20:15:29 -0700
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com>
	<55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com>
Message-ID: <E7F0A296-1DED-40D4-A29D-5434DE895D28@diablotech.com>


On Nov 30, 2009, at 6:06 PM, Pierre GM wrote:

> On Nov 30, 2009, at 7:53 PM, Robert Ferrell wrote:
>>
>> I may be misunderstanding what you are trying to do, but here's  
>> what I
>> do:
>>
>> In [68]: sd = ts.Date('d', '2001-01-01')
>>
>> In [69]: dates = ts.date_array(cumsum(ones(4)) + sd)
>>
>> In [70]: dates
>> Out[70]:
>> DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001],
>>          freq='D')
>
> The cumsum approach works only if you have irregular time steps as  
> inputs (as in 1 day after the first, 1 day after that, 3 days after  
> that...). If you have regular time steps of 1, just use arange 
> +start_date (or even just length+start_date)

Sort of.  The cumsum approach works even if the intervals are uniform,  
of course, but it may be overkill and arange may be sufficient.

In any case, I get the impression that the OP has an array of integer  
offsets generated in some other fashion entirely.


From pgmdevlist at gmail.com  Mon Nov 30 23:03:22 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 30 Nov 2009 23:03:22 -0500
Subject: [SciPy-User] scikits.timeseries question
In-Reply-To: <2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com>
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
	<4B146E79.7090407@noaa.gov>
	<2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com>
Message-ID: <17E9CA7E-7446-4202-997C-9AB5081977C0@gmail.com>

On Nov 30, 2009, at 9:59 PM, Robert Ferrell wrote:

> This seems natural to me, (array + Date = array) although I do have to  
> include an extra line sometimes to get a DateArray if I need it.  If I  
> need a timeseries, sometimes I can skip making the DateArray explicitly.

Well, keep in mind that Date was implemented a few years ago already, far before the new datetime64 dtype, and it was the easiest way we had to define a new datatype (well, a kind of datatype). I'll check how we can merge the two approaches when I'll have some time.
Anyhow, in practice, a Date object will be seen as a np.object by numpy, and you end up having a ndarray with a np.object dtype.

> Is the issue that sd is a Date and not a DateArray?  You can always  
> make a DataArray with sd, of the correct length, and then add to that:
> 
> In [83]: sd = ts.Date('d', '2001-01-01')
> 
> In [84]: d1 = ts.date_array(zeros(4) + sd)

Wow, that's overkill ! Just make sd a DateArray:
>>> np.arange(4) + ts.DateArray(sd)

Now, because DateArray is a subclass of ndarray with a higher priority, its _add__ method takes over and the ouput is a DateArray.

> 
>> and I'm still confused about what this means:
>> 
>>>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
> 
> This throws an exception for me.
> 
> <type 'exceptions.ValueError'>: year=1 is before 1900; the datetime  
> strftime() methods require year >= 1900


What version are you using ? And anyway, you get the exception only if you try to print it (as strftime is called only when calling repr/str)

From mattknox.ca at gmail.com  Mon Nov 30 23:13:55 2009
From: mattknox.ca at gmail.com (Matt Knox)
Date: Tue, 1 Dec 2009 04:13:55 +0000 (UTC)
Subject: [SciPy-User] scikits.timeseries question
References: <4B145C25.7040303@noaa.gov>
	<9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com>
	<4B146218.9000305@noaa.gov>
	<F8C975F3-7168-4A63-B4E5-0B043474D98D@gmail.com>
	<4B146E79.7090407@noaa.gov>
Message-ID: <loom.20091201T050620-830@post.gmane.org>

Christopher Barker <Chris.Barker <at> noaa.gov> writes:
> >> In [43]: da = ts.date_array((1,2,3,4), start_date=sd)
> > 
> > Check the doc for date_array: the first argument can be
> >         * an existing :class:`DateArray` object;
> >         * a sequence of :class:`Date` objects with the same frequency;
> >         * a sequence of :class:`datetime.datetime` objects;
> >         * a sequence of dates in string format;
> >         * a sequence of integers corresponding to the representation of 
> >           :class:`Date` objects.
> 
> That's what I have: a sequence of integers corresponding to the 
> representation of the Date objects (doesn't it represent them as "units 
> since start date" where units is the "freq" ?
> 
> If that's not what if means, then what does it mean?

I agree the documentation is perhaps a bit confusing here. The sequence of
integers being referred to are the internal representation of the Date objects
(eg. ts.now('d').value) which is absolute, not relative (not relative to a
custom start date anyway). Another thing you are missing is that the first
argument (dlist) is not supposed to be used in conjunction with the start_date
parameter.

There are a couple ways to call date_array:

1. using the `dlist` argument, possibly in combination with the `freq` argument
   if freq is not implicit with the dlist being passed.

2. Using the `start_date` parameter in combination with either the `length` or
   `end_date` parameter. This option would only be used for a continuous time
   series (ie. no missing or duplicated dates)

Whether this is a good api is probably debateable, but that is how it works
currently.

In addition to the methods described by Pierre and Robert, you could also do:
    
>>> sd = ts.now('d')
>>> relative_days = np.array([1,5,8])
>>> absolute_days = relative_days + sd.value
>>> darray = ts.date_array(absolute_days, freq = sd.freq)

which I think probably has the lowest overhead (but don't hold me to that :) )
if that matters for your application.

- Matt