From simpson at math.toronto.edu Sun Feb 1 00:37:48 2009 From: simpson at math.toronto.edu (Gideon Simpson) Date: Sun, 1 Feb 2009 00:37:48 -0500 Subject: [SciPy-user] shared memory machines Message-ID: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> Has anyone been able to take advantage of shared memory machines with scipy? How did you do it? -gideon From karl.young at ucsf.edu Sun Feb 1 00:45:59 2009 From: karl.young at ucsf.edu (Young, Karl) Date: Sat, 31 Jan 2009 21:45:59 -0800 Subject: [SciPy-user] Automating Matlab References: <4984F58C.5070605@gmail.com> <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> Message-ID: <9D202D4E86A4BF47BA6943ABDF21BE78058FAB62@EXVS06.net.ucsf.edu> >> Is there strong interest in automating matlab to numpy conversion? > Yes! Please post your code somewhere! seconded !!!!! I'm currently working on a grant that has turned out to involve porting a lot of matlab code to python; you will be gratefully acknowledged in whatever comes of the work of the grant. -- KY From gokhansever at gmail.com Sun Feb 1 02:49:32 2009 From: gokhansever at gmail.com (gsever) Date: Sat, 31 Jan 2009 23:49:32 -0800 (PST) Subject: [SciPy-user] Automating Matlab In-Reply-To: <4984F58C.5070605@gmail.com> References: <4984F58C.5070605@gmail.com> Message-ID: <27b4bf7a-bb75-457d-8ed0-eec3465b92f1@t13g2000yqc.googlegroups.com> I am interested with this project, too. Would be much better to have an automated tool than doing manual conversations. Just for your information, there is a IDL-to-Python conversation tool named i2py @ http://code.google.com/p/i2py/ On Jan 31, 7:06?pm, Eric Schug wrote: > Is there strong interest in automating matlab to numpy conversion? > > I have a working version of a matlab to python translator. > It allows translation of matlab scripts into numpy constructs, > supporting most of the matlab language. ?The parser is nearly complete. ? > Most of the remaining work involves providing a robust translation. Such as > ? ? * making sure that copies on assign are done when needed. > ? ? * correct indexing a(:) becomes a.flatten(1) when on the left hand > side (lhs) of equals > ? ? ? ?and a[:] when on the right hand side > > I've seen a few projects attempt to do this, but for one reason or > another have stopped it. > > _______________________________________________ > SciPy-user mailing list > SciPy-u... at scipy.orghttp://projects.scipy.org/mailman/listinfo/scipy-user From s.mientki at ru.nl Sun Feb 1 04:27:11 2009 From: s.mientki at ru.nl (Stef Mientki) Date: Sun, 01 Feb 2009 10:27:11 +0100 Subject: [SciPy-user] Automating Matlab In-Reply-To: <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> References: <4984F58C.5070605@gmail.com> <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> Message-ID: <49856AEF.9050605@ru.nl> Robert Kern wrote: > On Sat, Jan 31, 2009 at 19:06, Eric Schug wrote: > >> Is there strong interest in automating matlab to numpy conversion? >> > > Yes! Please post your code somewhere! > > +1 And this is a very good moment for the persons who are creating a Matlab like environment, including the Matlab-like workspace, to show there creations. cheers, Stef From gael.varoquaux at normalesup.org Sun Feb 1 04:57:46 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 1 Feb 2009 10:57:46 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> Message-ID: <20090201095746.GA1099@phare.normalesup.org> On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: > Has anyone been able to take advantage of shared memory machines with > scipy? How did you do it? I am not sure I understand your question. You want to do parallel computing and share the arrays between processes, is that it? Ga?l From simpson at math.toronto.edu Sun Feb 1 10:03:30 2009 From: simpson at math.toronto.edu (Gideon Simpson) Date: Sun, 1 Feb 2009 10:03:30 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090201095746.GA1099@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <20090201095746.GA1099@phare.normalesup.org> Message-ID: Yes, but I'm talking about when you have a multiprocessor/multicore system, not a commodity cluster. In these shared memory configurations, were I using compiled code, I'd be able to use OpenMP to take advantage of the additional cores/processors. I'm wondering if anyone has looked at ways to take advantage of such configurations with scipy. -gideon On Feb 1, 2009, at 4:57 AM, Gael Varoquaux wrote: > On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: >> Has anyone been able to take advantage of shared memory machines with >> scipy? How did you do it? > > I am not sure I understand your question. You want to do parallel > computing and share the arrays between processes, is that it? > > Ga?l > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user From gael.varoquaux at normalesup.org Sun Feb 1 10:29:40 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 1 Feb 2009 16:29:40 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <20090201095746.GA1099@phare.normalesup.org> Message-ID: <20090201152940.GD9757@phare.normalesup.org> On Sun, Feb 01, 2009 at 10:03:30AM -0500, Gideon Simpson wrote: > Yes, but I'm talking about when you have a multiprocessor/multicore > system, not a commodity cluster. In these shared memory > configurations, were I using compiled code, I'd be able to use OpenMP > to take advantage of the additional cores/processors. I'm wondering > if anyone has looked at ways to take advantage of such configurations > with scipy. I use the multiprocessing module: http://docs.python.org/library/multiprocessing.html I also have some code to share arrays between processes. I'd love to submit it for integration with numpy, but first I'd like it to get more exposure so that the eventual flaws in the APIs are found. I am attaching it. Actually I wrote this code a few months ago, and now that I am looking at it, I realise that the SharedMemArray should probably be a subclass of numpy.ndarray, and implement the full array signature. I am not sure if this is possible or not (ie if it will still be easy to have multiprocessing share the data between processes or not). I don't really have time for polishing this right, anybody wants to have a go? Ga?l > On Feb 1, 2009, at 4:57 AM, Gael Varoquaux wrote: > > On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: > >> Has anyone been able to take advantage of shared memory machines with > >> scipy? How did you do it? > > I am not sure I understand your question. You want to do parallel > > computing and share the arrays between processes, is that it? -------------- next part -------------- """ Small helper module to share arrays between processes without copying data. Numpy arrays can be converted to shared memory arrays, which implement the array protocole, but are allocated in memory that can be share transparently by the multiprocessing module. """ # Author: Gael Varoquaux # Copyright: Gael Varoquaux # License: BSD import numpy as np import multiprocessing import ctypes _ctypes_to_numpy = { ctypes.c_char : np.int8, ctypes.c_wchar : np.int16, ctypes.c_byte : np.int8, ctypes.c_ubyte : np.uint8, ctypes.c_short : np.int16, ctypes.c_ushort : np.uint16, ctypes.c_int : np.int32, ctypes.c_uint : np.int32, ctypes.c_long : np.int32, ctypes.c_ulong : np.int32, ctypes.c_float : np.float32, ctypes.c_double : np.float64 } _numpy_to_ctypes = dict((value, key) for key, value in _ctypes_to_numpy.iteritems()) def shmem_as_ndarray(data, dtype=float): """ Given a multiprocessing.Array object, as created by ndarray_to_shmem, returns an ndarray view on the data. """ dtype = np.dtype(dtype) size = data._wrapper.get_size()/dtype.itemsize arr = np.frombuffer(buffer=data, dtype=dtype, count=size) return arr def ndarray_to_shmem(arr): """ Converts a numpy.ndarray to a multiprocessing.Array object. The memory is copied, and the array is flattened. """ arr = arr.reshape((-1, )) data = multiprocessing.RawArray(_numpy_to_ctypes[arr.dtype.type], arr.size) ctypes.memmove(data, arr.data[:], len(arr.data)) return data def test_ndarray_conversion(): """ Check that the conversion to multiprocessing.Array and back works. """ a = np.random.random((100, )) a_sh = ndarray_to_shmem(a) b = shmem_as_ndarray(a_sh) np.testing.assert_almost_equal(a, b) def test_conversion_non_flat(): """ Check that the conversion also works with non-flat arrays. """ a = np.random.random((100, 2)) a_flat = a.flatten() a_sh = ndarray_to_shmem(a) b = shmem_as_ndarray(a_sh) np.testing.assert_almost_equal(a_flat, b) def test_conversion_non_contiguous(): """ Check that the conversion also works with non-contiguous arrays. """ a = np.indices((3, 3, 3)) a = a.T a_flat = a.flatten() a_sh = ndarray_to_shmem(a) b = shmem_as_ndarray(a_sh, dtype=a.dtype) np.testing.assert_almost_equal(a_flat, b) def test_no_copy(): """ Check that the data is not copied from the multiprocessing.Array. """ a = np.random.random((100, )) a_sh = ndarray_to_shmem(a) a = shmem_as_ndarray(a_sh) b = shmem_as_ndarray(a_sh) a[0] = 1 np.testing.assert_equal(a[0], b[0]) a[0] = 0 np.testing.assert_equal(a[0], b[0]) ################################################################################ # A class to carry around the relevant information ################################################################################ class SharedMemArray(object): """ Wrapper around multiprocessing.Array to share an array accross processes. """ def __init__(self, arr): """ Initialize a shared array from a numpy array. The data is copied. """ self.data = ndarray_to_shmem(arr) self.dtype = arr.dtype self.shape = arr.shape def __array__(self): """ Implement the array protocole. """ arr = shmem_as_ndarray(self.data, dtype=self.dtype) arr.shape = self.shape return arr def asarray(self): return self.__array__() def test_sharing_array(): """ Check that a SharedMemArray shared between processes is indeed modified in place. """ # Our worker function def f(arr): a = arr.asarray() a *= -1 a = np.random.random((10, 3, 1)) arr = SharedMemArray(a) # b is a copy of a b = arr.asarray() np.testing.assert_array_equal(a, b) multiprocessing.Process(target=f, args=(arr,)).run() np.testing.assert_equal(-b, a) if __name__ == '__main__': import nose nose.runmodule() From josef.pktd at gmail.com Sun Feb 1 12:02:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Feb 2009 12:02:38 -0500 Subject: [SciPy-user] bug in stats.pdfapprox/stats.pdf_moment and new Gram-Charlier distribution Message-ID: <1cd32cbb0902010902ga75134dxac08747aa19d524c@mail.gmail.com> I wanted to create a new distribution by wrapping stats.pdfapprox and stats.pdf_moment in stats.morestats.py. However, these two function do not create a normal expansion if the first four moments are given. The inner loop in stats.pdf_moment is never entered when four moments are given. As a consequence the pdf that is returned is the unexpanded normal distribution. (There is also a small mistake in stats.pdfapprox in how the variance is calculated.) I didn't find any information which type of expansion is used by pdf_moment. I assume it is Gram-Charlier, but I didn't find any formulas to make sense out of the inner loop that calculates the coefficients (for multiplying with the Hermite polynomials). If someone could provide a (understandable) reference for these calculations or figure out what the loop is supposed to do, then we could correct the expansion for the general case. Since I couldn't fix `pdf_moment`, I wrote a new function that calculates the pdf for the Gram-Charlier expansion when the first four moments (or mean, variance, skew, kurtosis) are given. This uses the explicit formula for this expansion, and doesn't allow for higher order expansion. pdf_mvsk: get pdf of G-Ch normal expansion using mean, variance, skew, and excess kurtosis This I wrapped in a subclass of _distributionsrv_continuous: NormExpan_gen It works in the examples that I tried but is not fully tested or cleaned up yet. attachment: * try_pdfapprox.py shows problem with current function * distr_gch.py new expansion pdf, and NormExpan distribution I also wrote a skew normal and skew t distribution (as defined by Azzalini, A. & Capitanio, A., univariate only), which is not attached. Josef -------------- next part -------------- from scipy import stats, special from scipy.stats import distributions import numpy as np def mvsk2cm(args): mu,sig,sk,kur = args # Get central moments cnt = [None]*4 cnt[0] = mu cnt[1] = sig #*sig cnt[2] = sk * sig**1.5 cnt[3] = (kur+3.0) * sig**2.0 return cnt rvs = stats.norm.rvs(5,size=(2,100)).max(axis=0) mvsk = stats.describe(rvs)[2:] print 'sample: mu,sig,sk,kur' print mvsk mc = mvsk2cm(mvsk) pdffn1 = stats.pdfapprox(rvs) print '\npdf approximation from sample' print 'pdf at mean-1, mean+1', mc[0]-1,mc[0]+1 print pdffn1([mc[0]-1,mc[0]+1]) pdffn2 = stats.pdf_moments(mc) print '\npdf approximation from moments' print 'pdf at mean-1, mean+1', mc[0]-1,mc[0]+1 print pdffn2([mc[0]-1,mc[0]+1]) -------------- next part -------------- '''Gram-Charlier distribution, four-moment expansion of normal distribution ''' from scipy import stats, special from scipy.stats import distributions import numpy as np def mvsk2cm(*args): mu,sig2,sk,kur = args # Get central moments cnt = [None]*4 cnt[0] = mu cnt[1] = sig2 #*sig wrong in stats.pdfapprox cnt[2] = sk * sig2**1.5 cnt[3] = (kur+3.0) * sig2**2.0 return cnt def mc2mvsk(args): mc, mc2, mc3, mc4 = args skew = mc3 / mc2**1.5 kurt = mc4 / mc2**2.0 - 3.0 return (mc, mc2, skew, kurt) def pdf_mvsk(mvsk): """Return the Gaussian expanded pdf function given the list of 1st, 2nd moment and skew and Fisher (excess) kurtosis. Parameters ---------- mvsk : list of mu, mc2, skew, kurt distribution is matched to these four moments Returns ------- pdffunc : function function that evaluates the pdf(x), where x is the non-standardized random variable. Notes ----- Changed so it works only if four arguments are given. Uses explicit formula, not loop. This implements a Gram-Charlier expansion of the normal distribution where the first 2 moments coincide with those of the normal distribution but skew and kurtosis can deviate from it. In the Gram-Charlier distribution it is possible that the density becomes negative. This is the case when the deviation from the normal distribution is too large. References ---------- http://en.wikipedia.org/wiki/Edgeworth_series Johnson N.L., S. Kotz, N. Balakrishnan: Continuous Univariate Distributions, Volume 1, 2nd ed., p.30 """ N = len(mvsk) if N < 4: raise ValueError, "Four moments must be given to" + \ "approximate the pdf." mu, mc2, skew, kurt = mvsk totp = poly1d(1) sig = sqrt(mc2) if N > 2: Dvals = stats.morestats._hermnorm(N+1) C3 = skew/6.0 C4 = kurt/24.0 # Note: Hermite polynomial for order 3 in _hermnorm is negative # instead of positive totp = totp - C3*Dvals[3] + C4*Dvals[4] def pdffunc(x): xn = (x-mu)/sig return totp(xn)*np.exp(-xn*xn/2.0)/np.sqrt(2*np.pi)/sig return pdffunc class NormExpan_gen(distributions.rv_continuous): def __init__(self,args, **kwds): distributions.rv_continuous.__init__(self, name = 'Normal Expansion distribution', extradoc = ''' The distribution is defined as the Gram-Charlier expansion of the normal distribution using the first four moments. The pdf is given by pdf(x) = (1+ skew/6.0 * H(xc,3) + kurt/24.0 * H(xc,4))*normpdf(xc) where xc = (x-mu)/sig is the standardized value of the random variable and H(xc,3) and H(xc,4) are Hermite polynomials. Note: This distribution has to be parameterized during initialization and instantiation, and does not have a shape parameter after instantiation (similar to frozen distribution except for location and scale.) Location and scale can be used as with other distributions, however note, that they are relative to the initialized distribution. ''' ) mode = kwds.get('mode', 'sample') if mode == 'sample': mu,sig2,sk,kur = stats.describe(args)[2:] self.mvsk = (mu,sig2,sk,kur) cnt = mvsk2cm(mu,sig2,sk,kur) elif mode == 'mvsk': cnt = mvsk2cm(args) self.mvsk = args elif mode == 'centmom': cnt = args self.mvsk = mc2mvsk(cnt) else: raise ValueError, "mode must be 'mvsk' or centmom" self.cnt = cnt self._pdf = pdf_mvsk(self.mvsk) def _munp(self,n): # use pdf integration with _mom0_sc if only _pdf is defined. # default stats calculation uses ppf return self._mom0_sc(n) def _stats_skip(self): # skip for now to force numerical integration of pdf for testing return self.mvsk if __name__ == '__main__': rvs = skewnorm.rvs(5,size=100) normexpan = NormExpan_gen(rvs, mode='sample') smvsk = stats.describe(rvs)[2:] print 'sample: mu,sig2,sk,kur' print smvsk dmvsk = normexpan.stats(moments='mvsk') print 'normexpan: mu,sig2,sk,kur' print dmvsk print 'mvsk diff distribution - sample' print np.array(dmvsk) - np.array(smvsk) print '\nnormexpan attributes mvsk' print mc2mvsk(normexpan.cnt) print normexpan.mvsk print '\nusing methods' print normexpan.rvs(size=5) #slow print normexpan.cdf([-1,0,1,2]) print normexpan.pdf([-1,0,1,2]) print normexpan.ppf([0,0.1,0.5,0.9,0.95,1]) print normexpan.sf([0,0.1,0.5,0.9,0.95,1]) From timmichelsen at gmx-topmail.de Sun Feb 1 15:48:32 2009 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Sun, 01 Feb 2009 21:48:32 +0100 Subject: [SciPy-user] SciPy and GUI In-Reply-To: <20090125100556.GA29918@phare.normalesup.org> References: <20090125100556.GA29918@phare.normalesup.org> Message-ID: <49860AA0.1090300@gmx-topmail.de> Hello, Some ideas on adding a GUI to scientif scripts can be found in the following book: Python Scripting for Computational Science, by H. P. Langtangen. http://folk.uio.no/hpl/scripting/ I am currently as well at a point within my developments where user interaction is needed. Currently, I see three options with different levels of complexity. 1) use commandline (OptParse) with config files 2) add some simple GUIs that pop up where user input is needed. 3) code a real GUI with a Toolkit. > I would use traits (see > http://code.enthought.com/projects/traits/documentation.php, and > http://code.enthought.com/projects/traits/docs/html/tutorials/traits_ui_scientific_app.html > for documentation and a tutorial) I read your tutorial. I think it is one of the best I read that are targeting non-programmer scientists who need to to task specific coding. Your Physics Lab background shows that you know the difficulties of your readers. Well done! I shows that GUIs can be created with as little overhead code as possible. Nevertheless, I have some questions: * Where is the "science" in TraitsUI? (Why do you call it a scientific GUI?) E.g. I could also build a Wizard directly with wxPython. So why with Traits? * I tried the examples. What I did not understand is how one can control the buttons below the Traits objects. For the first example (section "An object and its representation"), there are 6 buttons in your image: Undo, Redo, Revert, OK, Cancel, Help. When I execute the code I only get OK, Cancel. May you tell how or where to find information how buttons can be contolled? * Input validation: I remember to have seen a example where a Traits Window was used to validate (numeric) input. If the user puts in invalid numers, it would turn read.. Do you know about this? * Is there a feature roadmap for traits? I would like to know where you intend it to develop it to before I settle on it. Others users may also be interested, so I relink to an earlier post: example application for a starter with TraitsUI http://thread.gmane.org/gmane.comp.python.enthought.devel/18246 It maybe of interest for many prospective beginners to see example applications. Why not listing all accessible applications built with TraitsUI on a website? I think that Enthought should put a strong pointer on their website (http://code.enthought.com/) indicating that actually a lot of documentation can also be found on the Trac wiki (https://svn.enthought.com/enthought/wiki). Kind regards, Timmie From marko.loparic at gmail.com Sun Feb 1 15:55:44 2009 From: marko.loparic at gmail.com (Marko Loparic) Date: Sun, 1 Feb 2009 21:55:44 +0100 Subject: [SciPy-user] python alternative to java rich ajax platform (RAP) for a thin client of a mathematical model? Message-ID: Hi, Do you know a python alternative to rich ajax platform (RAP)? For the development of a user interface for a mathematical model someone suggested me to use eclipse and that tool: http://www.eclipse.org/rap/ http://www.eclipse.org/rap/about.php If I understand correctly it allows you to design very easily a web client interacting with your application (in particular you write no javascript and the powerful javascript routines from qooxdoo are called for you). It seems to be a very interesting package except that... it is in java. So I would like to know if there is something with similar power made in python. Alternatively I would like to know if there is a way to use this tool doing the minimum in java and the most in python (probably not...). Thanks! Marko (sorry for crossposting with comp.lang.python, but it seems that scipy community is quite different) From alex.liberzon at gmail.com Sun Feb 1 16:27:10 2009 From: alex.liberzon at gmail.com (Alex Liberzon) Date: Sun, 1 Feb 2009 23:27:10 +0200 Subject: [SciPy-user] Automating Matlab Message-ID: <775f17a80902011327x225b9d7ve28eeff39f3024@mail.gmail.com> +1 On Sun, Feb 1, 2009 at 5:29 PM, wrote: > Send SciPy-user mailing list submissions to > scipy-user at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://projects.scipy.org/mailman/listinfo/scipy-user > or, via email, send a message with subject or body 'help' to > scipy-user-request at scipy.org > > You can reach the person managing the list at > scipy-user-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of SciPy-user digest..." > > > Today's Topics: > > 1. Automating Matlab (Eric Schug) > 2. Re: Automating Matlab (Robert Kern) > 3. Re: Automating Matlab (David Warde-Farley) > 4. shared memory machines (Gideon Simpson) > 5. Re: Automating Matlab (Young, Karl) > 6. Re: Automating Matlab (gsever) > 7. Re: Automating Matlab (Stef Mientki) > 8. Re: shared memory machines (Gael Varoquaux) > 9. Re: shared memory machines (Gideon Simpson) > 10. Re: shared memory machines (Gael Varoquaux) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 31 Jan 2009 20:06:20 -0500 > From: Eric Schug > Subject: [SciPy-user] Automating Matlab > To: scipy-user at scipy.org > Message-ID: <4984F58C.5070605 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Is there strong interest in automating matlab to numpy conversion? > > I have a working version of a matlab to python translator. > It allows translation of matlab scripts into numpy constructs, > supporting most of the matlab language. The parser is nearly complete. > Most of the remaining work involves providing a robust translation. Such as > * making sure that copies on assign are done when needed. > * correct indexing a(:) becomes a.flatten(1) when on the left hand > side (lhs) of equals > and a[:] when on the right hand side > > > I've seen a few projects attempt to do this, but for one reason or > another have stopped it. > > > > ------------------------------ > > Message: 2 > Date: Sat, 31 Jan 2009 19:34:57 -0600 > From: Robert Kern > Subject: Re: [SciPy-user] Automating Matlab > To: SciPy Users List > Message-ID: > <3d375d730901311734o388adf56y9f3241032ed409c2 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sat, Jan 31, 2009 at 19:06, Eric Schug wrote: > > Is there strong interest in automating matlab to numpy conversion? > > Yes! Please post your code somewhere! > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 3 > Date: Sat, 31 Jan 2009 20:49:32 -0500 > From: David Warde-Farley > Subject: Re: [SciPy-user] Automating Matlab > To: SciPy Users List > Message-ID: <5BC40EFF-8964-45CB-9DA3-D4FA87EE4B2E at cs.toronto.edu> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > On 31-Jan-09, at 8:06 PM, Eric Schug wrote: > > > Is there strong interest in automating matlab to numpy conversion? > > I think there is a strong interest in this. One of the main obstacles > to changing environments is inertia and familiarity. My advisor > repeatedly expresses his wish to give Python another try, and having > an easy way to show him how his existing scripts translate would be > awesome. > > Of course there are caveats, corner cases where such translations will > fail, but a fairly foolproof method of converting simple scripts would > be just fantastic. I imagine if you've gotten further along than > previous attempts you'll receive a lot of street cred on this list and > probably a lot of patches to make things work better. :) > > David > > > ------------------------------ > > Message: 4 > Date: Sun, 1 Feb 2009 00:37:48 -0500 > From: Gideon Simpson > Subject: [SciPy-user] shared memory machines > To: SciPy Users List > Message-ID: <2AE6D153-799C-450E-8E69-CA80D12E2FF5 at math.toronto.edu> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Has anyone been able to take advantage of shared memory machines with > scipy? How did you do it? > > -gideon > > > > ------------------------------ > > Message: 5 > Date: Sat, 31 Jan 2009 21:45:59 -0800 > From: "Young, Karl" > Subject: Re: [SciPy-user] Automating Matlab > To: "SciPy Users List" > Message-ID: > <9D202D4E86A4BF47BA6943ABDF21BE78058FAB62 at EXVS06.net.ucsf.edu> > Content-Type: text/plain; charset=iso-8859-1 > > > >> Is there strong interest in automating matlab to numpy conversion? > > > Yes! Please post your code somewhere! > > seconded !!!!! I'm currently working on a grant that has turned out to > involve porting a lot of matlab code to python; you will be gratefully > acknowledged in whatever comes of the work of the grant. > > -- KY > > > > ------------------------------ > > Message: 6 > Date: Sat, 31 Jan 2009 23:49:32 -0800 (PST) > From: gsever > Subject: Re: [SciPy-user] Automating Matlab > To: scipy-user at scipy.org > Message-ID: > <27b4bf7a-bb75-457d-8ed0-eec3465b92f1 at t13g2000yqc.googlegroups.com> > Content-Type: text/plain; charset=ISO-8859-1 > > I am interested with this project, too. Would be much better to have > an automated tool than doing manual conversations. > > Just for your information, there is a IDL-to-Python conversation tool > named i2py @ http://code.google.com/p/i2py/ > > On Jan 31, 7:06?pm, Eric Schug wrote: > > Is there strong interest in automating matlab to numpy conversion? > > > > I have a working version of a matlab to python translator. > > It allows translation of matlab scripts into numpy constructs, > > supporting most of the matlab language. ?The parser is nearly complete. ? > > Most of the remaining work involves providing a robust translation. Such > as > > ? ? * making sure that copies on assign are done when needed. > > ? ? * correct indexing a(:) becomes a.flatten(1) when on the left hand > > side (lhs) of equals > > ? ? ? ?and a[:] when on the right hand side > > > > I've seen a few projects attempt to do this, but for one reason or > > another have stopped it. > > > > _______________________________________________ > > SciPy-user mailing list > > SciPy-u... at scipy.orghttp:// > projects.scipy.org/mailman/listinfo/scipy-user > > > ------------------------------ > > Message: 7 > Date: Sun, 01 Feb 2009 10:27:11 +0100 > From: Stef Mientki > Subject: Re: [SciPy-user] Automating Matlab > To: scipy-user at scipy.org > Message-ID: <49856AEF.9050605 at ru.nl> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Robert Kern wrote: > > On Sat, Jan 31, 2009 at 19:06, Eric Schug wrote: > > > >> Is there strong interest in automating matlab to numpy conversion? > >> > > > > Yes! Please post your code somewhere! > > > > > +1 > > And this is a very good moment for the persons who are creating a Matlab > like environment, > including the Matlab-like workspace, > to show there creations. > > cheers, > Stef > > > ------------------------------ > > Message: 8 > Date: Sun, 1 Feb 2009 10:57:46 +0100 > From: Gael Varoquaux > Subject: Re: [SciPy-user] shared memory machines > To: SciPy Users List > Message-ID: <20090201095746.GA1099 at phare.normalesup.org> > Content-Type: text/plain; charset=iso-8859-1 > > On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: > > Has anyone been able to take advantage of shared memory machines with > > scipy? How did you do it? > > I am not sure I understand your question. You want to do parallel > computing and share the arrays between processes, is that it? > > Ga?l > > > ------------------------------ > > Message: 9 > Date: Sun, 1 Feb 2009 10:03:30 -0500 > From: Gideon Simpson > Subject: Re: [SciPy-user] shared memory machines > To: SciPy Users List > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes > > Yes, but I'm talking about when you have a multiprocessor/multicore > system, not a commodity cluster. In these shared memory > configurations, were I using compiled code, I'd be able to use OpenMP > to take advantage of the additional cores/processors. I'm wondering > if anyone has looked at ways to take advantage of such configurations > with scipy. > > -gideon > > On Feb 1, 2009, at 4:57 AM, Gael Varoquaux wrote: > > > On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: > >> Has anyone been able to take advantage of shared memory machines with > >> scipy? How did you do it? > > > > I am not sure I understand your question. You want to do parallel > > computing and share the arrays between processes, is that it? > > > > Ga?l > > _______________________________________________ > > SciPy-user mailing list > > SciPy-user at scipy.org > > http://projects.scipy.org/mailman/listinfo/scipy-user > > > > ------------------------------ > > Message: 10 > Date: Sun, 1 Feb 2009 16:29:40 +0100 > From: Gael Varoquaux > Subject: Re: [SciPy-user] shared memory machines > To: SciPy Users List > Message-ID: <20090201152940.GD9757 at phare.normalesup.org> > Content-Type: text/plain; charset="iso-8859-1" > > On Sun, Feb 01, 2009 at 10:03:30AM -0500, Gideon Simpson wrote: > > Yes, but I'm talking about when you have a multiprocessor/multicore > > system, not a commodity cluster. In these shared memory > > configurations, were I using compiled code, I'd be able to use OpenMP > > to take advantage of the additional cores/processors. I'm wondering > > if anyone has looked at ways to take advantage of such configurations > > with scipy. > > I use the multiprocessing module: > http://docs.python.org/library/multiprocessing.html > > I also have some code to share arrays between processes. I'd love to > submit it for integration with numpy, but first I'd like it to get more > exposure so that the eventual flaws in the APIs are found. I am attaching > it. > > Actually I wrote this code a few months ago, and now that I am looking at > it, I realise that the SharedMemArray should probably be a subclass of > numpy.ndarray, and implement the full array signature. I am not sure if > this is possible or not (ie if it will still be easy to have > multiprocessing share the data between processes or not). I don't really > have time for polishing this right, anybody wants to have a go? > > Ga?l > > > On Feb 1, 2009, at 4:57 AM, Gael Varoquaux wrote: > > > > On Sun, Feb 01, 2009 at 12:37:48AM -0500, Gideon Simpson wrote: > > >> Has anyone been able to take advantage of shared memory machines with > > >> scipy? How did you do it? > > > > I am not sure I understand your question. You want to do parallel > > > computing and share the arrays between processes, is that it? > > -------------- next part -------------- > """ > Small helper module to share arrays between processes without copying > data. > > Numpy arrays can be converted to shared memory arrays, which implement > the array protocole, but are allocated in memory that can be > share transparently by the multiprocessing module. > """ > > # Author: Gael Varoquaux > # Copyright: Gael Varoquaux > # License: BSD > > import numpy as np > import multiprocessing > import ctypes > > _ctypes_to_numpy = { > ctypes.c_char : np.int8, > ctypes.c_wchar : np.int16, > ctypes.c_byte : np.int8, > ctypes.c_ubyte : np.uint8, > ctypes.c_short : np.int16, > ctypes.c_ushort : np.uint16, > ctypes.c_int : np.int32, > ctypes.c_uint : np.int32, > ctypes.c_long : np.int32, > ctypes.c_ulong : np.int32, > ctypes.c_float : np.float32, > ctypes.c_double : np.float64 > } > > _numpy_to_ctypes = dict((value, key) for key, value in > _ctypes_to_numpy.iteritems()) > > def shmem_as_ndarray(data, dtype=float): > """ Given a multiprocessing.Array object, as created by > ndarray_to_shmem, returns an ndarray view on the data. > """ > dtype = np.dtype(dtype) > size = data._wrapper.get_size()/dtype.itemsize > arr = np.frombuffer(buffer=data, dtype=dtype, count=size) > return arr > > > def ndarray_to_shmem(arr): > """ Converts a numpy.ndarray to a multiprocessing.Array object. > > The memory is copied, and the array is flattened. > """ > arr = arr.reshape((-1, )) > data = multiprocessing.RawArray(_numpy_to_ctypes[arr.dtype.type], > arr.size) > ctypes.memmove(data, arr.data[:], len(arr.data)) > return data > > > > def test_ndarray_conversion(): > """ Check that the conversion to multiprocessing.Array and back works. > """ > a = np.random.random((100, )) > a_sh = ndarray_to_shmem(a) > b = shmem_as_ndarray(a_sh) > np.testing.assert_almost_equal(a, b) > > > def test_conversion_non_flat(): > """ Check that the conversion also works with non-flat arrays. > """ > a = np.random.random((100, 2)) > a_flat = a.flatten() > a_sh = ndarray_to_shmem(a) > b = shmem_as_ndarray(a_sh) > np.testing.assert_almost_equal(a_flat, b) > > > def test_conversion_non_contiguous(): > """ Check that the conversion also works with non-contiguous arrays. > """ > a = np.indices((3, 3, 3)) > a = a.T > a_flat = a.flatten() > a_sh = ndarray_to_shmem(a) > b = shmem_as_ndarray(a_sh, dtype=a.dtype) > np.testing.assert_almost_equal(a_flat, b) > > > > def test_no_copy(): > """ Check that the data is not copied from the multiprocessing.Array. > """ > a = np.random.random((100, )) > a_sh = ndarray_to_shmem(a) > a = shmem_as_ndarray(a_sh) > b = shmem_as_ndarray(a_sh) > a[0] = 1 > np.testing.assert_equal(a[0], b[0]) > a[0] = 0 > np.testing.assert_equal(a[0], b[0]) > > > > ################################################################################ > # A class to carry around the relevant information > > ################################################################################ > > class SharedMemArray(object): > """ Wrapper around multiprocessing.Array to share an array accross > processes. > """ > > def __init__(self, arr): > """ Initialize a shared array from a numpy array. > > The data is copied. > """ > self.data = ndarray_to_shmem(arr) > self.dtype = arr.dtype > self.shape = arr.shape > > def __array__(self): > """ Implement the array protocole. > """ > arr = shmem_as_ndarray(self.data, dtype=self.dtype) > arr.shape = self.shape > return arr > > def asarray(self): > return self.__array__() > > > def test_sharing_array(): > """ Check that a SharedMemArray shared between processes is indeed > modified in place. > """ > # Our worker function > def f(arr): > a = arr.asarray() > a *= -1 > > a = np.random.random((10, 3, 1)) > arr = SharedMemArray(a) > # b is a copy of a > b = arr.asarray() > np.testing.assert_array_equal(a, b) > multiprocessing.Process(target=f, args=(arr,)).run() > np.testing.assert_equal(-b, a) > > > if __name__ == '__main__': > > import nose > nose.runmodule() > > > > ------------------------------ > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > > > End of SciPy-user Digest, Vol 66, Issue 1 > ***************************************** > -- Alex Liberzon Turbulence Structure Laboratory (http://www.eng.tau.ac.il/efdl) School of Mechanical Engineering Tel Aviv University Tel: +972-3-640-8928 (office) Tel: +972-3-640-6860 (lab) E-mail: alexlib at eng.tau.ac.il -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Feb 1 16:35:17 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 1 Feb 2009 22:35:17 +0100 Subject: [SciPy-user] SciPy and GUI In-Reply-To: <49860AA0.1090300@gmx-topmail.de> References: <20090125100556.GA29918@phare.normalesup.org> <49860AA0.1090300@gmx-topmail.de> Message-ID: <20090201213517.GG931@phare.normalesup.org> On Sun, Feb 01, 2009 at 09:48:32PM +0100, Tim Michelsen wrote: > > I would use traits (see > > http://code.enthought.com/projects/traits/documentation.php, and > > http://code.enthought.com/projects/traits/docs/html/tutorials/traits_ui_scientific_app.html > > for documentation and a tutorial) > * Where is the "science" in TraitsUI? (Why do you call it a scientific GUI?) > E.g. I could also build a Wizard directly with wxPython. So why with > Traits? There are two questions here: 1. What is scientific with Traits? 2. Why Traits rather than raw WxPython? Answer to 1): Per se Traits has nothing scientific and can be used for non-scientific applications. Now the people behind Traits do scientific computing. As a results Traits integrates perfectly with numpy, or Mayavi, or the Chaco visualization library. In addition there are plenty of widgets that are very relevant to scientific applications (such as slider bars). Answer to 2): Its a question of using the right abstraction level. WxPython is a library of widgets, events and eventloops. It forces you to think in these terms and not in terms of models and views. Traits makes you think in terms of building a model, making it live with a set of callbacks, and adding a view on top of it. The code is much clearer because it is not riddled with references to 'wx.TextField', and the reactive-programming model is much easier to follow than explicit registering of callbacks (it is interesting to note that Qt has started to move in this direction in Qt4, although the corresponding PyQt code is not terribly Pythonic). Moreover, the event loop is mostly hidden to the user, in Traits. This is possible because of the implicit View/Model separation and the 'message passing' programming style that comes from heavy use of callbacks on attribute modification. As a result, threading issues with event loops (which are a really bitch) are hidden with Traits: Traits, and TraitsUI is mostly thread-safe. In Wx, you will quickly have to understand the fine details of the event loop, which is interesting, but quite off-topic for the scientific programmer. But the really important thing about Traits is that is folds together a set of patterns and best-practices, such as validation, model-view separation, default-initialization, cheap callbacks/the observer pattern. Using Traits puts you on a good path building a good architecture to your application. If you are using the raw toolkit you can still architecture your application right, but you need more experience, more knowledge. It is so easy to mix model and view when manipulating widgets, and not an abstraction to them (I did this this summer without realizing it, and regretted it a lot much later). > * I tried the examples. > What I did not understand is how one can control the buttons below the > Traits objects. > For the first example (section "An object and its representation"), > there are 6 buttons in your image: > Undo, Redo, Revert, OK, Cancel, Help. > When I execute the code I only get OK, Cancel. > May you tell how or where to find information how buttons can be contolled? I can tell you this. You need to write a handler for your view: http://code.enthought.com/projects/traits/docs/html/TUIUG/handler.html To give you a short example to do this: from enthought.traits.api import HasTraits, Int from enthought.traits.ui.api import View, Handler class MyHandler(Handler): def closed(self, info, is_ok): if is_ok: print 'User closed the window, and is happy' else: print 'User closed the window, and is unhappy' class Model(HasTraits): a = Int view = View('a', handler=MyHandler(), buttons=['OK', 'Cancel']) model = Model() model.configure_traits(view=view) However, if you are not programming a reactive application, I would try to put as little code as possible in the handler, and put the logics in the code following the 'configure_traits' call. If you need to know if the user pressed 'OK' or 'Cancel', I would capture this and store it in the Handler, but I would put the processing logics later on. That's another case of separating the core logics (called 'model') from the view-related logics. > * Input validation: I remember to have seen a example where a Traits > Window was used to validate (numeric) input. If the user puts in invalid > numers, it would turn read.. Do you know about this? Sure, that's easy: when you specify the traits, you specify its type (in the above example it is an int), if the user enters a wrong type, the text box turns read, and the corresponding attribute is not changed. > * Is there a feature roadmap for traits? > I would like to know where you intend it to develop it to before I > settle on it. Traits 3 was release 6 months ago. It was a major overhaul (although the API didn't change much). Ever since development has been fairly limited. It seems that people are mainly happy with what we have right now. Of course Traits has limitations (including some design issues, nobody is perfect). In addition some specific needs might arise. Remember, there is a company behind Traits. Thus you may see some new developments, or additions. I don't expect a major change anytime soon. > It maybe of interest for many prospective beginners to see example > applications. Why not listing all accessible applications built with > TraitsUI on a website? Most of them are not open source. The open source ones (SciPyLab, Mayavi) are fairly complex, and I would advise a beginner to look into them. > I think that Enthought should put a strong pointer on their website > (http://code.enthought.com/) indicating that actually a lot of > documentation can also be found on the Trac wiki > (https://svn.enthought.com/enthought/wiki). You probably have a point. Documenting a beast like that is not easy, believe me :). HTH, Ga?l From sturla at molden.no Sun Feb 1 19:51:08 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 2 Feb 2009 01:51:08 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> Message-ID: <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> > Has anyone been able to take advantage of shared memory machines with > scipy? How did you do it? I have either used OpenML in C or Fortran 90 extension modules, or multiprocessing in Python. If you have lengthy calculations in extension libraries you can also use Python threads, given that your extension releases the GIL. I have been working on a multiprocessing + NumPy cookbook tutorial. For now the unfinished draft is here: http://folk.uio.no/sturlamo/python/multiprocessing-tutorial.pdf It only covers shared memory programming, though. I will also add how to use Queues for message-passing. Many prefer message-passing to shared memory. You avoid performance problems due to 'false sharing', and there is often less resource contention. The difference betwwen threading and multiprocessing should also be better covered. Both are applicable, but in defferent contexts. And with threading you can also choose between 'shared-objects' and message-passing (Queue.Queue). Sturla Molden From sturla at molden.no Sun Feb 1 20:07:22 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 2 Feb 2009 02:07:22 +0100 (CET) Subject: [SciPy-user] Automating Matlab In-Reply-To: <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> References: <4984F58C.5070605@gmail.com> <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> Message-ID: > On Sat, Jan 31, 2009 at 19:06, Eric Schug wrote: >> Is there strong interest in automating matlab to numpy conversion? > > Yes! Please post your code somewhere! For those who are interested, there are two ways of doing this: The most portable is to call the 'Matlab engine', which is a C and Fortran library for automating Matlab. This can be done using f2py or ctypes (wrap libeng.dll and libmx.dll). http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_external/f29148.html&http://www.google.no/search?rlz=1C1GGLS_noNO291NO303&aq=f&sourceid=chrome&ie=UTF-8&q=matlab+engine The other option (Windows only) is to use Matlab as an outproc COM server. This will require pywin32. S.M. From sturla at molden.no Sun Feb 1 20:33:38 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 2 Feb 2009 02:33:38 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: <20090201152940.GD9757@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <20090201095746.GA1099@phare.normalesup.org> <20090201152940.GD9757@phare.normalesup.org> Message-ID: <39cf90a17c7a1ea627e6273ce09e0da2.squirrel@webmail.uio.no> > On Sun, Feb 01, 2009 at 10:03:30AM -0500, Gideon Simpson wrote: > Actually I wrote this code a few months ago, and now that I am looking at > it, I realise that the SharedMemArray should probably be a subclass of > numpy.ndarray, and implement the full array signature. I am not sure if > this is possible or not (ie if it will still be easy to have > multiprocessing share the data between processes or not). ? You can use multiprocessing.Array to allocate shared memory, and use its buffer to create an ndarray with numpy.frombuffer. Basically multiprocessing can use whatever can be pickled. ndarrays copy their contents when pickled, and subclasses seem to inherit this behaviour. Note that this is perfectly okay if you are happy with a message-passing approach to parallel computing. When using mp.Array as shared memory, the object must be passed to multiprocessing.Process on instantiation. This is because of handle inheritance. Therefore you cannot pass an instance of mp.Array through a mp.Queue or mp.Pipe. If we use named share memory (System V IPC) instead of BSD mmap, we can probably get around this. But support for this li lacking in Python and SciPy. Sturla Molden From david at ar.media.kyoto-u.ac.jp Sun Feb 1 22:43:27 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 02 Feb 2009 12:43:27 +0900 Subject: [SciPy-user] Automating Matlab In-Reply-To: References: <4984F58C.5070605@gmail.com> <3d375d730901311734o388adf56y9f3241032ed409c2@mail.gmail.com> Message-ID: <49866BDF.2000809@ar.media.kyoto-u.ac.jp> Sturla Molden wrote: > > For those who are interested, there are two ways of doing this: > I think Eric talked about source code translation, that is .m to .py. > The most portable is to call the 'Matlab engine', which is a C and Fortran > library for automating Matlab. This can be done using f2py or ctypes (wrap > libeng.dll and libmx.dll). > If you are not aware of it, there is already code for it: http://svn.scipy.org/svn/scikits/trunk/mlabwrap/ cheers, David From gael.varoquaux at normalesup.org Mon Feb 2 01:38:33 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 2 Feb 2009 07:38:33 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> Message-ID: <20090202063833.GB9627@phare.normalesup.org> On Mon, Feb 02, 2009 at 01:51:08AM +0100, Sturla Molden wrote: > I have been working on a multiprocessing + NumPy cookbook tutorial. For > now the unfinished draft is here: > http://folk.uio.no/sturlamo/python/multiprocessing-tutorial.pdf Hey, it's a very interested document. It seems that you have quite a lot of insight on these problems. I hadn't realized that a numpy array with the memory alocated as shared memory would be automaticaly shared by multiprocessing (I tried, and to my surprise, it works). So it seems that shmem_as_ndarray (the implementation of which is fairly similar in your code and in mine), and probably probably some array creation helper like empty_shmem, is all we need to use multiprocessing with numpy. Do you concur? I also like a lot your code to figure out the number of processor. It is very useful in a multiprocessing scientific computing package. However my limitation is more often than not memory. Do you have cross platform code to analyse the percent of memory used, and the absolute amount of memory free? I think I should write empty_shmem, to complete hide the multiprocessing Array, delete my useless SharedMemArray class, integrate your number of processor function, and recirculate my code, if it is OK with you. In a few iterations we can propose this for integration in numpy. Cheers, Ga?l From robert.kern at gmail.com Mon Feb 2 01:51:51 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 2 Feb 2009 00:51:51 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090202063833.GB9627@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> Message-ID: <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> On Mon, Feb 2, 2009 at 00:38, Gael Varoquaux wrote: > I think I should write empty_shmem, to complete hide the multiprocessing > Array, delete my useless SharedMemArray class, integrate your number of > processor function, and recirculate my code, if it is OK with you. In a > few iterations we can propose this for integration in numpy. Here's mine, FWIW. It goes down directly to the multiprocessing.heap code underlying the Array stuff. On Windows, the objects transfer via pickle while under UNIX, they must be inherited. Windows mmap objects can be pickled while UNIX mmap objects can't. Like Sturla says, we'd have to use named shared memory to get around this on UNIX. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -------------- next part -------------- A non-text attachment was scrubbed... Name: shared_array.py Type: text/x-python-script Size: 2728 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example.py Type: text/x-python-script Size: 1406 bytes Desc: not available URL: From bgbg.bg at gmail.com Mon Feb 2 03:12:33 2009 From: bgbg.bg at gmail.com (bgbg bg) Date: Mon, 2 Feb 2009 10:12:33 +0200 Subject: [SciPy-user] concatenating arrays of different dimensions Message-ID: <57b9201a0902020012o31524f04ic34cda82c27920b@mail.gmail.com> Hello, Consider an Octave code that concatenates an array and a vector: octave:1> a = [1, 2, 3]; octave:2> b = [ 11, 22, 33; 44, 55 66]; octave:3> c = [a; b] c = 1 2 3 11 22 33 44 55 66 octave:4> How do I emulate this behavior in Python (scipy)? This is what i tried: In [37]: from scipy import array In [38]: a = array([1,2,3]) In [39]: b = array([ [11,22,33], [44, 55, 66]]) In [40]: c = [a, b] In [41]: print c [array([1, 2, 3]), array([[11, 22, 33], [44, 55, 66]])] In [42]: # not good In [43]: c = concatenate((a,b)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ValueError: arrays must have same number of dimensions In [44]: c = concatenate((a,b),1) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ValueError: arrays must have same number of dimensions In [45]: From pgmdevlist at gmail.com Mon Feb 2 03:23:26 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 2 Feb 2009 03:23:26 -0500 Subject: [SciPy-user] concatenating arrays of different dimensions In-Reply-To: <57b9201a0902020012o31524f04ic34cda82c27920b@mail.gmail.com> References: <57b9201a0902020012o31524f04ic34cda82c27920b@mail.gmail.com> Message-ID: <3777FF91-1F96-4179-9599-73A3607591B7@gmail.com> On Feb 2, 2009, at 3:12 AM, bgbg bg wrote: > Hello, > Consider an Octave code that concatenates an array and a vector: > octave:1> a = [1, 2, 3]; > octave:2> b = [ 11, 22, 33; 44, 55 66]; > octave:3> c = [a; b] > > > How do I emulate this behavior in Python (scipy)? This is what i > tried: > c = np.vstack((a,b)) For more info: http://www.scipy.org/Numpy_Functions_by_Category#head-ca5d5fe8c131a7ab8f7d7d38796ff84dbf4a2bd0 From ludovic.drouineau at ifremer.fr Mon Feb 2 04:41:50 2009 From: ludovic.drouineau at ifremer.fr (Ludovic DROUINEAU) Date: Mon, 02 Feb 2009 10:41:50 +0100 Subject: [SciPy-user] Problem reading NetCDF File In-Reply-To: <6a17e9ee0901300607l5345ca65oe927f32e48462592@mail.gmail.com> References: <4982F0EB.1000102@ifremer.fr> <6a17e9ee0901300607l5345ca65oe927f32e48462592@mail.gmail.com> Message-ID: <4986BFDE.2030007@ifremer.fr> Scott Sinclair a ?crit : >> 2009/1/30 Ludovic DROUINEAU : >> Hi all, >> >> When I try to open a NetCDF file, I have the following error: >> File "C:\Python25\lib\site-packages\scipy\io\netcdf.py", line 194, in >> _read_values >> count = n * bytes[nc_type-1] >> IndexError: list index out of range >> >> My code is: >> from scipy.io import netcdf >> >> nc = netcdf.netcdf_file ('test.nc', 'r') >> > > I'm not sure if anyone is actively maintaining scipy.io.netcdf (you'll > find out if there is a response to your query). In case there isn't, > you might have better luck with one of the following: > > http://code.google.com/p/netcdf4-python/ > http://matplotlib.sourceforge.net/basemap/doc/html/api/basemap_api.html#mpl_toolkits.basemap.NetCDFFile > http://www.pyngl.ucar.edu/Nio.shtml > http://pypi.python.org/pypi/pupynere/1.0 > > Cheers, > Scott > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > Hi all, I'm quite new to python and I had many problems installing netcdf4-python. I have installed netcdf, hdf5, szlib, zlib And I try to install netcdf4-python with: python setup.py install running install running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src building py_modules sources building extension "netCDF4" sources running build_py running build_ext No module named msvccompiler in numpy.distutils; trying from distutils error: Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py. Anyway, I have installed pupynere (with the tar.gz) With the egg Do I just have to run "easy-install pupinere.egg" ? And when I tried to test it with this line: |>>> from pupynere import NetCDFFile >>> f = nc('example.nc', 'w') | It failed with: NameError: name 'nc' is not defined May be the documentation in http://dealmeida.net/2008/07/14/pupynere is old But when I tried: f = netcdf_file('test.nc', 'r') print f.history time = f.variables['time'][:] lat = f.variables['lat'][:] Everything works fine Regards Ludovic -- Ludovic DROUINEAU NSE/ILE Ifremer Centre de Brest BP 70 - 29280 Plouzan? t?l. 33 (0)2 98 22 40 94 email Ludovic.Drouineau at ifremer.fr From gael.varoquaux at normalesup.org Mon Feb 2 05:53:16 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 2 Feb 2009 11:53:16 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> Message-ID: <20090202105316.GE11955@phare.normalesup.org> On Mon, Feb 02, 2009 at 12:51:51AM -0600, Robert Kern wrote: > On Mon, Feb 2, 2009 at 00:38, Gael Varoquaux > wrote: > > I think I should write empty_shmem, to complete hide the multiprocessing > > Array, delete my useless SharedMemArray class, integrate your number of > > processor function, and recirculate my code, if it is OK with you. In a > > few iterations we can propose this for integration in numpy. > Here's mine, FWIW. It goes down directly to the multiprocessing.heap > code underlying the Array stuff. On Windows, the objects transfer via > pickle while under UNIX, they must be inherited. Windows mmap objects > can be pickled while UNIX mmap objects can't. Like Sturla says, we'd > have to use named shared memory to get around this on UNIX. Well, you know way more than I do about this. But I fear I am miss-understanding something. Does what you are saying means that an 'empty_shmem', that would create a multiprocessing Array, and expose it as a numpy array, is bound to fail under windows? My experiments seem to show that this works under Linux, and this would be a very simple way of doing shared memory. We could have a numpy.multiprocessing, with all kind of constructors for arrays (empty, zeros, ones, *_like, and maybe 'array') that would be shared between process. Am I out of my mind, and will this fail utterly? Cheers, Ga?l From david_baddeley at yahoo.com.au Mon Feb 2 06:36:11 2009 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Mon, 2 Feb 2009 03:36:11 -0800 (PST) Subject: [SciPy-user] Matlab style line based profiling Message-ID: <365959.22116.qm@web33006.mail.mud.yahoo.com> Hi all, a while ago I drummed up a matlab like profiling module which gives times for individual lines. Since then I've found http://packages.python.org/line_profiler/ which seems to be a bit more refined and should be somewhat faster (mine is pure python, both hook the tracing functions). Where my code does have an advantage is that I've got it producing syntax highlighted html with the most expensive lines highlighted in red with the times in the margin, like the matlab profiler. I've also used a variant of the profile on, profile off, profile report syntax which should be familiar to matlab users. Would like to make it available, but am not sure how much demand there would be for a second line profiler module and whether it wouldn't be more sensible to see if the report generation couldn't be adapted to work with the aforementioned line_profiler module (there might be licensing issues here as my html generation borrows heavily from a GPL licensed python syntax highlighter and I'm not sure if Robert would be keen on having his module tainted). have attached the current code and would welcome any input, Cheers, David Get the world's best email - http://nz.mail.yahoo.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: colorize_db_t.py Type: text/x-python Size: 7971 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mProfile.py Type: text/x-python Size: 2710 bytes Desc: not available URL: From robert.kern at gmail.com Mon Feb 2 11:48:44 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 2 Feb 2009 10:48:44 -0600 Subject: [SciPy-user] Matlab style line based profiling In-Reply-To: <365959.22116.qm@web33006.mail.mud.yahoo.com> References: <365959.22116.qm@web33006.mail.mud.yahoo.com> Message-ID: <3d375d730902020848w13cf82aan1c8b024c029294a2@mail.gmail.com> On Mon, Feb 2, 2009 at 05:36, David Baddeley wrote: > Hi all, > > a while ago I drummed up a matlab like profiling module which gives times for individual lines. Since then I've found http://packages.python.org/line_profiler/ which seems to be a bit more refined and should be somewhat faster (mine is pure python, both hook the tracing functions). Where my code does have an advantage is that I've got it producing syntax highlighted html with the most expensive lines highlighted in red with the times in the margin, like the matlab profiler. I've also used a variant of the profile on, profile off, profile report syntax which should be familiar to matlab users. > > Would like to make it available, but am not sure how much demand there would be for a second line profiler module and whether it wouldn't be more sensible to see if the report generation couldn't be adapted to work with the aforementioned line_profiler module (there might be licensing issues here as my html generation borrows heavily from a GPL licensed python syntax highlighter and I'm not sure if Robert would be keen on having his module tainted). Not much! :-) However, there is a version of the colorization code that you used that is more palatably licensed: http://code.activestate.com/recipes/52298/ Here is IPython's version, which uses ANSI escape sequences for terminal color output, which would also be a nice addition to the text output: http://bazaar.launchpad.net/~ipython-dev/ipython/trunk/annotate/head%3A/IPython/PyColorize.py I would be happy to accept contributions in this vein. I was gearing up to release 1.0b2 in the next day or two, but if you would like to get a patch together in the next week, I can wait. Let me know if the default line_profiler workflow doesn't work well for you. If you have suggestions for alternatives, I'm happy to listen. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Mon Feb 2 12:24:18 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 2 Feb 2009 18:24:18 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: <20090202105316.GE11955@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> Message-ID: <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> > On Mon, Feb 02, 2009 at 12:51:51AM -0600, Robert Kern wrote: > Well, you know way more than I do about this. But I fear I am > miss-understanding something. Does what you are saying means that an > 'empty_shmem', that would create a multiprocessing Array, and expose it > as a numpy array, is bound to fail under windows? Linux: You can create shared memory using BSD mmap or System V IPC. Multiprocessing does the former. Shared memory created via BSD mmap is "unnamed". Thus, it has to be created in the parent prior prior to the call to fork(), otherwise the child cannot get access to it. That is why mp.Array must be created prior to mp.Process (the latter calls os.fork). Windows: There is no fork(). Shared memory can be named or unnamed. In the second case, it is passed to the spawned process via handle inheritance. This is what multiprocessing does. Again, the consequence is that it must med created prior to the creation of mp.Process. In this case it must actually be passed as an argument to to mp.Process when it is instantiated. However: If we had an ndarray that used named shared memory as buffer, it would be more convinient on Windows and Linux alike. Any process can map this segment if it knows its name. It would only pickle the name of the shared segment (as well as dtype and shape), and could thus be messaged between processes using mp.Queue. Currently we can only send copies of private memory arrays via mp.Queue. > My experiments seem to > show that this works under Linux, and this would be a very simple way of > doing shared memory. We could have a numpy.multiprocessing, with all kind > of constructors for arrays (empty, zeros, ones, *_like, and maybe > 'array') that would be shared between process. > > Am I out of my mind, and will this fail utterly? It will work. But we should use named shared memory (which requires some C or Cython coding), not BSD mmap as mp.Array currently does. Also we must override how ndarrays are pickled. Sturla Molden From robert.kern at gmail.com Mon Feb 2 12:29:06 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 2 Feb 2009 11:29:06 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090202105316.GE11955@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> Message-ID: <3d375d730902020929q68d1b163t5bcc424f52459c8a@mail.gmail.com> On Mon, Feb 2, 2009 at 04:53, Gael Varoquaux wrote: > On Mon, Feb 02, 2009 at 12:51:51AM -0600, Robert Kern wrote: >> On Mon, Feb 2, 2009 at 00:38, Gael Varoquaux >> wrote: >> > I think I should write empty_shmem, to complete hide the multiprocessing >> > Array, delete my useless SharedMemArray class, integrate your number of >> > processor function, and recirculate my code, if it is OK with you. In a >> > few iterations we can propose this for integration in numpy. > >> Here's mine, FWIW. It goes down directly to the multiprocessing.heap >> code underlying the Array stuff. On Windows, the objects transfer via >> pickle while under UNIX, they must be inherited. Windows mmap objects >> can be pickled while UNIX mmap objects can't. Like Sturla says, we'd >> have to use named shared memory to get around this on UNIX. > > Well, you know way more than I do about this. But I fear I am > miss-understanding something. Does what you are saying means that an > 'empty_shmem', that would create a multiprocessing Array, and expose it > as a numpy array, is bound to fail under windows? [These first two paragraphs are basically what Sturla says in his response. He's faster on the Send button than I am. :-)] Almost. On Windows, the subprocesses inherit nothing. All objects must be passed through pickles. Passing the Array works, but passing the ndarray won't because the ndarray pickler will pass-by-value. My approach registers a new pickler for ndarrays that recognizes my shared-memory ndarrays and makes a pickle that just references the shared memory. You could replicate that using Array as the memory allocator, but I think my approach which uses the "raw" allocators underneath Array is more straightforward. On UNIX, Arrays and the stuff underneath it don't pickle because the underlying mmap is not named. We'd need to wrap the appropriate APIs in order to do this. If you can arrange your program such that the arrays get inherited, you're fine because you don't need to pickle anything, but you can't pass these ndarrays through Queues and such. I've tried using the shm module, which does wrap those APIs, but I've never been able to get the memory to actually share unless if the subprocess inherits it. http://nikitathespider.com/python/shm/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Mon Feb 2 12:41:32 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 2 Feb 2009 18:41:32 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: <3d375d730902020929q68d1b163t5bcc424f52459c8a@mail.gmail.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <3d375d730902020929q68d1b163t5bcc424f52459c8a@mail.gmail.com> Message-ID: > On Mon, Feb 2, 2009 at 04:53, Gael Varoquaux > Almost. On Windows, the subprocesses inherit nothing. All objects must > be passed through pickles. Passing the Array works, but passing the > ndarray won't because the ndarray pickler will pass-by-value. Almost. A subprocess can be specified to inherit its parent's handles. The parent must then pass the value of the handle to the subprocess, e.g. via the stdin pipe. This is how mp.Array works on Windows. > On UNIX, Arrays and the stuff underneath it don't pickle because the > underlying mmap is not named. It is the same on Windows. Named shared memory is the cure in both cases. The advantage of named shared memory is that it can be created after the subprocesses are spawned/forked. Sturla Molden From timmichelsen at gmx-topmail.de Mon Feb 2 04:04:28 2009 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 02 Feb 2009 10:04:28 +0100 Subject: [SciPy-user] SciPy and GUI In-Reply-To: <20090201213517.GG931@phare.normalesup.org> References: <20090125100556.GA29918@phare.normalesup.org> <49860AA0.1090300@gmx-topmail.de> <20090201213517.GG931@phare.normalesup.org> Message-ID: Hello! > Answer to 2): > > Its a question of using the right abstraction level. WxPython is a [...] > thread-safe. In Wx, you will quickly have to understand the fine > details of the event loop, which is interesting, but quite off-topic > for the scientific programmer. [...] > But the really important thing about Traits is that is folds together > a set of patterns and best-practices, such as validation, model-view > separation, default-initialization, cheap callbacks/the observer > pattern. Using Traits puts you on a good path building a good > architecture to your application. If you are using the raw toolkit Hey, these well formulated explanations really convinced me to look more closely into ETS and GUI building! > However, if you are not programming a reactive application, I would try > to put as little code as possible in the handler, and put the logics in > the code following the 'configure_traits' call. If you need to know if > the user pressed 'OK' or 'Cancel', I would capture this and store it in > the Handler, but I would put the processing logics later on. That's > another case of separating the core logics (called 'model') from the > view-related logics. This is still something I have to discover closer. I hop to understand this once I digg deeper. > Sure, that's easy: when you specify the traits, you specify its type (in > the above example it is an int), if the user enters a wrong type, the > text box turns read, and the corresponding attribute is not changed. And can there also appear a message like: Please enter only data of "type"? >> It maybe of interest for many prospective beginners to see example >> applications. Why not listing all accessible applications built with >> TraitsUI on a website? > > Most of them are not open source. The open source ones (SciPyLab, Mayavi) > are fairly complex, and I would advise a beginner to look into them. > >> I think that Enthought should put a strong pointer on their website >> (http://code.enthought.com/) indicating that actually a lot of >> documentation can also be found on the Trac wiki >> (https://svn.enthought.com/enthought/wiki). > You probably have a point. Documenting a beast like that is not easy, > believe me :). I looked at all examples and demos in the ETS folder within the Python XY documentation folder. There are so many. I really think that the spread if ETS could benefit from a better advertisement of these demos. Look at the matplotlib gallery. The new user could quickly imagine why he/she should ponder about using the library. Thanks again & kind regards, Timmie From timmichelsen at gmx-topmail.de Mon Feb 2 13:23:29 2009 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 02 Feb 2009 19:23:29 +0100 Subject: [SciPy-user] timeseries: logging of defective time series Message-ID: Hello, I have a question on how to effectively log invalid timeseries. Such series may return may have one or more of the following properties: * duplicate dates (ts.time_series.has_duplicated_dates() ) * missing dates (ts.time_series.has_missing_dates() ) * masked values (ts.time_series.mask) The functions above in brackets return either "True" or "False" or the boolean mask array. But would be interested in the dates that my series are missing or the data points that are duplicated or masked (from input). May you give me an example how to retrieve these? I put some demo code with comments below. Example use cases: Someone sends you a data file from a datalogger or sensor recording device. * Due to battery problems, the logger did stop recording for some time (=> missing dates). It is important for inspection of the device setup to know when this happend or how long that period lasted. * The data file may have been reformatted or treated before sent to you. Due to this processing, some timsstamps have been saved twice or more (=> duplicated dates). For a correction, one would like to know where to search in the input files. * The input file has already NoData markers. They where used to mask data during loading in python (=> masked data). For error analysis the date and length of masked period is important. I would appreciate a pointer here. Regards, Timmie #### demo code: ### using the examples from http://pytseries.sourceforge.net/core/TimeSeries.html import numpy as np import scikits.timeseries as ts mlist_1 = ['2005-%02i' % i for i in range(1,10)] mlist_1 += ['2006-%02i' % i for i in range(2,13)] mdata_1 = np.arange(len(mlist_1)) mser_1 = ts.time_series(mdata_1, mlist_1, freq='M') mser_1.has_missing_dates() <55> True ### how do I retrieve a new series which contains only the dates that are missing? ## a series with masked mser_1_fill = mser_1.fill_missing_dates() mser_1_fill.mask # I tried "mser_1_fill.mask" but it returns the masked array. The timedate information is lost here. ### how do I retrieve a new series which contains only the dates that are masked? ### Basically it seems that I am looking for the opposite of mser_1_fill.compressed() mser_1_annual = ts.time_series(mdata_1, mlist_1, freq='A') mser_daily = mser_1.asfreq('D') ### how do I retrieve a new series which contains only the dates that are duplicated? mser_daily.has_duplicated_dates() <53> True From pgmdevlist at gmail.com Mon Feb 2 14:02:57 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 2 Feb 2009 14:02:57 -0500 Subject: [SciPy-user] timeseries: logging of defective time series In-Reply-To: References: Message-ID: <0946AD7A-E329-420B-BDD2-4550D1835783@gmail.com> Timmie, Remember that the mask is an array of boolean and can be used for indexing. I will also assume that your data is 1D * To find the dates corresponding to the missing values in your series: >>> series.dates[series.mask] * To find the missing dates, use fill_missing_dates first (to make sure the dates are continuous) and get the missing dates by >>> series.dates[series.mask] With your example: >>> mser_1_filled = ts.fill_missing_dates(mser_1) >>> missing_dates = mser_1_filled.dates[mser_1.mask] Note that if your initial `series` has already some missing dates, you'll pick those ones up as well. you shuld then check whether you have missing values in the first place, find the corresponding dates, fill the dates, recheck the missing ones, and take the difference between the two sets. * To find duplicated dates: Things get a tad more complicated: 1. make sure that your `series` is sorted chronologically first 2. construct the following array: >>> d = series.dates >>> dupcheck = np.r_[False, (d[1:]==d[:-1])] dupcheck is a ndarray of booleans with True values where the corresponding date is the same as the previous ones. Note that the first date of a duplicated series is flagged as False Gimme a few days to whip up a more useable function that would reproduce that (I think I already have something along those lines somewhere on my HD). > > Such series may return may have one or more of the following > properties: > > * duplicate dates (ts.time_series.has_duplicated_dates() ) > * missing dates (ts.time_series.has_missing_dates() ) > * masked values (ts.time_series.mask) has_duplicated_dates and has_missing_dates were not really meant to be used directly, but more internally to keep track of some info on the distribution of dates From gael.varoquaux at normalesup.org Mon Feb 2 14:30:27 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 2 Feb 2009 20:30:27 +0100 Subject: [SciPy-user] SciPy and GUI In-Reply-To: References: <20090125100556.GA29918@phare.normalesup.org> <49860AA0.1090300@gmx-topmail.de> <20090201213517.GG931@phare.normalesup.org> Message-ID: <20090202193027.GB7568@phare.normalesup.org> On Mon, Feb 02, 2009 at 10:04:28AM +0100, Tim Michelsen wrote: > Hello! > > Answer to 2): > > Its a question of using the right abstraction level. WxPython is a > [...] > > thread-safe. In Wx, you will quickly have to understand the fine > > details of the event loop, which is interesting, but quite off-topic > > for the scientific programmer. > [...] > > But the really important thing about Traits is that is folds together > > a set of patterns and best-practices, such as validation, model-view > > separation, default-initialization, cheap callbacks/the observer > > pattern. Using Traits puts you on a good path building a good > > architecture to your application. If you are using the raw toolkit > Hey, these well formulated explanations really convinced me to look more > closely into ETS and GUI building! Well thanks. I actually find that these problems are hard to understand and to explain and that I do not have enough insight on them, and thus my explanations are confused and go into circles. But thanks for the encouragement. Ga?l From pav at iki.fi Mon Feb 2 14:34:07 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 2 Feb 2009 19:34:07 +0000 (UTC) Subject: [SciPy-user] concatenating arrays of different dimensions References: <57b9201a0902020012o31524f04ic34cda82c27920b@mail.gmail.com> <3777FF91-1F96-4179-9599-73A3607591B7@gmail.com> Message-ID: Mon, 02 Feb 2009 03:23:26 -0500, Pierre GM wrote: > On Feb 2, 2009, at 3:12 AM, bgbg bg wrote: > >> Hello, >> Consider an Octave code that concatenates an array and a vector: >> octave:1> a = [1, 2, 3]; >> octave:2> b = [ 11, 22, 33; 44, 55 66]; octave:3> c = [a; b] >> >> >> How do I emulate this behavior in Python (scipy)? This is what i tried: >> >> > c = np.vstack((a,b)) > > For more info: > http://www.scipy.org/Numpy_Functions_by_Category#head- ca5d5fe8c131a7ab8f7d7d38796ff84dbf4a2bd0 And http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html#joining-arrays -- Pauli Virtanen From mhhohn at gmail.com Mon Feb 2 15:40:03 2009 From: mhhohn at gmail.com (michael hohn) Date: Mon, 2 Feb 2009 20:40:03 +0000 (UTC) Subject: [SciPy-user] SciPy and GUI References: Message-ID: Lorenzo Isella gmail.com> writes: > > Dear All, > I hope this is not too off-topic. Given you Python code, relying on > SciPy for number-crunching, which tools would you use to create a GUI > in order to allow someone else to use it, without his knowing much (or > anything) about scipy and programming?I know Python is great for this, > but I do not know of anything specific. > Cheers > > Lorenzo > If you are interested in a general-purpose worksheet-style interface to Python, you can have a look at the l3gui at http://l3lang.sourceforge.net, especially example 2.3. Cheers, Michael From dwf at cs.toronto.edu Tue Feb 3 02:12:32 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 3 Feb 2009 02:12:32 -0500 Subject: [SciPy-user] "clustergrams"/hierarchical clustering heat maps Message-ID: <8E0AFD62-64B7-435F-B80F-298C702BF771@cs.toronto.edu> Hi all, I was recently asked to cluster some data and I know from experience that people use these heat maps to look for patterns in multivariate data, often with a dendrogram off to the side. This involves sorting the rows and columns in a certain fashion, the details of which are somewhat fuzzy to me (and, truthfully, I'm happy with it staying that way for now). I notice that dendrogram plotting is available in scipy.cluster.hierarchy, and was wondering if the something for producing the associated sorted heat maps is available anywhere (within SciPy or otherwise). Many thanks, David From ondrej at certik.cz Tue Feb 3 04:08:51 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Tue, 3 Feb 2009 01:08:51 -0800 Subject: [SciPy-user] Matlab style line based profiling In-Reply-To: <3d375d730902020848w13cf82aan1c8b024c029294a2@mail.gmail.com> References: <365959.22116.qm@web33006.mail.mud.yahoo.com> <3d375d730902020848w13cf82aan1c8b024c029294a2@mail.gmail.com> Message-ID: <85b5c3130902030108l5c3d6d4cm7fe62da80e5d87f9@mail.gmail.com> On Mon, Feb 2, 2009 at 8:48 AM, Robert Kern wrote: > On Mon, Feb 2, 2009 at 05:36, David Baddeley > wrote: >> Hi all, >> >> a while ago I drummed up a matlab like profiling module which gives times for individual lines. Since then I've found http://packages.python.org/line_profiler/ which seems to be a bit more refined and should be somewhat faster (mine is pure python, both hook the tracing functions). Where my code does have an advantage is that I've got it producing syntax highlighted html with the most expensive lines highlighted in red with the times in the margin, like the matlab profiler. I've also used a variant of the profile on, profile off, profile report syntax which should be familiar to matlab users. >> >> Would like to make it available, but am not sure how much demand there would be for a second line profiler module and whether it wouldn't be more sensible to see if the report generation couldn't be adapted to work with the aforementioned line_profiler module (there might be licensing issues here as my html generation borrows heavily from a GPL licensed python syntax highlighter and I'm not sure if Robert would be keen on having his module tainted). > > Not much! :-) > > However, there is a version of the colorization code that you used > that is more palatably licensed: > > http://code.activestate.com/recipes/52298/ > > Here is IPython's version, which uses ANSI escape sequences for > terminal color output, which would also be a nice addition to the text > output: > > http://bazaar.launchpad.net/~ipython-dev/ipython/trunk/annotate/head%3A/IPython/PyColorize.py > > I would be happy to accept contributions in this vein. I was gearing > up to release 1.0b2 in the next day or two, but if you would like to > get a patch together in the next week, I can wait. Great, I am glad you are maintaining it. Your line profiler is very useful. Ondrej From cimrman3 at ntc.zcu.cz Tue Feb 3 04:46:03 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 03 Feb 2009 10:46:03 +0100 Subject: [SciPy-user] SciPy and GUI In-Reply-To: References:

Message-ID: <4988125B.80201@ntc.zcu.cz> michael hohn wrote: > If you are interested in a general-purpose worksheet-style interface to > Python, you can have a look at the l3gui at > http://l3lang.sourceforge.net, especially example 2.3. Very nice! r. From zachary.pincus at yale.edu Tue Feb 3 09:43:26 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 3 Feb 2009 09:43:26 -0500 Subject: [SciPy-user] "clustergrams"/hierarchical clustering heat maps In-Reply-To: <8E0AFD62-64B7-435F-B80F-298C702BF771@cs.toronto.edu> References: <8E0AFD62-64B7-435F-B80F-298C702BF771@cs.toronto.edu> Message-ID: Hi David, I don't know about making heat-maps in Python, but what I recently used for the task was the combination of "Cluster 3" (an update of Mike Eisen's original hierarchical-clustering-for-microarrays tool) to do the clustering, and "Java TreeView" to draw the heatmap/dendrogram. Cluster 3 is a bit annoying to one used to scripting analyses (lots of GUI button-pressing), but there's also a python library. Or you could just scrutinize the output format (it barfs out a few text files) and use your own clustering tools. TreeView then accepts these text files and lets you manipulate the heatmap / dendrograms (e.g. flipping nodes to get visually better results). You can then export to PS or other formats. (The PS output is pretty clean, so you can edit in Illustrator or whatnot easily.) Zach On Feb 3, 2009, at 2:12 AM, David Warde-Farley wrote: > Hi all, > > I was recently asked to cluster some data and I know from experience > that people use these heat maps to look for patterns in multivariate > data, often with a dendrogram off to the side. This involves sorting > the rows and columns in a certain fashion, the details of which are > somewhat fuzzy to me (and, truthfully, I'm happy with it staying that > way for now). > > I notice that dendrogram plotting is available in > scipy.cluster.hierarchy, and was wondering if the something for > producing the associated sorted heat maps is available anywhere > (within SciPy or otherwise). > > Many thanks, > > David > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user From gaedol at gmail.com Tue Feb 3 10:50:08 2009 From: gaedol at gmail.com (Marco) Date: Tue, 3 Feb 2009 16:50:08 +0100 Subject: [SciPy-user] FFT Filtering Message-ID: Hi list! Has anyone pointers to "applying a low pass filter to a signal's FFT" in scipy (and not only...)? Thanks, marco -- Quando sei una human pignata e la pazzo jacket si ? accorciata e non ti puoi liberare dai colpi di legno e di bastone dai petardi sul groppone Vinicio Capossela From wizzard028wise at gmail.com Tue Feb 3 19:23:53 2009 From: wizzard028wise at gmail.com (Dorian) Date: Wed, 4 Feb 2009 01:23:53 +0100 Subject: [SciPy-user] FFT Filtering In-Reply-To: References: Message-ID: <674a602a0902031623j1dd653b4jd77f3f990f3d12c5@mail.gmail.com> Could you rephrase your question ? Cheers 2009/2/3 Marco > Hi list! > > Has anyone pointers to "applying a low pass filter to a signal's FFT" > in scipy (and not only...)? > > Thanks, > > marco > > -- > > Quando sei una human pignata > e la pazzo jacket si ? accorciata > e non ti puoi liberare > dai colpi di legno e di bastone > dai petardi sul groppone > > Vinicio Capossela > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arokem at berkeley.edu Tue Feb 3 19:30:52 2009 From: arokem at berkeley.edu (Ariel Rokem) Date: Tue, 3 Feb 2009 16:30:52 -0800 Subject: [SciPy-user] FFT Filtering In-Reply-To: <674a602a0902031623j1dd653b4jd77f3f990f3d12c5@mail.gmail.com> References: <674a602a0902031623j1dd653b4jd77f3f990f3d12c5@mail.gmail.com> Message-ID: <79E709DF-013D-4863-A3B2-CF184E45B79B@berkeley.edu> Chapters 12 and 13 here: http://www.nrbook.com/nr3/ are a good place to start. On Feb 3, 2009, at 4:23 PM, Dorian wrote: > Could you rephrase your question ? > > Cheers > > 2009/2/3 Marco > Hi list! > > Has anyone pointers to "applying a low pass filter to a signal's FFT" > in scipy (and not only...)? > > Thanks, > > marco > > -- > > Quando sei una human pignata > e la pazzo jacket si ? accorciata > e non ti puoi liberare > dai colpi di legno e di bastone > dai petardi sul groppone > > Vinicio Capossela > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Wed Feb 4 00:38:18 2009 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 4 Feb 2009 07:38:18 +0200 Subject: [SciPy-user] FFT Filtering In-Reply-To: References: Message-ID: <6a17e9ee0902032138y390d7025mb23b5bc8b30b41cb@mail.gmail.com> > 2009/2/3 Marco : > Has anyone pointers to "applying a low pass filter to a signal's FFT" > in scipy (and not only...)? The suggestion to read the relevant sections in Numerical Recipes is a good start. After that, you can probably find the tools you need in scipy.signal http://docs.scipy.org/doc/scipy/reference/tutorial/signal.html http://docs.scipy.org/doc/scipy/reference/signal.html also see this cookbook entry http://www.scipy.org/Cookbook/SavitzkyGolay Cheers, Scott From starsareblueandfaraway at gmail.com Wed Feb 4 11:31:27 2009 From: starsareblueandfaraway at gmail.com (Roy H. Han) Date: Wed, 4 Feb 2009 11:31:27 -0500 Subject: [SciPy-user] Mysterious kmeans() error Message-ID: <6a5569ec0902040831i39e6e683y6ab8f97d8363b606@mail.gmail.com> Has anyone seen this error before? I have no idea what it means. I'm using version 0.6.0 packaged for Fedora. I'm getting this error using the kmeans2() implementation in scipy.cluster.vq File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", line 55, in grapeCluster assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, iter=iterationCountPerBurst)[1] File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line 563, in kmeans2 clusters = init(data, k) File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line 469, in _krandinit x = N.dot(x, N.linalg.cholesky(cov).T) + mu File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", line 418, in cholesky Cholesky decomposition cannot be computed' numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - Cholesky decomposition cannot be computed Thanks, RHH From starsareblueandfaraway at gmail.com Wed Feb 4 12:28:35 2009 From: starsareblueandfaraway at gmail.com (Roy H. Han) Date: Wed, 4 Feb 2009 12:28:35 -0500 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <6a5569ec0902040831i39e6e683y6ab8f97d8363b606@mail.gmail.com> References: <6a5569ec0902040831i39e6e683y6ab8f97d8363b606@mail.gmail.com> Message-ID: <6a5569ec0902040928l3e48680co404c9861ee6e067b@mail.gmail.com> As a side comment, if I use Pycluster, then the clustering proceeds without error. On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han wrote: > Has anyone seen this error before? I have no idea what it means. I'm > using version 0.6.0 packaged for Fedora. > I'm getting this error using the kmeans2() implementation in scipy.cluster.vq > > > File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", > line 55, in grapeCluster > assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, > iter=iterationCountPerBurst)[1] > File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line > 563, in kmeans2 > clusters = init(data, k) > File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line > 469, in _krandinit > x = N.dot(x, N.linalg.cholesky(cov).T) + mu > File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", > line 418, in cholesky > Cholesky decomposition cannot be computed' > numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - > Cholesky decomposition cannot be computed > > > Thanks, > RHH > From josef.pktd at gmail.com Wed Feb 4 12:44:27 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Feb 2009 12:44:27 -0500 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <6a5569ec0902040928l3e48680co404c9861ee6e067b@mail.gmail.com> References: <6a5569ec0902040831i39e6e683y6ab8f97d8363b606@mail.gmail.com> <6a5569ec0902040928l3e48680co404c9861ee6e067b@mail.gmail.com> Message-ID: <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d@mail.gmail.com> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han wrote: > As a side comment, if I use Pycluster, then the clustering proceeds > without error. > > On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han > wrote: >> Has anyone seen this error before? I have no idea what it means. I'm >> using version 0.6.0 packaged for Fedora. >> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq >> >> >> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", >> line 55, in grapeCluster >> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, >> iter=iterationCountPerBurst)[1] >> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >> 563, in kmeans2 >> clusters = init(data, k) >> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >> 469, in _krandinit >> x = N.dot(x, N.linalg.cholesky(cov).T) + mu >> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", >> line 418, in cholesky >> Cholesky decomposition cannot be computed' >> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - >> Cholesky decomposition cannot be computed This is just a general answer, I never used scipy.cluster The error message means that the covariance matrix of your np.cov(data) is not positive definite. Check your data, whether there is any linear dependence, eg. look at eigenvalues of np.cov(data). If that's not the source of the error, then a cluster expert is needed. Josef >> >> >> Thanks, >> RHH >> > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From grh at mur.at Wed Feb 4 15:04:42 2009 From: grh at mur.at (Georg Holzmann) Date: Wed, 04 Feb 2009 21:04:42 +0100 Subject: [SciPy-user] audiolab Problem Message-ID: <4989F4DA.7050509@mur.at> Hallo David! I have a problem with using the scikits.audiolab package (from svn) on latest ubuntu (python 2.5.2, numpy 1.1.1). When I want to import the module I get the following error: import scikits.audiolab File "/usr/lib/python2.5/site-packages/scikits/audiolab/__init__.py", line 37, in from numpy.testing import Tester ImportError: cannot import name Tester At least in my numpy version there is no Tester class in numpy.testing. Ok, so I just removed this line in the __init__ file ... However, I wanted to run the tests, because you wrote on the audiolab website, that there can be a nasty bug with integers which corrupts the audio ... But when I try to run the tests I get the next errors: Traceback (most recent call last): File "test_matapi.py", line 22, in class test_audiolab(TestCase): NameError: name 'TestCase' is not defined And there are a few more ... Do you have a clue whats the problem ? In my numpy version the test cases have a different name ... Or do I need to run these tests on latest Ubuntu - is the integer bug still a problem ? Thanks for any hints ! LG Georg From stefan at sun.ac.za Wed Feb 4 15:42:10 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 4 Feb 2009 22:42:10 +0200 Subject: [SciPy-user] audiolab Problem In-Reply-To: <4989F4DA.7050509@mur.at> References: <4989F4DA.7050509@mur.at> Message-ID: <9457e7c80902041242n6b2e85bat8cf0c9cdd5da1788@mail.gmail.com> Hi George 2009/2/4 Georg Holzmann : > I have a problem with using the scikits.audiolab package (from svn) on > latest ubuntu (python 2.5.2, numpy 1.1.1). > > When I want to import the module I get the following error: > import scikits.audiolab > File "/usr/lib/python2.5/site-packages/scikits/audiolab/__init__.py", > line 37, in > from numpy.testing import Tester > ImportError: cannot import name Tester > > At least in my numpy version there is no Tester class in numpy.testing. > Ok, so I just removed this line in the __init__ file ... "Tester" has been added after 1.1. Your workaround is OK. > However, I wanted to run the tests, because you wrote on the audiolab > website, that there can be a nasty bug with integers which corrupts the > audio ... > But when I try to run the tests I get the next errors: > Traceback (most recent call last): > File "test_matapi.py", line 22, in > class test_audiolab(TestCase): > NameError: name 'TestCase' is not defined > > And there are a few more ... Just replace TestCase with unittest.TestCase (you'll have to import unittest, too). You should then be able to run the tests with: nosetests scikits.audiolab Cheers St?fan From grh at mur.at Wed Feb 4 16:19:04 2009 From: grh at mur.at (Georg Holzmann) Date: Wed, 04 Feb 2009 22:19:04 +0100 Subject: [SciPy-user] audiolab Problem In-Reply-To: <9457e7c80902041242n6b2e85bat8cf0c9cdd5da1788@mail.gmail.com> References: <4989F4DA.7050509@mur.at> <9457e7c80902041242n6b2e85bat8cf0c9cdd5da1788@mail.gmail.com> Message-ID: <498A0648.6000409@mur.at> Hallo! >> At least in my numpy version there is no Tester class in numpy.testing. >> Ok, so I just removed this line in the __init__ file ... > > "Tester" has been added after 1.1. Your workaround is OK. Hm, I see ... >> However, I wanted to run the tests, because you wrote on the audiolab >> website, that there can be a nasty bug with integers which corrupts the >> audio ... >> But when I try to run the tests I get the next errors: >> Traceback (most recent call last): >> File "test_matapi.py", line 22, in >> class test_audiolab(TestCase): >> NameError: name 'TestCase' is not defined >> >> And there are a few more ... > > Just replace TestCase with unittest.TestCase (you'll have to import > unittest, too). Yes thanks, but there are more problems - not only TestCase ... Are there somewhere older packages of audiolab, which compile on standard systems ? (latest ubuntu, so I think this is quite up to date) I used this package a few months ago without problems ... Thanks, LG Georg From david at ar.media.kyoto-u.ac.jp Thu Feb 5 05:30:02 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 05 Feb 2009 19:30:02 +0900 Subject: [SciPy-user] audiolab Problem In-Reply-To: <4989F4DA.7050509@mur.at> References: <4989F4DA.7050509@mur.at> Message-ID: <498ABFAA.8090508@ar.media.kyoto-u.ac.jp> Georg Holzmann wrote: > Hallo David! > > I have a problem with using the scikits.audiolab package (from svn) on > latest ubuntu (python 2.5.2, numpy 1.1.1). > Hi Georg, Audiolab requires numpy 1.2 or above. I think we should push numpy 1.2 to Ubuntu 9.04 while there is still time - 1.2 was a significant release. > When I want to import the module I get the following error: > import scikits.audiolab > File "/usr/lib/python2.5/site-packages/scikits/audiolab/__init__.py", > line 37, in > from numpy.testing import Tester > ImportError: cannot import name Tester > This has nothing to do with the problem, but may I suggest not to install anything from sources into /usr/ ? You should install either in /usr/local or somewhere else, because if audiolab becomes packaged by Ubuntu, you will mess up things. It is a good idea to never install anything from sources in /usr, > At least in my numpy version there is no Tester class in numpy.testing. > Ok, so I just removed this line in the __init__ file ... > > However, I wanted to run the tests, because you wrote on the audiolab > website, that there can be a nasty bug with integers which corrupts the > audio ... > Yes, there was a ctypes bug in old Ubuntu. The new audiolab version should avoid this problem altogether. > Do you have a clue whats the problem ? > In my numpy version the test cases have a different name ... > Or do I need to run these tests on latest Ubuntu - is the integer bug > still a problem ? > Not on recent Ubuntu releases, no. David From bloodearnest at gmail.com Thu Feb 5 06:37:47 2009 From: bloodearnest at gmail.com (Wavy Davy) Date: Thu, 5 Feb 2009 11:37:47 +0000 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu Message-ID: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> Hi all I am using the mannwhitneyu in the stats module, and I was looking the code and I see this notice in the docstring. "Use only when the n in each condition is < 20 and you have 2 independent samples of ranks. " Am I reading it correctly that this test should only be used with sample sizes less than 20? I am not a statistican, more a python coder. I have been pointed and this test as a more robust version of the t-test, so forgive my ignorance. Any help would be much appreciated. -- Simon From alexander.borghgraef.rma at gmail.com Thu Feb 5 09:09:28 2009 From: alexander.borghgraef.rma at gmail.com (Alexander Borghgraef) Date: Thu, 5 Feb 2009 15:09:28 +0100 Subject: [SciPy-user] Vector field filtering Message-ID: <9e8c52a20902050609pe71a53fl970b1123d3a5c374@mail.gmail.com> Hi all, I'm trying to implement a mean shift filtering algorithm, and for that I need to apply a sliding window to a vector field or image, possibly with as output a vector field of different dimensions. So for example I could be filtering an RGB image of shape (3, height, width) and returning a (x, y)+RGB vectorfield containing the mean shift vector as wel as the color date, resulting in a shape (5,height, width). For solving this, I looked into scipy.ndimage.generic_filter, but that doesn't seem to do the trick. For one it can't handle input and output being of different shape (easy to circumvent by adding dummy (x,y) to the input), and more importantly, it doesn't feature an axis option, meaning that it shifts the filter footprint not only across the width and height, but also across the vector dimension, which is not what I need. So generic_filter is out, any suggestions for an alternative ready-made numpy solution? Anything in scikits? Or should I implement my own sliding window for vectorfields? -- Alex Borghgraef From josef.pktd at gmail.com Thu Feb 5 09:33:05 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 09:33:05 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> Message-ID: <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> On Thu, Feb 5, 2009 at 6:37 AM, Wavy Davy wrote: > Hi all > > I am using the mannwhitneyu in the stats module, and I was looking the > code and I see this notice in the docstring. > > "Use only when the n in each condition is < 20 and you have 2 > independent samples of ranks. " > > Am I reading it correctly that this test should only be used with > sample sizes less than 20? > > I am not a statistican, more a python coder. I have been pointed and > this test as a more robust version of the t-test, so forgive my > ignorance. > > Any help would be much appreciated. > > -- > Simon I briefly looked at the test, the implementation of the test statistic is mostly as described in http://en.wikipedia.org/wiki/Mann-Whitney_U_test It seems the test statistic is defined with the opposite sign from the definition in wikipedia. The doc string statement "Use only when the n in each condition is < 20", I think should be >20, since the pvalue is based on the asymptotic distribution, which is only correct in larger samples. I didn't see any unit tests for this test, but I will try to verify the results later today. wilcoxon is a similar test for paired instead of independent samples, and there the recommendation in the docstring is for N>20. Josef From sturla at molden.no Thu Feb 5 09:32:59 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 15:32:59 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> Message-ID: <498AF89B.6070404@molden.no> On 2/5/2009 12:37 PM, Wavy Davy wrote: > I am using the mannwhitneyu in the stats module, and I was looking the > code and I see this notice in the docstring. > > "Use only when the n in each condition is < 20 and you have 2 > independent samples of ranks. " > > Am I reading it correctly that this test should only be used with > sample sizes less than 20? First of all, the Mann-Withney U-test should NEVER be used. It has assumptions that are mathematically problematic, known as the "Behrens-Fisher problem". What you probably want to use is the "Wilcoxon rank-sum test". Despite common belief, Mann-Withney U and Wilcoxon rank-sum are not the same test. The latter assumes equal variance, the former do not. The Mann-Withney U has even been shown to fail when distributions have unequal variance (Journal of Experimental Education, Vol. 60, 1992), so its justification over the Wilcoxon rank-sum test is questionable. Wikipedia says the Wilcoxon rank-sum test assumes equal sample sizes; this is not correct. I would vote for the immediate removal of Mann-Withney U-test from SciPy. The only thing it should do is raise an exception and instruct the user to apply a t-test or Wilcoxon rank-sum test instead. As a side note, if you request a Mann-Withney test in MINITAB, you actually get a Wilcoxon rank-sum test instead. Then for your question: If N > 20, you can just as well use a t-test. Its assumptions will be asymptotically valid due to the central limit theorem, even though the data are not normally distributed. If you are worried about outliers, as opposed to systematic deviation from normality, use the Wilcoxon rank-sum test instead: When the data is transformed to rank scale and the two sample sizes are M and N respectively, the Mann-Withney U-statistic has O(N*M) complexity whereas the Wilcoxon rank-sum statistic only has O(N+M) complexity. O(N*M) behaviour makes the Mann-Withney U-statistic intractable for large samples. Sturla Molden From sturla at molden.no Thu Feb 5 09:38:31 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 15:38:31 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> Message-ID: <498AF9E7.90200@molden.no> On 2/5/2009 3:33 PM, josef.pktd at gmail.com wrote: > wilcoxon is a similar test for paired instead of independent samples, > and there the recommendation in the docstring is for N>20. There are two Wilcoxon tests. The signed-rank test for paired samples and the rank-sum test for independent samples. S.M. From bsouthey at gmail.com Thu Feb 5 09:56:08 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 05 Feb 2009 08:56:08 -0600 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> Message-ID: <498AFE08.8020402@gmail.com> Wavy Davy wrote: > Hi all > > I am using the mannwhitneyu in the stats module, and I was looking the > code and I see this notice in the docstring. > > "Use only when the n in each condition is < 20 and you have 2 > independent samples of ranks. " > > Am I reading it correctly that this test should only be used with > sample sizes less than 20? > > I am not a statistican, more a python coder. I have been pointed and > this test as a more robust version of the t-test, so forgive my > ignorance. > > Any help would be much appreciated. > > I think the docstring is referring to the distribution of the actual U-test. So for small samples typically the pvalue is directly computed from the sampling distribution. However, Scipy is using the normal approximation is which is not meant to be that great. http://faculty.vassar.edu/lowry/utest.html http://faculty.vassar.edu/lowry/ch11a.html http://www.alglib.net/statistics/hypothesistesting/mannwhitneyu.php Bruce From josef.pktd at gmail.com Thu Feb 5 10:36:08 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 10:36:08 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498AF9E7.90200@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> Message-ID: <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> On Thu, Feb 5, 2009 at 9:38 AM, Sturla Molden wrote: > On 2/5/2009 3:33 PM, josef.pktd at gmail.com wrote: > >> wilcoxon is a similar test for paired instead of independent samples, >> and there the recommendation in the docstring is for N>20. > > There are two Wilcoxon tests. The signed-rank test for paired samples > and the rank-sum test for independent samples. > According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test for independent samples, just a different name. Josef From sturla at molden.no Thu Feb 5 10:59:04 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 16:59:04 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> Message-ID: <498B0CC8.90007@molden.no> On 2/5/2009 4:36 PM, josef.pktd at gmail.com wrote: > According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test > for independent samples, just a different name. You should not trust Wikipedia. From stefan at sun.ac.za Thu Feb 5 11:09:43 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 5 Feb 2009 18:09:43 +0200 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B0CC8.90007@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <498B0CC8.90007@molden.no> Message-ID: <9457e7c80902050809wf101707y9ac654f64d86e8c4@mail.gmail.com> 2009/2/5 Sturla Molden : > On 2/5/2009 4:36 PM, josef.pktd at gmail.com wrote: > >> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test >> for independent samples, just a different name. > > You should not trust Wikipedia. Or, you can fix the entry on Wikipedia... St?fan From grh at mur.at Thu Feb 5 11:13:53 2009 From: grh at mur.at (Georg Holzmann) Date: Thu, 05 Feb 2009 17:13:53 +0100 Subject: [SciPy-user] audiolab Problem In-Reply-To: <498ABFAA.8090508@ar.media.kyoto-u.ac.jp> References: <4989F4DA.7050509@mur.at> <498ABFAA.8090508@ar.media.kyoto-u.ac.jp> Message-ID: <498B1041.2060807@mur.at> Hallo! > Audiolab requires numpy 1.2 or above. I think we should push numpy > 1.2 to Ubuntu 9.04 while there is still time - 1.2 was a significant > release. OK, I see. However, it would be nice to have the old working audiolab code somewhere, which can be used on recent systems ... > >> When I want to import the module I get the following error: >> import scikits.audiolab >> File "/usr/lib/python2.5/site-packages/scikits/audiolab/__init__.py", >> line 37, in >> from numpy.testing import Tester >> ImportError: cannot import name Tester >> > > This has nothing to do with the problem, but may I suggest not to > install anything from sources into /usr/ ? Hm ... I didn't notice that ... (was not my intention) I just typed 'python setup.py install'. >> Do you have a clue whats the problem ? >> In my numpy version the test cases have a different name ... >> Or do I need to run these tests on latest Ubuntu - is the integer bug >> still a problem ? >> > > Not on recent Ubuntu releases, no. OK - thanks for your feedback ! LG Georg From bloodearnest at gmail.com Thu Feb 5 11:24:45 2009 From: bloodearnest at gmail.com (Wavy Davy) Date: Thu, 5 Feb 2009 16:24:45 +0000 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498AFE08.8020402@gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498AFE08.8020402@gmail.com> Message-ID: <5063d0650902050824m16bfc86aia38245113f448ef4@mail.gmail.com> 2009/2/5 Bruce Southey : > I think the docstring is referring to the distribution of the actual > U-test. So for small samples typically the pvalue is directly computed > from the sampling distribution. However, Scipy is using the normal > approximation is which is not meant to be that great. > > http://faculty.vassar.edu/lowry/utest.html > http://faculty.vassar.edu/lowry/ch11a.html > http://www.alglib.net/statistics/hypothesistesting/mannwhitneyu.php OK - that makes more sense. Thanks. I've ended up using the Kruskal-Wallis extension to Mann-Whitney anyway, as I have multiple data samples. Which of course scipy provides with the kurskal function. Confusing docstrings aside, its been a pleasure to us :) -- Simon From bsouthey at gmail.com Thu Feb 5 11:32:51 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 05 Feb 2009 10:32:51 -0600 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <9457e7c80902050809wf101707y9ac654f64d86e8c4@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <498B0CC8.90007@molden.no> <9457e7c80902050809wf101707y9ac654f64d86e8c4@mail.gmail.com> Message-ID: <498B14B3.2070105@gmail.com> St?fan van der Walt wrote: > 2009/2/5 Sturla Molden : > >> On 2/5/2009 4:36 PM, josef.pktd at gmail.com wrote: >> >> >>> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test >>> for independent samples, just a different name. >>> >> You should not trust Wikipedia. >> > > Or, you can fix the entry on Wikipedia... > > St?fan > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > Or perhaps it is actually correct. My understanding (because I don't want to do it) is that these are equivalent and all major stats packages provide just one test. For example, Prof Brian Ripley's reply on the R list https://stat.ethz.ch/pipermail/r-help/2005-May/071544.html >/ I am hoping someone could shed some light into the Wilcoxon Rank Sum Test />/ for me? In looking through Stats references, the Mann-Whitney U-test and />/ the Wilcoxon Rank Sum Test are statistically equivalent. / Yes, but not numerically: they differ by a constant (in the data, a function of the data size). Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 5 11:39:19 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 11:39:19 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> Message-ID: <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> On Thu, Feb 5, 2009 at 10:36 AM, wrote: > On Thu, Feb 5, 2009 at 9:38 AM, Sturla Molden wrote: >> On 2/5/2009 3:33 PM, josef.pktd at gmail.com wrote: >> >>> wilcoxon is a similar test for paired instead of independent samples, >>> and there the recommendation in the docstring is for N>20. >> >> There are two Wilcoxon tests. The signed-rank test for paired samples >> and the rank-sum test for independent samples. >> > > According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test > for independent samples, just a different name. > > Josef > So far: According to R: wilcox.test(x,y) Performs one and two sample Wilcoxon tests on vectors of data; the latter is also known as 'Mann-Whitney' test. I tried a normal random variable example ( no ties): the test statistic returned is exactly the same as the one returned by stats.mannwhitneyu(x,y) however the p-values differ. the pvalue in stats is half of the one in R (up to 1e-17) as stated in the docstring: one-tailed p-value. In R the test statistic is the same for the two sided and the one sided tests, but the reported p-values differ. I used sample size 100. So there is an inconsistency in the reporting in stats.mannwhitneyu, the test statistic is for the two-sided test, but the p-value is half of the two sided p-value and should be multiplied by two. I haven't checked the tie handling. Josef From sturla at molden.no Thu Feb 5 11:56:27 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 17:56:27 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> Message-ID: <498B1A3B.8040603@molden.no> On 2/5/2009 5:39 PM, josef.pktd at gmail.com wrote: > According to R: > wilcox.test(x,y) > Performs one and two sample Wilcoxon tests on vectors of data; the > latter is also known as 'Mann-Whitney' test. > > I tried a normal random variable example ( no ties): the test > statistic returned is exactly the same as the one returned by > stats.mannwhitneyu(x,y) however the p-values differ. the pvalue in > stats is half of the one in R (up to 1e-17) as stated in the > docstring: one-tailed p-value. I believe there is a bug in SciPy: def mannwhitneyu(x, y): """Calculates a Mann-Whitney U statistic on the provided scores and returns the result. Use only when the n in each condition is < 20 and you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is significant if the u-obtained is LESS THAN or equal to the critical value of U. Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) """ x = asarray(x) y = asarray(y) n1 = len(x) n2 = len(y) ranked = rankdata(np.concatenate((x,y))) rankx = ranked[0:n1] # get the x-ranks #ranky = ranked[n1:] # the rest are y-ranks u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx,axis=0) # calc U for x u2 = n1*n2 - u1 # remainder is U for y bigu = max(u1,u2) smallu = min(u1,u2) T = np.sqrt(tiecorrect(ranked)) # correction factor for tied scores if T == 0: raise ValueError, 'All numbers are identical in amannwhitneyu' sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0) z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc return smallu, 1.0 - zprob(z) Take a look at the last two lines? Do you see something peculiar? Sturla Molden From gaedol at gmail.com Thu Feb 5 12:17:55 2009 From: gaedol at gmail.com (Marco) Date: Thu, 5 Feb 2009 18:17:55 +0100 Subject: [SciPy-user] Lowpass Filter Message-ID: Hi list! Let's suppose a to be a 1D array with N elements. Basically, it's a signal of some sort. How do I apply a low pass filter (with selected frequency and width) to this signal? How to store the resulting, filtered, signal, in a new array? I had a look at lp2lp() in scipy.signal, but it returns, if I am right, a filter object, which then I dunno how to use to filter my data. Any ideas or pointers? TIA, marco -- Quando sei una human pignata e la pazzo jacket si ? accorciata e non ti puoi liberare dai colpi di legno e di bastone dai petardi sul groppone Vinicio Capossela From josef.pktd at gmail.com Thu Feb 5 12:23:16 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 12:23:16 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B1A3B.8040603@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> Message-ID: <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> On Thu, Feb 5, 2009 at 11:56 AM, Sturla Molden wrote: > On 2/5/2009 5:39 PM, josef.pktd at gmail.com wrote: > >> According to R: >> wilcox.test(x,y) >> Performs one and two sample Wilcoxon tests on vectors of data; the >> latter is also known as 'Mann-Whitney' test. >> >> I tried a normal random variable example ( no ties): the test >> statistic returned is exactly the same as the one returned by >> stats.mannwhitneyu(x,y) however the p-values differ. the pvalue in >> stats is half of the one in R (up to 1e-17) as stated in the >> docstring: one-tailed p-value. > > > I believe there is a bug in SciPy: > > > def mannwhitneyu(x, y): > """Calculates a Mann-Whitney U statistic on the provided scores and > returns the result. Use only when the n in each condition is < 20 and > you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is > significant if the u-obtained is LESS THAN or equal to the critical > value of U. > > Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) > """ > x = asarray(x) > y = asarray(y) > n1 = len(x) > n2 = len(y) > ranked = rankdata(np.concatenate((x,y))) > rankx = ranked[0:n1] # get the x-ranks > #ranky = ranked[n1:] # the rest are y-ranks > u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx,axis=0) # calc U for x > u2 = n1*n2 - u1 # remainder is U for y > bigu = max(u1,u2) > smallu = min(u1,u2) > T = np.sqrt(tiecorrect(ranked)) # correction factor for tied scores > if T == 0: > raise ValueError, 'All numbers are identical in amannwhitneyu' > sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0) > z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc > return smallu, 1.0 - zprob(z) > > > Take a look at the last two lines? Do you see something peculiar? > > Sturla Molden > you mean that it uses bigu for the p-value calculation but reports smallu as the test-statistic? I didn't try to figure out what the formula for the p-value actually is, but I'm pretty happy that we get the same result as R, except for the times 2. I looked some more at the R implementation : the main difference is that R uses by default a continuity correction "correct a logical indicating whether to apply continuity correction in the normal approximation for the p-value" >>> rwilcox=rpy.r('wilcox.test') >>> stats.mannwhitneyu(rvs1,rvs2)[1]*2 - rwilcox(rvs1,rvs2,correct = False)['p.value'] -1.5265566588595902e-016 The test statistic in R is not symmetric in its argument, although the p-values are, stats.mannwhitneyu is symmetric in statistic and p-value. >>> rresult = rwilcox(rvs2,rvs1) >>> rresult['statistic'] {'W': 5637.0} >>> rresult['p.value'] 0.11989439052971607 >>> rresult = rwilcox(rvs1,rvs2) >>> rresult['statistic'] {'W': 4363.0} >>> rresult['p.value'] 0.11989439052971618 So overall stats.mannwhitney, I think, looks pretty good but it could be expanded to include some of the options that R offers, and I also think we should multiply the pvalue by 2, so that the reported p-value actually corresponds to the test. Josef From sturla at molden.no Thu Feb 5 12:29:43 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 18:29:43 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> Message-ID: <498B2207.2030303@molden.no> On 2/5/2009 6:23 PM, josef.pktd at gmail.com wrote: > you mean that it uses bigu for the p-value calculation but reports > smallu as the test-statistic? Yes S.M. From arokem at berkeley.edu Thu Feb 5 12:34:48 2009 From: arokem at berkeley.edu (Ariel Rokem) Date: Thu, 5 Feb 2009 09:34:48 -0800 Subject: [SciPy-user] Lowpass Filter In-Reply-To: References: Message-ID: <43958ee60902050934q3c9f98d8r38631c1e500b5faf@mail.gmail.com> Hi - I don't know if this what you want (I don't know how to use lp2lp or scipy.signal), but one strategy that I have used is to convolve your signal with a box-car function of a length equal to the inverse of your cut-off. This is most definitely not the best filter known to man, but fwiw here is the code. For example (here I do a lowpass and then subtract the low-passed signal from the original, effectively doing a quick-and-ugly highpass) : box_car = np.ones(np.ceil(1.0/(f_c/TR))) #TR is the inverse of the sampling frequency in the fMRI signal I am analyzing, f_c is the cutoff box_car = box_car/(float(len(box_car))) print('Normalizing and detrending time series') for i in range(len(tSeries)): #Detrending #Start by applying a low-pass to the signal: #Pad the signal on each side with the initial and terminal signal value: pad_s = np.append(np.ones(len(box_car)) * tSeries[i][0], tSeries[i][:]) pad_s = np.append(pad_s, np.ones(len(box_car)) * tSeries[i][-1]) #Filter operation is a convolution with the box-car: conv_s = np.convolve(pad_s,box_car) #Extract the low pass signal (by excising the central len(tSeries) points: s_lp= conv_s[len(conv_s)/2-np.ceil(len(tSeries[i][:])/2.0):len(conv_s)/2+len(tSeries[i][:])/2] #ceil(/2.0) for cases where the tSeries has an odd number of points #Extract the high pass signal simply by subtracting the high pass signal #from the original signal: tSeries[i] = tSeries[i][:] - s_lp + np.mean(s_lp) #add mean to make sure that there are no negative values #Normalization tSeries[i] = tSeries[i]/np.mean(tSeries[i])-1 On Thu, Feb 5, 2009 at 9:17 AM, Marco wrote: > Hi list! > > Let's suppose a to be a 1D array with N elements. > Basically, it's a signal of some sort. > > How do I apply a low pass filter (with selected frequency and width) > to this signal? > How to store the resulting, filtered, signal, in a new array? > > I had a look at lp2lp() in scipy.signal, but it returns, if I am > right, a filter object, which then I dunno how to use to filter my > data. > > Any ideas or pointers? > > TIA, > > marco > > > > -- > > Quando sei una human pignata > e la pazzo jacket si ? accorciata > e non ti puoi liberare > dai colpi di legno e di bastone > dai petardi sul groppone > > Vinicio Capossela > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu Feb 5 12:35:46 2009 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 05 Feb 2009 12:35:46 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B0CC8.90007@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <498B0CC8.90007@molden.no> Message-ID: <498B2372.1000508@american.edu> > On 2/5/2009 4:36 PM, josef.pktd at gmail.com wrote: >> According to wikipedia, Mann-Whitney-U is the Wilcoxon rank-sum test >> for independent samples, just a different name. On 2/5/2009 10:59 AM Sturla Molden apparently wrote: > You should not trust Wikipedia. Or any other encyclopedia. But actually Wikipedia is usually pretty good on technical matters, and can easily be fixed. fwiw, Alan Isaac From josef.pktd at gmail.com Thu Feb 5 12:46:48 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 12:46:48 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B2207.2030303@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> Message-ID: <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> On Thu, Feb 5, 2009 at 12:29 PM, Sturla Molden wrote: > On 2/5/2009 6:23 PM, josef.pktd at gmail.com wrote: > >> you mean that it uses bigu for the p-value calculation but reports >> smallu as the test-statistic? > > Yes > Given that it works, I didn't want to spend time on this, but wikipedia again: "therefore, the absolute value of the z statistic calculated will be same whichever value of U is used." As I understand it, because the sum U1+U2 is fixed (given the sample sizes), many properties are equivalent, i.e. U1 - meanU = - (U2 - meanU) so whether bigU or smallU is used in the calculation of z doesn't matter, I have no idea why in this specific implementation both are calculated if smallU would be enough. Josef From sturla at molden.no Thu Feb 5 12:49:02 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 18:49:02 +0100 Subject: [SciPy-user] Lowpass Filter In-Reply-To: References: Message-ID: <498B268E.7090502@molden.no> On 2/5/2009 6:17 PM, Marco wrote: > How do I apply a low pass filter (with selected frequency and width) > to this signal? What kind of lowpass filter? Single-pole? Butterworth? Bessel? Gaussian? Cheychev? Elliptic? Truncated sinc with window? What kind of window? But basically: - First obtain your filter coefficients. Filter design is an extensive subject; I cannot cover it here. Consult a text book. - Short FIR or IIR: apply filter to signal with scipy.signal.lfilter. - Long FIR: use numpy.fft.rfft for convolution in the Fourier plane. (You will get faster results with FFTW instead of NumPy's FFT.) S.M. From c-b at asu.edu Thu Feb 5 12:28:47 2009 From: c-b at asu.edu (Christopher Brown) Date: Thu, 05 Feb 2009 10:28:47 -0700 Subject: [SciPy-user] Lowpass Filter In-Reply-To: References: Message-ID: <498B21CF.6040105@asu.edu> Hi Marco, M> Let's suppose a to be a 1D array with N elements. M> Basically, it's a signal of some sort. M> M> How do I apply a low pass filter (with selected frequency and width) M> to this signal? M> How to store the resulting, filtered, signal, in a new array? M> M> I had a look at lp2lp() in scipy.signal, but it returns, if I am M> right, a filter object, which then I dunno how to use to filter my M> data. M> M> Any ideas or pointers? The following is a low-pass Butterworth filter cutoff = 500. fs = 44100. nyq = fs/2. filterorder = 5 b,a = scipy.signal.filter_design.butter(filterorder,cutoff/nyq) filteredsignal = scipy.signal.lfilter(b,a,signal) -- Chris From sturla at molden.no Thu Feb 5 13:03:34 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 19:03:34 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> Message-ID: <498B29F6.4080508@molden.no> On 2/5/2009 6:46 PM, josef.pktd at gmail.com wrote: > so whether bigU or smallU is used in the calculation of z doesn't > matter, I have no idea why in this specific implementation both are > calculated if smallU would be enough. By the way, there is a fucntion scipy.stats.ranksums that does a Wilcoxon rank-sum test. It seems to be using a large-sample approximation, and has no correction for tied ranks. S.M. From pgmdevlist at gmail.com Thu Feb 5 13:11:55 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 5 Feb 2009 13:11:55 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B29F6.4080508@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> Message-ID: On Feb 5, 2009, at 1:03 PM, Sturla Molden wrote: > On 2/5/2009 6:46 PM, josef.pktd at gmail.com wrote: > >> so whether bigU or smallU is used in the calculation of z doesn't >> matter, I have no idea why in this specific implementation both are >> calculated if smallU would be enough. > > By the way, there is a fucntion scipy.stats.ranksums that does a > Wilcoxon rank-sum test. It seems to be using a large-sample > approximation, and has no correction for tied ranks. Please keep in mind that some of the tests have been reimplemented in scipy.stats.mstats to support masked/missing values in scipy.mstats and to take ties into accounts ... I trust y'all to let me know of any inconsistencies between the masked/ unmasked versions, whether in terms of signatures or assumptions. Thx a lot in advance... From josef.pktd at gmail.com Thu Feb 5 13:16:22 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 13:16:22 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B29F6.4080508@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> Message-ID: <1cd32cbb0902051016j3eeeef55v8e7217c84e30bd2e@mail.gmail.com> On Thu, Feb 5, 2009 at 1:03 PM, Sturla Molden wrote: > On 2/5/2009 6:46 PM, josef.pktd at gmail.com wrote: > >> so whether bigU or smallU is used in the calculation of z doesn't >> matter, I have no idea why in this specific implementation both are >> calculated if smallU would be enough. > > By the way, there is a fucntion scipy.stats.ranksums that does a > Wilcoxon rank-sum test. It seems to be using a large-sample > approximation, and has no correction for tied ranks. > > S.M. > Also, in the explanation for kruskal it says it's an extension of Mann-Whitney-U to more than 2 groups for 2 groups (no ties): >>> stats.kruskal(rvs1,rvs2)[1] - stats.mannwhitneyu(rvs1,rvs2)[1]*2 -4.8572257327350599e-016 >>> stats.kruskal(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1] -4.8572257327350599e-016 >>> stats.ranksums(rvs1,rvs2)[1] - stats.mannwhitneyu(rvs1,rvs2)[1]*2 0.0 It looks like there are some redundancies or small variations in these tests. A systematic list of all tests would be pretty useful. Josef From josef.pktd at gmail.com Thu Feb 5 13:31:00 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 13:31:00 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> Message-ID: <1cd32cbb0902051031o3120c70ahfd1e2030eb72e750@mail.gmail.com> On Thu, Feb 5, 2009 at 1:11 PM, Pierre GM wrote: > > On Feb 5, 2009, at 1:03 PM, Sturla Molden wrote: > >> On 2/5/2009 6:46 PM, josef.pktd at gmail.com wrote: >> >>> so whether bigU or smallU is used in the calculation of z doesn't >>> matter, I have no idea why in this specific implementation both are >>> calculated if smallU would be enough. >> >> By the way, there is a fucntion scipy.stats.ranksums that does a >> Wilcoxon rank-sum test. It seems to be using a large-sample >> approximation, and has no correction for tied ranks. > > > Please keep in mind that some of the tests have been reimplemented in > scipy.stats.mstats to support masked/missing values in scipy.mstats > and to take ties into accounts ... > I trust y'all to let me know of any inconsistencies between the masked/ > unmasked versions, whether in terms of signatures or assumptions. > Thx a lot in advance... > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > a quick check looks pretty good (still example without ties) >>> stats.mstats.kruskal(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1] -4.8572257327350599e-016 >>> stats.mstats.kruskalwallis(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1] -4.8572257327350599e-016 >>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - stats.ranksums(rvs1,rvs2)[1] 0.00029058688269312238 >>> stats.mstats.mannwhitneyu(rvs1,rvs2) (4363.0, 0.11989439052971618) >>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - rwilcox(rvs1,rvs2,correct = False)['p.value'] 0.00029058688269296973 >>> stats.mstats.mannwhitneyu(rvs1,rvs2)[1] - rwilcox(rvs1,rvs2)['p.value'] 0.0 stats.mstats.mannwhitneyu employs continuity correction by default as in R. Just calling this, according to docstring, requires sequence, correct usage is not clear: >>> stats.mstats.compare_medians_ms(rvs1,rvs2) Traceback (most recent call last): File "", line 1, in stats.mstats.compare_medians_ms(rvs1,rvs2) File "\Programs\Python25\Lib\site-packages\scipy\stats\mstats_extras.py", line 332, in compare_medians_ms (std_1, std_2) = (mstats.stde_median(group_1, axis=axis), File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", line 1511, in stde_median return _stdemed_1D(data) File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", line 1504, in _stdemed_1D n = len(sorted) TypeError: object of type 'builtin_function_or_method' has no len() Josef From pgmdevlist at gmail.com Thu Feb 5 13:35:35 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 5 Feb 2009 13:35:35 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902051031o3120c70ahfd1e2030eb72e750@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <1cd32cbb0902051031o3120c70ahfd1e2030eb72e750@mail.gmail.com> Message-ID: On Feb 5, 2009, at 1:31 PM, josef.pktd at gmail.com wrote: > > Just calling this, according to docstring, requires sequence, correct > usage is not clear: > >>>> stats.mstats.compare_medians_ms(rvs1,rvs2) OK, can you send me a test sample (ie, the rvs1& rvs2 you used that fail, and what we should have had)? I'll try to fix that this afternoon... From josef.pktd at gmail.com Thu Feb 5 13:53:37 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 13:53:37 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <1cd32cbb0902051031o3120c70ahfd1e2030eb72e750@mail.gmail.com> Message-ID: <1cd32cbb0902051053j2beb7127sbacf49611f367f2a@mail.gmail.com> On Thu, Feb 5, 2009 at 1:35 PM, Pierre GM wrote: > > On Feb 5, 2009, at 1:31 PM, josef.pktd at gmail.com wrote: > >> >> Just calling this, according to docstring, requires sequence, correct >> usage is not clear: >> >>>>> stats.mstats.compare_medians_ms(rvs1,rvs2) > > OK, can you send me a test sample (ie, the rvs1& rvs2 you used that > fail, and what we should have had)? I'll try to fix that this > afternoon... > I generated rvs1 and rvs2 (without fixed seed) rvs1 = stats.norm.rvs(size = 100) rvs2 = 0.25*stats.norm.rvs(size = 100) I didn't look at stats.mstats.compare_medians_ms specifically, but it sounded like it should do something similar to the other tests I was trying out. So, I don't know what the expected answer should be, but I would expect a p-values similar to the other non-parametric tests for equality of location. the problem is in stats.mstats_basic.stde_median. Note: it is not exported (I'm not sure how or why the imports work) and can be accessed only directly >>> stats.mstats.stde_median(rvs1,axis=0) Traceback (most recent call last): File "", line 1, in stats.mstats.stde_median(rvs1,axis=0) AttributeError: 'module' object has no attribute 'stde_median' calling it directly returns this: >>> stats.mstats_basic.stde_median(rvs1,axis=0) Traceback (most recent call last): File "", line 1, in stats.mstats_basic.stde_median(rvs1,axis=0) File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", line 1514, in stde_median return ma.apply_along_axis(_stdemed_1D, axis, data) File "C:\Programs\Python25\lib\site-packages\numpy\ma\extras.py", line 185, in apply_along_axis res = func1d(arr[tuple(i.tolist())], *args, **kwargs) File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", line 1504, in _stdemed_1D n = len(sorted) TypeError: object of type 'builtin_function_or_method' has no len() Josef From josef.pktd at gmail.com Thu Feb 5 13:58:31 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 13:58:31 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902051053j2beb7127sbacf49611f367f2a@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <1cd32cbb0902051031o3120c70ahfd1e2030eb72e750@mail.gmail.com> <1cd32cbb0902051053j2beb7127sbacf49611f367f2a@mail.gmail.com> Message-ID: <1cd32cbb0902051058ldaba9bah862e3b4e3692eb2c@mail.gmail.com> On Thu, Feb 5, 2009 at 1:53 PM, wrote: > On Thu, Feb 5, 2009 at 1:35 PM, Pierre GM wrote: >> >> On Feb 5, 2009, at 1:31 PM, josef.pktd at gmail.com wrote: >> >>> >>> Just calling this, according to docstring, requires sequence, correct >>> usage is not clear: >>> >>>>>> stats.mstats.compare_medians_ms(rvs1,rvs2) >> >> OK, can you send me a test sample (ie, the rvs1& rvs2 you used that >> fail, and what we should have had)? I'll try to fix that this >> afternoon... >> > > I generated rvs1 and rvs2 (without fixed seed) > > rvs1 = stats.norm.rvs(size = 100) > rvs2 = 0.25*stats.norm.rvs(size = 100) > > I didn't look at stats.mstats.compare_medians_ms specifically, but it > sounded like it should do something similar to the other tests I was > trying out. So, I don't know what the expected answer should be, but I > would expect a p-values similar to the other non-parametric tests for > equality of location. > > the problem is in stats.mstats_basic.stde_median. > Note: it is not exported (I'm not sure how or why the imports work) > and can be accessed only directly > >>>> stats.mstats.stde_median(rvs1,axis=0) > Traceback (most recent call last): > File "", line 1, in > stats.mstats.stde_median(rvs1,axis=0) > AttributeError: 'module' object has no attribute 'stde_median' > > calling it directly returns this: > >>>> stats.mstats_basic.stde_median(rvs1,axis=0) > Traceback (most recent call last): > File "", line 1, in > stats.mstats_basic.stde_median(rvs1,axis=0) > File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", > line 1514, in stde_median > return ma.apply_along_axis(_stdemed_1D, axis, data) > File "C:\Programs\Python25\lib\site-packages\numpy\ma\extras.py", > line 185, in apply_along_axis > res = func1d(arr[tuple(i.tolist())], *args, **kwargs) > File "C:\Programs\Python25\lib\site-packages\scipy\stats\mstats_basic.py", > line 1504, in _stdemed_1D > n = len(sorted) > TypeError: object of type 'builtin_function_or_method' has no len() > doing : >>> np.source(stats.mstats_basic.stde_median) shows it's a reference by wrong name: you assign the sorted data to data, but then use "sorted" as a name def _stdemed_1D(data): data = np.sort(data.compressed()) n = len(sorted) Josef From sturla at molden.no Thu Feb 5 14:51:02 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 05 Feb 2009 20:51:02 +0100 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B29F6.4080508@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050633v91c8785s6ae993d6eb63aa01@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> Message-ID: <498B4326.7060207@molden.no> On 2/5/2009 7:03 PM, Sturla Molden wrote: > By the way, there is a fucntion scipy.stats.ranksums that does a > Wilcoxon rank-sum test. It seems to be using a large-sample > approximation, and has no correction for tied ranks. Here is a modification of SciPy's ranksums to allow small samples and correct for tied ranks. Sturla Molden import numpy as np import scipy import scipy.special zprob = scipy.special.ndtr def ranksums(x, y): """ Wilcoxon rank sum test Returns: W-statistic Z-statistic one-tailed p-value, asymptotic approximation one-tailed p-value, Monte Carlo approximation Corrected for ties. """ x,y = map(np.asarray, (x, y)) n1 = len(x) n2 = len(y) alldata = np.concatenate((x,y)) ranked = rankdata(alldata) x = ranked[:n1] y = ranked[n1:] w = np.sum(x,axis=0) def montecarlo(): shuffle = np.random.shuffle a = np.zeros(1000) shuffle(ranked) # bug in numpy: the first shuffle doesn't work for i in xrange(1000): shuffle(ranked) a[i] = np.sum(ranked[:n1],axis=0) return np.sum(a >= w) / 1000.0 def aymptotic_p(): expected = n1*(n1+n2+1) / 2.0 z = (w - expected) / np.sqrt(n1*n2*(n1+n2+1)/12.0) return 1.0 - zprob(z), z def aymptotic_p_ties(): t = [] _t = 0 for r in ranked: if r % 1: _t += 1 else: if _t: t.append(_t) _t = 0 if _t: t.append(_t) t = np.asarray(t) expected = n1*(n1+n2+1) / 2.0 tcorr = np.sum((t-1)*t*(t+1))/float((n1+n2)*(n1+n2-1)) z = (w - expected) / np.sqrt(n1*n2*(n1+n2+1-tcorr)/12.0) return 1.0 - zprob(z), z p_mc = montecarlo() if np.any(ranked % 1): p, z = aymptotic_p_ties() else: p, z = aymptotic_p() return w, z, p, p_mc def rankdata(a): a = np.ravel(a) n = len(a) svec, ivec = fastsort(a) sumranks = 0 dupcount = 0 newarray = np.zeros(n, float) for i in xrange(n): sumranks += i dupcount += 1 if i==n-1 or svec[i] != svec[i+1]: averank = sumranks / float(dupcount) + 1 for j in xrange(i-dupcount+1,i+1): newarray[ivec[j]] = averank sumranks = 0 dupcount = 0 return newarray def fastsort(a): it = np.argsort(a) as_ = a[it] return as_, it From christopher.paul.taylor at gmail.com Thu Feb 5 15:13:29 2009 From: christopher.paul.taylor at gmail.com (christopher taylor) Date: Thu, 5 Feb 2009 15:13:29 -0500 Subject: [SciPy-user] question about using speigs.ARPACK_eigs Message-ID: I'm currently working with sparse matrix of a size of roughly 65K by 65K. I'd like to compute the first 2 eigenvectors of this 65K x 65K matrix. I've been told to use speigs.ARPACK_eigs: # data_mat sizes data_mat_width=256 data_mat_height=256 #256*256 = 65536 # # i've noticed there's an assertion that will kill a call to this function if the last argument is >=4 # eigvals, eigvecs = speigs.ARPACK_eigs( data_mat.matvec, data_mat_width*data_mat_height, 2) Unfortunately, the function is only computing a result of 2x2 matrix. I need a result that's a matrix with a height of, roughly, 65536 and a width of 2. Any recommendations or tips? Thanks, ct From josef.pktd at gmail.com Thu Feb 5 15:30:07 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 15:30:07 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <498B4326.7060207@molden.no> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <498B4326.7060207@molden.no> Message-ID: <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> On Thu, Feb 5, 2009 at 2:51 PM, Sturla Molden wrote: > On 2/5/2009 7:03 PM, Sturla Molden wrote: > >> By the way, there is a fucntion scipy.stats.ranksums that does a >> Wilcoxon rank-sum test. It seems to be using a large-sample >> approximation, and has no correction for tied ranks. > > > Here is a modification of SciPy's ranksums to allow small samples and > correct for tied ranks. > there are absolute values missing abs(z-expected), I also prefer the correction p*2 since it is a two-sided test sample size 20, 9 ties this is with R wilcox.exact, ranksums is your ranksum >>> rwilcex(rvs1[:20],4*ind10+rvs2t[:20],exact=True)['p.value'] 0.15716005595098306 >>> ranksums(rvs1[:20],4*ind10+rvs2t[:20]) #wrong tail because no abs() (357.0, -1.4336547191212172, 0.9241645900073665, 0.92800000000000005) >>> ranksums(4*ind10+rvs2t[:20],rvs1[:20]) (463.0, 1.4336547191212172, 0.075835409992633496, 0.068000000000000005) >>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[3]*2 0.186 >>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[2]*2 0.15167081998526699 >>> stats.mannwhitneyu(rvs1[:20],4*ind10+rvs2t[:20])[1]*2 0.15167081998526699 With this correction, the normal distribution based p-value in ranksums looks exactly the same as stats.mannwhitneyu. your Monte Carlo p-value differs more from R's exact result than the normal distribution based p-value. Overall, the differences in p-values look pretty small in the examples I tried out, so my guess is that a Monte-Carlo on the correct size and power of the tests will show very similar rejection rates, at critical values of 0.05 or 0.1. But I don't have time for that now. Josef From josef.pktd at gmail.com Thu Feb 5 15:54:32 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 15:54:32 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <498B4326.7060207@molden.no> <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> Message-ID: <1cd32cbb0902051254g13ea0ae5yf879a5f648c8030e@mail.gmail.com> > > sample size 20, 9 ties > this is with R wilcox.exact, ranksums is your ranksum ... > > With this correction, the normal distribution based p-value in > ranksums looks exactly the same as stats.mannwhitneyu. this statement is not correct. I mixed up my variables and didn't actually have ties, now with ties, I still get essentially but not exactly the same results. Josef From pjungkun at nps.edu Thu Feb 5 17:57:52 2009 From: pjungkun at nps.edu (Patrick Jungkunz) Date: Thu, 5 Feb 2009 14:57:52 -0800 Subject: [SciPy-user] Fwd: Scipy sandbox - color.py Message-ID: <7794F558-D11F-4FFE-8441-6322FD5920CB@nps.edu> G'day out there Here is an answer about color.py I got from Robert Kern. I just wanted to forward this to the mailing list for the benefit of everyone. Thank you, Robert, for the immediate response. Patrick Begin forwarded message: > > > On Tue, Feb 3, 2009 at 18:06, Patrick Jungkunz > wrote: >> >> >> For a project I am working on using scipy, I need to convert an array >> representing an rgb image into the Lab color space. I found the >> color.py >> script in the scipy sandbox which seems to be taking care of that. >> I am >> writing you because you had been referenced as the author of that >> file. > > No, I'm just the last person to touch it. > >> I was wondering why this script is still in the sandbox and not >> integrated >> into the scipy release. Are there any issues I should be aware of >> before >> using it? Are there any specific setup procedures I need to follow >> in order >> to use the script. > > It's not really fully-baked. Copy it into your own code, for now. If > you're just interested in the standard transforms, I have a somewhat > cleaner version here: > > http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/colormap_explorer/file/tip/colormap_explorer/conversion.py > > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Feb 5 18:19:42 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 6 Feb 2009 00:19:42 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> Message-ID: <20090205231942.GB21014@phare.normalesup.org> On Mon, Feb 02, 2009 at 06:24:18PM +0100, Sturla Molden wrote: > > Am I out of my mind, and will this fail utterly? > It will work. But we should use named shared memory (which requires some C > or Cython coding), not BSD mmap as mp.Array currently does. Also we must > override how ndarrays are pickled. I just wanted to say that I am still interested in exploring this a bit deeper, but I got swamped suddenly. Besides, as Robert and you have sown, there is more than I thought to it. Cheers, Ga?l From robert.kern at gmail.com Thu Feb 5 18:23:32 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 5 Feb 2009 17:23:32 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090205231942.GB21014@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> Message-ID: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> On Thu, Feb 5, 2009 at 17:19, Gael Varoquaux wrote: > On Mon, Feb 02, 2009 at 06:24:18PM +0100, Sturla Molden wrote: >> > Am I out of my mind, and will this fail utterly? > >> It will work. But we should use named shared memory (which requires some C >> or Cython coding), not BSD mmap as mp.Array currently does. Also we must >> override how ndarrays are pickled. > > I just wanted to say that I am still interested in exploring this a bit > deeper, but I got swamped suddenly. Besides, as Robert and you have sown, > there is more than I thought to it. BTW, Philip Semanchuk, the maintainer of the aforementioned shm module, contacted Sturla and myself offlist to point out two, more up-to-date, modules which provide named shared memory on UNIX systems: http://semanchuk.com/philip/sysv_ipc/ http://semanchuk.com/philip/posix_ipc/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gael.varoquaux at normalesup.org Thu Feb 5 18:41:15 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 6 Feb 2009 00:41:15 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> Message-ID: <20090205234115.GC29684@phare.normalesup.org> On Thu, Feb 05, 2009 at 05:23:32PM -0600, Robert Kern wrote: > BTW, Philip Semanchuk, the maintainer of the aforementioned shm > module, contacted Sturla and myself offlist to point out two, more > up-to-date, modules which provide named shared memory on UNIX systems: > http://semanchuk.com/philip/sysv_ipc/ > http://semanchuk.com/philip/posix_ipc/ Interesting. I wonder how to use these. I would really like to see shared memory in numpy by itself at some point. I did not look at the code as it is GPL, from what I see. The core idea, from what I understand, would be to use the POSIX shm_open call to expose some named shared to numpy using eg from_buffer. Or can we simply make it point to the pointer of an existing array using shmat, if is is contiguous? That would avoid a copy (if contiguous). Finally, to make sure share memory works with multiprocessing, we would have to override pickling so that the pickling and unpicking are done simply by storing the name of the shared memory object or retrieving it. This is risky, because actual persistence would be destroyed. Under Window we would use CreateSharedMemory to perform the same trick using CreateFileMapping and MapViewOfFile? Sounds fun. Ga?l From josef.pktd at gmail.com Thu Feb 5 19:03:34 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Feb 2009 19:03:34 -0500 Subject: [SciPy-user] help with scipy.stats.mannwhitneyu In-Reply-To: <1cd32cbb0902051254g13ea0ae5yf879a5f648c8030e@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <498B4326.7060207@molden.no> <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> <1cd32cbb0902051254g13ea0ae5yf879a5f648c8030e@mail.gmail.com> Message-ID: <1cd32cbb0902051603w17d9a88ai39101ddf277341ca@mail.gmail.com> On Thu, Feb 5, 2009 at 3:54 PM, wrote: >> >> sample size 20, 9 ties >> this is with R wilcox.exact, ranksums is your ranksum > ... >> >> With this correction, the normal distribution based p-value in >> ranksums looks exactly the same as stats.mannwhitneyu. > > this statement is not correct. > > I mixed up my variables and didn't actually have ties, now with ties, > I still get essentially but not exactly the same results. > I think there is a mistake in the tie handling of stats.mannwhitneyu In the calculation of the standard error the sqrt is taken twice. T = np.sqrt(tiecorrect(ranked)) # correction factor for tied scores if T == 0: raise ValueError, 'All numbers are identical in amannwhitneyu' sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0) I don't have the formulas for the tie correction, but from looking at the tie correction in Sturlas version of ranksums, it seems that the first sqrt shouldn't be there. Can someone with access to the correct references verify this. Josef From ellisonbg.net at gmail.com Thu Feb 5 19:34:51 2009 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 5 Feb 2009 16:34:51 -0800 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090205234115.GC29684@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> Message-ID: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> This is quite interesting indeed. I am not familiar with this stuff at all, but I guess I have some reading to do. One important question though: Can these mechanisms be used to create shared memory amongst processes that are started in a completely independent manner. That is, they are not fork()'d. If so, then we should develop a shared memory version of numpy arrays that will work in any multiple-process setting. I am thinking multiprocessing *and* the IPython.kernel. Cheers, Brian On Thu, Feb 5, 2009 at 3:41 PM, Gael Varoquaux wrote: > On Thu, Feb 05, 2009 at 05:23:32PM -0600, Robert Kern wrote: >> BTW, Philip Semanchuk, the maintainer of the aforementioned shm >> module, contacted Sturla and myself offlist to point out two, more >> up-to-date, modules which provide named shared memory on UNIX systems: > >> http://semanchuk.com/philip/sysv_ipc/ >> http://semanchuk.com/philip/posix_ipc/ > > Interesting. I wonder how to use these. I would really like to see shared > memory in numpy by itself at some point. I did not look at the code as it > is GPL, from what I see. > > The core idea, from what I understand, would be to use the POSIX shm_open > call to expose some named shared to numpy using eg from_buffer. Or can we > simply make it point to the pointer of an existing array using shmat, if > is is contiguous? That would avoid a copy (if contiguous). > > Finally, to make sure share memory works with multiprocessing, we would > have to override pickling so that the pickling and unpicking are done > simply by storing the name of the shared memory object or retrieving it. > This is risky, because actual persistence would be destroyed. > > Under Window we would use CreateSharedMemory to perform the same trick > using CreateFileMapping and MapViewOfFile? > > Sounds fun. > > Ga?l > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From philip at semanchuk.com Thu Feb 5 20:00:30 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Thu, 5 Feb 2009 20:00:30 -0500 Subject: [SciPy-user] shared memory machines Message-ID: <5AEE5578-71FA-4EF4-B305-9A29B65A827C@semanchuk.com> Brian Granger wrote: > This is quite interesting indeed. I am not familiar with this stuff > at all, but I guess I have some reading to do. One important question > though: > Can these mechanisms be used to create shared memory amongst processes > that are started in a completely independent manner. That is, they > are not fork()'d. > If so, then we should develop a shared memory version of numpy arrays > that will work in any multiple-process setting. I am thinking > multiprocessing *and* the IPython.kernel. Hi all, I'm the author of the aforementioned IPC modules and I thought I'd jump in even though I'm not a numpy guy. Yes, one can use IPC objects (Sys V or POSIX) in completely independent processes. There's a demo that comes along with both modules that demonstrates that. I guess numpy isn't GPLed? You could still download either one of the above packages and run the demo to observe the process independence. Ga?l, AFAIK shared memory is guaranteed to be contiguous. I'm making my assumption based on the fact that neither the Sys V nor POSIX API has any references to accessing different chunks of memory. It's treated as one logical block. In fact, the POSIX API for creating shared memory (shm_open) simply returns a file descriptor that one accesses as a memory mapped file: http://www.opengroup.org/onlinepubs/000095399/functions/shm_open.html HTH Philip From robert.kern at gmail.com Thu Feb 5 20:02:59 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 5 Feb 2009 19:02:59 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <5AEE5578-71FA-4EF4-B305-9A29B65A827C@semanchuk.com> References: <5AEE5578-71FA-4EF4-B305-9A29B65A827C@semanchuk.com> Message-ID: <3d375d730902051702kc6b6a9eh861d78132462cc78@mail.gmail.com> On Thu, Feb 5, 2009 at 19:00, Philip Semanchuk wrote: > Brian Granger wrote: > >> This is quite interesting indeed. I am not familiar with this stuff >> at all, but I guess I have some reading to do. One important question >> though: >> Can these mechanisms be used to create shared memory amongst processes >> that are started in a completely independent manner. That is, they >> are not fork()'d. >> If so, then we should develop a shared memory version of numpy arrays >> that will work in any multiple-process setting. I am thinking >> multiprocessing *and* the IPython.kernel. > > Hi all, > I'm the author of the aforementioned IPC modules and I thought I'd > jump in even though I'm not a numpy guy. > > Yes, one can use IPC objects (Sys V or POSIX) in completely > independent processes. There's a demo that comes along with both > modules that demonstrates that. I guess numpy isn't GPLed? No, we're BSD-licensed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From strawman at astraw.com Thu Feb 5 23:13:34 2009 From: strawman at astraw.com (Andrew Straw) Date: Thu, 05 Feb 2009 20:13:34 -0800 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090205234115.GC29684@phare.normalesup.org> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> Message-ID: <498BB8EE.10900@astraw.com> FWIW, I wrote some BSD licensed Pyrex code that does some shared memory stuff. I wouldn't attempt to resurrect the complete working module, but cut and paste at will: http://code.astraw.com/projects/motmot/browser/trunk/pycamiface/src/_cam_iface_shm.pyx?rev=328 (This was from a wrapper of a camera driver that used shared memory since the camera driver was very badly behaved and couldn't be trusted to run in the same process. I have since stopped using this code and wouldn't have time to get it working again, but it did open and use shared memory quite nicely on linux.) Also I found this web site very useful: http://www.ecst.csuchico.edu/~beej/guide/ipc/ Gael Varoquaux wrote: > On Thu, Feb 05, 2009 at 05:23:32PM -0600, Robert Kern wrote: >> BTW, Philip Semanchuk, the maintainer of the aforementioned shm >> module, contacted Sturla and myself offlist to point out two, more >> up-to-date, modules which provide named shared memory on UNIX systems: > >> http://semanchuk.com/philip/sysv_ipc/ >> http://semanchuk.com/philip/posix_ipc/ > > Interesting. I wonder how to use these. I would really like to see shared > memory in numpy by itself at some point. I did not look at the code as it > is GPL, from what I see. > > The core idea, from what I understand, would be to use the POSIX shm_open > call to expose some named shared to numpy using eg from_buffer. Or can we > simply make it point to the pointer of an existing array using shmat, if > is is contiguous? That would avoid a copy (if contiguous). > > Finally, to make sure share memory works with multiprocessing, we would > have to override pickling so that the pickling and unpicking are done > simply by storing the name of the shared memory object or retrieving it. > This is risky, because actual persistence would be destroyed. > > Under Window we would use CreateSharedMemory to perform the same trick > using CreateFileMapping and MapViewOfFile? > > Sounds fun. > > Ga?l > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user From ellisonbg.net at gmail.com Thu Feb 5 23:20:04 2009 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 5 Feb 2009 20:20:04 -0800 Subject: [SciPy-user] shared memory machines In-Reply-To: <498BB8EE.10900@astraw.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <498BB8EE.10900@astraw.com> Message-ID: <6ce0ac130902052020l403bd877h42eb8f9b90f8158e@mail.gmail.com> Thanks! On Thu, Feb 5, 2009 at 8:13 PM, Andrew Straw wrote: > FWIW, I wrote some BSD licensed Pyrex code that does some shared memory > stuff. I wouldn't attempt to resurrect the complete working module, but > cut and paste at will: > > http://code.astraw.com/projects/motmot/browser/trunk/pycamiface/src/_cam_iface_shm.pyx?rev=328 > > (This was from a wrapper of a camera driver that used shared memory > since the camera driver was very badly behaved and couldn't be trusted > to run in the same process. I have since stopped using this code and > wouldn't have time to get it working again, but it did open and use > shared memory quite nicely on linux.) > > Also I found this web site very useful: > http://www.ecst.csuchico.edu/~beej/guide/ipc/ > > Gael Varoquaux wrote: >> On Thu, Feb 05, 2009 at 05:23:32PM -0600, Robert Kern wrote: >>> BTW, Philip Semanchuk, the maintainer of the aforementioned shm >>> module, contacted Sturla and myself offlist to point out two, more >>> up-to-date, modules which provide named shared memory on UNIX systems: >> >>> http://semanchuk.com/philip/sysv_ipc/ >>> http://semanchuk.com/philip/posix_ipc/ >> >> Interesting. I wonder how to use these. I would really like to see shared >> memory in numpy by itself at some point. I did not look at the code as it >> is GPL, from what I see. >> >> The core idea, from what I understand, would be to use the POSIX shm_open >> call to expose some named shared to numpy using eg from_buffer. Or can we >> simply make it point to the pointer of an existing array using shmat, if >> is is contiguous? That would avoid a copy (if contiguous). >> >> Finally, to make sure share memory works with multiprocessing, we would >> have to override pickling so that the pickling and unpicking are done >> simply by storing the name of the shared memory object or retrieving it. >> This is risky, because actual persistence would be destroyed. >> >> Under Window we would use CreateSharedMemory to perform the same trick >> using CreateFileMapping and MapViewOfFile? >> >> Sounds fun. >> >> Ga?l >> _______________________________________________ >> SciPy-user mailing list >> SciPy-user at scipy.org >> http://projects.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From gael.varoquaux at normalesup.org Fri Feb 6 01:36:45 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 6 Feb 2009 07:36:45 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> Message-ID: <20090206063645.GA1704@phare.normalesup.org> On Thu, Feb 05, 2009 at 04:34:51PM -0800, Brian Granger wrote: > If so, then we should develop a shared memory version of numpy arrays > that will work in any multiple-process setting. I am thinking > multiprocessing *and* the IPython.kernel. I am +1 on that, obviously. I'd love to see a 'fork'-based IPython version, though :). Ga?l From starsareblueandfaraway at gmail.com Fri Feb 6 08:42:01 2009 From: starsareblueandfaraway at gmail.com (Roy H. Han) Date: Fri, 6 Feb 2009 08:42:01 -0500 Subject: [SciPy-user] SciPy-user Digest, Vol 66, Issue 9 In-Reply-To: References: Message-ID: <6a5569ec0902060542n42e27cf9r22b9d24de9cbee8d@mail.gmail.com> Thanks, Josef. This doesn't really answer my question, but thanks for your response. Date: Wed, 4 Feb 2009 12:44:27 -0500 From: josef.pktd at gmail.com Subject: Re: [SciPy-user] Mysterious kmeans() error To: SciPy Users List Message-ID: <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han wrote: > As a side comment, if I use Pycluster, then the clustering proceeds > without error. > > On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han > wrote: >> Has anyone seen this error before? I have no idea what it means. I'm >> using version 0.6.0 packaged for Fedora. >> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq >> >> >> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", >> line 55, in grapeCluster >> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, >> iter=iterationCountPerBurst)[1] >> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >> 563, in kmeans2 >> clusters = init(data, k) >> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >> 469, in _krandinit >> x = N.dot(x, N.linalg.cholesky(cov).T) + mu >> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", >> line 418, in cholesky >> Cholesky decomposition cannot be computed' >> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - >> Cholesky decomposition cannot be computed This is just a general answer, I never used scipy.cluster The error message means that the covariance matrix of your np.cov(data) is not positive definite. Check your data, whether there is any linear dependence, eg. look at eigenvalues of np.cov(data). If that's not the source of the error, then a cluster expert is needed. Josef From gaedol at gmail.com Fri Feb 6 08:52:44 2009 From: gaedol at gmail.com (Marco) Date: Fri, 6 Feb 2009 14:52:44 +0100 Subject: [SciPy-user] Lowpass Filter In-Reply-To: <498B21CF.6040105@asu.edu> References: <498B21CF.6040105@asu.edu> Message-ID: Thank you all for the pointers and ideas: I will try to do something, and let you know what comes out. Thanks, marco -- Quando sei una human pignata e la pazzo jacket si ? accorciata e non ti puoi liberare dai colpi di legno e di bastone dai petardi sul groppone Vinicio Capossela On Thu, Feb 5, 2009 at 6:28 PM, Christopher Brown wrote: > Hi Marco, > > M> Let's suppose a to be a 1D array with N elements. > M> Basically, it's a signal of some sort. > M> > M> How do I apply a low pass filter (with selected frequency and width) > M> to this signal? > M> How to store the resulting, filtered, signal, in a new array? > M> > M> I had a look at lp2lp() in scipy.signal, but it returns, if I am > M> right, a filter object, which then I dunno how to use to filter my > M> data. > M> > M> Any ideas or pointers? > > The following is a low-pass Butterworth filter > > cutoff = 500. > fs = 44100. > nyq = fs/2. > filterorder = 5 > > b,a = scipy.signal.filter_design.butter(filterorder,cutoff/nyq) > filteredsignal = scipy.signal.lfilter(b,a,signal) > > -- > Chris > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From starsareblueandfaraway at gmail.com Fri Feb 6 09:29:11 2009 From: starsareblueandfaraway at gmail.com (Roy H. Han) Date: Fri, 6 Feb 2009 09:29:11 -0500 Subject: [SciPy-user] Mysterious kmeans() error Message-ID: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> Thanks, Josef. It seems that it happens when one of the clusters becomes empty. Pycluster never seems to have the problem of empty clusters though. /usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477: UserWarning: One of the clusters is empty. Re-run kmean with a different initialization. warnings.warn("One of the clusters is empty. " Traceback (most recent call last): File "clusterProbabilities.py", line 88, in run(taskName, parameterByName) File "clusterProbabilities.py", line 57, in run locationGeoFrame = probability_process.cluster(targetLocationPath, probabilityPath, iterationCountPerBurst, maximumGeoDiameter, minimumGeoDiameter) File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", line 33, in cluster windowLocations = grapeCluster(vectors, iterationCountPerBurst, maximumPixelDiameter, minimumPixelDiameter) File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", line 66, in grapeCluster assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, iter=iterationCountPerBurst)[1] File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line 563, in kmeans2 clusters = init(data, k) File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line 469, in _krandinit x = N.dot(x, N.linalg.cholesky(cov).T) + mu File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", line 418, in cholesky Cholesky decomposition cannot be computed' numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - Cholesky decomposition cannot be computed On Fri, Feb 6, 2009 at 9:08 AM, wrote: > On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han > wrote: >> Thanks, Josef. This doesn't really answer my question, but thanks for >> your response. >> >> >> Date: Wed, 4 Feb 2009 12:44:27 -0500 >> From: josef.pktd at gmail.com >> Subject: Re: [SciPy-user] Mysterious kmeans() error >> To: SciPy Users List >> Message-ID: >> <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d at mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han >> wrote: >>> As a side comment, if I use Pycluster, then the clustering proceeds >>> without error. >>> >>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han >>> wrote: >>>> Has anyone seen this error before? I have no idea what it means. I'm >>>> using version 0.6.0 packaged for Fedora. >>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq >>>> >>>> >>>> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", >>>> line 55, in grapeCluster >>>> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, >>>> iter=iterationCountPerBurst)[1] >>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >>>> 563, in kmeans2 >>>> clusters = init(data, k) >>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >>>> 469, in _krandinit >>>> x = N.dot(x, N.linalg.cholesky(cov).T) + mu >>>> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", >>>> line 418, in cholesky >>>> Cholesky decomposition cannot be computed' >>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - >>>> Cholesky decomposition cannot be computed >> >> This is just a general answer, I never used scipy.cluster >> >> The error message means that the covariance matrix of your >> np.cov(data) is not positive definite. Check your data, whether there >> is any linear dependence, eg. look at eigenvalues of np.cov(data). >> >> If that's not the source of the error, then a cluster expert is needed. >> >> Josef >> > > I had looked a bit more, and I get the same error if the data has more > columns than rows. > The assumption in scipy.cluster is that columns represent random > variables and rows represent > observations. So, if the matrix is transposed then also the same > exception is raised as in your case > > Josef > > BTW: it's better to reply to individual threads than to the Digest, > since that preserves the subject line and threading. > From starsareblueandfaraway at gmail.com Fri Feb 6 09:37:23 2009 From: starsareblueandfaraway at gmail.com (Roy H. Han) Date: Fri, 6 Feb 2009 09:37:23 -0500 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> References: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> Message-ID: <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> Well I feel like there are numerical problems with scipy's kmeans2(), at least in the 0.6.0 version of scipy. I changed the code to try to ensure that no clusters were empty. Pycluster seems to be the better clustering algorithm for now. Even though the size (number of columns = 3) of each vector in the cluster is three, kmeans should still work even if one of the clusters contained a single vector (number of rows = 1). This is a bug. On Fri, Feb 6, 2009 at 9:29 AM, Roy H. Han wrote: > Thanks, Josef. > > It seems that it happens when one of the clusters becomes empty. > Pycluster never seems to have the problem of empty clusters though. > > > /usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477: > UserWarning: One of the clusters is empty. Re-run kmean with a > different initialization. > warnings.warn("One of the clusters is empty. " > > Traceback (most recent call last): > File "clusterProbabilities.py", line 88, in > run(taskName, parameterByName) > File "clusterProbabilities.py", line 57, in run > locationGeoFrame = probability_process.cluster(targetLocationPath, > probabilityPath, iterationCountPerBurst, maximumGeoDiameter, > minimumGeoDiameter) > File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", > line 33, in cluster > windowLocations = grapeCluster(vectors, iterationCountPerBurst, > maximumPixelDiameter, minimumPixelDiameter) > File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", > line 66, in grapeCluster > assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, > iter=iterationCountPerBurst)[1] > File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line > 563, in kmeans2 > clusters = init(data, k) > File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line > 469, in _krandinit > x = N.dot(x, N.linalg.cholesky(cov).T) + mu > File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", > line 418, in cholesky > Cholesky decomposition cannot be computed' > numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - > Cholesky decomposition cannot be computed > > > > On Fri, Feb 6, 2009 at 9:08 AM, wrote: >> On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han >> wrote: >>> Thanks, Josef. This doesn't really answer my question, but thanks for >>> your response. >>> >>> >>> Date: Wed, 4 Feb 2009 12:44:27 -0500 >>> From: josef.pktd at gmail.com >>> Subject: Re: [SciPy-user] Mysterious kmeans() error >>> To: SciPy Users List >>> Message-ID: >>> <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d at mail.gmail.com> >>> Content-Type: text/plain; charset=ISO-8859-1 >>> >>> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han >>> wrote: >>>> As a side comment, if I use Pycluster, then the clustering proceeds >>>> without error. >>>> >>>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han >>>> wrote: >>>>> Has anyone seen this error before? I have no idea what it means. I'm >>>>> using version 0.6.0 packaged for Fedora. >>>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq >>>>> >>>>> >>>>> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py", >>>>> line 55, in grapeCluster >>>>> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2, >>>>> iter=iterationCountPerBurst)[1] >>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >>>>> 563, in kmeans2 >>>>> clusters = init(data, k) >>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line >>>>> 469, in _krandinit >>>>> x = N.dot(x, N.linalg.cholesky(cov).T) + mu >>>>> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py", >>>>> line 418, in cholesky >>>>> Cholesky decomposition cannot be computed' >>>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite - >>>>> Cholesky decomposition cannot be computed >>> >>> This is just a general answer, I never used scipy.cluster >>> >>> The error message means that the covariance matrix of your >>> np.cov(data) is not positive definite. Check your data, whether there >>> is any linear dependence, eg. look at eigenvalues of np.cov(data). >>> >>> If that's not the source of the error, then a cluster expert is needed. >>> >>> Josef >>> >> >> I had looked a bit more, and I get the same error if the data has more >> columns than rows. >> The assumption in scipy.cluster is that columns represent random >> variables and rows represent >> observations. So, if the matrix is transposed then also the same >> exception is raised as in your case >> >> Josef >> >> BTW: it's better to reply to individual threads than to the Digest, >> since that preserves the subject line and threading. >> > From sturla at molden.no Fri Feb 6 10:15:05 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 16:15:05 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> Message-ID: <498C53F9.3070708@molden.no> On 2/6/2009 1:34 AM, Brian Granger wrote: > Can these mechanisms be used to create shared memory amongst processes > that are started in a completely independent manner. That is, they > are not fork()'d. Yes it can. If we know the name of the segment (an integer on Unix, a string on Windows), it can be mapped into any process. Similarly for named semaphores. This is different from the locks ans shared memory in multiprocessing, which must be shared through forking (Unix) or handle inheritance (Windows), and therefore created prior to instantiation of multiprocessing.Process. Otherwise, there is no valid handle to inherit. The question remains: should we base this on Cython or C (there is very little coding to do), or some third party extension, e.g. Philip Semanchuk's POSIX IPC and Mark Hammond's pywin32? I am thinking that at least for POSIX IPC, GPL is a severe limitation. Also we need some automatic clean up, which can only be accomplished with an extension object (that is, __dealloc__ in Cython will always be called, as opposed to __del__ in Python). In Pywin32 there is a PyHANDLE object that automatically calls CloseHandle when it is collected. But I don't think Semanchuk's POSIX IPC module will do the same. And avoiding dependencies on huge projects like pywin32 is also good. Sturla Molden From gael.varoquaux at normalesup.org Fri Feb 6 10:24:29 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 6 Feb 2009 16:24:29 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <498C53F9.3070708@molden.no> References: <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> Message-ID: <20090206152429.GA13894@phare.normalesup.org> On Fri, Feb 06, 2009 at 04:15:05PM +0100, Sturla Molden wrote: > The question remains: should we base this on Cython or C (there is very > little coding to do), or some third party extension, e.g. Philip > Semanchuk's POSIX IPC and Mark Hammond's pywin32? I am thinking that at > least for POSIX IPC, GPL is a severe limitation. Also we need some > automatic clean up, which can only be accomplished with an extension > object (that is, __dealloc__ in Cython will always be called, as opposed > to __del__ in Python). In Pywin32 there is a PyHANDLE object that > automatically calls CloseHandle when it is collected. But I don't think > Semanchuk's POSIX IPC module will do the same. And avoiding dependencies > on huge projects like pywin32 is also good. I am all for avoiding external dependencies (especially if they are GPL). multiprocessing is in the standard library, I would like to be able to do shared memory parallel computing with only numpy and the standard library. Actually I can see a near future where some algorithms of scipy could have the option of using multiple cores (I am thinking of eg non-parametric statistics). The __dealloc__ argument is a very good one for going with Cython. In addition I really like the feeling of Cython code. And am I wrong in thinking that it would make the transition to Python3 easier? Ga?l From philip at semanchuk.com Fri Feb 6 10:26:46 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Fri, 6 Feb 2009 10:26:46 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <498C53F9.3070708@molden.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> Message-ID: <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> On Feb 6, 2009, at 10:15 AM, Sturla Molden wrote: > On 2/6/2009 1:34 AM, Brian Granger wrote: > >> Can these mechanisms be used to create shared memory amongst >> processes >> that are started in a completely independent manner. That is, they >> are not fork()'d. > > Yes it can. If we know the name of the segment (an integer on Unix, a > string on Windows), it can be mapped into any process. Similarly for > named semaphores. A small correction -- SysV IPC objects are referred to with an integer key. POSIX IPC objects are referred to with a file system-ish name e.g. "/my_semaphore". > The question remains: should we base this on Cython or C (there is > very > little coding to do), or some third party extension, e.g. Philip > Semanchuk's POSIX IPC and Mark Hammond's pywin32? I am thinking that > at > least for POSIX IPC, GPL is a severe limitation. Also we need some > automatic clean up, which can only be accomplished with an extension > object (that is, __dealloc__ in Cython will always be called, as > opposed > to __del__ in Python). In Pywin32 there is a PyHANDLE object that > automatically calls CloseHandle when it is collected. But I don't > think > Semanchuk's POSIX IPC module will do the same. And avoiding > dependencies > on huge projects like pywin32 is also good. You are correct that posix_ipc doesn't close handles when deallocated. THis is a deliberate choice -- the documentation says that closing the handle makes the IPC object no longer available to the *process*. So if one has multiple handles to an IPC object (say, inside multiple threads), closing one would invalidate them all. But as I write this, I'm wondering if that's not just a documentation bug and something with which I ought to experiment. bye Philip From sturla at molden.no Fri Feb 6 10:38:17 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 16:38:17 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> Message-ID: <498C5969.1040809@molden.no> On 2/6/2009 4:26 PM, Philip Semanchuk wrote: > You are correct that posix_ipc doesn't close handles when deallocated. > THis is a deliberate choice -- the documentation says that closing the > handle makes the IPC object no longer available to the *process*. So > if one has multiple handles to an IPC object (say, inside multiple > threads), closing one would invalidate them all. But as I write this, > I'm wondering if that's not just a documentation bug and something > with which I ought to experiment. I have been thinking about this as well. I am mostly familiar with Windows so excuse my terminology: We don't want an array to call CloseHandle() on a mapped segment that another array is still using. The effect would be global to the process. Thus, we would either need to maintain some sort of global reference count for all mapped shared resources, or make duplicates of the handle. On Windows there is a function called DuplicateHandle() that will do this. I am not sure about Unix. Sturla Molden From sturla at molden.no Fri Feb 6 10:51:52 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 16:51:52 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <498C5969.1040809@molden.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> Message-ID: <498C5C98.6000108@molden.no> On 2/6/2009 4:38 PM, Sturla Molden wrote: > I have been thinking about this as well. I am mostly familiar with > Windows so excuse my terminology: We don't want an array to call > CloseHandle() on a mapped segment that another array is still using. The > effect would be global to the process. Thus, we would either need to > maintain some sort of global reference count for all mapped shared > resources, or make duplicates of the handle. On Windows there is a > function called DuplicateHandle() that will do this. I am not sure about > Unix. On Unix we could possibly use a WeakValueDictionary with name as key and handle as value. And then let the handle object close itself when it is collected. So an array could first look for an open handle in the dictionary, before trying to map a new one. And since we are doing all this in Cython, the GIL will take care of the synchronization. This would also work on Windows, but there we have DuplicateHandle as another option. S.M. From philip at semanchuk.com Fri Feb 6 10:54:42 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Fri, 6 Feb 2009 10:54:42 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <498C5969.1040809@molden.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> Message-ID: <8FCF568D-EA63-45ED-9933-2DAB1B9714AC@semanchuk.com> On Feb 6, 2009, at 10:38 AM, Sturla Molden wrote: > On 2/6/2009 4:26 PM, Philip Semanchuk wrote: > >> You are correct that posix_ipc doesn't close handles when >> deallocated. >> THis is a deliberate choice -- the documentation says that closing >> the >> handle makes the IPC object no longer available to the *process*. So >> if one has multiple handles to an IPC object (say, inside multiple >> threads), closing one would invalidate them all. But as I write this, >> I'm wondering if that's not just a documentation bug and something >> with which I ought to experiment. > > I have been thinking about this as well. I am mostly familiar with > Windows so excuse my terminology: We don't want an array to call > CloseHandle() on a mapped segment that another array is still using. > The > effect would be global to the process. Thus, we would either need to > maintain some sort of global reference count for all mapped shared > resources, or make duplicates of the handle. On Windows there is a > function called DuplicateHandle() that will do this. I am not sure > about > Unix. On Unix, one can duplicate a file handle with a call to dup(). Note that the doc for shm_unlink() says this: "If one or more references to the shared memory object exist when the object is unlinked, the name shall be removed before shm_unlink() returns, but the removal of the memory object contents shall be postponed until all open and map references to the shared memory object have been removed." Furthermore (and this is where it gets tricky): "Even if the object continues to exist after the last shm_unlink(), reuse of the name shall subsequently cause shm_open() to behave as if no shared memory object of this name exists (that is, shm_open() will fail if O_CREAT is not set, or will create a new shared memory object if O_CREAT is set)." You'd have to do your testing very carefully to see if dup() really increments the kernel's reference count on a shared memory segment. From cournape at gmail.com Fri Feb 6 11:05:55 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 7 Feb 2009 01:05:55 +0900 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> References: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> Message-ID: <5b8d13220902060805j1b16d281v203201ad53c0df54@mail.gmail.com> On Fri, Feb 6, 2009 at 11:37 PM, Roy H. Han wrote: > Well I feel like there are numerical problems with scipy's kmeans2(), > at least in the 0.6.0 version of scipy. kmeans and kmeans2 are fairly low level - they will fail if you have empty cluster, indeed. > I changed the code to try to ensure that no clusters were empty. > Pycluster seems to be the better clustering algorithm for now. Maybe - I am not familiar with pycluster. > Even though the size (number of columns = 3) of each vector in the > cluster is three, kmeans should still work even if one of the clusters > contained a single vector (number of rows = 1). Strictly speaking, kmeans is undefined in that case - there are various strategies which can be implemented, like cluster splitting, etc... Generally, I agree the code is not great. David From josef.pktd at gmail.com Fri Feb 6 11:25:31 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Feb 2009 11:25:31 -0500 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <5b8d13220902060805j1b16d281v203201ad53c0df54@mail.gmail.com> References: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> <5b8d13220902060805j1b16d281v203201ad53c0df54@mail.gmail.com> Message-ID: <1cd32cbb0902060825i65265e9idd359ee0a2522cea@mail.gmail.com> On Fri, Feb 6, 2009 at 11:05 AM, David Cournapeau wrote: > On Fri, Feb 6, 2009 at 11:37 PM, Roy H. Han > wrote: >> Well I feel like there are numerical problems with scipy's kmeans2(), >> at least in the 0.6.0 version of scipy. > > kmeans and kmeans2 are fairly low level - they will fail if you have > empty cluster, indeed. I thought that the tests test_kmeans_lost_cluster(self) verifies that empty clusters are handled. > >> I changed the code to try to ensure that no clusters were empty. >> Pycluster seems to be the better clustering algorithm for now. > > Maybe - I am not familiar with pycluster. > >> Even though the size (number of columns = 3) of each vector in the >> cluster is three, kmeans should still work even if one of the clusters >> contained a single vector (number of rows = 1). > > Strictly speaking, kmeans is undefined in that case - there are > various strategies which can be implemented, like cluster splitting, > etc... Generally, I agree the code is not great. > > David If the problem is just the cholesky decomposition in the random initialization, then it should be possible to switch to a different initialization scheme, or force a correct covariance matrix for the cholesky decomposition. Eg. replace with diagonal matrix or, ensure that cov has the right dimension and add a small diagonal array (as in Ridge regression or Tychonov penalization). Josef From sturla at molden.no Fri Feb 6 11:40:48 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 17:40:48 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <8FCF568D-EA63-45ED-9933-2DAB1B9714AC@semanchuk.com> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <8FCF568D-EA63-45ED-9933-2DAB1B9714AC@semanchuk.com> Message-ID: <498C6810.7030702@molden.no> On 2/6/2009 4:54 PM, Philip Semanchuk wrote: > You'd have to do your testing very carefully to see if dup() really > increments the kernel's reference count on a shared memory segment. Ok, in that case it is probably better to let Python take care of the reference counting. S.M. From c-b at asu.edu Fri Feb 6 15:59:29 2009 From: c-b at asu.edu (Christopher Brown) Date: Fri, 06 Feb 2009 13:59:29 -0700 Subject: [SciPy-user] Zero crossings Message-ID: <498CA4B1.3030807@asu.edu> Hi List, What's the best way to find all zero crossings in my data? Is there something already written in scipy? -- Chris From sturla at molden.no Fri Feb 6 16:13:46 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 22:13:46 +0100 Subject: [SciPy-user] Zero crossings In-Reply-To: <498CA4B1.3030807@asu.edu> References: <498CA4B1.3030807@asu.edu> Message-ID: <498CA80A.3000707@molden.no> On 2/6/2009 9:59 PM, Christopher Brown wrote: > Hi List, > > What's the best way to find all zero crossings in my data? Is there > something already written in scipy? > zc = numpy.where(numpy.sign(a[1:]) != numpy.sign(a[:-1])) ... or something like that. Sturla Molden From sturla at molden.no Fri Feb 6 16:22:53 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 06 Feb 2009 22:22:53 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <498C5969.1040809@molden.no> References: <2AE6D153-799C-450E-8E69-CA80D12E2FF5@math.toronto.edu> <747c5db37a4e870a8e8f562a4636c6e7.squirrel@webmail.uio.no> <20090202063833.GB9627@phare.normalesup.org> <3d375d730902012251y1159737fk325a923a344f25cf@mail.gmail.com> <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> Message-ID: <498CAA2D.10102@molden.no> Ok, so this is approximately what I had in mind for Windows. It is a named mutex and shared memory that is pickled by name (given that I read the Python manuals on pickling extension objects correctly...) It still lacks an ndarray subclass that is pickled without making a copy of the buffer, and also a malloc similar to multiprocessing. And similar Cython code has to be written for posix... But this is a start. If anyone feel like contributing, please do. S.M. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sharedmemory_win.pyx URL: From jean-pascal.mercier at inrialpes.fr Fri Feb 6 16:31:16 2009 From: jean-pascal.mercier at inrialpes.fr (J-Pascal Mercier) Date: Fri, 6 Feb 2009 22:31:16 +0100 Subject: [SciPy-user] Zero crossings In-Reply-To: <498CA4B1.3030807@asu.edu> References: <498CA4B1.3030807@asu.edu> Message-ID: <20090206223116.65da02fd@utopia> On Fri, 06 Feb 2009 13:59:29 -0700 Christopher Brown wrote: > Hi List, > > What's the best way to find all zero crossings in my data? Is there > something already written in scipy? > Hi, To my knowledge, there is no such function in scipy. Are your data 1D, 2D, 3D, ... ? What kind of precision you need? Do you have to find every zero-crossing? An easy solution in 1D without sub-grid accuracy would be something like : scipy.where(A[:-1] * A[1:] < 0) cheers, J-Pascal -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From karl.young at ucsf.edu Fri Feb 6 17:08:01 2009 From: karl.young at ucsf.edu (Karl Young) Date: Fri, 06 Feb 2009 14:08:01 -0800 Subject: [SciPy-user] stupid array tricks In-Reply-To: <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <498AF9E7.90200@molden.no> <1cd32cbb0902050736v5ae55230l215910f312562f2a@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <498B4326.7060207@molden.no> <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> Message-ID: <498CB4C1.6040302@ucsf.edu> I know there are a number of array manipulation maestros on the list and wanted to run a problem by the list that my feeble mind is foundering on. I have three objects, 1) an array containing an "image" (could be any dimension), 2) a mask for the image (of the same dimensions as the image), and 3) a "template" which is just a list of offset coordinates from any point in the image. Say the template has n elements, the problem is to move the template over the image and build an m x n array containing image values (at the template positions) where m is the number of image indices such that the template lies within the image and all mask values are true. I currently do this using some raveling, compressing, and length comparison but I still haven't been able to figure out how to do it without looping through the image indices (and this is sloooowwww for big multidimensional "images"). I keep feeling like there must be some clever way to concatenate the template with the image and mask so as to do this without looping but haven't been able to come up with it. I could speed it up by doing the looping in C but that doesn't seem very elegant. Any thoughts welcome. --KY From guilherme at gpfreitas.com Fri Feb 6 18:19:50 2009 From: guilherme at gpfreitas.com (Guilherme P. de Freitas) Date: Fri, 6 Feb 2009 15:19:50 -0800 Subject: [SciPy-user] Computational Economics with SciPy In-Reply-To: References: Message-ID: Actually it is being shipped already. Mine arrived today from Amazon! I'm a econ graduate student trying to do the computational assignments in Python, and these resources seem *very* helpful. Especially when none of your professors or classmates ever used Python before, so you have nobody to ask for help. I don't know what's the policy for what should and what should not be linked on SciPy's website, but I think these resources should definitely be linked there (notably the "Cookbook" section and the "Topical Software" section). Best, Guilherme On Wed, Jan 28, 2009 at 12:39 PM, Peter Skomoroch wrote: > Just stumbled across a new book by John Stachurski using scipy which will > ship later this month > > Economic Dynamics: Theory and Computation > John Stachurski > MIT Press, 2009 > http://www.amazon.com/Economic-Dynamics-Computation-John-Stachurski/dp/0262012774 > http://johnstachurski.net/book/book.html > > There are some nice tutorials using scipy here as well: > > http://johnstachurski.net/lectures/index.html > > >> Economic Dynamics: Theory and Computation is a graduate level introduction >> to deterministic and stochastic dynamics, dynamic programming and >> computational methods with economic applications. >> >> Topics >> >> Programming techniques >> Basic analysis (real analysis, metric spaces, fixed points) >> Deterministic dynamic systems >> Finite state Markov chains >> Finite state dynamic programming >> Continuous state stochastic dynamics >> Continuous state dynamic programming > > -Pete > > > > > -- > Peter N. Skomoroch > peter.skomoroch at gmail.com > http://www.datawrangling.com > http://del.icio.us/pskomoroch > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > > -- Guilherme P. de Freitas http://www.gpfreitas.com From robert.kern at gmail.com Fri Feb 6 18:27:26 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 6 Feb 2009 17:27:26 -0600 Subject: [SciPy-user] Computational Economics with SciPy In-Reply-To: References: Message-ID: <3d375d730902061527t47914da8i71427d3c65ab9a87@mail.gmail.com> On Fri, Feb 6, 2009 at 17:19, Guilherme P. de Freitas wrote: > Actually it is being shipped already. Mine arrived today from Amazon! > I'm a econ graduate student trying to do the computational assignments > in Python, and these resources seem *very* helpful. Especially when > none of your professors or classmates ever used Python before, so you > have nobody to ask for help. > > I don't know what's the policy for what should and what should not be > linked on SciPy's website, but I think these resources should > definitely be linked there (notably the "Cookbook" section and the > "Topical Software" section). I think a page describing the book and linking to its home page (not an Amazon link, for preference) would be good. I don't think it really fits into the Cookbook section, though. Hopefully, there will eventually be enough books to make a special category for such pages. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Fri Feb 6 19:14:55 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 7 Feb 2009 09:14:55 +0900 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <1cd32cbb0902060825i65265e9idd359ee0a2522cea@mail.gmail.com> References: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> <5b8d13220902060805j1b16d281v203201ad53c0df54@mail.gmail.com> <1cd32cbb0902060825i65265e9idd359ee0a2522cea@mail.gmail.com> Message-ID: <5b8d13220902061614q685e6d2ei413d3797d1812ae0@mail.gmail.com> On Sat, Feb 7, 2009 at 1:25 AM, wrote: > On Fri, Feb 6, 2009 at 11:05 AM, David Cournapeau wrote: >> On Fri, Feb 6, 2009 at 11:37 PM, Roy H. Han >> wrote: >>> Well I feel like there are numerical problems with scipy's kmeans2(), >>> at least in the 0.6.0 version of scipy. >> >> kmeans and kmeans2 are fairly low level - they will fail if you have >> empty cluster, indeed. > > I thought that the tests test_kmeans_lost_cluster(self) verifies that > empty clusters > are handled. Actually, it tests a warning/exception is raised, instead of silently fail - so you can for example repeat the kmeans procedure with different initializations values (that's how I use kmeans in the em toolbox). But again, a better kmeans algorithm implementation would be nice - I just not sure it should be in scipy, though, David From c-b at asu.edu Fri Feb 6 19:28:41 2009 From: c-b at asu.edu (Christopher Brown) Date: Fri, 06 Feb 2009 17:28:41 -0700 Subject: [SciPy-user] Zero crossings In-Reply-To: <498CA80A.3000707@molden.no> References: <498CA4B1.3030807@asu.edu> <498CA80A.3000707@molden.no> Message-ID: <498CD5B9.7060706@asu.edu> Thanks Sturla and J-Pascal, SM> zc = numpy.where(numpy.sign(a[1:]) != numpy.sign(a[:-1])) I want to estimate f0 based on zero crossings in recorded speech, and I've got something that looks pretty good for only a few hours of work. Does anyone have any interest in this kind of thing? -- Christopher Brown, Ph.D. Department of Speech and Hearing Science Arizona State University From robert.kern at gmail.com Fri Feb 6 19:31:57 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 6 Feb 2009 18:31:57 -0600 Subject: [SciPy-user] Zero crossings In-Reply-To: <498CD5B9.7060706@asu.edu> References: <498CA4B1.3030807@asu.edu> <498CA80A.3000707@molden.no> <498CD5B9.7060706@asu.edu> Message-ID: <3d375d730902061631m5b228828hfb49309580fe5000@mail.gmail.com> On Fri, Feb 6, 2009 at 18:28, Christopher Brown wrote: > Thanks Sturla and J-Pascal, > > SM> zc = numpy.where(numpy.sign(a[1:]) != numpy.sign(a[:-1])) > > I want to estimate f0 based on zero crossings in recorded speech, and > I've got something that looks pretty good for only a few hours of work. > > Does anyone have any interest in this kind of thing? Sure! It would make a good recipe for the Cookbook. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Fri Feb 6 19:35:08 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Feb 2009 19:35:08 -0500 Subject: [SciPy-user] Mysterious kmeans() error In-Reply-To: <5b8d13220902061614q685e6d2ei413d3797d1812ae0@mail.gmail.com> References: <6a5569ec0902060629u14b7b594vf568429ef1580375@mail.gmail.com> <6a5569ec0902060637y6cd7d1ddt66da6e9becce8504@mail.gmail.com> <5b8d13220902060805j1b16d281v203201ad53c0df54@mail.gmail.com> <1cd32cbb0902060825i65265e9idd359ee0a2522cea@mail.gmail.com> <5b8d13220902061614q685e6d2ei413d3797d1812ae0@mail.gmail.com> Message-ID: <1cd32cbb0902061635l336ed7cex695e344be1e64a3e@mail.gmail.com> On Fri, Feb 6, 2009 at 7:14 PM, David Cournapeau wrote: > On Sat, Feb 7, 2009 at 1:25 AM, wrote: >> On Fri, Feb 6, 2009 at 11:05 AM, David Cournapeau wrote: >>> On Fri, Feb 6, 2009 at 11:37 PM, Roy H. Han >>> wrote: >>>> Well I feel like there are numerical problems with scipy's kmeans2(), >>>> at least in the 0.6.0 version of scipy. >>> >>> kmeans and kmeans2 are fairly low level - they will fail if you have >>> empty cluster, indeed. >> >> I thought that the tests test_kmeans_lost_cluster(self) verifies that >> empty clusters >> are handled. > > Actually, it tests a warning/exception is raised, instead of silently > fail - so you can for example repeat the kmeans procedure with > different initializations values (that's how I use kmeans in the em > toolbox). Doesn't random initialization automatically restart with different random values. When I ran the example in test_kmeans_lost_cluster, it seemed to produce reasonable results after the warning, but I didn't verify any numbers. Also the follow up error that the OP got was in the cov calculation for the random init. So it seems to me that there is a failure in reinitializing the process. (But, I only looked at the source for this part and don't know how the cluster analysis in scipy is constructed overall.) Josef > > But again, a better kmeans algorithm implementation would be nice - I > just not sure it should be in scipy, though, > > David > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From guilherme at gpfreitas.com Sat Feb 7 06:20:52 2009 From: guilherme at gpfreitas.com (Guilherme P. de Freitas) Date: Sat, 7 Feb 2009 03:20:52 -0800 Subject: [SciPy-user] Computational Economics with SciPy In-Reply-To: <3d375d730902061527t47914da8i71427d3c65ab9a87@mail.gmail.com> References: <3d375d730902061527t47914da8i71427d3c65ab9a87@mail.gmail.com> Message-ID: On Fri, Feb 6, 2009 at 3:27 PM, Robert Kern wrote: > On Fri, Feb 6, 2009 at 17:19, Guilherme P. de Freitas > wrote: >> I don't know what's the policy for what should and what should not be >> linked on SciPy's website, but I think these resources should >> definitely be linked there (notably the "Cookbook" section and the >> "Topical Software" section). > > I think a page describing the book and linking to its home page (not > an Amazon link, for preference) would be good. I don't think it really > fits into the Cookbook section, though. Hopefully, there will > eventually be enough books to make a special category for such pages. Here's the link to the book page: http://johnstachurski.net/book/book.html As for the Cookbook section, I'm sorry, I should have been more specific. In the lectures, there are "Application" lectures, like "Finite-state Optimal Growth" http://johnstachurski.net/lectures/finite_growth.html And I feel these specific application lectures could be linked under the cookbook or topical software section. They are essentially recipes. However, the applications often refer to the book, but it is not strictly necessary, if you understand the problem. Just an idea. It's just that it took me a while to find this, and there's nothing like this in the SciPy website (which is a natural place to search). From stefan at sun.ac.za Sat Feb 7 12:36:46 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 7 Feb 2009 19:36:46 +0200 Subject: [SciPy-user] stupid array tricks In-Reply-To: <498CB4C1.6040302@ucsf.edu> References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com> <1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com> <498B1A3B.8040603@molden.no> <1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com> <498B2207.2030303@molden.no> <1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com> <498B29F6.4080508@molden.no> <498B4326.7060207@molden.no> <1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com> <498CB4C1.6040302@ucsf.edu> Message-ID: <9457e7c80902070936w792c9329o114f0a67acfbda3d@mail.gmail.com> 2009/2/7 Karl Young : > I have three objects, 1) an array containing an "image" (could be any > dimension), 2) a mask for the image (of the same dimensions as the > image), and 3) a "template" which is just a list of offset coordinates > from any point in the image. You can create a strided view of the image, so that the values around each position where the filter can be applied becomes a row. Thereafter, using the indexing tricks shown at http://mentat.za.net/numpy/numpy_advanced_slides/ index the view to produce the templated values at each position. Say your template has length n, then you'd have: template of shape (1, n) rows = np.arange(m)[:, None] with shape (m, 1) When using template and rows in a fancy indexing operating, you should get an output of shape (m, n). Here is a simplified example: # Strided view of your image In [25]: data Out[25]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [26]: rows Out[26]: array([[0], [1], [2], [3]]) In [27]: rows.shape Out[27]: (4, 1) In [28]: template Out[28]: array([[0, 2]]) In [29]: template.shape Out[29]: (1, 2) In [30]: data[rows, template] Out[30]: array([[ 0, 2], [ 3, 5], [ 6, 8], [ 9, 11]]) Hope that helps! Cheers St?fan From dav at alum.mit.edu Sat Feb 7 17:13:27 2009 From: dav at alum.mit.edu (Dav Clark) Date: Sat, 7 Feb 2009 14:13:27 -0800 Subject: [SciPy-user] failed easy_install on OSX Message-ID: <80AC4A04-B81C-4524-B504-B2FEA32C5AF0@alum.mit.edu> Hi, I found a small bug (more with OSX than with SciPy) but worth mentioning. If you upgrade setuptools on OS X without changing your path, for some reason /usr/bin/easy_install (system setuptools 0.6c7) will remain ahead of /usr/local/bin/easy_install (your current install). Then, if you try to do an easy_install of scipy, it fails because setuptools 0.6c7 doesn't provide the proper fcompiler attribute. Two solutions: 1) download and run setup.py manually - this will use the most recent setuptools via python or 2) Change your PATH so that /usr/local/bin comes before /usr/bin. Why this isn't the case already, I have no idea. I guess it's apples way of insulating casual users from hackers like us. I would like to put this advice here: http://www.scipy.org/Installing_SciPy/Mac_OS_X But I don't have permission. If you want to give me permission, I am DavClark on the scipy.org wiki. Cheers, Dav From robert.kern at gmail.com Sat Feb 7 17:27:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 7 Feb 2009 16:27:33 -0600 Subject: [SciPy-user] failed easy_install on OSX In-Reply-To: <80AC4A04-B81C-4524-B504-B2FEA32C5AF0@alum.mit.edu> References: <80AC4A04-B81C-4524-B504-B2FEA32C5AF0@alum.mit.edu> Message-ID: <3d375d730902071427r259ecebctd4383f9db41baf11@mail.gmail.com> On Sat, Feb 7, 2009 at 16:13, Dav Clark wrote: > Hi, > > I found a small bug (more with OSX than with SciPy) but worth > mentioning. > > If you upgrade setuptools on OS X without changing your path, for some > reason /usr/bin/easy_install (system setuptools 0.6c7) will remain > ahead of /usr/local/bin/easy_install (your current install). Then, if > you try to do an easy_install of scipy, it fails because setuptools > 0.6c7 doesn't provide the proper fcompiler attribute. No version of setuptools provides an fcompiler attribute. That's all numpy.distutils. I suspect there is a different problem going on. The system's Python comes with a 1.0.x series numpy. I think that is the root of the problem. > Two solutions: > > 1) download and run setup.py manually - this will use the most recent > setuptools via python > > or > > 2) Change your PATH so that /usr/local/bin comes before /usr/bin. Why > this isn't the case already, I have no idea. I guess it's apples way > of insulating casual users from hackers like us. I believe all of the Python binaries I am aware of (www.python.org, Activestate, and EPD) will modify your .bashrc or .bash_profile to place the appropriate bin/ path (not always /usr/local/bin/!) at the front of your $PATH. If you are using a different shell, you may have to do this manually. Additionally, the default installation location for scripts is not /usr/local/bin/ but /Library/Frameworks/Python.framework/Versions/Current/bin/, so I suspect you have modified your .pydistutilsrc file to point there. When you modify things, you are on your own. :-) > I would like to put this advice here: > > http://www.scipy.org/Installing_SciPy/Mac_OS_X > > But I don't have permission. If you want to give me permission, I am > DavClark on the scipy.org wiki. You have now been added to the EditorsGroup. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dav at alum.mit.edu Sat Feb 7 19:12:24 2009 From: dav at alum.mit.edu (Dav Clark) Date: Sat, 7 Feb 2009 16:12:24 -0800 Subject: [SciPy-user] failed easy_install on OSX In-Reply-To: <3d375d730902071427r259ecebctd4383f9db41baf11@mail.gmail.com> References: <80AC4A04-B81C-4524-B504-B2FEA32C5AF0@alum.mit.edu> <3d375d730902071427r259ecebctd4383f9db41baf11@mail.gmail.com> Message-ID: On Feb 7, 2009, at 2:27 PM, Robert Kern wrote: > On Sat, Feb 7, 2009 at 16:13, Dav Clark wrote: >> Hi, >> >> I found a small bug (more with OSX than with SciPy) but worth >> mentioning. >> >> If you upgrade setuptools on OS X without changing your path, for >> some >> reason /usr/bin/easy_install (system setuptools 0.6c7) will remain >> ahead of /usr/local/bin/easy_install (your current install). Then, >> if >> you try to do an easy_install of scipy, it fails because setuptools >> 0.6c7 doesn't provide the proper fcompiler attribute. > > No version of setuptools provides an fcompiler attribute. That's all > numpy.distutils. I suspect there is a different problem going on. The > system's Python comes with a 1.0.x series numpy. I think that is the > root of the problem. > >> Two solutions: >> >> 1) download and run setup.py manually - this will use the most recent >> setuptools via python >> >> or >> >> 2) Change your PATH so that /usr/local/bin comes before /usr/bin. >> Why >> this isn't the case already, I have no idea. I guess it's apples way >> of insulating casual users from hackers like us. > > I believe all of the Python binaries I am aware of (www.python.org, > Activestate, and EPD) will modify your .bashrc or .bash_profile to > place the appropriate bin/ path (not always /usr/local/bin/!) at the > front of your $PATH. If you are using a different shell, you may have > to do this manually. Additionally, the default installation location > for scripts is not /usr/local/bin/ but > /Library/Frameworks/Python.framework/Versions/Current/bin/, so I > suspect you have modified your .pydistutilsrc file to point there. > When you modify things, you are on your own. :-) This is a super-fresh install of OS X, using the system python. I have definitely not modified the .pydistutilsrc file... that's just where Apple set things to go by default for the system python. This problem shouldn't occur for a /Library/Framework install. Cheers, Dav From dmitrey15 at ukr.net Sun Feb 8 13:55:13 2009 From: dmitrey15 at ukr.net (Dmitrey) Date: Sun, 08 Feb 2009 20:55:13 +0200 Subject: [SciPy-user] MPS files, Python code - does anyone have? Message-ID: <498F2A91.9000201@ukr.net> Hi there, does anyone have Python-written code that can read and/or write MPS files? I could write it by myself but I'm short of time (I'm busy with other things). Having the one would be very helpful for connecting more LP/MILP solvers to openopt. Regards, D. From gael.varoquaux at normalesup.org Sun Feb 8 19:00:46 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 01:00:46 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <498CAA2D.10102@molden.no> References: <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> Message-ID: <20090209000046.GD12866@phare.normalesup.org> On Fri, Feb 06, 2009 at 10:22:53PM +0100, Sturla Molden wrote: > > Ok, so this is approximately what I had in mind for Windows. It is a named > mutex and shared memory that is pickled by name (given that I read the > Python manuals on pickling extension objects correctly...) > > It still lacks an ndarray subclass that is pickled without making a copy of > the buffer, and also a malloc similar to multiprocessing. > > And similar Cython code has to be written for posix... OK, I've given it try, but it seems that my sheer incompetence on these matters is about to be revealed. Running the attached test code, I get a bus error. The output of test.py is: {'/c0aa50edb5a04371b8414ef16a49a4fa': (3070545920L, 409600)} Buffer created Array created 3070545920 [1] 9882 bus error python test.py I am quite clueless as to where this comes from (I can see different posibilities) and how to debug this. Once again, this is from sheer incompetence, but I have never mmaped files throught the C API, and my days of C, especially memory allocation in C, are very far. I am posting this on the mailing list hoping that someone will have an idea as to what I am doing wrong. Once this work, we can start looking at making this clean to have posix and windows implementations work together. Ga?l -------------- next part -------------- A non-text attachment was scrubbed... Name: shared_arrays.zip Type: application/x-zip-compressed Size: 9628 bytes Desc: not available URL: From philip at semanchuk.com Sun Feb 8 19:22:20 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Sun, 8 Feb 2009 19:22:20 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209000046.GD12866@phare.normalesup.org> References: <20090202105316.GE11955@phare.normalesup.org> <786d3e06228152ae2c30291b139983e4.squirrel@webmail.uio.no> <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> Message-ID: <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> On Feb 8, 2009, at 7:00 PM, Gael Varoquaux wrote: > On Fri, Feb 06, 2009 at 10:22:53PM +0100, Sturla Molden wrote: >> >> Ok, so this is approximately what I had in mind for Windows. It is >> a named >> mutex and shared memory that is pickled by name (given that I read >> the >> Python manuals on pickling extension objects correctly...) >> >> It still lacks an ndarray subclass that is pickled without making a >> copy of >> the buffer, and also a malloc similar to multiprocessing. >> >> And similar Cython code has to be written for posix... > > OK, I've given it try, but it seems that my sheer incompetence on > these > matters is about to be revealed. Running the attached test code, I > get a > bus error. The output of test.py is: > > {'/c0aa50edb5a04371b8414ef16a49a4fa': (3070545920L, 409600)} > Buffer created > Array created > 3070545920 > [1] 9882 bus error python test.py > > I am quite clueless as to where this comes from (I can see different > posibilities) and how to debug this. > > Once again, this is from sheer incompetence, but I have never mmaped > files throught the C API, and my days of C, especially memory > allocation > in C, are very far. Hi Ga?l, I believe one must call ftruncate() on the file handle returned by shm_open(). Look at the example at the bottom of this page: http://www.opengroup.org/onlinepubs/000095399/functions/shm_open.html I hope this info is useful. Philip From gael.varoquaux at normalesup.org Mon Feb 9 01:15:11 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 07:15:11 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> References: <20090205231942.GB21014@phare.normalesup.org> <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> Message-ID: <20090209061511.GB26350@phare.normalesup.org> On Sun, Feb 08, 2009 at 07:22:20PM -0500, Philip Semanchuk wrote: > Hi Ga?l, > I believe one must call ftruncate() on the file handle returned by > shm_open(). Look at the example at the bottom of this page: > http://www.opengroup.org/onlinepubs/000095399/functions/shm_open.html Hurray, that was it! The code snippet at the end of this page is very clear. Thank you for the pointer. > I hope this info is useful. It really was. Thanks a lot. I need to do a few more checks, but I believe I have a first version of some code sharing arrays by name. Ga?l From gael.varoquaux at normalesup.org Mon Feb 9 03:23:44 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 09:23:44 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209061511.GB26350@phare.normalesup.org> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> Message-ID: <20090209082344.GA635@phare.normalesup.org> On Mon, Feb 09, 2009 at 07:15:11AM +0100, Gael Varoquaux wrote: > It really was. Thanks a lot. I need to do a few more checks, but I > believe I have a first version of some code sharing arrays by name. OK, I have a first working version under Unix (attached, with trivial test case). Now we need to make it so that the ndarray can be used in the mutliprocessing function call, rather than the buffer object. In other words we need to create an object that behaves as an ndarray, but implements a different pickling method. What do people suggest as a best approach here? Subclassing ndarray? Cheers, Ga?l -------------- next part -------------- A non-text attachment was scrubbed... Name: shared_arrays.zip Type: application/x-zip-compressed Size: 12108 bytes Desc: not available URL: From sturla at molden.no Mon Feb 9 06:20:33 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 12:20:33 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209082344.GA635@phare.normalesup.org> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> Message-ID: <49901181.3040801@molden.no> On 2/9/2009 9:23 AM, Gael Varoquaux wrote: > What do people suggest as a best > approach here? Subclassing ndarray? I have been working on that. Basically using what Robert Kern posted a while ago and ripping out some code from multiprocessing's heap object. S.M. From gael.varoquaux at normalesup.org Mon Feb 9 06:23:47 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 12:23:47 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <49901181.3040801@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <49901181.3040801@molden.no> Message-ID: <20090209112347.GC32331@phare.normalesup.org> On Mon, Feb 09, 2009 at 12:20:33PM +0100, Sturla Molden wrote: > On 2/9/2009 9:23 AM, Gael Varoquaux wrote: > > What do people suggest as a best > > approach here? Subclassing ndarray? > I have been working on that. Basically using what Robert Kern posted a > while ago and ripping out some code from multiprocessing's heap object. Fantastic. I have to worry about other things for a little while (real-work related), so I won't be competing without you to find a good solution :). Cheers, Ga?l From sturla at molden.no Mon Feb 9 06:28:59 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 12:28:59 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209082344.GA635@phare.normalesup.org> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> Message-ID: <4990137B.6030802@molden.no> On 2/9/2009 9:23 AM, Gael Varoquaux wrote: > On Mon, Feb 09, 2009 at 07:15:11AM +0100, Gael Varoquaux wrote: >> It really was. Thanks a lot. I need to do a few more checks, but I >> believe I have a first version of some code sharing arrays by name. > > OK, I have a first working version under Unix (attached, with trivial > test case). By the way, how is memory reclaimed under your Posix code? On Windows, a memory mapping is removed when there is no open handles to it. That is what the Handle object does (i.e. preventing a sytem wide memory leak). On System V IPC a shared segment it has to be marked for removal, i.e. there are no reference counting in the kernel as in Windows. So I was thinking out marking it for removal when the attachment count is zero. But as you have used Posix V IPC I have no idea. Just make sure it does not produce a global memory leak. S.M. From gael.varoquaux at normalesup.org Mon Feb 9 06:38:35 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 12:38:35 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <4990137B.6030802@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> Message-ID: <20090209113835.GD32331@phare.normalesup.org> On Mon, Feb 09, 2009 at 12:28:59PM +0100, Sturla Molden wrote: > On System V IPC a shared segment it has to be marked for removal, i.e. > there are no reference counting in the kernel as in Windows. So I was > thinking out marking it for removal when the attachment count is zero. > But as you have used Posix V IPC I have no idea. Just make sure it does > not produce a global memory leak. Hum, I believe you are right, and I have produced just that. This means that we probably need a shared reference counter :(. Sounds tedious to implement. Do people have any suggestions on how to implement this? I can see several possibilities: * Using multiprocessing to share the dictionnary of shared map addresses, but this induces a tight coupling with multiprocessing, and I am not sure we want this. * Sharing this dictionnary via a C structure, ie to do our own implementation of a shared state. * Add the ref count information in the shared array. For instance the first byte could be the ref count. This sounds the easiest option, but I am probably not seeing some of the problems that will arize from this approach. I think I am going to take a stab at option three, tonight or later in the week, but please, wise people of the list, give me feedback on what you think might work. Ga?l From sturla at molden.no Mon Feb 9 06:56:56 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 12:56:56 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209113835.GD32331@phare.normalesup.org> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> Message-ID: <49901A08.9020104@molden.no> On 2/9/2009 12:38 PM, Gael Varoquaux wrote: > This means that we probably need a shared reference counter :(. Sounds > tedious to implement. On System V, you can get the attachment count using shmctl with IPC_STAT. Then after calling shmdt, checking the count and marking for removal if it is zero: int cleanup(int shmid) { int ierr; struct shmid_ds buf; ierr = shmctl(shmid, IPC_STAT, &buf); if(ierr < 0) goto error; if (buf.shm_nattch == 0) { ierr = shmctl(shmid, IPC_RMID, NULL); if(ierr < 0) goto error; } return 0 error: return errno; } S.M. From philip at semanchuk.com Mon Feb 9 10:07:16 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Mon, 9 Feb 2009 10:07:16 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <49901A08.9020104@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> Message-ID: On Feb 9, 2009, at 6:56 AM, Sturla Molden wrote: > On 2/9/2009 12:38 PM, Gael Varoquaux wrote: > >> This means that we probably need a shared reference counter :(. >> Sounds >> tedious to implement. > > On System V, you can get the attachment count using shmctl with > IPC_STAT. Then after calling shmdt, checking the count and marking for > removal if it is zero: > > int cleanup(int shmid) > { > int ierr; > struct shmid_ds buf; > ierr = shmctl(shmid, IPC_STAT, &buf); > if(ierr < 0) goto error; > if (buf.shm_nattch == 0) { > ierr = shmctl(shmid, IPC_RMID, NULL); > if(ierr < 0) goto error; > } > return 0 > error: > return errno; > } Unfortunately POSIX IPC doesn't report that information. Since I'm not a numpy user I'm a little lost as to how you're using the shared memory here, but I gather that it is effectively "magic" to a numpy user? i.e., he doesn't have any idea that a shared memory segment is being created on his behalf? If that's the case I don't see any way around reference counting. From gael.varoquaux at normalesup.org Mon Feb 9 10:09:20 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Feb 2009 16:09:20 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: References: <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> Message-ID: <20090209150920.GC27832@phare.normalesup.org> On Mon, Feb 09, 2009 at 10:07:16AM -0500, Philip Semanchuk wrote: > Since I'm not a numpy user I'm a little lost as to how you're using > the shared memory here, but I gather that it is effectively "magic" to > a numpy user? i.e., he doesn't have any idea that a shared memory > segment is being created on his behalf? My goal would be that he shouldn't have to know, or to care. It should be as much transparent as possible. > If that's the case I don't see any way around reference counting. Thanks for your input, it is valued a lot. Ga?l From philip at semanchuk.com Mon Feb 9 10:28:44 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Mon, 9 Feb 2009 10:28:44 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090209082344.GA635@phare.normalesup.org> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> Message-ID: <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> On Feb 9, 2009, at 3:23 AM, Gael Varoquaux wrote: > On Mon, Feb 09, 2009 at 07:15:11AM +0100, Gael Varoquaux wrote: >> It really was. Thanks a lot. I need to do a few more checks, but I >> believe I have a first version of some code sharing arrays by name. > > OK, I have a first working version under Unix (attached, with trivial > test case). > > Now we need to make it so that the ndarray can be used in the > mutliprocessing function call, rather than the buffer object. In other > words we need to create an object that behaves as an ndarray, but > implements a different pickling method. What do people suggest as a > best > approach here? Subclassing ndarray? Ga?l, I notice that the size of the shared memory segment is set to "pages" * PAGESIZE. Who determines the value of "pages"? And what happens if the numpy object you're storing in the segment grows beyond that size? AFAIK ftruncate() can only be called *once* to resize the segment. That's true on OS X, anyway, so it's probably true elsewhere. I once wrote some code to implement a shared dict using shared memory, and this was a problem I ran into. What happens when an item grows? The solution I eventually developed was to have one shared memory segment for metadata and a collection of other shared memory segments to hold the actual data. The metadata segment stored a (pickled) free space map and if a request was made to store an item that was larger than any free space I had, I'd allocate a new segment of the appropriate size. Otherwise, I'd stick it in the smallest piece of free space that it would fit into in an existing segment. You can perhaps see where this is leading -- once one is tracking free space slots and so forth, one needs to think about memory compaction, too, because sooner or later items will get deleted from the dict and if nothing new is inserted all of that free space is sitting around going to waste. Also, is it consistent with your license to use code from Python itself? If so, then I have another minor suggestion. Cheers Philip From robert.kern at gmail.com Mon Feb 9 11:24:36 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 10:24:36 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> Message-ID: <3d375d730902090824y39a69cc0i8551d94df4be39b7@mail.gmail.com> On Mon, Feb 9, 2009 at 09:28, Philip Semanchuk wrote: > Also, is it consistent with your license to use code from Python > itself? If so, then I have another minor suggestion. Yup. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Mon Feb 9 11:42:36 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 17:42:36 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> Message-ID: <49905CFC.9010507@molden.no> On 2/9/2009 4:07 PM, Philip Semanchuk wrote: > Unfortunately POSIX IPC doesn't report that information. I'll suggest we use System V IPC instead, as it does report a ref count. Code example attached. It compiles with Cython but I have not done any testing except that. My suggestion is to spawn a thread in the creator process to monitor the attachment count for the segment, and mark it for removal when it has dropped to zero. There is a __dealloc__ in a Handle object that does the shmdt, and then Python should do the refcounting (similar to what is done for CloseHandle in Windows). We have to figure out what to do with ctrl-c. It is a source of trouble. With a daemonic GC thread it could cause a leak, with a non-daemonic GC thread it may hang forever (which is also a leak). So I opted for a daemonic GC thread. I also have a version of the Windows sharedmem with a small bugfix (I forgot to unmap the segment before closing the handle). I had to remove the mutex from the Windows code. It can be put in a separate module. We should also have a lock with a named Sys V semaphore. > Since I'm not a numpy user I'm a little lost as to how you're using > the shared memory here, but I gather that it is effectively "magic" to > a numpy user? i.e., he doesn't have any idea that a shared memory > segment is being created on his behalf? If that's the case I don't see > any way around reference counting. We are going to use multiple processes as if they were threads. It is basically a hack to work around Python's GIL (global interpreter lock). Basically we want to create ndarray's with the same interface as before, except that they have shared memory as data. For example, import numpy a = numpy.zeros((4,1024), order='F', dtype=float) import scipy a = scipy.sharedmem.zeros((4,1024), order='F', dtype=float) should do the same, except that the latter uses shared memory. And when it is sent through a multiprocessing.Queue, only the segment name, offset, shape and dtype gets pickled. In the former case, a copy of the whole data buffer is made. Right now we are just creating the shared memory buffer to use as backend. In multiprocessing you will find an object called mp.Array. We can wrap its buffer with an ndarray, but it cannot be passes through a mp.Queue. In other words, all shared memory must be allocated in advance. And that is what we don't want. Sturla Molden -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sharedmemory_sysv.pyx URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sharedmemory_win.pyx URL: From philip at semanchuk.com Mon Feb 9 11:48:58 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Mon, 9 Feb 2009 11:48:58 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <3d375d730902090824y39a69cc0i8551d94df4be39b7@mail.gmail.com> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> <3d375d730902090824y39a69cc0i8551d94df4be39b7@mail.gmail.com> Message-ID: <4CF29518-2439-4938-A85B-E2B4DEE68D57@semanchuk.com> On Feb 9, 2009, at 11:24 AM, Robert Kern wrote: > On Mon, Feb 9, 2009 at 09:28, Philip Semanchuk > wrote: >> Also, is it consistent with your license to use code from Python >> itself? If so, then I have another minor suggestion. > > Yup. I'm not sure how prevalent the getpagesize() API is. You might want to consider using the following code (from Python's mmapmodule.c) to get the page size. #ifdef MS_WINDOWS #include static int my_getpagesize(void) { SYSTEM_INFO si; GetSystemInfo(&si); return si.dwPageSize; } #endif #ifdef UNIX #include #include #if defined(HAVE_SYSCONF) && defined(_SC_PAGESIZE) static int my_getpagesize(void) { return sysconf(_SC_PAGESIZE); } #else #define my_getpagesize getpagesize #endif #endif /* UNIX */ From sturla at molden.no Mon Feb 9 11:59:42 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 17:59:42 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> Message-ID: <499060FE.6020608@molden.no> On 2/9/2009 4:28 PM, Philip Semanchuk wrote: > I once wrote some code to implement a shared dict using shared memory, > and this was a problem I ran into. We should have that removed. The actual allocation will be rounded up to a multiple of the page size. So to prevent a leak we should round up before allocating and reporting the actual size. > What happens when an item grows? We don't want an array to grow or move once it has been created. But a process should be allowed to create subarray views. S.M. From simpson at math.toronto.edu Mon Feb 9 12:04:22 2009 From: simpson at math.toronto.edu (Gideon Simpson) Date: Mon, 9 Feb 2009 12:04:22 -0500 Subject: [SciPy-user] root finding in complex valued systems Message-ID: Do any of the SciPy root finding algorithms for systems of equations have native support for when the equations are complex valued? -gideon From karl.young at ucsf.edu Mon Feb 9 12:03:45 2009 From: karl.young at ucsf.edu (Young, Karl) Date: Mon, 9 Feb 2009 09:03:45 -0800 Subject: [SciPy-user] stupid array tricks References: <5063d0650902050337s3a4b656k71f9a5634c589589@mail.gmail.com><1cd32cbb0902050839m4e587390s31ef7b5f6267c5d4@mail.gmail.com><498B1A3B.8040603@molden.no><1cd32cbb0902050923j492f5683icbe737533159a24e@mail.gmail.com><498B2207.2030303@molden.no><1cd32cbb0902050946i700777cdhc920711cb393353f@mail.gmail.com><498B29F6.4080508@molden.no> <498B4326.7060207@molden.no><1cd32cbb0902051230k2c024a0cnc25448f6c1613679@mail.gmail.com><498CB4C1.6040302@ucsf.edu> <9457e7c80902070936w792c9329o114f0a67acfbda3d@mail.gmail.com> Message-ID: <9D202D4E86A4BF47BA6943ABDF21BE78058FAB8A@EXVS06.net.ucsf.edu> Thanks Stefan ! The index meister comes through again. I was sort of thinking along those lines but couldn't quite take the final step of understanding how to get the row coordinates for arbitrary filter locations. BTW, I should send you the latest version of my modification of glcom; the "profiling" that showed this part of my code to be the current bottleneck was the result of the generalized glcom (arbitrary number of co-registered images and arbitrary templates) being so fast at generating the co-occurrence matrices (for others on the list, Stefan wrote a very nice ctypes module, glcom, that generates co-occurrence matrices from an image for doing texture analysis). Karl Young Center for Imaging of Neurodegenerative Disease, UCSF VA Medical Center, MRS Unit (114M) Phone: (415) 221-4810 x3114 FAX: (415) 668-2864 Email: karl young at ucsf edu -----Original Message----- From: scipy-user-bounces at scipy.org on behalf of St?fan van der Walt Sent: Sat 2/7/2009 9:36 AM To: SciPy Users List Subject: Re: [SciPy-user] stupid array tricks 2009/2/7 Karl Young : > I have three objects, 1) an array containing an "image" (could be any > dimension), 2) a mask for the image (of the same dimensions as the > image), and 3) a "template" which is just a list of offset coordinates > from any point in the image. You can create a strided view of the image, so that the values around each position where the filter can be applied becomes a row. Thereafter, using the indexing tricks shown at http://mentat.za.net/numpy/numpy_advanced_slides/ index the view to produce the templated values at each position. Say your template has length n, then you'd have: template of shape (1, n) rows = np.arange(m)[:, None] with shape (m, 1) When using template and rows in a fancy indexing operating, you should get an output of shape (m, n). Here is a simplified example: # Strided view of your image In [25]: data Out[25]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [26]: rows Out[26]: array([[0], [1], [2], [3]]) In [27]: rows.shape Out[27]: (4, 1) In [28]: template Out[28]: array([[0, 2]]) In [29]: template.shape Out[29]: (1, 2) In [30]: data[rows, template] Out[30]: array([[ 0, 2], [ 3, 5], [ 6, 8], [ 9, 11]]) Hope that helps! Cheers St?fan _______________________________________________ SciPy-user mailing list SciPy-user at scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user From robert.kern at gmail.com Mon Feb 9 12:42:13 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 11:42:13 -0600 Subject: [SciPy-user] root finding in complex valued systems In-Reply-To: References: Message-ID: <3d375d730902090942q542b70c3r77f5aaf55f72f88f@mail.gmail.com> On Mon, Feb 9, 2009 at 11:04, Gideon Simpson wrote: > Do any of the SciPy root finding algorithms for systems of equations > have native support for when the equations are complex valued? Not really. You have to separate out the .real and .imag separately. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Mon Feb 9 12:44:07 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 18:44:07 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <4CF29518-2439-4938-A85B-E2B4DEE68D57@semanchuk.com> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> <3d375d730902090824y39a69cc0i8551d94df4be39b7@mail.gmail.com> <4CF29518-2439-4938-A85B-E2B4DEE68D57@semanchuk.com> Message-ID: <49906B67.5070109@molden.no> On 2/9/2009 5:48 PM, Philip Semanchuk wrote: > I'm not sure how prevalent the getpagesize() API is. You might want to > consider using the following code (from Python's mmapmodule.c) to get > the page size. I think we can just use mmap.PAGESIZE :) S.M. From philip at semanchuk.com Mon Feb 9 12:49:06 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Mon, 9 Feb 2009 12:49:06 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <49905CFC.9010507@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> Message-ID: <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> On Feb 9, 2009, at 11:42 AM, Sturla Molden wrote: > On 2/9/2009 4:07 PM, Philip Semanchuk wrote: > >> Unfortunately POSIX IPC doesn't report that information. > > I'll suggest we use System V IPC instead, as it does report a ref > count. Code example attached. It compiles with Cython but I have not > done any testing except that. > > My suggestion is to spawn a thread in the creator process to monitor > the attachment count for the segment, and mark it for removal when > it has dropped to zero. There is a __dealloc__ in a Handle object > that does the shmdt, and then Python should do the refcounting > (similar to what is done for CloseHandle in Windows). If you're destroying the segment when the attach count drops to zero, why not check that immediately after the call to shmdt()? > key = ftok( (self.name), 0) ftok() should probably be avoided as it returns duplicate keys: http://nikitathespider.com/python/shm/#ftok I'd recommend using a random number generator instead. I believe a key_t is guaranteed to fit into an int, so you could generate a random number anywhere from 1 to INT_MAX, taking care not to step on the value IPC_PRIVATE (unless you want to assume that that is always #defined to 0). From sturla at molden.no Mon Feb 9 13:05:35 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 19:05:35 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> Message-ID: <4990706F.7000505@molden.no> On 2/9/2009 6:49 PM, Philip Semanchuk wrote: > If you're destroying the segment when the attach count drops to zero, > why not check that immediately after the call to shmdt()? I thought it was only the owner/creator that was allowed to do that? > ftok() should probably be avoided as it returns duplicate keys: > http://nikitathespider.com/python/shm/#ftok Oh :( In that case I could rewrite the object to pickle the shmid instead of a random name (uuid string) on System V. > I'd recommend using a random number generator instead. I believe a > key_t is guaranteed to fit into an int, so you could generate a random > number anywhere from 1 to INT_MAX, taking care not to step on the > value IPC_PRIVATE (unless you want to assume that that is always > #defined to 0). I am not sure how big the problem is, as I pass an uuid as filename to ftok. S.M. From anjiro at cc.gatech.edu Mon Feb 9 13:03:38 2009 From: anjiro at cc.gatech.edu (Daniel Ashbrook) Date: Mon, 09 Feb 2009 13:03:38 -0500 Subject: [SciPy-user] suppress scientific notation printing for large numbers? Message-ID: <49906FFA.2000909@cc.gatech.edu> So using set_printoptions, I can set suppress=True, and suppress printing of tiny numbers using scientific notation. However, it doesn't do anything with respect to large numbers - is there any way to force large numbers in arrays to be printed as they are? Thanks, dan From sturla at molden.no Mon Feb 9 13:24:25 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Feb 2009 19:24:25 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <49901181.3040801@molden.no> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <20090205234115.GC29684@phare.normalesup.org> <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <49901181.3040801@molden.no> Message-ID: <499074D9.6060401@molden.no> On 2/9/2009 12:20 PM, Sturla Molden wrote: > I have been working on that. Basically using what Robert Kern posted a > while ago and ripping out some code from multiprocessing's heap object. Here is a first draft. The ftok issue is not fixed. I am not sure if Robert Kern's use of copy_reg affects ndarrays in general, or just the ones we create here. There are probably a few bugs to kill. And we need a setup script. S.M. -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmem.zip Type: application/x-zip-compressed Size: 7088 bytes Desc: not available URL: From robert.kern at gmail.com Mon Feb 9 13:26:19 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 12:26:19 -0600 Subject: [SciPy-user] suppress scientific notation printing for large numbers? In-Reply-To: <49906FFA.2000909@cc.gatech.edu> References: <49906FFA.2000909@cc.gatech.edu> Message-ID: <3d375d730902091026t51b60bf2jdcae6d5f9db03c94@mail.gmail.com> On Mon, Feb 9, 2009 at 12:03, Daniel Ashbrook wrote: > So using set_printoptions, I can set suppress=True, and suppress > printing of tiny numbers using scientific notation. However, it doesn't > do anything with respect to large numbers - is there any way to force > large numbers in arrays to be printed as they are? No. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Mon Feb 9 13:28:59 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 12:28:59 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <499074D9.6060401@molden.no> References: <3d375d730902051523q179b552bg11218a0b22a161b1@mail.gmail.com> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <49901181.3040801@molden.no> <499074D9.6060401@molden.no> Message-ID: <3d375d730902091028u7f00aa35i84699372c07ed94b@mail.gmail.com> On Mon, Feb 9, 2009 at 12:24, Sturla Molden wrote: > On 2/9/2009 12:20 PM, Sturla Molden wrote: > >> I have been working on that. Basically using what Robert Kern posted a >> while ago and ripping out some code from multiprocessing's heap object. > > Here is a first draft. The ftok issue is not fixed. > > I am not sure if Robert Kern's use of copy_reg affects ndarrays in general, > or just the ones we create here. It affects ndarrays in general. The reduce function should ideally be written to detect whether the ndarray is shared, or is a view eventually leading back to a shared ndarray, or is just a regular ndarray. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From philip at semanchuk.com Mon Feb 9 15:25:57 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Mon, 9 Feb 2009 15:25:57 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <4990706F.7000505@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> Message-ID: <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> On Feb 9, 2009, at 1:05 PM, Sturla Molden wrote: > On 2/9/2009 6:49 PM, Philip Semanchuk wrote: > >> If you're destroying the segment when the attach count drops to zero, >> why not check that immediately after the call to shmdt()? > > I thought it was only the owner/creator that was allowed to do that? Yes, sort of. Sys V IPC objects are owned by users, not processes, so if user foo creates a semaphore in one process and destroys it in another, that's OK. I just verified this on OS X. Unless portions of your SciPy application will be running under different users, I think the matter of which process destroys an IPC object is irrelevant. >> ftok() should probably be avoided as it returns duplicate keys: >> http://nikitathespider.com/python/shm/#ftok > > Oh :( > > In that case I could rewrite the object to pickle the shmid instead > of a > random name (uuid string) on System V. But you need the key, not the id, to pass to shmget() to get a handle to an existing IPC object. >> I'd recommend using a random number generator instead. I believe a >> key_t is guaranteed to fit into an int, so you could generate a >> random >> number anywhere from 1 to INT_MAX, taking care not to step on the >> value IPC_PRIVATE (unless you want to assume that that is always >> #defined to 0). > > I am not sure how big the problem is, as I pass an uuid as filename > to ftok. I'm not sure how big the problem is either. All I know is that in my experience, ftok() returned the same key for different files in the same directory. I realized, therefore, that my code needed to handle the case where ftok() didn't generate a useful key. Since I needed a second, more reliable method of key generation, why use ftok() at all? If I were you, rather than trying to figure out how broken ftok() is (and it might be broken in different ways on different platforms), I'd just abandon it altogether. It's not as if generating a random number instead is difficult. In fact, it's easier. Instead of generating a random uuid and passing that to ftok(), eliminate the middleman and generate a random key yourself. From sturla at molden.no Mon Feb 9 16:41:59 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 9 Feb 2009 22:41:59 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> Message-ID: > On Feb 9, 2009, at 1:05 PM, Sturla Molden wrote: > Yes, sort of. Sys V IPC objects are owned by users, not processes, so > if user foo creates a semaphore in one process and destroys it in > another, that's OK. I just verified this on OS X. Unless portions of > your SciPy application will be running under different users, I think > the matter of which process destroys an IPC object is irrelevant. My programs will certainly not do that (but I use Windows anyway). I have one process that spawns workers, and they run with the same user. I think that covers 99% of all use for this extension. But others may have a more complex design, so the safest method is to let the creator kill the segment. But it comes at the expense of a thread. Otherwise I could just assume the user is the same and do the check after shmdt. As I use Windows I have no personal preference here. Not using a thread avoids some of the ctrl-c issue, and it will be a bit faster. But then all processes sharing memory must run with the same user, otherwise there will be leaks and havoc. >> In that case I could rewrite the object to pickle the shmid instead >> of a >> random name (uuid string) on System V. > > But you need the key, not the id, to pass to shmget() to get a handle > to an existing IPC object. Yes, but if we use the shmid, we have the handle. So we can pass that integer from one process to another. At least that is what my old book on Linux programming describes. Since you say I need to use shmget, can I assume this method is not valid on all Unix incarnations? > If I were you, rather than trying to figure out how broken ftok() is > (and it might be broken in different ways on different platforms), I'd > just abandon it altogether. It's not as if generating a random number > instead is difficult. In fact, it's easier. Instead of generating a > random uuid and passing that to ftok(), eliminate the middleman and > generate a random key yourself. It has to be a unique key for the system, not just a random number. So I could try to call shmget multiple times with IPC_EXCL until it succeeds. Then I'll have to check why it failed as well. This is the first time I have found Windows to be the less annoying system. Thanks for your help. :-) Sturla Molden From sturla at molden.no Mon Feb 9 18:50:44 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 10 Feb 2009 00:50:44 +0100 (CET) Subject: [SciPy-user] shared memory machines In-Reply-To: References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> Message-ID: <038c02f4374caf7e49b01c0203e8a6bd.squirrel@webmail.uio.no> Just a small update: I have removed ftok and done what Philip Semanchuk suggested. The System V version now uses numpy's random integer generator to create a key (and if it fails, checks errno for EEXIST). Clean up can be done using threads or without threads; keep your paws off os.setuid if you set gc_thread to False. S.M. -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmemory_sysv.pyx Type: / Size: 7996 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmemory_win.pyx Type: / Size: 5418 bytes Desc: not available URL: From matthew.brett at gmail.com Mon Feb 9 19:31:29 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 9 Feb 2009 16:31:29 -0800 Subject: [SciPy-user] scipy.org Message-ID: <1e2af89e0902091631s777226f7k17eedcee1657716c@mail.gmail.com> Hi, Could scipy.org be down again? Best, Matthew From robert.kern at gmail.com Mon Feb 9 19:37:57 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 18:37:57 -0600 Subject: [SciPy-user] scipy.org In-Reply-To: <1e2af89e0902091631s777226f7k17eedcee1657716c@mail.gmail.com> References: <1e2af89e0902091631s777226f7k17eedcee1657716c@mail.gmail.com> Message-ID: <3d375d730902091637s55ce128coc963c0bce36c8124@mail.gmail.com> On Mon, Feb 9, 2009 at 18:31, Matthew Brett wrote: > Hi, > > Could scipy.org be down again? It was. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Mon Feb 9 20:23:09 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 10 Feb 2009 02:23:09 +0100 (CET) Subject: [SciPy-user] shared memory machines Message-ID: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> Ok, the work is basically done :) What remains is testing/debugging and a setup script. Perhaps we should move this debate to scipy-dev? I feel like I am spamming this list... Regards, Sturla Molden -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmem.zip Type: application/x-zip-compressed Size: 7117 bytes Desc: not available URL: From robert.kern at gmail.com Mon Feb 9 20:27:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Feb 2009 19:27:33 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> Message-ID: <3d375d730902091727x4e3c9760m6c164025d9f0aa36@mail.gmail.com> On Mon, Feb 9, 2009 at 19:23, Sturla Molden wrote: > Ok, the work is basically done :) > > What remains is testing/debugging and a setup script. > > Perhaps we should move this debate to scipy-dev? I feel like I am spamming > this list... Whatever. It would be nice, though, if you hosted the files somewhere, perhaps under source control, instead of passing them around in attachments. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From philip at semanchuk.com Tue Feb 10 00:05:42 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Tue, 10 Feb 2009 00:05:42 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> Message-ID: <92829ED2-44D4-497E-9AEF-2B49283778D7@semanchuk.com> On Feb 9, 2009, at 4:41 PM, Sturla Molden wrote: >>> In that case I could rewrite the object to pickle the shmid instead >>> of a >>> random name (uuid string) on System V. >> >> But you need the key, not the id, to pass to shmget() to get a handle >> to an existing IPC object. > > Yes, but if we use the shmid, we have the handle. So we can pass that > integer from one process to another. At least that is what my old > book on > Linux programming describes. Since you say I need to use shmget, can I > assume this method is not valid on all Unix incarnations? No, sorry, ignore what I said. I was not thinking clearly. >> If I were you, rather than trying to figure out how broken ftok() is >> (and it might be broken in different ways on different platforms), >> I'd >> just abandon it altogether. It's not as if generating a random number >> instead is difficult. In fact, it's easier. Instead of generating a >> random uuid and passing that to ftok(), eliminate the middleman and >> generate a random key yourself. > > It has to be a unique key for the system, not just a random number. > So I > could try to call shmget multiple times with IPC_EXCL until it > succeeds. > Then I'll have to check why it failed as well. Exactly. It's a pain in the arse but it is what must be done. > This is the first time I have found Windows to be the less annoying > system. Indeed, that's a rare occurrence. The POSIX API is better than Sys V in that there's a much larger "key" space, so large that collisions between randomly generated ids are statistically...hmmm, maybe I should watch my mouth on a statistically-oriented mailing list. =) Under POSIX, an IPC object's name is a string that starts with a slash. Under FreeBSD (the most restrictive API I've encountered), the name is limited to 14 filename-permissible characters. Subtracting the leading slash, that's space for 13 alphanumeric characters plus underscore and dot. Filenames permit 52 upper and lowercase characters plus 10 digits plus 2 punctuation characters = 64 characters. 64**13 is a lot more choices than INT_MAX. So OK, I stand by my statement. How does the Windows API resolve name/key collisions? > Thanks for your help. :-) Glad to be able to provide it. (Ni ?r v?lkomna.) Cheers Philip From gael.varoquaux at normalesup.org Tue Feb 10 01:09:38 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 10 Feb 2009 07:09:38 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <49906B67.5070109@molden.no> References: <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <64DD8CB4-3089-4695-91E7-F98907954E13@semanchuk.com> <3d375d730902090824y39a69cc0i8551d94df4be39b7@mail.gmail.com> <4CF29518-2439-4938-A85B-E2B4DEE68D57@semanchuk.com> <49906B67.5070109@molden.no> Message-ID: <20090210060938.GB4170@phare.normalesup.org> On Mon, Feb 09, 2009 at 06:44:07PM +0100, Sturla Molden wrote: > On 2/9/2009 5:48 PM, Philip Semanchuk wrote: > > I'm not sure how prevalent the getpagesize() API is. You might want to > > consider using the following code (from Python's mmapmodule.c) to get > > the page size. > I think we can just use mmap.PAGESIZE :) Good point :). I was using getpagesize, from unistd.h. Ga?l From jbh at broad.mit.edu Tue Feb 10 07:19:25 2009 From: jbh at broad.mit.edu (John Hanks) Date: Tue, 10 Feb 2009 07:19:25 -0500 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: References: Message-ID: I'm trying to build scipy for Python 2.6.1 using gcc-4.3.3. BLAS and LAPACK both build successfully as does ATLAS. numpy and scipy find the libraries and build without any obvious problems. But when I try to use scipy.linalg, I get this error: Python 2.6.1 (r261:67515, Feb 9 2009, 17:41:40) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy.linalg Traceback (most recent call last): File "", line 1, in File "/broad/tools/Linux/x86_64/pkgs/python_2.6.1/lib/python2.6/site-packages/scipy/linalg/__init__.py", line 8, in from basic import * File "/broad/tools/Linux/x86_64/pkgs/python_2.6.1/lib/python2.6/site-packages/scipy/linalg/basic.py", line 17, in from lapack import get_lapack_funcs File "/broad/tools/Linux/x86_64/pkgs/python_2.6.1/lib/python2.6/site-packages/scipy/linalg/lapack.py", line 17, in from scipy.linalg import flapack ImportError: /broad/tools/Linux/x86_64/pkgs/python_2.6.1/lib/python2.6/site-packages/scipy/linalg/flapack.so: undefined symbol: iladlc_ I've repeated this using variations of every set of install instructions for scipy that I can find with google with the same result each time. Any suggestions about where to look for what I've broken would be appreciated. Thanks, jbh From david at ar.media.kyoto-u.ac.jp Tue Feb 10 07:16:41 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 10 Feb 2009 21:16:41 +0900 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: References: Message-ID: <49917029.3060203@ar.media.kyoto-u.ac.jp> John Hanks wrote: > I've repeated this using variations of every set of install > instructions for scipy that I can find with google with the same > result each time. Any suggestions about where to look for what I've > broken would be appreciated. > Lapack 3.2 is not supported - please use 3.1.1 or below, David From sturla at molden.no Tue Feb 10 08:01:28 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 10 Feb 2009 14:01:28 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <92829ED2-44D4-497E-9AEF-2B49283778D7@semanchuk.com> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> <92829ED2-44D4-497E-9AEF-2B49283778D7@semanchuk.com> Message-ID: <49917AA8.4020603@molden.no> On 2/10/2009 6:05 AM, Philip Semanchuk wrote: > How does the Windows API resolve name/key collisions? Right. On Windows a name is a string, and an UUIDs should be unique to the system. If a name exists, CreateFileMapping fails and GetLastError returns ERROR_INVALID_HANDLE. If the object alredy exist in the process, CreateFileMapping returns a valid handle but GetLastError returns ERROR_ALREADY_EXISTS. I'll put som some tests for that to be pedantic, albeit UUIDs should be unique. Sturla Molden From sturla at molden.no Tue Feb 10 09:24:03 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 10 Feb 2009 15:24:03 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <49917AA8.4020603@molden.no> References: <6ce0ac130902051634i1408fddeiffd54c14f793f688@mail.gmail.com> <498C53F9.3070708@molden.no> <577F2D0E-D3B9-4400-912B-3EB6EFD5F48C@semanchuk.com> <498C5969.1040809@molden.no> <498CAA2D.10102@molden.no> <20090209000046.GD12866@phare.normalesup.org> <56B03E36-8CA9-4F34-B0C4-C38C833C8ACD@semanchuk.com> <20090209061511.GB26350@phare.normalesup.org> <20090209082344.GA635@phare.normalesup.org> <4990137B.6030802@molden.no> <20090209113835.GD32331@phare.normalesup.org> <49901A08.9020104@molden.no> <49905CFC.9010507@molden.no> <450D80F1-8023-43F0-A02C-28A87919F775@semanchuk.com> <4990706F.7000505@molden.no> <98CBDDB9-24E0-4769-A526-B58EACC89498@semanchuk.com> <92829ED2-44D4-497E-9AEF-2B49283778D7@semanchuk.com> <49917AA8.4020603@molden.no> Message-ID: <49918E03.5090404@molden.no> The Windows version seems to be working correctly on my computers. I will put updates here: http://folk.uio.no/sturlamo/python/sharedmem.zip Testing (particularly on Unix and friends) is appreciated. If you have comments, encounter bugs, or have corrections, please send them to my email. usage: import sharedmem as shm Now shm.zeros, shm.ones, and shm.empty should work like their numpy equivalents. They are pickled and depickled by name (hidden from sight), meaning only metadata is stored in the pickle. Call .copy() if you need the pickle to contain a copy of the data as well. Unlike multiprocessing.Array, these shared memory arrays can be sent through a multiprocessing.Queue, tcp, or any other IPC you may think of. That's it. I am done spamming this list with this for now. Regards, Sturla Molden From jbh at broad.mit.edu Tue Feb 10 09:30:39 2009 From: jbh at broad.mit.edu (John Hanks) Date: Tue, 10 Feb 2009 09:30:39 -0500 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: <49917029.3060203@ar.media.kyoto-u.ac.jp> References: <49917029.3060203@ar.media.kyoto-u.ac.jp> Message-ID: On Tue, Feb 10, 2009 at 7:16 AM, David Cournapeau wrote: > John Hanks wrote: >> I've repeated this using variations of every set of install >> instructions for scipy that I can find with google with the same >> result each time. Any suggestions about where to look for what I've >> broken would be appreciated. >> > > Lapack 3.2 is not supported - please use 3.1.1 or below, > I went back and rebuilt everything from LAPACK 3.1.1 and forward and still get the same error. Here's my compiler settings: ATLAS (from Make.inc) ICC = gcc ICCFLAGS = $(CDEFS) -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 SMC = gcc SMCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 DMC = gcc DMCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 SKC = gcc SKCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 DKC = gcc DKCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 XCC = gcc XCCFLAGS = $(CDEFS) -O -fomit-frame-pointer -fPIC -m64 F77 = gfortran F77FLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 SMAFLAGS = -fno-tree-loop-optimize DMAFLAGS = -fno-tree-loop-optimize LAPACK (from make.inc) FORTRAN = gfortran OPTS = -O2 -fPIC -m64 DRVOPTS = $(OPTS) NOOPT = -O0 -fPIC -m64 LOADER = gfortran LOADOPTS = The numpy and scipy setup finds gfortran and produces a mostly functional scipy install except for the undefined symbol error. Thanks, jbh From david at ar.media.kyoto-u.ac.jp Tue Feb 10 09:20:01 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 10 Feb 2009 23:20:01 +0900 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: References: <49917029.3060203@ar.media.kyoto-u.ac.jp> Message-ID: <49918D11.6090702@ar.media.kyoto-u.ac.jp> John Hanks wrote: > > I went back and rebuilt everything from LAPACK 3.1.1 and forward and > still get the same error. > If you get the same error, you forgot to rebuild something, or there is a leftover. The ILADLC function is specific to Lapack 3.2 AFAIK. If possible, you should really use lapack as packaged by your distribution, it will be much easier cheers, David From michael.abshoff at googlemail.com Tue Feb 10 09:53:06 2009 From: michael.abshoff at googlemail.com (Michael Abshoff) Date: Tue, 10 Feb 2009 06:53:06 -0800 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: References: <49917029.3060203@ar.media.kyoto-u.ac.jp> Message-ID: <499194D2.9090704@gmail.com> John Hanks wrote: > On Tue, Feb 10, 2009 at 7:16 AM, David Cournapeau > wrote: >> John Hanks wrote: Hi John, > I went back and rebuilt everything from LAPACK 3.1.1 and forward and > still get the same error. > > Here's my compiler settings: > > ATLAS (from Make.inc) > ICC = gcc > ICCFLAGS = $(CDEFS) -fomit-frame-pointer -mfpmath=387 -O2 > -falign-loops=4 -fPIC -m64 > SMC = gcc > SMCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 > DMC = gcc > DMCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 > SKC = gcc > SKCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 > DKC = gcc > DKCFLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 > XCC = gcc > XCCFLAGS = $(CDEFS) -O -fomit-frame-pointer -fPIC -m64 > F77 = gfortran > F77FLAGS = -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m64 > SMAFLAGS = -fno-tree-loop-optimize > DMAFLAGS = -fno-tree-loop-optimize > > LAPACK (from make.inc) > FORTRAN = gfortran > OPTS = -O2 -fPIC -m64 > DRVOPTS = $(OPTS) > NOOPT = -O0 -fPIC -m64 > LOADER = gfortran > LOADOPTS = > > The numpy and scipy setup finds gfortran and produces a mostly > functional scipy install except for the undefined symbol error. Hi, did you set the CFLAGS or did ATLAS pick them for you? I have seen scipy throw import errors in certain situation when I needed to set some flags for the Fortran compiler to build 64 bit code if it defaulted to a 32 bit target. If your toolchain produces 32 bit binaries per default there are a couple files in Scipy 0.6 that are not build cleanly via distutils, but that pick the Fortran compiler like gfortran directly and you end up attempting to link 32 bit object files into a 64 bit lib. This passes at link time, but blows up on import. > Thanks, > > jbh Cheers, Michael > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From cournape at gmail.com Tue Feb 10 10:43:39 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 11 Feb 2009 00:43:39 +0900 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: <499194D2.9090704@gmail.com> References: <49917029.3060203@ar.media.kyoto-u.ac.jp> <499194D2.9090704@gmail.com> Message-ID: <5b8d13220902100743j59243327l3481e9e34ece3d0@mail.gmail.com> On Tue, Feb 10, 2009 at 11:53 PM, Michael Abshoff wrote: > If your toolchain produces 32 bit binaries per default > there are a couple files in Scipy 0.6 that are not build cleanly via > distutils, but that pick the Fortran compiler like gfortran directly and > you end up attempting to link 32 bit object files into a 64 bit lib. > This passes at link time, but blows up on import. I don't think that's the problem here: the symbol not found simply does not exist in LAPACK < 3.2. Scipy and LAPACK 3.2 do not work together AFAIK. cheers, David From michael.abshoff at googlemail.com Tue Feb 10 10:55:52 2009 From: michael.abshoff at googlemail.com (Michael Abshoff) Date: Tue, 10 Feb 2009 07:55:52 -0800 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: <5b8d13220902100743j59243327l3481e9e34ece3d0@mail.gmail.com> References: <49917029.3060203@ar.media.kyoto-u.ac.jp> <499194D2.9090704@gmail.com> <5b8d13220902100743j59243327l3481e9e34ece3d0@mail.gmail.com> Message-ID: <4991A388.1080205@gmail.com> David Cournapeau wrote: > On Tue, Feb 10, 2009 at 11:53 PM, Michael Abshoff > wrote: >> If your toolchain produces 32 bit binaries per default >> there are a couple files in Scipy 0.6 that are not build cleanly via >> distutils, but that pick the Fortran compiler like gfortran directly and >> you end up attempting to link 32 bit object files into a 64 bit lib. >> This passes at link time, but blows up on import. > > I don't think that's the problem here: the symbol not found simply > does not exist in LAPACK < 3.2. Scipy and LAPACK 3.2 do not work > together AFAIK. Yes, it was a long short, but given the statement that John retried with Lapack 3.1.1 it seems odd. I don't know if he wiped the build directory and all that, so the problem might be something simple like that. > cheers, > > David Cheers, Michael > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From jbh at broad.mit.edu Tue Feb 10 12:25:50 2009 From: jbh at broad.mit.edu (John Hanks) Date: Tue, 10 Feb 2009 12:25:50 -0500 Subject: [SciPy-user] Problem with scipy.linalg and LAPACK. In-Reply-To: <4991A388.1080205@gmail.com> References: <49917029.3060203@ar.media.kyoto-u.ac.jp> <499194D2.9090704@gmail.com> <5b8d13220902100743j59243327l3481e9e34ece3d0@mail.gmail.com> <4991A388.1080205@gmail.com> Message-ID: On Tue, Feb 10, 2009 at 10:55 AM, Michael Abshoff wrote: > Yes, it was a long short, but given the statement that John retried with > Lapack 3.1.1 it seems odd. I don't know if he wiped the build directory > and all that, so the problem might be something simple like that. > Never attribute to software what can be explained by John's incompetence. I removed all traces of the libraries, scipy and numpy source and started over fresh. After rebuilding everything with the 3.1.1 LAPACK instead of 3.2 I now have a working version. It looks like 3.2 got stuck somewhere and wouldn't go away. Thanks for your help, jbh From gael.varoquaux at normalesup.org Tue Feb 10 16:06:06 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 10 Feb 2009 22:06:06 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> Message-ID: <20090210210606.GB9128@phare.normalesup.org> On Tue, Feb 10, 2009 at 02:23:09AM +0100, Sturla Molden wrote: > Ok, the work is basically done :) Congratulation, Sturla! I must admit that I am not too much enthousiastic about the thread+polling to do the cleaning up. I don't really understand why it is necessary. If the view of the array is the last to be decref, than 'buf.shm_nattch' should be 0, and as a result the freeing up of the memory can happen in the dealloc. Or did I miss something? Cheers, Ga?l From josegomez at gmx.net Tue Feb 10 16:36:00 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Tue, 10 Feb 2009 22:36:00 +0100 Subject: [SciPy-user] Array selection help Message-ID: <20090210213600.123400@gmx.net> Hi! Let's say I have two 2D arrays, arr1 and arr2. The elements of arr1 contain different numbers (such as labels, for example), and the elements of arr2 contain some floating point data (say, height above sea level or something like that). For each unique value in arr1, I want to work out the mean (... sum, std dev, etc) of arr2 for the overlapping region. So far, I have used the following code: #Get all the unique values in arr1 U = numpy.unique ( arr1 ) #Create a dictionary with the unique values as key, and the #locations of elements that have that value in arr1 R = dict (zip ( [U[i] for i in xrange(U.shape[0])], \ [ numpy.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] ) ) #Now, calculate the eg mean of arr2 per arr1 "label" M = dict ( zip ( R.keys(), [ numpy.mean(arr2[R[i]]) for i in R.keys() ] ) ) # So I now have a dictionary with the unique values of arr1, and the mean # value of arr2 for those pixels. The code is fast and I was feeling rather smug and pleased with myself about it :) However, when numpy.unique( arr1 ) increases, [ numpy.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] starts taking a long time (understandable, there are loads and loads of operations in that loop). At present, I can easily have numpy.unique ( arr1).shape[0] > 10000, so it does take a long time. Apart from looping through different values of arr1, can anyone think of an efficient way of achieving something similar to this? It doesn't have to be a dictionary as the output, an array or something else would do nicely. Thanks! jose -- Remote Sensing Unit | Env. Monitoring and Modelling Group Dept. of Geography | Dept. of Geography University College London | King's College London Gower St, London WC1E 6BT UK | Strand Campus, Strand, London WC2R 2LS UK -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From stefan at sun.ac.za Tue Feb 10 17:14:31 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 00:14:31 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <20090210213600.123400@gmx.net> References: <20090210213600.123400@gmx.net> Message-ID: <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> Hi Jose 2009/2/10 Jose Luis Gomez Dans : > Let's say I have two 2D arrays, arr1 and arr2. The elements of arr1 contain > different numbers (such as labels, for example), and the elements of arr2 > contain some floating point data (say, height above sea level or something > like that). For each unique value in arr1, I want to work out the mean (... > sum, std dev, etc) of arr2 for the overlapping region. So far, I have used > the following code: Also take a look at scipy.ndimage, which has functions to calculate means and variances over labeled data. Cheers St?fan From josef.pktd at gmail.com Tue Feb 10 17:17:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 10 Feb 2009 17:17:53 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> Message-ID: <1cd32cbb0902101417s1894df9h4409c2d0529108da@mail.gmail.com> I think this is also similar to a recent thread in scipy-dev. There I used a dict to build the unique indices. I don't know if this is fast for your case, since the use case was when there are only a few unique items. see thread at: http://projects.scipy.org/pipermail/scipy-dev/2009-January/010900.html Josef On 2/10/09, St?fan van der Walt wrote: > Hi Jose > > 2009/2/10 Jose Luis Gomez Dans : >> Let's say I have two 2D arrays, arr1 and arr2. The elements of arr1 >> contain >> different numbers (such as labels, for example), and the elements of arr2 >> contain some floating point data (say, height above sea level or something >> like that). For each unique value in arr1, I want to work out the mean >> (... >> sum, std dev, etc) of arr2 for the overlapping region. So far, I have used >> the following code: > > Also take a look at scipy.ndimage, which has functions to calculate > means and variances over labeled data. > > Cheers > St?fan > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From josegomez at gmx.net Tue Feb 10 17:56:42 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Tue, 10 Feb 2009 23:56:42 +0100 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> Message-ID: <20090210225642.141110@gmx.net> Hi St?fan, > > like that). For each unique value in arr1, I want to work out the mean > > (...sum, std dev, etc) of arr2 for the overlapping region. So far, > Also take a look at scipy.ndimage, which has functions to calculate > means and variances over labeled data. Oh, this looks very nice! In case someone is looking for this, you need to propose your labelling (using labels), and then use scipy.ndimage.means() and friends with your label definitions. Going back to my example, my labels grid is arr1, so I just have to do, eg: for i in numpy.unique ( arr1 ): print i, scipy.ndimage.mean ( arr2, labels=arr1, index=i ) I think that solves it! Thanks! Jose -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From josegomez at gmx.net Tue Feb 10 17:56:42 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Tue, 10 Feb 2009 23:56:42 +0100 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> Message-ID: <20090210225642.141110@gmx.net> Hi St?fan, > > like that). For each unique value in arr1, I want to work out the mean > > (...sum, std dev, etc) of arr2 for the overlapping region. So far, > Also take a look at scipy.ndimage, which has functions to calculate > means and variances over labeled data. Oh, this looks very nice! In case someone is looking for this, you need to propose your labelling (using labels), and then use scipy.ndimage.means() and friends with your label definitions. Going back to my example, my labels grid is arr1, so I just have to do, eg: for i in numpy.unique ( arr1 ): print i, scipy.ndimage.mean ( arr2, labels=arr1, index=i ) I think that solves it! Thanks! Jose -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From gael.varoquaux at normalesup.org Tue Feb 10 18:13:56 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 11 Feb 2009 00:13:56 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> Message-ID: <20090210231356.GC9128@phare.normalesup.org> On Tue, Feb 10, 2009 at 02:23:09AM +0100, Sturla Molden wrote: > Ok, the work is basically done :) > What remains is testing/debugging and a setup script. I did a setup script, and I had to change a few detail because Cython was unhappy with the names of the modules (I suspect local imports happening instead of absolute ones). I had to add a __weakref__ attribute to the handle, to make it so that it can be weakref'd. Now I am stuck because shared memory allocation is not working. This boils down to the following traceback: Traceback (most recent call last): File "test.py", line 4, in a = shmem.shared_zeros(10) File "ndarray.py", line 135, in shared_zeros arr = shared_empty(shape, dtype, order) File "ndarray.py", line 126, in shared_empty wrapper = heap.BufferWrapper(nbytes) File "array_heap.py", line 168, in __init__ block = BufferWrapper._heap.malloc(size) File "array_heap.py", line 148, in malloc (arena, start, stop) = self._malloc(size) File "array_heap.py", line 70, in _malloc arena = Arena(length) File "array_heap.py", line 37, in __init__ self.buffer = SharedMemoryBuffer(size) File "sharedmemory_sysv.pyx", line 170, in sharedmemory_sysv.SharedMemoryBuffer.__init__ (sharedmemory_sysv.c:1400) raise OSError, "Failed to attach shared memory: permission denied" OSError: Failed to attach shared memory: permission denied Basically this means that the shmat on line 167 of sharedmemory_sysv.pyx is failing. I don't really know why, but I suspect this might be something stupid. I need to go to bed now, and I probably won't have time to look at that at all before thursday evening. Maybe I will be in luck and someone more clever than me will have time to look at that in the mean time :). Cheers, Ga?l -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmem.zip Type: application/x-zip-compressed Size: 14704 bytes Desc: not available URL: From philip at semanchuk.com Tue Feb 10 18:23:13 2009 From: philip at semanchuk.com (Philip Semanchuk) Date: Tue, 10 Feb 2009 18:23:13 -0500 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090210231356.GC9128@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> Message-ID: <047B5950-BB40-494F-9230-2C46BD138E50@semanchuk.com> On Feb 10, 2009, at 6:13 PM, Gael Varoquaux wrote: > Now I am stuck because shared memory allocation is not working. This > boils down to the following traceback: > > Traceback (most recent call last): > File "test.py", line 4, in > a = shmem.shared_zeros(10) > File "ndarray.py", line 135, in shared_zeros > arr = shared_empty(shape, dtype, order) > File "ndarray.py", line 126, in shared_empty > wrapper = heap.BufferWrapper(nbytes) > File "array_heap.py", line 168, in __init__ > block = BufferWrapper._heap.malloc(size) > File "array_heap.py", line 148, in malloc > (arena, start, stop) = self._malloc(size) > File "array_heap.py", line 70, in _malloc > arena = Arena(length) > File "array_heap.py", line 37, in __init__ > self.buffer = SharedMemoryBuffer(size) > File "sharedmemory_sysv.pyx", line 170, in > sharedmemory_sysv.SharedMemoryBuffer.__init__ (sharedmemory_sysv.c: > 1400) > raise OSError, "Failed to attach shared memory: permission denied" > OSError: Failed to attach shared memory: permission denied > > Basically this means that the shmat on line 167 of > sharedmemory_sysv.pyx > is failing. I don't really know why, but I suspect this might be > something stupid. One problem I see is that the call to shmget() specifies no permissions. The third param to shmget() should contain two sets of bitwise params OR-ed together. The first set is IPC_CREAT and IPC_EXCL, the second set is the permissions. So you might want to change line 156 to this: shmid = shmget(key, buf_size, IPC_CREAT | IPC_EXCL | 0600) or this: shmid = shmget(key, buf_size, IPC_CREAT | IPC_EXCL | 0666) http://www.opengroup.org/onlinepubs/009695399/functions/shmget.html From robert.kern at gmail.com Tue Feb 10 18:24:37 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 10 Feb 2009 17:24:37 -0600 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090210231356.GC9128@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> Message-ID: <3d375d730902101524h2b96961bs93106ba7898556d1@mail.gmail.com> On Tue, Feb 10, 2009 at 17:13, Gael Varoquaux wrote: > On Tue, Feb 10, 2009 at 02:23:09AM +0100, Sturla Molden wrote: >> Ok, the work is basically done :) > >> What remains is testing/debugging and a setup script. > > I did a setup script, and I had to change a few detail because Cython was > unhappy with the names of the modules (I suspect local imports happening > instead of absolute ones). > > I had to add a __weakref__ attribute to the handle, to make it so that it > can be weakref'd. > > Now I am stuck because shared memory allocation is not working. This > boils down to the following traceback: > > Traceback (most recent call last): > File "test.py", line 4, in > a = shmem.shared_zeros(10) > File "ndarray.py", line 135, in shared_zeros > arr = shared_empty(shape, dtype, order) > File "ndarray.py", line 126, in shared_empty > wrapper = heap.BufferWrapper(nbytes) > File "array_heap.py", line 168, in __init__ > block = BufferWrapper._heap.malloc(size) > File "array_heap.py", line 148, in malloc > (arena, start, stop) = self._malloc(size) > File "array_heap.py", line 70, in _malloc > arena = Arena(length) > File "array_heap.py", line 37, in __init__ > self.buffer = SharedMemoryBuffer(size) > File "sharedmemory_sysv.pyx", line 170, in > sharedmemory_sysv.SharedMemoryBuffer.__init__ (sharedmemory_sysv.c:1400) > raise OSError, "Failed to attach shared memory: permission denied" > OSError: Failed to attach shared memory: permission denied I believe that was the error I kept running into when I was futzing around with this. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Wed Feb 11 02:39:21 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 09:39:21 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <20090210225642.141110@gmx.net> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> Message-ID: <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> 2009/2/11 Jose Luis Gomez Dans : > Going back to my example, my labels grid is arr1, so I just have to do, eg: > for i in numpy.unique ( arr1 ): > print i, scipy.ndimage.mean ( arr2, labels=arr1, index=i ) Ndimage can also do the for loop: scipy.ndimage.mean(arr2, labels=arr1, index=np.unique(arr1)) St?fan From millman at berkeley.edu Wed Feb 11 03:26:56 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Wed, 11 Feb 2009 00:26:56 -0800 Subject: [SciPy-user] ANN: SciPy 0.7.0 Message-ID: I'm pleased to announce SciPy 0.7.0. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release comes sixteen months after the 0.6.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.7.0 requires Python 2.4 or greater (but not Python 3) and NumPy 1.2.0 or greater. For information, please see the release notes: https://sourceforge.net/project/shownotes.php?release_id=660191&group_id=27747 You can download the release from here: https://sourceforge.net/project/showfiles.php?group_id=27747&package_id=19531&release_id=660191 Thank you to everybody who contributed to this release. Enjoy, Jarrod Millman From gael.varoquaux at normalesup.org Wed Feb 11 03:35:04 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 11 Feb 2009 09:35:04 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <047B5950-BB40-494F-9230-2C46BD138E50@semanchuk.com> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <047B5950-BB40-494F-9230-2C46BD138E50@semanchuk.com> Message-ID: <20090211083504.GA13047@phare.normalesup.org> On Tue, Feb 10, 2009 at 06:23:13PM -0500, Philip Semanchuk wrote: > One problem I see is that the call to shmget() specifies no > permissions. The third param to shmget() should contain two sets of > bitwise params OR-ed together. The first set is IPC_CREAT and > IPC_EXCL, the second set is the permissions. So you might want to > change line 156 to this: > shmid = shmget(key, buf_size, IPC_CREAT | IPC_EXCL | 0600) > or this: > shmid = shmget(key, buf_size, IPC_CREAT | IPC_EXCL | 0666) Indeed, Philip, that was it. Thanks a lot for your help. Ga?l From gael.varoquaux at normalesup.org Wed Feb 11 06:46:20 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 11 Feb 2009 12:46:20 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090210231356.GC9128@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> Message-ID: <20090211114620.GB19956@phare.normalesup.org> I shouldn't be working on that, but this is way more fun :). So I found a few more simple errors in the code and fixed them (code attached). The garbage collector thread lock multiprocessing. I am not sure why. I disabled it just to see what would happen. I added a few print statement to try and debug eventual memory leaks. I think I have a memory leak, judging from the different prints. I am not sure though, and I wonder if there is a good way of checking this, other than running the test code in a big loop and checking if the test box eventually dies. Valgrind does report some 'possibly lost' blocks that increase with the size of the array. Anybody has a suggestion on how to debug this? Cheers, Ga?l -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmem.zip Type: application/x-zip-compressed Size: 11588 bytes Desc: not available URL: From sturla at molden.no Wed Feb 11 07:04:59 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 13:04:59 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090210231356.GC9128@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> Message-ID: <4992BEEB.4030507@molden.no> On 2/11/2009 12:13 AM, Gael Varoquaux wrote: > I did a setup script, and I had to change a few detail because Cython was > unhappy with the names of the modules (I suspect local imports happening > instead of absolute ones). > > I had to add a __weakref__ attribute to the handle, to make it so that it > can be weakref'd. Thanks Gael I've noticed you used the version I posted to the list, and not the latest on the web. So there is a lot of debugging you missed. I'll do a quick merge of what you posted with mine. I inherited via a Python class to allow a weakref to a Handle. Your solution is cleaner. As for the clean-up thread: A shared segment has an owner on Linux. Only the owner or superuser can mark it for deletion. Someone else but the owner may be the last to detach, and then marking for deletion will fail. I think we should remove the thread and raise an exception if marking for deletion fails. We cannot completely foolproof the clean-up against use of os.setuid anyway. Sturla Molden From sturla at molden.no Wed Feb 11 07:07:49 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 13:07:49 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090211114620.GB19956@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> Message-ID: <4992BF95.6050505@molden.no> On 2/11/2009 12:46 PM, Gael Varoquaux wrote: > I shouldn't be working on that, but this is way more fun :). > > So I found a few more simple errors in the code and fixed them Here is my working Windows version from yesterday: http://folk.uio.no/sturlamo/python/sharedmem.zip Sturla Molden From sturla at molden.no Wed Feb 11 07:31:53 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 13:31:53 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090211114620.GB19956@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> Message-ID: <4992C539.7020501@molden.no> On 2/11/2009 12:46 PM, Gael Varoquaux wrote: > So I found a few more simple errors in the code and fixed them (code > attached). The garbage collector thread lock multiprocessing. def __dealloc__(SharedMemoryBuffer self): print 'Calling __dealloc__ on buffer at %s' \ % self.mapped_address #DBG self.handle.dealloc() Why do you do this? The Handle should self destruct. Anyway, this is evil and will possibly case multiprocessing to hang, as well as segfaults. Sturla From sturla at molden.no Wed Feb 11 07:41:39 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 13:41:39 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090211114620.GB19956@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> Message-ID: <4992C783.6060807@molden.no> On 2/11/2009 12:46 PM, Gael Varoquaux wrote: > So I found a few more simple errors in the code and fixed them (code > attached). Gael, I tried to merge your changes with mine. My Python code worked yesterday (albeit not the version you've debugging), so I guess it still does. The gc thread is removed. Sturla -------------- next part -------------- A non-text attachment was scrubbed... Name: sharedmem.zip Type: application/x-zip-compressed Size: 7867 bytes Desc: not available URL: From cournape at gmail.com Wed Feb 11 07:46:05 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 11 Feb 2009 21:46:05 +0900 Subject: [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages Message-ID: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> Hi, I started to set up a PPA for scipy on launchpad, which enables to build ubuntu packages for various distributions/architectures. The link is there: https://edge.launchpad.net/~scipy/+archive/ppa So you just need to add one line to your /etc/apt/sources.list, and you will get uptodate numpy and scipy packages, cheers, David From python-ml at nn7.de Wed Feb 11 07:41:12 2009 From: python-ml at nn7.de (Soeren Sonnenburg) Date: Wed, 11 Feb 2009 13:41:12 +0100 Subject: [SciPy-user] sparse matrices again Message-ID: <1234356072.5642.16.camel@localhost> Dear all, is it somehow possible to interface to the C API of scipy's spars matrices? I know numpy does not have sparse matrix support but scipy does (at least it can be used from the python side). If it is not too unstable then I would invest some time to get some swig typemaps to connect to it. Soeren From scott.sinclair.za at gmail.com Wed Feb 11 08:13:10 2009 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 11 Feb 2009 15:13:10 +0200 Subject: [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages In-Reply-To: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> Message-ID: <6a17e9ee0902110513x46905c03y7f6dcd1b93839406@mail.gmail.com> > 2009/2/11 David Cournapeau : > I started to set up a PPA for scipy on launchpad, which enables to > build ubuntu packages for various distributions/architectures. The > link is there: > > https://edge.launchpad.net/~scipy/+archive/ppa > > So you just need to add one line to your /etc/apt/sources.list, and > you will get uptodate numpy and scipy packages, Thanks! Cheers, Scott From faltet at pytables.org Wed Feb 11 08:20:49 2009 From: faltet at pytables.org (Francesc Alted) Date: Wed, 11 Feb 2009 14:20:49 +0100 Subject: [SciPy-user] ANN: Numexpr 1.2 released Message-ID: <200902111420.49579.faltet@pytables.org> ======================== Announcing Numexpr 1.2 ======================== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. The main feature added in this version is the support of the Intel VML library (many thanks to Gregor Thalhammer for his nice work on this!). In addition, when the VML support is on, several processors can be used in parallel (see the new `set_vml_num_threads()` function). When the VML support is on, the computation of transcendental functions (like trigonometrical, exponential, logarithmic, hyperbolic, power...) can be accelerated quite a few. Typical speed-ups when using one single core for contiguous arrays are around 3x, with peaks of 7.5x (for the pow() function). When using 2 cores the speed-ups are around 4x and 14x respectively. In case you want to know more in detail what has changed in this version, have a look at the release notes: http://code.google.com/p/numexpr/wiki/ReleaseNotes Where I can find Numexpr? ========================= The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi How it works? ============= See: http://code.google.com/p/numexpr/wiki/Overview for a detailed description of the package. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted From gael.varoquaux at normalesup.org Wed Feb 11 08:33:06 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 11 Feb 2009 14:33:06 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <4992C539.7020501@molden.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> <4992C539.7020501@molden.no> Message-ID: <20090211133305.GC19956@phare.normalesup.org> On Wed, Feb 11, 2009 at 01:31:53PM +0100, Sturla Molden wrote: > def __dealloc__(SharedMemoryBuffer self): > print 'Calling __dealloc__ on buffer at %s' \ > % self.mapped_address #DBG > self.handle.dealloc() > Why do you do this? The Handle should self destruct. Anyway, this is > evil and will possibly case multiprocessing to hang, as well as segfaults. This was for debugging. I do not understand why my test code shows only one call to __dealloc__ (see below), and I am trying to figure out why. I fear this has more to do with Python's garbage collector. I agree this is evil. However, if I don't add this code, the __dealloc__ method of the handler does not seem get called in my example. Here is what worries me: I run this test code: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ import ndarray as shmem import numpy as np def modify_array(ary): ary[:3] = 1 print 'Array address in sub program %s' % ary.ctypes.data from multiprocessing import Pool def main(): a = shmem.shared_zeros(10) p = Pool() print 'Array address in main program %s' % a.ctypes.data print a job = p.apply_async(modify_array, (a, )) p.close() p.join() print a main() import gc gc.collect() ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I get the following output: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Array address in main program 47294723575808 [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Array address in sub program 47294723575808 Calling __dealloc__ on buffer at 47294723575808 Deallocated memory at 47294723575808 [ 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The two messages about deallocation are debug prints that I inserted in the two __dealloc__ methods. It seems to me that the array 'a' in the main program has not been dellocated. I thus believe that there is a memory leak (I haven't been able to really confirm). It seems to me that the __dealloc__ method of 'a' does not get called in the main program. I have also just added print of pid (not in above example), and the two calls to __dealloc__ do happen in the child process. Finally, if I do not call explictely __dealloc__ for the handler in the dealloc of the buffer, I do not see it being called. So I am wondering if we are not being tricked by the fact that Python calls the __del__ method lazily, in particular when quitting. Maybe the solution to this problem is to add an exit hook (seems like that's what other people did when faced with this problem: http://www.python.org/search/hypermail/python-recent/0635.html, follow up is also interresting: http://www.python.org/search/hypermail/python-recent/0636.html), however this is not terribly robust. I wonder how mutliprocessing deals with this problem. By the way, I have just found a trivial bug: if I call shared_zeros with 1e5 as an argument, the code does not realise it should process this as an int. I suggest that shared_empty also accepts floats in the 'magic' cast from numbers to tuple for the shape, as this is what numpy does. Ga?l From sturla at molden.no Wed Feb 11 08:51:38 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 14:51:38 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090211133305.GC19956@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> <4992C539.7020501@molden.no> <20090211133305.GC19956@phare.normalesup.org> Message-ID: <4992D7EA.5070404@molden.no> You need to do if __name__ == "__main__": main() in you testing code for multiprocessing to work correctly. Leaving it out is a source of mysterious errors. In fact on Windows, leaving it out creates something similar to a fork bomb. > So I am wondering if we are not being tricked by the fact that Python > calls the __del__ method lazily, in particular when quitting. Maybe the > solution to this problem is to add an exit hook (seems like that's what > other people did when faced with this problem: > http://www.python.org/search/hypermail/python-recent/0635.html, follow up > is also interresting: > http://www.python.org/search/hypermail/python-recent/0636.html), however > this is not terribly robust. I wonder how mutliprocessing deals with this > problem. multiprocessing.util.Finalize is an exit hook. That should do the clean-up in the main program. It clean up the BufferWrapper object, which owns the SharedMemoryBuffer. As long as the Heap object is destroyed, it will clean up. Try to put in some printing to see if the buffer is marked for removal. Do not use a Cython's print statement but something else (e.g. printf from stdio.h). Sturla From josegomez at gmx.net Wed Feb 11 08:56:48 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Wed, 11 Feb 2009 14:56:48 +0100 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> Message-ID: <20090211135648.67960@gmx.net> St?fan, On Wednesday 11 February 2009 07:39:21 St?fan van der Walt wrote: > scipy.ndimage.mean(arr2, labels=arr1, index=np.unique(arr1)) True. The question is, how do I get the output of your code back into my original array? Presumably, there's another function that does that quickly? Many thanks! Jose -- Remote Sensing Unit | Env. Monitoring and Modelling Group Dept. of Geography | Dept. of Geography University College London | King's College London Gower St, London WC1E 6BT UK | Strand Campus, Strand, London WC2R 2LS UK -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 From sturla at molden.no Wed Feb 11 08:58:24 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Feb 2009 14:58:24 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <4992D7EA.5070404@molden.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> <4992C539.7020501@molden.no> <20090211133305.GC19956@phare.normalesup.org> <4992D7EA.5070404@molden.no> Message-ID: <4992D980.2070402@molden.no> On 2/11/2009 2:51 PM, Sturla Molden wrote: > As long as the Heap object is destroyed, eh, Handle object. cdef extern from "stdio.h": void printf(char *str) cdef class Handle: """ Automatic shared segment deattachment - without this object we would need to do reference counting manually, as shmdt is global to the process. Do not instantiate this class, except from within SharedMemoryBuffer.__init__. """ cdef int shmid cdef object name cdef object cleanup cdef object __weakref__ def __init__(Handle self, shmid, name): self.shmid = shmid self.name = name def gethandle(Handle self): return int(self.shmid) def __dealloc__(Handle self): self.dealloc() def dealloc(Handle self): cdef shmid_ds buf cdef int _shmid= self.shmid cdef void *addr cdef int ierr try: ma, size = __mapped_addresses[ self.name ] addr = ( ma) ierr = shmdt(addr) if (ierr < 0): raise MemoryError, "shmdt failed." del __mapped_addresses[ self.name ] print "Deallocated memory at %s" % ma #DBG except KeyError: print __mapped_addresses #DBG print self.name #DBG print 'KeyError' #DBG #pass # this may happen and is not a problem if (shmctl(_shmid, IPC_STAT, &buf) == -1): raise OSError, \ "IPC_STAT failed, you could have a global memory leak!" if (buf.shm_nattch == 0): if( shmctl(_shmid, IPC_RMID, NULL) == -1 ): raise OSError, \ "IPC_RMID failed, you have a global memory leak!" else: printf("shared segment removed\n") S.M. From gael.varoquaux at normalesup.org Wed Feb 11 09:24:22 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 11 Feb 2009 15:24:22 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <4992D7EA.5070404@molden.no> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> <4992C539.7020501@molden.no> <20090211133305.GC19956@phare.normalesup.org> <4992D7EA.5070404@molden.no> Message-ID: <20090211142422.GD19956@phare.normalesup.org> On Wed, Feb 11, 2009 at 02:51:38PM +0100, Sturla Molden wrote: > As long as the Heap object is destroyed, it will clean up. Try to put in > some printing to see if the buffer is marked for removal. Do not use a > Cython's print statement but something else (e.g. printf from stdio.h). I have put in some more print statements. I have the fealing that it is not cleaned up. I am attaching my test code, and the modified sharedmemory_sysv.pyx for debug. The output is the following: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Array address in main program 47782588157952 (pid: 18024) [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Array address in sub program 47782588157952 (pid: 18033) Calling __dealloc__ on buffer at 47782588157952, in pid 18033 Checking for deallocating of memory at 47782588157952 Not deallocating: 8 attached segments [ 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I do have the feeling that something is not getting garbage-collected. Ga?l -------------- next part -------------- # Written by Sturla Molden, 2009 # Released under SciPy license # ctypedef int size_t cdef extern from "errno.h": int EEXIST, errno int EACCES, errno int ENOMEM, errno cdef extern from "string.h": void memset(void *addr, int val, size_t len) void memcpy(void *trg, void *src, size_t len) cdef extern from "sys/types.h": ctypedef int key_t cdef extern from "sys/shm.h": ctypedef unsigned int shmatt_t cdef struct shmid_ds: shmatt_t shm_nattch int shmget(key_t key, size_t size, int shmflg) void *shmat(int shmid, void *shmaddr, int shmflg) int shmdt(void *shmaddr) int shmctl(int shmid, int cmd, shmid_ds *buf) nogil cdef extern from "stdio.h": void printf(char *str, ...) cdef extern from "sys/ipc.h": key_t ftok(char *path, int id) int IPC_STAT, IPC_RMID, IPC_CREAT, IPC_EXCL, IPC_PRIVATE cdef extern from "unistd.h": unsigned int sleep(unsigned int seconds) nogil import uuid import weakref import numpy import threading import os cdef object __mapped_addresses = dict() cdef object __open_handles = weakref.WeakValueDictionary() cdef class Handle: """ Automatic shared segment deattachment - without this object we would need to do reference counting manually, as shmdt is global to the process. Do not instantiate this class, except from within SharedMemoryBuffer.__init__. """ cdef int shmid cdef object name cdef object __weakref__ def __init__(Handle self, shmid, name): self.shmid = shmid self.name = name def gethandle(Handle self): return int(self.shmid) def __dealloc__(Handle self): self.dealloc() def dealloc(Handle self): cdef shmid_ds buf cdef int _shmid = self.shmid cdef void *addr cdef int ierr try: ma, size = __mapped_addresses[ self.name ] addr = ( ma) ierr = shmdt(addr) if (ierr < 0): raise MemoryError, "shmdt failed." del __mapped_addresses[ self.name ] printf("Checking for deallocating of memory at %lu\n", ma) #DBG except KeyError: print __mapped_addresses #DBG print self.name #DBG print 'KeyError' #DBG #pass if (shmctl(_shmid, IPC_STAT, &buf) == -1): raise OSError, \ "IPC_STAT failed, you could have a global memory leak!" if (buf.shm_nattch == 0): if( shmctl(_shmid, IPC_RMID, NULL) == -1 ): raise OSError, \ "IPC_RMID failed, you have a global memory leak!" else: printf("shared segment removed\n") else: printf('Not deallocating: %i attached segments\n', buf.shm_nattch) cdef class SharedMemoryBuffer: """ Windows API shared memory segment """ cdef void *mapped_address cdef object name cdef object handle cdef int shmid cdef unsigned long size def __init__(SharedMemoryBuffer self, unsigned int buf_size, name=None, unpickling=False): cdef void* mapped_address cdef long mode cdef int shmid cdef int ikey cdef key_t key lkey = 1 if IPC_PRIVATE < 0 else IPC_PRIVATE + 1 if (name is None) and (unpickling): raise TypeError, "Cannot unpickle without a kernel object name." elif (name is None) and not unpickling: # create a brand new shared segment while 1: self.name = numpy.random.random_integers(lkey, int(2147483646)) ikey = self.name memset( &key, 0, sizeof(key_t)) memcpy( &key, &ikey, sizeof(int)) # key_t is large enough to contain an int shmid = shmget(key, buf_size, IPC_CREAT|IPC_EXCL|0600) if (shmid < 0): if (errno != EEXIST): raise OSError, "Failed to open shared memory" else: # we have an open segment break self.handle = Handle(int(shmid), self.name) __open_handles[ self.name ] = self.handle mapped_address = shmat(shmid, NULL, 0) if (mapped_address == -1): if errno == EACCES: raise OSError, "Failed to attach shared memory: permission denied" elif errno == ENOMEM: raise OSError, "Failed to attach shared memory: insufficient memory" else: raise OSError, "Failed to attach shared memory" self.shmid = shmid self.size = buf_size self.mapped_address = mapped_address ma = int( self.mapped_address) size = int(buf_size) __mapped_addresses[ self.name ] = ma, size else: # unpickling self.name = name try: # check if this process has an open handle to # this segment already self.handle = __open_handles[ self.name ] self.shmid = self.handle.gethandle() ma, size = __mapped_addresses[ self.name ] self.mapped_address = ( ma) self.size = size except KeyError: # unpickle a segment created by another process ikey = self.name memset( &key, 0, sizeof(key_t)) memcpy( &key, &ikey, sizeof(int)) shmid = shmget(key, buf_size, 0) if (shmid < 0): raise OSError, "Failed to open shared memory" self.handle = Handle(int(shmid), name) __open_handles[ self.name ] = self.handle mapped_address = shmat(shmid, NULL, 0) if (mapped_address == -1): raise OSError, "Failed to attach shared memory" self.shmid = shmid self.size = buf_size self.mapped_address = mapped_address ma = int( self.mapped_address) size = int(buf_size) __mapped_addresses[ self.name ] = ma, size def __dealloc__(SharedMemoryBuffer self): printf('Calling __dealloc__ on buffer at %lu, in pid %i\n', #DBG self.mapped_address, #DBG os.getpid()) #DBG self.handle.dealloc() # return base address and segment size # this will be used by the heap object def getbuffer(SharedMemoryBuffer self): return int( self.mapped_address), int(self.size) # pickle def __reduce__(SharedMemoryBuffer self): return (__unpickle_shm, (self.size, self.name)) def __unpickle_shm(*args): s, n = args return SharedMemoryBuffer(s, name=n, unpickling=True) -------------- next part -------------- import ndarray as shmem import numpy as np import sys import os #a = shmem.shared_zeros(10) #print >>sys.stderr, 'Array created' #print a.ctypes.data #print a #print >>sys.stderr, 'Array printed' def modify_array(ary): ary[:3] = 1 print >>sys.stderr, 'Array address in sub program %s (pid: %s)' \ % (ary.ctypes.data, os.getpid()) from multiprocessing import Pool def main(): SIZE = 10 a = shmem.shared_zeros(SIZE) #a = np.zeros(SIZE) p = Pool() print >>sys.stderr, 'Array address in main program %s (pid: %s)' \ % (a.ctypes.data, os.getpid()) print >>sys.stderr, a job = p.apply_async(modify_array, (a, )) p.close() p.join() print >>sys.stderr, a if __name__ == '__main__': main() import gc gc.collect() From stefan at sun.ac.za Wed Feb 11 09:22:05 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 16:22:05 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <20090211135648.67960@gmx.net> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> Message-ID: <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> 2009/2/11 Jose Luis Gomez Dans : > On Wednesday 11 February 2009 07:39:21 St?fan van der Walt wrote: >> scipy.ndimage.mean(arr2, labels=arr1, index=np.unique(arr1)) > > True. The question is, how do I get the output of your code back into my > original array? Presumably, there's another function that does that quickly? It is already in an array, so I'm not sure I understand. Maybe you mean out[:] = scipy.ndimage.mean(...) ? St?fan From wnbell at gmail.com Wed Feb 11 09:40:11 2009 From: wnbell at gmail.com (Nathan Bell) Date: Wed, 11 Feb 2009 09:40:11 -0500 Subject: [SciPy-user] sparse matrices again In-Reply-To: <1234356072.5642.16.camel@localhost> References: <1234356072.5642.16.camel@localhost> Message-ID: On Wed, Feb 11, 2009 at 7:41 AM, Soeren Sonnenburg wrote: > > is it somehow possible to interface to the C API of scipy's spars > matrices? I know numpy does not have sparse matrix support but scipy > does (at least it can be used from the python side). > > If it is not too unstable then I would invest some time to get some swig > typemaps to connect to it. > The interface is not guaranteed to be stable, but you can access the C++ functions that implement much of scipy.sparse through scipy.sparse.sparsetools. What do you want to do exactly? -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From josegomez at gmx.net Wed Feb 11 10:03:15 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Wed, 11 Feb 2009 16:03:15 +0100 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> Message-ID: <20090211150315.141380@gmx.net> > >> scipy.ndimage.mean(arr2, labels=arr1, index=np.unique(arr1)) > > > > True. The question is, how do I get the output of your code back into my > > original array? Presumably, there's another function that does that > quickly? > > It is already in an array, so I'm not sure I understand. Maybe you mean > > out[:] = scipy.ndimage.mean(...) ? Sorry, I was clumsy with my wording. What I meant is how to put together the results, so that I have a 2D array where the value of each element is the value that corresponds to the mean of the corresponding label. So if arr1[100,100] = 4 (say), and after running the mean of arr2 for elements that in arr1 are labeled as 4, the mean value is 2.3, I'd like to have an array (out, out.shape == arr1.shape) where the values of elements of out that share a common label are given the mean value (2.3 for those labeled as 4 in my previous example). In essence, I want to have an array where each element is the mean value for its corresponding class. many thanks! Jose -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From stefan at sun.ac.za Wed Feb 11 10:26:31 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 17:26:31 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <20090211150315.141380@gmx.net> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> Message-ID: <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> 2009/2/11 Jose Luis Gomez Dans : >> out[:] = scipy.ndimage.mean(...) ? > > Sorry, I was clumsy with my wording. What I meant is how to put together the results, so that I have a 2D array where the value of each element is the value that corresponds to the mean of the corresponding label. So if arr1[100,100] = 4 (say), and after running the mean of arr2 for elements that in arr1 are labeled as 4, the mean value is 2.3, I'd like to have an array (out, out.shape == arr1.shape) where the values of elements of out that share a common label are given the mean value (2.3 for those labeled as 4 in my previous example). > > In essence, I want to have an array where each element is the mean value for its corresponding class. Thanks, now I understand! In that case your for-loop should be fine (I guess you won't have too many unique indices?). Cheers St?fan From josegomez at gmx.net Wed Feb 11 10:41:15 2009 From: josegomez at gmx.net (Jose Luis Gomez Dans) Date: Wed, 11 Feb 2009 16:41:15 +0100 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> Message-ID: <20090211154115.67130@gmx.net> Hi, > > In essence, I want to have an array where each element is the mean value > for its corresponding class. > > Thanks, now I understand! In that case your for-loop should be fine > (I guess you won't have too many unique indices?). Well, there can be quite a lot of them (~10000 at least), so it does take a long while. I was just wondering whether some numpy/scipy array Jedi trick might speed it up :) jose -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From josef.pktd at gmail.com Wed Feb 11 11:27:44 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 11:27:44 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <20090211154115.67130@gmx.net> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> Message-ID: <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> What's your average number of observations per label? If you have only a few number of observations per label, then using the looping once through your array in python is faster, then the way you were building your dict in the initial message: Below are some timing comparisons, first line is your usage of numpy, second line is one python loop. you see that the python loop scales much better Josef (length of observation array is 2000, labels are random integers) mean observation per label 40.0 0.404751721231 0.361348718907 >>> mean observation per label 200.0 0.149529060262 0.349892234903 >>> mean observation per label 4.0 2.87190969802 0.380998981716 >>> mean observation per label 2.0 4.87971013076 0.405277207021 >>> mean observation per label 400.0 0.117748205434 0.432144029481 for len(arr1) = 100000 and 10000 labels: mean observation per label 10.0 22.9237349998 0.292642780018 Note: the return types differ, version two return plain lists as dict values ------------------------- file----------------- import numpy as np from scipy import ndimage from numpy.testing import assert_array_equal n = 10000 size = 100000 print 'mean observation per label', size/float(n) rvs= np.random.randint(n,size=size) arr1 = rvs arr2 = float(n)-rvs def usendimage(arr1,arr2): for i in np.unique(arr1): print i, ndimage.mean(arr2, labels=arr1, index=i) labelsunique = np.unique(arr1) print labelsunique print ndimage.mean(arr2, labels=arr1, index=labelsunique) def labelcoord1(arr1, arr2): #Get all the unique values in arr1 U = np.unique ( arr1 ) #Create a dictionary with the unique values as key, and the #locations of elements that have that value in arr1 R = dict (zip ( [U[i] for i in xrange(U.shape[0])], \ [ np.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] ) ) return R # value of dict is tuple def labelcoord2(arr1, arr2): #Get all the unique values in arr1 U = np.unique ( arr1 ) #Create a dictionary with the unique values as key, and the #locations of elements that have that value in arr1 R = {} for index, row in enumerate(zip(arr1,arr2)): R.setdefault(row[0],[]).append(index) return R # value of dict is list # So I now have a dictionary with the unique values of arr1, and the mean # value of arr2 for those pixels. import timeit t=timeit.Timer("labelcoord1(arr1, arr2)", "from __main__ import *") print t.timeit(1) t=timeit.Timer("labelcoord2(arr1, arr2)", "from __main__ import *") print t.timeit(1) R1 = labelcoord1(arr1, arr2) R2 = labelcoord2(arr1, arr2) for k in sorted(R1): assert_array_equal(R1[k][0], np.array(R2[k])) On 2/11/09, Jose Luis Gomez Dans wrote: > Hi, > >> > In essence, I want to have an array where each element is the mean value >> for its corresponding class. >> >> Thanks, now I understand! In that case your for-loop should be fine >> (I guess you won't have too many unique indices?). > > Well, there can be quite a lot of them (~10000 at least), so it does take a > long while. I was just wondering whether some numpy/scipy array Jedi trick > might speed it up :) > > jose > -- > Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL > f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Wed Feb 11 11:47:52 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 11:47:52 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> Message-ID: <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> list comprehension is still a bit faster. That's about 90 times faster than your version for building the dict of indices for this case. Josef def labelcoord3(arr1, arr2): R = {} [R.setdefault(row[0],[]).append(index) for index, row in enumerate(zip(arr1,arr2))] return R mean observation per label 10.0 labelcoord2 0.374560733278 labelcoord3 0.254217505297 >>> len(R3) # number of different labels 10000 >>> len(arr1) # number of observations 100000 On 2/11/09, josef.pktd at gmail.com wrote: > What's your average number of observations per label? > > If you have only a few number of observations per label, then using > the looping once through your array in python is faster, then the way > you were building your dict in the initial message: > > Below are some timing comparisons, first line is your usage of numpy, > second line is one python loop. > you see that the python loop scales much better > > Josef > > > (length of observation array is 2000, labels are random integers) > > mean observation per label 40.0 > 0.404751721231 > 0.361348718907 >>>> > mean observation per label 200.0 > 0.149529060262 > 0.349892234903 >>>> > mean observation per label 4.0 > 2.87190969802 > 0.380998981716 >>>> > mean observation per label 2.0 > 4.87971013076 > 0.405277207021 >>>> > mean observation per label 400.0 > 0.117748205434 > 0.432144029481 > > for len(arr1) = 100000 and 10000 labels: > > mean observation per label 10.0 > 22.9237349998 > 0.292642780018 > > Note: the return types differ, version two return plain lists as dict > values > ------------------------- file----------------- > > > import numpy as np > from scipy import ndimage > from numpy.testing import assert_array_equal > > n = 10000 > size = 100000 > print 'mean observation per label', size/float(n) > rvs= np.random.randint(n,size=size) > arr1 = rvs > arr2 = float(n)-rvs > > def usendimage(arr1,arr2): > for i in np.unique(arr1): > print i, ndimage.mean(arr2, labels=arr1, index=i) > > labelsunique = np.unique(arr1) > print labelsunique > print ndimage.mean(arr2, labels=arr1, index=labelsunique) > > > > def labelcoord1(arr1, arr2): > #Get all the unique values in arr1 > U = np.unique ( arr1 ) > #Create a dictionary with the unique values as key, and the > #locations of elements that have that value in arr1 > R = dict (zip ( [U[i] for i in xrange(U.shape[0])], \ > [ np.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] ) ) > return R # value of dict is tuple > > def labelcoord2(arr1, arr2): > > #Get all the unique values in arr1 > U = np.unique ( arr1 ) > #Create a dictionary with the unique values as key, and the > #locations of elements that have that value in arr1 > R = {} > for index, row in enumerate(zip(arr1,arr2)): > R.setdefault(row[0],[]).append(index) > return R # value of dict is list > > > # So I now have a dictionary with the unique values of arr1, and the > mean > # value of arr2 for those pixels. > > > > import timeit > t=timeit.Timer("labelcoord1(arr1, arr2)", "from __main__ import *") > print t.timeit(1) > t=timeit.Timer("labelcoord2(arr1, arr2)", "from __main__ import *") > print t.timeit(1) > > R1 = labelcoord1(arr1, arr2) > R2 = labelcoord2(arr1, arr2) > for k in sorted(R1): > assert_array_equal(R1[k][0], np.array(R2[k])) > > > > > On 2/11/09, Jose Luis Gomez Dans wrote: >> Hi, >> >>> > In essence, I want to have an array where each element is the mean >>> > value >>> for its corresponding class. >>> >>> Thanks, now I understand! In that case your for-loop should be fine >>> (I guess you won't have too many unique indices?). >> >> Well, there can be quite a lot of them (~10000 at least), so it does take >> a >> long while. I was just wondering whether some numpy/scipy array Jedi >> trick >> might speed it up :) >> >> jose >> -- >> Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL >> f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a >> _______________________________________________ >> SciPy-user mailing list >> SciPy-user at scipy.org >> http://projects.scipy.org/mailman/listinfo/scipy-user >> > From jgomezdans at gmail.com Wed Feb 11 12:31:44 2009 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Wed, 11 Feb 2009 17:31:44 +0000 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> References: <20090210213600.123400@gmx.net> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> Message-ID: <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> Josef, Thanks for that 2009/2/11 > list comprehension is still a bit faster. That's about 90 times faster > than your version for building the dict of indices for this case. I still think ndimage is more obvious to use if you have images, and in my case it is fast (I didn't time it, but less than making a cup of horrible instant coffee ;p). My problem is that it takes a long time to go from a list/dictionary of class mean values (output from either St?fan's ndimage solution or your dictionary solution) back into the original 2D array. In fact, I'm thinking about using weave to achieve this, unless someone comes up with a better idea. Many thanks for your help, and for the code! Jose -- Remote Sensing Unit | Env. Monitoring and Modelling Group Dept. of Geography | Dept. of Geography University College London | King's College London Gower St, London WC1E 6BT UK | Strand Campus, Strand, London WC2R 2LS UK -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 11 13:23:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 13:23:42 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> Message-ID: <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> Putting the array back together with means also seems to be pretty fast this way. The test for correctness takes several times longer than the array creation. I think labelmeanfilter below does what you want. Josef timing again for large case: >>> arr1.size 100000 >>> labelsunique.size 10000 mean observation per label 10.0 labelcoord2 0.391759008477 labelcoord3 0.253704311581 labelmeanfilter 0.446776056733 def labelmeanfilter(arr1, arr2): R = {} [R.setdefault(row[0],[]).append(index) for index, row in enumerate(zip(arr1,arr2))] labelsunique = R.keys() #np.unique(arr1) labelmeans = ndimage.mean(arr2, labels=arr1, index=labelsunique) arr3 = np.zeros(arr1.shape) for k,v in zip(labelsunique,labelmeans): arr3[R[k]] = v return arr3 def test_labelmeanfilter(arr1, arr2): arr3b = labelmeanfilter(arr1, arr2) labmeandict = dict(zip(labelsunique,labelmeans)) for orig,means in zip(arr1,arr3b): assert_array_equal(means, labmeandict[orig], repr(orig)) On 2/11/09, Jose Gomez-Dans wrote: > Josef, > > Thanks for that > > 2009/2/11 > >> list comprehension is still a bit faster. That's about 90 times faster >> than your version for building the dict of indices for this case. > > > I still think ndimage is more obvious to use if you have images, and in my > case it is fast (I didn't time it, but less than making a cup of horrible > instant coffee ;p). My problem is that it takes a long time to go from a > list/dictionary of class mean values (output from either St?fan's ndimage > solution or your dictionary solution) back into the original 2D array. > > In fact, I'm thinking about using weave to achieve this, unless someone > comes up with a better idea. > > Many thanks for your help, and for the code! > Jose > > -- > Remote Sensing Unit | Env. Monitoring and Modelling Group > Dept. of Geography | Dept. of Geography > University College London | King's College London > Gower St, London WC1E 6BT UK | Strand Campus, Strand, London WC2R 2LS UK > From rpyle at post.harvard.edu Wed Feb 11 11:13:13 2009 From: rpyle at post.harvard.edu (Robert Pyle) Date: Wed, 11 Feb 2009 11:13:13 -0500 Subject: [SciPy-user] ANN: SciPy 0.7.0 In-Reply-To: References: Message-ID: <27CB252B-AFF8-4003-8304-74581F14C0EB@post.harvard.edu> Hi, I have a dual G5 running 10.5.6. I removed scipy-0.6.0 from site- packages, downloaded scipy-0.7.0-py2.5-macosx10.5.dmg, and installed it. The installation went surprisingly fast and claimed to have succeeded, but 0.7.0 was nowhere to be found. I then downloaded the tarball and successfully (if rather more slowly) went through the installation. Is something wrong with scipy-0.7.0-py2.5-macosx10.5.dmg? Bob Pyle On Feb 11, 2009, at 3:26 AM, Jarrod Millman wrote: > I'm pleased to announce SciPy 0.7.0. From josef.pktd at gmail.com Wed Feb 11 13:46:14 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 13:46:14 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> References: <20090210213600.123400@gmx.net> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> Message-ID: <1cd32cbb0902111046h6b3a0103q5c10bb77fd26aac5@mail.gmail.com> sorry, cut and paste error, the test function used globals, correction below. You can speed up a little bit more using itertools.izip instead of zip. (labelmeanfilter 0.377629000334) Josef def test_labelmeanfilter(arr1, arr2): arr3b = labelmeanfilter(arr1, arr2) labelsunique = np.unique(arr1) labelmeans = ndimage.mean(arr2, labels=arr1, index=labelsunique) labmeandict = dict(zip(labelsunique,labelmeans)) for orig,means in zip(arr1,arr3b): assert_array_equal(means, labmeandict[orig], repr(orig)) From strawman at astraw.com Wed Feb 11 13:54:43 2009 From: strawman at astraw.com (Andrew Straw) Date: Wed, 11 Feb 2009 10:54:43 -0800 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> Message-ID: <49931EF3.7010106@astraw.com> > def labelmeanfilter(arr1, arr2): > R = {} > [R.setdefault(row[0],[]).append(index) for index, row in > enumerate(zip(arr1,arr2))] > I think the following should produce a bit more speed (although I haven't benchmarked it) because it avoids creating len(arr1) empty lists. import collections def labelmeanfilter(arr1, arr2): R = collections.defaultdict(list) [R[row[0]].append(index) for index, row in enumerate(zip(arr1,arr2))] From josef.pktd at gmail.com Wed Feb 11 14:06:20 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 14:06:20 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <49931EF3.7010106@astraw.com> References: <20090210213600.123400@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> <49931EF3.7010106@astraw.com> Message-ID: <1cd32cbb0902111106g5dad3a58v13edb6fb1daafa52@mail.gmail.com> On Wed, Feb 11, 2009 at 1:54 PM, Andrew Straw wrote: > >> def labelmeanfilter(arr1, arr2): >> R = {} >> [R.setdefault(row[0],[]).append(index) for index, row in >> enumerate(zip(arr1,arr2))] >> > > > I think the following should produce a bit more speed (although I > haven't benchmarked it) because it avoids creating len(arr1) empty lists. > > import collections > > def labelmeanfilter(arr1, arr2): > R = collections.defaultdict(list) > [R[row[0]].append(index) for index, row in > enumerate(zip(arr1,arr2))] > Yes, its around 14% faster for the example, however, it requires python 2.5 and I am still thinking as if I were using 2.4. Josef From josef.pktd at gmail.com Wed Feb 11 14:26:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 14:26:04 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902111106g5dad3a58v13edb6fb1daafa52@mail.gmail.com> References: <20090210213600.123400@gmx.net> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> <49931EF3.7010106@astraw.com> <1cd32cbb0902111106g5dad3a58v13edb6fb1daafa52@mail.gmail.com> Message-ID: <1cd32cbb0902111126h510fa857mcecae47165567a85@mail.gmail.com> labelmeanfilter 0.387612196522 labelmeanfilter1 0.0931486264316 #new version from itertools import izip def labelmeanfilter1(arr1, arr2): labelsunique = np.unique(arr1) labelmeans = ndimage.mean(arr2, labels=arr1, index=labelsunique) labmeandict = dict(izip(labelsunique,labelmeans)) arr3 = np.array([labmeandict[orig] for orig in arr1]) return arr3 arr3_0 = labelmeanfilter(arr1, arr2) arr3_1 = labelmeanfilter1(arr1, arr2) >>> np.all(arr3_1 == arr3_0) True >>> arr3_1.shape (100000,) I'm finished playing, it's simple and obvious. Josef From jgomezdans at gmail.com Wed Feb 11 15:00:45 2009 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Wed, 11 Feb 2009 20:00:45 +0000 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902111126h510fa857mcecae47165567a85@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <1cd32cbb0902110827n773a6897p7b6dca3043784843@mail.gmail.com> <1cd32cbb0902110847w4e234183u79ad1dbec6340c2f@mail.gmail.com> <91d218430902110931s769e0160o62c449fc22f8f95b@mail.gmail.com> <1cd32cbb0902111023y1958afbsa947064f0f3abd34@mail.gmail.com> <49931EF3.7010106@astraw.com> <1cd32cbb0902111106g5dad3a58v13edb6fb1daafa52@mail.gmail.com> <1cd32cbb0902111126h510fa857mcecae47165567a85@mail.gmail.com> Message-ID: <91d218430902111200v6ece97dayb85be9977d74feec@mail.gmail.com> Josef, 2009/2/11 > labelmeanfilter 0.387612196522 > labelmeanfilter1 0.0931486264316 #new version > I'm finished playing, it's simple and obvious. > Wow! That is a massive improvement on my efforts!!!Many thanks to all that helped, it's been very useful. Cheers, Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Feb 11 15:03:33 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 22:03:33 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <20090211154115.67130@gmx.net> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> Message-ID: <9457e7c80902111203xf84ff89k6ddc06bfefab2ecf@mail.gmail.com> Hi Jose 2009/2/11 Jose Luis Gomez Dans : > Well, there can be quite a lot of them (~10000 at least), so it does take a long while. I was just wondering whether some numpy/scipy array Jedi trick might speed it up :) Since you have integer labels, you can make use of the following trick: In [54]: means = np.array([0.1, 0.2, 0.3]) In [55]: means[[1,1,0,1,0,0]] Out[55]: array([ 0.2, 0.2, 0.1, 0.2, 0.1, 0.1]) I implemented a solution using such a "translation table" (see attached). Regards St?fan -------------- next part -------------- A non-text attachment was scrubbed... Name: translate_labels.py Type: application/octet-stream Size: 1090 bytes Desc: not available URL: From stefan at sun.ac.za Wed Feb 11 15:08:03 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 11 Feb 2009 22:08:03 +0200 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902111203xf84ff89k6ddc06bfefab2ecf@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902101414v56ea9319m7fbc70a13229ef9c@mail.gmail.com> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <9457e7c80902111203xf84ff89k6ddc06bfefab2ecf@mail.gmail.com> Message-ID: <9457e7c80902111208g5116ac73odcc1fdefa45c6d73@mail.gmail.com> 2009/2/11 St?fan van der Walt : > In [55]: means[[1,1,0,1,0,0]] > Out[55]: array([ 0.2, 0.2, 0.1, 0.2, 0.1, 0.1]) > > I implemented a solution using such a "translation table" (see attached). Note that, for this approach to work, the labels must progress in increments of one from 0 to N. So labels 0, 1, 2, 3 are fine, but 0, 5, 10 are not. Cheers St?fan From josef.pktd at gmail.com Wed Feb 11 15:49:29 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 15:49:29 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <9457e7c80902111208g5116ac73odcc1fdefa45c6d73@mail.gmail.com> References: <20090210213600.123400@gmx.net> <20090210225642.141110@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <9457e7c80902111203xf84ff89k6ddc06bfefab2ecf@mail.gmail.com> <9457e7c80902111208g5116ac73odcc1fdefa45c6d73@mail.gmail.com> Message-ID: <1cd32cbb0902111249r8df5162m3cca1567d9f4f16f@mail.gmail.com> On Wed, Feb 11, 2009 at 3:08 PM, St?fan van der Walt wrote: > 2009/2/11 St?fan van der Walt : >> In [55]: means[[1,1,0,1,0,0]] >> Out[55]: array([ 0.2, 0.2, 0.1, 0.2, 0.1, 0.1]) >> >> I implemented a solution using such a "translation table" (see attached). > > Note that, for this approach to work, the labels must progress in > increments of one from 0 to N. So labels 0, 1, 2, 3 are fine, but 0, > 5, 10 are not. I just checked that ndimage can handle non-existing labels: >>> ndimage.mean(5.0-np.arange(5), labels=np.arange(1,10,2), index=np.arange(10)) [0.0, 5.0, 0.0, 4.0, 0.0, 3.0, 0.0, 2.0, 0.0, 1.0] So your translation table should work if you replace labels_unique by range(max(labels)) in ndimage.mean(...). I tried some basic example but I didn't really test it. This would work then as long as labels are positive integers. Josef example: >>> tt = ndimage.mean(5.0-np.arange(5), labels=np.arange(1,10,2), index=np.arange(10)) >>> tt [0.0, 5.0, 0.0, 4.0, 0.0, 3.0, 0.0, 2.0, 0.0, 1.0] >>> np.array(tt)[np.arange(1,10,2)] array([ 5., 4., 3., 2., 1.]) >>> From josef.pktd at gmail.com Wed Feb 11 16:31:31 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Feb 2009 16:31:31 -0500 Subject: [SciPy-user] Array selection help In-Reply-To: <1cd32cbb0902111249r8df5162m3cca1567d9f4f16f@mail.gmail.com> References: <20090210213600.123400@gmx.net> <9457e7c80902102339wea512bcj773811ff524a2828@mail.gmail.com> <20090211135648.67960@gmx.net> <9457e7c80902110622t294e1b98hf4c2612e34a16fb6@mail.gmail.com> <20090211150315.141380@gmx.net> <9457e7c80902110726y3d2ca16bmcd42dbccf1f9c55c@mail.gmail.com> <20090211154115.67130@gmx.net> <9457e7c80902111203xf84ff89k6ddc06bfefab2ecf@mail.gmail.com> <9457e7c80902111208g5116ac73odcc1fdefa45c6d73@mail.gmail.com> <1cd32cbb0902111249r8df5162m3cca1567d9f4f16f@mail.gmail.com> Message-ID: <1cd32cbb0902111331s666ea70co89846f810d4fc3a0@mail.gmail.com> On Wed, Feb 11, 2009 at 3:49 PM, wrote: > On Wed, Feb 11, 2009 at 3:08 PM, St?fan van der Walt wrote: >> 2009/2/11 St?fan van der Walt : >>> In [55]: means[[1,1,0,1,0,0]] >>> Out[55]: array([ 0.2, 0.2, 0.1, 0.2, 0.1, 0.1]) >>> >>> I implemented a solution using such a "translation table" (see attached). >> >> Note that, for this approach to work, the labels must progress in >> increments of one from 0 to N. So labels 0, 1, 2, 3 are fine, but 0, >> 5, 10 are not. > > I just checked that ndimage can handle non-existing labels: translation table version, another 10 times faster makes some missing labels >>> np.unique(arr1).shape (9998,) >>> np.unique(arr1)[:10] array([ 0, 1, 2, 3, 4, 7, 8, 9, 10, 11]) labelmeanfilter 0.383765171272 labelmeanfilter1 0.0916504471937 labelmeanfilter2 0.377427047292 labelmeanfilter3 0.00886087477598 # version with translation table with missing labels >>> np.all(arr3_3 == arr3_0) True def labelmeanfilter3(arr1, arr2): # requires integer labels labelsunique = np.arange(np.max(arr1)+1) labelmeans = np.array(ndimage.mean(arr2, labels=arr1, index=labelsunique)) arr3 = labelmeans[arr1] return arr3 I think we started out with more than 20 seconds. Josef From Juergen.Herrmann at XLhost.de Wed Feb 11 17:08:15 2009 From: Juergen.Herrmann at XLhost.de (=?iso8859-1?Q?J=FCrgen_Herrmann?=) Date: Wed, 11 Feb 2009 23:08:15 +0100 (CET) Subject: [SciPy-user] speaker crossover gui app project needs help Message-ID: <7179793a32e83a5bdd275de6c1aa27f1.squirrel@xlhost.de> hi there! i'm currently coding on a python gui application, that will generate coefficients and config for brutefir, a software convolution engine ( http://www.ludd.luth.se/~torger/brutefir.html ). the application offers signal routing between different filters and should allow the design of multi-way crossovers for speakers. i'm totally new to dsp and came accross scipy, which really looks interesting to me, but i have to admit that i hardly understand what i'm doing right now. i have been playing around with signal.butter and signal.remez and integrated basic fr graph plotting for them. but i simply lack the mathematical background for going deeper atm. my wishlist for configurable filters (in order of priority) would be: - low/higpass with configurable -3db freq and slope (preferrably in db/octave) - shelving low/highpass with configurable freq. slope and gain - notch/gain with freq, gain and q settings - all outputs will have configurable delay and gain (already implemented) so if someone with knowledge on this stuff and some hours for discussion on this topic, feel free to chime in! a screenshot of the current working state can be found here: http://t5.by/pyjackfir/screens/screen01.png by the time i have at least one basic filter type going i will release this (gpled) project. best regards and i'm looking forward to your answers. j?rgen herrmann -- >> XLhost.de - eXperts in Linux hosting ? << XLhost.de GmbH J?rgen Herrmann, Gesch?ftsf?hrer Boelckestrasse 21, 93051 Regensburg, Germany Gesch?ftsf?hrer: Volker Geith, J?rgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)700 XLHOSTDE [0700 95467833] Fax: +49 (0)700 XLHOSTDE [0700 95467833] WEB: http://www.XLhost.de IRC: #XLhost at irc.quakenet.org -- >> XLhost.de - eXperts in Linux hosting ? << XLhost.de GmbH J?rgen Herrmann, Gesch?ftsf?hrer Boelckestrasse 21, 93051 Regensburg, Germany Gesch?ftsf?hrer: Volker Geith, J?rgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)700 XLHOSTDE [0700 95467833] Fax: +49 (0)700 XLHOSTDE [0700 95467833] WEB: http://www.XLhost.de IRC: #XLhost at irc.quakenet.org From timmichelsen at gmx-topmail.de Wed Feb 11 17:59:52 2009 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Wed, 11 Feb 2009 23:59:52 +0100 Subject: [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages In-Reply-To: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> Message-ID: > https://edge.launchpad.net/~scipy/+archive/ppa Thanks. There is also: http://linux.pythonxy.com/ubuntu/ From fperez.net at gmail.com Wed Feb 11 18:11:10 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 11 Feb 2009 15:11:10 -0800 Subject: [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages In-Reply-To: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> Message-ID: On Wed, Feb 11, 2009 at 4:46 AM, David Cournapeau wrote: > Hi, > > I started to set up a PPA for scipy on launchpad, which enables to > build ubuntu packages for various distributions/architectures. The > link is there: > > https://edge.launchpad.net/~scipy/+archive/ppa Cool, thanks. Is it easy to provide also hardy packages, or does it require a lot of work on your part? Cheers, f From karl.young at ucsf.edu Wed Feb 11 18:31:08 2009 From: karl.young at ucsf.edu (Karl Young) Date: Wed, 11 Feb 2009 15:31:08 -0800 Subject: [SciPy-user] slice question In-Reply-To: References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> Message-ID: <49935FBC.2080902@ucsf.edu> Sorry for the dumb question but I did search quite a bit before realizing it would only take a couple of keystrokes from an Illuminati to dispel my ignorance. I want to slice an array as: A[a:b,c:d,e:f,...] starting with two 1d arrays containing the lower and upper slice limits: B = [a,c,e,...] and C = [b,d,f,...] and I'd like to write a general expression for this given A,B,C (i.e. not specify the dimension) but I can't figure out how to turn B and C into a:b,c:d,e:f,... re indexing A - any thoughts ? Thanks From robert.kern at gmail.com Wed Feb 11 18:34:35 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 11 Feb 2009 17:34:35 -0600 Subject: [SciPy-user] slice question In-Reply-To: <49935FBC.2080902@ucsf.edu> References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> <49935FBC.2080902@ucsf.edu> Message-ID: <3d375d730902111534s3dea3e1axf5f6bf57a6980b22@mail.gmail.com> On Wed, Feb 11, 2009 at 17:31, Karl Young wrote: > > Sorry for the dumb question but I did search quite a bit before > realizing it would only take a couple of keystrokes from an Illuminati > to dispel my ignorance. > > I want to slice an array as: > > A[a:b,c:d,e:f,...] > > starting with two 1d arrays containing the lower and upper slice limits: > > B = [a,c,e,...] and C = [b,d,f,...] > > and I'd like to write a general expression for this given A,B,C (i.e. > not specify the dimension) but I can't figure out how to turn B and C > into a:b,c:d,e:f,... re indexing A - any thoughts ? Thanks A[tuple([slice(b,c) for b,c in zip(B,C)])] -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From karl.young at ucsf.edu Wed Feb 11 18:40:05 2009 From: karl.young at ucsf.edu (Karl Young) Date: Wed, 11 Feb 2009 15:40:05 -0800 Subject: [SciPy-user] slice question In-Reply-To: <3d375d730902111534s3dea3e1axf5f6bf57a6980b22@mail.gmail.com> References: <5b8d13220902110446x82e25a9ifb11bb563e468313@mail.gmail.com> <49935FBC.2080902@ucsf.edu> <3d375d730902111534s3dea3e1axf5f6bf57a6980b22@mail.gmail.com> Message-ID: <499361D5.2090308@ucsf.edu> Thanks maestro ! (not quite as trivial as I thought it would be) > On Wed, Feb 11, 2009 at 17:31, Karl Young wrote: > >> Sorry for the dumb question but I did search quite a bit before >> realizing it would only take a couple of keystrokes from an Illuminati >> to dispel my ignorance. >> >> I want to slice an array as: >> >> A[a:b,c:d,e:f,...] >> >> starting with two 1d arrays containing the lower and upper slice limits: >> >> B = [a,c,e,...] and C = [b,d,f,...] >> >> and I'd like to write a general expression for this given A,B,C (i.e. >> not specify the dimension) but I can't figure out how to turn B and C >> into a:b,c:d,e:f,... re indexing A - any thoughts ? Thanks >> > > A[tuple([slice(b,c) for b,c in zip(B,C)])] > > From bernardo.rocha at meduni-graz.at Thu Feb 12 02:23:44 2009 From: bernardo.rocha at meduni-graz.at (Bernardo M. Rocha) Date: Thu, 12 Feb 2009 08:23:44 +0100 Subject: [SciPy-user] intersect (matlab) Message-ID: <4993CE80.20204@meduni-graz.at> Hi Guys, Is there an equivalent in scipy/numpy to the following MATLAB code??? Or is there a way to do the same and get this ia and ib? A = [1 2 3 6]; B = [1 2 3 4 6 10 20]; [c, ia, ib] = intersect(A, B); disp([c; ia; ib]) 1 2 3 6 1 2 3 4 1 2 3 5 Best regards. Bernardo M. Rocha From robert.kern at gmail.com Thu Feb 12 02:56:21 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 12 Feb 2009 01:56:21 -0600 Subject: [SciPy-user] intersect (matlab) In-Reply-To: <4993CE80.20204@meduni-graz.at> References: <4993CE80.20204@meduni-graz.at> Message-ID: <3d375d730902112356k65a9e993ue42337b56c32b3b@mail.gmail.com> On Thu, Feb 12, 2009 at 01:23, Bernardo M. Rocha wrote: > Hi Guys, > > Is there an equivalent in scipy/numpy to the following MATLAB code??? Or > is there a way to do the same and get this ia and ib? > > A = [1 2 3 6]; B = [1 2 3 4 6 10 20]; > [c, ia, ib] = intersect(A, B); > disp([c; ia; ib]) 1 2 3 6 > 1 2 3 4 > 1 2 3 5 In [40]: A = array([1, 2, 3, 6]) In [41]: B = array([1,2,3,4,6,10,20]) In [42]: c = intersect1d(A, B) In [43]: c Out[43]: array([1, 2, 3, 6]) In [46]: ma = setmember1d(A, B) In [47]: ma Out[47]: array([ True, True, True, True], dtype=bool) In [48]: ia = nonzero(ma)[0] In [49]: ia Out[49]: array([0, 1, 2, 3]) In [50]: mb = setmember1d(B, A) In [51]: mb Out[51]: array([ True, True, True, False, True, False, False], dtype=bool) In [52]: ib = nonzero(mb)[0] In [53]: ib Out[53]: array([0, 1, 2, 4]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Thu Feb 12 09:53:12 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 12 Feb 2009 15:53:12 +0100 Subject: [SciPy-user] shared memory machines In-Reply-To: <20090211142422.GD19956@phare.normalesup.org> References: <51ab5afe7eb746fa60db99ef7635df9b.squirrel@webmail.uio.no> <20090210231356.GC9128@phare.normalesup.org> <20090211114620.GB19956@phare.normalesup.org> <4992C539.7020501@molden.no> <20090211133305.GC19956@phare.normalesup.org> <4992D7EA.5070404@molden.no> <20090211142422.GD19956@phare.normalesup.org> Message-ID: <499437D8.9080103@molden.no> On 2/11/2009 3:24 PM, Gael Varoquaux wrote: > I have put in some more print statements. I have the fealing that it is > not cleaned up. I am attaching my test code, and the modified > sharedmemory_sysv.pyx for debug. The output is the following: I have reproduced the same error in Windows. A suspected, it stems from using multiprocessing's Finalizer. For some reason it is never called. Also having the heap object as a class attribute of the BufferWrapper prevents clean-up with a __del__ method. Whereas having it as a global variable in the module works ok. The problem is not the Cython extension code, it is the array_heap.py module. I think we should not use a malloc at all. If a segment cannot be reused (Heap.free) until all other handles to it is closed. This is a bug in my code. The easiest solution is to remove the heap malloc all together. In that case, the allocator for small arrays will just reuse a shared segment until it is exhausted, and then discared it. Large arrays will get their own segment. Otherwise we have to do refcounting for open handles (manually in Windows) and we are back to thread-based cleanup in the creator. Sturla Molden From rmay31 at gmail.com Thu Feb 12 13:04:53 2009 From: rmay31 at gmail.com (Ryan May) Date: Thu, 12 Feb 2009 12:04:53 -0600 Subject: [SciPy-user] odeint for calculating trajectories Message-ID: Hi, Is there a good way to use scipy.integrate.odeint to calculate trajectories from an observed velocity field? I know you can do this when you have an analytic expression for dx/dt, but in this case I have a spatial grid of values for dx/dt. The only way I've come up with is to make the function passed to odeint something that will interpolate fromt the grid to the given point. Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from: Norman Oklahoma United States. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.clewley at gmail.com Thu Feb 12 13:16:22 2009 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 12 Feb 2009 13:16:22 -0500 Subject: [SciPy-user] odeint for calculating trajectories In-Reply-To: References: Message-ID: > Is there a good way to use scipy.integrate.odeint to calculate trajectories > from an observed velocity field? I know you can do this when you have an > analytic expression for dx/dt, but in this case I have a spatial grid of > values for dx/dt. The only way I've come up with is to make the function > passed to odeint something that will interpolate fromt the grid to the given > point. I don't think odeint is the right tool for this job - there is no ODE integration to do if you do not have an explicit function for the vector field. You should think of it purely as an interpolation problem. You have (t,x) values and (t, dx/dt) values, so this defines a piecewise quadratic function which has continuous *second* derivative everywhere (i.e. the trajectory smoothly agrees at your mesh points). I would use the polynomial interpolation classes that were recently added to scipy by Anne Archibald (search this list for details about it). You pass it your arrays of values and you get back a function that smoothly interpolates through your points. This is the most accurate trajectory that you can derive from this finite mesh vector-field. -Rob From peridot.faceted at gmail.com Thu Feb 12 16:28:11 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 12 Feb 2009 16:28:11 -0500 Subject: [SciPy-user] odeint for calculating trajectories In-Reply-To: References: Message-ID: 2009/2/12 Rob Clewley : >> Is there a good way to use scipy.integrate.odeint to calculate trajectories >> from an observed velocity field? I know you can do this when you have an >> analytic expression for dx/dt, but in this case I have a spatial grid of >> values for dx/dt. The only way I've come up with is to make the function >> passed to odeint something that will interpolate fromt the grid to the given >> point. > > > I don't think odeint is the right tool for this job - there is no ODE > integration to do if you do not have an explicit function for the > vector field. You should think of it purely as an interpolation > problem. You have (t,x) values and (t, dx/dt) values, so this defines > a piecewise quadratic function which has continuous *second* > derivative everywhere (i.e. the trajectory smoothly agrees at your > mesh points). I would use the polynomial interpolation classes that > were recently added to scipy by Anne Archibald (search this list for > details about it). You pass it your arrays of values and you get back > a function that smoothly interpolates through your points. This is the > most accurate trajectory that you can derive from this finite mesh > vector-field. Put another way, odeint is full of cleverness to figure out how fast the derivative is changing, but that is of no use to you here (unless you have an extremely high-resolution, slowly-changing vector field). So an ode solver that walks along the grid is about as good as you can do. There is actually cython code to do exactly this in the scikit vectorplot, which uses it to implement line integral convolution. If you want fast and dirty trajectories, you may be interested in modifying that code. Using a piecewise polynomial will certainly give you a smoother trajectory, though. Anne From R.Springuel at umit.maine.edu Thu Feb 12 18:06:17 2009 From: R.Springuel at umit.maine.edu (R. Padraic Springuel) Date: Thu, 12 Feb 2009 18:06:17 -0500 Subject: [SciPy-user] isnotnan Message-ID: <4994AB69.7000304@umit.maine.edu> Is there a isnotnan function somewhere in the numpy or scipy library that functions similarly to isnan (except that the results are reversed)? -- R. Padraic Springuel Research Assistant Department of Physics and Astronomy University of Maine Bennett 309 Office Hours: By appointment only -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5627 bytes Desc: S/MIME Cryptographic Signature URL: From c-b at asu.edu Thu Feb 12 18:37:15 2009 From: c-b at asu.edu (Christopher Brown) Date: Thu, 12 Feb 2009 16:37:15 -0700 Subject: [SciPy-user] isnotnan In-Reply-To: <4994AB69.7000304@umit.maine.edu> References: <4994AB69.7000304@umit.maine.edu> Message-ID: <4994B2AB.3040005@asu.edu> Hi Padraic, PS> Is there a isnotnan function somewhere in the numpy or scipy library PS> that functions similarly to isnan (except that the results are PS> reversed)? I don't understand. Will 'not numpy.isnan' not work? -- Chris From pgmdevlist at gmail.com Thu Feb 12 18:37:37 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 12 Feb 2009 18:37:37 -0500 Subject: [SciPy-user] isnotnan In-Reply-To: <4994AB69.7000304@umit.maine.edu> References: <4994AB69.7000304@umit.maine.edu> Message-ID: On Feb 12, 2009, at 6:06 PM, R. Padraic Springuel wrote: > Is there a isnotnan function somewhere in the numpy or scipy library > that functions similarly to isnan (except that the results are > reversed)? Like np.logical_not(np.isnan(...)) ? From pgmdevlist at gmail.com Thu Feb 12 18:41:38 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 12 Feb 2009 18:41:38 -0500 Subject: [SciPy-user] isnotnan In-Reply-To: <4994B2AB.3040005@asu.edu> References: <4994AB69.7000304@umit.maine.edu> <4994B2AB.3040005@asu.edu> Message-ID: <52F18680-6E58-4274-9BDB-F09BB493A3E0@gmail.com> On Feb 12, 2009, at 6:37 PM, Christopher Brown wrote: > Hi Padraic, > > PS> Is there a isnotnan function somewhere in the numpy or scipy > library > PS> that functions similarly to isnan (except that the results are > PS> reversed)? > > I don't understand. Will 'not numpy.isnan' not work? Can't work: "not" works on booleans, not on arrays, and np.isnan returns a ndarray of booleans. You end up raising a ValueError exception: >>> x = np.array([1,np.nan,3.]) >>> not np.isnan(x) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Just use np.logical_not. From robert.kern at gmail.com Thu Feb 12 18:47:13 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 12 Feb 2009 17:47:13 -0600 Subject: [SciPy-user] isnotnan In-Reply-To: <4994AB69.7000304@umit.maine.edu> References: <4994AB69.7000304@umit.maine.edu> Message-ID: <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> On Thu, Feb 12, 2009 at 17:06, R. Padraic Springuel wrote: > Is there a isnotnan function somewhere in the numpy or scipy library that > functions similarly to isnan (except that the results are reversed)? ~isnan(x) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From yelvergm at gmail.com Thu Feb 12 20:39:07 2009 From: yelvergm at gmail.com (yelver huang) Date: Thu, 12 Feb 2009 20:39:07 -0500 Subject: [SciPy-user] Help on installation of scipy Message-ID: Hi all, I have encountered some problems on installing scipy on Windows, though it's quite simple. There is no fault for me to import numpy from Python25, but after I install the binary file of scipy, I could not import it. The error it shows is: Warning (from warnings module): File "G:\Python25\lib\site-packages\scipy\__init__.py", line 30 UserWarning) UserWarning: Numpy 1.2.0 or above is recommended for this version of scipy (detected version 1.0.4) Traceback (most recent call last): File "", line 1, in import scipy File "G:\Python25\Lib\site-packages\scipy\__init__.py", line 75, in from numpy.testing import Tester ImportError: cannot import name Tester The version of numpy in my computer is 1.2.1, I could not understand this message. Hope someone could help me. Thanks, Tao -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Feb 12 20:33:50 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Feb 2009 10:33:50 +0900 Subject: [SciPy-user] Help on installation of scipy In-Reply-To: References: Message-ID: <4994CDFE.8060706@ar.media.kyoto-u.ac.jp> Hi Tao, yelver huang wrote: > Hi all, > > I have encountered some problems on installing scipy on Windows, > though it's quite simple. There is no fault for me to import numpy > from Python25, but after I install the binary file of scipy, I could > not import it. The error it shows is: First, what does the following command tell you: python -c "import numpy; print numpy.version.version; print numpy.__file__" This gives both versions and numpy package location (I suspect that you have several numpy installs, and that scipy does not pick up the one you expect). How did you install numpy and scipy ? Did you use easy_install ? cheers, David From yelvergm at gmail.com Thu Feb 12 21:21:33 2009 From: yelvergm at gmail.com (yelver huang) Date: Thu, 12 Feb 2009 21:21:33 -0500 Subject: [SciPy-user] Help on installation of scipy In-Reply-To: <4994CDFE.8060706@ar.media.kyoto-u.ac.jp> References: <4994CDFE.8060706@ar.media.kyoto-u.ac.jp> Message-ID: Hi David, Thank you so much for you help. I have execute the command you suggest and the result is as you expected: the version is 1.0.4, and the result of 'print numpy.__file__' is: 'G:\Program Files\MGLTools 1.5.2\MGLToolsPckgs\numpy\__init__.pyc' This relates to my previously installed software Autodock, so can you give me some suggestion on what I can do further in order to implement numpy and scipy? Cheers, Tao On Thu, Feb 12, 2009 at 8:33 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Hi Tao, > > yelver huang wrote: > > Hi all, > > > > I have encountered some problems on installing scipy on Windows, > > though it's quite simple. There is no fault for me to import numpy > > from Python25, but after I install the binary file of scipy, I could > > not import it. The error it shows is: > > First, what does the following command tell you: > > python -c "import numpy; print numpy.version.version; print numpy.__file__" > > This gives both versions and numpy package location (I suspect that you > have several numpy installs, and that scipy does not pick up the one you > expect). How did you install numpy and scipy ? Did you use easy_install ? > > cheers, > > David > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > -- Tao-wei Huang Department of Chemistry and Chemical Biology Rensselaer Polytechnic Institute Phone: 518-275-7997 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Feb 12 21:19:53 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Feb 2009 11:19:53 +0900 Subject: [SciPy-user] Help on installation of scipy In-Reply-To: References: <4994CDFE.8060706@ar.media.kyoto-u.ac.jp> Message-ID: <4994D8C9.4020503@ar.media.kyoto-u.ac.jp> yelver huang wrote: > Hi David, Thank you so much for you help. I have execute the command > you suggest and the result is as you expected: the version is 1.0.4, > and the result of 'print numpy.__file__' is: 'G:\Program > Files\MGLTools 1.5.2\MGLToolsPckgs\numpy\__init__.pyc' > > This relates to my previously installed software Autodock, so can you > give me some suggestion on what I can do further in order to implement > numpy and scipy? I don't know autodock, but I would guess that it adds MGLToolsPckgs to your PYTHONPATH environment variable. Ideally, if autodock uses its own local python packages, it should keep them private - if you put C:\Python25\libs\site-packages in front of your PYTHONPATH, I am afraid it will break autodock. That may be an issue worth discussing with autodock people, David From bsouthey at gmail.com Thu Feb 12 21:38:35 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 20:38:35 -0600 Subject: [SciPy-user] isnotnan In-Reply-To: <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> Message-ID: Hi, Thanks to the Doc Marathon! Perhaps numpy.isfinite() is what you want: http://docs.scipy.org/doc/numpy/reference/generated/numpy.isfinite.html It is even linked from numpy.isnan() page http://docs.scipy.org/doc/numpy/reference/generated/numpy.isnan.html Bruce On Thu, Feb 12, 2009 at 5:47 PM, Robert Kern wrote: > On Thu, Feb 12, 2009 at 17:06, R. Padraic Springuel > wrote: >> Is there a isnotnan function somewhere in the numpy or scipy library that >> functions similarly to isnan (except that the results are reversed)? > > ~isnan(x) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > From wesmckinn at gmail.com Thu Feb 12 21:45:00 2009 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 12 Feb 2009 21:45:00 -0500 Subject: [SciPy-user] isnotnan In-Reply-To: References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> Message-ID: isfinite(x) is faster than -isnan(x), and it also gets INFs, so definitely the way to go. On Feb 12, 2009, at 9:38 PM, Bruce Southey wrote: > Hi, > Thanks to the Doc Marathon! > > Perhaps numpy.isfinite() is what you want: > http://docs.scipy.org/doc/numpy/reference/generated/ > numpy.isfinite.html > > It is even linked from numpy.isnan() page > http://docs.scipy.org/doc/numpy/reference/generated/numpy.isnan.html > > Bruce > > > > On Thu, Feb 12, 2009 at 5:47 PM, Robert Kern > wrote: >> On Thu, Feb 12, 2009 at 17:06, R. Padraic Springuel >> wrote: >>> Is there a isnotnan function somewhere in the numpy or scipy >>> library that >>> functions similarly to isnan (except that the results are reversed)? >> >> ~isnan(x) >> >> -- >> Robert Kern >> >> "I have come to believe that the whole world is an enigma, a harmless >> enigma that is made terrible by our own mad attempt to interpret >> it as >> though it had an underlying truth." >> -- Umberto Eco >> _______________________________________________ >> SciPy-user mailing list >> SciPy-user at scipy.org >> http://projects.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user From david at ar.media.kyoto-u.ac.jp Thu Feb 12 21:30:05 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Feb 2009 11:30:05 +0900 Subject: [SciPy-user] isnotnan In-Reply-To: References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> Message-ID: <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> Bruce Southey wrote: > Hi, > Thanks to the Doc Marathon! > > Perhaps numpy.isfinite() is what you want: > http://docs.scipy.org/doc/numpy/reference/generated/numpy.isfinite.html > isfinite may be acceptable for the OP, but !isnan is not the same as isfinite. cheers, David From yelvergm at gmail.com Thu Feb 12 21:48:11 2009 From: yelvergm at gmail.com (yelver huang) Date: Thu, 12 Feb 2009 21:48:11 -0500 Subject: [SciPy-user] Help on installation of scipy In-Reply-To: <4994D8C9.4020503@ar.media.kyoto-u.ac.jp> References: <4994CDFE.8060706@ar.media.kyoto-u.ac.jp> <4994D8C9.4020503@ar.media.kyoto-u.ac.jp> Message-ID: Hi David, Thank you for your suggestion, I have uninstall the software of Autodock, which I might not use at present. And now there is no error after I import scipy. Hopefully I can start learn scipy now. Cheers, Tao On Thu, Feb 12, 2009 at 9:19 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > yelver huang wrote: > > Hi David, Thank you so much for you help. I have execute the command > > you suggest and the result is as you expected: the version is 1.0.4, > > and the result of 'print numpy.__file__' is: 'G:\Program > > Files\MGLTools 1.5.2\MGLToolsPckgs\numpy\__init__.pyc' > > > > This relates to my previously installed software Autodock, so can you > > give me some suggestion on what I can do further in order to implement > > numpy and scipy? > > I don't know autodock, but I would guess that it adds MGLToolsPckgs to > your PYTHONPATH environment variable. Ideally, if autodock uses its own > local python packages, it should keep them private - if you put > C:\Python25\libs\site-packages in front of your PYTHONPATH, I am afraid > it will break autodock. That may be an issue worth discussing with > autodock people, > > David > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > -- Tao-wei Huang Department of Chemistry and Chemical Biology Rensselaer Polytechnic Institute Phone: 518-275-7997 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Thu Feb 12 21:50:41 2009 From: rmay31 at gmail.com (Ryan May) Date: Thu, 12 Feb 2009 20:50:41 -0600 Subject: [SciPy-user] scipy.ndimage.gaussian_filter for masked data? Message-ID: Hi, I have a 2D grid of spatial data that I wanted to smooth just using a simple gaussian filter. My grid comes from observational data, so there are some points that are masked. Is there something similar to scipy.ndimage.gaussian_filter anywhere that will work with masked points? Thanks in advance, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Feb 12 22:11:49 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 21:11:49 -0600 Subject: [SciPy-user] isnotnan In-Reply-To: <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> Message-ID: On Thu, Feb 12, 2009 at 8:30 PM, David Cournapeau wrote: > Bruce Southey wrote: >> Hi, >> Thanks to the Doc Marathon! >> >> Perhaps numpy.isfinite() is what you want: >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.isfinite.html >> > > isfinite may be acceptable for the OP, but !isnan is not the same as > isfinite. > > cheers, > > David > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user > Yes, isfinite is not the opposite of isnan unless there are no positive or negative infinity elements. Bruce From josef.pktd at gmail.com Thu Feb 12 22:16:26 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 12 Feb 2009 22:16:26 -0500 Subject: [SciPy-user] incorrect variance in optimize.curvefit and leastsq Message-ID: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> I just saw the new optimize.curvefit which provides a wrapper around optimize.leastsq optimize.leastsq provides the raw covariance matrix (cov_x). As I mentioned once on the mailing list, this is not the covariance matrix of the parameter estimates. To get those, the raw covariance matrix has to be multiplied by the standard error of the residual. So, the docstring in optimize.curvefit doesn't correspond to the actual calculation. I'm preparing a test against an example from the NIST certified cases: http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml >>> SSE=np.sum((y-yh)**2) difference in standard deviation of the parameter estimates compared to NIST: >>> np.sqrt(SSE/12.0)*np.sqrt(np.diag(pcov))[0] - 3.1643950207E+00 2.4865573906573957e-006 >>> np.sqrt(SSE/12.0)*np.sqrt(np.diag(pcov))[1] - 4.2547321834E-06 3.6275492099440408e-012 The first parameter is not exactly high precision. The second problem is that, in weighted least squares, the calculation of the standard deviation of the parameter estimates has to take the weights into account. (But I don't have the formulas right now.) I was looking at this to provide a general non-linear least squares class in stats. But for several calculation, the Jacobian would be necessary. optimize.leastsq only provides cov_x, but I was wondering whether the Jacobian can be calculated from the return of the minpack functions in optimize.leastsq, but I didn't have time to figure this out. Josef From c-b at asu.edu Thu Feb 12 22:22:19 2009 From: c-b at asu.edu (Christopher Brown) Date: Thu, 12 Feb 2009 20:22:19 -0700 Subject: [SciPy-user] isnotnan In-Reply-To: <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> Message-ID: <4994E76B.4060906@asu.edu> David Cournapeau wrote: > isfinite may be acceptable for the OP, but !isnan is not the same as > isfinite. So, what your saying is, not is not a number is not is infinite. Got it. :) From david at ar.media.kyoto-u.ac.jp Thu Feb 12 22:22:40 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Feb 2009 12:22:40 +0900 Subject: [SciPy-user] isnotnan In-Reply-To: <4994E76B.4060906@asu.edu> References: <4994AB69.7000304@umit.maine.edu> <3d375d730902121547j23a54345kd87b6a8ea5a8b5ff@mail.gmail.com> <4994DB2D.6010004@ar.media.kyoto-u.ac.jp> <4994E76B.4060906@asu.edu> Message-ID: <4994E780.8090403@ar.media.kyoto-u.ac.jp> Christopher Brown wrote: > David Cournapeau wrote: > >> isfinite may be acceptable for the OP, but !isnan is not the same as >> isfinite. >> > > So, what your saying is, not is not a number is not is infinite. Got it. :) > Or more simply, that inf is a number but not finite :) David From oliphant at enthought.com Thu Feb 12 23:09:19 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 12 Feb 2009 23:09:19 -0500 Subject: [SciPy-user] incorrect variance in optimize.curvefit and leastsq In-Reply-To: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> References: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> Message-ID: <4994F26F.4050808@enthought.com> josef.pktd at gmail.com wrote: > I just saw the new optimize.curvefit which provides a wrapper around > optimize.leastsq > > optimize.leastsq provides the raw covariance matrix (cov_x). As I > mentioned once on the mailing list, this is not the covariance matrix > of the parameter estimates. To get those, the raw covariance matrix > has to be multiplied by the standard error of the residual. So, the > docstring in optimize.curvefit doesn't correspond to the actual > calculation. > Thank you for the clarification. I had forgotten your earlier valid concerns. Help fixing the docstring is appreciated. If you can figure out how to improve the code, that is even better. I think it is good to at least report the cov, but the docstring should not mislead. > > The first parameter is not exactly high precision. > > The second problem is that, in weighted least squares, the calculation > of the standard deviation of the parameter estimates has to take the > weights into account. (But I don't have the formulas right now.) > > I was looking at this to provide a general non-linear least squares > class in stats. But for several calculation, the Jacobian would be > necessary. optimize.leastsq only provides cov_x, but I was wondering > whether the Jacobian can be calculated from the return of the minpack > functions in optimize.leastsq, but I didn't have time to figure this > out. > I'm not sure, but it might be. I would love to spend time on this, but don't have it. If somebody else can pick up, that would be great. -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliphant at enthought.com From oliphant at enthought.com Thu Feb 12 23:56:20 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Thu, 12 Feb 2009 23:56:20 -0500 Subject: [SciPy-user] incorrect variance in optimize.curvefit and leastsq In-Reply-To: <4994F26F.4050808@enthought.com> References: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> <4994F26F.4050808@enthought.com> Message-ID: <4994FD74.9080305@enthought.com> Travis E. Oliphant wrote: > josef.pktd at gmail.com wrote: > >> I just saw the new optimize.curvefit which provides a wrapper around >> optimize.leastsq >> >> optimize.leastsq provides the raw covariance matrix (cov_x). As I >> mentioned once on the mailing list, this is not the covariance matrix >> of the parameter estimates. To get those, the raw covariance matrix >> has to be multiplied by the standard error of the residual. So, the >> docstring in optimize.curvefit doesn't correspond to the actual >> calculation. >> >> > Thank you for the clarification. I had forgotten your earlier valid > concerns. Help fixing the docstring is appreciated. If you can > figure out how to improve the code, that is even better. I think it is > good to at least report the cov, but the docstring should not mislead. > >> The first parameter is not exactly high precision. >> >> The second problem is that, in weighted least squares, the calculation >> of the standard deviation of the parameter estimates has to take the >> weights into account. (But I don't have the formulas right now.) >> >> I was looking at this to provide a general non-linear least squares >> class in stats. But for several calculation, the Jacobian would be >> necessary. optimize.leastsq only provides cov_x, but I was wondering >> whether the Jacobian can be calculated from the return of the minpack >> functions in optimize.leastsq, but I didn't have time to figure this >> out. >> >> > I'm not sure, but it might be. I would love to spend time on this, > but don't have it. If somebody else can pick up, that would be great. O.K. So my desire to spend time on it outweighed my wisdom, and I went ahead and looked at the reference linked-to and multipled by the necessary scale factor. I fixed the documentation in leastsq as well. A sanity check on my work would be appreciated. I divided by the sum of the weights squared for the weighted case. I'm not sure if this is correct, but it's probably close. When someone can verify the formula that would be great. Adding a check against the test case referred-to would be great. -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliphant at enthought.com From josef.pktd at gmail.com Thu Feb 12 23:57:58 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 12 Feb 2009 23:57:58 -0500 Subject: [SciPy-user] incorrect variance in optimize.curvefit and leastsq In-Reply-To: <4994F26F.4050808@enthought.com> References: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> <4994F26F.4050808@enthought.com> Message-ID: <1cd32cbb0902122057x1fb8ac51l74b6988ec98a8f51@mail.gmail.com> On Thu, Feb 12, 2009 at 11:09 PM, Travis E. Oliphant wrote: > josef.pktd at gmail.com wrote: >> I just saw the new optimize.curvefit which provides a wrapper around >> optimize.leastsq >> >> optimize.leastsq provides the raw covariance matrix (cov_x). As I >> mentioned once on the mailing list, this is not the covariance matrix >> of the parameter estimates. To get those, the raw covariance matrix >> has to be multiplied by the standard error of the residual. So, the >> docstring in optimize.curvefit doesn't correspond to the actual >> calculation. >> > Thank you for the clarification. I had forgotten your earlier valid > concerns. Help fixing the docstring is appreciated. If you can > figure out how to improve the code, that is even better. I think it is > good to at least report the cov, but the docstring should not mislead. >> the standard deviation of the error can be calculated and the corrected (this is written for the use from outside of curvefit): yhat = func(x,popt[0], popt[1]) # get predicted observations SSE = np.sum((y-yhat)**2) sig2 = SSE/(len(y)-len(popt)) ecov = sig2*pcov # this is the variance-covariance matrix of the parameter estimates inside curvefit, this work (before the return): err = func(popt, *args) SSE = np.sum((err)**2) sig2 = SSE / (len(ydata) - len(popt)) pcov = sig2 * pcov >> The first parameter is not exactly high precision. >> >> The second problem is that, in weighted least squares, the calculation >> of the standard deviation of the parameter estimates has to take the >> weights into account. (But I don't have the formulas right now.) >> >> I was looking at this to provide a general non-linear least squares >> class in stats. But for several calculation, the Jacobian would be >> necessary. optimize.leastsq only provides cov_x, but I was wondering >> whether the Jacobian can be calculated from the return of the minpack >> functions in optimize.leastsq, but I didn't have time to figure this >> out. >> > I'm not sure, but it might be. I would love to spend time on this, > but don't have it. If somebody else can pick up, that would be great. > > -Travis > Below are two versions of the test function, the first is against curvefit with corrected pcov, the second is a test against an uncorrected curvefit. It uses only decimal=5 so the tests don't fail Josef ---------- test against corrected version ---------- def test_curvefit(): '''test against NIST certified case at http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml''' data = array([[ 10.07, 77.6 ], [ 14.73, 114.9 ], [ 17.94, 141.1 ], [ 23.93, 190.8 ], [ 29.61, 239.9 ], [ 35.18, 289. ], [ 40.02, 332.8 ], [ 44.82, 378.4 ], [ 50.76, 434.8 ], [ 55.05, 477.3 ], [ 61.01, 536.8 ], [ 66.4 , 593.1 ], [ 75.47, 689.1 ], [ 81.78, 760. ]]) pstd_c = [3.1643950207E+00, 4.2547321834E-06] popt_c = [3.3799746163E+02, 3.9039091287E-04] SSE_c = 7.5464681533E-02 Rstd_c = 7.9301471998E-02 decimal = 5 #accuracy of parameter estimate and standard deviation def funct(x, b1, b2): return b1 * (1-(1+b2*x/2)**(-2)) start1 = [500, 0.0001] start2 = [300, 0.0002] for start in [start1, start2]: popt, pcov = curve_fit(funct, x, y, p0=start) pstd = np.sqrt(np.diag(pcov)) assert_almost_equal(popt, popt_c, decimal=decimal) assert_almost_equal(pstd, pstd_c, decimal=decimal) ------------------- test against current version: -------------------------------------- from numpy.testing import assert_almost_equal def test_curvefit_old(): '''test against NIST certified case at http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml''' data = array([[ 10.07, 77.6 ], [ 14.73, 114.9 ], [ 17.94, 141.1 ], [ 23.93, 190.8 ], [ 29.61, 239.9 ], [ 35.18, 289. ], [ 40.02, 332.8 ], [ 44.82, 378.4 ], [ 50.76, 434.8 ], [ 55.05, 477.3 ], [ 61.01, 536.8 ], [ 66.4 , 593.1 ], [ 75.47, 689.1 ], [ 81.78, 760. ]]) pstd_c = [3.1643950207E+00, 4.2547321834E-06] popt_c = [3.3799746163E+02, 3.9039091287E-04] SSE_c = 7.5464681533E-02 Rstd_c = 7.9301471998E-02 decimal = 5 #accuracy of parameter estimate and standard deviation def funct(x, b1, b2): return b1 * (1-(1+b2*x/2)**(-2)) start1 = [500, 0.0001] start2 = [300, 0.0002] for start in [start1, start2]: popt, pcov = curve_fit(funct, x, y, p0=start) yest = funct(x,popt[0], popt[1]) SSE = np.sum((y-yest)**2) dof = len(y)-len(popt) #Residual standard error Rstd = np.sqrt(SSE/(len(y)-len(popt))) #parameter standard error pstd = np.sqrt(SSE/(len(y)-len(popt))*np.diag(pcov)) assert_almost_equal(popt, popt_c, decimal=decimal) assert_almost_equal(pstd, pstd_c, decimal=decimal) assert_almost_equal(SSE, SSE_c) assert_almost_equal(Rstd, Rstd_c) From oliphant at enthought.com Fri Feb 13 00:24:13 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 13 Feb 2009 00:24:13 -0500 Subject: [SciPy-user] incorrect variance in optimize.curvefit and leastsq In-Reply-To: <1cd32cbb0902122057x1fb8ac51l74b6988ec98a8f51@mail.gmail.com> References: <1cd32cbb0902121916q60a08be6ka027b5873a3239da@mail.gmail.com> <4994F26F.4050808@enthought.com> <1cd32cbb0902122057x1fb8ac51l74b6988ec98a8f51@mail.gmail.com> Message-ID: <499503FD.3050004@enthought.com> josef.pktd at gmail.com wrote: > On Thu, Feb 12, 2009 at 11:09 PM, Travis E. Oliphant >