From contact at pythonxy.com Fri Jan 1 08:43:17 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Fri, 1 Jan 2010 14:43:17 +0100 Subject: [Numpy-discussion] Announcing toydist, improving distribution and packaging situation Message-ID: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> Hi David, Following your announcement for the 'toydist' module, I think that your project is very promising: this is certainly a great idea and it will be very controversial but that's because people expectactions are great on this matter (distutils is so disappointing indeed). Anyway, if I may be useful, I'll gladly contribute to it. In time, I could change the whole Python(x,y) packaging system (which is currently quite ugly... but easy/quick to manage/maintain) to use/promote this new module. Happy New Year! and Long Live Scientific Python! ;-) Cheers, Pierre From kwgoodman at gmail.com Fri Jan 1 15:23:17 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 1 Jan 2010 12:23:17 -0800 Subject: [Numpy-discussion] arrays and __eq__ Message-ID: I have a class that stores some of its data in a numpy array. I can check for equality when myclass is on the left and an array is on the right: >> m = myclass([1,2,3]) >> a = np.asarray([9,2,3]) >> >> m == a myclass([False, True, True], dtype=bool) But I get the wrong answer when an array is on the left and myclass is on the right: >> a == m array([ True, True, True], dtype=bool) import numpy as np class myclass(object): def __init__(self, arr): self.arr = np.asarray(arr) def __eq__(self, other): if np.isscalar(other) or isinstance(other, np.ndarray): x = myclass(self.arr.copy()) x.arr = x.arr == other else: raise TypeError, 'This example just tests numpy arrays and scalars.' return x def __repr__(self): return 'myclass' + repr(self.arr).split('array')[1] I've run into a similar problem with __radd__ but the solution to that problem doesn't work for __eq__: http://www.mail-archive.com/numpy-discussion at scipy.org/msg09476.html From eadrogue at gmx.net Fri Jan 1 18:42:10 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Sat, 2 Jan 2010 00:42:10 +0100 Subject: [Numpy-discussion] is it safe to change the dtype without rebuilding the array? Message-ID: <20100101234210.GA11243@doriath.local> Hi, I find myself doing this: In [244]: x Out[244]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [245]: y=x.copy() In [251]: y.dtype.char Out[251]: 'l' In [252]: dt=np.dtype([('a','l'),('b','l'),('c','l')]) In [254]: y.dtype=dt Is it okay? The problem is that it's not easy to rebuild the array. I tried with: y.astype(dt) np.array(y, dt) np.array(y.tolist(), dt) None worked. Bye. Ernest From robert.kern at gmail.com Fri Jan 1 18:49:31 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 1 Jan 2010 17:49:31 -0600 Subject: [Numpy-discussion] is it safe to change the dtype without rebuilding the array? In-Reply-To: <20100101234210.GA11243@doriath.local> References: <20100101234210.GA11243@doriath.local> Message-ID: <3d375d731001011549x350d5ba9tc381413eb81b6fc5@mail.gmail.com> 2010/1/1 Ernest Adrogu? : > Hi, > > I find myself doing this: > > In [244]: x > Out[244]: > array([[0, 1, 2], > ? ? ? [3, 4, 5], > ? ? ? [6, 7, 8]]) > > In [245]: y=x.copy() > > In [251]: y.dtype.char > Out[251]: 'l' > > In [252]: dt=np.dtype([('a','l'),('b','l'),('c','l')]) > > In [254]: y.dtype=dt > > Is it okay? > The problem is that it's not easy to rebuild the array. > I tried with: > > y.astype(dt) > np.array(y, dt) > np.array(y.tolist(), dt) > > None worked. y.view(dt) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Fri Jan 1 21:32:00 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Jan 2010 11:32:00 +0900 Subject: [Numpy-discussion] [SciPy-dev] Announcing toydist, improving distribution and packaging situation In-Reply-To: References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <5b8d13220912300816s12c934adh4abdd6d703f8928f@mail.gmail.com> Message-ID: <5b8d13221001011832p5ca4875dj13f04ba7ee61dd41@mail.gmail.com> On Thu, Dec 31, 2009 at 6:06 AM, Darren Dale wrote: > > I should defer to the description of extras in the setuptools > documentation. It is only a few paragraphs long: > > http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional-features-with-their-own-dependencies Ok, so there are two issues related to this feature: - supporting variant at the build stage - supporting different variants of the same package in the dependency graph at install time The first issue is definitely supported - I fixed a bug in toydist to support this correctly, and this will be used when converting setuptools-based setup.py which use the features argument. The second issue is more challenging. It complicates the dependency handling quite a bit, and may cause difficult situations to happen at dependency resolution time. This becomes particularly messy if you mix packages you build yourself with packages grabbed from a repository. I wonder if there is a simpler solution which would give a similar feature set. cheers, David From cournape at gmail.com Sat Jan 2 02:51:38 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Jan 2010 16:51:38 +0900 Subject: [Numpy-discussion] [SciPy-User] Announcing toydist, improving distribution and packaging situation In-Reply-To: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> References: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> Message-ID: <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> On Fri, Jan 1, 2010 at 10:43 PM, Pierre Raybaut wrote: > Hi David, > > Following your announcement for the 'toydist' module, I think that > your project is very promising: this is certainly a great idea and it > will be very controversial but that's because people expectactions are > great on this matter (distutils is so disappointing indeed). > > Anyway, if I may be useful, I'll gladly contribute to it. > In time, I could change the whole Python(x,y) packaging system (which > is currently quite ugly... but easy/quick to manage/maintain) to > use/promote this new module. That would be a good way to test toydist on a real, complex package. I am not familiar at all with python(x,y) internals. Do you have some explanation I could look at somewhere ? In the meantime, I will try to clean-up the code to have a first experimental release. cheers, David From gael.varoquaux at normalesup.org Sat Jan 2 02:58:48 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 2 Jan 2010 08:58:48 +0100 Subject: [Numpy-discussion] [SciPy-dev] Announcing toydist, improving distribution and packaging situation In-Reply-To: <5b8d13221001011832p5ca4875dj13f04ba7ee61dd41@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <5b8d13220912300816s12c934adh4abdd6d703f8928f@mail.gmail.com> <5b8d13221001011832p5ca4875dj13f04ba7ee61dd41@mail.gmail.com> Message-ID: <20100102075848.GA17293@phare.normalesup.org> On Sat, Jan 02, 2010 at 11:32:00AM +0900, David Cournapeau wrote: > [snip] > - supporting different variants of the same package in the > dependency graph at install time > [snip] > The second issue is more challenging. It complicates the dependency > handling quite a bit, and may cause difficult situations to happen at > dependency resolution time. This becomes particularly messy if you mix > packages you build yourself with packages grabbed from a repository. I > wonder if there is a simpler solution which would give a similar > feature set. AFAICT, in Debian, the same feature is given via virtual packages: you would have: python-matplotlib python-matplotlib-basemap for instance. It is interesting to note that the same source package may be used to generate both binary, end-user, packages. And happy new year! Ga?l From cournape at gmail.com Sat Jan 2 05:18:34 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Jan 2010 19:18:34 +0900 Subject: [Numpy-discussion] [SciPy-dev] Announcing toydist, improving distribution and packaging situation In-Reply-To: <20100102075848.GA17293@phare.normalesup.org> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <5b8d13220912300816s12c934adh4abdd6d703f8928f@mail.gmail.com> <5b8d13221001011832p5ca4875dj13f04ba7ee61dd41@mail.gmail.com> <20100102075848.GA17293@phare.normalesup.org> Message-ID: <5b8d13221001020218j32eaff29v2c777c513373a43c@mail.gmail.com> On Sat, Jan 2, 2010 at 4:58 PM, Gael Varoquaux wrote: > On Sat, Jan 02, 2010 at 11:32:00AM +0900, David Cournapeau wrote: >> [snip] >> ? - supporting different variants of the same package in the >> dependency graph at install time > >> [snip] > >> The second issue is more challenging. It complicates the dependency >> handling quite a bit, and may cause difficult situations to happen at >> dependency resolution time. This becomes particularly messy if you mix >> packages you build yourself with packages grabbed from a repository. I >> wonder if there is a simpler solution which would give a similar >> feature set. > > > AFAICT, in Debian, the same feature is given via virtual packages: you > would have: I don't think virtual-packages entirely fix the issue. AFAIK, virtual packages have two uses: - handle dependencies where several packages may resolve one particular dependency in an equivalent way (one good example is LAPACK: both liblapack and libatlas provides the lapack feature) - closer to this discussion, you can build several variants of the same package, and each variant would resolve the dependency on a virtual package handling the commonalities. For example, say we have two numpy packages, one built with lapack (python-numpy-full), the other without (python-numpy-core). What happens when a package foo depends on numpy-full, but numpy-core is installed ? AFAICS, this can only work as long as the set containing every variant can be ordered (in the conventional set ordering sense), and the dependency can be satisfied by the smallest one. cheers, David From manuel.wittchen at gmail.com Sat Jan 2 05:23:13 2010 From: manuel.wittchen at gmail.com (Manuel Wittchen) Date: Sat, 2 Jan 2010 11:23:13 +0100 Subject: [Numpy-discussion] calculating the difference of an array Message-ID: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> Hi, I want to calculate the difference between the values of a numpy-array. The formula is: deltaT = t_(n+1) - t_(n) My approach to calculate the difference looks like this: for i in len(ARRAY): delta_t[i] = ARRAY[(i+1):] - ARRAY[:(len(ARRAY)-1)] print "result:", delta_t But I get a TypeError: File "./test.py", line 19, in for i in len(ARRAY): TypeError: 'int' object is not iterable Where is the mistake in the code? Regards and a happy new year, Manuel Wittchen From cournape at gmail.com Sat Jan 2 05:31:32 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Jan 2010 19:31:32 +0900 Subject: [Numpy-discussion] calculating the difference of an array In-Reply-To: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> References: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> Message-ID: <5b8d13221001020231m10c7b96ch5f62138d49de8953@mail.gmail.com> On Sat, Jan 2, 2010 at 7:23 PM, Manuel Wittchen wrote: > Hi, > > I want to calculate the difference between the values of a > numpy-array. The formula is: > > deltaT = t_(n+1) - t_(n) > > My approach to calculate the difference looks like this: > > for i in len(ARRAY): > ? ? ? ?delta_t[i] = ARRAY[(i+1):] - ARRAY[:(len(ARRAY)-1)] > > print "result:", delta_t > > But I get a TypeError: > File "./test.py", line 19, in > ? ?for i in len(ARRAY): > TypeError: 'int' object is not iterable > > Where is the mistake in the code? There are several mistakes :) Assuming ARRAY is a numpy array, len(ARRAY) will return an int. You would have the same error if ARRAY was any sequence: you should iterate over range(len(ARRAY)). Your formula within the loop is not very clear, and does not seem to match your formula. ARRAY[i+1:] gives you all the items ARRAY[i+1] until the end, ARRAY[:len(ARRAY)-1] gives you every item ARRAY[j] for 0 <= j < len(ARRAY)-1, that is the whole array. I think you want: for i in range(len(ARRAY)-1): delta_i[i] = ARRAY[i+1] - ARRAY[i] Also, using numpy efficiently requires to use vectorization, so actually: delta_t = ARRAY[1:] - ARRAY[:-1] gives you a more efficient version. But really, you should use diff, which implements what you want: import numpy as np delta_t = np.diff(ARRAY) cheers, David From emmanuelle.gouillart at normalesup.org Sat Jan 2 05:34:47 2010 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Sat, 2 Jan 2010 11:34:47 +0100 Subject: [Numpy-discussion] calculating the difference of an array In-Reply-To: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> References: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> Message-ID: <20100102103447.GA30365@phare.normalesup.org> Hello Manuel, the discrete difference of a numpy array can be written in a very natural way, without loops. Below are two possible ways to do it: >>> a = np.arange(10)**2 >>> a array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]) >>> a[1:] - a[:-1] array([ 1, 3, 5, 7, 9, 11, 13, 15, 17]) >>> np.diff(a) # another way to calculate the difference array([ 1, 3, 5, 7, 9, 11, 13, 15, 17]) The error in the example you give is due to the fact that you iterate over len(ARRAY), which is an integer, hence not an iterable object. You should write ``for i in range(len(ARRAY))`` instead. Cheers, Emmanuelle On Sat, Jan 02, 2010 at 11:23:13AM +0100, Manuel Wittchen wrote: > Hi, > I want to calculate the difference between the values of a > numpy-array. The formula is: > deltaT = t_(n+1) - t_(n) > My approach to calculate the difference looks like this: > for i in len(ARRAY): > delta_t[i] = ARRAY[(i+1):] - ARRAY[:(len(ARRAY)-1)] > print "result:", delta_t > But I get a TypeError: > File "./test.py", line 19, in > for i in len(ARRAY): > TypeError: 'int' object is not iterable > Where is the mistake in the code? > Regards and a happy new year, > Manuel Wittchen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From contact at pythonxy.com Sat Jan 2 05:40:16 2010 From: contact at pythonxy.com (Pierre Raybaut) Date: Sat, 2 Jan 2010 11:40:16 +0100 Subject: [Numpy-discussion] [SPAM] Re: [SciPy-User] Announcing toydist, improving distribution and packaging situation In-Reply-To: <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> References: <629b08a41001010543r193acb2bk3290b6458f97c596@mail.gmail.com> <5b8d13221001012351w4feda89bj13e67d102318076d@mail.gmail.com> Message-ID: <629b08a41001020240y642518f4r68f4a6a3860a3eee@mail.gmail.com> 2010/1/2 David Cournapeau : > On Fri, Jan 1, 2010 at 10:43 PM, Pierre Raybaut wrote: >> Hi David, >> >> Following your announcement for the 'toydist' module, I think that >> your project is very promising: this is certainly a great idea and it >> will be very controversial but that's because people expectactions are >> great on this matter (distutils is so disappointing indeed). >> >> Anyway, if I may be useful, I'll gladly contribute to it. >> In time, I could change the whole Python(x,y) packaging system (which >> is currently quite ugly... but easy/quick to manage/maintain) to >> use/promote this new module. > > That would be a good way to test toydist on a real, complex package. I > am not familiar at all with python(x,y) internals. Do you have some > explanation I could look at somewhere ? Honestly, let's assume that there is currently no packaging system... it would not be very far from the truth. I did it when I was young and naive regarding Python. Actually I almost did it without having writing any code in Python (approx. two months after earing about the Python language for the first time) : it's an ugly collection of AutoIt, NSIS and PHP scripts -- most of the tasks are automated like updating the generated website pages and so on. So I'm not proud at all, but it was easy and very quick to do as it is, and it's still quite easy to maintain. But, it's not satisfying in terms of code "purity" -- I've been wanting to rewrite all this in Python for a year and a half but since the features are there, there is no real motivation to do the work (in other words, Python(x,y) users would not see the difference, at least at the beginning). An other thing: Python(x,y) plugins are not built from source but from existing binaries (it's a pity I know, but it was incredibly faster to do this way). For example, eggs or distutils .exe may be converted in Python(x,y) plugins directly (same internal directory structure). So it may be different from the idea you had in mind (it's not like EPD which is entirely generated from source, AFAIK). > In the meantime, I will try to clean-up the code to have a first > experimental release. > Ok, keep up the good work! Cheers, Pierre From manuel.wittchen at gmail.com Sat Jan 2 06:03:23 2010 From: manuel.wittchen at gmail.com (Manuel Wittchen) Date: Sat, 2 Jan 2010 12:03:23 +0100 Subject: [Numpy-discussion] calculating the difference of an array In-Reply-To: <20100102103447.GA30365@phare.normalesup.org> References: <209cec441001020223v2260d5c6lf92491b89fc543da@mail.gmail.com> <20100102103447.GA30365@phare.normalesup.org> Message-ID: <209cec441001020303l45741e9cseebe8bddd6f078e1@mail.gmail.com> Hi, Thanks for your help. I tried np.diff() before, but the result looked like this: RESULT = [1, 1, 1, 1] So I was thinking that np.diff() doesn't iterate over the values of the array. So I gave the for-loop a try. Now, seeing your code below, I realized that my mistake was that I used ARRAY = [0, 1, 2, 3, 4, 5] for the calculations... Stupid me. 2010/1/2 Emmanuelle Gouillart : > Hello Manuel, > > the discrete difference of a numpy array can be written in a very > natural way, without loops. Below are two possible ways to do it: >>>> a = np.arange(10)**2 >>>> a > array([ 0, ?1, ?4, ?9, 16, 25, 36, 49, 64, 81]) >>>> a[1:] - a[:-1] > array([ 1, ?3, ?5, ?7, ?9, 11, 13, 15, 17]) >>>> np.diff(a) # another way to calculate the difference > array([ 1, ?3, ?5, ?7, ?9, 11, 13, 15, 17]) > > The error in the example you give is due to the fact that you iterate > over len(ARRAY), which is an integer, hence not an iterable object. You > should write ``for i in range(len(ARRAY))`` instead. > > Cheers, > > Emmanuelle > > On Sat, Jan 02, 2010 at 11:23:13AM +0100, Manuel Wittchen wrote: >> Hi, > >> I want to calculate the difference between the values of a >> numpy-array. The formula is: > >> deltaT = t_(n+1) - t_(n) > >> My approach to calculate the difference looks like this: > >> for i in len(ARRAY): >> ? ? ? delta_t[i] = ARRAY[(i+1):] - ARRAY[:(len(ARRAY)-1)] > >> print "result:", delta_t > >> But I get a TypeError: >> File "./test.py", line 19, in >> ? ? for i in len(ARRAY): >> TypeError: 'int' object is not iterable > >> Where is the mistake in the code? > >> Regards and a happy new year, >> Manuel Wittchen >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From tpk at kraussfamily.org Sat Jan 2 15:41:29 2010 From: tpk at kraussfamily.org (Tom K.) Date: Sat, 2 Jan 2010 12:41:29 -0800 (PST) Subject: [Numpy-discussion] ANN: upfirdn 0.2.0 Message-ID: <26996309.post@talk.nabble.com> ANNOUNCEMENT I am pleased to announce a new release of "upfirdn" - version 0.2.0. This package provides an efficient polyphase FIR resampler object (SWIG-ed C++) and some python wrappers. This release greatly improves installation with distutils relative to the initial 0.1.0 release. 0.2.0 includes no functional changes relative to 0.1.0. Also, the source code is now browse-able online through a Google Code site with mercurial repository. https://opensource.motorola.com/sf/projects/upfirdn http://code.google.com/p/upfirdn/ Thanks to Google for providing this hosting service! -- View this message in context: http://old.nabble.com/ANN%3A-upfirdn-0.2.0-tp26996309p26996309.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From njs at pobox.com Sun Jan 3 06:05:54 2010 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 3 Jan 2010 03:05:54 -0800 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> Message-ID: <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau wrote: > Buildout, virtualenv all work by sandboxing from the system python: > each of them do not see each other, which may be useful for > development, but as a deployment solution to the casual user who may > not be familiar with python, it is useless. A scientist who installs > numpy, scipy, etc... to try things out want to have everything > available in one python interpreter, and does not want to jump to > different virtualenvs and whatnot to try different packages. What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc. This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution. On another note, I hope toydist will provide a "source prepare" step, that allows arbitrary code to be run on the source tree. (For, e.g., cython->C conversion, ad-hoc template languages, etc.) IME this is a very common pain point with distutils; there is just no good way to do it, and it has to be supported in the distribution utility in order to get everything right. In particular: -- Generated files should never be written to the source tree itself, but only the build directory -- Building from a source checkout should run the "source prepare" step automatically -- Building a source distribution should also run the "source prepare" step, and stash the results in such a way that when later building the source distribution, this step can be skipped. This is a common requirement for user convenience, and necessary if you want to avoid arbitrary code execution during builds. And if you just set up the distribution util so that the only place you can specify arbitrary code execution is in the "source prepare" step, then even people who know nothing about packaging will automatically get all of the above right. Cheers, -- Nathaniel From gael.varoquaux at normalesup.org Sun Jan 3 06:11:53 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 3 Jan 2010 12:11:53 +0100 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> Message-ID: <20100103111153.GB24770@phare.normalesup.org> On Sun, Jan 03, 2010 at 03:05:54AM -0800, Nathaniel Smith wrote: > What I do -- and documented for people in my lab to do -- is set up > one virtualenv in my user account, and use it as my default python. (I > 'activate' it from my login scripts.) The advantage of this is that > easy_install (or pip) just works, without any hassle about permissions > etc. This should be easier, but I think the basic approach is sound. > "Integration with the package system" is useless; the advantage of > distribution packages is that distributions can provide a single > coherent system with consistent version numbers across all packages, > etc., and the only way to "integrate" with that is to, well, get the > packages into the distribution. That works because either you use packages that don't have much hard-core compiled dependencies, or these are already installed. Think about installing VTK or ITK this way, even something simpler such as umfpack. I think that you would loose most of your users. In my lab, I do lose users on such packages actually. Beside, what you are describing is possible without package isolation, it is simply the use of a per-user local site-packages, which now semi automatic in python2.6 using the '.local' directory. I do agree that, in a research lab, this is a best practice. Ga?l From cournape at gmail.com Sun Jan 3 06:24:59 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 3 Jan 2010 20:24:59 +0900 Subject: [Numpy-discussion] [matplotlib-devel] [SciPy-dev] Announcing toydist, improving distribution and packaging situation In-Reply-To: <4B3F8FF6.7010800@astraw.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <5b8d13220912300816s12c934adh4abdd6d703f8928f@mail.gmail.com> <5b8d13221001011832p5ca4875dj13f04ba7ee61dd41@mail.gmail.com> <20100102075848.GA17293@phare.normalesup.org> <5b8d13221001020218j32eaff29v2c777c513373a43c@mail.gmail.com> <4B3F8FF6.7010800@astraw.com> Message-ID: <5b8d13221001030324i67c3125dh8efa68c04a539659@mail.gmail.com> On Sun, Jan 3, 2010 at 3:27 AM, Andrew Straw wrote: >> > Typically, the dependencies only depend on the smallest subset of what > they require (if they don't need lapack, they'd only depend on > python-numpy-core in your example), but yes, if there's an unsatisfiable > condition, then apt-get will raise an error and abort. In practice, this > system seems to work quite well, IMO. Yes, but: - debian dependency resolution is complex. I think many people don't realize how complex the problem really is (AFAIK, any correct scheme to resolve dependencies in debian requires an algorithm which is NP-complete ) - introducing a lot of variants significantly slow down the whole thing. I think it worths thinking whether our problems warrant such a complexity. > > Anyhow, here's the full Debian documentation: > http://www.debian.org/doc/debian-policy/ch-relationships.html This is not the part I am afraid of. This is: http://people.debian.org/~dburrows/model.pdf cheers, David From cournape at gmail.com Sun Jan 3 07:23:14 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 3 Jan 2010 21:23:14 +0900 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> Message-ID: <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith wrote: > On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau wrote: >> Buildout, virtualenv all work by sandboxing from the system python: >> each of them do not see each other, which may be useful for >> development, but as a deployment solution to the casual user who may >> not be familiar with python, it is useless. A scientist who installs >> numpy, scipy, etc... to try things out want to have everything >> available in one python interpreter, and does not want to jump to >> different virtualenvs and whatnot to try different packages. > > What I do -- and documented for people in my lab to do -- is set up > one virtualenv in my user account, and use it as my default python. (I > 'activate' it from my login scripts.) The advantage of this is that > easy_install (or pip) just works, without any hassle about permissions > etc. It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target. No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions. > This should be easier, but I think the basic approach is sound. > "Integration with the package system" is useless; the advantage of > distribution packages is that distributions can provide a single > coherent system with consistent version numbers across all packages, > etc., and the only way to "integrate" with that is to, well, get the > packages into the distribution. Another way is to provide our own repository for a few major distributions, with automatically built packages. This is how most open source providers work. Miguel de Icaza explains this well: http://tirania.org/blog/archive/2007/Jan-26.html I hope we will be able to reuse much of the opensuse build service infrastructure. > > On another note, I hope toydist will provide a "source prepare" step, > that allows arbitrary code to be run on the source tree. (For, e.g., > cython->C conversion, ad-hoc template languages, etc.) IME this is a > very common pain point with distutils; there is just no good way to do > it, and it has to be supported in the distribution utility in order to > get everything right. In particular: > ?-- Generated files should never be written to the source tree > itself, but only the build directory > ?-- Building from a source checkout should run the "source prepare" > step automatically > ?-- Building a source distribution should also run the "source > prepare" step, and stash the results in such a way that when later > building the source distribution, this step can be skipped. This is a > common requirement for user convenience, and necessary if you want to > avoid arbitrary code execution during builds. Build directories are hard to implement right. I don't think toydist will support this directly. IMO, those advanced builds warrant a real build tool - one main goal of toydist is to make integration with waf or scons much easier. Both waf and scons have the concept of a build directory, which should do everything you described. David From njs at pobox.com Sun Jan 3 18:42:32 2010 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 3 Jan 2010 15:42:32 -0800 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> Message-ID: <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau wrote: > On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith wrote: >> What I do -- and documented for people in my lab to do -- is set up >> one virtualenv in my user account, and use it as my default python. (I >> 'activate' it from my login scripts.) The advantage of this is that >> easy_install (or pip) just works, without any hassle about permissions >> etc. > > It just works if you happen to be able to build everything from > sources. That alone means you ignore the majority of users I intend to > target. > > No other community (except maybe Ruby) push those isolated install > solutions as a general deployment solutions. If it were such a great > idea, other people would have picked up those solutions. AFAICT, R works more-or-less identically (once I convinced it to use a per-user library directory); install.packages() builds from source, and doesn't automatically pull in and build random C library dependencies. I'm not advocating the 'every app in its own world' model that virtualenv's designers had min mind, but virtualenv is very useful to give each user their own world. Normally I only use a fraction of virtualenv's power this way, but sometimes it's handy that they've solved the more general problem -- I can easily move my environment out of the way and rebuild if I've done something stupid, or experiment with new python versions in isolation, or whatever. And when you *do* have to reproduce some old environment -- if only to test that the new improved environment gives the same results -- then it's *really* handy. >> This should be easier, but I think the basic approach is sound. >> "Integration with the package system" is useless; the advantage of >> distribution packages is that distributions can provide a single >> coherent system with consistent version numbers across all packages, >> etc., and the only way to "integrate" with that is to, well, get the >> packages into the distribution. > > Another way is to provide our own repository for a few major > distributions, with automatically built packages. This is how most > open source providers work. Miguel de Icaza explains this well: > > http://tirania.org/blog/archive/2007/Jan-26.html > > I hope we will be able to reuse much of the opensuse build service > infrastructure. Sure, I'm aware of the opensuse build service, have built third-party packages for my projects, etc. It's a good attempt, but also has a lot of problems, and when talking about scientific software it's totally useless to me :-). First, I don't have root on our compute cluster. Second, even if I did I'd be very leery about installing third-party packages because there is no guarantee that the version numbering will be consistent between the third-party repo and the real distro repo -- suppose that the distro packages 0.1, then the third party packages 0.2, then the distro packages 0.3, will upgrades be seamless? What if the third party screws up the version numbering at some point? Debian has "epochs" to deal with this, but third-parties can't use them and maintain compatibility. What if the person making the third party packages is not an expert on these random distros that they don't even use? Will bug reporting tools work properly? Distros are complicated. Third, while we shouldn't advocate that people screw up backwards compatibility, version skew is a real issue. If I need one version of a package and my lab-mate needs another and we have submissions due tomorrow, then filing bugs is a great idea but not a solution. Fourth, even if we had expert maintainers taking care of all these third-party packages and all my concerns were answered, I couldn't convince our sysadmin of that; he's the one who'd have to clean up if something went wrong we don't have a big budget for overtime. Let's be honest -- scientists, on the whole, suck at IT infrastructure, and small individual packages are not going to be very expertly put together. IMHO any real solution should take this into account, keep them sandboxed from the rest of the system, and focus on providing the most friendly and seamless sandbox possible. >> On another note, I hope toydist will provide a "source prepare" step, >> that allows arbitrary code to be run on the source tree. (For, e.g., >> cython->C conversion, ad-hoc template languages, etc.) IME this is a >> very common pain point with distutils; there is just no good way to do >> it, and it has to be supported in the distribution utility in order to >> get everything right. In particular: >> ?-- Generated files should never be written to the source tree >> itself, but only the build directory >> ?-- Building from a source checkout should run the "source prepare" >> step automatically >> ?-- Building a source distribution should also run the "source >> prepare" step, and stash the results in such a way that when later >> building the source distribution, this step can be skipped. This is a >> common requirement for user convenience, and necessary if you want to >> avoid arbitrary code execution during builds. > > Build directories are hard to implement right. I don't think toydist > will support this directly. IMO, those advanced builds warrant a real > build tool - one main goal of toydist is to make integration with waf > or scons much easier. Both waf and scons have the concept of a build > directory, which should do everything you described. Maybe I was unclear -- proper build directory handling is nice, Cython/Pyrex's distutils integration get it wrong (not their fault, distutils is just impossible to do anything sensible with, as you've said), and I've never found build directories hard to implement (perhaps I'm missing something). But what I'm really talking about is having a "pre-build" step that integrates properly with the source and binary packaging stages, and that's not something waf or scons have any particular support for, AFAIK. -- Nathaniel From robert.kern at gmail.com Sun Jan 3 18:52:04 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 3 Jan 2010 17:52:04 -0600 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> Message-ID: <3d375d731001031552u6783266t9b035ece83c14927@mail.gmail.com> On Sun, Jan 3, 2010 at 17:42, Nathaniel Smith wrote: > On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau wrote: >> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith wrote: >>> What I do -- and documented for people in my lab to do -- is set up >>> one virtualenv in my user account, and use it as my default python. (I >>> 'activate' it from my login scripts.) The advantage of this is that >>> easy_install (or pip) just works, without any hassle about permissions >>> etc. >> >> It just works if you happen to be able to build everything from >> sources. That alone means you ignore the majority of users I intend to >> target. >> >> No other community (except maybe Ruby) push those isolated install >> solutions as a general deployment solutions. If it were such a great >> idea, other people would have picked up those solutions. > > AFAICT, R works more-or-less identically (once I convinced it to use a > per-user library directory); install.packages() builds from source, > and doesn't automatically pull in and build random C library > dependencies. That's not quite the same. That is the R equivalent of Python's recent per-user site-packages feature (every user get's their own sandbox), not virtualenv (every project gets it's own sandbox). The former feature has a long history in the multiuser UNIX world and is not really controversial. http://www.python.org/dev/peps/pep-0370/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Mon Jan 4 02:25:44 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 4 Jan 2010 16:25:44 +0900 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> Message-ID: <5b8d13221001032325o250e4d5ao5f476b384ad6dd17@mail.gmail.com> On Mon, Jan 4, 2010 at 8:42 AM, Nathaniel Smith wrote: > On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau wrote: >> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith wrote: >>> What I do -- and documented for people in my lab to do -- is set up >>> one virtualenv in my user account, and use it as my default python. (I >>> 'activate' it from my login scripts.) The advantage of this is that >>> easy_install (or pip) just works, without any hassle about permissions >>> etc. >> >> It just works if you happen to be able to build everything from >> sources. That alone means you ignore the majority of users I intend to >> target. >> >> No other community (except maybe Ruby) push those isolated install >> solutions as a general deployment solutions. If it were such a great >> idea, other people would have picked up those solutions. > > AFAICT, R works more-or-less identically (once I convinced it to use a > per-user library directory); install.packages() builds from source, > and doesn't automatically pull in and build random C library > dependencies. As mentioned by Robert, this is different from the usual virtualenv approach. Per-user app installation is certainly a useful (and uncontroversial) feature. And R does support automatically-built binary installers. > > Sure, I'm aware of the opensuse build service, have built third-party > packages for my projects, etc. It's a good attempt, but also has a lot > of problems, and when talking about scientific software it's totally > useless to me :-). First, I don't have root on our compute cluster. True, non-root install is a problem. Nothing *prevents* dpkg to run in non root environment in principle if the packages itself does not require it, but it is not really supported by the tools ATM. > Second, even if I did I'd be very leery about installing third-party > packages because there is no guarantee that the version numbering will > be consistent between the third-party repo and the real distro repo -- > suppose that the distro packages 0.1, then the third party packages > 0.2, then the distro packages 0.3, will upgrades be seamless? What if > the third party screws up the version numbering at some point? Debian > has "epochs" to deal with this, but third-parties can't use them and > maintain compatibility. Actually, at least with .deb-based distributions, this issue has a solution. As packages has their own version in addition to the upstream version, PPA-built packages have their own versions. https://help.launchpad.net/Packaging/PPA/BuildingASourcePackage Of course, this assumes a simple versioning scheme in the first place, instead of the cluster-fck that versioning has became within python packages (again, the scheme used in python is much more complicated than everyone else, and it seems that nobody has ever stopped and thought 5 minutes about the consequences, and whether this complexity was a good idea in the first place). > What if the person making the third party > packages is not an expert on these random distros that they don't even > use? I think simple rules/conventions + build farms would solve most issues. The problem is if you allow total flexibility as input, then automatic and simple solutions become impossible. Certainly, PPA and the build service provides for a much better experience than anything pypi has ever given to me. > Third, while we shouldn't advocate that people screw up backwards > compatibility, version skew is a real issue. If I need one version of > a package and my lab-mate needs another and we have submissions due > tomorrow, then filing bugs is a great idea but not a solution. Nothing prevents you from using virtualenv in that case (I may sound dismissive of those tools, but I am really not. I use them myselves. What I strongly react to is when those are pushed as the de-facto, standard method). > Fourth, > even if we had expert maintainers taking care of all these third-party > packages and all my concerns were answered, I couldn't convince our > sysadmin of that; he's the one who'd have to clean up if something > went wrong we don't have a big budget for overtime. I am not advocating using only packaged, binary installers. I am advocating using them as much as possible where it makes sense - on windows and mac os x in particular. Toydist also aims at making it easier to build, customize installs. Although not yet implemented, --user-like scheme would be quite simple to implement, because toydist installer internally uses autoconf-like directories description (of which --user is a special case). If you need sandboxed installs, customized installs, toydist will not prevent it. It is certainly my intention to make it possible to use virtualenv and co (you already can by building eggs, actually). I hope that by having our own "SciPi", we can actually have a more reliable approach. For example, the static dependency description + mandated metadata would make this much easier and more robust, as there would not be a need to run a setup.py to get the dependencies. If you look at hackageDB (http://hackage.haskell.org/packages/hackage.html), they have a very simple index structure, which makes it easy to download it entirely, and reuse this locally to avoid any internet access. > Let's be honest -- scientists, on the whole, suck at IT > infrastructure, and small individual packages are not going to be very > expertly put together. IMHO any real solution should take this into > account, keep them sandboxed from the rest of the system, and focus on > providing the most friendly and seamless sandbox possible. I agree packages will not always be well put together - but I don't see why this would be worse than the current situation. I also strongly disagree about the sandboxing as the solution of choice. For most users, having only one install of most packages is the typical use-case. Once you start sandboxing, you create artificial barriers between the sandboxes, and this becomes too complicated for most users IMHO. > > Maybe I was unclear -- proper build directory handling is nice, > Cython/Pyrex's distutils integration get it wrong (not their fault, > distutils is just impossible to do anything sensible with, as you've > said), and I've never found build directories hard to implement It is simple if you have a good infrastructure in place (node abstraction, etc...), but that infrastructure is hard to get right. > But what I'm really talking about is > having a "pre-build" step that integrates properly with the source and > binary packaging stages, and that's not something waf or scons have > any particular support for, AFAIK. Could you explain with a concrete example what a pre-build stage would look like ? I don't think I understand what you want, cheers, David From dagss at student.matnat.uio.no Mon Jan 4 03:48:43 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 4 Jan 2010 09:48:43 +0100 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> Message-ID: <89f11d872447395b3271dbea1fd5cd9d.squirrel@webmail.uio.no> Nathaniel Smith wrote: > On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau > wrote: >> Another way is to provide our own repository for a few major >> distributions, with automatically built packages. This is how most >> open source providers work. Miguel de Icaza explains this well: >> >> http://tirania.org/blog/archive/2007/Jan-26.html >> >> I hope we will be able to reuse much of the opensuse build service >> infrastructure. > > Sure, I'm aware of the opensuse build service, have built third-party > packages for my projects, etc. It's a good attempt, but also has a lot > of problems, and when talking about scientific software it's totally > useless to me :-). First, I don't have root on our compute cluster. I use Sage for this very reason, and others use EPD or FEMHub or Python(x,y) for the same reasons. Rolling this into the Python package distribution scheme seems backwards though, since a lot of binary packages that have nothing to do with Python are used as well -- the Python stuff is simply thin wrappers around what should ideally be located in /usr/lib or similar (but are nowadays compiled into the Python extension .so because of distribution problems). To solve the exact problem you (and me) have I think the best solution is to integrate the tools mentioned above with what David is planning (SciPI etc.). Or if that isn't good enough, find generic "userland package manager" that has nothing to do with Python (I'm sure a dozen half-finished ones must have been written but didn't look), finish it, and connect it to SciPI. What David does (I think) is seperate the concerns. This makes the task feasible, and also has the advantage of convenience for the people that *do* want to use Ubuntu, Red Hat or whatever to roll out scientific software on hundreds of clients. Dag Sverre From cournape at gmail.com Mon Jan 4 04:11:13 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 4 Jan 2010 18:11:13 +0900 Subject: [Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation In-Reply-To: <89f11d872447395b3271dbea1fd5cd9d.squirrel@webmail.uio.no> References: <5b8d13220912280603p7221a264o875b0d5e74a5404@mail.gmail.com> <64ddb72c0912290527s1143efc7g3efe93936ca5de5@mail.gmail.com> <5b8d13220912290634u5902a6bag33ddb8a15a93406b@mail.gmail.com> <961fa2b41001030305mddd301fp416a2fe23fc11568@mail.gmail.com> <5b8d13221001030423j96fdb72l832964f6c5df7f97@mail.gmail.com> <961fa2b41001031542t203e8ef6mee8f590095e54d18@mail.gmail.com> <89f11d872447395b3271dbea1fd5cd9d.squirrel@webmail.uio.no> Message-ID: <5b8d13221001040111x29cd463al7f9559ff93655508@mail.gmail.com> On Mon, Jan 4, 2010 at 5:48 PM, Dag Sverre Seljebotn wrote: > > Rolling this into the Python package distribution scheme seems backwards > though, since a lot of binary packages that have nothing to do with Python > are used as well Yep, exactly. > > To solve the exact problem you (and me) have I think the best solution is > to integrate the tools mentioned above with what David is planning (SciPI > etc.). Or if that isn't good enough, find generic "userland package > manager" that has nothing to do with Python (I'm sure a dozen > half-finished ones must have been written but didn't look), finish it, and > connect it to SciPI. You have 0install, autopackage and klik, to cite the ones I know about. I wish people had looked at those before rolling toy solutions to complex problems. > > What David does (I think) is seperate the concerns. Exactly - you've describe this better than I did David From Chris.Barker at noaa.gov Mon Jan 4 20:05:30 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 Jan 2010 17:05:30 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) Message-ID: <4B42905A.4080105@noaa.gov> Hi folks, I'm taking a look once again at fromfile() for reading text files. I often have the need to read a LOT of numbers form a text file, and it can actually be pretty darn slow do i the normal python way: for line in file: data = map(float, line.strip().split()) or various other versions that are similar. It really does take longer to read the text, split it up, convert to a number, then put that number into a numpy array, than it does to simply read it straight into the array. However, as it stands, fromfile() turn out to be next to useless for anything but whitespace separated text. Full set of ideas here: http://projects.scipy.org/numpy/ticket/909 However, for the moment, I'm digging into the code to address a particular problem -- reading files like this: 123, 65.6, 789 23, 3.2, 34 ... That is comma (or whatever) separated text -- pretty common stuff. The problem with the current code is that you can't read more than one line at time with fromfile: a = np.fromfile(infile, sep=",") will read until it doesn't find a comma, and thus only one line, as there is no comma after each line. As this is a really typical case, I think it should be supported. Here is the question: The work of finding the separator is done in: multiarray/ctors.c: fromfile_skip_separator() It looks like it wouldn't be too hard to add some code in there to look for a newline, and consider that a valid separator. However, that would break backward compatibility. So maybe a flag could be passed in, saying you wanted to support newlines. The problem is that flag would have to get passed all the way through to this function (and also for fromstring). I also notice that it supports separators of arbitrary length, which I wonder how useful that is. But it also does odd things with spaces embedded in the separator: ", $ #" matches all of: ",$#" ", $#" ",$ #" Is it worth trying to fix that? In the longer term, it would be really nice to support comments as well, tough that would require more of a re-factoring of the code, I think (though maybe not -- I suppose a call to fromfile_skip_separator() could look for a comment character, then if it found one, skip to where the comment ends -- hmmm. thanks for any feedback, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From alan at ajackson.org Mon Jan 4 22:39:42 2010 From: alan at ajackson.org (alan at ajackson.org) Date: Mon, 4 Jan 2010 21:39:42 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B42905A.4080105@noaa.gov> References: <4B42905A.4080105@noaa.gov> Message-ID: <20100104213942.588435c2@ajackson.org> >Hi folks, > >I'm taking a look once again at fromfile() for reading text files. I >often have the need to read a LOT of numbers form a text file, and it >can actually be pretty darn slow do i the normal python way: > >for line in file: > data = map(float, line.strip().split()) > > >or various other versions that are similar. It really does take longer >to read the text, split it up, convert to a number, then put that number >into a numpy array, than it does to simply read it straight into the array. > >However, as it stands, fromfile() turn out to be next to useless for >anything but whitespace separated text. Full set of ideas here: > >http://projects.scipy.org/numpy/ticket/909 > >However, for the moment, I'm digging into the code to address a >particular problem -- reading files like this: > >123, 65.6, 789 >23, 3.2, 34 >... > >That is comma (or whatever) separated text -- pretty common stuff. > >The problem with the current code is that you can't read more than one >line at time with fromfile: > >a = np.fromfile(infile, sep=",") > >will read until it doesn't find a comma, and thus only one line, as >there is no comma after each line. As this is a really typical case, I >think it should be supported. > >Here is the question: > >The work of finding the separator is done in: > >multiarray/ctors.c: fromfile_skip_separator() > >It looks like it wouldn't be too hard to add some code in there to look >for a newline, and consider that a valid separator. However, that would >break backward compatibility. So maybe a flag could be passed in, saying >you wanted to support newlines. The problem is that flag would have to >get passed all the way through to this function (and also for fromstring). > >I also notice that it supports separators of arbitrary length, which I >wonder how useful that is. But it also does odd things with spaces >embedded in the separator: > >", $ #" matches all of: ",$#" ", $#" ",$ #" > >Is it worth trying to fix that? > > >In the longer term, it would be really nice to support comments as well, >tough that would require more of a re-factoring of the code, I think >(though maybe not -- I suppose a call to fromfile_skip_separator() could >look for a comment character, then if it found one, skip to where the >comment ends -- hmmm. > >thanks for any feedback, > >-Chris > I agree. I've tried using it, and usually find that it doesn't quite get there. I rather like the R command(s) for reading text files - except then I have to use R which is painful after using python and numpy. Although ggplot2 is awfully nice too ... but that is a later post. read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", encoding = "unknown") read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="", ...) read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",", fill = TRUE, comment.char="", ...) read.delim(file, header = TRUE, sep = "\t", quote="\"", dec=".", fill = TRUE, comment.char="", ...) read.delim2(file, header = TRUE, sep = "\t", quote="\"", dec=",", fill = TRUE, comment.char="", ...) There is really only read.table, the others are just aliases with different defaults. But the flexibility is great, as you can see. -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From josef.pktd at gmail.com Mon Jan 4 22:45:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 4 Jan 2010 22:45:04 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <20100104213942.588435c2@ajackson.org> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> Message-ID: <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> On Mon, Jan 4, 2010 at 10:39 PM, wrote: >>Hi folks, >> >>I'm taking a look once again at fromfile() for reading text files. I >>often have the need to read a LOT of numbers form a text file, and it >>can actually be pretty darn slow do i the normal python way: >> >>for line in file: >> ? ?data = map(float, line.strip().split()) >> >> >>or various other versions that are similar. It really does take longer >>to read the text, split it up, convert to a number, then put that number >>into a numpy array, than it does to simply read it straight into the array. >> >>However, as it stands, fromfile() turn out to be next to useless for >>anything but whitespace separated text. Full set of ideas here: >> >>http://projects.scipy.org/numpy/ticket/909 >> >>However, for the moment, I'm digging into the code to address a >>particular problem -- reading files like this: >> >>123, 65.6, 789 >>23, ?3.2, ?34 >>... >> >>That is comma (or whatever) separated text -- pretty common stuff. >> >>The problem with the current code is that you can't read more than one >>line at time with fromfile: >> >>a = np.fromfile(infile, sep=",") >> >>will read until it doesn't find a comma, and thus only one line, as >>there is no comma after each line. As this is a really typical case, I >>think it should be supported. >> >>Here is the question: >> >>The work of finding the separator is done in: >> >>multiarray/ctors.c: ?fromfile_skip_separator() >> >>It looks like it wouldn't be too hard to add some code in there to look >>for a newline, and consider that a valid separator. However, that would >>break backward compatibility. So maybe a flag could be passed in, saying >>you wanted to support newlines. The problem is that flag would have to >>get passed all the way through to this function (and also for fromstring). >> >>I also notice that it supports separators of arbitrary length, which I >>wonder how useful that is. But it also does odd things with spaces >>embedded in the separator: >> >>", $ #" matches all of: ?",$#" ? ", $#" ?",$ #" >> >>Is it worth trying to fix that? >> >> >>In the longer term, it would be really nice to support comments as well, >>tough that would require more of a re-factoring of the code, I think >>(though maybe not -- I suppose a call to fromfile_skip_separator() could >>look for a comment character, then if it found one, skip to where the >>comment ends -- hmmm. >> >>thanks for any feedback, >> >>-Chris >> > > I agree. I've tried using it, and usually find that it doesn't quite get there. > > I rather like the R command(s) for reading text files - except then I have to > use R which is painful after using python and numpy. Although ggplot2 is > awfully nice too ... but that is a later post. > > ? ? read.table(file, header = FALSE, sep = "", quote = "\"'", > ? ? ? ? ? ? ? ?dec = ".", row.names, col.names, > ? ? ? ? ? ? ? ?as.is = !stringsAsFactors, > ? ? ? ? ? ? ? ?na.strings = "NA", colClasses = NA, nrows = -1, > ? ? ? ? ? ? ? ?skip = 0, check.names = TRUE, fill = !blank.lines.skip, > ? ? ? ? ? ? ? ?strip.white = FALSE, blank.lines.skip = TRUE, > ? ? ? ? ? ? ? ?comment.char = "#", > ? ? ? ? ? ? ? ?allowEscapes = FALSE, flush = FALSE, > ? ? ? ? ? ? ? ?stringsAsFactors = default.stringsAsFactors(), > ? ? ? ? ? ? ? ?fileEncoding = "", encoding = "unknown") > > ? ? read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".", > ? ? ? ? ? ? ?fill = TRUE, comment.char="", ...) > > ? ? read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",", > ? ? ? ? ? ? ? fill = TRUE, comment.char="", ...) > > ? ? read.delim(file, header = TRUE, sep = "\t", quote="\"", dec=".", > ? ? ? ? ? ? ? ?fill = TRUE, comment.char="", ...) > > ? ? read.delim2(file, header = TRUE, sep = "\t", quote="\"", dec=",", > ? ? ? ? ? ? ? ? fill = TRUE, comment.char="", ...) > > > There is really only read.table, the others are just aliases with different > defaults. ?But the flexibility is great, as you can see. Aren't the newly improved numpy.genfromtxt(fname, dtype=, comments='#', delimiter=None, skiprows=0, converters=None, missing='', missing_values=None, usecols=None, names=None, excludelist=None, deletechars=None, case_sensitive=True, unpack=None, usemask=False, loose=True) and friends indented to handle all this Josef > > -- > ----------------------------------------------------------------------- > | Alan K. Jackson ? ? ? ? ? ?| To see a World in a Grain of Sand ? ? ?| > | alan at ajackson.org ? ? ? ? ?| And a Heaven in a Wild Flower, ? ? ? ? | > | www.ajackson.org ? ? ? ? ? | Hold Infinity in the palm of your hand | > | Houston, Texas ? ? ? ? ? ? | And Eternity in an hour. - Blake ? ? ? | > ----------------------------------------------------------------------- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pivanov314 at gmail.com Tue Jan 5 03:30:17 2010 From: pivanov314 at gmail.com (Paul Ivanov) Date: Tue, 05 Jan 2010 00:30:17 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B42905A.4080105@noaa.gov> References: <4B42905A.4080105@noaa.gov> Message-ID: <4B42F899.90109@gmail.com> Christopher Barker, on 2010-01-04 17:05, wrote: > Hi folks, > > I'm taking a look once again at fromfile() for reading text files. I > often have the need to read a LOT of numbers form a text file, and it > can actually be pretty darn slow do i the normal python way: > > for line in file: > data = map(float, line.strip().split()) > > > or various other versions that are similar. It really does take longer > to read the text, split it up, convert to a number, then put that number > into a numpy array, than it does to simply read it straight into the array. > > However, as it stands, fromfile() turn out to be next to useless for > anything but whitespace separated text. Full set of ideas here: > > http://projects.scipy.org/numpy/ticket/909 > > However, for the moment, I'm digging into the code to address a > particular problem -- reading files like this: > > 123, 65.6, 789 > 23, 3.2, 34 > ... > > That is comma (or whatever) separated text -- pretty common stuff. > > The problem with the current code is that you can't read more than one > line at time with fromfile: > > a = np.fromfile(infile, sep=",") > > will read until it doesn't find a comma, and thus only one line, as > there is no comma after each line. As this is a really typical case, I > think it should be supported. Just a potshot, but have you tried np.loadtxt? I find it pretty fast. > > Here is the question: > > The work of finding the separator is done in: > > multiarray/ctors.c: fromfile_skip_separator() > > It looks like it wouldn't be too hard to add some code in there to look > for a newline, and consider that a valid separator. However, that would > break backward compatibility. So maybe a flag could be passed in, saying > you wanted to support newlines. The problem is that flag would have to > get passed all the way through to this function (and also for fromstring). > > I also notice that it supports separators of arbitrary length, which I > wonder how useful that is. But it also does odd things with spaces > embedded in the separator: > > ", $ #" matches all of: ",$#" ", $#" ",$ #" > > Is it worth trying to fix that? > > > In the longer term, it would be really nice to support comments as well, > tough that would require more of a re-factoring of the code, I think > (though maybe not -- I suppose a call to fromfile_skip_separator() could > look for a comment character, then if it found one, skip to where the > comment ends -- hmmm. > > thanks for any feedback, > > -Chris > > > > > > > From weisen123 at gmail.com Tue Jan 5 11:35:22 2010 From: weisen123 at gmail.com (neil weisenfeld) Date: Tue, 5 Jan 2010 11:35:22 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 Message-ID: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> Hi all, I'm having an odd problem with the package installer for numpy 1.4.0. It complains: numpy 1.4.0 can't be installed on this disk. numpy requires System Python 2.6 to install. I'm running a stock system with a stock python, so I'm not sure why the test is failing. Any ideas how to debug this? Some info: weisen at Neil-Weisenfeld-MacBook-Pro:~ [507]$ which python /usr/bin/python weisen at Neil-Weisenfeld-MacBook-Pro:~ [508]$ python Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> weisen at Neil-Weisenfeld-MacBook-Pro:~ [509]$ ls -l /System/Library/Frameworks/Python.framework/Versions/Current lrwxr-xr-x 1 root wheel 3 Sep 22 17:49 /System/Library/Frameworks/Python.framework/Versions/Current@ -> 2.6 Thanks, Neil From charlesr.harris at gmail.com Tue Jan 5 12:20:24 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jan 2010 10:20:24 -0700 Subject: [Numpy-discussion] finishing ticket #1035 Message-ID: Hi All, I'm looking at ticket #1025with an eye to bringing it to completion but there some issues that need discussion. Currently there are three ways in which nans can be compared: maximum/minimum, fmax/fmin, or the new sort order. The maximum/minimum ufuncs propagate nans, i.e., they will always return a nan if one is present. The fmax/fmin ufuncs don't propagate nans, they ignore nans when possible. The new sort order sorts nans to the end, i.e., nans are treated as larger than any non-nan number; at present there are no ufuncs that correspond to the sort order. The issues I think need resolving are: 1) Should there be ufuncs corresponding to the sort order? 2) What should a.max(), a.argmax(), a.min(), and a.argmin() do? I note that a.argmax() is not consistent with a.max() at the moment: In [9]: a Out[9]: array([ 0., 1., 2., 3., NaN, 5., 6., 7., NaN, NaN]) In [10]: a.argmax() Out[10]: 7 In [11]: a.max() Out[11]: nan Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Jan 5 12:32:01 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 09:32:01 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> Message-ID: <4B437791.7020808@noaa.gov> josef.pktd at gmail.com wrote: > On Mon, Jan 4, 2010 at 10:39 PM, wrote: >> I rather like the R command(s) for reading text files > Aren't the newly improved > > numpy.genfromtxt() ... > and friends indented to handle all this Yes, they are, and they are great, but not really all that fast. If you've got big complicated tables of data to read, then genfromtxt is the way to go -- it's a great tool. However, for the simple stuff, it's not really optimized. I also find I have to read a lot of text files that aren't tables of data, but rather an odd mix of stuff, but still a lot of reading lots of numbers from a file. As far as I can tell, genfromtxt and loadtxt can only load the entire file as a table (a very common situation, of course). Paul Ivanov wrote: > Just a potshot, but have you tried np.loadtxt? > > I find it pretty fast. I guess I should have posted timings in the first place: In [19]: timeit timing.time_genfromtxt() 10 loops, best of 3: 216 ms per loop In [20]: timeit timing.time_loadtxt() 10 loops, best of 3: 166 ms per loop In [21]: timeit timing.time_fromfile() 10 loops, best of 3: 47.1 ms per loop (40,000 doubles from a space-delimted text file) so fromfile() is 3.5 times as fast as loadtxt and 4.5 times as fast as genfromtxt. That does make a difference for me -- the user waiting 4 seconds, rather than one second to load a file matters. I suppose another option might be to see if I can optimize the inner scanning function of genfromtxt with Cython or C, but I'm not sure that's possible, as it's really very flexible, and re-writing all of that without Python would be really painful! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Jan 5 13:51:18 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jan 2010 13:51:18 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B437791.7020808@noaa.gov> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> <4B437791.7020808@noaa.gov> Message-ID: <07B0901E-5F52-4890-AB1D-F225F0D2E009@gmail.com> On Jan 5, 2010, at 12:32 PM, Christopher Barker wrote: > josef.pktd at gmail.com wrote: >> On Mon, Jan 4, 2010 at 10:39 PM, wrote: >>> I rather like the R command(s) for reading text files > >> Aren't the newly improved >> >> numpy.genfromtxt() > > ... > >> and friends indented to handle all this > > Yes, they are, and they are great, but not really all that fast. If > you've got big complicated tables of data to read, then genfromtxt is > the way to go -- it's a great tool. However, for the simple stuff, it's > not really optimized. genfromtxt is nothing but loadtxt overloaded to deal with undefined dtype and missing entries. It's doomed to be slower, and it shouldn't be used if you know your data is well-defined and well-behaved. Stick to loadtxt > I also find I have to read a lot of text files > that aren't tables of data, but rather an odd mix of stuff, but still a > lot of reading lots of numbers from a file. Well, everything depends on what kind of stuff you have in your mix, I guess... > so fromfile() is 3.5 times as fast as loadtxt and 4.5 times as fast as > genfromtxt. That does make a difference for me -- the user waiting 4 > seconds, rather than one second to load a file matters. Rmmbr that fromfile is C when loadtxt and genfromtxt are Python... > I suppose another option might be to see if I can optimize the inner > scanning function of genfromtxt with Cython or C, but I'm not sure > that's possible, as it's really very flexible, and re-writing all of > that without Python would be really painful! Well, there's room for some optimization for particular cases (dtype!=None), but the generic case will be tricky... From pav at iki.fi Tue Jan 5 15:01:36 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 05 Jan 2010 22:01:36 +0200 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B42905A.4080105@noaa.gov> References: <4B42905A.4080105@noaa.gov> Message-ID: <1262721695.5107.1.camel@idol> ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti: [clip] > I also notice that it supports separators of arbitrary length, which I > wonder how useful that is. But it also does odd things with spaces > embedded in the separator: > > ", $ #" matches all of: ",$#" ", $#" ",$ #" > > Is it worth trying to fix that? That's a documented feature: sep : str Separator between items if file is a text file. Empty ("") separator means the file should be treated as binary. Spaces (" ") in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace. -- Pauli Virtanen From efiring at hawaii.edu Tue Jan 5 15:45:51 2010 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 05 Jan 2010 10:45:51 -1000 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> Message-ID: <4B43A4FF.3010604@hawaii.edu> neil weisenfeld wrote: > Hi all, > > I'm having an odd problem with the package installer for numpy 1.4.0. > It complains: > > numpy 1.4.0 can't be installed on this disk. numpy requires System > Python 2.6 to install. I think the problem is that the message is misleading; it should be saying you need python from python.org, *not* the python that comes with OSX. (The two coexist; installing python from python.org does not interfere with OSX's use of its own python. Caveat: I don't use Mac myself, so I am basing all this on second-hand experience--helping with a numpy installation a few minutes ago--and what I remember seeing. I think there is a mailing list thread about this, but I couldn't find it.) Eric > > > I'm running a stock system with a stock python, so I'm not sure why > the test is failing. Any ideas how to debug this? > > Some info: > > weisen at Neil-Weisenfeld-MacBook-Pro:~ > [507]$ which python > /usr/bin/python > > weisen at Neil-Weisenfeld-MacBook-Pro:~ > [508]$ python > Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > > weisen at Neil-Weisenfeld-MacBook-Pro:~ > [509]$ ls -l /System/Library/Frameworks/Python.framework/Versions/Current > lrwxr-xr-x 1 root wheel 3 Sep 22 17:49 > /System/Library/Frameworks/Python.framework/Versions/Current@ -> 2.6 > > > > Thanks, > Neil > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Tue Jan 5 16:03:10 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jan 2010 16:03:10 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43A4FF.3010604@hawaii.edu> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> Message-ID: On Jan 5, 2010, at 3:45 PM, Eric Firing wrote: > neil weisenfeld wrote: >> Hi all, >> >> I'm having an odd problem with the package installer for numpy 1.4.0. >> It complains: >> >> numpy 1.4.0 can't be installed on this disk. numpy requires System >> Python 2.6 to install. > > I think the problem is that the message is misleading; it should be > saying you need python from python.org, *not* the python that comes with > OSX. ??? I have several versions of numpy installed on my Macbook (OS X 6), but only one Python (the one that comes with Apple). However, these versions are installed in different virtual environments. Neil, could you give us more info about how you're trying to install it ? Have you tried to use the --user flag (ie, `python setup.py install --user`) ? From manuel.wittchen at gmail.com Tue Jan 5 16:14:54 2010 From: manuel.wittchen at gmail.com (Manuel Wittchen) Date: Tue, 5 Jan 2010 22:14:54 +0100 Subject: [Numpy-discussion] extracting data from ODF files Message-ID: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> Hi, is there a (simple) solution to extract data from OpenDocument files (espacially OpenOffice.org Calc files) into a Numpy Array? At the moment I copy the colums from OO.org Calc manually into a tab-separatet Plaintext file which is quite annoying. Regards, Manuel Wittchen From emmanuelle.gouillart at normalesup.org Tue Jan 5 16:23:08 2010 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Tue, 5 Jan 2010 22:23:08 +0100 Subject: [Numpy-discussion] extracting data from ODF files In-Reply-To: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> References: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> Message-ID: <20100105212307.GA32094@phare.normalesup.org> Hi Manuel, you may save your odf file as a csv (comma separated value) file with OpenOffice, then use np.loadtxt, specifying the 'delimiter' keyword: myarray = np.loadtxt('myfile.csv', delimiter=',') Cheers, Emmanuelle On Tue, Jan 05, 2010 at 10:14:54PM +0100, Manuel Wittchen wrote: > Hi, > is there a (simple) solution to extract data from OpenDocument files > (espacially OpenOffice.org Calc files) into a Numpy Array? At the > moment I copy the colums from OO.org Calc manually into a > tab-separatet Plaintext file which is quite annoying. > Regards, > Manuel Wittchen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From xavier.gnata at gmail.com Tue Jan 5 16:48:43 2010 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Tue, 05 Jan 2010 22:48:43 +0100 Subject: [Numpy-discussion] Lots of 32bits specific errors Message-ID: <4B43B3BB.4060702@gmail.com> Hi, I have compiled numpy 1.5.0.dev8039 both on a 32 and a 64bits ubuntu machine. On the 64bits one, everything is fine: numpy.test get a perfect score: On the 32bits ubuntu, the story is not that nice: ERROR: Test filled w/ mvoid ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/ma/tests/test_core.py", line 506, in test_filled_w_mvoid a = mvoid(np.array((1, 2)), mask=[(0, 1)], dtype=ndtype) File "/usr/local/lib/python2.6/dist-packages/numpy/ma/core.py", line 5454, in __new__ _data = ndarray.__new__(self, (), dtype=dtype, buffer=data.data) TypeError: buffer is too small for requested array ====================================================================== FAIL: test_cdouble (test_linalg.TestCond2) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 114, in do old_assert_almost_equal(s[0]/s[-1], linalg.cond(a,2), decimal=5) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 455, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: 9.4348091510413177 DESIRED: 22.757141876814547 ====================================================================== FAIL: test_csingle (test_linalg.TestCond2) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 114, in do old_assert_almost_equal(s[0]/s[-1], linalg.cond(a,2), decimal=5) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 455, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: 9.4348097 DESIRED: 22.757143 ====================================================================== FAIL: test_cdouble (test_linalg.TestDet) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 135, in do assert_almost_equal(d, multiply.reduce(ev)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: (8.8817841970012523e-16-4j) DESIRED: (5.2800000000000011-11.040000000000004j) ====================================================================== FAIL: test_csingle (test_linalg.TestDet) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 135, in do assert_almost_equal(d, multiply.reduce(ev)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: (8.8817841970012523e-16-4j) DESIRED: (5.2800000000000011-11.040000000000004j) ====================================================================== FAIL: test_cdouble (test_linalg.TestEig) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 94, in do assert_almost_equal(dot(a, evectors), multiply(evectors, evalues)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 2.72530404+2.67511327j, 1.92238375+1.30131653j], [ 5.95809316+4.79684551j, 3.41362547+1.42587017j]]) DESIRED: array([[ 2.01388405+1.03693361j, -1.39512658+1.87085135j], [ 1.78601662+0.01838201j, -0.10408816-3.54121552j]]) ====================================================================== FAIL: test_csingle (test_linalg.TestEig) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 94, in do assert_almost_equal(dot(a, evectors), multiply(evectors, evalues)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 2.72530413+2.6751132j , 1.92238379+1.3013165j ], [ 5.95809317+4.79684544j, 3.41362548+1.42587018j]], dtype=complex64) DESIRED: array([[ 2.01388407+1.03693354j, -1.39512670+1.8708514j ], [ 1.78601658+0.01838197j, -0.10408816-3.54121566j]], dtype=complex64) ====================================================================== FAIL: test_cdouble (test_linalg.TestEigh) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 221, in test_cdouble self.do(a) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 259, in do assert_almost_equal(ev, evalues) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([-2.60555128, 4.60555128]) DESIRED: array([-1.71080202-1.00413682j, 3.01849433+1.46567528j]) ====================================================================== FAIL: test_csingle (test_linalg.TestEigh) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 217, in test_csingle self.do(a) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 259, in do assert_almost_equal(ev, evalues) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([-2.60555124, 4.60555124], dtype=float32) DESIRED: array([-1.71080208-1.0041368j , 3.01849437+1.46567523j], dtype=complex64) ====================================================================== FAIL: test_cdouble (test_linalg.TestEigvalsh) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 221, in test_cdouble self.do(a) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 249, in do assert_almost_equal(ev, evalues) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([-2.60555128+0.j, 4.60555128+0.j]) DESIRED: array([-1.71080202-1.00413682j, 3.01849433+1.46567528j]) ====================================================================== FAIL: test_csingle (test_linalg.TestEigvalsh) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 217, in test_csingle self.do(a) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 249, in do assert_almost_equal(ev, evalues) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([-2.60555124+0.j, 4.60555124+0.j], dtype=complex64) DESIRED: array([-1.71080208-1.0041368j , 3.01849437+1.46567523j], dtype=complex64) ====================================================================== FAIL: test_cdouble (test_linalg.TestLstsq) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 141, in do assert_almost_equal(b, dot(a, x)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([ 2.+1.j, 1.+2.j]) DESIRED: array([ 0.95920929+0.98311952j, 1.23494444+0.67346351j]) ====================================================================== FAIL: test_csingle (test_linalg.TestLstsq) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 141, in do assert_almost_equal(b, dot(a, x)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([ 2.+1.j, 1.+2.j], dtype=complex64) DESIRED: array([ 0.95920926+0.98311943j, 1.23494434+0.67346334j], dtype=complex64) ====================================================================== FAIL: test_cdouble (test_linalg.TestPinv) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 124, in do assert_almost_equal(dot(a, a_ginv), identity(asarray(a).shape[0])) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 0.29169056-0.07799046j, 0.17767375-0.01332484j], [ 0.04125021-0.38255608j, 0.73402869+0.62377356j]]) DESIRED: array([[ 1., 0.], [ 0., 1.]]) ====================================================================== FAIL: test_csingle (test_linalg.TestPinv) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 124, in do assert_almost_equal(dot(a, a_ginv), identity(asarray(a).shape[0])) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 0.29169053-0.07799049j, 0.17767370-0.0133248j ], [ 0.04125014-0.38255614j, 0.73402858+0.62377363j]], dtype=complex64) DESIRED: array([[ 1., 0.], [ 0., 1.]]) ====================================================================== FAIL: test_cdouble (test_linalg.TestSVD) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 100, in do assert_almost_equal(a, dot(multiply(u, s), vt)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 1.+2.j, 2.+3.j], [ 3.+4.j, 4.+5.j]]) DESIRED: array([[ 1.00000000+2.j , 2.36670415+2.98574489j], [ 3.00000000+4.j , 2.80882652+6.25521741j]]) ====================================================================== FAIL: test_csingle (test_linalg.TestSVD) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 39, in test_csingle self.do(a, b) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 100, in do assert_almost_equal(a, dot(multiply(u, s), vt)) File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/tests/test_linalg.py", line 23, in assert_almost_equal old_assert_almost_equal(a, b, decimal=decimal, **kw) File "/usr/local/lib/python2.6/dist-packages/numpy/testing/utils.py", line 435, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: array([[ 1.+2.j, 2.+3.j], [ 3.+4.j, 4.+5.j]], dtype=complex64) DESIRED: array([[ 0.99999994+2.j , 2.36670423+2.98574495j], [ 3.00000000+4.j , 2.80882668+6.25521755j]], dtype=complex64) Xavier From efiring at hawaii.edu Tue Jan 5 16:53:17 2010 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 05 Jan 2010 11:53:17 -1000 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> Message-ID: <4B43B4CD.3030701@hawaii.edu> Pierre GM wrote: > On Jan 5, 2010, at 3:45 PM, Eric Firing wrote: >> neil weisenfeld wrote: >>> Hi all, >>> >>> I'm having an odd problem with the package installer for numpy 1.4.0. >>> It complains: >>> >>> numpy 1.4.0 can't be installed on this disk. numpy requires System >>> Python 2.6 to install. >> I think the problem is that the message is misleading; it should be >> saying you need python from python.org, *not* the python that comes with >> OSX. > > ??? > I have several versions of numpy installed on my Macbook (OS X 6), but only one Python (the one that comes with Apple). However, these versions are installed in different virtual environments. > Neil, could you give us more info about how you're trying to install it ? Have you tried to use the --user flag (ie, `python setup.py install --user`) ? > Pierre, He is installing using the binary package installer, not installing from source. Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From Chris.Barker at noaa.gov Tue Jan 5 17:09:14 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 14:09:14 -0800 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43A4FF.3010604@hawaii.edu> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> Message-ID: <4B43B88A.6020008@noaa.gov> Eric Firing wrote: >> I'm having an odd problem with the package installer for numpy 1.4.0. >> It complains: >> >> numpy 1.4.0 can't be installed on this disk. numpy requires System >> Python 2.6 to install. > > I think the problem is that the message is misleading; it should be > saying you need python from python.org, *not* the python that comes with > OSX. yes, that is it the "system python" referred to in the error message is actually the "python.org Framework build". I recommend it anyway, it's newer, and you won't accidentally mess with anything Apple is doing. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cournape at gmail.com Tue Jan 5 17:31:55 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 Jan 2010 07:31:55 +0900 Subject: [Numpy-discussion] Lots of 32bits specific errors In-Reply-To: <4B43B3BB.4060702@gmail.com> References: <4B43B3BB.4060702@gmail.com> Message-ID: <5b8d13221001051431q418df142x5df762c0d065c992@mail.gmail.com> On Wed, Jan 6, 2010 at 6:48 AM, Xavier Gnata wrote: > Hi, > > I have compiled numpy 1.5.0.dev8039 both on a 32 and a 64bits ubuntu > machine. > > On the 64bits one, everything is fine: > numpy.test get a perfect score: > > > On the 32bits ubuntu, the story is not that nice: > Almost all your errors are in linalg - most likely an atlas problem. Atlas with sse2 is buggy on some Ubuntu versions. You should check Ubuntu bug tracker to see if it affects the version you are using. David From pgmdevlist at gmail.com Tue Jan 5 17:38:44 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jan 2010 17:38:44 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43B4CD.3030701@hawaii.edu> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> Message-ID: <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> On Jan 5, 2010, at 4:53 PM, Eric Firing wrote: > Pierre GM wrote: > > Pierre, > > He is installing using the binary package installer, not installing from > source. Ah OK, my bad. Now, why should it be that different ? Why rely on a second Python to install numpy from a dmg ? If it's a matter of framework, couldn't we used the Python in /System and create a framework in ~/Library ? (I'm just asking out of curiosity...) From pgmdevlist at gmail.com Tue Jan 5 17:38:44 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jan 2010 17:38:44 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43B4CD.3030701@hawaii.edu> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> Message-ID: <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> On Jan 5, 2010, at 4:53 PM, Eric Firing wrote: > Pierre GM wrote: > > Pierre, > > He is installing using the binary package installer, not installing from > source. Ah OK, my bad. Now, why should it be that different ? Why rely on a second Python to install numpy from a dmg ? If it's a matter of framework, couldn't we used the Python in /System and create a framework in ~/Library ? (I'm just asking out of curiosity...) From Chris.Barker at noaa.gov Tue Jan 5 18:01:38 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 15:01:38 -0800 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> Message-ID: <4B43C4D2.5030701@noaa.gov> Pierre GM wrote: > Ah OK, my bad. Now, why should it be that different ? Why rely on a > second Python to install numpy from a dmg? OS-X has a way of hard coding paths, so a given installer is designed to go in one place, and one place only. The python.org python is the best one to support -- Apple has never upgraded a python, has often shipped a broken version, and has provided different versions with each OS-X version. If we support the python.org python for OS-X 10.4, it can work for everyone with 10.4 - 10.6. It's changing a bit with OS-X 10.6 -- for the first time, Apple at least provided an up-to-date python that isn't broken. But it's not really up to date anymore, anyway (2.6.1 when 2.6.4 is out. I know I've been bitten by at least one bug that was fixed between 2.6.1 and 2.6.3). This is a policy followed by other projects as well. > If it's a matter of > framework, couldn't we used the Python in /System and create a > framework in ~/Library ? (I'm just asking out of curiosity...) The Apple python is fine -- it just isn't the same one, installed in the same place. If you want to build a binary installer for it, it's easy -- but it will only work on 10.6, and not with an updated Python. As the 2.6 series is binary compatible, you can build a single installer that will work with both -- Robin Dunn has done this for wxPython. The way he's done it is to put wxPython itself into /usr/local, and then put some *.pth trickery into both of the pythons: /System/... and /Library/... It works fine, and I've suggested it on this list before, but I guess folks think it's too much of a hack -- or just no one has taken the time to do it. If I get a vote of approval for the approach, I suppose I could do it, I'm sure I could find Robin's scripts and hack them for numpy. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From dwf at cs.toronto.edu Tue Jan 5 18:22:23 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 5 Jan 2010 18:22:23 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43C4D2.5030701@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> Message-ID: On 5-Jan-10, at 6:01 PM, Christopher Barker wrote: > The python.org python is the best one to support -- Apple has never > upgraded a python, has often shipped a broken version, and has > provided > different versions with each OS-X version. If we support the > python.org > python for OS-X 10.4, it can work for everyone with 10.4 - 10.6. > > It's changing a bit with OS-X 10.6 -- for the first time, Apple at > least > provided an up-to-date python that isn't broken. But it's not really > up > to date anymore, anyway (2.6.1 when 2.6.4 is out. I know I've been > bitten by at least one bug that was fixed between 2.6.1 and 2.6.3). AFAIK, the System Python in 10.6 is 64-bit capable (but not in the same way as Ron Oussoren's 4-way universal build script does it). Pretty sure the python.org binaries are 32-bit only. I still think it's sensible to prefer the > As the 2.6 series is binary compatible, you can build a single > installer > that will work with both -- Robin Dunn has done this for wxPython. The > way he's done it is to put wxPython itself into /usr/local, and then > put > some *.pth trickery into both of the pythons: /System/... and / > Library/... > > It works fine, and I've suggested it on this list before, but I guess > folks think it's too much of a hack -- or just no one has taken the > time > to do it. +1 on the general approach though it might get a bit more complicated if the two Pythons support different sets of architectures (e.g. i386 and x86_64 in System Python 10.6, i386 and ppc in Python.org Python, or some home-rolled weirdness). With wxPython this doesn't so much matter since wxMac depends on Carbon anyway (I think it still does, at least, unless the Cocoa port's suddenly sped up an incredible amount), which is a 64-bit no-no. I'm not really a fan of packages polluting /usr/local, I'd rather the tree appear /opt/packagename or /usr/local/packagename instead, for ease of removal, but the general approach of "stash somewhere and put a .pth in both site-packages" seems fine to me. David From xavier.gnata at gmail.com Tue Jan 5 18:17:12 2010 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Wed, 06 Jan 2010 00:17:12 +0100 Subject: [Numpy-discussion] Lots of 32bits specific errors In-Reply-To: <5b8d13221001051431q418df142x5df762c0d065c992@mail.gmail.com> References: <4B43B3BB.4060702@gmail.com> <5b8d13221001051431q418df142x5df762c0d065c992@mail.gmail.com> Message-ID: <4B43C878.3060007@gmail.com> > On Wed, Jan 6, 2010 at 6:48 AM, Xavier Gnata wrote: > >> Hi, >> >> I have compiled numpy 1.5.0.dev8039 both on a 32 and a 64bits ubuntu >> machine. >> >> On the 64bits one, everything is fine: >> numpy.test get a perfect score: >> >> >> On the 32bits ubuntu, the story is not that nice: >> >> > Almost all your errors are in linalg - most likely an atlas problem. > Atlas with sse2 is buggy on some Ubuntu versions. You should check > Ubuntu bug tracker to see if it affects the version you are using. > > Thanks, you got it! atlas sse2 in fully buggy in karmic (at least on 32bits machines). without sse. I'm going to try with the sse version of atlas (sse not sse2...) to see if it is as buggy as the sse? version. Xavier From pgmdevlist at gmail.com Tue Jan 5 18:21:24 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jan 2010 18:21:24 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> Message-ID: <01D3DA28-6462-4481-97AE-E4ECEB5DD181@gmail.com> On Jan 5, 2010, at 6:22 PM, David Warde-Farley wrote: > > On 5-Jan-10, at 6:01 PM, Christopher Barker wrote: > >> The python.org python is the best one to support -- Apple has never >> upgraded a python, has often shipped a broken version, and has >> provided >> different versions with each OS-X version. If we support the >> python.org >> python for OS-X 10.4, it can work for everyone with 10.4 - 10.6. >> >> It's changing a bit with OS-X 10.6 -- for the first time, Apple at >> least >> provided an up-to-date python that isn't broken. But it's not really >> up >> to date anymore, anyway (2.6.1 when 2.6.4 is out. I know I've been >> bitten by at least one bug that was fixed between 2.6.1 and 2.6.3). > > AFAIK, the System Python in 10.6 is 64-bit capable (but not in the > same way as Ron Oussoren's 4-way universal build script does it). > Pretty sure the python.org binaries are 32-bit only. I still think > it's sensible to prefer the OK, so there's still no 64b Python on python.org ? Gonna stick with Apple's then (I remember using a lot of sleep hours when I upgraded to 10.6...) >> As the 2.6 series is binary compatible, you can build a single >> installer >> that will work with both -- Robin Dunn has done this for wxPython. The >> way he's done it is to put wxPython itself into /usr/local, and then >> put >> some *.pth trickery into both of the pythons: /System/... and / >> Library/... >> >> It works fine, and I've suggested it on this list before, but I guess >> folks think it's too much of a hack -- or just no one has taken the >> time >> to do it. > > +1 on the general approach though it might get a bit more complicated > if the two Pythons support different sets of architectures (e.g. i386 > and x86_64 in System Python 10.6, i386 and ppc in Python.org Python, > or some home-rolled weirdness). With wxPython this doesn't so much > matter since wxMac depends on Carbon anyway (I think it still does, at > least, unless the Cocoa port's suddenly sped up an incredible amount), > which is a 64-bit no-no. > > I'm not really a fan of packages polluting /usr/local, I'd rather the > tree appear /opt/packagename or /usr/local/packagename instead, for > ease of removal, but the general approach of "stash somewhere and put > a .pth in both site-packages" seems fine to me. +1 w/ David as well. Christopher, thanks a lot for the info. I'm glad I don't have to deal w/ packaging issues... From Chris.Barker at noaa.gov Tue Jan 5 19:02:56 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 16:02:56 -0800 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> Message-ID: <4B43D330.7090608@noaa.gov> David Warde-Farley wrote: > AFAIK, the System Python in 10.6 is 64-bit capable (but not in the > same way as Ron Oussoren's 4-way universal build script does it). right -- I'm not sure if it's useful, though, I don't' think there is a 64 bit interpreter, for instance. But maybe that was the one delivered with 10.5. But I'm not the one to ask -- I don't have 10.6, I'm still on an old PPC running 10.4. > Pretty sure the python.org binaries are 32-bit only. I still think > it's sensible to prefer the waiting the rest of this sentence.. ;-) >> As the 2.6 series is binary compatible, you can build a single >> installer that will work with both > +1 on the general approach though it might get a bit more complicated > if the two Pythons support different sets of architectures (e.g. i386 > and x86_64 in System Python 10.6, i386 and ppc in Python.org Python, > or some home-rolled weirdness). Yes, the whole thing is a nightmare, really. 32bit ppc+i386 was bad enough -- with four now, it's really a mess. > With wxPython this doesn't so much > matter since wxMac depends on Carbon anyway (I think it still does, at > least, unless the Cocoa port's suddenly sped up an incredible amount), > which is a 64-bit no-no. You're right -- still strictly Carbon, and therefor strictly 32 bit. > I'm not really a fan of packages polluting /usr/local, I'd rather the > tree appear /opt/packagename well, /opt has kind of been co-opted by macports. > or /usr/local/packagename instead, for > ease of removal wxPython gets put entirely into: /usr/local/lib/wxPython-unicode-2.10.8 which isn't bad. > but the general approach of "stash somewhere and put > a .pth in both site-packages" seems fine to me. OK -- what about simply punting and doing two builds: one 32 bit, and one 64 bit. I wonder if we need 64bit PPC at all? I know I'm running 64 bit hardware, but never ran a 64 bit OS on it -- I wonder if anyone is? What machines/OS versions are available for building Mac installer with? I could do 10.4, 32 bit, ppc+intel. I'll post on the pythonmac list to see what folks there think. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Tue Jan 5 19:18:46 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 16:18:46 -0800 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43D330.7090608@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> Message-ID: <4B43D6E6.80305@noaa.gov> Christopher Barker wrote: > OK -- what about simply punting and doing two builds: one 32 bit, and > one 64 bit. I wonder if we need 64bit PPC at all? I know I'm running 64 > bit hardware, but never ran a 64 bit OS on it -- I wonder if anyone is? Oh, I think this approach may be completely egg-incompatible.... Maybe we just need ten builds -- arrgg! If distutils/setuptools could identify the python version properly, then binary eggs and easy-install could be a solution -- but that's a mess, too. oh well, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cournape at gmail.com Tue Jan 5 19:42:31 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 Jan 2010 09:42:31 +0900 Subject: [Numpy-discussion] Lots of 32bits specific errors In-Reply-To: <4B43C878.3060007@gmail.com> References: <4B43B3BB.4060702@gmail.com> <5b8d13221001051431q418df142x5df762c0d065c992@mail.gmail.com> <4B43C878.3060007@gmail.com> Message-ID: <5b8d13221001051642i24d0f0f5m82437235f5b1ae21@mail.gmail.com> On Wed, Jan 6, 2010 at 8:17 AM, Xavier Gnata wrote: > >> On Wed, Jan 6, 2010 at 6:48 AM, Xavier Gnata wrote: >> >>> Hi, >>> >>> I have compiled numpy 1.5.0.dev8039 both on a 32 and a 64bits ubuntu >>> machine. >>> >>> On the 64bits one, everything is fine: >>> numpy.test get a perfect score: >>> >>> >>> On the 32bits ubuntu, the story is not that nice: >>> >>> >> Almost all your errors are in linalg - most likely an atlas problem. >> Atlas with sse2 is buggy on some Ubuntu versions. You should check >> Ubuntu bug tracker to see if it affects the version you are using. >> >> > Thanks, you got it! atlas sse2 in fully buggy in karmic (at least on > 32bits machines). That's just incompetence at that point. This bug is known for > one year, and they still have not fixed it. The least they could do is removing the package so that people has slower, but accurate version. David From cournape at gmail.com Tue Jan 5 20:07:26 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 Jan 2010 10:07:26 +0900 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> Message-ID: <5b8d13221001051707g28d317f6t8ea07a138c457c38@mail.gmail.com> On Wed, Jan 6, 2010 at 8:22 AM, David Warde-Farley wrote: > > On 5-Jan-10, at 6:01 PM, Christopher Barker wrote: > > >> As the 2.6 series is binary compatible, you can build a single >> installer >> that will work with both I don't think that's true. 2.6.x are compatible with each other iif they are built with the same compiler options. There are too many differences between Apple python and the python.org python (dtrace, 64 bits support, compiler options, etc...) IMHO to make a compatible installer for both versions worthwhile. >> way he's done it is to put wxPython itself into /usr/local, and then >> put >> some *.pth trickery into both of the pythons: /System/... and / >> Library/... >> >> It works fine, and I've suggested it on this list before, but I guess >> folks think it's too much of a hack -- or just no one has taken the >> time >> to do it. I don't think it worths it. .pth files will involve even more point of failures, and it has the potential of breaking things in non obvious ways. I agree that the lack of 64 bits installer is an issue, but building numpy on mac os x is not that difficult, and I think people who need 64 bits often are more knowledgeable. There are also solutions like EPD and the likes, which support 64 bits. David From cournape at gmail.com Tue Jan 5 20:09:19 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 Jan 2010 10:09:19 +0900 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43D6E6.80305@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> Message-ID: <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> On Wed, Jan 6, 2010 at 9:18 AM, Christopher Barker wrote: > If distutils/setuptools could identify the python version properly, then > ?binary eggs and easy-install could be a solution -- but that's a mess, > too. It would not solve the problem, really. Two same versions of python does not imply compatible python when C extensions are involved. In current state of affairs, where python does not have a stable ABI, the only workable solution is to target one specific python (or to build your own as in EPD). cheers, David From x.yang at physics.usyd.edu.au Tue Jan 5 22:38:29 2010 From: x.yang at physics.usyd.edu.au (Xue (Sue) Yang) Date: Wed, 6 Jan 2010 14:38:29 +1100 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab Message-ID: <001a01ca8e81$b626ec50$2274c4f0$@yang@physics.usyd.edu.au> Hi, I followed what I collected about installation of numpy with lapack and atlas and installed numpy on our desktop with RHEL4 and 4 cores. >uname -a Linux curie.physics.usyd.edu.au 2.6.9-89.0.15.ELsmp #1 SMP Sat Oct 10 05:59:16 EDT 2009 i686 i686 i386 GNU/Linux I successfully installed lapack-3.1.1, atlas3.8.0 with fortran comfiler: gfortran, and numpy-1.3.0 with enthought-python distribution (python2.5). > python >>import numpy >>a = numpy.random.randn(6000, 6000) >>numpy.dot(a, a) Surprisingly, it only uses 2 cores instead of 4 cores. Where and how should I set up the number of threads for numpy? Thanks! Dr. Xue (Sue) Yang School of Physics, University of Sydney Ph: 02 9351 6081 Email: x.yang at physics.usyd.edu.au From david at silveregg.co.jp Tue Jan 5 22:49:28 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Wed, 06 Jan 2010 12:49:28 +0900 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <001a01ca8e81$b626ec50$2274c4f0$@yang@physics.usyd.edu.au> References: <001a01ca8e81$b626ec50$2274c4f0$@yang@physics.usyd.edu.au> Message-ID: <4B440848.2060106@silveregg.co.jp> Xue (Sue) Yang wrote: > Hi, > > I followed what I collected about installation of numpy with lapack and > atlas and installed numpy on our desktop with RHEL4 and 4 cores. > >> uname -a > > Linux curie.physics.usyd.edu.au 2.6.9-89.0.15.ELsmp #1 SMP Sat Oct 10 > 05:59:16 EDT 2009 i686 i686 i386 GNU/Linux > > I successfully installed lapack-3.1.1, atlas3.8.0 with fortran comfiler: > gfortran, and numpy-1.3.0 with enthought-python distribution (python2.5). > >> python >>> import numpy >>> a = numpy.random.randn(6000, 6000) >>> numpy.dot(a, a) > > Surprisingly, it only uses 2 cores instead of 4 cores. Where and how should > I set up the number of threads for numpy? Atlas (at least your version, I don't know about 3.9.* series) does not support setting the number of threads dynamically - it is a compile time option. If the compile time option is indeed 4 threads, it may be that ATLAS decided that using 2 threads instead of 4 was more efficient. You can find this info in atlas_buildinfo.h file (the ATL_NCPU CPP define). Note that you should not use atlas 3.8.0, as it has a number of serious bugs - you should use 3.8.3. David From nadavh at visionsense.com Wed Jan 6 01:13:33 2010 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 6 Jan 2010 08:13:33 +0200 Subject: [Numpy-discussion] extracting data from ODF files References: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> Message-ID: <710F2847B0018641891D9A21602763605AD297@ex3.envision.co.il> There is a possibility to export the data to excel format and use xlrd or similar package to read it. Nadav -----Original Message----- From: numpy-discussion-bounces at scipy.org on behalf of Manuel Wittchen Sent: Tue 05-Jan-10 23:14 To: Discussion of Numerical Python Subject: [Numpy-discussion] extracting data from ODF files Hi, is there a (simple) solution to extract data from OpenDocument files (espacially OpenOffice.org Calc files) into a Numpy Array? At the moment I copy the colums from OO.org Calc manually into a tab-separatet Plaintext file which is quite annoying. Regards, Manuel Wittchen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2927 bytes Desc: not available URL: From Chris.Barker at noaa.gov Wed Jan 6 01:34:45 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jan 2010 22:34:45 -0800 Subject: [Numpy-discussion] extracting data from ODF files In-Reply-To: <710F2847B0018641891D9A21602763605AD297@ex3.envision.co.il> References: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> <710F2847B0018641891D9A21602763605AD297@ex3.envision.co.il> Message-ID: <4B442F05.7070400@noaa.gov> Nadav Horesh wrote: > is there a (simple) solution to extract data from OpenDocument files > (espacially OpenOffice.org Calc files) into a Numpy Array? Aren't they XML? you may be able to use an XML parser. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From silva at lma.cnrs-mrs.fr Wed Jan 6 04:50:39 2010 From: silva at lma.cnrs-mrs.fr (Fabricio Silva) Date: Wed, 06 Jan 2010 10:50:39 +0100 Subject: [Numpy-discussion] extracting data from ODF files In-Reply-To: <4B442F05.7070400@noaa.gov> References: <209cec441001051314k284895bdq6e9a3b8f5fd9197c@mail.gmail.com> <710F2847B0018641891D9A21602763605AD297@ex3.envision.co.il> <4B442F05.7070400@noaa.gov> Message-ID: <1262771439.3294.2.camel@PCTerrusse> Le mardi 05 janvier 2010 ? 22:34 -0800, Christopher Barker a ?crit : > Nadav Horesh wrote: > > is there a (simple) solution to extract data from OpenDocument files > > (espacially OpenOffice.org Calc files) into a Numpy Array? > > Aren't they XML? you may be able to use an XML parser. See, e.g., http://wiki.services.openoffice.org/wiki/CalcParser Maybe another solution is to use the python open office interface : python-uno ? -- Fabrice Silva Laboratory of Mechanics and Acoustics (CNRS, UPR 7051) From Chris.Barker at noaa.gov Wed Jan 6 11:22:24 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jan 2010 08:22:24 -0800 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> Message-ID: <4B44B8C0.5010203@noaa.gov> NOTE: cc-d to the pythonmac list from the numpy list -- this is really a Mac issue. It's a discussion of what/how to produce binaries of numpy for OS-X David Cournapeau wrote: > On Wed, Jan 6, 2010 at 9:18 AM, Christopher Barker > wrote: > >> If distutils/setuptools could identify the python version properly, then >> binary eggs and easy-install could be a solution -- but that's a mess, >> too. > > It would not solve the problem, really. Two same versions of python > does not imply compatible python when C extensions are involved. In > current state of affairs, where python does not have a stable ABI, the > only workable solution is to target one specific python So you are saying that binary eggs are simply impossible altogether. Maybe true, I suppose, but... >>> As the 2.6 series is binary compatible, you can build a single >>> installer >>> that will work with both > > I don't think that's true. 2.6.x are compatible with each other iif > they are built with the same compiler options. There are too many > differences between Apple python and the python.org python (dtrace, 64 > bits support, compiler options, etc...) IMHO to make a compatible > installer for both versions worthwhile. Well, it was possible once, and it's been working just fine for wxPython for a good while. Things may have changed with OS-X 10.6, tough I think the wxPython binary still works (32 bit only, of course). > I agree that the lack of 64 bits installer is an issue, but building > numpy on mac os x is not that difficult, and I think people who need > 64 bits often are more knowledgeable. I agree -- but what do you get if you install OS-X 10.6, and then type "python" at the prompt -- is that a 32 bit or 64 bit python? > There are also solutions like > EPD and the likes, which support 64 bits. but not PPC anymore, sigh. There is a key problem here -- folks running OS-X expecting it to be another Unix are fine -- they install the compiler, build their own extensions, probably use some combination of fink and macports, etc. This may well apply to many Scientific programmers, and web developers (though it maybe not). However, there is a different type of Mac user -- the type that has traditionally used Macs. Some of these folks are giving a bit of programming a try, and have heard that python is an easy to learn language -- and, cool, OS-X even comes with it installed! But then they soon enough discover that they need additional packages of some sort --- and numpy is a very, very useful package, and not just for the experienced programmer (think Matlab users, for instance). These folks haven't installed the compiler, don't know 64 from 32 bit, and heaven forbid, have no idea how the heck to compile a dependency with the "./configure && make && make install" dance. Some years ago, the community on the pythonmac list made significant efforts to try to support these folks. Primarily what they need are binary installers. We also more or less declared the python.org python as the official python to support, and even had a repository of pre-built packages (http://pythonmac.org/packages/py25-fat/index.html). It was pretty handy -- you could get python itself and all the major packages there, all working together. That repository is not longer maintained, for a couple reasons: 1) Bob Ippolito is doing other things 2) A lot of package developers are providing binaries themselves 3) It's gotten even messier! But there is still a community out there that it would be nice to support. So the question is: how do we do it? That repository appears to be dead, though it could be revived if there is a bit of interest. But even without it, it would be great if there was some sort of consensus among the pythonmac crowd and the major package developers as to what to provide binaries for. We're really in a mess if you can only get a binary for PIL for the Apple Python, and only get a binary for numpy for the python.org python. Personally, I still think the Apple python is dead-end -- Apple has never supported it properly. And, if you go that route you need a different build for people running 10.4 and 10.5, and 10.6, and ... I'm not sure what the 64 bit story is -- I suspect that David is right -- folks running 64 bits are the ones that know what they are doing, so they have less need for binaries. So maybe for now this is a good goal: python 2.5: python.org build (32 bit PPC and intel) python 2.6: 32 bit python.org 2.6.4 64 bit python.org build? python 2.7: python.org 3-way build, if that happens. or separate 32 and 64 bit builds python 3.1: python.org build (whatever it ends up being) Darn, that's quite a few to support! NOTE: Ned Deily just posted a good summary of what's out there on teh pythonmac list: http://mail.python.org/pipermail/pythonmac-sig/2010-January/022031.html -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Jan 6 11:35:47 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jan 2010 08:35:47 -0800 Subject: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B44B8C0.5010203@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> Message-ID: <4B44BBE3.9090601@noaa.gov> One more note: An easy improvement to the current situation with binaries is to LABEL THEM WELL: It's worse to have a binary you expect to work fail for you than to not have one available. IN the past, I think folks' have used the default name provided by bdist_mpkg, and those are not always clear. Something like: numpy1.4-osx10.4-python.org2.6-32bit.dmg or something -- even better, with a a bit more text -- would help a lot. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cournape at gmail.com Wed Jan 6 20:09:42 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 Jan 2010 10:09:42 +0900 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B44B8C0.5010203@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> Message-ID: <5b8d13221001061709u56d72a80m73bacb69d123cd04@mail.gmail.com> On Thu, Jan 7, 2010 at 1:22 AM, Christopher Barker wrote: > NOTE: cc-d to the pythonmac list from the numpy list -- this is really a > Mac issue. It's a discussion of what/how to produce binaries of numpy > for OS-X > > > David Cournapeau wrote: >> On Wed, Jan 6, 2010 at 9:18 AM, Christopher Barker >> wrote: >> >>> If distutils/setuptools could identify the python version properly, then >>> ?binary eggs and easy-install could be a solution -- but that's a mess, >>> too. >> >> It would not solve the problem, really. Two same versions of python >> does not imply compatible python when C extensions are involved. In >> current state of affairs, where python does not have a stable ABI, the >> only workable solution is to target one specific python > > So you are saying that binary eggs are simply impossible altogether. More simply, you can't offer a single binary installer which works on binary-incompatible python versions. > I agree -- but what do you get if you install OS-X 10.6, and then type > "python" at the prompt -- is that a 32 bit or 64 bit python? 64 bits, at least by default. All the userland provided by OS-X is 64 bits AFAIK (the only apps still 32 bits on my macbook are vmware and the kernel). There is also the problem that controlling the minimal supported version of OS X is hard to control (another distutils insanity). > However, there is a different type of Mac user -- the type that has > traditionally used Macs. Some of these folks are giving a bit of > programming a try, and have heard that python is an easy to learn > language -- and, cool, OS-X even comes with it installed! > > But then they soon enough discover that they need additional packages of > some sort --- and numpy is a very, very useful package, and not just for > the experienced programmer (think Matlab users, for instance). These > folks haven't installed the compiler, don't know 64 from 32 bit, and > heaven forbid, have no idea how the heck to compile a dependency with > the "./configure && make && make install" dance. Those people already have numpy installed, though. The only solution I can see for a one-click install is to control the whole stack, e.g. like EPD eos. > Some years ago, the community on the pythonmac list made significant > efforts to try to support these folks. Primarily what they need are > binary installers. We also more or less declared the python.org python > as the official python to support, and even had a repository of > pre-built packages (http://pythonmac.org/packages/py25-fat/index.html). > It was pretty handy -- you could get python itself and all the major > packages there, all working together. I hope that our own scientific repository will be able to do this - at least that's one of the stated goal (see the toydist discussion). The only scalable solution I can see is if the packages are automatically built for every version of Mac OS X we wish to support. > > Personally, I still think the Apple python is dead-end -- Apple has > never supported it properly. And, if you go that route you need a > different build for people running 10.4 and 10.5, and 10.6, and ... I am afraid that this is needed anyway once you start depending on "high-level" stuff from Mac OS X API. > Darn, that's quite a few to support! I would say that's insane :) That's hopeless intractable. If the numpy stats are any indication, only supporting the last released python version is enough for most users. IMHO, it is much better to support only one binary installer which works well rather than a myriad which work half the time, and only confuse people anyway. David From cournape at gmail.com Wed Jan 6 20:12:08 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 Jan 2010 10:12:08 +0900 Subject: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B44BBE3.9090601@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> <4B44BBE3.9090601@noaa.gov> Message-ID: <5b8d13221001061712m3498390cse4b2ae6b37fc19e2@mail.gmail.com> On Thu, Jan 7, 2010 at 1:35 AM, Christopher Barker wrote: > One more note: > > An easy improvement to the current situation with binaries is to LABEL > THEM WELL: > > It's worse to have a binary you expect to work fail for you than to not > have one available. IN the past, I think folks' have used the default > name provided by bdist_mpkg, and those are not always clear. Something like: > > > numpy1.4-osx10.4-python.org2.6-32bit.dmg The 32 bits is redundant - we support all archs supported by the official python binary, so python.org is enough. About osx10.4, I still don't know how to make sure we do work there with distutils. The whole MACOSX_DEPLOYMENT_TARGET confuses me quite a lot. Other than that, the numpy 1.4.0 follows your advice, and contains the python.org part. David From x.yang at physics.usyd.edu.au Wed Jan 6 21:20:33 2010 From: x.yang at physics.usyd.edu.au (Xue (Sue) Yang) Date: Thu, 7 Jan 2010 13:20:33 +1100 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab Message-ID: <002401ca8f3f$fdae2340$f90a69c0$@yang@physics.usyd.edu.au> Hi David, Thank you for the reply which is useful. I also tried to Install numpy with intel mkl 9.1 I still used gfortran for numpy installation as intel mkl 9.1 supports gnu compiler. I only uncomment these lines for site.cfg in site.cfg.example [mkl] library_dirs = /usr/physics/intel/mkl/lib/32 include_dirs = /usr/physics/intel/mkl/include lapack_libs = mkl_lapack then I tested the numpy with > python >>import numpy >>a = numpy.random.randn(6000, 6000) >>numpy.dot(a, a) This time, only one cpu was used. Does it mean that our installed intel mkl 9.1 is not threaded? I don't think so. We have used it for openMP parallelization for quite a while. Thanks! Sue From cournape at gmail.com Wed Jan 6 23:21:07 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 Jan 2010 13:21:07 +0900 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <-7769099014455669290@unknownmsgid> References: <-7769099014455669290@unknownmsgid> Message-ID: <5b8d13221001062021k7d6dd475r7e895bed70b712f7@mail.gmail.com> On Thu, Jan 7, 2010 at 11:20 AM, Xue (Sue) Yang wrote: > This time, only one cpu was used. ?Does it mean that our installed intel mkl > 9.1 is not threaded? You would have to consult the MKL documentation - I believe you can control how many threads are used from an environment variable. Also, the exact build commands depend on the version of the MKL, as its libraries often change between versions. David From sturla at molden.no Thu Jan 7 06:31:38 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 7 Jan 2010 12:31:38 +0100 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <002401ca8f3f$fdae2340$f90a69c0$@yang@physics.usyd.edu.au> References: <002401ca8f3f$fdae2340$f90a69c0$@yang@physics.usyd.edu.au> Message-ID: <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> > I also tried to Install numpy with intel mkl 9.1 > I still used gfortran for numpy installation as intel mkl 9.1 supports gnu > compiler. I would suggest using GotoBLAS instead of ATLAS. It is easier to build then ATLAS (basically no configuration), and has even better performance than MKL. http://www.tacc.utexas.edu/tacc-projects/ S.M. From denis-bz-py at t-online.de Thu Jan 7 09:05:19 2010 From: denis-bz-py at t-online.de (denis) Date: Thu, 07 Jan 2010 15:05:19 +0100 Subject: [Numpy-discussion] Repeated dot products In-Reply-To: References: Message-ID: On 12/12/2009 22:55, T J wrote: > Hi, > > Suppose I have an array of shape: (n, k, k). In this case, I have n > k-by-k matrices. My goal is to compute the product of a (potentially > large) user-specified selection (with replacement) of these matrices. > For example, > > x = [0,1,2,1,3,3,2,1,3,2,1,5,3,2,3,5,2,5,3,2,1,3,5,6] TJ, what are your n, k, len(x) ? _dotblas.dot is fast: dot( 10x10 matrices ) takes ~ 22 usec on my g4 ppc, which is ~ 15 clock cycles (700 MHz) per mem access * +. A hack to find repeated pairs (or triples ...) follows. Your sequence above has only (3,2) 4 times, no win. (Can someone give a probabilistic estimate of the number of non-overlapping pairs in N letters from an alphabet of size A ?) #!/usr/bin/env python # numpy-discuss 2009 12dec TJ repeated dot products from __future__ import division from collections import defaultdict import numpy as np __version__ = "2010 7jan denis" def pairs( s, Len=2 ): """ repeated non-overlapping pairs (substrings, subwords) "abracadabra" -> ab ra [[0 7] [2 9]], not br Len=3: triples, 4 ... """ # bruteforce # grow repeated 2 3 ... ? pairs = defaultdict(list) for j in range(len(s)-Len+1): pairs[ s[j:j+Len] ].append(j) min2 = filter( lambda x: len(x) > 1, pairs.values() ) min2.sort( key = lambda x: len(x), reverse=True ) # remove overlaps -- # (if many, during init scan would be faster) runs = np.zeros( len(s), np.uint8 ) run = np.ones( Len, np.uint8 ) run[0] = Len chains = [] for ovchain in min2: chain = [] for c in ovchain: if not runs[c:c+Len].any(): runs[c:c+Len] = run chain.append(c) if len(chain) > 1: chains.append(chain) return (chains, runs) #............................................................................... if __name__ == "__main__": import sys abra = "abracadabra" alph = 5 randlen = 100 randseed = 1 exec( "\n".join( sys.argv[1:] )) # Test= ... print "pairs( %s ) --" % abra print pairs( abra ) # ab [0, 7], br [2, 9]] print pairs( abra, 3 ) # abr [0, 7] np.random.seed( randseed ) r = np.random.random_integers( 1, alph, randlen ) chains, runs = pairs( tuple(r) ) npair = sum([ len(c) for c in chains ]) print "%d repeated pairs in %d random %d" % (npair, randlen, alph) # 35 repeated pairs in 100 random 5 (prob estimate this ?) # 25 repeated pairs in 100 random 10 From Chris.Barker at noaa.gov Thu Jan 7 12:36:31 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 09:36:31 -0800 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> References: <002401ca8f3f$fdae2340$f90a69c0$%yang@physics.usyd.edu.au> <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> Message-ID: <4B461B9F.5060800@noaa.gov> Sturla Molden wrote: > I would suggest using GotoBLAS instead of ATLAS. > http://www.tacc.utexas.edu/tacc-projects/ That does look promising -- nay idea what the license is? They don't make it clear on the site (maybe it it is you set up a user account and download, but I'd rather know up front). The only reference I could find is from 2006: http://www.utexas.edu/news/2006/04/12/tacc/ and in that, they refer to one of those annoying "free for academic and scientific use" clauses. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sturla at molden.no Thu Jan 7 12:47:25 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 7 Jan 2010 18:47:25 +0100 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <4B461B9F.5060800@noaa.gov> References: <002401ca8f3f$fdae2340$f90a69c0$%yang@physics.usyd.edu.au> <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> <4B461B9F.5060800@noaa.gov> Message-ID: > Sturla Molden wrote: >> I would suggest using GotoBLAS instead of ATLAS. > >> http://www.tacc.utexas.edu/tacc-projects/ > > That does look promising -- nay idea what the license is? They don't > make it clear on the site UT TACC Research License (Source Code) The Texas Advanced Computing Center of The University of Texas at Austin has developed certain software and documentation that it desires to make available without charge to anyone for academic, research, experimental or personal use. This license is designed to guarantee freedom to use the software for these purposes. If you wish to distribute or make other use of the software, you may purchase a license to do so from the University of Texas. The accompanying source code is made available to you under the terms of this UT TACC Research License (this "UTTRL"). By clicking the "ACCEPT" button, or by installing or using the code, you are consenting to be bound by this UTTRL. If you do not agree to the terms and conditions of this license, do not click the "ACCEPT" button, and do not install or use any part of the code. The terms and conditions in this UTTRL not only apply to the source code made available by UT TACC, but also to any improvements to, or derivative works of, that source code made by you and to any object code compiled from such source code, improvements or derivative works. 1. DEFINITIONS. 1.1 "Commercial Use" shall mean use of Software or Documentation by Licensee for direct or indirect financial, commercial or strategic gain or advantage, including without limitation: (a) bundling or integrating the Software with any hardware product or another software product for transfer, sale or license to a third party (even if distributing the Software on separate media and not charging for the Software); (b) providing customers with a link to the Software or a copy of the Software for use with hardware or another software product purchased by that customer; or (c) use in connection with the performance of services for which Licensee is compensated. 1.2 "Derivative Products" means any improvements to, or other derivative works of, the Software made by Licensee. 1.3 "Documentation" shall mean all manuals, user documentation, and other related materials pertaining to the Software that are made available to Licensee in connection with the Software. 1.4 "Licensor" shall mean The University of Texas. 1.5 "Licensee" shall mean the person or entity that has agreed to the terms hereof and is exercising rights granted hereunder. 1.6 "Software" shall mean the computer program(s) referred to as GotoBLAS2 made available under this UTTRL in source code form, including any error corrections, bug fixes, patches, updates or other modifications that Licensor may in its sole discretion make available to Licensee from time to time, and any object code compiled from such source code. 2. GRANT OF RIGHTS. Subject to the terms and conditions hereunder, Licensor hereby grants to Licensee a worldwide, non-transferable, non-exclusive license to (a) install, use and reproduce the Software for academic, research, experimental and personal use (but specifically excluding Commercial Use); (b) use and modify the Software to create Derivative Products, subject to Section 3.2; and (c) use the Documentation, if any, solely in connection with Licensee's authorized use of the Software. 3. RESTRICTIONS; COVENANTS. 3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer copies or rights to the Software (or any portion thereof) or the Documentation; (b) use the Software (or any portion thereof) or Documentation for Commercial Use, or for any other use except as described in Section 2; (c) copy the Software or Documentation other than for archival and backup purposes; or (d) remove any product identification, copyright, proprietary notices or labels from the Software and Documentation. This UTTRL confers no rights upon Licensee except those expressly granted herein. 3.2 Licensee hereby agrees that it will provide a copy of all Derivative Products to Licensor and that its use of the Derivative Products will be subject to all of the same terms, conditions, restrictions and limitations on use imposed on the Software under this UTTRL. Licensee hereby grants Licensor a worldwide, non-exclusive, royalty-free license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute Derivative Products. Licensee also hereby grants Licensor a worldwide, non-exclusive, royalty-free patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Derivative Products under those patent claims licensable by Licensee that are necessarily infringed by the Derivative Products. 4. PROTECTION OF SOFTWARE. 4.1 Confidentiality. The Software and Documentation are the confidential and proprietary information of Licensor. Licensee agrees to take adequate steps to protect the Software and Documentation from unauthorized disclosure or use. Licensee agrees that it will not disclose the Software or Documentation to any third party. 4.2 Proprietary Notices. Licensee shall maintain and place on any copy of Software or Documentation that it reproduces for internal use all notices as are authorized and/or required hereunder. Licensee shall include a copy of this UTTRL and the following notice, on each copy of the Software and Documentation. Such license and notice shall be embedded in each copy of the Software, in the video screen display, on the physical medium embodying the Software copy and on any Documentation: Copyright ?? The University of Texas, 2009. All right reserved. UNIVERSITY EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES CONCERNING THIS SOFTWARE AND DOCUMENTATION, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, NON-INFRINGEMENT AND WARRANTIES OF PERFORMANCE, AND ANY WARRANTY THAT MIGHT OTHERWISE ARISE FROM COURSE OF DEALING OR USAGE OF TRADE. NO WARRANTY IS EITHER EXPRESS OR IMPLIED WITH RESPECT TO THE USE OF THE SOFTWARE OR DOCUMENTATION. Under no circumstances shall University be liable for incidental, special, indirect, direct or consequential damages or loss of profits, interruption of business, or related expenses which may arise from use of Software or Documentation, including but not limited to those resulting from defects in Software and/or Documentation, or loss or inaccuracy of data of any kind. 5. WARRANTIES. 5.1 Disclaimer of Warranties. TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE SOFTWARE AND DOCUMENTATION ARE BEING PROVIDED ON AN "AS IS" BASIS WITHOUT ANY WARRANTIES OF ANY KIND RESPECTING THE SOFTWARE OR DOCUMENTATION, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF DESIGN, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. 5.2 Limitation of Liability. UNDER NO CIRCUMSTANCES UNLESS REQUIRED BY APPLICABLE LAW SHALL LICENSOR BE LIABLE FOR INCIDENTAL, SPECIAL, INDIRECT, DIRECT OR CONSEQUENTIAL DAMAGES OR LOSS OF PROFITS, INTERRUPTION OF BUSINESS, OR RELATED EXPENSES WHICH MAY ARISE AS A RESULT OF THIS LICENSE OR OUT OF THE USE OR ATTEMPT OF USE OF SOFTWARE OR DOCUMENTATION INCLUDING BUT NOT LIMITED TO THOSE RESULTING FROM DEFECTS IN SOFTWARE AND/OR DOCUMENTATION, OR LOSS OR INACCURACY OF DATA OF ANY KIND. THE FOREGOING EXCLUSIONS AND LIMITATIONS WILL APPLY TO ALL CLAIMS AND ACTIONS OF ANY KIND, WHETHER BASED ON CONTRACT, TORT (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE), OR ANY OTHER GROUNDS. 6. INDEMNIFICATION. Licensee shall indemnify, defend and hold harmless Licensor, the University of Texas System, their Regents, and their officers, agents and employees from and against any claims, demands, or causes of action whatsoever caused by, or arising out of, or resulting from, the exercise or practice of the license granted hereunder by Licensee, its officers, employees, agents or representatives. 7. TERMINATION. If Licensee breaches this UTTRL, Licensee\'s right to use the Software and Documentation will terminate immediately without notice, but all provisions of this UTTRL except Section 2 will survive termination and continue in effect. Upon termination, Licensee must destroy all copies of the Software and Documentation. 8. GOVERNING LAW; JURISDICTION AND VENUE. The validity, interpretation, construction and performance of this UTTRL shall be governed by the laws of the State of Texas. The Texas state courts of Travis County, Texas (or, if there is exclusive federal jurisdiction, the United States District Court for the Central District of Texas) shall have exclusive jurisdiction and venue over any dispute arising out of this UTTRL, and Licensee consents to the jurisdiction of such courts. Application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. 9. EXPORT CONTROLS. This license is subject to all applicable export restrictions. Licensee must comply with all export and import laws and restrictions and regulations of any United States or foreign agency or authority relating to the Software and its use. 10. U.S. GOVERNMENT END-USERS. The Software is a "commercial item," as that term is defined in 48 C.F.R. 2.101, consisting of "commercial computer software" and "commercial computer software documentation," as such terms are used in 48 C.F.R. 12.212 (Sept. 1995) and 48 C.F.R. 227.7202 (June 1995). Consistent with 48 C.F.R. 12.212, 48 C.F.R. 27.405(b)(2) (June 1998) and 48 C.F.R. 227.7202, all U.S. Government End Users acquire the Software with only those rights as set forth herein. 11. MISCELLANEOUS. If any provision hereof shall be held illegal, invalid or unenforceable, in whole or in part, such provision shall be modified to the minimum extent necessary to make it legal, valid and enforceable, and the legality, validity and enforceability of all other provisions of this UTTRL shall not be affected thereby. Licensee may not assign this UTTRL in whole or in part, without Licensor's prior written consent. Any attempt to assign this UTTRL without such consent will be null and void. This UTTRL is the complete and exclusive statement between Licensee and Licensor relating to the subject matter hereof and supersedes all prior oral and written and all contemporaneous oral negotiations, commitments and understandings of the parties, if any. Any waiver by either party of any default or breach hereunder shall not constitute a waiver of any provision of this UTTRL or of any subsequent default or breach of the same or a different kind. END OF LICENSE From Nikolas.Tezak at gmx.de Thu Jan 7 12:51:44 2010 From: Nikolas.Tezak at gmx.de (Nikolas Tezak) Date: Thu, 7 Jan 2010 18:51:44 +0100 Subject: [Numpy-discussion] Behaviour of vdot(array2d, array1d) Message-ID: <90626729-8117-4C72-9509-E1BE7D7F7933@gmx.de> Hi, I am new to this list, but I have been using scipy for a couple of months now with great satisfaction. Currently I have a problem: I diagonalize a hermitian complex matrix using the eigh routine from scipy.linalg (this is still a numpy question, see below) This returns the eigenvectors as columns of a 2d array. Now I would like to project a vector onto this new basis. I could either do: inital_state = array(...) #dtype=complex, shape=(dim,) coefficients = zeros( shape=(dim,), dtype=complex) matrix = array(...) #dtype=complex, shape=(dim, dim) eigenvalues, eigenvectors = eigh(matrix) for i in xrange(dim): coefficients[i] = vdot(eigenvalues[:, i], initial_state) But it seems to me after reading the documentation for vdot, that it should also be possible to do this without a loop: initial_state = array(...) #dtype=complex, shape=(dim,) matrix = array(...) #dtype=complex, shape=(dim, dim) eigenvalues, eigenvectors = eigh(matrix) coefficients = vdot( eigenvalues.transpose(), initial_state) However when I do this, vdot raises a ValueError complaining that the "vectors have different lengths". It seems that vdot (as opposed to dot) cannot handle arguments with different shape although the documentation suggests otherwise. I am using numpy version 1.3.0. Is this a bug or am I missing something? Regards, Nikolas From Chris.Barker at noaa.gov Thu Jan 7 14:16:40 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 11:16:40 -0800 Subject: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <5b8d13221001061712m3498390cse4b2ae6b37fc19e2@mail.gmail.com> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> <4B44BBE3.9090601@noaa.gov> <5b8d13221001061712m3498390cse4b2ae6b37fc19e2@mail.gmail.com> Message-ID: <4B463318.5060902@noaa.gov> David Cournapeau wrote: > On Thu, Jan 7, 2010 at 1:35 AM, Christopher Barker >> In the past, I think folks' have used the default >> name provided by bdist_mpkg, and those are not always clear. Something like: >> >> >> numpy1.4-osx10.4-python.org2.6-32bit.dmg > > The 32 bits is redundant - we support all archs supported by the > official python binary, so python.org is enough. True, though I was anticipating that there may be 32 and 64 bit builds some day. > About osx10.4, As for that -- I put that in 'cause I remembered that in the past it has said "10.5", when, in fact 10.4 was supported. Thinking more, I think it's like 32 bit -- the python.org build supports 10.4, so that's all the information folks need. > still don't know how to make sure we do work there with distutils. The > whole MACOSX_DEPLOYMENT_TARGET confuses me quite a lot. distutils should do it right, and indeed, I just tested the py2.5 and py2.6 binaries on my 10.4 PPC machine ,and most of the tests all pass on both. (though see the note below) I think distutils does do it right, at least if you use the latest version of 2.6 -- a bug was fixed there. What OS/architecture were those built with? > Other than > that, the numpy 1.4.0 follows your advice, and contains the python.org > part. I should have looked first -- thanks, I think that will be helpful. NOTE: When I first installed the binary, I got a whole bunch of errors because "matrix' wasn't found. I recalled this issue from testing, and cleared out the install, then re-installed, and all was fine. I wonder if it's possible to have a mpkg remove anything? Other failed tests: ====================================================================== FAIL: test_umath.test_nextafterl ... return _test_nextafter(np.longdouble) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy/core/tests/test_umath.py", line 852, in _test_nextafter assert np.nextafter(one, two) - one == eps AssertionError ====================================================================== FAIL: test_umath.test_spacingl ---------------------------------------------------------------------- ... Traceback (most recent call last): line 887, in test_spacingl return _test_spacing(np.longdouble) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy/core/tests/test_umath.py", line 873, in _test_spacing assert np.spacing(one) == eps AssertionError I think both of those are known issues, and not a big deal. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jan 7 15:08:23 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 12:08:23 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1262721695.5107.1.camel@idol> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> Message-ID: <4B463F37.4010108@noaa.gov> Pauli Virtanen wrote: > ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti: > it also does odd things with spaces >> embedded in the separator: >> >> ", $ #" matches all of: ",$#" ", $#" ",$ #" > That's a documented feature: Fair enough. OK, I've written a patch that allows newlines to be interpreted as separators in addition to whatever is specified in sep. In the process of testing, I found again these issues, which are still marked as "needs decision". http://projects.scipy.org/numpy/ticket/883 In short: what to do with missing values? I'd like to address this bug, but I need a decision to do so. My proposal: Raise an ValueError with missing values. Justification: No function should EVER return data that is not there. Period. It is simply asking for hard to find bugs. Therefore: fromstring("3, 4,,5", sep=",") Should never, ever, return: array([ 3., 4., 0., 5.]) Which is what it does now. bad. bad. bad. Alternatives: A) Raising a ValueError is the easiest way to get "proper" behavior. Folks can use a more sophisticated file reader if they want missing values handled. I'm willing to contribute this patch. B) If the dtype is a floating point type, NaN could fill in the missing values -- a fine idea, but you can't use it for integers, and zero is a really bad replacement! C) The user could specify what they want filled in for missing values. This is a fine idea, though I'm not sure I want to take the time to impliment it. Oh, and this is a bug too, with probably the same solution: In [20]: np.fromstring("hjba", sep=',') Out[20]: array([ 0.]) In [26]: np.fromstring("34gytf39", sep=',') Out[26]: array([ 34.]) One more unresolved question: what should: np.fromstring("3, 4, 5,", sep=",") return? it currently returns: array([ 3., 4., 5.]) which seems a bit inconsitent with missing value handling. I also found a bug: In [6]: np.fromstring("3, 4, 5 , ", sep=",") Out[6]: array([ 3., 4., 5., 0.]) so if there is some extra whitespace in there, it does return a missing value. With my proposal, that wouldn't happen, but you might get an exception. I think you should, but it'll be easier to implement my "allow newlines" code if not. so, should I do (A) ? Another question: I've got a patch mostly working (except for the above issues) that will allow fromfile/string to read multiline non-whitespace separated data in one shot: In [15]: str Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' In [16]: np.fromstring(str, sep=',', allow_newlines=True) Out[16]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.]) I think this is a very helpful enhancement, and, as it is a new kwarg, backward compatible: 1) Might it be accepted for inclusion? 2) Is the name for the flag OK: "allow_newlines"? It's pretty explicit, but also long -- I used it for the flag name in the C code, too. 3) What C datatype should I use for a boolean flag? I used a char, but I don't know what the numpy standard is. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Thu Jan 7 15:32:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 Jan 2010 15:32:55 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B463F37.4010108@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> Message-ID: <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker wrote: > Pauli Virtanen wrote: >> ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti: >> it also does odd things with spaces >>> embedded in the separator: >>> >>> ", $ #" matches all of: ?",$#" ? ", $#" ?",$ #" > >> That's a documented feature: > > Fair enough. > > OK, I've written a patch that allows newlines to be interpreted as > separators in addition to whatever is specified in sep. > > In the process of testing, I found again these issues, which are still > marked as "needs decision". > > http://projects.scipy.org/numpy/ticket/883 > > In short: what to do with missing values? > > I'd like to address this bug, but I need a decision to do so. > > > My proposal: > > Raise an ValueError with missing values. > > > Justification: > > No function should EVER return data that is not there. Period. It is > simply asking for hard to find bugs. Therefore: > > fromstring("3, 4,,5", sep=",") > > Should never, ever, return: > > array([ 3., ?4., ?0., ?5.]) > > Which is what it does now. bad. bad. bad. > > > > > Alternatives: > > ? A) Raising a ValueError is the easiest way to get "proper" behavior. > Folks can use a more sophisticated file reader if they want missing > values handled. I'm willing to contribute this patch. > > ? B) If the dtype is a floating point type, NaN could fill in the > missing values -- a fine idea, but you can't use it for integers, and > zero is a really bad replacement! > > ? C) The user could specify what they want filled in for missing > values. This is a fine idea, though I'm not sure I want to take the time > to impliment it. > > Oh, and this is a bug too, with probably the same solution: > > In [20]: np.fromstring("hjba", sep=',') > Out[20]: array([ 0.]) > > In [26]: np.fromstring("34gytf39", sep=',') > Out[26]: array([ 34.]) > > > One more unresolved question: > > what should: > > np.fromstring("3, 4, 5,", sep=",") > > return? > > it currently returns: > > array([ 3., ?4., ?5.]) > > which seems a bit inconsitent with missing value handling. I also found > a bug: > > In [6]: np.fromstring("3, 4, 5 , ", sep=",") > Out[6]: array([ 3., ?4., ?5., ?0.]) > > so if there is some extra whitespace in there, it does return a missing > value. With my proposal, that wouldn't happen, but you might get an > exception. I think you should, but it'll be easier to implement my > "allow newlines" code if not. > > > so, should I do (A) ? > > > Another question: > > I've got a patch mostly working (except for the above issues) that will > allow fromfile/string to read multiline non-whitespace separated data in > one shot: > > > In [15]: str > Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' > > In [16]: np.fromstring(str, sep=',', allow_newlines=True) > Out[16]: > array([ ?1., ? 2., ? 3., ? 4., ? 5., ? 6., ? 7., ? 8., ? 9., ?10., ?11., > ? ? ? ? 12.]) > > > I think this is a very helpful enhancement, and, as it is a new kwarg, > backward compatible: > > 1) Might it be accepted for inclusion? > > 2) Is the name for the flag OK: "allow_newlines"? It's pretty explicit, > but also long -- I used it for the flag name in the C code, too. > > 3) What C datatype should I use for a boolean flag? I used a char, but I > don't know what the numpy standard is. > > > -Chris > > I don't know much about this, just a few more test cases comma and newline str = '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12' extra comma at end of file str = '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,' extra newlines at end of file str = '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' It would be nice if these cases would go through without missing values or exception, but I don't often have files that are clean enough for fromfile(). I'm in favor of nan for missing values with floating point numbers. It would make it easy to read correctly formatted csv files, even if the data is not complete. Josef > > > > > > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From bsouthey at gmail.com Thu Jan 7 16:11:01 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 7 Jan 2010 15:11:01 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> Message-ID: On Thu, Jan 7, 2010 at 2:32 PM, wrote: > On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker > wrote: >> Pauli Virtanen wrote: >>> ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti: >>> it also does odd things with spaces >>>> embedded in the separator: >>>> >>>> ", $ #" matches all of: ?",$#" ? ", $#" ?",$ #" >> >>> That's a documented feature: >> >> Fair enough. >> >> OK, I've written a patch that allows newlines to be interpreted as >> separators in addition to whatever is specified in sep. >> >> In the process of testing, I found again these issues, which are still >> marked as "needs decision". >> >> http://projects.scipy.org/numpy/ticket/883 >> >> In short: what to do with missing values? >> >> I'd like to address this bug, but I need a decision to do so. >> >> >> My proposal: >> >> Raise an ValueError with missing values. >> >> >> Justification: >> >> No function should EVER return data that is not there. Period. It is >> simply asking for hard to find bugs. Therefore: >> >> fromstring("3, 4,,5", sep=",") >> >> Should never, ever, return: >> >> array([ 3., ?4., ?0., ?5.]) >> >> Which is what it does now. bad. bad. bad. >> >> >> >> >> Alternatives: >> >> ? A) Raising a ValueError is the easiest way to get "proper" behavior. >> Folks can use a more sophisticated file reader if they want missing >> values handled. I'm willing to contribute this patch. >> >> ? B) If the dtype is a floating point type, NaN could fill in the >> missing values -- a fine idea, but you can't use it for integers, and >> zero is a really bad replacement! >> >> ? C) The user could specify what they want filled in for missing >> values. This is a fine idea, though I'm not sure I want to take the time >> to impliment it. >> >> Oh, and this is a bug too, with probably the same solution: >> >> In [20]: np.fromstring("hjba", sep=',') >> Out[20]: array([ 0.]) >> >> In [26]: np.fromstring("34gytf39", sep=',') >> Out[26]: array([ 34.]) >> >> >> One more unresolved question: >> >> what should: >> >> np.fromstring("3, 4, 5,", sep=",") >> >> return? >> >> it currently returns: >> >> array([ 3., ?4., ?5.]) >> >> which seems a bit inconsitent with missing value handling. I also found >> a bug: >> >> In [6]: np.fromstring("3, 4, 5 , ", sep=",") >> Out[6]: array([ 3., ?4., ?5., ?0.]) >> >> so if there is some extra whitespace in there, it does return a missing >> value. With my proposal, that wouldn't happen, but you might get an >> exception. I think you should, but it'll be easier to implement my >> "allow newlines" code if not. >> >> >> so, should I do (A) ? >> >> >> Another question: >> >> I've got a patch mostly working (except for the above issues) that will >> allow fromfile/string to read multiline non-whitespace separated data in >> one shot: >> >> >> In [15]: str >> Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >> >> In [16]: np.fromstring(str, sep=',', allow_newlines=True) >> Out[16]: >> array([ ?1., ? 2., ? 3., ? 4., ? 5., ? 6., ? 7., ? 8., ? 9., ?10., ?11., >> ? ? ? ? 12.]) >> >> >> I think this is a very helpful enhancement, and, as it is a new kwarg, >> backward compatible: >> >> 1) Might it be accepted for inclusion? >> >> 2) Is the name for the flag OK: "allow_newlines"? It's pretty explicit, >> but also long -- I used it for the flag name in the C code, too. >> >> 3) What C datatype should I use for a boolean flag? I used a char, but I >> don't know what the numpy standard is. >> >> >> -Chris >> >> > > I don't know much about this, just a few more test cases > > comma and newline > str = ?'1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12' > > extra comma at end of file > str = ?'1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,' > > extra newlines at end of file > str = ?'1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' > > It would be nice if these cases would go through without missing > values or exception, but I don't often have files that are clean > enough for fromfile(). > > I'm in favor of nan for missing values with floating point numbers. It > would make it easy to read correctly formatted csv files, even if the > data is not complete. > Using the numpy NaN or similar (noting R's approach to missing values which in turn allows it to have the above functionality) is just a very bad idea for missing values because you always have to check that which NaN is a missing value and which was due to some numerical calculation. It is a very bad idea because we have masked arrays that nicely but slowly handle this situation. >From what I can see is that you expect that fromfile() should only split at the supplied delimiters, optionally(?) strip any whitespace and force a specific dtype. I would agree that the failure of any of one these should create an exception by default rather than making the best guess. So 'missing data' would potentially fail with forcing the specified dtype. Thus, you should either create an exception for invalid data (with appropriate location) or use masked arrays. Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' actually assumes multiple delimiters because there is no comma between 4 and 5 and 8 and 9. So I think it would be better if fromfile accepted multiple delimiters. In Josef's last case how many 'missing values should there be? Bruce From oliphant at enthought.com Thu Jan 7 16:11:12 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 7 Jan 2010 15:11:12 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> Message-ID: <1822EAEB-7243-4C83-8FB0-B55DE40503DF@enthought.com> On Jan 7, 2010, at 2:32 PM, josef.pktd at gmail.com wrote: > On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker > wrote: >> Pauli Virtanen wrote: >>> ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti: >>> it also does odd things with spaces >>>> embedded in the separator: >>>> >>>> ", $ #" matches all of: ",$#" ", $#" ",$ #" >> >>> That's a documented feature: >> >> Fair enough. >> >> OK, I've written a patch that allows newlines to be interpreted as >> separators in addition to whatever is specified in sep. >> >> In the process of testing, I found again these issues, which are >> still >> marked as "needs decision". >> >> http://projects.scipy.org/numpy/ticket/883 >> >> In short: what to do with missing values? >> >> I'd like to address this bug, but I need a decision to do so. >> >> >> My proposal: >> >> Raise an ValueError with missing values. >> >> >> Justification: >> >> No function should EVER return data that is not there. Period. It is >> simply asking for hard to find bugs. Therefore: >> >> fromstring("3, 4,,5", sep=",") >> >> Should never, ever, return: >> >> array([ 3., 4., 0., 5.]) >> >> Which is what it does now. bad. bad. bad. >> >> >> >> >> Alternatives: >> >> A) Raising a ValueError is the easiest way to get "proper" >> behavior. >> Folks can use a more sophisticated file reader if they want missing >> values handled. I'm willing to contribute this patch. >> >> B) If the dtype is a floating point type, NaN could fill in the >> missing values -- a fine idea, but you can't use it for integers, and >> zero is a really bad replacement! >> >> C) The user could specify what they want filled in for missing >> values. This is a fine idea, though I'm not sure I want to take the >> time >> to impliment it. >> >> Oh, and this is a bug too, with probably the same solution: >> >> In [20]: np.fromstring("hjba", sep=',') >> Out[20]: array([ 0.]) >> >> In [26]: np.fromstring("34gytf39", sep=',') >> Out[26]: array([ 34.]) >> >> >> One more unresolved question: >> >> what should: >> >> np.fromstring("3, 4, 5,", sep=",") >> >> return? >> >> it currently returns: >> >> array([ 3., 4., 5.]) >> >> which seems a bit inconsitent with missing value handling. I also >> found >> a bug: >> >> In [6]: np.fromstring("3, 4, 5 , ", sep=",") >> Out[6]: array([ 3., 4., 5., 0.]) >> >> so if there is some extra whitespace in there, it does return a >> missing >> value. With my proposal, that wouldn't happen, but you might get an >> exception. I think you should, but it'll be easier to implement my >> "allow newlines" code if not. >> >> >> so, should I do (A) ? >> >> >> Another question: >> >> I've got a patch mostly working (except for the above issues) that >> will >> allow fromfile/string to read multiline non-whitespace separated >> data in >> one shot: >> >> >> In [15]: str >> Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >> >> In [16]: np.fromstring(str, sep=',', allow_newlines=True) >> Out[16]: >> array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., >> 11., >> 12.]) >> >> >> I think this is a very helpful enhancement, and, as it is a new >> kwarg, >> backward compatible: >> >> 1) Might it be accepted for inclusion? >> >> 2) Is the name for the flag OK: "allow_newlines"? It's pretty >> explicit, >> but also long -- I used it for the flag name in the C code, too. >> >> 3) What C datatype should I use for a boolean flag? I used a char, >> but I >> don't know what the numpy standard is. >> >> >> -Chris >> >> > > I don't know much about this, just a few more test cases > > comma and newline > str = '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12' > > extra comma at end of file > str = '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,' > > extra newlines at end of file > str = '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' > > It would be nice if these cases would go through without missing > values or exception, but I don't often have files that are clean > enough for fromfile(). +1 (ignoring new-lines transparently is a nice feature). You can also use sscanf with weave to read most files. > > I'm in favor of nan for missing values with floating point numbers. It > would make it easy to read correctly formatted csv files, even if the > data is not complete. +1 (much preferrable to insert NaN or other user value than raise ValueError in my opinion) -Travis From Chris.Barker at noaa.gov Thu Jan 7 16:45:41 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 13:45:41 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> Message-ID: <4B465605.3010406@noaa.gov> Bruce Southey wrote: >> wrote: > Using the numpy NaN or similar (noting R's approach to missing values > which in turn allows it to have the above functionality) is just a > very bad idea for missing values because you always have to check that > which NaN is a missing value and which was due to some numerical > calculation. well, this is specific to reading files, so you know where it came from. And the principle of fromfile() is that it is fast and simple, if you want masked arrays, use slower, but more full-featured methods. However, in this case: In [9]: np.fromstring("3, 4, NaN, 5", sep=",") Out[9]: array([ 3., 4., NaN, 5.]) An actual NaN is read from the file, rather than a missing value. Perhaps the user does want the distinction, so maybe it should really only fil it in if the users asks for it, but specifying "missing_value=np.nan" or something. >>From what I can see is that you expect that fromfile() should only > split at the supplied delimiters, optionally(?) strip any whitespace whitespace stripping is not optional. > Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' > actually assumes multiple delimiters because there is no comma between > 4 and 5 and 8 and 9. Yes, that's the point. I thought about allowing arbitrary multiple delimiters, but I think '/n' is a special case - for instance, a comma at the end of some numbers might mean missing data, but a '\n' would not. And I couldn't really think of a useful use-case for arbitrary multiple delimiters. > In Josef's last case how many 'missing values should there be? >> extra newlines at end of file >> str = '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' none -- exactly why I think \n is a special case. What about: >> extra newlines in the middle of the file >> str = '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' I think they should be ignored, but I hope I'm not making something that is too specific to my personal needs. Travis Oliphant wrote: > +1 (ignoring new-lines transparently is a nice feature). You can also > use sscanf with weave to read most files. right -- but that requires weave. In fact, MATLAB has a fscanf function that allows you to pass in a C format string and it vectorizes it to use the same one over an over again until it's done. It's actually quite powerful and flexible. I once started with that in mind, but didn't have the C chops to do it. I ended up with a tool that only did doubles (come to think of it, MATLAB only does doubles, anyway...) I may some day write a whole new C (or, more likely, Cython) function that does something like that, but for now, I'm jsut trying to get fromfile to be useful for me. > +1 (much preferrable to insert NaN or other user value than raise > ValueError in my opinion) But raise an error for integer types? I guess this is still up the air -- no consensus yet. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From james.mazer at yale.edu Thu Jan 7 16:54:44 2010 From: james.mazer at yale.edu (James Mazer) Date: Thu, 07 Jan 2010 16:54:44 -0500 Subject: [Numpy-discussion] cPickle/unPickle across archs Message-ID: <4B465824.70009@yale.edu> Hi, I've got a some Numeric arrays that were created without an explicit byte size in the initial declaration and pickled. Something like this: >>> cPickle.write(array(ones((3,3,)), 'f'), open('foo.pic', 'w')) as opposed to: >>> cPickle.write(array(ones((3,3,)), Float32), open('foo.pic', 'w')) This works as long as the word size doesn't change between the reading and writing machines. The data were generated under a 32bit linux kernel and now I'm trying to read them under a 64bit kernel, so the word size has changed and Numeric assumes that the 'f' type is the NATIVE float and 'l' type is the NATIVE long) and dies miserable when the native types don't match the actual types (which defeats the whole point of pickling, to some extent -- I thought that cPickle.save/load were "ensured" to be invertable...) I've got terrabytes of data that need to be read by both 32bit and 64bit machines (and it's not really feasible to scan all the files into new structures with explict types on a 32bit machine). Anybody have hints for addressing this problem? I found similar questions, but no answers, so I'm not completely alone iwth this problem. Thanks, /jamie -- James Mazer Department of Neurobiology Yale School of Medicine phone: 203-737-5853 fax: 203-785-5263 From robert.kern at gmail.com Thu Jan 7 17:30:24 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 Jan 2010 16:30:24 -0600 Subject: [Numpy-discussion] cPickle/unPickle across archs In-Reply-To: <4B465824.70009@yale.edu> References: <4B465824.70009@yale.edu> Message-ID: <3d375d731001071430m8f7dbdg56008e5f53606079@mail.gmail.com> On Thu, Jan 7, 2010 at 15:54, James Mazer wrote: > Hi, > > I've got a some Numeric arrays that were created without > an explicit byte size in the initial declaration and pickled. > Something like this: > > ? >>> cPickle.write(array(ones((3,3,)), 'f'), open('foo.pic', 'w')) > > as opposed to: > > ? >>> cPickle.write(array(ones((3,3,)), Float32), open('foo.pic', 'w')) > > This works as long as the word size doesn't change between the > reading and writing machines. > > The data were generated under a 32bit linux kernel and now I'm trying > to read them under a 64bit kernel, so the word size has changed and > Numeric assumes that the 'f' type is the NATIVE float Please note that 'f' is always a 32-bit float on any machine. Only integers may change size. > and 'l' type is > the NATIVE long) and dies miserable when the native types don't match > the actual types (which defeats the whole point of pickling, to some > extent -- I thought that cPickle.save/load were "ensured" to be > invertable...) I don't think cPickle ensures much at all. It's actually rather fragile for persisting data over long times and between different environments. It works better as a wire format for communication between similar codebases when thoroughly tested on both ends. Using a standard scientific file format for storing your important data has always been de rigeur. That said, it is a deficiency in Numeric that it records the native typecode instead of a platform-neutral, explicitly sized typecode. Unfortunately, Numeric has been deprecated for many years now, and is not maintained. Numeric's replacement, numpy, does not have this problem. > I've got terrabytes of data that need to be read by both 32bit and > 64bit machines (and it's not really feasible to scan all the files > into new structures with explict types on a 32bit machine). Anybody > have hints for addressing this problem? ?I found similar questions, > but no answers, so I'm not completely alone iwth this problem. What you can do is monkeypatch the function Numeric.array_constructor() to do "the right thing" for your case when it sees a platform-specific integer typecode. Something like the following (untested; you may need to generalize it to handle the unsigned integer typecodes, too, if you have that kind of data): import Numeric i_size = Numeric.empty(0, 'i').itemsize() def patched_array_constructor(shape, typecode, thestr, Endian=Numeric.LittleEndian): if typecode == "l": # Ensure that the length of the data matches our expectations. size = Numeric.product(shape) itemsize = len(thestr) // size if itemsize == i_size: typecode = 'i' if typecode == "O": x = Numeric.array(thestr,"O") else: x = Numeric.fromstring(thestr, typecode) x.shape = shape if LittleEndian != Endian: return x.byteswapped() else: return x Numeric.array_constructor = patched_array_constructor After you have done that, cPickle.load() will use that patched function to reconstruct the arrays and make sure that the appropriate typecode is used to interpret the data. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From weisen123 at gmail.com Thu Jan 7 17:58:30 2010 From: weisen123 at gmail.com (neil weisenfeld) Date: Thu, 7 Jan 2010 17:58:30 -0500 Subject: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B44BBE3.9090601@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> <4B44BBE3.9090601@noaa.gov> Message-ID: <9d5ec4221001071458u4255a0d6ne966be82272ea4fd@mail.gmail.com> On Wed, Jan 6, 2010 at 11:35 AM, Christopher Barker wrote: > > It's worse to have a binary you expect to work fail for you than to not > have one available. IN the past, I think folks' have used the default > name provided by bdist_mpkg, and those are not always clear. Something like: > > > numpy1.4-osx10.4-python.org2.6-32bit.dmg > > or something -- even better, with a a bit more text -- would help a lot. > I agree here. Better labeling of the .dmg would indeed help, I think. And thanks to everyone for all of the responses. I joined the mailing list, posted my question, and then went back to dissertation writing for a few days. When I looked up, there were 18 answers. I'll try getting python from python.org and/or building it all from scratch. Thanks again, Neil From josef.pktd at gmail.com Thu Jan 7 18:15:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 Jan 2010 18:15:46 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B465605.3010406@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> Message-ID: <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> On Thu, Jan 7, 2010 at 4:45 PM, Christopher Barker wrote: > Bruce Southey wrote: >>> wrote: > >> Using the numpy NaN or similar (noting R's approach to missing values >> which in turn allows it to have the above functionality) is just a >> very bad idea for missing values because you always have to check that >> which NaN is a missing value and which was due to some numerical >> calculation. > > well, this is specific to reading files, so you know where it came from. > And the principle of fromfile() is that it is fast and simple, if you > want masked arrays, use slower, but more full-featured methods. > > However, in this case: > > In [9]: np.fromstring("3, 4, NaN, 5", sep=",") > Out[9]: array([ ?3., ? 4., ?NaN, ? 5.]) > > > An actual NaN is read from the file, rather than a missing value. > Perhaps the user does want the distinction, so maybe it should really > only fil it in if the users asks for it, but specifying > "missing_value=np.nan" or something. > >>>From what I can see is that you expect that fromfile() should only >> split at the supplied delimiters, optionally(?) strip any whitespace > > whitespace stripping is not optional. > >> Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >> actually assumes multiple delimiters because there is no comma between >> 4 and 5 and 8 and 9. > > Yes, that's the point. I thought about allowing arbitrary multiple > delimiters, but I think '/n' is a special case - for instance, a comma > at the end of some numbers might mean missing data, but a '\n' would not. > > And I couldn't really think of a useful use-case for arbitrary multiple > delimiters. > >> In Josef's last case how many 'missing values should there be? > > ?>> extra newlines at end of file > ?>> str = ?'1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' > > none -- exactly why I think \n is a special case. > > What about: > ?>> extra newlines in the middle of the file > ?>> str = ?'1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' > > I think they should be ignored, but I hope I'm not making something that > is too specific to my personal needs. > > Travis Oliphant wrote: >> +1 (ignoring new-lines transparently is a nice feature). ?You can also >> use sscanf with weave to read most files. > > right -- but that requires weave. In fact, MATLAB has a fscanf function > that allows you to pass in a C format string and it vectorizes it to use > the same one over an over again until it's done. It's actually quite > powerful and flexible. I once started with that in mind, but didn't have > the C chops to do it. I ended up with a tool that only did doubles (come > to think of it, MATLAB only does doubles, anyway...) > > I may some day write a whole new C (or, more likely, Cython) function > that does something like that, but for now, I'm jsut trying to get > fromfile to be useful for me. > > >> +1 ? (much preferrable to insert NaN or other user value than raise >> ValueError in my opinion) > > But raise an error for integer types? > > I guess this is still up the air -- no consensus yet. raise an exception, I hate the silent cast of nan to integer zero, too much debugging and useless if there are real zeros. (or use some -999 kind of thing if user defined nan codes are allowed, but I just work with float if I expect nans/missing values.) Josef > > Thanks, > > -Chris > > > > > > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Thu Jan 7 18:29:16 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 15:29:16 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> Message-ID: <4B466E4C.10504@noaa.gov> josef.pktd at gmail.com wrote: >>> +1 (much preferrable to insert NaN or other user value than raise >>> ValueError in my opinion) >> But raise an error for integer types? >> >> I guess this is still up the air -- no consensus yet. > > raise an exception, I hate the silent cast of nan to integer zero, me too -- I'm sorry, I wasn't clear -- I'm not going to write any code that returns a zero for a missing value. These are the options I'd consider: 1) Have the user specify what to use for missing values, otherwise, raise an exception 2) Insert a NaN for floating points types, and raise an exception for integer types. what's not clear is whether (2) is a good idea. As for (1), I just don't know if I'm going to get around to writing the code, and I maybe more kwargs is a bad idea -- though maybe not. Enough talk: I've got ugly C code to wade through... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From x.yang at physics.usyd.edu.au Thu Jan 7 18:58:03 2010 From: x.yang at physics.usyd.edu.au (Xue (Sue) Yang) Date: Fri, 8 Jan 2010 10:58:03 +1100 Subject: [Numpy-discussion] Numpy & MKL Message-ID: <003301ca8ff5$40534940$c0f9dbc0$@yang@physics.usyd.edu.au> I understand that intel mkl uses openMP parallel model. Therefore I set environment variable >>os.environ['OMP_NUM_THREADS'] = '4' With same test example, however, still one cpu is used. Do I need any specifications when I run numpy with intel MKL (MKL9.1)? numpy developers would be able to answer this question? I changed the name of numpy-discussion thread to "Numpy & MKL" attempting to draw attentions from wide range of readers. Thanks! Sue On Thu, Jan 7, 2010 at 11:20 AM, Xue (Sue) Yang wrote: > This time, only one cpu was used. Does it mean that our installed intel mkl > 9.1 is not threaded? You would have to consult the MKL documentation - I believe you can control how many threads are used from an environment variable. Also, the exact build commands depend on the version of the MKL, as its libraries often change between versions. David > Thank you for the reply which is useful. > > I also tried to Install numpy with intel mkl 9.1 > I still used gfortran for numpy installation as intel mkl 9.1 supports > gnu compiler. > > I only uncomment these lines for site.cfg in site.cfg.example > > [mkl] > library_dirs = /usr/physics/intel/mkl/lib/32 > include_dirs = /usr/physics/intel/mkl/include > lapack_libs = mkl_lapack > > then I tested the numpy with > > > python > >>import numpy > >>a = numpy.random.randn(6000, 6000) > >>numpy.dot(a, a) > > This time, only one cpu was used. Does it mean that our installed > intel mkl 9.1 is not threaded? > I don't think so. We have used it for openMP parallelization for quite > a while. > > Thanks! > > Sue From dwf at cs.toronto.edu Thu Jan 7 19:48:57 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 7 Jan 2010 19:48:57 -0500 Subject: [Numpy-discussion] Numpy & MKL In-Reply-To: <003301ca8ff5$40534940$c0f9dbc0$@yang@physics.usyd.edu.au> References: <003301ca8ff5$40534940$c0f9dbc0$@yang@physics.usyd.edu.au> Message-ID: On 7-Jan-10, at 6:58 PM, Xue (Sue) Yang wrote: > Do I need any specifications when I run numpy with intel MKL (MKL9.1)? > numpy developers would be able to answer this question? Are you sure you've compiled against MKL properly? What is printed by numpy.show_config()? David From x.yang at physics.usyd.edu.au Thu Jan 7 20:13:22 2010 From: x.yang at physics.usyd.edu.au (Xue (Sue) Yang) Date: Fri, 8 Jan 2010 12:13:22 +1100 Subject: [Numpy-discussion] Numpy & MKL Message-ID: <003a01ca8fff$c62582e0$527088a0$@yang@physics.usyd.edu.au> This is what I had (when I built numpy, I chose gnu compilers instead of intel compilers), >>> numpy.show_config() lapack_opt_info: libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread'] library_dirs = ['/usr/physics/intel/mkl/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/usr/physics/intel/mkl/include'] blas_opt_info: libraries = ['mkl', 'vml', 'guide', 'pthread'] library_dirs = ['/usr/physics/intel/mkl/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/usr/physics/intel/mkl/include'] lapack_mkl_info: libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread'] library_dirs = ['/usr/physics/intel/mkl/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/usr/physics/intel/mkl/include'] blas_mkl_info: libraries = ['mkl', 'vml', 'guide', 'pthread'] library_dirs = ['/usr/physics/intel/mkl/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/usr/physics/intel/mkl/include'] mkl_info: libraries = ['mkl', 'vml', 'guide', 'pthread'] library_dirs = ['/usr/physics/intel/mkl/lib/32'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/usr/physics/intel/mkl/include'] Thanks! Sue >> Do I need any specifications when I run numpy with intel MKL (MKL9.1)? >> numpy developers would be able to answer this question? >Are you sure you've compiled against MKL properly? What is printed by >numpy.show_config()? >David From Chris.Barker at noaa.gov Thu Jan 7 20:21:34 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 07 Jan 2010 17:21:34 -0800 Subject: [Numpy-discussion] fromfile() -- help! In-Reply-To: <4B466E4C.10504@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> Message-ID: <4B46889E.3020008@noaa.gov> OK, I'm trying to dig into the code and figure out how to get it to stop putting in zeros for missing data with fromfile()/fromstring() text reading. It looks like the culprit is this, in arraytypes.c.src: @fname at _scan(FILE *fp, @type@ *ip, void *NPY_UNUSED(ignore), PyArray_Descr *NPY_UNUSED(ignored)) { double result; int ret; ret = NumPyOS_ascii_ftolf(fp, &result); *ip = (@type@) result; return ret; } If I'm reading this right, this gets called for the datatype of interest, and it is passed in a pointer to the file that is being read. if I have NumPyOS_ascii_ftolf right, it should return 0 if it doesn't succesfully read a number. However, this looks like it sets the data in *ip, even if the return value is zero. It does pass on that return value, but, from ctors.c: fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, void *NPY_UNUSED(stream_data)) { /* the NULL argument is for backwards-compatibility */ return dtype->f->scanfunc(*fp, dptr, NULL, dtype); } just moves it on through. This is called from here: if (next(&stream, dptr, dtype, stream_data) < 0) { break; } which is checking for < 0 , so if a zero is returned, it will just go in its merry way... So, have I got that right? Should this get fixed at that last point? One more point, this is a bit different for fromfile and fromstring, so I'm getting really confused! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From david at silveregg.co.jp Thu Jan 7 21:07:03 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 08 Jan 2010 11:07:03 +0900 Subject: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B463318.5060902@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <5b8d13221001051709g58103d1ewf4915d8be38a3277@mail.gmail.com> <4B44B8C0.5010203@noaa.gov> <4B44BBE3.9090601@noaa.gov> <5b8d13221001061712m3498390cse4b2ae6b37fc19e2@mail.gmail.com> <4B463318.5060902@noaa.gov> Message-ID: <4B469347.6000000@silveregg.co.jp> Christopher Barker wrote: > David Cournapeau wrote: >> On Thu, Jan 7, 2010 at 1:35 AM, Christopher Barker >>> In the past, I think folks' have used the default >>> name provided by bdist_mpkg, and those are not always clear. Something like: >>> >>> >>> numpy1.4-osx10.4-python.org2.6-32bit.dmg >> The 32 bits is redundant - we support all archs supported by the >> official python binary, so python.org is enough. > > True, though I was anticipating that there may be 32 and 64 bit builds > some day. I suspect it will be exactly as today, i.e. a universal build with 64 bits. I have not followed closely the discussion on python-dev on that topic, but I believe python 2.7 sill contain 64 bits as an arch. > What OS/architecture were those built with? Snow Leopard. > When I first installed the binary, I got a whole bunch of errors because > "matrix' wasn't found. I recalled this issue from testing, and cleared > out the install, then re-installed, and all was fine. I wonder if it's > possible to have a mpkg remove anything? pkg does not have a uninstaller - I don't think Apple provides one, that's a known limitation of Mac OS X installers (although I believe there are 3rd party ones) > > > I think both of those are known issues, and not a big deal. Maybe the spacing function is wrong on PPC. The underlying is highly architecture dependent. David From dwf at cs.toronto.edu Thu Jan 7 21:13:45 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 7 Jan 2010 21:13:45 -0500 Subject: [Numpy-discussion] Numpy & MKL In-Reply-To: <003a01ca8fff$c62582e0$527088a0$@yang@physics.usyd.edu.au> References: <003a01ca8fff$c62582e0$527088a0$@yang@physics.usyd.edu.au> Message-ID: <717E3E58-3B41-46B3-B432-21B937C8E9CA@cs.toronto.edu> On 7-Jan-10, at 8:13 PM, Xue (Sue) Yang wrote: > This is what I had (when I built numpy, I chose gnu compilers > instead of > intel compilers), > >>>> numpy.show_config() > lapack_opt_info: > libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread'] > library_dirs = ['/usr/physics/intel/mkl/lib/32'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/usr/physics/intel/mkl/include'] > > blas_opt_info: > libraries = ['mkl', 'vml', 'guide', 'pthread'] > library_dirs = ['/usr/physics/intel/mkl/lib/32'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/usr/physics/intel/mkl/include'] > > lapack_mkl_info: > libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread'] > library_dirs = ['/usr/physics/intel/mkl/lib/32'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/usr/physics/intel/mkl/include'] > > blas_mkl_info: > libraries = ['mkl', 'vml', 'guide', 'pthread'] > library_dirs = ['/usr/physics/intel/mkl/lib/32'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/usr/physics/intel/mkl/include'] > > mkl_info: > libraries = ['mkl', 'vml', 'guide', 'pthread'] > library_dirs = ['/usr/physics/intel/mkl/lib/32'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/usr/physics/intel/mkl/include'] That looks right to me... And you're sure you've set the environment variable before Python is run and NumPy is loaded? Try running: import os; print os.environ['OMP_NUM_THREADS'] and verify it's the right number. David From dwf at cs.toronto.edu Thu Jan 7 21:24:40 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 7 Jan 2010 21:24:40 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43D6E6.80305@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> Message-ID: <876E9DCB-4641-42F4-AD8C-385192F84582@cs.toronto.edu> On 5-Jan-10, at 7:18 PM, Christopher Barker wrote: > If distutils/setuptools could identify the python version properly, > then > binary eggs and easy-install could be a solution -- but that's a > mess, > too. Long live toydist! :) David From dwf at cs.toronto.edu Thu Jan 7 21:29:22 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 7 Jan 2010 21:29:22 -0500 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <4B43D330.7090608@noaa.gov> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> Message-ID: On 5-Jan-10, at 7:02 PM, Christopher Barker wrote: >> Pretty sure the python.org binaries are 32-bit only. I still think >> it's sensible to prefer the > > waiting the rest of this sentence.. ;-) I had meant to say 'sensible to prefer the Python.org version' though in reality I'm a little miffed that Python.org isn't providing Ron's 4- way binaries, since he went to the trouble of adding support for building them. Grumble grumble. >> I'm not really a fan of packages polluting /usr/local, I'd rather the >> tree appear /opt/packagename > > well, /opt has kind of been co-opted by macports. I'd forgotten about that. >> or /usr/local/packagename instead, for >> ease of removal > > wxPython gets put entirely into: > > /usr/local/lib/wxPython-unicode-2.10.8 > > which isn't bad. Ah, yeah, that isn't bad either. >> but the general approach of "stash somewhere and put >> a .pth in both site-packages" seems fine to me. > > OK -- what about simply punting and doing two builds: one 32 bit, and > one 64 bit. I wonder if we need 64bit PPC at all? I know I'm running > 64 > bit hardware, but never ran a 64 bit OS on it -- I wonder if anyone > is? I've built for ppc64 before, and in fact discovered a long-standing bug in the way ppc64 was detected. The fact that nobody found it before me is probably evidence that it is nearly never used. It could be useful in a minority of situations but I don't think it's going to be worth it for most people. David From robert.kern at gmail.com Thu Jan 7 21:34:09 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 Jan 2010 20:34:09 -0600 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> Message-ID: <3d375d731001071834p69de1764m5f3b8b0cf9eeee53@mail.gmail.com> On 2010-01-07, David Warde-Farley wrote: > On 5-Jan-10, at 7:02 PM, Christopher Barker wrote: >>> I'm not really a fan of packages polluting /usr/local, I'd rather the >>> tree appear /opt/packagename >> >> well, /opt has kind of been co-opted by macports. > > I'd forgotten about that. It's not really true, though. MacPorts took /opt/local/, but /opt// probably hasn't been. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Thu Jan 7 21:47:47 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 8 Jan 2010 11:47:47 +0900 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: <876E9DCB-4641-42F4-AD8C-385192F84582@cs.toronto.edu> References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> <4B43D6E6.80305@noaa.gov> <876E9DCB-4641-42F4-AD8C-385192F84582@cs.toronto.edu> Message-ID: <5b8d13221001071847u48cad23bx2c64a2cbf4df4584@mail.gmail.com> On Fri, Jan 8, 2010 at 11:24 AM, David Warde-Farley wrote: > On 5-Jan-10, at 7:18 PM, Christopher Barker wrote: > >> If distutils/setuptools could identify the python version properly, >> then >> ?binary eggs and easy-install could be a solution -- but that's a >> mess, >> too. > > > Long live toydist! :) Toydist will not solve anything here. Versioning info is useless here if it does not translate to compatible ABI. What is required is to be able to identify a precise python ABI: python makes that hard, mac os x harder, and universal builds ever harder. Things like PEP 384 may help in the future - As it is written by someone who actually knows about this stuff, it will hopefully be useful. David From cournape at gmail.com Thu Jan 7 22:12:20 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 8 Jan 2010 12:12:20 +0900 Subject: [Numpy-discussion] FIY: a (new ?) practical profiling tool on linux Message-ID: <5b8d13221001071912n54ff968dte7244ea56dc3a930@mail.gmail.com> Hi, I don't know if many people are aware of it, but I have recently discovered perf, a tool available from the kernel sources. It is extremely simple to use, and very useful when looking at numpy/scipy perf issues in compiled code. For example, I can get this kind of results for looking at the numpy neighborhood iterator performance in one simple command, without special compilation flags: 44.69% python /home/david/local/stow/scipy.git/lib/python2.6/site-packages/scipy/signal/sigtools.so [.] _imp_correlate_nd_double 39.47% python /home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so [.] get_ptr_constant 9.98% python /home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so [.] get_ptr_simple 0.65% python /usr/bin/python2.6 [.] 0x0000000012b8a0 0.40% python /usr/bin/python2.6 [.] 0x000000000a6662 0.37% python /usr/bin/python2.6 [.] 0x0000000004c10d 0.32% python /usr/bin/python2.6 [.] PyEval_EvalFrameEx 0.15% python [kernel] [k] __d_lookup 0.14% python /lib/libc-2.10.1.so [.] _int_malloc 0.12% python /usr/bin/python2.6 [.] 0x0000000004f90e 0.10% python [kernel] [k] __link_path_walk 0.09% python /usr/bin/python2.6 [.] PyObject_Malloc 0.09% python /lib/ld-2.10.1.so [.] do_lookup_x 0.09% python /lib/libc-2.10.1.so [.] __GI_memcpy 0.08% python [kernel] [k] __ticket_spin_lock 0.07% python /usr/bin/python2.6 [.] PyParser_AddToken And even cooler, annotated sources: ------------------------------------------------ Percent | Source code & Disassembly of multiarray.so ------------------------------------------------ : : : : Disassembly of section .text: : : 000000000001d8a0 : : _coordinates[c] = bd; : : /* set the dataptr from its current coordinates */ : static char* : get_ptr_constant(PyArrayIterObject* _iter, npy_intp *coordinates) : { 15.69 : 1d8a0: 48 81 ec 08 01 00 00 sub $0x108,%rsp : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; : : for(i = 0; i < niter->nd; ++i) { 0.02 : 1d8a7: 48 83 bf 48 0a 00 00 cmpq $0x0,0xa48(%rdi) 0.00 : 1d8ae: 00 : get_ptr_constant(PyArrayIterObject* _iter, npy_intp *coordinates) : { : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; 0.01 : 1d8af: 48 8b 87 50 0b 00 00 mov 0xb50(%rdi),%rax : : for(i = 0; i < niter->nd; ++i) { 7.92 : 1d8b6: 7e 64 jle 1d91c : _INF_SET_PTR(i) 0.01 : 1d8b8: 48 8b 0e mov (%rsi),%rcx 0.00 : 1d8bb: 48 03 48 28 add 0x28(%rax),%rcx 0.03 : 1d8bf: 48 3b 88 40 07 00 00 cmp 0x740(%rax),%rcx 7.97 : 1d8c6: 7c 68 jl 1d930 0.02 : 1d8c8: 45 31 c9 xor %r9d,%r9d 0.00 : 1d8cb: 31 d2 xor %edx,%edx 0.00 : 1d8cd: 48 3b 88 48 07 00 00 cmp 0x748(%rax),%rcx 7.75 : 1d8d4: 7e 32 jle 1d908 0.00 : 1d8d6: eb 58 jmp 1d930 0.00 : 1d8d8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 0.00 : 1d8df: 00 7.68 : 1d8e0: 4c 8d 42 74 lea 0x74(%rdx),%r8 0.00 : 1d8e4: 48 8b 0c d6 mov (%rsi,%rdx,8),%rcx 0.00 : 1d8e8: 48 03 4c d0 28 add 0x28(%rax,%rdx,8),%rcx 0.00 : 1d8ed: 49 c1 e0 04 shl $0x4,%r8 7.89 : 1d8f1: 49 3b 0c 00 cmp (%r8,%rax,1),%rcx 0.00 : 1d8f5: 7c 39 jl 1d930 0.01 : 1d8f7: 49 89 d0 mov %rdx,%r8 0.11 : 1d8fa: 49 c1 e0 04 shl $0x4,%r8 7.18 : 1d8fe: 4a 3b 8c 00 48 07 00 cmp 0x748(%rax,%r8,1),%rcx 0.00 : 1d905: 00 0.09 : 1d906: 7f 28 jg 1d930 : int i; : npy_intp bd, _coordinates[NPY_MAXDIMS]; : PyArrayNeighborhoodIterObject *niter = (PyArrayNeighborhoodIterObject*)_iter; : PyArrayIterObject *p = niter->_internal_iter; : It works for C and Fortran, BTW, cheers, David From bsouthey at gmail.com Thu Jan 7 23:10:39 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 7 Jan 2010 22:10:39 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B465605.3010406@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> Message-ID: On Thu, Jan 7, 2010 at 3:45 PM, Christopher Barker wrote: > Bruce Southey wrote: >>> wrote: > >> Using the numpy NaN or similar (noting R's approach to missing values >> which in turn allows it to have the above functionality) is just a >> very bad idea for missing values because you always have to check that >> which NaN is a missing value and which was due to some numerical >> calculation. > > well, this is specific to reading files, so you know where it came from. You can only know where it came from when you compare the original array to the transformed one. Also a user has to check for missing values or numpy has to warn a user that missing values are present immediately after reading the data so the appropriate action can be taken (like using functions that handle missing values appropriately). That is my second problem with using codes (NaN, -99999 etc) for missing values. > And the principle of fromfile() is that it is fast and simple, if you > want masked arrays, use slower, but more full-featured methods. So in that case it should fail with missing data. > > However, in this case: > > In [9]: np.fromstring("3, 4, NaN, 5", sep=",") > Out[9]: array([ ?3., ? 4., ?NaN, ? 5.]) > > > An actual NaN is read from the file, rather than a missing value. > Perhaps the user does want the distinction, so maybe it should really > only fil it in if the users asks for it, but specifying > "missing_value=np.nan" or something. Yes, that is my first problem of using predefined codes for missing values as you do not always know what is going to occur in the data. > >>>From what I can see is that you expect that fromfile() should only >> split at the supplied delimiters, optionally(?) strip any whitespace > > whitespace stripping is not optional. > >> Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >> actually assumes multiple delimiters because there is no comma between >> 4 and 5 and 8 and 9. > > Yes, that's the point. I thought about allowing arbitrary multiple > delimiters, but I think '/n' is a special case - for instance, a comma > at the end of some numbers might mean missing data, but a '\n' would not. > > And I couldn't really think of a useful use-case for arbitrary multiple > delimiters. > >> In Josef's last case how many 'missing values should there be? > > ?>> extra newlines at end of file > ?>> str = ?'1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' > > none -- exactly why I think \n is a special case. What about '\r' and '\n\r'? > > What about: > ?>> extra newlines in the middle of the file > ?>> str = ?'1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' > > I think they should be ignored, but I hope I'm not making something that > is too specific to my personal needs. Not really, it is more that I am being somewhat difficult to ensure I understand what you actually need. My problem with this is that you are reading one huge 1-D array (that you can resize later) rather than a 2-D array with rows and columns (which is what I deal with). But I agree that you can have an option to say treat '\n' or '\r' as a delimiter but I think it should be turned off by default. > > Travis Oliphant wrote: >> +1 (ignoring new-lines transparently is a nice feature). ?You can also >> use sscanf with weave to read most files. > > right -- but that requires weave. In fact, MATLAB has a fscanf function > that allows you to pass in a C format string and it vectorizes it to use > the same one over an over again until it's done. It's actually quite > powerful and flexible. I once started with that in mind, but didn't have > the C chops to do it. I ended up with a tool that only did doubles (come > to think of it, MATLAB only does doubles, anyway...) > > I may some day write a whole new C (or, more likely, Cython) function > that does something like that, but for now, I'm jsut trying to get > fromfile to be useful for me. > > >> +1 ? (much preferrable to insert NaN or other user value than raise >> ValueError in my opinion) > > But raise an error for integer types? > > I guess this is still up the air -- no consensus yet. > > Thanks, > > -Chris > You should have a corresponding value for ints because raising an exceptionwould be inconsistent with allowing floats to have a value. If you must keep the user defined dtype then, as Josef suggests, just use some code be it -999 or most negative number supported by the OS for the defined dtype or, just convert the ints into floats if the user does not define a missing value code. It would be nice to either return the number of missing values or display a warning indicating how many occurred. Bruce From josef.pktd at gmail.com Fri Jan 8 00:26:41 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 8 Jan 2010 00:26:41 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> Message-ID: <1cd32cbb1001072126x1e68f10dv940078d0cf81f35@mail.gmail.com> On Thu, Jan 7, 2010 at 11:10 PM, Bruce Southey wrote: > On Thu, Jan 7, 2010 at 3:45 PM, Christopher Barker > wrote: >> Bruce Southey wrote: >>>> wrote: >> >>> Using the numpy NaN or similar (noting R's approach to missing values >>> which in turn allows it to have the above functionality) is just a >>> very bad idea for missing values because you always have to check that >>> which NaN is a missing value and which was due to some numerical >>> calculation. >> >> well, this is specific to reading files, so you know where it came from. > > You can only know where it came from when you compare the original > array to the transformed one. Also a user has to check for missing > values or numpy has to warn a user that missing values are present > immediately after reading the data so the appropriate action can be > taken (like using functions that handle missing values appropriately). > That is my second problem with using codes (NaN, -99999 etc) ?for > missing values. > > > >> And the principle of fromfile() is that it is fast and simple, if you >> want masked arrays, use slower, but more full-featured methods. > > So in that case it should fail with missing data. > >> >> However, in this case: >> >> In [9]: np.fromstring("3, 4, NaN, 5", sep=",") >> Out[9]: array([ ?3., ? 4., ?NaN, ? 5.]) >> >> >> An actual NaN is read from the file, rather than a missing value. >> Perhaps the user does want the distinction, so maybe it should really >> only fil it in if the users asks for it, but specifying >> "missing_value=np.nan" or something. > > Yes, that is my first problem of using predefined codes for missing > values as you do not always know what is going to occur in the data. > > >> >>>>From what I can see is that you expect that fromfile() should only >>> split at the supplied delimiters, optionally(?) strip any whitespace >> >> whitespace stripping is not optional. >> >>> Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' >>> actually assumes multiple delimiters because there is no comma between >>> 4 and 5 and 8 and 9. >> >> Yes, that's the point. I thought about allowing arbitrary multiple >> delimiters, but I think '/n' is a special case - for instance, a comma >> at the end of some numbers might mean missing data, but a '\n' would not. >> >> And I couldn't really think of a useful use-case for arbitrary multiple >> delimiters. >> >>> In Josef's last case how many 'missing values should there be? >> >> ?>> extra newlines at end of file >> ?>> str = ?'1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' >> >> none -- exactly why I think \n is a special case. > > What about '\r' and '\n\r'? Yes, I forgot about this, and it will be the most common case for Windows users like myself. I think \r should be stripped automatically, like in non-binary reading of files in python. > >> >> What about: >> ?>> extra newlines in the middle of the file >> ?>> str = ?'1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' >> >> I think they should be ignored, but I hope I'm not making something that >> is too specific to my personal needs. > > Not really, it is more that I am being somewhat difficult to ensure I > understand what you actually need. > > My problem with this is that you are reading one huge 1-D array ?(that > you can resize later) rather than a 2-D array with rows and columns > (which is what I deal with). But I agree that you can have an option > to say treat '\n' or '\r' as a delimiter but I think it should be > turned off by default. > > >> >> Travis Oliphant wrote: >>> +1 (ignoring new-lines transparently is a nice feature). ?You can also >>> use sscanf with weave to read most files. >> >> right -- but that requires weave. In fact, MATLAB has a fscanf function >> that allows you to pass in a C format string and it vectorizes it to use >> the same one over an over again until it's done. It's actually quite >> powerful and flexible. I once started with that in mind, but didn't have >> the C chops to do it. I ended up with a tool that only did doubles (come >> to think of it, MATLAB only does doubles, anyway...) >> >> I may some day write a whole new C (or, more likely, Cython) function >> that does something like that, but for now, I'm jsut trying to get >> fromfile to be useful for me. >> >> >>> +1 ? (much preferrable to insert NaN or other user value than raise >>> ValueError in my opinion) >> >> But raise an error for integer types? >> >> I guess this is still up the air -- no consensus yet. >> >> Thanks, >> >> -Chris >> > > You should have a corresponding value for ints because raising an > exceptionwould be inconsistent with allowing floats to have a value. No, I think different nan/missing value handling between integers and float is a natural distinction. There is no default nan code for integers, but nan (and inf) are valid floating point numbers (even if nan is not a number). And the default treatment of nans in numpy is getting pretty good (e.g. I like the new (nan)sort). > If you must keep the user defined dtype then, as Josef suggests, just > use some code be it -999 or most negative number supported by the OS > for the defined dtype or, just convert the ints into floats if the > user does not define a missing value code. ?It would be nice to either > return the number of missing values or display a warning indicating > how many occurred. A warning would be good, but doing np.any(np.isnan(x)) or np.isnan(x).sum() on the result is always a good idea for a user when missing values are possibility. Josef > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Fri Jan 8 04:22:53 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 08 Jan 2010 11:22:53 +0200 Subject: [Numpy-discussion] fromfile() -- help! In-Reply-To: <4B46889E.3020008@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> Message-ID: <1262942573.2580.149.camel@talisman> to, 2010-01-07 kello 17:21 -0800, Christopher Barker kirjoitti: [clip] > if I have NumPyOS_ascii_ftolf right, it should return 0 if it doesn't > succesfully read a number. However, this looks like it sets the data in > *ip, even if the return value is zero. It may also return EOF (== -1) when encountering end-of-stream. Of course, I don't think any code should not rely on EOF being -1, and I doubt that relying on it is intended here. > It does pass on that return value, but, from ctors.c: > > fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, > void *NPY_UNUSED(stream_data)) > { > /* the NULL argument is for backwards-compatibility */ > return dtype->f->scanfunc(*fp, dptr, NULL, dtype); > } > > just moves it on through. This is called from here: > > if (next(&stream, dptr, dtype, stream_data) < 0) { > break; > } > > which is checking for < 0 , so if a zero is returned, it will just go in > its merry way... Yeah, this is of course wrong; for example a file containing "1,2," results to np.fromfile("filename.txt", sep=",") == [1, 2, -1] where the last value is effectively undefined. Another point to note is that `next` may also be the fromstr_next_element function; when fixing things also its semantics should be corrected. Pauli From pav+sp at iki.fi Fri Jan 8 04:28:33 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 8 Jan 2010 09:28:33 +0000 (UTC) Subject: [Numpy-discussion] fromfile() -- help! References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> Message-ID: Thu, 07 Jan 2010 17:21:34 -0800, Christopher Barker wrote: [clip] > It does pass on that return value, but, from ctors.c: > > fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, > void *NPY_UNUSED(stream_data)) > { > /* the NULL argument is for backwards-compatibility */ return > dtype->f->scanfunc(*fp, dptr, NULL, dtype); > } This functions is IMHO where the fix should go; I believe it should do something like return (ret == 0 || ret == EOF) ? -1 : ret; -- Pauli Virtanen From robince at gmail.com Fri Jan 8 07:03:27 2010 From: robince at gmail.com (Robin) Date: Fri, 8 Jan 2010 12:03:27 +0000 Subject: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2 In-Reply-To: References: <9d5ec4221001050835w5dd4cf97hffd4f2480bf19c3e@mail.gmail.com> <4B43A4FF.3010604@hawaii.edu> <4B43B4CD.3030701@hawaii.edu> <523E341E-A076-465C-938C-EC902F5F0D16@gmail.com> <4B43C4D2.5030701@noaa.gov> <4B43D330.7090608@noaa.gov> Message-ID: <2d5132a51001080403r62828c3bsae9baf12c6ed5e92@mail.gmail.com> On Fri, Jan 8, 2010 at 2:29 AM, David Warde-Farley wrote: > On 5-Jan-10, at 7:02 PM, Christopher Barker wrote: > >>> Pretty sure the python.org binaries are 32-bit only. I still think >>> it's sensible to prefer the >> >> waiting the rest of this sentence.. ;-) > > I had meant to say 'sensible to prefer the Python.org version' though > in reality I'm a little miffed that Python.org isn't providing Ron's 4- > way binaries, since he went to the trouble of adding support for > building them. Grumble grumble. My understanding was that 2.6/3.1 will never be buildable as an arch selectable universal binary interpreter (like the apple system python) due to this issue: http://bugs.python.org/issue6834 I think this is only being fixed in 2.7/3.2 so perhaps from then Python will distribute selectable universal builds. (Just mention it in case folks aren't aware of that issue). Cheers Robin From nouiz at nouiz.org Fri Jan 8 09:18:28 2010 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 8 Jan 2010 09:18:28 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <002401ca8f3f$fdae2340$f90a69c0$%yang@physics.usyd.edu.au> <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> <4B461B9F.5060800@noaa.gov> Message-ID: <2d1d7fe71001080618od97adf1nbd936f5f22963912@mail.gmail.com> Hi, I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In their benchmark they give info that they are competitive again mkl and goto on matrix matrix product. They are not better, but that could make a good default implementation for numpy when their is no blas installed. I think the license would allow to include it in numpy directly. I don't have time to do it, and my numpy is linked with goto. So it would be useless for me. But if someone want to make the default version better again other tools, that could be a good approach. Fr?d?ric Bastien On Thu, Jan 7, 2010 at 12:47 PM, Sturla Molden wrote: > > Sturla Molden wrote: > >> I would suggest using GotoBLAS instead of ATLAS. > > > >> http://www.tacc.utexas.edu/tacc-projects/ > > > > That does look promising -- nay idea what the license is? They don't > > make it clear on the site > > > > UT TACC Research License (Source Code) > > > > The Texas Advanced Computing Center of The University of Texas at Austin > has developed certain software and documentation that it desires to make > available without charge to anyone for academic, research, experimental or > personal use. This license is designed to guarantee freedom to use the > software for these purposes. If you wish to distribute or make other use > of the software, you may purchase a license to do so from the University > of Texas. > > The accompanying source code is made available to you under the terms of > this UT TACC Research License (this "UTTRL"). By clicking the "ACCEPT" > button, or by installing or using the code, you are consenting to be bound > by this UTTRL. If you do not agree to the terms and conditions of this > license, do not click the "ACCEPT" button, and do not install or use any > part of the code. > > The terms and conditions in this UTTRL not only apply to the source code > made available by UT TACC, but also to any improvements to, or derivative > works of, that source code made by you and to any object code compiled > from such source code, improvements or derivative works. > > 1. DEFINITIONS. > > 1.1 "Commercial Use" shall mean use of Software or Documentation by > Licensee for direct or indirect financial, commercial or strategic gain or > advantage, including without limitation: (a) bundling or integrating the > Software with any hardware product or another software product for > transfer, sale or license to a third party (even if distributing the > Software on separate media and not charging for the Software); (b) > providing customers with a link to the Software or a copy of the Software > for use with hardware or another software product purchased by that > customer; or (c) use in connection with the performance of services for > which Licensee is compensated. > > 1.2 "Derivative Products" means any improvements to, or other derivative > works of, the Software made by Licensee. > > 1.3 "Documentation" shall mean all manuals, user documentation, and other > related materials pertaining to the Software that are made available to > Licensee in connection with the Software. > > 1.4 "Licensor" shall mean The University of Texas. > > 1.5 "Licensee" shall mean the person or entity that has agreed to the > terms hereof and is exercising rights granted hereunder. > > 1.6 "Software" shall mean the computer program(s) referred to as GotoBLAS2 > made available under this UTTRL in source code form, including any error > corrections, bug fixes, patches, updates or other modifications that > Licensor may in its sole discretion make available to Licensee from time > to time, and any object code compiled from such source code. > > 2. GRANT OF RIGHTS. > > Subject to the terms and conditions hereunder, Licensor hereby grants to > Licensee a worldwide, non-transferable, non-exclusive license to (a) > install, use and reproduce the Software for academic, research, > experimental and personal use (but specifically excluding Commercial Use); > (b) use and modify the Software to create Derivative Products, subject to > Section 3.2; and (c) use the Documentation, if any, solely in connection > with Licensee's authorized use of the Software. > > 3. RESTRICTIONS; COVENANTS. > > 3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer > copies or rights to the Software (or any portion thereof) or the > Documentation; (b) use the Software (or any portion thereof) or > Documentation for Commercial Use, or for any other use except as described > in Section 2; (c) copy the Software or Documentation other than for > archival and backup purposes; or (d) remove any product identification, > copyright, proprietary notices or labels from the Software and > Documentation. This UTTRL confers no rights upon Licensee except those > expressly granted herein. > > 3.2 Licensee hereby agrees that it will provide a copy of all Derivative > Products to Licensor and that its use of the Derivative Products will be > subject to all of the same terms, conditions, restrictions and limitations > on use imposed on the Software under this UTTRL. Licensee hereby grants > Licensor a worldwide, non-exclusive, royalty-free license to reproduce, > prepare derivative works of, publicly display, publicly perform, > sublicense and distribute Derivative Products. Licensee also hereby grants > Licensor a worldwide, non-exclusive, royalty-free patent license to make, > have made, use, offer to sell, sell, import and otherwise transfer the > Derivative Products under those patent claims licensable by Licensee that > are necessarily infringed by the Derivative Products. > > 4. PROTECTION OF SOFTWARE. > > 4.1 Confidentiality. The Software and Documentation are the confidential > and proprietary information of Licensor. Licensee agrees to take adequate > steps to protect the Software and Documentation from unauthorized > disclosure or use. Licensee agrees that it will not disclose the Software > or Documentation to any third party. > > 4.2 Proprietary Notices. Licensee shall maintain and place on any copy of > Software or Documentation that it reproduces for internal use all notices > as are authorized and/or required hereunder. Licensee shall include a copy > of this UTTRL and the following notice, on each copy of the Software and > Documentation. Such license and notice shall be embedded in each copy of > the Software, in the video screen display, on the physical medium > embodying the Software copy and on any Documentation: > > Copyright ?? The University of Texas, 2009. All right reserved. > UNIVERSITY EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES CONCERNING THIS > SOFTWARE AND DOCUMENTATION, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, > FITNESS FOR ANY PARTICULAR PURPOSE, NON-INFRINGEMENT AND WARRANTIES OF > PERFORMANCE, AND ANY WARRANTY THAT MIGHT OTHERWISE ARISE FROM COURSE OF > DEALING OR USAGE OF TRADE. NO WARRANTY IS EITHER EXPRESS OR IMPLIED WITH > RESPECT TO THE USE OF THE SOFTWARE OR DOCUMENTATION. Under no > circumstances shall University be liable for incidental, special, > indirect, direct or consequential damages or loss of profits, interruption > of business, or related expenses which may arise from use of Software or > Documentation, including but not limited to those resulting from defects > in Software and/or Documentation, or loss or inaccuracy of data of any > kind. > > 5. WARRANTIES. > > 5.1 Disclaimer of Warranties. TO THE EXTENT PERMITTED BY APPLICABLE LAW, > THE SOFTWARE AND DOCUMENTATION ARE BEING PROVIDED ON AN "AS IS" BASIS > WITHOUT ANY WARRANTIES OF ANY KIND RESPECTING THE SOFTWARE OR > DOCUMENTATION, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY > WARRANTY OF DESIGN, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR > NON-INFRINGEMENT. > > 5.2 Limitation of Liability. UNDER NO CIRCUMSTANCES UNLESS REQUIRED BY > APPLICABLE LAW SHALL LICENSOR BE LIABLE FOR INCIDENTAL, SPECIAL, INDIRECT, > DIRECT OR CONSEQUENTIAL DAMAGES OR LOSS OF PROFITS, INTERRUPTION OF > BUSINESS, OR RELATED EXPENSES WHICH MAY ARISE AS A RESULT OF THIS LICENSE > OR OUT OF THE USE OR ATTEMPT OF USE OF SOFTWARE OR DOCUMENTATION INCLUDING > BUT NOT LIMITED TO THOSE RESULTING FROM DEFECTS IN SOFTWARE AND/OR > DOCUMENTATION, OR LOSS OR INACCURACY OF DATA OF ANY KIND. THE FOREGOING > EXCLUSIONS AND LIMITATIONS WILL APPLY TO ALL CLAIMS AND ACTIONS OF ANY > KIND, WHETHER BASED ON CONTRACT, TORT (INCLUDING, WITHOUT LIMITATION, > NEGLIGENCE), OR ANY OTHER GROUNDS. > > 6. INDEMNIFICATION. > > Licensee shall indemnify, defend and hold harmless Licensor, the > University of Texas System, their Regents, and their officers, agents and > employees from and against any claims, demands, or causes of action > whatsoever caused by, or arising out of, or resulting from, the exercise > or practice of the license granted hereunder by Licensee, its officers, > employees, agents or representatives. > > 7. TERMINATION. > > If Licensee breaches this UTTRL, Licensee\'s right to use the Software and > Documentation will terminate immediately without notice, but all > provisions of this UTTRL except Section 2 will survive termination and > continue in effect. Upon termination, Licensee must destroy all copies of > the Software and Documentation. > > 8. GOVERNING LAW; JURISDICTION AND VENUE. > > The validity, interpretation, construction and performance of this UTTRL > shall be governed by the laws of the State of Texas. The Texas state > courts of Travis County, Texas (or, if there is exclusive federal > jurisdiction, the United States District Court for the Central District of > Texas) shall have exclusive jurisdiction and venue over any dispute > arising out of this UTTRL, and Licensee consents to the jurisdiction of > such courts. Application of the United Nations Convention on Contracts for > the International Sale of Goods is expressly excluded. > > 9. EXPORT CONTROLS. > > This license is subject to all applicable export restrictions. Licensee > must comply with all export and import laws and restrictions and > regulations of any United States or foreign agency or authority relating > to the Software and its use. > > 10. U.S. GOVERNMENT END-USERS. > > The Software is a "commercial item," as that term is defined in 48 C.F.R. > 2.101, consisting of "commercial computer software" and "commercial > computer software documentation," as such terms are used in 48 C.F.R. > 12.212 (Sept. 1995) and 48 C.F.R. 227.7202 (June 1995). Consistent with 48 > C.F.R. 12.212, 48 C.F.R. 27.405(b)(2) (June 1998) and 48 C.F.R. 227.7202, > all U.S. Government End Users acquire the Software with only those rights > as set forth herein. > > 11. MISCELLANEOUS. > > If any provision hereof shall be held illegal, invalid or unenforceable, > in whole or in part, such provision shall be modified to the minimum > extent necessary to make it legal, valid and enforceable, and the > legality, validity and enforceability of all other provisions of this > UTTRL shall not be affected thereby. Licensee may not assign this UTTRL in > whole or in part, without Licensor's prior written consent. Any attempt to > assign this UTTRL without such consent will be null and void. This UTTRL > is the complete and exclusive statement between Licensee and Licensor > relating to the subject matter hereof and supersedes all prior oral and > written and all contemporaneous oral negotiations, commitments and > understandings of the parties, if any. Any waiver by either party of any > default or breach hereunder shall not constitute a waiver of any provision > of this UTTRL or of any subsequent default or breach of the same or a > different kind. > > > > END OF LICENSE > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jan 8 10:17:43 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 8 Jan 2010 09:17:43 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <2d1d7fe71001080618od97adf1nbd936f5f22963912@mail.gmail.com> References: <002401ca8f3f$fdae2340$f90a69c0$%yang@physics.usyd.edu.au> <1d4f24dc6f376254aa63963eb6cd5916.squirrel@webmail.uio.no> <4B461B9F.5060800@noaa.gov> <2d1d7fe71001080618od97adf1nbd936f5f22963912@mail.gmail.com> Message-ID: <3d375d731001080717t7185e9c5w9b64a5d49b9740bc@mail.gmail.com> 2010/1/8 Fr?d?ric Bastien : > Hi, > > I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In > their benchmark they give info that they are competitive again mkl and goto > on matrix matrix product. They are not better, but that could make a good > default implementation for numpy when their is no blas installed. I think > the license would allow to include it in numpy directly. It is licensed under the LGPLv3, so it is not compatible with the numpy license. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From d.l.goldsmith at gmail.com Fri Jan 8 16:13:14 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 8 Jan 2010 13:13:14 -0800 Subject: [Numpy-discussion] Stupid question (at least coming from me it is) Message-ID: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> So, to get the new numpy.polynomial "sub-package," one has to update to 1.4 (or is there a 1.3.x that has it)? Thanks! DG PS: my pressing need (another stupid question, at least coming from me): chebyshev.chebdomain = [0,1] or [-1,1]? Thanks again! -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 8 16:40:34 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jan 2010 14:40:34 -0700 Subject: [Numpy-discussion] Stupid question (at least coming from me it is) In-Reply-To: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> References: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> Message-ID: On Fri, Jan 8, 2010 at 2:13 PM, David Goldsmith wrote: > So, to get the new numpy.polynomial "sub-package," one has to update to 1.4 > (or is there a 1.3.x that has it)? Thanks! > > Yes. > DG > > PS: my pressing need (another stupid question, at least coming from me): > chebyshev.chebdomain = [0,1] or [-1,1]? Thanks again! > > chebyshev.chebdomain is the default chebyshev domain and is [-1,1]. Maybe it needs a bettter name? Note that it is integer; that isn't required, but it makes it compatible with other types like Decimal that don't mix with floats. Another possibility is to make it a function so I can document it, at present it is an ndarray. For normal work you should use the chebyshev.Chebyshev class. Let me know what particular problem you are looking at as it will be useful to start putting some examples together. And I want to see what needs improvement. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Jan 8 18:12:24 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 08 Jan 2010 15:12:24 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> Message-ID: <4B47BBD8.5010906@noaa.gov> Bruce Southey wrote: > Also a user has to check for missing > values or numpy has to warn a user I think warnings are next to useless for all but interactive work -- so I don't want to rely on them > that missing values are present > immediately after reading the data so the appropriate action can be > taken (like using functions that handle missing values appropriately). > That is my second problem with using codes (NaN, -99999 etc) for > missing values. But I think you're right -- if someone write code, tests with good input, then later runs it with missing valued import, they are likely to have not ever bothered to test for missing values. So I think missing values should only be replaced by something if the user specifically asks for it. >> And the principle of fromfile() is that it is fast and simple, if you >> want masked arrays, use slower, but more full-featured methods. > > So in that case it should fail with missing data. Well, I'm not so sure -- the point is performance, no reason not to have high performing code that handles missing data. > What about '\r' and '\n\r'? I have thought about that -- I'm hoping that python's text file reading will just take care of it, but as we're working with C file handles here (I think), I guess not. '/n/r' is easy -- the '/r' is just extra whitespace. 'r' is another case to handle. > My problem with this is that you are reading one huge 1-D array (that > you can resize later) rather than a 2-D array with rows and columns > (which is what I deal with). That's because fromfile()) is not designed to be row-oriented at all, and the binary read certainly isn't. I'm just trying to make this easy -- though it's not turning out that way! > But I agree that you can have an option > to say treat '\n' or '\r' as a delimiter but I think it should be > turned off by default. that's what I've done. > You should have a corresponding value for ints because raising an > exceptionwould be inconsistent with allowing floats to have a value. I'm not sure I care, really -- but I think having the user specify the fill value is the best option, anyway. josef.pktd at gmail.com wrote: >>> none -- exactly why I think \n is a special case. >> What about '\r' and '\n\r'? > > Yes, I forgot about this, and it will be the most common case for > Windows users like myself. > > I think \r should be stripped automatically, like in non-binary > reading of files in python. except for folks like me that have old mac files laying around...so I want this like "Universal newlines" support. > A warning would be good, but doing np.any(np.isnan(x)) or > np.isnan(x).sum() on the result is always a good idea for a user when > missing values are possibility. right, but the issue is the user has to know that they are possible, and we all know how carefully we all read docs! Thanks for your input -- I think I know what I'd like to do, but it's proving less than trivial to do it, so we'll see. In short: 1) optionally allow newlines to serve as a delimiter, so large tables can be read. 2) raise an exception for missing values, unless: 3) the user specifies a fill value of their choice (compatible with the chosen data type. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Fri Jan 8 18:16:31 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 08 Jan 2010 15:16:31 -0800 Subject: [Numpy-discussion] fromfile() -- help! In-Reply-To: <1262942573.2580.149.camel@talisman> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> <1262942573.2580.149.camel@talisman> Message-ID: <4B47BCCF.6060104@noaa.gov> Pauli Virtanen wrote: >> if I have NumPyOS_ascii_ftolf right, it should return 0 if it doesn't >> succesfully read a number. However, this looks like it sets the data in >> *ip, even if the return value is zero. > > It may also return EOF (== -1) when encountering end-of-stream. Of > course, I don't think any code should not rely on EOF being -1, and I > doubt that relying on it is intended here. OK, so it should explicitly check for EOF? >> It does pass on that return value, but, from ctors.c: >> >> fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, >> void *NPY_UNUSED(stream_data)) >> { >> /* the NULL argument is for backwards-compatibility */ >> return dtype->f->scanfunc(*fp, dptr, NULL, dtype); >> } >> >> just moves it on through. This is called from here: >> >> if (next(&stream, dptr, dtype, stream_data) < 0) { >> break; >> } >> >> which is checking for < 0 , so if a zero is returned, it will just go in >> its merry way... > > Yeah, this is of course wrong; for example a file containing "1,2," > results to np.fromfile("filename.txt", sep=",") == [1, 2, -1] where the > last value is effectively undefined. I get a zero, but yes, that's what I'm trying to fix > Another point to note is that `next` may also be the > fromstr_next_element function; when fixing things also its semantics > should be corrected. yup -- I know -- great fun! But I;'m writing unit test that ensure that fromstring and fromfile do the same thing, so I should catch it if I miss anything. >> It does pass on that return value, but, from ctors.c: >> >> fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, >> void *NPY_UNUSED(stream_data)) >> { >> /* the NULL argument is for backwards-compatibility */ return >> dtype->f->scanfunc(*fp, dptr, NULL, dtype); >> } > > This functions is IMHO where the fix should go; I believe it should do > something like > > return (ret == 0 || ret == EOF) ? -1 : ret; > Thanks -- I think that makes sense -- if nothing else, a change here will only effect fromfile(), so I won't accidentally break anything else. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From d.l.goldsmith at gmail.com Fri Jan 8 19:19:21 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 8 Jan 2010 16:19:21 -0800 Subject: [Numpy-discussion] Stupid question (at least coming from me it is) In-Reply-To: References: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> Message-ID: <45d1ab481001081619g3afb1af4t275631e293795e55@mail.gmail.com> On Fri, Jan 8, 2010 at 1:40 PM, Charles R Harris wrote: > > chebyshev.chebdomain is the default chebyshev domain and is [-1,1]. Maybe it > needs a bettter name? Note that it is integer; that isn't required, but it > makes it compatible with other types like Decimal that don't mix with > floats. Another possibility is to make it a function so I can document it, That's "the problem" I'm working on. :-) DG > at present it is an ndarray. For normal work you should use the > chebyshev.Chebyshev class. > > Let me know what particular problem you are looking at as it will be useful > to start putting some examples together. And I want to see what needs > improvement. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From bsouthey at gmail.com Fri Jan 8 20:15:43 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 8 Jan 2010 19:15:43 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B47BBD8.5010906@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <4B47BBD8.5010906@noaa.gov> Message-ID: On Fri, Jan 8, 2010 at 5:12 PM, Christopher Barker wrote: > Bruce Southey wrote: >> Also a user has to check for missing >> values or numpy has to warn a user > > I think warnings are next to useless for all but interactive work -- so > I don't want to rely on them > >> that missing values are present >> immediately after reading the data so the appropriate action can be >> taken (like using functions that handle missing values appropriately). >> That is my second problem with using codes (NaN, -99999 etc) ?for >> missing values. > > But I think you're right -- if someone write code, tests with good > input, then later runs it with missing valued import, they are likely to > have not ever bothered to test for missing values. > > So I think missing values should only be replaced by something if the > user specifically asks for it. > >>> And the principle of fromfile() is that it is fast and simple, if you >>> want masked arrays, use slower, but more full-featured methods. >> >> So in that case it should fail with missing data. > > Well, I'm not so sure -- the point is performance, no reason not to have > high performing code that handles missing data. > >> What about '\r' and '\n\r'? > > I have thought about that -- I'm hoping that python's text file reading > will just take care of it, but as we're working with C file handles here > (I think), I guess not. '/n/r' is easy -- the '/r' is just extra > whitespace. 'r' is another case to handle. > > >> My problem with this is that you are reading one huge 1-D array ?(that >> you can resize later) rather than a 2-D array with rows and columns >> (which is what I deal with). > > That's because fromfile()) is not designed to be row-oriented at all, > and the binary read certainly isn't. I'm just trying to make this easy > -- though it's not turning out that way! > > ?> But I agree that you can have an option >> to say treat '\n' or '\r' as a delimiter but I think it should be >> turned off by default. > > that's what I've done. > >> You should have a corresponding value for ints because raising an >> exceptionwould be inconsistent with allowing floats to have a value. > > I'm not sure I care, really -- but I think having the user specify the > fill value is the best option, anyway. > > josef.pktd at gmail.com wrote: >>>> none -- exactly why I think \n is a special case. >>> What about '\r' and '\n\r'? >> >> Yes, I forgot about this, and it will be the most common case for >> Windows users like myself. >> >> I think \r should be stripped automatically, like in non-binary >> reading of files in python. > > except for folks like me that have old mac files laying around...so I > want this like "Universal newlines" support. > >> A warning would be good, but doing np.any(np.isnan(x)) or >> np.isnan(x).sum() on the result is always a good idea for a user when >> missing values are possibility. > > right, but the issue is the user has to know that they are possible, and > we all know how carefully we all read docs! > > Thanks for your input -- I think I know what I'd like to do, but it's > proving less than trivial to do it, so we'll see. > > In short: > > 1) optionally allow newlines to serve as a delimiter, so large tables > can be read. > > 2) raise an exception for missing values, unless: > ? 3) the user specifies a fill value of their choice (compatible with > the chosen data type. > > > -Chris > > I fully agree with your approach! Thanks for considering my thoughts! Bruce From charlesr.harris at gmail.com Fri Jan 8 20:29:46 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jan 2010 18:29:46 -0700 Subject: [Numpy-discussion] Stupid question (at least coming from me it is) In-Reply-To: <45d1ab481001081619g3afb1af4t275631e293795e55@mail.gmail.com> References: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> <45d1ab481001081619g3afb1af4t275631e293795e55@mail.gmail.com> Message-ID: On Fri, Jan 8, 2010 at 5:19 PM, David Goldsmith wrote: > On Fri, Jan 8, 2010 at 1:40 PM, Charles R Harris > wrote: > > > > chebyshev.chebdomain is the default chebyshev domain and is [-1,1]. Maybe > it > > needs a bettter name? Note that it is integer; that isn't required, but > it > > makes it compatible with other types like Decimal that don't mix with > > floats. Another possibility is to make it a function so I can document > it, > > That's "the problem" I'm working on. :-) > > There are four variables defined # Chebyshev default domain. chebdomain = np.array([-1,1]) # Chebyshev coefficients representing zero. chebzero = np.array([0]) # Chebyshev coefficients representing one. chebone = np.array([1]) # Chebyshev coefficients representing the identity x. chebx = np.array([0,1]) And corresponding ones in the polynomial module. I can make them all functions if that would help, I thought of doing that in the first place... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Jan 8 23:37:37 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 8 Jan 2010 20:37:37 -0800 Subject: [Numpy-discussion] Stupid question (at least coming from me it is) In-Reply-To: References: <45d1ab481001081313w5a9feb62xf1355ed92e43189b@mail.gmail.com> <45d1ab481001081619g3afb1af4t275631e293795e55@mail.gmail.com> Message-ID: <45d1ab481001082037o4f8bb7dfnf43676c98c4056a3@mail.gmail.com> On Fri, Jan 8, 2010 at 5:29 PM, Charles R Harris wrote: > > On Fri, Jan 8, 2010 at 5:19 PM, David Goldsmith > wrote: >> >> On Fri, Jan 8, 2010 at 1:40 PM, Charles R Harris >> wrote: >> > >> > chebyshev.chebdomain is the default chebyshev domain and is [-1,1]. >> > Maybe it >> > needs a bettter name? Note that it is integer; that isn't required, but >> > it >> > makes it compatible with other types like Decimal that don't mix with >> > floats. Another possibility is to make it a function so I can document >> > it, >> >> That's "the problem" I'm working on. :-) >> > > There are four variables defined > > # Chebyshev default domain. > chebdomain = np.array([-1,1]) > > # Chebyshev coefficients representing zero. > chebzero = np.array([0]) > > # Chebyshev coefficients representing one. > chebone = np.array([1]) > > # Chebyshev coefficients representing the identity x. > chebx = np.array([0,1]) > > And corresponding ones in the polynomial module. I can make them all > functions if that would help, I thought of doing that in the first place... Well, I'm documenting them at the module level, which is where some doc on them already exists (I'm just embellishing a little for increased clarity) and what I _think_ "we" agreed on as "what to do" to document constants, so I don't need/want you to ("promote" them, that is), but if you decide that you want to do it, I'm neutral (unless it would hurt performance of course). Thanks for the additional info, DG From pav at iki.fi Sat Jan 9 07:44:03 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 09 Jan 2010 14:44:03 +0200 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B47BBD8.5010906@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <4B47BBD8.5010906@noaa.gov> Message-ID: <1263041042.4144.3.camel@idol> pe, 2010-01-08 kello 15:12 -0800, Christopher Barker kirjoitti: > 1) optionally allow newlines to serve as a delimiter, so large tables > can be read. I don't really like handling newlines specially. For instance, I could have data like 1, 2, 3; 4, 5, 6; 7, 8, 9; Allowing an "alternative separator" would sound better to me. The above data could then be read like fromfile('foo.txt', sep=' , ', sep2=' ; ') or perhaps fromfile('foo.txt', sep=[' , ', ' ; ']) Since whitespace matches also newlines, this would work. Pauli From efiring at hawaii.edu Sat Jan 9 14:30:48 2010 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 Jan 2010 09:30:48 -1000 Subject: [Numpy-discussion] mvoid test error with svn Message-ID: <4B48D968.50104@hawaii.edu> Building numpy from svn and then running numpy.test(), I get the following error: ERROR: Test filled w/ mvoid ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/numpy/ma/tests/test_core.py", line 506, in test_filled_w_mvoid a = mvoid(np.array((1, 2)), mask=[(0, 1)], dtype=ndtype) File "/usr/local/lib/python2.6/dist-packages/numpy/ma/core.py", line 5453, in __new__ _data = ndarray.__new__(self, (), dtype=dtype, buffer=data.data) TypeError: buffer is too small for requested array ---------------------------------------------------------------------- Ran 2505 tests in 10.478s FAILED (KNOWNFAIL=5, SKIP=4, errors=1) In [6]:numpy.version.version Out[6]:'1.5.0.dev8040' In [7]:!uname -a Linux manini 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC 2009 i686 GNU/Linux Eric From charlesr.harris at gmail.com Sat Jan 9 14:57:19 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Jan 2010 12:57:19 -0700 Subject: [Numpy-discussion] mvoid test error with svn In-Reply-To: <4B48D968.50104@hawaii.edu> References: <4B48D968.50104@hawaii.edu> Message-ID: On Sat, Jan 9, 2010 at 12:30 PM, Eric Firing wrote: > Building numpy from svn and then running numpy.test(), I get the > following error: > > ERROR: Test filled w/ mvoid > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/local/lib/python2.6/dist-packages/numpy/ma/tests/test_core.py", > line 506, in test_filled_w_mvoid > a = mvoid(np.array((1, 2)), mask=[(0, 1)], dtype=ndtype) > File "/usr/local/lib/python2.6/dist-packages/numpy/ma/core.py", line > 5453, in __new__ > _data = ndarray.__new__(self, (), dtype=dtype, buffer=data.data) > TypeError: buffer is too small for requested array > > ---------------------------------------------------------------------- > Ran 2505 tests in 10.478s > > FAILED (KNOWNFAIL=5, SKIP=4, errors=1) > > > In [6]:numpy.version.version > Out[6]:'1.5.0.dev8040' > > In [7]:!uname -a > Linux manini 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC > 2009 i686 GNU/Linux > > > There is already a ticket for this: http://projects.scipy.org/numpy/ticket/1346 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Sat Jan 9 20:32:48 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 09 Jan 2010 17:32:48 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1263041042.4144.3.camel@idol> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <4B47BBD8.5010906@noaa.gov> <1263041042.4144.3.camel@idol> Message-ID: <4B492E40.6010002@noaa.gov> Pauli Virtanen wrote: > I don't really like handling newlines specially. For instance, I could > have data like > > 1, 2, 3; > 4, 5, 6; > 7, 8, 9; > > Allowing an "alternative separator" would sound better to me. The above > data could then be read like > > fromfile('foo.txt', sep=' , ', sep2=' ; ') > > or perhaps > > fromfile('foo.txt', sep=[' , ', ' ; ']) I like this syntax better, but: 1) Yes you "could" have data like that, but do you? I've never seen it. Maybe other have. 2) if you did, it would probably indicate something the user would want reserved, like the shape of the array. And newlines really are a special case -- they have a special meaning, and they are very, very common (universal, even)! So, it's just more code than I'm probably going to write. If someone does want to write more code than I do, it would probably make sense to do what someone suggested in the ticket: write a optimized version of loadtxt in C. Anyway. I'll think about it when I poke at the code more. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From denis-bz-py at t-online.de Sun Jan 10 11:34:27 2010 From: denis-bz-py at t-online.de (denis) Date: Sun, 10 Jan 2010 17:34:27 +0100 Subject: [Numpy-discussion] Behaviour of vdot(array2d, array1d) In-Reply-To: <90626729-8117-4C72-9509-E1BE7D7F7933@gmx.de> References: <90626729-8117-4C72-9509-E1BE7D7F7933@gmx.de> Message-ID: On 07/01/2010 18:51, Nikolas Tezak wrote: > However when I do this, vdot raises a ValueError complaining that the > "vectors have different lengths". Nikolas, looks like a bug, in numpy 1.4 on mac ppc too. Use dot instead -- import numpy as np x = 1j * np.ones(( 2, 3 )) y = np.ones( 3 ) try: print "vdot:", np.vdot( x, y ) except ValueError, e: print "ValueError:", e print "dot:", np.dot( x.conj(), y ) numpy core/numeric.py has from _dotblas import dot, vdot but I don't know how to use testing/... for blas, nor how to log a ticket -- experts please advise ? cheers -- denis From tjhnson at gmail.com Sun Jan 10 17:11:06 2010 From: tjhnson at gmail.com (T J) Date: Sun, 10 Jan 2010 14:11:06 -0800 Subject: [Numpy-discussion] Uninformative Error Message-ID: When passing in a list of longs and asking that the dtype be a float (yes, losing precision), the error message is uninformative whenever the long is larger than the largest float. >>> x = 181626642333486640664316511479918087634811756599984861278481913634852446858952226941059178462566942027148832976486383692715763966132465634039844094073670028044755150133224694791817752891901042496950233943249209777416692569138779593594686170807571874640682826295728116325492852625325418526603207268018328608840 >>> array(x, dtype=float) OverflowError: long int too large to convert to float >>> array([x], dtype=float) ValueError: setting an array element with a sequence. The first error is informative, but the second is not and will occur anytime one tries to convert a python list containing longs which are too long. Is there a way this error message could be made more helpful? From timmichelsen at gmx-topmail.de Mon Jan 11 05:10:33 2010 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Mon, 11 Jan 2010 10:10:33 +0000 (UTC) Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables Message-ID: Hello, I experienced the following issue with numpy 1.4: scipy.stats: Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 import scipy.stats as st Traceback (most recent call last): File "", line 1, in File "C:\Python26\lib\site-packages\scipy\stats\__init__.py", line 7, in from stats import * File "C:\Python26\lib\site-packages\scipy\stats\stats.py", line 203, in from morestats import find_repeats #is only reference to scipy.stats File "C:\Python26\lib\site-packages\scipy\stats\morestats.py", line 7, in import distributions File "C:\Python26\lib\site-packages\scipy\stats\distributions.py", line 27, in import vonmises_cython File "numpy.pxd", line 30, in scipy.stats.vonmises_cython (scipy\stats\vonmises_cython.c:2939) ValueError: numpy.dtype does not appear to be the correct type object pytables: import tables Traceback (most recent call last): File "", line 1, in File "C:\Python26\lib\site-packages\tables\__init__.py", line 56, in from tables.utilsExtension import getPyTablesVersion, getHDF5Version File "definitions.pxd", line 138, in tables.utilsExtension ValueError: numpy.dtype does not appear to be the correct type object Is this an error in numpy or no the other packages require update in the code? Thanks, Timmie From pgmdevlist at gmail.com Mon Jan 11 05:54:23 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 11 Jan 2010 05:54:23 -0500 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: References: Message-ID: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> On Jan 11, 2010, at 5:10 AM, Tim Michelsen wrote: > Hello, > I experienced the following issue with numpy 1.4: > ... > > Is this an error in numpy or no the other packages require update in the code? Let me guess, you just recently updated numpy ? I'd bet ybut forgot to recompile scipy and pytables... From ndbecker2 at gmail.com Mon Jan 11 09:35:51 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 11 Jan 2010 09:35:51 -0500 Subject: [Numpy-discussion] savetxt only saves real part of complex Message-ID: Is this a bug? I think silently discarding the imaginary part is a bug. From denis-bz-py at t-online.de Mon Jan 11 11:56:55 2010 From: denis-bz-py at t-online.de (denis) Date: Mon, 11 Jan 2010 17:56:55 +0100 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> Message-ID: Only 2 of the 21 top-level subpackages draw that warning with numpy-1.4.0-py2.6-python.org.dmg scipy-0.7.1-py2.6-python.org.dmg on my mac 10.4 ppc, python 2.6.4: try: import scipy.cluster except ValueError, e: print "scipy.cluster error", e try: import scipy.constants except ValueError, e: print "scipy.constants error", e ... scipy.cluster error numpy.dtype does not appear to be the correct type object .../linsolve/__init__.py:4: DeprecationWarning: scipy.linsolve has moved to scipy.sparse.linalg.dsolve warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve', DeprecationWarning) scipy.stats error numpy.dtype does not appear to be the correct type object cheers -- denis From josef.pktd at gmail.com Mon Jan 11 12:10:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 12:10:24 -0500 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> Message-ID: <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> On Mon, Jan 11, 2010 at 11:56 AM, denis wrote: > Only 2 of the 21 top-level subpackages draw that warning > with numpy-1.4.0-py2.6-python.org.dmg > scipy-0.7.1-py2.6-python.org.dmg > on my mac 10.4 ppc, python 2.6.4: > > try: > ? ? import scipy.cluster > except ValueError, e: > ? ? print "scipy.cluster error", e > try: > ? ? import scipy.constants > except ValueError, e: > ? ? print "scipy.constants error", e > ... > > scipy.cluster error numpy.dtype does not appear to be the correct type object > .../linsolve/__init__.py:4: DeprecationWarning: scipy.linsolve has moved to scipy.sparse.linalg.dsolve > ? warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve', DeprecationWarning) > scipy.stats error numpy.dtype does not appear to be the correct type object For this problem, it's supposed to be only those packages that have or import cython generated code. Josef > > cheers > ? -- denis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andyjian430074 at gmail.com Mon Jan 11 18:44:59 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 17:44:59 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable Message-ID: <4B4BB7FB.6060004@gmail.com> Hello, I want to use scipy.sparse.linalg.eigen function, but it keeps popping out error message: TypeError: 'module' object is not callable "eigen" is a module, but it has "__call__" method. Why couldn't I call scipy.sparse.linalg.eigen(...)? Thanks. Jankins From robert.kern at gmail.com Mon Jan 11 18:49:11 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Jan 2010 17:49:11 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BB7FB.6060004@gmail.com> References: <4B4BB7FB.6060004@gmail.com> Message-ID: <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> On Mon, Jan 11, 2010 at 17:44, Jankins wrote: > Hello, > > I want to use scipy.sparse.linalg.eigen function, but it keeps popping > out error message: > TypeError: 'module' object is not callable > > "eigen" is a module, but it has "__call__" method. Why couldn't I call > scipy.sparse.linalg.eigen(...)? Please show the complete code and the complete traceback. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From andyjian430074 at gmail.com Mon Jan 11 19:03:46 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 18:03:46 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> Message-ID: <4B4BBC62.6080002@gmail.com> It is very simple code: import networkx as nx import scipy.sparse.linalg as linalg G = nx.Graph() G.add_star(range(9)) M= nx.to_scipy_sparse_matrix(G) print linalg.eigen(M) Thanks. Jankins On 1/11/2010 5:49 PM, Robert Kern wrote: > On Mon, Jan 11, 2010 at 17:44, Jankins wrote: > >> Hello, >> >> I want to use scipy.sparse.linalg.eigen function, but it keeps popping >> out error message: >> TypeError: 'module' object is not callable >> >> "eigen" is a module, but it has "__call__" method. Why couldn't I call >> scipy.sparse.linalg.eigen(...)? >> > Please show the complete code and the complete traceback. > > From Chris.Barker at noaa.gov Mon Jan 11 19:11:26 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 11 Jan 2010 16:11:26 -0800 Subject: [Numpy-discussion] fromfile() -- aarrgg! In-Reply-To: References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> Message-ID: <4B4BBE2E.9090706@noaa.gov> Pauli Virtanen wrote: > Thu, 07 Jan 2010 17:21:34 -0800, Christopher Barker wrote: > [clip] >> It does pass on that return value, but, from ctors.c: >> >> fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype, >> void *NPY_UNUSED(stream_data)) >> { >> /* the NULL argument is for backwards-compatibility */ return >> dtype->f->scanfunc(*fp, dptr, NULL, dtype); >> } > > This functions is IMHO where the fix should go; I believe it should do > something like > > return (ret == 0 || ret == EOF) ? -1 : ret; OK, more digging (and printf debugging -- I really need to learn to debug C extensions properly). I've found the deeper issue: NumPyOS_ascii_strtod returns a 0.0 when given invalid input, such as " ,". That's why fromstring is putting in a 0.0 for empty (and invalid) fields. Diggin into NumPyOS_ascii_strtod(), it looks like it is simply a wrapper around PyOS_ascii_strtod(), that checks for NaN and Inf first (and somethign with teh decimal point, I dont' quite get). But anyway, if it is a regular old number, it gets passed of to PyOS_ascii_strtod(), which isn't outragiously well documented (no, I havne't gone to the source, yet), but is similar to the C stdlib srtod(), which says: "If no conversion is performed, zero is returned and the value of nptr is stored in the location referenced by endptr." off do do some more testing, but I guess that means that those pointers need to be checked after the call, to see if a conversion was generated. Am I right? -Chris PS: Boy, this is a pain! -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From robert.kern at gmail.com Mon Jan 11 19:12:30 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Jan 2010 18:12:30 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BBC62.6080002@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> Message-ID: <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> On Mon, Jan 11, 2010 at 18:03, Jankins wrote: > It is very simple code: > > import networkx as nx > import scipy.sparse.linalg as linalg > > G = nx.Graph() > G.add_star(range(9)) > M= nx.to_scipy_sparse_matrix(G) > print linalg.eigen(M) > > Thanks. Please post the complete traceback. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From andyjian430074 at gmail.com Mon Jan 11 19:16:12 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 18:16:12 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> Message-ID: <4B4BBF4C.5050105@gmail.com> I am sorry. My bad. File "C:\test.py", line 7, in print linalg.eigen(M) TypeError: 'module' object is not callable I installed "pythonxy". "pythonxy" has already included the scipy package. On 1/11/2010 6:12 PM, Robert Kern wrote: > On Mon, Jan 11, 2010 at 18:03, Jankins wrote: > >> It is very simple code: >> >> import networkx as nx >> import scipy.sparse.linalg as linalg >> >> G = nx.Graph() >> G.add_star(range(9)) >> M= nx.to_scipy_sparse_matrix(G) >> print linalg.eigen(M) >> >> Thanks. >> > Please post the complete traceback. > > From josef.pktd at gmail.com Mon Jan 11 20:53:30 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 20:53:30 -0500 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BBF4C.5050105@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> Message-ID: <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> On Mon, Jan 11, 2010 at 7:16 PM, Jankins wrote: > I am sorry. My bad. > > ? File "C:\test.py", line 7, in > ? ? print linalg.eigen(M) > TypeError: 'module' object is not callable > > I installed "pythonxy". "pythonxy" has already included the scipy package. > > On 1/11/2010 6:12 PM, Robert Kern wrote: >> On Mon, Jan 11, 2010 at 18:03, Jankins ?wrote: >> >>> It is very simple code: >>> >>> import networkx as nx >>> import scipy.sparse.linalg as linalg >>> >>> G = nx.Graph() >>> G.add_star(range(9)) >>> M= nx.to_scipy_sparse_matrix(G) >>> print linalg.eigen(M) >>> >>> Thanks. >>> >> Please post the complete traceback. eigen is both a function and a module. Normally the function shadows the module >python Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy.sparse.linalg.eigen >>> scipy.sparse.linalg.eigen I'm not able to import the eigen module, so there is either something different with python 2.6 or networkx is doing some magic ? Can you try without networkx, try linalg.eigen.eigen ? Does >>> scipy.sparse.linalg.eigen show the module or the function? Josef PS: I don't like functions shadowing a module >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andyjian430074 at gmail.com Mon Jan 11 21:03:35 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 20:03:35 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> Message-ID: <4B4BD877.3090509@gmail.com> Here is the command line python: >>> import scipy.sparse.linalg as linalg >>> >>> linalg.eigen() Traceback (most recent call last): File "", line 1, in TypeError: 'module' object is not callable >>> It's really wired. Jankins On 1/11/2010 7:53 PM, josef.pktd at gmail.com wrote: > On Mon, Jan 11, 2010 at 7:16 PM, Jankins wrote: > >> I am sorry. My bad. >> >> File "C:\test.py", line 7, in >> print linalg.eigen(M) >> TypeError: 'module' object is not callable >> >> I installed "pythonxy". "pythonxy" has already included the scipy package. >> >> On 1/11/2010 6:12 PM, Robert Kern wrote: >> >>> On Mon, Jan 11, 2010 at 18:03, Jankins wrote: >>> >>> >>>> It is very simple code: >>>> >>>> import networkx as nx >>>> import scipy.sparse.linalg as linalg >>>> >>>> G = nx.Graph() >>>> G.add_star(range(9)) >>>> M= nx.to_scipy_sparse_matrix(G) >>>> print linalg.eigen(M) >>>> >>>> Thanks. >>>> >>>> >>> Please post the complete traceback. >>> > eigen is both a function and a module. Normally the function shadows the module > > >> python >> > Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. > >>>> import scipy.sparse.linalg.eigen >>>> scipy.sparse.linalg.eigen >>>> > > > I'm not able to import the eigen module, so there is either something > different with python 2.6 or networkx is doing some magic ? > > Can you try without networkx, try linalg.eigen.eigen ? > > Does > >>>> scipy.sparse.linalg.eigen >>>> > show the module or the function? > > Josef > > PS: I don't like functions shadowing a module > >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 11 21:55:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 21:55:29 -0500 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BD877.3090509@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> Message-ID: <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> On Mon, Jan 11, 2010 at 9:03 PM, Jankins wrote: > Here is the command line python: > > ?>>> import scipy.sparse.linalg as linalg > ?>>> > ?>>> linalg.eigen() > Traceback (most recent call last): > ? File "", line 1, in > TypeError: 'module' object is not callable > ?>>> linalg.eigen.eigen ? Is your working directory inside scipy ? I have no idea, since I'm not able not to get the function, and your information is a bit minimal. Josef > > It's really wired. > > Jankins > > On 1/11/2010 7:53 PM, josef.pktd at gmail.com wrote: >> On Mon, Jan 11, 2010 at 7:16 PM, Jankins ?wrote: >> >>> I am sorry. My bad. >>> >>> ? ?File "C:\test.py", line 7, in >>> ? ? ?print linalg.eigen(M) >>> TypeError: 'module' object is not callable >>> >>> I installed "pythonxy". "pythonxy" has already included the scipy package. >>> >>> On 1/11/2010 6:12 PM, Robert Kern wrote: >>> >>>> On Mon, Jan 11, 2010 at 18:03, Jankins ? ?wrote: >>>> >>>> >>>>> It is very simple code: >>>>> >>>>> import networkx as nx >>>>> import scipy.sparse.linalg as linalg >>>>> >>>>> G = nx.Graph() >>>>> G.add_star(range(9)) >>>>> M= nx.to_scipy_sparse_matrix(G) >>>>> print linalg.eigen(M) >>>>> >>>>> Thanks. >>>>> >>>>> >>>> Please post the complete traceback. >>>> >> eigen is both a function and a module. Normally the function shadows the module >> >> >>> python >>> >> Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on >> win32 >> Type "help", "copyright", "credits" or "license" for more information. >> >>>>> import scipy.sparse.linalg.eigen >>>>> scipy.sparse.linalg.eigen >>>>> >> >> >> I'm not able to import the eigen module, so there is either something >> different with python 2.6 or networkx is doing some magic ? >> >> Can you try without networkx, try linalg.eigen.eigen ? >> >> Does >> >>>>> scipy.sparse.linalg.eigen >>>>> >> show the module or the function? >> >> Josef >> >> PS: I don't like functions shadowing a module >> >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andyjian430074 at gmail.com Mon Jan 11 22:31:47 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 21:31:47 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> Message-ID: <4B4BED23.4080109@gmail.com> linalg has no attribute "eigen". Are you able to use scipy.sparse.linalg.eigen? My working dir is not inside scipy. It is 'C:\\Users\\jankins'. I am using Python 2.6.2 and the latest version of scipy. What should I do? And I couldn't even successfully install scipy in Ubuntu 9.10 neither by "easy_install" or "source compilation". I am so desperate. I planed to use the function to calculate the eigenvalue of a graph.The graph has about 265,214 nodes and 420,045 edges. So it's better to use sparse matrix. Jankins On 1/11/2010 8:55 PM, josef.pktd at gmail.com wrote: > On Mon, Jan 11, 2010 at 9:03 PM, Jankins wrote: > >> Here is the command line python: >> >> >>> import scipy.sparse.linalg as linalg >> >>> >> >>> linalg.eigen() >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: 'module' object is not callable >> >>> >> > linalg.eigen.eigen ? > > Is your working directory inside scipy ? > > I have no idea, since I'm not able not to get the function, and your > information is a bit minimal. > > Josef > > >> It's really wired. >> >> Jankins >> >> On 1/11/2010 7:53 PM, josef.pktd at gmail.com wrote: >> >>> On Mon, Jan 11, 2010 at 7:16 PM, Jankins wrote: >>> >>> >>>> I am sorry. My bad. >>>> >>>> File "C:\test.py", line 7, in >>>> print linalg.eigen(M) >>>> TypeError: 'module' object is not callable >>>> >>>> I installed "pythonxy". "pythonxy" has already included the scipy package. >>>> >>>> On 1/11/2010 6:12 PM, Robert Kern wrote: >>>> >>>> >>>>> On Mon, Jan 11, 2010 at 18:03, Jankins wrote: >>>>> >>>>> >>>>> >>>>>> It is very simple code: >>>>>> >>>>>> import networkx as nx >>>>>> import scipy.sparse.linalg as linalg >>>>>> >>>>>> G = nx.Graph() >>>>>> G.add_star(range(9)) >>>>>> M= nx.to_scipy_sparse_matrix(G) >>>>>> print linalg.eigen(M) >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>> Please post the complete traceback. >>>>> >>>>> >>> eigen is both a function and a module. Normally the function shadows the module >>> >>> >>> >>>> python >>>> >>>> >>> Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on >>> win32 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> >>>>>> import scipy.sparse.linalg.eigen >>>>>> scipy.sparse.linalg.eigen >>>>>> >>>>>> >>> >>> >>> I'm not able to import the eigen module, so there is either something >>> different with python 2.6 or networkx is doing some magic ? >>> >>> Can you try without networkx, try linalg.eigen.eigen ? >>> >>> Does >>> >>> >>>>>> scipy.sparse.linalg.eigen >>>>>> >>>>>> >>> show the module or the function? >>> >>> Josef >>> >>> PS: I don't like functions shadowing a module >>> >>> >>>>> >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 11 22:45:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 11 Jan 2010 22:45:29 -0500 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BED23.4080109@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> Message-ID: <1cd32cbb1001111945t7f185967p2ccdef6c2cb22b62@mail.gmail.com> On Mon, Jan 11, 2010 at 10:31 PM, Jankins wrote: > linalg has no attribute "eigen". You should post full tracebacks. I don't understand this error, because before eigen seemed to exist. You could run the test suite to see if the installation is ok and sparse is working correctly. >>> import scipy.sparse >>> scipy.sparse.test() which is for me: Ran 442 tests in 139.500s OK (KNOWNFAIL=4, SKIP=11) If there are installation problems, then I have no idea since I'm a (happy) Windows user. Josef > > Are you able to use scipy.sparse.linalg.eigen? > > My working dir is not inside scipy. ?It is 'C:\\Users\\jankins'. > > I am using Python 2.6.2 and the latest version of scipy. > > What should I do? And I couldn't even successfully install scipy in > Ubuntu 9.10 neither by "easy_install" or "source compilation". I am so > desperate. > > I planed to use the function to calculate the eigenvalue of a graph.The > graph has about 265,214 nodes and 420,045 edges. So it's better to use > sparse matrix. > > Jankins > > On 1/11/2010 8:55 PM, josef.pktd at gmail.com wrote: >> On Mon, Jan 11, 2010 at 9:03 PM, Jankins ?wrote: >> >>> Here is the command line python: >>> >>> ? >>> ?import scipy.sparse.linalg as linalg >>> ? >>> >>> ? >>> ?linalg.eigen() >>> Traceback (most recent call last): >>> ? ?File "", line 1, in >>> TypeError: 'module' object is not callable >>> ? >>> >>> >> linalg.eigen.eigen ? ? >> >> Is your working directory inside scipy ? >> >> I have no idea, since I'm not able not to get the function, and your >> information is a bit minimal. >> >> Josef >> >> >>> It's really wired. >>> >>> Jankins >>> >>> On 1/11/2010 7:53 PM, josef.pktd at gmail.com wrote: >>> >>>> On Mon, Jan 11, 2010 at 7:16 PM, Jankins ? ?wrote: >>>> >>>> >>>>> I am sorry. My bad. >>>>> >>>>> ? ? File "C:\test.py", line 7, in >>>>> ? ? ? print linalg.eigen(M) >>>>> TypeError: 'module' object is not callable >>>>> >>>>> I installed "pythonxy". "pythonxy" has already included the scipy package. >>>>> >>>>> On 1/11/2010 6:12 PM, Robert Kern wrote: >>>>> >>>>> >>>>>> On Mon, Jan 11, 2010 at 18:03, Jankins ? ? ?wrote: >>>>>> >>>>>> >>>>>> >>>>>>> It is very simple code: >>>>>>> >>>>>>> import networkx as nx >>>>>>> import scipy.sparse.linalg as linalg >>>>>>> >>>>>>> G = nx.Graph() >>>>>>> G.add_star(range(9)) >>>>>>> M= nx.to_scipy_sparse_matrix(G) >>>>>>> print linalg.eigen(M) >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>> Please post the complete traceback. >>>>>> >>>>>> >>>> eigen is both a function and a module. Normally the function shadows the module >>>> >>>> >>>> >>>>> python >>>>> >>>>> >>>> Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on >>>> win32 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>> >>>> >>>>>>> import scipy.sparse.linalg.eigen >>>>>>> scipy.sparse.linalg.eigen >>>>>>> >>>>>>> >>>> >>>> >>>> I'm not able to import the eigen module, so there is either something >>>> different with python 2.6 or networkx is doing some magic ? >>>> >>>> Can you try without networkx, try linalg.eigen.eigen ? >>>> >>>> Does >>>> >>>> >>>>>>> scipy.sparse.linalg.eigen >>>>>>> >>>>>>> >>>> show the module or the function? >>>> >>>> Josef >>>> >>>> PS: I don't like functions shadowing a module >>>> >>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david at silveregg.co.jp Mon Jan 11 23:33:33 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Tue, 12 Jan 2010 13:33:33 +0900 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BED23.4080109@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> Message-ID: <4B4BFB9D.4010403@silveregg.co.jp> Jankins wrote: > > What should I do? And I couldn't even successfully install scipy in > Ubuntu 9.10 neither by "easy_install" or "source compilation". I am so > desperate. Don't use easy_install, and install from sources with python setup.py install, both numpy and scipy, after having installed the following packages: sudo apt-get install gfortran python-dev libatlas-base-dev python-nose Before doing so, you should remove both the build directories (rm -rf build in your source tree) and the previously installed numpy/scipy if any (in /usr/local/lib/python2.6/site-packages/ on Ubuntu 9.10). You should then be able to test your installations doing something like: python -c "import numpy; numpy.test(); import scipy; scipy.test()" cheers, David From andyjian430074 at gmail.com Mon Jan 11 23:53:46 2010 From: andyjian430074 at gmail.com (Jankins) Date: Mon, 11 Jan 2010 22:53:46 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BFB9D.4010403@silveregg.co.jp> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> Message-ID: <4B4C005A.6030809@gmail.com> Thanks so much. I have successfully installed scipy in Ubuntu 9.10. But I still couldn't use scipy.sparse.linalg.eigen function. The test result is : Ran 3490 tests in 40.268s FAILED (KNOWNFAIL=4, SKIP=28, failures=1) Thanks again. Jankins On 1/11/2010 10:33 PM, David Cournapeau wrote: > Jankins wrote: > > >> What should I do? And I couldn't even successfully install scipy in >> Ubuntu 9.10 neither by "easy_install" or "source compilation". I am so >> desperate. >> > Don't use easy_install, and install from sources with python setup.py > install, both numpy and scipy, after having installed the following > packages: > > sudo apt-get install gfortran python-dev libatlas-base-dev python-nose > > Before doing so, you should remove both the build directories (rm -rf > build in your source tree) and the previously installed numpy/scipy if > any (in /usr/local/lib/python2.6/site-packages/ on Ubuntu 9.10). > > You should then be able to test your installations doing something like: > > python -c "import numpy; numpy.test(); import scipy; scipy.test()" > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david at silveregg.co.jp Tue Jan 12 00:46:38 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Tue, 12 Jan 2010 14:46:38 +0900 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4C005A.6030809@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> Message-ID: <4B4C0CBE.7080306@silveregg.co.jp> Jankins wrote: > Thanks so much. I have successfully installed scipy in Ubuntu 9.10. But > I still couldn't use scipy.sparse.linalg.eigen function. Please report *exactly* the suite of commands which is failing. For example, the following works for me: import numpy as np from scipy.sparse import csr_matrix from scipy.sparse.linalg.eigen import eigen m = np.random.randn(10, 10) sm = csr_matrix(m) print eigen(sm) # Give the 6 first (largest) eigen values of sm Note that I am not sure eigen will be able to cope with your problem's size. I already had trouble with problems 1 to 2 order of magnitude smaller than that (~ 5e4 x 5e4) cheers, David From andyjian430074 at gmail.com Tue Jan 12 01:35:55 2010 From: andyjian430074 at gmail.com (Jankins) Date: Tue, 12 Jan 2010 00:35:55 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4C0CBE.7080306@silveregg.co.jp> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> <4B4C0CBE.7080306@silveregg.co.jp> Message-ID: <4B4C184B.7010607@gmail.com> Here is the complete command lines in Windows 7: C:\Users\jankins>python Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from scipy.sparse.linalg.eigen import eigen Traceback (most recent call last): File "", line 1, in ImportError: cannot import name eigen >>> Here is the complete command lines in Ubuntu 9.10: Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from scipy.sparse.linalg.eigen import eigen Traceback (most recent call last): File "", line 1, in ImportError: cannot import name eigen >>> It's so wired. Thanks. On 1/11/2010 11:46 PM, David Cournapeau wrote: > Jankins wrote: > >> Thanks so much. I have successfully installed scipy in Ubuntu 9.10. But >> I still couldn't use scipy.sparse.linalg.eigen function. >> > Please report *exactly* the suite of commands which is failing. For > example, the following works for me: > > import numpy as np > from scipy.sparse import csr_matrix > from scipy.sparse.linalg.eigen import eigen > > m = np.random.randn(10, 10) > sm = csr_matrix(m) > > print eigen(sm) # Give the 6 first (largest) eigen values of sm > > > Note that I am not sure eigen will be able to cope with your problem's > size. I already had trouble with problems 1 to 2 order of magnitude > smaller than that (~ 5e4 x 5e4) > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Tue Jan 12 03:37:57 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 12 Jan 2010 10:37:57 +0200 Subject: [Numpy-discussion] fromfile() -- aarrgg! In-Reply-To: <4B4BBE2E.9090706@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> <4B4BBE2E.9090706@noaa.gov> Message-ID: <1263285477.7976.10.camel@talisman> ma, 2010-01-11 kello 16:11 -0800, Christopher Barker kirjoitti: [clip] > "If no conversion is performed, zero is returned and the value of nptr > is stored in the location referenced by endptr." > > off do do some more testing, but I guess that means that those pointers > need to be checked after the call, to see if a conversion was generated. > > Am I right? Yes, that's how strtod() is typically used. NumPyOS_ascii_ftolf already checks that, but it seems to me that fromstr_next_element or possibly fromstr does not. > PS: Boy, this is a pain! Welcome to the wonderful world of C ;) Pauli From jonboym2 at yahoo.co.uk Tue Jan 12 08:11:02 2010 From: jonboym2 at yahoo.co.uk (Jon Moore) Date: Tue, 12 Jan 2010 13:11:02 +0000 (GMT) Subject: [Numpy-discussion] Getting Callbacks with arrays to work Message-ID: <832556.93262.qm@web24504.mail.ird.yahoo.com> Hi, I'm trying to build a differential equation integrator and later a stochastic differential equation integrator. I'm having trouble getting f2py to work where the callback itself receives an array from the Fortran routine does some work on it and then passes an array back. ? For the stoachastic integrator I'll need 2 callbacks both dealing with arrays. The idea is the code that never changes (ie the integrator) will be in Fortran and the code that changes (ie the callbacks defining differential equations) will be different for each problem. To test the idea I've written basic code which should pass an array back and forth between Python and Fortran if it works right. Here is some code which doesn't work properly:- SUBROUTINE CallbackTest(dv,v0,Vout,N) ??? !IMPLICIT NONE ?????? ? cF2PY???? intent( hide ):: N ??? INTEGER:: N, ic ?????????????? ? ??? EXTERNAL:: dv?????? ? ??? DOUBLE PRECISION, DIMENSION( N ), INTENT(IN):: v0?????? ? ??? DOUBLE PRECISION, DIMENSION( N ), INTENT(OUT):: Vout ?????????? ? ??? DOUBLE PRECISION, DIMENSION( N ):: Vnow ??? DOUBLE PRECISION, DIMENSION( N )::? temp ?????? ? ??? Vnow = v0 ?????? ? ??? temp = dv(Vnow, N) ??? DO ic = 1, N ??????? Vout( ic ) = temp(ic) ??? END DO?? ? ?????? ? END SUBROUTINE CallbackTest When I test it with this python code I find the code just replicates the first term of the array! from numpy import * import callback as c def dV(v): ??? print 'in Python dV: V is: ',v ??? return v.copy()?? ? arr = array([2.0, 4.0, 6.0, 8.0]) print 'Arr is: ', arr output = c.CallbackTest(dV, arr) print 'Out is: ', output Arr is:? [ 2.? 4.? 6.? 8.] in Python dV: V is:? [ 2.? 4.? 6.? 8.] Out is:? [ 2.? 2.? 2.? 2.] Any ideas how I should do this, and also how do I get the code to work with implicit none not commented out? Thanks Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From pearu.peterson at gmail.com Tue Jan 12 08:44:33 2010 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Tue, 12 Jan 2010 15:44:33 +0200 Subject: [Numpy-discussion] Getting Callbacks with arrays to work In-Reply-To: <832556.93262.qm@web24504.mail.ird.yahoo.com> References: <832556.93262.qm@web24504.mail.ird.yahoo.com> Message-ID: <4B4C7CC1.3010304@cens.ioc.ee> Hi, The problem is that f2py does not support callbacks that return arrays. There is easy workaround to that: provide returnable arrays as arguments to callback functions. Using your example: SUBROUTINE CallbackTest(dv,v0,Vout,N) IMPLICIT NONE !F2PY intent( hide ):: N INTEGER:: N, ic EXTERNAL:: dv DOUBLE PRECISION, DIMENSION( N ), INTENT(IN):: v0 DOUBLE PRECISION, DIMENSION( N ), INTENT(OUT):: Vout DOUBLE PRECISION, DIMENSION( N ):: Vnow DOUBLE PRECISION, DIMENSION( N ):: temp Vnow = v0 !f2py intent (out) temp call dv(temp, Vnow, N) DO ic = 1, N Vout( ic ) = temp(ic) END DO END SUBROUTINE CallbackTest $ f2py -c test.f90 -m t --fcompiler=gnu95 >>> from numpy import * >>> from t import * >>> arr = array([2.0, 4.0, 6.0, 8.0]) >>> def dV(v): print 'in Python dV: V is: ',v ret = v.copy() ret[1] = 100.0 return ret ... >>> output = callbacktest(dV, arr) in Python dV: V is: [ 2. 4. 6. 8.] >>> output array([ 2., 100., 6., 8.]) What problems do you have with implicit none? It works fine here. Check the format of your source code, if it is free then use `.f90` extension, not `.f`. HTH, Pearu Jon Moore wrote: > Hi, > > I'm trying to build a differential equation integrator and later a > stochastic differential equation integrator. > > I'm having trouble getting f2py to work where the callback itself > receives an array from the Fortran routine does some work on it and then > passes an array back. > > For the stoachastic integrator I'll need 2 callbacks both dealing with > arrays. > > The idea is the code that never changes (ie the integrator) will be in > Fortran and the code that changes (ie the callbacks defining > differential equations) will be different for each problem. > > To test the idea I've written basic code which should pass an array back > and forth between Python and Fortran if it works right. > > Here is some code which doesn't work properly:- > > SUBROUTINE CallbackTest(dv,v0,Vout,N) > !IMPLICIT NONE > > cF2PY intent( hide ):: N > INTEGER:: N, ic > > EXTERNAL:: dv > > DOUBLE PRECISION, DIMENSION( N ), INTENT(IN):: v0 > DOUBLE PRECISION, DIMENSION( N ), INTENT(OUT):: Vout > > DOUBLE PRECISION, DIMENSION( N ):: Vnow > DOUBLE PRECISION, DIMENSION( N ):: temp > > Vnow = v0 > > > temp = dv(Vnow, N) > > DO ic = 1, N > Vout( ic ) = temp(ic) > END DO > > END SUBROUTINE CallbackTest > > > > When I test it with this python code I find the code just replicates the > first term of the array! > > > > > from numpy import * > import callback as c > > def dV(v): > print 'in Python dV: V is: ',v > return v.copy() > > arr = array([2.0, 4.0, 6.0, 8.0]) > > print 'Arr is: ', arr > > output = c.CallbackTest(dV, arr) > > print 'Out is: ', output > > > > > Arr is: [ 2. 4. 6. 8.] > > in Python dV: V is: [ 2. 4. 6. 8.] > > Out is: [ 2. 2. 2. 2.] > > > > Any ideas how I should do this, and also how do I get the code to work > with implicit none not commented out? > > Thanks > > Jon > > > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From aisaac at american.edu Tue Jan 12 09:06:33 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 12 Jan 2010 09:06:33 -0500 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4BED23.4080109@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> Message-ID: <4B4C81E9.9050504@american.edu> >>> filter(lambda x: x.startswith('eig'),dir(np.linalg)) ['eig', 'eigh', 'eigvals', 'eigvalsh'] >>> import scipy.linalg as spla >>> filter(lambda x: x.startswith('eig'),dir(spla)) ['eig', 'eig_banded', 'eigh', 'eigvals', 'eigvals_banded', 'eigvalsh'] hth, Alan Isaac From aisaac at american.edu Tue Jan 12 09:15:37 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 12 Jan 2010 09:15:37 -0500 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4C184B.7010607@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> <4B4C0CBE.7080306@silveregg.co.jp> <4B4C184B.7010607@gmail.com> Message-ID: <4B4C8409.7060400@american.edu> On 1/12/2010 1:35 AM, Jankins wrote: >>>> from scipy.sparse.linalg.eigen import eigen > Traceback (most recent call last): > File "", line 1, in > ImportError: cannot import name eigen Look at David's example: from scipy.sparse.linalg import eigen hth, Alan Isaac From andyjian430074 at gmail.com Tue Jan 12 10:11:57 2010 From: andyjian430074 at gmail.com (Jankins) Date: Tue, 12 Jan 2010 09:11:57 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4C8409.7060400@american.edu> References: <4B4BB7FB.6060004@gmail.com> <3d375d731001111549w6f63a4d9h23b4d7ca993fae4f@mail.gmail.com> <4B4BBC62.6080002@gmail.com> <3d375d731001111612g71158707sb68a3b133a67473c@mail.gmail.com> <4B4BBF4C.5050105@gmail.com> <1cd32cbb1001111753n701e063et3bb85d401babc199@mail.gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> <4B4C0CBE.7080306@silveregg.co.jp> <4B4C184B.7010607@gmail.com> <4B4C8409.7060400@american.edu> Message-ID: <4B4C913D.2020109@gmail.com> >>> import scipy.sparse.linalg as linalg >>> dir(linalg) ['LinearOperator', 'Tester', '__all__', '__builtins__', '__doc__', '__file__', ' __name__', '__package__', '__path__', 'aslinearoperator', 'bench', 'bicg', 'bicg stab', 'cg', 'cgs', 'dsolve', 'eigen', 'factorized', 'gmres', 'interface', 'isol ve', 'iterative', 'linsolve', 'lobpcg', 'minres', 'qmr', 'splu', 'spsolve', 'tes t', 'umfpack', 'use_solver', 'utils'] >>> dir(linalg.eigen) ['Tester', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__pack age__', '__path__', 'bench', 'lobpcg', 'test'] >>> linalg.eigen.test() Running unit tests for scipy.sparse.linalg.eigen NumPy version 1.3.0 NumPy is installed in C:\Python26\lib\site-packages\numpy SciPy version 0.7.1 SciPy is installed in C:\Python26\lib\site-packages\scipy Python version 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Int el)] nose version 0.11.1 .......... ---------------------------------------------------------------------- Ran 10 tests in 2.240s OK >>> On 1/12/2010 8:15 AM, Alan G Isaac wrote: > On 1/12/2010 1:35 AM, Jankins wrote: > >>>>> from scipy.sparse.linalg.eigen import eigen >>>>> >> Traceback (most recent call last): >> File "", line 1, in >> ImportError: cannot import name eigen >> > > Look at David's example: > from scipy.sparse.linalg import eigen > > hth, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From arnar.flatberg at gmail.com Tue Jan 12 10:19:38 2010 From: arnar.flatberg at gmail.com (Arnar Flatberg) Date: Tue, 12 Jan 2010 16:19:38 +0100 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <4B4C913D.2020109@gmail.com> References: <4B4BB7FB.6060004@gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> <4B4C0CBE.7080306@silveregg.co.jp> <4B4C184B.7010607@gmail.com> <4B4C8409.7060400@american.edu> <4B4C913D.2020109@gmail.com> Message-ID: <5d3194021001120719y61b80e2en668e9dac8afa922b@mail.gmail.com> On Tue, Jan 12, 2010 at 4:11 PM, Jankins wrote: Hi On my Ubuntu, I would reach the arpack wrapper as follows: from scipy.sparse.linalg.eigen.arpack import eigen However, I'd guess that you deal with a symmetric matrix (Laplacian or adjacency matrix), so the symmetric solver might be the best choice. This might be reached by: In [29]: from scipy.sparse.linalg.eigen.arpack import eigen_symmetric In [30]: scipy.__version__ Out[30]: '0.7.0' Arnar -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.pascoe at stfc.ac.uk Tue Jan 12 10:52:22 2010 From: stephen.pascoe at stfc.ac.uk (stephen.pascoe at stfc.ac.uk) Date: Tue, 12 Jan 2010 15:52:22 -0000 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? Message-ID: We have noticed the MaskedArray implementation in numpy-1.4.0 breaks some of our code. For instance we see the following: in 1.3.0: >>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) >>> numpy.ma.sum(a, 1) masked_array(data = [ 6 15], mask = False, fill_value = 999999) in 1.4.0 >>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) >>> numpy.ma.sum(a, 1) Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n umpy/ma/core.py", line 5682, in __call__ return method(*args, **params) File "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n umpy/ma/core.py", line 4357, in sum newmask = _mask.all(axis=axis) ValueError: axis(=1) out of bounds Also note the "Report Bugs" link on http://numpy.scipy.org is broken (http://numpy.scipy.org/bug-report.html) Thanks, Stephen. --- Stephen Pascoe +44 (0)1235 445980 British Atmospheric Data Centre Rutherford Appleton Laboratory -- Scanned by iCritical. From andyjian430074 at gmail.com Tue Jan 12 11:28:25 2010 From: andyjian430074 at gmail.com (Jankins) Date: Tue, 12 Jan 2010 10:28:25 -0600 Subject: [Numpy-discussion] TypeError: 'module' object is not callable In-Reply-To: <5d3194021001120719y61b80e2en668e9dac8afa922b@mail.gmail.com> References: <4B4BB7FB.6060004@gmail.com> <4B4BD877.3090509@gmail.com> <1cd32cbb1001111855r2b45ccden3d43030e08fb0d4d@mail.gmail.com> <4B4BED23.4080109@gmail.com> <4B4BFB9D.4010403@silveregg.co.jp> <4B4C005A.6030809@gmail.com> <4B4C0CBE.7080306@silveregg.co.jp> <4B4C184B.7010607@gmail.com> <4B4C8409.7060400@american.edu> <4B4C913D.2020109@gmail.com> <5d3194021001120719y61b80e2en668e9dac8afa922b@mail.gmail.com> Message-ID: <4B4CA329.4040900@gmail.com> Thanks so so much. Finally, it works. >>> import scipy.sparse.linalg.eigen.arpack as arpack >>> dir(arpack) ['__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', ' _arpack', 'arpack', 'aslinearoperator', 'eigen', 'eigen_symmetric', 'np', 'speig s', 'warnings'] >>> But I still didn't get it. Why some of you can directly use scipy.sparse.linalg.eigen as a function, while some of you couldn't use it that way? Anyway, your solution works for me. On 1/12/2010 9:19 AM, Arnar Flatberg wrote: > > > On Tue, Jan 12, 2010 at 4:11 PM, Jankins > wrote: > > Hi > > On my Ubuntu, I would reach the arpack wrapper as follows: > > from scipy.sparse.linalg.eigen.arpack import eigen > > However, I'd guess that you deal with a symmetric matrix (Laplacian or > adjacency matrix), so the symmetric solver might be the best choice. > > This might be reached by: > > In [29]: from scipy.sparse.linalg.eigen.arpack import eigen_symmetric > In [30]: scipy.__version__ > Out[30]: '0.7.0' > > > Arnar > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-py at t-online.de Tue Jan 12 11:33:21 2010 From: denis-bz-py at t-online.de (denis) Date: Tue, 12 Jan 2010 17:33:21 +0100 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> Message-ID: On 11/01/2010 18:10, josef.pktd at gmail.com wrote: > For this problem, it's supposed to be only those packages that have or > import cython generated code. Right; is this a known bug, is there a known fix for mac dmgs ? (Whisper, how'd it get past testing ?) scipy/stats/__init__.py has an apparent patch which doesn't work #remove vonmises_cython from __all__, I don't know why it is included __all__ = filter(lambda s:not (s.startswith('_') or s.endswith('cython')),dir()) but just removing vonmises_cython in distributions.py => import scipy.stats then works. Similarly import scipy.cluster => trace File "numpy.pxd", line 30, in scipy.spatial.ckdtree (scipy/spatial/ckdtree.c:6087) ValueError: numpy.dtype does not appear to be the correct type object I like the naming convention xx_cython.so. cheers -- denis From robert.kern at gmail.com Tue Jan 12 11:41:00 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Jan 2010 10:41:00 -0600 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> Message-ID: <3d375d731001120841y7bc9bb9cla1b49c6deb2b5232@mail.gmail.com> On Tue, Jan 12, 2010 at 10:33, denis wrote: > On 11/01/2010 18:10, josef.pktd at gmail.com wrote: > >> For this problem, it's supposed to be only those packages that have or >> import cython generated code. > > Right; is this a known bug, is there a known fix ?for mac dmgs ? > (Whisper, how'd it get past testing ?) It's not a bug, but it is a known issue. We tried very hard to keep numpy 1.4 binary compatible; however, Pyrex and Cython impose additional runtime checks above and beyond binary compatibility. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Tue Jan 12 11:42:00 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 12 Jan 2010 11:42:00 -0500 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> Message-ID: <1cd32cbb1001120842l17571babx8a31ca86c7b95d8b@mail.gmail.com> On Tue, Jan 12, 2010 at 11:33 AM, denis wrote: > On 11/01/2010 18:10, josef.pktd at gmail.com wrote: > >> For this problem, it's supposed to be only those packages that have or >> import cython generated code. > > Right; is this a known bug, is there a known fix ?for mac dmgs ? > (Whisper, how'd it get past testing ?) Switching to numpy 1.4 requires recompiling cython code (i.e. scipy), there's a lot of information on the details in the mailing lists. > > scipy/stats/__init__.py has an apparent patch which doesn't work > ? ? #remove vonmises_cython from __all__, I don't know why it is included > ? ? __all__ = filter(lambda s:not (s.startswith('_') or s.endswith('cython')),dir()) No this is unrelated, this is just to reduce namespace pollution in __all__ vonmises_cython is still imported as an internal module and functions in distributions. Josef > > but just removing vonmises_cython in distributions.py > => import scipy.stats then works. Then, I expect you will get an import error or some other exception when you try to use stats.vonmises. > > Similarly import scipy.cluster => trace > ? File "numpy.pxd", line 30, in scipy.spatial.ckdtree (scipy/spatial/ckdtree.c:6087) > ValueError: numpy.dtype does not appear to be the correct type object > > I like the naming convention xx_cython.so. > > cheers > ? -- denis > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Jan 12 12:51:08 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Jan 2010 12:51:08 -0500 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? In-Reply-To: References: Message-ID: On Jan 12, 2010, at 10:52 AM, wrote: > We have noticed the MaskedArray implementation in numpy-1.4.0 breaks > some of our code. For instance we see the following: My, that's embarrassing. Sorry for the inconvenience. > > in 1.3.0: > >>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) >>>> numpy.ma.sum(a, 1) > masked_array(data = [ 6 15], > mask = False, > fill_value = 999999) > > in 1.4.0 > >>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) >>>> numpy.ma.sum(a, 1) > Traceback (most recent call last): > File "", line 1, in > File > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > umpy/ma/core.py", line 5682, in __call__ > return method(*args, **params) > File > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > umpy/ma/core.py", line 4357, in sum > newmask = _mask.all(axis=axis) > ValueError: axis(=1) out of bounds Confirmed. Before I take full blame for it, can you try the following on both 1.3 and 1.4 ? >>> np.array(False).all().sum(1) Back to your problem: I'll fix that ASAIC, but it'll be on the SVN. Meanwhile, you can: * Use -1 instead of 1 for your axis. * Force the definition of a mask when you define your array with masked_array(...,mask=False) From sebastian.walter at gmail.com Tue Jan 12 13:05:01 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 12 Jan 2010 19:05:01 +0100 Subject: [Numpy-discussion] wrong casting of augmented assignment statements Message-ID: Hello, I have a question about the augmented assignment statements *=, +=, etc. Apparently, the casting of types is not working correctly. Is this known resp. intended behavior of numpy? (I'm using numpy.__version__ = '1.4.0.dev7039' on this machine but I remember a recent checkout of numpy yielded the same result). The problem is best explained at some examples: wrong casting from float to int:: In [1]: import numpy In [2]: x = numpy.ones(2,dtype=int) In [3]: y = 1.3 * numpy.ones(2,dtype=float) In [4]: z = x * y In [5]: z Out[5]: array([ 1.3, 1.3]) In [6]: x *= y In [7]: x Out[7]: array([1, 1]) In [8]: x.dtype Out[8]: dtype('int32') wrong casting from float to object:: In [1]: import numpy In [2]: import adolc In [3]: x = adolc.adouble(numpy.array([1,2,3],dtype=float)) In [4]: y = numpy.array([4,5,6],dtype=float) In [5]: x Out[5]: array([1(a), 2(a), 3(a)], dtype=object) In [6]: y Out[6]: array([ 4., 5., 6.]) In [7]: x * y Out[7]: array([4(a), 10(a), 18(a)], dtype=object) In [8]: y *= x In [9]: y Out[9]: array([ 4., 5., 6.]) It is inconsistent to the Python behavior:: In [9]: a = 1 In [10]: b = 1.3 In [11]: c = a * b In [12]: c Out[12]: 1.3 In [13]: a *= b In [14]: a Out[14]: 1.3 I would expect that numpy should at least raise an exception in the case of casting object to float. Any thoughts? regards, Sebastian From robert.kern at gmail.com Tue Jan 12 13:09:05 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Jan 2010 12:09:05 -0600 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: References: Message-ID: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> On Tue, Jan 12, 2010 at 12:05, Sebastian Walter wrote: > Hello, > I have a question about the augmented assignment statements *=, +=, etc. > Apparently, the casting of types is not working correctly. Is this > known resp. intended behavior of numpy? Augmented assignment modifies numpy arrays in-place, so the usual casting rules for assignment into an array apply. Namely, the array being assigned into keeps its dtype. If you do not want in-place modification, do not use augmented assignment. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From caldwell19 at llnl.gov Tue Jan 12 13:01:29 2010 From: caldwell19 at llnl.gov (Peter Caldwell) Date: Tue, 12 Jan 2010 10:01:29 -0800 Subject: [Numpy-discussion] sphinx numpydoc fails due to no __init__ for class SignedType Message-ID: <4B4CB8F9.4030208@llnl.gov> I'm trying to use sphinx to build documentation for our project (CDAT) that uses numpy. I'm running into an exception due to numpy.numarray.numerictypes.SignedType not having an __init__ attribute, which causes problems with numpydoc. I'm sure there must be a workaround or I'm doing something wrong since the basic numpy documentation is created with sphinx! Suggestions? I'm using sphinx v1.0, numpy v1.3.0, and numpydoc v0.3.1on Redhat Enterprise 5.x. Big thanks, Peter ps - I'm sending this question to both Numpy-discussion and sphinx-dev at googlegroups because the issue lies at the intersection of these groups. Here's the error: ========================================================= Running Sphinx v1.0 loading pickled environment... not found building [html]: targets for 6835 source files that are out of date updating environment: 6835 added, 0 changed, 0 removed /usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/docscrape.py:117: UserWarning: Unknown section Unary Ufuncs: warn("Unknown section %s" % key) /usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/docscrape.py:117: UserWarning: Unknown section Binary Ufuncs: warn("Unknown section %s" % key) /usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/docscrape.py:117: UserWarning: Unknown section Seealso warn("Unknown section %s" % key) reading sources... [ 3%] output/lev0/numpy.numarray Exception occurred: File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/numpydoc.py", line 76, in mangle_signature 'initializes x; see ' in pydoc.getdoc(obj.__init__)): AttributeError: class SignedType has no attribute '__init__' The full traceback has been saved in /tmp/sphinx-err-fprbpu.log, if you want to report the issue to the author. Please also report this if it was a user error, so that a better error message can be provided next time. Send reports to sphinx-dev at googlegroups.com. Thanks! make: *** [html] Error 1 ===================================================== Here's the full traceback: ------------------------------------------------------------------------------------------------ # Sphinx version: 1.0 # Docutils version: 0.6 release Traceback (most recent call last): File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/cmdline.py", line 172, in main app.build(all_files, filenames) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/application.py", line 130, in build self.builder.build_update() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/builders/__init__.py", line 265, in build_update 'out of date' % len(to_build)) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/builders/__init__.py", line 285, in build purple, length): File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/builders/__init__.py", line 131, in status_iterator for item in iterable: File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/environment.py", line 513, in update_generator self.read_doc(docname, app=app) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/environment.py", line 604, in read_doc pub.publish() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/core.py", line 203, in publish self.settings) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/readers/__init__.py", line 69, in read self.parse() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/readers/__init__.py", line 75, in parse self.parser.parse(self.input, document) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/__init__.py", line 157, in parse self.statemachine.run(inputlines, document, inliner=self.inliner) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 170, in run input_source=document['source']) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 233, in run context, state, transitions) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 421, in check_line return method(match, context, next_state) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2678, in underline self.section(title, source, style, lineno - 1, messages) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 323, in section self.new_subsection(title, lineno, messages) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 391, in new_subsection node=section_node, match_titles=1) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 278, in nested_parse node=node, match_titles=match_titles) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 195, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 233, in run context, state, transitions) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 421, in check_line return method(match, context, next_state) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2258, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2270, in explicit_construct return method(self, expmatch) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2013, in directive directive_class, match, type_name, option_presets) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2062, in run_directive result = directive_instance.run() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/autodoc.py", line 1106, in run nested_parse_with_titles(self.state, self.result, node) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/util/__init__.py", line 298, in nested_parse_with_titles return state.nested_parse(content, 0, node, match_titles=1) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 278, in nested_parse node=node, match_titles=match_titles) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 195, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 233, in run context, state, transitions) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 421, in check_line return method(match, context, next_state) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2260, in explicit_markup self.explicit_list(blank_finish) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2289, in explicit_list match_titles=self.state_machine.match_titles) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 315, in nested_list_parse node=node, match_titles=match_titles) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 195, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 233, in run context, state, transitions) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/statemachine.py", line 421, in check_line return method(match, context, next_state) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2562, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2270, in explicit_construct return method(self, expmatch) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2013, in directive directive_class, match, type_name, option_presets) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/docutils/parsers/rst/states.py", line 2062, in run_directive result = directive_instance.run() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/autosummary/__init__.py", line 192, in run items = self.get_items(names) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/autosummary/__init__.py", line 265, in get_items sig = documenter.format_signature() File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/autodoc.py", line 879, in format_signature return ModuleLevelDocumenter.format_signature(self) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/autodoc.py", line 384, in format_signature self.object, self.options, args, retann) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/application.py", line 226, in emit_firstresult for result in self.emit(event, *args): File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/application.py", line 222, in emit result.append(callback(self, *args)) File "/usr/local/cdat/release/5.2d/lib/python2.5/site-packages/Sphinx-1.0dev_20091202-py2.5.egg/sphinx/ext/numpydoc.py", line 76, in mangle_signature 'initializes x; see ' in pydoc.getdoc(obj.__init__)): AttributeError: class SignedType has no attribute '__init__' -- Peter Caldwell Program for Climate Model Diagnosis and Intercomparison Lawrence Livermore National Lab PO Box 808, L-103 Livermore, CA 94551-0808 925-422-4197 From josef.pktd at gmail.com Tue Jan 12 13:11:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 12 Jan 2010 13:11:29 -0500 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: References: Message-ID: <1cd32cbb1001121011g429e56bdh7ca2e99a4ca16820@mail.gmail.com> On Tue, Jan 12, 2010 at 1:05 PM, Sebastian Walter wrote: > Hello, > I have a question about the augmented assignment statements *=, +=, etc. > Apparently, the casting of types is not working correctly. Is this > known resp. intended behavior of numpy? > (I'm using numpy.__version__ = '1.4.0.dev7039' on this machine but I > remember a recent checkout of numpy yielded the same result). > > The problem is best explained at some examples: > > wrong casting from float to int:: > > ? ? ? ? ? ?In [1]: import numpy > > ? ? ? ? ? ?In [2]: x = numpy.ones(2,dtype=int) > > ? ? ? ? ? ?In [3]: y = 1.3 * numpy.ones(2,dtype=float) > > ? ? ? ? ? ?In [4]: z = x * y > > ? ? ? ? ? ?In [5]: z > ? ? ? ? ? ?Out[5]: array([ 1.3, ?1.3]) > > ? ? ? ? ? ?In [6]: x *= y > > ? ? ? ? ? ?In [7]: x > ? ? ? ? ? ?Out[7]: array([1, 1]) > > ? ? ? ? ? ?In [8]: x.dtype > ? ? ? ? ? ?Out[8]: dtype('int32') > > ?wrong casting from float to object:: > > ? ? ? ? ? ?In [1]: import numpy > > ? ? ? ? ? ?In [2]: import adolc > > ? ? ? ? ? ?In [3]: x = adolc.adouble(numpy.array([1,2,3],dtype=float)) > > ? ? ? ? ? ?In [4]: y = numpy.array([4,5,6],dtype=float) > > ? ? ? ? ? ?In [5]: x > ? ? ? ? ? ?Out[5]: array([1(a), 2(a), 3(a)], dtype=object) > > ? ? ? ? ? ?In [6]: y > ? ? ? ? ? ?Out[6]: array([ 4., ?5., ?6.]) > > ? ? ? ? ? ?In [7]: x * y > ? ? ? ? ? ?Out[7]: array([4(a), 10(a), 18(a)], dtype=object) > > ? ? ? ? ? ?In [8]: y *= x > > ? ? ? ? ? ?In [9]: y > > ? ? ? ? ? ?Out[9]: array([ 4., ?5., ?6.]) > > > ? ? ? ?It is inconsistent to the Python behavior:: > > ? ? ? ? ? ?In [9]: a = 1 > > ? ? ? ? ? ?In [10]: b = 1.3 > > ? ? ? ? ? ?In [11]: c = a * b > > ? ? ? ? ? ?In [12]: c > ? ? ? ? ? ?Out[12]: 1.3 > > ? ? ? ? ? ?In [13]: a *= b > > ? ? ? ? ? ?In [14]: a > ? ? ? ? ? ?Out[14]: 1.3 > > > I would expect that numpy should at least raise an exception in the > case of casting object to float. > Any thoughts? You are assigning to an existing array, which implies casting to the dtype of that array. It's the behavior that I would expect. If you want upcasting then don't use inplace *= , ... Josef > > regards, > Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Tue Jan 12 13:32:10 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 12 Jan 2010 10:32:10 -0800 Subject: [Numpy-discussion] fromfile() -- aarrgg! In-Reply-To: <1263285477.7976.10.camel@talisman> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> <4B4BBE2E.9090706@noaa.gov> <1263285477.7976.10.camel@talisman> Message-ID: <4B4CC02A.9020807@noaa.gov> Pauli Virtanen wrote: > ma, 2010-01-11 kello 16:11 -0800, Christopher Barker kirjoitti: > [clip] >> "If no conversion is performed, zero is returned and the value of nptr >> is stored in the location referenced by endptr." >> >> off do do some more testing, but I guess that means that those pointers >> need to be checked after the call, to see if a conversion was generated. >> >> Am I right? > > Yes, that's how strtod() is typically used. > > NumPyOS_ascii_ftolf already checks that, no, I don't' think it does, but it does pass the nifo through, so its API should be the same as PyOS_ascii_ftolf which is the same as strftolf(), which makes sense. > but it seems to me that > fromstr_next_element or possibly fromstr does not. The problem is fromstr -- it changes the symantics, assigning the value to a pointer passed in, and returning an error code -- except it doesn't actually check for an error -- it always returns 0: static int @fname at _fromstr(char *str, @type@ *ip, char **endptr, PyArray_Descr *NPY_UNUSED(ignore)) { double result; result = NumPyOS_ascii_strtod(str, endptr); *ip = (@type@) result; return 0; } so the errors are getting lost in the shuffle: This implies that fromstring/fromfile are the only things using it -- unless someone has seen similar bad behaviour anywhere else. > Welcome to the wonderful world of C ;) yup -- which is why I haven't worked out a fix yet... Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sebastian.walter at gmail.com Tue Jan 12 13:31:42 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 12 Jan 2010 19:31:42 +0100 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> References: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> Message-ID: On Tue, Jan 12, 2010 at 7:09 PM, Robert Kern wrote: > On Tue, Jan 12, 2010 at 12:05, Sebastian Walter > wrote: >> Hello, >> I have a question about the augmented assignment statements *=, +=, etc. >> Apparently, the casting of types is not working correctly. Is this >> known resp. intended behavior of numpy? > > Augmented assignment modifies numpy arrays in-place, so the usual > casting rules for assignment into an array apply. Namely, the array > being assigned into keeps its dtype. what are the usual casting rules? How does numpy know how to cast an object to a float? > > If you do not want in-place modification, do not use augmented assignment. Normally, I'd be perfectly fine with that. However, this particular problem occurs when you try to automatically differentiate an algorithm by using an Algorithmic Differentiation (AD) tool. E.g. given a function x = numpy.ones(2) def f(x): a = numpy.ones(2) a *= x return numpy.sum(a) one would use an AD tool as follows: x = numpy.array([adouble(1.), adouble(1.)]) y = f(x) but since the casting from object to float is not possible the computed gradient \nabla_x f(x) will be wrong. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Tue Jan 12 13:32:09 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 12 Jan 2010 20:32:09 +0200 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? In-Reply-To: References: Message-ID: <1263321129.7167.12.camel@idol> ti, 2010-01-12 kello 12:51 -0500, Pierre GM kirjoitti: [clip] > >>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) > >>>> numpy.ma.sum(a, 1) > > Traceback (most recent call last): > > File "", line 1, in > > File > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > umpy/ma/core.py", line 5682, in __call__ > > return method(*args, **params) > > File > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > umpy/ma/core.py", line 4357, in sum > > newmask = _mask.all(axis=axis) > > ValueError: axis(=1) out of bounds > > Confirmed. > Before I take full blame for it, can you try the following on both 1.3 and 1.4 ? > >>> np.array(False).all().sum(1) Oh crap, it's mostly my fault: http://projects.scipy.org/numpy/ticket/1286 http://projects.scipy.org/numpy/changeset/7697 http://projects.scipy.org/numpy/browser/trunk/doc/release/1.4.0-notes.rst#deprecations Pretty embarassing, as very simple things break, although the test suite miraculously passes... > Back to your problem: I'll fix that ASAIC, but it'll be on the SVN. Meanwhile, you can: > * Use -1 instead of 1 for your axis. > * Force the definition of a mask when you define your array with masked_array(...,mask=False) Sounds like we need a 1.4.1 out at some point not too far in the future, then. Pauli From robert.kern at gmail.com Tue Jan 12 13:38:51 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Jan 2010 12:38:51 -0600 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: References: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> Message-ID: <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> On Tue, Jan 12, 2010 at 12:31, Sebastian Walter wrote: > On Tue, Jan 12, 2010 at 7:09 PM, Robert Kern wrote: >> On Tue, Jan 12, 2010 at 12:05, Sebastian Walter >> wrote: >>> Hello, >>> I have a question about the augmented assignment statements *=, +=, etc. >>> Apparently, the casting of types is not working correctly. Is this >>> known resp. intended behavior of numpy? >> >> Augmented assignment modifies numpy arrays in-place, so the usual >> casting rules for assignment into an array apply. Namely, the array >> being assigned into keeps its dtype. > > what are the usual casting rules? For assignment into an array, the array keeps its dtype and the data being assigned into it will be cast to that dtype. > How does numpy know how to cast an object to a float? For a general object, numpy will call its __float__ method. >> If you do not want in-place modification, do not use augmented assignment. > > Normally, I'd be perfectly fine with that. > However, this particular problem occurs when you try to automatically > differentiate an algorithm by using an Algorithmic Differentiation > (AD) tool. > E.g. given a function > > x = numpy.ones(2) > def f(x): > ? a = numpy.ones(2) > ? a *= x > ? return numpy.sum(a) > > one would use an AD tool as follows: > x = numpy.array([adouble(1.), adouble(1.)]) > y = f(x) > > but since the casting from object to float is not possible the > computed gradient \nabla_x f(x) will be wrong. Sorry, but that's just a limitation of the AD approach. There are all kinds of numpy constructions that AD can't handle. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue Jan 12 13:52:17 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Jan 2010 11:52:17 -0700 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? In-Reply-To: <1263321129.7167.12.camel@idol> References: <1263321129.7167.12.camel@idol> Message-ID: On Tue, Jan 12, 2010 at 11:32 AM, Pauli Virtanen wrote: > ti, 2010-01-12 kello 12:51 -0500, Pierre GM kirjoitti: > [clip] > > >>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) > > >>>> numpy.ma.sum(a, 1) > > > Traceback (most recent call last): > > > File "", line 1, in > > > File > > > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > > umpy/ma/core.py", line 5682, in __call__ > > > return method(*args, **params) > > > File > > > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > > umpy/ma/core.py", line 4357, in sum > > > newmask = _mask.all(axis=axis) > > > ValueError: axis(=1) out of bounds > > > > Confirmed. > > Before I take full blame for it, can you try the following on both 1.3 > and 1.4 ? > > >>> np.array(False).all().sum(1) > > Oh crap, it's mostly my fault: > > http://projects.scipy.org/numpy/ticket/1286 > http://projects.scipy.org/numpy/changeset/7697 > > http://projects.scipy.org/numpy/browser/trunk/doc/release/1.4.0-notes.rst#deprecations > > Pretty embarassing, as very simple things break, although the test suite > miraculously passes... > > > Back to your problem: I'll fix that ASAIC, but it'll be on the SVN. > Meanwhile, you can: > > * Use -1 instead of 1 for your axis. > > * Force the definition of a mask when you define your array with > masked_array(...,mask=False) > > Sounds like we need a 1.4.1 out at some point not too far in the future, > then. > > If so, then it should be sooner rather than later in order to sync with the releases of ubuntu and fedora. Both of the upcoming releases still use 1.3.0, but that could change... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Tue Jan 12 14:16:43 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 12 Jan 2010 20:16:43 +0100 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> References: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> Message-ID: On Tue, Jan 12, 2010 at 7:38 PM, Robert Kern wrote: > On Tue, Jan 12, 2010 at 12:31, Sebastian Walter > wrote: >> On Tue, Jan 12, 2010 at 7:09 PM, Robert Kern wrote: >>> On Tue, Jan 12, 2010 at 12:05, Sebastian Walter >>> wrote: >>>> Hello, >>>> I have a question about the augmented assignment statements *=, +=, etc. >>>> Apparently, the casting of types is not working correctly. Is this >>>> known resp. intended behavior of numpy? >>> >>> Augmented assignment modifies numpy arrays in-place, so the usual >>> casting rules for assignment into an array apply. Namely, the array >>> being assigned into keeps its dtype. >> >> what are the usual casting rules? > > For assignment into an array, the array keeps its dtype and the data > being assigned into it will be cast to that dtype. > >> How does numpy know how to cast an object to a float? > > For a general object, numpy will call its __float__ method. 1) the object does not have a __float__ method. 2) I've now implemented the __float__ method (to raise an error). However, it doesn't get called. All objects are casted to 1. > >>> If you do not want in-place modification, do not use augmented assignment. >> >> Normally, I'd be perfectly fine with that. >> However, this particular problem occurs when you try to automatically >> differentiate an algorithm by using an Algorithmic Differentiation >> (AD) tool. >> E.g. given a function >> >> x = numpy.ones(2) >> def f(x): >> ? a = numpy.ones(2) >> ? a *= x >> ? return numpy.sum(a) >> >> one would use an AD tool as follows: >> x = numpy.array([adouble(1.), adouble(1.)]) >> y = f(x) >> >> but since the casting from object to float is not possible the >> computed gradient \nabla_x f(x) will be wrong. > > Sorry, but that's just a limitation of the AD approach. There are all > kinds of numpy constructions that AD can't handle. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Tue Jan 12 14:34:11 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 12 Jan 2010 11:34:11 -0800 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: References: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> Message-ID: <4B4CCEB3.5030307@noaa.gov> Sebastian Walter wrote: >>> However, this particular problem occurs when you try to automatically >>> differentiate an algorithm by using an Algorithmic Differentiation >>> (AD) tool. >>> E.g. given a function >>> >>> x = numpy.ones(2) >>> def f(x): >>> a = numpy.ones(2) >>> a *= x >>> return numpy.sum(a) I don't know anything about AD, but in general, when I write a function that requires a given numpy array type as input, I'll do something like: def f(x): x = np.asarray(a, dtype=np.float) a = np.ones(2) a *= x return np.sum(a) That makes the casting explicit, and forces it to happen at the top of the function, where the error will be more obvious. asarray will just pass through a conforming array, so little performance penalty when you do give it the right type. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Jan 12 14:50:43 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 12 Jan 2010 14:50:43 -0500 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? In-Reply-To: References: <1263321129.7167.12.camel@idol> Message-ID: <910F1C67-06BB-47C1-8FCA-057131728D47@gmail.com> On Jan 12, 2010, at 1:52 PM, Charles R Harris wrote: > > > > On Tue, Jan 12, 2010 at 11:32 AM, Pauli Virtanen wrote: > ti, 2010-01-12 kello 12:51 -0500, Pierre GM kirjoitti: > [clip] > > >>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) > > >>>> numpy.ma.sum(a, 1) > > > Traceback (most recent call last): > > > File "", line 1, in > > > File > > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > > umpy/ma/core.py", line 5682, in __call__ > > > return method(*args, **params) > > > File > > > "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux-x86_64.egg/n > > > umpy/ma/core.py", line 4357, in sum > > > newmask = _mask.all(axis=axis) > > > ValueError: axis(=1) out of bounds > > > > Confirmed. > > Before I take full blame for it, can you try the following on both 1.3 and 1.4 ? > > >>> np.array(False).all().sum(1) > > Oh crap, it's mostly my fault: > > http://projects.scipy.org/numpy/ticket/1286 > http://projects.scipy.org/numpy/changeset/7697 > http://projects.scipy.org/numpy/browser/trunk/doc/release/1.4.0-notes.rst#deprecations > > Pretty embarassing, as very simple things break, although the test suite > miraculously passes... > > > Back to your problem: I'll fix that ASAIC, but it'll be on the SVN. Meanwhile, you can: > > * Use -1 instead of 1 for your axis. > > * Force the definition of a mask when you define your array with masked_array(...,mask=False) > > Sounds like we need a 1.4.1 out at some point not too far in the future, > then. > > > If so, then it should be sooner rather than later in order to sync with the releases of ubuntu and fedora. Both of the upcoming releases still use 1.3.0, but that could change... I guess that the easiest would be for me to provide a workaround for the bug (Pauli's modifications make sense, I was relying on a *feature* that wasn't very robust). I'll update both the trunk and the 1.4.x branch From ms at TheBrookhavenGroup.com Tue Jan 12 15:33:02 2010 From: ms at TheBrookhavenGroup.com (Marc Schwarzschild) Date: Tue, 12 Jan 2010 15:33:02 -0500 Subject: [Numpy-discussion] numpy sum table by category Message-ID: <19276.56446.757824.576522@ny.koplon.com> I have a csv file like this: Account, Symbol, Quantity, Price One,SPY,5,119.00 One,SPY,3,120.00 One,SPY,-2,125.00 One,GE,... One,GE,... Two,SPY, ... Three,GE, ... ... The data is much larger, could be 10,000 records. I can load it into a numpy array using matplotlib.mlab.csv2rec(). I learned several useful numpy functions and have been reading lots of documentation. However, I have not found a way to create a unique list of symbols and the Sum of their respective Quantity values. I want do various calculations on the data like pull out all the records for a given Account. The actual data has lots more columns and sometimes I may want to sum(Quantity*Price) by Account and Symbol. I'm attracted to numpy for speed but would welcome alternative suggestions. I tried unsuccessfully to install PyTables on my Ubuntu system and abandoned that avenue. Can anyone provide some examples on how to do this or point me to documentation? Much appreciated. _________________________________________________________ Marc Schwarzschild The Brookhaven Group, LLC 1-212-580-1175 Analytics for Hedge Fund Investors Risk it, carefully! From josef.pktd at gmail.com Tue Jan 12 16:08:44 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 12 Jan 2010 16:08:44 -0500 Subject: [Numpy-discussion] numpy sum table by category In-Reply-To: <19276.56446.757824.576522@ny.koplon.com> References: <19276.56446.757824.576522@ny.koplon.com> Message-ID: <1cd32cbb1001121308k6e026be5k5ac79d66038268ed@mail.gmail.com> On Tue, Jan 12, 2010 at 3:33 PM, Marc Schwarzschild wrote: > > > I have a csv file like this: > > ? ?Account, Symbol, Quantity, Price > ? ?One,SPY,5,119.00 > ? ?One,SPY,3,120.00 > ? ?One,SPY,-2,125.00 > ? ?One,GE,... > ? ?One,GE,... > ? ?Two,SPY, ... > ? ?Three,GE, ... > ? ? ... > > The data is much larger, could be 10,000 records. ?I can load it > into a numpy array using matplotlib.mlab.csv2rec(). ?I learned > several useful numpy functions and have been reading lots of > documentation. ?However, I have not found a way to create a > unique list of symbols and the Sum of their respective Quantity > values. ?I want do various calculations on the data like pull out > all the records for a given Account. ?The actual data has lots > more columns and sometimes I may want to sum(Quantity*Price) by > Account and Symbol. > > I'm attracted to numpy for speed but would welcome alternative > suggestions. > > I tried unsuccessfully to install PyTables on my Ubuntu system > and abandoned that avenue. > > Can anyone provide some examples on how to do this or point me to > documentation? If you don't want to do a lot of programming yourself, then I recommend tabular for this, which looks good for this kind of spreadsheet like operations, alternatively pandas. Josef > > Much appreciated. > > _________________________________________________________ > Marc Schwarzschild ? ? ? ? ? ? ?The Brookhaven Group, LLC > 1-212-580-1175 ? ? ? ? Analytics for Hedge Fund Investors > ? ? ? ? ? ? ? ? Risk it, carefully! > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Tue Jan 12 20:19:35 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 12 Jan 2010 17:19:35 -0800 Subject: [Numpy-discussion] fromfile() -- aarrgg! In-Reply-To: <4B4CC02A.9020807@noaa.gov> References: <4B42905A.4080105@noaa.gov> <1262721695.5107.1.camel@idol> <4B463F37.4010108@noaa.gov> <1cd32cbb1001071232l6c3d3525g4ec4747d62d998ed@mail.gmail.com> <4B465605.3010406@noaa.gov> <1cd32cbb1001071515p498c8746u5dce34453346c97f@mail.gmail.com> <4B466E4C.10504@noaa.gov> <4B46889E.3020008@noaa.gov> <4B4BBE2E.9090706@noaa.gov> <1263285477.7976.10.camel@talisman> <4B4CC02A.9020807@noaa.gov> Message-ID: <4B4D1FA7.9070205@noaa.gov> Christopher Barker wrote: > static int > @fname at _fromstr(char *str, @type@ *ip, char **endptr, PyArray_Descr > *NPY_UNUSED(ignore)) > { > double result; > result = NumPyOS_ascii_strtod(str, endptr); > *ip = (@type@) result; > return 0; > } OK, I've done the diagnostics, but not all of the fix. Here's the issue: numpyos.c: NumPyOS_ascii_strtod() Was incrementing the input pointer to strip out whitespace before passing it on to PyOS_ascii_strtod(). So the **endptr getting passed back to @fname at _fromstr didn't match. I've fixed that -- so now it should be possible to check if str and *endptr are the same after the call, to see if a double was actually read -- I"m not suite sure what to do in that case, but a return code is a good start. However, I also took a look at integers. For example: In [39]: np.fromstring("4.5, 3", sep=',', dtype=np.int) Out[39]: array([4]) clearly wrong -- it may be OK to read "4.5" as 4, but then it stops, I guess because there is a ".5" before the next sep. Anyway, not the best solution. However, in this case, the function is here: @fname at _fromstr(char *str, @type@ *ip, char **endptr, PyArray_Descr *NPY_UNUSED(ignore)) { @btype@ result; result = PyOS_strto at func@(str, endptr, 10); *ip = (@type@) result; printf("In int fromstr - result: %i\n", result ); printf("In int fromstr - str: '%s', %p %p\n", str, str, *endptr); return 0; } so it's calling PyOS_strtol(), which when called on "4.5" returns 4 -- which explains the abive behaviou -- but how to know that that wasn't a proper reading? This really is a mess! Since there was just some talk about a 1.4.1 -- I'd like to get some of this fixed before then -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From denis-bz-py at t-online.de Wed Jan 13 05:47:22 2010 From: denis-bz-py at t-online.de (denis) Date: Wed, 13 Jan 2010 11:47:22 +0100 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: <3d375d731001120841y7bc9bb9cla1b49c6deb2b5232@mail.gmail.com> References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> <3d375d731001120841y7bc9bb9cla1b49c6deb2b5232@mail.gmail.com> Message-ID: On 12/01/2010 17:41, Robert Kern wrote: > It's not a bug, but it is a known issue. We tried very hard to keep > numpy 1.4 binary compatible; however, Pyrex and Cython impose > additional runtime checks above and beyond binary compatibility. Robert, Josef, are you saying that mac users shouldn't expect numpy-1.4.0-py2.6-python.org.dmg scipy-0.7.1-py2.6-python.org.dmg to "just work" together, download and go ? If not, then the download pages should clearly say "... may not work with ..." (If they weren't tested together, that's imho a problem in the process; I realize that testing is hard work, no glory.) cheers -- denis From eadrogue at gmx.net Wed Jan 13 06:57:03 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Wed, 13 Jan 2010 12:57:03 +0100 Subject: [Numpy-discussion] numpy sum table by category In-Reply-To: <19276.56446.757824.576522@ny.koplon.com> References: <19276.56446.757824.576522@ny.koplon.com> Message-ID: <20100113115702.GA6116@doriath.local> 12/01/10 @ 15:33 (-0500), thus spake Marc Schwarzschild: > > > I have a csv file like this: > > Account, Symbol, Quantity, Price > One,SPY,5,119.00 > One,SPY,3,120.00 > One,SPY,-2,125.00 > One,GE,... > One,GE,... > Two,SPY, ... > Three,GE, ... > ... > > The data is much larger, could be 10,000 records. I can load it > into a numpy array using matplotlib.mlab.csv2rec(). I learned > several useful numpy functions and have been reading lots of > documentation. However, I have not found a way to create a > unique list of symbols and the Sum of their respective Quantity > values. If x is your record array: for sym in set(x['Symbol']): mask = x['Symbol'] == sym print sym, x[mask]['Quantity'].sum() > I want do various calculations on the data like pull out > all the records for a given Account. The actual data has lots > more columns and sometimes I may want to sum(Quantity*Price) by > Account and Symbol. To get a subset of records matching an arbitrary criteria, you use boolean arrays. For example, x['Account'] == 'name' generates a boolean array of the same length as x, with each element being True or False depending on whether in that record the Account field was equal to 'name'. Then such arrays can be used as an index on the original x array, to get the subset of records. This is what the example above does. Cheers. From david at silveregg.co.jp Wed Jan 13 23:41:13 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 14 Jan 2010 13:41:13 +0900 Subject: [Numpy-discussion] numpy1.4 dtype issues: scipy.stats & pytables In-Reply-To: References: <3D845F44-C5F5-4F56-8233-E97C2D5A6CE2@gmail.com> <1cd32cbb1001110910r2cde5bdgac72e7d176ead46a@mail.gmail.com> <3d375d731001120841y7bc9bb9cla1b49c6deb2b5232@mail.gmail.com> Message-ID: <4B4EA069.8040103@silveregg.co.jp> denis wrote: > On 12/01/2010 17:41, Robert Kern wrote: > >> It's not a bug, but it is a known issue. We tried very hard to keep >> numpy 1.4 binary compatible; however, Pyrex and Cython impose >> additional runtime checks above and beyond binary compatibility. > > Robert, Josef, > are you saying that mac users shouldn't expect > numpy-1.4.0-py2.6-python.org.dmg > scipy-0.7.1-py2.6-python.org.dmg > to "just work" together, download and go ? It would not work for the concerned subpackages, no. > If not, then the download pages should clearly say "... may not work with ..." > (If they weren't tested together, that's imho a problem in the process; > I realize that testing is hard work, no glory.) It is not so much hard-work than time consuming, at least as long as we don't have automated testing of binaries. Unfortunately, the problem was not caught properly during the beta phase, David From cournape at gmail.com Thu Jan 14 00:02:47 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Jan 2010 14:02:47 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above Message-ID: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> Hi, as I already hinted to some people in person, I won't make new releases of numpy (and scipy) as I used to. To ease the transition, I think it would be good to have new people with who we could make say the 1.4.1 release together. Making installers is relatively streamlined now, it can be done almost 100 % automatically - the testing is still manual. I still think it would be a good idea to have a different release manager for each release - it may be easier to find someone to do it if it is only for one release cycle. cheers, David From charlesr.harris at gmail.com Thu Jan 14 01:07:16 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Jan 2010 23:07:16 -0700 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> Message-ID: On Wed, Jan 13, 2010 at 10:02 PM, David Cournapeau wrote: > Hi, > > as I already hinted to some people in person, I won't make new > releases of numpy (and scipy) as I used to. To ease the transition, I > think it would be good to have new people with who we could make say > the 1.4.1 release together. Making installers is relatively > streamlined now, it can be done almost 100 % automatically - the > testing is still manual. > > I still think it would be a good idea to have a different release > manager for each release - it may be easier to find someone to do it > if it is only for one release cycle. > > What is the setup one needs to build the installers? It might be well to document that, the dependencies, and the process. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Jan 14 01:34:10 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 14 Jan 2010 15:34:10 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> Message-ID: <4B4EBAE2.7040703@silveregg.co.jp> Charles R Harris wrote: > > > What is the setup one needs to build the installers? It might be well to > document that, the dependencies, and the process. Right. The top script is: http://projects.scipy.org/numpy/browser/trunk/release.sh the bulk of the work is in : http://projects.scipy.org/numpy/browser/trunk/pavement.py which describes what is needed to build installers. On mac os x, the release script may be used as is to build every installer + the release notes. David From jonboym2 at yahoo.co.uk Thu Jan 14 03:55:46 2010 From: jonboym2 at yahoo.co.uk (Jon Moore) Date: Thu, 14 Jan 2010 08:55:46 +0000 Subject: [Numpy-discussion] Getting Callbacks with arrays to work In-Reply-To: <4B4C7CC1.3010304@cens.ioc.ee> References: <832556.93262.qm@web24504.mail.ird.yahoo.com> <4B4C7CC1.3010304@cens.ioc.ee> Message-ID: <4B4EDC12.3050409@yahoo.co.uk> Hi, Thanks all works now! The implicit none only didn't work when defining dv as a function now its a subroutine it seems to work. Regards Jon On 12/01/2010 13:44, Pearu Peterson wrote: > Hi, > > The problem is that f2py does not support callbacks that > return arrays. There is easy workaround to that: provide > returnable arrays as arguments to callback functions. > Using your example: > > SUBROUTINE CallbackTest(dv,v0,Vout,N) > IMPLICIT NONE > > !F2PY intent( hide ):: N > INTEGER:: N, ic > EXTERNAL:: dv > > DOUBLE PRECISION, DIMENSION( N ), INTENT(IN):: v0 > DOUBLE PRECISION, DIMENSION( N ), INTENT(OUT):: Vout > > DOUBLE PRECISION, DIMENSION( N ):: Vnow > DOUBLE PRECISION, DIMENSION( N ):: temp > > Vnow = v0 > !f2py intent (out) temp > call dv(temp, Vnow, N) > > DO ic = 1, N > Vout( ic ) = temp(ic) > END DO > > END SUBROUTINE CallbackTest > > $ f2py -c test.f90 -m t --fcompiler=gnu95 > >>>> from numpy import * >>>> from t import * >>>> arr = array([2.0, 4.0, 6.0, 8.0]) >>>> def dV(v): > print 'in Python dV: V is: ',v > ret = v.copy() > ret[1] = 100.0 > return ret > ... >>>> output = callbacktest(dV, arr) > in Python dV: V is: [ 2. 4. 6. 8.] >>>> output > array([ 2., 100., 6., 8.]) > > What problems do you have with implicit none? It works > fine here. Check the format of your source code, > if it is free then use `.f90` extension, not `.f`. > > HTH, > Pearu > > Jon Moore wrote: >> Hi, >> >> I'm trying to build a differential equation integrator and later a >> stochastic differential equation integrator. >> >> I'm having trouble getting f2py to work where the callback itself >> receives an array from the Fortran routine does some work on it and then >> passes an array back. >> >> For the stoachastic integrator I'll need 2 callbacks both dealing with >> arrays. >> >> The idea is the code that never changes (ie the integrator) will be in >> Fortran and the code that changes (ie the callbacks defining >> differential equations) will be different for each problem. >> >> To test the idea I've written basic code which should pass an array back >> and forth between Python and Fortran if it works right. >> >> Here is some code which doesn't work properly:- >> >> SUBROUTINE CallbackTest(dv,v0,Vout,N) >> !IMPLICIT NONE >> >> cF2PY intent( hide ):: N >> INTEGER:: N, ic >> >> EXTERNAL:: dv >> >> DOUBLE PRECISION, DIMENSION( N ), INTENT(IN):: v0 >> DOUBLE PRECISION, DIMENSION( N ), INTENT(OUT):: Vout >> >> DOUBLE PRECISION, DIMENSION( N ):: Vnow >> DOUBLE PRECISION, DIMENSION( N ):: temp >> >> Vnow = v0 >> >> >> temp = dv(Vnow, N) >> >> DO ic = 1, N >> Vout( ic ) = temp(ic) >> END DO >> >> END SUBROUTINE CallbackTest >> >> >> >> When I test it with this python code I find the code just replicates the >> first term of the array! >> >> >> >> >> from numpy import * >> import callback as c >> >> def dV(v): >> print 'in Python dV: V is: ',v >> return v.copy() >> >> arr = array([2.0, 4.0, 6.0, 8.0]) >> >> print 'Arr is: ', arr >> >> output = c.CallbackTest(dV, arr) >> >> print 'Out is: ', output >> >> >> >> >> Arr is: [ 2. 4. 6. 8.] >> >> in Python dV: V is: [ 2. 4. 6. 8.] >> >> Out is: [ 2. 2. 2. 2.] >> >> >> >> Any ideas how I should do this, and also how do I get the code to work >> with implicit none not commented out? >> >> Thanks >> >> Jon >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian.walter at gmail.com Thu Jan 14 04:11:29 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Thu, 14 Jan 2010 10:11:29 +0100 Subject: [Numpy-discussion] wrong casting of augmented assignment statements In-Reply-To: <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> References: <3d375d731001121009p18e01199y2506b8a205d135fa@mail.gmail.com> <3d375d731001121038t226ed0c6md986851be317402d@mail.gmail.com> Message-ID: I've written a self-contained example that shows that numpy indeed tries to call the __float__ method. What is buggy is what happens if calling the __float__ method raises an Exception. Then numpy assumes (in this case wrongly) that the object should be casted to the neutral element. I'd guess that the __float__ method is called somewhere in a try: statement and if an exception is raised it is casted to the neutral element. I've tried to locate the corresponding code in the numpy sources but I got lost. Could someone be so kind and point me to it? -------------------- start code ---------------------- import numpy print 'numpy.__version__ = ',numpy.__version__ class ad1: def __init__(self,x): self.x = x def __mul__(self,other): if not isinstance(other, self.__class__): return self.__class__(self.x * other) return self.__class__(self.x * other.x) def __rmul__(self,other): return self * other def __float__(self): raise Exception('this is not possible') def __str__(self): return str(self.x) print '\nThis example yields buggy behavior:' x1 = numpy.array([ad1(1.), ad1(2.), ad1(3.)]) y1 = numpy.random.rand(3) print 'y1= ',y1 print 'x1= ',x1 z1 = x1 * y1 y1 *= x1 # this should call the __float__ method of ad1 which would raise an Exception print 'z1=x1*y1',z1 print 'y1*=x1 ',y1 class ad2: def __init__(self,x): self.x = x def __mul__(self,other): if not isinstance(other, self.__class__): return self.__class__(self.x * other) return self.__class__(self.x * other.x) def __rmul__(self,other): return self * other def __float__(self): return float(self.x) def __str__(self): return str(self.x) print '\nThis example works fine:' x2 = numpy.array([ad2(1.), ad2(2.), ad2(3.)]) y2 = numpy.random.rand(3) print 'y2= ',y2 print 'x2= ',x2 z2 = x2 * y2 y2 *= x2 # this should call the __float__ method of ad1 which would raise an Exception print 'z2=x2*y2',z2 print 'y2*=x2 ',y2 -------------------- end code ---------------------- -------- output --------- walter at wronski$ python wrong_casting_object_to_float_of_augmented_assignment_statements.py numpy.__version__ = 1.3.0 This example yields buggy behavior: y1= [ 0.15322371 0.47915903 0.81153995] x1= [1.0 2.0 3.0] z1=x1*y1 [0.153223711127 0.958318053803 2.43461983729] y1*=x1 [ 0.15322371 0.47915903 0.81153995] This example works fine: y2= [ 0.49377037 0.60908423 0.79772095] x2= [1.0 2.0 3.0] z2=x2*y2 [0.493770370747 1.21816846399 2.39316283707] y2*=x2 [ 0.49377037 1.21816846 2.39316284] -------- end output --------- On Tue, Jan 12, 2010 at 7:38 PM, Robert Kern wrote: > On Tue, Jan 12, 2010 at 12:31, Sebastian Walter > wrote: >> On Tue, Jan 12, 2010 at 7:09 PM, Robert Kern wrote: >>> On Tue, Jan 12, 2010 at 12:05, Sebastian Walter >>> wrote: >>>> Hello, >>>> I have a question about the augmented assignment statements *=, +=, etc. >>>> Apparently, the casting of types is not working correctly. Is this >>>> known resp. intended behavior of numpy? >>> >>> Augmented assignment modifies numpy arrays in-place, so the usual >>> casting rules for assignment into an array apply. Namely, the array >>> being assigned into keeps its dtype. >> >> what are the usual casting rules? > > For assignment into an array, the array keeps its dtype and the data > being assigned into it will be cast to that dtype. > >> How does numpy know how to cast an object to a float? > > For a general object, numpy will call its __float__ method. > >>> If you do not want in-place modification, do not use augmented assignment. >> >> Normally, I'd be perfectly fine with that. >> However, this particular problem occurs when you try to automatically >> differentiate an algorithm by using an Algorithmic Differentiation >> (AD) tool. >> E.g. given a function >> >> x = numpy.ones(2) >> def f(x): >> ? a = numpy.ones(2) >> ? a *= x >> ? return numpy.sum(a) >> >> one would use an AD tool as follows: >> x = numpy.array([adouble(1.), adouble(1.)]) >> y = f(x) >> >> but since the casting from object to float is not possible the >> computed gradient \nabla_x f(x) will be wrong. > > Sorry, but that's just a limitation of the AD approach. There are all > kinds of numpy constructions that AD can't handle. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Thu Jan 14 04:53:16 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Jan 2010 18:53:16 +0900 Subject: [Numpy-discussion] Matrix vs array in ma.minimum Message-ID: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> Hi, I encountered a problem in matlab which boils down to a surprising behavior of np.ma.minimum: x = np.random.randn(2, 3) mx = np.matrix(x) np.ma.minimum(x) # smallest item of x ret = np.ma.minimum(mx) # flattened version of mx, i.e. ret == mx.flatten() Is this expected ? cheers, David From pgmdevlist at gmail.com Thu Jan 14 07:22:02 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 14 Jan 2010 07:22:02 -0500 Subject: [Numpy-discussion] Matrix vs array in ma.minimum In-Reply-To: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> References: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> Message-ID: <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> On Jan 14, 2010, at 4:53 AM, David Cournapeau wrote: > Hi, > > I encountered a problem in matlab which boils down to a surprising > behavior of np.ma.minimum: > > x = np.random.randn(2, 3) > mx = np.matrix(x) > > np.ma.minimum(x) # smallest item of x > ret = np.ma.minimum(mx) # flattened version of mx, i.e. ret == mx.flatten() > > Is this expected ? Er, no. np.ma.minimum(a, b) returns the lowest value of a and b element-wsie, or the the lowest element of a is b is None. The behavior is inherited from the very first implementation of maskedarray in numeric. This itself is unexpected, since np.minimum requires at least 2 input arguments. As you observed, the current function breaks down w/ np.matrix objects when only one argument is given (and when the axis is None): we call umath.minimum.reduce on the ravelled matirx, which returns the ravelled matrix. One would expect a scalar, so yes, this behavior is also unexpected. Now, which way should we go ? Keep np.ma.minimum as it is (fixing the bug so that a scalar is returned if the function is called with only 1 argument and an axis None) ? Adapt it to match np.minimum ? From matthew.brett at gmail.com Thu Jan 14 08:01:48 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 14 Jan 2010 13:01:48 +0000 Subject: [Numpy-discussion] dtype.isbuiltin changed by .newbyteorder Message-ID: <1e2af89e1001140501m188703d6k46cf68c0ec05cacf@mail.gmail.com> Hi, Over on the scipy list, someone pointed out an oddness in the output of the matlab reader, which revealed this - to me - unexpected behavior in numpy: In [20]: dt = np.dtype('f8') In [21]: dt.isbuiltin Out[21]: 1 In [22]: ndt = dt.newbyteorder('<') In [23]: ndt.isbuiltin Out[23]: 0 I was expecting the 'isbuiltin' attribute to be the same (1) after byte swapping. Does that seem reasonable to y'all? Then, is this a bug? Thanks a lot, Matthew From robert.kern at gmail.com Thu Jan 14 10:53:14 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Jan 2010 09:53:14 -0600 Subject: [Numpy-discussion] dtype.isbuiltin changed by .newbyteorder In-Reply-To: <1e2af89e1001140501m188703d6k46cf68c0ec05cacf@mail.gmail.com> References: <1e2af89e1001140501m188703d6k46cf68c0ec05cacf@mail.gmail.com> Message-ID: <3d375d731001140753o371228a2v160c97af9a06e97b@mail.gmail.com> On Thu, Jan 14, 2010 at 07:01, Matthew Brett wrote: > Hi, > > Over on the scipy list, someone pointed out an oddness in the output > of the matlab reader, which revealed this - to me - unexpected > behavior in numpy: > > In [20]: dt = np.dtype('f8') > > In [21]: dt.isbuiltin > Out[21]: 1 > > In [22]: ndt = dt.newbyteorder('<') > > In [23]: ndt.isbuiltin > Out[23]: 0 > > I was expecting the 'isbuiltin' attribute to be the same (1) after > byte swapping. ? ?Does that seem reasonable to y'all? ?Then, is this a > bug? It is at least undesirable. It may not be a bug per se as I don't think that we guarantee that .isbuiltin is free from false negatives (though we do guarantee that it is free from false positives). The reason is that we would have to search the builtin dtypes for a match every time we create a new dtype object, and that could be more expensive than we care to do for *every* creation of a dtype object. It is possible that we can have a cheaper heuristic (native byte order and the standard typecodes) and that transformations like .newbyteorder() can have just a teeny bit more intelligent logic about how it transforms the .isbuiltin flag. Just for clarity and future googling, the issue is that when a native dtype has .newbyteorder() called on it to make a new dtype that has the *same* native byte order, the .isbuiltin flag incorrectly states that it is not builtin. Using .newbyteorder() to swap the byte order to the non-native byte order should and does cause the resulting dtype to not be considered builtin. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthew.brett at gmail.com Thu Jan 14 12:02:43 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 14 Jan 2010 17:02:43 +0000 Subject: [Numpy-discussion] dtype.isbuiltin changed by .newbyteorder In-Reply-To: <3d375d731001140753o371228a2v160c97af9a06e97b@mail.gmail.com> References: <1e2af89e1001140501m188703d6k46cf68c0ec05cacf@mail.gmail.com> <3d375d731001140753o371228a2v160c97af9a06e97b@mail.gmail.com> Message-ID: <1e2af89e1001140902p65e2bf72u8657353e654baa8@mail.gmail.com> Hi, > It is at least undesirable. It may not be a bug per se as I don't > think that we guarantee that .isbuiltin is free from false negatives > (though we do guarantee that it is free from false positives). The > reason is that we would have to search the builtin dtypes for a match > every time we create a new dtype object, and that could be more > expensive than we care to do for *every* creation of a dtype object. > It is possible that we can have a cheaper heuristic (native byte order > and the standard typecodes) and that transformations like > .newbyteorder() can have just a teeny bit more intelligent logic about > how it transforms the .isbuiltin flag. Thanks - that's very clear and helpful, and made me realize I didn't understand the builtin attribute. I suppose something like the following: output_dtype.isbuiltin = input_dtype.isbuiltin and new_byteorder == native would at least reduce the false negatives at little cost. Cheers, Matthew From rdmoores at gmail.com Thu Jan 14 12:53:44 2010 From: rdmoores at gmail.com (Richard D. Moores) Date: Fri, 15 Jan 2010 01:53:44 +0800 Subject: [Numpy-discussion] hello~ Message-ID: Hi,One of my friends introduce a very good website to me: http://www.flsso.com/. All their products are new and original. They have many brands, such as Sony, HP, Apple, Nokia and so on. Now , they are promoting their products for the coustomers. So their prices are very competitive. By the way, they mainly sell iphones, laptops, tvs, playstation and so on.If you need these products, it will be a good choice.Regards! From lists at onerussian.com Thu Jan 14 16:50:57 2010 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 14 Jan 2010 16:50:57 -0500 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> Message-ID: <20100114215057.GX18213@onerussian.com> Dear NumPy People, First I want to apologize if I misbehaved on NumPy Trac by reopening the closed ticket http://projects.scipy.org/numpy/ticket/1362 but I still feel strongly that there is misunderstanding and the bug/defect is valid. I would appreciate if someone would waste more of his time to persuade me that I am wrong but please first read till the end: The issue, as originally reported, is demonstrated with: ,--- | > python -c 'import numpy as N; print N.__version__; a=N.array([1, (0,1)],dtype=object); print a==1; print a == (0,1), a[1] == (0,1)' | 1.5.0.dev | [ True False] | [False False] True `--- whenever I expected the last line to be [False True] True charris (thanks for all the efforts to enlighten me) summarized it as """the result was correct given that the tuple (0,1) was converted to an object array with elements 0 and 1. It is *not* converted to an array containing a tuple. """ and I was trying to argue that it is not the case in my example. It is the case in charris's example though whenever both elements are of the same length, or there is just a single tuple, i.e. ,--- | In [1]: array((0,1), dtype=object) | Out[1]: array([0, 1], dtype=object) | | In [2]: array((0,1), dtype=object).shape | Out[2]: (2,) `--- There I would not expect my comparison to be valid indeed. But lets see what happens in my case: ,--- | In [2]: array([1, (0,1)],dtype=object) | Out[2]: array([1, (0, 1)], dtype=object) | | *In [3]: array([1, (0,1)],dtype=object).shape | Out[3]: (2,) | | *In [4]: array([1, (0,1)],dtype=object)[1].shape | --------------------------------------------------------------------------- | AttributeError Traceback (most recent call | last) | | /home/yoh/proj/ in () | | AttributeError: 'tuple' object has no attribute 'shape' `--- So, as far as I see it, the array does contain an object of type tuple, which does not get correctly compared upon __eq__ operation. Am I wrong? Or does numpy internally somehow does convert 1st item (ie tuple) into an array, but casts it back to tuple upon __repr__ or __getitem__? Thanks in advance for feedback On Thu, 14 Jan 2010, NumPy Trac wrote: > #1362: comparison operators (e.g. ==) on array with dtype object do not work > -------------------------+-------------------------------------------------- > Reporter: yarikoptic | Owner: somebody > Type: defect | Status: closed > Priority: normal | Milestone: > Component: Other | Version: > Resolution: invalid | Keywords: > -------------------------+-------------------------------------------------- > Changes (by charris): > * status: reopened => closed > * resolution: => invalid > Old description: > > You can see this better with the '*' operator: > > {{{ > > In [8]: a * (0,2) > > Out[8]: array([0, (0, 1, 0, 1)], dtype=object) > > }}} > > Note how the tuple is concatenated with itself. The reason the original > > instance of a worked was that 1 and (0,1) are of different lengths, so > > the decent into the nested sequence types stopped at one level and a > > tuple is one of the elements. When you do something like ((0,1),(0,1)) > > the decent goes down two levels and you end up with a 2x2 array of > > integer objects. The rule of thumb for object arrays is that you get an > > array with as many indices as possible. Which is why object arrays are > > hard to create. Another example: > > {{{ > > In [10]: array([(1,2,3),(1,2)], dtype=object) > > Out[10]: array([(1, 2, 3), (1, 2)], dtype=object) > > In [11]: array([(1,2),(1,2)], dtype=object) > > Out[11]: > > array([[1, 2], > > [1, 2]], dtype=object) > > }}} > New description: > {{{ > python -c 'import numpy as N; print N.__version__; a=N.array([1, > (0,1)],dtype=object); print a==1; print a == (0,1), a[1] == (0,1)' > }}} > results in > {{{ > 1.5.0.dev > [ True False] > [False False] True > }}} > I expected last line to be > {{{ > [False True] True > }}} > So, it works for int but doesn't work for tuple... I guess it doesn't try > to compare element by element but does smth else. -- Yaroslav O. Halchenko Postdoctoral Fellow, Department of Psychological and Brain Sciences Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From warren.weckesser at enthought.com Thu Jan 14 17:49:09 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 14 Jan 2010 16:49:09 -0600 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <20100114215057.GX18213@onerussian.com> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> <20100114215057.GX18213@onerussian.com> Message-ID: <4B4F9F65.2070308@enthought.com> Yaroslav Halchenko wrote: > Dear NumPy People, > > First I want to apologize if I misbehaved on NumPy Trac by reopening the > closed ticket > http://projects.scipy.org/numpy/ticket/1362 > but I still feel strongly that there is misunderstanding > and the bug/defect is valid. I would appreciate if someone would waste > more of his time to persuade me that I am wrong but please first read > till the end: > > The issue, as originally reported, is demonstrated with: > > ,--- > | > python -c 'import numpy as N; print N.__version__; a=N.array([1, (0,1)],dtype=object); print a==1; print a == (0,1), a[1] == (0,1)' > | 1.5.0.dev > | [ True False] > | [False False] True > `--- > > whenever I expected the last line to be > > [False True] True > > charris (thanks for all the efforts to enlighten me) summarized it as > > """the result was correct given that the tuple (0,1) was converted to an > object array with elements 0 and 1. It is *not* converted to an array > containing a tuple. """ > > and I was trying to argue that it is not the case in my example. It is > the case in charris's example though whenever both elements are of > the same length, or there is just a single tuple, i.e. > > The "problem" is that the tuple is converted to an array in the statement that does the comparison, not in the construction of the array. Numpy attempts to convert the right hand side of the == operator into an array. It then does the comparison using the two arrays. One way to get what you want is to create your own array and then do the comparison: In [1]: import numpy as np In [2]: a = np.array([1, (0,1)], dtype='O') In [3]: t = np.empty(1, dtype='O') In [4]: t[0] = (0,1) In [5]: a == t Out[5]: array([False, True], dtype=bool) In the above code, a numpy array 't' of objects with shape (1,) is created, and the single element is assigned the value (0,1). Then the comparison works as expected. More food for thought: In [6]: b = np.array([1, (0,1), "foo"], dtype='O') In [7]: b == 1 Out[7]: array([ True, False, False], dtype=bool) In [8]: b == (0,1) Out[8]: False In [9]: b == "foo" Out[9]: array([False, False, True], dtype=bool) Warren > ,--- > | In [1]: array((0,1), dtype=object) > | Out[1]: array([0, 1], dtype=object) > | > | In [2]: array((0,1), dtype=object).shape > | Out[2]: (2,) > `--- > > There I would not expect my comparison to be valid indeed. But lets see what > happens in my case: > > ,--- > | In [2]: array([1, (0,1)],dtype=object) > | Out[2]: array([1, (0, 1)], dtype=object) > | > | *In [3]: array([1, (0,1)],dtype=object).shape > | Out[3]: (2,) > | > | *In [4]: array([1, (0,1)],dtype=object)[1].shape > | --------------------------------------------------------------------------- > | AttributeError Traceback (most recent call > | last) > | > | /home/yoh/proj/ in () > | > | AttributeError: 'tuple' object has no attribute 'shape' > `--- > > So, as far as I see it, the array does contain an object of type tuple, > which does not get correctly compared upon __eq__ operation. Am I > wrong? Or does numpy internally somehow does convert 1st item (ie > tuple) into an array, but casts it back to tuple upon __repr__ or > __getitem__? > > Thanks in advance for feedback > > On Thu, 14 Jan 2010, NumPy Trac wrote: > > >> #1362: comparison operators (e.g. ==) on array with dtype object do not work >> -------------------------+-------------------------------------------------- >> Reporter: yarikoptic | Owner: somebody >> Type: defect | Status: closed >> Priority: normal | Milestone: >> Component: Other | Version: >> Resolution: invalid | Keywords: >> -------------------------+-------------------------------------------------- >> Changes (by charris): >> > > >> * status: reopened => closed >> * resolution: => invalid >> > > > >> Old description: >> > > >>> You can see this better with the '*' operator: >>> > > > >>> {{{ >>> In [8]: a * (0,2) >>> Out[8]: array([0, (0, 1, 0, 1)], dtype=object) >>> }}} >>> > > > >>> Note how the tuple is concatenated with itself. The reason the original >>> instance of a worked was that 1 and (0,1) are of different lengths, so >>> the decent into the nested sequence types stopped at one level and a >>> tuple is one of the elements. When you do something like ((0,1),(0,1)) >>> the decent goes down two levels and you end up with a 2x2 array of >>> integer objects. The rule of thumb for object arrays is that you get an >>> array with as many indices as possible. Which is why object arrays are >>> hard to create. Another example: >>> > > > >>> {{{ >>> In [10]: array([(1,2,3),(1,2)], dtype=object) >>> Out[10]: array([(1, 2, 3), (1, 2)], dtype=object) >>> > > >>> In [11]: array([(1,2),(1,2)], dtype=object) >>> Out[11]: >>> array([[1, 2], >>> [1, 2]], dtype=object) >>> }}} >>> > > >> New description: >> > > >> {{{ >> python -c 'import numpy as N; print N.__version__; a=N.array([1, >> (0,1)],dtype=object); print a==1; print a == (0,1), a[1] == (0,1)' >> }}} >> results in >> {{{ >> 1.5.0.dev >> [ True False] >> [False False] True >> }}} >> I expected last line to be >> {{{ >> [False True] True >> }}} >> So, it works for int but doesn't work for tuple... I guess it doesn't try >> to compare element by element but does smth else. >> From josef.pktd at gmail.com Thu Jan 14 18:40:20 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 Jan 2010 18:40:20 -0500 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <4B4F9F65.2070308@enthought.com> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> <20100114215057.GX18213@onerussian.com> <4B4F9F65.2070308@enthought.com> Message-ID: <1cd32cbb1001141540i398089f3xc4cecb684ca3ddfb@mail.gmail.com> On Thu, Jan 14, 2010 at 5:49 PM, Warren Weckesser wrote: > Yaroslav Halchenko wrote: >> Dear NumPy People, >> >> First I want to apologize if I misbehaved on NumPy Trac by reopening the >> closed ticket >> http://projects.scipy.org/numpy/ticket/1362 >> but I still feel strongly that there is misunderstanding >> and the bug/defect is valid. ? I would appreciate if someone would waste >> more of his time to persuade me that I am wrong but please first read >> till the end: >> >> The issue, as originally reported, is demonstrated with: >> >> ,--- >> | > python -c 'import numpy as N; print N.__version__; a=N.array([1, (0,1)],dtype=object); print a==1; print a == (0,1), ?a[1] == (0,1)' >> | 1.5.0.dev >> | [ True False] >> | [False False] True >> `--- >> >> whenever I expected the last line to be >> >> [False True] True >> >> charris (thanks for all the efforts to enlighten me) summarized it as >> >> """the result was correct given that the tuple (0,1) was converted to an >> object array with elements 0 and 1. It is *not* converted to an array >> containing a tuple. """ >> >> and I was trying to argue that it is not the case in my example. ?It is >> the case in charris's example though whenever both elements are of >> the same length, or there is just a single tuple, i.e. >> >> > > The "problem" is that the tuple is converted to an array in the > statement that > does the comparison, not in the construction of the array. ?Numpy attempts > to convert the right hand side of the == operator into an array. ?It > then does > the comparison using the two arrays. > > One way to get what you want is to create your own array and then do > the comparison: > > In [1]: import numpy as np > > In [2]: a = np.array([1, (0,1)], dtype='O') > > In [3]: t = np.empty(1, dtype='O') > > In [4]: t[0] = (0,1) > > In [5]: a == t > Out[5]: array([False, ?True], dtype=bool) > > > In the above code, a numpy array 't' of objects with shape (1,) is created, > and the single element is assigned the value (0,1). ?Then the comparison > works as expected. > > More food for thought: > > In [6]: b = np.array([1, (0,1), "foo"], dtype='O') > > In [7]: b == 1 > Out[7]: array([ True, False, False], dtype=bool) > > In [8]: b == (0,1) > Out[8]: False > > In [9]: b == "foo" > Out[9]: array([False, False, ?True], dtype=bool) > It looks difficult to construct an object array with only 1 element, since a tuple is interpreted as different array elements. >>> N.array([(0,1)],dtype=object).shape (1, 2) >>> N.array([(0,1),()],dtype=object).shape (2,) >>> c = N.array([(0,1),()],dtype=object)[:1] >>> c.shape1,) >>> a == c array([False, True], dtype=bool) It looks like some convention is necessary for interpreting a tuple in the array construction, but it doesn't look like a problem with the comparison operator just a consequence. Josef > Warren > >> ,--- >> | In [1]: array((0,1), dtype=object) >> | Out[1]: array([0, 1], dtype=object) >> | >> | In [2]: array((0,1), dtype=object).shape >> | Out[2]: (2,) >> `--- >> >> There I would not expect my comparison to be valid indeed. ?But lets see what >> happens in my case: >> >> ,--- >> | In [2]: array([1, (0,1)],dtype=object) >> | Out[2]: array([1, (0, 1)], dtype=object) >> | >> | *In [3]: array([1, (0,1)],dtype=object).shape >> | Out[3]: (2,) >> | >> | *In [4]: array([1, (0,1)],dtype=object)[1].shape >> | --------------------------------------------------------------------------- >> | AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call >> | last) >> | >> | /home/yoh/proj/ in () >> | >> | AttributeError: 'tuple' object has no attribute 'shape' >> `--- >> >> So, as far as I see it, the array does contain an object of type tuple, >> which does not get correctly compared upon __eq__ operation. ?Am I >> wrong? ?Or does numpy internally somehow does convert 1st item (ie >> tuple) into an array, but casts it back to tuple upon __repr__ or >> __getitem__? >> >> Thanks in advance for feedback >> >> On Thu, 14 Jan 2010, NumPy Trac wrote: >> >> >>> #1362: comparison operators (e.g. ==) on array with dtype object do not work >>> -------------------------+-------------------------------------------------- >>> ? Reporter: ?yarikoptic ?| ? ? ? Owner: ?somebody >>> ? ? ? Type: ?defect ? ? ?| ? ? ?Status: ?closed >>> ? Priority: ?normal ? ? ?| ? Milestone: >>> ?Component: ?Other ? ? ? | ? ? Version: >>> Resolution: ?invalid ? ? | ? ?Keywords: >>> -------------------------+-------------------------------------------------- >>> Changes (by charris): >>> >> >> >>> ? * status: ?reopened => closed >>> ? * resolution: ?=> invalid >>> >> >> >> >>> Old description: >>> >> >> >>>> You can see this better with the '*' operator: >>>> >> >> >> >>>> {{{ >>>> In [8]: a * (0,2) >>>> Out[8]: array([0, (0, 1, 0, 1)], dtype=object) >>>> }}} >>>> >> >> >> >>>> Note how the tuple is concatenated with itself. The reason the original >>>> instance of a worked was that 1 and (0,1) are of different lengths, so >>>> the decent into the nested sequence types stopped at one level and a >>>> tuple is one of the elements. When you do something like ((0,1),(0,1)) >>>> the decent goes down two levels and you end up with a 2x2 array of >>>> integer objects. The rule of thumb for object arrays is that you get an >>>> array with as many indices as possible. Which is why object arrays are >>>> hard to create. Another example: >>>> >> >> >> >>>> {{{ >>>> In [10]: array([(1,2,3),(1,2)], dtype=object) >>>> Out[10]: array([(1, 2, 3), (1, 2)], dtype=object) >>>> >> >> >>>> In [11]: array([(1,2),(1,2)], dtype=object) >>>> Out[11]: >>>> array([[1, 2], >>>> ? ? ? ?[1, 2]], dtype=object) >>>> }}} >>>> >> >> >>> New description: >>> >> >> >>> ?{{{ >>> ?python -c 'import numpy as N; print N.__version__; a=N.array([1, >>> ?(0,1)],dtype=object); print a==1; print a == (0,1), ?a[1] == (0,1)' >>> ?}}} >>> ?results in >>> ?{{{ >>> ?1.5.0.dev >>> ?[ True False] >>> ?[False False] True >>> ?}}} >>> ?I expected last line to be >>> ?{{{ >>> ?[False True] True >>> ?}}} >>> ?So, it works for int but doesn't work for tuple... I guess it doesn't try >>> ?to compare element by element but does smth else. >>> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From lists at onerussian.com Thu Jan 14 19:05:16 2010 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 14 Jan 2010 19:05:16 -0500 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <1cd32cbb1001141540i398089f3xc4cecb684ca3ddfb@mail.gmail.com> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> <20100114215057.GX18213@onerussian.com> <4B4F9F65.2070308@enthought.com> <1cd32cbb1001141540i398089f3xc4cecb684ca3ddfb@mail.gmail.com> Message-ID: <20100115000515.GB19319@onerussian.com> On Thu, 14 Jan 2010, josef.pktd at gmail.com wrote: > It looks difficult to construct an object array with only 1 element, > since a tuple is interpreted as different array elements. yeap > It looks like some convention is necessary for interpreting a tuple in > the array construction, but it doesn't look like a problem with the > comparison operator just a consequence. Well -- there is a reason why we use tuples -- they are immutable ... as well as strings actually ;) Thus, imho, it would be a logical API if immutable datatypes are not coerced magically into mutable arrays at least whenever I am already requesting dtype='O'. Such generic treatment of immutable dtypes would address special treatment of strings but it is too much of a change and debatable anyways ;-) -- Yaroslav O. Halchenko Postdoctoral Fellow, Department of Psychological and Brain Sciences Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From lists at onerussian.com Thu Jan 14 19:05:41 2010 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 14 Jan 2010 19:05:41 -0500 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <4B4F9F65.2070308@enthought.com> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> <20100114215057.GX18213@onerussian.com> <4B4F9F65.2070308@enthought.com> Message-ID: <20100115000540.GF18218@onerussian.com> Hi Warren, > The "problem" is that the tuple is converted to an array in the > statement that does the comparison, not in the construction of the > array. Numpy attempts > to convert the right hand side of the == operator into an array. > It then does the comparison using the two arrays. Thanks for the description! It kinda makes sense now, although, in general, I am not pleased with the API, I would take it as a documented feature from now on ;) > One way to get what you want is to create your own array and then do > the comparison: yeah... I might like to check if lhs has dtype==dtype('object') and then convert that rhs item into object array before comparison (for now I just did list comprehension ;)) > In [8]: b == (0,1) > Out[8]: False yeah -- lengths are different now ;) > In [9]: b == "foo" > Out[9]: array([False, False, True], dtype=bool) yeah -- strings are getting special treatment despite being iterables ;) but that is ok I guess anyways The main confusion seems to come from the feature of numpy in doing smart things -- like deciding either it thinks it needs to do element-wise comparison across lhs and rhs (if lengths match) or mapping comparison across all items. That behavior is quite different from basic Python iterable containers suchas tuples and lists, where it does just global comparison: ,--- | *In [33]: [1,2] == [1,3] | Out[33]: False | | *In [34]: array([1,2]) == array([1,3]) | Out[34]: array([ True, False], dtype=bool) `--- I guess I just need to remember that and what you have described thanks again -- Yaroslav O. Halchenko Postdoctoral Fellow, Department of Psychological and Brain Sciences Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From david at silveregg.co.jp Thu Jan 14 20:52:02 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 15 Jan 2010 10:52:02 +0900 Subject: [Numpy-discussion] Matrix vs array in ma.minimum In-Reply-To: <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> References: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> Message-ID: <4B4FCA42.1030205@silveregg.co.jp> Pierre GM wrote: > > Er, no. > np.ma.minimum(a, b) returns the lowest value of a and b element-wsie, or the the lowest element of a is b is None. The behavior is inherited from the very first implementation of maskedarray in numeric. This itself is unexpected, since np.minimum requires at least 2 input arguments. > > As you observed, the current function breaks down w/ np.matrix objects when only one argument is given (and when the axis is None): we call umath.minimum.reduce on the ravelled matirx, which returns the ravelled matrix. One would expect a scalar, so yes, this behavior is also unexpected. > > Now, which way should we go ? Keep np.ma.minimum as it is (fixing the bug so that a scalar is returned if the function is called with only 1 argument and an axis None) ? Adapt it to match np.minimum ? I am not a user of Masked Array, so I don't know what is the most desirable behavior. The problem appears when using pylab.imshow on matrices, because matplotlib (and not matlab :) ) uses masked arrays when normalizing the values. cheers, David From pgmdevlist at gmail.com Thu Jan 14 21:59:53 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 14 Jan 2010 21:59:53 -0500 Subject: [Numpy-discussion] Matrix vs array in ma.minimum In-Reply-To: <4B4FCA42.1030205@silveregg.co.jp> References: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> <4B4FCA42.1030205@silveregg.co.jp> Message-ID: <5CEE736D-90A8-4B59-A5DD-BDDEC71FA6C9@gmail.com> On Jan 14, 2010, at 8:52 PM, David Cournapeau wrote: > Pierre GM wrote: > >> >> Er, no. >> np.ma.minimum(a, b) returns the lowest value of a and b element-wsie, or the the lowest element of a is b is None. The behavior is inherited from the very first implementation of maskedarray in numeric. This itself is unexpected, since np.minimum requires at least 2 input arguments. >> >> As you observed, the current function breaks down w/ np.matrix objects when only one argument is given (and when the axis is None): we call umath.minimum.reduce on the ravelled matirx, which returns the ravelled matrix. One would expect a scalar, so yes, this behavior is also unexpected. >> >> Now, which way should we go ? Keep np.ma.minimum as it is (fixing the bug so that a scalar is returned if the function is called with only 1 argument and an axis None) ? Adapt it to match np.minimum ? > > I am not a user of Masked Array, so I don't know what is the most > desirable behavior. I'm not a regular user of np.minimum. > The problem appears when using pylab.imshow on > matrices, because matplotlib (and not matlab :) ) uses masked arrays > when normalizing the values. David, you mind pointing me to the relevan part of the code and/or give me an example ? In any case, I'd appreciate more feedback on the behavior of np.ma.minimum. From charlesr.harris at gmail.com Thu Jan 14 22:47:54 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Jan 2010 20:47:54 -0700 Subject: [Numpy-discussion] comparison operators (e.g. ==) on array with dtype object do not work In-Reply-To: <4B4F9F65.2070308@enthought.com> References: <022.87efc5288ac90c7afc65516c6af78b0a@scipy.org> <031.76ccb962b442426fadc5c22295c7d17f@scipy.org> <20100114215057.GX18213@onerussian.com> <4B4F9F65.2070308@enthought.com> Message-ID: On Thu, Jan 14, 2010 at 3:49 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > Yaroslav Halchenko wrote: > > Dear NumPy People, > > > > First I want to apologize if I misbehaved on NumPy Trac by reopening the > > closed ticket > > http://projects.scipy.org/numpy/ticket/1362 > > but I still feel strongly that there is misunderstanding > > and the bug/defect is valid. I would appreciate if someone would waste > > more of his time to persuade me that I am wrong but please first read > > till the end: > > > > The issue, as originally reported, is demonstrated with: > > > > ,--- > > | > python -c 'import numpy as N; print N.__version__; a=N.array([1, > (0,1)],dtype=object); print a==1; print a == (0,1), a[1] == (0,1)' > > | 1.5.0.dev > > | [ True False] > > | [False False] True > > `--- > > > > whenever I expected the last line to be > > > > [False True] True > > > > charris (thanks for all the efforts to enlighten me) summarized it as > > > > """the result was correct given that the tuple (0,1) was converted to an > > object array with elements 0 and 1. It is *not* converted to an array > > containing a tuple. """ > > > > and I was trying to argue that it is not the case in my example. It is > > the case in charris's example though whenever both elements are of > > the same length, or there is just a single tuple, i.e. > > > > > > The "problem" is that the tuple is converted to an array in the > statement that > does the comparison, not in the construction of the array. Numpy attempts > to convert the right hand side of the == operator into an array. It > then does > the comparison using the two arrays. > > One way to get what you want is to create your own array and then do > the comparison: > > In [1]: import numpy as np > > In [2]: a = np.array([1, (0,1)], dtype='O') > > In [3]: t = np.empty(1, dtype='O') > > In [4]: t[0] = (0,1) > > In [5]: a == t > Out[5]: array([False, True], dtype=bool) > > > In the above code, a numpy array 't' of objects with shape (1,) is created, > and the single element is assigned the value (0,1). Then the comparison > works as expected. > > More food for thought: > > In [6]: b = np.array([1, (0,1), "foo"], dtype='O') > > In [7]: b == 1 > Out[7]: array([ True, False, False], dtype=bool) > > In [8]: b == (0,1) > Out[8]: False > > Oooh, that last one is strange. Also In [6]: arange(2) == arange(3) Out[6]: False So the comparison isn't element-wise. I rather think a shape mismatch error should be raised in this case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Jan 14 23:06:36 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 15 Jan 2010 13:06:36 +0900 Subject: [Numpy-discussion] Matrix vs array in ma.minimum In-Reply-To: <5CEE736D-90A8-4B59-A5DD-BDDEC71FA6C9@gmail.com> References: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> <4B4FCA42.1030205@silveregg.co.jp> <5CEE736D-90A8-4B59-A5DD-BDDEC71FA6C9@gmail.com> Message-ID: <5b8d13221001142006u35c6e810gaf0af4806981a821@mail.gmail.com> On Fri, Jan 15, 2010 at 11:59 AM, Pierre GM wrote: > On Jan 14, 2010, at 8:52 PM, David Cournapeau wrote: >> Pierre GM wrote: >> >>> >>> Er, no. >>> np.ma.minimum(a, b) returns the lowest value of a and b element-wsie, or the the lowest element of a is b is None. The behavior is inherited from the very first implementation of maskedarray in numeric. This itself is unexpected, since np.minimum requires at least 2 input arguments. >>> >>> As you observed, the current function breaks down w/ np.matrix objects when only one argument is given (and when the axis is None): we call umath.minimum.reduce on the ravelled matirx, which returns the ravelled matrix. One would expect a scalar, so yes, this behavior is also unexpected. >>> >>> Now, which way should we go ? Keep np.ma.minimum as it is (fixing the bug so that a scalar is returned if the function is called with only 1 argument and an axis ?None) ? Adapt it to match np.minimum ? >> >> I am not a user of Masked Array, so I don't know what is the most >> desirable behavior. > > I'm not a regular user of np.minimum. Damn, I thought I coul > >> The problem appears when using pylab.imshow on >> matrices, because matplotlib (and not matlab :) ) uses masked arrays >> when normalizing the values. > > > David, you mind pointing me to the relevan part of the code and/or give me an example ? Here is a self-contained example reproducing the matplotlib pb: import numpy as np from numpy import ma import matplotlib.cbook as cbook class Normalize: """ Normalize a given value to the 0-1 range """ def __init__(self, vmin=None, vmax=None, clip=False): """ If *vmin* or *vmax* is not given, they are taken from the input's minimum and maximum value respectively. If *clip* is *True* and the given value falls outside the range, the returned value will be 0 or 1, whichever is closer. Returns 0 if:: vmin==vmax Works with scalars or arrays, including masked arrays. If *clip* is *True*, masked values are set to 1; otherwise they remain masked. Clipping silently defeats the purpose of setting the over, under, and masked colors in the colormap, so it is likely to lead to surprises; therefore the default is *clip* = *False*. """ self.vmin = vmin self.vmax = vmax self.clip = clip def __call__(self, value, clip=None): if clip is None: clip = self.clip if cbook.iterable(value): vtype = 'array' val = ma.asarray(value).astype(np.float) else: vtype = 'scalar' val = ma.array([value]).astype(np.float) self.autoscale_None(val) vmin, vmax = self.vmin, self.vmax if vmin > vmax: raise ValueError("minvalue must be less than or equal to maxvalue") elif vmin==vmax: return 0.0 * val else: if clip: mask = ma.getmask(val) val = ma.array(np.clip(val.filled(vmax), vmin, vmax), mask=mask) result = (val-vmin) * (1.0/(vmax-vmin)) if vtype == 'scalar': result = result[0] return result def inverse(self, value): if not self.scaled(): raise ValueError("Not invertible until scaled") vmin, vmax = self.vmin, self.vmax if cbook.iterable(value): val = ma.asarray(value) return vmin + val * (vmax - vmin) else: return vmin + value * (vmax - vmin) def autoscale(self, A): ''' Set *vmin*, *vmax* to min, max of *A*. ''' self.vmin = ma.minimum(A) self.vmax = ma.maximum(A) def autoscale_None(self, A): ' autoscale only None-valued vmin or vmax' if self.vmin is None: self.vmin = ma.minimum(A) if self.vmax is None: self.vmax = ma.maximum(A) def scaled(self): 'return true if vmin and vmax set' return (self.vmin is not None and self.vmax is not None) if __name__ == "__main__": x = np.random.randn(10, 10) mx = np.matrix(x) print Normalize()(x) print Normalize()(mx) From charlesr.harris at gmail.com Fri Jan 15 00:36:31 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Jan 2010 22:36:31 -0700 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <4B4EBAE2.7040703@silveregg.co.jp> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> Message-ID: On Wed, Jan 13, 2010 at 11:34 PM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > > What is the setup one needs to build the installers? It might be well to > > document that, the dependencies, and the process. > > Right. The top script is: > http://projects.scipy.org/numpy/browser/trunk/release.sh > > the bulk of the work is in : > http://projects.scipy.org/numpy/browser/trunk/pavement.py > > which describes what is needed to build installers. On mac os x, the > release script may be used as is to build every installer + the release > notes. > > Umm, I think it needs some more explanation. There are virtual environments, c compilers, wine, paver, etc. All/ of these might require some installation, version numbers, and setup. This might all seem clear to you, but a newbie coming on to build the packages probably needs more instruction. What sort of setup do you run, what hardware, etc. If code needs to be compiled for the PCC I assume the compiler needs to be to do that. What about c libraries (for numpy) and c++ libraries (for scipy)? Does one need a MAC? etc. I'm probably just ignorant, but I think a careful step by step procedure would be helpful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Fri Jan 15 00:44:26 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 15 Jan 2010 14:44:26 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> Message-ID: <4B5000BA.1070701@silveregg.co.jp> Charles R Harris wrote: > > > On Wed, Jan 13, 2010 at 11:34 PM, David Cournapeau > > wrote: > > Charles R Harris wrote: > > > > > > > What is the setup one needs to build the installers? It might be > well to > > document that, the dependencies, and the process. > > Right. The top script is: > http://projects.scipy.org/numpy/browser/trunk/release.sh > > the bulk of the work is in : > http://projects.scipy.org/numpy/browser/trunk/pavement.py > > which describes what is needed to build installers. On mac os x, the > release script may be used as is to build every installer + the release > notes. > > > Umm, I think it needs some more explanation. There are virtual > environments, c compilers, wine, paver, etc. All/ of these might require > some installation, version numbers, and setup. This might all seem clear > to you, but a newbie coming on to build the packages probably needs more > instruction. I think it is a waste of time to document all this very precisely, because it is continuously changing. Documenting everything would boil down to rewrite the paver script in English (and most likely would be much more verbose). That's exactly why I was suggesting to have some volunteers to do 1.4.1 to do the release together, as a way to "pass the knowledge around". cheers, David From pgmdevlist at gmail.com Fri Jan 15 04:10:51 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 15 Jan 2010 04:10:51 -0500 Subject: [Numpy-discussion] Matrix vs array in ma.minimum In-Reply-To: <5b8d13221001142006u35c6e810gaf0af4806981a821@mail.gmail.com> References: <5b8d13221001140153ocd65f2fwc64c4820aa9c7a@mail.gmail.com> <6C95E82E-B37F-47CC-A80D-995C779331BF@gmail.com> <4B4FCA42.1030205@silveregg.co.jp> <5CEE736D-90A8-4B59-A5DD-BDDEC71FA6C9@gmail.com> <5b8d13221001142006u35c6e810gaf0af4806981a821@mail.gmail.com> Message-ID: <6C270179-7B32-4907-96A7-0F1477A8044E@gmail.com> On Jan 14, 2010, at 11:06 PM, David Cournapeau wrote: > On Fri, Jan 15, 2010 at 11:59 AM, Pierre GM wrote: >> On Jan 14, 2010, at 8:52 PM, David Cournapeau wrote: >>> Pierre GM wrote: >>> >>>> >>>> Er, no. >>>> np.ma.minimum(a, b) returns the lowest value of a and b element-wsie, or the the lowest element of a is b is None. The behavior is inherited from the very first implementation of maskedarray in numeric. This itself is unexpected, since np.minimum requires at least 2 input arguments. >>>> >>>> As you observed, the current function breaks down w/ np.matrix objects when only one argument is given (and when the axis is None): we call umath.minimum.reduce on the ravelled matirx, which returns the ravelled matrix. One would expect a scalar, so yes, this behavior is also unexpected. >>>> >>>> Now, which way should we go ? Keep np.ma.minimum as it is (fixing the bug so that a scalar is returned if the function is called with only 1 argument and an axis None) ? Adapt it to match np.minimum ? >>> >>> I am not a user of Masked Array, so I don't know what is the most >>> desirable behavior. >> >> I'm not a regular user of np.minimum. > > Damn, I thought I coul >> >>> The problem appears when using pylab.imshow on >>> matrices, because matplotlib (and not matlab :) ) uses masked arrays >>> when normalizing the values. >> >> >> David, you mind pointing me to the relevan part of the code and/or give me an example ? > > Here is a self-contained example reproducing the matplotlib pb: > OK, thx a lot. I'll work on it as soon as I can. From seb.haase at gmail.com Fri Jan 15 04:38:14 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 15 Jan 2010 10:38:14 +0100 Subject: [Numpy-discussion] broken links on http://numpy.scipy.org/ Message-ID: Hi, Apparently this very nice looking icons (4 of the 5 icons or so) at http://numpy.scipy.org/ are broken links. Regards, Sebastian Haase From ralf.gommers at googlemail.com Fri Jan 15 09:46:18 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 15 Jan 2010 22:46:18 +0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <4B4EBAE2.7040703@silveregg.co.jp> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> Message-ID: Hi David, Here are some questions to get a clearer idea of exactly what's involved in / required for making a release. On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > > What is the setup one needs to build the installers? It might be well to > > document that, the dependencies, and the process. > > Right. The top script is: > http://projects.scipy.org/numpy/browser/trunk/release.sh > > the bulk of the work is in : > http://projects.scipy.org/numpy/browser/trunk/pavement.py > > which describes what is needed to build installers. On mac os x, the > release script may be used as is to build every installer + the release > notes. > > Is it necessary to have OS X to build the dmg installer, or could you build that from linux with some modifications to the build script? How many combinations do you test manually? All supported Python versions on all platforms? Several Linux flavors? For someone new to packaging, how much time would you estimate it takes to do a single release? Is most of this time spent testing, or fixing the problems you find during testing? Do you have an idea about when to start preparing for the release of 1.4.1? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Fri Jan 15 09:50:14 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 15 Jan 2010 15:50:14 +0100 Subject: [Numpy-discussion] svn log and blank entries Message-ID: Hi all, An svn log > CHANGELOG in svn/numpy yields some blank entries Is that intended ? ------------------------------------------------------------------------ r8055 | ariver | 2010-01-15 03:02:30 +0100 (Fr, 15 Jan 2010) | 1 line _ ------------------------------------------------------------------------ r8054 | ariver | 2010-01-15 02:57:56 +0100 (Fr, 15 Jan 2010) | 1 line _ ------------------------------------------------------------------------ r8053 | ariver | 2010-01-15 02:51:02 +0100 (Fr, 15 Jan 2010) | 1 line _ Nils From josef.pktd at gmail.com Fri Jan 15 09:55:15 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 09:55:15 -0500 Subject: [Numpy-discussion] svn log and blank entries In-Reply-To: References: Message-ID: <1cd32cbb1001150655haa6b7aaie3273b4d95a53402@mail.gmail.com> On Fri, Jan 15, 2010 at 9:50 AM, Nils Wagner wrote: > Hi all, > > An svn log > CHANGELOG in svn/numpy yields some blank > entries > Is that intended ? > > > ------------------------------------------------------------------------ > r8055 | ariver | 2010-01-15 03:02:30 +0100 (Fr, 15 Jan > 2010) | 1 line > > _ > ------------------------------------------------------------------------ > r8054 | ariver | 2010-01-15 02:57:56 +0100 (Fr, 15 Jan > 2010) | 1 line > > _ > ------------------------------------------------------------------------ > r8053 | ariver | 2010-01-15 02:51:02 +0100 (Fr, 15 Jan > 2010) | 1 line > according to Robert this was some checking/testing by the sysadmin Josef > _ > > Nils > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Jan 15 10:28:10 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Jan 2010 09:28:10 -0600 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> Message-ID: <3d375d731001150728t7194ecb6uff28e0a116a36be4@mail.gmail.com> On Fri, Jan 15, 2010 at 08:46, Ralf Gommers wrote: > Is it necessary to have OS X to build the dmg installer, or could you build > that from linux with some modifications to the build script? No, you need OS X to build and package the OS X binaries. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Fri Jan 15 10:31:44 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 10:31:44 -0500 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? Message-ID: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> I had a problem because linal.eig doesn't rebuild the original matrix, linalg.eigh does, see script below Whats the trick with linalg.eig to get the original (or the inverse) back ? None of my variations on the formulas worked. Thanks, Josef import numpy as np import scipy as sp import scipy.linalg omega = np.array([[ 6., 2., 2., 0., 0., 3., 0., 0.], [ 2., 6., 2., 3., 0., 0., 3., 0.], [ 2., 2., 6., 0., 3., 0., 0., 3.], [ 0., 3., 0., 6., 2., 0., 3., 0.], [ 0., 0., 3., 2., 6., 0., 0., 3.], [ 3., 0., 0., 0., 0., 6., 2., 2.], [ 0., 3., 0., 3., 0., 2., 6., 2.], [ 0., 0., 3., 0., 3., 2., 2., 6.]]) for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: print fun.__module__, fun ev, evec = fun(omega) omegainv = np.dot(evec, (1/ev * evec).T) omegainv2 = np.linalg.inv(omega) omegacomp = np.dot(evec, (ev * evec).T) print 'composition', print np.max(np.abs(omegacomp - omega)) print 'inverse', print np.max(np.abs(omegainv - omegainv2)) this prints: numpy.linalg.linalg composition 0.405241032278 inverse 0.405241032278 numpy.linalg.linalg composition 3.5527136788e-015 inverse 7.21644966006e-016 scipy.linalg.decomp composition 0.238386662463 inverse 0.238386662463 scipy.linalg.decomp composition 3.99680288865e-015 inverse 4.99600361081e-016 From cournape at gmail.com Fri Jan 15 10:56:08 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 16 Jan 2010 00:56:08 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> Message-ID: <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> On Fri, Jan 15, 2010 at 11:46 PM, Ralf Gommers wrote: > Hi David, > > Here are some questions to get a clearer idea of exactly what's involved in > / required for making a release. > > On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau > wrote: >> >> Charles R Harris wrote: >> >> > >> > >> > What is the setup one needs to build the installers? It might be well to >> > document that, the dependencies, and the process. >> >> Right. The top script is: >> http://projects.scipy.org/numpy/browser/trunk/release.sh >> >> the bulk of the work is in : >> http://projects.scipy.org/numpy/browser/trunk/pavement.py >> >> which describes what is needed to build installers. On mac os x, the >> release script may be used as is to build every installer + the release >> notes. >> > > Is it necessary to have OS X to build the dmg installer, or could you build > that from linux with some modifications to the build script? You cannot cross compile python extensions, so you have to build installer on each platform. Mac Os X is the most practical because you can build windows installers under wine, so you can build both mac and windows installers from the same machine. The paver script + the shell script can build everything in one step thanks to this. > > How many combinations do you test manually? All supported Python versions on > all platforms? Several Linux flavors? I basically assume that linux works once the branch is stabilized, if only because that's what most developers use. It is important to test on the oldest supported python (2.4) and both 32 and 64 bits, though (especially python 2.4 on 64 bits). I never test the installers - this is too much work manually. Ideally, this should be done on a build/test farm. > For someone new to packaging, how much time would you estimate it takes to > do a single release? Is most of this time spent testing, or fixing the > problems you find during testing? Most of the time is spent on fixing build issues which crop up during the beta phase. I found difficult to enforce a strict policy on not changing anything unless critical once in the beta phase. I think we should improve things in that aspect, and go away from the "but this is a small fix" mentality - maybe using something like for the linux kernel, with merge windows, etc... I secretly hope that if we can regularly change release managers, it will give a sense of why this is good policy :) I feel that we have improved things quite a bit since I have started doing releases: the binary installers are more stable, and build are mostly automated now. The next step would be automated testing of the binary installers (in particular testing new numpy against scipy, etc...), but this is quite a bit of work. Having a stricter time-based policy would be good as well. > Do you have an idea about when to start preparing for the release of 1.4.1? The cython thing is the most problematic bug, and should be fixed ASAP. I am still not sure whether that would require both a numpy 1.4.1 and a scipy 0.7.1.1 (built against numpy 1.3.0). cheers, David From sebastian.walter at gmail.com Fri Jan 15 11:32:04 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Fri, 15 Jan 2010 17:32:04 +0100 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? In-Reply-To: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> References: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> Message-ID: numpy.linalg.eig guarantees to return right eigenvectors. evec is not necessarily an orthonormal matrix when there are eigenvalues with multiplicity >1. For symmetrical matrices you'll have mutually orthogonal eigenspaces but each eigenspace might be spanned by vectors that are not orthogonal to each other. Your omega has eigenvalue 1 with multiplicity 3. On Fri, Jan 15, 2010 at 4:31 PM, wrote: > I had a problem because linal.eig doesn't rebuild the original matrix, > linalg.eigh does, see script below > > Whats the trick with linalg.eig to get the original (or the inverse) > back ? None of my variations on the formulas worked. > > Thanks, > Josef > > > import numpy as np > import scipy as sp > import scipy.linalg > > omega = ?np.array([[ 6., ?2., ?2., ?0., ?0., ?3., ?0., ?0.], > ? ? ? ? ? ? ? ? ? [ 2., ?6., ?2., ?3., ?0., ?0., ?3., ?0.], > ? ? ? ? ? ? ? ? ? [ 2., ?2., ?6., ?0., ?3., ?0., ?0., ?3.], > ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?6., ?2., ?0., ?3., ?0.], > ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?2., ?6., ?0., ?0., ?3.], > ? ? ? ? ? ? ? ? ? [ 3., ?0., ?0., ?0., ?0., ?6., ?2., ?2.], > ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?3., ?0., ?2., ?6., ?2.], > ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?0., ?3., ?2., ?2., ?6.]]) > > for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: > ? ?print fun.__module__, fun > ? ?ev, evec = fun(omega) > ? ?omegainv = np.dot(evec, (1/ev * evec).T) > ? ?omegainv2 = np.linalg.inv(omega) > ? ?omegacomp = np.dot(evec, (ev * evec).T) > ? ?print 'composition', > ? ?print np.max(np.abs(omegacomp - omega)) > ? ?print 'inverse', > ? ?print np.max(np.abs(omegainv - omegainv2)) > > this prints: > > numpy.linalg.linalg > composition 0.405241032278 > inverse 0.405241032278 > > numpy.linalg.linalg > composition 3.5527136788e-015 > inverse 7.21644966006e-016 > > scipy.linalg.decomp > composition 0.238386662463 > inverse 0.238386662463 > > scipy.linalg.decomp > composition 3.99680288865e-015 > inverse 4.99600361081e-016 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 15 12:07:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 12:07:04 -0500 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? In-Reply-To: References: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> Message-ID: <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> On Fri, Jan 15, 2010 at 11:32 AM, Sebastian Walter wrote: > numpy.linalg.eig guarantees to return right eigenvectors. > evec is not necessarily an orthonormal matrix when there are > eigenvalues with multiplicity >1. > For symmetrical matrices you'll have mutually orthogonal eigenspaces > but each eigenspace might be spanned by > vectors that are not orthogonal to each other. > > Your omega has eigenvalue 1 with multiplicity 3. Yes, I thought about the multiplicity. However, even for random symmetric matrices, I don't get the result I change the example matrix to omega0 = np.random.randn(20,8) omega = np.dot(omega0.T, omega0) print np.max(np.abs(omega == omega.T)) I have been playing with left and right eigenvectors, but I cannot figure out how I could compose my original matrix with them either. I checked with wikipedia, to make sure I remember my (basic) linear algebra http://en.wikipedia.org/wiki/Eigendecomposition_(matrix)#Symmetric_matrices The left and right eigenvectors are almost orthogonal ev, evecl, evecr = sp.linalg.eig(omega, left=1, right=1) >>> np.abs(np.dot(evecl.T, evecl) - np.eye(8))>1e-10 >>> np.abs(np.dot(evecr.T, evecr) - np.eye(8))>1e-10 shows three non-orthogonal pairs >>> ev array([ 6.27688862, 8.45055356, 15.03789945, 19.55477818, 20.33315408, 24.58589363, 28.71796764, 42.88603728]) I always thought eigenvectors are always orthogonal, at least in the case without multiple roots I had assumed that eig will treat symmetric matrices in the same way as eigh. Since I'm mostly or always working with symmetric matrices, I will stick to eigh which does what I expect. Still, I'm currently not able to reproduce any of the composition result on the wikipedia page with linalg.eig which is puzzling. Josef > > > > > On Fri, Jan 15, 2010 at 4:31 PM, ? wrote: >> I had a problem because linal.eig doesn't rebuild the original matrix, >> linalg.eigh does, see script below >> >> Whats the trick with linalg.eig to get the original (or the inverse) >> back ? None of my variations on the formulas worked. >> >> Thanks, >> Josef >> >> >> import numpy as np >> import scipy as sp >> import scipy.linalg >> >> omega = ?np.array([[ 6., ?2., ?2., ?0., ?0., ?3., ?0., ?0.], >> ? ? ? ? ? ? ? ? ? [ 2., ?6., ?2., ?3., ?0., ?0., ?3., ?0.], >> ? ? ? ? ? ? ? ? ? [ 2., ?2., ?6., ?0., ?3., ?0., ?0., ?3.], >> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?6., ?2., ?0., ?3., ?0.], >> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?2., ?6., ?0., ?0., ?3.], >> ? ? ? ? ? ? ? ? ? [ 3., ?0., ?0., ?0., ?0., ?6., ?2., ?2.], >> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?3., ?0., ?2., ?6., ?2.], >> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?0., ?3., ?2., ?2., ?6.]]) >> >> for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: >> ? ?print fun.__module__, fun >> ? ?ev, evec = fun(omega) >> ? ?omegainv = np.dot(evec, (1/ev * evec).T) >> ? ?omegainv2 = np.linalg.inv(omega) >> ? ?omegacomp = np.dot(evec, (ev * evec).T) >> ? ?print 'composition', >> ? ?print np.max(np.abs(omegacomp - omega)) >> ? ?print 'inverse', >> ? ?print np.max(np.abs(omegainv - omegainv2)) >> >> this prints: >> >> numpy.linalg.linalg >> composition 0.405241032278 >> inverse 0.405241032278 >> >> numpy.linalg.linalg >> composition 3.5527136788e-015 >> inverse 7.21644966006e-016 >> >> scipy.linalg.decomp >> composition 0.238386662463 >> inverse 0.238386662463 >> >> scipy.linalg.decomp >> composition 3.99680288865e-015 >> inverse 4.99600361081e-016 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From patrickmarshwx at gmail.com Fri Jan 15 12:11:24 2010 From: patrickmarshwx at gmail.com (Patrick Marsh) Date: Fri, 15 Jan 2010 11:11:24 -0600 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: I'm willing to get on board and learn/help with the releases. At present, my main machine is a Windows 7 box running EPD 5.1.1 and EPD 6.0. I also have a relatively old, but still functional, macbook pro that I can tinker with as well. I will state upfront that while I'm not a complete newbie when it comes to this, I'll certainly need a lot of help initially. (But hey, this is how you learn, right?) No offense if my offer isn't accepted. I just thought I'd thought I'd offer to try and give something back. Patrick -- Patrick Marsh Ph.D. Student / Graduate Research Assistant School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jan 15 12:17:58 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 12:17:58 -0500 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? In-Reply-To: <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> References: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> Message-ID: <1cd32cbb1001150917y618db92cvaf5b58be6c683d56@mail.gmail.com> On Fri, Jan 15, 2010 at 12:07 PM, wrote: > On Fri, Jan 15, 2010 at 11:32 AM, Sebastian Walter > wrote: >> numpy.linalg.eig guarantees to return right eigenvectors. >> evec is not necessarily an orthonormal matrix when there are >> eigenvalues with multiplicity >1. >> For symmetrical matrices you'll have mutually orthogonal eigenspaces >> but each eigenspace might be spanned by >> vectors that are not orthogonal to each other. >> >> Your omega has eigenvalue 1 with multiplicity 3. > > Yes, I thought about the multiplicity. However, even for random > symmetric matrices, I don't get the result > I change the example matrix to > omega0 = np.random.randn(20,8) > omega = np.dot(omega0.T, omega0) > print np.max(np.abs(omega == omega.T)) > > I have been playing with left and right eigenvectors, but I cannot > figure out how I could compose my original matrix with them either. > > I checked with wikipedia, to make sure I remember my (basic) linear algebra > http://en.wikipedia.org/wiki/Eigendecomposition_(matrix)#Symmetric_matrices > > The left and right eigenvectors are almost orthogonal > ev, evecl, evecr = sp.linalg.eig(omega, left=1, right=1) >>>> np.abs(np.dot(evecl.T, evecl) - np.eye(8))>1e-10 >>>> np.abs(np.dot(evecr.T, evecr) - np.eye(8))>1e-10 > > shows three non-orthogonal pairs This doesn't seem to be correct. I think, I had an old omega with multiplicity of eigenvalues in the interpreter. Writing it as a clean script, I get orthogonal left and right eigenvectors. Thanks for the reply, Josef > >>>> ev > array([ ?6.27688862, ? 8.45055356, ?15.03789945, ?19.55477818, > ? ? ? ?20.33315408, ?24.58589363, ?28.71796764, ?42.88603728]) > > > I always thought eigenvectors are always orthogonal, at least in the > case without multiple roots > > I had assumed that eig will treat symmetric matrices in the same way as eigh. > Since I'm mostly or always working with symmetric matrices, I will > stick to eigh which does what I expect. > > Still, I'm currently not able to reproduce any of the composition > result on the wikipedia page with linalg.eig which is puzzling. > > Josef > >> >> >> >> >> On Fri, Jan 15, 2010 at 4:31 PM, ? wrote: >>> I had a problem because linal.eig doesn't rebuild the original matrix, >>> linalg.eigh does, see script below >>> >>> Whats the trick with linalg.eig to get the original (or the inverse) >>> back ? None of my variations on the formulas worked. >>> >>> Thanks, >>> Josef >>> >>> >>> import numpy as np >>> import scipy as sp >>> import scipy.linalg >>> >>> omega = ?np.array([[ 6., ?2., ?2., ?0., ?0., ?3., ?0., ?0.], >>> ? ? ? ? ? ? ? ? ? [ 2., ?6., ?2., ?3., ?0., ?0., ?3., ?0.], >>> ? ? ? ? ? ? ? ? ? [ 2., ?2., ?6., ?0., ?3., ?0., ?0., ?3.], >>> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?6., ?2., ?0., ?3., ?0.], >>> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?2., ?6., ?0., ?0., ?3.], >>> ? ? ? ? ? ? ? ? ? [ 3., ?0., ?0., ?0., ?0., ?6., ?2., ?2.], >>> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?3., ?0., ?2., ?6., ?2.], >>> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?0., ?3., ?2., ?2., ?6.]]) >>> >>> for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: >>> ? ?print fun.__module__, fun >>> ? ?ev, evec = fun(omega) >>> ? ?omegainv = np.dot(evec, (1/ev * evec).T) >>> ? ?omegainv2 = np.linalg.inv(omega) >>> ? ?omegacomp = np.dot(evec, (ev * evec).T) >>> ? ?print 'composition', >>> ? ?print np.max(np.abs(omegacomp - omega)) >>> ? ?print 'inverse', >>> ? ?print np.max(np.abs(omegainv - omegainv2)) >>> >>> this prints: >>> >>> numpy.linalg.linalg >>> composition 0.405241032278 >>> inverse 0.405241032278 >>> >>> numpy.linalg.linalg >>> composition 3.5527136788e-015 >>> inverse 7.21644966006e-016 >>> >>> scipy.linalg.decomp >>> composition 0.238386662463 >>> inverse 0.238386662463 >>> >>> scipy.linalg.decomp >>> composition 3.99680288865e-015 >>> inverse 4.99600361081e-016 >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From warren.weckesser at enthought.com Fri Jan 15 12:24:10 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 15 Jan 2010 11:24:10 -0600 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? In-Reply-To: <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> References: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> Message-ID: <4B50A4BA.6050709@enthought.com> For the case where all the eigenvalues are simple, this works for me: In [1]: import numpy as np In [2]: a = np.array([[1.0, 2.0, 3.0],[2.0, 3.0, 0.0], [3.0, 0.0, 4.0]]) In [3]: eval, evec = np.linalg.eig(a) In [4]: eval Out[4]: array([-1.51690942, 6.24391817, 3.27299125]) In [5]: a2 = np.dot(evec, eval[:,np.newaxis] * evec.T) In [6]: np.allclose(a, a2) Out[6]: True In [7]: Warren josef.pktd at gmail.com wrote: > On Fri, Jan 15, 2010 at 11:32 AM, Sebastian Walter > wrote: > >> numpy.linalg.eig guarantees to return right eigenvectors. >> evec is not necessarily an orthonormal matrix when there are >> eigenvalues with multiplicity >1. >> For symmetrical matrices you'll have mutually orthogonal eigenspaces >> but each eigenspace might be spanned by >> vectors that are not orthogonal to each other. >> >> Your omega has eigenvalue 1 with multiplicity 3. >> > > Yes, I thought about the multiplicity. However, even for random > symmetric matrices, I don't get the result > I change the example matrix to > omega0 = np.random.randn(20,8) > omega = np.dot(omega0.T, omega0) > print np.max(np.abs(omega == omega.T)) > > I have been playing with left and right eigenvectors, but I cannot > figure out how I could compose my original matrix with them either. > > I checked with wikipedia, to make sure I remember my (basic) linear algebra > http://en.wikipedia.org/wiki/Eigendecomposition_(matrix)#Symmetric_matrices > > The left and right eigenvectors are almost orthogonal > ev, evecl, evecr = sp.linalg.eig(omega, left=1, right=1) > >>>> np.abs(np.dot(evecl.T, evecl) - np.eye(8))>1e-10 >>>> np.abs(np.dot(evecr.T, evecr) - np.eye(8))>1e-10 >>>> > > shows three non-orthogonal pairs > > >>>> ev >>>> > array([ 6.27688862, 8.45055356, 15.03789945, 19.55477818, > 20.33315408, 24.58589363, 28.71796764, 42.88603728]) > > > I always thought eigenvectors are always orthogonal, at least in the > case without multiple roots > > I had assumed that eig will treat symmetric matrices in the same way as eigh. > Since I'm mostly or always working with symmetric matrices, I will > stick to eigh which does what I expect. > > Still, I'm currently not able to reproduce any of the composition > result on the wikipedia page with linalg.eig which is puzzling. > > Josef > > >> >> >> On Fri, Jan 15, 2010 at 4:31 PM, wrote: >> >>> I had a problem because linal.eig doesn't rebuild the original matrix, >>> linalg.eigh does, see script below >>> >>> Whats the trick with linalg.eig to get the original (or the inverse) >>> back ? None of my variations on the formulas worked. >>> >>> Thanks, >>> Josef >>> >>> >>> import numpy as np >>> import scipy as sp >>> import scipy.linalg >>> >>> omega = np.array([[ 6., 2., 2., 0., 0., 3., 0., 0.], >>> [ 2., 6., 2., 3., 0., 0., 3., 0.], >>> [ 2., 2., 6., 0., 3., 0., 0., 3.], >>> [ 0., 3., 0., 6., 2., 0., 3., 0.], >>> [ 0., 0., 3., 2., 6., 0., 0., 3.], >>> [ 3., 0., 0., 0., 0., 6., 2., 2.], >>> [ 0., 3., 0., 3., 0., 2., 6., 2.], >>> [ 0., 0., 3., 0., 3., 2., 2., 6.]]) >>> >>> for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: >>> print fun.__module__, fun >>> ev, evec = fun(omega) >>> omegainv = np.dot(evec, (1/ev * evec).T) >>> omegainv2 = np.linalg.inv(omega) >>> omegacomp = np.dot(evec, (ev * evec).T) >>> print 'composition', >>> print np.max(np.abs(omegacomp - omega)) >>> print 'inverse', >>> print np.max(np.abs(omegainv - omegainv2)) >>> >>> this prints: >>> >>> numpy.linalg.linalg >>> composition 0.405241032278 >>> inverse 0.405241032278 >>> >>> numpy.linalg.linalg >>> composition 3.5527136788e-015 >>> inverse 7.21644966006e-016 >>> >>> scipy.linalg.decomp >>> composition 0.238386662463 >>> inverse 0.238386662463 >>> >>> scipy.linalg.decomp >>> composition 3.99680288865e-015 >>> inverse 4.99600361081e-016 >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 15 12:45:18 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 Jan 2010 12:45:18 -0500 Subject: [Numpy-discussion] linalg.eig getting the original matrix back ? In-Reply-To: <4B50A4BA.6050709@enthought.com> References: <1cd32cbb1001150731i48768f87uf89dd17721a9e980@mail.gmail.com> <1cd32cbb1001150907g7ef4e8e1xd8be844a241b748e@mail.gmail.com> <4B50A4BA.6050709@enthought.com> Message-ID: <1cd32cbb1001150945q787b68cayb3b2f126aae3d394@mail.gmail.com> On Fri, Jan 15, 2010 at 12:24 PM, Warren Weckesser wrote: > For the case where all the eigenvalues are simple, this works for me: > > In [1]: import numpy as np > > In [2]: a = np.array([[1.0, 2.0, 3.0],[2.0, 3.0, 0.0], [3.0, 0.0, 4.0]]) > > In [3]: eval, evec = np.linalg.eig(a) > > In [4]: eval > Out[4]: array([-1.51690942, ?6.24391817, ?3.27299125]) > > In [5]: a2 = np.dot(evec, eval[:,np.newaxis] * evec.T) > > In [6]: np.allclose(a, a2) > Out[6]: True > Thanks, I thought I had tried similar versions, but I guess not with the matrix without multiplicity of eigenvalues >>> np.max(np.abs(np.dot(evecl, (ev * evecl).T)-omega)) 3.6415315207705135e-014 >>> np.max(np.abs(np.dot(evecr, (ev * evecr).T)-omega)) 2.6256774532384952e-014 >>> np.max(np.abs(np.dot(evecl, np.dot(np.diag(ev), evecl.T))-omega)) 3.6415315207705135e-014 So, the my confusion was just because eig doesn't treat multiple eigenvalues in the same way as eigh. Josef > > > Warren > > > > josef.pktd at gmail.com wrote: >> On Fri, Jan 15, 2010 at 11:32 AM, Sebastian Walter >> wrote: >> >>> numpy.linalg.eig guarantees to return right eigenvectors. >>> evec is not necessarily an orthonormal matrix when there are >>> eigenvalues with multiplicity >1. >>> For symmetrical matrices you'll have mutually orthogonal eigenspaces >>> but each eigenspace might be spanned by >>> vectors that are not orthogonal to each other. >>> >>> Your omega has eigenvalue 1 with multiplicity 3. >>> >> >> Yes, I thought about the multiplicity. However, even for random >> symmetric matrices, I don't get the result >> I change the example matrix to >> omega0 = np.random.randn(20,8) >> omega = np.dot(omega0.T, omega0) >> print np.max(np.abs(omega == omega.T)) >> >> I have been playing with left and right eigenvectors, but I cannot >> figure out how I could compose my original matrix with them either. >> >> I checked with wikipedia, to make sure I remember my (basic) linear algebra >> http://en.wikipedia.org/wiki/Eigendecomposition_(matrix)#Symmetric_matrices >> >> The left and right eigenvectors are almost orthogonal >> ev, evecl, evecr = sp.linalg.eig(omega, left=1, right=1) >> >>>>> np.abs(np.dot(evecl.T, evecl) - np.eye(8))>1e-10 >>>>> np.abs(np.dot(evecr.T, evecr) - np.eye(8))>1e-10 >>>>> >> >> shows three non-orthogonal pairs >> >> >>>>> ev >>>>> >> array([ ?6.27688862, ? 8.45055356, ?15.03789945, ?19.55477818, >> ? ? ? ? 20.33315408, ?24.58589363, ?28.71796764, ?42.88603728]) >> >> >> I always thought eigenvectors are always orthogonal, at least in the >> case without multiple roots >> >> I had assumed that eig will treat symmetric matrices in the same way as eigh. >> Since I'm mostly or always working with symmetric matrices, I will >> stick to eigh which does what I expect. >> >> Still, I'm currently not able to reproduce any of the composition >> result on the wikipedia page with linalg.eig which is puzzling. >> >> Josef >> >> >>> >>> >>> On Fri, Jan 15, 2010 at 4:31 PM, ? wrote: >>> >>>> I had a problem because linal.eig doesn't rebuild the original matrix, >>>> linalg.eigh does, see script below >>>> >>>> Whats the trick with linalg.eig to get the original (or the inverse) >>>> back ? None of my variations on the formulas worked. >>>> >>>> Thanks, >>>> Josef >>>> >>>> >>>> import numpy as np >>>> import scipy as sp >>>> import scipy.linalg >>>> >>>> omega = ?np.array([[ 6., ?2., ?2., ?0., ?0., ?3., ?0., ?0.], >>>> ? ? ? ? ? ? ? ? ? [ 2., ?6., ?2., ?3., ?0., ?0., ?3., ?0.], >>>> ? ? ? ? ? ? ? ? ? [ 2., ?2., ?6., ?0., ?3., ?0., ?0., ?3.], >>>> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?6., ?2., ?0., ?3., ?0.], >>>> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?2., ?6., ?0., ?0., ?3.], >>>> ? ? ? ? ? ? ? ? ? [ 3., ?0., ?0., ?0., ?0., ?6., ?2., ?2.], >>>> ? ? ? ? ? ? ? ? ? [ 0., ?3., ?0., ?3., ?0., ?2., ?6., ?2.], >>>> ? ? ? ? ? ? ? ? ? [ 0., ?0., ?3., ?0., ?3., ?2., ?2., ?6.]]) >>>> >>>> for fun in [np.linalg.eig, np.linalg.eigh, sp.linalg.eig, sp.linalg.eigh]: >>>> ? ?print fun.__module__, fun >>>> ? ?ev, evec = fun(omega) >>>> ? ?omegainv = np.dot(evec, (1/ev * evec).T) >>>> ? ?omegainv2 = np.linalg.inv(omega) >>>> ? ?omegacomp = np.dot(evec, (ev * evec).T) >>>> ? ?print 'composition', >>>> ? ?print np.max(np.abs(omegacomp - omega)) >>>> ? ?print 'inverse', >>>> ? ?print np.max(np.abs(omegainv - omegainv2)) >>>> >>>> this prints: >>>> >>>> numpy.linalg.linalg >>>> composition 0.405241032278 >>>> inverse 0.405241032278 >>>> >>>> numpy.linalg.linalg >>>> composition 3.5527136788e-015 >>>> inverse 7.21644966006e-016 >>>> >>>> scipy.linalg.decomp >>>> composition 0.238386662463 >>>> inverse 0.238386662463 >>>> >>>> scipy.linalg.decomp >>>> composition 3.99680288865e-015 >>>> inverse 4.99600361081e-016 >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From nwagner at iam.uni-stuttgart.de Fri Jan 15 13:19:41 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 15 Jan 2010 19:19:41 +0100 Subject: [Numpy-discussion] ipython Message-ID: Hi all, I tried to install ipython via bzr If I run iypthon I get ipython Traceback (most recent call last): File "/home/nwagner/local/bin/ipython", line 4, in from IPython.core.ipapp import launch_new_instance ImportError: No module named ipapp Any idea ? Nils From pav at iki.fi Fri Jan 15 14:08:28 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 15 Jan 2010 21:08:28 +0200 Subject: [Numpy-discussion] ipython In-Reply-To: References: Message-ID: <1263582507.5902.1.camel@idol> pe, 2010-01-15 kello 19:19 +0100, Nils Wagner kirjoitti: [clip: issue with ipython] > Any idea ? Maybe people on the Ipython mailing list would know best? http://mail.scipy.org/mailman/listinfo/ipython-user http://mail.scipy.org/mailman/listinfo/ipython-dev -- Pauli Virtanen From millman at berkeley.edu Fri Jan 15 14:12:56 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Fri, 15 Jan 2010 11:12:56 -0800 Subject: [Numpy-discussion] broken links on http://numpy.scipy.org/ In-Reply-To: References: Message-ID: On Fri, Jan 15, 2010 at 1:38 AM, Sebastian Haase wrote: > Apparently this very nice looking icons (4 of the 5 icons or so) > at http://numpy.scipy.org/ are broken links. Fixed. Thanks, -- Jarrod Millman Helen Wills Neuroscience Institute 10 Giannini Hall, UC Berkeley http://cirl.berkeley.edu/ From sierra_mtnview at sbcglobal.net Fri Jan 15 18:06:35 2010 From: sierra_mtnview at sbcglobal.net (Wayne Watson) Date: Fri, 15 Jan 2010 15:06:35 -0800 Subject: [Numpy-discussion] Percentiles and Box Plots Message-ID: <4B50F4FB.7000904@sbcglobal.net> I have from about 90 to 600 points of different data sets that I would like to find the 10th and 90th percentile for. Does numpy have a function for that, or any other percentile points? Is there a method for getting at the Box Plot quartiles, and ranges. I think that's the simplest set for Box plots. I don't need to draw anything. -- Wayne Watson (Watson Adventures, Prop., Nevada City, CA) (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39? 15' 7" N, 121? 2' 32" W, 2700 feet "I was thinking about how people seem to read the Bible a whole lot more as they get older; then it dawned on me . . they're cramming for their final exam." -- George Carlin Web Page: From robert.kern at gmail.com Fri Jan 15 18:10:21 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Jan 2010 17:10:21 -0600 Subject: [Numpy-discussion] Percentiles and Box Plots In-Reply-To: <4B50F4FB.7000904@sbcglobal.net> References: <4B50F4FB.7000904@sbcglobal.net> Message-ID: <3d375d731001151510v5ff1a3edrbaff28984afcfa0b@mail.gmail.com> On Fri, Jan 15, 2010 at 17:06, Wayne Watson wrote: > I have from about 90 to 600 points of different data sets that I would > like to find the 10th and 90th percentile for. Does numpy have a > function for that, or any other percentile points? ?Is there a method > for getting at the Box Plot quartiles, and ranges. I think that's the > simplest set for Box plots. I don't need to draw anything. Use scipy.stats.scoreatpercentile() -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sierra_mtnview at sbcglobal.net Fri Jan 15 21:48:16 2010 From: sierra_mtnview at sbcglobal.net (Wayne Watson) Date: Fri, 15 Jan 2010 18:48:16 -0800 Subject: [Numpy-discussion] Percentiles and Box Plots In-Reply-To: <3d375d731001151510v5ff1a3edrbaff28984afcfa0b@mail.gmail.com> References: <4B50F4FB.7000904@sbcglobal.net> <3d375d731001151510v5ff1a3edrbaff28984afcfa0b@mail.gmail.com> Message-ID: <4B5128F0.8060203@sbcglobal.net> An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jan 15 21:56:32 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 15 Jan 2010 18:56:32 -0800 Subject: [Numpy-discussion] Percentiles and Box Plots In-Reply-To: <4B5128F0.8060203@sbcglobal.net> References: <4B50F4FB.7000904@sbcglobal.net> <3d375d731001151510v5ff1a3edrbaff28984afcfa0b@mail.gmail.com> <4B5128F0.8060203@sbcglobal.net> Message-ID: On Fri, Jan 15, 2010 at 6:48 PM, Wayne Watson wrote: > Thanks. I'll give it a try. Is this something fairly new? From http://projects.scipy.org/scipy/search?q=scoreatpercentile it looks like it has been there a few years. But what percentile is a few years? From charlesr.harris at gmail.com Fri Jan 15 23:41:11 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Jan 2010 21:41:11 -0700 Subject: [Numpy-discussion] Buildbot down Message-ID: Hi All, The numpy buildbot has been down for a while now, or maybe I just missed some time when it was up. Is this a known problem? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 16 00:12:21 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Jan 2010 22:12:21 -0700 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: On Fri, Jan 15, 2010 at 8:56 AM, David Cournapeau wrote: > On Fri, Jan 15, 2010 at 11:46 PM, Ralf Gommers > wrote: > > Hi David, > > > > Here are some questions to get a clearer idea of exactly what's involved > in > > / required for making a release. > > > > On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau > > > wrote: > >> > >> Charles R Harris wrote: > >> > >> > > >> > > >> > What is the setup one needs to build the installers? It might be well > to > >> > document that, the dependencies, and the process. > >> > >> Right. The top script is: > >> http://projects.scipy.org/numpy/browser/trunk/release.sh > >> > >> the bulk of the work is in : > >> http://projects.scipy.org/numpy/browser/trunk/pavement.py > >> > >> which describes what is needed to build installers. On mac os x, the > >> release script may be used as is to build every installer + the release > >> notes. > >> > > > > Is it necessary to have OS X to build the dmg installer, or could you > build > > that from linux with some modifications to the build script? > > You cannot cross compile python extensions, so you have to build > installer on each platform. Mac Os X is the most practical because you > can build windows installers under wine, so you can build both mac and > windows installers from the same machine. > > The paver script + the shell script can build everything in one step > thanks to this. > > > > > How many combinations do you test manually? All supported Python versions > on > > all platforms? Several Linux flavors? > > I basically assume that linux works once the branch is stabilized, if > only because that's what most developers use. It is important to test > on the oldest supported python (2.4) and both 32 and 64 bits, though > (especially python 2.4 on 64 bits). > > I never test the installers - this is too much work manually. Ideally, > this should be done on a build/test farm. > > > For someone new to packaging, how much time would you estimate it takes > to > > do a single release? Is most of this time spent testing, or fixing the > > problems you find during testing? > > Most of the time is spent on fixing build issues which crop up during > the beta phase. I found difficult to enforce a strict policy on not > changing anything unless critical once in the beta phase. I think we > should improve things in that aspect, and go away from the "but this > is a small fix" mentality - maybe using something like for the linux > kernel, with merge windows, etc... I secretly hope that if we can > regularly change release managers, it will give a sense of why this is > good policy :) > > I feel that we have improved things quite a bit since I have started > doing releases: the binary installers are more stable, and build are > mostly automated now. The next step would be automated testing of the > binary installers (in particular testing new numpy against scipy, > etc...), but this is quite a bit of work. > > Having a stricter time-based policy would be good as well. > > > Do you have an idea about when to start preparing for the release of > 1.4.1? > > The cython thing is the most problematic bug, and should be fixed > ASAP. I am still not sure whether that would require both a numpy > 1.4.1 and a scipy 0.7.1.1 (built against numpy 1.3.0). > > Speaking of the cython thing, do you know if the last release of cython (0.12) fixes that problem? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Sat Jan 16 02:30:26 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 16 Jan 2010 08:30:26 +0100 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: <4B516B12.7070803@student.matnat.uio.no> Charles R Harris wrote: > > > On Fri, Jan 15, 2010 at 8:56 AM, David Cournapeau > wrote: > > On Fri, Jan 15, 2010 at 11:46 PM, Ralf Gommers > > > wrote: > > Hi David, > > > > Here are some questions to get a clearer idea of exactly what's > involved in > > / required for making a release. > > > > On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau > > > > wrote: > >> > >> Charles R Harris wrote: > >> > >> > > >> > > >> > What is the setup one needs to build the installers? It might > be well to > >> > document that, the dependencies, and the process. > >> > >> Right. The top script is: > >> http://projects.scipy.org/numpy/browser/trunk/release.sh > >> > >> the bulk of the work is in : > >> http://projects.scipy.org/numpy/browser/trunk/pavement.py > >> > >> which describes what is needed to build installers. On mac os x, the > >> release script may be used as is to build every installer + the > release > >> notes. > >> > > > > Is it necessary to have OS X to build the dmg installer, or could > you build > > that from linux with some modifications to the build script? > > You cannot cross compile python extensions, so you have to build > installer on each platform. Mac Os X is the most practical because you > can build windows installers under wine, so you can build both mac and > windows installers from the same machine. > > The paver script + the shell script can build everything in one step > thanks to this. > > > > > How many combinations do you test manually? All supported Python > versions on > > all platforms? Several Linux flavors? > > I basically assume that linux works once the branch is stabilized, if > only because that's what most developers use. It is important to test > on the oldest supported python (2.4) and both 32 and 64 bits, though > (especially python 2.4 on 64 bits). > > I never test the installers - this is too much work manually. Ideally, > this should be done on a build/test farm. > > > For someone new to packaging, how much time would you estimate it > takes to > > do a single release? Is most of this time spent testing, or > fixing the > > problems you find during testing? > > Most of the time is spent on fixing build issues which crop up during > the beta phase. I found difficult to enforce a strict policy on not > changing anything unless critical once in the beta phase. I think we > should improve things in that aspect, and go away from the "but this > is a small fix" mentality - maybe using something like for the linux > kernel, with merge windows, etc... I secretly hope that if we can > regularly change release managers, it will give a sense of why this is > good policy :) > > I feel that we have improved things quite a bit since I have started > doing releases: the binary installers are more stable, and build are > mostly automated now. The next step would be automated testing of the > binary installers (in particular testing new numpy against scipy, > etc...), but this is quite a bit of work. > > Having a stricter time-based policy would be good as well. > > > Do you have an idea about when to start preparing for the release > of 1.4.1? > > The cython thing is the most problematic bug, and should be fixed > ASAP. I am still not sure whether that would require both a numpy > 1.4.1 and a scipy 0.7.1.1 (built against numpy 1.3.0). > > > Speaking of the cython thing, do you know if the last release of cython > (0.12) fixes that problem? > If you people referred to this "thing" in slightly more detail, I could probably answer that :-) (If you mean the problems with building a binary for more than one version of NumPy at once, then no, I don't believe that is even fixed in trunk yet, though it is trivial to do so its just to remember to do it before 0.12.1.) -- Dag Sverre From ralf.gommers at googlemail.com Sat Jan 16 02:57:10 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 16 Jan 2010 15:57:10 +0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: On Fri, Jan 15, 2010 at 11:56 PM, David Cournapeau wrote: > > > How many combinations do you test manually? All supported Python versions > on > > all platforms? Several Linux flavors? > > > I basically assume that linux works once the branch is stabilized, if > only because that's what most developers use. It is important to test > on the oldest supported python (2.4) and both 32 and 64 bits, though > (especially python 2.4 on 64 bits). > > I never test the installers - this is too much work manually. Ideally, > this should be done on a build/test farm. > > > For someone new to packaging, how much time would you estimate it takes > to > > do a single release? Is most of this time spent testing, or fixing the > > problems you find during testing? > > Most of the time is spent on fixing build issues which crop up during > the beta phase. I found difficult to enforce a strict policy on not > changing anything unless critical once in the beta phase. I think we > should improve things in that aspect, and go away from the "but this > is a small fix" mentality - maybe using something like for the linux > kernel, with merge windows, etc... I secretly hope that if we can > regularly change release managers, it will give a sense of why this is > good policy :) > > I feel that we have improved things quite a bit since I have started > doing releases: the binary installers are more stable, and build are > mostly automated now. The next step would be automated testing of the > binary installers (in particular testing new numpy against scipy, > etc...), but this is quite a bit of work. > > Having a stricter time-based policy would be good as well. > > Thanks for the explanations. I volunteer to help as well. From working on the docs and scikits.image I am familiar with most of NumPy/SciPy, but not with the C internals. Please just tell me if you think more experience is needed for this role, or if there are better candidates. Then I'll happily work on other things. I'm using OS X, so no problem there. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Jan 16 04:01:25 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 16 Jan 2010 18:01:25 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <4B516B12.7070803@student.matnat.uio.no> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <4B516B12.7070803@student.matnat.uio.no> Message-ID: <5b8d13221001160101l78fd1dc7o488a318d6c53340d@mail.gmail.com> On Sat, Jan 16, 2010 at 4:30 PM, Dag Sverre Seljebotn wrote: > Charles R Harris wrote: >> >> >> On Fri, Jan 15, 2010 at 8:56 AM, David Cournapeau > > wrote: >> >> ? ? On Fri, Jan 15, 2010 at 11:46 PM, Ralf Gommers >> ? ? > >> ? ? wrote: >> ? ? ?> Hi David, >> ? ? ?> >> ? ? ?> Here are some questions to get a clearer idea of exactly what's >> ? ? involved in >> ? ? ?> / required for making a release. >> ? ? ?> >> ? ? ?> On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau >> ? ? > >> ? ? ?> wrote: >> ? ? ?>> >> ? ? ?>> Charles R Harris wrote: >> ? ? ?>> >> ? ? ?>> > >> ? ? ?>> > >> ? ? ?>> > What is the setup one needs to build the installers? It might >> ? ? be well to >> ? ? ?>> > document that, the dependencies, and the process. >> ? ? ?>> >> ? ? ?>> Right. The top script is: >> ? ? ?>> http://projects.scipy.org/numpy/browser/trunk/release.sh >> ? ? ?>> >> ? ? ?>> the bulk of the work is in : >> ? ? ?>> http://projects.scipy.org/numpy/browser/trunk/pavement.py >> ? ? ?>> >> ? ? ?>> which describes what is needed to build installers. On mac os x, the >> ? ? ?>> release script may be used as is to build every installer + the >> ? ? release >> ? ? ?>> notes. >> ? ? ?>> >> ? ? ?> >> ? ? ?> Is it necessary to have OS X to build the dmg installer, or could >> ? ? you build >> ? ? ?> that from linux with some modifications to the build script? >> >> ? ? You cannot cross compile python extensions, so you have to build >> ? ? installer on each platform. Mac Os X is the most practical because you >> ? ? can build windows installers under wine, so you can build both mac and >> ? ? windows installers from the same machine. >> >> ? ? The paver script + the shell script can build everything in one step >> ? ? thanks to this. >> >> ? ? ?> >> ? ? ?> How many combinations do you test manually? All supported Python >> ? ? versions on >> ? ? ?> all platforms? Several Linux flavors? >> >> ? ? I basically assume that linux works once the branch is stabilized, if >> ? ? only because that's what most developers use. It is important to test >> ? ? on the oldest supported python (2.4) and both 32 and 64 bits, though >> ? ? (especially python 2.4 on 64 bits). >> >> ? ? I never test the installers - this is too much work manually. Ideally, >> ? ? this should be done on a build/test farm. >> >> ? ? ?> For someone new to packaging, how much time would you estimate it >> ? ? takes to >> ? ? ?> do a single release? Is most of this time spent testing, or >> ? ? fixing the >> ? ? ?> problems you find during testing? >> >> ? ? Most of the time is spent on fixing build issues which crop up during >> ? ? the beta phase. I found difficult to enforce a strict policy on not >> ? ? changing anything unless critical once in the beta phase. I think we >> ? ? should improve things in that aspect, and go away from the "but this >> ? ? is a small fix" mentality - maybe using something like for the linux >> ? ? kernel, with merge windows, etc... I secretly hope that if we can >> ? ? regularly change release managers, it will give a sense of why this is >> ? ? good policy :) >> >> ? ? I feel that we have improved things quite a bit since I have started >> ? ? doing releases: the binary installers are more stable, and build are >> ? ? mostly automated now. The next step would be automated testing of the >> ? ? binary installers (in particular testing new numpy against scipy, >> ? ? etc...), but this is quite a bit of work. >> >> ? ? Having a stricter time-based policy would be good as well. >> >> ? ? ?> Do you have an idea about when to start preparing for the release >> ? ? of 1.4.1? >> >> ? ? The cython thing is the most problematic bug, and should be fixed >> ? ? ASAP. I am still not sure whether that would require both a numpy >> ? ? 1.4.1 and a scipy 0.7.1.1 (built against numpy 1.3.0). >> >> >> Speaking of the cython thing, do you know if the last release of cython >> (0.12) fixes that problem? >> > > If you people referred to this "thing" in slightly more detail, I could > probably answer that :-) > > (If you mean the problems with building a binary for more than one > version of NumPy at once, then no, I don't believe that is even fixed in > trunk yet, though it is trivial to do so its just to remember to do it > before 0.12.1.) I thought the fix 9d8b2ecef24a (on cython-devel branch) by was supposed to fix it ? I try to apply this to cython-stable, but there were some issues, and did not want to waste time on it since it may be trivial to do for someone familiar with cython internals. David From cournape at gmail.com Sat Jan 16 04:19:13 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 16 Jan 2010 18:19:13 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> On Sat, Jan 16, 2010 at 4:57 PM, Ralf Gommers wrote: > > > On Fri, Jan 15, 2010 at 11:56 PM, David Cournapeau > wrote: >> >> > How many combinations do you test manually? All supported Python >> > versions on >> > all platforms? Several Linux flavors? >> >> >> I basically assume that linux works once the branch is stabilized, if >> only because that's what most developers use. It is important to test >> on the oldest supported python (2.4) and both 32 and 64 bits, though >> (especially python 2.4 on 64 bits). >> >> I never test the installers - this is too much work manually. Ideally, >> this should be done on a build/test farm. >> >> > For someone new to packaging, how much time would you estimate it takes >> > to >> > do a single release? Is most of this time spent testing, or fixing the >> > problems you find during testing? >> >> Most of the time is spent on fixing build issues which crop up during >> the beta phase. I found difficult to enforce a strict policy on not >> changing anything unless critical once in the beta phase. I think we >> should improve things in that aspect, and go away from the "but this >> is a small fix" mentality - maybe using something like for the linux >> kernel, with merge windows, etc... I secretly hope that if we can >> regularly change release managers, it will give a sense of why this is >> good policy :) >> >> I feel that we have improved things quite a bit since I have started >> doing releases: the binary installers are more stable, and build are >> mostly automated now. The next step would be automated testing of the >> binary installers (in particular testing new numpy against scipy, >> etc...), but this is quite a bit of work. >> >> Having a stricter time-based policy would be good as well. >> > > Thanks for the explanations. I volunteer to help as well. great. > From working on > the docs and scikits.image I am familiar with most of NumPy/SciPy, but not > with the C internals. That's not a problem - I was not either when I started doing it. And there are still a lot of areas I am not familiar with. There is no need to know everything about numpy to do a good job. One thing you could start doing is trying to make the mac os x dmg from the paver script, and familiarize yourself with virtualenv if you don't know it (I use virtualenv to install numpy in a temporary directory before building the doc - this guarantees that the doc matches the exact same version of numpy as the one you are packaging). I spent some time cleaning the paver script before the 1.4.0 release, so it should hopefully be readable. cheers, David From dagss at student.matnat.uio.no Sat Jan 16 04:22:49 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 16 Jan 2010 10:22:49 +0100 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001160101l78fd1dc7o488a318d6c53340d@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <4B516B12.7070803@student.matnat.uio.no> <5b8d13221001160101l78fd1dc7o488a318d6c53340d@mail.gmail.com> Message-ID: <4B518569.1060301@student.matnat.uio.no> David Cournapeau wrote: > On Sat, Jan 16, 2010 at 4:30 PM, Dag Sverre Seljebotn > wrote: >> Charles R Harris wrote: >>> >>> On Fri, Jan 15, 2010 at 8:56 AM, David Cournapeau >> > wrote: >>> >>> On Fri, Jan 15, 2010 at 11:46 PM, Ralf Gommers >>> > >>> wrote: >>> > Hi David, >>> > >>> > Here are some questions to get a clearer idea of exactly what's >>> involved in >>> > / required for making a release. >>> > >>> > On Thu, Jan 14, 2010 at 2:34 PM, David Cournapeau >>> > >>> > wrote: >>> >> >>> >> Charles R Harris wrote: >>> >> >>> >> > >>> >> > >>> >> > What is the setup one needs to build the installers? It might >>> be well to >>> >> > document that, the dependencies, and the process. >>> >> >>> >> Right. The top script is: >>> >> http://projects.scipy.org/numpy/browser/trunk/release.sh >>> >> >>> >> the bulk of the work is in : >>> >> http://projects.scipy.org/numpy/browser/trunk/pavement.py >>> >> >>> >> which describes what is needed to build installers. On mac os x, the >>> >> release script may be used as is to build every installer + the >>> release >>> >> notes. >>> >> >>> > >>> > Is it necessary to have OS X to build the dmg installer, or could >>> you build >>> > that from linux with some modifications to the build script? >>> >>> You cannot cross compile python extensions, so you have to build >>> installer on each platform. Mac Os X is the most practical because you >>> can build windows installers under wine, so you can build both mac and >>> windows installers from the same machine. >>> >>> The paver script + the shell script can build everything in one step >>> thanks to this. >>> >>> > >>> > How many combinations do you test manually? All supported Python >>> versions on >>> > all platforms? Several Linux flavors? >>> >>> I basically assume that linux works once the branch is stabilized, if >>> only because that's what most developers use. It is important to test >>> on the oldest supported python (2.4) and both 32 and 64 bits, though >>> (especially python 2.4 on 64 bits). >>> >>> I never test the installers - this is too much work manually. Ideally, >>> this should be done on a build/test farm. >>> >>> > For someone new to packaging, how much time would you estimate it >>> takes to >>> > do a single release? Is most of this time spent testing, or >>> fixing the >>> > problems you find during testing? >>> >>> Most of the time is spent on fixing build issues which crop up during >>> the beta phase. I found difficult to enforce a strict policy on not >>> changing anything unless critical once in the beta phase. I think we >>> should improve things in that aspect, and go away from the "but this >>> is a small fix" mentality - maybe using something like for the linux >>> kernel, with merge windows, etc... I secretly hope that if we can >>> regularly change release managers, it will give a sense of why this is >>> good policy :) >>> >>> I feel that we have improved things quite a bit since I have started >>> doing releases: the binary installers are more stable, and build are >>> mostly automated now. The next step would be automated testing of the >>> binary installers (in particular testing new numpy against scipy, >>> etc...), but this is quite a bit of work. >>> >>> Having a stricter time-based policy would be good as well. >>> >>> > Do you have an idea about when to start preparing for the release >>> of 1.4.1? >>> >>> The cython thing is the most problematic bug, and should be fixed >>> ASAP. I am still not sure whether that would require both a numpy >>> 1.4.1 and a scipy 0.7.1.1 (built against numpy 1.3.0). >>> >>> >>> Speaking of the cython thing, do you know if the last release of cython >>> (0.12) fixes that problem? >>> >> If you people referred to this "thing" in slightly more detail, I could >> probably answer that :-) >> >> (If you mean the problems with building a binary for more than one >> version of NumPy at once, then no, I don't believe that is even fixed in >> trunk yet, though it is trivial to do so its just to remember to do it >> before 0.12.1.) > > I thought the fix 9d8b2ecef24a (on cython-devel branch) by was > supposed to fix it ? I try to apply this to cython-stable, but there > were some issues, and did not want to waste time on it since it may be > trivial to do for someone familiar with cython internals. Right, missed that one. Looks like it should fix it, yes. I'm guessing that cython-devel could pretty much be rolled straight into 0.12.1 now if you need it (however every release means spending time testing etc.., and I'm not volunteering at this point, so I'm not raising the point on cython-dev..). -- Dag Sverre From millman at berkeley.edu Sat Jan 16 04:40:15 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Sat, 16 Jan 2010 01:40:15 -0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: First I want to give David Cournapeau a big thank you for all his hard work as release manager for the last few years. It is a lot of work and he has done a great job managing the releases (not to mention all the work he has done as one of the primary developers). I also want to thank Patrick Marsh and Ralf Gommers for stepping up to the plate and volunteering to help with the next release. I would like to ask you both to consider committing to managing the next few releases. I believe managing the releases takes some skills, which you will develop over a few releases. It will be much better for the community and for the project if we can have some consistency over the release process. I know David is willing to help you with at least the first release and I am happy to help as well. One of the things that both David and I would really like to see is finally moving to a time-based release. Both of us had moved in the direction of a time-based release and I think David had more success than I did. Ideally I would like to see the two of you commit to two years as release managers. So if we move to a time-based release every 6 months, then you would be responsible for 4 releases. As release managers, you will be responsible for keeping an eye of the trunk and making sure that it stays in a healthy releasable state. It would be great if you could work on improving the testing infrastructure and coverage. You will need to keep a good line of communication with the developers and keep everyone focused on the release date. You will also need to help write the release notes and build the binaries. Thanks, Jarrod From ralf.gommers at googlemail.com Sat Jan 16 07:59:06 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 16 Jan 2010 20:59:06 +0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> Message-ID: On Sat, Jan 16, 2010 at 5:19 PM, David Cournapeau wrote: > On Sat, Jan 16, 2010 at 4:57 PM, Ralf Gommers > > From working on > > the docs and scikits.image I am familiar with most of NumPy/SciPy, but > not > > with the C internals. > > That's not a problem - I was not either when I started doing it. And > there are still a lot of areas I am not familiar with. There is no > need to know everything about numpy to do a good job. > Good, then I'll start learning. > > One thing you could start doing is trying to make the mac os x dmg > from the paver script, and familiarize yourself with virtualenv if you > don't know it (I use virtualenv to install numpy in a temporary > directory before building the doc - this guarantees that the doc > matches the exact same version of numpy as the one you are packaging). > I spent some time cleaning the paver script before the 1.4.0 release, > so it should hopefully be readable. > > The paver script is indeed very readable. I'm familiar with virtualenv, and just tried building a dmg. So here's my first question: You use virtualenv with the --no-site-packages option. This means a whole bunch of stuff already present on my machine has to be downloaded again when building the docs (sphinx, numpydoc, pygments, jinja, etc). You get Sphinx 0.6.4 when you need 1.0 (I think), and doc generation fails because MPL can not be found. What am I missing here? And what is the problem with using site-packages? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Jan 16 08:39:45 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 16 Jan 2010 21:39:45 +0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: On Sat, Jan 16, 2010 at 5:40 PM, Jarrod Millman wrote: > First I want to give David Cournapeau a big thank you for all his hard > work as release manager for the last few years. It is a lot of work > and he has done a great job managing the releases (not to mention all > the work he has done as one of the primary developers). > > I also want to thank Patrick Marsh and Ralf Gommers for stepping up to > the plate and volunteering to help with the next release. I would > like to ask you both to consider committing to managing the next few > releases. I believe managing the releases takes some skills, which > you will develop over a few releases. It will be much better for the > community and for the project if we can have some consistency over the > release process. That makes sense. I expect there's a lot to learn, so I intend to do this for a longer time. I know David is willing to help you with at least > the first release and I am happy to help as well. > That's good to hear. > One of the things that both David and I would really like to see is > finally moving to a time-based release. Both of us had moved in the > direction of a time-based release and I think David had more success > than I did. > > Ideally I would like to see the two of you commit to two years as > release managers. So if we move to a time-based release every 6 > months, then you would be responsible for 4 releases. > I can commit to that. This is 4 increments of X in the 1.X.Y version scheme right? So with bug fix releases there should be more than 4 in two years. Then there's also SciPy to think about of course. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Jan 16 09:17:38 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 16 Jan 2010 23:17:38 +0900 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> Message-ID: <5b8d13221001160617o6c8bb2b4ye8d382d8408bb6cf@mail.gmail.com> On Sat, Jan 16, 2010 at 9:59 PM, Ralf Gommers wrote: > > > On Sat, Jan 16, 2010 at 5:19 PM, David Cournapeau > wrote: >> >> On Sat, Jan 16, 2010 at 4:57 PM, Ralf Gommers >> > From working on >> > the docs and scikits.image I am familiar with most of NumPy/SciPy, but >> > not >> > with the C internals. >> >> That's not a problem - I was not either when I started doing it. And >> there are still a lot of areas I am not familiar with. There is no >> need to know everything about numpy to do a good job. > > Good, then I'll start learning. >> >> One thing you could start doing is trying to make the mac os x dmg >> from the paver script, and familiarize yourself with virtualenv if you >> don't know it (I use virtualenv to install numpy in a temporary >> directory before building the doc - this guarantees that the doc >> matches the exact same version of numpy as the one you are packaging). >> I spent some time cleaning the paver script before the 1.4.0 release, >> so it should hopefully be readable. >> > > The paver script is indeed very readable. I'm familiar with virtualenv, and > just tried building a dmg. So here's my first question: > You use virtualenv with the --no-site-packages option. This means a whole > bunch of stuff already present on my machine has to be downloaded again when > building the docs (sphinx, numpydoc, pygments, jinja, etc). Yes, that's because I want to be sure to get the wanted version for sphinx. It may not matter much anymore, but before, there were a lot of instabilities between sphinx/numpydoc/matplotlib extensions changes. > You get Sphinx > 0.6.4 when you need 1.0 (I think), and doc generation fails because MPL can > not be found. You need matplotlib to build numpy doc I think - at least it used to be the case. Maybe it is not needed anymore. As I have matplotlib installed on my machine anyway, this has never been an issue for me. David From ralf.gommers at googlemail.com Sat Jan 16 09:35:07 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 16 Jan 2010 22:35:07 +0800 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: <5b8d13221001160617o6c8bb2b4ye8d382d8408bb6cf@mail.gmail.com> References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> <5b8d13221001160617o6c8bb2b4ye8d382d8408bb6cf@mail.gmail.com> Message-ID: On Sat, Jan 16, 2010 at 10:17 PM, David Cournapeau wrote: > On Sat, Jan 16, 2010 at 9:59 PM, Ralf Gommers > > You get Sphinx > > 0.6.4 when you need 1.0 (I think), and doc generation fails because MPL > can > > not be found. > > You need matplotlib to build numpy doc I think - at least it used to > be the case. Maybe it is not needed anymore. As I have matplotlib > installed on my machine anyway, this has never been an issue for me. > > Yes MPL is needed. I have it installed and can build the docs manually. It is just that inside the 'bootstrap' virtualenv MPL import fails because of the --no-site-packages. Maybe it works for you because your MPL install dir is on your PYTHONPATH? Anyway, I think doc building got more robust, so I'll try to modify the paver script to use my site-packages. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From resurgo at gmail.com Sat Jan 16 09:37:13 2010 From: resurgo at gmail.com (Peter Clarke) Date: Sat, 16 Jan 2010 14:37:13 +0000 Subject: [Numpy-discussion] Python coders for Haiti disaster relief Message-ID: Apologies for off topic posting but I think this in an important project. Python programmers are required immediately for assistance in coding a disaster management framework for the Earthquake in Haiti. >From http://wiki.python.org/moin/VolunteerOpportunities: ----------------- URGENT REQUEST, Sahana Disaster Management System, Haiti Earthquake *Job Description*:This is an urgent call for experienced Python programmers to help in the Sahana Disaster Management System immediately - knowledge of Web2Py platform would be best. The Sahana Disaster Management System is used to coordinate relief efforts. Please recruit any available programmers for the Haiti effort as quickly as possible and have them contact me immediately so that I can put them in touch with the correct people. Thank you kindly and I do hope that we can quickly identify some contributors for this monumental effort - they are needed ASAP. http://sahanapy.org/ is the developer site and the demo is http://demo.sahanapy.org/ - *Contact*: Connie White, PhD, Institute for Emergency Preparedness, Jacksonville State University - *E-mail contact*: connie.m.white at gmail.com - *Web*: http://sahanapy.org/ ----------------------------- Please help if you can. -Peter Clarke -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 16 10:49:06 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 16 Jan 2010 08:49:06 -0700 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> <5b8d13221001160119j3bde1c2fr78aad9c1b3670799@mail.gmail.com> <5b8d13221001160617o6c8bb2b4ye8d382d8408bb6cf@mail.gmail.com> Message-ID: On Sat, Jan 16, 2010 at 7:35 AM, Ralf Gommers wrote: > > > On Sat, Jan 16, 2010 at 10:17 PM, David Cournapeau wrote: > >> On Sat, Jan 16, 2010 at 9:59 PM, Ralf Gommers >> > You get Sphinx >> > 0.6.4 when you need 1.0 (I think), and doc generation fails because MPL >> can >> > not be found. >> >> You need matplotlib to build numpy doc I think - at least it used to >> be the case. Maybe it is not needed anymore. As I have matplotlib >> installed on my machine anyway, this has never been an issue for me. >> >> Yes MPL is needed. I have it installed and can build the docs manually. It > is just that inside the 'bootstrap' virtualenv MPL import fails because of > the --no-site-packages. Maybe it works for you because your MPL install dir > is on your PYTHONPATH? > > Anyway, I think doc building got more robust, so I'll try to modify the > paver script to use my site-packages. > > It would be useful to document the build environment and how you set it up, along with any version dependencies you uncover. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmarshwx at gmail.com Sat Jan 16 11:45:21 2010 From: patrickmarshwx at gmail.com (Patrick Marsh) Date: Sat, 16 Jan 2010 10:45:21 -0600 Subject: [Numpy-discussion] Wanted: new release manager for 1.5 and above In-Reply-To: References: <5b8d13221001132102k4f19ee45td6b54bd5df3c8578@mail.gmail.com> <4B4EBAE2.7040703@silveregg.co.jp> <5b8d13221001150756g32436cf4i49084207d041125a@mail.gmail.com> Message-ID: I think all of this makes perfect sense, and I'm willing to commit to at least the next 2 years. I'm glad to hear there will be some help initially, as I know I will need some spinup time/help. I'm currently out of town and away from my macbook pro, but when I get back I'll try to set up a build environment on it. In the mean time I'll work on setting up a build environment on my windows machine as practice. I look forward to working more closely with everyone. It's about time I started giving back! Thanks, Patrick On Sat, Jan 16, 2010 at 3:40 AM, Jarrod Millman wrote: > First I want to give David Cournapeau a big thank you for all his hard > work as release manager for the last few years. It is a lot of work > and he has done a great job managing the releases (not to mention all > the work he has done as one of the primary developers). > > I also want to thank Patrick Marsh and Ralf Gommers for stepping up to > the plate and volunteering to help with the next release. I would > like to ask you both to consider committing to managing the next few > releases. I believe managing the releases takes some skills, which > you will develop over a few releases. It will be much better for the > community and for the project if we can have some consistency over the > release process. I know David is willing to help you with at least > the first release and I am happy to help as well. > > One of the things that both David and I would really like to see is > finally moving to a time-based release. Both of us had moved in the > direction of a time-based release and I think David had more success > than I did. > > Ideally I would like to see the two of you commit to two years as > release managers. So if we move to a time-based release every 6 > months, then you would be responsible for 4 releases. > > As release managers, you will be responsible for keeping an eye of the > trunk and making sure that it stays in a healthy releasable state. It > would be great if you could work on improving the testing > infrastructure and coverage. You will need to keep a good line of > communication with the developers and keep everyone focused on the > release date. You will also need to help write the release notes and > build the binaries. > > Thanks, > Jarrod > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Patrick Marsh Ph.D. Student / Graduate Research Assistant School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Sat Jan 16 14:12:04 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Sat, 16 Jan 2010 13:12:04 -0600 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? Message-ID: My questions here concern those familiar with configure/build/install systems such as distutils, setuptools, scons/numscons or waf (particularly David Cournapeau). I'm creating a tool known as 'fwrap' that has a component that needs to do essentially what f2py does now -- take fortran source code and compile it into a python extension module. It uses Cython to create the extension module, and the current configure/build/install system is a very kludgy monkeypatched Cython.distutils and numpy.distutils setup.py script. The setup.py script works for testing on my system here, but for going prime time, I dread using it. David has made his critiques of distutils known for scientific software, and I agree. What's the best alternative? More specifically: what are the pros/cons between waf and scons/numscons for configure/build/install of a Fortran-C-Cython-Python project? Is scons capable of handling the configure and install stages, or is it only a build system? As I understand it, numscons is called from distutils; distutils handles the configure/install stages. Scons/numscons have more fortran support that waf, from what I can see. The main downside of using scons is that I'd still have to mess around with distutils. It looks like waf has explicit support for all three stages, and could be just what I'm looking for. David has a few threads on the waf-users list about getting fortran working with waf. Has that progressed much? I want to contribute to this, for the benefit of scipy and my project, and to limit duplicated work. From what I gather, the fortran configuration stuff in numscons is separated nicely from the scon-specific stuff :-) Would it be a matter of porting the numscons fortran stuff into waf? Any comments you have on using waf/scons for numerical projects would be welcome! Kurt From matthieu.brucher at gmail.com Sat Jan 16 14:57:57 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 16 Jan 2010 20:57:57 +0100 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: References: Message-ID: Hi, SCons can also do configuration and installation steps. David made it possible to use SCons capabilities from distutils, but you can still make a C/Fortran/Cython/Python project with SCons. Matthieu 2010/1/16 Kurt Smith : > My questions here concern those familiar with configure/build/install > systems such as distutils, setuptools, scons/numscons or waf > (particularly David Cournapeau). > > I'm creating a tool known as 'fwrap' that has a component that needs > to do essentially what f2py does now -- take fortran source code and > compile it into a python extension module. ?It uses Cython to create > the extension module, and the current configure/build/install system > is a very kludgy monkeypatched Cython.distutils and numpy.distutils > setup.py script. ?The setup.py script works for testing on my system > here, but for going prime time, I dread using it. ?David has made his > critiques of distutils known for scientific software, and I agree. > What's the best alternative? > > More specifically: what are the pros/cons between waf and > scons/numscons for configure/build/install of a > Fortran-C-Cython-Python project? > > Is scons capable of handling the configure and install stages, or is > it only a build system? ?As I understand it, numscons is called from > distutils; distutils handles the configure/install stages. > Scons/numscons have more fortran support that waf, from what I can > see. ?The main downside of using scons is that I'd still have to mess > around with distutils. > > It looks like waf has explicit support for all three stages, and could > be just what I'm looking for. ?David has a few threads on the > waf-users list about getting fortran working with waf. ?Has that > progressed much? ?I want to contribute to this, for the benefit of > scipy and my project, and to limit duplicated work. ?From what I > gather, the fortran configuration stuff in numscons is separated > nicely from the scon-specific stuff :-) ?Would it be a matter of > porting the numscons fortran stuff into waf? > > Any comments you have on using waf/scons for numerical projects would > be welcome! > > Kurt > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From dagss at student.matnat.uio.no Sat Jan 16 15:38:22 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 16 Jan 2010 21:38:22 +0100 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: References: Message-ID: <4B5223BE.7070101@student.matnat.uio.no> Kurt Smith wrote: > My questions here concern those familiar with configure/build/install > systems such as distutils, setuptools, scons/numscons or waf > (particularly David Cournapeau). > > I'm creating a tool known as 'fwrap' that has a component that needs > to do essentially what f2py does now -- take fortran source code and > compile it into a python extension module. It uses Cython to create > the extension module, and the current configure/build/install system > is a very kludgy monkeypatched Cython.distutils and numpy.distutils > setup.py script. The setup.py script works for testing on my system > here, but for going prime time, I dread using it. David has made his > critiques of distutils known for scientific software, and I agree. > What's the best alternative? > > More specifically: what are the pros/cons between waf and > scons/numscons for configure/build/install of a > Fortran-C-Cython-Python project? > > Is scons capable of handling the configure and install stages, or is > it only a build system? As I understand it, numscons is called from > distutils; distutils handles the configure/install stages. > Scons/numscons have more fortran support that waf, from what I can > see. The main downside of using scons is that I'd still have to mess > around with distutils. > Not that I really know anything about it, but note that one of the purposes of David's toydist is to handle the install stage independently of the build system used. That is, it is able to create e.g. Python eggs without using setuptools. The thing is, installing Python software is something of a mess, and every system would want this done differently (making an Ubuntu package, creating a DMG, or creating a Python egg are all different things). So I think it makes sense to decouple this from the build in the tools that are used. Of course, toydist is beta, and I dare say you have enough beta dependencies for fwrap already :-) Dag Sverre > It looks like waf has explicit support for all three stages, and could > be just what I'm looking for. David has a few threads on the > waf-users list about getting fortran working with waf. Has that > progressed much? I want to contribute to this, for the benefit of > scipy and my project, and to limit duplicated work. From what I > gather, the fortran configuration stuff in numscons is separated > nicely from the scon-specific stuff :-) Would it be a matter of > porting the numscons fortran stuff into waf? > > Any comments you have on using waf/scons for numerical projects would > be welcome! > > Kurt > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwmsmith at gmail.com Sat Jan 16 16:28:32 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Sat, 16 Jan 2010 15:28:32 -0600 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: <4B5223BE.7070101@student.matnat.uio.no> References: <4B5223BE.7070101@student.matnat.uio.no> Message-ID: On Sat, Jan 16, 2010 at 2:38 PM, Dag Sverre Seljebotn wrote: > Not that I really know anything about it, but note that one of the > purposes of David's toydist is to handle the install stage independently > of the build system used. That is, it is able to create e.g. Python eggs > without using setuptools. > > The thing is, installing Python software is something of a mess, and > every system would want this done differently (making an Ubuntu package, > creating a DMG, or creating a Python egg are all different things). So I > think it makes sense to decouple this from the build in the tools that > are used. Yep. Good points. I expect once I get the configure/build stages in a working state, I'll have most of what people need. The install stage is less crucial, at least for the first version. Seems like people would like the system to just create a .so file in the current directory, and leave it at that. If I can get that working on all platforms I'll be very happy :-) > > Of course, toydist is beta, and I dare say you have enough beta > dependencies for fwrap already :-) :-) Hopefully that can be remedied that in the coming months, at least from the fparser and memoryview-support-in-Cython side of things. > > Dag Sverre From dagss at student.matnat.uio.no Sat Jan 16 16:43:53 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 16 Jan 2010 22:43:53 +0100 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: References: <4B5223BE.7070101@student.matnat.uio.no> Message-ID: <4B523319.8020603@student.matnat.uio.no> Kurt Smith wrote: > On Sat, Jan 16, 2010 at 2:38 PM, Dag Sverre Seljebotn > wrote: > > >> Not that I really know anything about it, but note that one of the >> purposes of David's toydist is to handle the install stage independently >> of the build system used. That is, it is able to create e.g. Python eggs >> without using setuptools. >> >> The thing is, installing Python software is something of a mess, and >> every system would want this done differently (making an Ubuntu package, >> creating a DMG, or creating a Python egg are all different things). So I >> think it makes sense to decouple this from the build in the tools that >> are used. >> > > Yep. Good points. I expect once I get the configure/build stages in > a working state, I'll have most of what people need. The install > stage is less crucial, at least for the first version. Seems like > people would like the system to just create a .so file in the current > directory, and leave it at that. If I can get that working on all > platforms I'll be very happy :-) > > >> Of course, toydist is beta, and I dare say you have enough beta >> dependencies for fwrap already :-) >> > > :-) > > Hopefully that can be remedied that in the coming months, at least > from the fparser and memoryview-support-in-Cython side of things. > Obviously I didn't get around to that yet... As for the build systems, some things to consider (I have no clue myself as to waf vs. scons): - There's already primitive scons support for Cython, but I'm sure it wouldn't be hard to add to waf - Whatever you pick is likely to become the best supported build system for Cython code in the future, I think, due to our interest in working on it - Does waf have infrastructure for parsing files and finding dependencies? I know that in Scons one can plug in a "Cython parser", which checks the dependencies (which pxds are used, basically), so that pyx files are rebuilt automatically when pxds they depend on change. I'm sure waf supports something similar, if not I'd say it disqualifies it. My own hunch is that waf looks better, but scons has a larger mind share and Cython support right now in scientific Python, and that both must be supported eventually, so why not do scons first... *shrug* But like you I'm anxious to hear from more non-Cython devs as well on this matter. Dag Sverre From ndbecker2 at gmail.com Sat Jan 16 18:26:26 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Sat, 16 Jan 2010 18:26:26 -0500 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? References: Message-ID: Matthieu Brucher wrote: > Hi, > > SCons can also do configuration and installation steps. David made it > possible to use SCons capabilities from distutils, but you can still > make a C/Fortran/Cython/Python project with SCons. > Also, while I think waf looks interesting, I've seen almost 0 projects actually using it. From josef.pktd at gmail.com Sat Jan 16 18:36:02 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 16 Jan 2010 18:36:02 -0500 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: <4B523319.8020603@student.matnat.uio.no> References: <4B5223BE.7070101@student.matnat.uio.no> <4B523319.8020603@student.matnat.uio.no> Message-ID: <1cd32cbb1001161536i2696e043j4a8fd06cca8be416@mail.gmail.com> On Sat, Jan 16, 2010 at 4:43 PM, Dag Sverre Seljebotn wrote: > Kurt Smith wrote: >> On Sat, Jan 16, 2010 at 2:38 PM, Dag Sverre Seljebotn >> wrote: >> >> >>> Not that I really know anything about it, but note that one of the >>> purposes of David's toydist is to handle the install stage independently >>> of the build system used. That is, it is able to create e.g. Python eggs >>> without using setuptools. >>> >>> The thing is, installing Python software is something of a mess, and >>> every system would want this done differently (making an Ubuntu package, >>> creating a DMG, or creating a Python egg are all different things). So I >>> think it makes sense to decouple this from the build in the tools that >>> are used. >>> >> >> Yep. ?Good points. ?I expect once I get the configure/build stages in >> a working state, I'll have most of what people need. ?The install >> stage is less crucial, at least for the first version. ?Seems like >> people would like the system to just create a .so file in the current >> directory, and leave it at that. ?If I can get that working on all >> platforms I'll be very happy :-) >> >> >>> Of course, toydist is beta, and I dare say you have enough beta >>> dependencies for fwrap already :-) >>> >> >> :-) >> >> Hopefully that can be remedied that in the coming months, at least >> from the fparser and memoryview-support-in-Cython side of things. >> > > Obviously I didn't get around to that yet... > > As for the build systems, some things to consider (I have no clue myself > as to waf vs. scons): > ?- There's already primitive scons support for Cython, but I'm sure it > wouldn't be hard to add to waf > ?- Whatever you pick is likely to become the best supported build system > for Cython code in the future, I think, due to our interest in working on it > ?- Does waf have infrastructure for parsing files and finding > dependencies? I know that in Scons one can plug in a "Cython parser", > which checks the dependencies (which pxds are used, basically), so that > pyx files are rebuilt automatically when pxds they depend on change. I'm > sure waf supports something similar, if not I'd say it disqualifies it. > > My own hunch is that waf looks better, but scons has a larger mind share > and Cython support right now in scientific Python, and that both must be > supported eventually, so why not do scons first... *shrug* > > But like you I'm anxious to hear from more non-Cython devs as well on > this matter. > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >From a very brief look at the waf book, I don't really understand what the cross-platform capabilities of waf are http://freehackers.org/~tnagy/wafbook/single.html : "Installing Waf on a system is unnecessary and discouraged: " "Operating systems: Waf cannot be installed on Windows (yet)" Josef From cournape at gmail.com Sat Jan 16 23:04:24 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 17 Jan 2010 13:04:24 +0900 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: References: Message-ID: <5b8d13221001162004w640e4854obe27a78942bb551b@mail.gmail.com> On Sun, Jan 17, 2010 at 4:12 AM, Kurt Smith wrote: > My questions here concern those familiar with configure/build/install > systems such as distutils, setuptools, scons/numscons or waf > (particularly David Cournapeau). > > I'm creating a tool known as 'fwrap' that has a component that needs > to do essentially what f2py does now -- take fortran source code and > compile it into a python extension module. ?It uses Cython to create > the extension module, and the current configure/build/install system > is a very kludgy monkeypatched Cython.distutils and numpy.distutils > setup.py script. ?The setup.py script works for testing on my system > here, but for going prime time, I dread using it. ?David has made his > critiques of distutils known for scientific software, and I agree. > What's the best alternative? The best alternative in the short term is no alternative: making sure everything you need is incorporated in numpy.distutils. Otherwise, you will have to recreate everything that distutils is doing: you will have people who will demand egg, mac os x .mpkg, windows installers, etc.. Basically what I am trying to do with toydist now - I don't mind getting help there, though :) I promised to add decent cython support in numpy.distutils for 1.5.0, maybe we should see what we can do for fwrap at the same time. I am also a bit unclear about what is needed exactly, and what would be the workflow: I don't understand why fwrap should care about packaging/deployment at all, for example. > > More specifically: what are the pros/cons between waf and > scons/numscons for configure/build/install of a > Fortran-C-Cython-Python project? waf has no fortran support whatsoever, so you would need to add it. Waf codebase is much better than scons, but there lacks some internal documentation. There were some things that I did not manage to do in waf, because the internal API for scanning/dependency injection was not very clear to me (by scanning I mean the ability to scan the source code to look for dependency, e.g. fortran modules, and by dependency injection, I mean adding new targets to the DAG of dependencies at runtime - again needed for fortran modules). Basic handling of fortran compilation and fortran detection was relatively easy in comparison. The biggest drawback I see with waf is the lack of users: the only significant project I know which uses waf is Ardour. OTOH, I believe scons has deep structural problems, and only a few people can change some significant parts of the code. > > Is scons capable of handling the configure and install stages, or is > it only a build system? ?As I understand it, numscons is called from > distutils; distutils handles the configure/install stages. Distutils only handles the installation - everything done within the build_* command is done by the scons distutils command, and configuration is done by scons as well. Scons configure mechanism is very primitive - it uses a separate framework than the rest of the tool, which means in particular that scons top notch dependency handling does not work well for the configuration stage. waf is much better in that aspect. > Scons/numscons have more fortran support that waf, from what I can > see. ?The main downside of using scons is that I'd still have to mess > around with distutils. My main point should be this: whatever you do, you will end up messing with distutils, unless you reimplement everything that distutils does, be it waf, scons, etc... In the short term, adding things to numpy.distutils is the easiest path. Long term, I hope toydist will be a tool which will enable exactly what you want: using a build system of your choice, and being able to reuse existing code for installation/packaging to avoid recreating it yourself. You will be able to create an exe/egg/pkg from a simple package representation, every package will have a common interface for the user independently of the internal build tool, etc... David From cournape at gmail.com Sat Jan 16 23:08:30 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 17 Jan 2010 13:08:30 +0900 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: <1cd32cbb1001161536i2696e043j4a8fd06cca8be416@mail.gmail.com> References: <4B5223BE.7070101@student.matnat.uio.no> <4B523319.8020603@student.matnat.uio.no> <1cd32cbb1001161536i2696e043j4a8fd06cca8be416@mail.gmail.com> Message-ID: <5b8d13221001162008w60af8028s526b87f058b43b2f@mail.gmail.com> On Sun, Jan 17, 2010 at 8:36 AM, wrote: > On Sat, Jan 16, 2010 at 4:43 PM, Dag Sverre Seljebotn > wrote: >> Kurt Smith wrote: >>> On Sat, Jan 16, 2010 at 2:38 PM, Dag Sverre Seljebotn >>> wrote: >>> >>> >>>> Not that I really know anything about it, but note that one of the >>>> purposes of David's toydist is to handle the install stage independently >>>> of the build system used. That is, it is able to create e.g. Python eggs >>>> without using setuptools. >>>> >>>> The thing is, installing Python software is something of a mess, and >>>> every system would want this done differently (making an Ubuntu package, >>>> creating a DMG, or creating a Python egg are all different things). So I >>>> think it makes sense to decouple this from the build in the tools that >>>> are used. >>>> >>> >>> Yep. ?Good points. ?I expect once I get the configure/build stages in >>> a working state, I'll have most of what people need. ?The install >>> stage is less crucial, at least for the first version. ?Seems like >>> people would like the system to just create a .so file in the current >>> directory, and leave it at that. ?If I can get that working on all >>> platforms I'll be very happy :-) >>> >>> >>>> Of course, toydist is beta, and I dare say you have enough beta >>>> dependencies for fwrap already :-) >>>> >>> >>> :-) >>> >>> Hopefully that can be remedied that in the coming months, at least >>> from the fparser and memoryview-support-in-Cython side of things. >>> >> >> Obviously I didn't get around to that yet... >> >> As for the build systems, some things to consider (I have no clue myself >> as to waf vs. scons): >> ?- There's already primitive scons support for Cython, but I'm sure it >> wouldn't be hard to add to waf >> ?- Whatever you pick is likely to become the best supported build system >> for Cython code in the future, I think, due to our interest in working on it >> ?- Does waf have infrastructure for parsing files and finding >> dependencies? I know that in Scons one can plug in a "Cython parser", >> which checks the dependencies (which pxds are used, basically), so that >> pyx files are rebuilt automatically when pxds they depend on change. I'm >> sure waf supports something similar, if not I'd say it disqualifies it. >> >> My own hunch is that waf looks better, but scons has a larger mind share >> and Cython support right now in scientific Python, and that both must be >> supported eventually, so why not do scons first... *shrug* >> >> But like you I'm anxious to hear from more non-Cython devs as well on >> this matter. >> >> Dag Sverre >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > >From a very brief look at the waf book, I don't really understand what > the cross-platform capabilities of waf are > > http://freehackers.org/~tnagy/wafbook/single.html : > "Installing Waf on a system is unnecessary and discouraged: " The main waf author claims that waf should never be installed, and always included with your package. I think it makes sense for a lot of practical cases (and that's how autotools work, mostly: autoconf/automake are not needed when building something from sources, because you have a gianting shell script called configure). I know there has been some effort toward better windows support for waf - given that waf is written in python, it is hard to see a architectural reason why waf could not work well on windows. But build tools depend a lot spawning processes and the likes both efficiently and reliably, and that's one of the area where windows and unix-like systems are fundamentally different. David From jacob.benoit.1 at gmail.com Sun Jan 17 00:20:39 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Sun, 17 Jan 2010 00:20:39 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab Message-ID: >> Hi, >> >> I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In >> their benchmark they give info that they are competitive again mkl and goto >> on matrix matrix product. They are not better, but that could make a good >> default implementation for numpy when their is no blas installed. I think >> the license would allow to include it in numpy directly. > >It is licensed under the LGPLv3, so it is not compatible with the numpy license. Hi, I'm one of Eigen's authors. Eigen is indeed LGPL3 licensed. Our intent and understanding is that this makes Eigen usable by virtually any software, whence my disappointment to learn that LGPL3 software can't be used by NumPy. Just for my information, could you tell my why NumPy can't use LGPL3-licensed libraries? I found this page: http://www.scipy.org/License_Compatibility It does say that LGPL-licensed code can't be added to NumPy, but there's a big difference between adding LGPL code directly into NumPy, and just letting NumPy _use_ LGPL code. Couldn't you simply: - either add LGPL-licensed code to a third_party subdirectory not subject to the NumPy license, and just use it? This is common practice, see e.g. how Qt puts a copy of WebKit in a third_party subdirectory. - or use LGPL-licensed code as an external dependency? FYI, several BSD-licensed projects are using Eigen ;) Thanks for your consideration Benoit From cournape at gmail.com Sun Jan 17 00:58:34 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 17 Jan 2010 14:58:34 +0900 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: Message-ID: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> On Sun, Jan 17, 2010 at 2:20 PM, Benoit Jacob wrote: > Couldn't you simply: > ?- either add LGPL-licensed code to a third_party subdirectory not > subject to the NumPy license, and just use it? This is common > practice, see e.g. how Qt puts a copy of WebKit in a third_party > subdirectory. > ?- or use LGPL-licensed code as an external dependency? There are several issues with eigen2 for NumPy usage: - using it as a default implementation does not make much sense IMHO, as it would make distributed binaries non 100 % BSD. - to my knowledge, eigen2 does not have a BLAS API, so we would have to write specific wrappers for eigen2, which is undesirable. - eigen2 is C++, and it is a stated goal to make numpy depend only on a C compiler (it may optionally uses fortran to link against blas/lapack, though). As I see it, people would be able to easily use eigen2 if there was a BLAS API for it. We still would not distribute binaries built with eigen2, but it means people who don't care about using GPL code could use it. Independently of NumPy, I think a BLAS API for eigen2 would be very beneficial for eigen2 if you care about the numerical scientific community. David From jacob.benoit.1 at gmail.com Sun Jan 17 09:52:31 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Sun, 17 Jan 2010 09:52:31 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> Message-ID: 2010/1/17 David Cournapeau : > On Sun, Jan 17, 2010 at 2:20 PM, Benoit Jacob wrote: > >> Couldn't you simply: >> ?- either add LGPL-licensed code to a third_party subdirectory not >> subject to the NumPy license, and just use it? This is common >> practice, see e.g. how Qt puts a copy of WebKit in a third_party >> subdirectory. >> ?- or use LGPL-licensed code as an external dependency? > Thanks for the reply! First of all I should say that I was only talking about the raised licensing issue, I'm not saying that you _should_ use eigen from a technical point of view. > There are several issues with eigen2 for NumPy usage: > ?- using it as a default implementation does not make much sense IMHO, > as it would make distributed binaries non 100 % BSD. But the LGPL doesn't impose restrictions on the usage of binaries, so how does it matter? The LGPL and the BSD licenses are similar as far as the binaries are concerned (unless perhaps one starts disassembling them). The big difference between LGPL and BSD is at the level of source code, not binary code: one modifies LGPL-based source code and distributes a binary form of it, then one has to release the modified source code as well. Since NumPy's users are presumably not interested in modifying _Eigen_ itself, I don't think that matters. I understand that they may want to modify NumPy's source code without releasing their modified source code, so the BSD license is important for NumPy, but having Eigen in a third_party directory wouldn't affect that. > ?- to my knowledge, eigen2 does not have a BLAS API, so we would have > to write specific wrappers for eigen2, which is undesirable. That's true. FYI, a BLAS API is coming in Eigen 3, https://bitbucket.org/eigen/eigen/src/tip/blas/ > ?- eigen2 is C++, and it is a stated goal to make numpy depend only on > a C compiler (it may optionally uses fortran to link against > blas/lapack, though). Ah OK. Well, once the Eigen BLAS is implemented, it will be usable by a C compiler. > As I see it, people would be able to easily use eigen2 if there was a > BLAS API for it. We still would not distribute binaries built with > eigen2, but it means people who don't care about using GPL code could > use it. I see. I'd quite like to see this happening! Maybe, just give a look at where Eigen is in 1 year from now, the BLAS should be ready for that. > > Independently of NumPy, I think a BLAS API for eigen2 would be very > beneficial for eigen2 if you care about the numerical scientific > community. So do we, that's why we're doing it ;) see above. Benoit From thomas.robitaille at gmail.com Sun Jan 17 12:14:33 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Sun, 17 Jan 2010 12:14:33 -0500 Subject: [Numpy-discussion] Structured array sorting Message-ID: I am having trouble sorting a structured array - in the example below, sorting by the first column (col1) seems to work, but not sorting by the second column (col2). Is this a bug? I am using numpy svn r8071 on MacOS 10.6. Thanks for any help, Thomas Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> data = np.array([('a ', 2.), ('b', 4.), ('d', 3.), ('c', 1.)], ... dtype=[('col1', '|S5'), ('col2', '>f8')]) >>> >>> data array([('a ', 2.0), ('b', 4.0), ('d', 3.0), ('c', 1.0)], dtype=[('col1', '|S5'), ('col2', '>f8')]) >>> data.sort(order=['col1']) >>> data array([('a ', 2.0), ('b', 4.0), ('c', 1.0), ('d', 3.0)], dtype=[('col1', '|S5'), ('col2', '>f8')]) >>> data.sort(order=['col2']) >>> data array([('a ', 2.0), ('d', 3.0), ('b', 4.0), ('c', 1.0)], dtype=[('col1', '|S5'), ('col2', '>f8')]) From robert.kern at gmail.com Sun Jan 17 12:36:53 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 Jan 2010 11:36:53 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> Message-ID: <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: > 2010/1/17 David Cournapeau : >> There are several issues with eigen2 for NumPy usage: >> ?- using it as a default implementation does not make much sense IMHO, >> as it would make distributed binaries non 100 % BSD. > > But the LGPL doesn't impose restrictions on the usage of binaries, so > how does it matter? The LGPL and the BSD licenses are similar as far > as the binaries are concerned (unless perhaps one starts disassembling > them). > > The big difference between LGPL and BSD is at the level of source > code, not binary code: one modifies LGPL-based source code and > distributes a binary form of it, then one has to release the modified > source code as well. This is not true. Binaries that contain LGPLed code must be able to be relinked with a modified version of the LGPLed component. This is technically non-trivial. In addition, binaries containing an LGPLed component must still come with the source of the LGPLed component (or come with a written offer to distribute via the same mechanism ... yada yada yada). These are non-trivial restrictions above and beyond the BSD license that we, as a matter of policy, do not wish to impose on numpy users. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sierra_mtnview at sbcglobal.net Sun Jan 17 12:57:51 2010 From: sierra_mtnview at sbcglobal.net (Wayne Watson) Date: Sun, 17 Jan 2010 09:57:51 -0800 Subject: [Numpy-discussion] Module Index for numpy? Message-ID: <4B534F9F.4020901@sbcglobal.net> I was just looking at the (Win) Python documentation via the Help on IDLE, and a Global Module Index. Does anything like that exist for numpy, matplotlib, scipy? -- Wayne Watson (Watson Adventures, Prop., Nevada City, CA) (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39? 15' 7" N, 121? 2' 32" W, 2700 feet "I was thinking about how people seem to read the Bible a whole lot more as they get older; then it dawned on me . . they're cramming for their final exam." -- George Carlin Web Page: From jacob.benoit.1 at gmail.com Sun Jan 17 13:11:23 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Sun, 17 Jan 2010 13:11:23 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> Message-ID: 2010/1/17 Robert Kern : > On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >> 2010/1/17 David Cournapeau : > >>> There are several issues with eigen2 for NumPy usage: >>> ?- using it as a default implementation does not make much sense IMHO, >>> as it would make distributed binaries non 100 % BSD. >> >> But the LGPL doesn't impose restrictions on the usage of binaries, so >> how does it matter? The LGPL and the BSD licenses are similar as far >> as the binaries are concerned (unless perhaps one starts disassembling >> them). >> >> The big difference between LGPL and BSD is at the level of source >> code, not binary code: one modifies LGPL-based source code and >> distributes a binary form of it, then one has to release the modified >> source code as well. > > This is not true. Binaries that contain LGPLed code must be able to be > relinked with a modified version of the LGPLed component. This doesn't apply to Eigen which is a header-only pure template library, hence can't be 'linked' to. Actually you seem to be referring to Section 4 of the LGPL3, we have already asked the FSF about this and their reply was that it just doesn't apply in the case of Eigen: http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html In your case, what matters is Section 5. >In addition, binaries containing an LGPLed > component must still come with the source of the LGPLed component (or > come with a written offer to distribute via the same mechanism ... > yada yada yada). Since you would presumably be using vanilla Eigen without changes of your own, it would be enough to just give the link to the Eigen website, that's all. Just one line, and it doesn't have to be in a very prominent place, it just has to be reasonably easy to find for someone looking for it. > These are non-trivial restrictions above and beyond > the BSD license that we, as a matter of policy, do not wish to impose > on numpy users. The only thing you'd be imposing on NumPy users would be that somewhere at the bottom of, say, your README file, there would be a link to Eigen's website. Then who am I to discuss your policies ;) Finally let me just give an example why this is moot. You are using GCC, right? So you use the GNU libc (their standard C library)? It is LGPL ;) It's just that nobody cares to put a link to the GNU libc homepage, which is understandable ;) Cheers, Benoit From warren.weckesser at enthought.com Sun Jan 17 13:18:40 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 17 Jan 2010 12:18:40 -0600 Subject: [Numpy-discussion] Structured array sorting In-Reply-To: References: Message-ID: <4B535480.3010406@enthought.com> Looks like 'sort' is not handling endianess of the column data correctly. If you change the type of the floating point data to 'i2')]) In [141]: z.sort(order='num') In [142]: z Out[142]: array([(255,), (0,), (256,), (1,), (258,)], dtype=[('num', '>i2')]) In [143]: np.__version__ Out[143]: '1.3.0' ----- Sorting works as expected with a simple array of short ints: ----- In [152]: w = np.array([0,258, 3, 255], dtype=' I am having trouble sorting a structured array - in the example below, sorting by the first column (col1) seems to work, but not sorting by the second column (col2). Is this a bug? > > I am using numpy svn r8071 on MacOS 10.6. > > Thanks for any help, > > Thomas > > Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>>> import numpy as np >>>> data = np.array([('a ', 2.), ('b', 4.), ('d', 3.), ('c', 1.)], >>>> > ... dtype=[('col1', '|S5'), ('col2', '>f8')]) > >>>> data >>>> > array([('a ', 2.0), ('b', 4.0), ('d', 3.0), ('c', 1.0)], > dtype=[('col1', '|S5'), ('col2', '>f8')]) > >>>> data.sort(order=['col1']) >>>> data >>>> > array([('a ', 2.0), ('b', 4.0), ('c', 1.0), ('d', 3.0)], > dtype=[('col1', '|S5'), ('col2', '>f8')]) > >>>> data.sort(order=['col2']) >>>> data >>>> > array([('a ', 2.0), ('d', 3.0), ('b', 4.0), ('c', 1.0)], > dtype=[('col1', '|S5'), ('col2', '>f8')]) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Sun Jan 17 13:43:15 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 Jan 2010 12:43:15 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> Message-ID: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob wrote: > 2010/1/17 Robert Kern : >> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >>> 2010/1/17 David Cournapeau : >> >>>> There are several issues with eigen2 for NumPy usage: >>>> ?- using it as a default implementation does not make much sense IMHO, >>>> as it would make distributed binaries non 100 % BSD. >>> >>> But the LGPL doesn't impose restrictions on the usage of binaries, so >>> how does it matter? The LGPL and the BSD licenses are similar as far >>> as the binaries are concerned (unless perhaps one starts disassembling >>> them). >>> >>> The big difference between LGPL and BSD is at the level of source >>> code, not binary code: one modifies LGPL-based source code and >>> distributes a binary form of it, then one has to release the modified >>> source code as well. >> >> This is not true. Binaries that contain LGPLed code must be able to be >> relinked with a modified version of the LGPLed component. > > This doesn't apply to Eigen which is a header-only pure template > library, hence can't be 'linked' to. > > Actually you seem to be referring to Section 4 of the LGPL3, we have > already asked the FSF about this and their reply was that it just > doesn't apply in the case of Eigen: > > http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html > > In your case, what matters is Section 5. You mean Section 3. Good. I admit to being less up on the details of LGPLv3 than I was of LGPLv2 which had a problem with C++ header templates. That said, we will not be using the C++ templates directly in numpy for technical reasons (not least that we do not want to require a C++ compiler for the default build). At best, we would be using a BLAS interface which requires linking of objects, not just header templates. That *would* impose the Section 4 requirements. Furthermore, we would still prefer not to have any LGPL code in the official numpy sources or binaries, regardless of how minimal the real requirements are. Licensing is confusing enough that being able to say "numpy is BSD licensed" without qualification is quite important. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jacob.benoit.1 at gmail.com Sun Jan 17 14:18:42 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Sun, 17 Jan 2010 14:18:42 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> Message-ID: 2010/1/17 Robert Kern : > On Sun, Jan 17, 2010 at 12:11, Benoit Jacob wrote: >> 2010/1/17 Robert Kern : >>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >>>> 2010/1/17 David Cournapeau : >>> >>>>> There are several issues with eigen2 for NumPy usage: >>>>> ?- using it as a default implementation does not make much sense IMHO, >>>>> as it would make distributed binaries non 100 % BSD. >>>> >>>> But the LGPL doesn't impose restrictions on the usage of binaries, so >>>> how does it matter? The LGPL and the BSD licenses are similar as far >>>> as the binaries are concerned (unless perhaps one starts disassembling >>>> them). >>>> >>>> The big difference between LGPL and BSD is at the level of source >>>> code, not binary code: one modifies LGPL-based source code and >>>> distributes a binary form of it, then one has to release the modified >>>> source code as well. >>> >>> This is not true. Binaries that contain LGPLed code must be able to be >>> relinked with a modified version of the LGPLed component. >> >> This doesn't apply to Eigen which is a header-only pure template >> library, hence can't be 'linked' to. >> >> Actually you seem to be referring to Section 4 of the LGPL3, we have >> already asked the FSF about this and their reply was that it just >> doesn't apply in the case of Eigen: >> >> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html >> >> In your case, what matters is Section 5. > > You mean Section 3. Good. Section 3 is for using Eigen directly in a C++ program, yes, but I got a bit ahead of myself there: see below > I admit to being less up on the details of > LGPLv3 than I was of LGPLv2 which had a problem with C++ header > templates. Indeed, it did, that's why we don't use it. > > That said, we will not be using the C++ templates directly in numpy > for technical reasons (not least that we do not want to require a C++ > compiler for the default build). At best, we would be using a BLAS > interface which requires linking of objects, not just header > templates. That *would* impose the Section 4 requirements. ... or rather Section 5: that is what I was having in mind: " 5. Combined Libraries. " I have to admit that I don't understand what 5.a) means. > Furthermore, we would still prefer not to have any LGPL code in the > official numpy sources or binaries, regardless of how minimal the real > requirements are. Licensing is confusing enough that being able to say > "numpy is BSD licensed" without qualification is quite important. I hear you, in the same way we definitely care about being able to say "Eigen is LGPL licensed". So it's a hard problem. I think that this is the only real issue here, but I definitely agree that it is a real one. Large projects (such as Qt) that have a third_party subdirectory have to find a wording to explain that their license doesn't cover it. Benoit > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.l.goldsmith at gmail.com Sun Jan 17 14:19:38 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 17 Jan 2010 11:19:38 -0800 Subject: [Numpy-discussion] Module Index for numpy? In-Reply-To: <4B534F9F.4020901@sbcglobal.net> References: <4B534F9F.4020901@sbcglobal.net> Message-ID: <45d1ab481001171119u244352efgfa65c2e72965aff1@mail.gmail.com> Hi, Wayne. They're not nearly as structured, but for the time being (indefinitely? unless a volunteer steps forward to build something for us more closely resembling the GMI), you could use the numpy and scipy doc Wiki Milestones pages: http://docs.scipy.org/numpy/Milestones/ http://docs.scipy.org/scipy/Milestones/ in this fashion. DG On Sun, Jan 17, 2010 at 9:57 AM, Wayne Watson wrote: > I was just looking at the (Win) Python documentation via the Help on > IDLE, and a Global Module Index. Does anything like that exist for > numpy, matplotlib, scipy? > > -- > ? ? ? ? ? Wayne Watson (Watson Adventures, Prop., Nevada City, CA) > > ? ? ? ? ? ? (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) > ? ? ? ? ? ? ?Obz Site: ?39? 15' 7" N, 121? 2' 32" W, 2700 feet > > ? ? ? ? ? "I was thinking about how people seem to read the Bible > ? ? ? ? ? ?a whole lot more as they get older; then it dawned on > ? ? ? ? ? ?me . . they're cramming for their final exam." > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-- George Carlin > > ? ? ? ? ? ? ? ? ? ?Web Page: > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sun Jan 17 14:34:00 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 17 Jan 2010 14:34:00 -0500 Subject: [Numpy-discussion] Module Index for numpy? In-Reply-To: <45d1ab481001171119u244352efgfa65c2e72965aff1@mail.gmail.com> References: <4B534F9F.4020901@sbcglobal.net> <45d1ab481001171119u244352efgfa65c2e72965aff1@mail.gmail.com> Message-ID: <1cd32cbb1001171134n269aa238p7d4ac4030be024e7@mail.gmail.com> On Sun, Jan 17, 2010 at 2:19 PM, David Goldsmith wrote: > Hi, Wayne. > > They're not nearly as structured, but for the time being > (indefinitely? unless a volunteer steps forward to build something for > us more closely resembling the GMI), you could use the numpy and scipy > doc Wiki Milestones pages: > > http://docs.scipy.org/numpy/Milestones/ > > http://docs.scipy.org/scipy/Milestones/ > > in this fashion. > > DG > > On Sun, Jan 17, 2010 at 9:57 AM, Wayne Watson > wrote: >> I was just looking at the (Win) Python documentation via the Help on >> IDLE, and a Global Module Index. Does anything like that exist for >> numpy, matplotlib, scipy? >> >> -- >> ? ? ? ? ? Wayne Watson (Watson Adventures, Prop., Nevada City, CA) >> >> ? ? ? ? ? ? (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) >> ? ? ? ? ? ? ?Obz Site: ?39? 15' 7" N, 121? 2' 32" W, 2700 feet >> >> ? ? ? ? ? "I was thinking about how people seem to read the Bible >> ? ? ? ? ? ?a whole lot more as they get older; then it dawned on >> ? ? ? ? ? ?me . . they're cramming for their final exam." >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-- George Carlin >> >> ? ? ? ? ? ? ? ? ? ?Web Page: >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > the autogenerated indices are here and also in the htmlhelp http://docs.scipy.org/doc/numpy/modindex.html http://docs.scipy.org/doc/numpy/genindex.html However, because of the package structure of numpy the modindex is not as useful as the one for python. I find the structure of routines, and the index search in the htmlhelp more useful. Josef From robert.kern at gmail.com Sun Jan 17 14:57:20 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 Jan 2010 13:57:20 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> Message-ID: <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> On Sun, Jan 17, 2010 at 13:18, Benoit Jacob wrote: > 2010/1/17 Robert Kern : >> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob wrote: >>> 2010/1/17 Robert Kern : >>>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >>>>> 2010/1/17 David Cournapeau : >>>> >>>>>> There are several issues with eigen2 for NumPy usage: >>>>>> ?- using it as a default implementation does not make much sense IMHO, >>>>>> as it would make distributed binaries non 100 % BSD. >>>>> >>>>> But the LGPL doesn't impose restrictions on the usage of binaries, so >>>>> how does it matter? The LGPL and the BSD licenses are similar as far >>>>> as the binaries are concerned (unless perhaps one starts disassembling >>>>> them). >>>>> >>>>> The big difference between LGPL and BSD is at the level of source >>>>> code, not binary code: one modifies LGPL-based source code and >>>>> distributes a binary form of it, then one has to release the modified >>>>> source code as well. >>>> >>>> This is not true. Binaries that contain LGPLed code must be able to be >>>> relinked with a modified version of the LGPLed component. >>> >>> This doesn't apply to Eigen which is a header-only pure template >>> library, hence can't be 'linked' to. >>> >>> Actually you seem to be referring to Section 4 of the LGPL3, we have >>> already asked the FSF about this and their reply was that it just >>> doesn't apply in the case of Eigen: >>> >>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html >>> >>> In your case, what matters is Section 5. >> >> You mean Section 3. Good. > > Section 3 is for using Eigen directly in a C++ program, yes, but I got > a bit ahead of myself there: see below > >> I admit to being less up on the details of >> LGPLv3 than I was of LGPLv2 which had a problem with C++ header >> templates. > > Indeed, it did, that's why we don't use it. > >> >> That said, we will not be using the C++ templates directly in numpy >> for technical reasons (not least that we do not want to require a C++ >> compiler for the default build). At best, we would be using a BLAS >> interface which requires linking of objects, not just header >> templates. That *would* impose the Section 4 requirements. > > ... or rather Section 5: that is what I was having in mind: > ?" 5. Combined Libraries. " > > I have to admit that I don't understand what 5.a) means. I don't think it applies. Let's say I write some routines that use an LGPLed Library (let's call them Routines A). I can include those routines in a larger library with routines that do not use the LGPLed library (Routines B). The Routines B can be under whatever license you like. However, one must make a library containing only Routines A and the LGPLed Library and release that under the LGPLv3, distribute it along with the combined work, and give notice about how to obtain Routines A+Library separate from Routines B. Basically, it's another exception for needing to be able to relink object code in a particular technical use case. This cannot apply to numpy because we cannot break out numpy.linalg from the rest of numpy. Even if we could, we do not wish to make numpy.linalg itself LGPLed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jacob.benoit.1 at gmail.com Sun Jan 17 15:40:02 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Sun, 17 Jan 2010 15:40:02 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> Message-ID: 2010/1/17 Robert Kern : > On Sun, Jan 17, 2010 at 13:18, Benoit Jacob wrote: >> 2010/1/17 Robert Kern : >>> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob wrote: >>>> 2010/1/17 Robert Kern : >>>>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >>>>>> 2010/1/17 David Cournapeau : >>>>> >>>>>>> There are several issues with eigen2 for NumPy usage: >>>>>>> ?- using it as a default implementation does not make much sense IMHO, >>>>>>> as it would make distributed binaries non 100 % BSD. >>>>>> >>>>>> But the LGPL doesn't impose restrictions on the usage of binaries, so >>>>>> how does it matter? The LGPL and the BSD licenses are similar as far >>>>>> as the binaries are concerned (unless perhaps one starts disassembling >>>>>> them). >>>>>> >>>>>> The big difference between LGPL and BSD is at the level of source >>>>>> code, not binary code: one modifies LGPL-based source code and >>>>>> distributes a binary form of it, then one has to release the modified >>>>>> source code as well. >>>>> >>>>> This is not true. Binaries that contain LGPLed code must be able to be >>>>> relinked with a modified version of the LGPLed component. >>>> >>>> This doesn't apply to Eigen which is a header-only pure template >>>> library, hence can't be 'linked' to. >>>> >>>> Actually you seem to be referring to Section 4 of the LGPL3, we have >>>> already asked the FSF about this and their reply was that it just >>>> doesn't apply in the case of Eigen: >>>> >>>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html >>>> >>>> In your case, what matters is Section 5. >>> >>> You mean Section 3. Good. >> >> Section 3 is for using Eigen directly in a C++ program, yes, but I got >> a bit ahead of myself there: see below >> >>> I admit to being less up on the details of >>> LGPLv3 than I was of LGPLv2 which had a problem with C++ header >>> templates. >> >> Indeed, it did, that's why we don't use it. >> >>> >>> That said, we will not be using the C++ templates directly in numpy >>> for technical reasons (not least that we do not want to require a C++ >>> compiler for the default build). At best, we would be using a BLAS >>> interface which requires linking of objects, not just header >>> templates. That *would* impose the Section 4 requirements. >> >> ... or rather Section 5: that is what I was having in mind: >> ?" 5. Combined Libraries. " >> >> I have to admit that I don't understand what 5.a) means. > > I don't think it applies. Let's say I write some routines that use an > LGPLed Library (let's call them Routines A). I can include those > routines in a larger library with routines that do not use the LGPLed > library (Routines B). The Routines B can be under whatever license you > like. However, one must make a library containing only Routines A and > the LGPLed Library and release that under the LGPLv3, distribute it > along with the combined work, and give notice about how to obtain > Routines A+Library separate from Routines B. Basically, it's another > exception for needing to be able to relink object code in a particular > technical use case. > > This cannot apply to numpy because we cannot break out numpy.linalg > from the rest of numpy. Even if we could, we do not wish to make > numpy.linalg itself LGPLed. Indeed, that seems very cumbersome. I will ask the FSF about this, as this is definitely not something that we want to impose on Eigen users. Benoit > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwmsmith at gmail.com Sun Jan 17 15:42:15 2010 From: kwmsmith at gmail.com (Kurt Smith) Date: Sun, 17 Jan 2010 14:42:15 -0600 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: <5b8d13221001162004w640e4854obe27a78942bb551b@mail.gmail.com> References: <5b8d13221001162004w640e4854obe27a78942bb551b@mail.gmail.com> Message-ID: On Sat, Jan 16, 2010 at 10:04 PM, David Cournapeau wrote: > On Sun, Jan 17, 2010 at 4:12 AM, Kurt Smith wrote: >> My questions here concern those familiar with configure/build/install >> systems such as distutils, setuptools, scons/numscons or waf >> (particularly David Cournapeau). >> >> I'm creating a tool known as 'fwrap' that has a component that needs >> to do essentially what f2py does now -- take fortran source code and >> compile it into a python extension module. ?It uses Cython to create >> the extension module, and the current configure/build/install system >> is a very kludgy monkeypatched Cython.distutils and numpy.distutils >> setup.py script. ?The setup.py script works for testing on my system >> here, but for going prime time, I dread using it. ?David has made his >> critiques of distutils known for scientific software, and I agree. >> What's the best alternative? > > The best alternative in the short term is no alternative: making sure > everything you need is incorporated in numpy.distutils. Otherwise, you > will have to recreate everything that distutils is doing: you will > have people who will demand egg, mac os x .mpkg, windows installers, > etc.. Basically what I am trying to do with toydist now - I don't mind > getting help there, though :) If you ignore the installation phase and focus on just the configure/build phases (see below), what would you say then? How much work needs to go into the install step as compared to the configure/build steps for waf or scons? > > I promised to add decent cython support in numpy.distutils for 1.5.0, > maybe we should see what we can do for fwrap at the same time. > > I am also a bit unclear about what is needed exactly, and what would > be the workflow: I don't understand why fwrap should care about > packaging/deployment at all, for example. I should have emphasized that fwrap primarily needs a good configure/build system **for the projects that fwrap wraps.** The original post probably didn't make this clear. The workflow is this, assuming fwrap is installed appropriately on the platform: * fwrap is called on a bunch of fortran source files. * fwrap generates fortran wrappers, consisting of fortran source files and Cython source files. * If the user wants fwrap to create an extension module then and there: * The configure/build system (waf, scons, toydist, or a setup.py script) kicks in. It... * Configures the build, getting appropriate compilers, making sure Cython is installed, sorting out fortran <-> C type sizes, etc. * Builds the code appropriately, in the right order, etc. (requires fortran, Cython & C compilation and linking, with scanning and dependency injection). * Puts the extension *.so file in the current working directory by default. So the build tool's installation stage is much less important. It seems from these discussions that its the installation step that is particularly nasty. Fwrap would leave the installation of the extension module to the user. >> >> More specifically: what are the pros/cons between waf and >> scons/numscons for configure/build/install of a >> Fortran-C-Cython-Python project? > > waf has no fortran support whatsoever, so you would need to add it. > Waf codebase is much better than scons, but there lacks some internal > documentation. There were some things that I did not manage to do in > waf, because the internal API for scanning/dependency injection was > not very clear to me (by scanning I mean the ability to scan the > source code to look for dependency, e.g. fortran modules, and by > dependency injection, I mean adding new targets to the DAG of > dependencies at runtime - again needed for fortran modules). I'll likely need both scanning and Dependency Injection in fwrap. The complete lack of fortran support is an issue, although not a deal breaker. I like that waf is designed to be small, self-contained and not installed system-wide. I could distribute waf with fwrap and users wouldn't have to have yet another external dependency for fwrap to work. > > Basic handling of fortran compilation and fortran detection was > relatively easy in comparison. > > The biggest drawback I see with waf is the lack of users: the only > significant project I know which uses waf is Ardour. OTOH, I believe > scons has deep structural problems, and only a few people can change > some significant parts of the code. Yes, the lack of adoption is an issue. Would scons' structural problems affect a project like fwrap, in your view? You have a fair amount of experience dealing with Fortran/C hybrid programming -- is scons still flexible enough to handle it? > >> >> Is scons capable of handling the configure and install stages, or is >> it only a build system? ?As I understand it, numscons is called from >> distutils; distutils handles the configure/install stages. > > Distutils only handles the installation - everything done within the > build_* command is done by the scons distutils command, and > configuration is done by scons as well. Scons configure mechanism is > very primitive - it uses a separate framework than the rest of the > tool, which means in particular that scons top notch dependency > handling does not work well for the configuration stage. waf is much > better in that aspect. > Good to know. Perhaps scons would require a fair amount of work to get configuration working well, while waf would require fortran work in the configure/build stages? >> Scons/numscons have more fortran support that waf, from what I can >> see. ?The main downside of using scons is that I'd still have to mess >> around with distutils. > > My main point should be this: whatever you do, you will end up messing > with distutils, unless you reimplement everything that distutils does, > be it waf, scons, etc... In the short term, adding things to > numpy.distutils is the easiest path. > > Long term, I hope toydist will be a tool which will enable exactly > what you want: using a build system of your choice, and being able to > reuse existing code for installation/packaging to avoid recreating it > yourself. You will be able to create an exe/egg/pkg from a simple > package representation, every package will have a common interface for > the user independently of the internal build tool, etc... So is it correct to say that toydist hands off the build to an external tool, and takes care of the installation/packaging itself? Since the installation is the least essential part of fwrap and I need a robust build tool, waf/scons appear to be the best options over numpy.distutils, even though they will require a good chunk of work. Its clear I need to test-drive waf and scons to see how they compare with each other and my existing numpy.distutils and cython.distutils conglomeration. Thank you very much for the input, very informative. Kurt From totalbull at mac.com Sun Jan 17 18:43:42 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Sun, 17 Jan 2010 23:43:42 +0000 Subject: [Numpy-discussion] Pandas LongPanel/WidePanel for 3d timeseries? In-Reply-To: <6c476c8a1001041921l3b5e1c7uf5b42901c789cfb0@mail.gmail.com> References: <15DD68D1-B0E3-43BF-BC8A-784070842049@mac.com> <3E60CDFB-F0C3-4E80-8A61-C0C0E70F3C04@mac.com> <4B29327A.9020201@enthought.com> <2689639429985454905@unknownmsgid> <1215039009384389849@unknownmsgid> <0254E4B5-4740-4ECC-86FA-A3E1358E15FD@mac.com> <6c476c8a1001041921l3b5e1c7uf5b42901c789cfb0@mail.gmail.com> Message-ID: Hello, I am successfully using the new Pandas library for series and matrix analysis using 2 dimensional arrays. I am using the "fromDict" method which works well, and I am creating 2-dimensional arrays where each axis is indexed by FX currency names. So for example pandas DataFrame called aa: >>> aa.columns Index([AUDUSD, EURCHF, EURCZK, EURHUF, EURPLN, EURSEK, EURUSD, GBPUSD, NZDUSD, USDBRL, USDCAD, USDCLP, USDILS, USDJPY, USDKRW, USDMXN, USDRUB, USDSGD, USDTRY, USDTWD, USDZAR], dtype=object) >>> aa.rows Index: 21 entries, AUDUSD to USDZAR Data columns: AUDUSD 20 non-null values EURCHF 20 non-null values EURCZK 20 non-null values EURHUF 20 non-null values EURPLN 20 non-null values EURSEK 20 non-null values EURUSD 20 non-null values GBPUSD 20 non-null values NZDUSD 20 non-null values USDBRL 20 non-null values USDCAD 20 non-null values USDCLP 20 non-null values USDILS 20 non-null values USDJPY 20 non-null values USDKRW 20 non-null values USDMXN 20 non-null values USDRUB 20 non-null values USDSGD 20 non-null values USDTRY 20 non-null values USDTWD 20 non-null values USDZAR 20 non-null values > >>> aa['USDZAR']['USDTWD'] 1.2711725043942563 >>> (each cell contains the number of standard errors of today's prices in the linear regression of the two currency pairs). Now I want to create a 3 dimensional stack of these aa-style matrices, where the z axis is indexed by historical dates. IE one matrix for each date, from today, going back 2 years. What is the best pandas function for doing this? Is it pandas.WidePanel or pandas.LongPanel, and how do I use these functions to construct this 3d stack? (I would ideally like to append each 2d matrix to the 3d stack as I create each one). Thanks for the help...... unfortunately can't find this in the online docs. Tom From totalbull at mac.com Sun Jan 17 18:49:17 2010 From: totalbull at mac.com (totalbull at mac.com) Date: Sun, 17 Jan 2010 23:49:17 +0000 Subject: [Numpy-discussion] Pandas LongPanel/WidePanel for 3d timeseries? In-Reply-To: References: <15DD68D1-B0E3-43BF-BC8A-784070842049@mac.com> <3E60CDFB-F0C3-4E80-8A61-C0C0E70F3C04@mac.com> <4B29327A.9020201@enthought.com> <2689639429985454905@unknownmsgid> <1215039009384389849@unknownmsgid> <0254E4B5-4740-4ECC-86FA-A3E1358E15FD@mac.com> <6c476c8a1001041921l3b5e1c7uf5b42901c789cfb0@mail.gmail.com> Message-ID: Apologies - too quick to ask the list without thoroughly checking the online docs. I have found the answer (fromDict method takes DataFrame objects): http://pandas.sourceforge.net/generated/pandas.WidePanel.html#pandas.WidePanel Still would like to know how to append 2d matrices one-by-one though. On 17 Jan 2010, at 23:43, totalbull at mac.com wrote: > Hello, > > I am successfully using the new Pandas library for series and matrix analysis using 2 dimensional arrays. I am using the "fromDict" method which works well, and I am creating 2-dimensional arrays where each axis is indexed by FX currency names. So for example pandas DataFrame called aa: > >>>> aa.columns > Index([AUDUSD, EURCHF, EURCZK, EURHUF, EURPLN, EURSEK, EURUSD, GBPUSD, > NZDUSD, USDBRL, USDCAD, USDCLP, USDILS, USDJPY, USDKRW, USDMXN, > USDRUB, USDSGD, USDTRY, USDTWD, USDZAR], dtype=object) >>>> aa.rows > > Index: 21 entries, AUDUSD to USDZAR > Data columns: > AUDUSD 20 non-null values > EURCHF 20 non-null values > EURCZK 20 non-null values > EURHUF 20 non-null values > EURPLN 20 non-null values > EURSEK 20 non-null values > EURUSD 20 non-null values > GBPUSD 20 non-null values > NZDUSD 20 non-null values > USDBRL 20 non-null values > USDCAD 20 non-null values > USDCLP 20 non-null values > USDILS 20 non-null values > USDJPY 20 non-null values > USDKRW 20 non-null values > USDMXN 20 non-null values > USDRUB 20 non-null values > USDSGD 20 non-null values > USDTRY 20 non-null values > USDTWD 20 non-null values > USDZAR 20 non-null values >> >>>> aa['USDZAR']['USDTWD'] > 1.2711725043942563 >>>> > > (each cell contains the number of standard errors of today's prices in the linear regression of the two currency pairs). Now I want to create a 3 dimensional stack of these aa-style matrices, where the z axis is indexed by historical dates. IE one matrix for each date, from today, going back 2 years. What is the best pandas function for doing this? Is it pandas.WidePanel or pandas.LongPanel, and how do I use these functions to construct this 3d stack? (I would ideally like to append each 2d matrix to the 3d stack as I create each one). > > Thanks for the help...... unfortunately can't find this in the online docs. > > Tom > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From kwgoodman at gmail.com Sun Jan 17 21:21:43 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 17 Jan 2010 18:21:43 -0800 Subject: [Numpy-discussion] Pandas LongPanel/WidePanel for 3d timeseries? In-Reply-To: References: <15DD68D1-B0E3-43BF-BC8A-784070842049@mac.com> <2689639429985454905@unknownmsgid> <1215039009384389849@unknownmsgid> <0254E4B5-4740-4ECC-86FA-A3E1358E15FD@mac.com> <6c476c8a1001041921l3b5e1c7uf5b42901c789cfb0@mail.gmail.com> Message-ID: On Sun, Jan 17, 2010 at 3:49 PM, wrote: > Apologies - too quick to ask the list without thoroughly checking the online docs. I have found the answer (fromDict method takes DataFrame objects): > > http://pandas.sourceforge.net/generated/pandas.WidePanel.html#pandas.WidePanel > > Still would like to know how to append 2d matrices one-by-one though. Here's the best place to ask questions about pandas: http://groups.google.ca/group/pystatsmodels From wesmckinn at gmail.com Sun Jan 17 21:45:50 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 17 Jan 2010 21:45:50 -0500 Subject: [Numpy-discussion] Pandas LongPanel/WidePanel for 3d timeseries? In-Reply-To: References: <15DD68D1-B0E3-43BF-BC8A-784070842049@mac.com> <2689639429985454905@unknownmsgid> <1215039009384389849@unknownmsgid> <0254E4B5-4740-4ECC-86FA-A3E1358E15FD@mac.com> <6c476c8a1001041921l3b5e1c7uf5b42901c789cfb0@mail.gmail.com> Message-ID: <-5358252572861183867@unknownmsgid> On Jan 17, 2010, at 9:21 PM, Keith Goodman wrote: > On Sun, Jan 17, 2010 at 3:49 PM, wrote: >> Apologies - too quick to ask the list without thoroughly checking >> the online docs. I have found the answer (fromDict method takes >> DataFrame objects): >> >> http://pandas.sourceforge.net/generated/pandas.WidePanel.html#pandas.WidePanel >> >> Still would like to know how to append 2d matrices one-by-one though. > > Here's the best place to ask questions about pandas: > > http://groups.google.ca/group/pystatsmodels > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion pystatsmodels is the best place to ask questions. You will want to use WidePanel.fromDict. Additional items can be added easily like so: wp[x] = frame I apologize for the lack of documentation-- that should change in the next month or so. From lescroar at usc.edu Sun Jan 17 23:33:21 2010 From: lescroar at usc.edu (Mark Lescroart) Date: Sun, 17 Jan 2010 20:33:21 -0800 Subject: [Numpy-discussion] Numpy.dot segmentation fault Message-ID: Hello, I've encountered a segfault in numpy when trying to compute a dot product for two arrays - see code below. The problem only seems to occur when the arrays reach a certain size. I'm using Numpy version 1.3.0, installed via macports, on a 2.33 GHz Intel Core2 Duo Macbook Pro running Leopard (OSX 10.5.8). I've posted this as a bug on the numpy page, where I was told it might have to do with my ATLAS installation (version 3.8.3_1, also installed via macports). Has anyone run into anything like this before? Cheers, Mark Example code: import numpy as N print 'Demonstration of Numpy Bug:' print 'loading X (random numbers)' SzList = [10,20,30,40,50,60,70,80,90,100] for Sz in SzList: print 'X size = %d,%d'%(300,Sz) X = N.random.rand(300,Sz) Y = N.random.rand(300,3) print 'Attempting dot product of X and Y' N.dot(X.T,Y) print 'Finished without bug.' Result (run through gdb): (There were a number of warnings like this - so many that they went off the top of the screen and I couldn't copy them all. This was typical of the warnings.) Reading symbols for shared libraries warning: Could not find object file "/opt/local/var/macports/ build_opt_local_var_macports_sources_rsync .macports.org_release_ports_lang_python26/work/Python-2.6.4/build/ temp.macosx-10.5-i386-2.6/opt/local/var/macports/ build_opt_local_var_macports_sources_rsync.macports.org_release _ports_lang_python26/work/Python-2.6.4/Modules/_collectionsmodule.o" - no debug information available for "/opt/local/var/macports/build/ _opt_local_var_macports_sources_rsync .macports.org_release_ports_lang_python26/work/Python-2.6.4/Modules/ _collectionsmodule.c". . done Demonstration of Numpy Bug: loading X (random numbers) X size = 300,10 Attempting dot product of X and Y Finished without bug. X size = 300,20 Attempting dot product of X and Y Finished without bug. X size = 300,30 Attempting dot product of X and Y Finished without bug. X size = 300,40 Attempting dot product of X and Y Finished without bug. X size = 300,50 Attempting dot product of X and Y Finished without bug. X size = 300,60 Attempting dot product of X and Y Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x00000000 [Switching to process 56005 thread 0x117] 0x01038884 in ATL_dupMBmm0_2_0_b0 () -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Sun Jan 17 23:38:25 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Mon, 18 Jan 2010 13:38:25 +0900 Subject: [Numpy-discussion] Numpy.dot segmentation fault In-Reply-To: References: Message-ID: <4B53E5C1.4070801@silveregg.co.jp> Mark Lescroart wrote: > Hello, > > I've encountered a segfault in numpy when trying to compute a dot > product for two arrays - see code below. The problem only seems to occur > when the arrays reach a certain size. Your atlas is most likely broken. You will have to double-check how you built it, and maybe run the whole test suite (as indicated in the ATLAS installation notes). Note that you can use the Accelerate framework on mac os x, this is much easier to get numpy working on mac, cheers, David From thomas.robitaille at gmail.com Mon Jan 18 08:39:46 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Mon, 18 Jan 2010 05:39:46 -0800 (PST) Subject: [Numpy-discussion] Structured array sorting In-Reply-To: <4B535480.3010406@enthought.com> References: <4B535480.3010406@enthought.com> Message-ID: <27210615.post@talk.nabble.com> Warren Weckesser-3 wrote: > > Looks like 'sort' is not handling endianess of the column data > correctly. If you change the type of the floating point data to ' the sort works. > Thanks for identifying the issue - should I submit a bug report? Thomas -- View this message in context: http://old.nabble.com/Structured-array-sorting-tp27200785p27210615.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From charlesr.harris at gmail.com Mon Jan 18 09:27:16 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 18 Jan 2010 07:27:16 -0700 Subject: [Numpy-discussion] Structured array sorting In-Reply-To: <27210615.post@talk.nabble.com> References: <4B535480.3010406@enthought.com> <27210615.post@talk.nabble.com> Message-ID: On Mon, Jan 18, 2010 at 6:39 AM, Thomas Robitaille < thomas.robitaille at gmail.com> wrote: > > > Warren Weckesser-3 wrote: > > > > Looks like 'sort' is not handling endianess of the column data > > correctly. If you change the type of the floating point data to ' > the sort works. > > > > Thanks for identifying the issue - should I submit a bug report? > > Yes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.benoit.1 at gmail.com Mon Jan 18 10:35:51 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Mon, 18 Jan 2010 10:35:51 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> Message-ID: 2010/1/17 Benoit Jacob : > 2010/1/17 Robert Kern : >> On Sun, Jan 17, 2010 at 13:18, Benoit Jacob wrote: >>> 2010/1/17 Robert Kern : >>>> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob wrote: >>>>> 2010/1/17 Robert Kern : >>>>>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob wrote: >>>>>>> 2010/1/17 David Cournapeau : >>>>>> >>>>>>>> There are several issues with eigen2 for NumPy usage: >>>>>>>> ?- using it as a default implementation does not make much sense IMHO, >>>>>>>> as it would make distributed binaries non 100 % BSD. >>>>>>> >>>>>>> But the LGPL doesn't impose restrictions on the usage of binaries, so >>>>>>> how does it matter? The LGPL and the BSD licenses are similar as far >>>>>>> as the binaries are concerned (unless perhaps one starts disassembling >>>>>>> them). >>>>>>> >>>>>>> The big difference between LGPL and BSD is at the level of source >>>>>>> code, not binary code: one modifies LGPL-based source code and >>>>>>> distributes a binary form of it, then one has to release the modified >>>>>>> source code as well. >>>>>> >>>>>> This is not true. Binaries that contain LGPLed code must be able to be >>>>>> relinked with a modified version of the LGPLed component. >>>>> >>>>> This doesn't apply to Eigen which is a header-only pure template >>>>> library, hence can't be 'linked' to. >>>>> >>>>> Actually you seem to be referring to Section 4 of the LGPL3, we have >>>>> already asked the FSF about this and their reply was that it just >>>>> doesn't apply in the case of Eigen: >>>>> >>>>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html >>>>> >>>>> In your case, what matters is Section 5. >>>> >>>> You mean Section 3. Good. >>> >>> Section 3 is for using Eigen directly in a C++ program, yes, but I got >>> a bit ahead of myself there: see below >>> >>>> I admit to being less up on the details of >>>> LGPLv3 than I was of LGPLv2 which had a problem with C++ header >>>> templates. >>> >>> Indeed, it did, that's why we don't use it. >>> >>>> >>>> That said, we will not be using the C++ templates directly in numpy >>>> for technical reasons (not least that we do not want to require a C++ >>>> compiler for the default build). At best, we would be using a BLAS >>>> interface which requires linking of objects, not just header >>>> templates. That *would* impose the Section 4 requirements. >>> >>> ... or rather Section 5: that is what I was having in mind: >>> ?" 5. Combined Libraries. " >>> >>> I have to admit that I don't understand what 5.a) means. >> >> I don't think it applies. Let's say I write some routines that use an >> LGPLed Library (let's call them Routines A). I can include those >> routines in a larger library with routines that do not use the LGPLed >> library (Routines B). The Routines B can be under whatever license you >> like. However, one must make a library containing only Routines A and >> the LGPLed Library and release that under the LGPLv3, distribute it >> along with the combined work, and give notice about how to obtain >> Routines A+Library separate from Routines B. Basically, it's another >> exception for needing to be able to relink object code in a particular >> technical use case. >> >> This cannot apply to numpy because we cannot break out numpy.linalg >> from the rest of numpy. Even if we could, we do not wish to make >> numpy.linalg itself LGPLed. > > Indeed, that seems very cumbersome. I will ask the FSF about this, as > this is definitely not something that we want to impose on Eigen > users. > Sorry for continuing the licensing noise on your list --- I though that now that I've started, I should let you know that I think I understand things more clearly now ;) First, Section 5 of the LGPL is horrible indeed, so let's forget about that. If you were using a LGPL-licensed binary library, Section 4 would rather be what you want. It would require you to: 4a) say somewhere ("prominently" is vague, the bottom of a README is OK) that you use the library 4b) distribute copies of the GPL and LGPL licenses text. Pointless, but not a big issue. the rest doesn't matter: 4c) not applicable to you 4d1) this is what you would be doing anyway 4e) not applicable to you Finally and this is the important point: you would not be passing any requirement to your own users. Indeed, the LGPL license, contrary to the GPL license, does not propagate through dependency chains. So if NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be met when distributing NumPy, but NumPy itself isn't LGPL at all and an application using NumPy does not have to care at all about the LGPL. So there should be no concern at all of "passing on LGPL requirements to users" Again, IANAL. Benoit From robert.kern at gmail.com Mon Jan 18 11:04:15 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jan 2010 10:04:15 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <5b8d13221001162158l471dbd4k89230c9209f49b6e@mail.gmail.com> <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> Message-ID: <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob wrote: > Sorry for continuing the licensing noise on your list --- I though > that now that I've started, I should let you know that I think I > understand things more clearly now ;) No worries. > First, Section 5 of the LGPL is horrible indeed, so let's forget about that. I don't think it's that horrible, honestly. It just applies to a different deployment use case and a different set of technologies. > If you were using a LGPL-licensed binary library, Section 4 would > rather be what you want. It would require you to: > ?4a) say somewhere ("prominently" is vague, the bottom of a README is > OK) that you use the library > ?4b) distribute copies of the GPL and LGPL licenses text. Pointless, > but not a big issue. > > the rest doesn't matter: > ?4c) not applicable to you > ?4d1) this is what you would be doing anyway Possibly, but shared libraries are not easy for a variety of boring, Python-specific, technical reasons. 4d0 would be easier for the official binaries (because we provide official source). But that would still force people building a proprietary application using numpy to rebuild a binary without Eigen2 or else make sure that they allow users to rebuild numpy. For a number of deployment options (py2app, py2exe, bbfreeze, etc.), this is annoying, particularly when combined with the 4e requirement, as I explain below. > ?4e) not applicable to you Yes, it is. The exception where Installation Information is not required is only when installation is impossible, such as embedded devices where the code is in a ROM chip. > Finally and this is the important point: you would not be passing any > requirement to your own users. Indeed, the LGPL license, contrary to > the GPL license, does not propagate through dependency chains. So if > NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be > met when distributing NumPy, but NumPy itself isn't LGPL at all and an > application using NumPy does not have to care at all about the LGPL. > So there should be no concern at all of "passing on LGPL requirements > to users" No, not at all. The GPL "propagates" by requiring that the rest of the code be licensed compatibly with the GPL. This is an unusual and particular feature of the GPL. The LGPL does not require that rest of the code be licensed in a particular way. However, that doesn't mean that the license of the "outer layer" insulates the downstream user from the LGPL license of the wrapped component. It just means that there is BSD code and LGPL code in the total product. The downstream user must accept and deal with the licenses of *all* of the components simultaneously. This is how most licenses work. I think that the fact that the GPL is particularly "viral" may be obscuring the normal way that licenses work when combined with other licenses. If I had a proprietary application that used an LGPL library, and I gave my customers some limited rights to modify and resell my application, they would still be bound by the LGPL with respect to the library. They could not modify the LGPLed library and sell it under a proprietary license even if I allow them to do that with the application as a whole. For us to use Eigen2 in numpy such that our users could use, modify and redistribute numpy+Eigen2, in its entirety, under the terms of the BSD license, we would have to get permission from you to distribute Eigen2 under the BSD license. It's only polite. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jacob.benoit.1 at gmail.com Mon Jan 18 11:26:01 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Mon, 18 Jan 2010 11:26:01 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> References: <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> Message-ID: 2010/1/18 Robert Kern : > On Mon, Jan 18, 2010 at 09:35, Benoit Jacob wrote: > >> Sorry for continuing the licensing noise on your list --- I though >> that now that I've started, I should let you know that I think I >> understand things more clearly now ;) > > No worries. > >> First, Section 5 of the LGPL is horrible indeed, so let's forget about that. > > I don't think it's that horrible, honestly. It just applies to a > different deployment use case and a different set of technologies. > >> If you were using a LGPL-licensed binary library, Section 4 would >> rather be what you want. It would require you to: >> ?4a) say somewhere ("prominently" is vague, the bottom of a README is >> OK) that you use the library >> ?4b) distribute copies of the GPL and LGPL licenses text. Pointless, >> but not a big issue. >> >> the rest doesn't matter: >> ?4c) not applicable to you >> ?4d1) this is what you would be doing anyway > > Possibly, but shared libraries are not easy for a variety of boring, > Python-specific, technical reasons. Ah, that I didn't know. >> ?4e) not applicable to you > > Yes, it is. The exception where Installation Information is not > required is only when installation is impossible, such as embedded > devices where the code is in a ROM chip. OK, I didn't understand that. > >> Finally and this is the important point: you would not be passing any >> requirement to your own users. Indeed, the LGPL license, contrary to >> the GPL license, does not propagate through dependency chains. So if >> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be >> met when distributing NumPy, but NumPy itself isn't LGPL at all and an >> application using NumPy does not have to care at all about the LGPL. >> So there should be no concern at all of "passing on LGPL requirements >> to users" > > No, not at all. The GPL "propagates" by requiring that the rest of the > code be licensed compatibly with the GPL. This is an unusual and > particular feature of the GPL. The LGPL does not require that rest of > the code be licensed in a particular way. However, that doesn't mean > that the license of the "outer layer" insulates the downstream user > from the LGPL license of the wrapped component. It just means that > there is BSD code and LGPL code in the total product. The downstream > user must accept and deal with the licenses of *all* of the components > simultaneously. This is how most licenses work. I think that the fact > that the GPL is particularly "viral" may be obscuring the normal way > that licenses work when combined with other licenses. > > If I had a proprietary application that used an LGPL library, and I > gave my customers some limited rights to modify and resell my > application, they would still be bound by the LGPL with respect to the > library. They could not modify the LGPLed library and sell it under a > proprietary license even if I allow them to do that with the > application as a whole. For us to use Eigen2 in numpy such that our > users could use, modify and redistribute numpy+Eigen2, in its > entirety, under the terms of the BSD license, we would have to get > permission from you to distribute Eigen2 under the BSD license. It's > only polite. OK, so the Eigen code inside of NumPy would still be protected by the LGPL. But what I meant when I said that the LGPL requirements don't propagate to your users, was that, for example, they don't have to distribute copies of the LGPL text, installation information for Eigen, or links to Eigen's website. The only requirement, if I understand well, is that _if_ a NumPy user wanted to make modifications to Eigen itself, he would have to conform to the LGPL requirements about sharing the modified source code. But is it really a requirement of NumPy that all its dependencies must be free to modify without redistributing the modified source code? Don't you use MKL, for which the source code is not available at all? I am not sure that I understand how that is better than having source code subject to LGPL requirements. Benoit From robert.kern at gmail.com Mon Jan 18 11:37:23 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jan 2010 10:37:23 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <3d375d731001170936y38dd7538l3cb1ccece61e0ae2@mail.gmail.com> <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> Message-ID: <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> On Mon, Jan 18, 2010 at 10:26, Benoit Jacob wrote: > 2010/1/18 Robert Kern : >> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob wrote: >> >>> Sorry for continuing the licensing noise on your list --- I though >>> that now that I've started, I should let you know that I think I >>> understand things more clearly now ;) >> >> No worries. >> >>> First, Section 5 of the LGPL is horrible indeed, so let's forget about that. >> >> I don't think it's that horrible, honestly. It just applies to a >> different deployment use case and a different set of technologies. >> >>> If you were using a LGPL-licensed binary library, Section 4 would >>> rather be what you want. It would require you to: >>> ?4a) say somewhere ("prominently" is vague, the bottom of a README is >>> OK) that you use the library >>> ?4b) distribute copies of the GPL and LGPL licenses text. Pointless, >>> but not a big issue. >>> >>> the rest doesn't matter: >>> ?4c) not applicable to you >>> ?4d1) this is what you would be doing anyway >> >> Possibly, but shared libraries are not easy for a variety of boring, >> Python-specific, technical reasons. > > Ah, that I didn't know. > >>> ?4e) not applicable to you >> >> Yes, it is. The exception where Installation Information is not >> required is only when installation is impossible, such as embedded >> devices where the code is in a ROM chip. > > OK, I didn't understand that. > >> >>> Finally and this is the important point: you would not be passing any >>> requirement to your own users. Indeed, the LGPL license, contrary to >>> the GPL license, does not propagate through dependency chains. So if >>> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be >>> met when distributing NumPy, but NumPy itself isn't LGPL at all and an >>> application using NumPy does not have to care at all about the LGPL. >>> So there should be no concern at all of "passing on LGPL requirements >>> to users" >> >> No, not at all. The GPL "propagates" by requiring that the rest of the >> code be licensed compatibly with the GPL. This is an unusual and >> particular feature of the GPL. The LGPL does not require that rest of >> the code be licensed in a particular way. However, that doesn't mean >> that the license of the "outer layer" insulates the downstream user >> from the LGPL license of the wrapped component. It just means that >> there is BSD code and LGPL code in the total product. The downstream >> user must accept and deal with the licenses of *all* of the components >> simultaneously. This is how most licenses work. I think that the fact >> that the GPL is particularly "viral" may be obscuring the normal way >> that licenses work when combined with other licenses. >> >> If I had a proprietary application that used an LGPL library, and I >> gave my customers some limited rights to modify and resell my >> application, they would still be bound by the LGPL with respect to the >> library. They could not modify the LGPLed library and sell it under a >> proprietary license even if I allow them to do that with the >> application as a whole. For us to use Eigen2 in numpy such that our >> users could use, modify and redistribute numpy+Eigen2, in its >> entirety, under the terms of the BSD license, we would have to get >> permission from you to distribute Eigen2 under the BSD license. It's >> only polite. > > OK, so the Eigen code inside of NumPy would still be protected by the > LGPL. But what I meant when I said that the LGPL requirements don't > propagate to your users, was that, for example, they don't have to > distribute copies of the LGPL text, installation information for > Eigen, or links to Eigen's website. Yes, they do. They are redistributing Eigen; they must abide by its license in all respects. It doesn't matter how much it is wrapped. > The only requirement, if I understand well, is that _if_ a NumPy user > wanted to make modifications to ?Eigen itself, he would have to > conform to the LGPL requirements about sharing the modified source > code. > > But is it really a requirement of NumPy that all its dependencies must > be free to modify without redistributing the modified source code? For the default build and the official binaries, yes. > Don't you use MKL, for which the source code is not available at all? No, we don't. It is a build option. If you were to provide a BLAS interface to Eigen, Eigen would be another option. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jacob.benoit.1 at gmail.com Mon Jan 18 11:46:02 2010 From: jacob.benoit.1 at gmail.com (Benoit Jacob) Date: Mon, 18 Jan 2010 11:46:02 -0500 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> References: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> Message-ID: 2010/1/18 Robert Kern : > On Mon, Jan 18, 2010 at 10:26, Benoit Jacob wrote: >> 2010/1/18 Robert Kern : >>> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob wrote: >>> >>>> Sorry for continuing the licensing noise on your list --- I though >>>> that now that I've started, I should let you know that I think I >>>> understand things more clearly now ;) >>> >>> No worries. >>> >>>> First, Section 5 of the LGPL is horrible indeed, so let's forget about that. >>> >>> I don't think it's that horrible, honestly. It just applies to a >>> different deployment use case and a different set of technologies. >>> >>>> If you were using a LGPL-licensed binary library, Section 4 would >>>> rather be what you want. It would require you to: >>>> ?4a) say somewhere ("prominently" is vague, the bottom of a README is >>>> OK) that you use the library >>>> ?4b) distribute copies of the GPL and LGPL licenses text. Pointless, >>>> but not a big issue. >>>> >>>> the rest doesn't matter: >>>> ?4c) not applicable to you >>>> ?4d1) this is what you would be doing anyway >>> >>> Possibly, but shared libraries are not easy for a variety of boring, >>> Python-specific, technical reasons. >> >> Ah, that I didn't know. >> >>>> ?4e) not applicable to you >>> >>> Yes, it is. The exception where Installation Information is not >>> required is only when installation is impossible, such as embedded >>> devices where the code is in a ROM chip. >> >> OK, I didn't understand that. >> >>> >>>> Finally and this is the important point: you would not be passing any >>>> requirement to your own users. Indeed, the LGPL license, contrary to >>>> the GPL license, does not propagate through dependency chains. So if >>>> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be >>>> met when distributing NumPy, but NumPy itself isn't LGPL at all and an >>>> application using NumPy does not have to care at all about the LGPL. >>>> So there should be no concern at all of "passing on LGPL requirements >>>> to users" >>> >>> No, not at all. The GPL "propagates" by requiring that the rest of the >>> code be licensed compatibly with the GPL. This is an unusual and >>> particular feature of the GPL. The LGPL does not require that rest of >>> the code be licensed in a particular way. However, that doesn't mean >>> that the license of the "outer layer" insulates the downstream user >>> from the LGPL license of the wrapped component. It just means that >>> there is BSD code and LGPL code in the total product. The downstream >>> user must accept and deal with the licenses of *all* of the components >>> simultaneously. This is how most licenses work. I think that the fact >>> that the GPL is particularly "viral" may be obscuring the normal way >>> that licenses work when combined with other licenses. >>> >>> If I had a proprietary application that used an LGPL library, and I >>> gave my customers some limited rights to modify and resell my >>> application, they would still be bound by the LGPL with respect to the >>> library. They could not modify the LGPLed library and sell it under a >>> proprietary license even if I allow them to do that with the >>> application as a whole. For us to use Eigen2 in numpy such that our >>> users could use, modify and redistribute numpy+Eigen2, in its >>> entirety, under the terms of the BSD license, we would have to get >>> permission from you to distribute Eigen2 under the BSD license. It's >>> only polite. >> >> OK, so the Eigen code inside of NumPy would still be protected by the >> LGPL. But what I meant when I said that the LGPL requirements don't >> propagate to your users, was that, for example, they don't have to >> distribute copies of the LGPL text, installation information for >> Eigen, or links to Eigen's website. > > Yes, they do. They are redistributing Eigen; they must abide by its > license in all respects. It doesn't matter how much it is wrapped. Well this is where I'm not sure if I agree, I am asking the FSF right now as, if this were the case, I too would find such a clause very inconvenient for users. > >> The only requirement, if I understand well, is that _if_ a NumPy user >> wanted to make modifications to ?Eigen itself, he would have to >> conform to the LGPL requirements about sharing the modified source >> code. >> >> But is it really a requirement of NumPy that all its dependencies must >> be free to modify without redistributing the modified source code? > > For the default build and the official binaries, yes. OK. > >> Don't you use MKL, for which the source code is not available at all? > > No, we don't. It is a build option. If you were to provide a BLAS > interface to Eigen, Eigen would be another option. OK, then I guess that this is what will happen once we release the BLAS library. Thanks for your patience Benoit From amenity at enthought.com Mon Jan 18 11:55:20 2010 From: amenity at enthought.com (Amenity Applewhite) Date: Mon, 18 Jan 2010 10:55:20 -0600 Subject: [Numpy-discussion] EPD 6.0 and IPython Webinar Friday References: <0AE0D056-D7BB-498B-A14D-AAF9A90ED8F2@enthought.com> Message-ID: <94400778-AE3F-46A4-8B92-C86CB6DAD95A@enthought.com> Email not displaying correctly? View it in your browser. Happy 2010! To start the year off, we've released a new version of EPD and lined up a solid set of training options. Scientific Computing with Python Webinar This Friday, Travis Oliphant will then provide an introduction to multiprocessing and iPython.kernal. Scientific Computing with Python Webinar Multiprocessing and iPython.kernal Friday, January 22: 1pm CST/7pm UTC Register Enthought Live Training Enthought's intensive training courses are offered in 3-5 day sessions. The Python skills you'll acquire will save you and your organization time and money in 2010. Enthought Open Course February 22-26, Austin, TX ? Python for Scientists and Engineers ? Interfacing with C / C++ and Fortran ? Introduction to UIs and Visualization Enjoy! The Enthought Team EPD 6.0 Released Now available in our repository, EPD 6.0 includes Python 2.6, PiCloud's cloud library, and NumPy 1.4... Not to mention 64-bit support for Windows, OSX, and Linux. Details. Download now. New: Enthought channel on YouTube Short instructional videos straight from the desktops of our developers. Get started with a 4-part series on interpolation with SciPy. Our mailing address is: Enthought, Inc. 515 Congress Ave. Austin, TX 78701 Copyright (C) 2009 Enthought, Inc. All rights reserved. Forward this email to a friend -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Mon Jan 18 12:12:03 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 18 Jan 2010 11:12:03 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: References: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> Message-ID: <4B549663.5020404@gmail.com> On 01/18/2010 10:46 AM, Benoit Jacob wrote: > 2010/1/18 Robert Kern: > >> On Mon, Jan 18, 2010 at 10:26, Benoit Jacob wrote: >> >>> 2010/1/18 Robert Kern: >>> >>>> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob wrote: >>>> >>>> >>>>> Sorry for continuing the licensing noise on your list --- I though >>>>> that now that I've started, I should let you know that I think I >>>>> understand things more clearly now ;) >>>>> >>>> No worries. >>>> >>>> >>>>> First, Section 5 of the LGPL is horrible indeed, so let's forget about that. >>>>> >>>> I don't think it's that horrible, honestly. It just applies to a >>>> different deployment use case and a different set of technologies. >>>> >>>> >>>>> If you were using a LGPL-licensed binary library, Section 4 would >>>>> rather be what you want. It would require you to: >>>>> 4a) say somewhere ("prominently" is vague, the bottom of a README is >>>>> OK) that you use the library >>>>> 4b) distribute copies of the GPL and LGPL licenses text. Pointless, >>>>> but not a big issue. >>>>> >>>>> the rest doesn't matter: >>>>> 4c) not applicable to you >>>>> 4d1) this is what you would be doing anyway >>>>> >>>> Possibly, but shared libraries are not easy for a variety of boring, >>>> Python-specific, technical reasons. >>>> >>> Ah, that I didn't know. >>> >>> >>>>> 4e) not applicable to you >>>>> >>>> Yes, it is. The exception where Installation Information is not >>>> required is only when installation is impossible, such as embedded >>>> devices where the code is in a ROM chip. >>>> >>> OK, I didn't understand that. >>> >>> >>>> >>>>> Finally and this is the important point: you would not be passing any >>>>> requirement to your own users. Indeed, the LGPL license, contrary to >>>>> the GPL license, does not propagate through dependency chains. So if >>>>> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be >>>>> met when distributing NumPy, but NumPy itself isn't LGPL at all and an >>>>> application using NumPy does not have to care at all about the LGPL. >>>>> So there should be no concern at all of "passing on LGPL requirements >>>>> to users" >>>>> >>>> No, not at all. The GPL "propagates" by requiring that the rest of the >>>> code be licensed compatibly with the GPL. This is an unusual and >>>> particular feature of the GPL. The LGPL does not require that rest of >>>> the code be licensed in a particular way. However, that doesn't mean >>>> that the license of the "outer layer" insulates the downstream user >>>> from the LGPL license of the wrapped component. It just means that >>>> there is BSD code and LGPL code in the total product. The downstream >>>> user must accept and deal with the licenses of *all* of the components >>>> simultaneously. This is how most licenses work. I think that the fact >>>> that the GPL is particularly "viral" may be obscuring the normal way >>>> that licenses work when combined with other licenses. >>>> >>>> If I had a proprietary application that used an LGPL library, and I >>>> gave my customers some limited rights to modify and resell my >>>> application, they would still be bound by the LGPL with respect to the >>>> library. They could not modify the LGPLed library and sell it under a >>>> proprietary license even if I allow them to do that with the >>>> application as a whole. For us to use Eigen2 in numpy such that our >>>> users could use, modify and redistribute numpy+Eigen2, in its >>>> entirety, under the terms of the BSD license, we would have to get >>>> permission from you to distribute Eigen2 under the BSD license. It's >>>> only polite. >>>> >>> OK, so the Eigen code inside of NumPy would still be protected by the >>> LGPL. But what I meant when I said that the LGPL requirements don't >>> propagate to your users, was that, for example, they don't have to >>> distribute copies of the LGPL text, installation information for >>> Eigen, or links to Eigen's website. >>> >> Yes, they do. They are redistributing Eigen; they must abide by its >> license in all respects. It doesn't matter how much it is wrapped. >> > Well this is where I'm not sure if I agree, I am asking the FSF right > now as, if this were the case, I too would find such a clause very > inconvenient for users. > > If you obtain the code from any package then you are bound by the terms of that code. So while a user might not be 'inconvenienced' by the LGPL, they are required to meet the terms as required. For some licenses (like the LGPL) these terms do not really apply until you distribute the code but that does not mean that the user is exempt from the licensing terms of that code because they have not distributed their code (yet). Furthermore there are a number of numpy users that download the numpy project for further distribution such as Enthought, packagers for Linux distributions and developers of projects like Python(x,y). Some of these users would be inconvenienced because binary-only distributions would not be permitted in any form. Bruce From eadrogue at gmx.net Mon Jan 18 13:43:39 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 18 Jan 2010 19:43:39 +0100 Subject: [Numpy-discussion] logic problem Message-ID: <20100118184339.GA19364@doriath.local> Hi, This is hard to explain. In this code: reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) where m1, m2 and m3 are boolean arrays, I'm trying to figure out an expression that works with an arbitrary number of arrays, not just 3. Any idea?? Bye. From bsouthey at gmail.com Mon Jan 18 14:10:01 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 18 Jan 2010 13:10:01 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <20100118194720.cx3nrid4aogk4k48@160.103.2.152> References: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> <4B549663.5020404@gmail.com> <20100118194720.cx3nrid4aogk4k48@160.103.2.152> Message-ID: <4B54B209.6090604@gmail.com> On 01/18/2010 12:47 PM, Vicente Sole wrote: > Quoting Bruce Southey : > >> >> If you obtain the code from any package then you are bound by the terms >> of that code. So while a user might not be 'inconvenienced' by the LGPL, >> they are required to meet the terms as required. For some licenses (like >> the LGPL) these terms do not really apply until you distribute the code >> but that does not mean that the user is exempt from the licensing terms >> of that code because they have not distributed their code (yet). >> >> Furthermore there are a number of numpy users that download the numpy >> project for further distribution such as Enthought, packagers for Linux >> distributions and developers of projects like Python(x,y). Some of these >> users would be inconvenienced because binary-only distributions would >> not be permitted in any form. >> > > I think people are confusing LGPL and GPL... Not at all. > > I can distribute my code in binary form without any restriction when > using an LGPL library UNLESS I have modified the library itself. I do not interpret the LGPL version 3 in this way: A "Combined Work" is a work produced by combining or linking an Application with the Library. So you must apply section 4, in particular, provide the "Minimal Corresponding Source": The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version. So a binary-only is usually not appropriate. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 18 14:17:07 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 Jan 2010 14:17:07 -0500 Subject: [Numpy-discussion] logic problem In-Reply-To: <20100118184339.GA19364@doriath.local> References: <20100118184339.GA19364@doriath.local> Message-ID: <1cd32cbb1001181117v78f20b32n2ad121edffa8ffaa@mail.gmail.com> 2010/1/18 Ernest Adrogu? : > Hi, > > This is hard to explain. In this code: > > reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) > > where m1, m2 and m3 are boolean arrays, I'm trying to figure > out an expression that works with an arbitrary number of > arrays, not just 3. Any idea?? What's the shape of mi (dimension)? fixed or arbitrary number of dimension? a loop is the most memory efficient array broadcasting builds large arrays (and maybe has redundant calculations), but might be a one-liner or something like list comprehension m = [m1, m2, ... mn] reduce(np.logical_or, [mi & mj for (i, mi) in enumerate(m) for (j, mj) in enumerate(m) if i>> m = [np.arange(10)<5, np.arange(10)>3, np.arange(10)>8] >>> m [array([ True, True, True, True, True, False, False, False, False, False], dtype=bool), array([False, False, False, False, True, True, True, True, True, True], dtype=bool), array([False, False, False, False, False, False, False, False, False, True], dtype=bool)] >>> reduce(np.logical_or, [mi & mj for (i, mi) in enumerate(m) for (j, mj) in enumerate(m) if i > Bye. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From warren.weckesser at enthought.com Mon Jan 18 14:18:11 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 18 Jan 2010 13:18:11 -0600 Subject: [Numpy-discussion] logic problem In-Reply-To: <20100118184339.GA19364@doriath.local> References: <20100118184339.GA19364@doriath.local> Message-ID: <4B54B3F3.6060306@enthought.com> Ernest Adrogu? wrote: > Hi, > > This is hard to explain. In this code: > > reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) > > where m1, m2 and m3 are boolean arrays, I'm trying to figure > out an expression that works with an arbitrary number of > arrays, not just 3. Any idea?? > > If I understand the problem correctly, you want the result to be True whenever any pair of the corresponding elements of the arrays are True. This could work: reduce(np.add, [m.astype(int) for m in mlist]) > 1 where mlist is a list of the boolean array (e.g. mlist = [m1, m2, m3] in your example). Warren > Bye. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 18 14:23:56 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 Jan 2010 14:23:56 -0500 Subject: [Numpy-discussion] logic problem In-Reply-To: <4B54B3F3.6060306@enthought.com> References: <20100118184339.GA19364@doriath.local> <4B54B3F3.6060306@enthought.com> Message-ID: <1cd32cbb1001181123mc8de8b5t40c42417df2e0e69@mail.gmail.com> On Mon, Jan 18, 2010 at 2:18 PM, Warren Weckesser wrote: > Ernest Adrogu? wrote: >> Hi, >> >> This is hard to explain. In this code: >> >> reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) >> >> where m1, m2 and m3 are boolean arrays, I'm trying to figure >> out an expression that works with an arbitrary number of >> arrays, not just 3. Any idea?? >> >> > > If I understand the problem correctly, you want the result to be True > whenever any pair of the corresponding elements of the arrays are True. > > This could work: > > reduce(np.add, [m.astype(int) for m in mlist]) > 1 > > where mlist is a list of the boolean array (e.g. mlist = [m1, m2, m3] in > your example). much nicer than what I came up with. Does iterator instead of intermediate list work (same for my list comprehension)? reduce(np.add, (m.astype(int) for m in mlist)) > 1 Josef > > Warren >> Bye. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sole at esrf.fr Mon Jan 18 14:34:28 2010 From: sole at esrf.fr (Vicente Sole) Date: Mon, 18 Jan 2010 20:34:28 +0100 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <4B54B209.6090604@gmail.com> References: <3d375d731001171043s2e1f90e1t77d6923eb3a6f4cd@mail.gmail.com> <3d375d731001171157n1ee58b36ld8c5bca3fa2088f7@mail.gmail.com> <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> <4B549663.5020404@gmail.com> <20100118194720.cx3nrid4aogk4k48@160.103.2.152> <4B54B209.6090604@gmail.com> Message-ID: <20100118203428.o83tfn6680kc8sc4@160.103.2.152> Quoting Bruce Southey : > On 01/18/2010 12:47 PM, Vicente Sole wrote: >> Quoting Bruce Southey : >> >>> >>> If you obtain the code from any package then you are bound by the terms >>> of that code. So while a user might not be 'inconvenienced' by the LGPL, >>> they are required to meet the terms as required. For some licenses (like >>> the LGPL) these terms do not really apply until you distribute the code >>> but that does not mean that the user is exempt from the licensing terms >>> of that code because they have not distributed their code (yet). >>> >>> Furthermore there are a number of numpy users that download the numpy >>> project for further distribution such as Enthought, packagers for Linux >>> distributions and developers of projects like Python(x,y). Some of these >>> users would be inconvenienced because binary-only distributions would >>> not be permitted in any form. >>> >> >> I think people are confusing LGPL and GPL... > Not at all. > >> >> I can distribute my code in binary form without any restriction >> when using an LGPL library UNLESS I have modified the library itself. > > I do not interpret the LGPL version 3 in this way: > A "Combined Work" is a work produced by combining or linking an > Application with the Library. > So you must apply section 4, in particular, provide the "Minimal > Corresponding Source": > The "Minimal Corresponding Source" for a Combined Work means the > Corresponding Source for the Combined Work, excluding any source code > for portions of the Combined Work that, considered in isolation, are > based on the Application, and not on the Linked Version. > > So a binary-only is usually not appropriate. > You are taking point 4.d)0 while I am taking 4.d)1: """ 1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version. """ If you are using the library as a shared library (what you do most of the times in Python), you are quite free. In any case, it seems I am not the only one seeing it like that: http://qt.nokia.com/downloads The key point is if you use the library "as is" or you have modified it. Armando From warren.weckesser at enthought.com Mon Jan 18 14:39:30 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 18 Jan 2010 13:39:30 -0600 Subject: [Numpy-discussion] logic problem In-Reply-To: <1cd32cbb1001181123mc8de8b5t40c42417df2e0e69@mail.gmail.com> References: <20100118184339.GA19364@doriath.local> <4B54B3F3.6060306@enthought.com> <1cd32cbb1001181123mc8de8b5t40c42417df2e0e69@mail.gmail.com> Message-ID: <4B54B8F2.9080903@enthought.com> josef.pktd at gmail.com wrote: > On Mon, Jan 18, 2010 at 2:18 PM, Warren Weckesser > wrote: > >> Ernest Adrogu? wrote: >> >>> Hi, >>> >>> This is hard to explain. In this code: >>> >>> reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) >>> >>> where m1, m2 and m3 are boolean arrays, I'm trying to figure >>> out an expression that works with an arbitrary number of >>> arrays, not just 3. Any idea?? >>> >>> >>> >> If I understand the problem correctly, you want the result to be True >> whenever any pair of the corresponding elements of the arrays are True. >> >> This could work: >> >> reduce(np.add, [m.astype(int) for m in mlist]) > 1 >> >> where mlist is a list of the boolean array (e.g. mlist = [m1, m2, m3] in >> your example). >> > > much nicer than what I came up with. Does iterator instead of > intermediate list work (same for my list comprehension)? > > reduce(np.add, (m.astype(int) for m in mlist)) > 1 > > Yes, that works and is preferable, especially if the arrays are large or the list is long. Warren > Josef > > > From robert.kern at gmail.com Mon Jan 18 14:39:26 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jan 2010 13:39:26 -0600 Subject: [Numpy-discussion] performance matrix multiplication vs. matlab In-Reply-To: <20100118203428.o83tfn6680kc8sc4@160.103.2.152> References: <3d375d731001180804w7843c749ucf2b1f3fd46b6fba@mail.gmail.com> <3d375d731001180837m7ad230aie127acab4abfe184@mail.gmail.com> <4B549663.5020404@gmail.com> <20100118194720.cx3nrid4aogk4k48@160.103.2.152> <4B54B209.6090604@gmail.com> <20100118203428.o83tfn6680kc8sc4@160.103.2.152> Message-ID: <3d375d731001181139i22d40823x7cead4fd1c3cf2a6@mail.gmail.com> On Mon, Jan 18, 2010 at 13:34, Vicente Sole wrote: > You are taking point 4.d)0 while I am taking 4.d)1: > > """ > 1) Use a suitable shared library mechanism for linking with the > Library. A suitable mechanism is one that (a) uses at run time a copy > of the Library already present on the user's computer system, and (b) > will operate properly with a modified version of the Library that is > interface-compatible with the Linked Version. > """ > > If you are using the library as a shared library (what you do most of > the times in Python), you are quite free. numpy would not be using Eigen2 as a shared library. It is true that numpy would act as a shared library with respect to some downstream application, but incorporating Eigen2 into numpy would make those numpy binaries be effectively under the LGPL license with respect to the downstream application. > In any case, it seems I am not the only one seeing it like that: > > http://qt.nokia.com/downloads > > The key point is if you use the library "as is" or you have modified it. With respect to numpy and the way that Eigen2 was proposed as being used, no, it is not the key point. We will not incorporate Eigen2 code into numpy, particularly not as the default linear algebra implementation, because we wish to keep numpy's source as being only BSD. This is a policy decision of the numpy team, not a legal incompatibility. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From russel at appliedminds.com Mon Jan 18 14:41:47 2010 From: russel at appliedminds.com (Russel Howe) Date: Mon, 18 Jan 2010 11:41:47 -0800 Subject: [Numpy-discussion] Is this expected behavior? Message-ID: <4B54B97B.1080000@appliedminds.com> This looks like the difference between memmove and memcpy to me, but I am not sure what the expected behavior of numpy should be. The first shift behaves the way I expect, the second is surprising. I know about numpy.roll. I was hoping for something faster, which this would be if it worked. In [1]: a = (np.random.random((10,10))*10).astype('u1') In [2]: a Out[2]: array([[8, 0, 5, 4, 8, 2, 7, 8, 7, 6], [6, 6, 3, 3, 9, 8, 0, 8, 9, 5], [5, 0, 1, 1, 2, 5, 8, 2, 5, 3], [9, 0, 0, 2, 8, 2, 0, 7, 7, 0], [9, 8, 6, 9, 6, 3, 9, 4, 4, 5], [2, 7, 6, 9, 3, 8, 9, 9, 6, 9], [2, 8, 8, 4, 0, 3, 7, 6, 7, 6], [2, 4, 9, 2, 4, 7, 3, 6, 7, 4], [3, 2, 0, 7, 0, 7, 6, 6, 1, 6], [2, 3, 8, 8, 9, 6, 7, 2, 5, 0]], dtype=uint8) In [3]: a[:, :-1] = a[:, 1:] In [4]: a Out[4]: array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], [6, 3, 3, 9, 8, 0, 8, 9, 5, 5], [0, 1, 1, 2, 5, 8, 2, 5, 3, 3], [0, 0, 2, 8, 2, 0, 7, 7, 0, 0], [8, 6, 9, 6, 3, 9, 4, 4, 5, 5], [7, 6, 9, 3, 8, 9, 9, 6, 9, 9], [8, 8, 4, 0, 3, 7, 6, 7, 6, 6], [4, 9, 2, 4, 7, 3, 6, 7, 4, 4], [2, 0, 7, 0, 7, 6, 6, 1, 6, 6], [3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) In [5]: a[:, 1:] = a[:, :-1] In [6]: a Out[6]: array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) In [7]: np.__version__ Out[7]: '1.3.0' From eadrogue at gmx.net Mon Jan 18 14:48:13 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 18 Jan 2010 20:48:13 +0100 Subject: [Numpy-discussion] logic problem In-Reply-To: <1cd32cbb1001181117v78f20b32n2ad121edffa8ffaa@mail.gmail.com> References: <20100118184339.GA19364@doriath.local> <1cd32cbb1001181117v78f20b32n2ad121edffa8ffaa@mail.gmail.com> Message-ID: <20100118194813.GA19571@doriath.local> 18/01/10 @ 14:17 (-0500), thus spake josef.pktd at gmail.com: > 2010/1/18 Ernest Adrogu? : > > Hi, > > > > This is hard to explain. In this code: > > > > reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) > > > > where m1, m2 and m3 are boolean arrays, I'm trying to figure > > out an expression that works with an arbitrary number of > > arrays, not just 3. Any idea?? > > What's the shape of mi (dimension)? fixed or arbitrary number of dimension? > a loop is the most memory efficient I forgot to mention, mi are 1-dimensional, all the same length of course. > array broadcasting builds large arrays (and maybe has redundant > calculations), but might be a one-liner > > or something like list comprehension > > m = [m1, m2, ... mn] > reduce(np.logical_or, [mi & mj for (i, mi) in enumerate(m) for (j, mj) > in enumerate(m) if i > >>> m = [np.arange(10)<5, np.arange(10)>3, np.arange(10)>8] > >>> m > [array([ True, True, True, True, True, False, False, False, False, > False], dtype=bool), array([False, False, False, False, True, True, > True, True, True, True], dtype=bool), array([False, False, False, > False, False, False, False, False, False, True], dtype=bool)] > > >>> reduce(np.logical_or, [mi & mj for (i, mi) in enumerate(m) for (j, mj) in enumerate(m) if i array([False, False, False, False, True, False, False, False, False, > True], dtype=bool) > > Josef > > > > > > Bye. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Mon Jan 18 14:47:48 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jan 2010 13:47:48 -0600 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <4B54B97B.1080000@appliedminds.com> References: <4B54B97B.1080000@appliedminds.com> Message-ID: <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> On Mon, Jan 18, 2010 at 13:41, Russel Howe wrote: > This looks like the difference between memmove and memcpy to me, but I > am not sure what the expected behavior of numpy should be. ?The first > shift behaves the way I expect, the second is surprising. memmove() and memcpy() are not used for these operations (and in general, they can't be). Rather, iterators are created and looped over to do the assignments. Because you are not making copies on the right-hand-side, you are modifying the RHS as the iterators assign to the LHS. > In [3]: a[:, :-1] = a[:, 1:] > > In [4]: a > Out[4]: > array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], > ? ? ? ?[6, 3, 3, 9, 8, 0, 8, 9, 5, 5], > ? ? ? ?[0, 1, 1, 2, 5, 8, 2, 5, 3, 3], > ? ? ? ?[0, 0, 2, 8, 2, 0, 7, 7, 0, 0], > ? ? ? ?[8, 6, 9, 6, 3, 9, 4, 4, 5, 5], > ? ? ? ?[7, 6, 9, 3, 8, 9, 9, 6, 9, 9], > ? ? ? ?[8, 8, 4, 0, 3, 7, 6, 7, 6, 6], > ? ? ? ?[4, 9, 2, 4, 7, 3, 6, 7, 4, 4], > ? ? ? ?[2, 0, 7, 0, 7, 6, 6, 1, 6, 6], > ? ? ? ?[3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) The first one works because the RHS pointer is always one step ahead of the LHS pointer, thus it always reads pristine data. > In [5]: a[:, 1:] = a[:, :-1] > > In [6]: a > Out[6]: > array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], > ? ? ? ?[6, 6, 6, 6, 6, 6, 6, 6, 6, 6], > ? ? ? ?[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], > ? ? ? ?[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], > ? ? ? ?[8, 8, 8, 8, 8, 8, 8, 8, 8, 8], > ? ? ? ?[7, 7, 7, 7, 7, 7, 7, 7, 7, 7], > ? ? ? ?[8, 8, 8, 8, 8, 8, 8, 8, 8, 8], > ? ? ? ?[4, 4, 4, 4, 4, 4, 4, 4, 4, 4], > ? ? ? ?[2, 2, 2, 2, 2, 2, 2, 2, 2, 2], > ? ? ? ?[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) The second one fails to work as you expect because the RHS pointer is always one step behind the LHS pointer, thus it always reads the data that just got modified in the previous step. The data you expected it to read has already been wiped out. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From eadrogue at gmx.net Mon Jan 18 14:52:30 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 18 Jan 2010 20:52:30 +0100 Subject: [Numpy-discussion] logic problem In-Reply-To: <4B54B3F3.6060306@enthought.com> References: <20100118184339.GA19364@doriath.local> <4B54B3F3.6060306@enthought.com> Message-ID: <20100118195230.GB19571@doriath.local> 18/01/10 @ 13:18 (-0600), thus spake Warren Weckesser: > Ernest Adrogu? wrote: > > Hi, > > > > This is hard to explain. In this code: > > > > reduce(np.logical_or, [m1 & m2, m1 & m3, m2 & m3]) > > > > where m1, m2 and m3 are boolean arrays, I'm trying to figure > > out an expression that works with an arbitrary number of > > arrays, not just 3. Any idea?? > > > > > > If I understand the problem correctly, you want the result to be True > whenever any pair of the corresponding elements of the arrays are True. Exactly. > This could work: > > reduce(np.add, [m.astype(int) for m in mlist]) > 1 > > where mlist is a list of the boolean array (e.g. mlist = [m1, m2, m3] in > your example). Very clever. Thanks a lot! > Warren > > Bye. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From russel at appliedminds.com Mon Jan 18 14:51:58 2010 From: russel at appliedminds.com (Russel Howe) Date: Mon, 18 Jan 2010 11:51:58 -0800 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> References: <4B54B97B.1080000@appliedminds.com> <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> Message-ID: <4B54BBDE.7080200@appliedminds.com> Since they are iterators, is it possible to check for the second condition and reverse both of them so the behavior I expect happens or does this break something else? Russel Robert Kern wrote: > On Mon, Jan 18, 2010 at 13:41, Russel Howe wrote: >> This looks like the difference between memmove and memcpy to me, but I >> am not sure what the expected behavior of numpy should be. The first >> shift behaves the way I expect, the second is surprising. > > memmove() and memcpy() are not used for these operations (and in > general, they can't be). Rather, iterators are created and looped over > to do the assignments. Because you are not making copies on the > right-hand-side, you are modifying the RHS as the iterators assign to > the LHS. > >> In [3]: a[:, :-1] = a[:, 1:] >> >> In [4]: a >> Out[4]: >> array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], >> [6, 3, 3, 9, 8, 0, 8, 9, 5, 5], >> [0, 1, 1, 2, 5, 8, 2, 5, 3, 3], >> [0, 0, 2, 8, 2, 0, 7, 7, 0, 0], >> [8, 6, 9, 6, 3, 9, 4, 4, 5, 5], >> [7, 6, 9, 3, 8, 9, 9, 6, 9, 9], >> [8, 8, 4, 0, 3, 7, 6, 7, 6, 6], >> [4, 9, 2, 4, 7, 3, 6, 7, 4, 4], >> [2, 0, 7, 0, 7, 6, 6, 1, 6, 6], >> [3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) > > The first one works because the RHS pointer is always one step ahead > of the LHS pointer, thus it always reads pristine data. > >> In [5]: a[:, 1:] = a[:, :-1] >> >> In [6]: a >> Out[6]: >> array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >> [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], >> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >> [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], >> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], >> [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], >> [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) > > The second one fails to work as you expect because the RHS pointer is > always one step behind the LHS pointer, thus it always reads the data > that just got modified in the previous step. The data you expected it > to read has already been wiped out. > From robert.kern at gmail.com Mon Jan 18 15:03:21 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jan 2010 14:03:21 -0600 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <4B54BBDE.7080200@appliedminds.com> References: <4B54B97B.1080000@appliedminds.com> <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> <4B54BBDE.7080200@appliedminds.com> Message-ID: <3d375d731001181203i44da4d04yc6497cf093a913c8@mail.gmail.com> On Mon, Jan 18, 2010 at 13:51, Russel Howe wrote: > Since they are iterators, is it possible to check for the second > condition and reverse both of them so the behavior I expect happens or > does this ?break something else? In general, no I don't think it would be possible. It would create a weird special case in the semantics, and slow down common assignments that don't have the issue. It would be nice to have a fast in-place roll(), though this is not how one should implement it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From warren.weckesser at enthought.com Mon Jan 18 15:18:30 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 18 Jan 2010 14:18:30 -0600 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <4B54BBDE.7080200@appliedminds.com> References: <4B54B97B.1080000@appliedminds.com> <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> <4B54BBDE.7080200@appliedminds.com> Message-ID: <4B54C216.40105@enthought.com> Russel Howe wrote: > Since they are iterators, is it possible to check for the second > condition and reverse both of them so the behavior I expect happens or > does this break something else? > You may already know this, but just in case... In the second case, you can accomplish the shift by using reversed slices: a[:, -1:0:-1] = a[:, -2::-1] Warren > Russel > Robert Kern wrote: > >> On Mon, Jan 18, 2010 at 13:41, Russel Howe wrote: >> >>> This looks like the difference between memmove and memcpy to me, but I >>> am not sure what the expected behavior of numpy should be. The first >>> shift behaves the way I expect, the second is surprising. >>> >> memmove() and memcpy() are not used for these operations (and in >> general, they can't be). Rather, iterators are created and looped over >> to do the assignments. Because you are not making copies on the >> right-hand-side, you are modifying the RHS as the iterators assign to >> the LHS. >> >> >>> In [3]: a[:, :-1] = a[:, 1:] >>> >>> In [4]: a >>> Out[4]: >>> array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], >>> [6, 3, 3, 9, 8, 0, 8, 9, 5, 5], >>> [0, 1, 1, 2, 5, 8, 2, 5, 3, 3], >>> [0, 0, 2, 8, 2, 0, 7, 7, 0, 0], >>> [8, 6, 9, 6, 3, 9, 4, 4, 5, 5], >>> [7, 6, 9, 3, 8, 9, 9, 6, 9, 9], >>> [8, 8, 4, 0, 3, 7, 6, 7, 6, 6], >>> [4, 9, 2, 4, 7, 3, 6, 7, 4, 4], >>> [2, 0, 7, 0, 7, 6, 6, 1, 6, 6], >>> [3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) >>> >> The first one works because the RHS pointer is always one step ahead >> of the LHS pointer, thus it always reads pristine data. >> >> >>> In [5]: a[:, 1:] = a[:, :-1] >>> >>> In [6]: a >>> Out[6]: >>> array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>> [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], >>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>> [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], >>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], >>> [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], >>> [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) >>> >> The second one fails to work as you expect because the RHS pointer is >> always one step behind the LHS pointer, thus it always reads the data >> that just got modified in the previous step. The data you expected it >> to read has already been wiped out. >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From russel at appliedminds.com Mon Jan 18 15:43:23 2010 From: russel at appliedminds.com (Russel Howe) Date: Mon, 18 Jan 2010 12:43:23 -0800 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <4B54C216.40105@enthought.com> References: <4B54B97B.1080000@appliedminds.com> <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> <4B54BBDE.7080200@appliedminds.com> <4B54C216.40105@enthought.com> Message-ID: <4B54C7EB.6080906@appliedminds.com> Oh, of course. I can reverse it myself. Thanks, I did not think of that. Russel Warren Weckesser wrote: > Russel Howe wrote: >> Since they are iterators, is it possible to check for the second >> condition and reverse both of them so the behavior I expect happens or >> does this break something else? >> > > You may already know this, but just in case... > > In the second case, you can accomplish the shift by using reversed slices: > > a[:, -1:0:-1] = a[:, -2::-1] > > > Warren > >> Russel >> Robert Kern wrote: >> >>> On Mon, Jan 18, 2010 at 13:41, Russel Howe wrote: >>> >>>> This looks like the difference between memmove and memcpy to me, but I >>>> am not sure what the expected behavior of numpy should be. The first >>>> shift behaves the way I expect, the second is surprising. >>>> >>> memmove() and memcpy() are not used for these operations (and in >>> general, they can't be). Rather, iterators are created and looped over >>> to do the assignments. Because you are not making copies on the >>> right-hand-side, you are modifying the RHS as the iterators assign to >>> the LHS. >>> >>> >>>> In [3]: a[:, :-1] = a[:, 1:] >>>> >>>> In [4]: a >>>> Out[4]: >>>> array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], >>>> [6, 3, 3, 9, 8, 0, 8, 9, 5, 5], >>>> [0, 1, 1, 2, 5, 8, 2, 5, 3, 3], >>>> [0, 0, 2, 8, 2, 0, 7, 7, 0, 0], >>>> [8, 6, 9, 6, 3, 9, 4, 4, 5, 5], >>>> [7, 6, 9, 3, 8, 9, 9, 6, 9, 9], >>>> [8, 8, 4, 0, 3, 7, 6, 7, 6, 6], >>>> [4, 9, 2, 4, 7, 3, 6, 7, 4, 4], >>>> [2, 0, 7, 0, 7, 6, 6, 1, 6, 6], >>>> [3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) >>>> >>> The first one works because the RHS pointer is always one step ahead >>> of the LHS pointer, thus it always reads pristine data. >>> >>> >>>> In [5]: a[:, 1:] = a[:, :-1] >>>> >>>> In [6]: a >>>> Out[6]: >>>> array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>> [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], >>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>>> [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], >>>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>>> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], >>>> [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], >>>> [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) >>>> >>> The second one fails to work as you expect because the RHS pointer is >>> always one step behind the LHS pointer, thus it always reads the data >>> that just got modified in the previous step. The data you expected it >>> to read has already been wiped out. >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From lescroar at usc.edu Mon Jan 18 20:15:00 2010 From: lescroar at usc.edu (Mark Lescroart) Date: Mon, 18 Jan 2010 17:15:00 -0800 Subject: [Numpy-discussion] Numpy.dot segmentation fault In-Reply-To: <4B53E5C1.4070801@silveregg.co.jp> References: <4B53E5C1.4070801@silveregg.co.jp> Message-ID: <9073C6F4-C9F7-4A84-A0BE-668A65447090@usc.edu> Hi David (et al), Thanks for the reply. The version of ATLAS I was using was v3.8.3_1, installed via MacPorts, compiled and built on my machine with a gcc4.3 compiler. I uninstalled numpy 1.4 and ATLAS, re-installed (i.e., re- compiled) the same version (3.8.3_1), re-installed numpy (for python2.6, version 1.4), and got the same bug. I don't know if this means that there's something fundamentally wrong with the version of ATLAS on MacPorts (probably less likely) or something wrong with the way my system is configured (probably more likely). If anyone can give me any more insight into how to test my installation of ATLAS, I would be much obliged (I read through a fair bit of the ATLAS installation notes on the ATLAS sourceforge page, and could not figure out how to "run the whole test suite" ). If possible, I would like to solve this problem within Macports (and thus not with the Accelerate framework). I am using numpy mostly through the pyMVPA package for fMRI multi-voxel analysis, and the pyMVPA package depends on a number of other libraries, and the mess of dependencies is most easily managed within the framework of Macports. Cheers, Mark On Jan 17, 2010, at 8:38 PM, David Cournapeau wrote: > Mark Lescroart wrote: >> Hello, >> >> I've encountered a segfault in numpy when trying to compute a dot >> product for two arrays - see code below. The problem only seems to >> occur >> when the arrays reach a certain size. > > Your atlas is most likely broken. You will have to double-check how > you > built it, and maybe run the whole test suite (as indicated in the > ATLAS > installation notes). > > Note that you can use the Accelerate framework on mac os x, this is > much > easier to get numpy working on mac, > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robince at gmail.com Mon Jan 18 20:31:00 2010 From: robince at gmail.com (Robin) Date: Mon, 18 Jan 2010 20:31:00 -0500 Subject: [Numpy-discussion] Numpy.dot segmentation fault In-Reply-To: <9073C6F4-C9F7-4A84-A0BE-668A65447090@usc.edu> References: <4B53E5C1.4070801@silveregg.co.jp> <9073C6F4-C9F7-4A84-A0BE-668A65447090@usc.edu> Message-ID: <2d5132a51001181731y6bcd03d8m429a0426236984d8@mail.gmail.com> You can build numpy against Accelerate through macports by specifying the +no_atlas variant. Last time I tried I ran into this issue: http://trac.macports.org/ticket/22201 but it looks like it should be fixed now. Cheers Robin On Mon, Jan 18, 2010 at 8:15 PM, Mark Lescroart wrote: > Hi David (et al), > > Thanks for the reply. The version of ATLAS I was using was v3.8.3_1, > installed via MacPorts, compiled and built on my machine with a gcc4.3 > compiler. I uninstalled numpy 1.4 and ATLAS, re-installed (i.e., re- > compiled) the same version (3.8.3_1), re-installed numpy (for > python2.6, version 1.4), and got the same bug. > > I don't know if this means that there's something fundamentally wrong > with the version of ATLAS on MacPorts (probably less likely) or > something wrong with the way my system is configured (probably more > likely). If anyone can give me any more insight into how to test my > installation of ATLAS, I would be much obliged (I read through a fair > bit of the ATLAS installation notes on the ATLAS sourceforge page, and > could not figure out how to "run the whole test suite" ). > > If possible, I would like to solve this problem within Macports (and > thus not with the Accelerate framework). I am using numpy mostly > through the pyMVPA package for fMRI multi-voxel analysis, and the > pyMVPA package depends on a number of other libraries, and the mess of > dependencies is most easily managed within the framework of Macports. > > Cheers, > > Mark > > On Jan 17, 2010, at 8:38 PM, David Cournapeau wrote: > >> Mark Lescroart wrote: >>> Hello, >>> >>> I've encountered a segfault in numpy when trying to compute a dot >>> product for two arrays - see code below. The problem only seems to >>> occur >>> when the arrays reach a certain size. >> >> Your atlas is most likely broken. You will have to double-check how >> you >> built it, and maybe run the whole test suite (as indicated in the >> ATLAS >> installation notes). >> >> Note that you can use the Accelerate framework on mac os x, this is >> much >> easier to get numpy working on mac, >> >> cheers, >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From somervi8 at telus.net Tue Jan 19 12:53:36 2010 From: somervi8 at telus.net (robert somerville) Date: Tue, 19 Jan 2010 09:53:36 -0800 Subject: [Numpy-discussion] how to do a proper 2 column sort on a 2 dimensional array ?? Message-ID: <2fb4a5011001190953j3fba1134gac050ccf186dea0d@mail.gmail.com> Hi; i am having trouble trying to sort the rows of a 2 dimensional array by the values in the first column .. does anybody know how or have an example of how to do this ??? while leaving the remain columns remain relative to the leading column from numpy import * a=array( [ [4, 4, 3], [4, 5, 2], [3, 1, 1] ] ) i would like to generate the output (or get the output ...) b = [ [3,1,1], [4,4,3], [4,5,2] ] to be specific the primary sort is on the the first column, and within the primary key, i would like to do a seconday sort of the matrix based on 2nd column .. does Numpy have this finctionality, or do part have to be programmed in Python ?? thanks; bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Jan 19 13:09:09 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jan 2010 12:09:09 -0600 Subject: [Numpy-discussion] how to do a proper 2 column sort on a 2 dimensional array ?? In-Reply-To: <2fb4a5011001190953j3fba1134gac050ccf186dea0d@mail.gmail.com> References: <2fb4a5011001190953j3fba1134gac050ccf186dea0d@mail.gmail.com> Message-ID: <3d375d731001191009w42a8459ax9f55f069fa252abc@mail.gmail.com> On Tue, Jan 19, 2010 at 11:53, robert somerville wrote: > Hi; > ?i am having trouble trying to sort the rows of a 2 dimensional array by the > values in the first column .. does anybody know how or have an example of > how to do this ??? while leaving the remain columns remain relative to the > leading column > > from numpy import * > > a=array( [ [4, 4, 3], [4, 5, 2],? [3, 1, 1] ] ) > > i would like to generate the output (or get the output ...) > > b = [ [3,1,1], [4,4,3], [4,5,2] ] > > to be specific the primary sort is on the the first column, and within the > primary key, i would like to do a seconday sort of the matrix based on 2nd > column .. Let's modify your example slightly so I don't make the same mistake I did on comp.lang.python. Let's make sure that the input data is not already partially ordered by the second column. All we need to do is swap the first two rows. In [9]: a = np.array( [ [4, 5, 2], [4, 4, 3], [3, 1, 1] ] ) In [10]: i = np.lexsort((a[:,1], a[:,0])) In [11]: b = a[i] In [12]: b Out[12]: array([[3, 1, 1], [4, 4, 3], [4, 5, 2]]) Note that in the lexsort() call, the second column comes first. You can think of the procedure as "sort by the second column, now sort by the first column; where there are ties in the first column, the order is left alone from the previous sort on the second column". -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Tue Jan 19 13:15:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 13:15:16 -0500 Subject: [Numpy-discussion] how to do a proper 2 column sort on a 2 dimensional array ?? In-Reply-To: <3d375d731001191009w42a8459ax9f55f069fa252abc@mail.gmail.com> References: <2fb4a5011001190953j3fba1134gac050ccf186dea0d@mail.gmail.com> <3d375d731001191009w42a8459ax9f55f069fa252abc@mail.gmail.com> Message-ID: <1cd32cbb1001191015u4bf0eb04ga238b91fe87dfbde@mail.gmail.com> On Tue, Jan 19, 2010 at 1:09 PM, Robert Kern wrote: > On Tue, Jan 19, 2010 at 11:53, robert somerville wrote: >> Hi; >> ?i am having trouble trying to sort the rows of a 2 dimensional array by the >> values in the first column .. does anybody know how or have an example of >> how to do this ??? while leaving the remain columns remain relative to the >> leading column >> >> from numpy import * >> >> a=array( [ [4, 4, 3], [4, 5, 2],? [3, 1, 1] ] ) >> >> i would like to generate the output (or get the output ...) >> >> b = [ [3,1,1], [4,4,3], [4,5,2] ] >> >> to be specific the primary sort is on the the first column, and within the >> primary key, i would like to do a seconday sort of the matrix based on 2nd >> column .. > > Let's modify your example slightly so I don't make the same mistake I > did on comp.lang.python. Let's make sure that the input data is not > already partially ordered by the second column. All we need to do is > swap the first two rows. > > In [9]: a = np.array( [ [4, 5, 2], [4, 4, 3], [3, 1, 1] ] ) > > In [10]: i = np.lexsort((a[:,1], a[:,0])) > > In [11]: b = a[i] > > In [12]: b > Out[12]: > array([[3, 1, 1], > ? ? ? [4, 4, 3], > ? ? ? [4, 5, 2]]) > > > Note that in the lexsort() call, the second column comes first. You > can think of the procedure as "sort by the second column, now sort by > the first column; where there are ties in the first column, the order > is left alone from the previous sort on the second column". > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > see also numpy discussion 12/21/2008 "is there a sortrows" when I struggled with it Josef From Chris.Barker at noaa.gov Tue Jan 19 13:27:20 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 19 Jan 2010 10:27:20 -0800 Subject: [Numpy-discussion] Is this expected behavior? In-Reply-To: <4B54C7EB.6080906@appliedminds.com> References: <4B54B97B.1080000@appliedminds.com> <3d375d731001181147x3c2ecab6ua82c89ace04ddb83@mail.gmail.com> <4B54BBDE.7080200@appliedminds.com> <4B54C216.40105@enthought.com> <4B54C7EB.6080906@appliedminds.com> Message-ID: <4B55F988.5030509@noaa.gov> Russel Howe wrote: > Oh, of course. I can reverse it myself. Thanks, I did not think of that. note that you may need to make sure that your arrays are in C-order. -Chris > Russel > > Warren Weckesser wrote: >> Russel Howe wrote: >>> Since they are iterators, is it possible to check for the second >>> condition and reverse both of them so the behavior I expect happens or >>> does this break something else? >>> >> You may already know this, but just in case... >> >> In the second case, you can accomplish the shift by using reversed slices: >> >> a[:, -1:0:-1] = a[:, -2::-1] >> >> >> Warren >> >>> Russel >>> Robert Kern wrote: >>> >>>> On Mon, Jan 18, 2010 at 13:41, Russel Howe wrote: >>>> >>>>> This looks like the difference between memmove and memcpy to me, but I >>>>> am not sure what the expected behavior of numpy should be. The first >>>>> shift behaves the way I expect, the second is surprising. >>>>> >>>> memmove() and memcpy() are not used for these operations (and in >>>> general, they can't be). Rather, iterators are created and looped over >>>> to do the assignments. Because you are not making copies on the >>>> right-hand-side, you are modifying the RHS as the iterators assign to >>>> the LHS. >>>> >>>> >>>>> In [3]: a[:, :-1] = a[:, 1:] >>>>> >>>>> In [4]: a >>>>> Out[4]: >>>>> array([[0, 5, 4, 8, 2, 7, 8, 7, 6, 6], >>>>> [6, 3, 3, 9, 8, 0, 8, 9, 5, 5], >>>>> [0, 1, 1, 2, 5, 8, 2, 5, 3, 3], >>>>> [0, 0, 2, 8, 2, 0, 7, 7, 0, 0], >>>>> [8, 6, 9, 6, 3, 9, 4, 4, 5, 5], >>>>> [7, 6, 9, 3, 8, 9, 9, 6, 9, 9], >>>>> [8, 8, 4, 0, 3, 7, 6, 7, 6, 6], >>>>> [4, 9, 2, 4, 7, 3, 6, 7, 4, 4], >>>>> [2, 0, 7, 0, 7, 6, 6, 1, 6, 6], >>>>> [3, 8, 8, 9, 6, 7, 2, 5, 0, 0]], dtype=uint8) >>>>> >>>> The first one works because the RHS pointer is always one step ahead >>>> of the LHS pointer, thus it always reads pristine data. >>>> >>>> >>>>> In [5]: a[:, 1:] = a[:, :-1] >>>>> >>>>> In [6]: a >>>>> Out[6]: >>>>> array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>>> [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], >>>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], >>>>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>>>> [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], >>>>> [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], >>>>> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], >>>>> [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], >>>>> [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8) >>>>> >>>> The second one fails to work as you expect because the RHS pointer is >>>> always one step behind the LHS pointer, thus it always reads the data >>>> that just got modified in the previous step. The data you expected it >>>> to read has already been wiped out. >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From doutriaux1 at llnl.gov Tue Jan 19 13:32:33 2010 From: doutriaux1 at llnl.gov (=?UTF-8?Q?Charles_=D8=B3=D9=85=D9=8A=D8=B1_Doutriaux?=) Date: Tue, 19 Jan 2010 10:32:33 -0800 Subject: [Numpy-discussion] Numpy 1.4 MaskedArray bug? In-Reply-To: <910F1C67-06BB-47C1-8FCA-057131728D47@gmail.com> References: <1263321129.7167.12.camel@idol> <910F1C67-06BB-47C1-8FCA-057131728D47@gmail.com> Message-ID: Hi Pierre, We didn't move to 1.4 yet. Should we wait for 1.4.1? It seems there's some issues with numpy.ma in 1.4 and we rely heavily on it. C. On Jan 12, 2010, at 11:50 AM, Pierre GM wrote: > On Jan 12, 2010, at 1:52 PM, Charles R Harris wrote: >> >> >> >> On Tue, Jan 12, 2010 at 11:32 AM, Pauli Virtanen wrote: >> ti, 2010-01-12 kello 12:51 -0500, Pierre GM kirjoitti: >> [clip] >>>>>>> a = numpy.ma.MaskedArray([[1,2,3],[4,5,6]]) >>>>>>> numpy.ma.sum(a, 1) >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> File >>>> "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux- >>>> x86_64.egg/n >>>> umpy/ma/core.py", line 5682, in __call__ >>>> return method(*args, **params) >>>> File >>>> "/usr/lib64/python2.5/site-packages/numpy-1.4.0-py2.5-linux- >>>> x86_64.egg/n >>>> umpy/ma/core.py", line 4357, in sum >>>> newmask = _mask.all(axis=axis) >>>> ValueError: axis(=1) out of bounds >>> >>> Confirmed. >>> Before I take full blame for it, can you try the following on both >>> 1.3 and 1.4 ? >>>>>> np.array(False).all().sum(1) >> >> Oh crap, it's mostly my fault: >> >> http://*projects.scipy.org/numpy/ticket/1286 >> http://*projects.scipy.org/numpy/changeset/7697 >> http://*projects.scipy.org/numpy/browser/trunk/doc/release/1.4.0- >> notes.rst#deprecations >> >> Pretty embarassing, as very simple things break, although the test >> suite >> miraculously passes... >> >>> Back to your problem: I'll fix that ASAIC, but it'll be on the >>> SVN. Meanwhile, you can: >>> * Use -1 instead of 1 for your axis. >>> * Force the definition of a mask when you define your array with >>> masked_array(...,mask=False) >> >> Sounds like we need a 1.4.1 out at some point not too far in the >> future, >> then. >> >> >> If so, then it should be sooner rather than later in order to sync >> with the releases of ubuntu and fedora. Both of the upcoming >> releases still use 1.3.0, but that could change... > > I guess that the easiest would be for me to provide a workaround for > the bug (Pauli's modifications make sense, I was relying on a > *feature* that wasn't very robust). > I'll update both the trunk and the 1.4.x branch > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://*mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Tue Jan 19 15:07:46 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 19 Jan 2010 12:07:46 -0800 Subject: [Numpy-discussion] Waf or scons/numscons for a C/Fortran/Cython/Python project -- what's your recommendation? In-Reply-To: <5b8d13221001162004w640e4854obe27a78942bb551b@mail.gmail.com> References: <5b8d13221001162004w640e4854obe27a78942bb551b@mail.gmail.com> Message-ID: <4B561112.8030600@noaa.gov> David Cournapeau wrote: > Waf codebase is much better than scons, I don't know about waf, but I do know that I tried to add OS-X application bundle support to scons, and it was really, really painful. It sure seemed like it should have been easy to do -- it's just a well-defined directory structure. > The biggest drawback I see with waf is the lack of users: the only > significant project I know which uses waf is Ardour. There was some work done to use it for wxWebKit, though I don't know what's come of that: https://bugs.webkit.org/show_bug.cgi?id=27619 -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gael.varoquaux at normalesup.org Tue Jan 19 15:12:53 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 21:12:53 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy Message-ID: <20100119201253.GA2353@phare.normalesup.org> Hi there, Forgive me for turning to the mailing list to do my homework. I am currently optimizing a code, and it turns out that the main bottleneck is the orthogonalisation of a vector 'y' to a set of vectors 'confounds', that I am currently doing with the following code: y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. Most of the time is spent in linalg.lstsq. The length of the vectors is 810, and there are about 10 confounds. Is there a better way of doing this? Cheers, Ga?l From robert.kern at gmail.com Tue Jan 19 15:22:30 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jan 2010 14:22:30 -0600 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119201253.GA2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> Message-ID: <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> On Tue, Jan 19, 2010 at 14:12, Gael Varoquaux wrote: > Hi there, > > Forgive me for turning to the mailing list to do my homework. I am > currently optimizing a code, and it turns out that the main bottleneck is > the orthogonalisation of a vector 'y' to a set of vectors 'confounds', > that I am currently doing with the following code: > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. For clarification, are you trying to find the components of the y vectors that are perpendicular to the space spanned by the 10 orthonormal vectors in confounds? > Most of the time is spent in linalg.lstsq. The length of the vectors is > 810, and there are about 10 confounds. Exactly what are the shapes? y.shape = (810, N); confounds.shape = (810, 10)? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gael.varoquaux at normalesup.org Tue Jan 19 15:47:08 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 21:47:08 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> Message-ID: <20100119204708.GB2353@phare.normalesup.org> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. > For clarification, are you trying to find the components of the y > vectors that are perpendicular to the space spanned by the 10 > orthonormal vectors in confounds? Yes. Actually, what I am doing is calculating partial correlation between x and y conditionally to confounds, with the following code: def cond_partial_cor(y, x, confounds=[]): """ Returns the partial correlation of y and x, conditionning on confounds. """ # First orthogonalise y and x relative to confounds if len(confounds): y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) I am not sure that what I am doing is optimal. > > Most of the time is spent in linalg.lstsq. The length of the vectors is > > 810, and there are about 10 confounds. > Exactly what are the shapes? y.shape = (810, N); confounds.shape = (810, 10)? Sorry, I should have been more precise: y.shape = (810, ) confounds.shape = (10, 810) Thanks, Ga?l From josef.pktd at gmail.com Tue Jan 19 15:52:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 15:52:54 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119204708.GB2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> Message-ID: <1cd32cbb1001191252h18650e9ch3af8e8e2ce152d8@mail.gmail.com> On Tue, Jan 19, 2010 at 3:47 PM, Gael Varoquaux wrote: > On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. > >> For clarification, are you trying to find the components of the y >> vectors that are perpendicular to the space spanned by the 10 >> orthonormal vectors in confounds? > > Yes. Actually, what I am doing is calculating partial correlation between > x and y conditionally to confounds, with the following code: > > def cond_partial_cor(y, x, confounds=[]): > ? ?""" Returns the partial correlation of y and x, conditionning on > ? ? ? ?confounds. > ? ?""" > ? ?# First orthogonalise y and x relative to confounds > ? ?if len(confounds): > ? ? ? ?y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > ? ? ? ?x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) you can combine x, and y for one call to leastsq, if it makes a difference linalg.lstsq(confounds.T, [x,y]) #format? columnstack? I don't see anything else yet Josef > ? ?return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > > I am not sure that what I am doing is optimal. > >> > Most of the time is spent in linalg.lstsq. The length of the vectors is >> > 810, and there are about 10 confounds. > >> Exactly what are the shapes? y.shape = (810, N); confounds.shape = (810, 10)? > > Sorry, I should have been more precise: > > y.shape = (810, ) > confounds.shape = (10, 810) > > Thanks, > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Tue Jan 19 15:58:32 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jan 2010 14:58:32 -0600 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119204708.GB2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> Message-ID: <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> On Tue, Jan 19, 2010 at 14:47, Gael Varoquaux wrote: > On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. > >> For clarification, are you trying to find the components of the y >> vectors that are perpendicular to the space spanned by the 10 >> orthonormal vectors in confounds? > > Yes. Actually, what I am doing is calculating partial correlation between > x and y conditionally to confounds, with the following code: > > def cond_partial_cor(y, x, confounds=[]): > ? ?""" Returns the partial correlation of y and x, conditionning on > ? ? ? ?confounds. > ? ?""" > ? ?# First orthogonalise y and x relative to confounds > ? ?if len(confounds): > ? ? ? ?y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > ? ? ? ?x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) > ? ?return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > > I am not sure that what I am doing is optimal. If confounds is orthonormal, then there is no need to use lstsq(). y = y - np.dot(np.dot(confounds, y), confounds) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gael.varoquaux at normalesup.org Tue Jan 19 16:11:40 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 22:11:40 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <1cd32cbb1001191252h18650e9ch3af8e8e2ce152d8@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191252h18650e9ch3af8e8e2ce152d8@mail.gmail.com> Message-ID: <20100119211140.GC2353@phare.normalesup.org> > you can combine x, and y for one call to leastsq, if it makes a difference > linalg.lstsq(confounds.T, [x,y]) #format? columnstack? Indeed! Thank you Joseph. That's a gain of 10% in the total computation time of my algorithm (and 20% on the partial correlation calculation). Ga?l From gael.varoquaux at normalesup.org Tue Jan 19 16:12:36 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 22:12:36 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> Message-ID: <20100119211236.GD2353@phare.normalesup.org> On Tue, Jan 19, 2010 at 02:58:32PM -0600, Robert Kern wrote: > > I am not sure that what I am doing is optimal. > If confounds is orthonormal, then there is no need to use lstsq(). > y = y - np.dot(np.dot(confounds, y), confounds) Unfortunately, confounds is not orthonormal, and as it is different at each call, I cannot orthogonalise it as a preprocessing. Thanks, Ga?l From robert.kern at gmail.com Tue Jan 19 16:16:59 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jan 2010 15:16:59 -0600 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119211236.GD2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> <20100119211236.GD2353@phare.normalesup.org> Message-ID: <3d375d731001191316n3039bbd1k308a02e48505f9f9@mail.gmail.com> On Tue, Jan 19, 2010 at 15:12, Gael Varoquaux wrote: > On Tue, Jan 19, 2010 at 02:58:32PM -0600, Robert Kern wrote: >> > I am not sure that what I am doing is optimal. > >> If confounds is orthonormal, then there is no need to use lstsq(). > >> ? y = y - np.dot(np.dot(confounds, y), confounds) > > Unfortunately, confounds is not orthonormal, and as it is different at > each call, I cannot orthogonalise it as a preprocessing. Ah, then you shouldn't have said "Yes" when I asked if they were orthonormal. :-) However, you can orthonormalize inside the function and reuse that for both x and y. Using the QR decomposition is likely cheaper than the SVD that lstsq() does. ortho_confounds = linalg.qr(confounds.T)[0].T -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pav at iki.fi Tue Jan 19 16:17:12 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 19 Jan 2010 23:17:12 +0200 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119211236.GD2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> <20100119211236.GD2353@phare.normalesup.org> Message-ID: <1263935831.7146.23.camel@idol> ti, 2010-01-19 kello 22:12 +0100, Gael Varoquaux kirjoitti: > On Tue, Jan 19, 2010 at 02:58:32PM -0600, Robert Kern wrote: > > > I am not sure that what I am doing is optimal. > > > If confounds is orthonormal, then there is no need to use lstsq(). > > > y = y - np.dot(np.dot(confounds, y), confounds) > > Unfortunately, confounds is not orthonormal, and as it is different at > each call, I cannot orthogonalise it as a preprocessing. You orthonormalize it on each call, then. It's quite likely cheaper to do than the SVD that lstsq does. {{{ import numpy as np def gram_schmid(V): """ Gram-Schmid orthonormalization of a set of `M` vectors, in-place. Parameters ---------- V : array, shape (N, M) """ # XXX: speed can be improved by using routines from scipy.lib.blas # XXX: maybe there's an orthonormalization routine in LAPACK, too, # apart from QR. too lazy to check... n = V.shape[1] for k in xrange(n): V[:,k] /= np.linalg.norm(V[:,k]) for j in xrange(k+1, n): V[:,j] -= np.vdot(V[:,j], V[:,k]) * V[:,k] return V def relative_ortho(x, V, V_is_orthonormal=False): """ Relative orthogonalization of vector `x` versus vectors in `V`. """ if not V_is_orthonormal: gram_schmid(V) for k in xrange(V.shape[1]): x -= np.vdot(x, V[:,k])*V[:,k] return x V = np.array([[1,0,1], [1,1,0]], dtype=float).T x = np.array([1,1,1], dtype=float) relative_ortho(x, V) print x print np.dot(x, V[:,0]) print np.dot(x, V[:,1]) }}} -- Pauli Virtanen From josef.pktd at gmail.com Tue Jan 19 16:19:19 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 16:19:19 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119211140.GC2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191252h18650e9ch3af8e8e2ce152d8@mail.gmail.com> <20100119211140.GC2353@phare.normalesup.org> Message-ID: <1cd32cbb1001191319m4aac7e8ar4ed57489bb654c7d@mail.gmail.com> On Tue, Jan 19, 2010 at 4:11 PM, Gael Varoquaux wrote: >> you can ?combine x, and y for one call to leastsq, if it makes a difference >> linalg.lstsq(confounds.T, [x,y]) ?#format? columnstack? > > Indeed! Thank you Joseph. That's a gain of 10% in the total computation > time of my algorithm (and 20% on the partial correlation calculation). if you have z=[x,y] stacked, just one call to dot might also help for the correlation zz = dot(z.T, z) zz/sqrt(zz[0,0]*zz[1,1]) You might be able to do everything in a stacked version/ there is no ph in Josef (unless you talk about my french father-in-law) Josef > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Tue Jan 19 16:29:06 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jan 2010 14:29:06 -0700 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119204708.GB2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> Message-ID: On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: > > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > > > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. > > > For clarification, are you trying to find the components of the y > > vectors that are perpendicular to the space spanned by the 10 > > orthonormal vectors in confounds? > > Yes. Actually, what I am doing is calculating partial correlation between > x and y conditionally to confounds, with the following code: > > def cond_partial_cor(y, x, confounds=[]): > """ Returns the partial correlation of y and x, conditionning on > confounds. > """ > # First orthogonalise y and x relative to confounds > if len(confounds): > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) > return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > > I am not sure that what I am doing is optimal. > > > > Most of the time is spent in linalg.lstsq. The length of the vectors is > > > 810, and there are about 10 confounds. > > > Exactly what are the shapes? y.shape = (810, N); confounds.shape = (810, > 10)? > > Sorry, I should have been more precise: > > y.shape = (810, ) > confounds.shape = (10, 810) > > Column stack the bunch so that the last column is y, then do a qr decomposition. The last column of q is the (normalized) orthogonal vector and its amplitude is the last (bottom right) component of r. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 19 16:34:09 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 16:34:09 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> Message-ID: <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris wrote: > > > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux > wrote: >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. >> >> > For clarification, are you trying to find the components of the y >> > vectors that are perpendicular to the space spanned by the 10 >> > orthonormal vectors in confounds? >> >> Yes. Actually, what I am doing is calculating partial correlation between >> x and y conditionally to confounds, with the following code: >> >> def cond_partial_cor(y, x, confounds=[]): >> ? ?""" Returns the partial correlation of y and x, conditionning on >> ? ? ? ?confounds. >> ? ?""" >> ? ?# First orthogonalise y and x relative to confounds >> ? ?if len(confounds): >> ? ? ? ?y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> ? ? ? ?x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) >> ? ?return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) >> >> I am not sure that what I am doing is optimal. >> >> > > Most of the time is spent in linalg.lstsq. The length of the vectors >> > > is >> > > 810, and there are about 10 confounds. >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape = (810, >> > 10)? >> >> Sorry, I should have been more precise: >> >> y.shape = (810, ) >> confounds.shape = (10, 810) >> > > Column stack the bunch so that the last column is y, then do a qr > decomposition. The last column of q is the (normalized) orthogonal vector > and its amplitude is the last (bottom right) component of r. do you have to do qr twice, once with x and once with y in the last column or can this be combined? I was trying to do something similar for partial autocorrelation for timeseries but didn't manage or try anything better than repeated leastsq or a variant. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From gael.varoquaux at normalesup.org Tue Jan 19 16:37:54 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 22:37:54 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <3d375d731001191316n3039bbd1k308a02e48505f9f9@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> <20100119211236.GD2353@phare.normalesup.org> <3d375d731001191316n3039bbd1k308a02e48505f9f9@mail.gmail.com> Message-ID: <20100119213754.GE2353@phare.normalesup.org> On Tue, Jan 19, 2010 at 03:16:59PM -0600, Robert Kern wrote: > On Tue, Jan 19, 2010 at 15:12, Gael Varoquaux > wrote: > > On Tue, Jan 19, 2010 at 02:58:32PM -0600, Robert Kern wrote: > >> > I am not sure that what I am doing is optimal. > >> If confounds is orthonormal, then there is no need to use lstsq(). > >> ? y = y - np.dot(np.dot(confounds, y), confounds) > > Unfortunately, confounds is not orthonormal, and as it is different at > > each call, I cannot orthogonalise it as a preprocessing. > Ah, then you shouldn't have said "Yes" when I asked if they were > orthonormal. :-) > However, you can orthonormalize inside the function and reuse that for > both x and y. Using the QR decomposition is likely cheaper than the > SVD that lstsq() does. > ortho_confounds = linalg.qr(confounds.T)[0].T Indeed! I wasn't aware that lstsq did an SVD. I thought it did the QR. Though calculating the QR once for both vector is anyhow a gain. I got another 20% speed gain in my total run time. Thanks! For the google-completness of this thread, to get a speed gain, one needs to use the 'econ=True' flag to qr. Ga?l From robert.kern at gmail.com Tue Jan 19 16:43:25 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jan 2010 15:43:25 -0600 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> Message-ID: <3d375d731001191343x382f9a3en4a791d04aaef6a67@mail.gmail.com> On Tue, Jan 19, 2010 at 15:29, Charles R Harris wrote: > Column stack the bunch so that the last column is y, then do a qr > decomposition. The last column of q is the (normalized) orthogonal vector > and its amplitude is the last (bottom right) component of r. Is the order actually guaranteed? In a quick test, it seems to work. In any case, I suspect that needing to do both x and y will make doing the QR once and some two pairs of dot products a better proposition than two QR decompositons. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gael.varoquaux at normalesup.org Tue Jan 19 16:45:34 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 22:45:34 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <3d375d731001191343x382f9a3en4a791d04aaef6a67@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191343x382f9a3en4a791d04aaef6a67@mail.gmail.com> Message-ID: <20100119214534.GF2353@phare.normalesup.org> On Tue, Jan 19, 2010 at 03:43:25PM -0600, Robert Kern wrote: > On Tue, Jan 19, 2010 at 15:29, Charles R Harris > wrote: > > Column stack the bunch so that the last column is y, then do a qr > > decomposition. The last column of q is the (normalized) orthogonal vector > > and its amplitude is the last (bottom right) component of r. > Is the order actually guaranteed? In a quick test, it seems to work. > In any case, I suspect that needing to do both x and y will make doing > the QR once and some two pairs of dot products a better proposition > than two QR decompositons. Yes, that's correct, but my initial question wasn't clear enough, and Chuck was answering my initial question. Thanks a lot Chuck, Ga?l From peridot.faceted at gmail.com Tue Jan 19 17:02:20 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 19 Jan 2010 17:02:20 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <20100119213754.GE2353@phare.normalesup.org> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> <20100119211236.GD2353@phare.normalesup.org> <3d375d731001191316n3039bbd1k308a02e48505f9f9@mail.gmail.com> <20100119213754.GE2353@phare.normalesup.org> Message-ID: 2010/1/19 Gael Varoquaux : > For the google-completness of this thread, to get a speed gain, one needs > to use the 'econ=True' flag to qr. Be warned that in some installations (in particular some using ATLAS), supplying econ=True can cause a segfault under certain conditions (I think only when the arrays are misaligned, e.g. coming from unpickling, and even then only with certain data); the bugfix for this may or may not be complete, and involves copying misaligned arrays. Anne From gael.varoquaux at normalesup.org Tue Jan 19 17:05:01 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Jan 2010 23:05:01 +0100 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <3d375d731001191258m14c75c22o9afa7ec8543d907f@mail.gmail.com> <20100119211236.GD2353@phare.normalesup.org> <3d375d731001191316n3039bbd1k308a02e48505f9f9@mail.gmail.com> <20100119213754.GE2353@phare.normalesup.org> Message-ID: <20100119220501.GG2353@phare.normalesup.org> On Tue, Jan 19, 2010 at 05:02:20PM -0500, Anne Archibald wrote: > Be warned that in some installations (in particular some using ATLAS), > supplying econ=True can cause a segfault under certain conditions (I > think only when the arrays are misaligned, e.g. coming from > unpickling, and even then only with certain data); the bugfix for this > may or may not be complete, and involves copying misaligned arrays. Thanks for the warning. That's nasty. I'll test this around in my lab. I don't think that arrays can be misaligned in my case, but I find it always hard to be certain. Ga?l From charlesr.harris at gmail.com Tue Jan 19 18:48:44 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jan 2010 16:48:44 -0700 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> Message-ID: On Tue, Jan 19, 2010 at 2:34 PM, wrote: > On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris > wrote: > > > > > > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux > > wrote: > >> > >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: > >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> > >> > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. > >> > >> > For clarification, are you trying to find the components of the y > >> > vectors that are perpendicular to the space spanned by the 10 > >> > orthonormal vectors in confounds? > >> > >> Yes. Actually, what I am doing is calculating partial correlation > between > >> x and y conditionally to confounds, with the following code: > >> > >> def cond_partial_cor(y, x, confounds=[]): > >> """ Returns the partial correlation of y and x, conditionning on > >> confounds. > >> """ > >> # First orthogonalise y and x relative to confounds > >> if len(confounds): > >> y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) > >> return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > >> > >> I am not sure that what I am doing is optimal. > >> > >> > > Most of the time is spent in linalg.lstsq. The length of the vectors > >> > > is > >> > > 810, and there are about 10 confounds. > >> > >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape = > (810, > >> > 10)? > >> > >> Sorry, I should have been more precise: > >> > >> y.shape = (810, ) > >> confounds.shape = (10, 810) > >> > > > > Column stack the bunch so that the last column is y, then do a qr > > decomposition. The last column of q is the (normalized) orthogonal vector > > and its amplitude is the last (bottom right) component of r. > > do you have to do qr twice, once with x and once with y in the last > column or can this be combined? > > I was trying to do something similar for partial autocorrelation for > timeseries but didn't manage or try anything better than repeated > leastsq or a variant. > > Depends on what you want to do. The QR decomposition is essentially Gram-Schmidt on the columns. So if you just want an orthonormal basis for the subspace spanned by a bunch of columns, the columns of Q are they. To get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, which is probably what you want if the x's are fixed and the y's vary. If there is just one y, then putting it as the last column lets the QR algorithm do that last bit of projection. Note that if you apply the QR algorithm to a Vandermonde matrix with the columns properly ordered you can get a collection of graded orthogonal polynomials over a given set of points. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Tue Jan 19 19:12:38 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 19 Jan 2010 19:12:38 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> Message-ID: 2010/1/19 Charles R Harris : > > Note that if you apply the QR algorithm to a Vandermonde matrix with the > columns properly ordered you can get a collection of graded orthogonal > polynomials over a given set of points. Or, if you want the polynomials in some other representation - by values, or in terms of some basis of orthogonal polynomials - you can construct an appropriate Vandermonde-style matrix and use QR. (When I tried this, switching from the power basis to the Chebyshev basis let me go from tens to hundreds of polynomials, and now Chebyshev polynomials are first-class objects.) Anne From josef.pktd at gmail.com Tue Jan 19 20:08:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 20:08:46 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> Message-ID: <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris wrote: > > > On Tue, Jan 19, 2010 at 2:34 PM, wrote: >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris >> wrote: >> > >> > >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux >> > wrote: >> >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls ATLAS. >> >> >> >> > For clarification, are you trying to find the components of the y >> >> > vectors that are perpendicular to the space spanned by the 10 >> >> > orthonormal vectors in confounds? >> >> >> >> Yes. Actually, what I am doing is calculating partial correlation >> >> between >> >> x and y conditionally to confounds, with the following code: >> >> >> >> def cond_partial_cor(y, x, confounds=[]): >> >> ? ?""" Returns the partial correlation of y and x, conditionning on >> >> ? ? ? ?confounds. >> >> ? ?""" >> >> ? ?# First orthogonalise y and x relative to confounds >> >> ? ?if len(confounds): >> >> ? ? ? ?y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> ? ? ? ?x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) >> >> ? ?return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) >> >> >> >> I am not sure that what I am doing is optimal. >> >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the >> >> > > vectors >> >> > > is >> >> > > 810, and there are about 10 confounds. >> >> >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape = >> >> > (810, >> >> > 10)? >> >> >> >> Sorry, I should have been more precise: >> >> >> >> y.shape = (810, ) >> >> confounds.shape = (10, 810) >> >> >> > >> > Column stack the bunch so that the last column is y, then do a qr >> > decomposition. The last column of q is the (normalized) orthogonal >> > vector >> > and its amplitude is the last (bottom right) component of r. >> >> do you have to do qr twice, once with x and once with y in the last >> column or can this be combined? >> >> I was trying to do something similar for partial autocorrelation for >> timeseries but didn't manage or try anything better than repeated >> leastsq or a variant. >> > > Depends on what you want to do. The QR decomposition is essentially > Gram-Schmidt on the columns. So if you just want an orthonormal basis for > the subspace spanned by a bunch of columns, the columns of Q are they. To > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, which > is probably what you want if the x's are fixed and the y's vary. If there is > just one y, then putting it as the last column lets the QR algorithm do that > last bit of projection. Gram-Schmidt (looking at it for the first time) looks a lot like sequential least squares projection. So, I'm trying to figure out if I can use the partial results up to a specific column as partial least squares and then work my way to the end by including/looking at more columns. But unfortunately I don't have time to play with it long enough to figure out whether and how it works, but I keep this in mind for the future. Thanks, Josef > > Note that if you apply the QR algorithm to a Vandermonde matrix with the > columns properly ordered you can get a collection of graded orthogonal > polynomials over a given set of points. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Tue Jan 19 21:47:04 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jan 2010 19:47:04 -0700 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> Message-ID: On Tue, Jan 19, 2010 at 6:08 PM, wrote: > On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris > wrote: > > > > > > On Tue, Jan 19, 2010 at 2:34 PM, wrote: > >> > >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux > >> > wrote: > >> >> > >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: > >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> >> > >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls > ATLAS. > >> >> > >> >> > For clarification, are you trying to find the components of the y > >> >> > vectors that are perpendicular to the space spanned by the 10 > >> >> > orthonormal vectors in confounds? > >> >> > >> >> Yes. Actually, what I am doing is calculating partial correlation > >> >> between > >> >> x and y conditionally to confounds, with the following code: > >> >> > >> >> def cond_partial_cor(y, x, confounds=[]): > >> >> """ Returns the partial correlation of y and x, conditionning on > >> >> confounds. > >> >> """ > >> >> # First orthogonalise y and x relative to confounds > >> >> if len(confounds): > >> >> y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> >> x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) > >> >> return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > >> >> > >> >> I am not sure that what I am doing is optimal. > >> >> > >> >> > > Most of the time is spent in linalg.lstsq. The length of the > >> >> > > vectors > >> >> > > is > >> >> > > 810, and there are about 10 confounds. > >> >> > >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape = > >> >> > (810, > >> >> > 10)? > >> >> > >> >> Sorry, I should have been more precise: > >> >> > >> >> y.shape = (810, ) > >> >> confounds.shape = (10, 810) > >> >> > >> > > >> > Column stack the bunch so that the last column is y, then do a qr > >> > decomposition. The last column of q is the (normalized) orthogonal > >> > vector > >> > and its amplitude is the last (bottom right) component of r. > >> > >> do you have to do qr twice, once with x and once with y in the last > >> column or can this be combined? > >> > >> I was trying to do something similar for partial autocorrelation for > >> timeseries but didn't manage or try anything better than repeated > >> leastsq or a variant. > >> > > > > Depends on what you want to do. The QR decomposition is essentially > > Gram-Schmidt on the columns. So if you just want an orthonormal basis for > > the subspace spanned by a bunch of columns, the columns of Q are they. To > > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, > which > > is probably what you want if the x's are fixed and the y's vary. If there > is > > just one y, then putting it as the last column lets the QR algorithm do > that > > last bit of projection. > > Gram-Schmidt (looking at it for the first time) looks a lot like > sequential least squares projection. So, I'm trying to figure out if I > can use the partial results up to a specific column as partial least > squares and then work my way to the end by including/looking at more > columns. > > I don't the QR factorization would work for normal PLS. IIRC, one of the algorithms does a svd of the cross correlation matrix. The difference is that in some sense the svd picks out the best linear combination of columns, while the qr factorization without column pivoting just takes them in order. The QR factorization used to be the method of choice for least squares because it is straight forward to compute, no iterating needed as in svd, but these days that advantage is pretty much gone. It is still a common first step in the svd, however. The matrix is factored to Q*R, then the svd of R is computed. > But unfortunately I don't have time to play with it long enough to > figure out whether and how it works, but I keep this in mind for the > future. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 19 22:02:37 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 19 Jan 2010 22:02:37 -0500 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> Message-ID: <1cd32cbb1001191902q5ccd01ahb6020aa959a3fddf@mail.gmail.com> On Tue, Jan 19, 2010 at 9:47 PM, Charles R Harris wrote: > > > On Tue, Jan 19, 2010 at 6:08 PM, wrote: >> >> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris >> wrote: >> > >> > >> > On Tue, Jan 19, 2010 at 2:34 PM, wrote: >> >> >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux >> >> > wrote: >> >> >> >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> >> >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls >> >> >> > > ATLAS. >> >> >> >> >> >> > For clarification, are you trying to find the components of the y >> >> >> > vectors that are perpendicular to the space spanned by the 10 >> >> >> > orthonormal vectors in confounds? >> >> >> >> >> >> Yes. Actually, what I am doing is calculating partial correlation >> >> >> between >> >> >> x and y conditionally to confounds, with the following code: >> >> >> >> >> >> def cond_partial_cor(y, x, confounds=[]): >> >> >> ? ?""" Returns the partial correlation of y and x, conditionning on >> >> >> ? ? ? ?confounds. >> >> >> ? ?""" >> >> >> ? ?# First orthogonalise y and x relative to confounds >> >> >> ? ?if len(confounds): >> >> >> ? ? ? ?y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> >> ? ? ? ?x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0]) >> >> >> ? ?return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) >> >> >> >> >> >> I am not sure that what I am doing is optimal. >> >> >> >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the >> >> >> > > vectors >> >> >> > > is >> >> >> > > 810, and there are about 10 confounds. >> >> >> >> >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape = >> >> >> > (810, >> >> >> > 10)? >> >> >> >> >> >> Sorry, I should have been more precise: >> >> >> >> >> >> y.shape = (810, ) >> >> >> confounds.shape = (10, 810) >> >> >> >> >> > >> >> > Column stack the bunch so that the last column is y, then do a qr >> >> > decomposition. The last column of q is the (normalized) orthogonal >> >> > vector >> >> > and its amplitude is the last (bottom right) component of r. >> >> >> >> do you have to do qr twice, once with x and once with y in the last >> >> column or can this be combined? >> >> >> >> I was trying to do something similar for partial autocorrelation for >> >> timeseries but didn't manage or try anything better than repeated >> >> leastsq or a variant. >> >> >> > >> > Depends on what you want to do. The QR decomposition is essentially >> > Gram-Schmidt on the columns. So if you just want an orthonormal basis >> > for >> > the subspace spanned by a bunch of columns, the columns of Q are they. >> > To >> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, >> > which >> > is probably what you want if the x's are fixed and the y's vary. If >> > there is >> > just one y, then putting it as the last column lets the QR algorithm do >> > that >> > last bit of projection. >> >> Gram-Schmidt (looking at it for the first time) looks a lot like >> sequential least squares projection. So, I'm trying to figure out if I >> can use the partial results up to a specific column as partial least >> squares and then work my way to the end by including/looking at more >> columns. >> > > I don't the QR factorization would work for normal PLS. IIRC, one of the > algorithms does a svd of the cross correlation matrix. The difference is > that in some sense the svd picks out the best linear combination of columns, > while the qr factorization without column pivoting just takes them in order. > The QR factorization used to be the method of choice for least squares > because it is straight forward to compute, no iterating needed as in svd, > but these days that advantage is pretty much gone. It is still a common > first step in the svd, however. The matrix is factored to Q*R, then the svd > of R is computed. I (finally) figured out svd and eigenvalue decomposition for this purpose. But from your description of QR, I thought specifically of the case where we have a "natural" ordering of the regressors, similar to the polynomial case of you and Anne. In the timeseries case it would be by increasing lags yt on y_{t-1} yt on y_{t-1}, y_{t-2} ... ... yt on y_{t-k} for k= 1,...,K or yt on xt and the lags of xt This is really sequential LS with a predefined sequence, not PLS or PCA/PCR or similar orthogonalization by "importance". The usual procedure for deciding on the appropriate number of lags usually loops over OLS with increasing number of regressors. >From the discussion, I thought there might be a way to "cheat" in this using QR and Gram-Schmidt Thanks, Josef > >> >> But unfortunately I don't have time to play with it long enough to >> figure out whether and how it works, but I keep this in mind for the >> future. >> > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Tue Jan 19 22:13:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jan 2010 20:13:37 -0700 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: <1cd32cbb1001191902q5ccd01ahb6020aa959a3fddf@mail.gmail.com> References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> <1cd32cbb1001191902q5ccd01ahb6020aa959a3fddf@mail.gmail.com> Message-ID: On Tue, Jan 19, 2010 at 8:02 PM, wrote: > On Tue, Jan 19, 2010 at 9:47 PM, Charles R Harris > wrote: > > > > > > On Tue, Jan 19, 2010 at 6:08 PM, wrote: > >> > >> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Tue, Jan 19, 2010 at 2:34 PM, wrote: > >> >> > >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris > >> >> wrote: > >> >> > > >> >> > > >> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux > >> >> > wrote: > >> >> >> > >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: > >> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) > >> >> >> > >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls > >> >> >> > > ATLAS. > >> >> >> > >> >> >> > For clarification, are you trying to find the components of the > y > >> >> >> > vectors that are perpendicular to the space spanned by the 10 > >> >> >> > orthonormal vectors in confounds? > >> >> >> > >> >> >> Yes. Actually, what I am doing is calculating partial correlation > >> >> >> between > >> >> >> x and y conditionally to confounds, with the following code: > >> >> >> > >> >> >> def cond_partial_cor(y, x, confounds=[]): > >> >> >> """ Returns the partial correlation of y and x, conditionning > on > >> >> >> confounds. > >> >> >> """ > >> >> >> # First orthogonalise y and x relative to confounds > >> >> >> if len(confounds): > >> >> >> y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, > y)[0]) > >> >> >> x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, > x)[0]) > >> >> >> return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) > >> >> >> > >> >> >> I am not sure that what I am doing is optimal. > >> >> >> > >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the > >> >> >> > > vectors > >> >> >> > > is > >> >> >> > > 810, and there are about 10 confounds. > >> >> >> > >> >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape > = > >> >> >> > (810, > >> >> >> > 10)? > >> >> >> > >> >> >> Sorry, I should have been more precise: > >> >> >> > >> >> >> y.shape = (810, ) > >> >> >> confounds.shape = (10, 810) > >> >> >> > >> >> > > >> >> > Column stack the bunch so that the last column is y, then do a qr > >> >> > decomposition. The last column of q is the (normalized) orthogonal > >> >> > vector > >> >> > and its amplitude is the last (bottom right) component of r. > >> >> > >> >> do you have to do qr twice, once with x and once with y in the last > >> >> column or can this be combined? > >> >> > >> >> I was trying to do something similar for partial autocorrelation for > >> >> timeseries but didn't manage or try anything better than repeated > >> >> leastsq or a variant. > >> >> > >> > > >> > Depends on what you want to do. The QR decomposition is essentially > >> > Gram-Schmidt on the columns. So if you just want an orthonormal basis > >> > for > >> > the subspace spanned by a bunch of columns, the columns of Q are they. > >> > To > >> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, > >> > which > >> > is probably what you want if the x's are fixed and the y's vary. If > >> > there is > >> > just one y, then putting it as the last column lets the QR algorithm > do > >> > that > >> > last bit of projection. > >> > >> Gram-Schmidt (looking at it for the first time) looks a lot like > >> sequential least squares projection. So, I'm trying to figure out if I > >> can use the partial results up to a specific column as partial least > >> squares and then work my way to the end by including/looking at more > >> columns. > >> > > > > I don't the QR factorization would work for normal PLS. IIRC, one of the > > algorithms does a svd of the cross correlation matrix. The difference is > > that in some sense the svd picks out the best linear combination of > columns, > > while the qr factorization without column pivoting just takes them in > order. > > The QR factorization used to be the method of choice for least squares > > because it is straight forward to compute, no iterating needed as in svd, > > but these days that advantage is pretty much gone. It is still a common > > first step in the svd, however. The matrix is factored to Q*R, then the > svd > > of R is computed. > > I (finally) figured out svd and eigenvalue decomposition for this purpose. > > But from your description of QR, I thought specifically of the case > where we have a "natural" ordering of the regressors, similar to the > polynomial case of you and Anne. In the timeseries case it would be by > increasing lags > > yt on y_{t-1} > yt on y_{t-1}, y_{t-2} > ... > ... > yt on y_{t-k} for k= 1,...,K > > or yt on xt and the lags of xt > > This is really sequential LS with a predefined sequence, not PLS or > PCA/PCR or similar orthogonalization by "importance". > The usual procedure for deciding on the appropriate number of lags > usually loops over OLS with increasing number of regressors. > >From the discussion, I thought there might be a way to "cheat" in this > using QR and Gram-Schmidt > > Ah, then I think your idea would work. The norms of the residuals at each step would be along the diagonal of the R matrix. They won't necessarily decrease monotonically, however. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 19 22:18:46 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jan 2010 20:18:46 -0700 Subject: [Numpy-discussion] Efficient orthogonalisation with scipy/numpy In-Reply-To: References: <20100119201253.GA2353@phare.normalesup.org> <3d375d731001191222g3659c2d5yc35f655fbaa31ceb@mail.gmail.com> <20100119204708.GB2353@phare.normalesup.org> <1cd32cbb1001191334o438f0506hf75e3345f3596d7b@mail.gmail.com> <1cd32cbb1001191708n29438ce0q62e9b091981f5bd2@mail.gmail.com> <1cd32cbb1001191902q5ccd01ahb6020aa959a3fddf@mail.gmail.com> Message-ID: On Tue, Jan 19, 2010 at 8:13 PM, Charles R Harris wrote: > > > On Tue, Jan 19, 2010 at 8:02 PM, wrote: > >> On Tue, Jan 19, 2010 at 9:47 PM, Charles R Harris >> wrote: >> > >> > >> > On Tue, Jan 19, 2010 at 6:08 PM, wrote: >> >> >> >> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Tue, Jan 19, 2010 at 2:34 PM, wrote: >> >> >> >> >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris >> >> >> wrote: >> >> >> > >> >> >> > >> >> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux >> >> >> > wrote: >> >> >> >> >> >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> >> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> >> >> >> >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls >> >> >> >> > > ATLAS. >> >> >> >> >> >> >> >> > For clarification, are you trying to find the components of the >> y >> >> >> >> > vectors that are perpendicular to the space spanned by the 10 >> >> >> >> > orthonormal vectors in confounds? >> >> >> >> >> >> >> >> Yes. Actually, what I am doing is calculating partial correlation >> >> >> >> between >> >> >> >> x and y conditionally to confounds, with the following code: >> >> >> >> >> >> >> >> def cond_partial_cor(y, x, confounds=[]): >> >> >> >> """ Returns the partial correlation of y and x, conditionning >> on >> >> >> >> confounds. >> >> >> >> """ >> >> >> >> # First orthogonalise y and x relative to confounds >> >> >> >> if len(confounds): >> >> >> >> y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, >> y)[0]) >> >> >> >> x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, >> x)[0]) >> >> >> >> return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) >> >> >> >> >> >> >> >> I am not sure that what I am doing is optimal. >> >> >> >> >> >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the >> >> >> >> > > vectors >> >> >> >> > > is >> >> >> >> > > 810, and there are about 10 confounds. >> >> >> >> >> >> >> >> > Exactly what are the shapes? y.shape = (810, N); >> confounds.shape = >> >> >> >> > (810, >> >> >> >> > 10)? >> >> >> >> >> >> >> >> Sorry, I should have been more precise: >> >> >> >> >> >> >> >> y.shape = (810, ) >> >> >> >> confounds.shape = (10, 810) >> >> >> >> >> >> >> > >> >> >> > Column stack the bunch so that the last column is y, then do a qr >> >> >> > decomposition. The last column of q is the (normalized) orthogonal >> >> >> > vector >> >> >> > and its amplitude is the last (bottom right) component of r. >> >> >> >> >> >> do you have to do qr twice, once with x and once with y in the last >> >> >> column or can this be combined? >> >> >> >> >> >> I was trying to do something similar for partial autocorrelation for >> >> >> timeseries but didn't manage or try anything better than repeated >> >> >> leastsq or a variant. >> >> >> >> >> > >> >> > Depends on what you want to do. The QR decomposition is essentially >> >> > Gram-Schmidt on the columns. So if you just want an orthonormal basis >> >> > for >> >> > the subspace spanned by a bunch of columns, the columns of Q are >> they. >> >> > To >> >> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, >> >> > which >> >> > is probably what you want if the x's are fixed and the y's vary. If >> >> > there is >> >> > just one y, then putting it as the last column lets the QR algorithm >> do >> >> > that >> >> > last bit of projection. >> >> >> >> Gram-Schmidt (looking at it for the first time) looks a lot like >> >> sequential least squares projection. So, I'm trying to figure out if I >> >> can use the partial results up to a specific column as partial least >> >> squares and then work my way to the end by including/looking at more >> >> columns. >> >> >> > >> > I don't the QR factorization would work for normal PLS. IIRC, one of the >> > algorithms does a svd of the cross correlation matrix. The difference is >> > that in some sense the svd picks out the best linear combination of >> columns, >> > while the qr factorization without column pivoting just takes them in >> order. >> > The QR factorization used to be the method of choice for least squares >> > because it is straight forward to compute, no iterating needed as in >> svd, >> > but these days that advantage is pretty much gone. It is still a common >> > first step in the svd, however. The matrix is factored to Q*R, then the >> svd >> > of R is computed. >> >> I (finally) figured out svd and eigenvalue decomposition for this purpose. >> >> But from your description of QR, I thought specifically of the case >> where we have a "natural" ordering of the regressors, similar to the >> polynomial case of you and Anne. In the timeseries case it would be by >> increasing lags >> >> yt on y_{t-1} >> yt on y_{t-1}, y_{t-2} >> ... >> ... >> yt on y_{t-k} for k= 1,...,K >> >> or yt on xt and the lags of xt >> >> This is really sequential LS with a predefined sequence, not PLS or >> PCA/PCR or similar orthogonalization by "importance". >> The usual procedure for deciding on the appropriate number of lags >> usually loops over OLS with increasing number of regressors. >> >From the discussion, I thought there might be a way to "cheat" in this >> using QR and Gram-Schmidt >> >> > Ah, then I think your idea would work. The norms of the residuals at each > step would be along the diagonal of the R matrix. They won't necessarily > decrease monotonically, however. > > Or if you are fitting a quantity with the lags, then Q.T*y gives the component of each orthogonalized lag. The running sum of the squares of the components should approach the variance of y, so I expect the ratio would give the percentage of the variance accounted for by the lags up to that point. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From washakie at gmail.com Wed Jan 20 09:01:21 2010 From: washakie at gmail.com (John [H2O]) Date: Wed, 20 Jan 2010 06:01:21 -0800 (PST) Subject: [Numpy-discussion] dates, np.where finding months Message-ID: <27242195.post@talk.nabble.com> I have an array with the leading column a series of datetime objects. It covers several years. What is the most efficient way to pull out all the 'January' dates? Right now I do this: A = array with column 0 datetime objects January = [i for i in A if i[0].month ==1 ] It works, but I would rather use np.where and get indices and not have to convert my list back into an array. -- View this message in context: http://old.nabble.com/dates%2C-np.where-finding-months-tp27242195p27242195.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From kwgoodman at gmail.com Wed Jan 20 10:11:30 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 20 Jan 2010 07:11:30 -0800 Subject: [Numpy-discussion] dates, np.where finding months In-Reply-To: <27242195.post@talk.nabble.com> References: <27242195.post@talk.nabble.com> Message-ID: On Wed, Jan 20, 2010 at 6:01 AM, John [H2O] wrote: > > I have an array with the leading column a series of datetime objects. It > covers several years. What is the most efficient way to pull out all the > 'January' dates? > > Right now I do this: > > A = array with column 0 datetime objects > > January = [i for i in A if i[0].month ==1 ] > > It works, but I would rather use np.where and get indices and not have to > convert my list back into an array. Instead of doing this: >> A array([[2010-01-01, 1], [2010-01-02, 2], [2010-02-01, 3], [2010-01-05, 4]], dtype=object) >> [i for i in A if i[0].month==1] [array([2010-01-01, 1], dtype=object), array([2010-01-02, 2], dtype=object), array([2010-01-05, 4], dtype=object)] You could do this: >> [i for i, date in enumerate(A[:,0]) if date.month==1] [0, 1, 3] From kwgoodman at gmail.com Wed Jan 20 10:21:53 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 20 Jan 2010 07:21:53 -0800 Subject: [Numpy-discussion] dates, np.where finding months In-Reply-To: References: <27242195.post@talk.nabble.com> Message-ID: On Wed, Jan 20, 2010 at 7:11 AM, Keith Goodman wrote: > On Wed, Jan 20, 2010 at 6:01 AM, John [H2O] wrote: >> >> I have an array with the leading column a series of datetime objects. It >> covers several years. What is the most efficient way to pull out all the >> 'January' dates? >> >> Right now I do this: >> >> A = array with column 0 datetime objects >> >> January = [i for i in A if i[0].month ==1 ] >> >> It works, but I would rather use np.where and get indices and not have to >> convert my list back into an array. > > Instead of doing this: > >>> A > > array([[2010-01-01, 1], > ? ? ? [2010-01-02, 2], > ? ? ? [2010-02-01, 3], > ? ? ? [2010-01-05, 4]], dtype=object) > >>> [i for i in A if i[0].month==1] > > [array([2010-01-01, 1], dtype=object), > ?array([2010-01-02, 2], dtype=object), > ?array([2010-01-05, 4], dtype=object)] > > You could do this: > >>> [i for i, date in enumerate(A[:,0]) if date.month==1] > ? [0, 1, 3] Or maybe this is cleaner: >> [date.month==1 for date in A[:,0]] [True, True, False, True] which can be used like this: >> idx = np.array([date.month==1 for date in A[:,0]]) >> A[idx,:] array([[2010-01-01, 1], [2010-01-02, 2], [2010-01-05, 4]], dtype=object) From kwgoodman at gmail.com Wed Jan 20 10:25:52 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 20 Jan 2010 07:25:52 -0800 Subject: [Numpy-discussion] dates, np.where finding months In-Reply-To: References: <27242195.post@talk.nabble.com> Message-ID: On Wed, Jan 20, 2010 at 7:21 AM, Keith Goodman wrote: > On Wed, Jan 20, 2010 at 7:11 AM, Keith Goodman wrote: >> On Wed, Jan 20, 2010 at 6:01 AM, John [H2O] wrote: >>> >>> I have an array with the leading column a series of datetime objects. It >>> covers several years. What is the most efficient way to pull out all the >>> 'January' dates? >>> >>> Right now I do this: >>> >>> A = array with column 0 datetime objects >>> >>> January = [i for i in A if i[0].month ==1 ] >>> >>> It works, but I would rather use np.where and get indices and not have to >>> convert my list back into an array. >> >> Instead of doing this: >> >>>> A >> >> array([[2010-01-01, 1], >> ? ? ? [2010-01-02, 2], >> ? ? ? [2010-02-01, 3], >> ? ? ? [2010-01-05, 4]], dtype=object) >> >>>> [i for i in A if i[0].month==1] >> >> [array([2010-01-01, 1], dtype=object), >> ?array([2010-01-02, 2], dtype=object), >> ?array([2010-01-05, 4], dtype=object)] >> >> You could do this: >> >>>> [i for i, date in enumerate(A[:,0]) if date.month==1] >> ? [0, 1, 3] > > Or maybe this is cleaner: > >>> [date.month==1 for date in A[:,0]] > ? [True, True, False, True] > > which can be used like this: > >>> idx = np.array([date.month==1 for date in A[:,0]]) >>> A[idx,:] > > array([[2010-01-01, 1], > ? ? ? [2010-01-02, 2], > ? ? ? [2010-01-05, 4]], dtype=object) Last one (I promise). If you don't need to keep the dates: >> A array([[2010-01-01, 1], [2010-01-02, 2], [2010-02-01, 3], [2010-01-05, 4]], dtype=object) >> A[:,0] = [date.month for date in A[:,0]] >> >> A array([[1, 1], [1, 2], [2, 3], [1, 4]], dtype=object) >> >> A[A[:,0]==1,:] array([[1, 1], [1, 2], [1, 4]], dtype=object) From nwagner at iam.uni-stuttgart.de Wed Jan 20 12:26:06 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 20 Jan 2010 18:26:06 +0100 Subject: [Numpy-discussion] Floating exception Message-ID: Hi all, I found a strange problem when I try to import numpy python -v >>> import numpy ... dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", 2); Floating exception Any idea ? Nils From washakie at gmail.com Wed Jan 20 16:12:25 2010 From: washakie at gmail.com (John [H2O]) Date: Wed, 20 Jan 2010 13:12:25 -0800 (PST) Subject: [Numpy-discussion] dates, np.where finding months In-Reply-To: References: <27242195.post@talk.nabble.com> Message-ID: <27248729.post@talk.nabble.com> Keith Goodman wrote: > > > > Or maybe this is cleaner: > >>> [date.month==1 for date in A[:,0]] > [True, True, False, True] > > which can be used like this: > >>> idx = np.array([date.month==1 for date in A[:,0]]) >>> A[idx,:] > > array([[2010-01-01, 1], > [2010-01-02, 2], > [2010-01-05, 4]], dtype=object) > _______________________________________________ > NumPy-Discussion mailing list > That's the keeper! Thanks for the responses. -- View this message in context: http://old.nabble.com/dates%2C-np.where-finding-months-tp27242195p27248729.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From eadrogue at gmx.net Wed Jan 20 16:56:40 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Wed, 20 Jan 2010 22:56:40 +0100 Subject: [Numpy-discussion] strange divergence in performance Message-ID: <20100120215640.GA10411@doriath.local> Hi, I have a function where an array of integers (1-d) is compared element-wise to an integer using the greater-than operator. I noticed that when the integer is 0 it takes about 75% more time than when it's 1 or 2. Is there an explanation? Here is a stripped-down version which does (sort of)show what I say: def filter_array(array, f1, f2, flag=False): if flag: k = 1 else: k = 0 m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0 m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0 mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k return array[mask] Now let's create an array with two fields: a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)]) Now call the function with flag=True and flag=False, and see what happens: In [29]: %timeit filter_array(a, (6,), (0,), flag=False) 1000 loops, best of 3: 536 us per loop In [30]: %timeit filter_array(a, (6,), (0,), flag=True) 1000 loops, best of 3: 245 us per loop In this example the difference seems to be 1:2. In my program is 1:4. I am at a loss about what causes this. Bye. From dsdale24 at gmail.com Wed Jan 20 16:57:01 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Wed, 20 Jan 2010 16:57:01 -0500 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} Message-ID: I haven't been following development on the trunk closely, so I apologize if this is a known issue. I didn't see anything relevant when I searched the list. I just updated my checkout of the trunk, cleaned out the old installation and build/, and reinstalled. When I run the test suite (without specifying the verbosity), I get a slew of warnings like: Warning: invalid value encountered in isinf Warning: invalid value encountered in isfinite I checked on both OS X 10.6 and gentoo linux, with similar results. The test suite reports "ok" at the end with 5 known failures and 4 skipped tests. Darren From robert.kern at gmail.com Wed Jan 20 17:17:26 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Jan 2010 16:17:26 -0600 Subject: [Numpy-discussion] strange divergence in performance In-Reply-To: <20100120215640.GA10411@doriath.local> References: <20100120215640.GA10411@doriath.local> Message-ID: <3d375d731001201417u4e57172en97f995ff3f475cca@mail.gmail.com> 2010/1/20 Ernest Adrogu? : > Hi, > > I have a function where an array of integers (1-d) is compared > element-wise to an integer using the greater-than operator. > I noticed that when the integer is 0 it takes about 75% more time > than when it's 1 or 2. Is there an explanation? > > Here is a stripped-down version which does (sort of)show what I say: > > def filter_array(array, f1, f2, flag=False): > > ? ?if flag: > ? ? ? ?k = 1 > ? ?else: > ? ? ? ?k = 0 > > ? ?m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0 > ? ?m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0 > > ? ?mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k > ? ?return array[mask] > > Now let's create an array with two fields: > > a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)]) > > Now call the function with flag=True and flag=False, and see what happens: > > In [29]: %timeit filter_array(a, (6,), (0,), flag=False) > 1000 loops, best of 3: 536 us per loop > > In [30]: %timeit filter_array(a, (6,), (0,), flag=True) > 1000 loops, best of 3: 245 us per loop > > In this example the difference seems to be 1:2. In my program > is 1:4. I am at a loss about what causes this. It is not the > operator that exhibits the difference. In [28]: x = np.random.random_integers(0,10,size=5000) In [29]: %timeit m = x > 0 100000 loops, best of 3: 19.1 us per loop In [30]: %timeit m = x > 1 100000 loops, best of 3: 19.3 us per loop The difference is in the array[mask]. There are necessarily fewer True elements in the mask for >1 than >0. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Wed Jan 20 17:23:01 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 20 Jan 2010 17:23:01 -0500 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: References: Message-ID: <1B8F39BB-3FB7-420E-8C46-754E29FECDFC@gmail.com> On Jan 20, 2010, at 4:57 PM, Darren Dale wrote: > I haven't been following development on the trunk closely, so I > apologize if this is a known issue. I didn't see anything relevant > when I searched the list. > > I just updated my checkout of the trunk, cleaned out the old > installation and build/, and reinstalled. When I run the test suite > (without specifying the verbosity), I get a slew of warnings like: > > Warning: invalid value encountered in isinf > Warning: invalid value encountered in isfinite > > I checked on both OS X 10.6 and gentoo linux, with similar results. > The test suite reports "ok" at the end with 5 known failures and 4 > skipped tests. That comes from numpy.ma. On the SVN, we got rid of a line in ma.core that forced the warnings to be silent module-wise. Instead, silencing the warnings is done inside each ma function. The issue shows up when you apply a np function on a masked_array: the warnings pop up because there's nothing to silence them, but everything should run smoothly. Should, because I'm pretty sure there's a catch somewhere. I'll have to go and check an alternative approach using your __array_prepare__. So yes, everything's fine (albeit noisy) From eadrogue at gmx.net Wed Jan 20 17:26:01 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Wed, 20 Jan 2010 23:26:01 +0100 Subject: [Numpy-discussion] strange divergence in performance In-Reply-To: <3d375d731001201417u4e57172en97f995ff3f475cca@mail.gmail.com> References: <20100120215640.GA10411@doriath.local> <3d375d731001201417u4e57172en97f995ff3f475cca@mail.gmail.com> Message-ID: <20100120222601.GA10517@doriath.local> 20/01/10 @ 16:17 (-0600), thus spake Robert Kern: > 2010/1/20 Ernest Adrogu? : > > Hi, > > > > I have a function where an array of integers (1-d) is compared > > element-wise to an integer using the greater-than operator. > > I noticed that when the integer is 0 it takes about 75% more time > > than when it's 1 or 2. Is there an explanation? > > > > Here is a stripped-down version which does (sort of)show what I say: > > > > def filter_array(array, f1, f2, flag=False): > > > > ? ?if flag: > > ? ? ? ?k = 1 > > ? ?else: > > ? ? ? ?k = 0 > > > > ? ?m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0 > > ? ?m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0 > > > > ? ?mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k > > ? ?return array[mask] > > > > Now let's create an array with two fields: > > > > a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)]) > > > > Now call the function with flag=True and flag=False, and see what happens: > > > > In [29]: %timeit filter_array(a, (6,), (0,), flag=False) > > 1000 loops, best of 3: 536 us per loop > > > > In [30]: %timeit filter_array(a, (6,), (0,), flag=True) > > 1000 loops, best of 3: 245 us per loop > > > > In this example the difference seems to be 1:2. In my program > > is 1:4. I am at a loss about what causes this. > > It is not the > operator that exhibits the difference. > > In [28]: x = np.random.random_integers(0,10,size=5000) > > In [29]: %timeit m = x > 0 > 100000 loops, best of 3: 19.1 us per loop > > In [30]: %timeit m = x > 1 > 100000 loops, best of 3: 19.3 us per loop > > > The difference is in the array[mask]. There are necessarily fewer True > elements in the mask for >1 than >0. Ahh, I see... seems obvious now. Thanks! From david at silveregg.co.jp Wed Jan 20 20:04:56 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 21 Jan 2010 10:04:56 +0900 Subject: [Numpy-discussion] Floating exception In-Reply-To: References: Message-ID: <4B57A838.7020607@silveregg.co.jp> Nils Wagner wrote: > Hi all, > > I found a strange problem when I try to import numpy > > python -v >>>> import numpy > ... > dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", > 2); > Floating exception > > Any idea ? Could you get a traceback (ideally making sure numpy is built with debug symbols - having -g in both CFLAGS and LDFLAGS) ? Having it happening inside the dlopen call is a bit weird, I can't see what could cause it, cheers, David From pav+sp at iki.fi Thu Jan 21 04:23:25 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Thu, 21 Jan 2010 09:23:25 +0000 (UTC) Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} References: Message-ID: Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: [clip] > Warning: invalid value encountered in isinf Warning: invalid value > encountered in isfinite [clip] This is because of changed seterr() default values. IMHO, the 'print' default is slightly worse than the previous 'ignore'. Personally, I don't see great value in the "invalid value encountered" reports that are appear every time a nan is generated... -- Pauli Virtanen From cournape at gmail.com Thu Jan 21 05:03:06 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 21 Jan 2010 19:03:06 +0900 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: References: Message-ID: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen wrote: > Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: > [clip] >> Warning: invalid value encountered in isinf Warning: invalid value >> encountered in isfinite > [clip] > > This is because of changed seterr() default values. > > IMHO, the 'print' default is slightly worse than the previous 'ignore'. > Personally, I don't see great value in the "invalid value encountered" > reports that are appear every time a nan is generated... I thought it was agreed that the default would be changed to warnings for 1.5.0 ? cheers, David From pav+sp at iki.fi Thu Jan 21 05:17:17 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Thu, 21 Jan 2010 10:17:17 +0000 (UTC) Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} References: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> Message-ID: Thu, 21 Jan 2010 19:03:06 +0900, David Cournapeau wrote: > On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen wrote: >> Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: [clip] >>> Warning: invalid value encountered in isinf Warning: invalid value >>> encountered in isfinite >> [clip] >> >> This is because of changed seterr() default values. >> >> IMHO, the 'print' default is slightly worse than the previous 'ignore'. >> Personally, I don't see great value in the "invalid value encountered" >> reports that are appear every time a nan is generated... > > I thought it was agreed that the default would be changed to warnings > for 1.5.0? I'm not so sure whether that's the best choice either, although it's better than stderr, but maybe I'm in a minority. OTOH, for instance Matlab and Fortran do not warn about division by zero or invalid values. Pauli From nwagner at iam.uni-stuttgart.de Thu Jan 21 06:36:25 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Thu, 21 Jan 2010 12:36:25 +0100 Subject: [Numpy-discussion] Floating exception In-Reply-To: <4B57A838.7020607@silveregg.co.jp> References: <4B57A838.7020607@silveregg.co.jp> Message-ID: On Thu, 21 Jan 2010 10:04:56 +0900 David Cournapeau wrote: > Nils Wagner wrote: >> Hi all, >> >> I found a strange problem when I try to import numpy >> >> python -v >>>>> import numpy >> ... >> dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", >> 2); >> Floating exception >> >> Any idea ? > > Could you get a traceback (ideally making sure numpy is >built with debug > symbols - having -g in both CFLAGS and LDFLAGS) ? > > Having it happening inside the dlopen call is a bit >weird, I can't see > what could cause it, > > cheers, > > David Hi David, Thank you for your response. I switched from CentOS 4.2 to CentOS 5.2 Here is the output of gdb python run -v # /data/home/nwagner/local/lib/python2.5/site-packages/site.pyc has bad magic ... What is the meaning of 'bad magic' ? Should I start with a clean /data/home/nwagner/local/lib/python2.5/site-packages ? Cheers, Nils From dagss at student.matnat.uio.no Thu Jan 21 06:57:09 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 Jan 2010 12:57:09 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading Message-ID: <4B584115.9090500@student.matnat.uio.no> (Apologies if this has been fixed in trunk; I base this on 1.4.0 and no related comments of MKL on the mailing list) I finally got the latest version of MKL working. What appears to have changed is that the MKL shared libraries will themselves dynamically load different other libraries, depending on the detected CPU. This is in some ways great news for me, because it means I can avoid worrying about miscompiles when compiling one single version of NumPy/SciPy to use for our heterogenous cluster. So I'd rather *not* link statically [1]. Anyway, after modifying site.cfg [2], things almost work, but not quite. The problem is that Python by default imports shared libs using RTLD_LOCAL. With this patch to NumPy it does: Change in numpy/linalg/linalg.py: from numpy.linalg import lapack_lite to: try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld Questions: a) Should I submit a patch? b) Negative consequences? Perhaps another Python module can now not load a different BLAS implementation? (That still seems better than not being able to use MKL IMO). c) Should this only be enabled by a flag somewhere? Where? Or can one just do it regardless of BLAS? d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, or does this apply to Windows too? [1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*. Dag Sverre From dagss at student.matnat.uio.no Thu Jan 21 06:59:15 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 Jan 2010 12:59:15 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B584115.9090500@student.matnat.uio.no> References: <4B584115.9090500@student.matnat.uio.no> Message-ID: <4B584193.1080309@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > (Apologies if this has been fixed in trunk; I base this on 1.4.0 and no > related comments of MKL on the mailing list) > > I finally got the latest version of MKL working. What appears to have > changed is that the MKL shared libraries will themselves dynamically > load different other libraries, depending on the detected CPU. > > This is in some ways great news for me, because it means I can avoid > worrying about miscompiles when compiling one single version of > NumPy/SciPy to use for our heterogenous cluster. So I'd rather *not* > link statically [1]. > > Anyway, after modifying site.cfg [2], things almost work, but not quite. > The problem is that Python by default imports shared libs using > RTLD_LOCAL. With this patch to NumPy it does: > > Change in numpy/linalg/linalg.py: > > from numpy.linalg import lapack_lite > > to: > > try: > import sys > import ctypes > _old_rtld = sys.getdlopenflags() > sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) > from numpy.linalg import lapack_lite > finally: > sys.setdlopenflags(_old_rtld) > del sys; del ctypes; del _old_rtld > > Questions: > > a) Should I submit a patch? > b) Negative consequences? Perhaps another Python module can now not load > a different BLAS implementation? (That still seems better than not being > able to use MKL IMO). > c) Should this only be enabled by a flag somewhere? Where? Or can one > just do it regardless of BLAS? > d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, > or does this apply to Windows too? > > [1] BTW, I could not figure out how to link statically if I wanted -- is > "search_static_first = 1" supposed to work? Perhaps MKL will insist on > loading some parts dynamically even then *shrug*. > Forgot this: [2] Here's my site.cfg: [mkl] library_dirs=/mn/corcaroli/d1/dagss/intel/mkl/10.2.3.029/lib/em64t include_dirs = /mn/corcaroli/d1/dagss/intel/mkl/10.2.3.029/include lapack_libs = mkl_lapack mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5 Then I need to set LD_LIBRARY_PATH as well prior to running (which I'm quite OK with). Dag Sverre From dagss at student.matnat.uio.no Thu Jan 21 07:01:30 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 Jan 2010 13:01:30 +0100 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com> Message-ID: <4B58421A.3020402@student.matnat.uio.no> Kashyap Ashwin wrote: > Matthieu, > I am not sure what exactly you mean. I did pass in "static" to the > link-adviser and this is the new setup.cfg > mkl_libs = mkl_solver_ilp64, mkl_intel_ilp64, mkl_gnu_thread, mkl_core. > > On import, Numpy complains as usual about the mkl_def and mkl_mc. If I > append these libs, then the crashes happen on test() (complains first > about the DGES* functions). > Also, I have made sure that g77 is not installed and only gfortran is > available. > > I also put in the LD_LIBRARY_PATH=/opt/intel/mkl/10.2.2.025/lib/em64t. > > Thanks, > Ashwin > This was an old post, but for Googlability of this thread: I think the major problem here is that one uses "ilp64" rather than "lp64". Even on a 64-bit system, it is common to assume integers are 32-bit for BLAS, and it seems NumPy makes this assumption as well. Just appending mkl_def or mkl_mc seems dangerous as I have a lurking feeling the purpose of those is to load different libraries for different CPUs, determined runtime. (This is not validated by anyone, it is just a guess). There's more problems, but I've detailed those in my recent post on "Proposed fix for MKL and dynamic loading". Dag Sverre > Your message: > Hi, > > You need to use the static libraries, are you sure you currently do? > > Matthieu > > 2009/10/15 Kashyap Ashwin : > >> I followed the advice given by the Intel MKL link adviser >> >> > (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) > >> This is my new site.cfg: >> mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core >> >> I also exported CFLAGS="-fopenmp" and built with the >> > --fcompiler=gnu95. > >> Now I get these errors on import: >> Running unit tests for numpy >> NumPy version 1.3.0 >> NumPy is installed in >> /opt/Personalization/lib/python2.5/site-packages/numpy >> Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 >> (Ubuntu 4.2.4-1ubuntu3)] >> nose version 0.11.0 >> >> *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined >> > symbol: > >> mkl_dft_commit_descriptor_s_c2c_md_omp >> *** libmkl_def.so *** failed with error : libmkl_def.so: undefined >> symbol: mkl_dft_commit_descriptor_s_c2c_md_omp >> MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so >> >> >> Any hints? >> > > >> Thanks, >> Ashwin >> >> >> >> Your message: >> >> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin >> wrote: >> >>> Hello, >>> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) >>> >> with >> >>> MKL. >>> This is my site.cfg: >>> [mkl] >>> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ >>> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t >>> include_dirs = /opt/intel/mkl/10.2.2.025/include >>> lapack_libs = mkl_lapack >>> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, >>> iomp5, mkl_vml_mc3 >>> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, >>> mkl_mc3, mkl_def >>> >> The order does not look right - I don't know the exact order (each >> version of the MKL changes the libraries), but you should respect the >> order as given in the MKL manual. >> >> >>> MKL ERROR: Parameter 4 was incorrect on entry to DGESV >>> >> This suggests an error when passing argument to MKL - I believe your >> version of MKL uses the gfortran ABI by default, and hardy uses g77 as >> the default fortran compiler. You should either recompile everything >> with gfortran, or regenerate the MKL interface libraries with g77 (as >> indicated in the manual). >> >> cheers, >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > >> -----Original Message----- >> From: Kashyap Ashwin >> Sent: Thursday, October 15, 2009 11:01 AM >> To: 'numpy-discussion at scipy.org' >> Subject: RE: MKL with 64bit crashes >> >> I followed the advice given by the Intel MKL link adviser >> > (http://software.intel.com/en- > >> us/articles/intel-mkl-link-line-advisor/) >> >> This is my new site.cfg: >> mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core >> >> I also exported CFLAGS="-fopenmp" and built with the >> > --fcompiler=gnu95. Now I get these errors on > >> import: >> Running unit tests for numpy >> NumPy version 1.3.0 >> NumPy is installed in >> > /opt/Personalization/lib/python2.5/site-packages/numpy > >> Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 >> > (Ubuntu 4.2.4-1ubuntu3)] > >> nose version 0.11.0 >> >> *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined >> > symbol: > >> mkl_dft_commit_descriptor_s_c2c_md_omp >> *** libmkl_def.so *** failed with error : libmkl_def.so: undefined >> > symbol: > >> mkl_dft_commit_descriptor_s_c2c_md_omp >> MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so >> >> >> Any hints? >> >> Thanks, >> Ashwin >> >> >> >> Your message: >> >> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin >> wrote: >> >>> Hello, >>> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) >>> > with > >>> MKL. >>> This is my site.cfg: >>> [mkl] >>> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ >>> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t >>> include_dirs = /opt/intel/mkl/10.2.2.025/include >>> lapack_libs = mkl_lapack >>> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, >>> iomp5, mkl_vml_mc3 >>> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, >>> mkl_mc3, mkl_def >>> >> The order does not look right - I don't know the exact order (each >> version of the MKL changes the libraries), but you should respect the >> order as given in the MKL manual. >> >> >>> MKL ERROR: Parameter 4 was incorrect on entry to DGESV >>> >> This suggests an error when passing argument to MKL - I believe your >> version of MKL uses the gfortran ABI by default, and hardy uses g77 as >> the default fortran compiler. You should either recompile everything >> with gfortran, or regenerate the MKL interface libraries with g77 (as >> indicated in the manual). >> >> cheers, >> >> David >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthieu.brucher at gmail.com Thu Jan 21 07:17:58 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 21 Jan 2010 13:17:58 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B584115.9090500@student.matnat.uio.no> References: <4B584115.9090500@student.matnat.uio.no> Message-ID: > try: > ? ?import sys > ? ?import ctypes > ? ?_old_rtld = sys.getdlopenflags() > ? ?sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) > ? ?from numpy.linalg import lapack_lite > finally: > ? ?sys.setdlopenflags(_old_rtld) > ? ?del sys; del ctypes; del _old_rtld This also applies to scipy code that relies on BLAS as well. Lisandra Dalcin gave me a tip that is close to this one some months ago (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). The best official solution is to statically link against the MKL with Python. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From dagss at student.matnat.uio.no Thu Jan 21 07:29:50 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 Jan 2010 13:29:50 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: References: <4B584115.9090500@student.matnat.uio.no> Message-ID: <4B5848BE.7020104@student.matnat.uio.no> Matthieu Brucher wrote: >> try: >> import sys >> import ctypes >> _old_rtld = sys.getdlopenflags() >> sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) >> from numpy.linalg import lapack_lite >> finally: >> sys.setdlopenflags(_old_rtld) >> del sys; del ctypes; del _old_rtld >> > > This also applies to scipy code that relies on BLAS as well. Lisandra > Dalcin gave me a tip that is close to this one some months ago > (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). > The best official solution is to statically link against the MKL with > Python. > > IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it is probably enough to ensure NumPy is patched in a way so that SciPy loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to building SciPy now.) As for static linking, do you mean linking MKL into the Python interpreter itself? Or statically linking with NumPy? In the former case....well, even if the above solution is a not-officially-supported hack, I'd prefer that to messing with the Python build as long as it actually works, which it seems to...requiring custom Python builds for MKL support is not something one should do if one could avoid it. (I build my own Python anyway, but I suppose many potential NumPy/MKL users don't.) Dag Sverre From matthieu.brucher at gmail.com Thu Jan 21 07:40:15 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 21 Jan 2010 13:40:15 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B5848BE.7020104@student.matnat.uio.no> References: <4B584115.9090500@student.matnat.uio.no> <4B5848BE.7020104@student.matnat.uio.no> Message-ID: 2010/1/21 Dag Sverre Seljebotn : > Matthieu Brucher wrote: >>> try: >>> ? ?import sys >>> ? ?import ctypes >>> ? ?_old_rtld = sys.getdlopenflags() >>> ? ?sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) >>> ? ?from numpy.linalg import lapack_lite >>> finally: >>> ? ?sys.setdlopenflags(_old_rtld) >>> ? ?del sys; del ctypes; del _old_rtld >>> >> >> This also applies to scipy code that relies on BLAS as well. Lisandra >> Dalcin gave me a tip that is close to this one some months ago >> (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). >> The best official solution is to statically link against the MKL with >> Python. >> >> > IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it > is probably enough to ensure NumPy is patched in a way so that SciPy > loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate > patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to > building SciPy now.) Indeed, it should be enough. > As for static linking, do you mean linking MKL into the Python > interpreter itself? Or statically linking with NumPy? statically linking with numpy. This is what was advised to me by Intel. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From dagss at student.matnat.uio.no Thu Jan 21 07:44:39 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 Jan 2010 13:44:39 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: References: <4B584115.9090500@student.matnat.uio.no> <4B5848BE.7020104@student.matnat.uio.no> Message-ID: <4B584C37.5000400@student.matnat.uio.no> Matthieu Brucher wrote: > 2010/1/21 Dag Sverre Seljebotn : > >> Matthieu Brucher wrote: >> >>>> try: >>>> import sys >>>> import ctypes >>>> _old_rtld = sys.getdlopenflags() >>>> sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) >>>> from numpy.linalg import lapack_lite >>>> finally: >>>> sys.setdlopenflags(_old_rtld) >>>> del sys; del ctypes; del _old_rtld >>>> >>>> >>> This also applies to scipy code that relies on BLAS as well. Lisandra >>> Dalcin gave me a tip that is close to this one some months ago >>> (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). >>> The best official solution is to statically link against the MKL with >>> Python. >>> >>> >>> >> IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it >> is probably enough to ensure NumPy is patched in a way so that SciPy >> loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate >> patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to >> building SciPy now.) >> > > Indeed, it should be enough. > > >> As for static linking, do you mean linking MKL into the Python >> interpreter itself? Or statically linking with NumPy? >> > > statically linking with numpy. This is what was advised to me by Intel. > Somehow I didn't manage to do that. a) search_static_first does not seem to work for me b) moving the .so's out of the way does manage something, but mkl_lapack only exists in .so form. Moving only that back in still didn't work. In the end I stopped playing, even more as RTLD_GLOBAL seems a superior solution, even if Intel isn't willing to directly support it... Dag Sverre From cournape at gmail.com Thu Jan 21 09:35:29 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 21 Jan 2010 23:35:29 +0900 Subject: [Numpy-discussion] Floating exception In-Reply-To: References: <4B57A838.7020607@silveregg.co.jp> Message-ID: <5b8d13221001210635h51fa8762s8d24f36d10a77013@mail.gmail.com> On Thu, Jan 21, 2010 at 8:36 PM, Nils Wagner wrote: > On Thu, 21 Jan 2010 10:04:56 +0900 > ?David Cournapeau wrote: >> Nils Wagner wrote: >>> Hi all, >>> >>> I found a strange problem when I try to import numpy >>> >>> python -v >>>>>> import numpy >>> ... >>> dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", >>> 2); >>> Floating exception >>> >>> Any idea ? >> >> Could you get a traceback (ideally making sure numpy is >>built with debug >> symbols - having -g in both CFLAGS and LDFLAGS) ? >> >> Having it happening inside the dlopen call is a bit >>weird, I can't see >> what could cause it, >> >> cheers, >> >> David > > Hi David, > > Thank you for your response. > I switched from CentOS 4.2 to CentOS 5.2 > Here is the output of > > gdb python > run -v > # > /data/home/nwagner/local/lib/python2.5/site-packages/site.pyc > has bad magic > ... > > What is the meaning of 'bad magic' ? It seems that the bad magic is coming from python, which would most likely mean the site.pyc bytecode is not compatible with the run python. This is independent of your problem I think, David From charlesr.harris at gmail.com Thu Jan 21 10:06:24 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jan 2010 08:06:24 -0700 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> References: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> Message-ID: On Thu, Jan 21, 2010 at 3:03 AM, David Cournapeau wrote: > On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen > > wrote: > > Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: > > [clip] > >> Warning: invalid value encountered in isinf Warning: invalid value > >> encountered in isfinite > > [clip] > > > > This is because of changed seterr() default values. > > > > IMHO, the 'print' default is slightly worse than the previous 'ignore'. > > Personally, I don't see great value in the "invalid value encountered" > > reports that are appear every time a nan is generated... > > I thought it was agreed that the default would be changed to warnings > for 1.5.0 ? > > It was. Well, it was agreed that the change shouldn't be made in 1.4 anyway. I'm starting to have second thoughts also. Not so much because of the messages emitted by the tests, but because I suspect a lot of users will be shocked, shocked, when their old applications fill the screen with messages. Easily fixed, but still... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Thu Jan 21 11:37:09 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Thu, 21 Jan 2010 11:37:09 -0500 Subject: [Numpy-discussion] Broadcasting and indexing Message-ID: <2A7AEDE8-82DE-4067-8A81-068FBB9E5B37@gmail.com> Hello, I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out. If I consider the following example: >>> a = np.random.random((4,5)) >>> b = np.random.random((5,)) >>> a + b array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ], [ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499], [ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084], [ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]]) I understand how this works, because it works as expected as described in http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows. Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails: >>> c = np.array([1,0,1,0,1], dtype=bool) >>> a[c] = 0 Traceback (most recent call last): File "", line 1, in IndexError: index (4) out of range (0<=index<3) in dimension 0 However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero >>> c = np.array([1,0,1,0], dtype=bool) >>> a[c] = 0 >>> a array([[ 0. , 0. , 0. , 0. , 0. ], [ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153], [ 0. , 0. , 0. , 0. , 0. ], [ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]]) But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero. Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue. Thanks in advance for any advice, Thomas From nwagner at iam.uni-stuttgart.de Thu Jan 21 11:37:47 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Thu, 21 Jan 2010 17:37:47 +0100 Subject: [Numpy-discussion] Floating exception In-Reply-To: <5b8d13221001210635h51fa8762s8d24f36d10a77013@mail.gmail.com> References: <4B57A838.7020607@silveregg.co.jp> <5b8d13221001210635h51fa8762s8d24f36d10a77013@mail.gmail.com> Message-ID: On Thu, 21 Jan 2010 23:35:29 +0900 David Cournapeau wrote: > On Thu, Jan 21, 2010 at 8:36 PM, Nils Wagner > wrote: >> On Thu, 21 Jan 2010 10:04:56 +0900 >> ?David Cournapeau wrote: >>> Nils Wagner wrote: >>>> Hi all, >>>> >>>> I found a strange problem when I try to import numpy >>>> >>>> python -v >>>>>>> import numpy >>>> ... >>>> dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", >>>> 2); >>>> Floating exception >>>> >>>> Any idea ? >>> >>> Could you get a traceback (ideally making sure numpy is >>>built with debug >>> symbols - having -g in both CFLAGS and LDFLAGS) ? >>> >>> Having it happening inside the dlopen call is a bit >>>weird, I can't see >>> what could cause it, >>> >>> cheers, >>> >>> David >> >> Hi David, >> >> Thank you for your response. >> I switched from CentOS 4.2 to CentOS 5.2 >> Here is the output of >> >> gdb python >> run -v >> # >> /data/home/nwagner/local/lib/python2.5/site-packages/site.pyc >> has bad magic >> ... >> >> What is the meaning of 'bad magic' ? > > It seems that the bad magic is coming from python, which >would most > likely mean the site.pyc bytecode is not compatible with >the run > python. This is independent of your problem I think, > > David O.k. here is some more information ... # can't create /data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/info.pyc dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", 2); Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 182894183648 (LWP 22301)] 0x000000350e8074d7 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 0x000000350e8074d7 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 #1 0x000000350e80789e in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #2 0x000000350e808c70 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x000000350f7f7ac8 in dl_open_worker () from /lib64/tls/libc.so.6 #4 0x000000350e80aab0 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #5 0x000000350f7f845a in _dl_open () from /lib64/tls/libc.so.6 #6 0x000000350fc01054 in dlopen_doit () from /lib64/libdl.so.2 Any idea ? Nils From emmanuelle.gouillart at normalesup.org Thu Jan 21 13:03:48 2010 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Thu, 21 Jan 2010 19:03:48 +0100 Subject: [Numpy-discussion] Broadcasting and indexing In-Reply-To: <2A7AEDE8-82DE-4067-8A81-068FBB9E5B37@gmail.com> References: <2A7AEDE8-82DE-4067-8A81-068FBB9E5B37@gmail.com> Message-ID: <20100121180348.GC14058@phare.normalesup.org> Hi Thomas, broadcasting rules are only for ufuncs (and by extension, some numpy functions using ufuncs). Indexing obeys different rules and always starts by the first dimension. However, you don't have to use broadcasting for such indexing operations: >>> a[:, c] = 0 zeroes columns indexed by c. If you want to index along the 3rd dimension, you can use a[:, :, c], etc. If the dimension along which you index is a variable, you can also use the function np.rollaxis that allows to change the order of the dimensions of an array. You may then index along the first dimension (a[c]), then change back the order of the dimensions. Here is an example: >>> a = np.ones((3,4,5,6)) >>> c = np.array([1,0,1,0,1], dtype=bool) >>> tmp_a = np.rollaxis(a, 2, 0) >>> tmp_a.shape (5, 3, 4, 6) >>> tmp_a[c] = 0 >>> a = np.rollaxis(tmp_a, 0, 3) >>> a.shape (3, 4, 5, 6) Hope this helps. Cheers, Emmanuelle On Thu, Jan 21, 2010 at 11:37:09AM -0500, Thomas Robitaille wrote: > Hello, > I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out. > If I consider the following example: > >>> a = np.random.random((4,5)) > >>> b = np.random.random((5,)) > >>> a + b > array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ], > [ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499], > [ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084], > [ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]]) > I understand how this works, because it works as expected as described in > http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting > So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows. > Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails: > >>> c = np.array([1,0,1,0,1], dtype=bool) > >>> a[c] = 0 > Traceback (most recent call last): > File "", line 1, in > IndexError: index (4) out of range (0<=index<3) in dimension 0 > However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero > >>> c = np.array([1,0,1,0], dtype=bool) > >>> a[c] = 0 > >>> a > array([[ 0. , 0. , 0. , 0. , 0. ], > [ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153], > [ 0. , 0. , 0. , 0. , 0. ], > [ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]]) > But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero. > Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue. > Thanks in advance for any advice, > Thomas > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Thu Jan 21 17:14:28 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 22 Jan 2010 07:14:28 +0900 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: References: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> Message-ID: <5b8d13221001211414g7e2a9756w2fdd6837be2e4dd6@mail.gmail.com> On Fri, Jan 22, 2010 at 12:06 AM, Charles R Harris wrote: > > > On Thu, Jan 21, 2010 at 3:03 AM, David Cournapeau > wrote: >> >> On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen wrote: >> > Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: >> > [clip] >> >> Warning: invalid value encountered in isinf Warning: invalid value >> >> encountered in isfinite >> > [clip] >> > >> > This is because of changed seterr() default values. >> > >> > IMHO, the 'print' default is slightly worse than the previous 'ignore'. >> > Personally, I don't see great value in the "invalid value encountered" >> > reports that are appear every time a nan is generated... >> >> I thought it was agreed that the default would be changed to warnings >> for 1.5.0 ? >> > > It was. Well, it was agreed that the change shouldn't be made in 1.4 anyway. > I'm starting to have second thoughts also. Not so much because of the > messages emitted by the tests, but because I suspect a lot of users will be > shocked, shocked, when their old applications fill the screen with messages. Hence changing to warning: it would appear only once per location, and can be filtered. And filling was what happened before it was changed by accident so that long ago anyway. David From cournape at gmail.com Thu Jan 21 17:27:49 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 22 Jan 2010 07:27:49 +0900 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: <5b8d13221001211414g7e2a9756w2fdd6837be2e4dd6@mail.gmail.com> References: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> <5b8d13221001211414g7e2a9756w2fdd6837be2e4dd6@mail.gmail.com> Message-ID: <5b8d13221001211427h182007b3m259f5b894a737164@mail.gmail.com> On Fri, Jan 22, 2010 at 7:14 AM, David Cournapeau wrote: > On Fri, Jan 22, 2010 at 12:06 AM, Charles R Harris > wrote: >> >> >> On Thu, Jan 21, 2010 at 3:03 AM, David Cournapeau >> wrote: >>> >>> On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen wrote: >>> > Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: >>> > [clip] >>> >> Warning: invalid value encountered in isinf Warning: invalid value >>> >> encountered in isfinite >>> > [clip] >>> > >>> > This is because of changed seterr() default values. >>> > >>> > IMHO, the 'print' default is slightly worse than the previous 'ignore'. >>> > Personally, I don't see great value in the "invalid value encountered" >>> > reports that are appear every time a nan is generated... >>> >>> I thought it was agreed that the default would be changed to warnings >>> for 1.5.0 ? >>> >> >> It was. Well, it was agreed that the change shouldn't be made in 1.4 anyway. >> I'm starting to have second thoughts also. Not so much because of the >> messages emitted by the tests, but because I suspect a lot of users will be >> shocked, shocked, when their old applications fill the screen with messages. > > Hence changing to warning: it would appear only once per location, and > can be filtered. And filling was what happened before it was changed > by accident so that long ago anyway. sorry, I mean it was the default not that long ago before it was changed by accident, David From pgmdevlist at gmail.com Thu Jan 21 18:07:35 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 21 Jan 2010 18:07:35 -0500 Subject: [Numpy-discussion] numpy.test(): invalid value encountered in {isinf, divide, power, ...} In-Reply-To: <5b8d13221001211427h182007b3m259f5b894a737164@mail.gmail.com> References: <5b8d13221001210203h69a3a436id6e24a03117a538f@mail.gmail.com> <5b8d13221001211414g7e2a9756w2fdd6837be2e4dd6@mail.gmail.com> <5b8d13221001211427h182007b3m259f5b894a737164@mail.gmail.com> Message-ID: <9ACA278C-46C6-4F29-91AF-EF2DCC0A804C@gmail.com> On Jan 21, 2010, at 5:27 PM, David Cournapeau wrote: > On Fri, Jan 22, 2010 at 7:14 AM, David Cournapeau wrote: >> On Fri, Jan 22, 2010 at 12:06 AM, Charles R Harris >> wrote: >>> >>> >>> On Thu, Jan 21, 2010 at 3:03 AM, David Cournapeau >>> wrote: >>>> >>>> On Thu, Jan 21, 2010 at 6:23 PM, Pauli Virtanen wrote: >>>>> Wed, 20 Jan 2010 16:57:01 -0500, Darren Dale wrote: >>>>> [clip] >>>>>> Warning: invalid value encountered in isinf Warning: invalid value >>>>>> encountered in isfinite >>>>> [clip] >>>>> >>>>> This is because of changed seterr() default values. >>>>> >>>>> IMHO, the 'print' default is slightly worse than the previous 'ignore'. >>>>> Personally, I don't see great value in the "invalid value encountered" >>>>> reports that are appear every time a nan is generated... >>>> >>>> I thought it was agreed that the default would be changed to warnings >>>> for 1.5.0 ? >>>> >>> >>> It was. Well, it was agreed that the change shouldn't be made in 1.4 anyway. >>> I'm starting to have second thoughts also. Not so much because of the >>> messages emitted by the tests, but because I suspect a lot of users will be >>> shocked, shocked, when their old applications fill the screen with messages. >> >> Hence changing to warning: it would appear only once per location, and >> can be filtered. And filling was what happened before it was changed >> by accident so that long ago anyway. > > sorry, I mean it was the default not that long ago before it was > changed by accident, At the same time, having a seterr module-wise in numpy.ma wasn't the best idea either. So, we could put the seterr(invalid:ignore) back in numpy.ma.core, that should take care of the warning when using np functions on masked arrays. Nevertheless, I'll still keep the current mechanism in the ma functions (viz, catch the settings, change them locally before an operation, reset them afterwards), that's cleaner. Unless you guys come with a better idea. From dwf at cs.toronto.edu Thu Jan 21 19:13:17 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 21 Jan 2010 19:13:17 -0500 Subject: [Numpy-discussion] generalized ufunc problem Message-ID: <8A5E87C4-5ACC-47B1-9628-97F9FAB12559@cs.toronto.edu> I decided to take a crack at adding a generalized ufunc for logsumexp, i.e. collapsed an array along the last dimension by subtracting the maximum element E along that dimension, taking the exponential, adding, and then adding back E. Functionally the same logaddexp.reduce() but presumably faster and less prone to error accumulation. I added the following to umath_tests.c.src and got everything to compile, but for some reason it doesn't give me the behaviour I'm looking for. I was expecting a (500, 50) array to be collapsed down to a (500,) array. Is that not what the signature calls for? Thanks, David char *logsumexp_signature = "(i)->()"; /**begin repeat #TYPE=LONG,DOUBLE# #typ=npy_long, npy_double# #EXPFUN=expl, exp# #LOGFUN=logl, log# */ /* * This implements the function * out[n] = sum_i { in1[n, i] * in2[n, i] }. */ static void @TYPE at _logsumexp(char **args, intp *dimensions, intp *steps, void *NPY_UNUSED(func)) { INIT_OUTER_LOOP_3 intp di = dimensions[0]; intp i; intp is1=steps[0]; BEGIN_OUTER_LOOP_3 char *ip1=args[0], *op=args[1]; @typ@ max = (*(@typ@ *)ip1); @typ@ sum = 0; for (i = 0; i < di; i++) { max = max < (*(@typ@ *)ip1) ? (*(@typ@ *)ip1) : max; ip1 += is1; } ip1 = args[0]; for (i = 0; i < di; i++) { sum += @EXPFUN@((*(@typ@ *)ip1) - max); ip1 += is1; } *(@typ@ *)op = @LOGFUN@(sum + max); END_OUTER_LOOP } /**end repeat**/ static PyUFuncGenericFunction logsumexp_functions[] = { LONG_logsumexp, DOUBLE_logsumexp }; static void * logsumexp_data[] = { (void *)NULL, (void *)NULL }; static char logsumexp_signatures[] = { PyArray_LONG, PyArray_LONG, PyArray_DOUBLE, PyArray_DOUBLE }; /* and inside addUfuncs() */ ... f = PyUFunc_FromFuncAndData(logsumexp_functions, logsumexp_data, logsumexp_signatures, 2, 1, 1, PyUFunc_None, "logsumexp", "inner1d with a weight argument \\n""\" \"\n"" \\ \"(i)->()\\\" \\n""", 0); PyDict_SetItemString(dictionary, "logsumexp", f); Py_DECREF(f); .... From warren.weckesser at enthought.com Thu Jan 21 19:30:59 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 21 Jan 2010 18:30:59 -0600 Subject: [Numpy-discussion] generalized ufunc problem In-Reply-To: <8A5E87C4-5ACC-47B1-9628-97F9FAB12559@cs.toronto.edu> References: <8A5E87C4-5ACC-47B1-9628-97F9FAB12559@cs.toronto.edu> Message-ID: <4B58F1C3.1000609@enthought.com> David, I haven't tried creating a ufunc before, so I can't help you with that, but since you are working on logsumexp, you might be interested in the version I posted here in October: http://mail.scipy.org/pipermail/scipy-user/2009-October/022931.html and the attached tests. Warren David Warde-Farley wrote: > I decided to take a crack at adding a generalized ufunc for logsumexp, > i.e. collapsed an array along the last dimension by subtracting the > maximum element E along that dimension, taking the exponential, > adding, and then adding back E. Functionally the same > logaddexp.reduce() but presumably faster and less prone to error > accumulation. > > I added the following to umath_tests.c.src and got everything to > compile, but for some reason it doesn't give me the behaviour I'm > looking for. I was expecting a (500, 50) array to be collapsed down to > a (500,) array. Is that not what the signature calls for? > > Thanks, > > David > > char *logsumexp_signature = "(i)->()"; > > /**begin repeat > > #TYPE=LONG,DOUBLE# > #typ=npy_long, npy_double# > #EXPFUN=expl, exp# > #LOGFUN=logl, log# > */ > > /* > * This implements the function > * out[n] = sum_i { in1[n, i] * in2[n, i] }. > */ > > static void > @TYPE at _logsumexp(char **args, intp *dimensions, intp *steps, void > *NPY_UNUSED(func)) > { > INIT_OUTER_LOOP_3 > intp di = dimensions[0]; > intp i; > intp is1=steps[0]; > BEGIN_OUTER_LOOP_3 > char *ip1=args[0], *op=args[1]; > @typ@ max = (*(@typ@ *)ip1); > @typ@ sum = 0; > > for (i = 0; i < di; i++) { > max = max < (*(@typ@ *)ip1) ? (*(@typ@ *)ip1) : max; > ip1 += is1; > } > ip1 = args[0]; > for (i = 0; i < di; i++) { > sum += @EXPFUN@((*(@typ@ *)ip1) - max); > ip1 += is1; > } > *(@typ@ *)op = @LOGFUN@(sum + max); > END_OUTER_LOOP > } > > /**end repeat**/ > > > > static PyUFuncGenericFunction logsumexp_functions[] = > { LONG_logsumexp, DOUBLE_logsumexp }; > static void * logsumexp_data[] = { (void *)NULL, (void *)NULL }; > static char logsumexp_signatures[] = { PyArray_LONG, PyArray_LONG, > PyArray_DOUBLE, PyArray_DOUBLE }; > > > /* and inside addUfuncs() */ > ... > > f = PyUFunc_FromFuncAndData(logsumexp_functions, logsumexp_data, > logsumexp_signatures, 2, 1, 1, > PyUFunc_None, "logsumexp", "inner1d > with a weight argument \\n""\" \"\n"" \\ > \"(i)->()\\\" \\n""", 0); PyDict_SetItemString(dictionary, > "logsumexp", f); Py_DECREF(f); > > .... > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: my_logsumexp_test.py URL: From david at silveregg.co.jp Thu Jan 21 20:02:57 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 22 Jan 2010 10:02:57 +0900 Subject: [Numpy-discussion] Floating exception In-Reply-To: References: <4B57A838.7020607@silveregg.co.jp> <5b8d13221001210635h51fa8762s8d24f36d10a77013@mail.gmail.com> Message-ID: <4B58F941.3040108@silveregg.co.jp> Nils Wagner wrote: > # can't create > /data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/info.pyc > dlopen("/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so", > 2); > > Program received signal SIGFPE, Arithmetic exception. > [Switching to Thread 182894183648 (LWP 22301)] > 0x000000350e8074d7 in do_lookup_x () from > /lib64/ld-linux-x86-64.so.2 > (gdb) bt > #0 0x000000350e8074d7 in do_lookup_x () from > /lib64/ld-linux-x86-64.so.2 > #1 0x000000350e80789e in _dl_lookup_symbol_x () from > /lib64/ld-linux-x86-64.so.2 > #2 0x000000350e808c70 in _dl_relocate_object () from > /lib64/ld-linux-x86-64.so.2 > #3 0x000000350f7f7ac8 in dl_open_worker () from > /lib64/tls/libc.so.6 > #4 0x000000350e80aab0 in _dl_catch_error () from > /lib64/ld-linux-x86-64.so.2 > #5 0x000000350f7f845a in _dl_open () from > /lib64/tls/libc.so.6 > #6 0x000000350fc01054 in dlopen_doit () from > /lib64/libdl.so.2 > > Any idea ? Are you using the python provided by your distribution ? The only reasonable reason I can think of is that some modules change the FPU exception handling without rolling it back before importing multiarray, and dlopen expects the FPU state to be in a certain state. It would be good to make sure you are only loading numpy (if you use easy_install/setuptools, it may cause other packages to import "automatically"). Or maybe multiarray causes a FPU exception in dlopen because multiarray is badly built or some other corner cases in do_lookup_x, although this sounds much less likely - one good way would be load numpy with the debug version of glibc to get the exact line failing inside do_lookup_x. I see only one integer division in the do_lookup_x code, but it would be good to confirm it actually happens there. I don't know how it works on Centos to get your program loaded with the debug version of glibc, cheers, David From david at silveregg.co.jp Thu Jan 21 20:09:14 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 22 Jan 2010 10:09:14 +0900 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B584115.9090500@student.matnat.uio.no> References: <4B584115.9090500@student.matnat.uio.no> Message-ID: <4B58FABA.3030207@silveregg.co.jp> Dag Sverre Seljebotn wrote: > Questions: > > a) Should I submit a patch? > b) Negative consequences? Perhaps another Python module can now not load > a different BLAS implementation? (That still seems better than not being > able to use MKL IMO). Besides the problem of ctypes not always being available, I am very wary of those library-specific hacks. Worse, it is version dependent, because it depends on the MKL. > d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, > or does this apply to Windows too? Windows does not have dlopen, and has totally different semantics for dynamic loading. Besides, this is not needed on windows. So it should not be executed at all. > [1] BTW, I could not figure out how to link statically if I wanted -- is > "search_static_first = 1" supposed to work? Perhaps MKL will insist on > loading some parts dynamically even then *shrug*. search_static_first is inherently fragile - using the linker to do this is much better (with -WL,-Bshared/-Wl,-Bstatic flags). cheers, David From fperez.net at gmail.com Thu Jan 21 21:07:30 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 21 Jan 2010 18:07:30 -0800 Subject: [Numpy-discussion] numpy broken by r8077? Message-ID: Hi all, This simple-looking change: http://projects.scipy.org/numpy/changeset/8077 is giving me a wholly broken (unimportable) numpy on ubuntu 9.10, 64-bit: uqbar[junk]> python -c 'import numpy' Traceback (most recent call last): File "", line 1, in File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/__init__.py", line 136, in import add_newdocs File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/lib/__init__.py", line 4, in from type_check import * File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/lib/type_check.py", line 8, in import numpy.core.numeric as _nx File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/core/__init__.py", line 5, in import multiarray ImportError: /home/fperez/usr/opt/lib/python2.6/site-packages/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString I reverted to 8076 and it's ok now: uqbar[junk]> python -c 'import numpy;numpy.test()' Running unit tests for numpy NumPy version 1.5.0.dev8076 NumPy is installed in /home/fperez/usr/opt/lib/python2.6/site-packages/numpy Python version 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) [GCC 4.4.1] nose version 0.11.1 .......................................................................... ---------------------------------------------------------------------- Ran 2506 tests in 7.083s OK (KNOWNFAIL=5, SKIP=4) The change is very small and was logged as fixing another bug, but it seems something is amiss now. If it was committed like that I imagine it works on some systems, so is it a problem on my end? I'm using the system python (2.6) and it's an otherwise fairly standard ubuntu box, were I only run ipython, numpy, scipy and matplotlib from dev trees and everything else is stock from the distro. Any help much appreciated, f From david at silveregg.co.jp Thu Jan 21 21:12:52 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 22 Jan 2010 11:12:52 +0900 Subject: [Numpy-discussion] numpy broken by r8077? In-Reply-To: References: Message-ID: <4B5909A4.8090905@silveregg.co.jp> Fernando Perez wrote: > Hi all, > > This simple-looking change: > > http://projects.scipy.org/numpy/changeset/8077 > > is giving me a wholly broken (unimportable) numpy on ubuntu 9.10, 64-bit: > > uqbar[junk]> python -c 'import numpy' > Traceback (most recent call last): > File "", line 1, in > File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/__init__.py", > line 136, in > import add_newdocs > File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/add_newdocs.py", > line 9, in > from numpy.lib import add_newdoc > File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/lib/__init__.py", > line 4, in > from type_check import * > File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/lib/type_check.py", > line 8, in > import numpy.core.numeric as _nx > File "/home/fperez/usr/opt/lib/python2.6/site-packages/numpy/core/__init__.py", > line 5, in > import multiarray > ImportError: /home/fperez/usr/opt/lib/python2.6/site-packages/numpy/core/multiarray.so: > undefined symbol: PyUnicodeUCS2_AsASCIIString This is typically caused by incompatible python (e.g. you build with one python using 4-bytes/char unicode, and numpy is built against a python using 2-bytes/char). Anyway, very unlikely to be caused by r8077. I suspect some mixedup in your build - can you build numpy r8077 from scratch, after having removed installed and build directories ? David From fperez.net at gmail.com Thu Jan 21 21:39:29 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 21 Jan 2010 18:39:29 -0800 Subject: [Numpy-discussion] numpy broken by r8077? In-Reply-To: <4B5909A4.8090905@silveregg.co.jp> References: <4B5909A4.8090905@silveregg.co.jp> Message-ID: On Thu, Jan 21, 2010 at 6:12 PM, David Cournapeau wrote: > This is typically caused by incompatible python (e.g. you build with one > python using 4-bytes/char unicode, and numpy is built against a python > using 2-bytes/char). Anyway, very unlikely to be caused by r8077. > > I suspect some mixedup in your build - can you build numpy r8077 from > scratch, after having removed installed and build directories ? > Dead-on, thanks! For the record, in case anyone encounters a similar situation: this box is now running ubuntu 9.10, recently upgraded from 8.10. In 8.10, the system python was 2.5, so I had built python2.6 and put it in /usr/local/, that was 2.6.2. I now updated the machine, and it's python2.6 is 2.6.4. But my build script ended up building numpy against the 2.6.2 (likely a UCS2 build), giving me a numpy build that couldn't be imported by the system python. Subtle, but perfectly reasonable in retrospect. Thanks a lot for your hint, David, it was clear enough to let me understand the problem. Sorry for the noise, f From charlesr.harris at gmail.com Thu Jan 21 21:43:29 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jan 2010 19:43:29 -0700 Subject: [Numpy-discussion] numpy broken by r8077? In-Reply-To: References: Message-ID: On Thu, Jan 21, 2010 at 7:07 PM, Fernando Perez wrote: > Hi all, > > This simple-looking change: > > http://projects.scipy.org/numpy/changeset/8077 > > is giving me a wholly broken (unimportable) numpy on ubuntu 9.10, 64-bit: > > Works fine here on the same distro. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Thu Jan 21 21:56:50 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 21 Jan 2010 18:56:50 -0800 Subject: [Numpy-discussion] numpy broken by r8077? In-Reply-To: References: Message-ID: On Thu, Jan 21, 2010 at 6:43 PM, Charles R Harris wrote: > > Works fine here on the same distro. > Yup, as David suggested, I was managing to pick up a narrow build of python 2.6 leftover in /usr/local/ from before upgrading to 9.10, which I had needed back when I was running 8.10 here. Sorry for the noise, f From dagss at student.matnat.uio.no Fri Jan 22 04:34:07 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 22 Jan 2010 10:34:07 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B58FABA.3030207@silveregg.co.jp> References: <4B584115.9090500@student.matnat.uio.no> <4B58FABA.3030207@silveregg.co.jp> Message-ID: <4B59710F.2040507@student.matnat.uio.no> David Cournapeau wrote: > Dag Sverre Seljebotn wrote: > > >> Questions: >> >> a) Should I submit a patch? >> b) Negative consequences? Perhaps another Python module can now not load >> a different BLAS implementation? (That still seems better than not being >> able to use MKL IMO). >> > > Besides the problem of ctypes not always being available, I am very wary > of those library-specific hacks. Worse, it is version dependent, because > it depends on the MKL. > I was thinking that this was perhaps a general problem -- that *if* ATLAS started implementing support for dynamically switchable kernels at load time (which is a feature I certainly wish for), it would suffer the same problems. But I don't really know that. DLFCN can be used instead of ctypes. Which I think is not always available either, but "except ImportError: pass" should be fine in this kind of situation -- if you need the workaround you'd typically have it. The only real issue I can see is if it has a significant impact on import times for non-MKL users. But I won't put up a big fight for this kind patch -- I can work around it for my own purposes. I just though it might be nice to make things easier/more transparent for NumPy/MKL users. >> [1] BTW, I could not figure out how to link statically if I wanted -- is >> "search_static_first = 1" supposed to work? Perhaps MKL will insist on >> loading some parts dynamically even then *shrug*. >> > > search_static_first is inherently fragile - using the linker to do this > is much better (with -WL,-Bshared/-Wl,-Bstatic flags). > Thanks! (I'll do that if I get any problems, but I have 3-4 other libs depending on BLAS as well loaded, so shared is better in principle.) Dag Sverre From matthieu.brucher at gmail.com Fri Jan 22 05:21:17 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 22 Jan 2010 11:21:17 +0100 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: <4B58FABA.3030207@silveregg.co.jp> References: <4B584115.9090500@student.matnat.uio.no> <4B58FABA.3030207@silveregg.co.jp> Message-ID: >> [1] BTW, I could not figure out how to link statically if I wanted -- is >> "search_static_first = 1" supposed to work? Perhaps MKL will insist on >> loading some parts dynamically even then *shrug*. > > search_static_first is inherently fragile - using the linker to do this > is much better (with -WL,-Bshared/-Wl,-Bstatic flags). How do you write the site.cfg accordingly? Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From josef.pktd at gmail.com Fri Jan 22 09:51:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 22 Jan 2010 09:51:13 -0500 Subject: [Numpy-discussion] Broadcasting and indexing In-Reply-To: <20100121180348.GC14058@phare.normalesup.org> References: <2A7AEDE8-82DE-4067-8A81-068FBB9E5B37@gmail.com> <20100121180348.GC14058@phare.normalesup.org> Message-ID: <1cd32cbb1001220651n19d21057ib89126b06aed8264@mail.gmail.com> On Thu, Jan 21, 2010 at 1:03 PM, Emmanuelle Gouillart wrote: > > Hi Thomas, > > broadcasting rules are only for ufuncs (and by extension, some numpy > functions using ufuncs). Indexing obeys different rules and always starts > by the first dimension. Just a clarification: If there are several index arrays, then standard broadcasting rules apply for them. It's a bit messier when arrays and slice objects are mixed. An informative explanation was in the thread March 2009 about "Is this a bug?" and lots of examples are on the mailing list Josef > > However, you don't have to use broadcasting for such indexing operations: >>>> a[:, c] = 0 > zeroes columns indexed by c. > > If you want to index along the 3rd dimension, you can use a[:, :, c], > etc. If the dimension along which you index is a variable, you can also > use the function np.rollaxis that allows to change the order of the > dimensions of an array. You may then index along the first dimension > (a[c]), then change back the order of the dimensions. Here is an example: >>>> a = np.ones((3,4,5,6)) >>>> c = np.array([1,0,1,0,1], dtype=bool) >>>> tmp_a = np.rollaxis(a, 2, 0) >>>> tmp_a.shape > (5, 3, 4, 6) >>>> tmp_a[c] = 0 >>>> a = np.rollaxis(tmp_a, 0, 3) >>>> a.shape > (3, 4, 5, 6) > > Hope this helps. > > Cheers, > > Emmanuelle > > On Thu, Jan 21, 2010 at 11:37:09AM -0500, Thomas Robitaille wrote: >> Hello, > >> I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out. > >> If I consider the following example: > >> >>> a = np.random.random((4,5)) >> >>> b = np.random.random((5,)) >> >>> a + b >> array([[ 1.45499556, ?0.60633959, ?0.48236157, ?1.55357393, ?1.4339261 ], >> ? ? ? ?[ 1.28614593, ?1.11265001, ?0.63308615, ?1.28904227, ?1.34070499], >> ? ? ? ?[ 1.26988279, ?0.84683018, ?0.98959466, ?0.76388223, ?0.79273084], >> ? ? ? ?[ 1.27859505, ?0.9721984 , ?1.02725009, ?1.38852061, ?1.56065028]]) > >> I understand how this works, because it works as expected as described in > >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting > >> So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows. > >> Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails: > >> >>> c = np.array([1,0,1,0,1], dtype=bool) >> >>> a[c] = 0 >> Traceback (most recent call last): >> ? File "", line 1, in >> IndexError: index (4) out of range (0<=index<3) in dimension 0 > >> However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero > >> >>> c = np.array([1,0,1,0], dtype=bool) >> >>> a[c] = 0 >> >>> a >> array([[ 0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?], >> ? ? ? ?[ 0.41526315, ?0.7425491 , ?0.39872546, ?0.56141914, ?0.69795153], >> ? ? ? ?[ 0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?, ?0. ? ? ? ?], >> ? ? ? ?[ 0.40771227, ?0.60209749, ?0.7928894 , ?0.66089748, ?0.91789682]]) > >> But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero. > >> Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue. > >> Thanks in advance for any advice, > >> Thomas >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Fri Jan 22 12:11:14 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 22 Jan 2010 09:11:14 -0800 Subject: [Numpy-discussion] np.testing.assert_equal Message-ID: Should np.testing.assert_equal(np.array(1), 1) raise an AssertionError? (It currently doesn't.) The use case I have in mind is that scipy.stats.nanamedian incorrectly returns np.array(1.0) for the median of a 1d array while np.median correctly returns 1.0. It would be handy if the assert statement caught the difference. From charlesr.harris at gmail.com Fri Jan 22 12:35:10 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 Jan 2010 10:35:10 -0700 Subject: [Numpy-discussion] np.testing.assert_equal In-Reply-To: References: Message-ID: On Fri, Jan 22, 2010 at 10:11 AM, Keith Goodman wrote: > Should > > np.testing.assert_equal(np.array(1), 1) > > raise an AssertionError? (It currently doesn't.) > > The use case I have in mind is that scipy.stats.nanamedian incorrectly > returns np.array(1.0) for the median of a 1d array while np.median > correctly returns 1.0. It would be handy if the assert statement > caught the difference. > _ Such a change would break most of the current tests. The current form was chosen to be convenient for most use cases. If you need to check the return type, then you can write another helper function or use assert_ to combine several tests. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 22 16:02:39 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 Jan 2010 14:02:39 -0700 Subject: [Numpy-discussion] void * abuse Message-ID: Example: typedef void (PyArray_VectorUnaryFunc)(void *, void *, npy_intp, void *, void *) The actual functions that implement this type don't match the prototype, an unfortunate fact that is papered over by casting said functions to the type to avoid warnings about initialization from incompatible pointer types. Theoretically real problems could result from this, in practice the main effect is that type checking by the compiler doesn't take place leading to undisciplined implementions. The current prototype isn't much better than a simple typedef void (PyArray_VectorUnaryFunc)() So... this can be fixed but the fix could cause some current code in the wild to result in some warnings next time it is compiled. I doubt there will be any binary incompatibility with existing compiled code. Should I fix it? It is really ugly as it stands. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Fri Jan 22 22:28:12 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 22 Jan 2010 22:28:12 -0500 Subject: [Numpy-discussion] generalized ufunc problem In-Reply-To: <4B58F1C3.1000609@enthought.com> References: <8A5E87C4-5ACC-47B1-9628-97F9FAB12559@cs.toronto.edu> <4B58F1C3.1000609@enthought.com> Message-ID: <2C906C96-707B-478B-B7B5-A646E6966910@cs.toronto.edu> Hi Warren, Thanks for the reply. I actually have a very similar version in my own code; I was just hoping to figure out the generalized ufunc architecture. There aren't many examples of actual uses of this capability in NumPy, so I wanted to try and exercise it a bit. logsumexp is kind of a perfect scenario for generalized ufuncs as it requires operations over an entire dimension (the max operation). Cheers, David On 21-Jan-10, at 7:30 PM, Warren Weckesser wrote: > David, > > I haven't tried creating a ufunc before, so I can't help you with > that, but since you are working on logsumexp, you might be > interested in the version I posted here in October: > > http://mail.scipy.org/pipermail/scipy-user/2009-October/022931.html > > and the attached tests. > > > Warren > > > David Warde-Farley wrote: >> I decided to take a crack at adding a generalized ufunc for >> logsumexp, i.e. collapsed an array along the last dimension by >> subtracting the maximum element E along that dimension, taking the >> exponential, adding, and then adding back E. Functionally the >> same logaddexp.reduce() but presumably faster and less prone to >> error accumulation. >> >> I added the following to umath_tests.c.src and got everything to >> compile, but for some reason it doesn't give me the behaviour I'm >> looking for. I was expecting a (500, 50) array to be collapsed down >> to a (500,) array. Is that not what the signature calls for? >> >> Thanks, >> >> David >> >> char *logsumexp_signature = "(i)->()"; >> >> /**begin repeat >> >> #TYPE=LONG,DOUBLE# >> #typ=npy_long, npy_double# >> #EXPFUN=expl, exp# >> #LOGFUN=logl, log# >> */ >> >> /* >> * This implements the function >> * out[n] = sum_i { in1[n, i] * in2[n, i] }. >> */ >> >> static void >> @TYPE at _logsumexp(char **args, intp *dimensions, intp *steps, void >> *NPY_UNUSED(func)) >> { >> INIT_OUTER_LOOP_3 >> intp di = dimensions[0]; >> intp i; >> intp is1=steps[0]; >> BEGIN_OUTER_LOOP_3 >> char *ip1=args[0], *op=args[1]; >> @typ@ max = (*(@typ@ *)ip1); >> @typ@ sum = 0; >> >> for (i = 0; i < di; i++) { >> max = max < (*(@typ@ *)ip1) ? (*(@typ@ *)ip1) : max; >> ip1 += is1; >> } >> ip1 = args[0]; >> for (i = 0; i < di; i++) { >> sum += @EXPFUN@((*(@typ@ *)ip1) - max); >> ip1 += is1; >> } >> *(@typ@ *)op = @LOGFUN@(sum + max); >> END_OUTER_LOOP >> } >> >> /**end repeat**/ >> >> >> >> static PyUFuncGenericFunction logsumexp_functions[] = >> { LONG_logsumexp, DOUBLE_logsumexp }; >> static void * logsumexp_data[] = { (void *)NULL, (void *)NULL }; >> static char logsumexp_signatures[] = { PyArray_LONG, PyArray_LONG, >> PyArray_DOUBLE, PyArray_DOUBLE }; >> >> >> /* and inside addUfuncs() */ >> ... >> >> f = PyUFunc_FromFuncAndData(logsumexp_functions, >> logsumexp_data, logsumexp_signatures, >> 2, 1, 1, PyUFunc_None, >> "logsumexp", "inner1d with a >> weight argument \\n""\" \"\n"" \\ \"(i)- >> >()\\\" \\n""", 0); PyDict_SetItemString(dictionary, >> "logsumexp", f); Py_DECREF(f); >> >> .... >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > from numpy import * > from scipy.maxentropy import logsumexp > > from my_logsumexp import my_logsumexp > > > if __name__ == "__main__": > > # > #-------------------------------------------- > # 1D tests. > #-------------------------------------------- > x = array([1.0,2.0,3.0]) > p = logsumexp(x) > q = my_logsumexp(x) > assert p == q > > x = array([1.0,2.0,3.0]) > p = logsumexp(x) > q = my_logsumexp(x, axis=0) > assert p == q > > #-------------------------------------------- > # A 2D test. > #-------------------------------------------- > > a = array([[1.0,2.0,3.0],[0.1,0.2,0.3]]) > q = my_logsumexp(a, axis=0) > assert allclose(q[0], logsumexp(a[:,0])) > assert allclose(q[1], logsumexp(a[:,1])) > assert allclose(q[2], logsumexp(a[:,2])) > q = my_logsumexp(a, axis=1) > assert allclose(q[0], logsumexp(a[0])) > assert allclose(q[1], logsumexp(a[1])) > > #-------------------------------------------- > # A 3D test. > #-------------------------------------------- > L = 3 > M = 4 > N = 5 > w = random.random((L, M, N)) > q0 = empty((M,N)) > for j in range(M): > for k in range(N): > q0[j,k] = logsumexp(w[:,j,k]) > q1 = empty((L,N)) > for i in range(L): > for k in range(N): > q1[i,k] = logsumexp(w[i,:,k]) > q2 = empty((L,M)) > for i in range(L): > for j in range(M): > q2[i,j] = logsumexp(w[i,j,:]) > > assert allclose(q0, my_logsumexp(w, axis=0)) > assert allclose(q1, my_logsumexp(w, axis=1)) > assert allclose(q2, my_logsumexp(w, axis=2)) > #-------------------------------------------- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan at ajackson.org Sat Jan 23 13:50:19 2010 From: alan at ajackson.org (alan at ajackson.org) Date: Sat, 23 Jan 2010 12:50:19 -0600 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> Message-ID: <20100123125019.74e289cc@ajackson.org> >On Mon, Jan 4, 2010 at 10:39 PM, wrote: >>>Hi folks, >>> >>>I'm taking a look once again at fromfile() for reading text files. I >>>often have the need to read a LOT of numbers form a text file, and it >>>can actually be pretty darn slow do i the normal python way: >>> .....................big snip >> >> I agree. I've tried using it, and usually find that it doesn't quite get there. >> >> I rather like the R command(s) for reading text files - except then I have to >> use R which is painful after using python and numpy. Although ggplot2 is >> awfully nice too ... but that is a later post. >> >> ? ? read.table(file, header = FALSE, sep = "", quote = "\"'", >> ? ? ? ? ? ? ? ?dec = ".", row.names, col.names, >> ? ? ? ? ? ? ? ?as.is = !stringsAsFactors, >> ? ? ? ? ? ? ? ?na.strings = "NA", colClasses = NA, nrows = -1, >> ? ? ? ? ? ? ? ?skip = 0, check.names = TRUE, fill = !blank.lines.skip, >> ? ? ? ? ? ? ? ?strip.white = FALSE, blank.lines.skip = TRUE, >> ? ? ? ? ? ? ? ?comment.char = "#", >> ? ? ? ? ? ? ? ?allowEscapes = FALSE, flush = FALSE, >> ? ? ? ? ? ? ? ?stringsAsFactors = default.stringsAsFactors(), >> ? ? ? ? ? ? ? ?fileEncoding = "", encoding = "unknown") ....................... big snip > > >Aren't the newly improved > >numpy.genfromtxt(fname, dtype=, comments='#', >delimiter=None, skiprows=0, converters=None, missing='', >missing_values=None, usecols=None, names=None, excludelist=None, >deletechars=None, case_sensitive=True, unpack=None, usemask=False, >loose=True) > >and friends indented to handle all this > >Josef > Reopening an old thread... genfromtxt is a big step forward. Something I'm fiddling with is trying to work through the book "Using R for Data Analysis and Graphics, Introduction, Code, and Commentary" by J H Maindonald (available online), in python. So I am trying to see what it takes in python/numpy to work his examples and problems, sort of a learning exercise for me. So anyway, with that introduction, here is a case that I believe genfromtxt fails on, because it doesn't support the reasonable (IMHO) behavior of treating quote delimited strings in the input file as a single field. Below is the example from the book... So we have 2 issues. The header for the first field is quote-blank-quote, and various values for field one have 1 to 3 blank delimited strings, but encapsulated in quotes. I'm putting something together to read it using shlex.split, since it honors strings protected by quote pairs. I'm not an excel person, but I think it might export data like this in a format similar to what is shown below. " " "distance" "climb" "time" "Greenmantle" 2.5 650 16.083 "Carnethy" 6 2500 48.35 "Craig Dunain" 6 900 33.65 "Ben Rha" 7.5 800 45.6 "Ben Lomond" 8 3070 62.267 "Goatfell" 8 2866 73.217 "Bens of Jura" 16 7500 204.617 "Cairnpapple" 6 800 36.367 "Scolty" 5 800 29.75 "Traprain" 6 650 39.75 "Lairig Ghru" 28 2100 192.667 -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From Chris.Barker at noaa.gov Sat Jan 23 13:53:55 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 23 Jan 2010 10:53:55 -0800 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <20100123125019.74e289cc@ajackson.org> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> <20100123125019.74e289cc@ajackson.org> Message-ID: <4B5B45C3.2030302@noaa.gov> alan at ajackson.org wrote: > it doesn't support the reasonable > (IMHO) behavior of treating quote delimited strings in the input file as a > single field. I'd use the csv module for that. Which makes me wonder if it might make sense to build some of the numpy table-reading stuff on top of it... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From josef.pktd at gmail.com Sat Jan 23 14:04:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 23 Jan 2010 14:04:24 -0500 Subject: [Numpy-discussion] fromfile() for reading text (one more time!) In-Reply-To: <4B5B45C3.2030302@noaa.gov> References: <4B42905A.4080105@noaa.gov> <20100104213942.588435c2@ajackson.org> <1cd32cbb1001041945n24a5c49qcf80ef58430596f2@mail.gmail.com> <20100123125019.74e289cc@ajackson.org> <4B5B45C3.2030302@noaa.gov> Message-ID: <1cd32cbb1001231104j59d0f8d1n8b8e9ec9ed9cb374@mail.gmail.com> On Sat, Jan 23, 2010 at 1:53 PM, Christopher Barker wrote: > alan at ajackson.org wrote: >> it doesn't support the reasonable >> (IMHO) behavior of treating quote delimited strings in the input file as a >> single field. > > I'd use the csv module for that. > > Which makes me wonder if it might make sense to build some of the numpy > table-reading stuff on top of it... > > -Chris csv was also my standard module for this, it handles csv dialects and unicode (with some detour), but having automatic conversion in genfromtext is nicer. >>> reader = csv.reader(open(r'C:\Josef\work-oth\testdata.csv','rb'), delimiter=' ') >>> for line in reader: ... print line ... ['Greenmantle', '2.5', '650', '16.083'] ['Carnethy', '6', '2500', '48.35'] ['Craig Dunain', '6', '900', '33.65'] ['Ben Rha', '7.5', '800', '45.6'] ['Ben Lomond', '8', '3070', '62.267'] ['Goatfell', '8', '2866', '73.217'] ['Bens of Jura', '16', '7500', '204.617'] ['Cairnpapple', '6', '800', '36.367'] ['Scolty', '5', '800', '29.75'] ['Traprain', '6', '650', '39.75'] ['Lairig Ghru', '28', '2100', '192.667'] Josef > > > -- > Christopher Barker, Ph.D. > Oceanographer > > NOAA/OR&R/HAZMAT ? ? ? ? (206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From aisaac at american.edu Sat Jan 23 14:20:32 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 23 Jan 2010 14:20:32 -0500 Subject: [Numpy-discussion] fast duplicate of array Message-ID: <4B5B4C00.8040505@american.edu> Suppose x and y are conformable 2d arrays. I now want x to become a duplicate of y. I could create a new array: x = y.copy() or I could assign the values of y to x: x[:,:] = y As expected the latter is faster (no array creation). Are there better ways? Thanks, Alan Isaac From aisaac at american.edu Sat Jan 23 14:36:35 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 23 Jan 2010 14:36:35 -0500 Subject: [Numpy-discussion] random v. random_state in RandomState Message-ID: <4B5B4FC3.2010309@american.edu> As I understand it, numpy.random provides the function ``random`` as an alias for ``random_state``. Might this be moved into numpy.random.mtrand.RandomState, for interface consistency? Right now if I start with ``prng = np.random.RandomState(seed=myseed)`` I cannot use ``prng.random`` as it does not exist. Note that I can use the convenience function ``prng.rand``, and its documentation makes reference to ``random``! Alan Isaac From peridot.faceted at gmail.com Sat Jan 23 17:01:11 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 23 Jan 2010 17:01:11 -0500 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: <4B5B4C00.8040505@american.edu> References: <4B5B4C00.8040505@american.edu> Message-ID: 2010/1/23 Alan G Isaac : > Suppose x and y are conformable 2d arrays. > I now want x to become a duplicate of y. > I could create a new array: > x = y.copy() > or I could assign the values of y to x: > x[:,:] = y > > As expected the latter is faster (no array creation). > Are there better ways? If both arrays are "C contiguous", or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). For really large arrays that use complete pages, some low-level hackery involving memmap() might be able to make a shared copy-on-write copy at almost no cost until you start modifying one array or the other. But both of these tricks are intended for the regime where copying the data is the expensive part, not fabricating the array object; for that, I'm not sure you can accelerate things much. Anne From aisaac at american.edu Sat Jan 23 17:31:25 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 23 Jan 2010 17:31:25 -0500 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: References: <4B5B4C00.8040505@american.edu> Message-ID: <4B5B78BD.80900@american.edu> On 1/23/2010 5:01 PM, Anne Archibald wrote: > If both arrays are "C contiguous", or more generally contiguous blocks > of memory with the same strided structure, you might get faster > copying by flattening them first, so that it can go in a single > memcpy(). I may misuderstand this. Did you just mean x.flat = y.flat ? If so, I find that to be *much* slower. Thanks, Alan x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) From kwgoodman at gmail.com Sat Jan 23 18:00:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 23 Jan 2010 15:00:24 -0800 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: <4B5B78BD.80900@american.edu> References: <4B5B4C00.8040505@american.edu> <4B5B78BD.80900@american.edu> Message-ID: On Sat, Jan 23, 2010 at 2:31 PM, Alan G Isaac wrote: > On 1/23/2010 5:01 PM, Anne Archibald wrote: >> If both arrays are "C contiguous", or more generally contiguous blocks >> of memory with the same strided structure, you might get faster >> copying by flattening them first, so that it can go in a single >> memcpy(). > > I may misuderstand this. ?Did you just mean > x.flat = y.flat > ? > > If so, I find that to be *much* slower. > > Thanks, > Alan > > > x = np.random.random((1000,1000)) > y = x.copy() > t0 = time.clock() > for t in range(1000): x = y.copy() > print(time.clock() - t0) > t0 = time.clock() > for t in range(1000): x[:,:] = y > print(time.clock() - t0) > t0 = time.clock() > for t in range(1000): x.flat = y.flat > print(time.clock() - t0) I don't know what a view is, but it is fast: x = y.view() def speed(): import numpy as np import time x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) t0 = time.clock() for t in range(1000): x = y.view() print(time.clock() - t0) >> speed() 1.3 2.07 15.0 0.01 From charlesr.harris at gmail.com Sat Jan 23 19:08:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 Jan 2010 17:08:37 -0700 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: References: <4B5B4C00.8040505@american.edu> <4B5B78BD.80900@american.edu> Message-ID: On Sat, Jan 23, 2010 at 4:00 PM, Keith Goodman wrote: > On Sat, Jan 23, 2010 at 2:31 PM, Alan G Isaac wrote: > > On 1/23/2010 5:01 PM, Anne Archibald wrote: > >> If both arrays are "C contiguous", or more generally contiguous blocks > >> of memory with the same strided structure, you might get faster > >> copying by flattening them first, so that it can go in a single > >> memcpy(). > > > > I may misuderstand this. Did you just mean > > x.flat = y.flat > > ? > > > > If so, I find that to be *much* slower. > > > > Thanks, > > Alan > > > > > > x = np.random.random((1000,1000)) > > y = x.copy() > > t0 = time.clock() > > for t in range(1000): x = y.copy() > > print(time.clock() - t0) > > t0 = time.clock() > > for t in range(1000): x[:,:] = y > > print(time.clock() - t0) > > t0 = time.clock() > > for t in range(1000): x.flat = y.flat > > print(time.clock() - t0) > > I don't know what a view is, but it is fast: > > x = y.view() > > In this case x isn't a copy of y, it is a reference to the same data in memory. It is fast because no copying is done. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Sat Jan 23 19:29:42 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 23 Jan 2010 19:29:42 -0500 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: <4B5B78BD.80900@american.edu> References: <4B5B4C00.8040505@american.edu> <4B5B78BD.80900@american.edu> Message-ID: 2010/1/23 Alan G Isaac : > On 1/23/2010 5:01 PM, Anne Archibald wrote: >> If both arrays are "C contiguous", or more generally contiguous blocks >> of memory with the same strided structure, you might get faster >> copying by flattening them first, so that it can go in a single >> memcpy(). > > I may misuderstand this. ?Did you just mean > x.flat = y.flat > ? No, .flat constructs an iterator that traverses the object as if it were flat. I had in mind accessing the underlying data through views that were flat: In [3]: x = np.random.random((1000,1000)) In [4]: y = np.random.random((1000,1000)) In [5]: xf = x.view() In [6]: xf.shape = (-1,) In [7]: yf = y.view() In [8]: yf.shape = (-1,) In [9]: yf[:] = xf[:] This may still use a loop instead of a memcpy(), in which case you'd want to look for an explicit memcpy()-based implementation, but when manipulating multidimensional arrays you have (in principle, anyway) nested loops which may not be executed in the cache-optimal order. Ideally numpy would automatically notice when operations can be done on flattened versions of arrays and get rid of some of the looping and indexing, but I wouldn't count on it. At one point I remember finding that the loops were reordered not for cache coherence but to make the inner loop over the biggest dimension (to minimize looping overhead). Anne > If so, I find that to be *much* slower. > > Thanks, > Alan > > > x = np.random.random((1000,1000)) > y = x.copy() > t0 = time.clock() > for t in range(1000): x = y.copy() > print(time.clock() - t0) > t0 = time.clock() > for t in range(1000): x[:,:] = y > print(time.clock() - t0) > t0 = time.clock() > for t in range(1000): x.flat = y.flat > print(time.clock() - t0) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From aisaac at american.edu Sat Jan 23 19:30:38 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 23 Jan 2010 19:30:38 -0500 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: References: <4B5B4C00.8040505@american.edu> <4B5B78BD.80900@american.edu> Message-ID: <4B5B94AE.7020805@american.edu> On 1/23/2010 6:00 PM, Keith Goodman wrote: > x = y.view() Thanks, but I'm not looking for a view. And I need x to own its data. Alan From aisaac at american.edu Sat Jan 23 19:39:42 2010 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 23 Jan 2010 19:39:42 -0500 Subject: [Numpy-discussion] fast duplicate of array In-Reply-To: References: <4B5B4C00.8040505@american.edu> <4B5B78BD.80900@american.edu> Message-ID: <4B5B96CE.80009@american.edu> On 1/23/2010 7:29 PM, Anne Archibald wrote: > I had in mind accessing the underlying data through views > that were flat: > > In [3]: x = np.random.random((1000,1000)) > > In [4]: y = np.random.random((1000,1000)) > > In [5]: xf = x.view() > > In [6]: xf.shape = (-1,) > > In [7]: yf = y.view() > > In [8]: yf.shape = (-1,) > > In [9]: yf[:] = xf[:] Yup, that's a bit faster. Thanks, Alan From gokhansever at gmail.com Sun Jan 24 12:54:37 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Sun, 24 Jan 2010 11:54:37 -0600 Subject: [Numpy-discussion] Interact with matplotlib in Sage Message-ID: <49d6b3501001240954l42188cb9m6af15d9e11bf1af3@mail.gmail.com> Hello, I have thought of this might interesting to share. Register at www.sagenb.org or try on your local Sage-notebook and using the following code: # Simple example demonstrating how to interact with matplotlib directly. # Comment plt.clf() to get the plots overlay in each update. # Gokhan Sever & Harald Schilly (2010-01-24) from scipy import stats import numpy as np import matplotlib.pyplot as plt @interact def plot_norm(loc=(0,(0,10)), scale=(1,(1,10))): rv = stats.norm(loc, scale) x = np.linspace(-10,10,1000) plt.plot(x,rv.pdf(x)) plt.grid(True) plt.savefig('plt.png') plt.clf() A very easy to use example, also well-suited for learning and demonstration purposes. Posted at: http://wiki.sagemath.org/interact/graphics#Interactwithmatplotlib Have fun ;) -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Sun Jan 24 19:32:30 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Mon, 25 Jan 2010 09:32:30 +0900 Subject: [Numpy-discussion] Proposed fix for MKL and dynamic loading In-Reply-To: References: <4B584115.9090500@student.matnat.uio.no> <4B58FABA.3030207@silveregg.co.jp> Message-ID: <4B5CE69E.2070604@silveregg.co.jp> Matthieu Brucher wrote: > > How do you write the site.cfg accordingly? I don't think you can do that through site.cfg, David From seb.haase at gmail.com Mon Jan 25 03:55:09 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 25 Jan 2010 09:55:09 +0100 Subject: [Numpy-discussion] scipy-tickets restarted emailing on jan17 - how about numpy-tickets ? Message-ID: Hi, long time ago I had subscript to get both scipy-tickets and numpy-tickets emailed. Now scipy-tickets apparently started emailing again on 17th of Januar. Will numpy-tickets also come back "by itself" - or should I resubscribe? Regards, Sebastian Haase From Nicolas.Rougier at loria.fr Mon Jan 25 05:46:05 2010 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Mon, 25 Jan 2010 11:46:05 +0100 Subject: [Numpy-discussion] glumpy, fast opengl visualization Message-ID: <1264416365.376.19.camel@sulfur> Hello, This is an update about glumpy, a fast-OpenGL based numpy visualization. I modified the code such that the only dependencies are PyOpenGL and IPython (for interactive sessions). You will also need matplotlib and scipy for some demos. Sources: hg clone http://glumpy.googlecode.com/hg/ glumpy No installation required, you can run all demos inplace. Homepage: http://code.google.com/p/glumpy/ Nicolas From rmay31 at gmail.com Mon Jan 25 09:54:16 2010 From: rmay31 at gmail.com (Ryan May) Date: Mon, 25 Jan 2010 08:54:16 -0600 Subject: [Numpy-discussion] scipy-tickets restarted emailing on jan17 - how about numpy-tickets ? In-Reply-To: References: Message-ID: On Mon, Jan 25, 2010 at 2:55 AM, Sebastian Haase wrote: > Hi, > long time ago I had subscript to get both scipy-tickets and > numpy-tickets emailed. > Now scipy-tickets apparently started emailing again on 17th of Januar. > Will numpy-tickets also come back "by itself" - or should I resubscribe? I'm seeing traffic on numpy-tickets since about the time scipy-tickets came back. I'd try resubscribing. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States From denis-bz-py at t-online.de Mon Jan 25 11:28:17 2010 From: denis-bz-py at t-online.de (denis) Date: Mon, 25 Jan 2010 17:28:17 +0100 Subject: [Numpy-discussion] Module Index for numpy? In-Reply-To: <4B534F9F.4020901@sbcglobal.net> References: <4B534F9F.4020901@sbcglobal.net> Message-ID: On 17/01/2010 18:57, Wayne Watson wrote: > I was just looking at the (Win) Python documentation via the Help on > IDLE, and a Global Module Index. Does anything like that exist for > numpy, matplotlib, scipy? Wayne, folks, may I second the wish / the need for searching thousands of functions. Fwiw, grep makes a crude but very fast source tree browser: 1) grep class + def + first docstring lines in numpy/...py (re, no import); this looks like -- numpy/compat/setupscons.py def configuration(parent_package='',top_path=None): -- numpy/core/arrayprint.py def product(x, y): return x*y def set_printoptions(precision=None, threshold=None, edgeitems=None, def get_printoptions(): """ Return the current print options. def array2string(a, max_line_width = None, precision = None, ... 2) grep2 that, i.e. grep + previous ^-- line. numpy.defs is 6k lines, 230k, grep time ~ .25 sec. What do we really want -- - a source browser GUI, pyqt or webbrowser - or a better text pydoc - or full-text search -- Robert Kern has suggested his Whoosh ? We could get together a table of existing GUIs and desiderata, sort by sum(features) / time-to-write-a-manual (not time-to-hack). Or, o'er forms of doc let fools contest, what's best written is the best. cheers -- denis From josef.pktd at gmail.com Mon Jan 25 11:59:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jan 2010 11:59:14 -0500 Subject: [Numpy-discussion] Module Index for numpy? In-Reply-To: References: <4B534F9F.4020901@sbcglobal.net> Message-ID: <1cd32cbb1001250859s4867a63ak2bedc8ea6788336c@mail.gmail.com> On Mon, Jan 25, 2010 at 11:28 AM, denis wrote: > On 17/01/2010 18:57, Wayne Watson wrote: >> I was just looking at the (Win) Python documentation via the Help on >> IDLE, and a Global Module Index. Does anything like that exist for >> numpy, matplotlib, scipy? > > Wayne, folks, > > ? may I second the wish / the need for searching thousands of functions. > > Fwiw, grep makes a crude but very fast source tree browser: > 1) grep class + def + first docstring lines in numpy/...py (re, no import); > this looks like > > ? ? ? ?-- numpy/compat/setupscons.py > ? ? ? ?def configuration(parent_package='',top_path=None): > > ? ? ? ?-- numpy/core/arrayprint.py > ? ? ? ?def product(x, y): return x*y > ? ? ? ?def set_printoptions(precision=None, threshold=None, edgeitems=None, > ? ? ? ?def get_printoptions(): > ? ? ? ? ? ?""" Return the current print options. > ? ? ? ?def array2string(a, max_line_width = None, precision = None, > ? ? ? ?... > > 2) grep2 that, i.e. grep + previous ^-- line. > numpy.defs is 6k lines, 230k, grep time ~ .25 sec. > > > What do we really want -- > ?- a source browser GUI, pyqt or webbrowser > ?- or a better text pydoc > ?- or full-text search -- Robert Kern has suggested his Whoosh > ? > We could get together a table of existing GUIs and desiderata, > sort by sum(features) / time-to-write-a-manual (not time-to-hack). > Or, > ? ? ? ?o'er forms of doc let fools contest, > ? ? ? ?what's best written is the best. > > cheers > ? -- denis htmlhelp (of the docs) has all of the above at least on Windows, except for source browsing (I use spyder for functions that are source accessible) Isn't there a Linux equivalent? I haven't use np.lookfor or np.source in a long time. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Mon Jan 25 12:56:51 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 25 Jan 2010 19:56:51 +0200 Subject: [Numpy-discussion] Module Index for numpy? In-Reply-To: <1cd32cbb1001250859s4867a63ak2bedc8ea6788336c@mail.gmail.com> References: <4B534F9F.4020901@sbcglobal.net> <1cd32cbb1001250859s4867a63ak2bedc8ea6788336c@mail.gmail.com> Message-ID: <1264442210.6685.9.camel@idol> ma, 2010-01-25 kello 11:59 -0500, josef.pktd at gmail.com kirjoitti: [clip] > htmlhelp (of the docs) has all of the above at least on Windows, > except for source browsing (I use spyder for functions that are source > accessible) > > Isn't there a Linux equivalent? There's devhelp, qthelp, and also CHM viewers for Linux. Numpy docs can be built for all of them [1], provided a new enough version of Sphinx. No need to invent a new wheel, methinks. Moreover, there is already a listing of functions by feature in the refguide... .. [1] http://sphinx.pocoo.org/latest/builders.html?highlight=devhelp#sphinx.builders.qthelp.QtHelpBuilder -- Pauli Virtanen From curiousjan at gmail.com Mon Jan 25 16:38:48 2010 From: curiousjan at gmail.com (Jan Strube) Date: Mon, 25 Jan 2010 21:38:48 +0000 Subject: [Numpy-discussion] indexing, searchsorting, ... Message-ID: Dear List, I'm trying to speed up a piece of code that selects a subsample based on some criteria: Setup: I have two samples, raw and cut. Cut is a pure subset of raw, all elements in cut are also in raw, and cut is derived from raw by applying some cuts. Now I would like to select a random subsample of raw and find out how many are also in cut. In other words, some of those random events pass the cuts, others don't. So in principle I have randomSample = np.random.random_integers(0, len(raw)-1, size=sampleSize) random_that_pass1 = [r for r in raw[randomSample] if r in cut] This is fine (I hope), but slow. I have seen searchsorted mentioned as a possible way to speed this up. Now it gets complicated. I'm creating a boolean array that contains True, wherever a raw event is in cut. raw_sorted = np.sort(raw) cut_sorted = np.sort(cut) passed = np.searchsorted(raw_sorted, cut_sorted) raw_bool = np.zeros(len(raw), dtype='bool') raw_bool[passed] = True Now I create a second boolean array that is set to True at the random values. The events I care about are the ones that pass the cuts and are selected by the random selection: sample_bool = np.zeros(len(raw), dtype='bool') sample_bool[randomSample] = True random_that_pass2 = raw[np.logical_and(raw_bool, sample_bool)] The problem comes in now: random_that_pass2 and random_that_pass1 have different lengths!!! Sometimes one is longer, sometimes the other. I am completely at a loss to explain this. I tend to believe the slow selection leading to random_that_pass1, because it's only two lines, but I don't understand where the other selection could fail. Unfortunately, the samples that give me trouble are 2.2 MB, so maybe a bit large to mail around, but I can put it somewhere if needed. Thank you for your help, Cheers, Jan From kwgoodman at gmail.com Mon Jan 25 17:16:19 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 25 Jan 2010 14:16:19 -0800 Subject: [Numpy-discussion] indexing, searchsorting, ... In-Reply-To: References: Message-ID: On Mon, Jan 25, 2010 at 1:38 PM, Jan Strube wrote: > Dear List, > > I'm trying to speed up a piece of code that selects a subsample based on some criteria: > Setup: > I have two samples, raw and cut. Cut is a pure subset of raw, all elements in cut are also in raw, and cut is derived from raw by applying some cuts. > Now I would like to select a random subsample of raw and find out how many are also in cut. In other words, some of those random events pass the cuts, others don't. > So in principle I have > > randomSample = np.random.random_integers(0, len(raw)-1, size=sampleSize) > random_that_pass1 = [r for r in raw[randomSample] if r in cut] > > This is fine (I hope), but slow. You could construct raw2 and cut2 where each element placed in cut2 is removed from raw2: idx = np.random.rand(n_in_cut2) > 0.5 # for example raw2 = raw[~idx] cut2 = raw[idx] If you concatenate raw2 and cut2 you get raw (but reordered): raw3 = np.concatenate((raw2, cut2), axis=0) Any element in the subsample with an index of len(raw2) or greater is in cut. That makes counting fast. There is a setup cost. So I guess it all depends on how many subsamples you need from one cut. Not sure any of this works, just an idea. From josef.pktd at gmail.com Mon Jan 25 17:47:47 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jan 2010 17:47:47 -0500 Subject: [Numpy-discussion] indexing, searchsorting, ... In-Reply-To: References: Message-ID: <1cd32cbb1001251447o253d1325nab7bced6e17dd8f1@mail.gmail.com> On Mon, Jan 25, 2010 at 5:16 PM, Keith Goodman wrote: > On Mon, Jan 25, 2010 at 1:38 PM, Jan Strube wrote: >> Dear List, >> >> I'm trying to speed up a piece of code that selects a subsample based on some criteria: >> Setup: >> I have two samples, raw and cut. Cut is a pure subset of raw, all elements in cut are also in raw, and cut is derived from raw by applying some cuts. >> Now I would like to select a random subsample of raw and find out how many are also in cut. In other words, some of those random events pass the cuts, others don't. >> So in principle I have >> >> randomSample = np.random.random_integers(0, len(raw)-1, size=sampleSize) >> random_that_pass1 = [r for r in raw[randomSample] if r in cut] >> >> This is fine (I hope), but slow. > > You could construct raw2 and cut2 where each element placed in cut2 is > removed from raw2: > > idx = np.random.rand(n_in_cut2) > 0.5 ?# for example > raw2 = raw[~idx] > cut2 = raw[idx] > > If you concatenate raw2 and cut2 you get raw (but reordered): > > raw3 = np.concatenate((raw2, cut2), axis=0) > > Any element in the subsample with an index of len(raw2) or greater is > in cut. That makes counting fast. > > There is a setup cost. So I guess it all depends on how many > subsamples you need from one cut. > > Not sure any of this works, just an idea. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > in1d or intersect in arraysetops should also work, pure python but well constructed and tested for performance. Josef From fperez.net at gmail.com Mon Jan 25 19:11:29 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 25 Jan 2010 16:11:29 -0800 Subject: [Numpy-discussion] [IPython-dev] glumpy, fast opengl visualization In-Reply-To: <1264416365.376.19.camel@sulfur> References: <1264416365.376.19.camel@sulfur> Message-ID: Hi Nicolas, On Mon, Jan 25, 2010 at 2:46 AM, Nicolas Rougier wrote: > > > Hello, > > This is an update about glumpy, a fast-OpenGL based numpy visualization. > I modified the code such that the only dependencies are PyOpenGL and > IPython (for interactive sessions). You will also need matplotlib and > scipy for some demos. > > Sources: hg clone http://glumpy.googlecode.com/hg/ glumpy > No installation required, you can run all demos inplace. > > Homepage: http://code.google.com/p/glumpy/ This is great, and it would be very cool to have it updated to the new code we're now landing in ipython with a much cleaner internal API (finally :) Have you had a chance to look at the code in my trunk-dev branch? https://code.launchpad.net/~fdo.perez/ipython/trunk-dev Brian finished a large review of it and we just had a chance to go over his feedback directly, so there's now one more round of reviews to do (once he applies the changes from our discussion) and this should become trunk very soon. The apis are much cleaner, this is the big cleanup I told you about last year, and now we're getting to the point where having multiple ipython frontends is a very realistic prospect. Unfortunately we won't be able to use your code directly in IPython as it stands, since the GPL provisions in it would require us to GPL all of IPython to make use of any of it directly in IPython. Your code uses iptyhon, numpy, matplotlib and scipy (in some demos), which amounts to hundreds of thousands of lines of code; here are the sloccount outputs from their respective trunks: IPython Totals grouped by language (dominant language first): python: 47357 (99.24%) lisp: 262 (0.55%) sh: 62 (0.13%) objc: 37 (0.08%) Numpy Totals grouped by language (dominant language first): ansic: 152950 (67.19%) python: 73188 (32.15%) cpp: 828 (0.36%) fortran: 298 (0.13%) sh: 156 (0.07%) pascal: 120 (0.05%) f90: 97 (0.04%) Matplotlib Totals grouped by language (dominant language first): python: 83290 (52.64%) cpp: 68212 (43.11%) objc: 4517 (2.85%) ansic: 2149 (1.36%) sh: 69 (0.04%) Scipy Totals grouped by language (dominant language first): cpp: 220149 (48.35%) fortran: 87240 (19.16%) python: 79164 (17.38%) ansic: 68746 (15.10%) sh: 61 (0.01%) Glumpy: Totals grouped by language (dominant language first): python: 3751 (100.00%) We're looking at ~300.000 lines of python alone in these tools. It's unfortunately not realistic for us to consider GPL-ing them in order to incorporate glumpy into the core set; it would be fantastic if you were willing to consider licensing your code under a license that is compatible with the body of work you are building on top of. You are obviously free to choose your license as you see fit, and end users (myself included) will be always able to use glumpy along with ipython, numpy, matplotlib and scipy. So *users* get all of the benefit of your contribution, and for that I am the first to be delighted and grateful that you've put your code out there. But as it stands, your code builds on close to half a million lines of other code which can not benefit back from your contributions. If you consider licensing glumpy to be compatible with ipython, numpy and matplotlib, it would be possible to incorporate your ideas back into those projects: perhaps in some places the right solution would be to fix our own designs to better provide what glumpy needs, in other cases we may find fixes you've made fit better upstream, etc. But this kind of collaboration will not be possible as long as glumpy can benefit from our tools but our codes are not allowed to benefit from glumpy (without changing licenses, which isn't going to happen). I hope you consider this from our perspective and in the most friendly and open manner: I completely respect your right to license your own code as you see fit (I've seen people put out GPL 'projects' that effectively consist of 3 lines that import IPython and make a function call, and that's OK too, and allowed by the license I chose to use). The only reason I ask you is because I think your tool is very interesting, and it would ultimately lead to a much more productive relationship with ipython, numpy and matplotlib if it could be a collaboration instead of a one-way benefit. Best regards, Fernando. From rharderlists at gmail.com Tue Jan 26 01:42:38 2010 From: rharderlists at gmail.com (Ross Harder) Date: Tue, 26 Jan 2010 00:42:38 -0600 Subject: [Numpy-discussion] numpy.i Message-ID: <38aeac3c1001252242u4b16a8e9i19ef94eb1552f65c@mail.gmail.com> I'm struggling with using some of the macros in numpy.i for my own typemap. The problem is that the arrayobject.h include does not end up in the c wrapper code after swig runs. numpy.i has at the beginning: %{ #ifndef SWIG_FILE_WITH_INIT # define NO_IMPORT_ARRAY #endif #include "stdio.h" #include %} I tried explicitly putting %{#include "arrayobject.h"%} into the interface file. It ends up after the fragment code in the wrapper so the compiler is complaining. thanks, ross From aurelien.marsan at turbomeca.fr Tue Jan 26 06:21:44 2010 From: aurelien.marsan at turbomeca.fr (A.MARSAN) Date: Tue, 26 Jan 2010 12:21:44 +0100 Subject: [Numpy-discussion] pgcc-Error-Unknown switch Message-ID: <7B4A9ED217FE41FDAA5FA3DC2F7C4FB2@tmfr1.tm.corp> Dear All, I'm trying to use f2py in order to convert a fortan-file fonctions_f90.f90 I apply the following command line f2py -m fonctions_f90 --fcompiler=pg -c fonctions_f90.f90 Everything seems to be well, until this error appears : Pgcc: /tmp/tmp7RvKeA/src.linux-x86_64-2.5/fortranobject.c Pgcc-Error-Unknown switch: -fno-strict-aliasing Pgcc-Error-Unkown switch: -fwrapv Pgcc-Error-Unkown switch: -Wall Pgcc-Error-Unkown switch: -Wstrict-prototypes Does anyone could explain me what append and how to fix this problem ? Thanks for help. -- Aurelien From eadrogue at gmx.net Tue Jan 26 08:28:15 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Tue, 26 Jan 2010 14:28:15 +0100 Subject: [Numpy-discussion] should numpy.equal raise an exception instead of returning NotImplemented? Message-ID: <20100126132815.GA7938@doriath.local> Hi, Do you think it is sensible for np.equal to return a NotImplemented object when is given an array of variable length dtype? Consider this code: x = np.array(['xyz','zyx']) np.where(np.equal(x, 'zyx'), [0,0], [1,1]) the last line returns array([0, 0]) which is wrong. Compare with np.where(x == 'zyx', [0,0], [1,1]) I think in this case raising an exception would be better, because of the way np.equal is often used as an argument to another function. It's not easy to see where the bug is or even that there is a bug in your code. Cheers. Ernest From josef.pktd at gmail.com Tue Jan 26 08:59:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 26 Jan 2010 08:59:05 -0500 Subject: [Numpy-discussion] should numpy.equal raise an exception instead of returning NotImplemented? In-Reply-To: <20100126132815.GA7938@doriath.local> References: <20100126132815.GA7938@doriath.local> Message-ID: <1cd32cbb1001260559g1be33a50vdd86ab672bfc2e58@mail.gmail.com> 2010/1/26 Ernest Adrogu? : > Hi, > > Do you think it is sensible for np.equal to return a NotImplemented > object when is given an array of variable length dtype? > Consider this code: > > x = np.array(['xyz','zyx']) > np.where(np.equal(x, 'zyx'), [0,0], [1,1]) > > the last line returns array([0, 0]) which is wrong. Compare with > > np.where(x == 'zyx', [0,0], [1,1]) > > I think in this case raising an exception would be better, because > of the way np.equal is often used as an argument to another function. > It's not easy to see where the bug is or even that there is a bug > in your code. there was a thread on this on december 7, but maybe nobody filed a ticket. Josef > Cheers. > > Ernest > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From eadrogue at gmx.net Tue Jan 26 09:17:48 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Tue, 26 Jan 2010 15:17:48 +0100 Subject: [Numpy-discussion] should numpy.equal raise an exception instead of returning NotImplemented? In-Reply-To: <1cd32cbb1001260559g1be33a50vdd86ab672bfc2e58@mail.gmail.com> References: <20100126132815.GA7938@doriath.local> <1cd32cbb1001260559g1be33a50vdd86ab672bfc2e58@mail.gmail.com> Message-ID: <20100126141748.GA8878@doriath.local> 26/01/10 @ 08:59 (-0500), thus spake josef.pktd at gmail.com: > 2010/1/26 Ernest Adrogu? : > > Hi, > > > > Do you think it is sensible for np.equal to return a NotImplemented > > object when is given an array of variable length dtype? > > Consider this code: > > > > x = np.array(['xyz','zyx']) > > np.where(np.equal(x, 'zyx'), [0,0], [1,1]) > > > > the last line returns array([0, 0]) which is wrong. Compare with > > > > np.where(x == 'zyx', [0,0], [1,1]) > > > > I think in this case raising an exception would be better, because > > of the way np.equal is often used as an argument to another function. > > It's not easy to see where the bug is or even that there is a bug > > in your code. > > there was a thread on this on december 7, but maybe nobody filed a ticket. This ticket http://projects.scipy.org/numpy/changeset/7878 seems to imply that now ufuncs raise an exception, so maybe it's already been taken care of. Thanks. Ernest From curiousjan at gmail.com Tue Jan 26 11:22:24 2010 From: curiousjan at gmail.com (Jan Strube) Date: Tue, 26 Jan 2010 16:22:24 +0000 Subject: [Numpy-discussion] indexing, searchsorting, ... Message-ID: Dear Josef and Keith, thank you both for your suggestions. I think intersect would be what I want for it makes clean code. I have, however, spotted the problem: I was mistakenly under the assumption that random_integers returns unique entries, which is of course not guaranteed, so that the random sample contained duplicate entries. That's why the numpy methods returned results inconsistent with python 'in'. I'll have to be a bit smarter in the generation of the random sample. Good thing I try to do things in two different ways. (Sometimes it is, anyway...) Thanks again for your quick help. Cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Tue Jan 26 11:46:00 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Tue, 26 Jan 2010 10:46:00 -0600 Subject: [Numpy-discussion] indexing, searchsorting, ... In-Reply-To: References: Message-ID: <4B5F1C48.40602@wartburg.edu> On 2010-01-26 10:22 , Jan Strube wrote: > Dear Josef and Keith, > > thank you both for your suggestions. I think intersect would be what I > want for it makes clean code. > I have, however, spotted the problem: > I was mistakenly under the assumption that random_integers returns > unique entries, which is of course not guaranteed, so that the random > sample contained duplicate entries. > That's why the numpy methods returned results inconsistent with python 'in'. > I'll have to be a bit smarter in the generation of the random sample. > Good thing I try to do things in two different ways. (Sometimes it is, > anyway...) You probably know this, but the function sample in Python's random module does sample without replacement. In [1]: import random In [2]: random.sample([1,2,3],2) Out[2]: [2, 3] In [5]: random.sample([1,2,3],3) Out[5]: [1, 2, 3] In [7]: random.sample([1,2,3],4) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/nmb/ in () /Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/random.pyc in sample(self, population, k) 314 n = len(population) 315 if not 0 <= k <= n: --> 316 raise ValueError, "sample larger than population" 317 random = self.random 318 _int = int ValueError: sample larger than population In [9]: import numpy as np A = In [10]: A = np.arange(1000)**(3/2.) In [11]: A[random.sample(range(A.shape[0]),25)] Out[11]: array([ 12618.24425188, 30538.0882342 , 18361.74109392, 925.94546276, 2935.15331797, 4000.37598233, 21826.1206127 , 2618.9692629 , 868.08467329, 52.38320341, 12063.64687812, 29930.60881439, 12236.06517635, 10221.89370909, 2414.9534157 , 13039.6113439 , 22967.67537214, 15140.04385727, 2639.67251757, 26461.80402013, 3218.73142713, 15963.71209963, 11755.35677893, 11551.31295568, 29142.37675619]) -Neil From curiousjan at gmail.com Tue Jan 26 12:04:03 2010 From: curiousjan at gmail.com (Jan Strube) Date: Tue, 26 Jan 2010 17:04:03 +0000 Subject: [Numpy-discussion] indexing, searchsorting, ... In-Reply-To: References: Message-ID: Hi Neil, sure...I aeh, knew this...of course...[?] I'm using shuffle with a list of indices now... Thanks, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 330.gif Type: image/gif Size: 96 bytes Desc: not available URL: From aisaac at american.edu Tue Jan 26 14:00:36 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 26 Jan 2010 14:00:36 -0500 Subject: [Numpy-discussion] is shuffle needlessly slow? Message-ID: <4B5F3BD4.4070109@american.edu> Is this a fair test? I expected shuffle to be much faster (no array creation). Alan Isaac >>> import timeit >>> >>> setup = """ ... import numpy as np ... prng = np.random.RandomState() ... N = 10**5 ... indexes = np.arange(N) ... """ >>> >>> print timeit.timeit('prng.shuffle(indexes)',setup, number=100) 5.69172311006 >>> print timeit.timeit('indexes = prng.random_sample(N).argsort()',setup, number=100) 1.54648202495 From josef.pktd at gmail.com Tue Jan 26 14:20:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 26 Jan 2010 14:20:59 -0500 Subject: [Numpy-discussion] is shuffle needlessly slow? In-Reply-To: <4B5F3BD4.4070109@american.edu> References: <4B5F3BD4.4070109@american.edu> Message-ID: <1cd32cbb1001261120w682bbe93ne5e42fb5bbd192dd@mail.gmail.com> On Tue, Jan 26, 2010 at 2:00 PM, Alan G Isaac wrote: > Is this a fair test? > I expected shuffle to be much faster > (no array creation). > Alan Isaac > >>>> import timeit >>>> >>>> setup = """ > ... import numpy as np > ... prng = np.random.RandomState() > ... N = 10**5 > ... indexes = np.arange(N) > ... """ >>>> >>>> print timeit.timeit('prng.shuffle(indexes)',setup, number=100) > 5.69172311006 >>>> print timeit.timeit('indexes = prng.random_sample(N).argsort()',setup, number=100) > 1.54648202495 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > maybe because shuffle works with python objects and random_sample with floats >>> a=['a','bb','cc','c','a'];np.random.shuffle(a);a ['c', 'a', 'bb', 'a', 'cc'] >>> np.random.random_sample(5) array([ 0.02791159, 0.8451104 , 0.51629232, 0.15428393, 0.39491844]) Josef >>> From aisaac at american.edu Tue Jan 26 14:23:24 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 26 Jan 2010 14:23:24 -0500 Subject: [Numpy-discussion] is shuffle needlessly slow? In-Reply-To: <4B5F3BD4.4070109@american.edu> References: <4B5F3BD4.4070109@american.edu> Message-ID: <4B5F412C.6060402@american.edu> On 1/26/2010 2:00 PM, Alan G Isaac wrote: > Is this a fair test? > I expected shuffle to be much faster > (no array creation). > Alan Isaac > >>>> import timeit >>>> >>>> setup = """ > ... import numpy as np > ... prng = np.random.RandomState() > ... N = 10**5 > ... indexes = np.arange(N) > ... """ >>>> >>>> print timeit.timeit('prng.shuffle(indexes)',setup, number=100) > 5.69172311006 >>>> print timeit.timeit('indexes = prng.random_sample(N).argsort()',setup, number=100) > 1.54648202495 I suppose that is not fair. But how about this? >>> print timeit.timeit('indexes[prng.random_sample(N).argsort()]=indexes',setup, number=100) 1.76073257914 Alan Isaac From wfspotz at sandia.gov Tue Jan 26 16:46:16 2010 From: wfspotz at sandia.gov (Bill Spotz) Date: Tue, 26 Jan 2010 16:46:16 -0500 Subject: [Numpy-discussion] numpy.i In-Reply-To: <38aeac3c1001252242u4b16a8e9i19ef94eb1552f65c@mail.gmail.com> References: <38aeac3c1001252242u4b16a8e9i19ef94eb1552f65c@mail.gmail.com> Message-ID: Can you post a simple example of this not working? On Jan 26, 2010, at 1:42 AM, Ross Harder wrote: > I'm struggling with using some of the macros in numpy.i for my own > typemap. > The problem is that the arrayobject.h include does not end up in the c > wrapper code after swig runs. > numpy.i has at the beginning: > %{ > #ifndef SWIG_FILE_WITH_INIT > # define NO_IMPORT_ARRAY > #endif > #include "stdio.h" > #include > %} > > I tried explicitly putting %{#include "arrayobject.h"%} into the > interface file. It ends up after the fragment code in the wrapper so > the compiler is complaining. > > thanks, > ross > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From david at silveregg.co.jp Tue Jan 26 23:24:29 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Wed, 27 Jan 2010 13:24:29 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits Message-ID: <4B5FBFFD.3000305@silveregg.co.jp> Hi, I have investigated further the ABI issues reported for numpy 1.4.0. I can confirm that we have broken the ABI for 1.4.0 compared to 1.3.0 (besides the "trivial" cython issue). The good new is that I have found the issue, the bad news is that I don't know how to (cleanly) fix it. The problem was caused by the new datetime support, in particular the structure PyArray_ArrFuncs which has been modified in a ABI-incompatible way (the first member cast is bigger because NTYPES is bigger). I don't know how to fix this cleanly - the only solution I can see is to to split cast into two parts, the first part the same size as before and the second part at the end of the structure, with a double-case test everytime the cast member is accessed inside the relevant functions.... As an aside, I would like to reiterate my advice for *small* commits. It took me nearly 2 hours to find this because the offending commit was > 4000 LOC, and it would have been very easy to find this were the code committed as a set of small self-contained commits. thanks, David From charlesr.harris at gmail.com Tue Jan 26 23:50:46 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jan 2010 21:50:46 -0700 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B5FBFFD.3000305@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> Message-ID: On Tue, Jan 26, 2010 at 9:24 PM, David Cournapeau wrote: > Hi, > > I have investigated further the ABI issues reported for numpy 1.4.0. I > can confirm that we have broken the ABI for 1.4.0 compared to 1.3.0 > (besides the "trivial" cython issue). The good new is that I have found > the issue, the bad news is that I don't know how to (cleanly) fix it. > > The problem was caused by the new datetime support, in particular the > structure PyArray_ArrFuncs which has been modified in a ABI-incompatible > way (the first member cast is bigger because NTYPES is bigger). > > Hmm, nasty. I don't like that structure anyway, it should be a pointer to a structure, or somehow not there in the first place. Yeah, it's a catastrophic "solution". Probably the only compatible fixes are: 1) remove the new function, 2) put it at the end of the enclosing structure, 3) live with the ABI breakage. The last is the easiest way to go for us, if not for others. The first solves the problem, but pretty much vitiates the datetime work. And moving the function leads to all sorts of nasty work arounds and code fixes. Whatever we do, it would be good to figure out some way to avoid this problem in the future. We could hide access to the array, for instance. But again, that would require a lot of other code mods. Hmm... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 26 23:55:29 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jan 2010 21:55:29 -0700 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: References: <4B5FBFFD.3000305@silveregg.co.jp> Message-ID: On Tue, Jan 26, 2010 at 9:50 PM, Charles R Harris wrote: > > > On Tue, Jan 26, 2010 at 9:24 PM, David Cournapeau wrote: > >> Hi, >> >> I have investigated further the ABI issues reported for numpy 1.4.0. I >> can confirm that we have broken the ABI for 1.4.0 compared to 1.3.0 >> (besides the "trivial" cython issue). The good new is that I have found >> the issue, the bad news is that I don't know how to (cleanly) fix it. >> >> The problem was caused by the new datetime support, in particular the >> structure PyArray_ArrFuncs which has been modified in a ABI-incompatible >> way (the first member cast is bigger because NTYPES is bigger). >> >> > Hmm, nasty. I don't like that structure anyway, it should be a pointer to a > structure, or somehow not there in the first place. Yeah, it's a > catastrophic "solution". Probably the only compatible fixes are: 1) remove > the new function, 2) put it at the end of the enclosing structure, 3) live > with the ABI breakage. The last is the easiest way to go for us, if not for > others. The first solves the problem, but pretty much vitiates the datetime > work. And moving the function leads to all sorts of nasty work arounds and > code fixes. > > Whatever we do, it would be good to figure out some way to avoid this > problem in the future. We could hide access to the array, for instance. But > again, that would require a lot of other code mods. Hmm... > > Thinking a bit more, for 1.4.1 I think we should just remove the function. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Wed Jan 27 00:01:42 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Wed, 27 Jan 2010 14:01:42 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: References: <4B5FBFFD.3000305@silveregg.co.jp> Message-ID: <4B5FC8B6.6080405@silveregg.co.jp> Charles R Harris wrote: > > > Thinking a bit more, for 1.4.1 I think we should just remove the function. This was rejected last time I suggested it, though :) David From david at silveregg.co.jp Wed Jan 27 00:02:49 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Wed, 27 Jan 2010 14:02:49 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: References: <4B5FBFFD.3000305@silveregg.co.jp> Message-ID: <4B5FC8F9.6000900@silveregg.co.jp> Charles R Harris wrote: > > Whatever we do, it would be good to figure out some way to avoid this > problem in the future. We could hide access to the array, for instance. > But again, that would require a lot of other code mods. Hmm... That's something that we have to do at some point if we care about ABI (I think we should care - expecting people to recompile all the extensions for a new version of numpy is a big hindrance). Assuming python 1.5 will have py3k support, I was wondering about starting working on NumPy 2.0, with massive changes to the C API so that we can avoid this problem in the future: no more "naked" structures, much cleaner/leaner headers to avoid accidental reliance on specific private binary layouts, etc... David From charlesr.harris at gmail.com Wed Jan 27 00:15:51 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jan 2010 22:15:51 -0700 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B5FC8F9.6000900@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> Message-ID: On Tue, Jan 26, 2010 at 10:02 PM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > Whatever we do, it would be good to figure out some way to avoid this > > problem in the future. We could hide access to the array, for instance. > > But again, that would require a lot of other code mods. Hmm... > > That's something that we have to do at some point if we care about ABI > (I think we should care - expecting people to recompile all the > extensions for a new version of numpy is a big hindrance). > > Assuming python 1.5 will have py3k support, I was wondering about > starting working on NumPy 2.0, with massive changes to the C API so that > we can avoid this problem in the future: no more "naked" structures, > much cleaner/leaner headers to avoid accidental reliance on specific > private binary layouts, etc... > > NumPy 2.0 is going to be a *lot* of work. And I've been thinking about it lately, mostly because I was looking over the same code where you found this problem. What I didn't know was how public the code was. Good find, BTW. One thought was to start by separating out the ufuncs and their dependency on ndarrays. But then I looked at the new buffer interface and it just won't do as a replacement, no complex numbers, etc. Maybe it can be extended. Anyway, if we make a move it needs to be well planned. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Wed Jan 27 03:48:37 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 27 Jan 2010 09:48:37 +0100 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> Message-ID: <4B5FFDE5.2090502@student.matnat.uio.no> Charles R Harris wrote: > > > On Tue, Jan 26, 2010 at 10:02 PM, David Cournapeau > > wrote: > > Charles R Harris wrote: > > > > > Whatever we do, it would be good to figure out some way to avoid > this > > problem in the future. We could hide access to the array, for > instance. > > But again, that would require a lot of other code mods. Hmm... > > That's something that we have to do at some point if we care about ABI > (I think we should care - expecting people to recompile all the > extensions for a new version of numpy is a big hindrance). > > Assuming python 1.5 will have py3k support, I was wondering about > starting working on NumPy 2.0, with massive changes to the C API > so that > we can avoid this problem in the future: no more "naked" structures, > much cleaner/leaner headers to avoid accidental reliance on specific > private binary layouts, etc... > > > NumPy 2.0 is going to be a *lot* of work. And I've been thinking about > it lately, mostly because I was looking over the same code where you > found this problem. What I didn't know was how public the code was. > Good find, BTW. > > One thought was to start by separating out the ufuncs and their > dependency on ndarrays. But then I looked at the new buffer interface > and it just won't do as a replacement, no complex numbers, etc. Maybe > it can be extended. Anyway, if we make a move it needs to be well planned. Huh? The PEP 3118 buffer format strings "Zf", "Zd", "Zg" are respectively complex float, double, long double. Any other reasons PEP 3118 can't be used? Not saying I believe there isn't, I'm just curious... Dag Sverre From gregor.thalhammer at gmail.com Wed Jan 27 08:57:57 2010 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Wed, 27 Jan 2010 14:57:57 +0100 Subject: [Numpy-discussion] glumpy, fast opengl visualization In-Reply-To: <1264416365.376.19.camel@sulfur> References: <1264416365.376.19.camel@sulfur> Message-ID: <42de02941001270557y3e107debp415a787dcf84e6fb@mail.gmail.com> 2010/1/25 Nicolas Rougier : > > > Hello, > > This is an update about glumpy, a fast-OpenGL based numpy visualization. > I modified the code such that the only dependencies are PyOpenGL and > IPython (for interactive sessions). You will also need matplotlib and > scipy for some demos. > > Sources: hg clone http://glumpy.googlecode.com/hg/ glumpy > No installation required, you can run all demos inplace. > > Homepage: http://code.google.com/p/glumpy/ > Hi Nicolas, thank you for providing glumpy. I started using it for my own project. The previous, pyglet based version of glumpy worked flawlessly on my system (WinXP). I want to report about problems with the latest version. 1.) On Windows glumpy fails on importing the Python termios module, since it is for Unix platforms only. 2.) Resolving this, the demos start, but are mostly not usable, since mouse scroll events and passive mouse motions events are not created. This might be a problem of the glut implementation on Windows (I am using PyOpenGL 3.0.1b2). Who knows? 3.) On OpenSuse11.1 the demos fail at 'glCreateProgram'. The demos used to work with the previous version. I use the Intel Q45 graphics chipset. In the future I might spend some time on this problems and I would like to contribute to glumpy, even if it's only testing on platforms available to me. Gregor From josef.pktd at gmail.com Wed Jan 27 10:39:50 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 27 Jan 2010 10:39:50 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? Message-ID: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> Can we/someone add a warning on the front page http://scipy.org/ (maybe under news for numpy download) about incompatibility of the binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? It would avoid that users have to find it out for themselves and reduce questions on the mailing list. An thanks a lot to David for hunting this down. Especially for users like me who have to rely on (some) precompiled binary distributions. Josef From charlesr.harris at gmail.com Wed Jan 27 11:04:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jan 2010 09:04:32 -0700 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B5FFDE5.2090502@student.matnat.uio.no> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B5FFDE5.2090502@student.matnat.uio.no> Message-ID: On Wed, Jan 27, 2010 at 1:48 AM, Dag Sverre Seljebotn < dagss at student.matnat.uio.no> wrote: > Charles R Harris wrote: > > > > > > On Tue, Jan 26, 2010 at 10:02 PM, David Cournapeau > > > wrote: > > > > Charles R Harris wrote: > > > > > > > > Whatever we do, it would be good to figure out some way to avoid > > this > > > problem in the future. We could hide access to the array, for > > instance. > > > But again, that would require a lot of other code mods. Hmm... > > > > That's something that we have to do at some point if we care about > ABI > > (I think we should care - expecting people to recompile all the > > extensions for a new version of numpy is a big hindrance). > > > > Assuming python 1.5 will have py3k support, I was wondering about > > starting working on NumPy 2.0, with massive changes to the C API > > so that > > we can avoid this problem in the future: no more "naked" structures, > > much cleaner/leaner headers to avoid accidental reliance on specific > > private binary layouts, etc... > > > > > > NumPy 2.0 is going to be a *lot* of work. And I've been thinking about > > it lately, mostly because I was looking over the same code where you > > found this problem. What I didn't know was how public the code was. > > Good find, BTW. > > > > One thought was to start by separating out the ufuncs and their > > dependency on ndarrays. But then I looked at the new buffer interface > > and it just won't do as a replacement, no complex numbers, etc. Maybe > > it can be extended. Anyway, if we make a move it needs to be well > planned. > Huh? The PEP 3118 buffer format strings "Zf", "Zd", "Zg" are > respectively complex float, double, long double. > > Any other reasons PEP 3118 can't be used? Not saying I believe there > isn't, I'm just curious... > > I wasn't looking at the PEP, I was looking at the python 3.x documentation which claims that the type strings used the same notation as the struct module. const char *formatA *NULL* terminated string in structmodule style syntax giving the contents of the elements available through the buffer. If this is *NULL*, "B" (unsigned bytes) is assumed.I assumed that the PEP would be more compatible since Travis put it together and that it was changed on the journey to python inclusion. It could also be the case that the python documentation isn't correct ;) But if we go over to a buffer interface we need to use what was in the PEP. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Jan 27 11:05:05 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 27 Jan 2010 10:05:05 -0600 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B5FC8F9.6000900@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> Message-ID: On Tue, Jan 26, 2010 at 11:02 PM, David Cournapeau wrote: > Charles R Harris wrote: > >> >> Whatever we do, it would be good to figure out some way to avoid this >> problem in the future. We could hide access to the array, for instance. >> But again, that would require a lot of other code mods. Hmm... > > That's something that we have to do at some point if we care about ABI > (I think we should care - expecting people to recompile all the > extensions for a new version of numpy is a big hindrance). > > Assuming python 1.5 will have py3k support, I was wondering about > starting working on NumPy 2.0, with massive changes to the C API so that > we can avoid this problem in the future: no more "naked" structures, > much cleaner/leaner headers to avoid accidental reliance on specific > private binary layouts, etc... > > David Numpy 1.5? :-) That was an incredible effort! My understanding is that a minor numpy release should not break the ABI and a major release is required when there is an ABI breakage.Thus, this ABI change must be in numpy 2.0 and not allowed in the numpy 1.x series unless the changes can not be 'easily' made in a way that does not break the 1.x series ABI. Alternatively, just acknowledge the fact as a unintended consequence and move on - which has happened before in numpy for a similar situation (see links below). Recent comments appeared in David's thread 'Going toward time-based release ?' http://article.gmane.org/gmane.comp.python.numeric.general/21368 Especially Robert's and Jarrod's responses in the sub-thread: http://article.gmane.org/gmane.comp.python.numeric.general/21378 Hopefully some users of the numpy ABI can provide some feedback on their needs. Just my 2 cents, Bruce From amcmorl at gmail.com Wed Jan 27 11:21:02 2010 From: amcmorl at gmail.com (Angus McMorland) Date: Wed, 27 Jan 2010 11:21:02 -0500 Subject: [Numpy-discussion] np.ma.apply_along_axis mask propagation Message-ID: Hi all, I'm having an issue with np.ma.apply_along_axis not propagating a mask correctly. This code-snippet should hopefully show what's happening: -------------------- import numpy as np xy = np.random.random(size=(5,2)) mask = np.tile(np.array([True] * 3 + [False] * 2)[:,None], (1,2)) xyma = np.ma.array(xy, mask=mask) def myfunc(vec): x,y = vec return x,y xyma2 = np.ma.apply_along_axis(myfunc, 1, xyma) tst = "np.all(asarray(myfunc(xyma[1])).mask == xyma2[1].mask)" print tst, ":", eval(tst) -> np.all(asarray(myfunc(xyma[1])).mask == xyma2[1].mask) : False -------------------- The point here is not that xyma.mask != xyma2.mask, but that xyma2.mask != the output of myfunc run on the individual rows, which it seems like it should. The following simple change seems to fix this. --- /usr/lib/python2.6/dist-packages/numpy/ma/extras.py 2009-04-05 04:09:20.000000000 -0400 +++ tmp/extras.py 2010-01-27 10:45:10.000000000 -0500 @@ -322,7 +322,7 @@ n -= 1 i.put(indlist, ind) j.put(indlist, ind) - res = func1d(arr[tuple(i.tolist())], *args, **kwargs) + res = asarray(func1d(arr[tuple(i.tolist())], *args, **kwargs)) outarr[tuple(flatten_inplace(j.tolist()))] = res dtypes.append(asarray(res).dtype) k += 1 Does this seem like an improvement? I haven't explored performance issues yet, pending sanity check. Thanks, Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From rpg.314 at gmail.com Wed Jan 27 14:50:12 2010 From: rpg.314 at gmail.com (Rohit Garg) Date: Thu, 28 Jan 2010 01:20:12 +0530 Subject: [Numpy-discussion] [IPython-dev] glumpy, fast opengl visualization In-Reply-To: References: <1264416365.376.19.camel@sulfur> Message-ID: <4d5dd8c21001271150g70ea7a10jb46f6786e2715ad8@mail.gmail.com> Looks like the license change is done. http://code.google.com/p/glumpy/source/detail?r=79a5429ef1d5c57c5f97276bb39340ed1b808f9e On Tue, Jan 26, 2010 at 5:41 AM, Fernando Perez wrote: > Hi Nicolas, > > On Mon, Jan 25, 2010 at 2:46 AM, Nicolas Rougier > wrote: >> >> >> Hello, >> >> This is an update about glumpy, a fast-OpenGL based numpy visualization. >> I modified the code such that the only dependencies are PyOpenGL and >> IPython (for interactive sessions). You will also need matplotlib and >> scipy for some demos. >> >> Sources: hg clone http://glumpy.googlecode.com/hg/ glumpy >> No installation required, you can run all demos inplace. >> >> Homepage: http://code.google.com/p/glumpy/ > > This is great, and it would be very cool to have it updated to the new > code we're now landing in ipython with a much cleaner internal API > (finally :) ?Have you had a chance to look at the code in my trunk-dev > branch? > > https://code.launchpad.net/~fdo.perez/ipython/trunk-dev > > Brian finished a large review of it and we just had a chance to go > over his feedback directly, so there's now one more round of reviews > to do (once he applies the changes from our discussion) and this > should become trunk very soon. ?The apis are much cleaner, this is the > big cleanup I told you about last year, and now we're getting to the > point where having multiple ipython frontends is a very realistic > prospect. > > Unfortunately we won't be able to use your code directly in IPython as > it stands, since the GPL provisions in it would require us to GPL all > of IPython to make use of any of it directly in IPython. ?Your code > uses iptyhon, numpy, matplotlib and scipy (in some demos), which > amounts to hundreds of thousands of lines of code; here are the > sloccount outputs from their respective trunks: > > IPython > Totals grouped by language (dominant language first): > python: ? ? ? 47357 (99.24%) > lisp: ? ? ? ? ? 262 (0.55%) > sh: ? ? ? ? ? ? ?62 (0.13%) > objc: ? ? ? ? ? ?37 (0.08%) > > > Numpy > Totals grouped by language (dominant language first): > ansic: ? ? ? 152950 (67.19%) > python: ? ? ? 73188 (32.15%) > cpp: ? ? ? ? ? ?828 (0.36%) > fortran: ? ? ? ?298 (0.13%) > sh: ? ? ? ? ? ? 156 (0.07%) > pascal: ? ? ? ? 120 (0.05%) > f90: ? ? ? ? ? ? 97 (0.04%) > > Matplotlib > Totals grouped by language (dominant language first): > python: ? ? ? 83290 (52.64%) > cpp: ? ? ? ? ?68212 (43.11%) > objc: ? ? ? ? ?4517 (2.85%) > ansic: ? ? ? ? 2149 (1.36%) > sh: ? ? ? ? ? ? ?69 (0.04%) > > Scipy > Totals grouped by language (dominant language first): > cpp: ? ? ? ? 220149 (48.35%) > fortran: ? ? ?87240 (19.16%) > python: ? ? ? 79164 (17.38%) > ansic: ? ? ? ?68746 (15.10%) > sh: ? ? ? ? ? ? ?61 (0.01%) > > Glumpy: > Totals grouped by language (dominant language first): > python: ? ? ? ?3751 (100.00%) > > We're looking at ~300.000 lines of python alone in these tools. ?It's > unfortunately not realistic for us to consider GPL-ing them in order > to incorporate glumpy into the core set; it would be fantastic if you > were willing to consider licensing your code under a license that is > compatible with the body of work you are building on top of. > > You are obviously free to choose your license as you see fit, and end > users (myself included) will be always able to use glumpy along with > ipython, numpy, matplotlib and scipy. ?So *users* get all of the > benefit of your contribution, and for that I am the first to be > delighted and grateful that you've put your code out there. > > But as it stands, your code builds on close to half a million lines of > other code which can not benefit back from your contributions. ?If you > consider licensing glumpy to be compatible with ipython, numpy and > matplotlib, it would be possible to incorporate your ideas back into > those projects: perhaps in some places the right solution would be to > fix our own designs to better provide what glumpy needs, in other > cases we may find fixes you've made fit better upstream, etc. > > But this kind of collaboration will not be possible as long as glumpy > can benefit from our tools but our codes are not allowed to benefit > from glumpy (without changing licenses, which isn't going to happen). > > I hope you consider this from our perspective and in the most friendly > and open manner: I completely respect your right to license your own > code as you see fit (I've seen people put out GPL 'projects' that > effectively consist of 3 lines that import IPython and make a function > call, and that's OK too, and allowed by the license I chose to use). > The only reason I ask you is because I think your tool is very > interesting, and it would ultimately lead to a much more productive > relationship with ipython, numpy and matplotlib if it could be a > collaboration instead of a one-way benefit. > > Best regards, > > Fernando. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology Bombay From david at silveregg.co.jp Wed Jan 27 19:57:37 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 09:57:37 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> Message-ID: <4B60E101.2000402@silveregg.co.jp> Bruce Southey wrote: > On Tue, Jan 26, 2010 at 11:02 PM, David Cournapeau > wrote: >> Charles R Harris wrote: >> >>> Whatever we do, it would be good to figure out some way to avoid this >>> problem in the future. We could hide access to the array, for instance. >>> But again, that would require a lot of other code mods. Hmm... >> That's something that we have to do at some point if we care about ABI >> (I think we should care - expecting people to recompile all the >> extensions for a new version of numpy is a big hindrance). >> >> Assuming python 1.5 will have py3k support, I was wondering about >> starting working on NumPy 2.0, with massive changes to the C API so that >> we can avoid this problem in the future: no more "naked" structures, >> much cleaner/leaner headers to avoid accidental reliance on specific >> private binary layouts, etc... >> >> David > > Numpy 1.5? :-) > > That was an incredible effort! > > My understanding is that a minor numpy release should not break the > ABI and a major release is required when there is an ABI > breakage. That's the usual practice, but not in numpy (neither in python BTW). The ABI is regularly broken in minor releases. > Hopefully some users of the numpy ABI can provide some feedback on their needs. Breaking the ABI simply means that every single package using the C API will have to be recompiled. This means for example that every windows/mac os x binary out there is broken (including scipy binaries). It is particularly painful if you need to have some packages with numpy 1.3.0 and some with 1.4.0. The idea is that in the current state, keeping the ABI is incredibly difficult, so we would need to severly change the C code (in backward incompatible ways, i.e. it would require to break the A*P*I as well) to control those issues later. Guido explicitly asked not to break compatibility while staying under py3k, so we should try to do it once numpy has been ported to py3k (e.g. if numpy 1.5 still is not py3k compatible, do a 1.6 before a 2.0 - iterate if necessary :) ). David From david at silveregg.co.jp Wed Jan 27 20:20:29 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 10:20:29 +0900 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> Message-ID: <4B60E65D.8050804@silveregg.co.jp> josef.pktd at gmail.com wrote: > Can we/someone add a warning on the front page http://scipy.org/ > (maybe under news for numpy download) about incompatibility of the > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? It seems that it will be quite difficult to fix the issue without removing something (I tried to use datetime as user types, but this opened a can of worms), so I am (quite reluctantly ) coming to the conclusion we should just bite the bullet and change the ABI number (so that importing anything will fail instead of crashing randomly). Something like numpy 1.4.0.1, which would just have a different ABI number than 1.4.0, without anything else. cheers, David From aisaac at american.edu Wed Jan 27 20:24:32 2010 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 27 Jan 2010 20:24:32 -0500 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B60E101.2000402@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> Message-ID: <4B60E750.4020708@american.edu> On 1/27/2010 7:57 PM, David Cournapeau wrote: > Guido explicitly asked not to break compatibility while staying under > py3k, so we should try to do it once numpy has been ported to py3k (e.g. > if numpy 1.5 still is not py3k compatible, do a 1.6 before a 2.0 - > iterate if necessary:) ). This sounds very different than http://www.artima.com/weblogs/viewpost.jsp?thread=227041 Can you provide a link? Thanks, Alan Isaac From david at silveregg.co.jp Wed Jan 27 20:28:03 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 10:28:03 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B60E750.4020708@american.edu> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> <4B60E750.4020708@american.edu> Message-ID: <4B60E823.9050800@silveregg.co.jp> Alan G Isaac wrote: > On 1/27/2010 7:57 PM, David Cournapeau wrote: >> Guido explicitly asked not to break compatibility while staying under >> py3k, so we should try to do it once numpy has been ported to py3k (e.g. >> if numpy 1.5 still is not py3k compatible, do a 1.6 before a 2.0 - >> iterate if necessary:) ). > > > This sounds very different than > http://www.artima.com/weblogs/viewpost.jsp?thread=227041 Maybe my English is broken, as I meant exactly the same as in Guido's post: do not break API (C API here) while porting to py3k. Making the NumPy C API robust to changes wo constantly breaking the ABI will require heavy changes to C structures and how they are exposed to 3rd parties. It is impossible to do without breaking the C API. cheers, David From josef.pktd at gmail.com Wed Jan 27 20:35:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 27 Jan 2010 20:35:05 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B60E65D.8050804@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> Message-ID: <1cd32cbb1001271735n5f0c41e8s673782bf8f0c407f@mail.gmail.com> On Wed, Jan 27, 2010 at 8:20 PM, David Cournapeau wrote: > josef.pktd at gmail.com wrote: >> Can we/someone add a warning on the front page http://scipy.org/ >> (maybe under news for numpy download) about incompatibility of the >> binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > It seems that it will be quite difficult to fix the issue without > removing something (I tried to use datetime as user types, but this > opened a can of worms), so I am (quite reluctantly ) coming to the > conclusion we should just bite the bullet and change the ABI number (so > that importing anything will fail instead of crashing randomly). > > Something like numpy 1.4.0.1, which would just have a different ABI > number than 1.4.0, without anything else. If you are also able to provide new scipy binaries, then at least the combination would be usable without intermittent import errors and crashes. Would the change in the ABI numer prevent some other programs that use numpy and are compiled against an older numpy, for me mainly matplotlib, from running? Josef > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david at silveregg.co.jp Wed Jan 27 20:38:06 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 10:38:06 +0900 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <1cd32cbb1001271735n5f0c41e8s673782bf8f0c407f@mail.gmail.com> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <1cd32cbb1001271735n5f0c41e8s673782bf8f0c407f@mail.gmail.com> Message-ID: <4B60EA7E.3090208@silveregg.co.jp> josef.pktd at gmail.com wrote: > > If you are also able to provide new scipy binaries, then at least the > combination would be usable without intermittent import errors and > crashes. The problem is that the new scipy would not be usable with older numpy. That's why breaking the ABI is so painful. > > Would the change in the ABI numer prevent some other programs that use > numpy and are compiled against an older numpy, for me mainly > matplotlib, from running? Not some, *all* of them (as long as they use the numpy C extension, that is - pure python are obviously unaffected). David From andyjian430074 at gmail.com Wed Jan 27 20:45:06 2010 From: andyjian430074 at gmail.com (Jankins) Date: Wed, 27 Jan 2010 19:45:06 -0600 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault Message-ID: <4B60EC22.5070001@gmail.com> Dear all, I am using scipy '0.8.0.dev6120'. And the scipy.sparse.eigen function always produces error message. _Description:_ linalg.eigen(A, k=6, M=None, sigma=None, which='LM', v0=None, ncv=None, maxiter=None, tol=0, return_eigenvectors=True)_ Error messages:_ When I use this function in the way : linalg.eigen(A, k=2, return_eigenvectors=False) it produces error : *** glibc detected *** python: double free or corruption (!prev) when I use : linalg.eigen(A, k=4, return_eigenvectors=False) or linalg.eigen(A, k=8, return_eigenvectors=False) it produces error : Segmentation fault _My goal:_ "A" is an _unsymmetrical CSR sparse matrix_. What I am trying to do is : 1. find a node 's' to delete. For edge (u, v) in all the out-edges and in-edges of node 's', I set A[u, v] = 0.0. 2. calculate the largest eigenvalue using linalg.eigen(A, return_eigenvectors=False) 3. repeat 1-2 steps many times. I had used eigen_symmetric function to compute symmetric CSR sparse matrix and it works very well. But for the 'eigen' function, it is not working very well. Could you please help me about it? If it does not work, I have to rewrite my code in MATLAB, which is what I am trying to avoid. Thanks so much. Yours sincerely, Jankins -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Wed Jan 27 20:47:33 2010 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 27 Jan 2010 20:47:33 -0500 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B60E823.9050800@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> <4B60E750.4020708@american.edu> <4B60E823.9050800@silveregg.co.jp> Message-ID: <4B60ECB5.9060502@american.edu> >> On 1/27/2010 7:57 PM, David Cournapeau wrote: >>> Guido explicitly asked not to break compatibility while staying under >>> py3k, so we should try to do it once numpy has been ported to py3k (e.g. >>> if numpy 1.5 still is not py3k compatible, do a 1.6 before a 2.0 - >>> iterate if necessary:) ). > Alan G Isaac wrote: >> This sounds very different than >> http://www.artima.com/weblogs/viewpost.jsp?thread=227041 On 1/27/2010 8:28 PM, David Cournapeau wrote: > Maybe my English is broken, as I meant exactly the same as in Guido's > post: do not break API (C API here) while porting to py3k. Making the > NumPy C API robust to changes wo constantly breaking the ABI will > require heavy changes to C structures and how they are exposed to 3rd > parties. It is impossible to do without breaking the C API. My reading is: do not see py3k as an opportunity for API breakage. So if breakage is know to be necessary, do it now in a forward looking way, so that it will not be necessary after moving to py3k. Quoting from http://www.artima.com/weblogs/viewpost.jsp?thread=227041 : "If you have make API changes, do them before you port to 3.0" I thought you were saying the opposite of that ... ? fwiw, Alan From david at silveregg.co.jp Wed Jan 27 20:56:25 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 10:56:25 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B60ECB5.9060502@american.edu> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> <4B60E750.4020708@american.edu> <4B60E823.9050800@silveregg.co.jp> <4B60ECB5.9060502@american.edu> Message-ID: <4B60EEC9.2050605@silveregg.co.jp> Alan G Isaac wrote: >>> On 1/27/2010 7:57 PM, David Cournapeau wrote: >>>> Guido explicitly asked not to break compatibility while staying under >>>> py3k, so we should try to do it once numpy has been ported to py3k (e.g. >>>> if numpy 1.5 still is not py3k compatible, do a 1.6 before a 2.0 - >>>> iterate if necessary:) ). > > >> Alan G Isaac wrote: >>> This sounds very different than >>> http://www.artima.com/weblogs/viewpost.jsp?thread=227041 > > > On 1/27/2010 8:28 PM, David Cournapeau wrote: >> Maybe my English is broken, as I meant exactly the same as in Guido's >> post: do not break API (C API here) while porting to py3k. Making the >> NumPy C API robust to changes wo constantly breaking the ABI will >> require heavy changes to C structures and how they are exposed to 3rd >> parties. It is impossible to do without breaking the C API. > > > > My reading is: do not see py3k as > an opportunity for API breakage. Yup. > > So if breakage is know to be necessary, > do it now in a forward looking way, > so that it will not be necessary > after moving to py3k. > > Quoting from http://www.artima.com/weblogs/viewpost.jsp?thread=227041 : > "If you have make API changes, do them before you port to 3.0" Ah, that's the misunderstanding: I think you focus on before vs after, but that's not the most important point. The full quote is " If you have make API changes, do them before you port to 3.0 -- release a version with the new API for Python 2.5, or 2.6 if you must. (Or do it later, after you've released a port to 3.0 without adding new features.)" What matters is not to do it at the same time, so that porting 3rd party code with 2to3 is possible (that's the bolded text). Since the py3k port is already underway, it seems natural to me to first release a py3k compatible release, and then a new numpy with incompatible API. OTOH, one could make the argument that releasing the API would avoid having to port numpy "twice" (first to py3k with say numpy 1.5.0, then to the new API for numpy 2.0). But I am not sure it is a big change in practice ? David From kwgoodman at gmail.com Wed Jan 27 21:10:43 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 27 Jan 2010 18:10:43 -0800 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays Message-ID: I recently opened sourced one of my packages. It is a labeled array that I call larry. A two-dimensional larry, for example, contains a 2d NumPy array with labels on each row and column. A larry can have any dimension. Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys. larry has built-in methods such as movingsum, ranking, merge, shuffle, zscore, demean, lag as well as typical NumPy methods like sum, max, std, sign, clip. NaNs are treated as missing data. You can archive larrys in HDF5 format using save and load or using a dictionary-like interface. I'm working towards a 0.1 release. In the meantime, comments, suggestions, critiques are all appreciated. To use larry you need Python and NumPy 1.4 or newer. To save and load larrys in HDF5 format, you need h5py with HDF5 1.8. larry currently contains no extensions, just Python code, so there is nothing to compile. Just save the la package and make sure Python can find it. docs http://larry.sourceforge.net code https://launchpad.net/larry From wesmckinn at gmail.com Wed Jan 27 21:33:32 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jan 2010 21:33:32 -0500 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: <6c476c8a1001271833m331828a1sfde1c8fe27a67ea6@mail.gmail.com> On Wed, Jan 27, 2010 at 9:10 PM, Keith Goodman wrote: > I recently opened sourced one of my packages. It is a labeled array > that I call larry. > > A two-dimensional larry, for example, contains a 2d NumPy array with > labels on each row and column. A larry can have any dimension. > > Alignment by label is automatic when you add (or subtract, multiply, > divide) two larrys. > > larry has built-in methods such as movingsum, ranking, merge, shuffle, > zscore, demean, lag as well as typical NumPy methods like sum, max, > std, sign, clip. NaNs are treated as missing data. > > You can archive larrys in HDF5 format using save and load or using a > dictionary-like interface. > > I'm working towards a 0.1 release. In the meantime, comments, > suggestions, critiques are all appreciated. > > To use larry you need Python and NumPy 1.4 or newer. To save and load > larrys in HDF5 format, you need h5py with HDF5 1.8. > > larry currently contains no extensions, just Python code, so there is > nothing to compile. Just save the la package and make sure Python can > find it. > > docs ?http://larry.sourceforge.net > code ?https://launchpad.net/larry > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Cool! Thanks for releasing. Looks like you're solving some similar problems to the ones I built pandas for (http://pandas.sourceforge.net). I'll have to have a closer look at the implementation to see if there are some design commonalities we can benefit from. - Wes From kwgoodman at gmail.com Wed Jan 27 21:57:41 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 27 Jan 2010 18:57:41 -0800 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: <6c476c8a1001271833m331828a1sfde1c8fe27a67ea6@mail.gmail.com> References: <6c476c8a1001271833m331828a1sfde1c8fe27a67ea6@mail.gmail.com> Message-ID: On Wed, Jan 27, 2010 at 6:33 PM, Wes McKinney wrote: > On Wed, Jan 27, 2010 at 9:10 PM, Keith Goodman wrote: >> I recently opened sourced one of my packages. It is a labeled array >> that I call larry. >> >> A two-dimensional larry, for example, contains a 2d NumPy array with >> labels on each row and column. A larry can have any dimension. >> >> Alignment by label is automatic when you add (or subtract, multiply, >> divide) two larrys. >> >> larry has built-in methods such as movingsum, ranking, merge, shuffle, >> zscore, demean, lag as well as typical NumPy methods like sum, max, >> std, sign, clip. NaNs are treated as missing data. >> >> You can archive larrys in HDF5 format using save and load or using a >> dictionary-like interface. >> >> I'm working towards a 0.1 release. In the meantime, comments, >> suggestions, critiques are all appreciated. >> >> To use larry you need Python and NumPy 1.4 or newer. To save and load >> larrys in HDF5 format, you need h5py with HDF5 1.8. >> >> larry currently contains no extensions, just Python code, so there is >> nothing to compile. Just save the la package and make sure Python can >> find it. >> >> docs ?http://larry.sourceforge.net >> code ?https://launchpad.net/larry >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Cool! Thanks for releasing. > > Looks like you're solving some similar problems to the ones I built > pandas for (http://pandas.sourceforge.net). I'll have to have a closer > look at the implementation to see if there are some design > commonalities we can benefit from. Yes, I hope we have some overlap so that we can share code. As far as design goes, larry contains a Numpy array for the data and a list of lists (one list for each dimension) for the labels. Most of the larry methods have underlying Numpy array functions that could easily be used by other projects. There are also functions for repacking HDF5 archives and for creating intermediate HDF5 Groups when saving a Dataset inside nested Groups. All this is transparent to the user but hopefully useful for other projects. From david at silveregg.co.jp Wed Jan 27 22:06:12 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 12:06:12 +0900 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B60EC22.5070001@gmail.com> References: <4B60EC22.5070001@gmail.com> Message-ID: <4B60FF24.7040504@silveregg.co.jp> Jankins wrote: > Dear all, > > I am using scipy '0.8.0.dev6120'. And the scipy.sparse.eigen function > always produces error message. > > _Description:_ > linalg.eigen(A, k=6, M=None, sigma=None, which='LM', v0=None, > ncv=None, maxiter=None, tol=0, return_eigenvectors=True)_ Could you provide your platform details (i.e. OS, compiler, 32 vs 64 bits, the output of scipy.show_config()). This is needed to isolate the problem, cheers. David From pgmdevlist at gmail.com Wed Jan 27 22:13:10 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 Jan 2010 22:13:10 -0500 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote: > I recently opened sourced one of my packages. It is a labeled array > that I call larry. > > A two-dimensional larry, for example, contains a 2d NumPy array with > labels on each row and column. A larry can have any dimension. > > Alignment by label is automatic when you add (or subtract, multiply, > divide) two larrys. > > larry has built-in methods such as movingsum, ranking, merge, shuffle, > zscore, demean, lag as well as typical NumPy methods like sum, max, > std, sign, clip. NaNs are treated as missing data. So you can't have an integer larry with missing data ? > You can archive larrys in HDF5 format using save and load or using a > dictionary-like interface. > > I'm working towards a 0.1 release. In the meantime, comments, > suggestions, critiques are all appreciated. > I'll have to check it (hopefully I'll have a bit more time in the next couple of weeks), but what are the main differences/advantages of using your approach compared to pandas or tabular ? From andyjian430074 at gmail.com Wed Jan 27 22:21:26 2010 From: andyjian430074 at gmail.com (Jankins) Date: Wed, 27 Jan 2010 21:21:26 -0600 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B60FF24.7040504@silveregg.co.jp> References: <4B60EC22.5070001@gmail.com> <4B60FF24.7040504@silveregg.co.jp> Message-ID: <4B6102B6.400@gmail.com> I tried on Ubuntu 9.10-32bit, gcc version 4.4.1, . Here is the information of show_config(): In [2]: scipy.show_config() umfpack_info: NOT AVAILABLE atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = c include_dirs = ['/usr/include'] atlas_blas_threads_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/atlas', '/usr/lib'] define_macros = [('ATLAS_INFO', '"\\"3.6.0\\""')] language = f77 include_dirs = ['/usr/include'] atlas_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/atlas', '/usr/lib'] language = f77 include_dirs = ['/usr/include'] lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib'] language = c include_dirs = ['/usr/include'] mkl_info: NOT AVAILABLE Thanks so much. On 1/27/2010 9:06 PM, David Cournapeau wrote: > Jankins wrote: > >> Dear all, >> >> I am using scipy '0.8.0.dev6120'. And the scipy.sparse.eigen function >> always produces error message. >> >> _Description:_ >> linalg.eigen(A, k=6, M=None, sigma=None, which='LM', v0=None, >> ncv=None, maxiter=None, tol=0, return_eigenvectors=True)_ >> > Could you provide your platform details (i.e. OS, compiler, 32 vs 64 > bits, the output of scipy.show_config()). This is needed to isolate the > problem, > > cheers. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Wed Jan 27 22:24:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 27 Jan 2010 19:24:24 -0800 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: On Wed, Jan 27, 2010 at 7:13 PM, Pierre GM wrote: > On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote: >> I recently opened sourced one of my packages. It is a labeled array >> that I call larry. >> >> A two-dimensional larry, for example, contains a 2d NumPy array with >> labels on each row and column. A larry can have any dimension. >> >> Alignment by label is automatic when you add (or subtract, multiply, >> divide) two larrys. >> >> larry has built-in methods such as movingsum, ranking, merge, shuffle, >> zscore, demean, lag as well as typical NumPy methods like sum, max, >> std, sign, clip. NaNs are treated as missing data. > > So you can't have an integer larry with missing data ? No. >> You can archive larrys in HDF5 format using save and load or using a >> dictionary-like interface. >> >> I'm working towards a 0.1 release. In the meantime, comments, >> suggestions, critiques are all appreciated. >> > > I'll have to check it (hopefully I'll have a bit more time in the next couple of weeks), but what are the main differences/advantages of using your approach compared to pandas or tabular ? > I've tried to make larry behave as a numpy array user would expect. If, for example, you have a function, myfunc, that works on Numpy arrays and doesn't change the shape or ordering of the array, then you can use it on a larry, y, like this: y.x = myfunc(y.x). The main use case for a larry is when you want to work on the entire array, or a subset of it, all at once. Not so much if you only want to grab one row, for example, at a time. The internal structure of a larry (Numpy array + list) is easy to understand so it is easy to get going and to extend. From david at silveregg.co.jp Wed Jan 27 22:36:05 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 12:36:05 +0900 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B6102B6.400@gmail.com> References: <4B60EC22.5070001@gmail.com> <4B60FF24.7040504@silveregg.co.jp> <4B6102B6.400@gmail.com> Message-ID: <4B610625.4060303@silveregg.co.jp> Jankins wrote: > I tried on Ubuntu 9.10-32bit, gcc version 4.4.1, . Here is the > information of show_config(): Sorry, I forgot an additional information, the exact atlas you are using. For example, assuming scipy is installed in /usr/local, I would need the output of /usr/local/lib/python2.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so (you are using linalg.eigen from scipy.sparse, right ?). Ideally, if the matrix is not too big, having the matrix which crashes scipy is most helpful, thanks, David From josef.pktd at gmail.com Wed Jan 27 22:44:45 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 27 Jan 2010 22:44:45 -0500 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: <1cd32cbb1001271944u553e0136x8e1d1abbab4f89ac@mail.gmail.com> On Wed, Jan 27, 2010 at 10:13 PM, Pierre GM wrote: > On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote: >> I recently opened sourced one of my packages. It is a labeled array >> that I call larry. >> >> A two-dimensional larry, for example, contains a 2d NumPy array with >> labels on each row and column. A larry can have any dimension. >> >> Alignment by label is automatic when you add (or subtract, multiply, >> divide) two larrys. >> >> larry has built-in methods such as movingsum, ranking, merge, shuffle, >> zscore, demean, lag as well as typical NumPy methods like sum, max, >> std, sign, clip. NaNs are treated as missing data. > > So you can't have an integer larry with missing data ? > >> You can archive larrys in HDF5 format using save and load or using a >> dictionary-like interface. >> >> I'm working towards a 0.1 release. In the meantime, comments, >> suggestions, critiques are all appreciated. >> > > I'll have to check it (hopefully I'll have a bit more time in the next couple of weeks), but what are the main differences/advantages of using your approach compared to pandas or tabular ? In a very simplified characterization, my impression is they all try to do the same thing in different ways and with different emphasis pandas is a dictionary (not only), tabular are structured arrays, larry is an nd array by delegation, all of them for generic axis labels as far as I understand, all based on nan for missing values scikits.timeseries has the more elaborate time support and is based on masked arrays getitem and slicing work differently, another version of a labeled array is http://github.com/fperez/datarray/blob/master/datarray.py I'm trying to work on an example how we can move our data between the different implementations, because depending on the task one implementation or another might be more convenient. And there seems to be enough compatibility to do it without loss of information. Josef " http://esciencenews.com/articles/2010/01/19/too.many.choices.new.study.says.more.usually.better " > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From aisaac at american.edu Wed Jan 27 22:46:47 2010 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 27 Jan 2010 22:46:47 -0500 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B60EEC9.2050605@silveregg.co.jp> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> <4B60E750.4020708@american.edu> <4B60E823.9050800@silveregg.co.jp> <4B60ECB5.9060502@american.edu> <4B60EEC9.2050605@silveregg.co.jp> Message-ID: <4B6108A7.8090707@american.edu> On 1/27/2010 8:56 PM, David Cournapeau wrote: > one could make the argument that releasing the API would avoid > having to port numpy "twice" (first to py3k with say numpy 1.5.0, then > to the new API for numpy 2.0). But I am not sure it is a big change in > practice ? OK, I misunderstood: I thought you were proposing to change the API *only* for the py3k NumPy, effectively leaving the earlier Pythons orphaned. Sorry for the mistake. Alan From david at silveregg.co.jp Thu Jan 28 00:03:04 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 14:03:04 +0900 Subject: [Numpy-discussion] Fixing numpy 1.4.0 ABI breakage, and a plea for self-contained, small commits In-Reply-To: <4B6108A7.8090707@american.edu> References: <4B5FBFFD.3000305@silveregg.co.jp> <4B5FC8F9.6000900@silveregg.co.jp> <4B60E101.2000402@silveregg.co.jp> <4B60E750.4020708@american.edu> <4B60E823.9050800@silveregg.co.jp> <4B60ECB5.9060502@american.edu> <4B60EEC9.2050605@silveregg.co.jp> <4B6108A7.8090707@american.edu> Message-ID: <4B611A88.4000604@silveregg.co.jp> Alan G Isaac wrote: > On 1/27/2010 8:56 PM, David Cournapeau wrote: >> one could make the argument that releasing the API would avoid >> having to port numpy "twice" (first to py3k with say numpy 1.5.0, then >> to the new API for numpy 2.0). But I am not sure it is a big change in >> practice ? > > OK, I misunderstood: I thought you were proposing to change > the API *only* for the py3k NumPy, effectively leaving the > earlier Pythons orphaned. Ah, inddeed. Given the current adoption of py3k for libraries that matter to us, that would be insane :) David From josef.pktd at gmail.com Thu Jan 28 00:18:49 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 28 Jan 2010 00:18:49 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B60EA7E.3090208@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <1cd32cbb1001271735n5f0c41e8s673782bf8f0c407f@mail.gmail.com> <4B60EA7E.3090208@silveregg.co.jp> Message-ID: <1cd32cbb1001272118o71db6386w377fcb2e47b31a5e@mail.gmail.com> On Wed, Jan 27, 2010 at 8:38 PM, David Cournapeau wrote: > josef.pktd at gmail.com wrote: > >> >> If you are also able to provide new scipy binaries, then at least the >> combination would be usable without intermittent import errors and >> crashes. > > The problem is that the new scipy would not be usable with older numpy. > That's why breaking the ABI is so painful. > >> >> Would the change in the ABI numer prevent some other programs that use >> numpy and are compiled against an older numpy, for me mainly >> matplotlib, from running? > > Not some, *all* of them (as long as they use the numpy C extension, that > is - pure python are obviously unaffected). > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andyjian430074 at gmail.com Thu Jan 28 00:21:50 2010 From: andyjian430074 at gmail.com (Jankins) Date: Wed, 27 Jan 2010 23:21:50 -0600 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B610625.4060303@silveregg.co.jp> References: <4B60EC22.5070001@gmail.com> <4B60FF24.7040504@silveregg.co.jp> <4B6102B6.400@gmail.com> <4B610625.4060303@silveregg.co.jp> Message-ID: <4B611EEE.8040500@gmail.com> Yes. I am using scipy.sparse.linalg.eigen.arpack. The exact output is: /usr/local/lib/python2.6/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so In fact, the matrix is from a directed graph with about 18,000 nodes and 41,000 edges. Actually, this matrix is the smallest one I used. Now I switch to use numpy.linalg.eigvals, but it is slower than scipy.sparse.linalg.eigen.arpack module. Thanks. Jankins On 1/27/2010 9:36 PM, David Cournapeau wrote: > the exact atlas you are > using. For example, assuming scipy is insta From david at silveregg.co.jp Thu Jan 28 01:11:25 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 15:11:25 +0900 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B611EEE.8040500@gmail.com> References: <4B60EC22.5070001@gmail.com> <4B60FF24.7040504@silveregg.co.jp> <4B6102B6.400@gmail.com> <4B610625.4060303@silveregg.co.jp> <4B611EEE.8040500@gmail.com> Message-ID: <4B612A8D.3060401@silveregg.co.jp> Jankins wrote: > Yes. I am using scipy.sparse.linalg.eigen.arpack. > > The exact output is: > > /usr/local/lib/python2.6/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so I need the output of ldd on this file, actually, i.e the output of "ldd /usr/local/lib/python2.6/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so". It should output the libraries actually loaded by the OS. > In fact, the matrix is from a directed graph with about 18,000 nodes and > 41,000 edges. Actually, this matrix is the smallest one I used. Is it available somewhere ? 41000 edges should make the matrix very sparse. I first thought that your problem may be some buggy ATLAS, but the current arpack interface (the one used by sparse.linalg.eigen) is also quite buggy in my experience, though I could not reproduce it. Having a matrix which consistently reproduce the bug would be very useful. In the short term, you may want to do without arpack support in scipy. In the longer term, I intend to improve support for sparse matrices linear algebra, as it is needed for my new job. > Now I switch to use numpy.linalg.eigvals, but it is slower than > scipy.sparse.linalg.eigen.arpack module. If you have a reasonable ATLAS install, scipy.linalg.eigvals should actually be quite fast. Sparse eigenvalues solver are much slower than full ones in general as long as: - your matrices are tiny (with tiny defined here as the plain matrix requiring one order of magnitude less memory than the total available memory, so something like matrices with ~ 1e7/1e8 entries on current desktop computers) - you need more than a few eigenvalues, or not just the biggest/smallest ones cheers, David From charlesr.harris at gmail.com Thu Jan 28 01:24:30 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jan 2010 23:24:30 -0700 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B60E65D.8050804@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> Message-ID: On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau wrote: > josef.pktd at gmail.com wrote: > > Can we/someone add a warning on the front page http://scipy.org/ > > (maybe under news for numpy download) about incompatibility of the > > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > It seems that it will be quite difficult to fix the issue without > removing something (I tried to use datetime as user types, but this > opened a can of worms), so I am (quite reluctantly ) coming to the > conclusion we should just bite the bullet and change the ABI number (so > that importing anything will fail instead of crashing randomly). > > Something like numpy 1.4.0.1, which would just have a different ABI > number than 1.4.0, without anything else. > > Why do you think it would be better to make this change in 1.4 rather than 1.5? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Jan 28 01:39:57 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 15:39:57 +0900 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> Message-ID: <4B61313D.7080904@silveregg.co.jp> Charles R Harris wrote: > > > On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau > wrote: > > josef.pktd at gmail.com wrote: > > Can we/someone add a warning on the front page http://scipy.org/ > > (maybe under news for numpy download) about incompatibility of the > > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > It seems that it will be quite difficult to fix the issue without > removing something (I tried to use datetime as user types, but this > opened a can of worms), so I am (quite reluctantly ) coming to the > conclusion we should just bite the bullet and change the ABI number (so > that importing anything will fail instead of crashing randomly). > > Something like numpy 1.4.0.1, which would just have a different ABI > number than 1.4.0, without anything else. > > > Why do you think it would be better to make this change in 1.4 rather > than 1.5? Because then any extension fails to import with a clear message instead of crashing as it does now. It does not matter much if you know the crash is coming from an incompatible ABI, but it does if you don't :) cheers, David From charlesr.harris at gmail.com Thu Jan 28 02:26:44 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 28 Jan 2010 00:26:44 -0700 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B61313D.7080904@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> Message-ID: On Wed, Jan 27, 2010 at 11:39 PM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau > > wrote: > > > > josef.pktd at gmail.com wrote: > > > Can we/someone add a warning on the front page http://scipy.org/ > > > (maybe under news for numpy download) about incompatibility of the > > > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > > > It seems that it will be quite difficult to fix the issue without > > removing something (I tried to use datetime as user types, but this > > opened a can of worms), so I am (quite reluctantly ) coming to the > > conclusion we should just bite the bullet and change the ABI number > (so > > that importing anything will fail instead of crashing randomly). > > > > Something like numpy 1.4.0.1, which would just have a different ABI > > number than 1.4.0, without anything else. > > > > > > Why do you think it would be better to make this change in 1.4 rather > > than 1.5? > > Because then any extension fails to import with a clear message instead > of crashing as it does now. It does not matter much if you know the > crash is coming from an incompatible ABI, but it does if you don't :) > > But why not remove the change? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Jan 28 02:33:18 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 28 Jan 2010 16:33:18 +0900 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> Message-ID: <4B613DBE.9040509@silveregg.co.jp> Charles R Harris wrote: > > > On Wed, Jan 27, 2010 at 11:39 PM, David Cournapeau > > wrote: > > Charles R Harris wrote: > > > > > > On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau > > > >> wrote: > > > > josef.pktd at gmail.com > > wrote: > > > Can we/someone add a warning on the front page > http://scipy.org/ > > > (maybe under news for numpy download) about > incompatibility of the > > > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > > > It seems that it will be quite difficult to fix the issue without > > removing something (I tried to use datetime as user types, > but this > > opened a can of worms), so I am (quite reluctantly ) coming > to the > > conclusion we should just bite the bullet and change the ABI > number (so > > that importing anything will fail instead of crashing randomly). > > > > Something like numpy 1.4.0.1, which would just have a > different ABI > > number than 1.4.0, without anything else. > > > > > > Why do you think it would be better to make this change in 1.4 rather > > than 1.5? > > Because then any extension fails to import with a clear message instead > of crashing as it does now. It does not matter much if you know the > crash is coming from an incompatible ABI, but it does if you don't :) > > > But why not remove the change? Because Travis was against it when it was suggested last september or so. And removing in 1.4.x a feature introduced in 1.4.0 is weird. David From robert.kiwanuka at gmail.com Thu Jan 28 07:50:03 2010 From: robert.kiwanuka at gmail.com (Robert Kiwanuka) Date: Thu, 28 Jan 2010 12:50:03 +0000 Subject: [Numpy-discussion] is there any alternative to savefig? In-Reply-To: References: Message-ID: Hi all, I wonder if anyone knows any alternative function in pylab (or otherwise) that could be used to save an image. My problem is as follows: --------------- from pylab import * ... figure(1) fig1 = gca() figure(2) fig2 = gca() figure(3) fig3 = gca() for i,data_file in enumerate(data_file_list): time,x, y,x2, y2 = read_csv_file_4(open (data_file),elements=num_of_ elements) fig1.plot(-x,-y,color=colours[i],label=labellist[i]) fig2.plot(time,-y,color=colours[i],label=labellist[i]) fig3.plot(time,-x,color=colours[i],label=labellist[i]) fig1.legend(loc='best') fig1.set_title("y1 - x1") fig1.set_ylabel("y1") fig1.set_xlabel("x1") #savefig("y1-x1.png") fig2.legend(loc='best') fig2.set_title("y1 - time") fig2.set_ylabel("y1") fig2.set_xlabel("time[s]") #savefig("y1-time.png") fig3.legend(loc='best') fig3.set_title("x1 - time") fig3.set_ylabel("x1") fig3.set_xlabel("time[s]") #savefig("x1-time.png") show() --------------------------- In the above code, I read multiple data files and plot three separate figures. Now I would like to save each of the figures to a file as the commented savefig satements suggest. The trouble is that if I uncomment all those savefig statements, I get three saved images all containing the plot belonging to figure(3), which was the last figure declared. I understand this to be happening because savefig will save the "current" figure, which in this case happens to be the last one declared. If I could do something like fig1.savefig("y1-x1.png") or savefig("y1- x1.png").fig1, this would solve the problem but I'm not aware of any such methods or modules to enable this. This is thus a flaw in the general design/implementation of the savefig function, but is there an alternative function to enable me achieve what I need? Is there perhaps a possible tweak to savefig to make it do the same? Thanks in advance, Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jan 28 07:57:08 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 28 Jan 2010 14:57:08 +0200 Subject: [Numpy-discussion] is there any alternative to savefig? In-Reply-To: References: Message-ID: <1264683428.16723.18.camel@talisman> to, 2010-01-28 kello 12:50 +0000, Robert Kiwanuka kirjoitti: [clip] > If I could do something like fig1.savefig("y1-x1.png") or savefig("y1- > x1.png").fig1, this would solve the problem but I'm not aware of any > such methods or modules to enable this. This is thus a flaw in the > general design/implementation of the savefig function, but is there an > alternative function to enable me achieve what I need? Is there > perhaps a possible tweak to savefig to make it do the same? Well, you almost had the answer there: use fig1fig = figure(1) ... fig1fig.savefig("y1-x1.png") fig2fig.savefig("y1-time.png") fig3fig.savefig("x1-time.png") to save the respective figures. There is a separate mailing list for Matplotlib-specific questions: http://sourceforge.net/mail/?group_id=80706 -- Pauli Virtanen From dagss at student.matnat.uio.no Thu Jan 28 08:00:25 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 28 Jan 2010 14:00:25 +0100 Subject: [Numpy-discussion] is there any alternative to savefig? In-Reply-To: References: Message-ID: <4B618A69.5030701@student.matnat.uio.no> Robert Kiwanuka wrote: > Hi all, > > I wonder if anyone knows any alternative function in pylab (or > otherwise) that could be used to save an image. My problem is as > follows: > > --------------- > from pylab import * > ... > > figure(1) > fig1 = gca() > figure(2) > fig2 = gca() > figure(3) > fig3 = gca() You should not use the pylab interface for stuff like this. This is much easier if you get rid of the notion of "current plot". from matplotlib import pyplot as plt fig1 = plt.figure() ax1 = fig1.add_subplot(111) ax1.plot_something... fig1.savefig(...) Etc., see matplotlib docs. Dag Sverre From josef.pktd at gmail.com Thu Jan 28 09:01:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 28 Jan 2010 09:01:13 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B61313D.7080904@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> Message-ID: <1cd32cbb1001280601g54957f1aq4cbce3fc6b5c25ec@mail.gmail.com> On Thu, Jan 28, 2010 at 1:39 AM, David Cournapeau wrote: > Charles R Harris wrote: >> >> >> On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau > > wrote: >> >> ? ? josef.pktd at gmail.com wrote: >> ? ? ?> Can we/someone add a warning on the front page http://scipy.org/ >> ? ? ?> (maybe under news for numpy download) about incompatibility of the >> ? ? ?> binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? >> >> ? ? It seems that it will be quite difficult to fix the issue without >> ? ? removing something (I tried to use datetime as user types, but this >> ? ? opened a can of worms), so I am (quite reluctantly ) coming to the >> ? ? conclusion we should just bite the bullet and change the ABI number (so >> ? ? that importing anything will fail instead of crashing randomly). >> >> ? ? Something like numpy 1.4.0.1, which would just have a different ABI >> ? ? number than 1.4.0, without anything else. >> >> >> Why do you think it would be better to make this change in 1.4 rather >> than 1.5? > > Because then any extension fails to import with a clear message instead > of crashing as it does now. It does not matter much if you know the > crash is coming from an incompatible ABI, but it does if you don't :) I thought we could get away with a small binary incompatibility, without rebuilding everything. I'm using matplotlib although not extensively and it didn't crash in a while. (I don't remember which version of scipy I used for the last time when I had a crashing script.) I just tried to build h5py which does not import at all with 1.4.0, but I only get compiler errors about using headers only with Visual C++ and conflicting types for 'ssize_t' . Is there a way to find out which extensions use the binary incompatible part. Since it took a long time to confirm the ABI breakage, I would think it's not in a heavily used part. Although, I'm not sure since I had to rebuild for example most scikits with 1.4 either because of the cython issue or because of this. Personally, I would remove the ABI breakage for 1.4.1, I rather have a working SciPy than a new "experimental" feature. That's my opinion as a consumer of binary distributions. Josef > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From dagss at student.matnat.uio.no Thu Jan 28 09:09:41 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 28 Jan 2010 15:09:41 +0100 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <1cd32cbb1001280601g54957f1aq4cbce3fc6b5c25ec@mail.gmail.com> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> <1cd32cbb1001280601g54957f1aq4cbce3fc6b5c25ec@mail.gmail.com> Message-ID: <4B619AA5.2090601@student.matnat.uio.no> josef.pktd at gmail.com wrote: > On Thu, Jan 28, 2010 at 1:39 AM, David Cournapeau wrote: > >> Charles R Harris wrote: >> >>> On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau >> > wrote: >>> >>> josef.pktd at gmail.com wrote: >>> > Can we/someone add a warning on the front page http://scipy.org/ >>> > (maybe under news for numpy download) about incompatibility of the >>> > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? >>> >>> It seems that it will be quite difficult to fix the issue without >>> removing something (I tried to use datetime as user types, but this >>> opened a can of worms), so I am (quite reluctantly ) coming to the >>> conclusion we should just bite the bullet and change the ABI number (so >>> that importing anything will fail instead of crashing randomly). >>> >>> Something like numpy 1.4.0.1, which would just have a different ABI >>> number than 1.4.0, without anything else. >>> >>> >>> Why do you think it would be better to make this change in 1.4 rather >>> than 1.5? >>> >> Because then any extension fails to import with a clear message instead >> of crashing as it does now. It does not matter much if you know the >> crash is coming from an incompatible ABI, but it does if you don't :) >> > > I thought we could get away with a small binary incompatibility, > without rebuilding everything. I'm using matplotlib although not > extensively and it didn't crash in a while. (I don't remember which > version of scipy I used for the last time when I had a crashing > script.) > This made my hairs stand up on my back... Even if you check the "widely used" extensions for usecases which are affected by the breakage, you'll never get to check all custom propriotary C code around using NumPy, and their authors might easily miss this thread. In face of this, I actually think the current behaviour of Cython is a lucky accident, as all Cython code refuse to run with upgraded NumPy without being recompiled :-) (Only joking though; the next version of Cython will work across ABI versions). Dag Sverre From bsouthey at gmail.com Thu Jan 28 09:46:43 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 28 Jan 2010 08:46:43 -0600 Subject: [Numpy-discussion] wired error message in scipy.sparse.eigen function: Segmentation fault In-Reply-To: <4B612A8D.3060401@silveregg.co.jp> References: <4B60EC22.5070001@gmail.com> <4B60FF24.7040504@silveregg.co.jp> <4B6102B6.400@gmail.com> <4B610625.4060303@silveregg.co.jp> <4B611EEE.8040500@gmail.com> <4B612A8D.3060401@silveregg.co.jp> Message-ID: On Thu, Jan 28, 2010 at 12:11 AM, David Cournapeau wrote: > Jankins wrote: >> Yes. I am using scipy.sparse.linalg.eigen.arpack. >> >> The exact output is: >> >> /usr/local/lib/python2.6/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so > > I need the output of ldd on this file, actually, i.e the output of "ldd > /usr/local/lib/python2.6/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.so". > It should output the libraries actually loaded by the OS. > >> In fact, the matrix is from a directed graph with about 18,000 nodes and >> 41,000 edges. Actually, this matrix is the smallest one I used. > > Is it available somewhere ? 41000 edges should make the matrix very > sparse. I first thought that your problem may be some buggy ATLAS, but > the current arpack interface (the one used by sparse.linalg.eigen) is > also quite buggy in my experience, though I could not reproduce it. > Having a matrix which consistently reproduce the bug would be very useful. > > In the short term, you may want to do without arpack support in scipy. > In the longer term, I intend to improve support for sparse matrices > linear algebra, as it is needed for my new job. > >> Now I switch to use numpy.linalg.eigvals, but it is slower than >> scipy.sparse.linalg.eigen.arpack module. > > If you have a reasonable ATLAS install, scipy.linalg.eigvals should > actually be quite fast. Sparse eigenvalues solver are much slower than > full ones in general as long as: > ? ? ? ?- your matrices are tiny (with tiny defined here as the plain matrix > requiring one order of magnitude less memory than the total available > memory, so something like matrices with ~ 1e7/1e8 entries on current > desktop computers) > ? ? ? ?- you need more than a few eigenvalues, or not just the > biggest/smallest ones > > cheers, > > David You are using Atlas version 3.6, perhaps you should upgrade to a more recent version (3.8.x)? What version of numpy are you using? Where did Atlas etc come from? Did you install both numpy and scipy from scratch (preferably built at the same time against the same library versions)? Sometimes removing everything and then rebuilding or reinstalling everything from scratch can help Perhaps less of a concern, but since your OS is 32-bit, is everything 32-bit and do you have sufficient memory for the system to run your code? After that, the array and code in question is need. Bruce From bsouthey at gmail.com Thu Jan 28 09:53:44 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 28 Jan 2010 08:53:44 -0600 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: On Wed, Jan 27, 2010 at 9:24 PM, Keith Goodman wrote: > On Wed, Jan 27, 2010 at 7:13 PM, Pierre GM wrote: >> On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote: >>> I recently opened sourced one of my packages. It is a labeled array >>> that I call larry. >>> >>> A two-dimensional larry, for example, contains a 2d NumPy array with >>> labels on each row and column. A larry can have any dimension. >>> >>> Alignment by label is automatic when you add (or subtract, multiply, >>> divide) two larrys. >>> >>> larry has built-in methods such as movingsum, ranking, merge, shuffle, >>> zscore, demean, lag as well as typical NumPy methods like sum, max, >>> std, sign, clip. NaNs are treated as missing data. >> >> So you can't have an integer larry with missing data ? > > No. > (No means yes??? :-) ) So how do you distinguish between a real NaN and a missing value? (Having to check array before and after an operation is not fun.) This is one of the reasons why masked arrays are superior for missing values. Bruce From kwgoodman at gmail.com Thu Jan 28 10:07:46 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 28 Jan 2010 07:07:46 -0800 Subject: [Numpy-discussion] [ANN] New open source project for labeled arrays In-Reply-To: References: Message-ID: On Thu, Jan 28, 2010 at 6:53 AM, Bruce Southey wrote: > On Wed, Jan 27, 2010 at 9:24 PM, Keith Goodman wrote: >> On Wed, Jan 27, 2010 at 7:13 PM, Pierre GM wrote: >>> On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote: >>>> I recently opened sourced one of my packages. It is a labeled array >>>> that I call larry. >>>> >>>> A two-dimensional larry, for example, contains a 2d NumPy array with >>>> labels on each row and column. A larry can have any dimension. >>>> >>>> Alignment by label is automatic when you add (or subtract, multiply, >>>> divide) two larrys. >>>> >>>> larry has built-in methods such as movingsum, ranking, merge, shuffle, >>>> zscore, demean, lag as well as typical NumPy methods like sum, max, >>>> std, sign, clip. NaNs are treated as missing data. >>> >>> So you can't have an integer larry with missing data ? >> >> No. >> > > (No means yes??? :-) ) No. > So how do you distinguish between a real NaN and a missing value? > (Having to check array before and after an operation is not fun.) > This is one of the reasons why masked arrays are superior for missing values. Unit test coverage of larry is pretty good, so at some point I could begin porting, function by function, to ma while keeping NaN --> missing. After the porting was complete I could remove NaN --> missing and add the ability to pass in a mask or missing value marker. I don't have any experience with ma. And I have a long todo list. So ma support is not currently planned. From josef.pktd at gmail.com Thu Jan 28 10:08:23 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 28 Jan 2010 10:08:23 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B619AA5.2090601@student.matnat.uio.no> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> <1cd32cbb1001280601g54957f1aq4cbce3fc6b5c25ec@mail.gmail.com> <4B619AA5.2090601@student.matnat.uio.no> Message-ID: <1cd32cbb1001280708vc38c800i309f7730b717a4cb@mail.gmail.com> On Thu, Jan 28, 2010 at 9:09 AM, Dag Sverre Seljebotn wrote: > josef.pktd at gmail.com wrote: >> On Thu, Jan 28, 2010 at 1:39 AM, David Cournapeau wrote: >> >>> Charles R Harris wrote: >>> >>>> On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau >>> > wrote: >>>> >>>> ? ? josef.pktd at gmail.com wrote: >>>> ? ? ?> Can we/someone add a warning on the front page http://scipy.org/ >>>> ? ? ?> (maybe under news for numpy download) about incompatibility of the >>>> ? ? ?> binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? >>>> >>>> ? ? It seems that it will be quite difficult to fix the issue without >>>> ? ? removing something (I tried to use datetime as user types, but this >>>> ? ? opened a can of worms), so I am (quite reluctantly ) coming to the >>>> ? ? conclusion we should just bite the bullet and change the ABI number (so >>>> ? ? that importing anything will fail instead of crashing randomly). >>>> >>>> ? ? Something like numpy 1.4.0.1, which would just have a different ABI >>>> ? ? number than 1.4.0, without anything else. >>>> >>>> >>>> Why do you think it would be better to make this change in 1.4 rather >>>> than 1.5? >>>> >>> Because then any extension fails to import with a clear message instead >>> of crashing as it does now. It does not matter much if you know the >>> crash is coming from an incompatible ABI, but it does if you don't :) >>> >> >> I thought we could get away with a small binary incompatibility, >> without rebuilding everything. I'm using matplotlib although not >> extensively and it didn't crash in a while. (I don't remember which >> version of scipy I used for the last time when I had a crashing >> script.) >> > This made my hairs stand up on my back... Maybe this wasn't well phrased, with "we" I meant users like myself, and for example not the numpy developers. I'm strongly in favor of warnings, but not of enforced not running at all. For example I don't know whether h5py would work with numpy 1.4 if cython (I think) wouldn't prevent me from importing it. If I had the option I would stick with numpy 1.3 until the mess is cleared up and other packages are available in xxx-numpy1.4.x versions. Cheers, Josef > > Even if you check the "widely used" extensions for usecases which are > affected by the breakage, you'll never get to check all custom > propriotary C code around using NumPy, and their authors might easily > miss this thread. > > In face of this, I actually think the current behaviour of Cython is a > lucky accident, as all Cython code refuse to run with upgraded NumPy > without being recompiled :-) > > (Only joking though; the next version of Cython will work across ABI > versions). > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kiwanuka at gmail.com Thu Jan 28 10:43:29 2010 From: robert.kiwanuka at gmail.com (Robert Kiwanuka) Date: Thu, 28 Jan 2010 15:43:29 +0000 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 40, Issue 70 In-Reply-To: References: Message-ID: [snip] > > Message: 2 > Date: Thu, 28 Jan 2010 12:50:03 +0000 > From: Robert Kiwanuka > Subject: [Numpy-discussion] is there any alternative to savefig? > To: numpy-discussion at scipy.org > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi all, > > I wonder if anyone knows any alternative function in pylab (or > otherwise) that could be used to save an image. My problem is as > follows: > > --------------- > from pylab import * > ... > > figure(1) > fig1 = gca() > figure(2) > fig2 = gca() > figure(3) > fig3 = gca() > > for i,data_file in enumerate(data_file_list): > time,x, y,x2, y2 = read_csv_file_4(open > (data_file),elements=num_of_ > elements) > fig1.plot(-x,-y,color=colours[i],label=labellist[i]) > fig2.plot(time,-y,color=colours[i],label=labellist[i]) > fig3.plot(time,-x,color=colours[i],label=labellist[i]) > > fig1.legend(loc='best') > fig1.set_title("y1 - x1") > fig1.set_ylabel("y1") > fig1.set_xlabel("x1") > #savefig("y1-x1.png") > > fig2.legend(loc='best') > fig2.set_title("y1 - time") > fig2.set_ylabel("y1") > fig2.set_xlabel("time[s]") > #savefig("y1-time.png") > > fig3.legend(loc='best') > fig3.set_title("x1 - time") > fig3.set_ylabel("x1") > fig3.set_xlabel("time[s]") > #savefig("x1-time.png") > show() > --------------------------- > > In the above code, I read multiple data files and plot three separate > figures. Now I would like to save each of the figures to a file as the > commented savefig satements suggest. The trouble is that if I > uncomment all those savefig statements, I get three saved images all > containing the plot belonging to figure(3), which was the last figure > declared. > > I understand this to be happening because savefig will save the > "current" figure, which in this case happens to be the last one > declared. > > If I could do something like fig1.savefig("y1-x1.png") or savefig("y1- > x1.png").fig1, this would solve the problem but I'm not aware of any > such methods or modules to enable this. This is thus a flaw in the > general design/implementation of the savefig function, but is there an > alternative function to enable me achieve what I need? Is there > perhaps a possible tweak to savefig to make it do the same? > > Thanks in advance, > > Robert > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100128/057ec695/attachment-0001.html > > ------------------------------ > > Message: 3 > Date: Thu, 28 Jan 2010 14:57:08 +0200 > From: Pauli Virtanen > Subject: Re: [Numpy-discussion] is there any alternative to savefig? > To: Discussion of Numerical Python > Message-ID: <1264683428.16723.18.camel at talisman> > Content-Type: text/plain; charset="UTF-8" > > to, 2010-01-28 kello 12:50 +0000, Robert Kiwanuka kirjoitti: > [clip] > > If I could do something like fig1.savefig("y1-x1.png") or savefig("y1- > > x1.png").fig1, this would solve the problem but I'm not aware of any > > such methods or modules to enable this. This is thus a flaw in the > > general design/implementation of the savefig function, but is there an > > alternative function to enable me achieve what I need? Is there > > perhaps a possible tweak to savefig to make it do the same? > > Well, you almost had the answer there: use > > fig1fig = figure(1) > ... > > fig1fig.savefig("y1-x1.png") > fig2fig.savefig("y1-time.png") > fig3fig.savefig("x1-time.png") > > to save the respective figures. > > There is a separate mailing list for Matplotlib-specific questions: > http://sourceforge.net/mail/?group_id=80706 > > -- > Pauli Virtanen > > > > ------------------------------ > > Message: 4 > Date: Thu, 28 Jan 2010 14:00:25 +0100 > From: Dag Sverre Seljebotn > Subject: Re: [Numpy-discussion] is there any alternative to savefig? > To: Discussion of Numerical Python > Message-ID: <4B618A69.5030701 at student.matnat.uio.no> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Robert Kiwanuka wrote: > > Hi all, > > > > I wonder if anyone knows any alternative function in pylab (or > > otherwise) that could be used to save an image. My problem is as > > follows: > > > > --------------- > > from pylab import * > > ... > > > > figure(1) > > fig1 = gca() > > figure(2) > > fig2 = gca() > > figure(3) > > fig3 = gca() > You should not use the pylab interface for stuff like this. This is much > easier if you get rid of the notion of "current plot". > > from matplotlib import pyplot as plt > > fig1 = plt.figure() > ax1 = fig1.add_subplot(111) > > ax1.plot_something... > > fig1.savefig(...) > > Etc., see matplotlib docs. > > Dag Sverre > > [snip] Many thanks to both of you! It is all fine now: The real problem was, having defined e.g. fig1 = plt.figure(1).gca() "fig1.savefig" would not work! Instead, I should have used "plt.figure(1).savefig" where the "gca()" part is trimmed off! Regards, Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jan 28 11:31:44 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 28 Jan 2010 08:31:44 -0800 Subject: [Numpy-discussion] dtype and logical_and Message-ID: I noticed that & (logical and) does support float dtype: >> b = np.array([True, False]) >> f = np.array([1.0, 2.0]) >> i = np.array([1, 2]) >> b & b array([ True, False], dtype=bool) >> i & i array([1, 2]) >> i & f TypeError: unsupported operand type(s) for &: 'int' and 'float' But this works: >> np.logical_and(b, f) array([ True, False], dtype=bool) and this gives a different result from i & i above: >> np.logical_and(i, i) array([ True, True], dtype=bool) Why are & and np.logical_and different? If I have a class (a labeled array, larry) any suggestions on whether I should use & or np.logical_and on the the underlying arrays for the __and__ method? From robert.kern at gmail.com Thu Jan 28 11:39:16 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Jan 2010 10:39:16 -0600 Subject: [Numpy-discussion] dtype and logical_and In-Reply-To: References: Message-ID: <3d375d731001280839i6f7791f7hce4c5a6d4e85b5b2@mail.gmail.com> On Thu, Jan 28, 2010 at 10:31, Keith Goodman wrote: > I noticed that & (logical and) does support float dtype: & is not logical_and(). It is bitwise_and(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From kwgoodman at gmail.com Thu Jan 28 11:42:37 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 28 Jan 2010 08:42:37 -0800 Subject: [Numpy-discussion] dtype and logical_and In-Reply-To: <3d375d731001280839i6f7791f7hce4c5a6d4e85b5b2@mail.gmail.com> References: <3d375d731001280839i6f7791f7hce4c5a6d4e85b5b2@mail.gmail.com> Message-ID: On Thu, Jan 28, 2010 at 8:39 AM, Robert Kern wrote: > On Thu, Jan 28, 2010 at 10:31, Keith Goodman wrote: >> I noticed that & (logical and) does support float dtype: > > & is not logical_and(). It is bitwise_and(). That explains it. Thank you. From charlesr.harris at gmail.com Thu Jan 28 16:17:29 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 28 Jan 2010 14:17:29 -0700 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B613DBE.9040509@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> <4B613DBE.9040509@silveregg.co.jp> Message-ID: On Thu, Jan 28, 2010 at 12:33 AM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > On Wed, Jan 27, 2010 at 11:39 PM, David Cournapeau > > > wrote: > > > > Charles R Harris wrote: > > > > > > > > > On Wed, Jan 27, 2010 at 6:20 PM, David Cournapeau > > > > > >> > wrote: > > > > > > josef.pktd at gmail.com > > > wrote: > > > > Can we/someone add a warning on the front page > > http://scipy.org/ > > > > (maybe under news for numpy download) about > > incompatibility of the > > > > binaries on sourceforge of scipy <=0.7.1 with numpy 1.4.0 ? > > > > > > It seems that it will be quite difficult to fix the issue > without > > > removing something (I tried to use datetime as user types, > > but this > > > opened a can of worms), so I am (quite reluctantly ) coming > > to the > > > conclusion we should just bite the bullet and change the ABI > > number (so > > > that importing anything will fail instead of crashing > randomly). > > > > > > Something like numpy 1.4.0.1, which would just have a > > different ABI > > > number than 1.4.0, without anything else. > > > > > > > > > Why do you think it would be better to make this change in 1.4 > rather > > > than 1.5? > > > > Because then any extension fails to import with a clear message > instead > > of crashing as it does now. It does not matter much if you know the > > crash is coming from an incompatible ABI, but it does if you don't :) > > > > > > But why not remove the change? > > Because Travis was against it when it was suggested last september or > so. And removing in 1.4.x a feature introduced in 1.4.0 is weird. > > But wasn't that decision based on the premiss that the datetime work wouldn't break the ABI? I don't see anything weird about making 1.4 work with existing binaries. If we are going to break the ABI, and it looks like we will, then it would be better if the word went out early so that projects that depend on numpy can be prepared for the change. So my preference would be to remove the incompatibility in 1.4 and introduce it in 1.5. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Jan 28 19:58:31 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 29 Jan 2010 09:58:31 +0900 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> <4B613DBE.9040509@silveregg.co.jp> Message-ID: <4B6232B7.9060409@silveregg.co.jp> Charles R Harris wrote: > > > On Thu, Jan 28, 2010 at 12:33 AM, David Cournapeau > > wrote: > > Because Travis was against it when it was suggested last september or > so. And removing in 1.4.x a feature introduced in 1.4.0 is weird. > > > But wasn't that decision based on the premiss that the datetime work > wouldn't break the ABI? Well, this and because Travis is the BFDL of NumPy as far as I am concerned :) So I think it should be his decision whether to remove it or not. > I don't see anything weird about making 1.4 work > with existing binaries. Indeed, but that's not really what I am saying :) I am saying there is a tradeoff between breaking people's code (for people using datetime) and keeping a compatible ABI. So the decision depends quite a bit on how many people use the datetime code. > If we are going to break the ABI, and it looks > like we will, then it would be better if the word went out early so that > projects that depend on numpy can be prepared for the change. So my > preference would be to remove the incompatibility in 1.4 and introduce > it in 1.5. Assuming not many people depend on datetime in 1.4.0, that would be my preference as well. cheers, David From david.huard at gmail.com Fri Jan 29 11:49:20 2010 From: david.huard at gmail.com (David Huard) Date: Fri, 29 Jan 2010 11:49:20 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it Message-ID: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> Hi, I have a 4D "array" with a given shape, but the array is never actually created since it is large and distributed over multiple binary files. Typical usage would be to take slices across the 4D array. I'd like to know what the shape of the resulting array would be if I took a slice out of it. That is, let's say my 4D array is A, I'd like to know A[ndindex].shape without actually creating A. ndindex should support all numpy constructions (integer, boolean, array, slice, ...). I am guessing something already exists to do this, but I just can't put my finger on it. Thanks. David From josef.pktd at gmail.com Fri Jan 29 12:10:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 12:10:42 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> Message-ID: <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> On Fri, Jan 29, 2010 at 11:49 AM, David Huard wrote: > Hi, > > I have a 4D "array" with a given shape, but the array is never > actually created since it is large and distributed over multiple > binary files. Typical usage would be to take slices across the 4D > array. > > I'd like to know what the shape of the resulting array would be if I > took a slice out of it. > That is, let's say my 4D array is A, I'd like to know > > A[ndindex].shape > > without actually creating A. > > ndindex should support all numpy constructions (integer, boolean, > array, slice, ...). I am guessing something already exists to do this, > but I just can't put my finger on it. trying out some things, just because it's a puzzling question >>> indi= (slice(2,5), np.arange(2), np.arange(3)[:,None]) >>> np.broadcast(*indi).shape (3, 2) I don't know if this is ok for all possible cases, (and there are some confusing things with reordering axis, when slices and fancy indexing is mixed) Josef > Thanks. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david.huard at gmail.com Fri Jan 29 12:32:44 2010 From: david.huard at gmail.com (David Huard) Date: Fri, 29 Jan 2010 12:32:44 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> Message-ID: <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> On Fri, Jan 29, 2010 at 12:10 PM, wrote: > On Fri, Jan 29, 2010 at 11:49 AM, David Huard wrote: >> Hi, >> >> I have a 4D "array" with a given shape, but the array is never >> actually created since it is large and distributed over multiple >> binary files. Typical usage would be to take slices across the 4D >> array. >> >> I'd like to know what the shape of the resulting array would be if I >> took a slice out of it. >> That is, let's say my 4D array is A, I'd like to know >> >> A[ndindex].shape >> >> without actually creating A. >> >> ndindex should support all numpy constructions (integer, boolean, >> array, slice, ...). I am guessing something already exists to do this, >> but I just can't put my finger on it. > > trying out some things, just because it's a puzzling question > >>>> indi= (slice(2,5), np.arange(2), np.arange(3)[:,None]) >>>> np.broadcast(*indi).shape > (3, 2) > > I don't know if this is ok for all possible cases, (and there are some > confusing things with reordering axis, when slices and fancy indexing > is mixed) > Hi josef, Where then do you specify the shape of the A array ? Maybe an example would be clearer: Let's say A's shape is (10, 1, 5, 20) and the index is [::2, ..., 0] A[::2, ..., 0] shape would be (5, 1, 5) The broadcast idea has potential, I'll toy with it. David > Josef > > > >> Thanks. >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 29 12:48:22 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 12:48:22 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> Message-ID: <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> On Fri, Jan 29, 2010 at 12:32 PM, David Huard wrote: > On Fri, Jan 29, 2010 at 12:10 PM, ? wrote: >> On Fri, Jan 29, 2010 at 11:49 AM, David Huard wrote: >>> Hi, >>> >>> I have a 4D "array" with a given shape, but the array is never >>> actually created since it is large and distributed over multiple >>> binary files. Typical usage would be to take slices across the 4D >>> array. >>> >>> I'd like to know what the shape of the resulting array would be if I >>> took a slice out of it. >>> That is, let's say my 4D array is A, I'd like to know >>> >>> A[ndindex].shape >>> >>> without actually creating A. >>> >>> ndindex should support all numpy constructions (integer, boolean, >>> array, slice, ...). I am guessing something already exists to do this, >>> but I just can't put my finger on it. >> >> trying out some things, just because it's a puzzling question >> >>>>> indi= (slice(2,5), np.arange(2), np.arange(3)[:,None]) >>>>> np.broadcast(*indi).shape >> (3, 2) >> >> I don't know if this is ok for all possible cases, (and there are some >> confusing things with reordering axis, when slices and fancy indexing >> is mixed) >> > > Hi josef, > > Where then do you specify the shape of the A array ? ?Maybe an example > would be clearer: > > Let's say A's shape is (10, 1, 5, 20) > and the index is [::2, ..., 0] > > A[::2, ..., 0] shape would be (5, 1, 5) > > The broadcast idea has potential, I'll toy with it. > > David maybe this helps: >>> np.broadcast(*indi).shape (3, 2) >>> indi= (slice(slice(8,None,None).indices(10)),np.arange(2), np.arange(3)[:,None]) >>> np.broadcast(*indi).shape (3, 2) >>> slice(8,None,None).indices(10) (8, 10, 1) I'm just doing dir(slice) and look up the docs. I never used any of this. Josef > > > >> Josef >> >> >> >>> Thanks. >>> >>> David >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 29 12:53:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 12:53:46 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> Message-ID: <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> On Fri, Jan 29, 2010 at 12:48 PM, wrote: > On Fri, Jan 29, 2010 at 12:32 PM, David Huard wrote: >> On Fri, Jan 29, 2010 at 12:10 PM, ? wrote: >>> On Fri, Jan 29, 2010 at 11:49 AM, David Huard wrote: >>>> Hi, >>>> >>>> I have a 4D "array" with a given shape, but the array is never >>>> actually created since it is large and distributed over multiple >>>> binary files. Typical usage would be to take slices across the 4D >>>> array. >>>> >>>> I'd like to know what the shape of the resulting array would be if I >>>> took a slice out of it. >>>> That is, let's say my 4D array is A, I'd like to know >>>> >>>> A[ndindex].shape >>>> >>>> without actually creating A. >>>> >>>> ndindex should support all numpy constructions (integer, boolean, >>>> array, slice, ...). I am guessing something already exists to do this, >>>> but I just can't put my finger on it. >>> >>> trying out some things, just because it's a puzzling question >>> >>>>>> indi= (slice(2,5), np.arange(2), np.arange(3)[:,None]) >>>>>> np.broadcast(*indi).shape >>> (3, 2) >>> >>> I don't know if this is ok for all possible cases, (and there are some >>> confusing things with reordering axis, when slices and fancy indexing >>> is mixed) >>> >> >> Hi josef, >> >> Where then do you specify the shape of the A array ? ?Maybe an example >> would be clearer: >> >> Let's say A's shape is (10, 1, 5, 20) >> and the index is [::2, ..., 0] >> >> A[::2, ..., 0] shape would be (5, 1, 5) >> >> The broadcast idea has potential, I'll toy with it. >> >> David > > maybe this helps: > >>>> np.broadcast(*indi).shape > (3, 2) >>>> indi= (slice(slice(8,None,None).indices(10)),np.arange(2), np.arange(3)[:,None]) >>>> np.broadcast(*indi).shape > (3, 2) >>>> slice(8,None,None).indices(10) > (8, 10, 1) > > I'm just doing dir(slice) and look up the docs. I never used any of this. > > Josef I forgot about ellipsis, since I never use them, replace ellipsis by [slice(None)]*ndim or something like this I don't know how to access an ellipsis directly, is it even possible to construct an index list that contains an ellipsis? There is an object for it but I never looked at it. Josef > > >> >> >> >>> Josef >>> >>> >>> >>>> Thanks. >>>> >>>> David >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From kwgoodman at gmail.com Fri Jan 29 13:03:19 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 29 Jan 2010 10:03:19 -0800 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> Message-ID: On Fri, Jan 29, 2010 at 9:53 AM, wrote: > I forgot about ellipsis, since I never use them, > replace ellipsis by [slice(None)]*ndim or something like this > > I don't know how to access an ellipsis directly, is it even possible > to construct an index list that contains an ellipsis? > There is an object for it but I never looked at it. I haven't been following the discussion and I don't understand your question and in a moment I will accidentally hit send... >> class eli(object): ...: ...: def __init__(self): ...: pass ...: ...: def __getitem__(self, index): ...: print index ...: >> x[...] Ellipsis >> x[...,1] (Ellipsis, 1) Ellipsis is a python class. Built in, no need to import. From josef.pktd at gmail.com Fri Jan 29 13:16:41 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 13:16:41 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> Message-ID: <1cd32cbb1001291016i28cf7eeem91aaacdeb42e8536@mail.gmail.com> On Fri, Jan 29, 2010 at 12:32 PM, David Huard wrote: > On Fri, Jan 29, 2010 at 12:10 PM, ? wrote: >> On Fri, Jan 29, 2010 at 11:49 AM, David Huard wrote: >>> Hi, >>> >>> I have a 4D "array" with a given shape, but the array is never >>> actually created since it is large and distributed over multiple >>> binary files. Typical usage would be to take slices across the 4D >>> array. >>> >>> I'd like to know what the shape of the resulting array would be if I >>> took a slice out of it. >>> That is, let's say my 4D array is A, I'd like to know >>> >>> A[ndindex].shape >>> >>> without actually creating A. >>> >>> ndindex should support all numpy constructions (integer, boolean, >>> array, slice, ...). I am guessing something already exists to do this, >>> but I just can't put my finger on it. >> >> trying out some things, just because it's a puzzling question >> >>>>> indi= (slice(2,5), np.arange(2), np.arange(3)[:,None]) >>>>> np.broadcast(*indi).shape >> (3, 2) >> >> I don't know if this is ok for all possible cases, (and there are some >> confusing things with reordering axis, when slices and fancy indexing >> is mixed) >> > > Hi josef, > > Where then do you specify the shape of the A array ? ?Maybe an example > would be clearer: > > Let's say A's shape is (10, 1, 5, 20) > and the index is [::2, ..., 0] > > A[::2, ..., 0] shape would be (5, 1, 5) > > The broadcast idea has potential, I'll toy with it. > > David broadcast doesn't work as easily as I thought, the dimension to broadcast to, would have to be the same as the original array. I just struggled with something similar for my attempted rewrite for stats.nanmedian, and I think it took me a day to get the axis right. Maybe someone has a better idea. Josef > > >> Josef >> >> >> >>> Thanks. >>> >>> David >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 29 13:27:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 13:27:24 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> Message-ID: <1cd32cbb1001291027o6cb60e37j33942753a229196e@mail.gmail.com> On Fri, Jan 29, 2010 at 1:03 PM, Keith Goodman wrote: > On Fri, Jan 29, 2010 at 9:53 AM, ? wrote: >> I forgot about ellipsis, since I never use them, >> replace ellipsis by [slice(None)]*ndim or something like this >> >> I don't know how to access an ellipsis directly, is it even possible >> to construct an index list that contains an ellipsis? >> There is an object for it but I never looked at it. > > I haven't been following the discussion and I don't understand your > question and in a moment I will accidentally hit send... > >>> class eli(object): > ? ...: > ? ...: ? ? ? ? def __init__(self): > ? ...: ? ? ? ? ? ? pass > ? ...: > ? ...: ? ? def __getitem__(self, index): > ? ...: ? ? ? ? ? ? print index > ? ...: > >>> x[...] > Ellipsis >>> x[...,1] > (Ellipsis, 1) > > Ellipsis is a python class. Built in, no need to import. thanks, this makes it possible to construct index lists with Ellipsis, but it showed that my broadcast idea doesn't work this way Travis explained last year how slices and broadcasting are used for indexing, and it's quite a bit more complicated than this. Sorry for jumping in too fast. Josef >>> indi= (slice(2,5),Ellipsis, np.arange(3)[:,None]) >>> ind2 = [] >>> for i in indi: if not i is Ellipsis: ind2.append(i) else: ind2.extend([slice(None)]*2) >>> ind2 [slice(2, 5, None), slice(None, None, None), slice(None, None, None), array([[0], [1], [2]])] >>> np.broadcast(*ind2).shape (3, 1) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david.huard at gmail.com Fri Jan 29 14:42:47 2010 From: david.huard at gmail.com (David Huard) Date: Fri, 29 Jan 2010 14:42:47 -0500 Subject: [Numpy-discussion] Warning on http://scipy.org/ about binary incompatibility ? In-Reply-To: <4B6232B7.9060409@silveregg.co.jp> References: <1cd32cbb1001270739h4dc55ec0vd88575e6a9151e06@mail.gmail.com> <4B60E65D.8050804@silveregg.co.jp> <4B61313D.7080904@silveregg.co.jp> <4B613DBE.9040509@silveregg.co.jp> <4B6232B7.9060409@silveregg.co.jp> Message-ID: <91cf711d1001291142v2b964063m38fa60904aa661d@mail.gmail.com> I'm a heavy user of scikits.timeseries so I am very interested in having native datetime objects in Numpy. However, when I did play with it about a week ago. I found inconsistencies between the actual code and the NEP. The "Example of use" section mostly doesn't work. I understand the need to put it out there so it gets used, but for the moment I think potential users are still those who compile from the dev. tree anyway. Thanks for all the hard work that has been put into this, David On Thu, Jan 28, 2010 at 7:58 PM, David Cournapeau wrote: > Charles R Harris wrote: >> >> >> On Thu, Jan 28, 2010 at 12:33 AM, David Cournapeau >> > wrote: > >> >> ? ? Because Travis was against it when it was suggested last september or >> ? ? so. And removing in 1.4.x a feature introduced in 1.4.0 is weird. >> >> >> But wasn't that decision based on the premiss that the datetime work >> wouldn't break the ABI? > > Well, this and because Travis is the BFDL of NumPy as far as I am > concerned :) So I think it should be his decision whether to remove it > or not. > >> I don't see anything weird about making 1.4 work >> with existing binaries. > > Indeed, but that's not really what I am saying :) I am saying there is a > tradeoff between breaking people's code (for people using datetime) and > keeping a compatible ABI. > > So the decision depends quite a bit on how many people use the datetime > code. > >> If we are going to break the ABI, and it looks >> like we will, then it would be better if the word went out early so that >> projects that depend on numpy can be prepared for the change. So my >> preference would be to remove the incompatibility in 1.4 and introduce >> it in 1.5. > > Assuming not many people depend on datetime in 1.4.0, that would be my > preference as well. > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david.huard at gmail.com Fri Jan 29 14:58:52 2010 From: david.huard at gmail.com (David Huard) Date: Fri, 29 Jan 2010 14:58:52 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <1cd32cbb1001291027o6cb60e37j33942753a229196e@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> <1cd32cbb1001291027o6cb60e37j33942753a229196e@mail.gmail.com> Message-ID: <91cf711d1001291158k3f5f989ak4ab00229869f2b40@mail.gmail.com> For the record, here is what I came up with. import numpy as np def expand_ellipsis(index, ndim): """Replace the ellipsis, real or implied, of an index expression by slices. Parameters ---------- index : tuple Indexing expression. ndim : int Number of dimensions of the array the index applies to. Return ------ out : tuple An indexing expression of length `ndim` where the Elipsis are replaced by slices. """ n = len(index) index = index + ndim * (slice(None),) newindex = [] for i in index: try: if i == Ellipsis: newindex.extend((ndim - n + 1)*(slice(None),)) else: newindex.append(i) except: newindex.append(i) return newindex[:ndim] def indexedshape(shape, index): """Return the shape of an array sliced by index. Parameters ---------- shape : tuple Shape of the original array. index : tuple Indexing sequence. Return ------ out : tuple If array A has shape `shape`, then out = A[index].shape. Example ------- >>> indexedshape((5,4,3,2), (Ellipsis, 0)) (5,4,3) >>> indexedshape((5,4,3,2), (slice(None, None, 2), 2, [1,2], [True, False])) """ index = expand_ellipsis(index, len(shape)) out = [] for s, i in zip(shape,index): if type(i) == slice: start, stop, stride = i.indices(s) out.append(int(np.ceil((stop-start)*1./stride))) elif np.isscalar(i): pass elif getattr(i, 'dtype', None) == np.bool: out.append(i.sum()) else: out.append(len(i)) return tuple(out) def test_indexedshape(): from numpy.testing import assert_equal as eq s = (6,5,4,3) a = np.empty(s) i = np.index_exp[::4, 3:, 0, np.array([True, False, True])] eq(a[i].shape, indexedshape(s, i)) i = np.index_exp[1::4, 3:, np.array([0,1,2]), ::-1] eq(a[i].shape, indexedshape(s, i)) i = (0,) eq(a[i].shape, indexedshape(s, i)) i = (3, Ellipsis, 0) eq(a[i].shape, indexedshape(s, i)) On Fri, Jan 29, 2010 at 1:27 PM, wrote: > On Fri, Jan 29, 2010 at 1:03 PM, Keith Goodman wrote: >> On Fri, Jan 29, 2010 at 9:53 AM, ? wrote: >>> I forgot about ellipsis, since I never use them, >>> replace ellipsis by [slice(None)]*ndim or something like this >>> >>> I don't know how to access an ellipsis directly, is it even possible >>> to construct an index list that contains an ellipsis? >>> There is an object for it but I never looked at it. >> >> I haven't been following the discussion and I don't understand your >> question and in a moment I will accidentally hit send... >> >>>> class eli(object): >> ? ...: >> ? ...: ? ? ? ? def __init__(self): >> ? ...: ? ? ? ? ? ? pass >> ? ...: >> ? ...: ? ? def __getitem__(self, index): >> ? ...: ? ? ? ? ? ? print index >> ? ...: >> >>>> x[...] >> Ellipsis >>>> x[...,1] >> (Ellipsis, 1) >> >> Ellipsis is a python class. Built in, no need to import. > > thanks, this makes it possible to construct index lists with Ellipsis, > but it showed that my broadcast idea doesn't work this way > > Travis explained last year how slices and broadcasting are used for > indexing, and it's quite a bit more complicated than this. > > Sorry for jumping in too fast. > > Josef > >>>> indi= (slice(2,5),Ellipsis, np.arange(3)[:,None]) >>>> ind2 = [] >>>> for i in indi: > ? ? ? ?if not i is Ellipsis: ind2.append(i) > ? ? ? ?else: ind2.extend([slice(None)]*2) > > >>>> ind2 > [slice(2, 5, None), slice(None, None, None), slice(None, None, None), > array([[0], > ? ? ? [1], > ? ? ? [2]])] >>>> np.broadcast(*ind2).shape > (3, 1) > > > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 29 15:08:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 15:08:14 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <91cf711d1001291158k3f5f989ak4ab00229869f2b40@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> <1cd32cbb1001291027o6cb60e37j33942753a229196e@mail.gmail.com> <91cf711d1001291158k3f5f989ak4ab00229869f2b40@mail.gmail.com> Message-ID: <1cd32cbb1001291208s72fa6daase2b8e6d1e804dd43@mail.gmail.com> On Fri, Jan 29, 2010 at 2:58 PM, David Huard wrote: > For the record, here is what I came up with. > > import numpy as np > > def expand_ellipsis(index, ndim): > ? ?"""Replace the ellipsis, real or implied, of an index expression by slices. > > ? ?Parameters > ? ?---------- > ? ?index : tuple > ? ? ?Indexing expression. > ? ?ndim : int > ? ? ?Number of dimensions of the array the index applies to. > > ? ?Return > ? ?------ > ? ?out : tuple > ? ? ?An indexing expression of length `ndim` where the Elipsis are replaced > ? ? ?by slices. > ? ?""" > ? ?n = len(index) > ? ?index = index + ndim * (slice(None),) > > ? ?newindex = [] > ? ?for i in index: > ? ? ? ?try: > ? ? ? ? ? ?if i == Ellipsis: > ? ? ? ? ? ? ? ?newindex.extend((ndim - n + 1)*(slice(None),)) > ? ? ? ? ? ?else: > ? ? ? ? ? ? ? ?newindex.append(i) > ? ? ? ?except: > ? ? ? ? ? ?newindex.append(i) > > ? ?return newindex[:ndim] > > def indexedshape(shape, index): > ? ?"""Return the shape of an array sliced by index. > > ? ?Parameters > ? ?---------- > ? ?shape : tuple > ? ? ?Shape of the original array. > ? ?index : tuple > ? ? ?Indexing sequence. > > ? ?Return > ? ?------ > ? ?out : tuple > ? ? ?If array A has shape `shape`, then out = A[index].shape. > > ? ?Example > ? ?------- > ? ?>>> indexedshape((5,4,3,2), (Ellipsis, 0)) > ? ?(5,4,3) > ? ?>>> indexedshape((5,4,3,2), (slice(None, None, 2), 2, [1,2], > [True, False])) > ? ?""" > ? ?index = expand_ellipsis(index, len(shape)) > ? ?out = [] > ? ?for s, i in zip(shape,index): > ? ? ? ?if type(i) == slice: > ? ? ? ? ? ?start, stop, stride = i.indices(s) > ? ? ? ? ? ?out.append(int(np.ceil((stop-start)*1./stride))) > ? ? ? ?elif np.isscalar(i): > ? ? ? ? ? ?pass > ? ? ? ?elif getattr(i, 'dtype', None) == np.bool: > ? ? ? ? ? ?out.append(i.sum()) > ? ? ? ?else: > ? ? ? ? ? ?out.append(len(i)) > > ? ?return tuple(out) > > > def test_indexedshape(): > ? ?from numpy.testing import assert_equal as eq > ? ?s = (6,5,4,3) > ? ?a = np.empty(s) > ? ?i = np.index_exp[::4, 3:, 0, np.array([True, False, True])] > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = np.index_exp[1::4, 3:, np.array([0,1,2]), ::-1] > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = (0,) > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = (3, Ellipsis, 0) > ? ?eq(a[i].shape, indexedshape(s, i)) You did the slice part that I didn't manage, but broadcasting doesn't work correctly ? >>> i = np.index_exp[1::4, 3:, np.array([0,1,2])[:,None], ::-1] >>> i (slice(1, None, 4), slice(3, None, None), array([[0], [1], [2]]), slice(None, None, -1)) >>> a[i].shape, indexedshape(s, i) ((2, 2, 3, 1, 3), (2, 2, 3, 3)) >>> i = np.index_exp[1::4, np.array([0,1,2])[None,None,:], np.array([0,1,2])[:,None], ::-1] >>> i (slice(1, None, 4), array([[[0, 1, 2]]]), array([[0], [1], [2]]), slice(None, None, -1)) >>> a[i].shape, indexedshape(s, i) ((2, 1, 3, 3, 3), (2, 1, 3, 3)) Josef > > On Fri, Jan 29, 2010 at 1:27 PM, ? wrote: >> On Fri, Jan 29, 2010 at 1:03 PM, Keith Goodman wrote: >>> On Fri, Jan 29, 2010 at 9:53 AM, ? wrote: >>>> I forgot about ellipsis, since I never use them, >>>> replace ellipsis by [slice(None)]*ndim or something like this >>>> >>>> I don't know how to access an ellipsis directly, is it even possible >>>> to construct an index list that contains an ellipsis? >>>> There is an object for it but I never looked at it. >>> >>> I haven't been following the discussion and I don't understand your >>> question and in a moment I will accidentally hit send... >>> >>>>> class eli(object): >>> ? ...: >>> ? ...: ? ? ? ? def __init__(self): >>> ? ...: ? ? ? ? ? ? pass >>> ? ...: >>> ? ...: ? ? def __getitem__(self, index): >>> ? ...: ? ? ? ? ? ? print index >>> ? ...: >>> >>>>> x[...] >>> Ellipsis >>>>> x[...,1] >>> (Ellipsis, 1) >>> >>> Ellipsis is a python class. Built in, no need to import. >> >> thanks, this makes it possible to construct index lists with Ellipsis, >> but it showed that my broadcast idea doesn't work this way >> >> Travis explained last year how slices and broadcasting are used for >> indexing, and it's quite a bit more complicated than this. >> >> Sorry for jumping in too fast. >> >> Josef >> >>>>> indi= (slice(2,5),Ellipsis, np.arange(3)[:,None]) >>>>> ind2 = [] >>>>> for i in indi: >> ? ? ? ?if not i is Ellipsis: ind2.append(i) >> ? ? ? ?else: ind2.extend([slice(None)]*2) >> >> >>>>> ind2 >> [slice(2, 5, None), slice(None, None, None), slice(None, None, None), >> array([[0], >> ? ? ? [1], >> ? ? ? [2]])] >>>>> np.broadcast(*ind2).shape >> (3, 1) >> >> >> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Jan 29 16:01:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 29 Jan 2010 16:01:57 -0500 Subject: [Numpy-discussion] How to get the shape of an array slice without doing it In-Reply-To: <91cf711d1001291158k3f5f989ak4ab00229869f2b40@mail.gmail.com> References: <91cf711d1001290849r5e2fa56dlcde9f4fea28cc23d@mail.gmail.com> <1cd32cbb1001290910t351572a5h47a54b8d800a780a@mail.gmail.com> <91cf711d1001290932g22e23727ycf8b71eac4a590bf@mail.gmail.com> <1cd32cbb1001290948o7e67f011h792088611ad966f4@mail.gmail.com> <1cd32cbb1001290953t364c9c35u6f9a9ad9af5505db@mail.gmail.com> <1cd32cbb1001291027o6cb60e37j33942753a229196e@mail.gmail.com> <91cf711d1001291158k3f5f989ak4ab00229869f2b40@mail.gmail.com> Message-ID: <1cd32cbb1001291301s5a0b603euccdff64ba8df42a9@mail.gmail.com> On Fri, Jan 29, 2010 at 2:58 PM, David Huard wrote: > For the record, here is what I came up with. > > import numpy as np > > def expand_ellipsis(index, ndim): > ? ?"""Replace the ellipsis, real or implied, of an index expression by slices. > > ? ?Parameters > ? ?---------- > ? ?index : tuple > ? ? ?Indexing expression. > ? ?ndim : int > ? ? ?Number of dimensions of the array the index applies to. > > ? ?Return > ? ?------ > ? ?out : tuple > ? ? ?An indexing expression of length `ndim` where the Elipsis are replaced > ? ? ?by slices. > ? ?""" > ? ?n = len(index) > ? ?index = index + ndim * (slice(None),) > > ? ?newindex = [] > ? ?for i in index: > ? ? ? ?try: > ? ? ? ? ? ?if i == Ellipsis: > ? ? ? ? ? ? ? ?newindex.extend((ndim - n + 1)*(slice(None),)) > ? ? ? ? ? ?else: > ? ? ? ? ? ? ? ?newindex.append(i) > ? ? ? ?except: > ? ? ? ? ? ?newindex.append(i) > > ? ?return newindex[:ndim] > > def indexedshape(shape, index): > ? ?"""Return the shape of an array sliced by index. > > ? ?Parameters > ? ?---------- > ? ?shape : tuple > ? ? ?Shape of the original array. > ? ?index : tuple > ? ? ?Indexing sequence. > > ? ?Return > ? ?------ > ? ?out : tuple > ? ? ?If array A has shape `shape`, then out = A[index].shape. > > ? ?Example > ? ?------- > ? ?>>> indexedshape((5,4,3,2), (Ellipsis, 0)) > ? ?(5,4,3) > ? ?>>> indexedshape((5,4,3,2), (slice(None, None, 2), 2, [1,2], > [True, False])) > ? ?""" > ? ?index = expand_ellipsis(index, len(shape)) > ? ?out = [] > ? ?for s, i in zip(shape,index): > ? ? ? ?if type(i) == slice: > ? ? ? ? ? ?start, stop, stride = i.indices(s) > ? ? ? ? ? ?out.append(int(np.ceil((stop-start)*1./stride))) > ? ? ? ?elif np.isscalar(i): > ? ? ? ? ? ?pass > ? ? ? ?elif getattr(i, 'dtype', None) == np.bool: > ? ? ? ? ? ?out.append(i.sum()) > ? ? ? ?else: > ? ? ? ? ? ?out.append(len(i)) > > ? ?return tuple(out) > > > def test_indexedshape(): > ? ?from numpy.testing import assert_equal as eq > ? ?s = (6,5,4,3) > ? ?a = np.empty(s) > ? ?i = np.index_exp[::4, 3:, 0, np.array([True, False, True])] > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = np.index_exp[1::4, 3:, np.array([0,1,2]), ::-1] > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = (0,) > ? ?eq(a[i].shape, indexedshape(s, i)) > > ? ?i = (3, Ellipsis, 0) > ? ?eq(a[i].shape, indexedshape(s, i)) BTW: This will be very useful to understand slicing and indexing which is (for me) hidden in the c code. A few more test cases good: >>> i = np.index_exp[np.array([0,1,2])[:,None], np.array([0,1]),:,:] >>> a[i].shape, indexedshape(s, i) ((3, 2, 4, 3), (3, 2, 4, 3)) these are the tricky ones but the ones that show the interaction between broadcasting and slices in more than 2 dimensions >>> i = np.index_exp[np.array([0,1,2])[:,None], :, np.array([0,1])] >>> a[i].shape, indexedshape(s, i) ((3, 2, 5, 3), (3, 5, 2, 3)) >>> i = np.index_exp[np.array([0,1,2])[:,None], ..., np.array([0,1])] >>> a[i].shape, indexedshape(s, i) ((3, 2, 5, 4), (3, 5, 4, 2)) Thanks, Josef > > On Fri, Jan 29, 2010 at 1:27 PM, ? wrote: >> On Fri, Jan 29, 2010 at 1:03 PM, Keith Goodman wrote: >>> On Fri, Jan 29, 2010 at 9:53 AM, ? wrote: >>>> I forgot about ellipsis, since I never use them, >>>> replace ellipsis by [slice(None)]*ndim or something like this >>>> >>>> I don't know how to access an ellipsis directly, is it even possible >>>> to construct an index list that contains an ellipsis? >>>> There is an object for it but I never looked at it. >>> >>> I haven't been following the discussion and I don't understand your >>> question and in a moment I will accidentally hit send... >>> >>>>> class eli(object): >>> ? ...: >>> ? ...: ? ? ? ? def __init__(self): >>> ? ...: ? ? ? ? ? ? pass >>> ? ...: >>> ? ...: ? ? def __getitem__(self, index): >>> ? ...: ? ? ? ? ? ? print index >>> ? ...: >>> >>>>> x[...] >>> Ellipsis >>>>> x[...,1] >>> (Ellipsis, 1) >>> >>> Ellipsis is a python class. Built in, no need to import. >> >> thanks, this makes it possible to construct index lists with Ellipsis, >> but it showed that my broadcast idea doesn't work this way >> >> Travis explained last year how slices and broadcasting are used for >> indexing, and it's quite a bit more complicated than this. >> >> Sorry for jumping in too fast. >> >> Josef >> >>>>> indi= (slice(2,5),Ellipsis, np.arange(3)[:,None]) >>>>> ind2 = [] >>>>> for i in indi: >> ? ? ? ?if not i is Ellipsis: ind2.append(i) >> ? ? ? ?else: ind2.extend([slice(None)]*2) >> >> >>>>> ind2 >> [slice(2, 5, None), slice(None, None, None), slice(None, None, None), >> array([[0], >> ? ? ? [1], >> ? ? ? [2]])] >>>>> np.broadcast(*ind2).shape >> (3, 1) >> >> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Jan 29 19:25:09 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 29 Jan 2010 18:25:09 -0600 Subject: [Numpy-discussion] Distutils issue? [migrated] In-Reply-To: References: Message-ID: <3d375d731001291625k38d80213r6dbfd5c2f60d0153@mail.gmail.com> On Fri, Jan 29, 2010 at 18:18, Tom Davis wrote: > [This thread has been migrated from distutils-sig until such a time that it > can be determined if this is a distutils or numpy issue] > A quick recap: Basically, trying to > fix?http://projects.scipy.org/numpy/ticket/999 along with some duplicates > thereof. Robert made some changes to get us past the initial missing lists > in 8080 and 8081; now things have gotten crazy. > > ``export CFLAGS="-I/usr/include/python2.5"`` fixed the missing header issue. > I've attached the resulting dump with the patched debug output; I wasn't > able to make a lot of sense out of it. At over 2400 lines, it's a bit > lengthy. Don't use CFLAGS. Please show the dump without that flag. Also, please show me the command line(s) that you are using to try to install numpy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ralf.gommers at googlemail.com Sun Jan 31 09:43:24 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 31 Jan 2010 22:43:24 +0800 Subject: [Numpy-discussion] which Python for OS X to build installers? Message-ID: Hi, With only a few changes (see diff below) to pavement.py I managed to build a dmg installer. For this I used the Python in the bootstrap virtualenv however, instead of the one in /Library/Frameworks/Python.framework/. Does this matter? I don't have a framework build installed, the 4-way universal build did not work for me out of the box a while ago, and installer downloads from python.org are blocked here (thanks Chinese Govt., the Great Firewall is an impressive productivity killer). So I stuck with the default Apple Python till now. For making releases, would I need the framework build? Do I need 32- and 64-bit versions of Python 2.4, 2.5 and 2.6? Cheers, Ralf diff --git a/pavement.py b/pavement.py index f6c1433..bc931ec 100644 --- a/pavement.py +++ b/pavement.py @@ -88,12 +88,13 @@ SUPERPACK_BUILD = 'build-superpack' SUPERPACK_BINDIR = os.path.join(SUPERPACK_BUILD, 'binaries') options(bootstrap=Bunch(bootstrap_dir="bootstrap"), - virtualenv=Bunch(packages_to_install=["sphinx", "numpydoc"], no_site_packages=True), + virtualenv=Bunch(packages_to_install=["sphinx", "numpydoc"], + no_site_packages=False), sphinx=Bunch(builddir="build", sourcedir="source", docroot='doc'), superpack=Bunch(builddir="build-superpack"), installers=Bunch(releasedir="release", installersdir=os.path.join("release", "installers")), - doc=Bunch(doc_root="doc", + doc=Bunch(doc_root="doc", sdir=os.path.join("doc", "source"), bdir=os.path.join("doc", "build"), bdir_latex=os.path.join("doc", "build", "latex"), @@ -106,7 +107,7 @@ options(bootstrap=Bunch(bootstrap_dir="bootstrap"), MPKG_PYTHON = { "2.5": ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], - "2.6": ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"] + "2.6": ["python"] } SSE3_CFG = {'ATLAS': r'C:\local\lib\yop\sse3'} @@ -206,7 +207,7 @@ def bdist_superpack(options): copy_bdist("sse2") bdist_wininst_arch(pyver, 'sse3') copy_bdist("sse3") - + idirs = options.installers.installersdir pyver = options.python_version prepare_nsis_script(pyver, FULLVERSION) @@ -273,8 +274,8 @@ def bootstrap(options): options.virtualenv.script_name = os.path.join(options.bootstrap_dir, bscript) - options.virtualenv.no_site_packages = True - options.bootstrap.no_site_packages = True + options.virtualenv.no_site_packages = False + options.bootstrap.no_site_packages = False call_task('paver.virtual.bootstrap') sh('cd %s; %s %s' % (bdir, sys.executable, bscript)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 31 23:33:26 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 1 Feb 2010 13:33:26 +0900 Subject: [Numpy-discussion] which Python for OS X to build installers? In-Reply-To: References: Message-ID: <5b8d13221001312033j1d03642aq8a60b5a9654b9bd5@mail.gmail.com> On Sun, Jan 31, 2010 at 11:43 PM, Ralf Gommers wrote: > Hi, > > With only a few changes (see diff below) to pavement.py I managed to build a > dmg installer. For this I used the Python in the bootstrap virtualenv > however, instead of the one in /Library/Frameworks/Python.framework/. Does > this matter? Yes it does. The binary installers should target the python from python.org, nothing else. > For making releases, would I need the framework build? Do I need 32- and > 64-bit versions of Python 2.4, 2.5 and 2.6? The python from python.org do not support 64 bits (yet), so just build for ppc/x86. I never bothered with ppc64, and I think we can actually give up on ppc soon. cheers, David