From pierre.raybaut at gmail.com Sun Apr 1 03:36:35 2012 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Sun, 1 Apr 2012 09:36:35 +0200 Subject: [Numpy-discussion] ANN: Spyder v2.1.9 Message-ID: Hi all, On the behalf of Spyder's development team (http://code.google.com/p/spyderlib/people/list), I'm pleased to announce that Spyder v2.1.9 has been released and is available for Windows XP/Vista/7, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/ This is a pure maintenance release -- a lot of bugs were fixed since v2.1.8: http://code.google.com/p/spyderlib/wiki/ChangeLog Spyder is a free, open-source (MIT license) interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. Originally designed to provide MATLAB-like features (integrated help, interactive console, variable explorer with GUI-based editors for dictionaries, NumPy arrays, ...), it is strongly oriented towards scientific computing and software development. Thanks to the `spyderlib` library, Spyder also provides powerful ready-to-use widgets: embedded Python console (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor (example: http://packages.python.org/guiqwt/_images/sift2.png), dictionary editor, source code editor, etc. Description of key features with tasty screenshots can be found at: http://code.google.com/p/spyderlib/wiki/Features On Windows platforms, Spyder is also available as a stand-alone executable (don't forget to disable UAC on Vista/7). This all-in-one portable version is still experimental (for example, it does not embed sphinx -- meaning no rich text mode for the object inspector) but it should provide a working version of Spyder for Windows platforms without having to install anything else (except Python 2.x itself, of course). Don't forget to follow Spyder updates/news: * on the project website: http://code.google.com/p/spyderlib/ * and on our official blog: http://spyder-ide.blogspot.com/ Last, but not least, we welcome any contribution that helps making Spyder an efficient scientific development/computing environment. Join us to help creating your favourite environment! (http://code.google.com/p/spyderlib/wiki/NoteForContributors) Enjoy! -Pierre From ralf.gommers at googlemail.com Sun Apr 1 05:32:59 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 1 Apr 2012 11:32:59 +0200 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sat, Mar 31, 2012 at 12:39 PM, Sandro Tosi wrote: > Hi Ralf > sorry for the late reply. > > On Tue, Mar 27, 2012 at 22:29, Ralf Gommers > wrote: > > > > > > On Wed, Mar 21, 2012 at 12:28 AM, Sandro Tosi > wrote: > >> > >> Hello, > >> I've reported http://projects.scipy.org/numpy/ticket/2085 and Ralf > >> asked for bringing that up here: is anyone able to replicate the > >> problem described in that ticket? > >> > >> The debian bug tracking the problem is: > >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=664672 > > > > > > We do have an Ubuntu buildbot that runs fine with 2.7.2 (see > > buildbot.scipy.org). > > ubuntu python and build stack tends to be different than Debian ones, > so they are not exactly comparable. > > > Is that failure seen on unusual hardware or with a > > specific compiler only? > > well, I don't think so. You can check all our archs build log at > > https://buildd.debian.org/status/package.php?p=python-numpy&suite=experimental > but I saw that on my laptop (amd64) an on those logs too. There you > can find all the references to the versions of the tools used for the > build. > Thanks. Can you explain what happens when running the tests? I don't understand why the log says "Fatal Python error...Aborted" and then it happily continues (or restarts) and returns "OK (KNOWNFAIL=3, SKIP=4)" even for the 2.7.3rc1 and 3.2.3rc2 release candidates. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Sun Apr 1 06:08:04 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Sun, 1 Apr 2012 12:08:04 +0200 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 11:32, Ralf Gommers wrote: > Thanks. Can you explain what happens when running the tests? I don't > understand why the log says "Fatal Python error...Aborted" and then it > happily continues (or restarts) and returns "OK (KNOWNFAIL=3, SKIP=4)" even > for the 2.7.3rc1 and 3.2.3rc2 release candidates. I think it's some sort of stdout/stderr mixup, where ..... ----- Ran 3541 tests in 26.049s OK (KNOWNFAIL=3, SKIP=4) is printed before the information about the tests, such as: Running unit tests for numpy NumPy version 1.6.1 NumPy is installed in /build/buildd-python-numpy_1.6.1-6-i386-lYkcLV/python-numpy-1.6.1/debian/tmp/usr/lib/python2.7/dist-packages/numpy Python version 2.7.3rc1 (default, Mar 10 2012, 00:01:06) [GCC 4.6.3] nose version 1.1.2 so it might be that only the debug flavors are affected by this problem. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From ralf.gommers at googlemail.com Sun Apr 1 06:25:49 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 1 Apr 2012 12:25:49 +0200 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 12:08 PM, Sandro Tosi wrote: > On Sun, Apr 1, 2012 at 11:32, Ralf Gommers > wrote: > > Thanks. Can you explain what happens when running the tests? I don't > > understand why the log says "Fatal Python error...Aborted" and then it > > happily continues (or restarts) and returns "OK (KNOWNFAIL=3, SKIP=4)" > even > > for the 2.7.3rc1 and 3.2.3rc2 release candidates. > > I think it's some sort of stdout/stderr mixup, where > > ..... > ----- > Ran 3541 tests in 26.049s > > OK (KNOWNFAIL=3, SKIP=4) > > is printed before the information about the tests, such as: > > Running unit tests for numpy > NumPy version 1.6.1 > NumPy is installed in > > /build/buildd-python-numpy_1.6.1-6-i386-lYkcLV/python-numpy-1.6.1/debian/tmp/usr/lib/python2.7/dist-packages/numpy > Python version 2.7.3rc1 (default, Mar 10 2012, 00:01:06) [GCC 4.6.3] > nose version 1.1.2 > > so it might be that only the debug flavors are affected by this problem. > OK, that makes sense. So there are six test runs; for normal and debug builds of 2.6.7, 2.7.3rc1 and 3.2.3rc2. Only the debug builds of the RCs have a problem, debug build of 2.6.7 is fine. So I'd think that most likely there is a problem with how the debug versions of the RCs were built. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 1 07:02:15 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 1 Apr 2012 13:02:15 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> Message-ID: On Sat, Mar 31, 2012 at 9:45 PM, Travis Oliphant wrote: > > The idea is to allow people to test-out YouTrack for a few weeks and get > to know it while we migrate bugs to it. it looks like it is > straightforward to export the data out of YouTrack should we eventually > decide to use something else. > The interface looks good, but to get a feeling for how this would really work out I think admin rights are necessary. Then we can try out the command window (mass editing of issues), the rest API, etc. Could you send those out off-list? > > The idea is to host it on an external server (Rackspace or AWS that > multiple people are able to admin). So far, I like the keyboard interface > and the searchable widget on top. We will continue to work on moving > tickets into the system. > I assume you're doing that based on http://confluence.jetbrains.net/display/YTD3/Python+Client+Library? We discussed before keeping the conversion code somewhere public, is that possible already? Then we can also see the Trac --> YouTrack mapping. For example, it was unclear to me if "fix versions" equal Trac Milestones or not. Ralf On Mar 30, 2012, at 7:33 PM, Charles R Harris wrote: On Fri, Mar 30, 2012 at 4:08 PM, Maggie Mari wrote: > Hello, everyone. > > I work with Travis at Continuum, and he asked me to setup a YouTrack > server that everyone is welcome to play around with. There is a test > project currently set up, with some fake tickets. > > Here is the address: > > http://ec2-107-21-65-210.compute-1.amazonaws.com:8011/issues > > It's running on an AWS micro instance, so it might be slow at the moment. > > Any feedback or comments would be welcome. > > Looks nice, although it will take a little getting used to. It's hard to tell with these things until you have actually made some use of them. Is it configurable? I was wondering what sort of feedback you were looking for. Who will have access to these issues? Is this going to be hosted at Continuum? Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Apr 1 09:16:51 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 1 Apr 2012 14:16:51 +0100 Subject: [Numpy-discussion] [ANN] Bento 0.0.8.1 Message-ID: Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. The main features of this 0.0.8.1 release are: - Path sections can now use conditionals - More reliable convert command to migrate distutils/setuptools/distribute/distutils2 setup.py to bento - Single-file distribution can now include waf itself - Nose is not necessary to run the test suite anymore - Significant improvements to the distutils compatibility layer - LibraryDir support for backward compatibility with distutils packages relying on the package_dir feature Bento source code can be found on github: https://github.com/cournape/Bento Bento documentation is there as well: https://cournape.github.com/Bento regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From kameshkk at gmail.com Sun Apr 1 09:28:41 2012 From: kameshkk at gmail.com (Kamesh Krishnamurthy) Date: Sun, 1 Apr 2012 16:28:41 +0300 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG Message-ID: Hello all, I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. I've posted details on Stackoverflow: http://stackoverflow.com/q/9955021/974568 Can someone please let me know the reason for the performance gap? Thanks, Kamesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Sun Apr 1 11:19:17 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Sun, 1 Apr 2012 11:19:17 -0400 Subject: [Numpy-discussion] ndarray sub-classing and append function In-Reply-To: <1333175146.87934.YahooMailNeo@web193205.mail.sg3.yahoo.com> References: <1333175146.87934.YahooMailNeo@web193205.mail.sg3.yahoo.com> Message-ID: On Sat, Mar 31, 2012 at 2:25 AM, Prashant Saxena wrote: > Hi, > > I am sub-classing numpy.ndarry for vector array representation. The append > function is like this: > > ? ? def append(self, other): > ? ? ? ?self = numpy.append(self, [other], axis=0) > > Example: > vary = VectorArray([v1, v2]) > #vary = numpy.append(vary, [v1], axis=0) > vary.append(v1) > > The commented syntax (numpy syntax) is working but "vary.append(v1)" is not > working. > > Any help? You might try something like below (untested code, just meant as pointing in the right direction): self.resize(len(self) + len(v1), refcheck=False) self[len(self):] = v1 Setting refcheck=False is potentially dangerous since it means other references to the object might get corrupted. You should play with this option, but the alternative is that if there are *any* references to the object then the append will fail. - Tom > Cheers > > Prashant > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Sun Apr 1 23:36:32 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Sun, 1 Apr 2012 20:36:32 -0700 Subject: [Numpy-discussion] ndarray sub-classing and append function In-Reply-To: References: <1333175146.87934.YahooMailNeo@web193205.mail.sg3.yahoo.com> Message-ID: On Sun, Apr 1, 2012 at 8:19 AM, Tom Aldcroft wrote: > You might try something like below (untested code, just meant as > pointing in the right direction): > > self.resize(len(self) + len(v1), refcheck=False) > self[len(self):] = v1 > > Setting refcheck=False is potentially dangerous since it means other > references to the object might get corrupted. ?You should play with > this option, but the alternative is that if there are *any* references > to the object then the append will fail. exactly -- numpy arrays are not designed to be re-sizable, and there are good reasons for that. I'd suggest that you either: 1) don't have an append method at all (though maybe provide a method or function that makes a copy, like the numpy.append) 2) use a "has a" relationship, rather than subclassing -- i.e have your class use a numpy array as container internally, even though it isn't a numpy array. - you could still delegate most operations to the numpy array -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From njs at pobox.com Mon Apr 2 05:25:48 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Apr 2012 10:25:48 +0100 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: To see if this is an effect of numpy using C-order by default instead of Fortran-order, try measuring eig(x.T) instead of eig(x)? -n On Apr 1, 2012 2:28 PM, "Kamesh Krishnamurthy" wrote: > Hello all, > > I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were > linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. > I've posted details on Stackoverflow: > http://stackoverflow.com/q/9955021/974568 > > Can someone please let me know the reason for the performance gap? > > Thanks, > Kamesh > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kameshkk at gmail.com Mon Apr 2 05:57:48 2012 From: kameshkk at gmail.com (Kamesh Krishnamurthy) Date: Mon, 2 Apr 2012 12:57:48 +0300 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: Changing the array to Fortran order using numpy.ndarray.T does not help much in my machine. But, this may be important since the LAPACK routines are written in Fortran 90. On 2 April 2012 12:25, Nathaniel Smith wrote: > To see if this is an effect of numpy using C-order by default instead of > Fortran-order, try measuring eig(x.T) instead of eig(x)? > > -n > On Apr 1, 2012 2:28 PM, "Kamesh Krishnamurthy" wrote: > >> Hello all, >> >> I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both >> were linking to the Accelerate framework BLAS. NumPy turns out to be ~4x >> slower. I've posted details on Stackoverflow: >> http://stackoverflow.com/q/9955021/974568 >> >> Can someone please let me know the reason for the performance gap? >> >> Thanks, >> Kamesh >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Apr 2 09:47:25 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 2 Apr 2012 14:47:25 +0100 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 2:28 PM, Kamesh Krishnamurthy wrote: > Hello all, > > I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were > linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. > I've posted details on Stackoverflow: > http://stackoverflow.com/q/9955021/974568 > > Can someone please let me know the reason for the performance gap? > I would look at two things: - first, are you sure matlab is not using the MKL instead of accelerate framework ? I have not used matlab in ages, but you should be able to check by using otool -L to some of the core dll of matlab, to find out which libraries are linked to it - second, it could be that matlab eig and numpy eig don't use the same underlying lapack API (do they give you the same result ?). This would already be a bit harder to check, unless it is documented explicitly in matlab. regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From boogaloojb at yahoo.fr Mon Apr 2 11:12:37 2012 From: boogaloojb at yahoo.fr (Jean-Baptiste Rudant) Date: Mon, 2 Apr 2012 16:12:37 +0100 (BST) Subject: [Numpy-discussion] (no subject) Message-ID: <1333379557.9044.YahooMailMobile@web28501.mail.ukl.yahoo.com> http://motovideo.cl/videos/website_2.0/wp-content/02efpk.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From williamj at tenbase2.com Mon Apr 2 11:46:17 2012 From: williamj at tenbase2.com (William Johnston) Date: Mon, 2 Apr 2012 11:46:17 -0400 Subject: [Numpy-discussion] \*\*\*\*\*SPAM\*\*\*\*\* Re: \*\*\*\*\*SPAM\*\*\*\*\* Re: Numpy forIronPython 2.7 DLR app? In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io><8A50E415A49F4E208EBE78C74915ADA8@leviathan> Message-ID: <85857A6D366B4CAE90AC6014A36FBB40@leviathan> Hello, My email server went down. Did anyone respond to this post? Thanks, William Johnston -----Original Message----- From: William Johnston Sent: Thursday, March 29, 2012 5:59 PM To: Discussion of Numerical Python Subject: \*\*\*\*\*SPAM\*\*\*\*\* Re: [Numpy-discussion] \*\*\*\*\*SPAM\*\*\*\*\* Re: Numpy forIronPython 2.7 DLR app? Ilan: Thanks for your post. I can import from the command-line and IronPython console, but not from a C# DLR app (using embedded python scripts.) Any suggestions? Regards, William Johnston -----Original Message----- From: Ilan Schnell Sent: Thursday, March 29, 2012 3:11 PM To: Discussion of Numerical Python Subject: \*\*\*\*\*SPAM\*\*\*\*\* Re: [Numpy-discussion] Numpy for IronPython 2.7 DLR app? Hello William, It's just a matter of importing numpy into IronPython. See also, http://enthought.com/repo/.iron/NumPySciPyforDotNet.pdf - Ilan On Thu, Mar 29, 2012 at 12:47 PM, William Johnston wrote: > > Hello, > > Can numpy for .NET be used in a DLR C# application? > > Regards, > William Johnston > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Mon Apr 2 11:45:23 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 2 Apr 2012 08:45:23 -0700 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: On Mon, Apr 2, 2012 at 2:25 AM, Nathaniel Smith wrote: > To see if this is an effect of numpy using C-order by default instead of > Fortran-order, try measuring eig(x.T) instead of eig(x)? Just to be clear, .T re-arranges the strides (Making it Fortran order), butyou'll have to make sure your ariginal data is the transpose of whatyou want. I posted this on slashdot, but for completeness: the code posted on slashdot is also profiling the random number generation -- I have no idea how numpy and MATLAB's random number generation compare, nor how random number generation compares to eig(), but you should profile them independently to make sure. -Chris > -n > > On Apr 1, 2012 2:28 PM, "Kamesh Krishnamurthy" wrote: >> >> Hello all, >> >> I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were >> linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. >> I've posted details on Stackoverflow: >> http://stackoverflow.com/q/9955021/974568 >> >> Can someone please let me know the reason for the performance gap? >> >> Thanks, >> Kamesh >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From francesc at continuum.io Mon Apr 2 11:55:10 2012 From: francesc at continuum.io (Francesc Alted) Date: Mon, 02 Apr 2012 10:55:10 -0500 Subject: [Numpy-discussion] \*\*\*\*\*SPAM\*\*\*\*\* Re: \*\*\*\*\*SPAM\*\*\*\*\* Re: Numpy forIronPython 2.7 DLR app? In-Reply-To: <85857A6D366B4CAE90AC6014A36FBB40@leviathan> References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io><8A50E415A49F4E208EBE78C74915ADA8@leviathan> <85857A6D366B4CAE90AC6014A36FBB40@leviathan> Message-ID: <4F79CBDE.9040600@continuum.io> On 4/2/12 10:46 AM, William Johnston wrote: > Hello, > > My email server went down. > > Did anyone respond to this post? You can check the mail archive here: http://mail.scipy.org/pipermail/numpy-discussion -- Francesc Alted From cournape at gmail.com Mon Apr 2 12:04:36 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 2 Apr 2012 17:04:36 +0100 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: On Mon, Apr 2, 2012 at 4:45 PM, Chris Barker wrote: > On Mon, Apr 2, 2012 at 2:25 AM, Nathaniel Smith wrote: > > To see if this is an effect of numpy using C-order by default instead of > > Fortran-order, try measuring eig(x.T) instead of eig(x)? > > Just to be clear, .T re-arranges the strides (Making it Fortran > order), butyou'll have to make sure your ariginal data is the > transpose of whatyou want. > > I posted this on slashdot, but for completeness: > > the code posted on slashdot is also profiling the random number > generation -- I have no idea how numpy and MATLAB's random number > generation compare, nor how random number generation compares to > eig(), but you should profile them independently to make sure. > While this is true, the cost is most likely negligeable compared to the cost of eig (unless something weird is going on in random as well). David -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Apr 2 12:09:06 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 2 Apr 2012 11:09:06 -0500 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> Message-ID: <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> The idea of using constants instead of strings throughout NumPy is an interesting one, but should be pushed to another thread and not hold up this particular PR. I like the suggestion of Nathaniel. Let's get the PR committed with a single-function interface. I like having the array as the first argument to that function (it is more consistent). They keyword can be called mode or method Tim, what do you think of that? Further developments can happen in a separate PR. -Travis On Mar 31, 2012, at 3:07 PM, Richard Hattersley wrote: >>> 1) The use of string constants to identify NumPy processes. It would >>> seem better to use library defined constants (ufuncs?) for better >>> future-proofing, maintenance, etc. >> >> I don't see how this would help with future-proofing or maintenance -- >> can you elaborate? >> >> If this were C, I'd agree; using an enum would have a number of benefits: >> -- easier to work with than strings (== and switch work, no memory >> management hassles) >> -- compiler will notice if you accidentally misspell the enum name >> -- since you always in effect 'import *', getting access to >> additional constants doesn't require any extra effort >> But in Python none of these advantages apply, so I find it more >> convenient to just use strings. > > Using constants provides for tab-completion and associated help text. > The help text can be particularly useful if the choice of constant > affects which extra keyword arguments can be specified. > > And on a minor note, and far more subjectively (time for another > bike-shedding reference!), there's the "cleanliness" of API. (e.g. > Strings don't "feel" a good match. There are an infinite number of > strings, but only a small number are valid. There's nothing > machine-readable you can interrogate to find valid values.) Under the > hood you'll have to use the string to do a lookup, but the constant > can *be* the result of the lookup. Why re-invent the wheel when the > language gives it to you for free? > >> Note also that we couldn't use ufuncs here, because we're specifying a >> rather unusual sort of operation -- there is no ufunc for padding with >> a linear ramp etc. Using "mean" as the example is misleading in this >> respect -- it's not really the same as np.mean. >> >>> 2) Why does only "pad" use this style of interface? If it's a good >>> idea for "pad", perhaps it should be applied more generally? >>> numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone? >> >> The mode="foo" interface style is actually used in other places, e.g., >> np.linalg.qr. > > My mistake - I misinterpreted the API earlier, so we're talking at > cross-purposes. My comment/question isn't really about pad & mode, but > about numpy more generally. But it still stands - albeit somewhat > hypothetically, since it's hard to imagine such a change taking place. > > Richard > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From tim at cerazone.net Mon Apr 2 12:36:21 2012 From: tim at cerazone.net (Tim Cera) Date: Mon, 2 Apr 2012 12:36:21 -0400 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: On Mon, Apr 2, 2012 at 12:09 PM, Travis Oliphant wrote: > The idea of using constants instead of strings throughout NumPy is an > interesting one, but should be pushed to another thread and not hold up > this particular PR. > > I like the suggestion of Nathaniel. Let's get the PR committed with a > single-function interface. I like having the array as the first argument > to that function (it is more consistent). They keyword can be called mode > or method > > Tim, what do you think of that? Further developments can happen in a > separate PR. > Current pull request has a single pad function with the mode/method/whatever you call it as a string OR function as the first argument. pad('mean', a, 5) pad('median', a, 7) pad(paddingfunction, a, 2) ...etc. I like the strings, maybe that is not the best, but yes I would like to defer that discussion. Having the string representation does allow 'pad()' to make some checks on inputs to the built in functions. About whether to have "pad('mean', a, 5)" or "pad(a, 'mean', 5)" - I don't care. It seems like we have two votes for the later form (Travis and Nathaniel) and unless others weigh in, I will make the change soonish. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Mon Apr 2 12:36:20 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 2 Apr 2012 12:36:20 -0400 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: numpy.random are not optimized. If matlab use the random number from mkl, they will be much faster. Fred On Mon, Apr 2, 2012 at 12:04 PM, David Cournapeau wrote: > > > On Mon, Apr 2, 2012 at 4:45 PM, Chris Barker wrote: >> >> On Mon, Apr 2, 2012 at 2:25 AM, Nathaniel Smith wrote: >> > To see if this is an effect of numpy using C-order by default instead of >> > Fortran-order, try measuring eig(x.T) instead of eig(x)? >> >> Just to be clear, .T re-arranges the strides (Making it Fortran >> order), butyou'll have to make sure your ariginal data is the >> transpose of whatyou want. >> >> I posted this on slashdot, but for completeness: >> >> the code posted on slashdot is also profiling the random number >> generation -- I have no idea how numpy and MATLAB's random number >> generation compare, nor how random number generation compares to >> eig(), but you should profile them independently to make sure. > > > While this is true, the cost is most likely negligeable compared to the cost > of eig (unless something weird is going on in random as well). > > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From olivier.grisel at ensta.org Mon Apr 2 12:41:24 2012 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 2 Apr 2012 18:41:24 +0200 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: Le 2 avril 2012 18:36, Fr?d?ric Bastien a ?crit : > numpy.random are not optimized. If matlab use the random number from > mkl, they will be much faster. In that case this is indeed negligible: In [1]: %timeit np.random.randn(2000, 2000) 1 loops, best of 3: 306 ms per loop -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From aronne.merrelli at gmail.com Mon Apr 2 13:18:20 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Mon, 2 Apr 2012 12:18:20 -0500 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 8:28 AM, Kamesh Krishnamurthy wrote: > Hello all, > > I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were > linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. > I've posted details on Stackoverflow: > http://stackoverflow.com/q/9955021/974568 > If you just call eig() in MATLAB it only returns eigenvalues (not vectors). I think there might be a "shortcut" algorithm if you only want the eigenvalues - or maybe it is faster just due to the smaller memory requirement. NumPy's eig always computes both. On my Mac OS X machine I get this result, showing the two are basically equivalent (this is EPD NumPy, so show_config() shows it is built on MKL): MATLAB: >> tic; eig(r); toc Elapsed time is 10.594226 seconds. >> tic; [V,D] = eig(r); toc Elapsed time is 23.767467 seconds. NumPy In [4]: t0=datetime.now(); numpy.linalg.eig(r); print datetime.now()-t0 0:00:25.594435 In [6]: t0=datetime.now(); v,V = numpy.linalg.eig(r); print datetime.now()-t0 0:00:25.485411 If you change the MATLAB call, how does it compare? From charlesr.harris at gmail.com Mon Apr 2 13:38:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Apr 2012 11:38:49 -0600 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: On Mon, Apr 2, 2012 at 10:36 AM, Tim Cera wrote: > > > On Mon, Apr 2, 2012 at 12:09 PM, Travis Oliphant wrote: > >> The idea of using constants instead of strings throughout NumPy is an >> interesting one, but should be pushed to another thread and not hold up >> this particular PR. >> >> I like the suggestion of Nathaniel. Let's get the PR committed with a >> single-function interface. I like having the array as the first argument >> to that function (it is more consistent). They keyword can be called mode >> or method >> >> Tim, what do you think of that? Further developments can happen in a >> separate PR. >> > > Current pull request has a single pad function with the > mode/method/whatever you call it as a string OR function as the first > argument. > > pad('mean', a, 5) > pad('median', a, 7) > pad(paddingfunction, a, 2) > ...etc. > > I like the strings, maybe that is not the best, but yes I would like to > defer that discussion. Having the string representation does allow 'pad()' > to make some checks on inputs to the built in functions. > > About whether to have "pad('mean', a, 5)" or "pad(a, 'mean', 5)" - I don't > care. It seems like we have two votes for the later form (Travis and > Nathaniel) and unless others weigh in, I will make the change soonish. > > I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Apr 2 13:56:29 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 2 Apr 2012 12:56:29 -0500 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: <1214690B-ED84-43CA-8490-A9FC636E58FA@continuum.io> > I like the strings, maybe that is not the best, but yes I would like to defer that discussion. Having the string representation does allow 'pad()' to make some checks on inputs to the built in functions. > > About whether to have "pad('mean', a, 5)" or "pad(a, 'mean', 5)" - I don't care. It seems like we have two votes for the later form (Travis and Nathaniel) and unless others weigh in, I will make the change soonish. > > > I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. It looks like most agree that pad(a, 5, mode='mean') is the preferred API. In terms of the default, it really depends on the problem as to which one is most sensible. I notice that mode='edge' does not require additional keyword arguments and so perhaps this is the best default. But, I think Tim should make the call on which default he prefers. -Travis > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 2 14:05:40 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Apr 2012 19:05:40 +0100 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: On Mon, Apr 2, 2012 at 6:18 PM, Aronne Merrelli wrote: > On Sun, Apr 1, 2012 at 8:28 AM, Kamesh Krishnamurthy wrote: >> Hello all, >> >> I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were >> linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. >> I've posted details on Stackoverflow: >> http://stackoverflow.com/q/9955021/974568 > > If you just call eig() in MATLAB it only returns eigenvalues (not > vectors). I think there might be a "shortcut" algorithm if you only > want the eigenvalues - or maybe it is faster just due to the smaller > memory requirement. NumPy's eig always computes both. On my Mac OS X > machine I get this result, showing the two are basically equivalent > (this is EPD NumPy, so show_config() shows it is built on MKL): > > MATLAB: >>> tic; eig(r); toc > Elapsed time is 10.594226 seconds. >>> tic; [V,D] = eig(r); toc > Elapsed time is 23.767467 seconds. > > NumPy > In [4]: t0=datetime.now(); numpy.linalg.eig(r); print datetime.now()-t0 > 0:00:25.594435 > In [6]: t0=datetime.now(); v,V = numpy.linalg.eig(r); print datetime.now()-t0 > 0:00:25.485411 > > If you change the MATLAB call, how does it compare? Or you could alternatively change the numpy call to np.linalg.eigvals(r), if you're only interested in the eigenvalues. - N From tim at cerazone.net Mon Apr 2 14:14:59 2012 From: tim at cerazone.net (Tim Cera) Date: Mon, 2 Apr 2012 14:14:59 -0400 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: > > > I think the suggestion is pad(a, 5, mode='mean'), which would be > consistent with common numpy signatures. The mode keyword should probably > have a default, something commonly used. I'd suggest 'mean', Nathaniel > suggests 'zero', I think either would be fine. > I can't type fast enough. :-) I should say that I can't type faster than Travis since he has already responded.... Currently that '5' in the example above is the keyword argument 'pad_width' which defaults to 1. So really the only argument then is 'a'? Everything else is keywords? I missed that in the discussion and I am not sure that it is a good idea. In fact as I am typing this I am thinking that we should have pad_width as an argument. I hate to rely on this, because it tends to get overused, but 'Explicit is better than implicit.' 'pad(a)' would carry a lot of implicit baggage that would mean it would be very difficult to figure out what was going on if reading someone else's code. Someone unfamiliar with the pad routine must consult the documentation to figure out what 'pad(a)' meant whereas "pad(a, 'mean', 1)", regardless of the order of the arguments, would actually read pretty well. I defer to a 'consensus' - whatever that might mean, but I am actually thinking that the input array, mode/method, and the pad_width should be arguments. The order of the arguments - I don't care. I realize that this thread is around 26 messages long now, but if everyone who is interested in this could weigh in one more time about this one issue. To minimize discussion on the list, you can add a comment to the pull request at https://github.com/numpy/numpy/pull/242 Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Apr 2 15:49:09 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 2 Apr 2012 14:49:09 -0500 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: On the one hand it is nice to be explicit. On the other hand it is nice to have keyword arguments. In this case it is very true that pad(a) would not be very clear. Most clear, though, would be: pad(a, width=5, mode='mean'). You could use keyword arguments with None as the default and raise an error if a correct value is not passed in. -Travis On Apr 2, 2012, at 1:14 PM, Tim Cera wrote: > > I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. > > I can't type fast enough. :-) I should say that I can't type faster than Travis since he has already responded.... > > Currently that '5' in the example above is the keyword argument 'pad_width' which defaults to 1. So really the only argument then is 'a'? Everything else is keywords? I missed that in the discussion and I am not sure that it is a good idea. In fact as I am typing this I am thinking that we should have pad_width as an argument. I hate to rely on this, because it tends to get overused, but 'Explicit is better than implicit.' > > 'pad(a)' would carry a lot of implicit baggage that would mean it would be very difficult to figure out what was going on if reading someone else's code. Someone unfamiliar with the pad routine must consult the documentation to figure out what 'pad(a)' meant whereas "pad(a, 'mean', 1)", regardless of the order of the arguments, would actually read pretty well. > > I defer to a 'consensus' - whatever that might mean, but I am actually thinking that the input array, mode/method, and the pad_width should be arguments. The order of the arguments - I don't care. > > I realize that this thread is around 26 messages long now, but if everyone who is interested in this could weigh in one more time about this one issue. To minimize discussion on the list, you can add a comment to the pull request at https://github.com/numpy/numpy/pull/242 > > Kindest regards, > Tim > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Apr 2 16:16:08 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 02 Apr 2012 22:16:08 +0200 Subject: [Numpy-discussion] Trac configuration tweak In-Reply-To: References: Message-ID: 31.03.2012 18:19, Pauli Virtanen kirjoitti: > I moved projects.scipy.org Tracs to run on mod_python (instead of CGI), > in order to try to combat the present performance issues. Let's see if > this helps with the "database is locked" problem. > > Please drop me a message if something stops working. Ok, this seemed at first sight help somewhat. However, rolling them back to CGI became necessary, due to uncontrollable mod_python memory usage. Some other solution should be taken. Pauli From travis at continuum.io Mon Apr 2 16:47:38 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 2 Apr 2012 15:47:38 -0500 Subject: [Numpy-discussion] Trac configuration tweak In-Reply-To: References: Message-ID: The plan is use a different issue tracker. We are trying out YouTrack right now and hope to export the Trac database into YouTrack. -Travis the plOn Apr 2, 2012, at 3:16 PM, Pauli Virtanen wrote: > 31.03.2012 18:19, Pauli Virtanen kirjoitti: >> I moved projects.scipy.org Tracs to run on mod_python (instead of CGI), >> in order to try to combat the present performance issues. Let's see if >> this helps with the "database is locked" problem. >> >> Please drop me a message if something stops working. > > Ok, this seemed at first sight help somewhat. However, rolling them back > to CGI became necessary, due to uncontrollable mod_python memory usage. > Some other solution should be taken. > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Mon Apr 2 16:58:32 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 02 Apr 2012 22:58:32 +0200 Subject: [Numpy-discussion] Trac configuration tweak In-Reply-To: References: Message-ID: Hi, 02.04.2012 22:47, Travis Oliphant kirjoitti: > The plan is use a different issue tracker. > We are trying out YouTrack right now and hope to > export the Trac database into YouTrack. Certainly, I'm aware :) However, was the plan to also migrate the Scipy Trac? I understood the answer to this was "no". If the aim is to migrate also that, then the Trac can stay it is. Pauli From hongbin_zhang82 at hotmail.com Mon Apr 2 17:35:06 2012 From: hongbin_zhang82 at hotmail.com (Hongbin Zhang) Date: Tue, 3 Apr 2012 05:35:06 +0800 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine Message-ID: Dear Python-users, I am currently very confused about the Scipy routine to obtain the eigenvectors of a complex matrix.In attached you find two files to diagonalize a 2X2 complex Hermitian matrix, however, on my computer, If I run python, I got: [[ 0.80322132+0.j 0.59500941+0.02827207j] [-0.59500941+0.02827207j 0.80322132+0.j ]] If I compile the fortran code, I got: ( -0.595009410289, -0.028272068905) ( 0.802316135182, 0.038122316497) ( -0.803221321796, 0.000000000000) ( -0.595680709955, 0.000000000000) >From the scipy webpage, it is said that numpy.linalg.eig() provides nothing butan interface to lapack zheevd subroutine, which is used in my fortran code. Would somebody be kind to tell me how to get consistent results? Many thanks in advance. Best wishes, Hongbin Ad hoc, ad loc and quid pro quo --- Jeremy Hilary Boob -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 2X2.f90 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2X2.py Type: text/x-script.phyton Size: 144 bytes Desc: not available URL: From travis at continuum.io Mon Apr 2 17:35:44 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 2 Apr 2012 16:35:44 -0500 Subject: [Numpy-discussion] Trac configuration tweak In-Reply-To: References: Message-ID: Sorry, I saw the cross-posting to the NumPy list and wondered if we were on the same page. I don't know of any plans to migrate SciPy Trac at this time: perhaps later. Thanks for the clarification. Best, -Travis I don'tOn Apr 2, 2012, at 3:58 PM, Pauli Virtanen wrote: > Hi, > > 02.04.2012 22:47, Travis Oliphant kirjoitti: >> The plan is use a different issue tracker. >> We are trying out YouTrack right now and hope to >> export the Trac database into YouTrack. > > Certainly, I'm aware :) > > However, was the plan to also migrate the Scipy Trac? I understood the > answer to this was "no". If the aim is to migrate also that, then the > Trac can stay it is. > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Mon Apr 2 20:36:35 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Apr 2012 20:36:35 -0400 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: 2012/4/2 Hongbin Zhang : > Dear Python-users, > > I am currently very confused about the Scipy routine to obtain the > eigenvectors of a complex matrix. > In attached you find two files to diagonalize a 2X2 complex Hermitian > matrix, however, on my computer, > > If I run python, I got: > > [[ 0.80322132+0.j ? ? ? ? ?0.59500941+0.02827207j] > ?[-0.59500941+0.02827207j ?0.80322132+0.j ? ? ? ?]] > > If I compile the fortran code, I got: > > ?( -0.595009410289, -0.028272068905) ( ?0.802316135182, ?0.038122316497) > ?( -0.803221321796, ?0.000000000000) ( -0.595680709955, ?0.000000000000) these results look more like eigh (except flipped) >>> numpy.linalg.eigh(numpy.array(H))[1] array([[ 0.59568071+0.j , -0.80322132+0.j ], [ 0.80231613-0.03812232j, 0.59500941-0.02827207j]]) Josef > > From the scipy webpage, it is said that numpy.linalg.eig() provides nothing > but > an interface to lapack zheevd subroutine, which is used in my fortran code. > > < /div> > Would somebody be kind to tell me how to get consistent results? > > Many thanks in advance. > > Best wishes, > > Hongbin > > > > > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Ad hoc, ad loc and > quid pro quo > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- ??Jeremy Hilary Boob > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kalatsky at gmail.com Mon Apr 2 20:38:54 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Mon, 2 Apr 2012 19:38:54 -0500 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: Both results are correct. There are 2 factors that make the results look different: 1) The order: the 2nd eigenvector of the numpy solution corresponds to the 1st eigenvector of your solution, note that the vectors are written in columns. 2) The phase: an eigenvector can be multiplied by an arbitrary phase factor with absolute value = 1. As you can see this factor is -1 for the 2nd eigenvector and -0.99887305445887753-0.047461785427773337j for the other one. Val 2012/4/2 Hongbin Zhang > Dear Python-users, > > I am currently very confused about the Scipy routine to obtain the > eigenvectors of a complex matrix. > In attached you find two files to diagonalize a 2X2 complex Hermitian > matrix, however, on my computer, > > If I run python, I got: > > [[ 0.80322132+0.j 0.59500941+0.02827207j] > [-0.59500941+0.02827207j 0.80322132+0.j ]] > > If I compile the fortran code, I got: > > ( -0.595009410289, -0.028272068905) ( 0.802316135182, 0.038122316497) > ( -0.803221321796, 0.000000000000) ( -0.595680709955, 0.000000000000) > > From the scipy webpage, it is said that numpy.linalg.eig() provides > nothing but > an interface to lapack zheevd subroutine, which is used in my fortran code. > > < /div> > Would somebody be kind to tell me how to get consistent results? > > Many thanks in advance. > > Best wishes, > > Hongbin > > > > > Ad hoc, ad loc > and quid pro quo > > > --- Jeremy Hilary Boob > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Apr 2 21:15:56 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 2 Apr 2012 18:15:56 -0700 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: Hi, 2012/4/2 Hongbin Zhang : > Dear Python-users, > > I am currently very confused about the Scipy routine to obtain the > eigenvectors of a complex matrix. > In attached you find two files to diagonalize a 2X2 complex Hermitian > matrix, however, on my computer, > > If I run python, I got: > > [[ 0.80322132+0.j ? ? ? ? ?0.59500941+0.02827207j] > ?[-0.59500941+0.02827207j ?0.80322132+0.j ? ? ? ?]] > > If I compile the fortran code, I got: > > ?( -0.595009410289, -0.028272068905) ( ?0.802316135182, ?0.038122316497) > ?( -0.803221321796, ?0.000000000000) ( -0.595680709955, ?0.000000000000) > > From the scipy webpage, it is said that numpy.linalg.eig() provides nothing > but > an interface to lapack zheevd subroutine, which is used in my fortran code. > > < /div> > Would somebody be kind to tell me how to get consistent results? I should also point out that matlab and octave give the same answer as your Fortran routine: octave:15> H=[0.6+0.0j, -1.97537668-0.09386068j; -1.97537668+0.09386068j, -0.6+0.0j] H = 0.60000 + 0.00000i -1.97538 - 0.09386i -1.97538 + 0.09386i -0.60000 + 0.00000i Best, Matthew From matthew.brett at gmail.com Mon Apr 2 21:53:57 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 2 Apr 2012 18:53:57 -0700 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: Hi, On Mon, Apr 2, 2012 at 5:38 PM, Val Kalatsky wrote: > Both results are correct. > There are 2 factors that make the results look different: > 1) The order: the 2nd eigenvector of the numpy solution corresponds to the > 1st?eigenvector of your solution, > note that the vectors are written in columns. > 2) The phase: an eigenvector can be multiplied by an arbitrary phase factor > with absolute value = 1. > As you can see this factor is -1 for the 2nd eigenvector > and?-0.99887305445887753-0.047461785427773337j for the other one. Thanks for this answer; for my own benefit: Definition: A . v = L . v where A is the input matrix, L is an eigenvalue of A and v is an eigenvector of A. http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix In [63]: A = [[0.6+0.0j, -1.97537668-0.09386068j],[-1.97537668+0.09386068j, -0.6+0.0j]] In [64]: L, v = np.linalg.eig(A) In [66]: np.allclose(np.dot(A, v), L * v) Out[66]: True Best, Matthew From kalatsky at gmail.com Mon Apr 2 23:19:55 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Mon, 2 Apr 2012 22:19:55 -0500 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: BTW this extra degree of freedom can be used to "rotate" the eigenvectors along the unit circle (multiplication by exp(j*phi)). To those of physical inclinations it should remind of gauge fixing (vector potential in EM/QM). These "rotations" can be used to make one (any) non-zero component of each eigenvector be positive real number. Finally to the point: it seems that numpy.linalg.eig uses these "rotations" to turn the diagonal elements in the eigenvector matrix to real positive numbers, that's why the numpy solutions looks neat. Val PS Probably nobody cares to know, but the phase factor I gave in my 1st email should be negated: 0.99887305445887753+0.047461785427773337j On Mon, Apr 2, 2012 at 8:53 PM, Matthew Brett wrote: > Hi, > > On Mon, Apr 2, 2012 at 5:38 PM, Val Kalatsky wrote: > > Both results are correct. > > There are 2 factors that make the results look different: > > 1) The order: the 2nd eigenvector of the numpy solution corresponds to > the > > 1st eigenvector of your solution, > > note that the vectors are written in columns. > > 2) The phase: an eigenvector can be multiplied by an arbitrary phase > factor > > with absolute value = 1. > > As you can see this factor is -1 for the 2nd eigenvector > > and -0.99887305445887753-0.047461785427773337j for the other one. > > Thanks for this answer; for my own benefit: > > Definition: A . v = L . v where A is the input matrix, L is an > eigenvalue of A and v is an eigenvector of A. > > http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix > > In [63]: A = [[0.6+0.0j, > -1.97537668-0.09386068j],[-1.97537668+0.09386068j, -0.6+0.0j]] > > In [64]: L, v = np.linalg.eig(A) > > In [66]: np.allclose(np.dot(A, v), L * v) > Out[66]: True > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin_zhang82 at hotmail.com Tue Apr 3 03:02:18 2012 From: hongbin_zhang82 at hotmail.com (Hongbin Zhang) Date: Tue, 3 Apr 2012 15:02:18 +0800 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: , , , Message-ID: Hej Val, Thank you very much for your replies. Yes, I know that both eigenvectors are correct while they are indeed related to each other by unitary transformations (unitary matrices). Actually, what I am trying to do is to evaluate the Berry phase which is closely related to the gauge chosen. It is okay to apply an arbitrary phase to the eigenvectors, while to get the (meaningful) physical quantity the phase should be consistent for all the other eigenvectors. To my understanding, if I run both Fortran and python on the same computer, they should have the same phase (that is the arbitrary phase is computer-dependent). Maybe some additional "rotations" have been performed in python, but should this be written/commented somewhere in the man page? I will try to fix this by performing additional rotation to make the diagonal elements real and check whether this is the solution or not. Thank you all again, and of course more insightful suggestions are welcome. Regards, Hongbin Ad hoc, ad loc and quid pro quo --- Jeremy Hilary Boob Date: Mon, 2 Apr 2012 22:19:55 -0500 From: kalatsky at gmail.com To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] One question about the numpy.linalg.eig() routine BTW this extra degree of freedom can be used to "rotate" the eigenvectors along the unit circle (multiplication by exp(j*phi)). To those of physical inclinations it should remind of gauge fixing (vector potential in EM/QM). These "rotations" can be used to make one (any) non-zero component of each eigenvector be positive real number. Finally to the point: it seems that numpy.linalg.eig uses these "rotations" to turn the diagonal elements in the eigenvector matrix to real positive numbers, that's why the numpy solutions looks neat. Val PS Probably nobody cares to know, but the phase factor I gave in my 1st email should be negated: 0.99887305445887753+0.047461785427773337j On Mon, Apr 2, 2012 at 8:53 PM, Matthew Brett wrote: Hi, On Mon, Apr 2, 2012 at 5:38 PM, Val Kalatsky wrote: > Both results are correct. > There are 2 factors that make the results look different: > 1) The order: the 2nd eigenvector of the numpy solution corresponds to the > 1st eigenvector of your solution, > note that the vectors are written in columns. > 2) The phase: an eigenvector can be multiplied by an arbitrary phase factor > with absolute value = 1. > As you can see this factor is -1 for the 2nd eigenvector > and -0.99887305445887753-0.047461785427773337j for the other one. Thanks for this answer; for my own benefit: Definition: A . v = L . v where A is the input matrix, L is an eigenvalue of A and v is an eigenvector of A. http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix In [63]: A = [[0.6+0.0j, -1.97537668-0.09386068j],[-1.97537668+0.09386068j, -0.6+0.0j]] In [64]: L, v = np.linalg.eig(A) In [66]: np.allclose(np.dot(A, v), L * v) Out[66]: True Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 3 03:08:48 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 3 Apr 2012 08:08:48 +0100 Subject: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib In-Reply-To: References: <38410503-5EB8-484E-B778-4BFFF558615D@continuum.io> <35298DC7-58D4-4505-81E8-2EAAB7FD12AA@continuum.io> <665E0D44-9323-4F79-832D-D0C3C2B2B73F@continuum.io> Message-ID: On Mon, Apr 2, 2012 at 7:14 PM, Tim Cera wrote: >> >> I think the suggestion is pad(a, 5, mode='mean'), which would be >> consistent with common numpy signatures. The mode keyword should probably >> have a default, something commonly used. I'd suggest 'mean', Nathaniel >> suggests 'zero', I think either would be fine. > > I can't type fast enough. ?:-) ?I should say that I can't type faster than > Travis since he has already responded.... > > Currently that '5' in the example above is the keyword argument 'pad_width' > which defaults to 1. ?So really the only argument then is 'a'? ?Everything > else is keywords? ?I missed that in the discussion and I am not sure that it > is a good idea. In fact as I am typing this I am thinking that we should > have pad_width as an argument. ?I hate to rely on this, because it tends to > get overused, but 'Explicit is better than implicit.' > > 'pad(a)' would carry a lot of implicit baggage that would mean it would be > very difficult to figure out what was going on if reading someone else's > code. ?Someone unfamiliar with the pad routine must consult the > documentation to figure out what 'pad(a)' meant whereas "pad(a, 'mean', 1)", > regardless of the order of the arguments, would actually read pretty well. > > I defer to a 'consensus' - whatever that might mean, but I am actually > thinking that the input array, mode/method, and the pad_width should be > arguments. ?The order of the arguments ?- I don't care. > > I realize that this thread is around 26 messages long now, but if everyone > who is interested in this could weigh in one more time about this one issue. > ?To minimize discussion on the list, you can add a comment to the pull > request at?https://github.com/numpy/numpy/pull/242 I guess I'll say def pad(arr, width, mode="constant", **kwargs): Or, if we don't want to have a default argument for mode (and maybe we don't -- my suggestion of giving it a default was partly based on the assumption that it was pretty obvious what the default should be!), then I'm indifferent between def pad(arr, width, mode, **kwargs): def pad(arr, mode, width, **kwargs): I definitely don't think width should have a default. -- Nathaniel From hongbin_zhang82 at hotmail.com Tue Apr 3 03:55:06 2012 From: hongbin_zhang82 at hotmail.com (Hongbin Zhang) Date: Tue, 3 Apr 2012 15:55:06 +0800 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: , , , , , , , Message-ID: Dears, Though it might sounds strange, but the eigenvectors of my 2X2 matrix is rather different if I get it calculated in a loop over many other similar matrices: for instance: matrix: [[ 0.60000000+0.j -1.97537668-0.09386068j] [-1.97537668+0.09386068j -0.60000000+0.j ]] eigenvals: [-2.06662112 2.06662112] eigenvects: [[ 0.59568071+0.j 0.80231613-0.03812232j] [-0.80322132+0.j 0.59500941-0.02827207j]] In this case, the elements in the first column of the eigenvectors are real. In the fortran code, such transformation can be easily done by dividing all the elements in the i-th row by EV_{i1}/abs(EV_{i1}, where EV_{i1} denotes the first element in the i-th row. Same can be performed column-wise if it is intended. In this way, at least for the moment, I could get the same eigenvectors for the same complex matrix by Python and Fortran. I do not know whether this is the solution, but I hope this would work. Cheers, Hongbin Ad hoc, ad loc and quid pro quo --- Jeremy Hilary Boob From: hongbin_zhang82 at hotmail.com To: numpy-discussion at scipy.org Date: Tue, 3 Apr 2012 15:02:18 +0800 Subject: Re: [Numpy-discussion] One question about the numpy.linalg.eig() routine Hej Val, Thank you very much for your replies. Yes, I know that both eigenvectors are correct while they are indeed related to each other by unitary transformations (unitary matrices). Actually, what I am trying to do is to evaluate the Berry phase which is closely related to the gauge chosen. It is okay to apply an arbitrary phase to the eigenvectors, while to get the (meaningful) physical quantity the phase should be consistent for all the other eigenvectors. To my understanding, if I run both Fortran and python on the same computer, they should have the same phase (that is the arbitrary phase is computer-dependent). Maybe some additional "rotations" have been performed in python, but should this be written/commented somewhere in the man page? I will try to fix this by performing additional rotation to make the diagonal elements real and check whether this is the solution or not. Thank you all again, a nd of course more insightful suggestions are welcome. Regards, Hongbin Ad hoc, ad loc and quid pro quo &n bsp; --- Jeremy Hilary Boob Date: Mon, 2 Apr 2012 22:19:55 -0500 From: kalatsky at gmail.com To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] One question about the numpy.linalg.eig() routine BTW this extra degree of freedom can be used to "rotate" the eigenvectors along the unit circle (multiplication by exp(j*phi)). To those of physical inclinations it should remind of gauge fixing (vector potential in EM/QM). These "rotations" can be used to make one (any) non-zero component of each eigenvector be positive real number. Finally to the point: it seems that numpy.linalg.eig uses these "rotations" to turn the diagonal elements in the eigenvector matrix to real positive numbers, that's why the numpy solutions looks neat. Val PS Probably nobody cares to know, but the phase factor I gave in my 1st email should be negated: 0.99887305445887753+0.047461785427773337j On Mon, Apr 2, 2012 at 8:53 PM, Matthew Brett wrote: Hi, On Mon, Apr 2, 2012 at 5:38 PM, Val Kalatsky wrote: > Both results are correct. > There are 2 factors that make the results look different: > 1) The order: the 2nd eigenvector of the numpy solution corresponds to the > 1st eigenvector of your solution, > note that the vectors are written in columns. > 2) The phase: an eigenvector can be multiplied by an arbitrary phase factor > with absolute value = 1. > As you can see this factor is -1 for the 2nd eigenvector > and -0.99887305445887753-0.047461785427773337j for the other one. Thanks for this answer; for my own benefit: Definition: A . v = L . v where A is the input matrix, L is an eigenvalue of A and v is an eigenvector of A. http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix In [63]: A = [[0.6+0.0j, -1.97537668-0.09386068j],[-1.97537668+0.09386068j, -0.6+0.0j]] In [64]: L, v = np.linalg.eig(A) In [66]: np.allclose(np.dot(A, v), L * v) Out[66]: True Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From holgerherrlich05 at arcor.de Tue Apr 3 09:06:09 2012 From: holgerherrlich05 at arcor.de (Holger Herrlich) Date: Tue, 03 Apr 2012 15:06:09 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ Message-ID: <4F7AF5C1.5090604@arcor.de> Hi, I plan to migrate core classes of an application from Python to C++ using SWIG, while still the user interface being Python. I also plan to further use NumPy's ndarrays. The application's core classes will create the ndarrays and make calculations. The user interface (Python) finally receives it. C++ OOP features will be deployed. What general ways to work with NumPy ndarrays in C++ are here? I know of boost.python so far. Regards Holger From maggie.mari at continuum.io Tue Apr 3 10:32:53 2012 From: maggie.mari at continuum.io (Maggie Mari) Date: Tue, 03 Apr 2012 09:32:53 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> Message-ID: <4F7B0A15.30509@continuum.io> On 4/1/12 6:02 AM, Ralf Gommers wrote: > The interface looks good, but to get a feeling for how this would > really work out I think admin rights are necessary. Then we can try > out the command window (mass editing of issues), the rest API, etc. > Could you send those out off-list? Hi Ralf, I have added you to the admin group. Let me know if you have any trouble. Who else should be added? Thanks, Maggie From boogaloojb at yahoo.fr Tue Apr 3 11:35:01 2012 From: boogaloojb at yahoo.fr (Jean-Baptiste Rudant) Date: Tue, 3 Apr 2012 16:35:01 +0100 (BST) Subject: [Numpy-discussion] (no subject) Message-ID: <1333467301.73186.YahooMailMobile@web28510.mail.ukl.yahoo.com> http://dmjmultimedia.com/components/com_jcomments/tpl/default/jrklre.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Apr 3 12:48:06 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 3 Apr 2012 09:48:06 -0700 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7AF5C1.5090604@arcor.de> References: <4F7AF5C1.5090604@arcor.de> Message-ID: On Tue, Apr 3, 2012 at 6:06 AM, Holger Herrlich > Hi, I plan to migrate core classes of an application from Python to C++ > using SWIG, if you're using SWIG, you may want the numpy.i SWIG interface files, they can be handy. but I probably wouldn't use SWIG, unless: - you are already a SWIG master - you are wrapping a substantial library that will use a lot of the same constructs (i.e can re-use the same *.i files a lot) - you want to use SWIG to wrap the same library for multipel languages. > The application's core classes will create the ndarrays and make > calculations. The user interface (Python) finally receives it. C++ OOP > features will be deployed. > > What general ways to work with NumPy ndarrays in C++ are here? I'd take a good look at Cython -- while not all that mature for C++, it does support the basics, and makes the transition between C/C++ and python very smooth -- and handles ndarrays out of the box. If your code ony needs to be driven by Python (and not used as a C++ lib on its own), I'd tend to: - create your ndarrays in Python or Cython. - write your C++ to work with "bare" pointers -- i.e C arrays. (also take a look at the new Cython memory views" -- they me your best bet.) It would be nice to have a clean C++ wrapper around ndarrays, but that doesn't exist yet (is there a good reason for that?) you could also probably get one of the C++ array libs to work well, if it shares a memory model with ndarray (which is likely, at least in special cases: - Blitz++ - ublas - others??? > I know of > boost.python so far. I've never used boost.python, but it's always seemed to me to be kind of heavy weight and not all that well maintained [1] -- but don't take my word for it! (there are boost arrays that may be useful) -Chris [1] from what seems to be the most recent docs: http://www.boost.org/doc/libs/1_49_0/libs/python/doc/v2/numeric.html """ Provides access to the array types of Numerical Python's Numeric and NumArray modules. """ The days of Numeric an Numarray are long gone! It may only be the docs that are that our of date, but.... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From talljimbo at gmail.com Tue Apr 3 13:33:13 2012 From: talljimbo at gmail.com (Jim Bosch) Date: Tue, 03 Apr 2012 13:33:13 -0400 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> Message-ID: <4F7B3459.7060409@gmail.com> On 04/03/2012 12:48 PM, Chris Barker wrote: > On Tue, Apr 3, 2012 at 6:06 AM, Holger Herrlich > >> I know of >> boost.python so far. > > I've never used boost.python, but it's always seemed to me to be kind > of heavy weight and not all that well maintained [1] > > -- but don't take my word for it! > > (there are boost arrays that may be useful) > > -Chris > > [1] from what seems to be the most recent docs: > > http://www.boost.org/doc/libs/1_49_0/libs/python/doc/v2/numeric.html > > """ > Provides access to the array types of Numerical Python's Numeric and > NumArray modules. > """ > > The days of Numeric an Numarray are long gone! It may only be the docs > that are that our of date, but.... > > I'm a big fan of Boost.Python, and I'd strongly recommend it over SWIG if you have anything remotely complex in your C++ (though I don't know much about Cython, and it may well be better in this case). That said, it's also very true that Boost.Python hasn't seen much in the way of active development aside from bug and compatibility fixes for years. So the Numeric interface in the Boost.Python main library is indeed way out of date, and not very useful. But there is a very nice extension library in the Boost Sandbox: https://svn.boost.org/svn/boost/sandbox/numpy/ or (equivalently) on GitHub: https://github.com/ndarray/Boost.NumPy Disclosure: I'm the main author. And while we've put a lot of effort into making this work well and I use it quite a bit myself, it's not nearly as battle-tested (especially on non-Unix platforms) as many of the alternatives. Good luck! Jim From erin.sheldon at gmail.com Tue Apr 3 13:42:50 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Tue, 03 Apr 2012 13:42:50 -0400 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7AF5C1.5090604@arcor.de> References: <4F7AF5C1.5090604@arcor.de> Message-ID: <1333474499-sup-7712@rohan> Excerpts from Holger Herrlich's message of Tue Apr 03 09:06:09 -0400 2012: > > Hi, I plan to migrate core classes of an application from Python to C++ > using SWIG, while still the user interface being Python. I also plan to > further use NumPy's ndarrays. > > The application's core classes will create the ndarrays and make > calculations. The user interface (Python) finally receives it. C++ OOP > features will be deployed. > > What general ways to work with NumPy ndarrays in C++ are here? I know of > boost.python so far. Hi Holger - I put together some header-only classes for this back when I used to do a lot of C++ and numpy. They are part of the "esutil" package but you could actually just pull them out and use them http://code.google.com/p/esutil/ The first is a template class for numpy arrays which can create and import arrays and keeps track of the reference counts http://code.google.com/p/esutil/source/browse/trunk/esutil/include/NumpyVector.h The second is similar but for void* vectors so the type can be determined at runtime http://code.google.com/p/esutil/source/browse/trunk/esutil/include/NumpyVoidVector.h There is also one for record arrays http://code.google.com/p/esutil/source/browse/trunk/esutil/include/NumpyRecords.h Hope these are useful or can give you some ideas. -e -- Erin Scott Sheldon Brookhaven National Laboratory From kalatsky at gmail.com Tue Apr 3 14:09:59 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Tue, 3 Apr 2012 13:09:59 -0500 Subject: [Numpy-discussion] One question about the numpy.linalg.eig() routine In-Reply-To: References: Message-ID: Interesting. I happen to know a little bit about Berry's phase http://keck.ucsf.edu/~kalatsky/publications/PRL1998_BerryPhaseForLargeSpins.pdf http://keck.ucsf.edu/~kalatsky/publications/PRA1999_SpectraOfLargeSpins-General.pdf The latter one knocks out all point groups. Probably you want to do something different, I cared about eigenvalues only (BTW my Hamiltonians were carefully crafted). Cheers Val PS I doubt anybody on this list cares to hear more about Berry's phase, should take this discussion off-line 2012/4/3 Hongbin Zhang > Hej Val, > > Thank you very much for your replies. > > Yes, I know that both eigenvectors are correct while they are indeed > related > to each other by unitary transformations (unitary matrices). > > Actually, what I am trying to do is to evaluate the Berry phase which is > closely related to the gauge chosen. It is okay to apply an arbitrary > phase to the eigenvectors, while to get the (meaningful) physical quantity > the phase should be consistent for all the other eigenvectors. > > To my understanding, if I run both Fortran and python on the same computer, > they should have the same phase (that is the arbitrary phase is > computer-dependent). Maybe some additional "rotations" have been performed > in > python, > but should this be written/commented somewhere in the man page? > > I will try to fix this by performing additional rotation to make the > diagonal > elements real and check whether this is the solution or not. > > Thank you all again, a nd of course more insightful suggestions are > welcome. > > Regards, > > > Hongbin > > > > Ad hoc, ad loc > and quid pro quo > > &n bsp; > --- Jeremy Hilary Boob > > > ------------------------------ > Date: Mon, 2 Apr 2012 22:19:55 -0500 > From: kalatsky at gmail.com > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] One question about the numpy.linalg.eig() > routine > > > BTW this extra degree of freedom can be used to "rotate" the eigenvectors > along the unit circle (multiplication by exp(j*phi)). To those of physical > inclinations > it should remind of gauge fixing (vector potential in EM/QM). > These "rotations" can be used to make one (any) non-zero component of each > eigenvector be positive real number. > Finally to the point: it seems that numpy.linalg.eig uses these > "rotations" to turn the > diagonal elements in the eigenvector matrix to real positive numbers, > that's why the numpy solutions looks neat. > Val > > PS Probably nobody cares to know, but the phase factor I gave in my 1st > email should be negated: > 0.99887305445887753+0.047461785427773337j > > On Mon, Apr 2, 2012 at 8:53 PM, Matthew Brett wrote: > > Hi, > > On Mon, Apr 2, 2012 at 5:38 PM, Val Kalatsky wrote: > > Both results are correct. > > There are 2 factors that make the results look different: > > 1) The order: the 2nd eigenvector of the numpy solution corresponds to > the > > 1st eigenvector of your solution, > > note that the vectors are written in columns. > > 2) The phase: an eigenvector can be multiplied by an arbitrary phase > factor > > with absolute value = 1. > > As you can see this factor is -1 for the 2nd eigenvector > > and -0.99887305445887753-0.047461785427773337j for the other one. > > Thanks for this answer; for my own benefit: > > Definition: A . v = L . v where A is the input matrix, L is an > eigenvalue of A and v is an eigenvector of A. > > http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix > > In [63]: A = [[0.6+0.0j, > -1.97537668-0.09386068j],[-1.97537668+0.09386068j, -0.6+0.0j]] > > In [64]: L, v = np.linalg.eig(A) > > In [66]: np.allclose(np.dot(A, v), L * v) > Out[66]: True > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ NumPy-Discussion mailing > list NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Apr 3 15:21:10 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 3 Apr 2012 21:21:10 +0200 Subject: [Numpy-discussion] Trac configuration tweak In-Reply-To: References: Message-ID: On Mon, Apr 2, 2012 at 11:35 PM, Travis Oliphant wrote: > Sorry, I saw the cross-posting to the NumPy list and wondered if we were > on the same page. > > I don't know of any plans to migrate SciPy Trac at this time: perhaps > later. > > At this time maybe, but I was assuming that if the Numpy migration works out well, Scipy would follow. Same for other supporting tools like a CI server. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wfspotz at sandia.gov Tue Apr 3 15:55:13 2012 From: wfspotz at sandia.gov (Bill Spotz) Date: Tue, 3 Apr 2012 13:55:13 -0600 Subject: [Numpy-discussion] [EXTERNAL] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7AF5C1.5090604@arcor.de> References: <4F7AF5C1.5090604@arcor.de> Message-ID: Holger, SWIG can read C or C++ header files and use them to generate wrapper interfaces for a long list of scripting languages. It sounds to me like you want to go the other direction -- i.e. you have a code prototyped in python and you want to convert core kernels to C++, perhaps to improve efficiency? Do I have that right? If so, then SWIG is not your tool. If efficiency is what you are after, then Cython could work really well. You would start with your existing python code and rename appropriate files to become first draft Cython code -- it should compile right out of the box. You could then start adding efficiencies (typed method arguments, for example). The end result would be Cython, though, not C++. If C++ is a requirement, it sounds like Jim's numpy extension to boost.python might be your best bet. My biggest issue with boost is the heavy templating resulting in nearly indecipherable compiler error messages. -Bill On Apr 3, 2012, at 7:06 AM, Holger Herrlich wrote: > > Hi, I plan to migrate core classes of an application from Python to C++ > using SWIG, while still the user interface being Python. I also plan to > further use NumPy's ndarrays. > > The application's core classes will create the ndarrays and make > calculations. The user interface (Python) finally receives it. C++ OOP > features will be deployed. > > What general ways to work with NumPy ndarrays in C++ are here? I know of > boost.python so far. > > Regards Holger > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From nouiz at nouiz.org Tue Apr 3 16:10:28 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 3 Apr 2012 16:10:28 -0400 Subject: [Numpy-discussion] numpy.sum(..., keepdims=False) Message-ID: Hi, Someone told me that on this page, there was a new parameter to numpy.sum: keepdims=False http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html Currently the doc don't build correctly this page. Can someone fix this? He gived the link to the google cache that showed it, but the google cache was just replaced by a newer version. I would like to add this parameter to Theano. So my question is, will the interface change or is it stable? The new parameter will make the output have the same number of dimensions as the inputs. The shape will be one on the summed dimensions gived by the axis parameter. thanks Fr?d?ric From ralf.gommers at googlemail.com Tue Apr 3 17:18:48 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 3 Apr 2012 23:18:48 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <4F7B0A15.30509@continuum.io> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> Message-ID: On Tue, Apr 3, 2012 at 4:32 PM, Maggie Mari wrote: > On 4/1/12 6:02 AM, Ralf Gommers wrote: > > The interface looks good, but to get a feeling for how this would > > really work out I think admin rights are necessary. Then we can try > > out the command window (mass editing of issues), the rest API, etc. > > Could you send those out off-list? > > Hi Ralf, > > I have added you to the admin group. Let me know if you have any > trouble. Who else should be added? > Thanks Maggie. Here some first impressions. The good: - It's responsive! - It remembers my preferences (view type, # of issues per page, etc.) - Editing multiple issues with the command window is easy. - Search and filter functionality is powerful The bad: - Multiple projects are supported, but issues are then really mixed. The way this works doesn't look very useful for combined admin of numpy/scipy trackers. - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. - Plain text attachments (.txt, .diff, .patch) can't be viewed, only downloaded. - No direct VCS integration, only via Teamcity (not set up, so can't evaluate). - No useful default views as in Trac (http://projects.scipy.org/scipy/report ). Overall, I have to say that I'm not convinced yet. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdroe at stsci.edu Tue Apr 3 17:56:28 2012 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 3 Apr 2012 17:56:28 -0400 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> Message-ID: <4F7B720C.8010002@stsci.edu> On 04/03/2012 12:48 PM, Chris Barker wrote: > It would be nice to have a clean C++ wrapper around ndarrays, but that > doesn't exist yet (is there a good reason for that?) Check out: http://code.google.com/p/numpy-boost/ Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Tue Apr 3 19:45:50 2012 From: srean.list at gmail.com (srean) Date: Tue, 3 Apr 2012 18:45:50 -0500 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7B720C.8010002@stsci.edu> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> Message-ID: This makes me ask something that I always wanted to know: why is weave not the preferred or encouraged way ? Is it because no developer has interest in maintaining it or is it too onerous to maintain ? I do not know enough of its internals to guess an answer. I think it would be fair to say that weave has languished a bit over the years. What I like about weave is that even when I drop into the C++ mode I can pretty much use the same numpy'ish syntax and with no overhead of calling back into the numpy c functions. From the sourceforge forum it seems the new Blitz++ is quite competitive with intel fortran in SIMD vectorization as well, which does sound attractive. Would be delighted if development on weave catches up again. From pierre.haessig at crans.org Wed Apr 4 05:37:19 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 04 Apr 2012 11:37:19 +0200 Subject: [Numpy-discussion] numpy.sum(..., keepdims=False) In-Reply-To: References: Message-ID: <4F7C164F.3060506@crans.org> Hi, Le 03/04/2012 22:10, Fr?d?ric Bastien a ?crit : > I would like to add this parameter to Theano. So my question is, will > the interface change or is it stable? I don't know for the stability, but for the existence of this new parameter: https://github.com/numpy/numpy/blob/master/numpy/core/fromnumeric.py looking at "def sum(...)" it seems the keepdims=False parameter is here and was introduced 7 months ago by Mark Wiebe and Charles Harris. The docstring indeed says : keepdims : bool, optional If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original `arr`. The commit message also mentions the skipna parameter, which is part of the overall NA implementation which is indeed tagged as somehow experimental (if I'm correct ! ), but I would assume that the keepdims=False parameter is an orthogonal issue. Hopefully somebody can give you a more precise answer ! Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From pierre.haessig at crans.org Wed Apr 4 06:10:35 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 04 Apr 2012 12:10:35 +0200 Subject: [Numpy-discussion] numpy doc for percentile function Message-ID: <4F7C1E1B.3020403@crans.org> Hi, I'm looking for the entry point in Numpy doc for the percentile function. I'm assuming it should sit in routines.statistics but do not see it : http://docs.scipy.org/doc/numpy/reference/routines.statistics.html Am I missing something ? If indeed the percentile entry should be added, do you agree it could be added to the "Histogram" section ? (and "Histogram" would become "Histograms and percentiles") Also, as Fr?d?ric Bastien pointed out, I feel that the current doc build is broken (especially the links :-( ) Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From jsseabold at gmail.com Wed Apr 4 09:53:07 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 4 Apr 2012 09:53:07 -0400 Subject: [Numpy-discussion] numpy doc for percentile function In-Reply-To: <4F7C1E1B.3020403@crans.org> References: <4F7C1E1B.3020403@crans.org> Message-ID: On Wed, Apr 4, 2012 at 6:10 AM, Pierre Haessig wrote: > Hi, > > I'm looking for the entry point in Numpy doc for the percentile function. > I'm assuming it should sit in routines.statistics but do not see it : > http://docs.scipy.org/doc/numpy/reference/routines.statistics.html > > I don't see it in the docs either. > Am I missing something ? If indeed the percentile entry should be added, > do you agree it could be added to the "Histogram" section ? (and > "Histogram" would become "Histograms and percentiles") > > IMO it should go under the extremal values header and this should be changed to order statistics. > Also, as Fr?d?ric Bastien pointed out, I feel that the current doc build > is broken (especially the links :-( ) > > Indeed. It does look that way. It is the same on my local build. Perhaps this deserves a separate message. They show up here. http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.statistics.rst/#routines-statistics Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Apr 4 14:34:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 04 Apr 2012 11:34:13 -0700 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> Message-ID: <4F7C9425.4080807@astro.uio.no> On 04/03/2012 04:45 PM, srean wrote: > This makes me ask something that I always wanted to know: why is weave > not the preferred or encouraged way ? > > Is it because no developer has interest in maintaining it or is it too > onerous to maintain ? I do not know enough of its internals to guess > an answer. I think it would be fair to say that weave has languished a > bit over the years. I think the story is that Cython overlaps enough with Weave that Weave doesn't get any new users or developers. Which isn't to say that Cython is always superior to the Weave approach (for one thing, embedding Cython code in Python source code files could have been a better experience), just that it overlaps enough, and since it has I honestly don't believe Weave has a chance of getting resurrected from the dead -- my bets for the future are on Cython, Travis' numba, and perhaps some combination or amalgamation of the two (note that I'm a Cython dev and so rather biased). > What I like about weave is that even when I drop into the C++ mode I > can pretty much use the same numpy'ish syntax and with no overhead of > calling back into the numpy c functions. From the sourceforge forum it > seems the new Blitz++ is quite competitive with intel fortran in SIMD > vectorization as well, which does sound attractive. Cython seems likely to be pushed further in this area over the next half year so that it can grow up to become more of a Fortran competitor. Dag From chris.barker at noaa.gov Wed Apr 4 15:18:22 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 4 Apr 2012 12:18:22 -0700 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> Message-ID: On Tue, Apr 3, 2012 at 4:45 PM, srean wrote: > From the sourceforge forum it > seems the new Blitz++ is quite competitive with intel fortran in SIMD > vectorization as well, which does sound attractive. you could write Blitz++ code, and call it from Cython. That may be a bit klunky at this point, but I'm sure it could be streamlined (at least for a subset of Blitz++ arrays). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From gael.varoquaux at normalesup.org Wed Apr 4 15:34:46 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 4 Apr 2012 21:34:46 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7C9425.4080807@astro.uio.no> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <4F7C9425.4080807@astro.uio.no> Message-ID: <20120404193446.GB15811@phare.normalesup.org> On Wed, Apr 04, 2012 at 11:34:13AM -0700, Dag Sverre Seljebotn wrote: > On 04/03/2012 04:45 PM, srean wrote: > > This makes me ask something that I always wanted to know: why is weave > > not the preferred or encouraged way ? > > Is it because no developer has interest in maintaining it or is it too > > onerous to maintain ? I do not know enough of its internals to guess > > an answer. I think it would be fair to say that weave has languished a > > bit over the years. > I think the story is that Cython overlaps enough with Weave that Weave > doesn't get any new users or developers. One big issue that I had with weave is that it compile on the fly. As a result, it makes for very non-distributable software (requires a compiler and the development headers installed), and leads to problems in the long run. Gael From bryanv at continuum.io Wed Apr 4 15:41:47 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 04 Apr 2012 14:41:47 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> Message-ID: <4F7CA3FB.40803@continuum.io> On 4/3/12 4:18 PM, Ralf Gommers wrote: > The bad: > - Multiple projects are supported, but issues are then really mixed. > The way this works doesn't look very useful for combined admin of > numpy/scipy trackers. > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > downloaded. > - No direct VCS integration, only via Teamcity (not set up, so can't > evaluate). > - No useful default views as in Trac > (http://projects.scipy.org/scipy/report). Ralf, I don't know about most of these issues offhand, but it does seem like youtrack offers github integration, in the form of being able to issue commands to youtrack through git commits (is that the kind of integration you are looking for?) http://confluence.jetbrains.net/display/YTD3/GitHub+Integration http://blogs.jetbrains.com/youtrack/tag/github-integration/ Bryan From srean.list at gmail.com Wed Apr 4 15:55:25 2012 From: srean.list at gmail.com (srean) Date: Wed, 4 Apr 2012 14:55:25 -0500 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <20120404193446.GB15811@phare.normalesup.org> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <4F7C9425.4080807@astro.uio.no> <20120404193446.GB15811@phare.normalesup.org> Message-ID: >> I think the story is that Cython overlaps enough with Weave that Weave >> doesn't get any new users or developers. > > One big issue that I had with weave is that it compile on the fly. As a > result, it makes for very non-distributable software (requires a compiler > and the development headers installed), and leads to problems in the long > run. > > Gael I do not know much Cython, except for the fact that it is out there and what it is supposed to do., but wouldnt Cython need a compiler too ? I imagine distributing Cython based code would incur similar amounts of schlep. But yes, you raise a valid point. It does cause annoyances. One that I have faced is with running the same code simultaneously over a mix of 32 bit and 64 bit machines. But this is because the source code hashing function does not take the architecture into account. Shouldnt be hard to fix. From chris.barker at noaa.gov Wed Apr 4 16:05:27 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 4 Apr 2012 13:05:27 -0700 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <4F7C9425.4080807@astro.uio.no> <20120404193446.GB15811@phare.normalesup.org> Message-ID: On Wed, Apr 4, 2012 at 12:55 PM, srean wrote: >> One big issue that I had with weave is that it compile on the fly. As a >> result, it makes for very non-distributable software (requires a compiler >> and the development headers installed), and leads to problems in the long > I do not know much Cython, except for the fact that it is out there > and what it is supposed to do., but wouldnt Cython need a compiler too > ? Yes, but at build-time, not run time. > I imagine distributing Cython based code would incur similar amounts > of schlep. if you distribute source, yes, but if you at least have the option of distributing binaries. (and distutils does make that fairly easy, for some value of fairly) And many folks distribute the Cython-build C code with a source distro, so the end user only needs to compile -- same as any other compiled Python extension. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From srean.list at gmail.com Wed Apr 4 16:21:12 2012 From: srean.list at gmail.com (srean) Date: Wed, 4 Apr 2012 15:21:12 -0500 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <4F7C9425.4080807@astro.uio.no> <20120404193446.GB15811@phare.normalesup.org> Message-ID: >> I do not know much Cython, except for the fact that it is out there >> and what it is supposed to do., but wouldnt Cython need a compiler too >> ? > > Yes, but at build-time, not run time. Ah! I see what you mean, or so I think. So the first time a weave based code runs, it builds, stores the code on disk and then executes. Whereas in Cython there is a clear separation of build vs execute. In fairness, though, it shouldnt be difficult to pre-empt a build with weave. But I imagine Cython has other advantages (and in my mind so does weave in certain restricted areas) Now I feel it will be great to marry the two, so that for the most part Cython does not need to call into the numpy api for array based operations but fall back on something weave like. May be sometime in future .... >> I imagine distributing Cython based code would incur similar amounts >> of schlep. > > if you distribute source, yes, but if you at least have the option of > distributing binaries. (and distutils does make that fairly easy, for > some value of fairly) Indeed. From d.warde.farley at gmail.com Wed Apr 4 16:59:22 2012 From: d.warde.farley at gmail.com (David Warde-Farley) Date: Wed, 4 Apr 2012 16:59:22 -0400 Subject: [Numpy-discussion] numpy.sum(..., keepdims=False) In-Reply-To: References: Message-ID: <1AF23DD0-0581-455E-A880-B8235EB1B4A6@gmail.com> On 2012-04-03, at 4:10 PM, Fr?d?ric Bastien wrote: > I would like to add this parameter to Theano. So my question is, will > the interface change or is it stable? To elaborate on what Fred said, in Theano we try to offer the same functions/methods as NumPy does with the same arguments and same behaviour, except operating on our symbolic proxies instead of actual NumPy arrays; we try to break compatibility only when absolutely necessary. It would be great if someone (probably Mark?) could chime in as to whether this is here to stay, regardless of the NA business. This also seems like a good candidate for a backport to subsequent NumPy 1.x releases rather than reserving it for 2.x. David From warren.weckesser at enthought.com Wed Apr 4 17:30:35 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 4 Apr 2012 16:30:35 -0500 Subject: [Numpy-discussion] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python Message-ID: SciPy 2012, the eleventh annual Conference on Scientific Computing with Python, will be held July 16?21, 2012, in Austin, Texas. At this conference, novel scientific applications and libraries related to data acquisition, analysis, dissemination and visualization using Python are presented. Attended by leading figures from both academia and industry, it is an excellent opportunity to experience the cutting edge of scientific software development. The conference is preceded by two days of tutorials, during which community experts provide training on several scientific Python packages. Following the main conference will be two days of coding sprints. We invite you to give a talk or present a poster at SciPy 2012. The list of topics that are appropriate for the conference includes (but is not limited to): - new Python libraries for science and engineering; - applications of Python in solving scientific or computational problems; - high performance, parallel and GPU computing with Python; - use of Python in science education. Specialized Tracks Two specialized tracks run in parallel to the main conference: - High Performance Computing with Python Whether your algorithm is distributed, threaded, memory intensive or latency bound, Python is making headway into the problem. We are looking for performance driven designs and applications in Python. Candidates include the use of Python within a parallel application, new architectures, and ways of making traditional applications execute more efficiently. - Visualization They say a picture is worth a thousand words--we?re interested in both! Python provides numerous visualization tools that allow scientists to show off their work, and we want to know about any new tools and techniques out there. Come show off your latest graphics, whether it?s an old library with a slick new feature, a new library out to challenge the status quo, or simply a beautiful result. Domain-specific Mini-symposia Mini-symposia on the following topics are also being organized: - Computational bioinformatics - Meteorology and climatology - Astronomy and astrophysics - Geophysics Talks, papers and posters We invite you to take part by submitting a talk or poster abstract. Instructions are on the conference website: http://conference.scipy.org/scipy2012/talks.php Selected talks are included as papers in the peer-reviewed conference proceedings, to be published online. Tutorials Tutorials will be given July 16?17. We invite instructors to submit proposals for half-day tutorials on topics relevant to scientific computing with Python. See http://conference.scipy.org/scipy2012/tutorials.php for information about submitting a tutorial proposal. To encourage tutorials of the highest quality, the instructor (or team of instructors) is given a $1,000 stipend for each half day tutorial. Student/Community Scholarships We anticipate providing funding for students and for active members of the SciPy community who otherwise might not be able to attend the conference. See http://conference.scipy.org/scipy2012/student.php for scholarship application guidelines. Be a Sponsor The SciPy conference could not run without the generous support of the institutions and corporations who share our enthusiasm for Python as a tool for science. Please consider sponsoring SciPy 2012. For more information, see http://conference.scipy.org/scipy2012/sponsor/index.php Important dates: Monday, April 30: Talk abstracts and tutorial proposals due. Monday, May 7: Accepted tutorials announced. Monday, May 13: Accepted talks announced. Monday, June 18: Early registration ends. (Price increases after this date.) Sunday, July 8: Online registration ends. Monday-Tuesday, July 16 - 17: Tutorials Wednesday-Thursday, July 18 - July 19: Conference Friday-Saturday, July 20 - July 21: Sprints We look forward to seeing you all in Austin this year! The SciPy 2012 Team http://conference.scipy.org/scipy2012/organizers.php -------------- next part -------------- An HTML attachment was scrubbed... URL: From boogaloojb at yahoo.fr Wed Apr 4 18:00:10 2012 From: boogaloojb at yahoo.fr (Jean-Baptiste Rudant) Date: Wed, 4 Apr 2012 23:00:10 +0100 (BST) Subject: [Numpy-discussion] (no subject) Message-ID: <1333576810.4681.YahooMailMobile@web28504.mail.ukl.yahoo.com> http://donnamaui.com/images/uploads/_thumbs/fjgvkd.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From apratap at lbl.gov Wed Apr 4 19:17:40 2012 From: apratap at lbl.gov (Abhishek Pratap) Date: Wed, 4 Apr 2012 16:17:40 -0700 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance Message-ID: Hey Guys I am new to both python and more so to numpy. I am trying to cluster close to a 900K points using DBSCAN algo. My input is a list of ~900k tuples each having two points (x,y) coordinates. I am converting them to numpy array and passing them to pdist method of scipy.spatial.distance for calculating distance between each point. Here is some size info on my numpy array shape of input array : (828575, 2) Size : 6872000 bytes I think the error has something to do with the default double dtype of numpy array of pdist function. I would appreciate if you could help me debug this. I am sure I overlooking some naive thing here See the traceback below. MemoryError Traceback (most recent call last) /house/homedirs/a/apratap/Dropbox/dev/ipython/ in () 36 37 print cleaned_senseBam ---> 38 cluster_pet_points_per_chromosome(sense_bamFile) /house/homedirs/a/apratap/Dropbox/dev/ipython/ in cluster_pet_points_per_chromosome(bamFile) 30 print 'Size of list points is %d' % sys.getsizeof(points) 31 print 'Size of numpy array is %d' % sys.getsizeof(points_array) ---> 32 cluster_points_DBSCAN(points_array) 33 #print points_array 34 /house/homedirs/a/apratap/Dropbox/dev/ipython/ in cluster_points_DBSCAN(data_numpy_array) 9 def cluster_points_DBSCAN(data_numpy_array): 10 #eucledian distance calculation ---> 11 D = distance.pdist(data_numpy_array) 12 S = distance.squareform(D) 13 H = 1 - S/np.max(S) /house/homedirs/a/apratap/playground/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/scipy/spatial/distance.pyc in pdist(X, metric, p, w, V, VI) 1155 1156 m, n = s -> 1157 dm = np.zeros((m * (m - 1) / 2,), dtype=np.double) 1158 1159 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm'] From chris.barker at noaa.gov Wed Apr 4 19:35:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 4 Apr 2012 16:35:55 -0700 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: References: Message-ID: On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap > close to a 900K points using DBSCAN algo. My input is a list of ~900k > tuples each having two points (x,y) coordinates. I am converting them > to numpy array and passing them to pdist method of > scipy.spatial.distance for calculating distance between each point. I think pdist creates an array that is: sum(range(num+points)) in size. That's going to be pretty darn big: 404999550000 elements I think that's about 3 terabytes: In [41]: sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8 Out[41]: 2.946759559563361 (for 64 bit floats) > I think the error has something to do with the default double dtype > of numpy array of pdist function. you *may* be able to get it to use float32 -- but as you can see, that probably won't help enough! You'll need a different approach! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From apratap at lbl.gov Wed Apr 4 19:41:51 2012 From: apratap at lbl.gov (Abhishek Pratap) Date: Wed, 4 Apr 2012 16:41:51 -0700 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: References: Message-ID: Thanks Chris. So I guess the question becomes how can I efficiently cluster 1 million x,y coordinates. -Abhi On Wed, Apr 4, 2012 at 4:35 PM, Chris Barker wrote: > On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap >> close to a 900K points using DBSCAN algo. My input is a list of ~900k >> tuples each having two points (x,y) coordinates. I am converting them >> to numpy array and passing them to pdist method of >> scipy.spatial.distance for calculating distance between each point. > > I think pdist creates an array that is: > > sum(range(num+points)) in size. > > That's going to be pretty darn big: > > 404999550000 elements > > I think that's about 3 terabytes: > > In [41]: sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8 > Out[41]: 2.946759559563361 > > (for 64 bit floats) > > >> I think the error has something to do with the default double dtype >> of numpy array of pdist function. > > you *may* be able to get it to use float32 -- but as you can see, that > probably won't help enough! > > You'll need a different approach! > > -Chris > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice > 7600 Sand Point Way NE ??(206) 526-6329?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gael.varoquaux at normalesup.org Thu Apr 5 01:33:51 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 5 Apr 2012 07:33:51 +0200 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: References: Message-ID: <20120405053351.GA1271@phare.normalesup.org> On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote: > Thanks Chris. So I guess the question becomes how can I efficiently > cluster 1 million x,y coordinates. Did you try the scikit-learn's implementation of DBSCAN: http://scikit-learn.org/stable/modules/clustering.html#dbscan ? I am not sure that it scales, but it's worth trying. Alternatively, the best way to cluster massive datasets is to use the mini-batch implementation of KMeans: http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means Hope this helps, Gael From chaoyuejoy at gmail.com Thu Apr 5 04:45:58 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 5 Apr 2012 10:45:58 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? Message-ID: Dear all, Is there a small bug in following? In [2]: b Out[2]: array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) In [3]: b.flatten(order='C') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /mnt/f/ in () TypeError: flatten() takes no keyword arguments order='F' gave tha same. cheers, chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Thu Apr 5 06:41:34 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 5 Apr 2012 06:41:34 -0400 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: References: Message-ID: It works for me, which version of numpy are you using? What do you get when you type help(b.flatten)? -=- Olivier Le 5 avril 2012 04:45, Chao YUE a ?crit : > Dear all, > > Is there a small bug in following? > > In [2]: b > Out[2]: > array([[ 0, 1, 2, 3, 4, 5], > [ 6, 7, 8, 9, 10, 11], > [12, 13, 14, 15, 16, 17], > [18, 19, 20, 21, 22, 23]]) > > > > In [3]: b.flatten(order='C') > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > > /mnt/f/ in () > > TypeError: flatten() takes no keyword arguments > > order='F' gave tha same. > > cheers, > > chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Apr 5 08:12:59 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 5 Apr 2012 14:12:59 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: References: Message-ID: Hi, I use 1.51. In [69]: np.__version__ Out[69]: '1.5.1' the help information seems OK. In [70]: b.flatten? Type: builtin_function_or_method Base Class: String Form: Namespace: Interactive Docstring: a.flatten(order='C') Return a copy of the array collapsed into one dimension. Parameters ---------- order : {'C', 'F'}, optional Whether to flatten in C (row-major) or Fortran (column-major) order. The default is 'C'. Returns ------- y : ndarray A copy of the input array, flattened to one dimension. See Also -------- ravel : Return a flattened array. flat : A 1-D flat iterator over the array. Examples -------- >>> a = np.array([[1,2], [3,4]]) >>> a.flatten() array([1, 2, 3, 4]) >>> a.flatten('F') array([1, 3, 2, 4]) cheers, Chao 2012/4/5 Olivier Delalleau > It works for me, which version of numpy are you using? > What do you get when you type help(b.flatten)? > > -=- Olivier > > Le 5 avril 2012 04:45, Chao YUE a ?crit : > >> Dear all, >> >> Is there a small bug in following? >> >> In [2]: b >> Out[2]: >> array([[ 0, 1, 2, 3, 4, 5], >> [ 6, 7, 8, 9, 10, 11], >> [12, 13, 14, 15, 16, 17], >> [18, 19, 20, 21, 22, 23]]) >> >> >> >> In [3]: b.flatten(order='C') >> >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> >> /mnt/f/ in () >> >> TypeError: flatten() takes no keyword arguments >> >> order='F' gave tha same. >> >> cheers, >> >> chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Thu Apr 5 09:00:34 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 5 Apr 2012 09:00:34 -0400 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: References: Message-ID: Ok, it looks weird indeed. I was using numpy 1.6.1 myself, not sure if it's a bug that's been fixed in 1.6. Try without the keyword argument (b.flatten('C')), see if at least that works. -=- Olivier Le 5 avril 2012 08:12, Chao YUE a ?crit : > Hi, > > I use 1.51. > In [69]: np.__version__ > Out[69]: '1.5.1' > > the help information seems OK. > > In [70]: b.flatten? > Type: builtin_function_or_method > Base Class: > String Form: 0xb5d4a58> > Namespace: Interactive > Docstring: > a.flatten(order='C') > > Return a copy of the array collapsed into one dimension. > > Parameters > ---------- > order : {'C', 'F'}, optional > Whether to flatten in C (row-major) or Fortran (column-major) > order. > The default is 'C'. > > Returns > ------- > y : ndarray > A copy of the input array, flattened to one dimension. > > See Also > -------- > ravel : Return a flattened array. > flat : A 1-D flat iterator over the array. > > Examples > -------- > >>> a = np.array([[1,2], [3,4]]) > >>> a.flatten() > array([1, 2, 3, 4]) > >>> a.flatten('F') > array([1, 3, 2, 4]) > > cheers, > > Chao > > > 2012/4/5 Olivier Delalleau > >> It works for me, which version of numpy are you using? >> What do you get when you type help(b.flatten)? >> >> -=- Olivier >> >> Le 5 avril 2012 04:45, Chao YUE a ?crit : >> >>> Dear all, >>> >>> Is there a small bug in following? >>> >>> In [2]: b >>> Out[2]: >>> array([[ 0, 1, 2, 3, 4, 5], >>> [ 6, 7, 8, 9, 10, 11], >>> [12, 13, 14, 15, 16, 17], >>> [18, 19, 20, 21, 22, 23]]) >>> >>> >>> >>> In [3]: b.flatten(order='C') >>> >>> --------------------------------------------------------------------------- >>> TypeError Traceback (most recent call >>> last) >>> >>> /mnt/f/ in () >>> >>> TypeError: flatten() takes no keyword arguments >>> >>> order='F' gave tha same. >>> >>> cheers, >>> >>> chao >>> >>> -- >>> >>> *********************************************************************************** >>> Chao YUE >>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>> UMR 1572 CEA-CNRS-UVSQ >>> Batiment 712 - Pe 119 >>> 91191 GIF Sur YVETTE Cedex >>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>> >>> ************************************************************************************ >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Thu Apr 5 10:39:08 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 05 Apr 2012 16:39:08 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: References: Message-ID: <4F7DAE8C.3030505@crans.org> Hi, Le 05/04/2012 15:00, Olivier Delalleau a ?crit : > Ok, it looks weird indeed. I was using numpy 1.6.1 myself, not sure if > it's a bug that's been fixed in 1.6. > Try without the keyword argument (b.flatten('C')), see if at least > that works. I can reproduce Chao's bug with my numpy 1.5. As you've just suggested, it runs fine when there is no keyword. In [1]: a= np.arange(10) In [2]: a.flatten('C') # Works well Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [3]: a.flatten(order='C') # Chao's bug --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/pierre/ in () ----> 1 a.flatten(order='C') TypeError: flatten() takes no keyword arguments In [4]: a.flatten? # indeed says there is a keyword order='C' In [5]: np.__version__ Out[5]: '1.5.1' Now if the bug is fixed in 1.6, there's nothing more to do than just wait for the update ! (Debian testing in my case) Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From apratap at lbl.gov Thu Apr 5 11:06:45 2012 From: apratap at lbl.gov (Abhishek Pratap) Date: Thu, 5 Apr 2012 08:06:45 -0700 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: <20120405053351.GA1271@phare.normalesup.org> References: <20120405053351.GA1271@phare.normalesup.org> Message-ID: Hi Gael The MemoryError exception I am getting is from using scikit's DBSCAN implementation. I can check mini-batch implementation of Kmeans. Best, -Abhi On Wed, Apr 4, 2012 at 10:33 PM, Gael Varoquaux wrote: > On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote: >> Thanks Chris. So I guess the question becomes how can I efficiently >> cluster 1 million x,y coordinates. > > Did you try the scikit-learn's implementation of DBSCAN: > http://scikit-learn.org/stable/modules/clustering.html#dbscan > ? I am not sure that it scales, but it's worth trying. > > Alternatively, the best way to cluster massive datasets is to use the > mini-batch implementation of KMeans: > http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means > > Hope this helps, > > Gael > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chaoyuejoy at gmail.com Thu Apr 5 11:17:15 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 5 Apr 2012 17:17:15 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: <4F7DAE8C.3030505@crans.org> References: <4F7DAE8C.3030505@crans.org> Message-ID: nice to know this. can also use b.transpose().flatten() to circumvent it. thanks, Chao 2012/4/5 Pierre Haessig > Hi, > > Le 05/04/2012 15:00, Olivier Delalleau a ?crit : > > Ok, it looks weird indeed. I was using numpy 1.6.1 myself, not sure if > > it's a bug that's been fixed in 1.6. > > Try without the keyword argument (b.flatten('C')), see if at least > > that works. > > I can reproduce Chao's bug with my numpy 1.5. > > As you've just suggested, it runs fine when there is no keyword. > > In [1]: a= np.arange(10) > > In [2]: a.flatten('C') # Works well > Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [3]: a.flatten(order='C') # Chao's bug > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > /home/pierre/ in () > ----> 1 a.flatten(order='C') > > TypeError: flatten() takes no keyword arguments > > In [4]: a.flatten? # indeed says there is a keyword order='C' > > In [5]: np.__version__ > Out[5]: '1.5.1' > > Now if the bug is fixed in 1.6, there's nothing more to do than just > wait for the update ! > (Debian testing in my case) > > Best, > Pierre > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu Apr 5 11:31:36 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 05 Apr 2012 11:31:36 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? Message-ID: I have an array of object. How can I apply attribute access to each element? I want to do, for example, np.all (u.some_attribute == 0) for all elements in u? From hugadams at gwmail.gwu.edu Thu Apr 5 11:41:23 2012 From: hugadams at gwmail.gwu.edu (Adam Hughes) Date: Thu, 5 Apr 2012 11:41:23 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? In-Reply-To: References: Message-ID: If you are storing objects, then can't you store them in a list and just do: for obj in objectlist: obj.attribute = value Or am I misunderstanding? On Thu, Apr 5, 2012 at 11:31 AM, Neal Becker wrote: > I have an array of object. > > How can I apply attribute access to each element? > > I want to do, for example, > np.all (u.some_attribute == 0) for all elements in u? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu Apr 5 11:45:58 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 05 Apr 2012 11:45:58 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? References: Message-ID: Adam Hughes wrote: > If you are storing objects, then can't you store them in a list and just do: > > for obj in objectlist: > obj.attribute = value > > Or am I misunderstanding? > It's multi-dimensional, and I wanted to avoid writing explicit loops. From shish at keba.be Thu Apr 5 11:57:56 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 5 Apr 2012 11:57:56 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? In-Reply-To: References: Message-ID: Le 5 avril 2012 11:45, Neal Becker a ?crit : > Adam Hughes wrote: > > > If you are storing objects, then can't you store them in a list and just > do: > > > > for obj in objectlist: > > obj.attribute = value > > > > Or am I misunderstanding? > > > > It's multi-dimensional, and I wanted to avoid writing explicit loops. > You can do: f = numpy.frompyfunc(lambda x: x.some_attribute == 0, 1, 1) Then f(array_of_objects_x) -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwatford at gmail.com Thu Apr 5 12:10:15 2012 From: kwatford at gmail.com (Ken Watford) Date: Thu, 5 Apr 2012 12:10:15 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? In-Reply-To: References: Message-ID: On Thu, Apr 5, 2012 at 11:57 AM, Olivier Delalleau wrote: > Le 5 avril 2012 11:45, Neal Becker a ?crit : > > You can do: > > f = numpy.frompyfunc(lambda x: x.some_attribute == 0, 1, 1) > > Then > f(array_of_objects_x) This is handy too: agetattr = numpy.frompyfunc(getattr, 2, 1) array_of_values = agetattr(array_of_objects, 'some_attribute') From ndbecker2 at gmail.com Thu Apr 5 12:50:17 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 05 Apr 2012 12:50:17 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? References: Message-ID: Ken Watford wrote: > On Thu, Apr 5, 2012 at 11:57 AM, Olivier Delalleau wrote: >> Le 5 avril 2012 11:45, Neal Becker a ?crit : >> >> You can do: >> >> f = numpy.frompyfunc(lambda x: x.some_attribute == 0, 1, 1) >> >> Then >> f(array_of_objects_x) > > This is handy too: > > agetattr = numpy.frompyfunc(getattr, 2, 1) > > array_of_values = agetattr(array_of_objects, 'some_attribute') I suppose for setitem something similar, except I don't think you can do that with lambda since lambda doesn't allow an assignment. From pierre.haessig at crans.org Thu Apr 5 12:53:39 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 05 Apr 2012 18:53:39 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: References: <4F7DAE8C.3030505@crans.org> Message-ID: <4F7DCE13.1020508@crans.org> Hi Chao, Le 05/04/2012 17:17, Chao YUE a ?crit : > nice to know this. can also use b.transpose().flatten() to circumvent it. Just a short remark : b.T is a shorcut for b.transpose() ;-) Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From pierre.haessig at crans.org Thu Apr 5 12:56:45 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 05 Apr 2012 18:56:45 +0200 Subject: [Numpy-discussion] small bug in ndarray.flatten()? In-Reply-To: <4F7DCE13.1020508@crans.org> References: <4F7DAE8C.3030505@crans.org> <4F7DCE13.1020508@crans.org> Message-ID: <4F7DCECD.5040601@crans.org> Sorry for the noise on the ML, I thougt I had made a private reply... -- Pierre Le 05/04/2012 18:53, Pierre Haessig a ?crit : > Hi Chao, > > Le 05/04/2012 17:17, Chao YUE a ?crit : >> nice to know this. can also use b.transpose().flatten() to circumvent it. > Just a short remark : b.T is a shorcut for b.transpose() ;-) > > Best, > Pierre > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From shish at keba.be Thu Apr 5 12:57:32 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 5 Apr 2012 12:57:32 -0400 Subject: [Numpy-discussion] apply 'getitem to each element of obj array? In-Reply-To: References: Message-ID: Le 5 avril 2012 12:50, Neal Becker a ?crit : > Ken Watford wrote: > > > On Thu, Apr 5, 2012 at 11:57 AM, Olivier Delalleau > wrote: > >> Le 5 avril 2012 11:45, Neal Becker a ?crit : > >> > >> You can do: > >> > >> f = numpy.frompyfunc(lambda x: x.some_attribute == 0, 1, 1) > >> > >> Then > >> f(array_of_objects_x) > > > > This is handy too: > > > > agetattr = numpy.frompyfunc(getattr, 2, 1) > > > > array_of_values = agetattr(array_of_objects, 'some_attribute') > > I suppose for setitem something similar, except I don't think you can do > that > with lambda since lambda doesn't allow an assignment. > You can call setattr in a lambda though, to bypass the assignment limitation. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu Apr 5 13:20:12 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 05 Apr 2012 13:20:12 -0400 Subject: [Numpy-discussion] don't understand nditer Message-ID: Along the lines of my question about apply getitem to each element... If I try to use nditer, I seem to run into trouble: for d in np.nditer (y, ['refs_ok'],['readwrite']): ....: y[...].w = 2 ....: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/nbecker/hn-8psk/ in () 1 for d in np.nditer (y, ['refs_ok'],['readwrite']): ----> 2 y[...].w = 2 3 AttributeError: 'numpy.ndarray' object has no attribute 'w' y is a 2D array of 'noop' class instances: class noop: pass y Out[75]: array([[<__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>], [<__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>, <__main__.noop instance at 0x4241098>]], dtype=object) Any idea how to do this setattr per element using nditer? From tsyu80 at gmail.com Thu Apr 5 15:52:03 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Thu, 5 Apr 2012 15:52:03 -0400 Subject: [Numpy-discussion] Slice specified axis Message-ID: Is there a way to slice an nd-array along a specified axis? It's easy to slice along a fixed axis, e.g.: axis = 0: >>> array[start:end] axis = 1: >>> array[:, start:end] ... But I need to do this inside of a function that accepts arrays of any dimension, and the user can operate on any axis of the array. My current solution looks like the following: >>> aslice = lambda axis, s, e: (slice(None),) * axis + (slice(s, e),) >>> array[aslice(axis, start, end)] which works, but I'm guessing that numpy has a more readable way of doing this that I've overlooked. Thanks, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From apratap at lbl.gov Thu Apr 5 16:05:01 2012 From: apratap at lbl.gov (Abhishek Pratap) Date: Thu, 5 Apr 2012 13:05:01 -0700 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: References: <20120405053351.GA1271@phare.normalesup.org> Message-ID: Also in my case I dont really have a good approximate on value of K in K-means. -A On Thu, Apr 5, 2012 at 8:06 AM, Abhishek Pratap wrote: > Hi Gael > > The MemoryError exception I am getting is from using scikit's DBSCAN > implementation. I can check mini-batch implementation of Kmeans. > > Best, > -Abhi > > On Wed, Apr 4, 2012 at 10:33 PM, Gael Varoquaux > wrote: >> On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote: >>> Thanks Chris. So I guess the question becomes how can I efficiently >>> cluster 1 million x,y coordinates. >> >> Did you try the scikit-learn's implementation of DBSCAN: >> http://scikit-learn.org/stable/modules/clustering.html#dbscan >> ? I am not sure that it scales, but it's worth trying. >> >> Alternatively, the best way to cluster massive datasets is to use the >> mini-batch implementation of KMeans: >> http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means >> >> Hope this helps, >> >> Gael >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From gael.varoquaux at normalesup.org Thu Apr 5 16:31:30 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 5 Apr 2012 22:31:30 +0200 Subject: [Numpy-discussion] MemoryError : with scipy.spatial.distance In-Reply-To: References: <20120405053351.GA1271@phare.normalesup.org> Message-ID: <20120405203130.GE26015@phare.normalesup.org> On Thu, Apr 05, 2012 at 01:05:01PM -0700, Abhishek Pratap wrote: > Also in my case I dont really have a good approximate on value of K in K-means. That's a hard problem, for which I have no answer, sorry :$ G From Tim.Whitcomb at nrlmry.navy.mil Thu Apr 5 18:57:49 2012 From: Tim.Whitcomb at nrlmry.navy.mil (Whitcomb, Mr. Tim) Date: Thu, 5 Apr 2012 15:57:49 -0700 Subject: [Numpy-discussion] Can't access NumPy documentation elements Message-ID: Good afternoon - we're having some issues here accessing the online documentation for the latest NumPy version: 1. Search for "numpy asarray" on Google 2. Top result is "numpy.asarray - NumPy v1.7.dev-72185d3 Manual (DRAFT)" (or just go directly to link) 3. Click link - arrive at http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html 4. Page has a title, links to previous/next topics, but no content. If I manually go to the NumPy docs page and follow the links down to "array manipulation routines", I see that there are a bunch of entries in the table that I remember as being links, but are just plain text currently. Did something go wrong with a build? Thanks, Tim [w] From pav at iki.fi Thu Apr 5 19:59:46 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 06 Apr 2012 01:59:46 +0200 Subject: [Numpy-discussion] Can't access NumPy documentation elements In-Reply-To: References: Message-ID: 06.04.2012 00:57, Whitcomb, Mr. Tim kirjoitti: [clip] > Did something go wrong with a build? Seems so. As a workaround, you can read the documentation of the released versions. -- Pauli Virtanen From wesmckinn at gmail.com Thu Apr 5 23:04:35 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 5 Apr 2012 23:04:35 -0400 Subject: [Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation Message-ID: dear all, I've routinely found that: 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing 2) Hand-coded Cython boolean indexing is many times faster than ndarray indexing 3) putmask is significantly faster than ndarray indexing For example, I stumbled on this tonight: straightforward cython code: def row_bool_subset(ndarray[float64_t, ndim=2] values, ndarray[uint8_t, cast=True] mask): cdef: Py_ssize_t i, j, n, k, pos = 0 ndarray[float64_t, ndim=2] out n, k = ( values).shape assert(n == len(mask)) out = np.empty((mask.sum(), k), dtype=np.float64) for i in range(n): if mask[i]: for j in range(k): out[pos, j] = values[i, j] pos += 1 return out In [1]: values = randn(1000000, 4) In [2]: mask = np.ones(1000000, dtype=bool) In [3]: import pandas._sandbox as sbx In [4]: result = sbx.row_bool_subset(values, mask) In [5]: timeit result = sbx.row_bool_subset(values, mask) 100 loops, best of 3: 9.58 ms per loop pure NumPy: In [6]: timeit values[mask] 10 loops, best of 3: 81.6 ms per loop Here's the kind of take performance problems that I routinely experience: In [12]: values = randn(1000000, 4) v In [13]: values.shape Out[13]: (1000000, 4) In [14]: indexer = np.random.permutation(1000000)[:500000] In [15]: timeit values[indexer] 10 loops, best of 3: 70.7 ms per loop In [16]: timeit values.take(indexer, axis=0) 100 loops, best of 3: 13.3 ms per loop When I can spare the time in the future I will personally work on these indexing routines in the C codebase, but I suspect that I'm not the only one adversely affected by these kinds of performance problems, and it might be worth thinking about a community effort to split up the work of retooling these methods to be more performant. We could use a tool like my vbench project (github.com/wesm/vbench) to make a list of the performance benchmarks of interest (like the ones above). Unfortunately I am too time constrained at least for the next 6 months to devote to a complete rewrite of the code in question. Possibly sometime in 2013 if no one has gotten to it yet, but this seems like someplace that we should be concerned about as the language performance wars continue to rage. - Wes From claumann at physics.harvard.edu Thu Apr 5 23:54:18 2012 From: claumann at physics.harvard.edu (Chris Laumann) Date: Thu, 5 Apr 2012 23:54:18 -0400 Subject: [Numpy-discussion] Bitwise operations and unsigned types Message-ID: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> Hi all- I've been trying to use numpy arrays of ints as arrays of bit fields and mostly this works fine. However, it seems that the bitwise_* ufuncs do not support unsigned integer dtypes: In [142]: np.uint64(5)&3 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/claumann/ in () ----> 1 np.uint64(5)&3 TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' This seems odd as unsigned ints are the most natural bitfields I can think of -- the sign bit is just confusing when doing bit manipulation. Python itself of course doesn't make much a distinction between ints, longs, unsigned etc. Is this a bug? Thanks, Chris -- Chris Laumann Sent with Sparrow (http://www.sparrowmailapp.com/?sig) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Fri Apr 6 00:58:20 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Thu, 5 Apr 2012 23:58:20 -0500 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: Message-ID: The only slicing short-cut I can think of is the Ellipsis object, but it's not going to help you much here. The alternatives that come to my mind are (1) manipulation of shape directly and (2) building a string and running eval on it. Your solution is better than (1), and (2) is a horrible hack, so your solution wins again. Cheers Val On Thu, Apr 5, 2012 at 2:52 PM, Tony Yu wrote: > Is there a way to slice an nd-array along a specified axis? It's easy to > slice along a fixed axis, e.g.: > > axis = 0: > >>> array[start:end] > > axis = 1: > >>> array[:, start:end] > ... > > But I need to do this inside of a function that accepts arrays of any > dimension, and the user can operate on any axis of the array. My current > solution looks like the following: > > >>> aslice = lambda axis, s, e: (slice(None),) * axis + (slice(s, e),) > >>> array[aslice(axis, start, end)] > > which works, but I'm guessing that numpy has a more readable way of doing > this that I've overlooked. > > Thanks, > -Tony > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Apr 6 01:16:34 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 6 Apr 2012 00:16:34 -0500 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> Message-ID: <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> Which version of NumPy are you using. This could be an artefact of the new casting rules. This used to work. So, yes, this is definitely a bug. -Travis On Apr 5, 2012, at 10:54 PM, Chris Laumann wrote: > Hi all- > > I've been trying to use numpy arrays of ints as arrays of bit fields and mostly this works fine. However, it seems that the bitwise_* ufuncs do not support unsigned integer dtypes: > > In [142]: np.uint64(5)&3 > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > /Users/claumann/ in () > ----> 1 np.uint64(5)&3 > > TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' > > This seems odd as unsigned ints are the most natural bitfields I can think of -- the sign bit is just confusing when doing bit manipulation. Python itself of course doesn't make much a distinction between ints, longs, unsigned etc. > > Is this a bug? > > Thanks, Chris > > -- > Chris Laumann > Sent with Sparrow > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 01:39:14 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 Apr 2012 23:39:14 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> Message-ID: On Thu, Apr 5, 2012 at 11:16 PM, Travis Oliphant wrote: > Which version of NumPy are you using. This could be an artefact of the > new casting rules. > > This used to work. So, yes, this is definitely a bug. > > It's because the '3' is treated as signed, so the uint64 needs to be cast to something of higher precision, of which there is none. You can either use uint64(3) or just stick to int64. I don't know if this used to work or not, mixing signed and unsigned has always led to higher precision in arithmetic operations, even (mistakenly in my opinion) promoting uint64(5) + 3 to lower precision float64. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 01:45:34 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 Apr 2012 23:45:34 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> Message-ID: On Thu, Apr 5, 2012 at 11:39 PM, Charles R Harris wrote: > > > On Thu, Apr 5, 2012 at 11:16 PM, Travis Oliphant wrote: > >> Which version of NumPy are you using. This could be an artefact of the >> new casting rules. >> >> This used to work. So, yes, this is definitely a bug. >> >> > It's because the '3' is treated as signed, so the uint64 needs to be cast > to something of higher precision, of which there is none. You can either > use uint64(3) or just stick to int64. I don't know if this used to work or > not, mixing signed and unsigned has always led to higher precision in > arithmetic operations, even (mistakenly in my opinion) promoting uint64(5) > + 3 to lower precision float64. > > In particular, in this case it is because two scalars are used. It works fine for arrays In [11]: ones(3, uint64) & 3 Out[11]: array([1, 1, 1], dtype=uint64) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Apr 6 01:57:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 6 Apr 2012 00:57:26 -0500 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> Message-ID: <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> As of 1.5.1 this worked: >>> numpy.__version__ 1.5.1 >>> numpy.uint64(5) & 3 1L So, this is a regression and a bug. It should be fixed so that it doesn't raise an error. I believe the scalars were special cased so that a raw 3 would not be interpreted as a signed int when it is clearly unsigned. The casting rules were well established over a long period. They had issues, but they should not have been changed like this in a 1.X release. Fortunately there are work-arounds and these issues arise only in corner cases, but we should strive to do better. -Travis On Apr 6, 2012, at 12:45 AM, Charles R Harris wrote: > > > On Thu, Apr 5, 2012 at 11:39 PM, Charles R Harris wrote: > > > On Thu, Apr 5, 2012 at 11:16 PM, Travis Oliphant wrote: > Which version of NumPy are you using. This could be an artefact of the new casting rules. > > This used to work. So, yes, this is definitely a bug. > > > It's because the '3' is treated as signed, so the uint64 needs to be cast to something of higher precision, of which there is none. You can either use uint64(3) or just stick to int64. I don't know if this used to work or not, mixing signed and unsigned has always led to higher precision in arithmetic operations, even (mistakenly in my opinion) promoting uint64(5) + 3 to lower precision float64. > > > In particular, in this case it is because two scalars are used. It works fine for arrays > > In [11]: ones(3, uint64) & 3 > Out[11]: array([1, 1, 1], dtype=uint64) > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 02:01:34 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2012 00:01:34 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> Message-ID: On Thu, Apr 5, 2012 at 11:57 PM, Travis Oliphant wrote: > As of 1.5.1 this worked: > > >>> numpy.__version__ > 1.5.1 > >>> numpy.uint64(5) & 3 > 1L > > > So, this is a regression and a bug. It should be fixed so that it > doesn't raise an error. I believe the scalars were special cased so that a > raw 3 would not be interpreted as a signed int when it is clearly unsigned. > The casting rules were well established over a long period. They had > issues, but they should not have been changed like this in a 1.X release. > > Fortunately there are work-arounds and these issues arise only in corner > cases, but we should strive to do better. > > I disagree, promoting to object kind of destroys the whole idea of bitwise operations. I think we *fixed* a bug. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Apr 6 02:19:06 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 6 Apr 2012 01:19:06 -0500 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> Message-ID: <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> On Apr 6, 2012, at 1:01 AM, Charles R Harris wrote: > > > On Thu, Apr 5, 2012 at 11:57 PM, Travis Oliphant wrote: > As of 1.5.1 this worked: > > >>> numpy.__version__ > 1.5.1 > >>> numpy.uint64(5) & 3 > 1L > > > So, this is a regression and a bug. It should be fixed so that it doesn't raise an error. I believe the scalars were special cased so that a raw 3 would not be interpreted as a signed int when it is clearly unsigned. The casting rules were well established over a long period. They had issues, but they should not have been changed like this in a 1.X release. > > Fortunately there are work-arounds and these issues arise only in corner cases, but we should strive to do better. > > > I disagree, promoting to object kind of destroys the whole idea of bitwise operations. I think we *fixed* a bug. That is an interesting point of view. I could see that point of view. But, was this discussed as a bug prior to this change occurring? I just heard from a very heavy user of NumPy that they are nervous about upgrading because of little changes like this one. I don't know if this particular issue would affect them or not, but I will re-iterate my view that we should be very careful of these kinds of changes. In this particular case, what should the behavior be? It would be ideal if the scalar math did not just re-use the array-math machinery. Let's say that scalars had their own bit-wise operator. What should the output of numpy.uint64(5) & 3 actually be? I think it should interpret the 3 as unsigned and perform the operation (but not promote to an object). -Travis > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 02:19:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2012 00:19:29 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> Message-ID: On Fri, Apr 6, 2012 at 12:01 AM, Charles R Harris wrote: > > > On Thu, Apr 5, 2012 at 11:57 PM, Travis Oliphant wrote: > >> As of 1.5.1 this worked: >> >> >>> numpy.__version__ >> 1.5.1 >> >>> numpy.uint64(5) & 3 >> 1L >> >> >> So, this is a regression and a bug. It should be fixed so that it >> doesn't raise an error. I believe the scalars were special cased so that a >> raw 3 would not be interpreted as a signed int when it is clearly unsigned. >> The casting rules were well established over a long period. They had >> issues, but they should not have been changed like this in a 1.X release. >> >> Fortunately there are work-arounds and these issues arise only in corner >> cases, but we should strive to do better. >> >> > I disagree, promoting to object kind of destroys the whole idea of bitwise > operations. I think we *fixed* a bug. > Although 1.5.1 also gives >>> np.uint(3) + 4 7.0 i.e., a float, which certainly doesn't look right either. Whereas >>> np.int(3) + 4 7 The float promotion is still there in 1.6.1 In [4]: uint64(1) + 2 Out[4]: 3.0 So I suppose there is the larger question is how combining numpy scalars with python scalars should be done in general. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Apr 6 02:22:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 6 Apr 2012 01:22:55 -0500 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> Message-ID: > > Although 1.5.1 also gives > > >>> np.uint(3) + 4 > 7.0 > > i.e., a float, which certainly doesn't look right either. Whereas > > >>> np.int(3) + 4 > 7 > > The float promotion is still there in 1.6.1 > > In [4]: uint64(1) + 2 > Out[4]: 3.0 > > So I suppose there is the larger question is how combining numpy scalars with python scalars should be done in general. > Yes, exactly! This is a good discussion to have. As mentioned before, I would also like to see the NumPy scalars get their own math instead of re-using the umath machinery. In the case of scalars, it would seem to make sense to interpret integers as signed or unsigned based on their value. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 02:22:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2012 00:22:55 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: On Fri, Apr 6, 2012 at 12:19 AM, Travis Oliphant wrote: > > On Apr 6, 2012, at 1:01 AM, Charles R Harris wrote: > > > > On Thu, Apr 5, 2012 at 11:57 PM, Travis Oliphant wrote: > >> As of 1.5.1 this worked: >> >> >>> numpy.__version__ >> 1.5.1 >> >>> numpy.uint64(5) & 3 >> 1L >> >> >> So, this is a regression and a bug. It should be fixed so that it >> doesn't raise an error. I believe the scalars were special cased so that a >> raw 3 would not be interpreted as a signed int when it is clearly unsigned. >> The casting rules were well established over a long period. They had >> issues, but they should not have been changed like this in a 1.X release. >> >> Fortunately there are work-arounds and these issues arise only in corner >> cases, but we should strive to do better. >> >> > I disagree, promoting to object kind of destroys the whole idea of bitwise > operations. I think we *fixed* a bug. > > > That is an interesting point of view. I could see that point of view. > But, was this discussed as a bug prior to this change occurring? > > I just heard from a very heavy user of NumPy that they are nervous about > upgrading because of little changes like this one. I don't know if this > particular issue would affect them or not, but I will re-iterate my view > that we should be very careful of these kinds of changes. > > In this particular case, what should the behavior be? It would be ideal > if the scalar math did not just re-use the array-math machinery. Let's say > that scalars had their own bit-wise operator. What should the output of > > numpy.uint64(5) & 3 actually be? I think it should interpret the 3 as > unsigned and perform the operation (but not promote to an object). > > > Yes, I agree with that. We should think about how the scalar types combine in the common operations. It looks like it could be made more consistent. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cwg at falma.de Fri Apr 6 03:56:27 2012 From: cwg at falma.de (Christoph Groth) Date: Fri, 06 Apr 2012 09:56:27 +0200 Subject: [Numpy-discussion] why does eigvalsh return a complex array? Message-ID: <877gxte1jo.fsf@falma.de> I noticed that numpy.linalg.eigvalsh returns a complex array, even though mathematically the resulting eigenvalues are guaranteed to be real. Looking at the source code, the underlying zheevd routine of LAPACK indeed returns an array of real numbers which is than converted to complex in the numpy wrapper. Does numpy policy require the type of the result to be the same as the type of input? Copying an array twice to arrive at the original result seems pointless to me. From njs at pobox.com Fri Apr 6 05:57:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Apr 2012 10:57:44 +0100 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant wrote: > That is an interesting point of view. ? ? I could see that point of view. > ?But, was this discussed as a bug prior to this change occurring? > > I just heard from a very heavy user of NumPy that they are nervous about > upgrading because of little changes like this one. ? I don't know if this > particular issue would affect them or not, but I will re-iterate my view > that we should be very careful of these kinds of changes. I agree -- these changes make me very nervous as well, especially since I haven't seen any short, simple description of what changed or what the rules actually are now (comparable to the old "scalars do not affect the type of arrays"). But, I also want to speak up in favor in one respect, since real world data points are always good. I had some code that did def do_something(a): a = np.asarray(a) a -= np.mean(a) ... If someone happens to pass in an integer array, then this is totally broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently discards the fractional part and performs the subtraction anyway, e.g.: In [4]: a Out[4]: array([0, 1, 2, 3]) In [5]: a -= 1.5 In [6]: a Out[6]: array([-1, 0, 0, 1]) The bug was discovered when Skipper tried running my code against numpy master, and it errored out on the -=. So Mark's changes did catch one real bug that would have silently caused completely wrong numerical results! https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 - Nathaniel From njs at pobox.com Fri Apr 6 06:00:11 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Apr 2012 11:00:11 +0100 Subject: [Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation In-Reply-To: References: Message-ID: Hi Wes, I believe that Mark rewrote a bunch of the fancy-indexing-related code from scratch in the masked-NA branch. I don't know if it affects anything you're talking about here, but just as a heads up, you might want to benchmark master, since it may have a different performance profile. -- Nathaniel On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney wrote: > dear all, > > I've routinely found that: > > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing > 2) Hand-coded Cython boolean indexing is many times faster than ndarray indexing > 3) putmask is significantly faster than ndarray indexing > > For example, I ?stumbled on this tonight: > > straightforward cython code: > > def row_bool_subset(ndarray[float64_t, ndim=2] values, > ? ? ? ? ? ? ? ? ? ndarray[uint8_t, cast=True] mask): > ? cdef: > ? ? ? Py_ssize_t i, j, n, k, pos = 0 > ? ? ? ndarray[float64_t, ndim=2] out > > ? n, k = ( values).shape > ? assert(n == len(mask)) > > ? out = np.empty((mask.sum(), k), dtype=np.float64) > > ? for i in range(n): > ? ? ? if mask[i]: > ? ? ? ? ? for j in range(k): > ? ? ? ? ? ? ? out[pos, j] = values[i, j] > ? ? ? ? ? pos += 1 > > ? return out > > In [1]: values = randn(1000000, 4) > > In [2]: mask = np.ones(1000000, dtype=bool) > > In [3]: import pandas._sandbox as sbx > > In [4]: result = sbx.row_bool_subset(values, mask) > > In [5]: timeit result = sbx.row_bool_subset(values, mask) > 100 loops, best of 3: 9.58 ms per loop > > pure NumPy: > > In [6]: timeit values[mask] > 10 loops, best of 3: 81.6 ms per loop > > Here's the kind of take performance problems that I routinely experience: > > In [12]: values = randn(1000000, 4) > v > In [13]: values.shape > Out[13]: (1000000, 4) > > In [14]: indexer = np.random.permutation(1000000)[:500000] > > In [15]: timeit values[indexer] > 10 loops, best of 3: 70.7 ms per loop > > In [16]: timeit values.take(indexer, axis=0) > 100 loops, best of 3: 13.3 ms per loop > > When I can spare the time in the future I will personally work on > these indexing routines in the C codebase, but I suspect that I'm not > the only one adversely affected by these kinds of performance > problems, and it might be worth thinking about a community effort to > split up the work of retooling these methods to be more performant. We > could use a tool like my vbench project (github.com/wesm/vbench) to > make a list of the performance benchmarks of interest (like the ones > above). > > Unfortunately I am too time constrained at least for the next 6 months > to devote to a complete rewrite of the code in question. Possibly > sometime in 2013 if no one has gotten to it yet, but this seems like > someplace that we should be concerned about as the language > performance wars continue to rage. > > - Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From boogaloojb at yahoo.fr Fri Apr 6 07:41:25 2012 From: boogaloojb at yahoo.fr (Jean-Baptiste Rudant) Date: Fri, 6 Apr 2012 12:41:25 +0100 (BST) Subject: [Numpy-discussion] (no subject) Message-ID: <1333712485.41005.YahooMailMobile@web28516.mail.ukl.yahoo.com> http://alumnos.digicap.cl/images/rmngl.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Fri Apr 6 08:06:35 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 6 Apr 2012 13:06:35 +0100 Subject: [Numpy-discussion] (no subject) In-Reply-To: <1333712485.41005.YahooMailMobile@web28516.mail.ukl.yahoo.com> References: <1333712485.41005.YahooMailMobile@web28516.mail.ukl.yahoo.com> Message-ID: Could someone please ban this person from the mailing list, he keeps sending spam. On 6 April 2012 12:41, Jean-Baptiste Rudant wrote: > http://alumnos.digicap.cl/images/rmngl.html > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Fri Apr 6 08:15:29 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 06 Apr 2012 14:15:29 +0200 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: <1333712485.41005.YahooMailMobile@web28516.mail.ukl.yahoo.com> Message-ID: <4F7EDE61.4040005@crans.org> Le 06/04/2012 14:06, mark florisson a ?crit : > Could someone please ban this person from the mailing list, he keeps > sending spam. I was about to ask the same thing. In the mean time, I googled the name of this gentleman and found a possible match with a person working for the French national institute for statistics (INSEE). I've tried to forge a valid email address from his name and the INSEE domain name to warn him about his spamming Yahoo address. I'll see if the email comes back as "Undelivered" or not... -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From ben.root at ou.edu Fri Apr 6 08:54:35 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 6 Apr 2012 08:54:35 -0400 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: Message-ID: On Friday, April 6, 2012, Val Kalatsky wrote: > > The only slicing short-cut I can think of is the Ellipsis object, but it's > not going to help you much here. > The alternatives that come to my mind are (1) manipulation of shape > directly and (2) building a string and running eval on it. > Your solution is better than (1), and (2) is a horrible hack, so your > solution wins again. > Cheers > Val > Take a peek at how np.gradient() does it. It creates a list of None with a length equal to the number of dimensions, and then inserts a slice object in the appropriate spot in the list. Cheers! Ben Root > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ognen at enthought.com Fri Apr 6 09:10:39 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Fri, 6 Apr 2012 08:10:39 -0500 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: <1333712485.41005.YahooMailMobile@web28516.mail.ukl.yahoo.com> Message-ID: On Fri, Apr 6, 2012 at 7:06 AM, mark florisson wrote: > Could someone please ban this person from the mailing list, he keeps > sending spam. > > On 6 April 2012 12:41, Jean-Baptiste Rudant wrote: > >> http://alumnos.digicap.cl/images/rmngl.html >> >> They should be gone now. Ognen -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 09:50:05 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2012 07:50:05 -0600 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: On Fri, Apr 6, 2012 at 3:57 AM, Nathaniel Smith wrote: > On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant > wrote: > > That is an interesting point of view. I could see that point of view. > > But, was this discussed as a bug prior to this change occurring? > > > > I just heard from a very heavy user of NumPy that they are nervous about > > upgrading because of little changes like this one. I don't know if this > > particular issue would affect them or not, but I will re-iterate my view > > that we should be very careful of these kinds of changes. > > I agree -- these changes make me very nervous as well, especially > since I haven't seen any short, simple description of what changed or > what the rules actually are now (comparable to the old "scalars do not > affect the type of arrays"). > > But, I also want to speak up in favor in one respect, since real world > data points are always good. I had some code that did > def do_something(a): > a = np.asarray(a) > a -= np.mean(a) > ... > If someone happens to pass in an integer array, then this is totally > broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently > discards the fractional part and performs the subtraction anyway, > e.g.: > > In [4]: a > Out[4]: array([0, 1, 2, 3]) > > In [5]: a -= 1.5 > > In [6]: a > Out[6]: array([-1, 0, 0, 1]) > > The bug was discovered when Skipper tried running my code against > numpy master, and it errored out on the -=. So Mark's changes did > catch one real bug that would have silently caused completely wrong > numerical results! > > > https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 > > Yes, these things are trade offs between correctness and convenience. I don't mind new warnings/errors so much, they may break old code but they don't lead to wrong results. It's the unexpected and unnoticed successes that are scary. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From claumann at physics.harvard.edu Fri Apr 6 10:01:57 2012 From: claumann at physics.harvard.edu (Chris Laumann) Date: Fri, 6 Apr 2012 10:01:57 -0400 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: <2006E783-4190-4D8E-9F3B-8236093952BA@physics.harvard.edu> Good morning all-- didn't realize this would generate quite such a buzz. To answer a direct question, I'm using the github master. A few thoughts (from a fairly heavy numpy user for numerical simulations and analysis): The current behavior is confusing and (as far as i can tell) undocumented. Scalars act up only if they are big: In [152]: np.uint32(1) & 1 Out[152]: 1 In [153]: np.uint64(1) & 1 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/claumann/ in () ----> 1 np.uint64(1) & 1 TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' But arrays don't seem to mind: In [154]: ones(3, dtype=np.uint32) & 1 Out[154]: array([1, 1, 1], dtype=uint32) In [155]: ones(3, dtype=np.uint64) & 1 Out[155]: array([1, 1, 1], dtype=uint64) As you mentioned, explicitly casting 1 to np.uint makes the above scalar case work, but I don't understand why this is unnecessary for the arrays. I could understand a general argument that type casting rules should always be the same independent of the underlying ufunc, but I'm not sure if that is sufficiently smart. Bitwise ops probably really ought to treat nonnegative python integers as unsigned. >> I disagree, promoting to object kind of destroys the whole idea of bitwise operations. I think we *fixed* a bug. > > That is an interesting point of view. I could see that point of view. But, was this discussed as a bug prior to this change occurring? I'm not sure what 'promoting to object' constitutes in the new numpy, but just a small thought. I can think of two reasons to go to the trouble of using bitfields over more pythonic (higher level) representations: speed/memory overhead and interfacing with external hardware/software. For me, it's mostly the former -- I've already implemented this program once using a much more pythonic approach but it just has too much memory overhead to scale to where I want it. If a coder goes to the trouble of using bitfields, there's probably a good reason they wanted a lower level representation in which bitfield ops happen in parallel as integer operations. But, what do you mean that bitwise operations are destroyed by promotion to objects? Best, Chris On Apr 6, 2012, at 5:57 AM, Nathaniel Smith wrote: > On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant wrote: >> That is an interesting point of view. I could see that point of view. >> But, was this discussed as a bug prior to this change occurring? >> >> I just heard from a very heavy user of NumPy that they are nervous about >> upgrading because of little changes like this one. I don't know if this >> particular issue would affect them or not, but I will re-iterate my view >> that we should be very careful of these kinds of changes. > > I agree -- these changes make me very nervous as well, especially > since I haven't seen any short, simple description of what changed or > what the rules actually are now (comparable to the old "scalars do not > affect the type of arrays"). > > But, I also want to speak up in favor in one respect, since real world > data points are always good. I had some code that did > def do_something(a): > a = np.asarray(a) > a -= np.mean(a) > ... > If someone happens to pass in an integer array, then this is totally > broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently > discards the fractional part and performs the subtraction anyway, > e.g.: > > In [4]: a > Out[4]: array([0, 1, 2, 3]) > > In [5]: a -= 1.5 > > In [6]: a > Out[6]: array([-1, 0, 0, 1]) > > The bug was discovered when Skipper tried running my code against > numpy master, and it errored out on the -=. So Mark's changes did > catch one real bug that would have silently caused completely wrong > numerical results! > > https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 > > - Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From francesco.barale at gmail.com Fri Apr 6 14:44:41 2012 From: francesco.barale at gmail.com (francesco82) Date: Fri, 6 Apr 2012 11:44:41 -0700 (PDT) Subject: [Numpy-discussion] problem with vectorized difference equation Message-ID: <33641688.post@talk.nabble.com> Hello everyone, After reading the very good post http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html and the book by H. P. Langtangen 'Python scripting for computational science' I was trying to speed up the execution of a loop on numpy arrays being used to describe a simple difference equation. The actual code I am working on involves some more complicated equations, but I am having the same exact behavior as described below. To test the improvement in speed I wrote the following in vect_try.py: #!/usr/bin/python import numpy as np import matplotlib.pyplot as plt dt = 0.02 #time step time = np.arange(0,2,dt) #time array u = np.sin(2*np.pi*time) #input signal array def vect_int(u,y): #vectorized function n = u.size y[1:n] = y[0:n-1] + u[1:n] return y def sc_int(u,y): #scalar function y = y + u return y def calc_vect(u, func=vect_int): out = np.zeros(u.size) for i in xrange(u.size): out = func(u,out) return out def calc_sc(u, func=sc_int): out = np.zeros(u.size) for i in xrange(u.size-1): out[i+1] = sc_int(u[i+1],out[i]) return out To verify the execution time I've used the timeit function in Ipython: import vect_try as vt timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop As you can see the scalar implementation looping one item at the time (calc_sc) is 494/92.8~5.3 times faster than the vectorized one (calc_vect). My problem consists in the fact that I need to iterate the execution of calc_vect in order for it to operate on all the elements of the input array. If I execute calc_vect only once, it will only operate on the first slice of the vectors leaving the remaining untouched. My understanding was that the vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate over all the array, but that's not happening for me. Can anyone tell me what I am doing wrong? Thanks! Francesco -- View this message in context: http://old.nabble.com/problem-with-vectorized-difference-equation-tp33641688p33641688.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From tsyu80 at gmail.com Fri Apr 6 16:12:27 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Fri, 6 Apr 2012 16:12:27 -0400 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 8:54 AM, Benjamin Root wrote: > > > On Friday, April 6, 2012, Val Kalatsky wrote: > >> >> The only slicing short-cut I can think of is the Ellipsis object, but >> it's not going to help you much here. >> The alternatives that come to my mind are (1) manipulation of shape >> directly and (2) building a string and running eval on it. >> Your solution is better than (1), and (2) is a horrible hack, so your >> solution wins again. >> Cheers >> Val >> > > Take a peek at how np.gradient() does it. It creates a list of None with > a length equal to the number of dimensions, and then inserts a slice object > in the appropriate spot in the list. > > Cheers! > Ben Root > >> Hmm, it looks like my original implementation wasn't too far off. Thanks for the tip! -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Apr 6 16:25:03 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 6 Apr 2012 13:25:03 -0700 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: Message-ID: Hi, On Fri, Apr 6, 2012 at 1:12 PM, Tony Yu wrote: > > > On Fri, Apr 6, 2012 at 8:54 AM, Benjamin Root wrote: >> >> >> >> On Friday, April 6, 2012, Val Kalatsky wrote: >>> >>> >>> The only slicing short-cut I can think of is the Ellipsis object, but >>> it's not going to help you much here. >>> The alternatives that come to my mind are (1) manipulation of shape >>> directly and (2) building a string and running eval on it. >>> Your solution is better than (1), and (2) is a horrible hack, so your >>> solution wins again. >>> Cheers >>> Val >> >> >> Take a peek at how np.gradient() does it. ?It creates a list of None with >> a length equal to the number of dimensions, and then inserts a slice object >> in the appropriate spot in the list. >> >> Cheers! >> Ben Root > > > Hmm, it looks like my original implementation wasn't too far off. Thanks for > the tip! Another option: me_first = np.rollaxis(arr, axis) slice = me_first[start:end] slice = np.rollaxis(slice, 0, axis+1) Best, Matthew From sameer.grover.1 at gmail.com Fri Apr 6 17:06:30 2012 From: sameer.grover.1 at gmail.com (Sameer Grover) Date: Sat, 07 Apr 2012 02:36:30 +0530 Subject: [Numpy-discussion] problem with vectorized difference equation In-Reply-To: <33641688.post@talk.nabble.com> References: <33641688.post@talk.nabble.com> Message-ID: <4F7F5AD6.3070609@gmail.com> On Saturday 07 April 2012 12:14 AM, francesco82 wrote: > Hello everyone, > > After reading the very good post > http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html > and the book by H. P. Langtangen 'Python scripting for computational > science' I was trying to speed up the execution of a loop on numpy arrays > being used to describe a simple difference equation. > > The actual code I am working on involves some more complicated equations, > but I am having the same exact behavior as described below. To test the > improvement in speed I wrote the following in vect_try.py: > > #!/usr/bin/python > import numpy as np > import matplotlib.pyplot as plt > > dt = 0.02 #time step > time = np.arange(0,2,dt) #time array > u = np.sin(2*np.pi*time) #input signal array > > def vect_int(u,y): #vectorized function > n = u.size > y[1:n] = y[0:n-1] + u[1:n] > return y > > def sc_int(u,y): #scalar function > y = y + u > return y > > def calc_vect(u, func=vect_int): > out = np.zeros(u.size) > for i in xrange(u.size): > out = func(u,out) > return out > > def calc_sc(u, func=sc_int): > out = np.zeros(u.size) > for i in xrange(u.size-1): > out[i+1] = sc_int(u[i+1],out[i]) > return out > > To verify the execution time I've used the timeit function in Ipython: > > import vect_try as vt > timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop > timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop > > As you can see the scalar implementation looping one item at the time > (calc_sc) is 494/92.8~5.3 times faster than the vectorized one (calc_vect). > > My problem consists in the fact that I need to iterate the execution of > calc_vect in order for it to operate on all the elements of the input array. > If I execute calc_vect only once, it will only operate on the first slice of > the vectors leaving the remaining untouched. My understanding was that the > vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate over > all the array, but that's not happening for me. Can anyone tell me what I am > doing wrong? > > Thanks! > Francesco > 1. By vectorizing, we mean replacing a loop with a single expression. In your program, both the scalar and vector implementations (calc_vect and calc_sc) have a loop each. This isn't going to make anything faster. The vectorized implementation is just a convoluted way of achieving the same result and is slower. 2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the following loop for i in range(0,n-1): y[i+1] = y[i] + u[i+1] It is equivalent to something like z = np.zeros(n-1) for i in range(0,n-1): z[i] = y[i] + u[i+1] y[1:n] = z i.e., the RHS is computed in totality and then assigned to the LHS. This is how array operations work even in other languages such as Matlab. 3. I personally don't think there is a simple/obvious way to vectorize what you're trying to achieve. Sameer -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.barale at gmail.com Fri Apr 6 17:21:48 2012 From: francesco.barale at gmail.com (Francesco Barale) Date: Fri, 6 Apr 2012 14:21:48 -0700 (PDT) Subject: [Numpy-discussion] problem with vectorized difference equation In-Reply-To: <4F7F5AD6.3070609@gmail.com> References: <33641688.post@talk.nabble.com> <4F7F5AD6.3070609@gmail.com> Message-ID: <33645699.post@talk.nabble.com> Hello Sameer, Thank you very much for your reply. My goal was to try to speed up the loop describing the accumulator. In the (excellent) book I was mentioning in my initial post I could find one example that seemed to match what I was trying to do. Basically, it is said that a loop of the following kind: n = size(u)-1 for i in xrange(1,n,1): u_new[i] = u[i-1] + u[i] + u[i+1] should be equivalent to: u[1:n] = u[0:n-1] + u[1:n] + u[i+1] Am I missing something? Regards, Francesco Sameer Grover wrote: > > On Saturday 07 April 2012 12:14 AM, francesco82 wrote: >> Hello everyone, >> >> After reading the very good post >> http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html >> and the book by H. P. Langtangen 'Python scripting for computational >> science' I was trying to speed up the execution of a loop on numpy arrays >> being used to describe a simple difference equation. >> >> The actual code I am working on involves some more complicated equations, >> but I am having the same exact behavior as described below. To test the >> improvement in speed I wrote the following in vect_try.py: >> >> #!/usr/bin/python >> import numpy as np >> import matplotlib.pyplot as plt >> >> dt = 0.02 #time step >> time = np.arange(0,2,dt) #time array >> u = np.sin(2*np.pi*time) #input signal array >> >> def vect_int(u,y): #vectorized function >> n = u.size >> y[1:n] = y[0:n-1] + u[1:n] >> return y >> >> def sc_int(u,y): #scalar function >> y = y + u >> return y >> >> def calc_vect(u, func=vect_int): >> out = np.zeros(u.size) >> for i in xrange(u.size): >> out = func(u,out) >> return out >> >> def calc_sc(u, func=sc_int): >> out = np.zeros(u.size) >> for i in xrange(u.size-1): >> out[i+1] = sc_int(u[i+1],out[i]) >> return out >> >> To verify the execution time I've used the timeit function in Ipython: >> >> import vect_try as vt >> timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop >> timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop >> >> As you can see the scalar implementation looping one item at the time >> (calc_sc) is 494/92.8~5.3 times faster than the vectorized one >> (calc_vect). >> >> My problem consists in the fact that I need to iterate the execution of >> calc_vect in order for it to operate on all the elements of the input >> array. >> If I execute calc_vect only once, it will only operate on the first slice >> of >> the vectors leaving the remaining untouched. My understanding was that >> the >> vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate over >> all the array, but that's not happening for me. Can anyone tell me what I >> am >> doing wrong? >> >> Thanks! >> Francesco >> > 1. By vectorizing, we mean replacing a loop with a single expression. In > your program, both the scalar and vector implementations (calc_vect and > calc_sc) have a loop each. This isn't going to make anything faster. The > vectorized implementation is just a convoluted way of achieving the same > result and is slower. > > 2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the > following loop > > for i in range(0,n-1): > y[i+1] = y[i] + u[i+1] > > It is equivalent to something like > > z = np.zeros(n-1) > for i in range(0,n-1): > z[i] = y[i] + u[i+1] > y[1:n] = z > > i.e., the RHS is computed in totality and then assigned to the LHS. This > is how array operations work even in other languages such as Matlab. > > 3. I personally don't think there is a simple/obvious way to vectorize > what you're trying to achieve. > > Sameer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- View this message in context: http://old.nabble.com/problem-with-vectorized-difference-equation-tp33641688p33645699.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From sameer.grover.1 at gmail.com Fri Apr 6 18:27:46 2012 From: sameer.grover.1 at gmail.com (Sameer Grover) Date: Sat, 07 Apr 2012 03:57:46 +0530 Subject: [Numpy-discussion] problem with vectorized difference equation In-Reply-To: <33645699.post@talk.nabble.com> References: <33641688.post@talk.nabble.com> <4F7F5AD6.3070609@gmail.com> <33645699.post@talk.nabble.com> Message-ID: <4F7F6DE2.4060006@gmail.com> On Saturday 07 April 2012 02:51 AM, Francesco Barale wrote: > Hello Sameer, > > Thank you very much for your reply. My goal was to try to speed up the loop > describing the accumulator. In the (excellent) book I was mentioning in my > initial post I could find one example that seemed to match what I was trying > to do. Basically, it is said that a loop of the following kind: > > n = size(u)-1 > for i in xrange(1,n,1): > u_new[i] = u[i-1] + u[i] + u[i+1] > > should be equivalent to: > > u[1:n] = u[0:n-1] + u[1:n] + u[i+1] This example is correct. What I was trying to point out was that the single expression y[1:n] = y[0:n-1] + u[1:n] will iterate over the array but will not accumulate. It will add y[0:n-1] to u[1:n] and assign the result to y[1:n]. For example, y = [1,2,3,4] u = [5,6,7,8] Then y[0:n-1] = [1,2,3] and u[1:n]=[6,7,8] The statement y[1:n] = y[0:n-1] + u[1:n] implies y[1:n] = [6+1,7+2,8+3] = [7,9,11] yielding y = [1,7,9,11] whereas the code: for i in range(0,n-1): y[i+1] = y[i] + u[i+1] will accumulate and give y = [1,7,14,22] Sameer > Am I missing something? > > Regards, > Francesco > > > Sameer Grover wrote: >> On Saturday 07 April 2012 12:14 AM, francesco82 wrote: >>> Hello everyone, >>> >>> After reading the very good post >>> http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html >>> and the book by H. P. Langtangen 'Python scripting for computational >>> science' I was trying to speed up the execution of a loop on numpy arrays >>> being used to describe a simple difference equation. >>> >>> The actual code I am working on involves some more complicated equations, >>> but I am having the same exact behavior as described below. To test the >>> improvement in speed I wrote the following in vect_try.py: >>> >>> #!/usr/bin/python >>> import numpy as np >>> import matplotlib.pyplot as plt >>> >>> dt = 0.02 #time step >>> time = np.arange(0,2,dt) #time array >>> u = np.sin(2*np.pi*time) #input signal array >>> >>> def vect_int(u,y): #vectorized function >>> n = u.size >>> y[1:n] = y[0:n-1] + u[1:n] >>> return y >>> >>> def sc_int(u,y): #scalar function >>> y = y + u >>> return y >>> >>> def calc_vect(u, func=vect_int): >>> out = np.zeros(u.size) >>> for i in xrange(u.size): >>> out = func(u,out) >>> return out >>> >>> def calc_sc(u, func=sc_int): >>> out = np.zeros(u.size) >>> for i in xrange(u.size-1): >>> out[i+1] = sc_int(u[i+1],out[i]) >>> return out >>> >>> To verify the execution time I've used the timeit function in Ipython: >>> >>> import vect_try as vt >>> timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop >>> timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop >>> >>> As you can see the scalar implementation looping one item at the time >>> (calc_sc) is 494/92.8~5.3 times faster than the vectorized one >>> (calc_vect). >>> >>> My problem consists in the fact that I need to iterate the execution of >>> calc_vect in order for it to operate on all the elements of the input >>> array. >>> If I execute calc_vect only once, it will only operate on the first slice >>> of >>> the vectors leaving the remaining untouched. My understanding was that >>> the >>> vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate over >>> all the array, but that's not happening for me. Can anyone tell me what I >>> am >>> doing wrong? >>> >>> Thanks! >>> Francesco >>> >> 1. By vectorizing, we mean replacing a loop with a single expression. In >> your program, both the scalar and vector implementations (calc_vect and >> calc_sc) have a loop each. This isn't going to make anything faster. The >> vectorized implementation is just a convoluted way of achieving the same >> result and is slower. >> >> 2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the >> following loop >> >> for i in range(0,n-1): >> y[i+1] = y[i] + u[i+1] >> >> It is equivalent to something like >> >> z = np.zeros(n-1) >> for i in range(0,n-1): >> z[i] = y[i] + u[i+1] >> y[1:n] = z >> >> i.e., the RHS is computed in totality and then assigned to the LHS. This >> is how array operations work even in other languages such as Matlab. >> >> 3. I personally don't think there is a simple/obvious way to vectorize >> what you're trying to achieve. >> >> Sameer >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> From michael.forbes at gmail.com Fri Apr 6 18:18:45 2012 From: michael.forbes at gmail.com (Michael McNeil Forbes) Date: Fri, 6 Apr 2012 15:18:45 -0700 Subject: [Numpy-discussion] Keyword argument support for vectorize. Message-ID: Hi, I added a simple enhancement patch to provide vectorize with simple keyword argument support. (I added a new kwvectorize decorator, but suspect this could/should easily be rolled into the existing vectorize.) http://projects.scipy.org/numpy/ticket/2100 This just reorders any kwargs into the correct position (filling in defaults as needed) and then class the "vectorize"d function. If people think this is reasonable, I can improve the patch with more comprehensive testing and error messages. Michael. From christoph.graves at gmail.com Fri Apr 6 18:50:17 2012 From: christoph.graves at gmail.com (cgraves) Date: Fri, 6 Apr 2012 15:50:17 -0700 (PDT) Subject: [Numpy-discussion] speed of append_fields() in numpy.lib.recfunctions vs matplotlib.mlab Message-ID: <33646038.post@talk.nabble.com> It seems that the speed of append_fields() in numpy.lib.recfunctions is much slower than rec_append_fields() in matplotlib.mlab. See the following code: import numpy as np import matplotlib.mlab as mlab import numpy.lib.recfunctions as nprf import time # Set up recarray nr_pts = 1E6 dt = np.dtype([('x', float), ('y', float)]) data = np.zeros(nr_pts, dtype=dt) data = data.view(np.recarray) data.x = np.linspace(0,5,nr_pts) data.y = np.linspace(5,10,nr_pts) z = np.linspace(20,15,nr_pts) # Test mlab last_time_clock = time.clock() data_mlab = mlab.rec_append_fields(data, ['z'], [z]) time_taken = time.clock() - last_time_clock print 'mlab took %i milliseconds.' % (time_taken*1000) # Test nprf last_time_clock = time.clock() data_nprf = nprf.append_fields(data, ['z'], [z], usemask=False, asrecarray=True) time_taken = time.clock() - last_time_clock print 'nprf took %i milliseconds.' % (time_taken*1000) On this computer, the output is (+/- 10 ms): mlab took 49 milliseconds. nprf took 440 milliseconds. Does anyone know why the numpy.lib.recfunctions version is so much slower? I thought these were a port from matplotlib.mlab. Changing to usemask=True has an effect on the time of the nprf way, making it vary 330-480 ms (still much slower). I'm using numpy 1.5.1. Best, Chris -- View this message in context: http://old.nabble.com/speed-of-append_fields%28%29-in-numpy.lib.recfunctions-vs-matplotlib.mlab-tp33646038p33646038.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From francesco.barale at gmail.com Fri Apr 6 19:27:53 2012 From: francesco.barale at gmail.com (Francesco Barale) Date: Fri, 6 Apr 2012 16:27:53 -0700 (PDT) Subject: [Numpy-discussion] problem with vectorized difference equation In-Reply-To: <33641688.post@talk.nabble.com> References: <33641688.post@talk.nabble.com> Message-ID: <33646154.post@talk.nabble.com> Now I am clear. I guess the vectorized notation speeds up difference equations describing FIR structures, whereas IIR ones won't benefit. Francesco Barale wrote: > > Hello everyone, > > After reading the very good post > http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html > and the book by H. P. Langtangen 'Python scripting for computational > science' I was trying to speed up the execution of a loop on numpy arrays > being used to describe a simple difference equation. > > The actual code I am working on involves some more complicated equations, > but I am having the same exact behavior as described below. To test the > improvement in speed I wrote the following in vect_try.py: > > #!/usr/bin/python > import numpy as np > import matplotlib.pyplot as plt > > dt = 0.02 #time step > time = np.arange(0,2,dt) #time array > u = np.sin(2*np.pi*time) #input signal array > > def vect_int(u,y): #vectorized function > n = u.size > y[1:n] = y[0:n-1] + u[1:n] > return y > > def sc_int(u,y): #scalar function > y = y + u > return y > > def calc_vect(u, func=vect_int): > out = np.zeros(u.size) > for i in xrange(u.size): > out = func(u,out) > return out > > def calc_sc(u, func=sc_int): > out = np.zeros(u.size) > for i in xrange(u.size-1): > out[i+1] = func(u[i+1],out[i]) > return out > > To verify the execution time I've used the timeit function in Ipython: > > import vect_try as vt > timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop > timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop > > As you can see the scalar implementation looping one item at the time > (calc_sc) is 494/92.8~5.3 times faster than the vectorized one > (calc_vect). > > My problem consists in the fact that I need to iterate the execution of > vect_int in order for it to operate on all the elements of the input > array. If I execute vect_int only once, it will only operate on the first > slice of the vectors leaving the remaining untouched. My understanding was > that the vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to > iterate over all the array, but that's not happening for me. Can anyone > tell me what I am doing wrong? > > Thanks! > Francesco > > -- View this message in context: http://old.nabble.com/problem-with-vectorized-difference-equation-tp33641688p33646154.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From matthew.brett at gmail.com Fri Apr 6 19:39:32 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 6 Apr 2012 16:39:32 -0700 Subject: [Numpy-discussion] speed of append_fields() in numpy.lib.recfunctions vs matplotlib.mlab In-Reply-To: <33646038.post@talk.nabble.com> References: <33646038.post@talk.nabble.com> Message-ID: Hi, On Fri, Apr 6, 2012 at 3:50 PM, cgraves wrote: > > It seems that the speed of append_fields() in numpy.lib.recfunctions is much > slower than rec_append_fields() in matplotlib.mlab. See the following code: As I remember it (Pierre M can probably correct me) the recfunctions are not ports of the mlab functions, but are considerably extended in order to deal with masking, and do not have exactly the same API. When I noticed this I wondered if there would be some sensible way of making the mlab routines available in a separate namespace, but I did not pursue it. Cheers, matthew From charlesr.harris at gmail.com Fri Apr 6 22:04:02 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2012 20:04:02 -0600 Subject: [Numpy-discussion] why does eigvalsh return a complex array? In-Reply-To: <877gxte1jo.fsf@falma.de> References: <877gxte1jo.fsf@falma.de> Message-ID: On Fri, Apr 6, 2012 at 1:56 AM, Christoph Groth wrote: > I noticed that numpy.linalg.eigvalsh returns a complex array, even > though mathematically the resulting eigenvalues are guaranteed to be > real. > > Looking at the source code, the underlying zheevd routine of LAPACK > indeed returns an array of real numbers which is than converted to > complex in the numpy wrapper. > > Does numpy policy require the type of the result to be the same as the > type of input? Copying an array twice to arrive at the original result > seems pointless to me. > > I think this should be fixed, the problem is the wrapper from a complex array. Not sure what the easiest fix is. I expect eigh is similar in behavior. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Apr 7 08:43:21 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 7 Apr 2012 14:43:21 +0200 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: On Fri, Apr 6, 2012 at 3:50 PM, Charles R Harris wrote: > > > On Fri, Apr 6, 2012 at 3:57 AM, Nathaniel Smith wrote: > >> On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant >> wrote: >> > That is an interesting point of view. I could see that point of >> view. >> > But, was this discussed as a bug prior to this change occurring? >> > >> > I just heard from a very heavy user of NumPy that they are nervous about >> > upgrading because of little changes like this one. I don't know if >> this >> > particular issue would affect them or not, but I will re-iterate my view >> > that we should be very careful of these kinds of changes. >> >> I agree -- these changes make me very nervous as well, especially >> since I haven't seen any short, simple description of what changed or >> what the rules actually are now (comparable to the old "scalars do not >> affect the type of arrays"). >> >> But, I also want to speak up in favor in one respect, since real world >> data points are always good. I had some code that did >> def do_something(a): >> a = np.asarray(a) >> a -= np.mean(a) >> ... >> If someone happens to pass in an integer array, then this is totally >> broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently >> discards the fractional part and performs the subtraction anyway, >> e.g.: >> >> In [4]: a >> Out[4]: array([0, 1, 2, 3]) >> >> In [5]: a -= 1.5 >> >> In [6]: a >> Out[6]: array([-1, 0, 0, 1]) >> >> The bug was discovered when Skipper tried running my code against >> numpy master, and it errored out on the -=. So Mark's changes did >> catch one real bug that would have silently caused completely wrong >> numerical results! >> > As a second datapoint, it did catch real bugs in scikit-learn too. On the other hand, it required a workaround in ndimage. http://thread.gmane.org/gmane.comp.python.numeric.general/44206/focus=44208 > >> >> https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 >> >> Yes, these things are trade offs between correctness and convenience. I > don't mind new warnings/errors so much, they may break old code but they > don't lead to wrong results. It's the unexpected and unnoticed successes > that are scary. > We discussed reverting the unsafe casting behavior for 1.7 in the thread I linked to above. Do we still want to do this? As far as I can tell it didn't really cause problems so far. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Apr 7 09:19:41 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 7 Apr 2012 15:19:41 +0200 Subject: [Numpy-discussion] empty chararrays (ticket 1948) In-Reply-To: References: Message-ID: On Sun, Mar 25, 2012 at 7:09 PM, Ralf Gommers wrote: > > > On Sun, Mar 25, 2012 at 7:03 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, Mar 25, 2012 at 10:12 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> Hi, >>> >>> In ticket 1948 a backwards compatibility issue with chararray is >>> reported. Indexing a chararray with [] or a bool array of False used to >>> return [] in numpy 1.2.1 (consistent with ndarray behavior), but now >>> returns an empty string. Unfortunately this changed behavior has been >>> present for the 1.5.x and 1.6.x releases. >>> >>> So the question is if this should be changed back or not? The change was >>> likely unintentional; there's no test for it. >>> >>> >> I believe the stsci folks were maintaining chararray, although I don't >> see anyone from there with commit permissions. Hmm... I'd be inclined to >> reinstate the old behavior, but the stsci folks may have deliberately made >> the change, I'd like to hear from them first. >> > > The change was made by Michael Droettboom (CC'd), who did have commit > permissions for this. They got lost with the Github move it seems. > I've sent a PR for this, https://github.com/numpy/numpy/pull/247, which changes chararray to return an empty chararray for indexing with an empty array/list. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 7 09:58:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Apr 2012 07:58:11 -0600 Subject: [Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation In-Reply-To: References: Message-ID: On Fri, Apr 6, 2012 at 4:00 AM, Nathaniel Smith wrote: > Hi Wes, > > I believe that Mark rewrote a bunch of the fancy-indexing-related code > from scratch in the masked-NA branch. I don't know if it affects > anything you're talking about here, but just as a heads up, you might > want to benchmark master, since it may have a different performance > profile. > > -- Nathaniel > > On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney wrote: > > dear all, > > > > I've routinely found that: > > > > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing > > 2) Hand-coded Cython boolean indexing is many times faster than ndarray > indexing > > 3) putmask is significantly faster than ndarray indexing > > > > For example, I stumbled on this tonight: > > > > straightforward cython code: > > > > def row_bool_subset(ndarray[float64_t, ndim=2] values, > > ndarray[uint8_t, cast=True] mask): > > cdef: > > Py_ssize_t i, j, n, k, pos = 0 > > ndarray[float64_t, ndim=2] out > > > > n, k = ( values).shape > > assert(n == len(mask)) > > > > out = np.empty((mask.sum(), k), dtype=np.float64) > > > > for i in range(n): > > if mask[i]: > > for j in range(k): > > out[pos, j] = values[i, j] > > pos += 1 > > > > return out > > > > In [1]: values = randn(1000000, 4) > > > > In [2]: mask = np.ones(1000000, dtype=bool) > > > > In [3]: import pandas._sandbox as sbx > > > > In [4]: result = sbx.row_bool_subset(values, mask) > > > > In [5]: timeit result = sbx.row_bool_subset(values, mask) > > 100 loops, best of 3: 9.58 ms per loop > > > > pure NumPy: > > > > In [6]: timeit values[mask] > > 10 loops, best of 3: 81.6 ms per loop > > > > Here's the kind of take performance problems that I routinely experience: > > > > In [12]: values = randn(1000000, 4) > > v > > In [13]: values.shape > > Out[13]: (1000000, 4) > > > > In [14]: indexer = np.random.permutation(1000000)[:500000] > > > > In [15]: timeit values[indexer] > > 10 loops, best of 3: 70.7 ms per loop > > > > In [16]: timeit values.take(indexer, axis=0) > > 100 loops, best of 3: 13.3 ms per loop > > > > When I can spare the time in the future I will personally work on > > these indexing routines in the C codebase, but I suspect that I'm not > > the only one adversely affected by these kinds of performance > > problems, and it might be worth thinking about a community effort to > > split up the work of retooling these methods to be more performant. We > > could use a tool like my vbench project (github.com/wesm/vbench) to > > make a list of the performance benchmarks of interest (like the ones > > above). > > > > Unfortunately I am too time constrained at least for the next 6 months > > to devote to a complete rewrite of the code in question. Possibly > > sometime in 2013 if no one has gotten to it yet, but this seems like > > someplace that we should be concerned about as the language > > performance wars continue to rage. > > > New workers in the C vineyards are always welcome. I believe Mark is unhappy with fancy indexing, both in design and implementation. Unfortunately it is pretty entrenched in existing code, so making fundamental changes could be difficult. But it looks like a good topic for discussion in depth. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat Apr 7 14:07:59 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 7 Apr 2012 13:07:59 -0500 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> Message-ID: <257A925A-F6A8-4636-9A29-13ECDFCA23DD@continuum.io> If we just announce that there has been some code changes that alter corner-case casting rules, I think we can move forward. We could use a script to document the changes and create a test case which would help people figure out their code. Please speak up if you have another point of view? Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 7, 2012, at 7:43 AM, Ralf Gommers wrote: > > > On Fri, Apr 6, 2012 at 3:50 PM, Charles R Harris wrote: > > > On Fri, Apr 6, 2012 at 3:57 AM, Nathaniel Smith wrote: > On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant wrote: > > That is an interesting point of view. I could see that point of view. > > But, was this discussed as a bug prior to this change occurring? > > > > I just heard from a very heavy user of NumPy that they are nervous about > > upgrading because of little changes like this one. I don't know if this > > particular issue would affect them or not, but I will re-iterate my view > > that we should be very careful of these kinds of changes. > > I agree -- these changes make me very nervous as well, especially > since I haven't seen any short, simple description of what changed or > what the rules actually are now (comparable to the old "scalars do not > affect the type of arrays"). > > But, I also want to speak up in favor in one respect, since real world > data points are always good. I had some code that did > def do_something(a): > a = np.asarray(a) > a -= np.mean(a) > ... > If someone happens to pass in an integer array, then this is totally > broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently > discards the fractional part and performs the subtraction anyway, > e.g.: > > In [4]: a > Out[4]: array([0, 1, 2, 3]) > > In [5]: a -= 1.5 > > In [6]: a > Out[6]: array([-1, 0, 0, 1]) > > The bug was discovered when Skipper tried running my code against > numpy master, and it errored out on the -=. So Mark's changes did > catch one real bug that would have silently caused completely wrong > numerical results! > > As a second datapoint, it did catch real bugs in scikit-learn too. On the other hand, it required a workaround in ndimage. http://thread.gmane.org/gmane.comp.python.numeric.general/44206/focus=44208 > > > https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 > > Yes, these things are trade offs between correctness and convenience. I don't mind new warnings/errors so much, they may break old code but they don't lead to wrong results. It's the unexpected and unnoticed successes that are scary. > > We discussed reverting the unsafe casting behavior for 1.7 in the thread I linked to above. Do we still want to do this? As far as I can tell it didn't really cause problems so far. > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sat Apr 7 15:16:45 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 7 Apr 2012 15:16:45 -0400 Subject: [Numpy-discussion] Improving NumPy's indexing / subsetting / fancy indexing implementation In-Reply-To: References: Message-ID: On Sat, Apr 7, 2012 at 9:58 AM, Charles R Harris wrote: > > > On Fri, Apr 6, 2012 at 4:00 AM, Nathaniel Smith wrote: >> >> Hi Wes, >> >> I believe that Mark rewrote a bunch of the fancy-indexing-related code >> from scratch in the masked-NA branch. I don't know if it affects >> anything you're talking about here, but just as a heads up, you might >> want to benchmark master, since it may have a different performance >> profile. >> >> -- Nathaniel >> >> On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney wrote: >> > dear all, >> > >> > I've routinely found that: >> > >> > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing >> > 2) Hand-coded Cython boolean indexing is many times faster than ndarray >> > indexing >> > 3) putmask is significantly faster than ndarray indexing >> > >> > For example, I ?stumbled on this tonight: >> > >> > straightforward cython code: >> > >> > def row_bool_subset(ndarray[float64_t, ndim=2] values, >> > ? ? ? ? ? ? ? ? ? ndarray[uint8_t, cast=True] mask): >> > ? cdef: >> > ? ? ? Py_ssize_t i, j, n, k, pos = 0 >> > ? ? ? ndarray[float64_t, ndim=2] out >> > >> > ? n, k = ( values).shape >> > ? assert(n == len(mask)) >> > >> > ? out = np.empty((mask.sum(), k), dtype=np.float64) >> > >> > ? for i in range(n): >> > ? ? ? if mask[i]: >> > ? ? ? ? ? for j in range(k): >> > ? ? ? ? ? ? ? out[pos, j] = values[i, j] >> > ? ? ? ? ? pos += 1 >> > >> > ? return out >> > >> > In [1]: values = randn(1000000, 4) >> > >> > In [2]: mask = np.ones(1000000, dtype=bool) >> > >> > In [3]: import pandas._sandbox as sbx >> > >> > In [4]: result = sbx.row_bool_subset(values, mask) >> > >> > In [5]: timeit result = sbx.row_bool_subset(values, mask) >> > 100 loops, best of 3: 9.58 ms per loop >> > >> > pure NumPy: >> > >> > In [6]: timeit values[mask] >> > 10 loops, best of 3: 81.6 ms per loop >> > >> > Here's the kind of take performance problems that I routinely >> > experience: >> > >> > In [12]: values = randn(1000000, 4) >> > v >> > In [13]: values.shape >> > Out[13]: (1000000, 4) >> > >> > In [14]: indexer = np.random.permutation(1000000)[:500000] >> > >> > In [15]: timeit values[indexer] >> > 10 loops, best of 3: 70.7 ms per loop >> > >> > In [16]: timeit values.take(indexer, axis=0) >> > 100 loops, best of 3: 13.3 ms per loop >> > >> > When I can spare the time in the future I will personally work on >> > these indexing routines in the C codebase, but I suspect that I'm not >> > the only one adversely affected by these kinds of performance >> > problems, and it might be worth thinking about a community effort to >> > split up the work of retooling these methods to be more performant. We >> > could use a tool like my vbench project (github.com/wesm/vbench) to >> > make a list of the performance benchmarks of interest (like the ones >> > above). >> > >> > Unfortunately I am too time constrained at least for the next 6 months >> > to devote to a complete rewrite of the code in question. Possibly >> > sometime in 2013 if no one has gotten to it yet, but this seems like >> > someplace that we should be concerned about as the language >> > performance wars continue to rage. >> > > > > New workers in the C vineyards are always welcome. I believe Mark is unhappy > with fancy indexing, both in design and implementation. Unfortunately it is > pretty entrenched in existing code, so making fundamental changes could be > difficult. But it looks like a good topic for discussion in depth. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I plan to get involved in the C codebase as soon as I can spare the time from pandas development. Perhaps it would be worth setting up some vbenchmarks like pandas has to give some transparency ([1], [2]) into the performance and how it's been improved over time. I may be able to set up my build machine to publish benchmarks like these for NumPy sometime in the near future. - Wes [1] http://wesmckinney.com/blog/?p=373 [2] http://pandas.pydata.org/pandas-docs/vbench/vb_indexing.html From e.antero.tammi at gmail.com Sat Apr 7 16:50:48 2012 From: e.antero.tammi at gmail.com (eat) Date: Sat, 7 Apr 2012 23:50:48 +0300 Subject: [Numpy-discussion] problem with vectorized difference equation In-Reply-To: <4F7F6DE2.4060006@gmail.com> References: <33641688.post@talk.nabble.com> <4F7F5AD6.3070609@gmail.com> <33645699.post@talk.nabble.com> <4F7F6DE2.4060006@gmail.com> Message-ID: Hi, On Sat, Apr 7, 2012 at 1:27 AM, Sameer Grover wrote: > On Saturday 07 April 2012 02:51 AM, Francesco Barale wrote: > > Hello Sameer, > > > > Thank you very much for your reply. My goal was to try to speed up the > loop > > describing the accumulator. In the (excellent) book I was mentioning in > my > > initial post I could find one example that seemed to match what I was > trying > > to do. Basically, it is said that a loop of the following kind: > > > > n = size(u)-1 > > for i in xrange(1,n,1): > > u_new[i] = u[i-1] + u[i] + u[i+1] > > > > should be equivalent to: > > > > u[1:n] = u[0:n-1] + u[1:n] + u[i+1] > This example is correct. > > What I was trying to point out was that the single expression y[1:n] = > y[0:n-1] + u[1:n] will iterate over the array but will not accumulate. > It will add y[0:n-1] to u[1:n] and assign the result to y[1:n]. > > For example, > y = [1,2,3,4] > u = [5,6,7,8] > Then y[0:n-1] = [1,2,3] and u[1:n]=[6,7,8] > > The statement y[1:n] = y[0:n-1] + u[1:n] implies > y[1:n] = [6+1,7+2,8+3] = [7,9,11] > yielding y = [1,7,9,11] > FWIIFO, if assignment in loop like this ever makes any sense (which I doubt), > > whereas the code: > > for i in range(0,n-1): > y[i+1] = y[i] + u[i+1] > then it can be captured in a function, like In []: def s0(y, u): ..: yn= y.copy() ..: for k in xrange(y.size- 1): ..: yn[k+ 1]= yn[k]+ u[k+ 1] ..: return yn ..: and now this function can be easily transformed to utilize cumsum, like In []: def s1(y, u): ..: un= u.copy() ..: un[0]= y[0] ..: return cumsum(un) ..: thus In []: y, u= rand(1e5), rand(1e5) In []: allclose(s0(y, u), s1(y, u)) Out[]: True and definitely this transformation will outperform a plain python loop In []: timeit s0(y, u) 10 loops, best of 3: 122 ms per loop In []: timeit s1(y, u) 100 loops, best of 3: 2.16 ms per loop In []: 122/ 2.16 Out[]: 56.48148148148148 My 2 cents, -eat > > will accumulate and give y = [1,7,14,22] > > Sameer > > Am I missing something? > > > > Regards, > > Francesco > > > > > > Sameer Grover wrote: > >> On Saturday 07 April 2012 12:14 AM, francesco82 wrote: > >>> Hello everyone, > >>> > >>> After reading the very good post > >>> > http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html > >>> and the book by H. P. Langtangen 'Python scripting for computational > >>> science' I was trying to speed up the execution of a loop on numpy > arrays > >>> being used to describe a simple difference equation. > >>> > >>> The actual code I am working on involves some more complicated > equations, > >>> but I am having the same exact behavior as described below. To test the > >>> improvement in speed I wrote the following in vect_try.py: > >>> > >>> #!/usr/bin/python > >>> import numpy as np > >>> import matplotlib.pyplot as plt > >>> > >>> dt = 0.02 #time step > >>> time = np.arange(0,2,dt) #time array > >>> u = np.sin(2*np.pi*time) #input signal array > >>> > >>> def vect_int(u,y): #vectorized function > >>> n = u.size > >>> y[1:n] = y[0:n-1] + u[1:n] > >>> return y > >>> > >>> def sc_int(u,y): #scalar function > >>> y = y + u > >>> return y > >>> > >>> def calc_vect(u, func=vect_int): > >>> out = np.zeros(u.size) > >>> for i in xrange(u.size): > >>> out = func(u,out) > >>> return out > >>> > >>> def calc_sc(u, func=sc_int): > >>> out = np.zeros(u.size) > >>> for i in xrange(u.size-1): > >>> out[i+1] = sc_int(u[i+1],out[i]) > >>> return out > >>> > >>> To verify the execution time I've used the timeit function in Ipython: > >>> > >>> import vect_try as vt > >>> timeit vt.calc_vect(vt.u) --> 1000 loops, best of 3: 494 us per loop > >>> timeit vt.calc_sc(vt.u) -->10000 loops, best of 3: 92.8 us per loop > >>> > >>> As you can see the scalar implementation looping one item at the time > >>> (calc_sc) is 494/92.8~5.3 times faster than the vectorized one > >>> (calc_vect). > >>> > >>> My problem consists in the fact that I need to iterate the execution of > >>> calc_vect in order for it to operate on all the elements of the input > >>> array. > >>> If I execute calc_vect only once, it will only operate on the first > slice > >>> of > >>> the vectors leaving the remaining untouched. My understanding was that > >>> the > >>> vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate > over > >>> all the array, but that's not happening for me. Can anyone tell me > what I > >>> am > >>> doing wrong? > >>> > >>> Thanks! > >>> Francesco > >>> > >> 1. By vectorizing, we mean replacing a loop with a single expression. In > >> your program, both the scalar and vector implementations (calc_vect and > >> calc_sc) have a loop each. This isn't going to make anything faster. The > >> vectorized implementation is just a convoluted way of achieving the same > >> result and is slower. > >> > >> 2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the > >> following loop > >> > >> for i in range(0,n-1): > >> y[i+1] = y[i] + u[i+1] > >> > >> It is equivalent to something like > >> > >> z = np.zeros(n-1) > >> for i in range(0,n-1): > >> z[i] = y[i] + u[i+1] > >> y[1:n] = z > >> > >> i.e., the RHS is computed in totality and then assigned to the LHS. This > >> is how array operations work even in other languages such as Matlab. > >> > >> 3. I personally don't think there is a simple/obvious way to vectorize > >> what you're trying to achieve. > >> > >> Sameer > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Sat Apr 7 19:32:15 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Sun, 8 Apr 2012 01:32:15 +0200 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sun, Apr 1, 2012 at 12:25, Ralf Gommers wrote: > OK, that makes sense. So there are six test runs; for normal and debug > builds of 2.6.7, 2.7.3rc1 and 3.2.3rc2. Only the debug builds of the RCs > have a problem, debug build of 2.6.7 is fine. exactly. > So I'd think that most likely there is a problem with how the debug versions > of the RCs were built. it sounds possible: is there a way to isolate the failing test, so that I can provide a minimal test code for further investigation to python maintainer? Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From charlesr.harris at gmail.com Sat Apr 7 20:41:45 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Apr 2012 18:41:45 -0600 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sat, Apr 7, 2012 at 5:32 PM, Sandro Tosi wrote: > On Sun, Apr 1, 2012 at 12:25, Ralf Gommers > wrote: > > OK, that makes sense. So there are six test runs; for normal and debug > > builds of 2.6.7, 2.7.3rc1 and 3.2.3rc2. Only the debug builds of the RCs > > have a problem, debug build of 2.6.7 is fine. > > exactly. > > > So I'd think that most likely there is a problem with how the debug > versions > > of the RCs were built. > > it sounds possible: is there a way to isolate the failing test, so > that I can provide a minimal test code for further investigation to > python maintainer? > > Possibly related to ticket #1578. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 7 21:04:27 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Apr 2012 19:04:27 -0600 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sat, Apr 7, 2012 at 6:41 PM, Charles R Harris wrote: > > > On Sat, Apr 7, 2012 at 5:32 PM, Sandro Tosi wrote: > >> On Sun, Apr 1, 2012 at 12:25, Ralf Gommers >> wrote: >> > OK, that makes sense. So there are six test runs; for normal and debug >> > builds of 2.6.7, 2.7.3rc1 and 3.2.3rc2. Only the debug builds of the RCs >> > have a problem, debug build of 2.6.7 is fine. >> >> exactly. >> >> > So I'd think that most likely there is a problem with how the debug >> versions >> > of the RCs were built. >> >> it sounds possible: is there a way to isolate the failing test, so >> that I can provide a minimal test code for further investigation to >> python maintainer? >> >> > Possibly related to ticket #1578. > > I can reproduce at least one crash with python2.7 debug at arrayobject.c Program received signal SIGSEGV, Segmentation fault. 0x00000036a5882b94 in free () from /lib64/libc.so.6 (gdb) bt #0 0x00000036a5882b94 in free () from /lib64/libc.so.6 #1 0x00007ffff12ce399 in array_dealloc (self=0x1bc50d8) at numpy/core/src/multiarray/arrayobject.c:408 #2 0x00007ffff12b58dd in PyArray_Return (mp=0x1bc50d8) at numpy/core/src/multiarray/scalarapi.c:830 #3 PyArray_Return (mp=0x1bc50d8) at numpy/core/src/multiarray/scalarapi.c:803 #4 0x00007ffff12b5d68 in array_any (array=0x1b69de8, args=, kwds=) at numpy/core/src/multiarray/methods.c I don't know if it is the same. Chuck > > Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 7 22:01:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Apr 2012 20:01:22 -0600 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sat, Apr 7, 2012 at 7:04 PM, Charles R Harris wrote: > > > On Sat, Apr 7, 2012 at 6:41 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Apr 7, 2012 at 5:32 PM, Sandro Tosi wrote: >> >>> On Sun, Apr 1, 2012 at 12:25, Ralf Gommers >>> wrote: >>> > OK, that makes sense. So there are six test runs; for normal and debug >>> > builds of 2.6.7, 2.7.3rc1 and 3.2.3rc2. Only the debug builds of the >>> RCs >>> > have a problem, debug build of 2.6.7 is fine. >>> >>> exactly. >>> >>> > So I'd think that most likely there is a problem with how the debug >>> versions >>> > of the RCs were built. >>> >>> it sounds possible: is there a way to isolate the failing test, so >>> that I can provide a minimal test code for further investigation to >>> python maintainer? >>> >>> >> Possibly related to ticket #1578. >> >> > > I can reproduce at least one crash with python2.7 debug at arrayobject.c > > Program received signal SIGSEGV, Segmentation fault. > 0x00000036a5882b94 in free () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000036a5882b94 in free () from /lib64/libc.so.6 > #1 0x00007ffff12ce399 in array_dealloc (self=0x1bc50d8) > at numpy/core/src/multiarray/arrayobject.c:408 > #2 0x00007ffff12b58dd in PyArray_Return (mp=0x1bc50d8) > at numpy/core/src/multiarray/scalarapi.c:830 > #3 PyArray_Return (mp=0x1bc50d8) at > numpy/core/src/multiarray/scalarapi.c:803 > #4 0x00007ffff12b5d68 in array_any (array=0x1b69de8, args= out>, > kwds=) at numpy/core/src/multiarray/methods.c > > I don't know if it is the same. > > This one occurs in test_api.test_copyto_fromscalar. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Sun Apr 8 04:28:18 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Sun, 8 Apr 2012 10:28:18 +0200 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sun, Apr 8, 2012 at 02:41, Charles R Harris wrote: > Possibly related to ticket #1578. yes, that's exactly it: >>> $ python2.7-dbg -c "import sys ; sys.path.insert(0, '/home/morph/deb/build-area/python-numpy-1.6.1/debian/tmp/usr/lib/python$v/dist-packages/') ; import numpy; numpy.test(verbose=10)" Running unit tests for numpy /usr/lib/pymodules/python2.7/nose/plugins/manager.py:405: UserWarning: Module paste was already imported from None, but /usr/lib/python2.7/dist-packages is being added to sys.path import pkg_resources /usr/lib/pymodules/python2.7/nose/plugins/manager.py:405: UserWarning: Module dap was already imported from None, but /usr/lib/python2.7/dist-packages is being added to sys.path import pkg_resources NumPy version 1.5.1 NumPy is installed in /usr/lib/pymodules/python2.7/numpy Python version 2.7.3rc2 (default, Apr 5 2012, 13:54:40) [GCC 4.6.3] nose version 1.0.0 nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] numpy.core.tests.test_arrayprint.TestArrayRepr.test_nan_inf ... ok Ticket 844. ... ok numpy.core.tests.test_blasdot.test_blasdot_used ... ok test_from_object_array (numpy.core.tests.test_defchararray.TestBasic) ... ok test_from_object_array_unicode (numpy.core.tests.test_defchararray.TestBasic) ... ok test_from_string (numpy.core.tests.test_defchararray.TestBasic) ... ok test_from_string_array (numpy.core.tests.test_defchararray.TestBasic) ... ok test_from_unicode (numpy.core.tests.test_defchararray.TestBasic) ... Debug memory block at address p=0x2ec3cc0: API 'm' 8 bytes originally requested The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected. The 8 pad bytes at tail=0x2ec3cc8 are FORBIDDENBYTE, as expected. The block was made by call #7954800 to debug malloc/realloc. Data at p: a3 03 00 00 00 00 00 00 Fatal Python error: bad ID: Allocated using API 'm', verified using API 'o' Aborted <<< I've replied to the Trac issue attaching 2 gdb output files for 2.7.3rc2 and 3.2.3rc2 in debug flavor. If you want me to test anything, I'd be happy to. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From holgerherrlich05 at arcor.de Sun Apr 8 14:25:33 2012 From: holgerherrlich05 at arcor.de (Holger Herrlich) Date: Sun, 08 Apr 2012 20:25:33 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7AF5C1.5090604@arcor.de> References: <4F7AF5C1.5090604@arcor.de> Message-ID: <4F81D81D.3030609@arcor.de> That all sounds like no option -- sad. Cython is no solution cause, all I want is to leave Python Syntax in favor for strong OOP design patterns. Anyway, thanks Holger On 04/03/2012 03:06 PM, Holger Herrlich wrote: > > Hi, I plan to migrate core classes of an application from Python to C++ > using SWIG, while still the user interface being Python. I also plan to > further use NumPy's ndarrays. > > The application's core classes will create the ndarrays and make > calculations. The user interface (Python) finally receives it. C++ OOP > features will be deployed. > > What general ways to work with NumPy ndarrays in C++ are here? I know of > boost.python so far. > > Regards Holger > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sun Apr 8 14:34:32 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 8 Apr 2012 12:34:32 -0600 Subject: [Numpy-discussion] Testsuite fails with Python 2.7.3rc1 and 3.2.3rc1 (Debian) In-Reply-To: References: Message-ID: On Sun, Apr 8, 2012 at 2:28 AM, Sandro Tosi wrote: > On Sun, Apr 8, 2012 at 02:41, Charles R Harris > wrote: > > Possibly related to ticket #1578. > > yes, that's exactly it: > > >>> > $ python2.7-dbg -c "import sys ; sys.path.insert(0, > > '/home/morph/deb/build-area/python-numpy-1.6.1/debian/tmp/usr/lib/python$v/dist-packages/') > ; import numpy; numpy.test(verbose=10)" > Running unit tests for numpy > /usr/lib/pymodules/python2.7/nose/plugins/manager.py:405: UserWarning: > Module paste was already imported from None, but > /usr/lib/python2.7/dist-packages is being added to sys.path > import pkg_resources > /usr/lib/pymodules/python2.7/nose/plugins/manager.py:405: UserWarning: > Module dap was already imported from None, but > /usr/lib/python2.7/dist-packages is being added to sys.path > import pkg_resources > NumPy version 1.5.1 > NumPy is installed in /usr/lib/pymodules/python2.7/numpy > Python version 2.7.3rc2 (default, Apr 5 2012, 13:54:40) [GCC 4.6.3] > nose version 1.0.0 > nose.config: INFO: Excluding tests matching ['f2py_ext', > 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] > numpy.core.tests.test_arrayprint.TestArrayRepr.test_nan_inf ... ok > Ticket 844. ... ok > numpy.core.tests.test_blasdot.test_blasdot_used ... ok > test_from_object_array (numpy.core.tests.test_defchararray.TestBasic) ... > ok > test_from_object_array_unicode > (numpy.core.tests.test_defchararray.TestBasic) ... ok > test_from_string (numpy.core.tests.test_defchararray.TestBasic) ... ok > test_from_string_array (numpy.core.tests.test_defchararray.TestBasic) ... > ok > test_from_unicode (numpy.core.tests.test_defchararray.TestBasic) ... > Debug memory block at address p=0x2ec3cc0: API 'm' > 8 bytes originally requested > The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected. > The 8 pad bytes at tail=0x2ec3cc8 are FORBIDDENBYTE, as expected. > The block was made by call #7954800 to debug malloc/realloc. > Data at p: a3 03 00 00 00 00 00 00 > Fatal Python error: bad ID: Allocated using API 'm', verified using API 'o' > Aborted > <<< > > I've replied to the Trac issue attaching 2 gdb > output files for 2.7.3rc2 and 3.2.3rc2 in debug flavor. If you want me > to test anything, I'd be happy to. > > I've closed #2085 as a duplicate of #1578. Trying to track this down is a hike through the swamp without waders... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 8 15:09:42 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 8 Apr 2012 21:09:42 +0200 Subject: [Numpy-discussion] Keyword argument support for vectorize. In-Reply-To: References: Message-ID: On Sat, Apr 7, 2012 at 12:18 AM, Michael McNeil Forbes < michael.forbes at gmail.com> wrote: > Hi, > > I added a simple enhancement patch to provide vectorize with simple > keyword argument support. (I added a new kwvectorize decorator, but > suspect this could/should easily be rolled into the existing vectorize.) > > http://projects.scipy.org/numpy/ticket/2100 That looks like a useful enhancement. Integrating in the existing vectorize class should be the way to go. > > This just reorders any kwargs into the correct position (filling in > defaults as needed) and then class the "vectorize"d function. > > If people think this is reasonable, I can improve the patch with more > comprehensive testing and error messages. > > The vectorize tests look reasonable, although adding to verify that things work with class methods looks necessary. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From valentin.haenel at epfl.ch Sun Apr 8 17:39:18 2012 From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin) Date: Sun, 8 Apr 2012 23:39:18 +0200 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F7B720C.8010002@stsci.edu> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> Message-ID: <20120408213918.GA798@kudu.in-berlin.de> * Michael Droettboom [2012-04-03]: > On 04/03/2012 12:48 PM, Chris Barker wrote: > >It would be nice to have a clean C++ wrapper around ndarrays, but > >that doesn't exist yet (is there a good reason for that?) > Check out: > > http://code.google.com/p/numpy-boost/ Just out of curiosity, any idea how this compares to: http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html ? It's just that with: http://code.google.com/p/numpy-boost/ which was also mentioned on this list, I now know of four alternatives for boost+NumPy (including the built-in support). Would anyone like to share perhaps a sentence or two about their experience in using either? (comparisons, of course, are highly appreciated :D ) Thanks! V- From paul.anton.letnes at gmail.com Mon Apr 9 02:36:08 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 9 Apr 2012 08:36:08 +0200 Subject: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG In-Reply-To: References: Message-ID: <5D1F584C-095E-41A7-AB61-A0ABAC08702E@gmail.com> On 2. apr. 2012, at 15:47, David Cournapeau wrote: > > > On Sun, Apr 1, 2012 at 2:28 PM, Kamesh Krishnamurthy wrote: > Hello all, > > I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. I've posted details on Stackoverflow: > http://stackoverflow.com/q/9955021/974568 > > Can someone please let me know the reason for the performance gap? > > I would look at two things: > > - first, are you sure matlab is not using the MKL instead of accelerate framework ? I have not used matlab in ages, but you should be able to check by using otool -L to some of the core dll of matlab, to find out which libraries are linked to it > - second, it could be that matlab eig and numpy eig don't use the same underlying lapack API (do they give you the same result ?). This would already be a bit harder to check, unless it is documented explicitly in matlab. > > regards, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Without commenting on the timing issues, I see that in the folder /Applications/MATLAB_R2011b.app/bin/maci64 there are files with mkl in their names: mkl.dylib mklcompat.dylib I'm guessing that Matlab always uses MKL. Paul From michael.forbes at gmail.com Mon Apr 9 05:53:46 2012 From: michael.forbes at gmail.com (Michael McNeil Forbes) Date: Mon, 9 Apr 2012 02:53:46 -0700 Subject: [Numpy-discussion] Keyword argument support for vectorize. In-Reply-To: References: Message-ID: On 8 Apr 2012, at 12:09 PM, Ralf Gommers wrote: > That looks like a useful enhancement. Integrating in the existing > vectorize class should be the way to go. Okay. I will push forward. I would also like to add support for "freezing" (or "excluding") certain arguments from the vectorization. Any ideas for a good argument name? (I am just using "exclude=['p']" for now). The use case I have is vectorizing polynomial evaluation `polyval(p, x)`. The coefficient array `p` should not be vectorized over, only the variable `x`, so something like: @vectorize(exclude=set(['p'])) def mypolyval(p, x): return np.polyval(p, x) would work like np.polyval currently behaves: >>> mypolyval([1.0,2.0],[0.0,3.0]) array([ 2., 5.]) (Of course, numpy already has polyval: I am actually trying to wrap similar functions that use Theano for automatic differentiation, but the idea is the same). It seems like functools.partial is the appropriate tool to use here which means I will have to deal with the This will require overcoming the issues with how vectorize deduces the number of parameters, but if I integrate this with the vectorize class, then this should be easy to patch as well. http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052642.html Michael. > On Sat, Apr 7, 2012 at 12:18 AM, Michael McNeil Forbes > wrote: > Hi, > >> I added a simple enhancement patch to provide vectorize with simple >> keyword argument support. (I added a new kwvectorize decorator, but >> suspect this could/should easily be rolled into the existing >> vectorize.) >> >> http://projects.scipy.org/numpy/ticket/2100 ... From njs at pobox.com Mon Apr 9 06:02:50 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 9 Apr 2012 11:02:50 +0100 Subject: [Numpy-discussion] Keyword argument support for vectorize. In-Reply-To: References: Message-ID: On Mon, Apr 9, 2012 at 10:53 AM, Michael McNeil Forbes wrote: > It seems like functools.partial is the appropriate tool to use here > which means I will have to deal with the functools was added in Python 2.5, and so far numpy is still trying to maintain 2.4 compatibility. (Not that this is particularly complicated code to reimplement.) - N From williamj at tenbase2.com Mon Apr 9 10:37:06 2012 From: williamj at tenbase2.com (William Johnston) Date: Mon, 9 Apr 2012 10:37:06 -0400 Subject: [Numpy-discussion] NpyAccessLib method documentation? Message-ID: <89DE819561494065B7EBFA1713FD6362@leviathan> Hello, Is there NpyAccessLib documentation available? I need to use DLLImport for a C# IronPython DLR app and am not sure which methods to include. Thank you. Regards, William Johnston -------------- next part -------------- An HTML attachment was scrubbed... URL: From jniehof at lanl.gov Mon Apr 9 12:14:44 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Mon, 09 Apr 2012 10:14:44 -0600 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: Message-ID: <4F830AF4.9080608@lanl.gov> On 04/06/2012 06:54 AM, Benjamin Root wrote: > Take a peek at how np.gradient() does it. It creates a list of None with > a length equal to the number of dimensions, and then inserts a slice > object in the appropriate spot in the list. List of slice(None), correct? At least that's what I see in the source, and: >>> a = numpy.array([[1,2],[3,4]]) >>> operator.getitem(a, (None, slice(1, 2))) array([[[3, 4]]]) >>> operator.getitem(a, (slice(None), slice(1, 2))) array([[2], [4]]) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From ben.root at ou.edu Mon Apr 9 12:22:32 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 9 Apr 2012 12:22:32 -0400 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: <4F830AF4.9080608@lanl.gov> References: <4F830AF4.9080608@lanl.gov> Message-ID: On Mon, Apr 9, 2012 at 12:14 PM, Jonathan T. Niehof wrote: > On 04/06/2012 06:54 AM, Benjamin Root wrote: > > > Take a peek at how np.gradient() does it. It creates a list of None with > > a length equal to the number of dimensions, and then inserts a slice > > object in the appropriate spot in the list. > > List of slice(None), correct? At least that's what I see in the source, > and: > > >>> a = numpy.array([[1,2],[3,4]]) > >>> operator.getitem(a, (None, slice(1, 2))) > array([[[3, 4]]]) > >>> operator.getitem(a, (slice(None), slice(1, 2))) > array([[2], > [4]]) > > Correct, sorry, I was working from memory. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Apr 9 12:24:16 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Apr 2012 09:24:16 -0700 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: <20120408213918.GA798@kudu.in-berlin.de> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <20120408213918.GA798@kudu.in-berlin.de> Message-ID: 2012/4/8 H?nel Nikolaus Valentin : http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html that looks like it hasn't been updated since 2006 -- I"d say that makes it a non-starter The new numpy-boost project looks promising, though. > which was also mentioned on this list, I now know of four alternatives > for boost+NumPy (including the built-in support). 4? Given the pace of change in numpy, it looks to me like the new numpy-boost is the only viable (Boost) option, other than rolling your own. For the record, Cython (with or without another lib like Blitz++) doesn't preclude "strong OOP design patterns". -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From valentin.haenel at epfl.ch Mon Apr 9 12:58:38 2012 From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin) Date: Mon, 9 Apr 2012 18:58:38 +0200 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <20120408213918.GA798@kudu.in-berlin.de> Message-ID: <20120409165838.GA439@kudu.in-berlin.de> Hi Chris, thanks for your answer. * Chris Barker [2012-04-09]: > 2012/4/8 H?nel Nikolaus Valentin >: > http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html > > that looks like it hasn't been updated since 2006 -- I"d say that > makes it a non-starter Yeah, thats what I thought... Until I found it in several production codes... > The new numpy-boost project looks promising, though. > > > which was also mentioned on this list, I now know of four alternatives > > for boost+NumPy (including the built-in support). > > 4? http://www.boost.org/doc/libs/1_49_0/libs/python/doc/v2/numeric.html (old) https://github.com/ndarray/Boost.NumPy (new) http://code.google.com/p/numpy-boost/ http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html Or am I missing something? V- From chris.barker at noaa.gov Mon Apr 9 14:27:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Apr 2012 11:27:55 -0700 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: <20120409165838.GA439@kudu.in-berlin.de> References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <20120408213918.GA798@kudu.in-berlin.de> <20120409165838.GA439@kudu.in-berlin.de> Message-ID: 2012/4/9 H?nel Nikolaus Valentin : http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html >> >> that looks like it hasn't been updated since 2006 -- I"d say that >> makes it a non-starter > > Yeah, thats what I thought... Until I found it in several production > codes... are they maintaining it? >> 4? > > http://www.boost.org/doc/libs/1_49_0/libs/python/doc/v2/numeric.html > (old) > > https://github.com/ndarray/Boost.NumPy > (new) > > http://code.google.com/p/numpy-boost/ (also pretty old -- I see this:) - Numpy (numpy.scipy.org) (Tested versions: 1.1.1, though >= 1.0 should work) - Python (www.python.org) (Tested versions: 2.5.2, though >= 2.3 should work) both pretty old versions. http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html also pretty old. So I'd go with the actively maintained on -- or Cython -- what I can tell you is that Cython is being very widely used in the numerical/scientific computing community -- but I haven't seen a lot of Boost users. Maybe they use different mailing lists, and dont go to SciPy or Pycon... I'm not sure you made your use case clear -- are you writing C++ specifically for calling form Python? or are you working on a C++ lib that will be used in C++ apps as well as Python apps? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From zachary.pincus at yale.edu Mon Apr 9 14:38:10 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 9 Apr 2012 14:38:10 -0400 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F81D81D.3030609@arcor.de> References: <4F7AF5C1.5090604@arcor.de> <4F81D81D.3030609@arcor.de> Message-ID: <5A886E87-76B1-4222-9E62-C9A9FCD68F05@yale.edu> > That all sounds like no option -- sad. > Cython is no solution cause, all I want is to leave Python Syntax in > favor for strong OOP design patterns. What about ctypes? For straight numerical work where sometimes all one needs to hand across the python-to-C/C++/Fortran boundary is a pointer to some memory and the size of the memory region. So I will often just write a very thin C interface to whatever I'm using and then call into it with ctypes. So you could just design your algorithm in C++ with all the "strong OOP design patterns" you want, and then just write a minimal C interface on top with one or two functions that receive pointers to memory. Then allocate numpy arrays in python with whatever memory layout you need, and use the a "ctypes" attribute to find the pointer data etc. that you need to pass over. Zach From d.s.seljebotn at astro.uio.no Mon Apr 9 15:19:35 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 09 Apr 2012 21:19:35 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F81D81D.3030609@arcor.de> References: <4F7AF5C1.5090604@arcor.de> <4F81D81D.3030609@arcor.de> Message-ID: <4F833647.1070804@astro.uio.no> On 04/08/2012 08:25 PM, Holger Herrlich wrote: > > That all sounds like no option -- sad. > Cython is no solution cause, all I want is to leave Python Syntax in > favor for strong OOP design patterns. I'm sorry, I'm trying and trying to make heads and tails of this paragraph, but I don't manage to. If you could rephrase it that would be very helpful. (But I'm afraid that if you believe that C++ is more object-oriented than Python, you'll find most people disagree. Perhaps you meant that you want static typing?) Any wrapping tool (Cython, ctypes, probably SWIG but don't know) will allow you to pass the pointer to the data array, the npy_intp* shape array, and the npy_intp* strides array. Really, that's all you need. And given those three, writing a simple C++ class wrapping the arrays and allowing you to conveniently index into the array is done in 15 minutes. If you need more than that -- that is, what you want is essentially to "use NumPy from C++", which slicing and ufuncs and reductions and so on -- then you should probably look into a C++ array library (such as Eigen or Blitz++ or the array stuff in boost). Then, pass the aforementioned data and shape/stride arrays to the array library. Dag From paustin at eos.ubc.ca Mon Apr 9 16:28:42 2012 From: paustin at eos.ubc.ca (Phil Austin) Date: Mon, 09 Apr 2012 13:28:42 -0700 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <20120408213918.GA798@kudu.in-berlin.de> <20120409165838.GA439@kudu.in-berlin.de> Message-ID: <4F83467A.7020302@eos.ubc.ca> On 12-04-09 11:27 AM, Chris Barker wrote: > > http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html > > > also pretty old. > > So I'd go with the actively maintained on -- or Cython -- what I can > tell you is that Cython is being very widely used in the > numerical/scientific computing community -- but I haven't seen a lot > of Boost users. Maybe they use different mailing lists, and dont go to > SciPy or Pycon... FWIW -- we moved all of our extensions from num_util/boost to cython when it started supporting numpy, and have been happy with the results. I'll put something to that effect on our pythonlibs page -- best, Phil From bryanv at continuum.io Mon Apr 9 16:32:22 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Mon, 09 Apr 2012 15:32:22 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> Message-ID: <4F834756.2040008@continuum.io> On 4/3/12 4:18 PM, Ralf Gommers wrote: > Here some first impressions. > > The good: > - It's responsive! > - It remembers my preferences (view type, # of issues per page, etc.) > - Editing multiple issues with the command window is easy. > - Search and filter functionality is powerful > > The bad: > - Multiple projects are supported, but issues are then really mixed. > The way this works doesn't look very useful for combined admin of > numpy/scipy trackers. > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > downloaded. > - No direct VCS integration, only via Teamcity (not set up, so can't > evaluate). > - No useful default views as in Trac > (http://projects.scipy.org/scipy/report). Ralf, regarding some of the issues: I think for numpy/scipy trackers, we could simply run separate instances of YouTrack for each. Also we can certainly create some standard queries. It's a small pain not to have useful defaults, but it's only a one-time pain. :) Also, what kind of integration are you looking for with github? There does appear to be the ability to issue commands to youtrack through git commits, which does not depend on TeamCity, as best I can tell: http://confluence.jetbrains.net/display/YTD3/GitHub+Integration http://blogs.jetbrains.com/youtrack/tag/github-integration/ I'm not sure this is what you were thinking about though. For the other issues, Maggie or I can try and see what we can find out about implementing them, or working around them, this week. Of course, we'd like to evaluate any other viable issue trackers as well. Do you have any suggestions for other systems besides YouTrack? Bryan From teoliphant at gmail.com Mon Apr 9 20:11:55 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Mon, 9 Apr 2012 19:11:55 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C Message-ID: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Hi all, Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. This is essentially what ctypes does when creating a ctypes function pointer out of: func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. Something like this is what is envisioned here: typedef struct { PyObject_HEAD char *b_ptr; } _cfuncptr_object; then the function pointer is: (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) which could be wrapped-up into a nice little NumPy C-API call like void * Npy_ctypes_funcptr(obj) 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of ctypes.cast(obj, ctypes.c_void_p).value There is working code for this in the ctypes_callback branch of my scipy fork on github. I would like to propose two things: * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and * implement it with the simple pointer dereference above (option #1) Thoughts? -Travis From njs at pobox.com Mon Apr 9 20:21:55 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 10 Apr 2012 01:21:55 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Message-ID: ...isn't this an operation that will be performed once per compiled function? Is the overhead of the easy, robust method (calling ctypes.cast) actually measurable as compared to, you know, running an optimizing compiler? I mean, I doubt there'd be any real problem with adding this extra API to numpy, but it does seem like there might be higher priority targets :-) On Apr 10, 2012 1:12 AM, "Travis Oliphant" wrote: > Hi all, > > Some of you are aware of Numba. Numba allows you to create the > equivalent of C-function's dynamically from Python. One purpose of this > system is to allow NumPy to take these functions and use them in operations > like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so > forth. There are actually many use-cases that one can imagine for such > things. > > One question is how do you pass this function pointer to the C-side. On > the Python side, Numba allows you to get the raw integer address of the > equivalent C-function pointer that it just created out of the Python code. > One can think of this as a 32- or 64-bit integer that you can cast to a > C-function pointer. > > Now, how should this C-function pointer be passed from Python to NumPy? > One approach is just to pass it as an integer --- in other words have an > API in C that accepts an integer as the first argument that the internal > function interprets as a C-function pointer. > > This is essentially what ctypes does when creating a ctypes function > pointer out of: > > func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) > > Of course the problem with this is that you can easily hand it integers > which don't make sense and which will cause a segfault when control is > passed to this "function" > > We could also piggy-back on-top of Ctypes and assume that a ctypes > function-pointer object is passed in. This allows some error-checking at > least and also has the benefit that one could use ctypes to access a > c-function library where these functions were defined. I'm leaning towards > this approach. > > Now, the issue is how to get the C-function pointer (that npy_intp > integer) back and hand it off internally. Unfortunately, ctypes does not > make it very easy to get this address (that I can see). There is no > ctypes C-API, for example. There are two potential options: > > 1) Create an API for such Ctypes function pointers in NumPy and use > the ctypes object structure. If ctypes were to ever change it's object > structure we would have to adapt this API. > > Something like this is what is envisioned here: > > typedef struct { > PyObject_HEAD > char *b_ptr; > } _cfuncptr_object; > > then the function pointer is: > > (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) > > which could be wrapped-up into a nice little NumPy C-API call like > > void * Npy_ctypes_funcptr(obj) > > > 2) Use the Python API of ctypes to do the same thing. This has > the advantage of not needing to mirror the simple _cfuncptr_object > structure in NumPy but it is *much* slower to get the address. It > basically does the equivalent of > > ctypes.cast(obj, ctypes.c_void_p).value > > > There is working code for this in the ctypes_callback branch of my > scipy fork on github. > > > I would like to propose two things: > > * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy > and > * implement it with the simple pointer dereference above (option #1) > > > Thoughts? > > -Travis > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Apr 9 20:57:42 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 9 Apr 2012 19:57:42 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Message-ID: <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: > ...isn't this an operation that will be performed once per compiled function? Is the overhead of the easy, robust method (calling ctypes.cast) actually measurable as compared to, you know, running an optimizing compiler? > > Yes, there can be significant overhead. The compiler is run once and creates the function. This function is then potentially used many, many times. Also, it is entirely conceivable that the "build" step happens at a separate "compilation" time, and Numba actually loads a pre-compiled version of the function from disk which it then uses at run-time. I have been playing with a version of this using scipy.integrate and unfortunately the overhead of ctypes.cast is rather significant --- to the point of making the code-path using these function pointers to be useless when without the ctypes.cast overhed the speed up is 3-5x. In general, I think NumPy will need its own simple function-pointer object to use when handing over raw-function pointers between Python and C. SciPy can then re-use this object which also has a useful C-API for things like signature checking. I have seen that ctypes is nice but very slow and without a compelling C-API. The kind of new C-level cfuncptr object I imagine has attributes: void *func_ptr; char *signature string /* something like 'dd->d' to indicate a function that takes two doubles and returns a double */ methods would be: from_ctypes (classmethod) to_ctypes and simple inline functions to get the function pointer and the signature. > I mean, I doubt there'd be any real problem with adding this extra API to numpy, but it does seem like there might be higher priority targets :-) > > Not if you envision doing a lot of code-development using Numba ;-) -Travis > On Apr 10, 2012 1:12 AM, "Travis Oliphant" wrote: > Hi all, > > Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. > > One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. > > Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. > > This is essentially what ctypes does when creating a ctypes function pointer out of: > > func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) > > Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" > > We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. > > Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: > > 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. > > Something like this is what is envisioned here: > > typedef struct { > PyObject_HEAD > char *b_ptr; > } _cfuncptr_object; > > then the function pointer is: > > (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) > > which could be wrapped-up into a nice little NumPy C-API call like > > void * Npy_ctypes_funcptr(obj) > > > 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of > > ctypes.cast(obj, ctypes.c_void_p).value > > > There is working code for this in the ctypes_callback branch of my scipy fork on github. > > > I would like to propose two things: > > * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and > * implement it with the simple pointer dereference above (option #1) > > > Thoughts? > > -Travis > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Mon Apr 9 23:11:49 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Mon, 9 Apr 2012 23:11:49 -0400 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: <4F830AF4.9080608@lanl.gov> Message-ID: On Mon, Apr 9, 2012 at 12:22 PM, Benjamin Root wrote: > > > On Mon, Apr 9, 2012 at 12:14 PM, Jonathan T. Niehof wrote: > >> On 04/06/2012 06:54 AM, Benjamin Root wrote: >> >> > Take a peek at how np.gradient() does it. It creates a list of None with >> > a length equal to the number of dimensions, and then inserts a slice >> > object in the appropriate spot in the list. >> >> List of slice(None), correct? At least that's what I see in the source, >> and: >> >> >>> a = numpy.array([[1,2],[3,4]]) >> >>> operator.getitem(a, (None, slice(1, 2))) >> array([[[3, 4]]]) >> >>> operator.getitem(a, (slice(None), slice(1, 2))) >> array([[2], >> [4]]) >> >> > Correct, sorry, I was working from memory. > > Ben Root > > I guess I wasn't reading very carefully and assumed that you meant a list of `slice(None)` instead of a list of `None`. In any case, both your solution and Matthew's solution work (and both are more readable than my original implementation). After I got everything cleaned up (and wrote documentation and tests), I found out that numpy already has a function to do *exactly* what I wanted in the first place: `np.split` (the slicing was just one component of this). I was initially misled by the docstring, but with a list of indices, you can split an array into subarrays of variable length (I wanted to use this to save and load ragged arrays). Well, I guess it was a learning experience, at least. In case anyone is wondering about the original question, `np.split` (and `np.array_split`) uses `np.swapaxes` to specify the slicing axis. Thanks for all your help. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Apr 10 00:52:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 9 Apr 2012 23:52:53 -0500 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x Message-ID: Hey all, I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin. Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. I think there are some things still to be decided. However, I think some things are pretty clear: 1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. The numpy.ma code will remain as a compatibility layer 2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). However, the semantic of what the mask implies may change from what numpy.ma uses to having a True value meaning selected. 3) There will be missing-data dtypes in NumPy. Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. 4) I'm still not sure about whether the IGNORED concept is necessary or not. I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)? My current weak view is that it is not really necessary. But, I could be convinced otherwise. I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. Thanks, -Travis From charlesr.harris at gmail.com Tue Apr 10 01:52:45 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 9 Apr 2012 23:52:45 -0600 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: On Mon, Apr 9, 2012 at 10:52 PM, Travis Oliphant wrote: > Hey all, > > I've been waiting for Mark Wiebe to arrive in Austin where he will spend > several weeks, but I also know that masked arrays will be only one of the > things he and I are hoping to make head-way on while he is in Austin. > Nevertheless, we need to make progress on the masked array discussion and > if we want to finalize the masked array implementation we will need to > finish the design. > > I've caught up on most of the discussion including Mark's NEP, Nathaniel's > NEP and other writings and the very-nice mailing list discussion that > included a somewhat detailed discussion on the algebra of IGNORED. I > think there are some things still to be decided. However, I think some > things are pretty clear: > > 1) Masked arrays are going to be fundamental in NumPy and these > should replace most people's use of numpy.ma. The numpy.ma code will > remain as a compatibility layer > > 2) The reality of #1 and NumPy's general philosophy to date means > that masked arrays in NumPy should support the common use-cases of masked > arrays (including getting and setting of the mask from the Python and > C-layers). However, the semantic of what the mask implies may change from > what numpy.ma uses to having a True value meaning selected. > > 3) There will be missing-data dtypes in NumPy. Likely only a > limited sub-set (string, bytes, int64, int32, float32, float64, complex64, > complex32, and object) with an API that allows more to be defined if > desired. These will most likely use Mark's nice machinery for managing > the calculation structure without requiring new C-level loops to be defined. > > 4) I'm still not sure about whether the IGNORED concept is > necessary or not. I really like the separation that was emphasized > between implementation (masks versus bit-patterns) and operations > (propagating versus non-propagating). Pauli even created another > dimension which I don't totally grok and therefore can't remember. Pauli? > Do you still feel that is a necessary construction? But, do we need the > IGNORED concept to indicate what amounts to different default key-word > arguments to functions that operate on NumPy arrays containing missing data > (however that is represented)? My current weak view is that it is not > really necessary. But, I could be convinced otherwise. > > I think the good news is that given Mark's hard-work and Nathaniel's > follow-up we are really quite far along. I would love to get Nathaniel's > opinion about what remains un-done in the current NumPy code-base. I > would also appreciate knowing (from anyone with an interest) opinions of > items 1-4 above and anything else I've left out. > > Somewhat off topic, but as I wander around inside Numpy these days I'm impressed as to how much code Mark touched in implementing masked arrays, the traces are everywhere. It's quite remarkable, especially given the short amount of time he had to work on it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Apr 10 03:16:01 2012 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 09 Apr 2012 21:16:01 -1000 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: <4F83DE31.2010700@hawaii.edu> On 04/09/2012 06:52 PM, Travis Oliphant wrote: > Hey all, > > I've been waiting for Mark Wiebe to arrive in Austin where he will > spend several weeks, but I also know that masked arrays will be only > one of the things he and I are hoping to make head-way on while he is > in Austin. Nevertheless, we need to make progress on the masked > array discussion and if we want to finalize the masked array > implementation we will need to finish the design. > > I've caught up on most of the discussion including Mark's NEP, > Nathaniel's NEP and other writings and the very-nice mailing list > discussion that included a somewhat detailed discussion on the > algebra of IGNORED. I think there are some things still to be > decided. However, I think some things are pretty clear: > > 1) Masked arrays are going to be fundamental in NumPy and these > should replace most people's use of numpy.ma. The numpy.ma code > will remain as a compatibility layer Excellent! In mpl and other heavy users of numpy.ma there will still be work to do to handle all varieties of input, but it should be manageable. > > 2) The reality of #1 and NumPy's general philosophy to date means > that masked arrays in NumPy should support the common use-cases of > masked arrays (including getting and setting of the mask from the > Python and C-layers). However, the semantic of what the mask implies > may change from what numpy.ma uses to having a True value meaning > selected. I never understood a strong argument for that change from numpy.ma. When editing data, it is natural to use flag bits to indicate various rejection criteria; no bit set means it's all good, so a False is naturally "good" and True is naturally "mask it out". But I can live with the change if you and Mark see a good reason for it. > 3) There will be missing-data dtypes in NumPy. Likely > only a limited sub-set (string, bytes, int64, int32, float32, > float64, complex64, complex32, and object) with an API that allows > more to be defined if desired. These will most likely use Mark's > nice machinery for managing the calculation structure without > requiring new C-level loops to be defined. So, these will be the bit-pattern versions of NA, correct? With the bit pattern specified as an attribute of the dtype? Good, but... Are we getting into trouble here, figuring out how to handle all combinations of numpy.ma, masked dtypes, and Mark's masked NA? > > 4) I'm still not sure about whether the IGNORED concept is necessary > or not. I really like the separation that was emphasized between > implementation (masks versus bit-patterns) and operations > (propagating versus non-propagating). Pauli even created another > dimension which I don't totally grok and therefore can't remember. > Pauli? Do you still feel that is a necessary construction? But, do > we need the IGNORED concept to indicate what amounts to different > default key-word arguments to functions that operate on NumPy arrays > containing missing data (however that is represented)? My current > weak view is that it is not really necessary. But, I could be > convinced otherwise. I agree (if I understand you correctly); the goal is an expressive, explicit language that lets people accomplish what they want, clearly and quickly, and I think this is more a matter of practicality than purity of theory. Nevertheless, achieving that is easier said than done, and figuring out how to handle corner cases is better done sooner than later. Numpy.ma has never been perfect, but it has proven a good tool for practical work in my experience. (Many thanks to Pierre GM for all his work on it.) One of the nice things it does is to automatically mask out invalid results. This saves quit a bit of explicit checking that would otherwise be required. Eric > > I think the good news is that given Mark's hard-work and Nathaniel's > follow-up we are really quite far along. I would love to get > Nathaniel's opinion about what remains un-done in the current NumPy > code-base. I would also appreciate knowing (from anyone with an > interest) opinions of items 1-4 above and anything else I've left > out. > > Thanks, > > -Travis From njs at pobox.com Tue Apr 10 06:37:34 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 10 Apr 2012 11:37:34 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> Message-ID: On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: > On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: > > ...isn't this an operation that will be performed once per compiled > function? Is the overhead of the easy, robust method (calling ctypes.cast) > actually measurable as compared to, you know, running an optimizing > compiler? > > Yes, there can be significant overhead. ? The compiler is run once and > creates the function. ? This function is then potentially used many, many > times. ? ?Also, it is entirely conceivable that the "build" step happens at > a separate "compilation" time, and Numba actually loads a pre-compiled > version of the function from disk which it then uses at run-time. > > I have been playing with a version of this using scipy.integrate and > unfortunately the overhead of ctypes.cast is rather significant --- to the > point of making the code-path using these function pointers to be useless > when without the ctypes.cast overhed the speed up is 3-5x. Ah, I was assuming that you'd do the cast once outside of the inner loop (at the same time you did type compatibility checking and so forth). > In general, I think NumPy will need its own simple function-pointer object > to use when handing over raw-function pointers between Python and C. ? SciPy > can then re-use this object which also has a useful C-API for things like > signature checking. ? ?I have seen that ctypes is nice but very slow and > without a compelling C-API. Sounds reasonable to me. Probably nicer than violating ctypes's abstraction boundary, and with no real downsides. > The kind of new C-level cfuncptr object I imagine has attributes: > > void *func_ptr; > char *signature string ?/* something like 'dd->d' to indicate a function > that takes two doubles and returns a double */ This looks like it's setting us up for trouble later. We already have a robust mechanism for describing types -- dtypes. We should use that instead of inventing Yet Another baby type system. We'll need to convert between this representation and dtypes anyway if you want to use these pointers for ufunc loops... and if we just use dtypes from the start, we'll avoid having to break the API the first time someone wants to pass a struct or array or something. > methods would be: > > from_ctypes ?(classmethod) > to_ctypes > and simple inline functions to get the function pointer and the signature. The other approach would be to define an interface, something like: class MyFuncWrapper: def func_pointer(requested_rettype, requested_argtypes): return an_integer fp = wrapper.func_pointer(float, (float, float)) This would be trivial to implement for ctypes functions, cython functions, and numba. For ctypes or cython you'd probably just check that the requested prototype matched the prototype for the wrapped function and otherwise raise an error. For numba you'd check if you've already compiled the function for the given type signature, and if not then you could compile it on the fly. It'd also let you wrap an entire family of ufunc loop functions at once (maybe np.add.c_func is an object that implements the above interface to return any registered add loop). OTOH maybe there are places where the code that *calls* the "c function object" should be adapting to its signature, rather than the other way around -- in that case you'd want some way for the "c function object" to advertise what signature(s) it supports. I'm not sure which way the flexibility goes for the cases you're thinking of. I feel iike I may not be putting my finger on what you're asking, though, so hopefully these random thoughts are helpful. -- Nathaniel From robert.kern at gmail.com Tue Apr 10 06:53:56 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 10 Apr 2012 11:53:56 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Message-ID: On Tue, Apr 10, 2012 at 01:11, Travis Oliphant wrote: > ? ? ? ?1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. ?If ctypes were to ever change it's object structure we would have to adapt this API. > > ? ? ? ?Something like this is what is envisioned here: > > ? ? ? ? ? ? typedef struct { > ? ? ? ? ? ? ? ? ? ? ? ?PyObject_HEAD > ? ? ? ? ? ? ? ? ? ? ? ?char *b_ptr; > ? ? ? ? ? ? } _cfuncptr_object; Why not just use PyCapsules? http://docs.python.org/release/2.7/c-api/capsule.html -- Robert Kern From d.s.seljebotn at astro.uio.no Tue Apr 10 08:36:05 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 14:36:05 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> Message-ID: <4F842935.2010004@astro.uio.no> Hi Travis, we've been discussing almost the exact same thing in Cython (on a workshop, not on the mailing list, I'm afraid). Our specific example-usecase was passing a Cython function to scipy.integrate. On 04/10/2012 02:57 AM, Travis Oliphant wrote: > > On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: > >> ...isn't this an operation that will be performed once per compiled >> function? Is the overhead of the easy, robust method (calling >> ctypes.cast) actually measurable as compared to, you know, running an >> optimizing compiler? >> >> > > Yes, there can be significant overhead. The compiler is run once and > creates the function. This function is then potentially used many, many > times. Also, it is entirely conceivable that the "build" step happens at > a separate "compilation" time, and Numba actually loads a pre-compiled > version of the function from disk which it then uses at run-time. > > I have been playing with a version of this using scipy.integrate and > unfortunately the overhead of ctypes.cast is rather significant --- to > the point of making the code-path using these function pointers to be > useless when without the ctypes.cast overhed the speed up is 3-5x. There's an N where the cost of the ctypes.cast is properly amortized though, right? The ctypes.cast should only be called once at the beginning of scipy.integrate? > In general, I think NumPy will need its own simple function-pointer > object to use when handing over raw-function pointers between Python and > C. SciPy can then re-use this object which also has a useful C-API for > things like signature checking. I have seen that ctypes is nice but very > slow and without a compelling C-API. > > > The kind of new C-level cfuncptr object I imagine has attributes: > > void *func_ptr; > char *signature string /* something like 'dd->d' to indicate a function > that takes two doubles and returns a double */ > > methods would be: > > from_ctypes (classmethod) > to_ctypes > and simple inline functions to get the function pointer and the signature. This is more or less the same format we discussed for Cython functions. What we wanted to do is to write Cython code like this: cpdef double f(double x, double y): ... and when passing f to scipy.integrate, let it call the inner C function directly. We even worked with the exact same format string in our disscussions :-) Long term, in Cython we could use the type information together with LLVM to generate adapted code wherever Cython calls objects (in call-sites). So ideally we would want to agree on an API, so that Cython functions can be passed to scipy.integrate, and so that numba functions can be jumped to directly from Cython code. Comments: - PEP3118-augmented format strings should work well, and we may want to enforce a canonicalized subset (i.e. whitespace is not allowed, do not use repeat specifiers, ...anything else?) - What you propose above already do two pointer jumps (with possibly associated cache misses and stalls) if you want to validate the signature, which can be eliminated (at least from Cython's perspective). But I'll let this thread go on a little longer, to figure out the "is this needed for NumPy" question, before continuing on my bikeshedding on performance issues. Dag From d.s.seljebotn at astro.uio.no Tue Apr 10 08:39:17 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 14:39:17 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> Message-ID: <4F8429F5.20601@astro.uio.no> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: > On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: >> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >> >> ...isn't this an operation that will be performed once per compiled >> function? Is the overhead of the easy, robust method (calling ctypes.cast) >> actually measurable as compared to, you know, running an optimizing >> compiler? >> >> Yes, there can be significant overhead. The compiler is run once and >> creates the function. This function is then potentially used many, many >> times. Also, it is entirely conceivable that the "build" step happens at >> a separate "compilation" time, and Numba actually loads a pre-compiled >> version of the function from disk which it then uses at run-time. >> >> I have been playing with a version of this using scipy.integrate and >> unfortunately the overhead of ctypes.cast is rather significant --- to the >> point of making the code-path using these function pointers to be useless >> when without the ctypes.cast overhed the speed up is 3-5x. > > Ah, I was assuming that you'd do the cast once outside of the inner > loop (at the same time you did type compatibility checking and so > forth). > >> In general, I think NumPy will need its own simple function-pointer object >> to use when handing over raw-function pointers between Python and C. SciPy >> can then re-use this object which also has a useful C-API for things like >> signature checking. I have seen that ctypes is nice but very slow and >> without a compelling C-API. > > Sounds reasonable to me. Probably nicer than violating ctypes's > abstraction boundary, and with no real downsides. > >> The kind of new C-level cfuncptr object I imagine has attributes: >> >> void *func_ptr; >> char *signature string /* something like 'dd->d' to indicate a function >> that takes two doubles and returns a double */ > > This looks like it's setting us up for trouble later. We already have > a robust mechanism for describing types -- dtypes. We should use that > instead of inventing Yet Another baby type system. We'll need to > convert between this representation and dtypes anyway if you want to > use these pointers for ufunc loops... and if we just use dtypes from > the start, we'll avoid having to break the API the first time someone > wants to pass a struct or array or something. For some of the things we'd like to do with Cython down the line, something very fast like what Travis describes is exactly what we need; specifically, if you have Cython code like cdef double f(func): return func(3.4) that may NOT be called in a loop. But I do agree that this sounds overkill for NumPy+numba at the moment; certainly for scipy.integrate where you can amortize over N function samples. But Travis perhaps has a usecase I didn't think of. Dag From heng at cantab.net Tue Apr 10 07:44:43 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 10 Apr 2012 12:44:43 +0100 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? Message-ID: <4F841D2B.8060406@cantab.net> Here is the body of a post I made on stackoverflow, but it seems to be a non-obvious issue. I was hoping someone here might be able to shed light on it... On my 32-bit Windows Vista machine I notice a significant (5x) slowdown when taking the absolute values of a fairly large |numpy.complex64| array when compared to a |numpy.complex128| array. |>>> import numpy >>> a= numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048) >>> b= numpy.complex64(a) >>> timeit c= numpy.float32(numpy.abs(a)) 10 loops, best of3: 27.5 ms per loop >>> timeit c= numpy.abs(b) 1 loops, best of3: 143 ms per loop | Obviously, the outputs in both cases are the same (to operating precision). I do not notice the same effect on my Ubuntu 64-bit machine (indeed, as one might expect, the double precision array operation is a bit slower). Is there a rational explanation for this? Is this something that is common to all windows? In a related note of confusion, the times above are notably (and consistently) different (shorter) to that I get doing a naive `st = time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected? Cheers, Henry -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 10 09:00:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 10 Apr 2012 14:00:38 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F8429F5.20601@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> Message-ID: On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant ?wrote: >>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>> >>> ...isn't this an operation that will be performed once per compiled >>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>> actually measurable as compared to, you know, running an optimizing >>> compiler? >>> >>> Yes, there can be significant overhead. ? The compiler is run once and >>> creates the function. ? This function is then potentially used many, many >>> times. ? ?Also, it is entirely conceivable that the "build" step happens at >>> a separate "compilation" time, and Numba actually loads a pre-compiled >>> version of the function from disk which it then uses at run-time. >>> >>> I have been playing with a version of this using scipy.integrate and >>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>> point of making the code-path using these function pointers to be useless >>> when without the ctypes.cast overhed the speed up is 3-5x. >> >> Ah, I was assuming that you'd do the cast once outside of the inner >> loop (at the same time you did type compatibility checking and so >> forth). >> >>> In general, I think NumPy will need its own simple function-pointer object >>> to use when handing over raw-function pointers between Python and C. ? SciPy >>> can then re-use this object which also has a useful C-API for things like >>> signature checking. ? ?I have seen that ctypes is nice but very slow and >>> without a compelling C-API. >> >> Sounds reasonable to me. Probably nicer than violating ctypes's >> abstraction boundary, and with no real downsides. >> >>> The kind of new C-level cfuncptr object I imagine has attributes: >>> >>> void *func_ptr; >>> char *signature string ?/* something like 'dd->d' to indicate a function >>> that takes two doubles and returns a double */ >> >> This looks like it's setting us up for trouble later. We already have >> a robust mechanism for describing types -- dtypes. We should use that >> instead of inventing Yet Another baby type system. We'll need to >> convert between this representation and dtypes anyway if you want to >> use these pointers for ufunc loops... and if we just use dtypes from >> the start, we'll avoid having to break the API the first time someone >> wants to pass a struct or array or something. > > For some of the things we'd like to do with Cython down the line, > something very fast like what Travis describes is exactly what we need; > specifically, if you have Cython code like > > cdef double f(func): > ? ? return func(3.4) > > that may NOT be called in a loop. > > But I do agree that this sounds overkill for NumPy+numba at the moment; > certainly for scipy.integrate where you can amortize over N function > samples. But Travis perhaps has a usecase I didn't think of. It sounds sort of like you're disagreeing with me but I can't tell about what, so maybe I was unclear :-). All I was saying was that a list-of-dtype-objects was probably a better way to write down a function signature than some ad-hoc string language. In both cases you'd do some type-compatibility-checking up front and then use C calling afterwards, and I don't see why type-checking would be faster or slower for one representation than the other. (Certainly one wouldn't have to support all possible dtypes up front, the point is just that they give us more room to grow later.) -- Nathaniel From valentin.haenel at epfl.ch Tue Apr 10 09:08:25 2012 From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin) Date: Tue, 10 Apr 2012 15:08:25 +0200 Subject: [Numpy-discussion] [OFFTOPIC] creating/working NumPy-ndarrays in C++ In-Reply-To: References: <4F7AF5C1.5090604@arcor.de> <4F7B720C.8010002@stsci.edu> <20120408213918.GA798@kudu.in-berlin.de> <20120409165838.GA439@kudu.in-berlin.de> Message-ID: <20120410130825.GC798@kudu.in-berlin.de> * Chris Barker [2012-04-09]: > 2012/4/9 H?nel Nikolaus Valentin : > > http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html > >> > >> that looks like it hasn't been updated since 2006 -- I"d say that > >> makes it a non-starter > > > > Yeah, thats what I thought... Until I found it in several production > > codes... > > are they maintaining it? Well, no... thats ?legacy code? that was handed down to me, more or less. > >> 4? > > > > http://www.boost.org/doc/libs/1_49_0/libs/python/doc/v2/numeric.html > > (old) > > > > https://github.com/ndarray/Boost.NumPy > > (new) > > > > http://code.google.com/p/numpy-boost/ > (also pretty old -- I see this:) > > - Numpy (numpy.scipy.org) (Tested versions: 1.1.1, though >= 1.0 should work) > - Python (www.python.org) (Tested versions: 2.5.2, though >= 2.3 should work) > > both pretty old versions. > > http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html > > > also pretty old. > > So I'd go with the actively maintained on -- or Cython -- what I can > tell you is that Cython is being very widely used in the > numerical/scientific computing community -- but I haven't seen a lot > of Boost users. Maybe they use different mailing lists, and dont go to > SciPy or Pycon... Yeah, I would choose cython... if I had a choice... I have had boost.python mentioned a single time throughout the last four editions of EuroScipy2012 > I'm not sure you made your use case clear -- are you writing C++ > specifically for calling form Python? or are you working on a C++ lib > that will be used in C++ apps as well as Python apps? Currently just curious about the different tools available to fascilitate interoperability between numpy and boost.python. V- From d.s.seljebotn at astro.uio.no Tue Apr 10 09:10:04 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 15:10:04 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> Message-ID: <4F84312C.3040309@astro.uio.no> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: > On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn > wrote: >> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: >>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>> >>>> ...isn't this an operation that will be performed once per compiled >>>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>>> actually measurable as compared to, you know, running an optimizing >>>> compiler? >>>> >>>> Yes, there can be significant overhead. The compiler is run once and >>>> creates the function. This function is then potentially used many, many >>>> times. Also, it is entirely conceivable that the "build" step happens at >>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>> version of the function from disk which it then uses at run-time. >>>> >>>> I have been playing with a version of this using scipy.integrate and >>>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>>> point of making the code-path using these function pointers to be useless >>>> when without the ctypes.cast overhed the speed up is 3-5x. >>> >>> Ah, I was assuming that you'd do the cast once outside of the inner >>> loop (at the same time you did type compatibility checking and so >>> forth). >>> >>>> In general, I think NumPy will need its own simple function-pointer object >>>> to use when handing over raw-function pointers between Python and C. SciPy >>>> can then re-use this object which also has a useful C-API for things like >>>> signature checking. I have seen that ctypes is nice but very slow and >>>> without a compelling C-API. >>> >>> Sounds reasonable to me. Probably nicer than violating ctypes's >>> abstraction boundary, and with no real downsides. >>> >>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>> >>>> void *func_ptr; >>>> char *signature string /* something like 'dd->d' to indicate a function >>>> that takes two doubles and returns a double */ >>> >>> This looks like it's setting us up for trouble later. We already have >>> a robust mechanism for describing types -- dtypes. We should use that >>> instead of inventing Yet Another baby type system. We'll need to >>> convert between this representation and dtypes anyway if you want to >>> use these pointers for ufunc loops... and if we just use dtypes from >>> the start, we'll avoid having to break the API the first time someone >>> wants to pass a struct or array or something. >> >> For some of the things we'd like to do with Cython down the line, >> something very fast like what Travis describes is exactly what we need; >> specifically, if you have Cython code like >> >> cdef double f(func): >> return func(3.4) >> >> that may NOT be called in a loop. >> >> But I do agree that this sounds overkill for NumPy+numba at the moment; >> certainly for scipy.integrate where you can amortize over N function >> samples. But Travis perhaps has a usecase I didn't think of. > > It sounds sort of like you're disagreeing with me but I can't tell > about what, so maybe I was unclear :-). > > All I was saying was that a list-of-dtype-objects was probably a > better way to write down a function signature than some ad-hoc string > language. In both cases you'd do some type-compatibility-checking up > front and then use C calling afterwards, and I don't see why > type-checking would be faster or slower for one representation than > the other. (Certainly one wouldn't have to support all possible dtypes > up front, the point is just that they give us more room to grow > later.) My point was that with Cython you'd get cases where there is no "up-front", you have to check-and-call as essentially one operation. The Cython code above would result in something like this: if (strcmp("dd->d", signature) == 0) { /* guess on signature and have fast C dispatch for exact match */ } else { /* fall back to calling as Python object */ } The strcmp would probably be inlined and unrolled, but you get the idea. With LLVM available, and if Cython started to use it, we could generate more such branches on the fly, making it more attractive. Dag From d.s.seljebotn at astro.uio.no Tue Apr 10 09:15:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 15:15:51 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F84312C.3040309@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> <4F84312C.3040309@astro.uio.no> Message-ID: <4F843287.4020702@astro.uio.no> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >> wrote: >>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: >>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>> >>>>> ...isn't this an operation that will be performed once per compiled >>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>>>> actually measurable as compared to, you know, running an optimizing >>>>> compiler? >>>>> >>>>> Yes, there can be significant overhead. The compiler is run once and >>>>> creates the function. This function is then potentially used many, many >>>>> times. Also, it is entirely conceivable that the "build" step happens at >>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>> version of the function from disk which it then uses at run-time. >>>>> >>>>> I have been playing with a version of this using scipy.integrate and >>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>>>> point of making the code-path using these function pointers to be useless >>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>> >>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>> loop (at the same time you did type compatibility checking and so >>>> forth). >>>> >>>>> In general, I think NumPy will need its own simple function-pointer object >>>>> to use when handing over raw-function pointers between Python and C. SciPy >>>>> can then re-use this object which also has a useful C-API for things like >>>>> signature checking. I have seen that ctypes is nice but very slow and >>>>> without a compelling C-API. >>>> >>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>> abstraction boundary, and with no real downsides. >>>> >>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>> >>>>> void *func_ptr; >>>>> char *signature string /* something like 'dd->d' to indicate a function >>>>> that takes two doubles and returns a double */ >>>> >>>> This looks like it's setting us up for trouble later. We already have >>>> a robust mechanism for describing types -- dtypes. We should use that >>>> instead of inventing Yet Another baby type system. We'll need to >>>> convert between this representation and dtypes anyway if you want to >>>> use these pointers for ufunc loops... and if we just use dtypes from >>>> the start, we'll avoid having to break the API the first time someone >>>> wants to pass a struct or array or something. >>> >>> For some of the things we'd like to do with Cython down the line, >>> something very fast like what Travis describes is exactly what we need; >>> specifically, if you have Cython code like >>> >>> cdef double f(func): >>> return func(3.4) >>> >>> that may NOT be called in a loop. >>> >>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>> certainly for scipy.integrate where you can amortize over N function >>> samples. But Travis perhaps has a usecase I didn't think of. >> >> It sounds sort of like you're disagreeing with me but I can't tell >> about what, so maybe I was unclear :-). >> >> All I was saying was that a list-of-dtype-objects was probably a >> better way to write down a function signature than some ad-hoc string >> language. In both cases you'd do some type-compatibility-checking up >> front and then use C calling afterwards, and I don't see why >> type-checking would be faster or slower for one representation than >> the other. (Certainly one wouldn't have to support all possible dtypes Rereading this, perhaps this is the statement you seek: Yes, doing a simple strcmp is much, much faster than jumping all around in memory to check the equality of two lists of dtypes. If it is a string less than 8 bytes in length with the comparison string known at compile-time (the Cython case) then the comparison is only a couple of CPU instructions, as you can check 64 bits at the time. Dag >> up front, the point is just that they give us more room to grow >> later.) > > My point was that with Cython you'd get cases where there is no > "up-front", you have to check-and-call as essentially one operation. The > Cython code above would result in something like this: > > if (strcmp("dd->d", signature) == 0) { > /* guess on signature and have fast C dispatch for exact match */ > } > else { > /* fall back to calling as Python object */ > } > > The strcmp would probably be inlined and unrolled, but you get the idea. > > With LLVM available, and if Cython started to use it, we could generate > more such branches on the fly, making it more attractive. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Apr 10 09:29:32 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 10 Apr 2012 14:29:32 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F843287.4020702@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> <4F84312C.3040309@astro.uio.no> <4F843287.4020702@astro.uio.no> Message-ID: On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>> ? wrote: >>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant ? ? wrote: >>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>> >>>>>> ...isn't this an operation that will be performed once per compiled >>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>>>>> actually measurable as compared to, you know, running an optimizing >>>>>> compiler? >>>>>> >>>>>> Yes, there can be significant overhead. ? The compiler is run once and >>>>>> creates the function. ? This function is then potentially used many, many >>>>>> times. ? ?Also, it is entirely conceivable that the "build" step happens at >>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>> version of the function from disk which it then uses at run-time. >>>>>> >>>>>> I have been playing with a version of this using scipy.integrate and >>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>>>>> point of making the code-path using these function pointers to be useless >>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>> >>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>> loop (at the same time you did type compatibility checking and so >>>>> forth). >>>>> >>>>>> In general, I think NumPy will need its own simple function-pointer object >>>>>> to use when handing over raw-function pointers between Python and C. ? SciPy >>>>>> can then re-use this object which also has a useful C-API for things like >>>>>> signature checking. ? ?I have seen that ctypes is nice but very slow and >>>>>> without a compelling C-API. >>>>> >>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>> abstraction boundary, and with no real downsides. >>>>> >>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>> >>>>>> void *func_ptr; >>>>>> char *signature string ?/* something like 'dd->d' to indicate a function >>>>>> that takes two doubles and returns a double */ >>>>> >>>>> This looks like it's setting us up for trouble later. We already have >>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>> instead of inventing Yet Another baby type system. We'll need to >>>>> convert between this representation and dtypes anyway if you want to >>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>> the start, we'll avoid having to break the API the first time someone >>>>> wants to pass a struct or array or something. >>>> >>>> For some of the things we'd like to do with Cython down the line, >>>> something very fast like what Travis describes is exactly what we need; >>>> specifically, if you have Cython code like >>>> >>>> cdef double f(func): >>>> ? ? ? return func(3.4) >>>> >>>> that may NOT be called in a loop. >>>> >>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>> certainly for scipy.integrate where you can amortize over N function >>>> samples. But Travis perhaps has a usecase I didn't think of. >>> >>> It sounds sort of like you're disagreeing with me but I can't tell >>> about what, so maybe I was unclear :-). >>> >>> All I was saying was that a list-of-dtype-objects was probably a >>> better way to write down a function signature than some ad-hoc string >>> language. In both cases you'd do some type-compatibility-checking up >>> front and then use C calling afterwards, and I don't see why >>> type-checking would be faster or slower for one representation than >>> the other. (Certainly one wouldn't have to support all possible dtypes > > Rereading this, perhaps this is the statement you seek: Yes, doing a > simple strcmp is much, much faster than jumping all around in memory to > check the equality of two lists of dtypes. If it is a string less than 8 > bytes in length with the comparison string known at compile-time (the > Cython case) then the comparison is only a couple of CPU instructions, > as you can check 64 bits at the time. Right, that's what I wasn't getting until you mentioned strcmp :-). That said, the core numpy dtypes are singletons. For this purpose, the signature could be stored as C array of PyArray_Descr*, but even if we store it in a Python tuple/list, we'd still end up with a contiguous array of PyArray_Descr*'s. (I'm assuming that we would guarantee that it was always-and-only a real PyTupleObject* here.) So for the function we're talking about, the check would compile down to doing the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte strcmp. That's admittedly worse, but I think the difference between these two comparisons is unlikely to be measurable, considering that they're followed immediately by a cache miss when we actually jump to the function pointer. -- Nathaniel From d.s.seljebotn at astro.uio.no Tue Apr 10 09:38:39 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 15:38:39 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> <4F84312C.3040309@astro.uio.no> <4F843287.4020702@astro.uio.no> Message-ID: <4F8437DF.3040203@astro.uio.no> On 04/10/2012 03:29 PM, Nathaniel Smith wrote: > On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn > wrote: >> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>>> wrote: >>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: >>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>>> >>>>>>> ...isn't this an operation that will be performed once per compiled >>>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>>>>>> actually measurable as compared to, you know, running an optimizing >>>>>>> compiler? >>>>>>> >>>>>>> Yes, there can be significant overhead. The compiler is run once and >>>>>>> creates the function. This function is then potentially used many, many >>>>>>> times. Also, it is entirely conceivable that the "build" step happens at >>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>>> version of the function from disk which it then uses at run-time. >>>>>>> >>>>>>> I have been playing with a version of this using scipy.integrate and >>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>>>>>> point of making the code-path using these function pointers to be useless >>>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>>> >>>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>>> loop (at the same time you did type compatibility checking and so >>>>>> forth). >>>>>> >>>>>>> In general, I think NumPy will need its own simple function-pointer object >>>>>>> to use when handing over raw-function pointers between Python and C. SciPy >>>>>>> can then re-use this object which also has a useful C-API for things like >>>>>>> signature checking. I have seen that ctypes is nice but very slow and >>>>>>> without a compelling C-API. >>>>>> >>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>>> abstraction boundary, and with no real downsides. >>>>>> >>>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>>> >>>>>>> void *func_ptr; >>>>>>> char *signature string /* something like 'dd->d' to indicate a function >>>>>>> that takes two doubles and returns a double */ >>>>>> >>>>>> This looks like it's setting us up for trouble later. We already have >>>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>>> instead of inventing Yet Another baby type system. We'll need to >>>>>> convert between this representation and dtypes anyway if you want to >>>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>>> the start, we'll avoid having to break the API the first time someone >>>>>> wants to pass a struct or array or something. >>>>> >>>>> For some of the things we'd like to do with Cython down the line, >>>>> something very fast like what Travis describes is exactly what we need; >>>>> specifically, if you have Cython code like >>>>> >>>>> cdef double f(func): >>>>> return func(3.4) >>>>> >>>>> that may NOT be called in a loop. >>>>> >>>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>>> certainly for scipy.integrate where you can amortize over N function >>>>> samples. But Travis perhaps has a usecase I didn't think of. >>>> >>>> It sounds sort of like you're disagreeing with me but I can't tell >>>> about what, so maybe I was unclear :-). >>>> >>>> All I was saying was that a list-of-dtype-objects was probably a >>>> better way to write down a function signature than some ad-hoc string >>>> language. In both cases you'd do some type-compatibility-checking up >>>> front and then use C calling afterwards, and I don't see why >>>> type-checking would be faster or slower for one representation than >>>> the other. (Certainly one wouldn't have to support all possible dtypes >> >> Rereading this, perhaps this is the statement you seek: Yes, doing a >> simple strcmp is much, much faster than jumping all around in memory to >> check the equality of two lists of dtypes. If it is a string less than 8 >> bytes in length with the comparison string known at compile-time (the >> Cython case) then the comparison is only a couple of CPU instructions, >> as you can check 64 bits at the time. > > Right, that's what I wasn't getting until you mentioned strcmp :-). > > That said, the core numpy dtypes are singletons. For this purpose, the > signature could be stored as C array of PyArray_Descr*, but even if we > store it in a Python tuple/list, we'd still end up with a contiguous > array of PyArray_Descr*'s. (I'm assuming that we would guarantee that > it was always-and-only a real PyTupleObject* here.) So for the > function we're talking about, the check would compile down to doing > the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte > strcmp. That's admittedly worse, but I think the difference between > these two comparisons is unlikely to be measurable, considering that > they're followed immediately by a cache miss when we actually jump to > the function pointer. Yes, for singletons you're almost as good off. But if you have a struct argument, say void f(double x, struct {double a, float b} y); then PEP 3118 gives you the string "dT{dd}", whereas with NumPy dtypes you won't have a singleton? I can agree that that is a minor issue though (you could always *make* NumPy dtypes always be singleton). I think the real argument is that for Cython, it just wouldn't do to rely on NumPy dtypes (or NumPy being installed at all) for something as basic as calling to a C-level function; and strings are a simple substitute. And since it is a format defined in PEP 3118, NumPy should already support these kinds of strings internally (i.e. conversion to/from dtype). Dag From d.s.seljebotn at astro.uio.no Tue Apr 10 09:49:40 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 15:49:40 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F8437DF.3040203@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> <4F84312C.3040309@astro.uio.no> <4F843287.4020702@astro.uio.no> <4F8437DF.3040203@astro.uio.no> Message-ID: <4F843A74.1060402@astro.uio.no> On 04/10/2012 03:38 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 03:29 PM, Nathaniel Smith wrote: >> On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn >> wrote: >>> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >>>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>>>> wrote: >>>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant wrote: >>>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>>>> >>>>>>>> ...isn't this an operation that will be performed once per compiled >>>>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast) >>>>>>>> actually measurable as compared to, you know, running an optimizing >>>>>>>> compiler? >>>>>>>> >>>>>>>> Yes, there can be significant overhead. The compiler is run once and >>>>>>>> creates the function. This function is then potentially used many, many >>>>>>>> times. Also, it is entirely conceivable that the "build" step happens at >>>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>>>> version of the function from disk which it then uses at run-time. >>>>>>>> >>>>>>>> I have been playing with a version of this using scipy.integrate and >>>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the >>>>>>>> point of making the code-path using these function pointers to be useless >>>>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>>>> >>>>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>>>> loop (at the same time you did type compatibility checking and so >>>>>>> forth). >>>>>>> >>>>>>>> In general, I think NumPy will need its own simple function-pointer object >>>>>>>> to use when handing over raw-function pointers between Python and C. SciPy >>>>>>>> can then re-use this object which also has a useful C-API for things like >>>>>>>> signature checking. I have seen that ctypes is nice but very slow and >>>>>>>> without a compelling C-API. >>>>>>> >>>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>>>> abstraction boundary, and with no real downsides. >>>>>>> >>>>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>>>> >>>>>>>> void *func_ptr; >>>>>>>> char *signature string /* something like 'dd->d' to indicate a function >>>>>>>> that takes two doubles and returns a double */ >>>>>>> >>>>>>> This looks like it's setting us up for trouble later. We already have >>>>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>>>> instead of inventing Yet Another baby type system. We'll need to >>>>>>> convert between this representation and dtypes anyway if you want to >>>>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>>>> the start, we'll avoid having to break the API the first time someone >>>>>>> wants to pass a struct or array or something. >>>>>> >>>>>> For some of the things we'd like to do with Cython down the line, >>>>>> something very fast like what Travis describes is exactly what we need; >>>>>> specifically, if you have Cython code like >>>>>> >>>>>> cdef double f(func): >>>>>> return func(3.4) >>>>>> >>>>>> that may NOT be called in a loop. >>>>>> >>>>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>>>> certainly for scipy.integrate where you can amortize over N function >>>>>> samples. But Travis perhaps has a usecase I didn't think of. >>>>> >>>>> It sounds sort of like you're disagreeing with me but I can't tell >>>>> about what, so maybe I was unclear :-). >>>>> >>>>> All I was saying was that a list-of-dtype-objects was probably a >>>>> better way to write down a function signature than some ad-hoc string >>>>> language. In both cases you'd do some type-compatibility-checking up >>>>> front and then use C calling afterwards, and I don't see why >>>>> type-checking would be faster or slower for one representation than >>>>> the other. (Certainly one wouldn't have to support all possible dtypes >>> >>> Rereading this, perhaps this is the statement you seek: Yes, doing a >>> simple strcmp is much, much faster than jumping all around in memory to >>> check the equality of two lists of dtypes. If it is a string less than 8 >>> bytes in length with the comparison string known at compile-time (the >>> Cython case) then the comparison is only a couple of CPU instructions, >>> as you can check 64 bits at the time. >> >> Right, that's what I wasn't getting until you mentioned strcmp :-). >> >> That said, the core numpy dtypes are singletons. For this purpose, the >> signature could be stored as C array of PyArray_Descr*, but even if we >> store it in a Python tuple/list, we'd still end up with a contiguous >> array of PyArray_Descr*'s. (I'm assuming that we would guarantee that >> it was always-and-only a real PyTupleObject* here.) So for the >> function we're talking about, the check would compile down to doing >> the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte >> strcmp. That's admittedly worse, but I think the difference between >> these two comparisons is unlikely to be measurable, considering that >> they're followed immediately by a cache miss when we actually jump to >> the function pointer. Actually, I think the performance hit is a problem in the Cython case. While there's no place to explicitly pre-check the signature, it will very often be the case that everything is in L1 cache already. Consider f being called in a loop. (And the whole point of the exercise is to avoid for the user having to type the "func" argument) Dag From njs at pobox.com Tue Apr 10 11:04:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 10 Apr 2012 16:04:38 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F8437DF.3040203@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <9A9D82F1-42A2-470E-AA2C-2CA904698B23@continuum.io> <4F8429F5.20601@astro.uio.no> <4F84312C.3040309@astro.uio.no> <4F843287.4020702@astro.uio.no> <4F8437DF.3040203@astro.uio.no> Message-ID: On Tue, Apr 10, 2012 at 2:38 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 03:29 PM, Nathaniel Smith wrote: >> Right, that's what I wasn't getting until you mentioned strcmp :-). >> >> That said, the core numpy dtypes are singletons. For this purpose, the >> signature could be stored as C array of PyArray_Descr*, but even if we >> store it in a Python tuple/list, we'd still end up with a contiguous >> array of PyArray_Descr*'s. (I'm assuming that we would guarantee that >> it was always-and-only a real PyTupleObject* here.) So for the >> function we're talking about, the check would compile down to doing >> the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte >> strcmp. That's admittedly worse, but I think the difference between >> these two comparisons is unlikely to be measurable, considering that >> they're followed immediately by a cache miss when we actually jump to >> the function pointer. > > Yes, for singletons you're almost as good off. But if you have a struct > argument, say > > void f(double x, struct {double a, float b} y); > > then PEP 3118 gives you the string "dT{dd}", whereas with NumPy dtypes > you won't have a singleton? > > I can agree that that is a minor issue though (you could always *make* > NumPy dtypes always be singleton). > > I think the real argument is that for Cython, it just wouldn't do to > rely on NumPy dtypes (or NumPy being installed at all) for something as > basic as calling to a C-level function; and strings are a simple substitute. > > And since it is a format defined in PEP 3118, NumPy should already > support these kinds of strings internally (i.e. conversion to/from dtype). Good points. PEP 3118 is more thorough than I realized. Is it actually canonical/implemented? The PEP says that all the added type syntax will be added to struct, but that doesn't seem to have happened (except for the "?" character, I guess). -- Nathaniel From nadavh at visionsense.com Tue Apr 10 11:25:32 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 10 Apr 2012 08:25:32 -0700 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Message-ID: <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local> Sorry for being slow. There is (I think) a related question I raised on the skimage list: I have a cython function that calls a C callback function in a loop (one call for each pixel in an image). The C function in compiled in a different shared library (a simple C library, not a python module). I would like a python script to get the address of the C function and pass it on to the cython function as the pointer for the callback function. As I understand Travis' isue starts ones the callback address is obtained, but, is there a direct method to retrieve the address from the shared library? Nadav. ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Travis Oliphant [teoliphant at gmail.com] Sent: 10 April 2012 03:11 To: Discussion of Numerical Python Subject: [Numpy-discussion] Getting C-function pointers from Python to C Hi all, Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. This is essentially what ctypes does when creating a ctypes function pointer out of: func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. Something like this is what is envisioned here: typedef struct { PyObject_HEAD char *b_ptr; } _cfuncptr_object; then the function pointer is: (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) which could be wrapped-up into a nice little NumPy C-API call like void * Npy_ctypes_funcptr(obj) 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of ctypes.cast(obj, ctypes.c_void_p).value There is working code for this in the ctypes_callback branch of my scipy fork on github. I would like to propose two things: * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and * implement it with the simple pointer dereference above (option #1) Thoughts? -Travis _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From francesc at continuum.io Tue Apr 10 11:36:56 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 10 Apr 2012 10:36:56 -0500 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F841D2B.8060406@cantab.net> References: <4F841D2B.8060406@cantab.net> Message-ID: <4F845398.2070005@continuum.io> On 4/10/12 6:44 AM, Henry Gomersall wrote: > Here is the body of a post I made on stackoverflow, but it seems to be > a non-obvious issue. I was hoping someone here might be able to shed > light on it... > > On my 32-bit Windows Vista machine I notice a significant (5x) > slowdown when taking the absolute values of a fairly large > |numpy.complex64| array when compared to a |numpy.complex128| array. > > |>>> import numpy > >>> a= numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048) > >>> b= numpy.complex64(a) > >>> timeit c= numpy.float32(numpy.abs(a)) > 10 loops, best of3: 27.5 ms per loop > >>> timeit c= numpy.abs(b) > 1 loops, best of3: 143 ms per loop > | > > Obviously, the outputs in both cases are the same (to operating > precision). > > I do not notice the same effect on my Ubuntu 64-bit machine (indeed, > as one might expect, the double precision array operation is a bit > slower). > > Is there a rational explanation for this? > > Is this something that is common to all windows? > I cannot tell for sure, but it looks like the windows version of NumPy is casting complex64 to complex128 internally. I'm guessing here, but numexpr lacks the complex64 type, so it has to internally do the upcast, and I'm seeing kind of the same slowdown: In [6]: timeit numpy.abs(a) 100 loops, best of 3: 10.7 ms per loop In [7]: timeit numpy.abs(b) 100 loops, best of 3: 8.51 ms per loop In [8]: timeit numexpr.evaluate("abs(a)") 100 loops, best of 3: 1.67 ms per loop In [9]: timeit numexpr.evaluate("abs(b)") 100 loops, best of 3: 4.96 ms per loop In my case I'm seeing only a 3x slowdown, but this is because numexpr is not re-casting the outcome to complex64, while windows might be doing this. Just to make sure, can you run this: In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b))) 100 loops, best of 3: 12.3 ms per loop In [11]: timeit c = numpy.abs(b) 100 loops, best of 3: 8.45 ms per loop in your windows box and see if they raise similar results? > In a related note of confusion, the times above are notably (and > consistently) different (shorter) to that I get doing a naive `st = > time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected? > This happens a lot, yes, specially when your code is memory-bottlenecked (a very common situation). The explanation is simple: when your datasets are small enough to fit in CPU cache, the first time the timing loop runs, it brings all your working set to cache, so the second time the computation is evaluated, the time does not have to fetch data from memory, and by the time you run the loop 10 times or more, you are discarding any memory effect. However, when you run the loop only once, you are considering the memory fetch time too (which is often much more realistic). -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Tue Apr 10 11:57:54 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 17:57:54 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local> Message-ID: <439be604-b0ae-4974-b94c-63d07bb76e62@email.android.com> That is rather unrelated, you better ask this again on the cython-users list (be warned that top-posting is strongly discouraged in that place). Dag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Nadav Horesh wrote: Sorry for being slow. There is (I think) a related question I raised on the skimage list: I have a cython function that calls a C callback function in a loop (one call for each pixel in an image). The C function in compiled in a different shared library (a simple C library, not a python module). I would like a python script to get the address of the C function and pass it on to the cython function as the pointer for the callback function. As I understand Travis' isue starts ones the callback address is obtained, but, is there a direct method to retrieve the address from the shared library? Nadav. _____________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Travis Oliphant [teoliphant at gmail.com] Sent: 10 April 2012 03:11 To: Discussion of Numerical Python Subject: [Numpy-discussion] Getting C-function pointers from Python to C Hi all, Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. This is essentially what ctypes does when creating a ctypes function pointer out of: func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. Something like this is what is envisioned here: typedef struct { PyObject_HEAD char *b_ptr; } _cfuncptr_object; then the function pointer is: (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) which could be wrapped-up into a nice little NumPy C-API call like void * Npy_ctypes_funcptr(obj) 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of ctypes.cast(obj, ctypes.c_void_p).value There is working code for this in the ctypes_callback branch of my scipy fork on github. I would like to propose two things: * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and * implement it with the simple pointer dereference above (option #1) Thoughts? -Travis _____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Tue Apr 10 10:55:53 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 10 Apr 2012 15:55:53 +0100 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F845398.2070005@continuum.io> References: <4F841D2B.8060406@cantab.net> <4F845398.2070005@continuum.io> Message-ID: <4F8449F9.7060507@cantab.net> On 10/04/2012 16:36, Francesc Alted wrote: > In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b))) > 100 loops, best of 3: 12.3 ms per loop > > In [11]: timeit c = numpy.abs(b) > 100 loops, best of 3: 8.45 ms per loop > > in your windows box and see if they raise similar results? > No, the results are somewhat the same as before - ~40ms for the first (upcast/downcast) case and ~150ms for the direct case (both *much* slower than yours!). This is versus ~28ms for operating directly on double precisions. I'm using numexpr in the end, but this is slower than numpy.abs under linux. >> In a related note of confusion, the times above are notably (and >> consistently) different (shorter) to that I get doing a naive `st = >> time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected? >> > > This happens a lot, yes, specially when your code is > memory-bottlenecked (a very common situation). The explanation is > simple: when your datasets are small enough to fit in CPU cache, the > first time the timing loop runs, it brings all your working set to > cache, so the second time the computation is evaluated, the time does > not have to fetch data from memory, and by the time you run the loop > 10 times or more, you are discarding any memory effect. However, when > you run the loop only once, you are considering the memory fetch time > too (which is often much more realistic). Ah, that makes sense. Thanks! Cheers, Henry From jniehof at lanl.gov Tue Apr 10 12:52:46 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Tue, 10 Apr 2012 10:52:46 -0600 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: References: <4F830AF4.9080608@lanl.gov> Message-ID: <4F84655E.1030501@lanl.gov> On 04/09/2012 09:11 PM, Tony Yu wrote: > I guess I wasn't reading very carefully and assumed that you meant a > list of `slice(None)` instead of a list of `None`. My apologies to Ben...I wasn't being pedantic to be a jerk, I was being pedantic because I read Ben's message and thought "oooh, that works?" and ran off to try it, since I'd just been writing some very similar code. And sadly, it doesn't. -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From francesc at continuum.io Tue Apr 10 12:57:04 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 10 Apr 2012 11:57:04 -0500 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F8449F9.7060507@cantab.net> References: <4F841D2B.8060406@cantab.net> <4F845398.2070005@continuum.io> <4F8449F9.7060507@cantab.net> Message-ID: <4F846660.1060906@continuum.io> On 4/10/12 9:55 AM, Henry Gomersall wrote: > On 10/04/2012 16:36, Francesc Alted wrote: >> In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b))) >> 100 loops, best of 3: 12.3 ms per loop >> >> In [11]: timeit c = numpy.abs(b) >> 100 loops, best of 3: 8.45 ms per loop >> >> in your windows box and see if they raise similar results? >> > No, the results are somewhat the same as before - ~40ms for the first > (upcast/downcast) case and ~150ms for the direct case (both *much* > slower than yours!). This is versus ~28ms for operating directly on > double precisions. Okay, so it seems that something is going on wrong with the performance of pure complex64 abs() for Windows. > > I'm using numexpr in the end, but this is slower than numpy.abs under linux. Oh, you mean the windows version of abs(complex64) in numexpr is slower than a pure numpy.abs(complex64) under linux? That's weird, because numexpr has an independent implementation of the complex operations from NumPy machinery. Here it is how abs() is implemented in numexpr: static void nc_abs(cdouble *x, cdouble *r) { r->real = sqrt(x->real*x->real + x->imag*x->imag); r->imag = 0; } [as I said, only the double precision version is implemented, so you have to add here the cost of the cast too] Hmm, considering all of these facts, it might be that sqrtf() on windows is under-performing? Can you try this: In [68]: a = numpy.linspace(0, 1, 1e6) In [69]: b = numpy.float32(a) In [70]: timeit c = numpy.sqrt(a) 100 loops, best of 3: 5.64 ms per loop In [71]: timeit c = numpy.sqrt(b) 100 loops, best of 3: 3.77 ms per loop and tell us the results for windows? PD: if you are using numexpr on windows, you may want to use the MKL linked version, which uses the abs of MKL, that should have considerably better performance. -- Francesc Alted From ben.root at ou.edu Tue Apr 10 13:01:45 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 10 Apr 2012 13:01:45 -0400 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F846660.1060906@continuum.io> References: <4F841D2B.8060406@cantab.net> <4F845398.2070005@continuum.io> <4F8449F9.7060507@cantab.net> <4F846660.1060906@continuum.io> Message-ID: On Tue, Apr 10, 2012 at 12:57 PM, Francesc Alted wrote: > On 4/10/12 9:55 AM, Henry Gomersall wrote: > > On 10/04/2012 16:36, Francesc Alted wrote: > >> In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b))) > >> 100 loops, best of 3: 12.3 ms per loop > >> > >> In [11]: timeit c = numpy.abs(b) > >> 100 loops, best of 3: 8.45 ms per loop > >> > >> in your windows box and see if they raise similar results? > >> > > No, the results are somewhat the same as before - ~40ms for the first > > (upcast/downcast) case and ~150ms for the direct case (both *much* > > slower than yours!). This is versus ~28ms for operating directly on > > double precisions. > > Okay, so it seems that something is going on wrong with the performance > of pure complex64 abs() for Windows. > > > > > I'm using numexpr in the end, but this is slower than numpy.abs under > linux. > > Oh, you mean the windows version of abs(complex64) in numexpr is slower > than a pure numpy.abs(complex64) under linux? That's weird, because > numexpr has an independent implementation of the complex operations from > NumPy machinery. Here it is how abs() is implemented in numexpr: > > static void > nc_abs(cdouble *x, cdouble *r) > { > r->real = sqrt(x->real*x->real + x->imag*x->imag); > r->imag = 0; > } > > [as I said, only the double precision version is implemented, so you > have to add here the cost of the cast too] > > Hmm, considering all of these facts, it might be that sqrtf() on windows > is under-performing? Can you try this: > > In [68]: a = numpy.linspace(0, 1, 1e6) > > In [69]: b = numpy.float32(a) > > In [70]: timeit c = numpy.sqrt(a) > 100 loops, best of 3: 5.64 ms per loop > > In [71]: timeit c = numpy.sqrt(b) > 100 loops, best of 3: 3.77 ms per loop > > and tell us the results for windows? > > PD: if you are using numexpr on windows, you may want to use the MKL > linked version, which uses the abs of MKL, that should have considerably > better performance. > > -- > Francesc Alted > > Just a quick aside, wouldn't the above have overflow issues? Isn't this why hypot() is available? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Apr 10 13:13:12 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 10 Apr 2012 13:13:12 -0400 Subject: [Numpy-discussion] Slice specified axis In-Reply-To: <4F84655E.1030501@lanl.gov> References: <4F830AF4.9080608@lanl.gov> <4F84655E.1030501@lanl.gov> Message-ID: On Tue, Apr 10, 2012 at 12:52 PM, Jonathan T. Niehof wrote: > On 04/09/2012 09:11 PM, Tony Yu wrote: > > > I guess I wasn't reading very carefully and assumed that you meant a > > list of `slice(None)` instead of a list of `None`. > > My apologies to Ben...I wasn't being pedantic to be a jerk, I was being > pedantic because I read Ben's message and thought "oooh, that works?" > and ran off to try it, since I'd just been writing some very similar > code. And sadly, it doesn't. > > No offense taken. Such mistakes should be pointed out so that future readers of the mailing list archives will have the correct information available to them. Bad mailing list comments are just as bad as outdated source code comments. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Tue Apr 10 12:43:14 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 10 Apr 2012 17:43:14 +0100 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F846660.1060906@continuum.io> References: <4F841D2B.8060406@cantab.net> <4F845398.2070005@continuum.io> <4F8449F9.7060507@cantab.net> <4F846660.1060906@continuum.io> Message-ID: <4F846322.5000009@cantab.net> On 10/04/2012 17:57, Francesc Alted wrote: >> I'm using numexpr in the end, but this is slower than numpy.abs under linux. > Oh, you mean the windows version of abs(complex64) in numexpr is slower > than a pure numpy.abs(complex64) under linux? That's weird, because > numexpr has an independent implementation of the complex operations from > NumPy machinery. Here it is how abs() is implemented in numexpr: > > static void > nc_abs(cdouble *x, cdouble *r) > { > r->real = sqrt(x->real*x->real + x->imag*x->imag); > r->imag = 0; > } > > [as I said, only the double precision version is implemented, so you > have to add here the cost of the cast too] hmmm, I can't seem to reproduce that assertion, so ignore it. > Hmm, considering all of these facts, it might be that sqrtf() on windows > is under-performing? Can you try this: > > In [68]: a = numpy.linspace(0, 1, 1e6) > > In [69]: b = numpy.float32(a) > > In [70]: timeit c = numpy.sqrt(a) > 100 loops, best of 3: 5.64 ms per loop > > In [71]: timeit c = numpy.sqrt(b) > 100 loops, best of 3: 3.77 ms per loop > > and tell us the results for windows? In [18]: timeit c = numpy.sqrt(a) 100 loops, best of 3: 21.4 ms per loop In [19]: timeit c = numpy.sqrt(b) 100 loops, best of 3: 12.5 ms per loop So, all sensible there it seems. Taking this to the next stage... In [95]: a = numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048) In [96]: b = numpy.complex64(a) In [97]: timeit numpy.sqrt(a*numpy.conj(a)) 10 loops, best of 3: 61.9 ms per loop In [98]: timeit numpy.sqrt(b*numpy.conj(b)) 10 loops, best of 3: 27.2 ms per loop In [99]: timeit numpy.abs(a) # for comparison 10 loops, best of 3: 30 ms per loop In [100]: timeit numpy.abs(b) # and again (slow slow slow) 1 loops, best of 3: 153 ms per loop I can confirm the results are correct. So, it really is in numpy.abs. > PD: if you are using numexpr on windows, you may want to use the MKL > linked version, which uses the abs of MKL, that should have considerably > better performance. I'd love to - I presume this would mean me buying an MKL license? If not, where do I find the MKL linked version? Cheers, Henry From francesc at continuum.io Tue Apr 10 14:13:01 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 10 Apr 2012 13:13:01 -0500 Subject: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit? In-Reply-To: <4F846322.5000009@cantab.net> References: <4F841D2B.8060406@cantab.net> <4F845398.2070005@continuum.io> <4F8449F9.7060507@cantab.net> <4F846660.1060906@continuum.io> <4F846322.5000009@cantab.net> Message-ID: <4F84782D.2080103@continuum.io> On 4/10/12 11:43 AM, Henry Gomersall wrote: > On 10/04/2012 17:57, Francesc Alted wrote: >>> I'm using numexpr in the end, but this is slower than numpy.abs under linux. >> Oh, you mean the windows version of abs(complex64) in numexpr is slower >> than a pure numpy.abs(complex64) under linux? That's weird, because >> numexpr has an independent implementation of the complex operations from >> NumPy machinery. Here it is how abs() is implemented in numexpr: >> >> static void >> nc_abs(cdouble *x, cdouble *r) >> { >> r->real = sqrt(x->real*x->real + x->imag*x->imag); >> r->imag = 0; >> } >> >> [as I said, only the double precision version is implemented, so you >> have to add here the cost of the cast too] > hmmm, I can't seem to reproduce that assertion, so ignore it. > >> Hmm, considering all of these facts, it might be that sqrtf() on windows >> is under-performing? Can you try this: >> >> In [68]: a = numpy.linspace(0, 1, 1e6) >> >> In [69]: b = numpy.float32(a) >> >> In [70]: timeit c = numpy.sqrt(a) >> 100 loops, best of 3: 5.64 ms per loop >> >> In [71]: timeit c = numpy.sqrt(b) >> 100 loops, best of 3: 3.77 ms per loop >> >> and tell us the results for windows? > In [18]: timeit c = numpy.sqrt(a) > 100 loops, best of 3: 21.4 ms per loop > > In [19]: timeit c = numpy.sqrt(b) > 100 loops, best of 3: 12.5 ms per loop > > So, all sensible there it seems. > > Taking this to the next stage... > > In [95]: a = numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048) > > In [96]: b = numpy.complex64(a) > > In [97]: timeit numpy.sqrt(a*numpy.conj(a)) > 10 loops, best of 3: 61.9 ms per loop > > In [98]: timeit numpy.sqrt(b*numpy.conj(b)) > 10 loops, best of 3: 27.2 ms per loop > > In [99]: timeit numpy.abs(a) # for comparison > 10 loops, best of 3: 30 ms per loop > > In [100]: timeit numpy.abs(b) # and again (slow slow slow) > 1 loops, best of 3: 153 ms per loop > > I can confirm the results are correct. So, it really is in numpy.abs. Yup, definitely seems an issues of numpy.abs for complex64 on windows. Could you file a ticket on this please? >> PD: if you are using numexpr on windows, you may want to use the MKL >> linked version, which uses the abs of MKL, that should have considerably >> better performance. > I'd love to - I presume this would mean me buying an MKL license? If > not, where do I find the MKL linked version? Well, depending on what you do, you may want to use Golke's version: http://www.lfd.uci.edu/~gohlke/pythonlibs/ where part of the packages here comes with MKL included (in particular NumPy/numexpr). However, after having a look at numexpr sources, I found that the abs() version is not using MKL (apparently due to some malfunction of the latter; maybe this has been solved already). So, don't expect a speedup by using MKL in this case. -- Francesc Alted From pav at iki.fi Tue Apr 10 15:18:23 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 10 Apr 2012 21:18:23 +0200 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: 10.04.2012 06:52, Travis Oliphant kirjoitti: [clip] > 4) I'm still not sure about whether the IGNORED > concept is necessary or not. I really like the separation > that was emphasized between implementation (masks versus > bit-patterns) and operations (propagating versus non-propagating). > Pauli even created another dimension which I don't totally grok > and therefore can't remember. Pauli? Do you still feel that > is a necessary construction? I think the conclusion from that discussion subthread was only that retaining commutativity of binary operations is probably impossible, if unmasking of values is allowed. (I think in that discussion the big difference between IGNORED and MISSING was that ignored values could be unmasked while missing values could not.) If I recall correctly, my suggestion was that you might be able to rescue the situation by changing what assignment means, e.g. in `x[:5] = y` what gets written to `x` at the points where values in `x` and/or `y` are masked/ignored. But I think some counterexamples why this will not work as intended came up. `numpy.ma` operations are not commutative, which can be sometimes surprising, but apparently one just has to be pragmatical and live with this as there's no real way around it. I don't have very good suggestions on how these features should be designed --- I use them too seldom. Pauli From travis at continuum.io Tue Apr 10 15:28:51 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 10 Apr 2012 14:28:51 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local> Message-ID: On Apr 10, 2012, at 10:25 AM, Nadav Horesh wrote: > Sorry for being slow. > There is (I think) a related question I raised on the skimage list: > I have a cython function that calls a C callback function in a loop (one call for each pixel in an image). The C function in compiled in a different shared library (a simple C library, not a python module). I would like a python script to get the address of the C function and pass it on to the cython function as the pointer for the callback function. > > As I understand Travis' isue starts ones the callback address is obtained, but, is there a direct method to retrieve the address from the shared library? There are several ways to do this. But, ctypes makes it fairly straightforward: Example: lib = ctypes.CDLL('libm.dylib') address_as_integer = ctypes.cast(lib.sin, ctypes.c_void_p).value Basically, what we are talking about is a lighter weight way to do hand this address around instead of using ctypes objects including it's heavy-weight method of creating signatures. During the lengthy PEP 3118 discussions, this question of whether to use NumPy dtypes or ctypes classes was debated in terms of how to represent "data-types" in the buffer protocol. Guido wisely decided to use the struct-module method of "strings" duly extended to cover more cases. I think this is definitely the way to go. I also noticed that the dyncall library (http://dyncall.org/) also uses strings to represent signatures (althought it uses a ")" to indicate the boundary between inputs and outputs). -Travis > > Nadav. > ________________________________________ > From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Travis Oliphant [teoliphant at gmail.com] > Sent: 10 April 2012 03:11 > To: Discussion of Numerical Python > Subject: [Numpy-discussion] Getting C-function pointers from Python to C > > Hi all, > > Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. > > One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. > > Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. > > This is essentially what ctypes does when creating a ctypes function pointer out of: > > func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) > > Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" > > We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. > > Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: > > 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. > > Something like this is what is envisioned here: > > typedef struct { > PyObject_HEAD > char *b_ptr; > } _cfuncptr_object; > > then the function pointer is: > > (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) > > which could be wrapped-up into a nice little NumPy C-API call like > > void * Npy_ctypes_funcptr(obj) > > > 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of > > ctypes.cast(obj, ctypes.c_void_p).value > > > There is working code for this in the ctypes_callback branch of my scipy fork on github. > > > I would like to propose two things: > > * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and > * implement it with the simple pointer dereference above (option #1) > > > Thoughts? > > -Travis > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Apr 10 15:40:46 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 10 Apr 2012 21:40:46 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <4F834756.2040008@continuum.io> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: > On 4/3/12 4:18 PM, Ralf Gommers wrote: > > Here some first impressions. > > > > The good: > > - It's responsive! > > - It remembers my preferences (view type, # of issues per page, etc.) > > - Editing multiple issues with the command window is easy. > > - Search and filter functionality is powerful > > > > The bad: > > - Multiple projects are supported, but issues are then really mixed. > > The way this works doesn't look very useful for combined admin of > > numpy/scipy trackers. > > - I haven't found a way yet to make versions and subsystems appear in > > the one-line issue overview. > > - Fixed issues are still shown by default. There are several open > > issues filed against youtrack about this, with no reasonable answers. > > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > > downloaded. > > - No direct VCS integration, only via Teamcity (not set up, so can't > > evaluate). > > - No useful default views as in Trac > > (http://projects.scipy.org/scipy/report). > > Ralf, regarding some of the issues: > Hi Bryan, thanks for looking into this. > > I think for numpy/scipy trackers, we could simply run separate instances > of YouTrack for each. That would work. It does mean that there's no maintenance advantage over using Trac here. Also we can certainly create some standard > queries. It's a small pain not to have useful defaults, but it's only a > one-time pain. :) > That should help. > Also, what kind of integration are you looking for with github? There > does appear to be the ability to issue commands to youtrack through git > commits, which does not depend on TeamCity, as best I can tell: > > http://confluence.jetbrains.net/display/YTD3/GitHub+Integration > http://blogs.jetbrains.com/youtrack/tag/github-integration/ > > I'm not sure this is what you were thinking about though. > That does help. The other thing that's useful is to reference commits (like commit:abcd123 in current Trac) and have them turned into links to commits on Github. This is not a showstopper for me though. > > For the other issues, Maggie or I can try and see what we can find out > about implementing them, or working around them, this week. > I'd say that from the issues I mentioned, the biggest one is the one-line view. So these two: - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. > Of course, we'd like to evaluate any other viable issue trackers as > well. Do you have any suggestions for other systems besides YouTrack? > David wrote up some issues (some of which I didn't check) with current Trac and looked at Redmine before. He also mentioned Roundup. See http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow Redmine does look good from a quick browse (better view, does display diffs). It would be good to get the opinions of a few more people on this topic. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Apr 10 15:53:04 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 10 Apr 2012 20:53:04 +0100 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers wrote: > > > On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: > >> On 4/3/12 4:18 PM, Ralf Gommers wrote: >> > Here some first impressions. >> > >> > The good: >> > - It's responsive! >> > - It remembers my preferences (view type, # of issues per page, etc.) >> > - Editing multiple issues with the command window is easy. >> > - Search and filter functionality is powerful >> > >> > The bad: >> > - Multiple projects are supported, but issues are then really mixed. >> > The way this works doesn't look very useful for combined admin of >> > numpy/scipy trackers. >> > - I haven't found a way yet to make versions and subsystems appear in >> > the one-line issue overview. >> > - Fixed issues are still shown by default. There are several open >> > issues filed against youtrack about this, with no reasonable answers. >> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only >> > downloaded. >> > - No direct VCS integration, only via Teamcity (not set up, so can't >> > evaluate). >> > - No useful default views as in Trac >> > (http://projects.scipy.org/scipy/report). >> >> Ralf, regarding some of the issues: >> > > Hi Bryan, thanks for looking into this. > >> >> I think for numpy/scipy trackers, we could simply run separate instances >> of YouTrack for each. > > > That would work. It does mean that there's no maintenance advantage over > using Trac here. > > Also we can certainly create some standard >> queries. It's a small pain not to have useful defaults, but it's only a >> one-time pain. :) >> > > That should help. > > >> Also, what kind of integration are you looking for with github? There >> does appear to be the ability to issue commands to youtrack through git >> commits, which does not depend on TeamCity, as best I can tell: >> >> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration >> http://blogs.jetbrains.com/youtrack/tag/github-integration/ >> >> I'm not sure this is what you were thinking about though. >> > > That does help. The other thing that's useful is to reference commits > (like commit:abcd123 in current Trac) and have them turned into links to > commits on Github. This is not a showstopper for me though. > >> >> For the other issues, Maggie or I can try and see what we can find out >> about implementing them, or working around them, this week. >> > > I'd say that from the issues I mentioned, the biggest one is the one-line > view. So these two: > > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. > > >> Of course, we'd like to evaluate any other viable issue trackers as >> >> well. Do you have any suggestions for other systems besides YouTrack? >> > > David wrote up some issues (some of which I didn't check) with current > Trac and looked at Redmine before. He also mentioned Roundup. See > http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > > Redmine does look good from a quick browse (better view, does display > diffs). It would be good to get the opinions of a few more people on this > topic. > Redmine is "trac on RoR", but it solves two significant issues over trac: - mass edit (e.g. moving things to a new mileston is simple and doable from the UI) - REST API by default, so that we can build simple command line tools on top of it (this changed since I made the wiki page) It is a PITA to install, though, at least if you are not familiar with ruby, and I heard it is hard to manage as well. IIRC, roundup was suggested by Robert, but it is more of a custom solution I believe. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From williamj at tenbase2.com Tue Apr 10 16:13:32 2012 From: williamj at tenbase2.com (William Johnston) Date: Tue, 10 Apr 2012 16:13:32 -0400 Subject: [Numpy-discussion] NpyAccessLib method documentation? Message-ID: <8A823A8A0E2D4423A080E6745B3D8BC3@leviathan> Hello, Anyone there? williamj -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Tue Apr 10 16:24:17 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 22:24:17 +0200 Subject: [Numpy-discussion] NpyAccessLib method documentation? In-Reply-To: <8A823A8A0E2D4423A080E6745B3D8BC3@leviathan> References: <8A823A8A0E2D4423A080E6745B3D8BC3@leviathan> Message-ID: <4F8496F1.2050308@astro.uio.no> On 04/10/2012 10:13 PM, William Johnston wrote: > Hello, > Anyone there? > williamj > The likely reason nobody answers your question is that this is the list for NumPy for CPython, and the .NET port of NumPy is something 99.9% of the readers know nothing about. I'm not sure if there's even a list for NumPy on .NET, but posting here doesn't help that. The best thing Numpy-for-.NET users can do is to try to form a community around it (create a mailing list if one doesn't exist, and so on). Dag From ralf.gommers at googlemail.com Tue Apr 10 16:46:29 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 10 Apr 2012 22:46:29 +0200 Subject: [Numpy-discussion] Bitwise operations and unsigned types In-Reply-To: <257A925A-F6A8-4636-9A29-13ECDFCA23DD@continuum.io> References: <98F70436E20441DFBABC66A5580779D8@physics.harvard.edu> <53C37A1C-BAB1-4D76-8C41-C3D6EFEAF88E@continuum.io> <1DA366B7-7409-4F04-B4FB-CB08F12EEB87@continuum.io> <851D2B1E-280C-4DDF-B0A9-A8FDEBA0D6A1@continuum.io> <257A925A-F6A8-4636-9A29-13ECDFCA23DD@continuum.io> Message-ID: On Sat, Apr 7, 2012 at 8:07 PM, Travis Oliphant wrote: > If we just announce that there has been some code changes that alter > corner-case casting rules, I think we can move forward. > Sounds good to me. > We could use a script to document the changes and create a test case which > would help people figure out their code. > > Please speak up if you have another point of view? > I've opened http://projects.scipy.org/numpy/ticket/2101 so we remember to do this before the 1.7 release. Ralf > > On Apr 7, 2012, at 7:43 AM, Ralf Gommers > wrote: > > > > On Fri, Apr 6, 2012 at 3:50 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Apr 6, 2012 at 3:57 AM, Nathaniel Smith wrote: >> >>> On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant >>> wrote: >>> > That is an interesting point of view. I could see that point of >>> view. >>> > But, was this discussed as a bug prior to this change occurring? >>> > >>> > I just heard from a very heavy user of NumPy that they are nervous >>> about >>> > upgrading because of little changes like this one. I don't know if >>> this >>> > particular issue would affect them or not, but I will re-iterate my >>> view >>> > that we should be very careful of these kinds of changes. >>> >>> I agree -- these changes make me very nervous as well, especially >>> since I haven't seen any short, simple description of what changed or >>> what the rules actually are now (comparable to the old "scalars do not >>> affect the type of arrays"). >>> >>> But, I also want to speak up in favor in one respect, since real world >>> data points are always good. I had some code that did >>> def do_something(a): >>> a = np.asarray(a) >>> a -= np.mean(a) >>> ... >>> If someone happens to pass in an integer array, then this is totally >>> broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently >>> discards the fractional part and performs the subtraction anyway, >>> e.g.: >>> >>> In [4]: a >>> Out[4]: array([0, 1, 2, 3]) >>> >>> In [5]: a -= 1.5 >>> >>> In [6]: a >>> Out[6]: array([-1, 0, 0, 1]) >>> >>> The bug was discovered when Skipper tried running my code against >>> numpy master, and it errored out on the -=. So Mark's changes did >>> catch one real bug that would have silently caused completely wrong >>> numerical results! >>> >> > As a second datapoint, it did catch real bugs in scikit-learn too. On the > other hand, it required a workaround in ndimage. > http://thread.gmane.org/gmane.comp.python.numeric.general/44206/focus=44208 > > >> >>> >>> https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512 >>> >>> Yes, these things are trade offs between correctness and convenience. I >> don't mind new warnings/errors so much, they may break old code but they >> don't lead to wrong results. It's the unexpected and unnoticed successes >> that are scary. >> > > We discussed reverting the unsafe casting behavior for 1.7 in the thread I > linked to above. Do we still want to do this? As far as I can tell it > didn't really cause problems so far. > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsipsen at gmail.com Wed Apr 11 05:28:57 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Wed, 11 Apr 2012 11:28:57 +0200 Subject: [Numpy-discussion] Slices from an index list Message-ID: <4F854ED9.2090108@gmail.com> Hi, Suppose a have an array of indices, say indices = [0,1,2,3,5,7,8,9,10,12,13,14] Then the following slices a = slice(0,4) b = slice(4,5) c = slice(5,9) d = slice(9,12) provide information about all the consecutive parts of the index list. Given the list of indices, is there some nifty numpy function that can generate the above slices for me (or their start and stop values)? Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Apr 11 07:42:44 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 11 Apr 2012 06:42:44 -0500 Subject: [Numpy-discussion] Slices from an index list In-Reply-To: <4F854ED9.2090108@gmail.com> References: <4F854ED9.2090108@gmail.com> Message-ID: On Wed, Apr 11, 2012 at 4:28 AM, Mads Ipsen wrote: > Hi, > > Suppose a have an array of indices, say > > indices = [0,1,2,3,5,7,8,9,10,12,13,14] > > Then the following slices > > a = slice(0,4) > b = slice(4,5) > c = slice(5,9) > d = slice(9,12) > > provide information about all the consecutive parts of the index list. > Given the list of indices, is there some nifty numpy function that can > generate the above slices for me (or their start and stop values)? > > Here's one way you could do it: In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14] In [44]: jumps = where(diff(indices) != 1)[0] + 1 In [45]: starts = hstack((0, jumps)) In [46]: ends = hstack((jumps, len(indices))) In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)] In [48]: slices Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), slice(9, 12, None)] Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Wed Apr 11 08:19:37 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 11 Apr 2012 08:19:37 -0400 Subject: [Numpy-discussion] Slices from an index list In-Reply-To: References: <4F854ED9.2090108@gmail.com> Message-ID: <35727F57-0549-4810-A192-92DBF0ACFD7D@yale.edu> > Here's one way you could do it: > > In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14] > > In [44]: jumps = where(diff(indices) != 1)[0] + 1 > > In [45]: starts = hstack((0, jumps)) > > In [46]: ends = hstack((jumps, len(indices))) > > In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)] > > In [48]: slices > Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), slice(9, 12, None)] If you're only going to use the slices to divide up the list, you could use numpy.split and skip creating the slice objects: indices = [0,1,2,3,5,7,8,9,10,12,13,14] jumps = numpy.where(numpy.diff(indices) != 1)[0] + 1 numpy.split(indices, jumps) giving: [array([0, 1, 2, 3]), array([5]), array([ 7, 8, 9, 10]), array([12, 13, 14])] Zach (btw, Warren, the method to calculate the jumps is cute. I'll have to remember that.) From madsipsen at gmail.com Wed Apr 11 09:15:04 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Wed, 11 Apr 2012 15:15:04 +0200 Subject: [Numpy-discussion] Slices from an index list In-Reply-To: References: <4F854ED9.2090108@gmail.com> Message-ID: <4F8583D8.6020107@gmail.com> On 11/04/2012 13:42, Warren Weckesser wrote: > > > On Wed, Apr 11, 2012 at 4:28 AM, Mads Ipsen > wrote: > > Hi, > > Suppose a have an array of indices, say > > indices = [0,1,2,3,5,7,8,9,10,12,13,14] > > Then the following slices > > a = slice(0,4) > b = slice(4,5) > c = slice(5,9) > d = slice(9,12) > > provide information about all the consecutive parts of the index > list. Given the list of indices, is there some nifty numpy > function that can generate the above slices for me (or their start > and stop values)? > > > Here's one way you could do it: > > In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14] > > In [44]: jumps = where(diff(indices) != 1)[0] + 1 > > In [45]: starts = hstack((0, jumps)) > > In [46]: ends = hstack((jumps, len(indices))) > > In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)] > > In [48]: slices > Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), > slice(9, 12, None)] > > > Warren > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks - very helpful! -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsipsen at gmail.com Wed Apr 11 09:15:29 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Wed, 11 Apr 2012 15:15:29 +0200 Subject: [Numpy-discussion] Slices from an index list In-Reply-To: <35727F57-0549-4810-A192-92DBF0ACFD7D@yale.edu> References: <4F854ED9.2090108@gmail.com> <35727F57-0549-4810-A192-92DBF0ACFD7D@yale.edu> Message-ID: <4F8583F1.2040600@gmail.com> On 11/04/2012 14:19, Zachary Pincus wrote: >> Here's one way you could do it: >> >> In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14] >> >> In [44]: jumps = where(diff(indices) != 1)[0] + 1 >> >> In [45]: starts = hstack((0, jumps)) >> >> In [46]: ends = hstack((jumps, len(indices))) >> >> In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)] >> >> In [48]: slices >> Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), slice(9, 12, None)] > If you're only going to use the slices to divide up the list, you could use numpy.split and skip creating the slice objects: > > indices = [0,1,2,3,5,7,8,9,10,12,13,14] > jumps = numpy.where(numpy.diff(indices) != 1)[0] + 1 > numpy.split(indices, jumps) > > giving: > [array([0, 1, 2, 3]), array([5]), array([ 7, 8, 9, 10]), array([12, 13, 14])] > > Zach > > (btw, Warren, the method to calculate the jumps is cute. I'll have to remember that.) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks - very helpful! -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Wed Apr 11 14:45:43 2012 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 11 Apr 2012 20:45:43 +0200 Subject: [Numpy-discussion] RuntimeWarning: numpy.ndarray size changed Message-ID: Hi all, Can someone reproduce the following message ? Python 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy /home/nwagner/local/lib64/python2.7/site-packages/numpy/random/__init__.py:91: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from mtrand import * >> numpy.__version__ '1.7.0.dev-9aac543' >>> Nils From charlesr.harris at gmail.com Wed Apr 11 15:11:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 11 Apr 2012 13:11:53 -0600 Subject: [Numpy-discussion] RuntimeWarning: numpy.ndarray size changed In-Reply-To: References: Message-ID: On Wed, Apr 11, 2012 at 12:45 PM, Nils Wagner wrote: > Hi all, > > Can someone reproduce the following message ? > > Python 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] on > linux2 > Type "help", "copyright", "credits" or "license" for more > information. > >>> import numpy > > /home/nwagner/local/lib64/python2.7/site-packages/numpy/random/__init__.py:91: > RuntimeWarning: numpy.ndarray size changed, may indicate > binary incompatibility > from mtrand import * > >> numpy.__version__ > '1.7.0.dev-9aac543' > >>> > > It's a harmless artifact. When NO_DEPRECATED_API is defined, the ndarray structure is hidden so that direct member access raises a compile error. This confuses the heck out of Cython. There is a fairly recent post on the Cython mailing list about it. For release we will disable the warning so that it won't bother anyone. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 11 15:22:17 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 11 Apr 2012 13:22:17 -0600 Subject: [Numpy-discussion] RuntimeWarning: numpy.ndarray size changed In-Reply-To: References: Message-ID: On Wed, Apr 11, 2012 at 1:11 PM, Charles R Harris wrote: > > > On Wed, Apr 11, 2012 at 12:45 PM, Nils Wagner < > nwagner at iam.uni-stuttgart.de> wrote: > >> Hi all, >> >> Can someone reproduce the following message ? >> >> Python 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] on >> linux2 >> Type "help", "copyright", "credits" or "license" for more >> information. >> >>> import numpy >> >> /home/nwagner/local/lib64/python2.7/site-packages/numpy/random/__init__.py:91: >> RuntimeWarning: numpy.ndarray size changed, may indicate >> binary incompatibility >> from mtrand import * >> >> numpy.__version__ >> '1.7.0.dev-9aac543' >> >>> >> >> > It's a harmless artifact. When NO_DEPRECATED_API is defined, the ndarray > structure is hidden so that direct member access raises a compile error. > This confuses the heck out of Cython. There is a fairly recent post on the > Cython mailing list about it. For release we will disable the warning so > that it won't bother anyone. > > Here is the Cython discussion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Wed Apr 11 15:57:38 2012 From: pmhobson at gmail.com (Paul Hobson) Date: Wed, 11 Apr 2012 12:57:38 -0700 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: Travis et al, This isn't a reply to anything specific in your email and I apologize if there is a better thread or place to share this information. I've been meaning to participate in the discussion for a long time and never got around to it. The main thing I'd like to is convey my typical use of the numpy.ma module as an environmental engineer analyzing censored datasets --contaminant concentrations that are either at well understood values (not masked) or some unknown value below an upper bound (masked). My basic understanding is that this discussion revolved around how to treat masked data (ignored vs missing) and how to implement one, both, or some middle ground between those two concepts. If I'm off-base, just ignore all of the following. For my purposes, numpy.ma is implemented in a way very well suited to my needs. Here's a gist of a something that was *really* hard for me before I discovered numpy.ma and numpy in general. (this is a bit much, see below for the highlights) https://gist.github.com/2361814 The main message here is that I include the upper bounds of the unknown values (detection limits) in my array and use that to statistically estimate their values. I must be able to retrieve the masked detection limits throughout this process. Additionally the masks as currently implemented allow me sort first the undetected values, then the detected values (see __rosRanks in the gist). As boots-on-the-ground user of numpy, I'm ecstatic that this tool exists. I'm also pretty flexible and don't anticipated any major snags in my work if things change dramatically as the masked/missing/ignored functionality evolves. Thanks to everyone for the hard work and great tools, -Paul Hobson On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant wrote: > Hey all, > > I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin. ? ?Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. > > I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. ? I think there are some things still to be decided. ?However, I think some things are pretty clear: > > ? ? ? ?1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. ? The numpy.ma code will remain as a compatibility layer > > ? ? ? ?2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). ?However, the semantic of what the mask implies may change from what numpy.ma uses to having ?a True value meaning selected. > > ? ? ? ?3) There will be missing-data dtypes in NumPy. ? Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. ? These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. > > ? ? ? ?4) I'm still not sure about whether the IGNORED concept is necessary or not. ? ?I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). ? Pauli even created another dimension which I don't totally grok and therefore can't remember. ? Pauli? ?Do you still feel that is a necessary construction? ?But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)? ? ?My current weak view is that it is not really necessary. ? But, I could be convinced otherwise. > > I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. ? I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. ? I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. > > Thanks, > > -Travis > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Wed Apr 11 16:54:10 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 11 Apr 2012 22:54:10 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> Message-ID: <4F85EF72.6000500@astro.uio.no> On 04/10/2012 02:11 AM, Travis Oliphant wrote: > Hi all, > > Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. > > One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. > > Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. > > This is essentially what ctypes does when creating a ctypes function pointer out of: > > func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) > > Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" > > We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. > > Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: > > 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. > > Something like this is what is envisioned here: > > typedef struct { > PyObject_HEAD > char *b_ptr; > } _cfuncptr_object; > > then the function pointer is: > > (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) > > which could be wrapped-up into a nice little NumPy C-API call like > > void * Npy_ctypes_funcptr(obj) > > > 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of > > ctypes.cast(obj, ctypes.c_void_p).value > > > There is working code for this in the ctypes_callback branch of my scipy fork on github. > > > I would like to propose two things: > > * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and > * implement it with the simple pointer dereference above (option #1) > > > Thoughts? I really hope we can find some project-neutral common ground, so that lots of tools (Cython, f2py, numba, C extensions in NumPy and SciPy) can agree on how to "unbox callables". A new extension type in NumPy would not fit this bill I feel. I've created a specification for this; if a number of projects (the ones mentioned above) agree on this or something similar and implement support, we could propose a PEP and do it properly once it has proven itself. http://wiki.cython.org/enhancements/cep1000 In Cython, this may take the form def call_callback(object func): cdef double (*typed_func)(int) typed_func = func return typed_func(4) ...it would be awesome if passing a Numba-compiled function just worked in this example. Dag From teoliphant at gmail.com Wed Apr 11 17:00:28 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 11 Apr 2012 16:00:28 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F85EF72.6000500@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> Message-ID: <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> > On 04/10/2012 02:11 AM, Travis Oliphant wrote: >> Hi all, >> >> Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. >> >> One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. >> >> Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. >> >> This is essentially what ctypes does when creating a ctypes function pointer out of: >> >> func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) >> >> Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" >> >> We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. >> >> Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: >> >> 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. >> >> Something like this is what is envisioned here: >> >> typedef struct { >> PyObject_HEAD >> char *b_ptr; >> } _cfuncptr_object; >> >> then the function pointer is: >> >> (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) >> >> which could be wrapped-up into a nice little NumPy C-API call like >> >> void * Npy_ctypes_funcptr(obj) >> >> >> 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of >> >> ctypes.cast(obj, ctypes.c_void_p).value >> >> >> There is working code for this in the ctypes_callback branch of my scipy fork on github. >> >> >> I would like to propose two things: >> >> * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and >> * implement it with the simple pointer dereference above (option #1) >> >> >> Thoughts? > > I really hope we can find some project-neutral common ground, so that lots of tools (Cython, f2py, numba, C extensions in NumPy and SciPy) can agree on how to "unbox callables". > > A new extension type in NumPy would not fit this bill I feel. I've created a specification for this; if a number of projects (the ones mentioned above) agree on this or something similar and implement support, we could propose a PEP and do it properly once it has proven itself. > > http://wiki.cython.org/enhancements/cep1000 > > In Cython, this may take the form > > def call_callback(object func): > cdef double (*typed_func)(int) > typed_func = func > return typed_func(4) > > ...it would be awesome if passing a Numba-compiled function just worked in this example. Yes, I think we should go the Python PEP route. However, it will take some time to see that to completion (especially with ctypes already in existence). Dag, this would be a very good thing for you to champion however ;-) In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: int id /* Some number that can be used as a "type-check" */ void *func; char *string; We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. -Travis > > Dag From d.s.seljebotn at astro.uio.no Wed Apr 11 17:08:36 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 11 Apr 2012 23:08:36 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> Message-ID: <4F85F2D4.8070807@astro.uio.no> On 04/11/2012 11:00 PM, Travis Oliphant wrote: > >> On 04/10/2012 02:11 AM, Travis Oliphant wrote: >>> Hi all, >>> >>> Some of you are aware of Numba. Numba allows you to create the equivalent of C-function's dynamically from Python. One purpose of this system is to allow NumPy to take these functions and use them in operations like ufuncs, generalized ufuncs, file-reading, fancy-indexing, and so forth. There are actually many use-cases that one can imagine for such things. >>> >>> One question is how do you pass this function pointer to the C-side. On the Python side, Numba allows you to get the raw integer address of the equivalent C-function pointer that it just created out of the Python code. One can think of this as a 32- or 64-bit integer that you can cast to a C-function pointer. >>> >>> Now, how should this C-function pointer be passed from Python to NumPy? One approach is just to pass it as an integer --- in other words have an API in C that accepts an integer as the first argument that the internal function interprets as a C-function pointer. >>> >>> This is essentially what ctypes does when creating a ctypes function pointer out of: >>> >>> func = ctypes.CFUNCTYPE(restype, *argtypes)(integer) >>> >>> Of course the problem with this is that you can easily hand it integers which don't make sense and which will cause a segfault when control is passed to this "function" >>> >>> We could also piggy-back on-top of Ctypes and assume that a ctypes function-pointer object is passed in. This allows some error-checking at least and also has the benefit that one could use ctypes to access a c-function library where these functions were defined. I'm leaning towards this approach. >>> >>> Now, the issue is how to get the C-function pointer (that npy_intp integer) back and hand it off internally. Unfortunately, ctypes does not make it very easy to get this address (that I can see). There is no ctypes C-API, for example. There are two potential options: >>> >>> 1) Create an API for such Ctypes function pointers in NumPy and use the ctypes object structure. If ctypes were to ever change it's object structure we would have to adapt this API. >>> >>> Something like this is what is envisioned here: >>> >>> typedef struct { >>> PyObject_HEAD >>> char *b_ptr; >>> } _cfuncptr_object; >>> >>> then the function pointer is: >>> >>> (*((void **)(((_sp_cfuncptr_object *)(obj))->b_ptr))) >>> >>> which could be wrapped-up into a nice little NumPy C-API call like >>> >>> void * Npy_ctypes_funcptr(obj) >>> >>> >>> 2) Use the Python API of ctypes to do the same thing. This has the advantage of not needing to mirror the simple _cfuncptr_object structure in NumPy but it is *much* slower to get the address. It basically does the equivalent of >>> >>> ctypes.cast(obj, ctypes.c_void_p).value >>> >>> >>> There is working code for this in the ctypes_callback branch of my scipy fork on github. >>> >>> >>> I would like to propose two things: >>> >>> * creating a Npy_ctypes_funcptr(obj) function in the C-API of NumPy and >>> * implement it with the simple pointer dereference above (option #1) >>> >>> >>> Thoughts? >> >> I really hope we can find some project-neutral common ground, so that lots of tools (Cython, f2py, numba, C extensions in NumPy and SciPy) can agree on how to "unbox callables". >> >> A new extension type in NumPy would not fit this bill I feel. I've created a specification for this; if a number of projects (the ones mentioned above) agree on this or something similar and implement support, we could propose a PEP and do it properly once it has proven itself. >> >> http://wiki.cython.org/enhancements/cep1000 >> >> In Cython, this may take the form >> >> def call_callback(object func): >> cdef double (*typed_func)(int) >> typed_func = func >> return typed_func(4) >> >> ...it would be awesome if passing a Numba-compiled function just worked in this example. > > Yes, I think we should go the Python PEP route. However, it will take some time to see that to completion (especially with ctypes already in existence). Dag, this would be a very good thing for you to champion however ;-) I was NOT proposing a PEP. The spec is created so that it can be implemented *now*, by the tools "we" control (and still be very efficient). A "sci-PEP", if you will; a mutual understanding between Cython, NumPy, numba (and ideally f2py, which already has something similar, if anyone bothers). When this is implemented in all the tools we care about, we can propose something even nicer as a PEP, but that's far down the road; it'll be another couple of years before I'm on Python 3. > > In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: > > int id /* Some number that can be used as a "type-check" */ > void *func; > char *string; > > We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. What is not clear to me is how one get from the Python callable to the capsule. Or do you simply intend to pass a non-callable capsule as an argument in place of the callback? Dag From teoliphant at gmail.com Wed Apr 11 17:23:22 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 11 Apr 2012 16:23:22 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F85F2D4.8070807@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> Message-ID: >>>> Thoughts? >>> >>> I really hope we can find some project-neutral common ground, so that lots of tools (Cython, f2py, numba, C extensions in NumPy and SciPy) can agree on how to "unbox callables". >>> >>> A new extension type in NumPy would not fit this bill I feel. I've created a specification for this; if a number of projects (the ones mentioned above) agree on this or something similar and implement support, we could propose a PEP and do it properly once it has proven itself. >>> >>> http://wiki.cython.org/enhancements/cep1000 >>> >>> In Cython, this may take the form >>> >>> def call_callback(object func): >>> cdef double (*typed_func)(int) >>> typed_func = func >>> return typed_func(4) >>> >>> ...it would be awesome if passing a Numba-compiled function just worked in this example. >> >> Yes, I think we should go the Python PEP route. However, it will take some time to see that to completion (especially with ctypes already in existence). Dag, this would be a very good thing for you to champion however ;-) > > I was NOT proposing a PEP. > > The spec is created so that it can be implemented *now*, by the tools > "we" control (and still be very efficient). A "sci-PEP", if you will; a > mutual understanding between Cython, NumPy, numba (and ideally f2py, > which already has something similar, if anyone bothers). > > When this is implemented in all the tools we care about, we can propose > something even nicer as a PEP, but that's far down the road; it'll be > another couple of years before I'm on Python 3. Perfect :-) We are on the same page.... > >> >> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >> >> int id /* Some number that can be used as a "type-check" */ >> void *func; >> char *string; >> >> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. > > What is not clear to me is how one get from the Python callable to the > capsule. This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. > > Or do you simply intend to pass a non-callable capsule as an argument in > place of the callback? I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. Thanks, -Travis From questions.anon at gmail.com Wed Apr 11 21:15:18 2012 From: questions.anon at gmail.com (questions anon) Date: Thu, 12 Apr 2012 11:15:18 +1000 Subject: [Numpy-discussion] mask array and add to list Message-ID: I am trying to mask an array and then add the array to a list, so I can then go on and calculate the max, min and mean of that list. The mask seems to work when I check each array. I check each array by finding the max, mean and mean and comparing with the unmasked array (they are different). However, when I add the array to the list and then check the max, min and mean and compare to the list of unmasked array I find that they are the same (which they shouldn't be). The problem seems to be occurring when I add the masked array to the list. Is there something I am doing wrong or not doing?? Any feedback will be greatly appreciated. import numpy as np import matplotlib.pyplot as plt from numpy import ma as MA from mpl_toolkits.basemap import Basemap from datetime import datetime import os from StringIO import StringIO from osgeo import gdal, gdalnumeric, ogr, osr import glob from datetime import date, timedelta import matplotlib.dates as mdates import time shapefile=r"d:/Vic/Vic_dissolve.shp" ## Create masked array from shapefile xmin,ymin,xmax,ymax=[111.975,-9.975, 156.275,-44.525] ncols,nrows=[886, 691] maskvalue = 1 xres=(xmax-xmin)/float(ncols) yres=(ymax-ymin)/float(nrows) geotransform=(xmin,xres,0,ymax,0, -yres) 0 src_ds = ogr.Open(shapefile) src_lyr=src_ds.GetLayer() dst_ds = gdal.GetDriverByName('MEM').Create('',ncols, nrows, 1 ,gdal.GDT_Byte) dst_rb = dst_ds.GetRasterBand(1) dst_rb.Fill(0) #initialise raster with zeros dst_rb.SetNoDataValue(0) dst_ds.SetGeoTransform(geotransform) err = gdal.RasterizeLayer(dst_ds, [maskvalue], src_lyr) dst_ds.FlushCache() mask_arr=dst_ds.GetRasterBand(1).ReadAsArray() np.set_printoptions(threshold='nan') mask_arr[mask_arr == 255] = 1 newmask=MA.masked_equal(mask_arr,0) ### calculate monthly summary stats for VIC Only rainmax=[] rainmin=[] rainmean=[] rainmaxaust=[] rainminaust=[] rainmeanaust=[] yearmonthlist=[] yearmonth_int=[] GLOBTEMPLATE = r"e:/Rainfall/rainfall-{year}/r{year}{month:02}??.txt" def accumulate_month(year, month): files = glob.glob(GLOBTEMPLATE.format(year=year, month=month)) monthlyrain=[] monthlyrainaust=[] for ifile in files: f=np.genfromtxt(ifile,skip_header=6) viconly_f=np.ma.masked_array(f, mask=newmask.mask) #print "f:", f.max(), f.mean() #print "viconly_f:", viconly_f.max(), viconly_f.mean() monthlyrain.append(viconly_f) monthlyrainaust.append(f) yearmonth=str(year)+str(month) yearmonthlist.append(yearmonth) r_max, r_mean, r_min=np.max(monthlyrain), np.mean(monthlyrain), np.min(monthlyrain) ra_max, ra_mean, ra_min=np.max(monthlyrainaust), np.mean(monthlyrainaust), np.min(monthlyrainaust) rainmax.append(r_max) rainmean.append(r_mean) rainmean.append(r_min) rainmaxaust.append(ra_max) rainminaust.append(ra_min) rainmeanaust.append(ra_mean) print "viconly:", yearmonth,r_max, r_mean, r_min print "aust:", yearmonth,ra_max, ra_mean, ra_min ###loop through months and years stop_month = datetime(2011, 10, 01) month = datetime(2011, 01, 01) while month < stop_month: accumulate_month(month.year, month.month) month += timedelta(days=32) month = month.replace(day=01) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mesanthu at gmail.com Wed Apr 11 23:38:20 2012 From: mesanthu at gmail.com (santhu kumar) Date: Wed, 11 Apr 2012 22:38:20 -0500 Subject: [Numpy-discussion] partial computations Message-ID: Hello all, I am trying to optimise a code and want your suggestions. A : - NX3 matrix (coordinates of N points) After performing pairwise distance computations(called pdist) between these points, depending upon a condition that the distance is in, I would perform further computations. Most of the computations require schur products (element by element) of NXN matrices with each other and then computing either the coloumn sum or row sum. As N goes to be large, these computations are taking some time (0.7 secs) which is not much generally but since this is being called many times, it acts as a bottleneck. I want to leverage on the fact that many of the NXN computations are not going to be used, or would be set to zero (if the pdist is greater than some minimum distance). How do i achieve it ?? Is masked array the elegant solution? Would it save me time? Thanks Santhosh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceball at gmail.com Thu Apr 12 04:05:25 2012 From: ceball at gmail.com (Chris Ball) Date: Thu, 12 Apr 2012 08:05:25 +0000 (UTC) Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? Message-ID: Hi, I'm trying out various continuous integration options, so I happen to be testing NumPy on several platforms that I don't normally use. Recently, I've been getting a segmentation fault on Debian 6 (with Python 2.7.2): Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64 GNU/Linux (Debian GNU/Linux 6.0 \n \l) nosetests --verbose /home/slave/tmp/numpy/numpy/random/__init__.py:91: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from mtrand import * test_api.test_fastCopyAndTranspose ... ok test_api.test_array_astype ... ok test_api.test_copyto_fromscalar ... ok test_api.test_copyto ... ok test_api.test_copyto_maskna ... ok test_api.test_copy_order ... ok Basic test of array2string. ... ok Test custom format function for each element in array. ... ok This should only apply to 0-D arrays. See #1218. ... ok test_arrayprint.TestArrayRepr.test_nan_inf ... ok test_str (test_arrayprint.TestComplexArray) ... ok test_arrayprint.TestPrintOptions.test_basic ... ok test_arrayprint.TestPrintOptions.test_formatter ... ok test_arrayprint.TestPrintOptions.test_formatter_reset ... ok Ticket 844. ... ok test_blasdot.test_blasdot_used ... SKIP: Skipping test: test_blasdot_used Numpy is not compiled with _dotblas test_blasdot.test_dot_2args ... ok test_blasdot.test_dot_3args ... ok test_blasdot.test_dot_3args_errors ... ok test_creation_overflow (test_datetime.TestDateTime) ... ok test_datetime_add (test_datetime.TestDateTime) ... ok test_datetime_arange (test_datetime.TestDateTime) ... ok test_datetime_array_find_type (test_datetime.TestDateTime) ... ok test_datetime_array_str (test_datetime.TestDateTime) ... ok test_datetime_as_string (test_datetime.TestDateTime) ... ok test_datetime_as_string_timezone (test_datetime.TestDateTime) ... /home/slave/ tmp/numpy/numpy/core/tests/test_datetime.py:1319: UserWarning: pytz not found, pytz compatibility tests skipped warnings.warn("pytz not found, pytz compatibility tests skipped") ok test_datetime_busday_holidays_count (test_datetime.TestDateTime) ... ok test_datetime_busday_holidays_offset (test_datetime.TestDateTime) ... ok test_datetime_busday_offset (test_datetime.TestDateTime) ... ok test_datetime_busdaycalendar (test_datetime.TestDateTime) ... ok test_datetime_casting_rules (test_datetime.TestDateTime) ... ok test_datetime_divide (test_datetime.TestDateTime) ... ok test_datetime_dtype_creation (test_datetime.TestDateTime) ... ok test_datetime_is_busday (test_datetime.TestDateTime) ... ok test_datetime_like (test_datetime.TestDateTime) ... ok test_datetime_maximum_reduce (test_datetime.TestDateTime) ... ok test_datetime_minmax (test_datetime.TestDateTime) ... ok test_datetime_multiply (test_datetime.TestDateTime) ... ok test_datetime_nat_casting (test_datetime.TestDateTime) ... ok test_datetime_scalar_construction (test_datetime.TestDateTime) ... ok test_datetime_string_conversion (test_datetime.TestDateTime) ... ERROR test_datetime_subtract (test_datetime.TestDateTime) ... Segmentation fault With Python 2.6 there doesn't seem to be a problem on the same machine. Unfortunately, I haven't had time to investigate (I don't have Debian 6 to use myself, and I just started a job that doesn't involve any Python...). However, according to the Jenkins instance on ShiningPanda.com, the problem began with these changes: BUG: ticket #1578, Fix python-debug warning for python >= 2.7. STY: Small style fixes. For now, that's all I can say; I haven't manually verified the problem myself (that it exists, or that it truly started after the changes above). I hope to be able to investigate further at the weekend, but I thought I'd post to the list now in case someone else can verify the problem. Chris Segmentation fault is buried in console output of Jenkins: https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console The previous build was ok: https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/5/console Changes that Jenkins claims are responsible: https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/ changes#detail0 From holgerherrlich05 at arcor.de Thu Apr 12 07:02:31 2012 From: holgerherrlich05 at arcor.de (Holger Herrlich) Date: Thu, 12 Apr 2012 13:02:31 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F833647.1070804@astro.uio.no> References: <4F7AF5C1.5090604@arcor.de> <4F81D81D.3030609@arcor.de> <4F833647.1070804@astro.uio.no> Message-ID: <4F86B647.8020801@arcor.de> On 04/09/2012 09:19 PM, Dag Sverre Seljebotn wrote: > On 04/08/2012 08:25 PM, Holger Herrlich wrote: >> >> That all sounds like no option -- sad. Cython is no solution cause, >> all I want is to leave Python Syntax in favor for strong OOP design >> patterns. > > I'm sorry, I'm trying and trying to make heads and tails of this > paragraph, but I don't manage to. If you could rephrase it that would > be very helpful. (But I'm afraid that if you believe that C++ is more > object-oriented than Python, you'll find most people disagree. > Perhaps you meant that you want static typing?) Yes and further, design patterns known to me are very java (static) specific. In Python you can often work around the "rules" (or intension). > Any wrapping tool (Cython, ctypes, probably SWIG but don't know) will > allow you to pass the pointer to the data array, the npy_intp* shape > array, and the npy_intp* strides array. > > Really, that's all you need. And given those three, writing a simple > C++ class wrapping the arrays and allowing you to conveniently index > into the array is done in 15 minutes. On the scipy-pages I miss C/C++ samples. I got some idea of what to include and what to link to by running distutils, but wonder why here is no single one gcc command line. (Anyway, read further.) > If you need more than that -- that is, what you want is essentially > to "use NumPy from C++", which slicing and ufuncs and reductions and > so on -- then you should probably look into a C++ array library (such > as Eigen or Blitz++ or the array stuff in boost). Then, pass the > aforementioned data and shape/stride arrays to the array library. I see a bit clearer now that the task can be splited up. For first a shared library will be used by Python (embedding C in Python). Ctypes might do that. SWIG also. Second, to benefit from: slicing, in sito calculation (ufunc?) and using the histogram[2D]() functions as I do in Python. It's not necessary, for the second approach, to use Python at all. It should be possible to compile a shared library. Both will become mixed up if the ndarray crosses the library boundary, maybe towards Python. More complicated SWIG. Taking care of deallocation. So I will have a look on Blitz++, Boost, or native C++ arrays (here performance will come to play), but I also consider the concept of using NumPy-arrays. Or the concept of converting the array at an output layer. I'm not that familiar yet with the C-API of NumPy, but do I see it right that in using NumPy a whole Python is embedded to the Program? Inclusively the garbage collector. Is this, why distutils allways link with pthread (linux). still reading, Holger (And hopefully this will be better to read.) > Dag _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Thu Apr 12 07:37:49 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 12 Apr 2012 13:37:49 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F86B647.8020801@arcor.de> References: <4F7AF5C1.5090604@arcor.de> <4F81D81D.3030609@arcor.de> <4F833647.1070804@astro.uio.no> <4F86B647.8020801@arcor.de> Message-ID: <4F86BE8D.1040007@astro.uio.no> On 04/12/2012 01:02 PM, Holger Herrlich wrote: > > On 04/09/2012 09:19 PM, Dag Sverre Seljebotn wrote: >> On 04/08/2012 08:25 PM, Holger Herrlich wrote: >>> >>> That all sounds like no option -- sad. Cython is no solution cause, >>> all I want is to leave Python Syntax in favor for strong OOP design >>> patterns. >> >> I'm sorry, I'm trying and trying to make heads and tails of this >> paragraph, but I don't manage to. If you could rephrase it that would >> be very helpful. (But I'm afraid that if you believe that C++ is more >> object-oriented than Python, you'll find most people disagree. >> Perhaps you meant that you want static typing?) > > Yes and further, design patterns known to me are very java (static) > specific. In Python you can often work around the "rules" (or intension). Most design patterns I know work well in Python as well, but may sometimes be overkill because Python has simpler mechanisms to do the same compared to Java. E.g., I often use the visitor pattern in Python. Yes, the Python philosophy is that we don't waste time on assuming that the user of our code doesn't abuse it. "If you break it, you get to keep the pieces." > >> Any wrapping tool (Cython, ctypes, probably SWIG but don't know) will >> allow you to pass the pointer to the data array, the npy_intp* shape >> array, and the npy_intp* strides array. >> >> Really, that's all you need. And given those three, writing a simple >> C++ class wrapping the arrays and allowing you to conveniently index >> into the array is done in 15 minutes. > > On the scipy-pages I miss C/C++ samples. I got some idea of what to > include and what to link to by running distutils, but wonder why here is > no single one gcc command line. (Anyway, read further.) It's difficult to provide such examples because either you know enough technical details that what to do is "obvious", or the amount of education needed (about how C works and how CPython works) is too much to have in such a place, and it is not specific to the SciPy project. There is documentation on docs.scipy.org about the C interface to NumPy. > >> If you need more than that -- that is, what you want is essentially >> to "use NumPy from C++", which slicing and ufuncs and reductions and >> so on -- then you should probably look into a C++ array library (such >> as Eigen or Blitz++ or the array stuff in boost). Then, pass the >> aforementioned data and shape/stride arrays to the array library. > > I see a bit clearer now that the task can be splited up. For first a > shared library will be used by Python (embedding C in Python). Ctypes > might do that. SWIG also. Second, to benefit from: slicing, in sito > calculation (ufunc?) and using the histogram[2D]() functions as I do in > Python. It's not necessary, for the second approach, to use Python at > all. It should be possible to compile a shared library. > > Both will become mixed up if the ndarray crosses the library boundary, > maybe towards Python. More complicated SWIG. Taking care of deallocation. > > So I will have a look on Blitz++, Boost, or native C++ arrays (here > performance will come to play), but I also consider the concept of using > NumPy-arrays. Or the concept of converting the array at an output layer. Look at eigen too. And be advised that "native C++ arrays" (if there is such a think) should only be used in the 1D form; i.e. where you to access element (i, j) in a 2D array do "arr[n * j + i]" on a "double*", NOT where you do arr[i][j] on a "double**" (the latter simply doesn't work well with any scientific codes). > I'm not that familiar yet with the C-API of NumPy, but do I see it right > that in using NumPy a whole Python is embedded to the Program? Yes (of course). Remember that large parts of the NumPy package is written in Python code too. NumPy and Python. "Using NumPy (the library) from C++" doesn't really make sense and no tools will let you do that easily; the idea is to transfer the *data* = what the NumPy array wraps. If you want to use C++, it's because you want to use C++ tools anyway, right? If you just want to use "Python with types" then Cython is really the only option. > Inclusively the garbage collector. Is this, why distutils allways link > with pthread (linux). I believe distutils isn't too smart; it always uses the flags that were used to compile Python, and Python was compiled with pthread on your machine. Dag From ben.root at ou.edu Thu Apr 12 08:51:17 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 12 Apr 2012 08:51:17 -0400 Subject: [Numpy-discussion] partial computations In-Reply-To: References: Message-ID: On Wed, Apr 11, 2012 at 11:38 PM, santhu kumar wrote: > Hello all, > > I am trying to optimise a code and want your suggestions. > A : - NX3 matrix (coordinates of N points) > > After performing pairwise distance computations(called pdist) between > these points, depending upon a condition that the distance is in, I would > perform further computations. > Most of the computations require schur products (element by element) of > NXN matrices with each other and then computing either the coloumn sum or > row sum. > > As N goes to be large, these computations are taking some time (0.7 secs) > which is not much generally but since this is being called many times, it > acts as a bottleneck. > I want to leverage on the fact that many of the NXN computations are not > going to be used, or would be set to zero (if the pdist is greater than > some minimum distance). > > How do i achieve it ?? Is masked array the elegant solution? Would it save > me time? > > Thanks > Santhosh > > > You might want to consider using scipy.spatial's KDTree as a way to efficiently find all points that are within a specified distance from each other. Then, using those pairs, load up a sparse array with only the relevant pairs. It should save in computation and memory as well. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Apr 12 09:07:09 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 12 Apr 2012 15:07:09 +0200 Subject: [Numpy-discussion] change masked array member values with conditional selection Message-ID: Dear all numpy users, I am using numpy 1.6.1 I find that if you want to change some member values in a masked array according to some conditional selection. suppose a is a masked array, you want to change all value below zero to zero. you must always use a[np.nonzero(a<0)]=0 rather than a[a<0]=0. the latter will lose all masked elements. an example: In [24]: a=np.arange(10.) In [25]: a=np.ma.masked_array(a,mask=a<3) In [28]: a[a<5]=2. In [29]: a Out[29]: masked_array(data = [2.0 2.0 2.0 2.0 2.0 5.0 6.0 7.0 8.0 9.0], mask = [False False False False False False False False False False], fill_value = 1e+20) In [30]: b=np.arange(10.) In [31]: b=np.ma.masked_array(b,mask=b<3) In [34]: b[np.nonzero(b<5)]=2. In [35]: b Out[35]: masked_array(data = [-- -- -- 2.0 2.0 5.0 6.0 7.0 8.0 9.0], mask = [ True True True False False False False False False False], fill_value = 1e+20) cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Thu Apr 12 09:41:56 2012 From: tim at cerazone.net (Tim Cera) Date: Thu, 12 Apr 2012 09:41:56 -0400 Subject: [Numpy-discussion] mask array and add to list In-Reply-To: References: Message-ID: Use 'ma.max' instead of 'np.max'. This might be a bug OR an undocumented feature. :-) import numpy.ma as ma marr = ma.array(range(10), mask=[0,0,0,0,0,1,1,1,1,1]) np.max(marr) 4 # mask is used a = [] a.append(marr) a.append(marr) np.max(a) 9 # mask is not used ma.max(a) 4 # mask is used Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 12 10:16:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Apr 2012 08:16:22 -0600 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball wrote: > Hi, > > I'm trying out various continuous integration options, so I happen to be > testing NumPy on several platforms that I don't normally use. > > Recently, I've been getting a segmentation fault on Debian 6 (with Python > 2.7.2): > > Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 > x86_64 > GNU/Linux (Debian GNU/Linux 6.0 \n \l) > > nosetests --verbose > /home/slave/tmp/numpy/numpy/random/__init__.py:91: RuntimeWarning: > numpy.ndarray size changed, may indicate binary incompatibility > from mtrand import * > test_api.test_fastCopyAndTranspose ... ok > test_api.test_array_astype ... ok > test_api.test_copyto_fromscalar ... ok > test_api.test_copyto ... ok > test_api.test_copyto_maskna ... ok > test_api.test_copy_order ... ok > Basic test of array2string. ... ok > Test custom format function for each element in array. ... ok > This should only apply to 0-D arrays. See #1218. ... ok > test_arrayprint.TestArrayRepr.test_nan_inf ... ok > test_str (test_arrayprint.TestComplexArray) ... ok > test_arrayprint.TestPrintOptions.test_basic ... ok > test_arrayprint.TestPrintOptions.test_formatter ... ok > test_arrayprint.TestPrintOptions.test_formatter_reset ... ok > Ticket 844. ... ok > test_blasdot.test_blasdot_used ... SKIP: Skipping test: test_blasdot_used > Numpy is not compiled with _dotblas > test_blasdot.test_dot_2args ... ok > test_blasdot.test_dot_3args ... ok > test_blasdot.test_dot_3args_errors ... ok > test_creation_overflow (test_datetime.TestDateTime) ... ok > test_datetime_add (test_datetime.TestDateTime) ... ok > test_datetime_arange (test_datetime.TestDateTime) ... ok > test_datetime_array_find_type (test_datetime.TestDateTime) ... ok > test_datetime_array_str (test_datetime.TestDateTime) ... ok > test_datetime_as_string (test_datetime.TestDateTime) ... ok > test_datetime_as_string_timezone (test_datetime.TestDateTime) ... > /home/slave/ > tmp/numpy/numpy/core/tests/test_datetime.py:1319: UserWarning: pytz not > found, > pytz compatibility tests skipped > warnings.warn("pytz not found, pytz compatibility tests skipped") > ok > test_datetime_busday_holidays_count (test_datetime.TestDateTime) ... ok > test_datetime_busday_holidays_offset (test_datetime.TestDateTime) ... ok > test_datetime_busday_offset (test_datetime.TestDateTime) ... ok > test_datetime_busdaycalendar (test_datetime.TestDateTime) ... ok > test_datetime_casting_rules (test_datetime.TestDateTime) ... ok > test_datetime_divide (test_datetime.TestDateTime) ... ok > test_datetime_dtype_creation (test_datetime.TestDateTime) ... ok > test_datetime_is_busday (test_datetime.TestDateTime) ... ok > test_datetime_like (test_datetime.TestDateTime) ... ok > test_datetime_maximum_reduce (test_datetime.TestDateTime) ... ok > test_datetime_minmax (test_datetime.TestDateTime) ... ok > test_datetime_multiply (test_datetime.TestDateTime) ... ok > test_datetime_nat_casting (test_datetime.TestDateTime) ... ok > test_datetime_scalar_construction (test_datetime.TestDateTime) ... ok > test_datetime_string_conversion (test_datetime.TestDateTime) ... ERROR > test_datetime_subtract (test_datetime.TestDateTime) ... Segmentation fault > > I don't see this here, also on AMD x64_86 and Python 2.7. I suspect different unicode sizes leading to different code paths, or possibly the need for a clean install. Grr, now I've got to install debian somewhere... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Thu Apr 12 11:17:25 2012 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 12 Apr 2012 17:17:25 +0200 Subject: [Numpy-discussion] change masked array member values with conditional selection In-Reply-To: References: Message-ID: Ciao Chao, That known quirk deserves to be better documented, I agree. There's a simple explanation for this behavior: Because `a` is a masked array, `(a < 5)` is also a masked array with dtype=np.bool, and whose mask is the same as `a`'s. In your example, that's: masked_array(data = [-- -- -- True True False False False False False], mask = [ True True True False False False False False False False], fill_value = True) Now, what should we do with the masked entries ? Should we consider them as False? As True? That's up to you, actually... Because it's never a good idea to use masked arrays as condition (as you just experienced), I advise you to be explicit. In your case, that'd be >>> a[(a<5).filled(False)] = 2 If you go in the source code of numpy.ma.core, in the __getitem__/__setitem__ methods, you'll find a little warning that I commented (because numpy.ma is already slow enough that I didn't want to make it even slower)... On 4/12/12, Chao YUE wrote: > Dear all numpy users, > > I am using numpy 1.6.1 > > I find that if you want to change some member values in a masked array > according to some conditional selection. > suppose a is a masked array, you want to change all value below zero to > zero. > you must always use > > a[np.nonzero(a<0)]=0 > > rather than a[a<0]=0. > > the latter will lose all masked elements. > > > an example: > In [24]: a=np.arange(10.) > > In [25]: a=np.ma.masked_array(a,mask=a<3) > > In [28]: a[a<5]=2. > > In [29]: a > Out[29]: > masked_array(data = [2.0 2.0 2.0 2.0 2.0 5.0 6.0 7.0 8.0 9.0], > mask = [False False False False False False False False False > False], > fill_value = 1e+20) > > > > In [30]: b=np.arange(10.) > > In [31]: b=np.ma.masked_array(b,mask=b<3) > > In [34]: b[np.nonzero(b<5)]=2. > > In [35]: b > Out[35]: > masked_array(data = [-- -- -- 2.0 2.0 5.0 6.0 7.0 8.0 9.0], > mask = [ True True True False False False False False False > False], > fill_value = 1e+20) > > cheers, > > Chao > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > From bryanv at continuum.io Thu Apr 12 11:37:32 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Thu, 12 Apr 2012 10:37:32 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: <4F86F6BC.3060503@continuum.io> On 4/10/12 2:40 PM, Ralf Gommers wrote: > > > On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven > wrote: > > On 4/3/12 4:18 PM, Ralf Gommers wrote: > > Here some first impressions. > > > > The good: > > - It's responsive! > > - It remembers my preferences (view type, # of issues per page, > etc.) > > - Editing multiple issues with the command window is easy. > > - Search and filter functionality is powerful > > > > The bad: > > - Multiple projects are supported, but issues are then really mixed. > > The way this works doesn't look very useful for combined admin of > > numpy/scipy trackers. > > - I haven't found a way yet to make versions and subsystems > appear in > > the one-line issue overview. > > - Fixed issues are still shown by default. There are several open > > issues filed against youtrack about this, with no reasonable > answers. > > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > > downloaded. > > - No direct VCS integration, only via Teamcity (not set up, so can't > > evaluate). > > - No useful default views as in Trac > > (http://projects.scipy.org/scipy/report). > > Ralf, regarding some of the issues: > > > Hi Bryan, thanks for looking into this. > > > I think for numpy/scipy trackers, we could simply run separate > instances > of YouTrack for each. > > > That would work. It does mean that there's no maintenance advantage > over using Trac here. > > Also we can certainly create some standard > queries. It's a small pain not to have useful defaults, but it's > only a > one-time pain. :) > > > That should help. > > Also, what kind of integration are you looking for with github? There > does appear to be the ability to issue commands to youtrack > through git > commits, which does not depend on TeamCity, as best I can tell: > > http://confluence.jetbrains.net/display/YTD3/GitHub+Integration > http://blogs.jetbrains.com/youtrack/tag/github-integration/ > > I'm not sure this is what you were thinking about though. > > > That does help. The other thing that's useful is to reference commits > (like commit:abcd123 in current Trac) and have them turned into links > to commits on Github. This is not a showstopper for me though. > > > For the other issues, Maggie or I can try and see what we can find out > about implementing them, or working around them, this week. > > > I'd say that from the issues I mentioned, the biggest one is the > one-line view. So these two: > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. Ralf, I don't believe there is a solution for the first issue. There are tickets on YouTrack filed specifically asking for this feature, but it does not seem clear they want to implement it. For the second, I created a saved search called "open" that I think should show up for all users (let me know if it does not). The nice thing is, this save search can be referenced in other searches, so you can do: saved search: open Subsystem: test3 sort by: updated and get all the open tickets for that subsystem. I think basically it's just a different type of workflow, fixed tickets show up "by default" because everything shows up by default, there is no search criteria to exclude them. But it seems easy enough to combine searches to get what you want. Another trac feature that I did realize is missing is a "group by" functionality. So you can sort by subsytem, but there is no notion of nicely grouping by them as in trac. There's a feature request for this, too, but who knows if or when it will get put in. Travis mentioned he had created a code.google.com site for numpy a long time ago that never got used. I think Maggie is going to create a few dozen test tickets on its issue tracker today and then we can also have that to evaluate as well. Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Thu Apr 12 11:59:10 2012 From: lists at hilboll.de (Andreas H.) Date: Thu, 12 Apr 2012 17:59:10 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: <59f45acc84318c05f53067acc40fc905.squirrel@srv2.s4y.tournesol-consulting.eu> Have you guys actually thought about JIRA? Atlassian offers free licences for open source projects ... Cheers, Andreas. From chaoyuejoy at gmail.com Thu Apr 12 12:01:43 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 12 Apr 2012 18:01:43 +0200 Subject: [Numpy-discussion] change masked array member values with conditional selection In-Reply-To: References: Message-ID: Thanks Pierre. It's a good idea to always use a[(a<5).filled(False)] = 2 I don't understand very well the underlying structure but it's good to know some. Chao 2012/4/12 Pierre GM > Ciao Chao, > > That known quirk deserves to be better documented, I agree. > > There's a simple explanation for this behavior: > Because `a` is a masked array, `(a < 5)` is also a masked array with > dtype=np.bool, and whose mask is the same as `a`'s. In your example, > that's: > masked_array(data = [-- -- -- True True False False False False False], > mask = [ True True True False False False False False > False False], > fill_value = True) > Now, what should we do with the masked entries ? Should we consider > them as False? As True? That's up to you, actually... > Because it's never a good idea to use masked arrays as condition (as > you just experienced), I advise you to be explicit. In your case, > that'd be > >>> a[(a<5).filled(False)] = 2 > > If you go in the source code of numpy.ma.core, in the > __getitem__/__setitem__ methods, you'll find a little warning that I > commented (because numpy.ma is already slow enough that I didn't want > to make it even slower)... > > On 4/12/12, Chao YUE wrote: > > Dear all numpy users, > > > > I am using numpy 1.6.1 > > > > I find that if you want to change some member values in a masked array > > according to some conditional selection. > > suppose a is a masked array, you want to change all value below zero to > > zero. > > you must always use > > > > a[np.nonzero(a<0)]=0 > > > > rather than a[a<0]=0. > > > > the latter will lose all masked elements. > > > > > > an example: > > In [24]: a=np.arange(10.) > > > > In [25]: a=np.ma.masked_array(a,mask=a<3) > > > > In [28]: a[a<5]=2. > > > > In [29]: a > > Out[29]: > > masked_array(data = [2.0 2.0 2.0 2.0 2.0 5.0 6.0 7.0 8.0 9.0], > > mask = [False False False False False False False False > False > > False], > > fill_value = 1e+20) > > > > > > > > In [30]: b=np.arange(10.) > > > > In [31]: b=np.ma.masked_array(b,mask=b<3) > > > > In [34]: b[np.nonzero(b<5)]=2. > > > > In [35]: b > > Out[35]: > > masked_array(data = [-- -- -- 2.0 2.0 5.0 6.0 7.0 8.0 9.0], > > mask = [ True True True False False False False False > False > > False], > > fill_value = 1e+20) > > > > cheers, > > > > Chao > > -- > > > *********************************************************************************** > > Chao YUE > > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > > UMR 1572 CEA-CNRS-UVSQ > > Batiment 712 - Pe 119 > > 91191 GIF Sur YVETTE Cedex > > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Apr 12 12:04:26 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 12 Apr 2012 18:04:26 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <4F86F6BC.3060503@continuum.io> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> <4F86F6BC.3060503@continuum.io> Message-ID: On Thu, Apr 12, 2012 at 5:37 PM, Bryan Van de Ven wrote: > On 4/10/12 2:40 PM, Ralf Gommers wrote: > > > > On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: > >> On 4/3/12 4:18 PM, Ralf Gommers wrote: >> > Here some first impressions. >> > >> > The good: >> > - It's responsive! >> > - It remembers my preferences (view type, # of issues per page, etc.) >> > - Editing multiple issues with the command window is easy. >> > - Search and filter functionality is powerful >> > >> > The bad: >> > - Multiple projects are supported, but issues are then really mixed. >> > The way this works doesn't look very useful for combined admin of >> > numpy/scipy trackers. >> > - I haven't found a way yet to make versions and subsystems appear in >> > the one-line issue overview. >> > - Fixed issues are still shown by default. There are several open >> > issues filed against youtrack about this, with no reasonable answers. >> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only >> > downloaded. >> > - No direct VCS integration, only via Teamcity (not set up, so can't >> > evaluate). >> > - No useful default views as in Trac >> > (http://projects.scipy.org/scipy/report). >> >> Ralf, regarding some of the issues: >> > > Hi Bryan, thanks for looking into this. > >> >> I think for numpy/scipy trackers, we could simply run separate instances >> of YouTrack for each. > > > That would work. It does mean that there's no maintenance advantage over > using Trac here. > > Also we can certainly create some standard >> queries. It's a small pain not to have useful defaults, but it's only a >> one-time pain. :) >> > > That should help. > > >> Also, what kind of integration are you looking for with github? There >> does appear to be the ability to issue commands to youtrack through git >> commits, which does not depend on TeamCity, as best I can tell: >> >> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration >> http://blogs.jetbrains.com/youtrack/tag/github-integration/ >> >> I'm not sure this is what you were thinking about though. >> > > That does help. The other thing that's useful is to reference commits > (like commit:abcd123 in current Trac) and have them turned into links to > commits on Github. This is not a showstopper for me though. > >> >> For the other issues, Maggie or I can try and see what we can find out >> about implementing them, or working around them, this week. >> > > I'd say that from the issues I mentioned, the biggest one is the one-line > view. So these two: > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. > > > Ralf, > > I don't believe there is a solution for the first issue. There are tickets > on YouTrack filed specifically asking for this feature, but it does not > seem clear they want to implement it. > > For the second, I created a saved search called "open" that I think should > show up for all users (let me know if it does not). The nice thing is, this > save search can be referenced in other searches, so you can do: > > saved search: open Subsystem: test3 sort by: updated > > and get all the open tickets for that subsystem. I think basically it's > just a different type of workflow, fixed tickets show up "by default" > because everything shows up by default, there is no search criteria to > exclude them. But it seems easy enough to combine searches to get what you > want. > Well, not that easy. For example, if I want to go through all open tickets and get an overview of how many open tickets there are for each scipy module. In Trac I can just sort by "component" and see the (approximate) answer. In Youtrack I'd have to execute "saved search: open Subsystem:xxx" once for each module. Of course a tracker with a useful REST API where you could get the exact answer with a few lines of Python code would be even better.... Another trac feature that I did realize is missing is a "group by" > functionality. So you can sort by subsytem, but there is no notion of > nicely grouping by them as in trac. There's a feature request for this, > too, but who knows if or when it will get put in. > Agreed. > > Travis mentioned he had created a code.google.com site for numpy a long > time ago that never got used. I think Maggie is going to create a few dozen > test tickets on its issue tracker today and then we can also have that to > evaluate as well. > Is that necessary? I think there's a reason no one has suggested it so far - it has very few features. Plus it's blocked in China (and probably other countries too), which is enough of a problem in my opinion to immediately dismiss it. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Apr 12 12:06:11 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 12 Apr 2012 18:06:11 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <59f45acc84318c05f53067acc40fc905.squirrel@srv2.s4y.tournesol-consulting.eu> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> <59f45acc84318c05f53067acc40fc905.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Thu, Apr 12, 2012 at 5:59 PM, Andreas H. wrote: > Have you guys actually thought about JIRA? Atlassian offers free licences > for open source projects ... > Yes, http://article.gmane.org/gmane.comp.python.numeric.general/48224/match=jira Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Thu Apr 12 12:23:25 2012 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 12 Apr 2012 09:23:25 -0700 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <26FC23E7C398A64083C980D16001012D2E4ABCD510@VA3DIAXVS361.RED001.local>, Message-ID: <26FC23E7C398A64083C980D16001012D2E4ABCD513@VA3DIAXVS361.RED001.local> ________________________________ >> Example: >> lib = ctypes.CDLL('libm.dylib') >> address_as_integer = ctypes.cast(lib.sin, ctypes.c_void_p).value Excellent! Sorry for the hijack, thanks for rhe ride, Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Apr 12 12:43:00 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 12 Apr 2012 18:43:00 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau wrote: > > > On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers > wrote: > >> >> >> On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: >> >>> On 4/3/12 4:18 PM, Ralf Gommers wrote: >>> > Here some first impressions. >>> > >>> > The good: >>> > - It's responsive! >>> > - It remembers my preferences (view type, # of issues per page, etc.) >>> > - Editing multiple issues with the command window is easy. >>> > - Search and filter functionality is powerful >>> > >>> > The bad: >>> > - Multiple projects are supported, but issues are then really mixed. >>> > The way this works doesn't look very useful for combined admin of >>> > numpy/scipy trackers. >>> > - I haven't found a way yet to make versions and subsystems appear in >>> > the one-line issue overview. >>> > - Fixed issues are still shown by default. There are several open >>> > issues filed against youtrack about this, with no reasonable answers. >>> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only >>> > downloaded. >>> > - No direct VCS integration, only via Teamcity (not set up, so can't >>> > evaluate). >>> > - No useful default views as in Trac >>> > (http://projects.scipy.org/scipy/report). >>> >>> Ralf, regarding some of the issues: >>> >> >> Hi Bryan, thanks for looking into this. >> >>> >>> I think for numpy/scipy trackers, we could simply run separate instances >>> of YouTrack for each. >> >> >> That would work. It does mean that there's no maintenance advantage over >> using Trac here. >> >> Also we can certainly create some standard >>> queries. It's a small pain not to have useful defaults, but it's only a >>> one-time pain. :) >>> >> >> That should help. >> >> >>> Also, what kind of integration are you looking for with github? There >>> does appear to be the ability to issue commands to youtrack through git >>> commits, which does not depend on TeamCity, as best I can tell: >>> >>> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration >>> http://blogs.jetbrains.com/youtrack/tag/github-integration/ >>> >>> I'm not sure this is what you were thinking about though. >>> >> >> That does help. The other thing that's useful is to reference commits >> (like commit:abcd123 in current Trac) and have them turned into links to >> commits on Github. This is not a showstopper for me though. >> >>> >>> For the other issues, Maggie or I can try and see what we can find out >>> about implementing them, or working around them, this week. >>> >> >> I'd say that from the issues I mentioned, the biggest one is the one-line >> view. So these two: >> >> - I haven't found a way yet to make versions and subsystems appear in >> the one-line issue overview. >> - Fixed issues are still shown by default. There are several open >> issues filed against youtrack about this, with no reasonable answers. >> >> >>> Of course, we'd like to evaluate any other viable issue trackers as >>> >>> well. Do you have any suggestions for other systems besides YouTrack? >>> >> >> David wrote up some issues (some of which I didn't check) with current >> Trac and looked at Redmine before. He also mentioned Roundup. See >> http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow >> >> Redmine does look good from a quick browse (better view, does display >> diffs). It would be good to get the opinions of a few more people on this >> topic. >> > > Redmine is "trac on RoR", but it solves two significant issues over trac: > - mass edit (e.g. moving things to a new mileston is simple and doable > from the UI) > - REST API by default, so that we can build simple command line tools on > top of it (this changed since I made the wiki page) > > It is a PITA to install, though, at least if you are not familiar with > ruby, and I heard it is hard to manage as well. > Thanks, that's a clear description of pros and cons. It's also easy to play with Redmine at demo.redmine.org. That site allows you to set up a new project and try the admin interface. My current list of preferences is: 1. Redmine (if admin overhead is not unreasonable) 2. Trac with performance issues solved 3. Github 4. YouTrack 5. Trac with current performance Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Apr 12 13:24:39 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 12 Apr 2012 18:24:39 +0100 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> Message-ID: On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>> >>> ? ? ?int ? id ? /* Some number that can be used as a "type-check" */ >>> ? ? ?void *func; >>> ? ? ?char *string; >>> >>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >> >> What is not clear to me is how one get from the Python callable to the >> capsule. > > This varies substantially based on the tool. ? Numba would do it's work and create the capsule object using it's approach. ? Cython would use a different approach. > > I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. > >> >> Or do you simply intend to pass a non-callable capsule as an argument in >> place of the callback? > > I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. If the cython folks are worried about type-checking overhead, then PyCapsule seems sub-optimal, because it's unnecessarily complicated to determine what sort of PyCapsule you have, and then extract the actual C struct. (At a minimum, it requires two calls to non-inlineable functions, plus an unnecessary pointer indirection.) A tiny little custom class in a tiny little library that everyone can share might be better? (Bonus: a custom class could define a __call__ method that used ctypes to call the function directly, for interactive convenience/testing/etc.) -- Nathaniel From holgerherrlich05 at arcor.de Thu Apr 12 13:51:48 2012 From: holgerherrlich05 at arcor.de (Holger Herrlich) Date: Thu, 12 Apr 2012 19:51:48 +0200 Subject: [Numpy-discussion] creating/working NumPy-ndarrays in C++ In-Reply-To: <4F86BE8D.1040007@astro.uio.no> References: <4F7AF5C1.5090604@arcor.de> <4F81D81D.3030609@arcor.de> <4F833647.1070804@astro.uio.no> <4F86B647.8020801@arcor.de> <4F86BE8D.1040007@astro.uio.no> Message-ID: <4F871634.9040002@arcor.de> > If you want to use C++, it's because you want to use C++ tools anyway, > right? Some tools like autodia for class diagrams etc. Main reason to use C++ is complexity. Then "..., you get to keep the pieces." becomes too likely. I see, world's upside down, my GUI runs wxPython. ;) Thanks to all, Holger From hmgaudecker at gmail.com Thu Apr 12 14:21:22 2012 From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker) Date: Thu, 12 Apr 2012 20:21:22 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: Message-ID: On 12.04.2012, at 18:38, numpy-discussion-request at scipy.org wrote: >>> Redmine does look good from a quick browse (better view, does display >>> diffs). It would be good to get the opinions of a few more people on this >>> topic. >>> >> >> Redmine is "trac on RoR", but it solves two significant issues over trac: >> - mass edit (e.g. moving things to a new mileston is simple and doable >> from the UI) >> - REST API by default, so that we can build simple command line tools on >> top of it (this changed since I made the wiki page) >> >> It is a PITA to install, though, at least if you are not familiar with >> ruby, and I heard it is hard to manage as well. >> > > Thanks, that's a clear description of pros and cons. It's also easy to play > with Redmine at demo.redmine.org. That site allows you to set up a new > project and try the admin interface. > > My current list of preferences is: > > 1. Redmine (if admin overhead is not unreasonable) > 2. Trac with performance issues solved > 3. Github > 4. YouTrack > 5. Trac with current performance I've been running a Redmine server for a couple of years now, managing lots of small scientific projects. Setup was not trivial back then but if you can manage to run it under Ubuntu it should install smoothly, there seems to be a package for it nowadays. Maintenance has been practically zero over the period. Except for one issue during setup that doesn't apply to NumPy (automatic creation of a Subversion repository when a project is created), I never noticed the fact that it is written in Ruby. The overall experience has been much nicer than with Trac, but I can't comment much on the advanced issue tracker functionality, which seems to be the crucial point here. Best, Hans-Martin From d.s.seljebotn at astro.uio.no Thu Apr 12 15:08:35 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 12 Apr 2012 21:08:35 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> Message-ID: <4F872833.4030309@astro.uio.no> On 04/12/2012 07:24 PM, Nathaniel Smith wrote: > On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>>> >>>> int id /* Some number that can be used as a "type-check" */ >>>> void *func; >>>> char *string; >>>> >>>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >>> >>> What is not clear to me is how one get from the Python callable to the >>> capsule. >> >> This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. >> >> I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. >> >>> >>> Or do you simply intend to pass a non-callable capsule as an argument in >>> place of the callback? >> >> I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. > > If the cython folks are worried about type-checking overhead, then > PyCapsule seems sub-optimal, because it's unnecessarily complicated to > determine what sort of PyCapsule you have, and then extract the actual > C struct. (At a minimum, it requires two calls to non-inlineable > functions, plus an unnecessary pointer indirection.) I think this discussion is moot -- the way I reverse-engineer Travis is that there's no time for a cross-project discussion about this now. That's not too bad, Cython will go its own way (eventually), and perhaps we can merge in the future... But for the entertainment value: In my CEP [1] I descripe two access mechanisms, one slow (for which I think capsules is fine), and a faster one. Obviously, only the slow mechanism will be implemented first. So the only things I'd like changed in how Travis' want to do this is a) Storing the signature string data in the struct, rather than as a char*; void *func char string[1]; // variable-size-allocated and null-terminated b) Allow for multiple signatures in the same capsule, i.e. "dd->d", "ff->f", in the same capsule. > A tiny little custom class in a tiny little library that everyone can > share might be better? (Bonus: a custom class could define a __call__ > method that used ctypes to call the function directly, for interactive > convenience/testing/etc.) Having NumPy and Cython depend on a common library, and getting that to work for users, seems rather utopic to me. And if I propose that Cython have a hard dependency of NumPy for a feature as basic as calli.ng a callback object then certain people will be very angry. Anyway, in my CEP I went to great pains to avoid having to do this, with a global registration mechanism for multiple such types. Regarding your idea for the __call__, that's the exact opposite of what I'm doing in the CEP. I'm pretty sure that what I described is what we want for Cython; we will never tell our users to pass capsules around. What I want is this: @numba def f(x): return 2 * x @cython.inline def g(x): return 3 * x print f(3) print g(3) print scipy.integrate.quad(f, 0.2, 3) # fast! print scipy.integrate.quad(g, 0.2, 3) # fast! # If you really want a capsule: print f.__nativecall__ Dag [1] http://wiki.cython.org/enhancements/cep1000 From cournape at gmail.com Thu Apr 12 15:22:02 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 12 Apr 2012 20:22:02 +0100 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: On Thu, Apr 12, 2012 at 5:43 PM, Ralf Gommers wrote: > > > On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau wrote: > >> >> >> On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: >>> >>>> On 4/3/12 4:18 PM, Ralf Gommers wrote: >>>> > Here some first impressions. >>>> > >>>> > The good: >>>> > - It's responsive! >>>> > - It remembers my preferences (view type, # of issues per page, etc.) >>>> > - Editing multiple issues with the command window is easy. >>>> > - Search and filter functionality is powerful >>>> > >>>> > The bad: >>>> > - Multiple projects are supported, but issues are then really mixed. >>>> > The way this works doesn't look very useful for combined admin of >>>> > numpy/scipy trackers. >>>> > - I haven't found a way yet to make versions and subsystems appear in >>>> > the one-line issue overview. >>>> > - Fixed issues are still shown by default. There are several open >>>> > issues filed against youtrack about this, with no reasonable answers. >>>> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only >>>> > downloaded. >>>> > - No direct VCS integration, only via Teamcity (not set up, so can't >>>> > evaluate). >>>> > - No useful default views as in Trac >>>> > (http://projects.scipy.org/scipy/report). >>>> >>>> Ralf, regarding some of the issues: >>>> >>> >>> Hi Bryan, thanks for looking into this. >>> >>>> >>>> I think for numpy/scipy trackers, we could simply run separate instances >>>> of YouTrack for each. >>> >>> >>> That would work. It does mean that there's no maintenance advantage over >>> using Trac here. >>> >>> Also we can certainly create some standard >>>> queries. It's a small pain not to have useful defaults, but it's only a >>>> one-time pain. :) >>>> >>> >>> That should help. >>> >>> >>>> Also, what kind of integration are you looking for with github? There >>>> does appear to be the ability to issue commands to youtrack through git >>>> commits, which does not depend on TeamCity, as best I can tell: >>>> >>>> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration >>>> http://blogs.jetbrains.com/youtrack/tag/github-integration/ >>>> >>>> I'm not sure this is what you were thinking about though. >>>> >>> >>> That does help. The other thing that's useful is to reference commits >>> (like commit:abcd123 in current Trac) and have them turned into links to >>> commits on Github. This is not a showstopper for me though. >>> >>>> >>>> For the other issues, Maggie or I can try and see what we can find out >>>> about implementing them, or working around them, this week. >>>> >>> >>> I'd say that from the issues I mentioned, the biggest one is the >>> one-line view. So these two: >>> >>> - I haven't found a way yet to make versions and subsystems appear in >>> the one-line issue overview. >>> - Fixed issues are still shown by default. There are several open >>> issues filed against youtrack about this, with no reasonable >>> answers. >>> >>> >>>> Of course, we'd like to evaluate any other viable issue trackers as >>>> >>>> well. Do you have any suggestions for other systems besides YouTrack? >>>> >>> >>> David wrote up some issues (some of which I didn't check) with current >>> Trac and looked at Redmine before. He also mentioned Roundup. See >>> http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow >>> >>> Redmine does look good from a quick browse (better view, does display >>> diffs). It would be good to get the opinions of a few more people on this >>> topic. >>> >> >> Redmine is "trac on RoR", but it solves two significant issues over trac: >> - mass edit (e.g. moving things to a new mileston is simple and doable >> from the UI) >> - REST API by default, so that we can build simple command line tools >> on top of it (this changed since I made the wiki page) >> >> It is a PITA to install, though, at least if you are not familiar with >> ruby, and I heard it is hard to manage as well. >> > > Thanks, that's a clear description of pros and cons. It's also easy to > play with Redmine at demo.redmine.org. That site allows you to set up a > new project and try the admin interface. > And I just discovered this (and in python !) https://github.com/coiled-coil/git-redmine David -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwp2 at students.calvin.edu Thu Apr 12 15:24:43 2012 From: pwp2 at students.calvin.edu (Peter Plantinga) Date: Thu, 12 Apr 2012 15:24:43 -0400 Subject: [Numpy-discussion] physics simulation Message-ID: I'm trying to run a physics simulation on a cluster. The original program is written in fortran, and highly parallelizable. Unfortunately, I've had a bit of trouble getting f2py to work with the compiler I'm using (absoft v.10.1). The cluster is running Linux v. 12. When I type just "f2py" I get the following error: > f2py Traceback (most recent call last): File "/opt/absoft10.1/bin/f2py", line 20, in from numpy.f2py import main File "/usr/local/lib/python2.6/dist-packages/numpy/__init__.py", line 137, in import add_newdocs File "/usr/local/lib/python2.6/dist-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/usr/local/lib/python2.6/dist-packages/numpy/lib/__init__.py", line 13, in from polynomial import * File "/usr/local/lib/python2.6/dist-packages/numpy/lib/polynomial.py", line 17, in from numpy.linalg import eigvals, lstsq File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/__init__.py", line 48, in from linalg import * File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/linalg.py", line 23, in from numpy.linalg import lapack_lite ImportError: libaf90math.so: cannot open shared object file: No such file or directory It looks like f2py cannot find libaf90math.so, located in /opt/absoft10.1/shlib. How can I tell f2py where af90math is? Thanks for the help! Peter Plantinga -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Thu Apr 12 15:54:50 2012 From: tim at cerazone.net (Tim Cera) Date: Thu, 12 Apr 2012 15:54:50 -0400 Subject: [Numpy-discussion] physics simulation In-Reply-To: References: Message-ID: > > It looks like f2py cannot find libaf90math.so, located in > /opt/absoft10.1/shlib. How can I tell f2py where af90math is? Really you have to have this setup in order to run a fortran executable, but the only thing that comes to mind is the LD_LIBRARY_PATH environment variable. LD_LIBRARY_PATH is a colon separated list of paths that is searched for dynamic libraries by both regular programs and by Python. Use... env | grep LD_ to show you the existing LD_LIBRARY_PATH. To change/append depends on your shell. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Apr 12 16:05:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 12 Apr 2012 15:05:12 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: <3517E0D6-B8CB-4A3A-A37C-F78860F2D0AA@continuum.io> This looks good. Maggie and Bryan are now setting up a Redmine instance to try out how hard that is to administer. I have some experience with Redmine and have liked what I've seen in the past. I think the user experience that Ralf is providing feedback on is much more important than how hard it is to administer. NumFocus will dedicate resources to administer the system. -Travis On Apr 12, 2012, at 11:43 AM, Ralf Gommers wrote: > > > On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau wrote: > > > On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers wrote: > > > On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: > On 4/3/12 4:18 PM, Ralf Gommers wrote: > > Here some first impressions. > > > > The good: > > - It's responsive! > > - It remembers my preferences (view type, # of issues per page, etc.) > > - Editing multiple issues with the command window is easy. > > - Search and filter functionality is powerful > > > > The bad: > > - Multiple projects are supported, but issues are then really mixed. > > The way this works doesn't look very useful for combined admin of > > numpy/scipy trackers. > > - I haven't found a way yet to make versions and subsystems appear in > > the one-line issue overview. > > - Fixed issues are still shown by default. There are several open > > issues filed against youtrack about this, with no reasonable answers. > > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > > downloaded. > > - No direct VCS integration, only via Teamcity (not set up, so can't > > evaluate). > > - No useful default views as in Trac > > (http://projects.scipy.org/scipy/report). > > Ralf, regarding some of the issues: > > Hi Bryan, thanks for looking into this. > > I think for numpy/scipy trackers, we could simply run separate instances > of YouTrack for each. > > That would work. It does mean that there's no maintenance advantage over using Trac here. > > Also we can certainly create some standard > queries. It's a small pain not to have useful defaults, but it's only a > one-time pain. :) > > That should help. > > Also, what kind of integration are you looking for with github? There > does appear to be the ability to issue commands to youtrack through git > commits, which does not depend on TeamCity, as best I can tell: > > http://confluence.jetbrains.net/display/YTD3/GitHub+Integration > http://blogs.jetbrains.com/youtrack/tag/github-integration/ > > I'm not sure this is what you were thinking about though. > > That does help. The other thing that's useful is to reference commits (like commit:abcd123 in current Trac) and have them turned into links to commits on Github. This is not a showstopper for me though. > > For the other issues, Maggie or I can try and see what we can find out > about implementing them, or working around them, this week. > > I'd say that from the issues I mentioned, the biggest one is the one-line view. So these two: > > - I haven't found a way yet to make versions and subsystems appear in > the one-line issue overview. > - Fixed issues are still shown by default. There are several open > issues filed against youtrack about this, with no reasonable answers. > > Of course, we'd like to evaluate any other viable issue trackers as > > well. Do you have any suggestions for other systems besides YouTrack? > > David wrote up some issues (some of which I didn't check) with current Trac and looked at Redmine before. He also mentioned Roundup. See http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > > Redmine does look good from a quick browse (better view, does display diffs). It would be good to get the opinions of a few more people on this topic. > > Redmine is "trac on RoR", but it solves two significant issues over trac: > - mass edit (e.g. moving things to a new mileston is simple and doable from the UI) > - REST API by default, so that we can build simple command line tools on top of it (this changed since I made the wiki page) > > It is a PITA to install, though, at least if you are not familiar with ruby, and I heard it is hard to manage as well. > > Thanks, that's a clear description of pros and cons. It's also easy to play with Redmine at demo.redmine.org. That site allows you to set up a new project and try the admin interface. > > My current list of preferences is: > > 1. Redmine (if admin overhead is not unreasonable) > 2. Trac with performance issues solved > 3. Github > 4. YouTrack > 5. Trac with current performance > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ratcliff at gmail.com Thu Apr 12 16:29:23 2012 From: william.ratcliff at gmail.com (william ratcliff) Date: Thu, 12 Apr 2012 16:29:23 -0400 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: <3517E0D6-B8CB-4A3A-A37C-F78860F2D0AA@continuum.io> References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> <3517E0D6-B8CB-4A3A-A37C-F78860F2D0AA@continuum.io> Message-ID: Has anyone tried Rietveld, Gerrit, or Phabricator? On Thu, Apr 12, 2012 at 4:05 PM, Travis Oliphant wrote: > This looks good. Maggie and Bryan are now setting up a Redmine instance > to try out how hard that is to administer. I have some experience with > Redmine and have liked what I've seen in the past. I think the user > experience that Ralf is providing feedback on is much more important than > how hard it is to administer. > > NumFocus will dedicate resources to administer the system. > > -Travis > > > > > On Apr 12, 2012, at 11:43 AM, Ralf Gommers wrote: > > > > On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau wrote: > >> >> >> On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven wrote: >>> >>>> On 4/3/12 4:18 PM, Ralf Gommers wrote: >>>> > Here some first impressions. >>>> > >>>> > The good: >>>> > - It's responsive! >>>> > - It remembers my preferences (view type, # of issues per page, etc.) >>>> > - Editing multiple issues with the command window is easy. >>>> > - Search and filter functionality is powerful >>>> > >>>> > The bad: >>>> > - Multiple projects are supported, but issues are then really mixed. >>>> > The way this works doesn't look very useful for combined admin of >>>> > numpy/scipy trackers. >>>> > - I haven't found a way yet to make versions and subsystems appear in >>>> > the one-line issue overview. >>>> > - Fixed issues are still shown by default. There are several open >>>> > issues filed against youtrack about this, with no reasonable answers. >>>> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only >>>> > downloaded. >>>> > - No direct VCS integration, only via Teamcity (not set up, so can't >>>> > evaluate). >>>> > - No useful default views as in Trac >>>> > (http://projects.scipy.org/scipy/report). >>>> >>>> Ralf, regarding some of the issues: >>>> >>> >>> Hi Bryan, thanks for looking into this. >>> >>>> >>>> I think for numpy/scipy trackers, we could simply run separate instances >>>> of YouTrack for each. >>> >>> >>> That would work. It does mean that there's no maintenance advantage over >>> using Trac here. >>> >>> Also we can certainly create some standard >>>> queries. It's a small pain not to have useful defaults, but it's only a >>>> one-time pain. :) >>>> >>> >>> That should help. >>> >>> >>>> Also, what kind of integration are you looking for with github? There >>>> does appear to be the ability to issue commands to youtrack through git >>>> commits, which does not depend on TeamCity, as best I can tell: >>>> >>>> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration >>>> http://blogs.jetbrains.com/youtrack/tag/github-integration/ >>>> >>>> I'm not sure this is what you were thinking about though. >>>> >>> >>> That does help. The other thing that's useful is to reference commits >>> (like commit:abcd123 in current Trac) and have them turned into links to >>> commits on Github. This is not a showstopper for me though. >>> >>>> >>>> For the other issues, Maggie or I can try and see what we can find out >>>> about implementing them, or working around them, this week. >>>> >>> >>> I'd say that from the issues I mentioned, the biggest one is the >>> one-line view. So these two: >>> >>> - I haven't found a way yet to make versions and subsystems appear in >>> the one-line issue overview. >>> - Fixed issues are still shown by default. There are several open >>> issues filed against youtrack about this, with no reasonable >>> answers. >>> >>> >>>> Of course, we'd like to evaluate any other viable issue trackers as >>>> >>>> well. Do you have any suggestions for other systems besides YouTrack? >>>> >>> >>> David wrote up some issues (some of which I didn't check) with current >>> Trac and looked at Redmine before. He also mentioned Roundup. See >>> http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow >>> >>> Redmine does look good from a quick browse (better view, does display >>> diffs). It would be good to get the opinions of a few more people on this >>> topic. >>> >> >> Redmine is "trac on RoR", but it solves two significant issues over trac: >> - mass edit (e.g. moving things to a new mileston is simple and doable >> from the UI) >> - REST API by default, so that we can build simple command line tools >> on top of it (this changed since I made the wiki page) >> >> It is a PITA to install, though, at least if you are not familiar with >> ruby, and I heard it is hard to manage as well. >> > > Thanks, that's a clear description of pros and cons. It's also easy to > play with Redmine at demo.redmine.org. That site allows you to set up a > new project and try the admin interface. > > My current list of preferences is: > > 1. Redmine (if admin overhead is not unreasonable) > 2. Trac with performance issues solved > 3. Github > 4. YouTrack > 5. Trac with current performance > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Apr 12 17:03:58 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 12 Apr 2012 22:03:58 +0100 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> <3517E0D6-B8CB-4A3A-A37C-F78860F2D0AA@continuum.io> Message-ID: On Thu, Apr 12, 2012 at 9:29 PM, william ratcliff < william.ratcliff at gmail.com> wrote: > Has anyone tried Rietveld, Gerrit, or Phabricator? rietveld and gerrit are code review tools. I have not heard of phabricator, but this article certainly makes it sounds interesting: http://www.readwriteweb.com/hack/2011/09/a-look-at-phabricator-facebook.php There is a quite complete command line interface, arcanist, and if done right, having code review and bug tracking integrated together sounds exciting. Thanks for mentioning it, I will definitely look it out. regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Apr 12 17:13:51 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 12 Apr 2012 16:13:51 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F872833.4030309@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> Message-ID: <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> Dag, Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). Thanks for the CEP. -Travis On Apr 12, 2012, at 2:08 PM, Dag Sverre Seljebotn wrote: > On 04/12/2012 07:24 PM, Nathaniel Smith wrote: >> On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>>>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>>>> >>>>> int id /* Some number that can be used as a "type-check" */ >>>>> void *func; >>>>> char *string; >>>>> >>>>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >>>> >>>> What is not clear to me is how one get from the Python callable to the >>>> capsule. >>> >>> This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. >>> >>> I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. >>> >>>> >>>> Or do you simply intend to pass a non-callable capsule as an argument in >>>> place of the callback? >>> >>> I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. >> >> If the cython folks are worried about type-checking overhead, then >> PyCapsule seems sub-optimal, because it's unnecessarily complicated to >> determine what sort of PyCapsule you have, and then extract the actual >> C struct. (At a minimum, it requires two calls to non-inlineable >> functions, plus an unnecessary pointer indirection.) > > I think this discussion is moot -- the way I reverse-engineer Travis is > that there's no time for a cross-project discussion about this now. > That's not too bad, Cython will go its own way (eventually), and perhaps > we can merge in the future... > > But for the entertainment value: > > In my CEP [1] I descripe two access mechanisms, one slow (for which I > think capsules is fine), and a faster one. > > Obviously, only the slow mechanism will be implemented first. > > So the only things I'd like changed in how Travis' want to do this is > > a) Storing the signature string data in the struct, rather than as a char*; > > void *func > char string[1]; // variable-size-allocated and null-terminated > > b) Allow for multiple signatures in the same capsule, i.e. "dd->d", > "ff->f", in the same capsule. > >> A tiny little custom class in a tiny little library that everyone can >> share might be better? (Bonus: a custom class could define a __call__ >> method that used ctypes to call the function directly, for interactive >> convenience/testing/etc.) > > Having NumPy and Cython depend on a common library, and getting that to > work for users, seems rather utopic to me. And if I propose that Cython > have a hard dependency of NumPy for a feature as basic as calli.ng a > callback object then certain people will be very angry. > > Anyway, in my CEP I went to great pains to avoid having to do this, with > a global registration mechanism for multiple such types. > > Regarding your idea for the __call__, that's the exact opposite of what > I'm doing in the CEP. I'm pretty sure that what I described is what we > want for Cython; we will never tell our users to pass capsules around. > What I want is this: > > @numba > def f(x): return 2 * x > > @cython.inline > def g(x): return 3 * x > > print f(3) > print g(3) > print scipy.integrate.quad(f, 0.2, 3) # fast! > print scipy.integrate.quad(g, 0.2, 3) # fast! > > # If you really want a capsule: > print f.__nativecall__ > > Dag > > [1] http://wiki.cython.org/enhancements/cep1000 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From maggie.mari at continuum.io Thu Apr 12 17:16:02 2012 From: maggie.mari at continuum.io (Maggie Mari) Date: Thu, 12 Apr 2012 16:16:02 -0500 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: <4F874612.30601@continuum.io> On 4/12/12 11:43 AM, Ralf Gommers wrote: > > > Thanks, that's a clear description of pros and cons. It's also easy to > play with Redmine at demo.redmine.org . That > site allows you to set up a new project and try the admin interface. > > My current list of preferences is: > > 1. Redmine (if admin overhead is not unreasonable) > 2. Trac with performance issues solved > 3. Github > 4. YouTrack > 5. Trac with current performance > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Ralf, I have a redmine server up now at http://ec2-107-21-65-210.compute-1.amazonaws.com/redmine/projects/numpy with mostly default settings. If you would like to play around with it, I will give you admin status after you sign up. Maggie -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu Apr 12 17:51:38 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 12 Apr 2012 23:51:38 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> Message-ID: <4F874E6A.5040701@astro.uio.no> On 04/12/2012 11:13 PM, Travis Oliphant wrote: > Dag, > > Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. > > That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. > > I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). > > Thanks for the CEP. Great. I'll pass this message on to the Cython list and see if anybody wants to provide input (but given the scope, it should be minor tweaks and easy to accommodate in whatever code you write). You will fill in more of the holes as you implement this in Numba and SciPy of course (my feeling is they will support it before Cython; let's say I hope this happens within the next year). Dag > > -Travis > > > > > > On Apr 12, 2012, at 2:08 PM, Dag Sverre Seljebotn wrote: > >> On 04/12/2012 07:24 PM, Nathaniel Smith wrote: >>> On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>>>>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>>>>> >>>>>> int id /* Some number that can be used as a "type-check" */ >>>>>> void *func; >>>>>> char *string; >>>>>> >>>>>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >>>>> >>>>> What is not clear to me is how one get from the Python callable to the >>>>> capsule. >>>> >>>> This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. >>>> >>>> I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. >>>> >>>>> >>>>> Or do you simply intend to pass a non-callable capsule as an argument in >>>>> place of the callback? >>>> >>>> I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. >>> >>> If the cython folks are worried about type-checking overhead, then >>> PyCapsule seems sub-optimal, because it's unnecessarily complicated to >>> determine what sort of PyCapsule you have, and then extract the actual >>> C struct. (At a minimum, it requires two calls to non-inlineable >>> functions, plus an unnecessary pointer indirection.) >> >> I think this discussion is moot -- the way I reverse-engineer Travis is >> that there's no time for a cross-project discussion about this now. >> That's not too bad, Cython will go its own way (eventually), and perhaps >> we can merge in the future... >> >> But for the entertainment value: >> >> In my CEP [1] I descripe two access mechanisms, one slow (for which I >> think capsules is fine), and a faster one. >> >> Obviously, only the slow mechanism will be implemented first. >> >> So the only things I'd like changed in how Travis' want to do this is >> >> a) Storing the signature string data in the struct, rather than as a char*; >> >> void *func >> char string[1]; // variable-size-allocated and null-terminated >> >> b) Allow for multiple signatures in the same capsule, i.e. "dd->d", >> "ff->f", in the same capsule. >> >>> A tiny little custom class in a tiny little library that everyone can >>> share might be better? (Bonus: a custom class could define a __call__ >>> method that used ctypes to call the function directly, for interactive >>> convenience/testing/etc.) >> >> Having NumPy and Cython depend on a common library, and getting that to >> work for users, seems rather utopic to me. And if I propose that Cython >> have a hard dependency of NumPy for a feature as basic as calli.ng a >> callback object then certain people will be very angry. >> >> Anyway, in my CEP I went to great pains to avoid having to do this, with >> a global registration mechanism for multiple such types. >> >> Regarding your idea for the __call__, that's the exact opposite of what >> I'm doing in the CEP. I'm pretty sure that what I described is what we >> want for Cython; we will never tell our users to pass capsules around. >> What I want is this: >> >> @numba >> def f(x): return 2 * x >> >> @cython.inline >> def g(x): return 3 * x >> >> print f(3) >> print g(3) >> print scipy.integrate.quad(f, 0.2, 3) # fast! >> print scipy.integrate.quad(g, 0.2, 3) # fast! >> >> # If you really want a capsule: >> print f.__nativecall__ >> >> Dag >> >> [1] http://wiki.cython.org/enhancements/cep1000 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Thu Apr 12 17:55:52 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 12 Apr 2012 16:55:52 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F874E6A.5040701@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> <4F874E6A.5040701@astro.uio.no> Message-ID: <2A99F25F-EEFD-4F49-823D-4B4BCC55BA8D@continuum.io> On Apr 12, 2012, at 4:51 PM, Dag Sverre Seljebotn wrote: > On 04/12/2012 11:13 PM, Travis Oliphant wrote: >> Dag, >> >> Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. >> >> That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. >> >> I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). >> >> Thanks for the CEP. > > Great. I'll pass this message on to the Cython list and see if anybody > wants to provide input (but given the scope, it should be minor tweaks > and easy to accommodate in whatever code you write). > > You will fill in more of the holes as you implement this in Numba and > SciPy of course (my feeling is they will support it before Cython; let's > say I hope this happens within the next year). Very nice. This will help immensely I think. It's actually just what I was looking for. Just to be clear, by " pad to sizeof(void*) alignment", you mean that after the first 2 bytes there are (sizeof(void*) - 2) bytes before the first function pointer in the memory block pointed to by the PyCObject / Capsule? Thanks, -Travis > > Dag > > > > >> >> -Travis >> >> >> >> >> >> On Apr 12, 2012, at 2:08 PM, Dag Sverre Seljebotn wrote: >> >>> On 04/12/2012 07:24 PM, Nathaniel Smith wrote: >>>> On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>>>>>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>>>>>> >>>>>>> int id /* Some number that can be used as a "type-check" */ >>>>>>> void *func; >>>>>>> char *string; >>>>>>> >>>>>>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >>>>>> >>>>>> What is not clear to me is how one get from the Python callable to the >>>>>> capsule. >>>>> >>>>> This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. >>>>> >>>>> I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. >>>>> >>>>>> >>>>>> Or do you simply intend to pass a non-callable capsule as an argument in >>>>>> place of the callback? >>>>> >>>>> I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. >>>> >>>> If the cython folks are worried about type-checking overhead, then >>>> PyCapsule seems sub-optimal, because it's unnecessarily complicated to >>>> determine what sort of PyCapsule you have, and then extract the actual >>>> C struct. (At a minimum, it requires two calls to non-inlineable >>>> functions, plus an unnecessary pointer indirection.) >>> >>> I think this discussion is moot -- the way I reverse-engineer Travis is >>> that there's no time for a cross-project discussion about this now. >>> That's not too bad, Cython will go its own way (eventually), and perhaps >>> we can merge in the future... >>> >>> But for the entertainment value: >>> >>> In my CEP [1] I descripe two access mechanisms, one slow (for which I >>> think capsules is fine), and a faster one. >>> >>> Obviously, only the slow mechanism will be implemented first. >>> >>> So the only things I'd like changed in how Travis' want to do this is >>> >>> a) Storing the signature string data in the struct, rather than as a char*; >>> >>> void *func >>> char string[1]; // variable-size-allocated and null-terminated >>> >>> b) Allow for multiple signatures in the same capsule, i.e. "dd->d", >>> "ff->f", in the same capsule. >>> >>>> A tiny little custom class in a tiny little library that everyone can >>>> share might be better? (Bonus: a custom class could define a __call__ >>>> method that used ctypes to call the function directly, for interactive >>>> convenience/testing/etc.) >>> >>> Having NumPy and Cython depend on a common library, and getting that to >>> work for users, seems rather utopic to me. And if I propose that Cython >>> have a hard dependency of NumPy for a feature as basic as calli.ng a >>> callback object then certain people will be very angry. >>> >>> Anyway, in my CEP I went to great pains to avoid having to do this, with >>> a global registration mechanism for multiple such types. >>> >>> Regarding your idea for the __call__, that's the exact opposite of what >>> I'm doing in the CEP. I'm pretty sure that what I described is what we >>> want for Cython; we will never tell our users to pass capsules around. >>> What I want is this: >>> >>> @numba >>> def f(x): return 2 * x >>> >>> @cython.inline >>> def g(x): return 3 * x >>> >>> print f(3) >>> print g(3) >>> print scipy.integrate.quad(f, 0.2, 3) # fast! >>> print scipy.integrate.quad(g, 0.2, 3) # fast! >>> >>> # If you really want a capsule: >>> print f.__nativecall__ >>> >>> Dag >>> >>> [1] http://wiki.cython.org/enhancements/cep1000 >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu Apr 12 18:08:12 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 00:08:12 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <2A99F25F-EEFD-4F49-823D-4B4BCC55BA8D@continuum.io> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> <4F874E6A.5040701@astro.uio.no> <2A99F25F-EEFD-4F49-823D-4B4BCC55BA8D@continuum.io> Message-ID: <4F87524C.6090302@astro.uio.no> On 04/12/2012 11:55 PM, Travis Oliphant wrote: > > On Apr 12, 2012, at 4:51 PM, Dag Sverre Seljebotn wrote: > >> On 04/12/2012 11:13 PM, Travis Oliphant wrote: >>> Dag, >>> >>> Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. >>> >>> That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. >>> >>> I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). >>> >>> Thanks for the CEP. >> >> Great. I'll pass this message on to the Cython list and see if anybody >> wants to provide input (but given the scope, it should be minor tweaks >> and easy to accommodate in whatever code you write). >> >> You will fill in more of the holes as you implement this in Numba and >> SciPy of course (my feeling is they will support it before Cython; let's >> say I hope this happens within the next year). > > Very nice. This will help immensely I think. It's actually just what I was looking for. Just to be clear, by " pad to sizeof(void*) alignment", you mean that after the first 2 bytes there are (sizeof(void*) - 2) bytes before the first function pointer in the memory block pointed to by the PyCObject / Capsule? You are only right if the starting address of the data is divisible by sizeof(void*). On 64-bit you would do something like func_ptr = (func_ptr_t*)((char*)descriptor & 0xfffffffffffffff7) + 8) Hmm, not sure if I like it any longer. I don't know a priori how much alignment really matters either on modern CPUs (but in the Cython case, we would like this to be somewhat competitive with compile-time binding, so it does merit checking I think). Dag > > Thanks, > > -Travis > > >> >> Dag >> >> >> >> >>> >>> -Travis >>> >>> >>> >>> >>> >>> On Apr 12, 2012, at 2:08 PM, Dag Sverre Seljebotn wrote: >>> >>>> On 04/12/2012 07:24 PM, Nathaniel Smith wrote: >>>>> On Wed, Apr 11, 2012 at 10:23 PM, Travis Oliphant wrote: >>>>>>>> In the mean-time, I think we could do as Robert essentially suggested and just use Capsule Objects around an agreed-upon simple C-structure: >>>>>>>> >>>>>>>> int id /* Some number that can be used as a "type-check" */ >>>>>>>> void *func; >>>>>>>> char *string; >>>>>>>> >>>>>>>> We can then just create some nice functions to go to and from this form in NumPy ctypeslib and then use this while the Python PEP gets written and adopted. >>>>>>> >>>>>>> What is not clear to me is how one get from the Python callable to the >>>>>>> capsule. >>>>>> >>>>>> This varies substantially based on the tool. Numba would do it's work and create the capsule object using it's approach. Cython would use a different approach. >>>>>> >>>>>> I would also propose to have in NumPy some basic functions that go back-and forth between this representation, ctypes, and any other useful representations that might emerge. >>>>>> >>>>>>> >>>>>>> Or do you simply intend to pass a non-callable capsule as an argument in >>>>>>> place of the callback? >>>>>> >>>>>> I had simply intended to allow a non-callable capsule argument to be passed in instead of another call-back to any SciPy or NumPy function that can take a raw C-function pointer. >>>>> >>>>> If the cython folks are worried about type-checking overhead, then >>>>> PyCapsule seems sub-optimal, because it's unnecessarily complicated to >>>>> determine what sort of PyCapsule you have, and then extract the actual >>>>> C struct. (At a minimum, it requires two calls to non-inlineable >>>>> functions, plus an unnecessary pointer indirection.) >>>> >>>> I think this discussion is moot -- the way I reverse-engineer Travis is >>>> that there's no time for a cross-project discussion about this now. >>>> That's not too bad, Cython will go its own way (eventually), and perhaps >>>> we can merge in the future... >>>> >>>> But for the entertainment value: >>>> >>>> In my CEP [1] I descripe two access mechanisms, one slow (for which I >>>> think capsules is fine), and a faster one. >>>> >>>> Obviously, only the slow mechanism will be implemented first. >>>> >>>> So the only things I'd like changed in how Travis' want to do this is >>>> >>>> a) Storing the signature string data in the struct, rather than as a char*; >>>> >>>> void *func >>>> char string[1]; // variable-size-allocated and null-terminated >>>> >>>> b) Allow for multiple signatures in the same capsule, i.e. "dd->d", >>>> "ff->f", in the same capsule. >>>> >>>>> A tiny little custom class in a tiny little library that everyone can >>>>> share might be better? (Bonus: a custom class could define a __call__ >>>>> method that used ctypes to call the function directly, for interactive >>>>> convenience/testing/etc.) >>>> >>>> Having NumPy and Cython depend on a common library, and getting that to >>>> work for users, seems rather utopic to me. And if I propose that Cython >>>> have a hard dependency of NumPy for a feature as basic as calli.ng a >>>> callback object then certain people will be very angry. >>>> >>>> Anyway, in my CEP I went to great pains to avoid having to do this, with >>>> a global registration mechanism for multiple such types. >>>> >>>> Regarding your idea for the __call__, that's the exact opposite of what >>>> I'm doing in the CEP. I'm pretty sure that what I described is what we >>>> want for Cython; we will never tell our users to pass capsules around. >>>> What I want is this: >>>> >>>> @numba >>>> def f(x): return 2 * x >>>> >>>> @cython.inline >>>> def g(x): return 3 * x >>>> >>>> print f(3) >>>> print g(3) >>>> print scipy.integrate.quad(f, 0.2, 3) # fast! >>>> print scipy.integrate.quad(g, 0.2, 3) # fast! >>>> >>>> # If you really want a capsule: >>>> print f.__nativecall__ >>>> >>>> Dag >>>> >>>> [1] http://wiki.cython.org/enhancements/cep1000 >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pwp2 at students.calvin.edu Thu Apr 12 20:26:54 2012 From: pwp2 at students.calvin.edu (Peter Plantinga) Date: Thu, 12 Apr 2012 20:26:54 -0400 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 67, Issue 43 In-Reply-To: References: Message-ID: > > Really you have to have this setup in order to run a fortran executable, > but the only thing that comes to mind is the LD_LIBRARY_PATH environment > variable. LD_LIBRARY_PATH is a colon separated list of paths that is > searched for dynamic libraries by both regular programs and by Python. Thanks for the suggestion! I added the library to LD_LIBRARY_PATH and that fixed it. On Thu, Apr 12, 2012 at 4:25 PM, wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. physics simulation (Peter Plantinga) > 2. Re: physics simulation (Tim Cera) > 3. Re: YouTrack testbed (Travis Oliphant) > 4. Re: YouTrack testbed (william ratcliff) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 12 Apr 2012 15:24:43 -0400 > From: Peter Plantinga > Subject: [Numpy-discussion] physics simulation > To: numpy-discussion at scipy.org > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > I'm trying to run a physics simulation on a cluster. The original program > is written in fortran, and highly parallelizable. Unfortunately, I've had a > bit of trouble getting f2py to work with the compiler I'm using (absoft > v.10.1). The cluster is running Linux v. 12. > > When I type just "f2py" I get the following error: > > > f2py > Traceback (most recent call last): > File "/opt/absoft10.1/bin/f2py", line 20, in > from numpy.f2py import main > File "/usr/local/lib/python2.6/dist-packages/numpy/__init__.py", line > 137, in > import add_newdocs > File "/usr/local/lib/python2.6/dist-packages/numpy/add_newdocs.py", line > 9, in > from numpy.lib import add_newdoc > File "/usr/local/lib/python2.6/dist-packages/numpy/lib/__init__.py", line > 13, in > from polynomial import * > File "/usr/local/lib/python2.6/dist-packages/numpy/lib/polynomial.py", > line 17, in > from numpy.linalg import eigvals, lstsq > File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/__init__.py", > line 48, in > from linalg import * > File "/usr/local/lib/python2.6/dist-packages/numpy/linalg/linalg.py", > line 23, in > from numpy.linalg import lapack_lite > ImportError: libaf90math.so: cannot open shared object file: No such file > or directory > > It looks like f2py cannot find libaf90math.so, located in > /opt/absoft10.1/shlib. How can I tell f2py where af90math is? > > Thanks for the help! > Peter Plantinga > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120412/6d1dfed3/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Thu, 12 Apr 2012 15:54:50 -0400 > From: Tim Cera > Subject: Re: [Numpy-discussion] physics simulation > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > It looks like f2py cannot find libaf90math.so, located in > > /opt/absoft10.1/shlib. How can I tell f2py where af90math is? > > > Really you have to have this setup in order to run a fortran executable, > but the only thing that comes to mind is the LD_LIBRARY_PATH environment > variable. LD_LIBRARY_PATH is a colon separated list of paths that is > searched for dynamic libraries by both regular programs and by Python. > > Use... > > env | grep LD_ > > to show you the existing LD_LIBRARY_PATH. To change/append depends on your > shell. > > Kindest regards, > Tim > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120412/12138bcf/attachment-0001.html > > ------------------------------ > > Message: 3 > Date: Thu, 12 Apr 2012 15:05:12 -0500 > From: Travis Oliphant > Subject: Re: [Numpy-discussion] YouTrack testbed > To: Discussion of Numerical Python > Message-ID: <3517E0D6-B8CB-4A3A-A37C-F78860F2D0AA at continuum.io> > Content-Type: text/plain; charset="iso-8859-1" > > This looks good. Maggie and Bryan are now setting up a Redmine instance > to try out how hard that is to administer. I have some experience with > Redmine and have liked what I've seen in the past. I think the user > experience that Ralf is providing feedback on is much more important than > how hard it is to administer. > > NumFocus will dedicate resources to administer the system. > > -Travis > > > > > On Apr 12, 2012, at 11:43 AM, Ralf Gommers wrote: > > > > > > > On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau > wrote: > > > > > > On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers < > ralf.gommers at googlemail.com> wrote: > > > > > > On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven > wrote: > > On 4/3/12 4:18 PM, Ralf Gommers wrote: > > > Here some first impressions. > > > > > > The good: > > > - It's responsive! > > > - It remembers my preferences (view type, # of issues per page, etc.) > > > - Editing multiple issues with the command window is easy. > > > - Search and filter functionality is powerful > > > > > > The bad: > > > - Multiple projects are supported, but issues are then really mixed. > > > The way this works doesn't look very useful for combined admin of > > > numpy/scipy trackers. > > > - I haven't found a way yet to make versions and subsystems appear in > > > the one-line issue overview. > > > - Fixed issues are still shown by default. There are several open > > > issues filed against youtrack about this, with no reasonable answers. > > > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > > > downloaded. > > > - No direct VCS integration, only via Teamcity (not set up, so can't > > > evaluate). > > > - No useful default views as in Trac > > > (http://projects.scipy.org/scipy/report). > > > > Ralf, regarding some of the issues: > > > > Hi Bryan, thanks for looking into this. > > > > I think for numpy/scipy trackers, we could simply run separate instances > > of YouTrack for each. > > > > That would work. It does mean that there's no maintenance advantage over > using Trac here. > > > > Also we can certainly create some standard > > queries. It's a small pain not to have useful defaults, but it's only a > > one-time pain. :) > > > > That should help. > > > > Also, what kind of integration are you looking for with github? There > > does appear to be the ability to issue commands to youtrack through git > > commits, which does not depend on TeamCity, as best I can tell: > > > > http://confluence.jetbrains.net/display/YTD3/GitHub+Integration > > http://blogs.jetbrains.com/youtrack/tag/github-integration/ > > > > I'm not sure this is what you were thinking about though. > > > > That does help. The other thing that's useful is to reference commits > (like commit:abcd123 in current Trac) and have them turned into links to > commits on Github. This is not a showstopper for me though. > > > > For the other issues, Maggie or I can try and see what we can find out > > about implementing them, or working around them, this week. > > > > I'd say that from the issues I mentioned, the biggest one is the > one-line view. So these two: > > > > - I haven't found a way yet to make versions and subsystems appear in > > the one-line issue overview. > > - Fixed issues are still shown by default. There are several open > > issues filed against youtrack about this, with no reasonable answers. > > > > Of course, we'd like to evaluate any other viable issue trackers as > > > > well. Do you have any suggestions for other systems besides YouTrack? > > > > David wrote up some issues (some of which I didn't check) with current > Trac and looked at Redmine before. He also mentioned Roundup. See > http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > > > > Redmine does look good from a quick browse (better view, does display > diffs). It would be good to get the opinions of a few more people on this > topic. > > > > Redmine is "trac on RoR", but it solves two significant issues over trac: > > - mass edit (e.g. moving things to a new mileston is simple and doable > from the UI) > > - REST API by default, so that we can build simple command line tools > on top of it (this changed since I made the wiki page) > > > > It is a PITA to install, though, at least if you are not familiar with > ruby, and I heard it is hard to manage as well. > > > > Thanks, that's a clear description of pros and cons. It's also easy to > play with Redmine at demo.redmine.org. That site allows you to set up a > new project and try the admin interface. > > > > My current list of preferences is: > > > > 1. Redmine (if admin overhead is not unreasonable) > > 2. Trac with performance issues solved > > 3. Github > > 4. YouTrack > > 5. Trac with current performance > > > > Ralf > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120412/3b4b840c/attachment-0001.html > > ------------------------------ > > Message: 4 > Date: Thu, 12 Apr 2012 16:29:23 -0400 > From: william ratcliff > Subject: Re: [Numpy-discussion] YouTrack testbed > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Has anyone tried Rietveld, Gerrit, or Phabricator? > > > > On Thu, Apr 12, 2012 at 4:05 PM, Travis Oliphant >wrote: > > > This looks good. Maggie and Bryan are now setting up a Redmine instance > > to try out how hard that is to administer. I have some experience with > > Redmine and have liked what I've seen in the past. I think the user > > experience that Ralf is providing feedback on is much more important than > > how hard it is to administer. > > > > NumFocus will dedicate resources to administer the system. > > > > -Travis > > > > > > > > > > On Apr 12, 2012, at 11:43 AM, Ralf Gommers wrote: > > > > > > > > On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau >wrote: > > > >> > >> > >> On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers < > >> ralf.gommers at googlemail.com> wrote: > >> > >>> > >>> > >>> On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven >wrote: > >>> > >>>> On 4/3/12 4:18 PM, Ralf Gommers wrote: > >>>> > Here some first impressions. > >>>> > > >>>> > The good: > >>>> > - It's responsive! > >>>> > - It remembers my preferences (view type, # of issues per page, > etc.) > >>>> > - Editing multiple issues with the command window is easy. > >>>> > - Search and filter functionality is powerful > >>>> > > >>>> > The bad: > >>>> > - Multiple projects are supported, but issues are then really mixed. > >>>> > The way this works doesn't look very useful for combined admin of > >>>> > numpy/scipy trackers. > >>>> > - I haven't found a way yet to make versions and subsystems appear > in > >>>> > the one-line issue overview. > >>>> > - Fixed issues are still shown by default. There are several open > >>>> > issues filed against youtrack about this, with no reasonable > answers. > >>>> > - Plain text attachments (.txt, .diff, .patch) can't be viewed, only > >>>> > downloaded. > >>>> > - No direct VCS integration, only via Teamcity (not set up, so can't > >>>> > evaluate). > >>>> > - No useful default views as in Trac > >>>> > (http://projects.scipy.org/scipy/report). > >>>> > >>>> Ralf, regarding some of the issues: > >>>> > >>> > >>> Hi Bryan, thanks for looking into this. > >>> > >>>> > >>>> I think for numpy/scipy trackers, we could simply run separate > instances > >>>> of YouTrack for each. > >>> > >>> > >>> That would work. It does mean that there's no maintenance advantage > over > >>> using Trac here. > >>> > >>> Also we can certainly create some standard > >>>> queries. It's a small pain not to have useful defaults, but it's only > a > >>>> one-time pain. :) > >>>> > >>> > >>> That should help. > >>> > >>> > >>>> Also, what kind of integration are you looking for with github? There > >>>> does appear to be the ability to issue commands to youtrack through > git > >>>> commits, which does not depend on TeamCity, as best I can tell: > >>>> > >>>> http://confluence.jetbrains.net/display/YTD3/GitHub+Integration > >>>> http://blogs.jetbrains.com/youtrack/tag/github-integration/ > >>>> > >>>> I'm not sure this is what you were thinking about though. > >>>> > >>> > >>> That does help. The other thing that's useful is to reference commits > >>> (like commit:abcd123 in current Trac) and have them turned into links > to > >>> commits on Github. This is not a showstopper for me though. > >>> > >>>> > >>>> For the other issues, Maggie or I can try and see what we can find out > >>>> about implementing them, or working around them, this week. > >>>> > >>> > >>> I'd say that from the issues I mentioned, the biggest one is the > >>> one-line view. So these two: > >>> > >>> - I haven't found a way yet to make versions and subsystems appear in > >>> the one-line issue overview. > >>> - Fixed issues are still shown by default. There are several open > >>> issues filed against youtrack about this, with no reasonable > >>> answers. > >>> > >>> > >>>> Of course, we'd like to evaluate any other viable issue trackers as > >>>> > >>>> well. Do you have any suggestions for other systems besides YouTrack? > >>>> > >>> > >>> David wrote up some issues (some of which I didn't check) with current > >>> Trac and looked at Redmine before. He also mentioned Roundup. See > >>> http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > >>> > >>> Redmine does look good from a quick browse (better view, does display > >>> diffs). It would be good to get the opinions of a few more people on > this > >>> topic. > >>> > >> > >> Redmine is "trac on RoR", but it solves two significant issues over > trac: > >> - mass edit (e.g. moving things to a new mileston is simple and doable > >> from the UI) > >> - REST API by default, so that we can build simple command line tools > >> on top of it (this changed since I made the wiki page) > >> > >> It is a PITA to install, though, at least if you are not familiar with > >> ruby, and I heard it is hard to manage as well. > >> > > > > Thanks, that's a clear description of pros and cons. It's also easy to > > play with Redmine at demo.redmine.org. That site allows you to set up a > > new project and try the admin interface. > > > > My current list of preferences is: > > > > 1. Redmine (if admin overhead is not unreasonable) > > 2. Trac with performance issues solved > > 3. Github > > 4. YouTrack > > 5. Trac with current performance > > > > Ralf > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120412/11c9aea6/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 67, Issue 43 > ************************************************ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 12 21:41:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Apr 2012 19:41:09 -0600 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball wrote: > Hi, > > I'm trying out various continuous integration options, so I happen to be > testing NumPy on several platforms that I don't normally use. > > Recently, I've been getting a segmentation fault on Debian 6 (with Python > 2.7.2): > > Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 > x86_64 > GNU/Linux (Debian GNU/Linux 6.0 \n \l) > > nosetests --verbose > /home/slave/tmp/numpy/numpy/random/__init__.py:91: RuntimeWarning: > numpy.ndarray size changed, may indicate binary incompatibility > from mtrand import * > test_api.test_fastCopyAndTranspose ... ok > test_api.test_array_astype ... ok > test_api.test_copyto_fromscalar ... ok > test_api.test_copyto ... ok > test_api.test_copyto_maskna ... ok > test_api.test_copy_order ... ok > Basic test of array2string. ... ok > Test custom format function for each element in array. ... ok > This should only apply to 0-D arrays. See #1218. ... ok > test_arrayprint.TestArrayRepr.test_nan_inf ... ok > test_str (test_arrayprint.TestComplexArray) ... ok > test_arrayprint.TestPrintOptions.test_basic ... ok > test_arrayprint.TestPrintOptions.test_formatter ... ok > test_arrayprint.TestPrintOptions.test_formatter_reset ... ok > Ticket 844. ... ok > test_blasdot.test_blasdot_used ... SKIP: Skipping test: test_blasdot_used > Numpy is not compiled with _dotblas > test_blasdot.test_dot_2args ... ok > test_blasdot.test_dot_3args ... ok > test_blasdot.test_dot_3args_errors ... ok > test_creation_overflow (test_datetime.TestDateTime) ... ok > test_datetime_add (test_datetime.TestDateTime) ... ok > test_datetime_arange (test_datetime.TestDateTime) ... ok > test_datetime_array_find_type (test_datetime.TestDateTime) ... ok > test_datetime_array_str (test_datetime.TestDateTime) ... ok > test_datetime_as_string (test_datetime.TestDateTime) ... ok > test_datetime_as_string_timezone (test_datetime.TestDateTime) ... > /home/slave/ > tmp/numpy/numpy/core/tests/test_datetime.py:1319: UserWarning: pytz not > found, > pytz compatibility tests skipped > warnings.warn("pytz not found, pytz compatibility tests skipped") > ok > test_datetime_busday_holidays_count (test_datetime.TestDateTime) ... ok > test_datetime_busday_holidays_offset (test_datetime.TestDateTime) ... ok > test_datetime_busday_offset (test_datetime.TestDateTime) ... ok > test_datetime_busdaycalendar (test_datetime.TestDateTime) ... ok > test_datetime_casting_rules (test_datetime.TestDateTime) ... ok > test_datetime_divide (test_datetime.TestDateTime) ... ok > test_datetime_dtype_creation (test_datetime.TestDateTime) ... ok > test_datetime_is_busday (test_datetime.TestDateTime) ... ok > test_datetime_like (test_datetime.TestDateTime) ... ok > test_datetime_maximum_reduce (test_datetime.TestDateTime) ... ok > test_datetime_minmax (test_datetime.TestDateTime) ... ok > test_datetime_multiply (test_datetime.TestDateTime) ... ok > test_datetime_nat_casting (test_datetime.TestDateTime) ... ok > test_datetime_scalar_construction (test_datetime.TestDateTime) ... ok > test_datetime_string_conversion (test_datetime.TestDateTime) ... ERROR > test_datetime_subtract (test_datetime.TestDateTime) ... Segmentation fault > > With Python 2.6 there doesn't seem to be a problem on the same machine. > > Unfortunately, I haven't had time to investigate (I don't have Debian 6 to > use > myself, and I just started a job that doesn't involve any Python...). > However, > according to the Jenkins instance on ShiningPanda.com, the problem began > with > these changes: > > BUG: ticket #1578, Fix python-debug warning for python >= 2.7. > STY: Small style fixes. > > For now, that's all I can say; I haven't manually verified the problem > myself > (that it exists, or that it truly started after the changes above). I hope > to > be able to investigate further at the weekend, but I thought I'd post to > the > list now in case someone else can verify the problem. > > Chris > > > Segmentation fault is buried in console output of Jenkins: > > https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console > > The previous build was ok: > > https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/5/console > > Changes that Jenkins claims are responsible: > https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/ > changes#detail0 > > > It seems that python2.7 is far, far, too recent to be part of Debian 6. I mean, finding python 2.7 in recent Debian stable would be like finding an atomic cannon in a 1'st dynasty Egyptian tomb. So it is in testing, but for replication I like to know where you got it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 12 22:13:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Apr 2012 20:13:11 -0600 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 7:41 PM, Charles R Harris wrote: > > > On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball wrote: > >> Hi, >> >> I'm trying out various continuous integration options, so I happen to be >> testing NumPy on several platforms that I don't normally use. >> >> Recently, I've been getting a segmentation fault on Debian 6 (with Python >> 2.7.2): >> >> Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 >> x86_64 >> GNU/Linux (Debian GNU/Linux 6.0 \n \l) >> >> nosetests --verbose >> /home/slave/tmp/numpy/numpy/random/__init__.py:91: RuntimeWarning: >> numpy.ndarray size changed, may indicate binary incompatibility >> from mtrand import * >> test_api.test_fastCopyAndTranspose ... ok >> test_api.test_array_astype ... ok >> test_api.test_copyto_fromscalar ... ok >> test_api.test_copyto ... ok >> test_api.test_copyto_maskna ... ok >> test_api.test_copy_order ... ok >> Basic test of array2string. ... ok >> Test custom format function for each element in array. ... ok >> This should only apply to 0-D arrays. See #1218. ... ok >> test_arrayprint.TestArrayRepr.test_nan_inf ... ok >> test_str (test_arrayprint.TestComplexArray) ... ok >> test_arrayprint.TestPrintOptions.test_basic ... ok >> test_arrayprint.TestPrintOptions.test_formatter ... ok >> test_arrayprint.TestPrintOptions.test_formatter_reset ... ok >> Ticket 844. ... ok >> test_blasdot.test_blasdot_used ... SKIP: Skipping test: test_blasdot_used >> Numpy is not compiled with _dotblas >> test_blasdot.test_dot_2args ... ok >> test_blasdot.test_dot_3args ... ok >> test_blasdot.test_dot_3args_errors ... ok >> test_creation_overflow (test_datetime.TestDateTime) ... ok >> test_datetime_add (test_datetime.TestDateTime) ... ok >> test_datetime_arange (test_datetime.TestDateTime) ... ok >> test_datetime_array_find_type (test_datetime.TestDateTime) ... ok >> test_datetime_array_str (test_datetime.TestDateTime) ... ok >> test_datetime_as_string (test_datetime.TestDateTime) ... ok >> test_datetime_as_string_timezone (test_datetime.TestDateTime) ... >> /home/slave/ >> tmp/numpy/numpy/core/tests/test_datetime.py:1319: UserWarning: pytz not >> found, >> pytz compatibility tests skipped >> warnings.warn("pytz not found, pytz compatibility tests skipped") >> ok >> test_datetime_busday_holidays_count (test_datetime.TestDateTime) ... ok >> test_datetime_busday_holidays_offset (test_datetime.TestDateTime) ... ok >> test_datetime_busday_offset (test_datetime.TestDateTime) ... ok >> test_datetime_busdaycalendar (test_datetime.TestDateTime) ... ok >> test_datetime_casting_rules (test_datetime.TestDateTime) ... ok >> test_datetime_divide (test_datetime.TestDateTime) ... ok >> test_datetime_dtype_creation (test_datetime.TestDateTime) ... ok >> test_datetime_is_busday (test_datetime.TestDateTime) ... ok >> test_datetime_like (test_datetime.TestDateTime) ... ok >> test_datetime_maximum_reduce (test_datetime.TestDateTime) ... ok >> test_datetime_minmax (test_datetime.TestDateTime) ... ok >> test_datetime_multiply (test_datetime.TestDateTime) ... ok >> test_datetime_nat_casting (test_datetime.TestDateTime) ... ok >> test_datetime_scalar_construction (test_datetime.TestDateTime) ... ok >> test_datetime_string_conversion (test_datetime.TestDateTime) ... ERROR >> test_datetime_subtract (test_datetime.TestDateTime) ... Segmentation fault >> >> With Python 2.6 there doesn't seem to be a problem on the same machine. >> >> Unfortunately, I haven't had time to investigate (I don't have Debian 6 >> to use >> myself, and I just started a job that doesn't involve any Python...). >> However, >> according to the Jenkins instance on ShiningPanda.com, the problem began >> with >> these changes: >> >> BUG: ticket #1578, Fix python-debug warning for python >= 2.7. >> STY: Small style fixes. >> >> For now, that's all I can say; I haven't manually verified the problem >> myself >> (that it exists, or that it truly started after the changes above). I >> hope to >> be able to investigate further at the weekend, but I thought I'd post to >> the >> list now in case someone else can verify the problem. >> >> Chris >> >> >> Segmentation fault is buried in console output of Jenkins: >> >> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console >> >> The previous build was ok: >> >> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/5/console >> >> Changes that Jenkins claims are responsible: >> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/ >> changes#detail0 >> >> >> > It seems that python2.7 is far, far, too recent to be part of Debian 6. I > mean, finding python 2.7 in recent Debian stable would be like finding an > atomic cannon in a 1'st dynasty Egyptian tomb. So it is in testing, but for > replication I like to know where you got it. > > Python 2.7 from Debian testing works fine here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 13 00:41:36 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Apr 2012 22:41:36 -0600 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 8:13 PM, Charles R Harris wrote: > > > On Thu, Apr 12, 2012 at 7:41 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball wrote: >> >>> Hi, >>> >>> I'm trying out various continuous integration options, so I happen to be >>> testing NumPy on several platforms that I don't normally use. >>> >>> Recently, I've been getting a segmentation fault on Debian 6 (with Python >>> 2.7.2): >>> >>> Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 >>> x86_64 >>> GNU/Linux (Debian GNU/Linux 6.0 \n \l) >>> >>> nosetests --verbose >>> /home/slave/tmp/numpy/numpy/random/__init__.py:91: RuntimeWarning: >>> numpy.ndarray size changed, may indicate binary incompatibility >>> from mtrand import * >>> test_api.test_fastCopyAndTranspose ... ok >>> test_api.test_array_astype ... ok >>> test_api.test_copyto_fromscalar ... ok >>> test_api.test_copyto ... ok >>> test_api.test_copyto_maskna ... ok >>> test_api.test_copy_order ... ok >>> Basic test of array2string. ... ok >>> Test custom format function for each element in array. ... ok >>> This should only apply to 0-D arrays. See #1218. ... ok >>> test_arrayprint.TestArrayRepr.test_nan_inf ... ok >>> test_str (test_arrayprint.TestComplexArray) ... ok >>> test_arrayprint.TestPrintOptions.test_basic ... ok >>> test_arrayprint.TestPrintOptions.test_formatter ... ok >>> test_arrayprint.TestPrintOptions.test_formatter_reset ... ok >>> Ticket 844. ... ok >>> test_blasdot.test_blasdot_used ... SKIP: Skipping test: test_blasdot_used >>> Numpy is not compiled with _dotblas >>> test_blasdot.test_dot_2args ... ok >>> test_blasdot.test_dot_3args ... ok >>> test_blasdot.test_dot_3args_errors ... ok >>> test_creation_overflow (test_datetime.TestDateTime) ... ok >>> test_datetime_add (test_datetime.TestDateTime) ... ok >>> test_datetime_arange (test_datetime.TestDateTime) ... ok >>> test_datetime_array_find_type (test_datetime.TestDateTime) ... ok >>> test_datetime_array_str (test_datetime.TestDateTime) ... ok >>> test_datetime_as_string (test_datetime.TestDateTime) ... ok >>> test_datetime_as_string_timezone (test_datetime.TestDateTime) ... >>> /home/slave/ >>> tmp/numpy/numpy/core/tests/test_datetime.py:1319: UserWarning: pytz not >>> found, >>> pytz compatibility tests skipped >>> warnings.warn("pytz not found, pytz compatibility tests skipped") >>> ok >>> test_datetime_busday_holidays_count (test_datetime.TestDateTime) ... ok >>> test_datetime_busday_holidays_offset (test_datetime.TestDateTime) ... ok >>> test_datetime_busday_offset (test_datetime.TestDateTime) ... ok >>> test_datetime_busdaycalendar (test_datetime.TestDateTime) ... ok >>> test_datetime_casting_rules (test_datetime.TestDateTime) ... ok >>> test_datetime_divide (test_datetime.TestDateTime) ... ok >>> test_datetime_dtype_creation (test_datetime.TestDateTime) ... ok >>> test_datetime_is_busday (test_datetime.TestDateTime) ... ok >>> test_datetime_like (test_datetime.TestDateTime) ... ok >>> test_datetime_maximum_reduce (test_datetime.TestDateTime) ... ok >>> test_datetime_minmax (test_datetime.TestDateTime) ... ok >>> test_datetime_multiply (test_datetime.TestDateTime) ... ok >>> test_datetime_nat_casting (test_datetime.TestDateTime) ... ok >>> test_datetime_scalar_construction (test_datetime.TestDateTime) ... ok >>> test_datetime_string_conversion (test_datetime.TestDateTime) ... ERROR >>> test_datetime_subtract (test_datetime.TestDateTime) ... Segmentation >>> fault >>> >>> With Python 2.6 there doesn't seem to be a problem on the same machine. >>> >>> Unfortunately, I haven't had time to investigate (I don't have Debian 6 >>> to use >>> myself, and I just started a job that doesn't involve any Python...). >>> However, >>> according to the Jenkins instance on ShiningPanda.com, the problem began >>> with >>> these changes: >>> >>> BUG: ticket #1578, Fix python-debug warning for python >= 2.7. >>> STY: Small style fixes. >>> >>> For now, that's all I can say; I haven't manually verified the problem >>> myself >>> (that it exists, or that it truly started after the changes above). I >>> hope to >>> be able to investigate further at the weekend, but I thought I'd post to >>> the >>> list now in case someone else can verify the problem. >>> >>> Chris >>> >>> >>> Segmentation fault is buried in console output of Jenkins: >>> >>> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console >>> >>> The previous build was ok: >>> >>> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/5/console >>> >>> Changes that Jenkins claims are responsible: >>> https://jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/ >>> changes#detail0 >>> >>> >>> >> It seems that python2.7 is far, far, too recent to be part of Debian 6. I >> mean, finding python 2.7 in recent Debian stable would be like finding an >> atomic cannon in a 1'st dynasty Egyptian tomb. So it is in testing, but for >> replication I like to know where you got it. >> >> > Python 2.7 from Debian testing works fine here. > > But ActiveState python (ucs2) segfaults with >>> a = np.array(['0123456789'], 'U') >>> a Segmentation fault The string needs to be long for this to show. Chuck > -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Fri Apr 13 03:03:32 2012 From: opossumnano at gmail.com (Tiziano Zito) Date: Fri, 13 Apr 2012 09:03:32 +0200 (CEST) Subject: [Numpy-discussion] =?utf-8?q?=5BReminder=5D_Summer_School_=22Adva?= =?utf-8?q?nced_Scientific_Programming_in_Python=22_in_Kiel=2C_Germany?= Message-ID: <20120413070332.C1602249AC6@mail.bccn-berlin> ?Advanced Scientific Programming in Python ========================================= a Summer School by the G-Node and the Institute of Experimental and Applied Physics, Christian-Albrechts-Universit?t zu Kiel Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location ================= September 2?7, 2012. Kiel, Germany. Preliminary Program =================== Day 0 (Sun Sept 2) ? Best Programming Practices - Best Practices, Development Methodologies and the Zen of Python - Version control with git - Object-oriented programming & design patterns Day 1 (Mon Sept 3) ? Software Carpentry - Test-driven development, unit testing & quality assurance - Debugging, profiling and benchmarking techniques - Best practices in data visualization - Programming in teams Day 2 (Tue Sept 4) ? Scientific Tools for Python - Advanced NumPy - The Quest for Speed (intro): Interfacing to C with Cython - Advanced Python I: idioms, useful built-in data structures, generators Day 3 (Wed Sept 5) ? The Quest for Speed - Writing parallel applications in Python - Programming project Day 4 (Thu Sept 6) ? Efficient Memory Management - When parallelization does not help: the starving CPUs problem - Advanced Python II: decorators and context managers - Programming project Day 5 (Fri Sept 7) ? Practical Software Development - Programming project - The Pelita Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects. Applications ============ You can apply on-line at http://python.g-node.org Applications must be submitted before 23:59 UTC, May 1, 2012. Notifications of acceptance will be sent by June 1, 2012. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate last time was around 20%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. You are encouraged to go through the introductory material available on the website. Faculty ======= - Francesc Alted, Continuum Analytics Inc., USA - Pietro Berkes, Enthought Inc., UK - Valentin Haenel, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland - Zbigniew J?drzejewski-Szmek, Faculty of Physics, University of Warsaw, Poland - Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland - Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy - Rike-Benjamin Schuppner, Technologit GbR, Germany - Bartosz Tele?czuk, Unit? de Neurosciences Information et Complexit?, Centre National de la Recherche Scientifique, France - St?fan van der Walt, Helen Wills Neuroscience Institute, University of California Berkeley, USA - Bastian Venthur, Berlin Institute of Technology and Bernstein Focus Neurotechnology, Germany - Niko Wilbert, TNG Technology Consulting GmbH, Germany - Tiziano Zito, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany Organized by Christian T. Steigies and Christian Drews of the Institute of Experimental and Applied Physics, Christian-Albrechts-Universit?t zu Kiel , and by Zbigniew J?drzejewski-Szmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Website: http://python.g-node.org Contact: python-info at g-node.org From bryanv at continuum.io Fri Apr 13 11:08:17 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Fri, 13 Apr 2012 10:08:17 -0500 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: <4F884161.3020606@continuum.io> On 4/12/12 8:41 PM, Charles R Harris wrote: > It seems that python2.7 is far, far, too recent to be part of Debian > 6. I mean, finding python 2.7 in recent Debian stable would be like > finding an atomic cannon in a 1'st dynasty Egyptian tomb. So it is in > testing, but for replication I like to know where you got it. > > Chuck Just to add a data point, Maggie built python 2.7.2 on Debian 32 and 64 bit VMs yesterday and saw the same error. Interestingly, it segfaults in different places: 32 bit ==== Program received signal SIGSEGV, Segmentation fault. PyObject_Malloc (nbytes=48) at Objects/obmalloc.c:788 788 if ((pool->freeblock = *(block **)bp) != NULL) { #0 PyObject_Malloc (nbytes=48) at Objects/obmalloc.c:788 #1 0x08095ad5 in _PyObject_New (tp=0xb7b80180) at Objects/object.c:244 #2 0xb7af433f in PyArray_DescrNew (type_num=18) at numpy/core/src/multiarray/descriptor.c:1457 #3 PyArray_DescrNewFromType (type_num=18) at numpy/core/src/multiarray/descriptor.c:1106 #4 0xb7b1631c in PyArray_DTypeFromObjectHelper (obj=, maxdims=, out_contains_na=0xbfffbae0, out_dtype=0xbfffbae8, string_type=18) at numpy/core/src/multiarray/common.c:259 64 bit ==== Program received signal SIGSEGV, Segmentation fault. PyType_IsSubtype (a=0x2d00330030002d, b=0x7ffff6624b20) at Objects/typeobject.c:1132 1132 if (!(a->tp_flags & Py_TPFLAGS_HAVE_CLASS)) (gdb) print a $1 = (PyTypeObject *) 0x2d00330030002d #0 PyType_IsSubtype (a=0x2d00330030002d, b=0x7ffff6624b20) at Objects/typeobject.c:1132 #1 0x00007ffff639b8ad in STRING_setitem (op=0x134c930, ov=0x154f630 "", ap=0x144f1d0) at numpy/core/src/multiarray/arraytypes.c.src:476 #2 0x00007ffff639d466 in UNICODE_to_STRING (ip=0x1111f00 "2", op=0x154f630 "", n=, aip=0x14fb5c0, aop=0x144f1d0) at numpy/core/src/multiarray/arraytypes.c.src:1378 #3 0x00007ffff62fa27f in _strided_to_strided_contig_align_wrap (dst=0x1
, dst_stride=, src=0x7fffffff9c40 "\003", src_stride=, N=140737488329280, src_itemsize=40, data=0x154f5c0) at numpy/core/src/multiarray/dtype_transfer.c:354 -------------- next part -------------- An HTML attachment was scrubbed... URL: From talljimbo at gmail.com Fri Apr 13 16:21:54 2012 From: talljimbo at gmail.com (Jim Bosch) Date: Fri, 13 Apr 2012 16:21:54 -0400 Subject: [Numpy-discussion] problem importing NumPy from C/C++ Message-ID: <4F888AE2.8060308@gmail.com> One of my colleagues is having some difficulty with some of my code that uses the NumPy C-API; I think we're seeing the C import_array() function claim that it can't find NumPy, even though "import numpy" works just fine in Python. Here's a minimal embedded-Python program that is enough to cause problems: --------- #include "Python.h" #include "numpy/arrayobject.h" void doImport() { import_array(); } int main() { int result = 0; Py_Initialize(); doImport(); if (PyErr_Occurred()) { result = 1; } else { npy_intp dims = 2; PyObject * a = PyArray_SimpleNew(1, &dims, NPY_INT); if (!a) result = 1; Py_DECREF(a); } Py_Finalize(); return result; } --------- That compiles and links fine, but when running it we see: --------- ImportError: numpy.core.multiarray failed to import --------- and the exit code is 1. Note that we can run python and successfully import numpy there, in what's otherwise the same environment, so it doesn't appear to be PYTHONPATH issue (or, I think, an environment issue, though there might be a difference between embedding and extending). All of the above does work on a number of other systems, so I suspect something went wrong with this particular NumPy install, but it claims to have been successful. This is with gcc 4.1.2, RHEL 5. Any ideas? Thanks! Jim Bosch From michael.forbes at gmail.com Fri Apr 13 20:24:49 2012 From: michael.forbes at gmail.com (Michael McNeil Forbes) Date: Fri, 13 Apr 2012 17:24:49 -0700 Subject: [Numpy-discussion] Keyword argument support for vectorize. In-Reply-To: References: Message-ID: <31645A34-99B5-4CE3-ACE5-A3EFD15B5F41@gmail.com> On 9 Apr 2012, at 3:02 AM, Nathaniel Smith wrote: > functools was added in Python 2.5, and so far numpy is still trying to > maintain 2.4 compatibility. Thanks: I forgot about that. I have attached a 2.4 compatible patch, updated docs, and tests for review to ticket #2100. This also includes a patch and regression test for ticket #1156 so it can be closed out too after review. http://projects.scipy.org/numpy/ticket/1156 http://projects.scipy.org/numpy/ticket/2100 Michael. From irwin.zaid at physics.ox.ac.uk Mon Apr 16 06:22:20 2012 From: irwin.zaid at physics.ox.ac.uk (Irwin Zaid) Date: Mon, 16 Apr 2012 11:22:20 +0100 Subject: [Numpy-discussion] [f2py] f2py ignores 'fortranname' inside F90 modules? Message-ID: <4F8BF2DC.2020401@physics.ox.ac.uk> Hi all, I've been having an issue with f2py simply ignoring the fortranname option if the Fortran subroutine is inside an F90 module. That option is useful for renaming Fortran subroutines. I don't know if this behaviour is to be expected, or if I am doing something wrong. I would definitely appreciate any help! As an example, here is code that correctly produces a Python module 'test' with a single Fortran subroutine 'my_wrapped_subroutine'. TEST_SUBROUTINE.F90 ------------------- subroutine my_subroutine() write (*,*) 'Hello, world!' end subroutine my_subroutine TEST_SUBROUTINE.PYF ------------------- python module test interface subroutine my_wrapped_subroutine() fortranname my_subroutine end subroutine my_wrapped_subroutine end interface end python module test But, when the Fortran subroutine 'my_subroutine' is placed inside a module, the fortranname option seems to be entirely ignored. The following example fails to compile. The error is "Error: Symbol 'my_wrapped_subroutine' referenced at (1) not found in module 'my_module'". TEST_MODULE.F90 --------------- module my_module contains subroutine my_subroutine() write (*,*) 'Hello, world!' end subroutine my_subroutine end module my_module TEST_MODULE.PYF --------------- python module test interface module my_module contains subroutine my_wrapped_subroutine() fortranname my_subroutine end subroutine my_wrapped_subroutine end module my_module end interface end python module test F2py is a great tool aside from this and a few other minor quibbles. So thanks a lot! Cheers, Irwin From tanner at gmx.de Mon Apr 16 11:50:24 2012 From: tanner at gmx.de (Thomas Tanner) Date: Mon, 16 Apr 2012 17:50:24 +0200 Subject: [Numpy-discussion] setting the same object value with a mask? Message-ID: <4F8C3FC0.3090807@gmx.de> Hi, is there an elegant method for assigning the same value to several indices in a ndarray? (in this case with dtype=object) example: a = empty(4,'O') # object ndarray x = [1,2,'f'] # the value to be set for some indicies - the value is not scalar a[array((True,False,True))] = x # works like put -> not what I want a[array((0,2))] = x # same effect print a # -> [1 None 2 None] a[0],a[2] = x,x # set explicitly - works print a # -> [[1, 2, 'f'] None [1, 2, 'f'] None] thanks for your help! cheers, -- Thomas Tanner ------ email: tanner at gmx.de GnuPG: 1024/5924D4DD From ralf.gommers at googlemail.com Mon Apr 16 17:09:31 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 16 Apr 2012 23:09:31 +0200 Subject: [Numpy-discussion] 1.7 blockers In-Reply-To: References: Message-ID: On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Hi All, > > There several problems with numpy master that need to be fixed before a > release can be considered. > > 1. Datetime on windows with mingw. > > Opened http://projects.scipy.org/numpy/ticket/2108 for the last datetime failures. > > 1. Bus error on SPARC, ticket #2076. > 2. NA and real/complex views of complex arrays. > > Number 1 has been proved to be particularly difficult, any help or > suggestions for that would be much appreciated. The current work has been > going in pull request 214 . > > This isn't to say that there aren't a ton of other things that need fixing > or that we can skip out on the current stack of pull requests, but I think > it is impossible to consider a release while those three problems are > outstanding. > We've closed a number of open issues and merged some PRs, but haven't made much progress on the issues above. Especially for the NA issues I'm not sure what's going on. Is anyone working on this at the moment? If so, can he/she give an update of things to change/fix and an estimate of how long that will take? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 16 17:26:10 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 15:26:10 -0600 Subject: [Numpy-discussion] 1.7 blockers In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 3:09 PM, Ralf Gommers wrote: > > > On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> There several problems with numpy master that need to be fixed before a >> release can be considered. >> >> 1. Datetime on windows with mingw. >> >> Opened http://projects.scipy.org/numpy/ticket/2108 for the last datetime > failures. > >> >> 1. Bus error on SPARC, ticket #2076. >> 2. NA and real/complex views of complex arrays. >> >> Number 1 has been proved to be particularly difficult, any help or >> suggestions for that would be much appreciated. The current work has been >> going in pull request 214 . >> >> This isn't to say that there aren't a ton of other things that need >> fixing or that we can skip out on the current stack of pull requests, but I >> think it is impossible to consider a release while those three problems are >> outstanding. >> > > We've closed a number of open issues and merged some PRs, but haven't made > much progress on the issues above. Especially for the NA issues I'm not > sure what's going on. Is anyone working on this at the moment? If so, can > he/she give an update of things to change/fix and an estimate of how long > that will take? > > I think I can deal with the NA issues, just haven't got around to it. I'll try to get to it sometime in the next week. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Apr 16 17:27:27 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 16 Apr 2012 17:27:27 -0400 Subject: [Numpy-discussion] adding a cut function to numpy Message-ID: Hi, I have a pull request here [1] to add a cut function similar to R's [2]. It seems there are often requests for similar functionality. It's something I'm making use of for my own work and would like to use in statstmodels and in generating instances of pandas' Factor class, but is this generally something people would find useful to warrant its inclusion in numpy? It will be even more useful I think with an enum dtype in numpy. If you aren't familiar with cut, here's a potential use case. Going from a continuous to a categorical variable. Given a continuous variable [~/] [8]: age = np.random.randint(15,70, size=100) [~/] [9]: age [9]: array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27, 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69, 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40, 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25, 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27, 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) Give me a variable where people are in age groups (lower bound is not inclusive) [~/] [10]: groups = [14, 25, 35, 45, 55, 70] [~/] [11]: age_cat = np.cut(age, groups) [~/] [12]: age_cat [12]: array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, 3, 2, 3, 2, 1, 3, 2, 2]) Skipper [1] https://github.com/numpy/numpy/pull/248 [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html From njs at pobox.com Mon Apr 16 17:29:02 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 16 Apr 2012 22:29:02 +0100 Subject: [Numpy-discussion] 1.7 blockers In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 10:09 PM, Ralf Gommers wrote: > > > On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris > wrote: >> >> Hi All, >> >> There several problems with numpy master that need to be fixed before a >> release can be considered. >> >> Datetime on windows with mingw. > > Opened http://projects.scipy.org/numpy/ticket/2108 for the last datetime > failures. >> >> Bus error on SPARC, ticket #2076. >> NA and real/complex views of complex arrays. >> >> Number 1 has been proved to be particularly difficult, any help or >> suggestions for that would be much appreciated. The current work has been >> going in pull request 214. >> >> This isn't to say that there aren't a ton of other things that need fixing >> or that we can skip out on the current stack of pull requests, but I think >> it is impossible to consider a release while those three problems are >> outstanding. > > We've closed a number of open issues and merged some PRs, but haven't made > much progress on the issues above. Especially for the NA issues I'm not sure > what's going on. Is anyone working on this at the moment? If so, can he/she > give an update of things to change/fix and an estimate of how long that will > take? There's been some ongoing behind-the-scenes discussion of the overall NA problem, but I wouldn't try to give an estimate on the outcome. My personal opinion is that given you already added the note to the docs that masked arrays are in a kind of experimental prototype state for this release, some small inconsistencies in their behaviour shouldn't be a release blocker. The release notes already have a whole list of stuff that's unsupported in the presence of masks ("Fancy indexing...UFunc.accumulate, UFunc.reduceat...where=...ndarray.argmax, ndarray.argmin..."), I'm not sure why .real and .imag are blockers and they aren't :-). Maybe just make a note of them on that list? (Unless of course Chuck fixes them before the other blockers are finished, as per his email that just arrived.) -- Nathaniel From tsyu80 at gmail.com Mon Apr 16 17:51:24 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Mon, 16 Apr 2012 17:51:24 -0400 Subject: [Numpy-discussion] adding a cut function to numpy In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold wrote: > Hi, > > I have a pull request here [1] to add a cut function similar to R's > [2]. It seems there are often requests for similar functionality. It's > something I'm making use of for my own work and would like to use in > statstmodels and in generating instances of pandas' Factor class, but > is this generally something people would find useful to warrant its > inclusion in numpy? It will be even more useful I think with an enum > dtype in numpy. > > If you aren't familiar with cut, here's a potential use case. Going > from a continuous to a categorical variable. > > Given a continuous variable > > [~/] > [8]: age = np.random.randint(15,70, size=100) > > [~/] > [9]: age > [9]: > array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27, > 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69, > 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40, > 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25, > 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27, > 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) > > Give me a variable where people are in age groups (lower bound is not > inclusive) > > [~/] > [10]: groups = [14, 25, 35, 45, 55, 70] > > [~/] > [11]: age_cat = np.cut(age, groups) > > [~/] > [12]: age_cat > [12]: > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, > 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, > 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, > 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, > 3, 2, 3, 2, 1, 3, 2, 2]) > > Skipper > > [1] https://github.com/numpy/numpy/pull/248 > [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html > Is this the same as `np.searchsorted` (with reversed arguments)? In [292]: np.searchsorted(groups, age) Out[292]: array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, 3, 2, 3, 2, 1, 3, 2, 2]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Apr 16 17:55:31 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 16 Apr 2012 23:55:31 +0200 Subject: [Numpy-discussion] 1.7 blockers In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 11:29 PM, Nathaniel Smith wrote: > On Mon, Apr 16, 2012 at 10:09 PM, Ralf Gommers > wrote: > > > > > > On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris > > wrote: > >> > >> Hi All, > >> > >> There several problems with numpy master that need to be fixed before a > >> release can be considered. > >> > >> Datetime on windows with mingw. > > > > Opened http://projects.scipy.org/numpy/ticket/2108 for the last datetime > > failures. > >> > >> Bus error on SPARC, ticket #2076. > >> NA and real/complex views of complex arrays. > >> > >> Number 1 has been proved to be particularly difficult, any help or > >> suggestions for that would be much appreciated. The current work has > been > >> going in pull request 214. > >> > >> This isn't to say that there aren't a ton of other things that need > fixing > >> or that we can skip out on the current stack of pull requests, but I > think > >> it is impossible to consider a release while those three problems are > >> outstanding. > > > > We've closed a number of open issues and merged some PRs, but haven't > made > > much progress on the issues above. Especially for the NA issues I'm not > sure > > what's going on. Is anyone working on this at the moment? If so, can > he/she > > give an update of things to change/fix and an estimate of how long that > will > > take? > > There's been some ongoing behind-the-scenes discussion of the overall > NA problem, but I wouldn't try to give an estimate on the outcome. My > personal opinion is that given you already added the note to the docs > that masked arrays are in a kind of experimental prototype state for > this release, some small inconsistencies in their behaviour shouldn't > be a release blocker. > The release notes already have a whole list of stuff that's > unsupported in the presence of masks ("Fancy > indexing...UFunc.accumulate, UFunc.reduceat...where=...ndarray.argmax, > ndarray.argmin..."), I'm not sure why .real and .imag are blockers and > they aren't :-). Maybe just make a note of them on that list? > > (Unless of course Chuck fixes them before the other blockers are > finished, as per his email that just arrived.) > Good point. If you look at the open tickets for 1.7.0 ( http://projects.scipy.org/numpy/report/3) with a view on getting a release out soon, you could do the following: #2066 : close as fixed. #2078 : regression, should fix. #1578 : important to fix, but not a regression. Include only if fixed on time. #1755 : mark as knownfail. #2025 : document as not working as expected yet. #2047 : fix or postpone. Pearu indicated this will take him a few hours. #2076 : one of many. not a blocker, postpone. #2101 : need to do this. shouldn't cost much time. #2108 : status unclear. likely a blocker. Can someone who knows about datetime give some feedback on #2108? If that's not a blocker, a release within a couple of weeks can be considered. Although not fixing #1578 is questionable, and we need to revisit the LTS release debate... Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Apr 16 18:01:29 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 16 Apr 2012 18:01:29 -0400 Subject: [Numpy-discussion] adding a cut function to numpy In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 5:51 PM, Tony Yu wrote: > > > On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold > wrote: >> >> Hi, >> >> I have a pull request here [1] to add a cut function similar to R's >> [2]. It seems there are often requests for similar functionality. It's >> something I'm making use of for my own work and would like to use in >> statstmodels and in generating instances of pandas' Factor class, but >> is this generally something people would find useful to warrant its >> inclusion in numpy? It will be even more useful I think with an enum >> dtype in numpy. >> >> If you aren't familiar with cut, here's a potential use case. Going >> from a continuous to a categorical variable. >> >> Given a continuous variable >> >> [~/] >> [8]: age = np.random.randint(15,70, size=100) >> >> [~/] >> [9]: age >> [9]: >> array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27, >> ? ? ? 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69, >> ? ? ? 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40, >> ? ? ? 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25, >> ? ? ? 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27, >> ? ? ? 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) >> >> Give me a variable where people are in age groups (lower bound is not >> inclusive) >> >> [~/] >> [10]: groups = [14, 25, 35, 45, 55, 70] >> >> [~/] >> [11]: age_cat = np.cut(age, groups) >> >> [~/] >> [12]: age_cat >> [12]: >> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, >> 3, >> ? ? ? 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, >> ? ? ? 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, >> ? ? ? 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, >> ? ? ? 3, 2, 3, 2, 1, 3, 2, 2]) >> >> Skipper >> >> [1] https://github.com/numpy/numpy/pull/248 >> [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html > > > Is this the same as `np.searchsorted` (with reversed arguments)? > > In [292]:?np.searchsorted(groups, age) > Out[292]: > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, > ? ? ? ?1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, > ? ? ? ?3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, > ? ? ? ?3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, > ? ? ? ?3, 2, 3, 2, 1, 3, 2, 2]) > That's news to me, and I don't know how I missed it. It looks like there is overlap, but cut will also do binning for equal width categorization [~/] [21]: np.cut(age, 6) [21]: array([5, 2, 1, 2, 3, 6, 5, 2, 1, 1, 4, 6, 3, 5, 3, 4, 2, 1, 2, 1, 6, 2, 4, 1, 5, 2, 4, 5, 2, 3, 6, 2, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4, 4, 2, 5, 5, 3, 2, 5, 4, 4, 2, 5, 1, 3, 2, 5, 1, 5, 1, 1, 2, 1, 2, 3, 4, 2, 5, 3, 2, 4, 3, 1, 6, 2, 2, 1, 1, 3, 4, 2, 1, 6, 6, 6, 3, 3, 6, 3, 3, 3, 3, 1, 3, 2, 2]) and explicitly handles the case with constant x [~/] [26]: x = np.ones(100)*6 [~/] [27]: np.cut(x, 5) [27]: array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]) I guess I could patch searchsorted. Thoughts? Skipper From cmutel at gmail.com Mon Apr 16 18:06:11 2012 From: cmutel at gmail.com (Christopher Mutel) Date: Tue, 17 Apr 2012 00:06:11 +0200 Subject: [Numpy-discussion] Different behaviour of python built sum and addition on ndarrays Message-ID: So, for both 1.5 and 1.6 (at least), it appears that the builtin sum does not add ndarrays the way "+" (and operator.add) do: a = np.arange(10).reshape((2,5)) b = np.arange(10, 20).reshape((2,5)) sum(a,b) Out[5]: array([[15, 18, 21, 24, 27], [20, 23, 26, 29, 32]]) a + b Out[6]: array([[10, 12, 14, 16, 18], [20, 22, 24, 26, 28]]) Is this expected? I couldn't find a description of why this would occur in the mailing list or in the documentation. I can't figure out what sum does at all, actually, as it doesn't seem to be a case of strange broadcasting or any other tricks I tried. Yours, Chris -- ############################ Chris Mutel ?kologisches Systemdesign - Ecological Systems Design Institut f.Umweltingenieurwissenschaften - Institute for Environmental Engineering ETH Z?rich - HIF C 44 - Schafmattstr. 6 8093 Z?rich Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61 ############################ From travis at continuum.io Mon Apr 16 18:06:52 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 17:06:52 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: Message-ID: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> There is an issue with the NumPy 1.7 release that we all need to understand. Doesn't including the missing-data attributes in the NumPy structure in a released version of NumPy basically commit to including those attributes in NumPy 1.8? I'm not comfortable with that, is everyone else? One possibility is to move those attributes to a C-level sub-class of NumPy. I have heard from a few people that they are not excited by the growth of the NumPy data-structure by the 3 pointers needed to hold the masked-array storage. This is especially true when there is talk to potentially add additional attributes to the NumPy array (for labels and other meta-information). If you are willing to let us know how you feel about this, please speak up. Mark Wiebe will be in Austin for about 3 months. He and I will be hashing some of this out in the first week or two. We will present any proposal and ask questions to this list before acting. We will be using some phone calls and face-to-face communications to increase the bandwidth and speed of the conversations (not to exclude anyone). If you would like to be part of the in-person discussions let me know -- or just make your views known here --- they will be taken seriously. The goal is consensus for any major change in NumPy. If we can't get consensus, then we vote on this list and use a super-majority. If we can't get a super-majority, then except in rare circumstances we can't move forward. Heavy users of NumPy get higher voting privileges. My perspective is that we don't have consensus on the current additions to the NumPy data-structure to have the current additional attributes on the NumPy data-structure be included for long-term release. Best, -Travis On Mar 25, 2012, at 6:27 PM, Charles R Harris wrote: > > > On Sun, Mar 25, 2012 at 3:14 PM, Ralf Gommers wrote: > > > On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris wrote: > Hi All, > > There several problems with numpy master that need to be fixed before a release can be considered. > Datetime on windows with mingw. > Bus error on SPARC, ticket #2076. > NA and real/complex views of complex arrays. > Number 1 has been proved to be particularly difficult, any help or suggestions for that would be much appreciated. The current work has been going in pull request 214. > > This isn't to say that there aren't a ton of other things that need fixing or that we can skip out on the current stack of pull requests, but I think it is impossible to consider a release while those three problems are outstanding. > Why do you consider (2) a blocker? Not saying it's not important, but there are eight other open tickets with segfaults. Some are more esoteric than other, but I don't see why for example #1713 and #1808 are less important than this one. > > #1522 provides a patch that fixes a segfault by the way, could use a review. > > > I wasn't aware of the other segfaults, I'd like to get them all fixed... The list was meant to elicit additions. > > I don't know where the missed floating point errors come from, but they are somewhat dependent on the compiler doing the right thing and hardware support. I'd welcome any insight into why we get them on SPARC (underflow) and Windows (overflow). The windows buildbot doesn't seem to be updating correctly since it is still missing the combinations method that is now part of the test module. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 16 18:14:36 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 16 Apr 2012 23:14:36 +0100 Subject: [Numpy-discussion] Different behaviour of python built sum and addition on ndarrays In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 11:06 PM, Christopher Mutel wrote: > So, for both 1.5 and 1.6 (at least), it appears that the builtin sum > does not add ndarrays the way "+" (and operator.add) do: > > a = np.arange(10).reshape((2,5)) > b = np.arange(10, 20).reshape((2,5)) > sum(a,b) > Out[5]: > array([[15, 18, 21, 24, 27], > ? ? ? [20, 23, 26, 29, 32]]) > > a + b > Out[6]: > array([[10, 12, 14, 16, 18], > ? ? ? [20, 22, 24, 26, 28]]) > > Is this expected? I couldn't find a description of why this would > occur in the mailing list or in the documentation. I can't figure out > what sum does at all, actually, as it doesn't seem to be a case of > strange broadcasting or any other tricks I tried. The 'sum' function that comes builtin to the python language does this: def sum(iterable, start=0): value = start for item in iterable: value = value + item return value So your 'b' is acting as an initializer for this sum, which may not be what you expect. 'sum' is almost always called with only one argument.[1] Next, note that if you try to iterate over a Numpy 2-d array, it gives you each row: In [15]: for row in a: print "row is:", row row is: [0 1 2 3 4] row is: [5 6 7 8 9] So sum(a, b) is in fact computing this: In [16]: b + a[0, :] + a[1, :] Out[16]: array([[15, 18, 21, 24, 27], [20, 23, 26, 29, 32]]) Moral of the story: use np.sum, it's less confusing :-) HTH, -- Nathaniel [1] The only exception I've ever run into is that if you want to concatenate a list-of-lists, then this is a cute and useful trick: In [13]: sum([["a", "b"], ["c", "d"]], []) Out[13]: ['a', 'b', 'c', 'd'] From ralf.gommers at googlemail.com Mon Apr 16 18:21:43 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 17 Apr 2012 00:21:43 +0200 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 12:06 AM, Travis Oliphant wrote: > There is an issue with the NumPy 1.7 release that we all need to > understand. Doesn't including the missing-data attributes in the NumPy > structure in a released version of NumPy basically commit to including > those attributes in NumPy 1.8? > We clearly labeled NA as experimental, so some changes are to be expected. But not complete removal - so yes, if we release them they should stay in some form. > I'm not comfortable with that, is everyone else? One possibility is to > move those attributes to a C-level sub-class of NumPy. > That's the first time I've heard this. Until now, we have talked a lot about adding bitmasks and API changes, not about complete removal. My assumption was that the experimental label was enough. From Nathaniel's reaction I gathered the same. It looks like too many conversations on this topic are happening off-list. Ralf > I have heard from a few people that they are not excited by the growth of > the NumPy data-structure by the 3 pointers needed to hold the masked-array > storage. This is especially true when there is talk to potentially add > additional attributes to the NumPy array (for labels and other > meta-information). If you are willing to let us know how you feel > about this, please speak up. > > Mark Wiebe will be in Austin for about 3 months. He and I will be hashing > some of this out in the first week or two. We will present any proposal > and ask questions to this list before acting. We will be using some > phone calls and face-to-face communications to increase the bandwidth and > speed of the conversations (not to exclude anyone). If you would like to > be part of the in-person discussions let me know -- or just make your views > known here --- they will be taken seriously. > > The goal is consensus for any major change in NumPy. If we can't get > consensus, then we vote on this list and use a super-majority. If we > can't get a super-majority, then except in rare circumstances we can't move > forward. Heavy users of NumPy get higher voting privileges. > > My perspective is that we don't have consensus on the current additions to > the NumPy data-structure to have the current additional attributes on the > NumPy data-structure be included for long-term release. > > Best, > > -Travis > > > > > > On Mar 25, 2012, at 6:27 PM, Charles R Harris wrote: > > > > On Sun, Mar 25, 2012 at 3:14 PM, Ralf Gommers > wrote: > >> >> >> On Sat, Mar 24, 2012 at 10:13 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> There several problems with numpy master that need to be fixed before a >>> release can be considered. >>> >>> 1. Datetime on windows with mingw. >>> 2. Bus error on SPARC, ticket #2076. >>> 3. NA and real/complex views of complex arrays. >>> >>> Number 1 has been proved to be particularly difficult, any help or >>> suggestions for that would be much appreciated. The current work has been >>> going in pull request 214 . >>> >>> This isn't to say that there aren't a ton of other things that need >>> fixing or that we can skip out on the current stack of pull requests, but I >>> think it is impossible to consider a release while those three problems are >>> outstanding. >>> >> Why do you consider (2) a blocker? Not saying it's not important, but >> there are eight other open tickets with segfaults. Some are more esoteric >> than other, but I don't see why for example #1713 and #1808 are less >> important than this one. >> >> #1522 provides a patch that fixes a segfault by the way, could use a >> review. >> >> > I wasn't aware of the other segfaults, I'd like to get them all fixed... > The list was meant to elicit additions. > > I don't know where the missed floating point errors come from, but they > are somewhat dependent on the compiler doing the right thing and hardware > support. I'd welcome any insight into why we get them on SPARC (underflow) > and Windows (overflow). The windows buildbot doesn't seem to be updating > correctly since it is still missing the combinations method that is now > part of the test module. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Mon Apr 16 18:27:51 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 16 Apr 2012 15:27:51 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers wrote: > That's the first time I've heard this. Until now, we have talked a lot about > adding bitmasks and API changes, not about complete removal. My assumption > was that the experimental label was enough. From Nathaniel's reaction I > gathered the same. It looks like too many conversations on this topic are > happening off-list. My impression was that Travis was just suggesting that as an option here for discussion, not presenting it as something discussed elsewhere. I read Travis' email precisely as restarting the discussion for consideration of the issues in full public view (+ calls/skype open to anyone interested for bandwidth purposes), so in this case I don't think there's any background off-list to worry about. At least that's how I read it... Cheers, f From travis at continuum.io Mon Apr 16 18:33:11 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 17:33:11 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: <82302F4C-1B2A-4CC4-9BCD-472E324CAA07@continuum.io> No off list discussions have been happening material to this point. I am basically stating my view for the first time. I have delayed because I realize it is not a pleasant view and I was hoping I could end up resolving it favorably. But, it needs to be discussed before 1.7 is released. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 5:27 PM, Fernando Perez wrote: > On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers > wrote: >> That's the first time I've heard this. Until now, we have talked a lot about >> adding bitmasks and API changes, not about complete removal. My assumption >> was that the experimental label was enough. From Nathaniel's reaction I >> gathered the same. It looks like too many conversations on this topic are >> happening off-list. > > My impression was that Travis was just suggesting that as an option > here for discussion, not presenting it as something discussed > elsewhere. I read Travis' email precisely as restarting the > discussion for consideration of the issues in full public view (+ > calls/skype open to anyone interested for bandwidth purposes), so in > this case I don't think there's any background off-list to worry > about. At least that's how I read it... > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Mon Apr 16 18:40:50 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 16:40:50 -0600 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <82302F4C-1B2A-4CC4-9BCD-472E324CAA07@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <82302F4C-1B2A-4CC4-9BCD-472E324CAA07@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 4:33 PM, Travis Oliphant wrote: > No off list discussions have been happening material to this point. I am > basically stating my view for the first time. I have delayed because I > realize it is not a pleasant view and I was hoping I could end up resolving > it favorably. > > But, it needs to be discussed before 1.7 is released. > > What is the problem with three extra pointers? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Apr 16 19:01:25 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 17 Apr 2012 01:01:25 +0200 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez wrote: > On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers > wrote: > > That's the first time I've heard this. Until now, we have talked a lot > about > > adding bitmasks and API changes, not about complete removal. My > assumption > > was that the experimental label was enough. From Nathaniel's reaction I > > gathered the same. It looks like too many conversations on this topic are > > happening off-list. > > My impression was that Travis was just suggesting that as an option > here for discussion, not presenting it as something discussed > elsewhere. >From "I have heard from a few people that they are not excited ...." I deduce it was discussed to some extent. I read Travis' email precisely as restarting the > discussion for consideration of the issues in full public view It wasn't restating anything, it's completely opposite to the part that I thought we did reach consensus on (*not* backing out changes). I stated as much when first discussing a 1.7.0 in December, http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, with no one disagreeing. It's perfectly fine to reconsider any previous decisions/discussions of course. However, I do now draw the conclusion that it's best to wait for this issue to be resolved before considering a new release. I had been working on closing tickets and cleaning up loose ends for 1.7.0, and pinging others to do the same. I guess I'll stop doing that for now, until the renewed NA debate has been settled. If there are bug fixes that are important (like the Debian segfaults with Python debug builds), we can do a 1.6.2 release. Ralf (+ > calls/skype open to anyone interested for bandwidth purposes), so in > this case I don't think there's any background off-list to worry > about. At least that's how I read it... > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceball at gmail.com Mon Apr 16 19:11:16 2012 From: ceball at gmail.com (Chris Ball) Date: Mon, 16 Apr 2012 23:11:16 +0000 (UTC) Subject: [Numpy-discussion] Test failures - which dependencies am I missing? Message-ID: Hi, When I build NumPy and then run the tests on Ubuntu (10.04 LTS) and Debian (6.1), I always seem to get several failures. I guess most of these failures come from not having some dependencies installed, but I can't figure out which ones by reading e.g. http://docs.scipy.org/doc/numpy/user/install.html. It would be great if someone could tell me what I've likely missed! I remember Gael Varoquaux posted a few weeks back with some of the same errors (http://thread.gmane.org/gmane.comp.python.numeric.general/49032/). He was also using Ubuntu (though a newer version). Anyway, on Ubuntu here are the errors - other than known failures - after "python setup.py build_ext -i" (or python setup.py build_ext -i -- fcompiler=gnu") followed by nosetests: ====================================================================== ERROR: Failure: ImportError (cannot import name fib2) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/f2py_ext/tests/ test_fib2.py", line 3, in from f2py_ext import fib2 ImportError: cannot import name fib2 ====================================================================== ERROR: Failure: ImportError (cannot import name foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/f2py_f90_ext/tests/ test_foo.py", line 3, in from f2py_f90_ext import foo ImportError: cannot import name foo ====================================================================== ERROR: Failure: ImportError (cannot import name fib3) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/gen_ext/tests/ test_fib3.py", line 3, in from gen_ext import fib3 ImportError: cannot import name fib3 ====================================================================== ERROR: Failure: ImportError (No module named primes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/pyrex_ext/tests/ test_primes.py", line 3, in from pyrex_ext.primes import primes ImportError: No module named primes ====================================================================== ERROR: Failure: ImportError (cannot import name example) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/swig_ext/tests/ test_example.py", line 3, in from swig_ext import example ImportError: cannot import name example ====================================================================== ERROR: Failure: ImportError (cannot import name example2) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/scratch/ceball/numpy/numpy/distutils/tests/swig_ext/tests/ test_example2.py", line 3, in from swig_ext import example2 ImportError: cannot import name example2 ---------------------------------------------------------------------- I see something similar on Debian 6 (x86_64). You can see the full output for that at https://jenkins.shiningpanda.com/scipy/job/NumPy/ PLATFORM=debian6,PYTHON=CPython-2.7/19/consoleFull, in case it helps. Thanks, Chris From ceball at gmail.com Mon Apr 16 19:16:43 2012 From: ceball at gmail.com (Chris Ball) Date: Mon, 16 Apr 2012 23:16:43 +0000 (UTC) Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? References: Message-ID: Charles R Harris gmail.com> writes: > > > On Thu, Apr 12, 2012 at 8:13 PM, Charles R Harris gmail.com> wrote: > > On Thu, Apr 12, 2012 at 7:41 PM, Charles R Harris gmail.com> wrote: > > On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball gmail.com> wrote: > > > > Hi, > I'm trying out various continuous integration options, so I happen to be > testing NumPy on several platforms that I don't normally use. > Recently, I've been getting a segmentation fault on Debian 6 (with Python > 2.7.2): > Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64 > GNU/Linux (Debian GNU/Linux 6.0 \n \l) ... > Segmentation fault is buried in console output of Jenkins:https:// jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console > The previous build was ok:https://jenkins.shiningpanda.com/scipy/job/NumPy/ PYTHON=CPython-2.7/5/console > Changes that Jenkins claims are responsible:https://jenkins.shiningpanda.com/ scipy/job/NumPy/PYTHON=CPython-2.7/6/ > changes#detail0 > > > > It seems that python2.7 is far, far, too recent to be part of Debian 6. I mean, finding python 2.7 in recent Debian stable would be like finding an atomic cannon in a 1'st dynasty Egyptian tomb. So it is in testing, but for replication I like to know where you got it. > > > > > > Python 2.7 from Debian testing works fine here. > > > > > But ActiveState python (ucs2) segfaults with>>> a = np.array(['0123456789'], 'U') > >>> aSegmentation faultThe string needs to be long for this to show.Chuck Sorry for the delay. I'll let you know about that as soon as I can (I didn't set up the machine, and although I can get ssh access, it's not straightforward). Chris From travis at continuum.io Mon Apr 16 19:17:08 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 18:17:08 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: The comments I have heard have been from people who haven't wanted to make them on this list. I wish they would, but I understand that not everyone wants to be drawn into a long discussion. They have not been discussions. My bias is to just move forward with what is there. After a week or two of discussion, I expect that we will resolve this one way or another. The result be to just move forward as previously planned. However, that might not be the best move forward either. These are significant changes and they do impact users. We need to understand those implications and take very seriously any concerns from users. There is time to look at this carefully. We need to take the time. I am really posting so that the discussions Mark and I have this week (I haven't seen Mark since PyCon) can be productive with as many other people participating as possible. -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 16, 2012, at 6:01 PM, Ralf Gommers wrote: > > > On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez wrote: > On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers > wrote: > > That's the first time I've heard this. Until now, we have talked a lot about > > adding bitmasks and API changes, not about complete removal. My assumption > > was that the experimental label was enough. From Nathaniel's reaction I > > gathered the same. It looks like too many conversations on this topic are > > happening off-list. > > My impression was that Travis was just suggesting that as an option > here for discussion, not presenting it as something discussed > elsewhere. > > From "I have heard from a few people that they are not excited ...." I deduce it was discussed to some extent. > > I read Travis' email precisely as restarting the > discussion for consideration of the issues in full public view > > It wasn't restating anything, it's completely opposite to the part that I thought we did reach consensus on (*not* backing out changes). I stated as much when first discussing a 1.7.0 in December, http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, with no one disagreeing. > > It's perfectly fine to reconsider any previous decisions/discussions of course. > > However, I do now draw the conclusion that it's best to wait for this issue to be resolved before considering a new release. I had been working on closing tickets and cleaning up loose ends for 1.7.0, and pinging others to do the same. I guess I'll stop doing that for now, until the renewed NA debate has been settled. > > If there are bug fixes that are important (like the Debian segfaults with Python debug builds), we can do a 1.6.2 release. > > Ralf > > (+ > calls/skype open to anyone interested for bandwidth purposes), so in > this case I don't think there's any background off-list to worry > about. At least that's how I read it... > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Mon Apr 16 20:08:46 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Mon, 16 Apr 2012 20:08:46 -0400 Subject: [Numpy-discussion] adding a cut function to numpy In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 6:01 PM, Skipper Seabold wrote: > On Mon, Apr 16, 2012 at 5:51 PM, Tony Yu wrote: > > > > > > On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold > > wrote: > >> > >> Hi, > >> > >> I have a pull request here [1] to add a cut function similar to R's > >> [2]. It seems there are often requests for similar functionality. It's > >> something I'm making use of for my own work and would like to use in > >> statstmodels and in generating instances of pandas' Factor class, but > >> is this generally something people would find useful to warrant its > >> inclusion in numpy? It will be even more useful I think with an enum > >> dtype in numpy. > >> > >> If you aren't familiar with cut, here's a potential use case. Going > >> from a continuous to a categorical variable. > >> > >> Given a continuous variable > >> > >> [~/] > >> [8]: age = np.random.randint(15,70, size=100) > >> > >> [~/] > >> [9]: age > >> [9]: > >> array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, > 27, > >> 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, > 69, > >> 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, > 40, > >> 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, > 25, > >> 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, > 27, > >> 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) > >> > >> Give me a variable where people are in age groups (lower bound is not > >> inclusive) > >> > >> [~/] > >> [10]: groups = [14, 25, 35, 45, 55, 70] > >> > >> [~/] > >> [11]: age_cat = np.cut(age, groups) > >> > >> [~/] > >> [12]: age_cat > >> [12]: > >> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, > >> 3, > >> 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, > 4, > >> 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, > 3, > >> 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, > 5, > >> 3, 2, 3, 2, 1, 3, 2, 2]) > >> > >> Skipper > >> > >> [1] https://github.com/numpy/numpy/pull/248 > >> [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html > > > > > > Is this the same as `np.searchsorted` (with reversed arguments)? > > > > In [292]: np.searchsorted(groups, age) > > Out[292]: > > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, > 3, > > 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, > 4, > > 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, > 3, > > 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, > 5, > > 3, 2, 3, 2, 1, 3, 2, 2]) > > > > That's news to me, and I don't know how I missed it. Actually, the only reason I remember searchsorted is because I also implemented a variant of it before finding that it existed. > It looks like > there is overlap, but cut will also do binning for equal width > categorization > > [~/] > [21]: np.cut(age, 6) > [21]: > array([5, 2, 1, 2, 3, 6, 5, 2, 1, 1, 4, 6, 3, 5, 3, 4, 2, 1, 2, 1, 6, 2, 4, > 1, 5, 2, 4, 5, 2, 3, 6, 2, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4, > 4, 2, 5, 5, 3, 2, 5, 4, 4, 2, 5, 1, 3, 2, 5, 1, 5, 1, 1, 2, 1, 2, 3, > 4, 2, 5, 3, 2, 4, 3, 1, 6, 2, 2, 1, 1, 3, 4, 2, 1, 6, 6, 6, 3, 3, 6, > 3, 3, 3, 3, 1, 3, 2, 2]) > > and explicitly handles the case with constant x > > [~/] > [26]: x = np.ones(100)*6 > > [~/] > [27]: np.cut(x, 5) > [27]: > array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, > 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, > 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, > 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, > 3, 3, 3, 3, 3, 3, 3, 3]) > > I guess I could patch searchsorted. Thoughts? > > Skipper > Hmm, ... I'm not sure if these other call signatures map as well to the name "searchsorted"; i.e. "cut" makes more sense in these cases. On the other hand, it seems these cases could be handled by `np.digitize` (although they aren't currently). Hmm,... why doesn't the above call to `cut` match (what I assume to be) the equivalent call to `np.digitize`: In [302]: np.digitize(age, np.linspace(age.min(), age.max(), 6)) Out[302]: array([4, 2, 1, 1, 2, 6, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4, 3, 2, 4, 4, 3, 2, 4, 3, 3, 2, 4, 1, 2, 2, 4, 1, 4, 1, 1, 2, 1, 1, 2, 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 6, 5, 2, 3, 5, 3, 2, 3, 2, 1, 2, 2, 2]) It's unfortunate that `digitize` and `histogram` have one call signature, but `searchsorted` has the reverse; in that sense, I like `cut` better. Cheers -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 16 20:46:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 18:46:55 -0600 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 5:17 PM, Travis Oliphant wrote: > The comments I have heard have been from people who haven't wanted to make > them on this list. I wish they would, but I understand that not everyone > wants to be drawn into a long discussion. They have not been discussions. > > My bias is to just move forward with what is there. After a week or two > of discussion, I expect that we will resolve this one way or another. The > result be to just move forward as previously planned. However, that might > not be the best move forward either. These are significant changes and > they do impact users. We need to understand those implications and take > very seriously any concerns from users. > > There is time to look at this carefully. We need to take the time. I > am really posting so that the discussions Mark and I have this week (I > haven't seen Mark since PyCon) can be productive with as many other people > participating as possible. > > I would suggest the you and Mark have a good talk first, then report here with some specifics that you think need discussion, along with specifics from the unnamed sources. The somewhat vague "some say" doesn't help much and in the absence of specifics the discussion is likely to proceed along the same old lines if it happens at all. Meanwhile there is a disturbance in the force that makes us all uneasy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Apr 16 21:03:50 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 16 Apr 2012 18:03:50 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: Hi, On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant wrote: > I have heard from a few people that they are not excited by the growth of > the NumPy data-structure by the 3 pointers needed to hold the masked-array > storage. ? This is especially true when there is talk to potentially add > additional attributes to the NumPy array (for labels and other > meta-information). ? ? ?If you are willing to let us know how you feel about > this, please speak up. I guess there are two questions here 1) Will something like the current version of masked arrays have a long term future in numpy, regardless of eventual API? Most likely answer - yes? 2) Will likely changes to the masked array API make any difference to the number of extra pointers? Likely answer no? Is that right? I have the impression that the masked array API discussion still has not come out fully into the unforgiving light of discussion day, but if the answer to 2) is No, then I suppose the API discussion is not relevant to the 3 pointers change. See y'all, Matthew From charlesr.harris at gmail.com Mon Apr 16 21:21:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 19:21:04 -0600 Subject: [Numpy-discussion] Segmentation fault during tests with Python 2.7.2 on Debian 6? In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 5:16 PM, Chris Ball wrote: > Charles R Harris gmail.com> writes: > > > > > > > On Thu, Apr 12, 2012 at 8:13 PM, Charles R Harris > gmail.com> wrote: > > > > On Thu, Apr 12, 2012 at 7:41 PM, Charles R Harris > gmail.com> wrote: > > > > On Thu, Apr 12, 2012 at 2:05 AM, Chris Ball gmail.com> > wrote: > > > > > > > > Hi, > > I'm trying out various continuous integration options, so I happen to be > > testing NumPy on several platforms that I don't normally use. > > Recently, I've been getting a segmentation fault on Debian 6 (with Python > > 2.7.2): > > Linux debian6-amd64 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 > x86_64 > > GNU/Linux (Debian GNU/Linux 6.0 \n \l) > ... > > Segmentation fault is buried in console output of Jenkins:https:// > jenkins.shiningpanda.com/scipy/job/NumPy/PYTHON=CPython-2.7/6/console > > The previous build was ok: > https://jenkins.shiningpanda.com/scipy/job/NumPy/ > PYTHON=CPython-2.7/5/console > > Changes that Jenkins claims are responsible: > https://jenkins.shiningpanda.com/ > scipy/job/NumPy/PYTHON=CPython-2.7/6/ > > changes#detail0 > > > > > > > > It seems that python2.7 is far, far, too recent to be part of Debian 6. I > mean, finding python 2.7 in recent Debian stable would be like finding an > atomic cannon in a 1'st dynasty Egyptian tomb. So it is in testing, but for > replication I like to know where you got it. > > > > > > > > > > > > Python 2.7 from Debian testing works fine here. > > > > > > > > > > But ActiveState python (ucs2) segfaults with>>> a = > np.array(['0123456789'], > 'U') > > >>> aSegmentation faultThe string needs to be long for this to show.Chuck > > Sorry for the delay. I'll let you know about that as soon as I can (I > didn't > set up the machine, and although I can get ssh access, it's not > straightforward). > > Don't worry about it yet, I'm working on a fix. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Apr 16 22:38:00 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 16 Apr 2012 19:38:00 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: Hi, On Mon, Apr 16, 2012 at 6:03 PM, Matthew Brett wrote: > Hi, > > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant wrote: > >> I have heard from a few people that they are not excited by the growth of >> the NumPy data-structure by the 3 pointers needed to hold the masked-array >> storage. ? This is especially true when there is talk to potentially add >> additional attributes to the NumPy array (for labels and other >> meta-information). ? ? ?If you are willing to let us know how you feel about >> this, please speak up. > > I guess there are two questions here > > 1) Will something like the current version of masked arrays have a > long term future in numpy, regardless of eventual API? Most likely > answer - yes? > 2) Will likely changes to the masked array API make any difference to > the number of extra pointers? ?Likely answer no? > > Is that right? > > I have the impression that the masked array API discussion still has > not come out fully into the unforgiving light of discussion day, but > if the answer to 2) is No, then I suppose the API discussion is not > relevant to the 3 pointers change. Sorry, if the answers to 1 and 2 are Yes and No then the API discussion may not be relevant. Cheers, Matthew From travis at continuum.io Mon Apr 16 22:46:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 21:46:27 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: > Hi, > > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant wrote: > >> I have heard from a few people that they are not excited by the growth of >> the NumPy data-structure by the 3 pointers needed to hold the masked-array >> storage. This is especially true when there is talk to potentially add >> additional attributes to the NumPy array (for labels and other >> meta-information). If you are willing to let us know how you feel about >> this, please speak up. > > I guess there are two questions here > > 1) Will something like the current version of masked arrays have a > long term future in numpy, regardless of eventual API? Most likely > answer - yes? I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). > 2) Will likely changes to the masked array API make any difference to > the number of extra pointers? Likely answer no? > > Is that right? The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). > > I have the impression that the masked array API discussion still has > not come out fully into the unforgiving light of discussion day, but > if the answer to 2) is No, then I suppose the API discussion is not > relevant to the 3 pointers change. You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. In that climate, my concern is that we haven't finalized the API but are rapidly cementing the *structure* of NumPy arrays into a modified form that has real downstream implications. Two other people I have talked to share this concern (nobody who has posted on this list before but who are heavy users of NumPy). I may have missed the threads where it was discussed, but have these structure changes and their implications been fully discussed? Is there anyone else who is concerned about adding 3 more pointers (12 bytes or 24 bytes) to the NumPy structure? As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). But, I personally know of half-written NEPs that propose to add more pointers to the NumPy array: * to allow meta-information to be attached to a NumPy array * to allow labels to be attached to a NumPy array (ala data-array) * to allow multiple chunks for an array. Are people O.K. with 5 or 6 more pointers on every NumPy array? We could also think about adding just one more pointer to a new "enhanced" structure that contains multiple enhancements to the NumPy array. But, this whole line of discussion sounds a lot like a true sub-class of the NumPy array at the C-level. It has the benefit that only people that use the features of the sub-class have to worry about using the extra space. Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. Best regards, -Travis > > See y'all, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Mon Apr 16 23:08:03 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 16 Apr 2012 20:08:03 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: Hi, On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant wrote: > > On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: > >> Hi, >> >> On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant wrote: >> >>> I have heard from a few people that they are not excited by the growth of >>> the NumPy data-structure by the 3 pointers needed to hold the masked-array >>> storage. ? This is especially true when there is talk to potentially add >>> additional attributes to the NumPy array (for labels and other >>> meta-information). ? ? ?If you are willing to let us know how you feel about >>> this, please speak up. >> >> I guess there are two questions here >> >> 1) Will something like the current version of masked arrays have a >> long term future in numpy, regardless of eventual API? Most likely >> answer - yes? > > I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). I'd love to hear that argument fleshed out in more detail - do you have time? >> 2) Will likely changes to the masked array API make any difference to >> the number of extra pointers? ?Likely answer no? >> >> Is that right? > > The answer to this is very likely no on the Python side. ?But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). > >> >> I have the impression that the masked array API discussion still has >> not come out fully into the unforgiving light of discussion day, but >> if the answer to 2) is No, then I suppose the API discussion is not >> relevant to the 3 pointers change. > > You are correct that the API discussion is separate from this one. ? ? Overall, ?I was surprised at how fervently people would oppose ABI changes. ? As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. ? I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. ? ?But, that is the current climate. The objectors object to any binary ABI change, but not specifically three pointers rather than two or one? Is their point then about ABI breakage? Because that seems like a different point again. Or is it possible that they are in fact worried about the masked array API? > Mark and I will talk about this long and hard. ?Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. ? ?If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. I started writing something about this but I guess you'd know what I'd write, so I only humbly ask that you consider whether it might be doing real damage to allow substantial discussion that is not documented or argued out in public. See you, Matthew From travis at continuum.io Mon Apr 16 23:16:17 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 22:16:17 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: Ralf, I wouldn't change your plans just yet for NumPy 1.7. With Mark available full time for the next few weeks, I think we will be able to make rapid progress on whatever is decided -- in fact if people are available to help but just need resources let me know off list. I just want to make sure that the process for making significant changes to NumPy does not dis-enfranchise any voice. Like bug-reports, and feature-requests, complaints are food to a project, just like usage is oxygen. In my view, we should take any concern that is raised from the perspective of NumPy is "guilty until proven innocent." This takes some intentional effort. I have found that because of how much work it takes to design and implement software, my natural perspective is to be defensive, but I have always appreciated the outcome when all view-points are considered seriously and addressed respectfully. Best regards, -Travis On Apr 16, 2012, at 6:01 PM, Ralf Gommers wrote: > > > On Tue, Apr 17, 2012 at 12:27 AM, Fernando Perez wrote: > On Mon, Apr 16, 2012 at 3:21 PM, Ralf Gommers > wrote: > > That's the first time I've heard this. Until now, we have talked a lot about > > adding bitmasks and API changes, not about complete removal. My assumption > > was that the experimental label was enough. From Nathaniel's reaction I > > gathered the same. It looks like too many conversations on this topic are > > happening off-list. > > My impression was that Travis was just suggesting that as an option > here for discussion, not presenting it as something discussed > elsewhere. > > From "I have heard from a few people that they are not excited ...." I deduce it was discussed to some extent. > > I read Travis' email precisely as restarting the > discussion for consideration of the issues in full public view > > It wasn't restating anything, it's completely opposite to the part that I thought we did reach consensus on (*not* backing out changes). I stated as much when first discussing a 1.7.0 in December, http://thread.gmane.org/gmane.comp.python.numeric.general/47022/focus=47027, with no one disagreeing. > > It's perfectly fine to reconsider any previous decisions/discussions of course. > > However, I do now draw the conclusion that it's best to wait for this issue to be resolved before considering a new release. I had been working on closing tickets and cleaning up loose ends for 1.7.0, and pinging others to do the same. I guess I'll stop doing that for now, until the renewed NA debate has been settled. > > If there are bug fixes that are important (like the Debian segfaults with Python debug builds), we can do a 1.6.2 release. > > Ralf > > (+ > calls/skype open to anyone interested for bandwidth purposes), so in > this case I don't think there's any background off-list to worry > about. At least that's how I read it... > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Apr 16 23:24:50 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 16 Apr 2012 23:24:50 -0400 Subject: [Numpy-discussion] adding a cut function to numpy In-Reply-To: References: Message-ID: On Mon, Apr 16, 2012 at 8:08 PM, Tony Yu wrote: > > > On Mon, Apr 16, 2012 at 6:01 PM, Skipper Seabold > wrote: >> >> On Mon, Apr 16, 2012 at 5:51 PM, Tony Yu wrote: >> > >> > >> > On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold >> > wrote: >> >> >> >> Hi, >> >> >> >> I have a pull request here [1] to add a cut function similar to R's >> >> [2]. It seems there are often requests for similar functionality. It's >> >> something I'm making use of for my own work and would like to use in >> >> statstmodels and in generating instances of pandas' Factor class, but >> >> is this generally something people would find useful to warrant its >> >> inclusion in numpy? It will be even more useful I think with an enum >> >> dtype in numpy. >> >> >> >> If you aren't familiar with cut, here's a potential use case. Going >> >> from a continuous to a categorical variable. >> >> >> >> Given a continuous variable >> >> >> >> [~/] >> >> [8]: age = np.random.randint(15,70, size=100) >> >> >> >> [~/] >> >> [9]: age >> >> [9]: >> >> array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, >> >> 27, >> >> ? ? ? 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, >> >> 69, >> >> ? ? ? 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, >> >> 40, >> >> ? ? ? 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, >> >> 25, >> >> ? ? ? 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, >> >> 27, >> >> ? ? ? 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) >> >> >> >> Give me a variable where people are in age groups (lower bound is not >> >> inclusive) >> >> >> >> [~/] >> >> [10]: groups = [14, 25, 35, 45, 55, 70] >> >> >> >> [~/] >> >> [11]: age_cat = np.cut(age, groups) >> >> >> >> [~/] >> >> [12]: age_cat >> >> [12]: >> >> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, >> >> 1, >> >> 3, >> >> ? ? ? 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, >> >> 4, >> >> ? ? ? 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, >> >> 3, >> >> ? ? ? 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, >> >> 5, >> >> ? ? ? 3, 2, 3, 2, 1, 3, 2, 2]) >> >> >> >> Skipper >> >> >> >> [1] https://github.com/numpy/numpy/pull/248 >> >> [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html >> > >> > >> > Is this the same as `np.searchsorted` (with reversed arguments)? >> > >> > In [292]:?np.searchsorted(groups, age) >> > Out[292]: >> > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, >> > 3, >> > ? ? ? ?1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, >> > 4, >> > ? ? ? ?3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, >> > 3, >> > ? ? ? ?3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, >> > 5, >> > ? ? ? ?3, 2, 3, 2, 1, 3, 2, 2]) >> > >> >> That's news to me, and I don't know how I missed it. > > > Actually, the only reason I remember searchsorted is because I also > implemented a variant of it before finding that it existed. > It's certainly not an obvious name for the behavior I wanted at least with my background. Ie., I want something that works on the data not the bins/groups. And it's not referenced in histogram or digitize, though now that I wade back through some threads I see people pointing to it. It also appears to be faster than my implementation with digitize with a quick look. >> >> It looks like >> there is overlap, but cut will also do binning for equal width >> categorization >> >> [~/] >> [21]: np.cut(age, 6) >> [21]: >> array([5, 2, 1, 2, 3, 6, 5, 2, 1, 1, 4, 6, 3, 5, 3, 4, 2, 1, 2, 1, 6, 2, >> 4, >> ? ? ? 1, 5, 2, 4, 5, 2, 3, 6, 2, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4, >> ? ? ? 4, 2, 5, 5, 3, 2, 5, 4, 4, 2, 5, 1, 3, 2, 5, 1, 5, 1, 1, 2, 1, 2, 3, >> ? ? ? 4, 2, 5, 3, 2, 4, 3, 1, 6, 2, 2, 1, 1, 3, 4, 2, 1, 6, 6, 6, 3, 3, 6, >> ? ? ? 3, 3, 3, 3, 1, 3, 2, 2]) >> >> and explicitly handles the case with constant x >> >> [~/] >> [26]: x = np.ones(100)*6 >> >> [~/] >> [27]: np.cut(x, 5) >> [27]: >> array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, >> 3, >> ? ? ? 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, >> ? ? ? 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, >> ? ? ? 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, >> ? ? ? 3, 3, 3, 3, 3, 3, 3, 3]) >> >> I guess I could patch searchsorted. Thoughts? >> >> Skipper > > > Hmm, ... I'm not sure if these?other call signatures?map as well to the name > "searchsorted"; i.e. "cut" makes more sense in these cases. > > On the other hand, it seems these cases could be handled by `np.digitize` > (although they aren't currently). Hmm,... why doesn't the above call to > `cut` match (what I assume to be) the equivalent call to `np.digitize`: > > In [302]:?np.digitize(age, np.linspace(age.min(), age.max(), 6)) > Out[302]: > array([4, 2, 1, 1, 2, 6, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, > ? ? ? ?1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4, > ? ? ? ?3, 2, 4, 4, 3, 2, 4, 3, 3, 2, 4, 1, 2, 2, 4, 1, 4, 1, 1, 2, 1, 1, 2, > ? ? ? ?3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 6, 5, 2, 3, 5, > ? ? ? ?3, 2, 3, 2, 1, 2, 2, 2]) > > It's unfortunate that `digitize` and `histogram` have one call signature, > but `searchsorted` has the reverse; in that sense, I like `cut` better. > I actually extended digitize to work the way I wanted with the sole intention to implement cut. https://github.com/numpy/numpy/pull/245 I agree about the call signature. As I mentioned, the way my work flow goes, I have the data first then think about the groups rather than thinking about doing an action on the groups themselves. In this way, I still think having cut is beneficial. Skipper From travis at continuum.io Mon Apr 16 23:40:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 22:40:53 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> >> >> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). > > I'd love to hear that argument fleshed out in more detail - do you have time? My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7. This would not require removing code but would require another PyTypeObject and associated structures. I expect Mark could do this work in 2-4 weeks. We also have other developers who could help in order to get the sub-type in NumPy 1.7. What kind of details would you like to see? In this way, the masked-array approach to missing data could be pursued by those who prefer that approach without affecting any other users of numpy arrays (and the numpy.ma sub-class could be deprecated). I would also like to add missing-data dtypes (ideally before NumPy 1.7, but it is not a requirement of release). I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another. > >>> 2) Will likely changes to the masked array API make any difference to >>> the number of extra pointers? Likely answer no? >>> >>> Is that right? >> >> The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). >> >>> >>> I have the impression that the masked array API discussion still has >>> not come out fully into the unforgiving light of discussion day, but >>> if the answer to 2) is No, then I suppose the API discussion is not >>> relevant to the 3 pointers change. >> >> You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. > > The objectors object to any binary ABI change, but not specifically > three pointers rather than two or one? Adding pointers is not really an ABI change (but removing them after they were there would be...) It's really just the addition of data to the NumPy array structure that they aren't going to use. Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying. > > Is their point then about ABI breakage? Because that seems like a > different point again. Yes, it's not that. > > Or is it possible that they are in fact worried about the masked array API? I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. I think they just want us to come up with an answer and then move forward. But, they will judge us based on the solution we come up with. > >> Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. > > I started writing something about this but I guess you'd know what I'd > write, so I only humbly ask that you consider whether it might be > doing real damage to allow substantial discussion that is not > documented or argued out in public. It will be documented and argued in public. We are just going to have one off-list conversation to try and speed up the process. You make a valid point, and I appreciate the perspective. Please speak up again after hearing the report if something is not clear. I don't want this to even have the appearance of a "back-room" deal. Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do? Thanks, -Travis > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Apr 17 00:01:46 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 22:01:46 -0600 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant wrote: > > On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: > > > Hi, > > > > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant > wrote: > > > >> I have heard from a few people that they are not excited by the growth > of > >> the NumPy data-structure by the 3 pointers needed to hold the > masked-array > >> storage. This is especially true when there is talk to potentially add > >> additional attributes to the NumPy array (for labels and other > >> meta-information). If you are willing to let us know how you feel > about > >> this, please speak up. > > > > I guess there are two questions here > > > > 1) Will something like the current version of masked arrays have a > > long term future in numpy, regardless of eventual API? Most likely > > answer - yes? > > I think the answer to this is yes, but it could be as a feature-filled > sub-class (like the current numpy.ma, except in C). > I think making numpy.ma a subclass of ndarray has caused all sorts of trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from ndarray for implementation of various parts. The upshot is that almost everything has to be overridden, so it didn't buy much. > > > 2) Will likely changes to the masked array API make any difference to > > the number of extra pointers? Likely answer no? > > > > Is that right? > > The answer to this is very likely no on the Python side. But, on the > C-side, their could be some differences (i.e. are masked arrays a sub-class > of the ndarray or not). > > > > > I have the impression that the masked array API discussion still has > > not come out fully into the unforgiving light of discussion day, but > > if the answer to 2) is No, then I suppose the API discussion is not > > relevant to the 3 pointers change. > > You are correct that the API discussion is separate from this one. > Overall, I was surprised at how fervently people would oppose ABI changes. > As has been pointed out, NumPy and Numeric before it were not really > designed to prevent having to recompile when changes were made. I'm still > not sure that a better overall solution is not to promote better > availability of downstream binary packages than excessively worry about ABI > changes in NumPy. But, that is the current climate. > > In that climate, my concern is that we haven't finalized the API but are > rapidly cementing the *structure* of NumPy arrays into a modified form that > has real downstream implications. Two other people I have talked to share > this concern (nobody who has posted on this list before but who are heavy > users of NumPy). I may have missed the threads where it was discussed, > but have these structure changes and their implications been fully > discussed? Is there anyone else who is concerned about adding 3 more > pointers (12 bytes or 24 bytes) to the NumPy structure? > > As Chuck points out, 3 more pointers is not necessarily that big of a deal > if you are talking about a large array (though for small arrays it could > matter). But, I personally know of half-written NEPs that propose to add > more pointers to the NumPy array: > > * to allow meta-information to be attached to a NumPy array > * to allow labels to be attached to a NumPy array (ala data-array) > * to allow multiple chunks for an array. > > Are people O.K. with 5 or 6 more pointers on every NumPy array? We > could also think about adding just one more pointer to a new "enhanced" > structure that contains multiple enhancements to the NumPy array. > > Yes, this whole thing could get out of hand with too many extras. One of the things you could discuss with Mark is how to deal with this, or limit the modifications. At some point the ndarray class could become cumbersome, complicated, and difficult to maintain. We need to be careful that it doesn't go that way. I'd like to keep it as simple as possible, the question is what is fundamental. The main long term advantage of having masks part of the base is the possibility of adapted loops in ufuncs, which would give the advantage of speed. But that is just how it looks from where I stand, no doubt others have different priorities. But, this whole line of discussion sounds a lot like a true sub-class of > the NumPy array at the C-level. It has the benefit that only people that > use the features of the sub-class have to worry about using the extra space. > > Mark and I will talk about this long and hard. Mark has ideas about where > he wants to see NumPy go, but I don't think we have fully accounted for > where NumPy and its user base *is* and there may be better ways to approach > this evolution. If others are interested in the outcome of the > discussion please speak up (either on the list or privately) and we will > make sure your views get heard and accounted for. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Apr 17 00:38:44 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 16 Apr 2012 23:38:44 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: <24DD18E0-BF7E-466E-9C37-7621397CDCA3@continuum.io> On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote: > > > On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant wrote: > > On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: > > > Hi, > > > > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant wrote: > > > >> I have heard from a few people that they are not excited by the growth of > >> the NumPy data-structure by the 3 pointers needed to hold the masked-array > >> storage. This is especially true when there is talk to potentially add > >> additional attributes to the NumPy array (for labels and other > >> meta-information). If you are willing to let us know how you feel about > >> this, please speak up. > > > > I guess there are two questions here > > > > 1) Will something like the current version of masked arrays have a > > long term future in numpy, regardless of eventual API? Most likely > > answer - yes? > > I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). > > I think making numpy.ma a subclass of ndarray has caused all sorts of trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from ndarray for implementation of various parts. The upshot is that almost everything has to be overridden, so it didn't buy much. This is a valid point. One could create a new object that is binary compatible with the NumPy Array but not really a sub-class but provides the array interface. We could keep Mark's modifications to the array interface as well so that it can communicate a mask. -Travis > > > > 2) Will likely changes to the masked array API make any difference to > > the number of extra pointers? Likely answer no? > > > > Is that right? > > The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). > > > > > I have the impression that the masked array API discussion still has > > not come out fully into the unforgiving light of discussion day, but > > if the answer to 2) is No, then I suppose the API discussion is not > > relevant to the 3 pointers change. > > You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. > > In that climate, my concern is that we haven't finalized the API but are rapidly cementing the *structure* of NumPy arrays into a modified form that has real downstream implications. Two other people I have talked to share this concern (nobody who has posted on this list before but who are heavy users of NumPy). I may have missed the threads where it was discussed, but have these structure changes and their implications been fully discussed? Is there anyone else who is concerned about adding 3 more pointers (12 bytes or 24 bytes) to the NumPy structure? > > As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). But, I personally know of half-written NEPs that propose to add more pointers to the NumPy array: > > * to allow meta-information to be attached to a NumPy array > * to allow labels to be attached to a NumPy array (ala data-array) > * to allow multiple chunks for an array. > > Are people O.K. with 5 or 6 more pointers on every NumPy array? We could also think about adding just one more pointer to a new "enhanced" structure that contains multiple enhancements to the NumPy array. > > > Yes, this whole thing could get out of hand with too many extras. One of the things you could discuss with Mark is how to deal with this, or limit the modifications. At some point the ndarray class could become cumbersome, complicated, and difficult to maintain. We need to be careful that it doesn't go that way. I'd like to keep it as simple as possible, the question is what is fundamental. The main long term advantage of having masks part of the base is the possibility of adapted loops in ufuncs, which would give the advantage of speed. But that is just how it looks from where I stand, no doubt others have different priorities. > > But, this whole line of discussion sounds a lot like a true sub-class of the NumPy array at the C-level. It has the benefit that only people that use the features of the sub-class have to worry about using the extra space. > > Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Apr 17 00:59:26 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 16 Apr 2012 21:59:26 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: Hi, On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant wrote: >>> >>> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). >> >> I'd love to hear that argument fleshed out in more detail - do you have time? > > > My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7. > > This would not require removing code but would require another PyTypeObject and associated structures. ?I expect Mark could do this work in 2-4 weeks. ? We also have other developers who could help in order to get the sub-type in NumPy 1.7. ? ? What kind of details would you like to see? I was dimly thinking of the same questions that Chuck had - about how subclassing would relate to the ufunc changes. > I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another. Is the proposal that this would be an alternative API to numpy.ma? Is numpy.ma not itself satisfactory as a test of these uses, because of performance or some other reason? >>>> 2) Will likely changes to the masked array API make any difference to >>>> the number of extra pointers? ?Likely answer no? >>>> >>>> Is that right? >>> >>> The answer to this is very likely no on the Python side. ?But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). >>> >>>> >>>> I have the impression that the masked array API discussion still has >>>> not come out fully into the unforgiving light of discussion day, but >>>> if the answer to 2) is No, then I suppose the API discussion is not >>>> relevant to the 3 pointers change. >>> >>> You are correct that the API discussion is separate from this one. ? ? Overall, ?I was surprised at how fervently people would oppose ABI changes. ? As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. ? I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. ? ?But, that is the current climate. >> >> The objectors object to any binary ABI change, but not specifically >> three pointers rather than two or one? > > Adding pointers is not really an ABI change (but removing them after they were there would be...) ?It's really just the addition of data to the NumPy array structure that they aren't going to use. ?Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying. > >> >> Is their point then about ABI breakage? ?Because that seems like a >> different point again. > > Yes, it's not that. > >> >> Or is it possible that they are in fact worried about the masked array API? > > I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. ?I think they just want us to come up with an answer and then move forward. ? ?But, they will judge us based on the solution we come up with. > >> >>> Mark and I will talk about this long and hard. ?Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. ? ?If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. >> >> I started writing something about this but I guess you'd know what I'd >> write, so I only humbly ask that you consider whether it might be >> doing real damage to allow substantial discussion that is not >> documented or argued out in public. > > It will be documented and argued in public. ? ? We are just going to have one off-list conversation to try and speed up the process. ? ?You make a valid point, and I appreciate the perspective. ? ? Please speak up again after hearing the report if something is not clear. ? I don't want this to even have the appearance of a "back-room" deal. > > Mark and I will have conversations about NumPy while he is in Austin. ? There are many other active stake-holders whose opinions and views are essential for major changes. ? ?Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. ? ? I'm not sure if that helps. ? Is there more we can do? As you might have heard me say before, my concern is that it has not been easy to have good discussions on this list. I think the problem has been that is has not been clear what the culture was, and how decisions got made, and that had led to some uncomfortable and unhelpful discussions. My plea would be for you as BDF$N to strongly encourage on-list discussions and discourage off-list discussions as far as possible, and to help us make the difficult public effort to bash out the arguments to clarity and consensus. I know that's a big ask. See you, Matthew From workalof at gmail.com Tue Apr 17 01:32:06 2012 From: workalof at gmail.com (John Mitchell) Date: Mon, 16 Apr 2012 23:32:06 -0600 Subject: [Numpy-discussion] f2py with int8 Message-ID: Hi, I am using f2py to pass a numpy array of type numpy.int8 to fortran. It seems like I am misunderstanding something because I just can't make it work. Here is what I am doing. PYTHON b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') b[0]=1 b[2]=1 b[3]=1 b array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) FORTRAN subroutine print_bit_array(bits,n) use iso_fortran_env integer,intent(in)::n integer(kind=int8),intent(in),dimension(n)::bits print*,'bits = ',bits end subroutine print_bit_array RESULT when calling fortran from python bits = 1 0 0 0 0 0 0 0 1 0 Any Ideas? thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 17 01:44:08 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Apr 2012 23:44:08 -0600 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <24DD18E0-BF7E-466E-9C37-7621397CDCA3@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <24DD18E0-BF7E-466E-9C37-7621397CDCA3@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 10:38 PM, Travis Oliphant wrote: > > On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote: > > > > On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant wrote: > >> >> On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: >> >> > Hi, >> > >> > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant >> wrote: >> > >> >> I have heard from a few people that they are not excited by the growth >> of >> >> the NumPy data-structure by the 3 pointers needed to hold the >> masked-array >> >> storage. This is especially true when there is talk to potentially >> add >> >> additional attributes to the NumPy array (for labels and other >> >> meta-information). If you are willing to let us know how you feel >> about >> >> this, please speak up. >> > >> > I guess there are two questions here >> > >> > 1) Will something like the current version of masked arrays have a >> > long term future in numpy, regardless of eventual API? Most likely >> > answer - yes? >> >> I think the answer to this is yes, but it could be as a feature-filled >> sub-class (like the current numpy.ma, except in C). >> > > I think making numpy.ma a subclass of ndarray has caused all sorts of > trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from > ndarray for implementation of various parts. The upshot is that almost > everything has to be overridden, so it didn't buy much. > > > This is a valid point. One could create a new object that is binary > compatible with the NumPy Array but not really a sub-class but provides the > array interface. We could keep Mark's modifications to the array > interface as well so that it can communicate a mask. > > Another place inheritance causes problems is PyUnicodeArrType inheriting from PyUnicodeType. There the difficulty is that the unicode itemsize/encoding may not match between the types. IIRC, it isn't recommended that derived classes change the itemsize. Numpy also has the different byte orderings... The Python types are sort of like virtual classes, so in some sense they are designed for inheritance. We could maybe set up some sort of parallel numpy type system with empty slots and such but we would need to decide what those slots are ahead of time. And if we got really serious, ABI backwards compatibility would break big time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Apr 17 01:44:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 17 Apr 2012 00:44:26 -0500 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: <5EE0D067-6271-4BBD-BA93-E643600D7D8E@continuum.io> On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote: > Hi, > > On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant wrote: >>>> >>>> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C). >>> >>> I'd love to hear that argument fleshed out in more detail - do you have time? >> >> >> My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7. >> >> This would not require removing code but would require another PyTypeObject and associated structures. I expect Mark could do this work in 2-4 weeks. We also have other developers who could help in order to get the sub-type in NumPy 1.7. What kind of details would you like to see? > > I was dimly thinking of the same questions that Chuck had - about how > subclassing would relate to the ufunc changes. Basically, there are two sets of changes as far as I understand right now: 1) ufunc infrastructure understands masked arrays 2) ndarray grew attributes to represent masked arrays I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. Such changes would not require ripping code out --- just altering the presentation a bit. Yet, they could have large long-term implications, that we should explore before they get fixed. Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. The big questions are does this object work in the calculation infrastructure. Can I add an array to a masked array. Does it have a sum method? I think it could be argued that a masked array does have a "is a" relationship with an array. It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting. -Travis > >> I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another. > > Is the proposal that this would be an alternative API to numpy.ma? > Is numpy.ma not itself satisfactory as a test of these uses, because > of performance or some other reason? > >>>>> 2) Will likely changes to the masked array API make any difference to >>>>> the number of extra pointers? Likely answer no? >>>>> >>>>> Is that right? >>>> >>>> The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not). >>>> >>>>> >>>>> I have the impression that the masked array API discussion still has >>>>> not come out fully into the unforgiving light of discussion day, but >>>>> if the answer to 2) is No, then I suppose the API discussion is not >>>>> relevant to the 3 pointers change. >>>> >>>> You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate. >>> >>> The objectors object to any binary ABI change, but not specifically >>> three pointers rather than two or one? >> >> Adding pointers is not really an ABI change (but removing them after they were there would be...) It's really just the addition of data to the NumPy array structure that they aren't going to use. Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying. >> >>> >>> Is their point then about ABI breakage? Because that seems like a >>> different point again. >> >> Yes, it's not that. >> >>> >>> Or is it possible that they are in fact worried about the masked array API? >> >> I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. I think they just want us to come up with an answer and then move forward. But, they will judge us based on the solution we come up with. >> >>> >>>> Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for. >>> >>> I started writing something about this but I guess you'd know what I'd >>> write, so I only humbly ask that you consider whether it might be >>> doing real damage to allow substantial discussion that is not >>> documented or argued out in public. >> >> It will be documented and argued in public. We are just going to have one off-list conversation to try and speed up the process. You make a valid point, and I appreciate the perspective. Please speak up again after hearing the report if something is not clear. I don't want this to even have the appearance of a "back-room" deal. >> >> Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do? > > As you might have heard me say before, my concern is that it has not > been easy to have good discussions on this list. I think the problem > has been that is has not been clear what the culture was, and how > decisions got made, and that had led to some uncomfortable and > unhelpful discussions. My plea would be for you as BDF$N to strongly > encourage on-list discussions and discourage off-list discussions as > far as possible, and to help us make the difficult public effort to > bash out the arguments to clarity and consensus. I know that's a big > ask. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From paul.anton.letnes at gmail.com Tue Apr 17 01:53:03 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Tue, 17 Apr 2012 07:53:03 +0200 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: References: Message-ID: Hi, this probably does not help with your problem. However, I would recommend changing your fortran code to: subroutine print_bit_array(bits) use iso_fortran_env integer(kind=int8),intent(in),dimension(:)::bits print*,'bits = ',bits end subroutine print_bit_array In that way you could print shape(bits) to verify that you are getting an array of the size you are expecting. Also, you could compile with -fbounds-check (gfortran) or a similar flag for some extra debugging facilities. To get better help with your issues, I would recommend also posting your call to the fortran routine, and the compilation command used (f2py -m myfile.f90 -flags....). Cheers Paul On 17. apr. 2012, at 07:32, John Mitchell wrote: > Hi, > > I am using f2py to pass a numpy array of type numpy.int8 to fortran. It seems like I am misunderstanding something because I just can't make it work. > > Here is what I am doing. > > PYTHON > b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') > b[0]=1 > b[2]=1 > b[3]=1 > b > array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) > > > > FORTRAN > subroutine print_bit_array(bits,n) > use iso_fortran_env > integer,intent(in)::n > integer(kind=int8),intent(in),dimension(n)::bits > print*,'bits = ',bits > end subroutine print_bit_array > > > RESULT when calling fortran from python > bits = 1 0 0 0 0 0 0 0 1 0 > > Any Ideas? > thanks, > John > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sameer.grover.1 at gmail.com Tue Apr 17 01:58:20 2012 From: sameer.grover.1 at gmail.com (Sameer Grover) Date: Tue, 17 Apr 2012 11:28:20 +0530 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: References: Message-ID: <4F8D067C.9040203@gmail.com> On Tuesday 17 April 2012 11:02 AM, John Mitchell wrote: > Hi, > > I am using f2py to pass a numpy array of type numpy.int8 to fortran. > It seems like I am misunderstanding something because I just can't > make it work. > > Here is what I am doing. > > PYTHON > b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') > b[0]=1 > b[2]=1 > b[3]=1 > b > array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) > > > > FORTRAN > subroutine print_bit_array(bits,n) > use iso_fortran_env > integer,intent(in)::n > integer(kind=int8),intent(in),dimension(n)::bits > print*,'bits = ',bits > end subroutine print_bit_array > > > RESULT when calling fortran from python > bits = 1 0 0 0 0 0 0 0 1 0 > > Any Ideas? > thanks, > John > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion It seems to work if "integer(kind=int8)" is replaced with "integer(8)" or "integer(1)". Don't know why, though. Sameer -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Tue Apr 17 02:59:09 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 17 Apr 2012 08:59:09 +0200 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: <20120417065909.GB12437@phare.normalesup.org> On Mon, Apr 16, 2012 at 10:40:53PM -0500, Travis Oliphant wrote: > > The objectors object to any binary ABI change, but not specifically > > three pointers rather than two or one? > Adding pointers is not really an ABI change (but removing them after > they were there would be...) It's really just the addition of data to > the NumPy array structure that they aren't going to use. Most of the > time it would not be a real problem (the number of use-cases where you > have a lot of small NumPy arrays is small), but when it is a problem it > is very annoying. I think that something that the numpy community must be very careful about is ABI breakage. At the scale of a large and heavy institution, it is very costly. In my mind, this is the argument that should guide the discussion: does going one way of the other (removing NA or not) will lead us likely into ABI breakage ? My 2 cents, Ga?l From njs at pobox.com Tue Apr 17 08:52:50 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 13:52:50 +0100 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <5EE0D067-6271-4BBD-BA93-E643600D7D8E@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> <5EE0D067-6271-4BBD-BA93-E643600D7D8E@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 6:44 AM, Travis Oliphant wrote: > Basically, there are two sets of changes as far as I understand right now: > > ? ? ? ?1) ufunc infrastructure understands masked arrays > ? ? ? ?2) ndarray grew attributes to represent masked arrays > > I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). ? In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. ? Such changes would not require ripping code out --- just altering the presentation a bit. ? Yet, they could have large long-term implications, that we should explore before they get fixed. > > Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. ? The big questions are does this object work in the calculation infrastructure. ? Can I add an array to a masked array. ? Does it have a sum method? ? I think it could be argued that a masked array does have a "is a" relationship with an array. ? It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. ? Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting. It sounds like the main implementation issue here is that this masked array class needs some way to coordinate with the ufunc infrastructure to efficiently and reliably handle the mask in calculations. The core ufunc code now knows how to handle masks, and this functionality is needed for where= and NA-dtypes, so obviously it's staying, independent of what we decide to do with masked arrays. So the question is just, how do we get the masked array and the ufuncs talking to each other so they can do the right thing. Perhaps we should focus, then, on how to create a better hooking mechanism for ufuncs? Something along these lines? http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html If done in a solid enough way, this would also solve other problems, e.g. we could make ufuncs work reliably on sparse matrices, which seems to trip people up on scipy-user every month or two. Of course, it's very tricky to get right :-( As far the masked array API: I'm still not convinced we know how we want these things to behave. The masked arrays in master currently implement MISSING semantics, but AFAICT everyone who wants MISSING semantics prefers NA-dtypes or even plain old NaN's over a masked implementation. And some of the current implementation's biggest backers, like Chuck, have argued that they should switch to skipNA=True, which is more of an IGNORED-style semantic. OTOH, there's still disagreement over how IGNORED-style semantics should even work (I'm thinking of that discussion about commutivity). The best existing model is numpy.ma -- but the numpy.ma API is quite different from the NEP, in more ways than just the default setting for skipNA. numpy.ma uses the opposite convention for mask values, it has additional concepts like the fillvalue, hardmask versus softmask, and then there's the whole way the NEP uses views to manage the mask. And I don't know which of these numpy.ma features are useful, which are extraneous, and which are currently useful but will become extraneous once the users who really wanted something more like NA-dtypes switch to those. So we all agree that masked arrays can be useful, and that numpy.ma has problems. But straightforwardly porting numpy.ma to C doesn't seem like it would help much, and neither does simply declaring that numpy.ma has been deprecated in favour of a new NEP-like API. So, I dunno. It seems like it might make the most sense to: 1) take the mask fields out of the core ndarray (while leaving the rest of Mark's infrastructure, as per above) 2) make sure we have the hooks needed so that numpy.ma, and NEP-like APIs, and whatever other experiments people want to try, can all integrate well with ufuncs, and make any other extensions that are generally useful and required so that they can work well 3) once we've experimented, move the winner into the core. Or whatever else makes sense to do once we understand what we're trying to accomplish. -- Nathaniel From workalof at gmail.com Tue Apr 17 09:38:38 2012 From: workalof at gmail.com (John Mitchell) Date: Tue, 17 Apr 2012 07:38:38 -0600 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: <4F8D067C.9040203@gmail.com> References: <4F8D067C.9040203@gmail.com> Message-ID: Thanks Sameer. I confirmed on my side as well. I will try to understand the why part now. Much appreciated. On Mon, Apr 16, 2012 at 11:58 PM, Sameer Grover wrote: > On Tuesday 17 April 2012 11:02 AM, John Mitchell wrote: > > Hi, > > I am using f2py to pass a numpy array of type numpy.int8 to fortran. It > seems like I am misunderstanding something because I just can't make it > work. > > Here is what I am doing. > > PYTHON > b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') > b[0]=1 > b[2]=1 > b[3]=1 > b > array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) > > > > FORTRAN > subroutine print_bit_array(bits,n) > use iso_fortran_env > integer,intent(in)::n > integer(kind=int8),intent(in),dimension(n)::bits > print*,'bits = ',bits > end subroutine print_bit_array > > > RESULT when calling fortran from python > bits = 1 0 0 0 0 0 0 0 1 0 > > Any Ideas? > thanks, > John > > > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > It seems to work if "integer(kind=int8)" is replaced with "integer(8)" or > "integer(1)". Don't know why, though. > > Sameer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 17 10:24:20 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 15:24:20 +0100 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett wrote: > Hi, > > On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant wrote: >> Mark and I will have conversations about NumPy while he is in Austin. ? There are many other active stake-holders whose opinions and views are essential for major changes. ? ?Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. ? ? I'm not sure if that helps. ? Is there more we can do? > > As you might have heard me say before, my concern is that it has not > been easy to have good discussions on this list. ? I think the problem > has been that is has not been clear what the culture was, and how > decisions got made, and that had led to some uncomfortable and > unhelpful discussions. ?My plea would be for you as BDF$N to strongly > encourage on-list discussions and discourage off-list discussions as > far as possible, and to help us make the difficult public effort to > bash out the arguments to clarity and consensus. ?I know that's a big > ask. Hi Matthew, As you know, I agree with everything you just said :-). So in interest of transparency, I should add: I have been in touch with Travis some off-list, and the main topic has been how to proceed in a way that let's us achieve public consensus. -- Nathaniel From paul.anton.letnes at gmail.com Tue Apr 17 12:12:34 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Tue, 17 Apr 2012 18:12:34 +0200 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: <4F8D067C.9040203@gmail.com> References: <4F8D067C.9040203@gmail.com> Message-ID: <79E537C9-2FBA-4A09-ABED-19A2DE5F8645@gmail.com> Ah, come to think of it, I think that f2py only supports literal kind values. Maybe that's your problem. Paul On 17. apr. 2012, at 07:58, Sameer Grover wrote: > On Tuesday 17 April 2012 11:02 AM, John Mitchell wrote: >> Hi, >> >> I am using f2py to pass a numpy array of type numpy.int8 to fortran. It seems like I am misunderstanding something because I just can't make it work. >> >> Here is what I am doing. >> >> PYTHON >> b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') >> b[0]=1 >> b[2]=1 >> b[3]=1 >> b >> array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) >> >> >> >> FORTRAN >> subroutine print_bit_array(bits,n) >> use iso_fortran_env >> integer,intent(in)::n >> integer(kind=int8),intent(in),dimension(n)::bits >> print*,'bits = ',bits >> end subroutine print_bit_array >> >> >> RESULT when calling fortran from python >> bits = 1 0 0 0 0 0 0 0 1 0 >> >> Any Ideas? >> thanks, >> John >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > It seems to work if "integer(kind=int8)" is replaced with "integer(8)" or "integer(1)". Don't know why, though. > > Sameer > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From workalof at gmail.com Tue Apr 17 13:47:06 2012 From: workalof at gmail.com (John Mitchell) Date: Tue, 17 Apr 2012 11:47:06 -0600 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: <79E537C9-2FBA-4A09-ABED-19A2DE5F8645@gmail.com> References: <4F8D067C.9040203@gmail.com> <79E537C9-2FBA-4A09-ABED-19A2DE5F8645@gmail.com> Message-ID: Thanks Paul. I suppose this is now going slightly out of bounds for f2py. What I am looking for is the fortran kind type for a byte. I thought that this was int8. I guess the question is how to identify the kind type. Although I have verified that integer(1) seems to work for me, I would really like to know why and your answer alludes to that. Please excuse my ignorance on this topic. Can you perhaps educate me a little on 'literal kind values'? I take you to mean that 'int8' is not a literal kind value while 1 and 8 are examples of literal kind values. Thanks, John On Tue, Apr 17, 2012 at 10:12 AM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > Ah, come to think of it, I think that f2py only supports literal kind > values. Maybe that's your problem. > > Paul > > On 17. apr. 2012, at 07:58, Sameer Grover wrote: > > > On Tuesday 17 April 2012 11:02 AM, John Mitchell wrote: > >> Hi, > >> > >> I am using f2py to pass a numpy array of type numpy.int8 to fortran. > It seems like I am misunderstanding something because I just can't make it > work. > >> > >> Here is what I am doing. > >> > >> PYTHON > >> b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') > >> b[0]=1 > >> b[2]=1 > >> b[3]=1 > >> b > >> array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) > >> > >> > >> > >> FORTRAN > >> subroutine print_bit_array(bits,n) > >> use iso_fortran_env > >> integer,intent(in)::n > >> integer(kind=int8),intent(in),dimension(n)::bits > >> print*,'bits = ',bits > >> end subroutine print_bit_array > >> > >> > >> RESULT when calling fortran from python > >> bits = 1 0 0 0 0 0 0 0 1 0 > >> > >> Any Ideas? > >> thanks, > >> John > >> > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > It seems to work if "integer(kind=int8)" is replaced with "integer(8)" > or "integer(1)". Don't know why, though. > > > > Sameer > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Apr 17 14:40:40 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 17 Apr 2012 11:40:40 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: Hi, On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smith wrote: > On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett wrote: >> Hi, >> >> On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant wrote: >>> Mark and I will have conversations about NumPy while he is in Austin. ? There are many other active stake-holders whose opinions and views are essential for major changes. ? ?Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. ? ? I'm not sure if that helps. ? Is there more we can do? >> >> As you might have heard me say before, my concern is that it has not >> been easy to have good discussions on this list. ? I think the problem >> has been that is has not been clear what the culture was, and how >> decisions got made, and that had led to some uncomfortable and >> unhelpful discussions. ?My plea would be for you as BDF$N to strongly >> encourage on-list discussions and discourage off-list discussions as >> far as possible, and to help us make the difficult public effort to >> bash out the arguments to clarity and consensus. ?I know that's a big >> ask. > > Hi Matthew, > > As you know, I agree with everything you just said :-). So in interest > of transparency, I should add: I have been in touch with Travis some > off-list, and the main topic has been how to proceed in a way that > let's us achieve public consensus. I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. So, yes, you might get a better outcome for this specific case, but a worse outcome in the long term, because the list will start to feel that it's for signing off or voting rather than discussion, and that - I feel sure - would lead to worse decisions. The other issue is that there's a reason you are having the discussion off-list - which is that it was getting difficult on-list. But - again - a personal view - that really has to be addressed directly by setting out the rules of engagement and modeling the kind of discussion we want to have. Cheers, Matthew From efiring at hawaii.edu Tue Apr 17 15:00:54 2012 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 17 Apr 2012 09:00:54 -1000 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: <4F8DBDE6.9030704@hawaii.edu> On 04/17/2012 08:40 AM, Matthew Brett wrote: > Hi, > > On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smith wrote: >> On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant wrote: >>>> Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do? >>> >>> As you might have heard me say before, my concern is that it has not >>> been easy to have good discussions on this list. I think the problem >>> has been that is has not been clear what the culture was, and how >>> decisions got made, and that had led to some uncomfortable and >>> unhelpful discussions. My plea would be for you as BDF$N to strongly >>> encourage on-list discussions and discourage off-list discussions as >>> far as possible, and to help us make the difficult public effort to >>> bash out the arguments to clarity and consensus. I know that's a big >>> ask. >> >> Hi Matthew, >> >> As you know, I agree with everything you just said :-). So in interest >> of transparency, I should add: I have been in touch with Travis some >> off-list, and the main topic has been how to proceed in a way that >> let's us achieve public consensus. ...when possible without paralysis. > > I'm glad to hear that discussion is happening, but please do have it > on list. If it's off list it easy for people to feel they are being > bypassed, and that the public discussion is not important. So, yes, > you might get a better outcome for this specific case, but a worse > outcome in the long term, because the list will start to feel that > it's for signing off or voting rather than discussion, and that - I > feel sure - would lead to worse decisions. I think you are over-stating the case a bit. Taking what you say literally, one might conclude that numpy people should never meet and chat, or phone each other up and chat. But such small conversations are an important extension and facilitator of individual thinking. Major decisions do need to get hashed out publicly, but mailing list discussions are only one part of the thinking and decision process. Eric > > The other issue is that there's a reason you are having the discussion > off-list - which is that it was getting difficult on-list. But - > again - a personal view - that really has to be addressed directly by > setting out the rules of engagement and modeling the kind of > discussion we want to have. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Tue Apr 17 15:04:29 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 17 Apr 2012 12:04:29 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett wrote: > I'm glad to hear that discussion is happening, but please do have it > on list. ? If it's off list it easy for people to feel they are being > bypassed, and that the public discussion is not important. I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. The only thing ZT policies achieve is to remove common sense and human judgement from a process, invariably causing more harm than they do good, no matter how well intentioned. There are perfectly reasonable cases where a quick phone call may be a more effective and sensible way to work than an on-list discussion. The question isn't whether someone, somewhere, had an off-list discussion or not; it's whether *the main decision making process* is being handled transparently or not. I trust that Nathaniel and Travis had a sensible reason to speak off-list; as long as it appears clear that the *decisions about numpy* are being made via public discussion with room for all necessary input and confrontation of disparate viewpoints, I don't care what they talk about in private. In IPython, I am constantly fielding private emails that I very often refer to the list because they make more sense there, but I also have off-list discussions when I consider that to be the right thing to do. And I certainly hope nobody ever asks me to *never* have an off-list discussion. I try very hard to ensure that the direction of the project is very transparent, with redundant points (people) of access to critical resources and a good vetting of key decisions with public input (e.g. our first IPEP at https://github.com/ipython/ipython/issues/1611). If I am failing at that, I hope people will call me out *on that point*, but not on whether I ever pick up the phone or email to talk about IPython off-list. Let's try to trust for one minute that the actual decisions will be made here with solid debate and project-wide input, and seek change only if we have evidence that this isn't happening (not evidence of a meta-problem that isn't a problem here). Best, f From matthew.brett at gmail.com Tue Apr 17 15:10:55 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 17 Apr 2012 12:10:55 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: Hi, On Tue, Apr 17, 2012 at 12:04 PM, Fernando Perez wrote: > On Tue, Apr 17, 2012 at 11:40 AM, Matthew Brett wrote: >> I'm glad to hear that discussion is happening, but please do have it >> on list. ? If it's off list it easy for people to feel they are being >> bypassed, and that the public discussion is not important. > > I'm afraid I have to disagree: you seem to be proposing an absolute, > 'zero-tolerance'-style policy against any off-list discussion. ?The > only thing ZT policies achieve is to remove common sense and human > judgement from a process, invariably causing more harm than they do > good, no matter how well intentioned. Right - but that would be an absurd overstatement of what I said. There's no point in addressing something I didn't say and no sensible person would think. Indeed, it makes the discussion harder. It's just exhausting to have to keep stating the obvious. Of course discussions happen off-list. Of course sometimes that has to happen. Of course that can be a better and quicker way of having discussions. However, in this case the > Let's try to trust for one minute that the actual decisions will be > made here with solid debate and project-wide input, and seek change > only if we have evidence that this isn't happening (not evidence of a > meta-problem that isn't a problem here). meta-problem that is a real problem is that we've shown ourselves that we are not currently good at having discussions on list. There are clearly reasons for that, and also clearly, they can be addressed. The particular point I am making is neither silly nor extreme nor vapid. It is simply that, in order to make discussion work better on the list, it is in my view better to make an explicit effort to make the discussions - explicit. Yours in Bay Area opposition, Matthew From fperez.net at gmail.com Tue Apr 17 15:32:29 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 17 Apr 2012 12:32:29 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett wrote: > Right - but that would be an absurd overstatement of what I said. > There's no point in addressing something I didn't say and no sensible > person would think. ? Indeed, it makes the discussion harder. Well, in that case neither Eric Firing nor I are 'sensible persons', since that's how we both understood what you said (Eric's email appeared to me as a more concise/better phrased version of the same points I was making). You said: """ I'm glad to hear that discussion is happening, but please do have it on list. If it's off list it easy for people to feel they are being bypassed, and that the public discussion is not important. """ I don't think it's an 'absurd overstatement' to interpret that as "don't have discussions off-list", but hey, it may just be me :) > meta-problem that is a real problem is that we've shown ourselves that > we are not currently good at having discussions on list. Oh, I know that did happen in the past regarding this very topic (the big NA mess last summer); what I meant was to try and trust that *this time around* things might be already moving in a better direction, which it seems to me they are. It seems to me that everyone is genuinely trying to tackle the discussion/consensus questions head-on right on the list, and that's why I proposed waiting to see if there were really any problems before asking Nathaniel not to have any discussion off-list (esp. since we have no evidence that what they talked about had any impact on any decisions bypassing the open forum). Best, f From matthew.brett at gmail.com Tue Apr 17 15:43:16 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 17 Apr 2012 12:43:16 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <84C50C8D-979C-4599-A7B1-BF0A89E3AF0F@continuum.io> Message-ID: On Tue, Apr 17, 2012 at 12:32 PM, Fernando Perez wrote: > On Tue, Apr 17, 2012 at 12:10 PM, Matthew Brett wrote: >> Right - but that would be an absurd overstatement of what I said. >> There's no point in addressing something I didn't say and no sensible >> person would think. ? Indeed, it makes the discussion harder. > > Well, in that case neither Eric Firing nor I are 'sensible persons', > since that's how we both understood what you said (Eric's email > appeared to me as a more concise/better phrased version of the same > points I was making). You said: > > """ > I'm glad to hear that discussion is happening, but please do have it > on list. ? If it's off list it easy for people to feel they are being > bypassed, and that the public discussion is not important. > """ > > I don't think it's an 'absurd overstatement' to interpret that as > "don't have discussions off-list", but hey, it may just be me :) The absurd over-statement is the following: " I'm afraid I have to disagree: you seem to be proposing an absolute, 'zero-tolerance'-style policy against any off-list discussion. " >> meta-problem that is a real problem is that we've shown ourselves that >> we are not currently good at having discussions on list. > > Oh, I know that did happen in the past regarding this very topic (the > big NA mess last summer); what I meant was to try and trust that *this > time around* things might be already moving in a better direction, > which it seems to me they are. ?It seems to me that everyone is > genuinely trying to tackle the discussion/consensus questions head-on > right on the list, and that's why I proposed waiting to see if there > were really any problems before asking Nathaniel not to have any > discussion off-list (esp. since we have no evidence that what they > talked about had any impact on any decisions bypassing the open > forum). The question - which seems to me to be sensible rational and important - is how to get better at on-list discussion, and whether taking this particular discussion mainly off-list is good or bad in that respect. See you, Matthew From tim at cerazone.net Tue Apr 17 15:57:04 2012 From: tim at cerazone.net (Tim Cera) Date: Tue, 17 Apr 2012 15:57:04 -0400 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: I have never found mailing lists good places for discussion and consensus. I think the format itself does not lend itself to involvement, carefully considered (or the ability to change) positions, or voting since all of it can be so easily lost within all of the quoting, the back and forth, people walking away,,,etc. And you also want involvement from people who don't have x hours to craft a finely worded, politically correct, and detailed response. I am not advocating this particular system, but something like http://meta.programmers.stackexchange.com/ would be a better platform for building to a decision when there are many choices to be made. Now about ma, NA, missing... I am just an engineer working in water resources and I had lots of difficulty reading the NEP (so sleeeeepy) so I will be the first to admit that I probably have something wrong. Just for reference (since I missed it the first time around) Nathaniel posted this page at https://github.com/njsmith/numpy/wiki/NA-discussion-status I think that I could adapt to everything that is discussed in the NEP, but I do have some comments about things that puzzled me. I don't need answers, but if I am puzzled maybe others are also. First - 'maskna=True'? Tested on development version of numpy... >>> a = np.arange(10, maskna = True) >>> a[:2] = np.NA >>> a array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) Why do I have to specify 'maskna = True'? If NA and ndarray are intended to be combined in some way, then I don't think that I need this. During development, I understand, but the NEP shouldn't have it. Heck, even if you keep NA and ndarrays separate when someone tries to set a ndarray element with np.NA, instead of a ValueError convert to an NA array. I say that very casually as if I know how to do it. I do have a proof, but the margin is too small to include it. :-) I am torn about 'skipna=True'. I think I understand the desire for explicit behavior, but I suspect that every operation that I would use a NA array for, would require 'skipna=True'. Actually, I don't use that many reducing functions, so maybe not a big deal. Regardless of the skipna setting, a related idea that could be useful for reducing functions is to set an 'includesna' attribute with the returned scalar value. The view() didn't work as described in the NEP, where np.NA isn't propagated back to the original array. This could be because the NEP references a 'missingdata' work in progress branch and I don't know what has been merged. I can force the NEP described behavior if I set 'd.flags.ownmaskna=True'. >>> d = a.view() >>> d array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[0] = 4 >>> a array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d[6] = np.NA >>> d array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) >>> a array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9]) In the NEP 'Accessing a Boolean Mask' section there is a comment about... actually I don't understand this section at all. Especially about a boolean byle level mask? Why would it need to be a byte level mask in order to be viewed? The logic also of mask = True or False, that can be easily handled by using a better name for the flag. 'mask = True' means that the value is masked (missing), where if 'exposed = True' is used that means the value is not masked (not missing). The biggest question mark to me is that 'a[0] = np.NA' is destructive and (using numpy.ma) 'a.mask[0] = True' is not. Is that a big deal? I am trying to think back on all of my 'ma' code and try to remember if I masked, then unmasked values and I don't recall any time that I did that. Of course my use cases are constrained to what I have done in the past. It feels like a bad idea, for the sake of saving the memory for the mask bits. Now, the amazing thing is that understanding so little, doing even less of the work, I get to vote. Sounds like America! I would really like to see NA in the wild, and I think that I can adapt my ma code to it, so +1. If it has to wait until 1.8, +1. If it has to wait until 1.9, +1. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue Apr 17 16:01:59 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 17 Apr 2012 13:01:59 -0700 Subject: [Numpy-discussion] [Meta] Was: Re: Removing masked arrays for 1.7? (Was 1.7 blockers) Message-ID: [ Making a separate thread so the NA one can stay on topic, since I haven't actually followed the discussion well enough to contribute on the technical points ] On Tue, Apr 17, 2012 at 12:43 PM, Matthew Brett wrote: > The absurd over-statement is the following: > > " I'm afraid I have to disagree: you seem to be proposing an absolute, > 'zero-tolerance'-style policy against any off-list discussion. " Well, that's how I understood what you said; sorry if I mis-interpreted it, but that is indeed how I read your post. > The question - which seems to me to be sensible rational and > important - is how to get better at on-list discussion, and whether > taking this particular discussion mainly off-list is good or bad in > that respect. And on that we completely agree. I just think that in this case, we can try to give people a chance and see if things go better: things have changed since last summer and there's encouraging evidence to suggest that a solid, open discussion will take place this time around. I just propose we give that process a chance first... Best, f From fperez.net at gmail.com Tue Apr 17 16:20:02 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 17 Apr 2012 13:20:02 -0700 Subject: [Numpy-discussion] All of the PyData videos are now up at the Marakana site Message-ID: Hi folks, A number of you expressed interest in attending the PyData workshop last month and unfortunately we had very tight space restrictions. But thanks to the team at Marakana, who pitched in and were willing to film, edit and post videos for many of the talks, you can access them all here: http://marakana.com/s/2012_pydata_workshop,1090/index.html They are in 720p so you can actually read the terminals, though I think you have to click the YouTube link to be able to change the resolution. Enjoy! f From paul.anton.letnes at gmail.com Wed Apr 18 06:56:53 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Wed, 18 Apr 2012 12:56:53 +0200 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: References: <4F8D067C.9040203@gmail.com> <79E537C9-2FBA-4A09-ABED-19A2DE5F8645@gmail.com> Message-ID: Hi, 1, 2, 3 are integer literals. 1.0, 3.0e2, -42.0 are real (float) literals 'hello world' is a string literal. As far as I remember, f2py requires a literal variable for the kind. The solution I have landed on is to write a pure fortran module (using int8, or whatever), and then wrap this module either with an f2py compatible fortran module or an interface file. If you want to know what int8 corresponds to, run the following (pure fortran) program through your compiler of choice: program kind_values use iso_fortran_env implicit none print *, 'int8 kind value:', int8 print *, 'int16 kind value:', int16 end program kind_values Paul On 17. apr. 2012, at 19:47, John Mitchell wrote: > Thanks Paul. > > I suppose this is now going slightly out of bounds for f2py. What I am looking for is the fortran kind type for a byte. I thought that this was int8. I guess the question is how to identify the kind type. Although I have verified that integer(1) seems to work for me, I would really like to know why and your answer alludes to that. > > Please excuse my ignorance on this topic. Can you perhaps educate me a little on 'literal kind values'? I take you to mean that > 'int8' is not a literal kind value while 1 and 8 are examples of literal kind values. > > Thanks, > John > > > > On Tue, Apr 17, 2012 at 10:12 AM, Paul Anton Letnes wrote: > Ah, come to think of it, I think that f2py only supports literal kind values. Maybe that's your problem. > > Paul > > On 17. apr. 2012, at 07:58, Sameer Grover wrote: > > > On Tuesday 17 April 2012 11:02 AM, John Mitchell wrote: > >> Hi, > >> > >> I am using f2py to pass a numpy array of type numpy.int8 to fortran. It seems like I am misunderstanding something because I just can't make it work. > >> > >> Here is what I am doing. > >> > >> PYTHON > >> b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') > >> b[0]=1 > >> b[2]=1 > >> b[3]=1 > >> b > >> array([1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int8) > >> > >> > >> > >> FORTRAN > >> subroutine print_bit_array(bits,n) > >> use iso_fortran_env > >> integer,intent(in)::n > >> integer(kind=int8),intent(in),dimension(n)::bits > >> print*,'bits = ',bits > >> end subroutine print_bit_array > >> > >> > >> RESULT when calling fortran from python > >> bits = 1 0 0 0 0 0 0 0 1 0 > >> > >> Any Ideas? > >> thanks, > >> John > >> > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > It seems to work if "integer(kind=int8)" is replaced with "integer(8)" or "integer(1)". Don't know why, though. > > > > Sameer > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Wed Apr 18 13:57:51 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 18 Apr 2012 13:57:51 -0400 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated Message-ID: <4F8F009F.6050006@gmail.com> http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib promises a list of functions that does not appear (at the moment, anyway). Alan Isaac From matthew.brett at gmail.com Wed Apr 18 14:03:33 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 18 Apr 2012 11:03:33 -0700 Subject: [Numpy-discussion] Casting rules - an awkward case Message-ID: Hi, I just wanted to point out a situation where the scalar casting rules can be a little confusing: In [113]: a - np.int16(128) Out[113]: array([-256, -1], dtype=int16) In [114]: a + np.int16(-128) Out[114]: array([ 0, -1], dtype=int8) This is predictable from the nice docs here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html but I offer it only as a speedbump I hit. On the other hand I didn't find it easy to predict what numpy 1.5.1 was going to do: In [31]: a - np.int16(1) Out[31]: array([127, 126], dtype=int8) In [32]: a + np.int16(-1) Out[32]: array([-129, 126], dtype=int16) As a matter of interest, what was the rule for 1.5.1? See you, Matthew From matthew.brett at gmail.com Wed Apr 18 15:05:36 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 18 Apr 2012 12:05:36 -0700 Subject: [Numpy-discussion] Casting rules - an awkward case In-Reply-To: References: Message-ID: Oops, sorry, Keith Goodman kindly pointed out that I had missed out: On Wed, Apr 18, 2012 at 11:03 AM, Matthew Brett wrote: > Hi, > > I just wanted to point out a situation where the scalar casting rules > can be a little confusing: In [110]: a = np.array([-128, 127], dtype=np.int8) > In [113]: a - np.int16(128) > Out[113]: array([-256, ? -1], dtype=int16) > > In [114]: a + np.int16(-128) > Out[114]: array([ 0, -1], dtype=int8) > > This is predictable from the nice docs here: > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html > > but I offer it only as a speedbump I hit. > > On the other hand I didn't find it easy to predict what numpy 1.5.1 > was going to do: > > In [31]: a - np.int16(1) > Out[31]: array([127, 126], dtype=int8) > > In [32]: a + np.int16(-1) > Out[32]: array([-129, ?126], dtype=int16) > > As a matter of interest, what was the rule for 1.5.1? Matthew From pav at iki.fi Wed Apr 18 16:14:17 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 18 Apr 2012 22:14:17 +0200 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated In-Reply-To: <4F8F009F.6050006@gmail.com> References: <4F8F009F.6050006@gmail.com> Message-ID: Hi, 18.04.2012 19:57, Alan G Isaac kirjoitti: > http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib > promises a list of functions that does not appear (at the moment, anyway). This doesn't seem to be due to a technical reason, but rather than because nobody has written a list of the functions in the docstring of the module. Pauli From pav at iki.fi Wed Apr 18 18:30:57 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 19 Apr 2012 00:30:57 +0200 Subject: [Numpy-discussion] YouTrack testbed In-Reply-To: References: <4F762ED3.9060402@continuum.io> <2F839812-5320-4390-A51F-53DEF8F47AEF@continuum.io> <4F7B0A15.30509@continuum.io> <4F834756.2040008@continuum.io> Message-ID: 12.04.2012 18:43, Ralf Gommers kirjoitti: [clip] > My current list of preferences is: > > 1. Redmine (if admin overhead is not unreasonable) > 2. Trac with performance issues solved > 3. Github > 4. YouTrack > 5. Trac with current performance Redmine seems pretty nice, apparently has all the features Trac has, and more. It's actually *easier* to administer than Trac, because you apparently can do most configuration via the web interface. With Trac, you have to drop down to command line and use trac-admin. Just don't deploy it on CGI like the Tracs we currently have :) Pauli From questions.anon at gmail.com Wed Apr 18 18:34:56 2012 From: questions.anon at gmail.com (questions anon) Date: Thu, 19 Apr 2012 08:34:56 +1000 Subject: [Numpy-discussion] mask array and add to list In-Reply-To: References: Message-ID: excellent thank you, that worked perfectly. I just need to remember this feature next time I need it. Thanks again On Thu, Apr 12, 2012 at 11:41 PM, Tim Cera wrote: > Use 'ma.max' instead of 'np.max'. This might be a bug OR an undocumented > feature. :-) > > import numpy.ma as ma > marr = ma.array(range(10), mask=[0,0,0,0,0,1,1,1,1,1]) > np.max(marr) > 4 # mask is used > > a = [] > a.append(marr) > a.append(marr) > np.max(a) > 9 # mask is not used > > ma.max(a) > 4 # mask is used > > Kindest regards, > Tim > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 18 21:12:39 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Apr 2012 21:12:39 -0400 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated In-Reply-To: References: <4F8F009F.6050006@gmail.com> Message-ID: On Wed, Apr 18, 2012 at 4:14 PM, Pauli Virtanen wrote: > Hi, > > 18.04.2012 19:57, Alan G Isaac kirjoitti: >> http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib >> promises a list of functions that does not appear (at the moment, anyway). > > This doesn't seem to be due to a technical reason, but rather than > because nobody has written a list of the functions in the docstring of > the module. Is it a good idea to use this? Mixing namespaces would completely confuse me. >>> for f in dir(numpy.matlib): ... try: ... if getattr(numpy.matlib, f).__module__ in ['numpy.matlib', 'numpy.matrixlib.defmatrix']: print f ... except: pass ... asmatrix bmat empty eye identity mat matrix ones rand randn repmat zeros Josef > > ? ? ? ?Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu Apr 19 07:00:57 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 13:00:57 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F874E6A.5040701@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> <4F874E6A.5040701@astro.uio.no> Message-ID: <4F8FF069.4050802@astro.uio.no> On 04/12/2012 11:51 PM, Dag Sverre Seljebotn wrote: > On 04/12/2012 11:13 PM, Travis Oliphant wrote: >> Dag, >> >> Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. >> >> That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. >> >> I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). >> >> Thanks for the CEP. > > Great. I'll pass this message on to the Cython list and see if anybody > wants to provide input (but given the scope, it should be minor tweaks > and easy to accommodate in whatever code you write). Getting back with a status update on this, the thread is still rolling and benchmarks getting taken on the Cython list. I think it will take some more time. This CEP will be incredibly important for Cython, e.g. if NumPy starts supporting it then from numpy import sin cdef double f(double x): return sin(x*x) won't be that much slower than early-binding directly with sin. It could take another couple of weeks. So for Numba I think just starting with whatever is fastest is the way to go now; and then hopefully one can have the CEP done and things ported over before a Numba or SciPy release gets into the wild without conforming to it. Dag From sturla at molden.no Thu Apr 19 08:17:09 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 19 Apr 2012 14:17:09 +0200 Subject: [Numpy-discussion] f2py with int8 In-Reply-To: References: Message-ID: <4F900245.4040301@molden.no> On 17.04.2012 07:32, John Mitchell wrote: > Hi, > > I am using f2py to pass a numpy array of type numpy.int8 to fortran. It > seems like I am misunderstanding something because I just can't make it > work. > > Here is what I am doing. > > PYTHON > b=numpy.array(numpy.zeros(shape=(10,),dtype=numpy.int8),order='F') This returns an array with dtype int. You want: b = numpy.zeros(shape=(10,),dtype=numpy.int8,order='F') Sturla From travis at continuum.io Thu Apr 19 10:17:56 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 19 Apr 2012 09:17:56 -0500 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: <4F8FF069.4050802@astro.uio.no> References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> <4F874E6A.5040701@astro.uio.no> <4F8FF069.4050802@astro.uio.no> Message-ID: Thanks for the status update. A couple of weeks is a fine timeline to wait. Are you envisioning that the ufuncs in NumPy would have the nativecall attribute? -- Travis Oliphant (on a mobile) 512-826-7480 On Apr 19, 2012, at 6:00 AM, Dag Sverre Seljebotn wrote: > On 04/12/2012 11:51 PM, Dag Sverre Seljebotn wrote: >> On 04/12/2012 11:13 PM, Travis Oliphant wrote: >>> Dag, >>> >>> Thanks for the link to your CEP. This is the first time I've seen it. You probably referenced it before, but I hadn't seen it. >>> >>> That CEP seems along the lines of what I was thinking of. We can make scipy follow that CEP and NumPy as well in places that it needs function pointers. >>> >>> I can certainly get behind it with Numba and recommend it to SciPy (and write the scipy.integrate.quad function to support it). >>> >>> Thanks for the CEP. >> >> Great. I'll pass this message on to the Cython list and see if anybody >> wants to provide input (but given the scope, it should be minor tweaks >> and easy to accommodate in whatever code you write). > > Getting back with a status update on this, the thread is still rolling > and benchmarks getting taken on the Cython list. > > I think it will take some more time. This CEP will be incredibly > important for Cython, e.g. if NumPy starts supporting it then > > from numpy import sin > cdef double f(double x): > return sin(x*x) > > won't be that much slower than early-binding directly with sin. > > It could take another couple of weeks. So for Numba I think just > starting with whatever is fastest is the way to go now; and then > hopefully one can have the CEP done and things ported over before a > Numba or SciPy release gets into the wild without conforming to it. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu Apr 19 11:28:36 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 17:28:36 +0200 Subject: [Numpy-discussion] Getting C-function pointers from Python to C In-Reply-To: References: <5B2B4C3B-120B-4551-8E64-4F2783FD447C@gmail.com> <4F85EF72.6000500@astro.uio.no> <212A6354-018D-4684-BC65-CDDB581F1F38@gmail.com> <4F85F2D4.8070807@astro.uio.no> <4F872833.4030309@astro.uio.no> <9091F889-6C4A-4F48-91C6-D8B0704C8C7F@continuum.io> <4F874E6A.5040701@astro.uio.no> <4F8FF069.4050802@astro.uio.no> Message-ID: <4F902F24.6050401@astro.uio.no> On 04/19/2012 04:17 PM, Travis Oliphant wrote: > Thanks for the status update. A couple of weeks is a fine timeline to wait. > > Are you envisioning that the ufuncs in NumPy would have the nativecall attribute? I'm envisioning that they would be able to support CEP 1000, yes, but I don't think they would necesarrily use the ufunc machinery or existing implementation -- just a namespace mechanism, a way of getting the "np.sin" to map to the libc (or npymath?) "sin" function. Currently when translating In Numba, if somebody writes: from numpy import sin @numba def f(x): return sin(x * x) Then, at numbafication-time, you can presumably lift the "sin" object out of the module scope, look at it, and through CEP 1000 figure out that there's a "d->d" function pointer inside the sin object, and use that a) for type inference, b) embed a jump to the function in the generated code. (If somebody rebinds "sin" after the numba decorator has run, they get to keep the pieces) I don't know how you planned on doing this now, perhaps special casing a few NumPy functions? The nice thing is that through CEP 1000, numba can transparently support whatever special function I write in Cython, and dispatch to it *fast*. Dag From nouiz at nouiz.org Fri Apr 20 10:55:48 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 20 Apr 2012 10:55:48 -0400 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> Message-ID: Hi, I just discovered that the NA mask will modify the base ndarray object. So I read about it to find the consequences on our c code. Up to now I have fully read: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html and partially read: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst https://github.com/njsmith/numpy/wiki/NA-discussion-status In those documents, I see a problem with legacy code that will receive an NA masked array as input. If I missed something, tell me. All our c functions check their inputs array with PyArray_Check and PyArray_ISALIGNED. If the NA mask array is set inside the ndarray c object, our c functions who don't know about it and will treat those inputs as not masked. So the user will have unexpected results. The output will be an ndarray without mask but the code will have used the masked value. This will also happen with all other c code that use ndarray. In our case, all the input check is done at the same place, so adding the check with "PyArray_HasNASupport(PyArrayObject* obj)" to raise an error will be easy for us. But I don't think this is the case for most other c code. So I would prefer a separate object to protect users from code not being updated to reject NA masked inputs. An alternative would be to have PyArray_Check return False for the NA masked array, but I don't like that as this break the semantic that it check for the class. A last option I see would be to make the NPY_ARRAY_BEHAVED flag also check that the array is not an NA marked array. I suppose many c code do this check. But this is not a bullet proof check as not all code (as ours) do not use it. Also, I don't mind the added pointers to the structure as we use big arrays. thanks Fr?d?ric From chris.barker at noaa.gov Fri Apr 20 12:49:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 20 Apr 2012 09:49:55 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: On Mon, Apr 16, 2012 at 7:46 PM, Travis Oliphant wrote: > As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). yup -- for the most part, numpy arrays are best for workign with large data sets, in which case a little bit bigger core object doesn't matter. But there are many times that we do want to work with small arrays (particularly ones that are pulled out of a larger array -- iterating over an array or (x,y) points or the like) However, numpy overhead is already pretty heavy for such use, so it may not matter. I recall discossion a couple times in the past of having some special-case numpy arrays for the simple, small cases -- perhaps 1-d or 2-d C-contiguous only, for instance. That might be a better way to address the small-array performance issue, and free us of concerns about minor growth to the core ndarray object. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From fperez.net at gmail.com Fri Apr 20 13:54:22 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 20 Apr 2012 10:54:22 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker wrote: > > I recall discossion a couple times in the past of having some > special-case numpy arrays for the simple, small cases -- perhaps 1-d > or 2-d C-contiguous only, for instance. That might be a better way to > address the small-array performance issue, and free us of concerns > about minor growth to the core ndarray object. +1 on that: I once wrote such code in pyrex (years ago) and it worked extremely well for me. No fancy features, very small footprint and highly optimized codepaths that gave me excellent performance. Cheers, f From charlesr.harris at gmail.com Fri Apr 20 14:04:31 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 20 Apr 2012 12:04:31 -0600 Subject: [Numpy-discussion] A 1.6.2 release? Message-ID: Hi All, Given the amount of new stuff coming in 1.7 and the slip in it's schedule, I wonder if it would be worth putting out a 1.6.2 release with fixes for einsum, ticket 1578, perhaps some others. My reasoning is that the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they might as well use a somewhat fixed up version. The downside is located and backporting fixes is likely to be a fair amount of work. A 1.7 release would be preferable, but I'm not sure when we can make that happen. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfugate at microsoft.com Fri Apr 20 14:05:04 2012 From: dfugate at microsoft.com (Dave Fugate) Date: Fri, 20 Apr 2012 18:05:04 +0000 Subject: [Numpy-discussion] Command-line options for (Windows) NumPy Installer? In-Reply-To: <47FF78CF835BC64E99CC0C2C559478C30B8D4914@BY2PRD0310MB388.namprd03.prod.outlook.com> References: <47FF78CF835BC64E99CC0C2C559478C30B8D4914@BY2PRD0310MB388.namprd03.prod.outlook.com> Message-ID: <47FF78CF835BC64E99CC0C2C559478C30B8D4957@BY2PRD0310MB388.namprd03.prod.outlook.com> Hi, is there any documentation available on exactly which command line options are available from NumPy's 'superpack' installers on Windows? E.g., http://docs.scipy.org/doc/numpy/user/install.html mentions an "/arch" flag, but I'm not seeing anything else called out. Thanks! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Apr 20 14:07:52 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 20 Apr 2012 11:07:52 -0700 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: Hi, On Fri, Apr 20, 2012 at 11:04 AM, Charles R Harris wrote: > Hi All, > > Given the amount of new stuff coming in 1.7 and the slip in it's schedule, I > wonder if it would be worth putting out a 1.6.2 release with fixes for > einsum, ticket 1578, perhaps some others. My reasoning is that the fall > releases of Fedora, Ubuntu are likely to still use 1.6 and they might as > well use a somewhat fixed up version. The downside is located and > backporting fixes is likely to be a fair amount of work. A 1.7 release would > be preferable, but I'm not sure when we can make that happen. Also, I believe Debian will very soon freeze "testing" in order to prepare to release the next stable. See you, Matthew From soucoupevolante at yahoo.com Fri Apr 20 14:15:45 2012 From: soucoupevolante at yahoo.com (Andre Martel) Date: Fri, 20 Apr 2012 11:15:45 -0700 (PDT) Subject: [Numpy-discussion] (no subject) Message-ID: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> What would be the best way to remove the maximum from a cube and "collapse" the remaining elements along the z-axis ? For example, I want to reduce Cube to NewCube: >>> Cube array([[[? 13,?? 2,?? 3,? 42], ??????? [? 5, 100,?? 7,?? 8], ??????? [? 9,?? 1,? 11,? 12]], ?????? [[ 25,?? 4,? 15,?? 1], ??????? [ 17,? 30,?? 9,? 20], ??????? [ 21,?? 2,? 23,? 24]], ?????? [[ 1,?? 2,? 27,? 28], ??????? [ 29,? 18,? 31,? 32], ??????? [ -1,?? 3,? 35,?? 4]]]) NewCube array([[[? 13,?? 2,?? 3,? 1], ??????? [? 5, 30,?? 7,?? 8], ??????? [? 9,?? 1,? 11,? 12]], ?????? [[ 1,?? 2,? 15,? 28], ??????? [ 17,? 18,? 9,? 20], ??????? [ -1,?? 2,? 23,?? 4]]]) I tried with argmax() and then roll() and delete() but these all work on 1-D arrays only. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Fri Apr 20 14:27:48 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 20 Apr 2012 20:27:48 +0200 Subject: [Numpy-discussion] =?utf-8?q?Removing_masked_arrays_for_1=2E7=3F_?= =?utf-8?b?KFdhcyAxLjcJYmxvY2tlcnMp?= In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> Message-ID: <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> Fernando Perez wrote: >On Fri, Apr 20, 2012 at 9:49 AM, Chris Barker >wrote: >> >> I recall discossion a couple times in the past of having some >> special-case numpy arrays for the simple, small cases -- perhaps 1-d >> or 2-d C-contiguous only, for instance. That might be a better way to >> address the small-array performance issue, and free us of concerns >> about minor growth to the core ndarray object. > >+1 on that: I once wrote such code in pyrex (years ago) and it worked >extremely well for me. No fancy features, very small footprint and >highly optimized codepaths that gave me excellent performance. I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well. I always assumed that it would be possible to optimize NumPy, just that nobody invested time in it. Starting from scratch you gain that you don't have to work with and understand NumPy's codebase, but I honestly think that's a small price to pay for compatibility. Dag > > >Cheers, > >f >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From tsyu80 at gmail.com Fri Apr 20 14:34:05 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Fri, 20 Apr 2012 14:34:05 -0400 Subject: [Numpy-discussion] (no subject) In-Reply-To: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> References: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> Message-ID: On Fri, Apr 20, 2012 at 2:15 PM, Andre Martel wrote: > What would be the best way to remove the maximum from a cube and > "collapse" the remaining elements along the z-axis ? > For example, I want to reduce Cube to NewCube: > > >>> Cube > array([[[ 13, 2, 3, 42], > [ 5, 100, 7, 8], > [ 9, 1, 11, 12]], > > [[ 25, 4, 15, 1], > [ 17, 30, 9, 20], > [ 21, 2, 23, 24]], > > [[ 1, 2, 27, 28], > [ 29, 18, 31, 32], > [ -1, 3, 35, 4]]]) > > NewCube > > array([[[ 13, 2, 3, 1], > [ 5, 30, 7, 8], > [ 9, 1, 11, 12]], > > [[ 1, 2, 15, 28], > [ 17, 18, 9, 20], > [ -1, 2, 23, 4]]]) > > I tried with argmax() and then roll() and delete() but these > all work on 1-D arrays only. Thanks. > > Actually, those commands do work with n-dimensional arrays, but you'll have to specify the axis (the default for all these functions is `axis=None` which tell the function to operate on flattened the arrays). If you don't care about the order of the "collapse", you can just do a simple sort (and drop the last---i.e. max---sub-array): >>> np.sort(cube, axis=0)[:2] If you need to keep the order, you can probably use some combination of `np.argsort` and `np.choose`. Cheers, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Fri Apr 20 14:35:29 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 20 Apr 2012 11:35:29 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> Message-ID: On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn wrote: > > I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well. Could be: this was years ago, and the bottleneck for me was in the constructor and in basic arithmetic. I had to make millions of these vectors and I needed to do basic arithmetic, but they were always 1-d and had one to 6 entries only. So writing a very static constructor with very low overhead did make a huge difference in that project. Also, when I wrote this code numpy didn't exist, I was using Numeric. Perhaps the same results could be obtained in numpy itself with judicious coding, I don't know. But in that project, ~600 lines of really easy pyrex code (it would be cython today) made a *huge* performance difference for me. Cheers, f From d.s.seljebotn at astro.uio.no Fri Apr 20 14:39:57 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 20 Apr 2012 20:39:57 +0200 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> Message-ID: <4F91AD7D.9030403@astro.uio.no> On 04/20/2012 08:35 PM, Fernando Perez wrote: > On Fri, Apr 20, 2012 at 11:27 AM, Dag Sverre Seljebotn > wrote: >> >> I don't think you gain that much by using a different type though? Those optimized code paths could be plugged into NumPy as well. > > Could be: this was years ago, and the bottleneck for me was in the > constructor and in basic arithmetic. I had to make millions of these > vectors and I needed to do basic arithmetic, but they were always 1-d > and had one to 6 entries only. So writing a very static constructor > with very low overhead did make a huge difference in that project. Oh, right. I was thinking "small" as in "fits in L2 cache", not small as in a few dozen entries. You definitely still need a Cython class then. Dag > > Also, when I wrote this code numpy didn't exist, I was using Numeric. > > Perhaps the same results could be obtained in numpy itself with > judicious coding, I don't know. But in that project, ~600 lines of > really easy pyrex code (it would be cython today) made a *huge* > performance difference for me. From chris.barker at noaa.gov Fri Apr 20 14:45:57 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 20 Apr 2012 11:45:57 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: <4F91AD7D.9030403@astro.uio.no> References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> <4F91AD7D.9030403@astro.uio.no> Message-ID: On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn wrote: > Oh, right. I was thinking "small" as in "fits in L2 cache", not small as > in a few dozen entries. or even two or three entries. I often use a (2,) or (3,) numpy array to represent an (x,y) point (usually pulled out from a Nx2 array). I like it 'cause i can do array math, etc. it makes the code cleaner, but it's actually faster to use tuples and do the indexing by hand :-( but yes, having something built-in, or at least very compatible with numpy would be best. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From matrixhasu at gmail.com Fri Apr 20 16:09:53 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Fri, 20 Apr 2012 22:09:53 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: > Also, I believe Debian will very soon freeze "testing" in order to > prepare to release the next stable. yes, the estimates are around June (nor clear if the beginning of the end). Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From ajfrank at ics.uci.edu Fri Apr 20 18:16:02 2012 From: ajfrank at ics.uci.edu (Drew Frank) Date: Fri, 20 Apr 2012 15:16:02 -0700 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> <4F91AD7D.9030403@astro.uio.no> Message-ID: On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker wrote: > > On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn > wrote: > > Oh, right. I was thinking "small" as in "fits in L2 cache", not small as > > in a few dozen entries. Another example of a small array use-case: I've been using numpy for my research in multi-target tracking, which involves something like a bunch of entangled hidden markov models. I represent target states with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small 1d arrays (1 or 2 elements). It may be possible to combine a bunch of these small arrays into a single larger array and use indexing to extract views, but it is much cleaner and more intuitive to use separate, small arrays. It's also convenient to use numpy arrays rather than a custom class because I use the linear algebra functionality as well as integration with other libraries (e.g. matplotlib). I also work with approximate probabilistic inference in graphical models (belief propagation, etc), which is another area where it can be nice to work with many small arrays. In any case, I just wanted to chime in with my small bit of evidence for people wanting to use numpy for work with small arrays, even if they are currently pretty slow. If there were a special version of a numpy array that would be faster for cases like this, I would definitely make use of it. Drew From ralf.gommers at googlemail.com Sat Apr 21 04:46:03 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 21 Apr 2012 10:46:03 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris wrote: > Hi All, > > Given the amount of new stuff coming in 1.7 and the slip in it's schedule, > I wonder if it would be worth putting out a 1.6.2 release with fixes for > einsum, ticket 1578, perhaps some others. My reasoning is that the fall > releases of Fedora, Ubuntu are likely to still use 1.6 and they might as > well use a somewhat fixed up version. The downside is located and > backporting fixes is likely to be a fair amount of work. A 1.7 release > would be preferable, but I'm not sure when we can make that happen. > Travis still sounded hopeful of being able to resolve the 1.7 issues relatively soon. On the other hand, even if that's done in one month we'll still miss Debian stable and a 1.6.2 release won't be *that* much work. Let's go for it I would say. Aiming for a RC on May 2nd and final release on May 16th would work for me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Apr 21 04:48:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 21 Apr 2012 10:48:36 +0200 Subject: [Numpy-discussion] Command-line options for (Windows) NumPy Installer? In-Reply-To: <47FF78CF835BC64E99CC0C2C559478C30B8D4957@BY2PRD0310MB388.namprd03.prod.outlook.com> References: <47FF78CF835BC64E99CC0C2C559478C30B8D4914@BY2PRD0310MB388.namprd03.prod.outlook.com> <47FF78CF835BC64E99CC0C2C559478C30B8D4957@BY2PRD0310MB388.namprd03.prod.outlook.com> Message-ID: On Fri, Apr 20, 2012 at 8:05 PM, Dave Fugate wrote: > Hi, is there any documentation available on exactly which command line > options are available from NumPy?s ?superpack? installers on Windows? > E.g., http://docs.scipy.org/doc/numpy/user/install.html mentions an > ?/arch? flag, but I?m not seeing anything else called out. > Other than arch selection I think it's a fairly standard NSIS installer. No idea what else you can do with it though from the command line. Are you looking to accomplish some specific task? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 21 11:16:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 21 Apr 2012 09:16:22 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers wrote: > > > On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Given the amount of new stuff coming in 1.7 and the slip in it's >> schedule, I wonder if it would be worth putting out a 1.6.2 release with >> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >> might as well use a somewhat fixed up version. The downside is located and >> backporting fixes is likely to be a fair amount of work. A 1.7 release >> would be preferable, but I'm not sure when we can make that happen. >> > > Travis still sounded hopeful of being able to resolve the 1.7 issues > relatively soon. On the other hand, even if that's done in one month we'll > still miss Debian stable and a 1.6.2 release won't be *that* much work. > > Let's go for it I would say. > > Aiming for a RC on May 2nd and final release on May 16th would work for me. > > I count 280 BUG commits since 1.6.1, so we are going to need to thin those out. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 21 12:24:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 21 Apr 2012 10:24:22 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 9:16 AM, Charles R Harris wrote: > > > On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers > wrote: > >> >> >> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Given the amount of new stuff coming in 1.7 and the slip in it's >>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>> might as well use a somewhat fixed up version. The downside is located and >>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>> would be preferable, but I'm not sure when we can make that happen. >>> >> >> Travis still sounded hopeful of being able to resolve the 1.7 issues >> relatively soon. On the other hand, even if that's done in one month we'll >> still miss Debian stable and a 1.6.2 release won't be *that* much work. >> >> Let's go for it I would say. >> >> Aiming for a RC on May 2nd and final release on May 16th would work for >> me. >> >> > I count 280 BUG commits since 1.6.1, so we are going to need to thin those > out. > > Backported einsum fixes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kishor.iitr at gmail.com Sat Apr 21 13:25:02 2012 From: kishor.iitr at gmail.com (kishor.iitr) Date: Sat, 21 Apr 2012 10:25:02 -0700 (PDT) Subject: [Numpy-discussion] Problem in Generalized Ufunc with mixed data type signatures Message-ID: <33724952.post@talk.nabble.com> Hi, I am trying to create Generalized ufunc with following signature: static char func_signatures[] = { NPY_LONG, NPY_DOUBLE, PyArray_INTP }; gufunc_signature = "(m), (n), -> (n)"; I am getting following warning while calling the function: TypeError: function not supported for these types, and can't coerce safely to supported types If I change m => NPY_DOUBLE and n -> NPY_DOBULE function works fine but somehow when I mix the data types it gives me warning mentioned above? I didn't find any example in the Numpy Library where we mix the data types. Can you please help me? Thanks, -Kishor -- View this message in context: http://old.nabble.com/Problem-in-Generalized-Ufunc-with-mixed-data-type-signatures-tp33724952p33724952.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From charlesr.harris at gmail.com Sat Apr 21 13:59:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 21 Apr 2012 11:59:53 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 10:24 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sat, Apr 21, 2012 at 9:16 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> Given the amount of new stuff coming in 1.7 and the slip in it's >>>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>>> might as well use a somewhat fixed up version. The downside is located and >>>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>>> would be preferable, but I'm not sure when we can make that happen. >>>> >>> >>> Travis still sounded hopeful of being able to resolve the 1.7 issues >>> relatively soon. On the other hand, even if that's done in one month we'll >>> still miss Debian stable and a 1.6.2 release won't be *that* much work. >>> >>> Let's go for it I would say. >>> >>> Aiming for a RC on May 2nd and final release on May 16th would work for >>> me. >>> >>> >> I count 280 BUG commits since 1.6.1, so we are going to need to thin >> those out. >> >> > Backported einsum fixes. > > Backported ticket #1578 fixes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat Apr 21 14:49:52 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 21 Apr 2012 20:49:52 +0200 Subject: [Numpy-discussion] ANN: Cython 0.16 In-Reply-To: References: Message-ID: <4F930150.3080308@astro.uio.no> [From Mark Florisson, release manager for 0.16. PS: Note that the git branch is "release", not "master": https://github.com/cython/cython/tree/release Dag] We are pleased to announce a new version of Cython, 0.16 (http://cython.org/release/Cython-0.16.tar.gz). It comes with new features, improvements and bug fixes, including - super() without arguments - fused types - memoryviews - more Python-like functions Many thanks to the many contributors of this release and to all bug reporters and supporting users! A more comprehensive list of features and contributors can be found here: http://wiki.cython.org/ReleaseNotes-0.16 , and an overview of bug fixes can be found here: http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=component&milestone=0.16&desc=1 Enjoy! From e.antero.tammi at gmail.com Sun Apr 22 05:54:12 2012 From: e.antero.tammi at gmail.com (eat) Date: Sun, 22 Apr 2012 12:54:12 +0300 Subject: [Numpy-discussion] (no subject) In-Reply-To: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> References: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> Message-ID: Hi, On Fri, Apr 20, 2012 at 9:15 PM, Andre Martel wrote: > What would be the best way to remove the maximum from a cube and > "collapse" the remaining elements along the z-axis ? > For example, I want to reduce Cube to NewCube: > > >>> Cube > array([[[ 13, 2, 3, 42], > [ 5, 100, 7, 8], > [ 9, 1, 11, 12]], > > [[ 25, 4, 15, 1], > [ 17, 30, 9, 20], > [ 21, 2, 23, 24]], > > [[ 1, 2, 27, 28], > [ 29, 18, 31, 32], > [ -1, 3, 35, 4]]]) > > NewCube > > array([[[ 13, 2, 3, 1], > [ 5, 30, 7, 8], > [ 9, 1, 11, 12]], > > [[ 1, 2, 15, 28], > [ 17, 18, 9, 20], > [ -1, 2, 23, 4]]]) > > I tried with argmax() and then roll() and delete() but these > all work on 1-D arrays only. Thanks. > Perhaps it would be more straightforward to process via 2D-arrays, like: In []: C Out[]: array([[[ 13, 2, 3, 42], [ 5, 100, 7, 8], [ 9, 1, 11, 12]], [[ 25, 4, 15, 1], [ 17, 30, 9, 20], [ 21, 2, 23, 24]], [[ 1, 2, 27, 28], [ 29, 18, 31, 32], [ -1, 3, 35, 4]]]) In []: C_in= C.reshape(3, -1).copy() In []: ndx= C_in.argmax(0) In []: C_out= C_in[:2, :] In []: C_out[:, ndx== 0]= C_in[1:, ndx== 0] In []: C_out[1, ndx== 1]= C_in[2, ndx== 1] In []: C_out.reshape(2, 3, 4) Out[]: array([[[13, 2, 3, 1], [ 5, 30, 7, 8], [ 9, 1, 11, 12]], [[ 1, 2, 15, 28], [17, 18, 9, 20], [-1, 2, 23, 4]]]) My 2 cents, -eat > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Sun Apr 22 06:32:42 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 22 Apr 2012 12:32:42 +0200 Subject: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers) In-Reply-To: References: <601932FA-9F0A-489B-B3D7-BFBD2756559C@continuum.io> <24AB12B7-548F-4B17-9CB9-7673353C8ADB@continuum.io> <2888f0c8-7d92-44a6-8544-2a6e39ebdfc6@email.android.com> <4F91AD7D.9030403@astro.uio.no> Message-ID: On 21. apr. 2012, at 00:16, Drew Frank wrote: > On Fri, Apr 20, 2012 at 11:45 AM, Chris Barker wrote: >> >> On Fri, Apr 20, 2012 at 11:39 AM, Dag Sverre Seljebotn >> wrote: >>> Oh, right. I was thinking "small" as in "fits in L2 cache", not small as >>> in a few dozen entries. > > Another example of a small array use-case: I've been using numpy for > my research in multi-target tracking, which involves something like a > bunch of entangled hidden markov models. I represent target states > with small 2d arrays (e.g. 2x2, 4x4, ..) and observations with small > 1d arrays (1 or 2 elements). It may be possible to combine a bunch of > these small arrays into a single larger array and use indexing to > extract views, but it is much cleaner and more intuitive to use > separate, small arrays. It's also convenient to use numpy arrays > rather than a custom class because I use the linear algebra > functionality as well as integration with other libraries (e.g. > matplotlib). > > I also work with approximate probabilistic inference in graphical > models (belief propagation, etc), which is another area where it can > be nice to work with many small arrays. > > In any case, I just wanted to chime in with my small bit of evidence > for people wanting to use numpy for work with small arrays, even if > they are currently pretty slow. If there were a special version of a > numpy array that would be faster for cases like this, I would > definitely make use of it. > > Drew Although performance hasn't been a killer for me, I've been using numpy arrays (or matrices) for Mueller matrices [0] and Stokes vectors [1]. These describe the polarization of light and are always 4x1 vectors or 4x4 matrices. It would be nice if my code ran in 1 night instead of one week, although this is still tolerable in my case. Again, just an example of how small-vector/matrix performance can be important in certain use cases. Paul [0] https://en.wikipedia.org/wiki/Mueller_calculus [1] https://en.wikipedia.org/wiki/Stokes_vector From ralf.gommers at googlemail.com Sun Apr 22 07:25:43 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 22 Apr 2012 13:25:43 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 5:16 PM, Charles R Harris wrote: > > > On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers > wrote: > >> >> >> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Given the amount of new stuff coming in 1.7 and the slip in it's >>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>> might as well use a somewhat fixed up version. The downside is located and >>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>> would be preferable, but I'm not sure when we can make that happen. >>> >> >> Travis still sounded hopeful of being able to resolve the 1.7 issues >> relatively soon. On the other hand, even if that's done in one month we'll >> still miss Debian stable and a 1.6.2 release won't be *that* much work. >> >> Let's go for it I would say. >> >> Aiming for a RC on May 2nd and final release on May 16th would work for >> me. >> >> > I count 280 BUG commits since 1.6.1, so we are going to need to thin those > out. > Indeed. We can discard all commits related to NA and datetime, and then we should find some balance between how important the fixes are and how much risk there is that they break something. I agree with the couple of backports you've done so far, but I propose to do the rest via PRs. There's also build issues. I checked all of those and sent a PR with backports of all the relevant ones: https://github.com/numpy/numpy/pull/258 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Apr 22 09:44:31 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 22 Apr 2012 07:44:31 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers wrote: > > > On Sat, Apr 21, 2012 at 5:16 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> Given the amount of new stuff coming in 1.7 and the slip in it's >>>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>>> might as well use a somewhat fixed up version. The downside is located and >>>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>>> would be preferable, but I'm not sure when we can make that happen. >>>> >>> >>> Travis still sounded hopeful of being able to resolve the 1.7 issues >>> relatively soon. On the other hand, even if that's done in one month we'll >>> still miss Debian stable and a 1.6.2 release won't be *that* much work. >>> >>> Let's go for it I would say. >>> >>> Aiming for a RC on May 2nd and final release on May 16th would work for >>> me. >>> >>> >> I count 280 BUG commits since 1.6.1, so we are going to need to thin >> those out. >> > > Indeed. We can discard all commits related to NA and datetime, and then we > should find some balance between how important the fixes are and how much > risk there is that they break something. I agree with the couple of > backports you've done so far, but I propose to do the rest via PRs. > > There's also build issues. I checked all of those and sent a PR with > backports of all the relevant ones: > https://github.com/numpy/numpy/pull/258 > > Hi Ralf, I went ahead and merged those. What's the easiest way to make things merge into the maintenance/1.6.x branch in the pull requests? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 22 09:49:38 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 22 Apr 2012 15:49:38 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 3:44 PM, Charles R Harris wrote: > > > On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers > wrote: > >> >> >> On Sat, Apr 21, 2012 at 5:16 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> >>>> >>>> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Given the amount of new stuff coming in 1.7 and the slip in it's >>>>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>>>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>>>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>>>> might as well use a somewhat fixed up version. The downside is located and >>>>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>>>> would be preferable, but I'm not sure when we can make that happen. >>>>> >>>> >>>> Travis still sounded hopeful of being able to resolve the 1.7 issues >>>> relatively soon. On the other hand, even if that's done in one month we'll >>>> still miss Debian stable and a 1.6.2 release won't be *that* much work. >>>> >>>> Let's go for it I would say. >>>> >>>> Aiming for a RC on May 2nd and final release on May 16th would work for >>>> me. >>>> >>>> >>> I count 280 BUG commits since 1.6.1, so we are going to need to thin >>> those out. >>> >> >> Indeed. We can discard all commits related to NA and datetime, and then >> we should find some balance between how important the fixes are and how >> much risk there is that they break something. I agree with the couple of >> backports you've done so far, but I propose to do the rest via PRs. >> >> There's also build issues. I checked all of those and sent a PR with >> backports of all the relevant ones: >> https://github.com/numpy/numpy/pull/258 >> >> > Hi Ralf, I went ahead and merged those. What's the easiest way to make > things merge into the maintenance/1.6.x branch in the pull requests? > When sending a PR: in your own Github repo you press "Pull request", then in the next screen under "Base branch - tag - commit" you change the branch from master to maintenance/1.6.x. Then press "Update commit range". Then merge like normal. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmanuelle.gouillart at nsup.org Sun Apr 22 09:59:02 2012 From: emmanuelle.gouillart at nsup.org (Emmanuelle Gouillart) Date: Sun, 22 Apr 2012 15:59:02 +0200 Subject: [Numpy-discussion] Euroscipy 2012 - abstract deadline soon (April 30) + sprints Message-ID: <20120422135902.GA8752@phare.normalesup.org> Hello, this is a reminder of the approaching deadline for abstract submission at the Euroscipy 2012 conference: the deadline is April 30, in one week. Euroscipy 2012 will be held in **Brussels**, **August 23-27**, at the Universit? Libre de Bruxelles (ULB, Solbosch Campus). The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research and industry. This event strives to bring together both users and developers of scientific tools, as well as academic research and state of the art industry. More information about the conference, including practical information, are found on the conference website http://www.euroscipy.org/conference/euroscipy2012 We are soliciting talks and posters that discuss topics related to scientific computing using Python. These include applications, teaching, future development directions, and research. We welcome contributions from the industry as well as the academic world. Submission guidelines are found on http://www.euroscipy.org/card/euroscipy2012_call_for_contributions Also, rooms are available at the ULB for sprints on Tuesday August 28th and Wednesday 29th. If you wish to organize a sprint at Euroscipy, please get in touch with Berkin Malkoc (malkocb at itu.edu.tr). Any other questions should be addressed exclusively to org-team at lists.euroscipy.org We apologize for the inconvenience if you received this e-mail through several mailing-lists. -- Emmanuelle, for the organizing team From njs at pobox.com Sun Apr 22 18:14:56 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 22 Apr 2012 23:14:56 +0100 Subject: [Numpy-discussion] the state of NA/masking Message-ID: Hi all, Travis, Mark, and I talked on Skype this week about how to productively move forward with the NA debate, and I got picked to summarize for the list :-). There are three main things we discussed: 1) About process: We seem to agree that this discussion has been ineffective for a variety of reasons, and that it would be best to back up and try the consensus-based approach. Maybe not everyone agrees... I'm not sure how we go about building consensus on whether we need consensus? And we noted that we may not actually all mean the same thing by that. To start a discussion, I'll write up separately what I understand by that term. 2) If we require consensus on our NA implemention, then we have a problem for the 1.7.0 release. The problem is this: -- We have some kind of commitment to keeping compatibility between releases -- Therefore, if we release with NA masks, then we have some kind of commitment to continuing to support these in some form going forward -- But as per above, we can't make such a commitment until we have consensus, and we don't have consensus. Even if we end up deciding that the current code is the best thing ever, we haven't done that yet. Therefore, we have a kind of constrained optimization problem: we need to find the best way to adjust our "some kind of commitment", or the current code, or both, so that we can release 1.7. Alternatively we could delay the release until we have reached and implemented consensus, but I have an allergy to putting such amorphous things on our critical path, and I suspect I'm not the only one. (If it turns out that consensus is quick and the release is slow for other reasons, then that'd be great, of course, but why depend on it if we don't have to?) I'll also send a separate email to try and lay out the main options here, as a basis for discussion. 3) And, in the long run, there's the actual question of what we kind of NA support we actually want in numpy. A major problem here is that it's very difficult for anyone who hasn't spent huge amounts of time wading through the mailing list to actually understand what the points of contention are. So, Mark and I are going to *co*-write a document explaining what we see as the main problems, and trying to clarify our disagreements. Of course, this still won't include everyone's point of view, but hopefully it will serve as a good starting point for... you guessed it... discussion. Cheers, -- Nathaniel From njs at pobox.com Sun Apr 22 18:15:01 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 22 Apr 2012 23:15:01 +0100 Subject: [Numpy-discussion] NEP mask code and the 1.7 release Message-ID: We need to decide what to do with the NA masking code currently in master, vis-a-vis the 1.7 release. While this code is great at what it is, we don't actually have consensus yet that it's the best way to give our users what they want/need -- or even an appropriate way. So we need to figure out how to release 1.7 without committing ourselves to supporting this design in the future. Background: what does the code currently in master do? -------------------------------------------- It adds 3 pointers at the end of the PyArrayObject struct (which is better known as the numpy.ndarray object). These new struct members, and some accessors for them, are exposed as part of the public API. There are also a few additions to the Python-level API (mask= argument to np.array, skipna= argument to ufuncs, etc.) What does this mean for compatibility? ------------------------------------------------ The change in the ndarray struct is not as problematic as it might seem, compatibility-wise, since Python objects are almost always referred to by pointers. Since the initial part of the struct will continue to have the same memory layout, existing source and binary code that works with PyArrayObject *pointers* will continue to work unchanged. One place where the actual struct size matters is for any C-level ndarray subclasses, which will have their memory layout change, and thus will need to be recompiled. (Python-level ndarray subclasses will have their memory layout change as well -- e.g., they will have different __dictoffset__ values -- but it's unlikely that any existing Python code depends on such details.) What if we want to change our minds later? ------------------------------------------------------- For the same reasons as given above, any new code which avoids referencing the new struct fields referring to masks, or using the new masking APIs, will continue to work even if the masking is later removed. Any new code which *does* refer to the new masking APIs, or references the fields directly, will break if masking is later removed. Specifically, source will fail to compile, and existing binaries will silently access memory that is past the end of the PyArrayObject struct, which will have unpredictable consequences. (Most likely segfaults, but no guarantees.) This applies even to code which simply tries to check whether a mask is present. So I think the preconditions for leaving this code as-is for 1.7 are that we must agree: * We are willing to require a recompile of any C-level ndarray subclasses (do any exist?) * We are willing to make absolutely no guarantees about future compatibility for code which uses APIs marked "experimental" * We are willing for this breakage to occur in the form of random segfaults * We are okay with the extra 3 pointers worth of memory overhead on each ndarray Personally I can live with all of these if everyone else can, but I'm nervous about reducing our compatibility guarantees like that, and we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we currently have. (Maybe we should resurrect the weasels ;-) [1]) [1] http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html Any other options? ------------------------ Alternative 1: The obvious other option is to go through and move all the strictly mask-related code out of master and into a branch. Presumably this wouldn't include all the infrastructure that Mark added, since a lot of it is e.g. shared with where=, and that would stay. Even so, this would be a big and possibly time-consuming change. Alternative 2: After auditing the code a bit, the cleanest third option I can think of is: 1. Go through and make sure that all numpy-internal access to the new maskna fields happens via the accessor functions. (This patch would produce no functionality change.) 2. Move the accessors into some numpy-internal header file, so that user code can't call them. 3. Remove the mask= argument to Python-level ndarray constructors, remove the new maskna_ fields from PyArrayObject, and modify the accessors so that they always return NULL, 0, etc., as if the array does not have a mask. This would make 1.7 completely compatible with 1.6 API and ABI-wise. But it would also be a minimal code change, leaving the mask-related code paths in place but inaccessible. If we decided to re-enable them, it would just be matter of reverting steps (3) and (2). The main downside I see with this approach is that leaving a bunch of inaccessible code paths lying around might make it harder to maintain 1.7 as a "long term support" release. I'm personally willing to implement either of these changes. Or perhaps there's another option that I'm not thinking of! -- Nathaniel From njs at pobox.com Sun Apr 22 18:15:04 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 22 Apr 2012 23:15:04 +0100 Subject: [Numpy-discussion] What is consensus anyway Message-ID: If you hang around big FOSS projects, you'll see the word "consensus" come up a lot. For example, the glibc steering committee recently dissolved itself in favor of governance "directly by the consensus of the people active in glibc development"[1]. It's the governing rule of the IETF, which defines many of the most important internet standards[2]. It is the "primary way decisions are made on Wikipedia"[3]. It's "one of the fundamental aspects of accomplishing things within the Apache framework"[4]. [1] https://lwn.net/Articles/488778/ [2] https://www.ietf.org/tao.html#getting.things.done [3] https://en.wikipedia.org/wiki/Wikipedia:Consensus [4] https://www.apache.org/foundation/voting.html But it turns out that this "consensus" thing is actually somewhat mysterious, and one that most programmers immersed in this culture pick it up by osmosis. And numpy in particular has a lot of developers who are not coming from a classic FOSS programmer background! So this is my personal attempt to articulate what it is, and why requiring consensus is probably the best possible approach to project decision making. So what is "consensus"? Like, voting or something? ----------------------------------------------------- This is surprisingly subtle and specific. "Consensus" means something like, "everyone who cares is satisfied with the result". It does *not* mean * Every opinion counts equally * We vote on anything * Every solution must be perfect and flawless * Every solution must leave everyone overjoyed * Everyone must sign off on every solution. It *does* mean * We invite people to speak up * We generally trust individuals to decide how important their opinion is * We generally trust individuals to decide whether or not they can live with some outcome * If they can't, then we take the time to find something better. One simple way of stating this is, everyone has a veto. In practice, such vetoes are almost never used, so this rule is not particularly illuminating on its own. Hence, the rest of this document. What a waste of time! That all sounds very pretty on paper, but we have stuff to get done. ----------------------------------------------------------------------------------- First, I'll note that this seemingly utopian scheme has a track record of producing such impractical systems as TCP/IP, SMTP, DNS, Apache, GCC, Linux, Samba, Python, ... But mere empirical results are often less convincing than a good story, so I will give you two. Why does a requirement for consensus work? Reason 1 (for optimists): *All of us are smarter than any of us.* For a complex project with many users, it's extraordinarily difficult for any one person to understand the full ramifications of any decision, particularly the sort of far-reaching architectural decisions that are most important. It's even more difficult to understand all the possibilities of all the different possible solutions. In fact, it's *extremely* common that the correct solution to a problem is the one that no-one thinks of until after a month of annoying debate. Spending a month to avoid an architectural problem that will haunt us for years is an *excellent* trade-off, even if it feels interminable at the time. Even two months. Usually disagreements are an indication that a better solution is possible, even when it's not clear what that would be. Reason 2 (for pessimists): *You **will** reach consensus sooner or later; it's less painful to do up front.* Example: NA handling. There are two schemes that people use for this right now -- numpy.ma and ugly NaN kluges (see e.g. nanmean). These are generally agreed to be suboptimal. Recently, two new contenders have shown up: the NEP masked-NA support currently in master, and the unrelated NA support in pandas (which as a library is attracting a *lot* of the statistical analysis folk who care about missing data, kudos to Wes). I think that right now, the most likely future is that a few years from now, many people will still be using the old solutions, and others will have switched to the new (incompatible) solutions, and we will have *4* suboptimal schemes in concurrent use. If (when) this happens, we will have to re-open this discussion yet again, but now with a heck of a mess to clean up. This is FOSS -- if people aren't convinced by your solution, they will just ignore it and do their own thing. So a policy that allows changes to be made without consensus is a recipe for entrenching disagreements and splitting the community. Okay, great, but even if it's the best thing ever, we *can't* hold a vote on every change! What are you actually suggesting we do? ---------------------------------------------------------------------------- Right, that's not the idea. Most changes are pretty obviously uncontroversial, and in fact we usually have the opposite problem -- it's hard to get people to do code review! So having consensus on every change is an ideal, and in practice, just following the reasonable person principle lets us get pretty close to that ideal. If no-one objects to a change, then it's probably fine. (And it's not like anyone *wants* that segfault to remain unfixed! That's obvious.) OTOH, this isn't an excuse to try gaming the system -- if you have a change that might affect people adversely, then it can be worthwhile to send them a ping, even if they didn't object. They might have just missed seeing it go by, and if it's going to be a problem, better to find out now! In fact, one of the nice things about having a consistent culture of consensus-building is that people learn to trust that if they do have a problem, it will be taken seriously. And that, in turn, makes it okay to make judgement calls about whether the participants in some discussion basically agree, or whether the apparent disagreement is just bikeshedding[1], or whatever. If you make the wrong judgement call, then someone will tell you, and no harm is done. If you do not have such a culture, then people may (will) despair of being taken seriously, and go do something more pleasant and productive, like fire-walking or coding in Matlab. [1] http://producingoss.com/en/common-pitfalls.html#bikeshed So mainly what I'm saying we should do is: 1. Make it as easy as possible for people to see what's going on and join the discussion. All decisions and reasoning behind decisions take place in public. (On this note, it would be *really* good if pull request notifications went to the list.) 2. If someone raises a substantive objection, take that seriously. 3. If someone says "no, this is just not going to work for me, because... ", then it can't go in. It turns out that since we are all human, it's much easier to take people's concerns seriously when you know that they can veto your code! The result is that in practice, disagreements get resolved at point (2) there, and no-one feels the need to take such extreme measures as vetoing anything. I'm as lazy as anyone else; I produce better solutions when I'm forced to keep looking for them. So if we just follow this consensus stuff, everything will be perfect? --------------------------------------------------- Ha ha ha no. It can and does work better than any other options I'm aware of, but it takes practice and there are certainly still failure modes. Here's some that come to mind: * Bikeshedding: Mentioned above. There's nothing *wrong* with everyone speaking up to voice their opinion, but it shouldn't be an obstacle to getting work done when in fact everyone will be satisfied regardless. * False compromise: if all you're trying to do is make everybody happy, then it's easy to end up with a "compromise" that takes one bit from each proposal (or simply takes the union of them all). This is usually worse than any of the original proposals. Good designs take work; skipping that work is tempting; resist it. You also can end up with some surprising solutions, e.g.: * The group reaches consensus that the problem is well understood and there simply is no perfect solution, but something is better than nothing. So everyone agrees to, in effect, flip a coin. (Example: the Python ternary operator) * The group reaches consensus that while other points of view may be valid, this project only has room for one of them, sorry, anyone who disagrees will have to start their own project. (Example: GPL versus BSD debates) * The group reaches consensus that at least one piece of a larger proposal is okay to start, which puts off the show-down over the rest of it until another day. (At which point more data more be available, opinions may have changed, or the whole issue may have become irrelevant.) Remember, the goal is always to find some way forward that we can collectively live with. Sometimes you can successfully convince everyone of the intrinsic awesomeness of your original idea through argument along... but clever outside-the-box proposals often pay-off too. But what about obstructive people abusing their veto power? ---------------------------------------------------------------- This concern makes perfect sense, but it turns out to just not be as much of a problem as often as you'd think. Most people have more interesting things to do with their lives than to gum up the mailing list for some random software library. It's a fair assumption that if someone cares enough to speak up, it's because they have some legitimate interest in numpy's future. And again, energy spent on trying to sniff out obstructionists can usually be more profitably spent on finding better solutions. That said, yes, sometimes people may be obstructive or act in bad faith. Here's some good experience-based advice on this: http://producingoss.com/en/difficult-people.html Notice that everything they say is still oriented around consensus -- "You may not persuade the person in question, but that's okay as long as you persuade everyone else", "a perfect example of how to build a strong case on neutral, quantitative data", the "masterful strategy" is to build consensus before acting. The consensus ideal is perfectly compatible with dealing with difficult people. Other documents --------------------- Like I said at the start, this is just my attempt to distill some abstract principles from my own experience. I can't take credit for most of these insights, and no doubt I've articulated some of them poorly. Fortunately, we don't all have to agree on every detail to get things done :-). But if you want to read more about this topic, and from other perspectives, here are some decent documents: http://producingoss.com/en/consensus-democracy.html https://www.ietf.org/tao.html#getting.things.done https://en.wikipedia.org/wiki/Consensus_decision-making And, of course, I would love to hear feedback on this document! - N From charlesr.harris at gmail.com Sun Apr 22 20:04:34 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 22 Apr 2012 18:04:34 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 4:15 PM, Nathaniel Smith wrote: > If you hang around big FOSS projects, you'll see the word "consensus" > come up a lot. For example, the glibc steering committee recently > dissolved itself in favor of governance "directly by the consensus of > the people active in glibc development"[1]. It's the governing rule of > the IETF, which defines many of the most important internet > standards[2]. It is the "primary way decisions are made on > Wikipedia"[3]. It's "one of the fundamental aspects of accomplishing > things within the Apache framework"[4]. > > [1] https://lwn.net/Articles/488778/ > [2] https://www.ietf.org/tao.html#getting.things.done > [3] https://en.wikipedia.org/wiki/Wikipedia:Consensus > [4] https://www.apache.org/foundation/voting.html > > But it turns out that this "consensus" thing is actually somewhat > mysterious, and one that most programmers immersed in this culture > pick it up by osmosis. And numpy in particular has a lot of developers > who are not coming from a classic FOSS programmer background! So this > is my personal attempt to articulate what it is, and why requiring > consensus is probably the best possible approach to project decision > making. > > So what is "consensus"? Like, voting or something? > ----------------------------------------------------- > > This is surprisingly subtle and specific. > > "Consensus" means something like, "everyone who cares is satisfied > with the result". > > It does *not* mean > * Every opinion counts equally > * We vote on anything > * Every solution must be perfect and flawless > * Every solution must leave everyone overjoyed > * Everyone must sign off on every solution. > > It *does* mean > * We invite people to speak up > * We generally trust individuals to decide how important their opinion is > * We generally trust individuals to decide whether or not they can > live with some outcome > * If they can't, then we take the time to find something better. > > One simple way of stating this is, everyone has a veto. In practice, > such vetoes are almost never used, so this rule is not particularly > illuminating on its own. Hence, the rest of this document. > > What a waste of time! That all sounds very pretty on paper, but we > have stuff to get done. > > ----------------------------------------------------------------------------------- > > First, I'll note that this seemingly utopian scheme has a track record > of producing such impractical systems as TCP/IP, SMTP, DNS, Apache, > GCC, Linux, Samba, Python, ... > > Linux is Linus' private tree. Everything that goes in is his decision, everything that stays out is his decision. Of course, he delegates much of the work to people he trusts, but it doesn't even reach the level of a BDFL, it's DFL. As for consensus, it basically comes down to convincing the gatekeepers one level below Linus that your code might be useful. So bad example. Same with TCP/IP, which was basically Kahn and Cerf consulting with a few others and working by request of DARPA. GCC was Richard Stallman (I got one of the first tapes for a $30 donation), Python was Guido. Some of the projects later developed some form of governance but Guido, for instance, can veto anything he dislikes even if he is disinclined to do so. I'm not saying you're wrong about open source, I'm just saying that that each project differs and it is wrong to imply that they follow some common form of governance under the rubric FOSS and that they all seek consensus. And they certainly don't *start* that way. And there are also plenty of projects that fail when the prime mover loses interest or folks get tired of the politics. But mere empirical results are often less convincing than a good > story, so I will give you two. Why does a requirement for consensus > work? > > Reason 1 (for optimists): *All of us are smarter than any of us.* For > a complex project with many users, it's extraordinarily difficult for > any one person to understand the full ramifications of any decision, > particularly the sort of far-reaching architectural decisions that are > most important. It's even more difficult to understand all the > possibilities of all the different possible solutions. In fact, it's > *extremely* common that the correct solution to a problem is the one > that no-one thinks of until after a month of annoying debate. Spending > a month to avoid an architectural problem that will haunt us for years > is an *excellent* trade-off, even if it feels interminable at the > time. Even two months. Usually disagreements are an indication that a > better solution is possible, even when it's not clear what that would > be. > > Reason 2 (for pessimists): *You **will** reach consensus sooner or > later; it's less painful to do up front.* Example: NA handling. There > are two schemes that people use for this right now -- numpy.ma and > ugly NaN kluges (see e.g. nanmean). These are generally agreed to be > suboptimal. Recently, two new contenders have shown up: the NEP > masked-NA support currently in master, and the unrelated NA support in > pandas (which as a library is attracting a *lot* of the statistical > analysis folk who care about missing data, kudos to Wes). I think that > right now, the most likely future is that a few years from now, many > people will still be using the old solutions, and others will have > switched to the new (incompatible) solutions, and we will have *4* > suboptimal schemes in concurrent use. If (when) this happens, we will > have to re-open this discussion yet again, but now with a heck of a > mess to clean up. This is FOSS -- if people aren't convinced by your > solution, they will just ignore it and do their own thing. So a policy > that allows changes to be made without consensus is a recipe for > entrenching disagreements and splitting the community. > > Okay, great, but even if it's the best thing ever, we *can't* hold a > vote on every change! What are you actually suggesting we do? > > ---------------------------------------------------------------------------- > > Right, that's not the idea. Most changes are pretty obviously > uncontroversial, and in fact we usually have the opposite problem -- > it's hard to get people to do code review! > > So having consensus on every change is an ideal, and in practice, just > following the reasonable person principle lets us get pretty close to > that ideal. If no-one objects to a change, then it's probably fine. > (And it's not like anyone *wants* that segfault to remain unfixed! > That's obvious.) OTOH, this isn't an excuse to try gaming the system > -- if you have a change that might affect people adversely, then it > can be worthwhile to send them a ping, even if they didn't object. > They might have just missed seeing it go by, and if it's going to be a > problem, better to find out now! > > In fact, one of the nice things about having a consistent culture of > consensus-building is that people learn to trust that if they do have > a problem, it will be taken seriously. And that, in turn, makes it > okay to make judgement calls about whether the participants in some > discussion basically agree, or whether the apparent disagreement is > just bikeshedding[1], or whatever. If you make the wrong judgement > call, then someone will tell you, and no harm is done. If you do not > have such a culture, then people may (will) despair of being taken > seriously, and go do something more pleasant and productive, like > fire-walking or coding in Matlab. > > [1] http://producingoss.com/en/common-pitfalls.html#bikeshed > > So mainly what I'm saying we should do is: > 1. Make it as easy as possible for people to see what's going on and > join the discussion. All decisions and reasoning behind decisions take > place in public. (On this note, it would be *really* good if pull > request notifications went to the list.) > 2. If someone raises a substantive objection, take that seriously. > 3. If someone says "no, this is just not going to work for me, > because... ", then it can't go in. > > What happens when someone wants to spend all their time talking about process? It can get kind of old. > It turns out that since we are all human, it's much easier to take > people's concerns seriously when you know that they can veto your > code! The result is that in practice, disagreements get resolved at > point (2) there, and no-one feels the need to take such extreme > measures as vetoing anything. I'm as lazy as anyone else; I produce > better solutions when I'm forced to keep looking for them. > > So if we just follow this consensus stuff, everything will be perfect? > --------------------------------------------------- > > Ha ha ha no. It can and does work better than any other options I'm > aware of, but it takes practice and there are certainly still failure > modes. Here's some that come to mind: > > * Bikeshedding: Mentioned above. There's nothing *wrong* with everyone > speaking up to voice their opinion, but it shouldn't be an obstacle to > getting work done when in fact everyone will be satisfied regardless. > * False compromise: if all you're trying to do is make everybody > happy, then it's easy to end up with a "compromise" that takes one bit > from each proposal (or simply takes the union of them all). This is > usually worse than any of the original proposals. Good designs take > work; skipping that work is tempting; resist it. > > You also can end up with some surprising solutions, e.g.: > > * The group reaches consensus that the problem is well understood and > there simply is no perfect solution, but something is better than > nothing. So everyone agrees to, in effect, flip a coin. (Example: the > Python ternary operator) > * The group reaches consensus that while other points of view may be > valid, this project only has room for one of them, sorry, anyone who > disagrees will have to start their own project. (Example: GPL versus > BSD debates) > * The group reaches consensus that at least one piece of a larger > proposal is okay to start, which puts off the show-down over the rest > of it until another day. (At which point more data more be available, > opinions may have changed, or the whole issue may have become > irrelevant.) > > Remember, the goal is always to find some way forward that we can > collectively live with. Sometimes you can successfully convince > everyone of the intrinsic awesomeness of your original idea through > argument along... but clever outside-the-box proposals often pay-off > too. > > But what about obstructive people abusing their veto power? > ---------------------------------------------------------------- > > This concern makes perfect sense, but it turns out to just not be as > much of a problem as often as you'd think. Most people have more > interesting things to do with their lives than to gum up the mailing > list for some random software library. It's a fair assumption that if > someone cares enough to speak up, it's because they have some > legitimate interest in numpy's future. And again, energy spent on > trying to sniff out obstructionists can usually be more profitably > spent on finding better solutions. > > That said, yes, sometimes people may be obstructive or act in bad > faith. Here's some good experience-based advice on this: > http://producingoss.com/en/difficult-people.html > Notice that everything they say is still oriented around consensus -- > "You may not persuade the person in question, but that's okay as long > as you persuade everyone else", "a perfect example of how to build a > strong case on neutral, quantitative data", the "masterful strategy" > is to build consensus before acting. The consensus ideal is perfectly > compatible with dealing with difficult people. > > Other documents > --------------------- > > Like I said at the start, this is just my attempt to distill some > abstract principles from my own experience. I can't take credit for > most of these insights, and no doubt I've articulated some of them > poorly. Fortunately, we don't all have to agree on every detail to get > things done :-). But if you want to read more about this topic, and > from other perspectives, here are some decent documents: > > http://producingoss.com/en/consensus-democracy.html > https://www.ietf.org/tao.html#getting.things.done > https://en.wikipedia.org/wiki/Consensus_decision-making > > And, of course, I would love to hear feedback on this document! > > It seems top heavy for an organization that has maybe three people working on Numpy C code in their spare time. I think the ideal here would be for someone to produce their own version as a working example and then we could discuss merging code, and also have something to play with. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Apr 22 20:26:36 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 22 Apr 2012 18:26:36 -0600 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 4:15 PM, Nathaniel Smith wrote: > We need to decide what to do with the NA masking code currently in > master, vis-a-vis the 1.7 release. While this code is great at what it > is, we don't actually have consensus yet that it's the best way to > give our users what they want/need -- or even an appropriate way. So > we need to figure out how to release 1.7 without committing ourselves > to supporting this design in the future. > > Background: what does the code currently in master do? > -------------------------------------------- > > It adds 3 pointers at the end of the PyArrayObject struct (which is > better known as the numpy.ndarray object). These new struct members, > and some accessors for them, are exposed as part of the public API. > There are also a few additions to the Python-level API (mask= argument > to np.array, skipna= argument to ufuncs, etc.) > > What does this mean for compatibility? > ------------------------------------------------ > > The change in the ndarray struct is not as problematic as it might > seem, compatibility-wise, since Python objects are almost always > referred to by pointers. Since the initial part of the struct will > continue to have the same memory layout, existing source and binary > code that works with PyArrayObject *pointers* will continue to work > unchanged. > > One place where the actual struct size matters is for any C-level > ndarray subclasses, which will have their memory layout change, and > thus will need to be recompiled. (Python-level ndarray subclasses will > have their memory layout change as well -- e.g., they will have > different __dictoffset__ values -- but it's unlikely that any existing > Python code depends on such details.) > > What if we want to change our minds later? > ------------------------------------------------------- > > For the same reasons as given above, any new code which avoids > referencing the new struct fields referring to masks, or using the new > masking APIs, will continue to work even if the masking is later > removed. > > Any new code which *does* refer to the new masking APIs, or references > the fields directly, will break if masking is later removed. > Specifically, source will fail to compile, and existing binaries will > silently access memory that is past the end of the PyArrayObject > struct, which will have unpredictable consequences. (Most likely > segfaults, but no guarantees.) This applies even to code which simply > tries to check whether a mask is present. > > So I think the preconditions for leaving this code as-is for 1.7 are > that we must agree: > * We are willing to require a recompile of any C-level ndarray > subclasses (do any exist?) > * We are willing to make absolutely no guarantees about future > compatibility for code which uses APIs marked "experimental" > * We are willing for this breakage to occur in the form of random > segfaults > * We are okay with the extra 3 pointers worth of memory overhead on > each ndarray > > Personally I can live with all of these if everyone else can, but I'm > nervous about reducing our compatibility guarantees like that, and > we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we > currently have. (Maybe we should resurrect the weasels ;-) [1]) > > [1] > http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html > > Any other options? > ------------------------ > > Alternative 1: The obvious other option is to go through and move all > the strictly mask-related code out of master and into a branch. > Presumably this wouldn't include all the infrastructure that Mark > added, since a lot of it is e.g. shared with where=, and that would > stay. Even so, this would be a big and possibly time-consuming change. > > Alternative 2: After auditing the code a bit, the cleanest third > option I can think of is: > > 1. Go through and make sure that all numpy-internal access to the new > maskna fields happens via the accessor functions. (This patch would > produce no functionality change.) > 2. Move the accessors into some numpy-internal header file, so that > user code can't call them. > 3. Remove the mask= argument to Python-level ndarray constructors, > remove the new maskna_ fields from PyArrayObject, and modify the > accessors so that they always return NULL, 0, etc., as if the array > does not have a mask. > > This would make 1.7 completely compatible with 1.6 API and ABI-wise. > But it would also be a minimal code change, leaving the mask-related > code paths in place but inaccessible. If we decided to re-enable them, > it would just be matter of reverting steps (3) and (2). > > The main downside I see with this approach is that leaving a bunch of > inaccessible code paths lying around might make it harder to maintain > 1.7 as a "long term support" release. > > I'm personally willing to implement either of these changes. Or > perhaps there's another option that I'm not thinking of! > > I'm not deeply invested in the current version of masked NA. OTOH, code development usually goes through several cycles of implementation and trial. My own rule of thumb is that everything needs to be rewritten three times, which in fact has happened with Numpy with Numeric and Numarray as precursors. I think a fourth rewrite of much of the code is going to happen in the future. What I do disagree with is the idea that everything has to be planned and designed up front based on consensus. I prefer a certain amount of trial and error leading to evolution. Numpy does need some way to experiment, and unless someone is willing to develop and maintain separate trees (which happened in Linux), there needs to be some wiggle room. Which is why I proposed a LTS release. In any case, I think a good topic for discussion is what we have learned from the current prototype exclusive of politics. Hopefully you have used it and can give us some feedback based on your own experience. I'd also like to hear from anyone else who is using it at the moment. Then we can discuss at a technical level what should be changed, alternative API's, and what works/sucks about what we have. I thought there were some good points along those lines made in the thread following Travis' first post. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Apr 22 20:49:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 22 Apr 2012 18:49:44 -0600 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 6:26 PM, Charles R Harris wrote: > > > On Sun, Apr 22, 2012 at 4:15 PM, Nathaniel Smith wrote: > >> We need to decide what to do with the NA masking code currently in >> master, vis-a-vis the 1.7 release. While this code is great at what it >> is, we don't actually have consensus yet that it's the best way to >> give our users what they want/need -- or even an appropriate way. So >> we need to figure out how to release 1.7 without committing ourselves >> to supporting this design in the future. >> >> Background: what does the code currently in master do? >> -------------------------------------------- >> >> It adds 3 pointers at the end of the PyArrayObject struct (which is >> better known as the numpy.ndarray object). These new struct members, >> and some accessors for them, are exposed as part of the public API. >> There are also a few additions to the Python-level API (mask= argument >> to np.array, skipna= argument to ufuncs, etc.) >> >> What does this mean for compatibility? >> ------------------------------------------------ >> >> The change in the ndarray struct is not as problematic as it might >> seem, compatibility-wise, since Python objects are almost always >> referred to by pointers. Since the initial part of the struct will >> continue to have the same memory layout, existing source and binary >> code that works with PyArrayObject *pointers* will continue to work >> unchanged. >> >> One place where the actual struct size matters is for any C-level >> ndarray subclasses, which will have their memory layout change, and >> thus will need to be recompiled. (Python-level ndarray subclasses will >> have their memory layout change as well -- e.g., they will have >> different __dictoffset__ values -- but it's unlikely that any existing >> Python code depends on such details.) >> >> What if we want to change our minds later? >> ------------------------------------------------------- >> >> For the same reasons as given above, any new code which avoids >> referencing the new struct fields referring to masks, or using the new >> masking APIs, will continue to work even if the masking is later >> removed. >> >> Any new code which *does* refer to the new masking APIs, or references >> the fields directly, will break if masking is later removed. >> Specifically, source will fail to compile, and existing binaries will >> silently access memory that is past the end of the PyArrayObject >> struct, which will have unpredictable consequences. (Most likely >> segfaults, but no guarantees.) This applies even to code which simply >> tries to check whether a mask is present. >> >> So I think the preconditions for leaving this code as-is for 1.7 are >> that we must agree: >> * We are willing to require a recompile of any C-level ndarray >> subclasses (do any exist?) >> * We are willing to make absolutely no guarantees about future >> compatibility for code which uses APIs marked "experimental" >> * We are willing for this breakage to occur in the form of random >> segfaults >> * We are okay with the extra 3 pointers worth of memory overhead on >> each ndarray >> >> Personally I can live with all of these if everyone else can, but I'm >> nervous about reducing our compatibility guarantees like that, and >> we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we >> currently have. (Maybe we should resurrect the weasels ;-) [1]) >> >> [1] >> http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html >> >> Any other options? >> ------------------------ >> >> Alternative 1: The obvious other option is to go through and move all >> the strictly mask-related code out of master and into a branch. >> Presumably this wouldn't include all the infrastructure that Mark >> added, since a lot of it is e.g. shared with where=, and that would >> stay. Even so, this would be a big and possibly time-consuming change. >> >> Alternative 2: After auditing the code a bit, the cleanest third >> option I can think of is: >> >> 1. Go through and make sure that all numpy-internal access to the new >> maskna fields happens via the accessor functions. (This patch would >> produce no functionality change.) >> 2. Move the accessors into some numpy-internal header file, so that >> user code can't call them. >> 3. Remove the mask= argument to Python-level ndarray constructors, >> remove the new maskna_ fields from PyArrayObject, and modify the >> accessors so that they always return NULL, 0, etc., as if the array >> does not have a mask. >> >> This would make 1.7 completely compatible with 1.6 API and ABI-wise. >> But it would also be a minimal code change, leaving the mask-related >> code paths in place but inaccessible. If we decided to re-enable them, >> it would just be matter of reverting steps (3) and (2). >> >> The main downside I see with this approach is that leaving a bunch of >> inaccessible code paths lying around might make it harder to maintain >> 1.7 as a "long term support" release. >> >> I'm personally willing to implement either of these changes. Or >> perhaps there's another option that I'm not thinking of! >> >> > > I'm not deeply invested in the current version of masked NA. OTOH, code > development usually goes through several cycles of implementation and > trial. My own rule of thumb is that everything needs to be rewritten three > times, which in fact has happened with Numpy with Numeric and Numarray as > precursors. I think a fourth rewrite of much of the code is going to happen > in the future. What I do disagree with is the idea that everything has to > be planned and designed up front based on consensus. I prefer a certain > amount of trial and error leading to evolution. Numpy does need some way to > experiment, and unless someone is willing to develop and maintain separate > trees (which happened in Linux), there needs to be some wiggle room. Which > is why I proposed a LTS release. > > In any case, I think a good topic for discussion is what we have learned > from the current prototype exclusive of politics. Hopefully you have used > it and can give us some feedback based on your own experience. I'd also > like to hear from anyone else who is using it at the moment. Then we can > discuss at a technical level what should be changed, alternative API's, and > what works/sucks about what we have. I thought there were some good points > along those lines made in the thread following Travis' first post. > > To expand on the last, I think that if we are going to have masked arrays, all arrays should be masked, but that an actual mask doesn't get allocated until it is used, so the masked keyword would go away. Ignored values also need better support, i.e., erasure. Re the ndarray structure, it needs to be hidden at some point and that has been under discussion since 1.3, but doing that will take time and needs planning and a preliminary timeline since it will effect a lot of people. I'm thinking several years at least. OT, it also seems that several folks would like more efficient small arrays. It might be worth devoting some time to profiling the current code and removing bottlenecks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sun Apr 22 21:39:50 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 22 Apr 2012 18:39:50 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: Hi Nathaniel, thanks for a solid writeup of this topic. I just want to add a note from personal experience, regarding this specific point: On Sun, Apr 22, 2012 at 3:15 PM, Nathaniel Smith wrote: > Usually disagreements are an indication that a > better solution is possible, even when it's not clear what that would > be. I think this is *extremely* important, so I want to highlight it from the rest of your post. Regarding how IPython operates, I think we have good evidence to illustrate the value of this... One of the members of the IPython team who joined earliest is Brian Granger: he started working on IPython around 2004 after a conversation we had in the context of a SciPy conference. Some of you may know that Brian and I went to graduate school together, which means we've known each other for much longer than IPython, and we've been good friends since. But that alone doesn't ensure a smooth collaboration; in fact Brian and I extremely often disagree *deeply* on design decisions about IPython. And yet, I think the project makes solid progress, not despite this but in an important way *thanks* to this divergence. Whenever we disagree, it typically means that each of us is seeing a partial solution to a problem, but not a really solid and complete one. I don't recall ever using my 'BDFL vote' in one of these discussions; instead we just keep going around the problem. Typically what happens is that after much discussion, we settle on a new solution that neither of us had quite seen at the start. I mention Brian specifically because him and I seem to be at opposite ends of some weird spectrum, disagreement between the other parties appears to fall somewhere in between. Here's an example that is currently in open discussion, and despite the fact that I'm completely convinced that something like this should go into IPython, I'm waiting. We'll continue the discussion to either find arguments that convince me otherwise, or to convince Brian of the value of the PR: https://github.com/ipython/ipython/pull/1343 It takes both patience and trust for this to work: we have to be willing to wait out the long discussion, and we have to trust that despite how much we may disagree on something, we both play fair and ultimately only want what's best for the project. That means giving the other party the benefit of the doubt at every turn, and having a willingness to let the discussion happen openly as long as is necessary for the project to remain healthy. For example in this case, I'm *really* convinced of my point, and I think blocking this PR actively hurts users. Is it worth saying "OK, I'm overriding your concerns here and pushing this forward"? Absolutely NOT! I'd only: - alienate Brian, a key member of the project without whom IPython would be nowhere near where it is today, and decrease his motivation to continue working - kill the opportunity for a discussion to produce an even cleaner solution than what we've seen so far - piss off a good friend. I put this last because while that's actually a very important reason for me, the fact that Brian and I are good personal friends is secondary here: this is about discussion between contributors independent of their personal relationships. I hope this perspective is useful... > 1. Make it as easy as possible for people to see what's going on and > join the discussion. All decisions and reasoning behind decisions take > place in public. (On this note, it would be *really* good if pull > request notifications went to the list.) If anyone knows how to do this, let me know; I'd like to do the same for IPython and our -dev list. Cheers, f From ralf.gommers at googlemail.com Mon Apr 23 02:22:02 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 08:22:02 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 3:44 PM, Charles R Harris wrote: > > > On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers > wrote: > >> >> >> On Sat, Apr 21, 2012 at 5:16 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> >>>> >>>> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Given the amount of new stuff coming in 1.7 and the slip in it's >>>>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>>>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>>>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>>>> might as well use a somewhat fixed up version. The downside is located and >>>>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>>>> would be preferable, but I'm not sure when we can make that happen. >>>>> >>>> >>>> Travis still sounded hopeful of being able to resolve the 1.7 issues >>>> relatively soon. On the other hand, even if that's done in one month we'll >>>> still miss Debian stable and a 1.6.2 release won't be *that* much work. >>>> >>>> Let's go for it I would say. >>>> >>>> Aiming for a RC on May 2nd and final release on May 16th would work for >>>> me. >>>> >>>> >>> I count 280 BUG commits since 1.6.1, so we are going to need to thin >>> those out. >>> >> >> Indeed. We can discard all commits related to NA and datetime, and then >> we should find some balance between how important the fixes are and how >> much risk there is that they break something. I agree with the couple of >> backports you've done so far, but I propose to do the rest via PRs. >> > Charles, did you have some practical way in mind to select these commits? We could split it up by time range or by submodules for example. I'd prefer the latter. You would be able to do a better job of the commits touching numpy/core than I. How about you do that one and the polynomial module, and I do the rest? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 23 02:47:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Apr 2012 00:47:24 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 12:22 AM, Ralf Gommers wrote: > > > On Sun, Apr 22, 2012 at 3:44 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Sat, Apr 21, 2012 at 5:16 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Sat, Apr 21, 2012 at 2:46 AM, Ralf Gommers < >>>> ralf.gommers at googlemail.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Apr 20, 2012 at 8:04 PM, Charles R Harris < >>>>> charlesr.harris at gmail.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Given the amount of new stuff coming in 1.7 and the slip in it's >>>>>> schedule, I wonder if it would be worth putting out a 1.6.2 release with >>>>>> fixes for einsum, ticket 1578, perhaps some others. My reasoning is that >>>>>> the fall releases of Fedora, Ubuntu are likely to still use 1.6 and they >>>>>> might as well use a somewhat fixed up version. The downside is located and >>>>>> backporting fixes is likely to be a fair amount of work. A 1.7 release >>>>>> would be preferable, but I'm not sure when we can make that happen. >>>>>> >>>>> >>>>> Travis still sounded hopeful of being able to resolve the 1.7 issues >>>>> relatively soon. On the other hand, even if that's done in one month we'll >>>>> still miss Debian stable and a 1.6.2 release won't be *that* much work. >>>>> >>>>> Let's go for it I would say. >>>>> >>>>> Aiming for a RC on May 2nd and final release on May 16th would work >>>>> for me. >>>>> >>>>> >>>> I count 280 BUG commits since 1.6.1, so we are going to need to thin >>>> those out. >>>> >>> >>> Indeed. We can discard all commits related to NA and datetime, and then >>> we should find some balance between how important the fixes are and how >>> much risk there is that they break something. I agree with the couple of >>> backports you've done so far, but I propose to do the rest via PRs. >>> >> > Charles, did you have some practical way in mind to select these commits? > We could split it up by time range or by submodules for example. I'd prefer > the latter. You would be able to do a better job of the commits touching > numpy/core than I. How about you do that one and the polynomial module, and > I do the rest? > > I'll give it a shot. I thought the first thing I would try is a search on tickets. We'll also need to track things and I haven't thought of a good way to do that apart from making a list and checking things off. I don't think there was too much polynomial fixing, mostly new stuff, but I'd like to use the current documentation. I don't know how you manage that for releases. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfugate at microsoft.com Mon Apr 23 11:55:32 2012 From: dfugate at microsoft.com (Dave Fugate) Date: Mon, 23 Apr 2012 15:55:32 +0000 Subject: [Numpy-discussion] Command-line options for (Windows) NumPy Installer? Message-ID: <47FF78CF835BC64E99CC0C2C559478C30B8D5643@BY2PRD0310MB388.namprd03.prod.outlook.com> Thanks Ralf! I'm interested in unattended/silent installations. My best, Dave --------------------------------------------------- Date: Sat, 21 Apr 2012 10:48:36 +0200 From: Ralf Gommers Subject: Re: [Numpy-discussion] Command-line options for (Windows) NumPy Installer? To: Discussion of Numerical Python Message-ID: Content-Type: text/plain; charset="windows-1252" On Fri, Apr 20, 2012 at 8:05 PM, Dave Fugate wrote: > Hi, is there any documentation available on exactly which command line > options are available from NumPy?s ?superpack? installers on Windows? > E.g., http://docs.scipy.org/doc/numpy/user/install.html mentions an > ?/arch? flag, but I?m not seeing anything else called out. > Other than arch selection I think it's a fairly standard NSIS installer. No idea what else you can do with it though from the command line. Are you looking to accomplish some specific task? Ralf From ralf.gommers at googlemail.com Mon Apr 23 13:05:38 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 19:05:38 +0200 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 8:47 AM, Charles R Harris wrote: > > > On Mon, Apr 23, 2012 at 12:22 AM, Ralf Gommers < > ralf.gommers at googlemail.com> wrote: > >> >> >> On Sun, Apr 22, 2012 at 3:44 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> >>>> >>>>>> Aiming for a RC on May 2nd and final release on May 16th would work >>>>>> for me. >>>>>> >>>>>> >>>>> I count 280 BUG commits since 1.6.1, so we are going to need to thin >>>>> those out. >>>>> >>>> >>>> Indeed. We can discard all commits related to NA and datetime, and then >>>> we should find some balance between how important the fixes are and how >>>> much risk there is that they break something. I agree with the couple of >>>> backports you've done so far, but I propose to do the rest via PRs. >>>> >>> >> Charles, did you have some practical way in mind to select these commits? >> We could split it up by time range or by submodules for example. I'd prefer >> the latter. You would be able to do a better job of the commits touching >> numpy/core than I. How about you do that one and the polynomial module, and >> I do the rest? >> >> > I'll give it a shot. I thought the first thing I would try is a search on > tickets. We'll also need to track things and I haven't thought of a good > way to do that apart from making a list and checking things off. I don't > think there was too much polynomial fixing, mostly new stuff, but I'd like > to use the current documentation. I don't know how you manage that for > releases. > Nothing too fancy - I use the open tickets for the milestone at http://projects.scipy.org/numpy/report/3, plus the checklist at https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt and perhaps a small todo list in my inbox. Normally we only do bugfix releases for specific reasons, so besides those I just scan through the list of commits and pick only some relevant ones of which I'm sure that they won't give any problems. The fixed items under http://projects.scipy.org/numpy/query?status=closed&group=resolution&milestone=1.7.0 http://projects.scipy.org/numpy/query?status=closed&group=resolution&milestone=2.0.0 probably give the best overview. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Apr 23 13:18:24 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 19:18:24 +0200 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 12:15 AM, Nathaniel Smith wrote: > We need to decide what to do with the NA masking code currently in > master, vis-a-vis the 1.7 release. While this code is great at what it > is, we don't actually have consensus yet that it's the best way to > give our users what they want/need -- or even an appropriate way. So > we need to figure out how to release 1.7 without committing ourselves > to supporting this design in the future. > > Background: what does the code currently in master do? > -------------------------------------------- > > It adds 3 pointers at the end of the PyArrayObject struct (which is > better known as the numpy.ndarray object). These new struct members, > and some accessors for them, are exposed as part of the public API. > There are also a few additions to the Python-level API (mask= argument > to np.array, skipna= argument to ufuncs, etc.) > > What does this mean for compatibility? > ------------------------------------------------ > > The change in the ndarray struct is not as problematic as it might > seem, compatibility-wise, since Python objects are almost always > referred to by pointers. Since the initial part of the struct will > continue to have the same memory layout, existing source and binary > code that works with PyArrayObject *pointers* will continue to work > unchanged. > > One place where the actual struct size matters is for any C-level > ndarray subclasses, which will have their memory layout change, and > thus will need to be recompiled. (Python-level ndarray subclasses will > have their memory layout change as well -- e.g., they will have > different __dictoffset__ values -- but it's unlikely that any existing > Python code depends on such details.) > > What if we want to change our minds later? > ------------------------------------------------------- > > For the same reasons as given above, any new code which avoids > referencing the new struct fields referring to masks, or using the new > masking APIs, will continue to work even if the masking is later > removed. > > Any new code which *does* refer to the new masking APIs, or references > the fields directly, will break if masking is later removed. > Specifically, source will fail to compile, and existing binaries will > silently access memory that is past the end of the PyArrayObject > struct, which will have unpredictable consequences. (Most likely > segfaults, but no guarantees.) This applies even to code which simply > tries to check whether a mask is present. > > So I think the preconditions for leaving this code as-is for 1.7 are > that we must agree: > * We are willing to require a recompile of any C-level ndarray > subclasses (do any exist?) > As long as it's only subclasses I think this may be OK. Not 100% sure on this one though. > * We are willing to make absolutely no guarantees about future > compatibility for code which uses APIs marked "experimental" > That is what I understand "experimental" to mean. Could stay, could change - no guarantees. > * We are willing for this breakage to occur in the form of random > segfaults > This is not OK of course. But it shouldn't apply to the Python API, which I think is the most important one here. > * We are okay with the extra 3 pointers worth of memory overhead on > each ndarray > > Personally I can live with all of these if everyone else can, but I'm > nervous about reducing our compatibility guarantees like that, and > we'd probably need, at a minimum, a flashier EXPERIMENTAL sign than we > currently have. (Maybe we should resurrect the weasels ;-) [1]) > > [1] > http://mail.scipy.org/pipermail/numpy-discussion/2012-March/061204.html > > I'm personally willing to implement either of these changes. > Thank you Nathaniel, that is a very important and helpful statement. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Apr 23 13:59:22 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 19:59:22 +0200 Subject: [Numpy-discussion] Command-line options for (Windows) NumPy Installer? In-Reply-To: <47FF78CF835BC64E99CC0C2C559478C30B8D5643@BY2PRD0310MB388.namprd03.prod.outlook.com> References: <47FF78CF835BC64E99CC0C2C559478C30B8D5643@BY2PRD0310MB388.namprd03.prod.outlook.com> Message-ID: On Mon, Apr 23, 2012 at 5:55 PM, Dave Fugate wrote: > Thanks Ralf! I'm interested in unattended/silent installations. > > I'm afraid that that doesn't work. NSIS installers provide the /S option for silent installs, but it requires some changes to the install script that we apparently didn't make. I opened http://projects.scipy.org/numpy/ticket/2112 for this. Ralf > > --------------------------------------------------- > Date: Sat, 21 Apr 2012 10:48:36 +0200 > From: Ralf Gommers > Subject: Re: [Numpy-discussion] Command-line options for (Windows) > NumPy Installer? > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset="windows-1252" > > On Fri, Apr 20, 2012 at 8:05 PM, Dave Fugate > wrote: > > > Hi, is there any documentation available on exactly which command line > > options are available from NumPy?s ?superpack? installers on Windows? > > E.g., http://docs.scipy.org/doc/numpy/user/install.html mentions an > > ?/arch? flag, but I?m not seeing anything else called out. > > > > Other than arch selection I think it's a fairly standard NSIS installer. No > idea what else you can do with it though from the command line. Are you > looking to accomplish some specific task? > > Ralf > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Apr 23 14:05:09 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 23 Apr 2012 11:05:09 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: Hi, On Sun, Apr 22, 2012 at 3:15 PM, Nathaniel Smith wrote: > If you hang around big FOSS projects, you'll see the word "consensus" > come up a lot. For example, the glibc steering committee recently > dissolved itself in favor of governance "directly by the consensus of > the people active in glibc development"[1]. It's the governing rule of > the IETF, which defines many of the most important internet > standards[2]. It is the "primary way decisions are made on > Wikipedia"[3]. It's "one of the fundamental aspects of accomplishing > things within the Apache framework"[4]. > > [1] https://lwn.net/Articles/488778/ > [2] https://www.ietf.org/tao.html#getting.things.done > [3] https://en.wikipedia.org/wiki/Wikipedia:Consensus > [4] https://www.apache.org/foundation/voting.html I think the big problem here is that Chuck (I hope I'm not misrepresenting him) is not interested in discussion of process, and the last time we had a specific thread on governance, Travis strongly implied he was not very interested either, at least at the time. In that situation, there's rather a high threshold to pass before getting involved in the discussion, and I think you're seeing some evidence for that. So, as before, and as we discussed on gchat :) - whether this discussion can go anywhere depends on Travis. Travis - what do you think? See you, Matthew From ralf.gommers at googlemail.com Mon Apr 23 14:05:13 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 20:05:13 +0200 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated In-Reply-To: References: <4F8F009F.6050006@gmail.com> Message-ID: On Thu, Apr 19, 2012 at 3:12 AM, wrote: > On Wed, Apr 18, 2012 at 4:14 PM, Pauli Virtanen wrote: > > Hi, > > > > 18.04.2012 19:57, Alan G Isaac kirjoitti: > >> > http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib > >> promises a list of functions that does not appear (at the moment, > anyway). > > > > This doesn't seem to be due to a technical reason, but rather than > > because nobody has written a list of the functions in the docstring of > > the module. > > Is it a good idea to use this? Mixing namespaces would completely confuse > me. > > >>> for f in dir(numpy.matlib): > ... try: > ... if getattr(numpy.matlib, f).__module__ in ['numpy.matlib', > 'numpy.matrixlib.defmatrix']: print f > ... except: pass > ... > asmatrix > bmat > empty > eye > identity > mat > matrix > ones > rand > randn > repmat > zeros Looks good to me. Did you plan to put this somewhere (PR, doc wiki)? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Apr 23 14:42:58 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 23 Apr 2012 14:42:58 -0400 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated In-Reply-To: References: <4F8F009F.6050006@gmail.com> Message-ID: On Mon, Apr 23, 2012 at 2:05 PM, Ralf Gommers wrote: > > > On Thu, Apr 19, 2012 at 3:12 AM, wrote: >> >> On Wed, Apr 18, 2012 at 4:14 PM, Pauli Virtanen wrote: >> > Hi, >> > >> > 18.04.2012 19:57, Alan G Isaac kirjoitti: >> >> >> >> http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib >> >> promises a list of functions that does not appear (at the moment, >> >> anyway). >> > >> > This doesn't seem to be due to a technical reason, but rather than >> > because nobody has written a list of the functions in the docstring of >> > the module. >> >> Is it a good idea to use this? Mixing namespaces would completely confuse >> me. >> >> >>> for f in dir(numpy.matlib): >> ... ? ? try: >> ... ? ? ? ? if getattr(numpy.matlib, f).__module__ in ['numpy.matlib', >> 'numpy.matrixlib.defmatrix']: print f >> ... ? ? except: pass >> ... >> asmatrix >> bmat >> empty >> eye >> identity >> mat >> matrix >> ones >> rand >> randn >> repmat >> zeros > > > Looks good to me. Did you plan to put this somewhere (PR, doc wiki)? I was hoping it isn't me that struggles with rst http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.matlib.rst/ (Since we are not voting based on number of PRs, I prefer the doc wiki. Instant feedback. :) Josef > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Mon Apr 23 15:33:27 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 23 Apr 2012 20:33:27 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 1:04 AM, Charles R Harris wrote: > > > On Sun, Apr 22, 2012 at 4:15 PM, Nathaniel Smith wrote: >> >> If you hang around big FOSS projects, you'll see the word "consensus" >> come up a lot. For example, the glibc steering committee recently >> dissolved itself in favor of governance "directly by the consensus of >> the people active in glibc development"[1]. It's the governing rule of >> the IETF, which defines many of the most important internet >> standards[2]. It is the "primary way decisions are made on >> Wikipedia"[3]. It's "one of the fundamental aspects of accomplishing >> things within the Apache framework"[4]. >> >> [1] https://lwn.net/Articles/488778/ >> [2] https://www.ietf.org/tao.html#getting.things.done >> [3] https://en.wikipedia.org/wiki/Wikipedia:Consensus >> [4] https://www.apache.org/foundation/voting.html >> >> But it turns out that this "consensus" thing is actually somewhat >> mysterious, and one that most programmers immersed in this culture >> pick it up by osmosis. And numpy in particular has a lot of developers >> who are not coming from a classic FOSS programmer background! So this >> is my personal attempt to articulate what it is, and why requiring >> consensus is probably the best possible approach to project decision >> making. >> >> So what is "consensus"? Like, voting or something? >> ----------------------------------------------------- >> >> This is surprisingly subtle and specific. >> >> "Consensus" means something like, "everyone who cares is satisfied >> with the result". >> >> It does *not* mean >> * Every opinion counts equally >> * We vote on anything >> * Every solution must be perfect and flawless >> * Every solution must leave everyone overjoyed >> * Everyone must sign off on every solution. >> >> It *does* mean >> * We invite people to speak up >> * We generally trust individuals to decide how important their opinion is >> * We generally trust individuals to decide whether or not they can >> live with some outcome >> * If they can't, then we take the time to find something better. >> >> One simple way of stating this is, everyone has a veto. In practice, >> such vetoes are almost never used, so this rule is not particularly >> illuminating on its own. Hence, the rest of this document. >> >> What a waste of time! That all sounds very pretty on paper, but we >> have stuff to get done. >> >> ----------------------------------------------------------------------------------- >> >> First, I'll note that this seemingly utopian scheme has a track record >> of producing such impractical systems as TCP/IP, SMTP, DNS, Apache, >> GCC, Linux, Samba, Python, ... >> > > Linux is Linus' private tree. Everything that goes in is his decision, > everything that stays out is his decision. Of course, he delegates much of > the work to people he trusts, but it doesn't even reach the level of a BDFL, > it's DFL. As for consensus, it basically comes down to convincing the > gatekeepers one level below Linus that your code might be useful. So bad > example. Same with TCP/IP, which was basically Kahn and Cerf consulting with > a few others and working by request of DARPA. GCC was Richard Stallman (I > got one of the first tapes for a $30 donation), Python was Guido. Some of > the projects later developed some form of governance but Guido, for > instance, can veto anything he dislikes even if he is disinclined to do so. > I'm not saying you're wrong about open source, I'm just saying that that > each project differs and it is wrong to imply that they follow some common > form of governance under the rubric FOSS and that they all seek consensus. > And they certainly don't *start* that way. And there are also plenty of > projects that fail when the prime mover loses interest or folks get tired of > the politics. So a few points here: Consensus-based decision-making is an ideal and a guide, not an algorithm. There's nothing at all inconsistent between having a BDFL and using consensus as the primary guide for decision making -- it just means that the BDFL chooses to exercise their power in that way, and is generally trusted to make judgement calls about specific cases. See Fernando's reply down-thread for an example of this. And I'm not saying that all FOSS projects follow some common form of governance. But I am saying that there's a substantial amount of shared development culture across most successful FOSS projects, and a ton of experience on how to run a project successfully. Project management is a difficult and arcane skill set, and one that's hard to learn except through apprenticeship and osmosis. And it's definitely not included in most courses on programming for scientists! So it'd be nice if numpy could avoid having to re-make some of these mistakes... But the other effect of this being cultural values rather than something explicit and articulated is that sometimes you can't see it from the outside. For example: Linux: Technically, everything you say is true. In practice, good luck convincing Linus or a subsystem maintainer to accept your patch when other people are raising substantive complaints. Here's an email I googled up in a few moments, in which Linus yells at people for trying to submit a patch to him without making sure that all interested parties have agreed: https://lkml.org/lkml/2009/9/14/481 Stuff regularly sits outside the kernel tree in limbo for *years* while people debate different approaches back and forth. Of course the kernel development process is far more complicated than I can capture with a bit of amateur anthropology here, but think about this: why do all these multinational companies *care* what some guy named Linus puts in his tree? I don't think they're impressed by how his name sounds similar to "Linux". But, his trees consistently do well enough at the things they care about that they stick around, i.e., empirically, he's achieving reasonable consensus. And when that fails, like, say, with the Android fork, then you can see what a mess results. (This is the "you *will* achieve consensus sooner or later" part.) GCC: I just asked my friend Zack Weinberg about this via IM -- among other things, he wrote the current C preprocessor, and used to be one of the dozen people who had blanket write access to the GCC repo. His response was that yes, GCC was originally run by RMS as dictator, and then Richard Kenner as dictator, and then, "you remember that EGCS fork that happened back in the nineties? That was because Kenner didn't scale, and people wanted a more consensus-based process. And that was so successful that it became the official branch". He also pointed out that "the way things actually get done on a day-to-day basis in GCC can look an awful lot like "committers do what they want" if you only read the mailing list casually, but that's because everyone with blanket commit rights is trusted to not fuck up". TCP/IP: I'm not exactly privy to how Kahn and Cerf worked, but (1) they didn't exactly design it in a vacuum and impose it by fiat -- in fact, it was originally like a series of academic articles developed over 5+ years, wasn't it? (2) we're not using their TCP/IP, either. That stopped working in the mid-80s: https://en.wikipedia.org/wiki/Network_congestion#History TCP/IP has been under IETF stewardship for a *long* time. And in any case, nothing about consensus-based decision making rules out having a few geniuses produce some beautiful design. It's about how you recognize when that has happened. I think you get the point. I don't think there are any examples of Guido saying "hey, I like this feature, so I'm going to put it into the next Python release, and then see whether people like it or not and decide what to do with it then". That would be really shocking, actually. [...] >> So mainly what I'm saying we should do is: >> 1. Make it as easy as possible for people to see what's going on and >> join the discussion. All decisions and reasoning behind decisions take >> place in public. (On this note, it would be *really* good if pull >> request notifications went to the list.) >> 2. If someone raises a substantive objection, take that seriously. >> 3. If someone says "no, this is just not going to work for me, >> because... ", then it can't go in. >> > > What happens when someone wants to spend all their time talking about > process? It can get kind of old. Yeah, such discussions can certainly be exhausting. Anyway, the answer is in the bit about obstructive people down below -- if you think someone's behaving in a way that's destructive to the project, then I'd suggest gathering evidence, making a case, etc.: http://producingoss.com/en/difficult-people.html "You may not persuade the person in question, but that's okay as long as you persuade everyone else." [...] > It seems top heavy for an organization that has maybe three people working > on Numpy C code in their spare time. I think the ideal here would be for > someone to produce their own version as a working example and then we could > discuss merging code, and also have something to play with. I'm not sure what in the above list is "top heavy" -- can you elaborate? If three people are all we have to support many thousands of users, then to me those rules seem like a good way for them to get feedback and avoid wasting limited resources. And perhaps they'll help get more people involved. (This is a point Zack made to me too: "I wanted to get involved with GCC for some time before I actually could, and EGCS was what made it possible".) -- Nathaniel From ralf.gommers at googlemail.com Mon Apr 23 15:42:52 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 21:42:52 +0200 Subject: [Numpy-discussion] documentation bug: Matrix library page not populated In-Reply-To: References: <4F8F009F.6050006@gmail.com> Message-ID: On Mon, Apr 23, 2012 at 8:42 PM, wrote: > On Mon, Apr 23, 2012 at 2:05 PM, Ralf Gommers > wrote: > > > > > > On Thu, Apr 19, 2012 at 3:12 AM, wrote: > >> > >> On Wed, Apr 18, 2012 at 4:14 PM, Pauli Virtanen wrote: > >> > Hi, > >> > > >> > 18.04.2012 19:57, Alan G Isaac kirjoitti: > >> >> > >> >> > http://docs.scipy.org/doc/numpy/reference/routines.matlib.html#module-numpy.matlib > >> >> promises a list of functions that does not appear (at the moment, > >> >> anyway). > >> > > >> > This doesn't seem to be due to a technical reason, but rather than > >> > because nobody has written a list of the functions in the docstring of > >> > the module. > >> > >> Is it a good idea to use this? Mixing namespaces would completely > confuse > >> me. > >> > >> >>> for f in dir(numpy.matlib): > >> ... try: > >> ... if getattr(numpy.matlib, f).__module__ in ['numpy.matlib', > >> 'numpy.matrixlib.defmatrix']: print f > >> ... except: pass > >> ... > >> asmatrix > >> bmat > >> empty > >> eye > >> identity > >> mat > >> matrix > >> ones > >> rand > >> randn > >> repmat > >> zeros > > > > > > Looks good to me. Did you plan to put this somewhere (PR, doc wiki)? > > I was hoping it isn't me that struggles with rst > > http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.matlib.rst/ > > (Since we are not voting based on number of PRs, I prefer the doc > wiki. Instant feedback. :) > Great, thanks. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Apr 23 15:48:55 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 23 Apr 2012 12:48:55 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: Hi, On Mon, Apr 23, 2012 at 12:33 PM, Nathaniel Smith wrote: > On Mon, Apr 23, 2012 at 1:04 AM, Charles R Harris > wrote: >> Linux is Linus' private tree. Everything that goes in is his decision, >> everything that stays out is his decision. Of course, he delegates much of >> the work to people he trusts, but it doesn't even reach the level of a BDFL, >> it's DFL. As for consensus, it basically comes down to convincing the >> gatekeepers one level below Linus that your code might be useful. So bad >> example. Same with TCP/IP, which was basically Kahn and Cerf consulting with >> a few others and working by request of DARPA. GCC was Richard Stallman (I >> got one of the first tapes for a $30 donation), Python was Guido. Some of >> the projects later developed some form of governance but Guido, for >> instance, can veto anything he dislikes even if he is disinclined to do so. >> I'm not saying you're wrong about open source, I'm just saying that that >> each project differs and it is wrong to imply that they follow some common >> form of governance under the rubric FOSS and that they all seek consensus. >> And they certainly don't *start* that way. And there are also plenty of >> projects that fail when the prime mover loses interest or folks get tired of >> the politics. [snip] > Linux: Technically, everything you say is true. In practice, good luck > convincing Linus or a subsystem maintainer to accept your patch when > other people are raising substantive complaints. Here's an email I > googled up in a few moments, in which Linus yells at people for trying > to submit a patch to him without making sure that all interested > parties have agreed: > ?https://lkml.org/lkml/2009/9/14/481 > Stuff regularly sits outside the kernel tree in limbo for *years* > while people debate different approaches back and forth. To which I'd add: "In fact, for [Linus'] decisions to be received as legitimate, they have to be consistent with the consensus of the opinions of participating developers as manifest on Linux mailing lists. It is not unusual for him to back down from a decision under the pressure of criticism from other developers. His position is based on the recognition of his fitness by the community of Linux developers and this type of authority is, therefore, constantly subject to withdrawal. His role is not that of a boss or a manager in the usual sense. In the final analysis, the direction of the project springs from the cumulative synthesis of modifications contributed by individual developers." http://shareable.net/blog/governance-of-open-source-george-dafermos-interview See you, Matthew From njs at pobox.com Mon Apr 23 15:57:10 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 23 Apr 2012 20:57:10 +0100 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 6:18 PM, Ralf Gommers wrote: > > > On Mon, Apr 23, 2012 at 12:15 AM, Nathaniel Smith wrote: >> >> We need to decide what to do with the NA masking code currently in >> master, vis-a-vis the 1.7 release. While this code is great at what it >> is, we don't actually have consensus yet that it's the best way to >> give our users what they want/need -- or even an appropriate way. So >> we need to figure out how to release 1.7 without committing ourselves >> to supporting this design in the future. >> >> Background: what does the code currently in master do? >> -------------------------------------------- >> >> It adds 3 pointers at the end of the PyArrayObject struct (which is >> better known as the numpy.ndarray object). These new struct members, >> and some accessors for them, are exposed as part of the public API. >> There are also a few additions to the Python-level API (mask= argument >> to np.array, skipna= argument to ufuncs, etc.) >> >> What does this mean for compatibility? >> ------------------------------------------------ >> >> The change in the ndarray struct is not as problematic as it might >> seem, compatibility-wise, since Python objects are almost always >> referred to by pointers. Since the initial part of the struct will >> continue to have the same memory layout, existing source and binary >> code that works with PyArrayObject *pointers* will continue to work >> unchanged. >> >> One place where the actual struct size matters is for any C-level >> ndarray subclasses, which will have their memory layout change, and >> thus will need to be recompiled. (Python-level ndarray subclasses will >> have their memory layout change as well -- e.g., they will have >> different __dictoffset__ values -- but it's unlikely that any existing >> Python code depends on such details.) >> >> What if we want to change our minds later? >> ------------------------------------------------------- >> >> For the same reasons as given above, any new code which avoids >> referencing the new struct fields referring to masks, or using the new >> masking APIs, will continue to work even if the masking is later >> removed. >> >> Any new code which *does* refer to the new masking APIs, or references >> the fields directly, will break if masking is later removed. >> Specifically, source will fail to compile, and existing binaries will >> silently access memory that is past the end of the PyArrayObject >> struct, which will have unpredictable consequences. (Most likely >> segfaults, but no guarantees.) This applies even to code which simply >> tries to check whether a mask is present. >> >> So I think the preconditions for leaving this code as-is for 1.7 are >> that we must agree: >> ?* We are willing to require a recompile of any C-level ndarray >> subclasses (do any exist?) > > > As long as it's only subclasses I think this may be OK. Not 100% sure on > this one though. > >> >> ?* We are willing to make absolutely no guarantees about future >> compatibility for code which uses APIs marked "experimental" > > > That is what I understand "experimental" to mean. Could stay, could change - > no guarantees. Earlier you said it meant "some changes are to be expected, but not complete removal", which seems different from "absolutely no guarantees": http://www.mail-archive.com/numpy-discussion at scipy.org/msg36833.html So I just wanted to double-check whether you're revising that earlier opinion, or...? >> ?* We are willing for this breakage to occur in the form of random >> segfaults > > > This is not OK of course. But it shouldn't apply to the Python API, which I > think is the most important one here. Right, this part is specifically about ABI compatibility, not API compatibility -- segfaults would only occur for extension libraries that were compiled against one version of numpy and then used with a different version. - N From chris.barker at noaa.gov Mon Apr 23 16:16:08 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 23 Apr 2012 13:16:08 -0700 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 12:57 PM, Nathaniel Smith wrote: > Right, this part is specifically about ABI compatibility, not API > compatibility -- segfaults would only occur for extension libraries > that were compiled against one version of numpy and then used with a > different version. Which makes me think -- the ABI will have changed by adding three new pointers to the end of the main struct, yes? Of the options on the table, do any of the others involve adding three new pointers? What I'm getting at is that while the API and symantics may change with a different NA system -- maybe the ABI won't change as much (even if those pointers mean something different, but the size of the struct could be constant). Or is this just a fantasy? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From njs at pobox.com Mon Apr 23 16:24:16 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 23 Apr 2012 21:24:16 +0100 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 9:16 PM, Chris Barker wrote: > On Mon, Apr 23, 2012 at 12:57 PM, Nathaniel Smith wrote: >> Right, this part is specifically about ABI compatibility, not API >> compatibility -- segfaults would only occur for extension libraries >> that were compiled against one version of numpy and then used with a >> different version. > > Which makes me think -- the ABI will have changed by adding three new > pointers to the end of the main struct, yes? No, re-read the original message :-). AFAICT the only place that *adding* the pointers will break backwards ABI compatibility is for C subclasses of ndarray, and it's not clear if any exist. > Of the options on the table, do any of the others involve adding three > new pointers? What I'm getting at is that while the API and symantics > may change with a different NA system -- maybe the ABI won't change as > much (even if those pointers mean something different, but the size of > the struct could be constant). If the size of the struct stays the same but the meaning of the pointers changes, then that's probably not going to lead to any good results for code which tries to manipulate those pointers using the wrong semantics. Usually ABI changes are strictly greater than API changes... though of course it depends on how big exactly the change is. -- Nathaniel From ralf.gommers at googlemail.com Mon Apr 23 16:31:38 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 23 Apr 2012 22:31:38 +0200 Subject: [Numpy-discussion] NEP mask code and the 1.7 release In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 9:57 PM, Nathaniel Smith wrote: > On Mon, Apr 23, 2012 at 6:18 PM, Ralf Gommers > wrote: > > > > > > On Mon, Apr 23, 2012 at 12:15 AM, Nathaniel Smith wrote: > >> > >> We need to decide what to do with the NA masking code currently in > >> master, vis-a-vis the 1.7 release. While this code is great at what it > >> is, we don't actually have consensus yet that it's the best way to > >> give our users what they want/need -- or even an appropriate way. So > >> we need to figure out how to release 1.7 without committing ourselves > >> to supporting this design in the future. > >> > >> Background: what does the code currently in master do? > >> -------------------------------------------- > >> > >> It adds 3 pointers at the end of the PyArrayObject struct (which is > >> better known as the numpy.ndarray object). These new struct members, > >> and some accessors for them, are exposed as part of the public API. > >> There are also a few additions to the Python-level API (mask= argument > >> to np.array, skipna= argument to ufuncs, etc.) > >> > >> What does this mean for compatibility? > >> ------------------------------------------------ > >> > >> The change in the ndarray struct is not as problematic as it might > >> seem, compatibility-wise, since Python objects are almost always > >> referred to by pointers. Since the initial part of the struct will > >> continue to have the same memory layout, existing source and binary > >> code that works with PyArrayObject *pointers* will continue to work > >> unchanged. > >> > >> One place where the actual struct size matters is for any C-level > >> ndarray subclasses, which will have their memory layout change, and > >> thus will need to be recompiled. (Python-level ndarray subclasses will > >> have their memory layout change as well -- e.g., they will have > >> different __dictoffset__ values -- but it's unlikely that any existing > >> Python code depends on such details.) > >> > >> What if we want to change our minds later? > >> ------------------------------------------------------- > >> > >> For the same reasons as given above, any new code which avoids > >> referencing the new struct fields referring to masks, or using the new > >> masking APIs, will continue to work even if the masking is later > >> removed. > >> > >> Any new code which *does* refer to the new masking APIs, or references > >> the fields directly, will break if masking is later removed. > >> Specifically, source will fail to compile, and existing binaries will > >> silently access memory that is past the end of the PyArrayObject > >> struct, which will have unpredictable consequences. (Most likely > >> segfaults, but no guarantees.) This applies even to code which simply > >> tries to check whether a mask is present. > >> > >> So I think the preconditions for leaving this code as-is for 1.7 are > >> that we must agree: > >> * We are willing to require a recompile of any C-level ndarray > >> subclasses (do any exist?) > > > > > > As long as it's only subclasses I think this may be OK. Not 100% sure on > > this one though. > > > >> > >> * We are willing to make absolutely no guarantees about future > >> compatibility for code which uses APIs marked "experimental" > > > > > > That is what I understand "experimental" to mean. Could stay, could > change - > > no guarantees. > > Earlier you said it meant "some changes are to be expected, but not > complete removal", which seems different from "absolutely no > guarantees": > http://www.mail-archive.com/numpy-discussion at scipy.org/msg36833.html > So I just wanted to double-check whether you're revising that earlier > opinion, or...? > Stay and change are both not the same as complete removal. But to spell it out: if we release a feature, I expect it to stay in some form. That still means we can change APIs (i.e. no compatibility for code written against the old API), but not removing the concept itself. If we're not even sure that the concept should stay, why bother releasing it as experimental? Experimental is for finding out what works well, not for whether or not we need some concept at all. > >> * We are willing for this breakage to occur in the form of random > >> segfaults > > > > > > This is not OK of course. But it shouldn't apply to the Python API, > which I > > think is the most important one here. > > Right, this part is specifically about ABI compatibility, not API > compatibility -- segfaults would only occur for extension libraries > that were compiled against one version of numpy and then used with a > different version. That's what I suspected, but not what your earlier email said. I understood your email as talking only about segfaults for code using the new NA C API. Breaking ABI compatibility is a no-go. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 23 16:40:04 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 23 Apr 2012 21:40:04 +0100 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: Hi Paul, On Wed, Apr 11, 2012 at 8:57 PM, Paul Hobson wrote: > Travis et al, > > This isn't a reply to anything specific in your email and I apologize > if there is a better thread or place to share this information. I've > been meaning to participate in the discussion for a long time and > never got around to it. The main thing I'd like to is convey my > typical use of the numpy.ma module as an environmental engineer > analyzing censored datasets --contaminant concentrations that are > either at well understood values (not masked) or some unknown value > below an upper bound (masked). > > My basic understanding is that this discussion revolved around how to > treat masked data (ignored vs missing) and how to implement one, both, > or some middle ground between those two concepts. If I'm off-base, > just ignore all of the following. > > For my purposes, numpy.ma is implemented in a way very well suited to > my needs. Here's a gist of a something that was *really* hard for me > before I discovered numpy.ma and numpy in general. (this is a bit > much, see below for the highlights) > https://gist.github.com/2361814 > > The main message here is that I include the upper bounds of the > unknown values (detection limits) in my array and use that to > statistically estimate their values. I must be able to retrieve the > masked detection limits throughout this process. Additionally the > masks as currently implemented allow me sort first the undetected > values, then the detected values (see __rosRanks in the gist). > > As boots-on-the-ground user of numpy, I'm ecstatic that this tool > exists. I'm also pretty flexible and don't anticipated any major snags > in my work if things change dramatically as the masked/missing/ignored > functionality evolves. > > Thanks to everyone for the hard work and great tools, > -Paul Hobson Thanks for this note -- it's getting feedback from people on how they're actually using numpy.ma is *very* helpful, because there's a lot more data out there on the "missing data" use case. But, I couldn't quite figure out what you're actually doing in this code. It looks like the measurements that you're masking out have some values "hidden behind" the mask, which you then make use of? Unfortunately, I don't know anything about environmental engineering or the method of Hirsch and Stedinger (1987). Could you elaborate a bit on what these masked values mean and how you process them? -- Nathaniel From pmhobson at gmail.com Mon Apr 23 17:57:56 2012 From: pmhobson at gmail.com (Paul Hobson) Date: Mon, 23 Apr 2012 14:57:56 -0700 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: Nathan, Apologies for not being clear. My interaction with these lists is unfortunately constrained to lunch breaks and times when my code is running. :-/ So the masked values are the the detection limits. In other words, the lab said that they can't see the chemical in the sample, but it may be present at a concentration below the what their machines can see, i.e. the detection limit. Simply put, I receive data that looks like this: Sample # - Result (mg/kg) 1 - 5.1 2 - 6.3 3 - <4.5 4 - <3.0 5 - 10.2 6 - <1.0 etc... So when I parse the data, I mask the results that are non-detect (less than the reported value) so that I can set them aside and sort the data my way -- detected values descending stacked on top of non-detect values descending. I then use the statistical distribution of the detected values to model the non-detect values. Again, since we don't know what they are, this is just a best guess. Some people just use the detection limit, others half of the detection limit. A picture is worth 1000 words, right? The attached plot shows what my code does to my data. Hopefully this demonstrates the value of my method over using the detection limits as received from the lab. The best way I can explain it is that I don't use the mask to say, "I don't know this" or "I don't want to see this". Instead I use it to classify data within a single set into two types of data. For simplicity's sake, we can call them "known" data and "upper bounded" data. In order to do this with numpy.ma, I must be able to retrieve the masked values (x[x.mask=true].values). I'm sure that if numpy.ma went away forever, I could work around this. However the current implementation of numpy.ma works very well for me as-is. Please don't hesitate to ask for further clarification if i glossed over any details. I've been working in this field since I was a 19-yo intern, so I'm undoubtedly taking things for granted. -paul On Mon, Apr 23, 2012 at 1:40 PM, Nathaniel Smith wrote: > Hi Paul, > > On Wed, Apr 11, 2012 at 8:57 PM, Paul Hobson wrote: >> Travis et al, >> >> This isn't a reply to anything specific in your email and I apologize >> if there is a better thread or place to share this information. I've >> been meaning to participate in the discussion for a long time and >> never got around to it. The main thing I'd like to is convey my >> typical use of the numpy.ma module as an environmental engineer >> analyzing censored datasets --contaminant concentrations that are >> either at well understood values (not masked) or some unknown value >> below an upper bound (masked). >> >> My basic understanding is that this discussion revolved around how to >> treat masked data (ignored vs missing) and how to implement one, both, >> or some middle ground between those two concepts. If I'm off-base, >> just ignore all of the following. >> >> For my purposes, numpy.ma is implemented in a way very well suited to >> my needs. Here's a gist of a something that was *really* hard for me >> before I discovered numpy.ma and numpy in general. (this is a bit >> much, see below for the highlights) >> https://gist.github.com/2361814 >> >> The main message here is that I include the upper bounds of the >> unknown values (detection limits) in my array and use that to >> statistically estimate their values. I must be able to retrieve the >> masked detection limits throughout this process. Additionally the >> masks as currently implemented allow me sort first the undetected >> values, then the detected values (see __rosRanks in the gist). >> >> As boots-on-the-ground user of numpy, I'm ecstatic that this tool >> exists. I'm also pretty flexible and don't anticipated any major snags >> in my work if things change dramatically as the masked/missing/ignored >> functionality evolves. >> >> Thanks to everyone for the hard work and great tools, >> -Paul Hobson > > Thanks for this note -- it's getting feedback from people on how > they're actually using numpy.ma is *very* helpful, because there's a > lot more data out there on the "missing data" use case. > > But, I couldn't quite figure out what you're actually doing in this > code. It looks like the measurements that you're masking out have some > values "hidden behind" the mask, which you then make use of? > Unfortunately, I don't know anything about environmental engineering > or the method of Hirsch and Stedinger (1987). Could you elaborate a > bit on what these masked values mean and how you process them? > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: test_plot.png Type: image/png Size: 61872 bytes Desc: not available URL: From travis at continuum.io Mon Apr 23 18:08:38 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 23 Apr 2012 17:08:38 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> > >> Linux: Technically, everything you say is true. In practice, good luck >> convincing Linus or a subsystem maintainer to accept your patch when >> other people are raising substantive complaints. Here's an email I >> googled up in a few moments, in which Linus yells at people for trying >> to submit a patch to him without making sure that all interested >> parties have agreed: >> https://lkml.org/lkml/2009/9/14/481 >> Stuff regularly sits outside the kernel tree in limbo for *years* >> while people debate different approaches back and forth. > > To which I'd add: > > "In fact, for [Linus'] decisions to be received as legitimate, they > have to be consistent with the consensus of the opinions of > participating developers as manifest on Linux mailing lists. It is not > unusual for him to back down from a decision under the pressure of > criticism from other developers. His position is based on the > recognition of his fitness by the community of Linux developers and > this type of authority is, therefore, constantly subject to > withdrawal. His role is not that of a boss or a manager in the usual > sense. In the final analysis, the direction of the project springs > from the cumulative synthesis of modifications contributed by > individual developers." > http://shareable.net/blog/governance-of-open-source-george-dafermos-interview > This is the model that I have for NumPy development. It is my view of how NumPy has evolved already and how Numarray, and Numeric evolved before it as well. I also feel like these things are fundamentally determined by the people involved and by the personalities and styles of those who participate. There certainly are globally applicable principles (like code review, building consensus, and mutual respect) that are worth emphasizing over and over again. If it helps let's write those down and say "these are the principles we live by". I am suspicious that you can go beyond this in formalizing the process as you ultimately are at the mercy of the people involved and their judgment, anyway. I can also see that for the benefit of newcomers and occasional contributors it can be beneficial to have some documentation of the natural, emergent methods and interactions that apply to cooperative software development. But, I would hesitate to put some-kind of aura of authority around such a document that implies the processes cannot be violated if good judgment demands that they should be. That is the basis of my hesitation to spend much time on "officially documenting our process" Right now we are trying to balance difficult things: stable releases with experimental development. The fact that we had such differences of opinion last year on masked arrays / missing values and how to incorporate them into a common object model means that we should not have committed the code to master until we figured out a way to reconcile Nathaniel's concerns. That is my current view. I was very enthused that we had someone contributing large scale changes that clearly showed an ability to understand the code and contribute to it --- that hadn't happened in a while. I wanted to encourage that. I still do. I think the process itself has shown that you can have an impact on NumPy just by voicing your opinion. Clearly, you have more of an effect on NumPy by submitting pull requests, but NumPy development does listen carefully to the voices of users. Best, -Travis > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Mon Apr 23 18:11:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 23 Apr 2012 17:11:36 -0500 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: Thank you very much for contributing this description. It is very helpful to see how people use numpy.ma in the wild. -Travis On Apr 11, 2012, at 2:57 PM, Paul Hobson wrote: > Travis et al, > > This isn't a reply to anything specific in your email and I apologize > if there is a better thread or place to share this information. I've > been meaning to participate in the discussion for a long time and > never got around to it. The main thing I'd like to is convey my > typical use of the numpy.ma module as an environmental engineer > analyzing censored datasets --contaminant concentrations that are > either at well understood values (not masked) or some unknown value > below an upper bound (masked). > > My basic understanding is that this discussion revolved around how to > treat masked data (ignored vs missing) and how to implement one, both, > or some middle ground between those two concepts. If I'm off-base, > just ignore all of the following. > > For my purposes, numpy.ma is implemented in a way very well suited to > my needs. Here's a gist of a something that was *really* hard for me > before I discovered numpy.ma and numpy in general. (this is a bit > much, see below for the highlights) > https://gist.github.com/2361814 > > The main message here is that I include the upper bounds of the > unknown values (detection limits) in my array and use that to > statistically estimate their values. I must be able to retrieve the > masked detection limits throughout this process. Additionally the > masks as currently implemented allow me sort first the undetected > values, then the detected values (see __rosRanks in the gist). > > As boots-on-the-ground user of numpy, I'm ecstatic that this tool > exists. I'm also pretty flexible and don't anticipated any major snags > in my work if things change dramatically as the masked/missing/ignored > functionality evolves. > > Thanks to everyone for the hard work and great tools, > -Paul Hobson > > On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant wrote: >> Hey all, >> >> I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin. Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. >> >> I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. I think there are some things still to be decided. However, I think some things are pretty clear: >> >> 1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. The numpy.ma code will remain as a compatibility layer >> >> 2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). However, the semantic of what the mask implies may change from what numpy.ma uses to having a True value meaning selected. >> >> 3) There will be missing-data dtypes in NumPy. Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. >> >> 4) I'm still not sure about whether the IGNORED concept is necessary or not. I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)? My current weak view is that it is not really necessary. But, I could be convinced otherwise. >> >> I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. >> >> Thanks, >> >> -Travis >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Mon Apr 23 18:17:19 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 23 Apr 2012 15:17:19 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Mon, Apr 23, 2012 at 3:08 PM, Travis Oliphant wrote: >> >>> Linux: Technically, everything you say is true. In practice, good luck >>> convincing Linus or a subsystem maintainer to accept your patch when >>> other people are raising substantive complaints. Here's an email I >>> googled up in a few moments, in which Linus yells at people for trying >>> to submit a patch to him without making sure that all interested >>> parties have agreed: >>> ?https://lkml.org/lkml/2009/9/14/481 >>> Stuff regularly sits outside the kernel tree in limbo for *years* >>> while people debate different approaches back and forth. >> >> To which I'd add: >> >> "In fact, for [Linus'] decisions to be received as legitimate, they >> have to be consistent with the consensus of the opinions of >> participating developers as manifest on Linux mailing lists. It is not >> unusual for him to back down from a decision under the pressure of >> criticism from other developers. His position is based on the >> recognition of his fitness by the community of Linux developers and >> this type of authority is, therefore, constantly subject to >> withdrawal. His role is not that of a boss or a manager in the usual >> sense. In the final analysis, the direction of the project springs >> from the cumulative synthesis of modifications contributed by >> individual developers." >> http://shareable.net/blog/governance-of-open-source-george-dafermos-interview >> > > This is the model that I have for NumPy development. ? It is my view of how NumPy has evolved already and how Numarray, and Numeric evolved before it as well. ? ?I also feel like these things are fundamentally determined by the people involved and by the personalities and styles of those who participate. ? ?There certainly are globally applicable principles (like code review, building consensus, and mutual respect) that are worth emphasizing over and over again. ? If it helps let's write those down and say "these are the principles we live by". ? I am suspicious that you can go beyond this in formalizing the process as you ultimately are at the mercy of the people involved and their judgment, anyway. I think writing it down would help enormously. For example, if you do agree to Nathaniel's view of consensus - *in principle* - and we write that down and agree, we have a document to appeal to when we next run into trouble. Maybe the document could say something like: """ We strive for consensus [some refs here]. Any substantial new feature is subject to consensus. Only if all avenues for consensus have been documented, and exhausted, will we [vote, defer to Travis, or some other tie-breaking thing]. """ Best, Matthew From chris.barker at noaa.gov Mon Apr 23 18:46:18 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 23 Apr 2012 15:46:18 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 3:08 PM, Travis Oliphant wrote: > Right now we are trying to balance difficult things: ?stable releases with experimental development. Perhaps a more formal "development release" system could help here. IIUC, numpy pretty much has two things: the latest release (and past ones) and master (and assorted experimentla branches). If someone develops a new feature, we can either: have them submit a pull request, and people with the where-with-all can pull it, compile, it, and start tesing it on their own -- hsitory shows that this is a small group. merge it with master -- and hope it gets the testing is should before it becomes part of a release, but: we are rightly heistant to put experimental stuff in master, and it really dont' get that much testing -- again only folks that are building master will even see it. Some projects have a more format "development release" system. wxPython, for instance has had for years development releases with odd numbers -- right now, the official release is 2.8.*, but there is a 2.9.* out there that is getting some use and testing. A couple of things help make this work: 1) Robin makes the effort to put out binaries for development releases -- it's easy to go get and give it a try. 2) there is the wxversion system that makes it easy to install a new versin of wx, and easily switch between them (it's actually broken on OS-X right now --- :-) ) -- this pre-dated virtualenv and friends, maybe virtualenv is enough for this now. Anyway, it's a thought -- I think some more rea-world use of new features before a real commitment to adopting them would be great. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From travis at continuum.io Mon Apr 23 19:02:29 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 23 Apr 2012 18:02:29 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: That is an excellent thought. We could make the odd numbered releases "experimental" and the even-numbered as stable. That makes some sense. What do others think? -Travis On Apr 23, 2012, at 5:46 PM, Chris Barker wrote: > On Mon, Apr 23, 2012 at 3:08 PM, Travis Oliphant wrote: >> Right now we are trying to balance difficult things: stable releases with experimental development. > > Perhaps a more formal "development release" system could help here. > IIUC, numpy pretty much has two things: the latest release (and past > ones) and master (and assorted experimentla branches). If someone > develops a new feature, we can either: > > have them submit a pull request, and people with the where-with-all > can pull it, compile, it, and start tesing it on their own -- hsitory > shows that this is a small group. > > merge it with master -- and hope it gets the testing is should before > it becomes part of a release, but: we are rightly heistant to put > experimental stuff in master, and it really dont' get that much > testing -- again only folks that are building master will even see it. > > > Some projects have a more format "development release" system. > wxPython, for instance has had for years development releases with odd > numbers -- right now, the official release is 2.8.*, but there is a > 2.9.* out there that is getting some use and testing. A couple of > things help make this work: > > 1) Robin makes the effort to put out binaries for development releases > -- it's easy to go get and give it a try. > > 2) there is the wxversion system that makes it easy to install a new > versin of wx, and easily switch between them (it's actually broken on > OS-X right now --- :-) ) -- this pre-dated virtualenv and friends, > maybe virtualenv is enough for this now. > > > Anyway, it's a thought -- I think some more rea-world use of new > features before a real commitment to adopting them would be great. > > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Mon Apr 23 19:39:43 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Apr 2012 17:39:43 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 5:02 PM, Travis Oliphant wrote: > That is an excellent thought. > > We could make the odd numbered releases "experimental" and the > even-numbered as stable. > > That makes some sense. What do others think? > > I'm starting to think that a fork might be the best solution to the present problem. There is plenty of precedent for forks in FOSS, for example GCC, EGCS, Redhat 1.97, LLVM and emacs, xemacs. There are several semi-official forks of linux (Android, the real time Kernel, etc.) Zeromq just forked, OpenOffice forked, there was XFree86 forked to Xorg, etc. Linus encourages forks, so there is even authority for that ;) Of course, the further the fork diverges from the original the harder reintegration becomes, witness Android and wake-locks. But a fork would cure a lot of contention. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 23 20:24:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Apr 2012 18:24:29 -0600 Subject: [Numpy-discussion] A 1.6.2 release? In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 11:05 AM, Ralf Gommers wrote: > > > On Mon, Apr 23, 2012 at 8:47 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Apr 23, 2012 at 12:22 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Sun, Apr 22, 2012 at 3:44 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> On Sun, Apr 22, 2012 at 5:25 AM, Ralf Gommers < >>>> ralf.gommers at googlemail.com> wrote: >>>> >>>>> >>>>> >>>>>>> Aiming for a RC on May 2nd and final release on May 16th would work >>>>>>> for me. >>>>>>> >>>>>>> >>>>>> I count 280 BUG commits since 1.6.1, so we are going to need to thin >>>>>> those out. >>>>>> >>>>> >>>>> Indeed. We can discard all commits related to NA and datetime, and >>>>> then we should find some balance between how important the fixes are and >>>>> how much risk there is that they break something. I agree with the couple >>>>> of backports you've done so far, but I propose to do the rest via PRs. >>>>> >>>> >>> Charles, did you have some practical way in mind to select these >>> commits? We could split it up by time range or by submodules for example. >>> I'd prefer the latter. You would be able to do a better job of the commits >>> touching numpy/core than I. How about you do that one and the polynomial >>> module, and I do the rest? >>> >>> >> I'll give it a shot. I thought the first thing I would try is a search on >> tickets. We'll also need to track things and I haven't thought of a good >> way to do that apart from making a list and checking things off. I don't >> think there was too much polynomial fixing, mostly new stuff, but I'd like >> to use the current documentation. I don't know how you manage that for >> releases. >> > > Nothing too fancy - I use the open tickets for the milestone at > http://projects.scipy.org/numpy/report/3, plus the checklist at > https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt and > perhaps a small todo list in my inbox. Normally we only do bugfix releases > for specific reasons, so besides those I just scan through the list of > commits and pick only some relevant ones of which I'm sure that they won't > give any problems. > > The fixed items under > > http://projects.scipy.org/numpy/query?status=closed&group=resolution&milestone=1.7.0 > > http://projects.scipy.org/numpy/query?status=closed&group=resolution&milestone=2.0.0 > probably give the best overview. > > Argghhh... work ;) But thanks, that's a good starting point... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Apr 23 23:49:22 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 23 Apr 2012 20:49:22 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 4:39 PM, Charles R Harris wrote: > I'm starting to think that a fork might be the best solution to the present > problem. If you are referring to the traditional concept of a fork, and not to the type we frequently make on GitHub, then I'm surprised that no one has objected already. What would a fork solve? To paraphrase the regexp saying: after forking, we'll simply have two problems. It's really not that hard to focus our attention on technical issues and to reach consensus. St?fan From fperez.net at gmail.com Tue Apr 24 01:29:53 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 23 Apr 2012 22:29:53 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 4:02 PM, Travis Oliphant wrote: > That is an excellent thought. > > We could make the odd numbered releases "experimental" and the even-numbered as stable. > > That makes some sense. ? ?What do others think? I think the concern with that is manpower: it effectively requires maintaining two complete projects alive in parallel. As far as I know, a number projects that used to have that model have backed off (the linux kernel included) to better enable a limited team to focus on development. I'm skeptical that numpy has the manpower to sustain that approach. Cheers, f From fperez.net at gmail.com Tue Apr 24 01:35:41 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 23 Apr 2012 22:35:41 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt wrote: > If you are referring to the traditional concept of a fork, and not to > the type we frequently make on GitHub, then I'm surprised that no one > has objected already. ?What would a fork solve? To paraphrase the > regexp saying: after forking, we'll simply have two problems. I concur with you here: github 'forks', yes, as many as possible! Hopefully every one of those will produce one or more PRs :) But a fork in the sense of a divergent parallel project? I think that would only be indicative of a complete failure to find a way to make progress here, and I doubt we're anywhere near that state. That forks are *possible* is indeed a valuable and important option in open source software, because it means that a truly dysfunctional original project team/direction can't hold a community hostage forever. But that doesn't mean that full-blown forks should be considered lightly, as they also carry enormous costs. I see absolutely nothing in the current scenario to even remotely consider that a full-blown fork would be a good idea, and I hope I'm right. It seems to me we're making progress on problems that led to real difficulties last year, but from multiple parties I see signs that give me reason to be optimistic that the project is getting better, not worse. Cheers, f From ralf.gommers at googlemail.com Tue Apr 24 02:18:34 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 24 Apr 2012 08:18:34 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 12:46 AM, Chris Barker wrote: > On Mon, Apr 23, 2012 at 3:08 PM, Travis Oliphant > wrote: > > Right now we are trying to balance difficult things: stable releases > with experimental development. > > Perhaps a more formal "development release" system could help here. > IIUC, numpy pretty much has two things: the latest release (and past > ones) and master (and assorted experimentla branches). If someone > develops a new feature, we can either: > > have them submit a pull request, and people with the where-with-all > can pull it, compile, it, and start tesing it on their own -- hsitory > shows that this is a small group. > > merge it with master -- and hope it gets the testing is should before > it becomes part of a release, but: we are rightly heistant to put > experimental stuff in master, and it really dont' get that much > testing -- again only folks that are building master will even see it. > > > Some projects have a more format "development release" system. > wxPython, for instance has had for years development releases with odd > numbers -- right now, the official release is 2.8.*, but there is a > 2.9.* out there that is getting some use and testing. A couple of > things help make this work: > > 1) Robin makes the effort to put out binaries for development releases > -- it's easy to go get and give it a try. > This is a good idea - not for development releases but for master. Building nightly/weekly binaries would help more people try out new features. > 2) there is the wxversion system that makes it easy to install a new > versin of wx, and easily switch between them (it's actually broken on > OS-X right now --- :-) ) -- this pre-dated virtualenv and friends, > maybe virtualenv is enough for this now. > wxversion was broken for a long time on Ubuntu too (~5 yrs ago). I don't exactly remember it as a good idea. virtualenv also doesn't help, because if you can use that you know how to build from source anyway. Ralf > > Anyway, it's a thought -- I think some more rea-world use of new > features before a real commitment to adopting them would be great. > > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Tue Apr 24 09:04:41 2012 From: tmp50 at ukr.net (Dmitrey) Date: Tue, 24 Apr 2012 16:04:41 +0300 Subject: [Numpy-discussion] [ANN] Optimization with categorical variables, disjunctive (and other logical) constraints Message-ID: <10202.1335272681.1300527922027298816@ffe17.ukr.net> hi all, free solver interalg for global nonlinear optimization with specifiable accuracy now can handle categorical variables, disjunctive (and other logical) constraints, thus making it available to solve GDP, possibly in multiobjective form. There are ~ 2 months till next OpenOpt release, but I guess someone may find it useful for his purposes right now. See here for more details. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 24 09:14:20 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 07:14:20 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez wrote: > On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt > wrote: > > If you are referring to the traditional concept of a fork, and not to > > the type we frequently make on GitHub, then I'm surprised that no one > > has objected already. What would a fork solve? To paraphrase the > > regexp saying: after forking, we'll simply have two problems. > > I concur with you here: github 'forks', yes, as many as possible! > Hopefully every one of those will produce one or more PRs :) But a > fork in the sense of a divergent parallel project? I think that would > only be indicative of a complete failure to find a way to make > progress here, and I doubt we're anywhere near that state. > > That forks are *possible* is indeed a valuable and important option in > open source software, because it means that a truly dysfunctional > original project team/direction can't hold a community hostage > forever. But that doesn't mean that full-blown forks should be > considered lightly, as they also carry enormous costs. > > I see absolutely nothing in the current scenario to even remotely > consider that a full-blown fork would be a good idea, and I hope I'm > right. It seems to me we're making progress on problems that led to > real difficulties last year, but from multiple parties I see signs > that give me reason to be optimistic that the project is getting > better, not worse. > > We certainly aren't there at the moment, but I can see us heading that way. But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since then datetime, NA, polynomial work, and various other enhancements have gone in along with some 280 bug fixes. The major technical problem blocking a 1.7 release is getting datetime working reliably on windows. So I think that is where the short term effort needs to be. Meanwhile, we are spending effort to get out a 1.6.2 just so people can work with a stable version with some of the bug fixes, and potentially we will spend more time and effort to pull out the NA code. In the future there may be a transition to C++ and eventually a break with the current ABI. Or not. There are at least two motivations that get folks to write code for open source projects, scratching an itch and money. Money hasn't been a big part of the Numpy picture so far, so that leaves scratching an itch. One of the attractions of Numpy is that it is a small project, BSD licensed, and not overburdened with governance and process. This makes scratching an itch not as difficult as it would be in a large project. If Numpy remains a small project but acquires the encumbrances of a big project much of that attraction will be lost. Momentum and direction also attracts people, but numpy is stalled at the moment as the whole NA thing circles around once again. What would I suggest as a way forward with the NA option. Let's take the issues. 1) Adding slots to PyArrayObject_fields. I don't think this is likely to be a problem unless someone's code passes the struct by value or uses assignment to initialize a statically allocated instance. I'm not saying no one does that, low level scientific code can contain all sorts of bizarre and astonishing constructs and it is also possible that these sort of things might turn up in an old FORTRAN program. The question here is whether to allow any changes at all, and I think we will have to in the future. Given that, consistent use of accessors will make later changes to the organization or implementation of the base structure transparent. Numpy itself now uses accessors for the heritage slots, but not for the new NA slots. So I suggest at a minimum adding accessors for the maskna_dtype, maskna_data, and maskna_strides. Of course, later removing these slots will still remain a problem. 2) NA. This breaks down into API and implementation issues. Personally, I think marking the NA stuff experimental leaves room to modify both and would prefer to go with what we have and change it into whatever looks best by modification through pull requests. This kicks the can down the road, but not so far that people sufficiently interested in working on the topic can't get modifications in. My own preferences for future API modifications are as follows. a) All arrays should be implicitly masked, even if the mask isn't initially allocated. The maskna keyword can then be removed, taking with it the sense that there are two kinds of arrays. b) There needs to be a distinction between missing and ignore. The mechanism for this is already in place in the payload type, although it isn't clear to me that that is uniformly used in all the NA code. There is also a place for missing *and* ignored. Which leads to c) Sums, etc. should always skip ignored data. If missing data is present, but not ignored, then a sum should return NA. The main danger I see here is that the behavior of arrays becomes state dependent, something that can lead to subtle problems. Explicit request for a particular behavior, as is done now, might be preferable for its clarity. d) I think views are a good way add another mask layer to existing arrays. And for implementation: a) Ufunc loop support. This is most easily done with explicit masks. b) Apropos a), I'm coming (again) to the opinion that byte masks are the simplest and most general implementation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Tue Apr 24 09:43:48 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Tue, 24 Apr 2012 15:43:48 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: <4F96AE14.20302@crans.org> Hi, Le 24/04/2012 15:14, Charles R Harris a ?crit : > > a) All arrays should be implicitly masked, even if the mask isn't > initially allocated. The maskna keyword can then be removed, taking > with it the sense that there are two kinds of arrays. > From my lazy user perspective, having masked and non-masked arrays share the same "look and feel" would be a number one advantage over the existing numpy.ma arrays. I would like masked array to be as transparent as possible. > b) There needs to be a distinction between missing and ignore. The > mechanism for this is already in place in the payload type, although > it isn't clear to me that that is uniformly used in all the NA code. > There is also a place for missing *and* ignored. Which leads to If the idea of having two payloads is to avoid a maximum of "skipna & friends" extra keywords, I would like it much. My feeling with my small experience with R is that I end up calling every function with a different magical set of keywords (na.rm, na.action, ... and I forgot). My 2 lazy user cents... Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From nouiz at nouiz.org Tue Apr 24 11:03:26 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 24 Apr 2012 11:03:26 -0400 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: Hi, I finished reading the doc I listed in the other thread. As the NA stuff will be marked as Experimental in numpy 1.7, why not define a new macro like NPY_NA_VERSION that will give the version of the NA C-api? That way, people will be able to detect if there is change in the c-api of NA when they write it. So this will allow to break this interface more easily. We would just need to make a big warning to do this check it. The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow removing feature. Probably a function like PyArray_GetNACVersion would be useful too.[1] Continuing on my previous post, old code need to be changed to don't accept NA inputs. With the current trunk, this can be done like this: PyObject* an_input = ....; if (!PyArray_Check(an_input) { PyErr_SetString(PyExc_ValueError, "expected an ndarray"); %(fail)s } if (NPY_FEATURE_VERSION >= 0x00000008){ if(PyArray_HasNASupport((PyArrayObject*) an_input )){ PyErr_SetString(PyExc_ValueError, "masked array are not supported by this function"); %(fail)s } } In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x00000007. This value wasn't changed in the trunk. I suppose it will be raised to 0x00000008 for numpy 1.7. Can we suppose that old code check input with PyArray_Check()? I think so, but it would be really helpful if people that are here for longer them me can confirm/deny this? Fr?d?ric [1] http://docs.scipy.org/doc/numpy/reference/c-api.array.html#checking-the-api-version From josef.pktd at gmail.com Tue Apr 24 11:25:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 24 Apr 2012 11:25:27 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <4F96AE14.20302@crans.org> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig wrote: > Hi, > > Le 24/04/2012 15:14, Charles R Harris a ?crit : >> >> a) All arrays should be implicitly masked, even if the mask isn't >> initially allocated. The maskna keyword can then be removed, taking >> with it the sense that there are two kinds of arrays. >> > > From my lazy user perspective, having masked and non-masked arrays share > the same "look and feel" would be a number one advantage over the > existing numpy.ma arrays. I would like masked array to be as transparent > as possible. I don't have any opinion about internal implementation. But users needs to be aware of whether they have masked arrays or not. Since many functions (most of scipy) wouldn't know how to handle NA and don't do any checks, (and shouldn't in my opinion if the NA check is costly). The result might be silently wrong numbers depending on the implementation. > >> b) There needs to be a distinction between missing and ignore. The >> mechanism for this is already in place in the payload type, although >> it isn't clear to me that that is uniformly used in all the NA code. >> There is also a place for missing *and* ignored. Which leads to > > If the idea of having two payloads is to avoid a maximum of "skipna & > friends" extra keywords, I would like it much. My feeling with my small > experience with R is that I end up calling every function with a > different magical set of keywords (na.rm, na.action, ... and I forgot). There is a reason for requiring the user to decide what to do about NA's. Either we have utility functions/methods to help the user change the arrays and treat NA's before calling a function, or the function needs to ask the user what should be done about possible NAs. Doing it automatically might only be useful for specialised packages. My 2c Josef > > My 2 lazy user cents... > > Best, > Pierre > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Tue Apr 24 12:38:08 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 24 Apr 2012 09:38:08 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Tue, Apr 24, 2012 at 6:14 AM, Charles R Harris wrote: > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > wrote: >> >> On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt >> wrote: >> > If you are referring to the traditional concept of a fork, and not to >> > the type we frequently make on GitHub, then I'm surprised that no one >> > has objected already. ?What would a fork solve? To paraphrase the >> > regexp saying: after forking, we'll simply have two problems. >> >> I concur with you here: github 'forks', yes, as many as possible! >> Hopefully every one of those will produce one or more PRs :) ?But a >> fork in the sense of a divergent parallel project? ?I think that would >> only be indicative of a complete failure to find a way to make >> progress here, and I doubt we're anywhere near that state. >> >> That forks are *possible* is indeed a valuable and important option in >> open source software, because it means that a truly dysfunctional >> original project team/direction can't hold a community hostage >> forever. ?But that doesn't mean that full-blown forks should be >> considered lightly, as they also carry enormous costs. >> >> I see absolutely nothing in the current scenario to even remotely >> consider that a full-blown fork would be a good idea, and I hope I'm >> right. ?It seems to me we're making progress on problems that led to >> real difficulties last year, but from multiple parties I see signs >> that give me reason to be optimistic that the project is getting >> better, not worse. >> > > We certainly aren't there at the moment, but I can see us heading that way. > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since > then datetime, NA, polynomial work, and various other enhancements have gone > in along with some 280 bug fixes. The major technical problem blocking a 1.7 > release is getting datetime working reliably on windows. So I think that is > where the short term effort needs to be. Meanwhile, we are spending effort > to get out a 1.6.2 just so people can work with a stable version with some > of the bug fixes, and potentially we will spend more time and effort to pull > out the NA code. In the future there may be a transition to C++ and > eventually a break with the current ABI. Or not. > > There are at least two motivations that get folks to write code for open > source projects, scratching an itch and money. Money hasn't been a big part > of the Numpy picture so far, so that leaves scratching an itch. One of the > attractions of Numpy is that it is a small project, BSD licensed, and not > overburdened with governance and process. This makes scratching an itch not > as difficult as it would be in a large project. If Numpy remains a small > project but acquires the encumbrances of a big project much of that > attraction will be lost. Momentum and direction also attracts people, but > numpy is stalled at the moment as the whole NA thing circles around once > again. I think your assumptions are incorrect, although I have seen them before. No stated process leads to less encumbrance if and only if the implicit process works. It clearly doesn't work, precisely because the NA thing is circling round and round again. And the governance discussion. And previously the ABI breakage discussion. If you are on other mailing lists, I'm sure you are, you'll see that this does not happen to - say - Cython, or Sympy. In particular, I have not seen, on those lists, the current numpy way of simply blocking or avoiding discussion. Everything is discussed out to agreement, or at least until all parties accept the way forward. At the moment, the only hope I could imagine for the 'no governance is good governance' method, is that all those who don't agree would just shut up. It would be more peaceful, but for the reasons stated by Nathaniel, I think that would be a very bad outcome. Best, Matthew From charlesr.harris at gmail.com Tue Apr 24 14:12:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 12:12:55 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 9:25 AM, wrote: > On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig > wrote: > > Hi, > > > > Le 24/04/2012 15:14, Charles R Harris a ?crit : > >> > >> a) All arrays should be implicitly masked, even if the mask isn't > >> initially allocated. The maskna keyword can then be removed, taking > >> with it the sense that there are two kinds of arrays. > >> > > > > From my lazy user perspective, having masked and non-masked arrays share > > the same "look and feel" would be a number one advantage over the > > existing numpy.ma arrays. I would like masked array to be as transparent > > as possible. > > I don't have any opinion about internal implementation. > > But users needs to be aware of whether they have masked arrays or not. > Since many functions (most of scipy) wouldn't know how to handle NA > and don't do any checks, (and shouldn't in my opinion if the NA check > is costly). The result might be silently wrong numbers depending on > the implementation. > There should be a flag saying whether or not NA has been allocated and allocation happens when NA is assigned to an array item, so that should be fast. I don't think scipy currently deals with masked arrays in all areas,, so I believe that the same problem exists there and would also exist for missing data types. I think this sort of compatibility problem is worth a whole discussion by itself. > > > > >> b) There needs to be a distinction between missing and ignore. The > >> mechanism for this is already in place in the payload type, although > >> it isn't clear to me that that is uniformly used in all the NA code. > >> There is also a place for missing *and* ignored. Which leads to > > > > If the idea of having two payloads is to avoid a maximum of "skipna & > > friends" extra keywords, I would like it much. My feeling with my small > > experience with R is that I end up calling every function with a > > different magical set of keywords (na.rm, na.action, ... and I forgot). > > There is a reason for requiring the user to decide what to do about NA's. > Either we have utility functions/methods to help the user change the > arrays and treat NA's before calling a function, or the function needs > to ask the user what should be done about possible NAs. > Doing it automatically might only be useful for specialised packages. > > That's what the different payloads would do. I think the common use case would always have the ignore bit set. What are the other sorts of actions you are interested in, and should they be part of the functions in Numpy, such as mean and std, or should they rather implemented in stats packages that may be more specialized? I see numpy.ma currently used in the following spots in scipy: scipy/stats/mstats_extras.py scipy/stats/tests/test_mstats_extras.py scipy/stats/tests/test_mstats_basic.py scipy/stats/mstats_basic.py scipy/signal/filter_design.py scipy/optimize/optimize.py The advantage of nans, I suppose, is that they are in the hardware and so already universally part of Numpy. NA would be introduced, so would require a bit more work. I expect it will be several (many) years before they are dealt with as a matter of course. At minimum, one would need to check if the masked flag is set. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 24 14:32:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 12:32:12 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 12:12 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Tue, Apr 24, 2012 at 9:25 AM, wrote: > >> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >> wrote: >> > Hi, >> > >> > Le 24/04/2012 15:14, Charles R Harris a ?crit : >> >> >> >> a) All arrays should be implicitly masked, even if the mask isn't >> >> initially allocated. The maskna keyword can then be removed, taking >> >> with it the sense that there are two kinds of arrays. >> >> >> > >> > From my lazy user perspective, having masked and non-masked arrays share >> > the same "look and feel" would be a number one advantage over the >> > existing numpy.ma arrays. I would like masked array to be as >> transparent >> > as possible. >> >> I don't have any opinion about internal implementation. >> >> But users needs to be aware of whether they have masked arrays or not. >> Since many functions (most of scipy) wouldn't know how to handle NA >> and don't do any checks, (and shouldn't in my opinion if the NA check >> is costly). The result might be silently wrong numbers depending on >> the implementation. >> > > There should be a flag saying whether or not NA has been allocated and > allocation happens when NA is assigned to an array item, so that should be > fast. I don't think scipy currently deals with masked arrays in all areas,, > so I believe that the same problem exists there and would also exist for > missing data types. I think this sort of compatibility problem is worth a > whole discussion by itself. > To clarify a bit, a item could be marked as both missing and ignore. An item that is marked missing will propagate as missing, but if it is also ignored then things like mean and std will skip it. There would also be a clear operation that would clear the ignore bit but keep the missing bit. Now I can see the advantage of explicitly specifying behavior in functions as one is knows right at the spot what is intended whereas with the other alternative one needs to know the history of the array and whether ignore was ever set, but in that sense it is just like having default keyword values and could be implemented as such. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Apr 24 14:35:17 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 24 Apr 2012 14:35:17 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 2:12 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 9:25 AM, wrote: > >> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >> wrote: >> > Hi, >> > >> > Le 24/04/2012 15:14, Charles R Harris a ?crit : >> >> >> >> a) All arrays should be implicitly masked, even if the mask isn't >> >> initially allocated. The maskna keyword can then be removed, taking >> >> with it the sense that there are two kinds of arrays. >> >> >> > >> > From my lazy user perspective, having masked and non-masked arrays share >> > the same "look and feel" would be a number one advantage over the >> > existing numpy.ma arrays. I would like masked array to be as >> transparent >> > as possible. >> >> I don't have any opinion about internal implementation. >> >> But users needs to be aware of whether they have masked arrays or not. >> Since many functions (most of scipy) wouldn't know how to handle NA >> and don't do any checks, (and shouldn't in my opinion if the NA check >> is costly). The result might be silently wrong numbers depending on >> the implementation. >> > > There should be a flag saying whether or not NA has been allocated and > allocation happens when NA is assigned to an array item, so that should be > fast. I don't think scipy currently deals with masked arrays in all areas,, > so I believe that the same problem exists there and would also exist for > missing data types. I think this sort of compatibility problem is worth a > whole discussion by itself. > > >> >> > >> >> b) There needs to be a distinction between missing and ignore. The >> >> mechanism for this is already in place in the payload type, although >> >> it isn't clear to me that that is uniformly used in all the NA code. >> >> There is also a place for missing *and* ignored. Which leads to >> > >> > If the idea of having two payloads is to avoid a maximum of "skipna & >> > friends" extra keywords, I would like it much. My feeling with my small >> > experience with R is that I end up calling every function with a >> > different magical set of keywords (na.rm, na.action, ... and I forgot). >> >> There is a reason for requiring the user to decide what to do about NA's. >> Either we have utility functions/methods to help the user change the >> arrays and treat NA's before calling a function, or the function needs >> to ask the user what should be done about possible NAs. >> Doing it automatically might only be useful for specialised packages. >> >> > That's what the different payloads would do. I think the common use case > would always have the ignore bit set. What are the other sorts of actions > you are interested in, and should they be part of the functions in Numpy, > such as mean and std, or should they rather implemented in stats packages > that may be more specialized? I see numpy.ma currently used in the > following spots in scipy: > > Like you said, this whole issue probably should be in a separate discussion, but I would like to point out here with my thoughts on default payload. If we don't have some sort of mechanism for flagging which functions are NA-friendly or not, then it would be wise to have NA default to NaN behavior. If only to prevent bugs that mess up data from being undetected. That being said, the determination of NA payload is tricky. Some functions may need to react differently to an NA. One that comes to mind is np.gradient(). However, other functions may not need to do anything because they depend entirely upon other functions that have already been updated to support NA. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 24 15:02:46 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 24 Apr 2012 20:02:46 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <4F96AE14.20302@crans.org> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 2:43 PM, Pierre Haessig wrote: > If the idea of having two payloads is to avoid a maximum of "skipna & > friends" extra keywords, I would like it much. My feeling with my small > experience with R is that I end up calling every function with a > different magical set of keywords (na.rm, na.action, ... and I forgot). While I can't in general defend R on consistency grounds, there is a logic to this particular case. Most basic R functions like 'sum' take the na.rm= argument, which can be True or False and is equivalent to the skipna argument we've talked about for ufuncs. The functions that take other arguments (like na.action= for model fitting functions, or use= for their equivalent of np.corrcoef) are the ones that have *more* than 2 ways to handle NAs. E.g. model fitting functions given NAs can raise an error, skip the NA cases, or pass the NA cases through, and the correlation matrix function has different options for what to do with cases where one column has an NA but there are two others that don't. Having a distinction between missing and ignored values doesn't really affect whether you need such options. (If anything I guess it could make such options even more complicated -- what if I want my regression function to error out on missing but skip over ignored values, etc.) - N From josef.pktd at gmail.com Tue Apr 24 15:19:47 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 24 Apr 2012 15:19:47 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 2:35 PM, Benjamin Root wrote: > On Tue, Apr 24, 2012 at 2:12 PM, Charles R Harris > wrote: >> >> >> >> On Tue, Apr 24, 2012 at 9:25 AM, wrote: >>> >>> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >>> wrote: >>> > Hi, >>> > >>> > Le 24/04/2012 15:14, Charles R Harris a ?crit : >>> >> >>> >> a) All arrays should be implicitly masked, even if the mask isn't >>> >> initially allocated. The maskna keyword can then be removed, taking >>> >> with it the sense that there are two kinds of arrays. >>> >> >>> > >>> > From my lazy user perspective, having masked and non-masked arrays >>> > share >>> > the same "look and feel" would be a number one advantage over the >>> > existing numpy.ma arrays. I would like masked array to be as >>> > transparent >>> > as possible. >>> >>> I don't have any opinion about internal implementation. >>> >>> But users needs to be aware of whether they have masked arrays or not. >>> Since many functions (most of scipy) wouldn't know how to handle NA >>> and don't do any checks, (and shouldn't in my opinion if the NA check >>> is costly). The result might be silently wrong numbers depending on >>> the implementation. >> >> >> There should be a flag saying whether or not NA has been allocated and >> allocation happens when NA is assigned to an array item, so that should be >> fast. I don't think scipy currently deals with masked arrays in all areas,, >> so I believe that the same problem exists there and would also exist for >> missing data types. I think this sort of compatibility problem is worth a >> whole discussion by itself. >> >>> >>> >>> > >>> >> b) There needs to be a distinction between missing and ignore. The >>> >> mechanism for this is already in place in the payload type, although >>> >> it isn't clear to me that that is uniformly used in all the NA code. >>> >> There is also a place for missing *and* ignored. Which leads to >>> > >>> > If the idea of having two payloads is to avoid a maximum of "skipna & >>> > friends" extra keywords, I would like it much. My feeling with my small >>> > experience with R is that I end up calling every function with a >>> > different magical set of keywords (na.rm, na.action, ... and I forgot). >>> >>> There is a reason for requiring the user to decide what to do about NA's. >>> Either we have utility functions/methods to help the user change the >>> arrays and treat NA's before calling a function, or the function needs >>> to ask the user what should be done about possible NAs. >>> Doing it automatically might only be useful for specialised packages. >>> >> >> That's what the different payloads would do. I think the common use case >> would always have the ignore bit set. What are the other sorts of actions >> you are interested in, and should they be part of the functions in Numpy, >> such as mean and std, or should they rather implemented in stats packages >> that may be more specialized? I see numpy.ma currently used in the following >> spots in scipy: I think most functions that operate on an axis are mostly unambiguous ignore, std, mean, var, histogram, should stay in numpy, np.cov might have pairwise or row/column wise deletion option (but I don't know what other packages are doing). (While I had to run off, Nathaniel explained this.) The main cases in stats (or statsmodels) for handling NaNs or NAs would be rowwise ignore or pretend temporarily that they are zero or some other neutral value. >> > > Like you said, this whole issue probably should be in a separate discussion, > but I would like to point out here with my thoughts on default payload.? If > we don't have some sort of mechanism for flagging which functions are > NA-friendly or not, then it would be wise to have NA default to NaN > behavior.? If only to prevent bugs that mess up data from being undetected. In scipy.stats it's currently the responsibility of the user, unless explicitly mentioned that a function knows how to handle nans or masked arrays, the default is "we don't check" and what you get returned might be anything. If there is a flag (and a cheap way to verify whether there are NaNs or NAs), then we could just add a check in every function. Josef > > That being said, the determination of NA payload is tricky.? Some functions > may need to react differently to an NA.? One that comes to mind is > np.gradient().? However, other functions may not need to do anything because > they depend entirely upon other functions that have already been updated to > support NA. > > Cheers! > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From stefan at sun.ac.za Tue Apr 24 15:23:47 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 24 Apr 2012 12:23:47 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris wrote: > The advantage of nans, I suppose, is that they are in the hardware and so Why are we having a discussion on NAN's in a thread on consensus? This is a strong indicator of the problem we're facing. St?fan From ben.root at ou.edu Tue Apr 24 15:31:39 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 24 Apr 2012 15:31:39 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 3:23 PM, St?fan van der Walt wrote: > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: > > The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > St?fan > Good catch! Looks like we got off-track when the discussion talked about forks. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 24 16:03:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 14:03:19 -0600 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: 2012/4/24 Fr?d?ric Bastien > Hi, > > I finished reading the doc I listed in the other thread. As the NA > stuff will be marked as Experimental in numpy 1.7, why not define a > new macro like NPY_NA_VERSION that will give the version of the NA > C-api? That way, people will be able to detect if there is change in > the c-api of NA when they write it. So this will allow to break this > interface more easily. We would just need to make a big warning to do > this check it. > > This sounds like a good thing to do. > The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow > removing feature. Probably a function like PyArray_GetNACVersion would > be useful too.[1] > > Continuing on my previous post, old code need to be changed to don't > accept NA inputs. With the current trunk, this can be done like this: > > PyObject* an_input = ....; > > if (!PyArray_Check(an_input) { > PyErr_SetString(PyExc_ValueError, "expected an ndarray"); > %(fail)s > } > > if (NPY_FEATURE_VERSION >= 0x00000008){ > if(PyArray_HasNASupport((PyArrayObject*) an_input )){ > PyErr_SetString(PyExc_ValueError, "masked array are not > supported by this function"); > %(fail)s > } > } > > In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x00000007. This > value wasn't changed in the trunk. I suppose it will be raised to > 0x00000008 for numpy 1.7. > > Can we suppose that old code check input with PyArray_Check()? I think > so, but it would be really helpful if people that are here for longer > them me can confirm/deny this? > > Should be 6 in 1.6 # Binary compatibility version number. This number is increased whenever the # C-API is changed such that binary compatibility is broken, i.e. whenever a # recompile of extension modules is needed. C_ABI_VERSION = 0x01000009 # Minor API version. This number is increased whenever a change is made to the # C-API -- whether it breaks binary compatibility or not. Some changes, such # as adding a function pointer to the end of the function table, can be made # without breaking binary compatibility. In this case, only the C_API_VERSION # (*not* C_ABI_VERSION) would be increased. Whenever binary compatibility is # broken, both C_API_VERSION and C_ABI_VERSION should be increased. C_API_VERSION = 0x00000006 It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing 7 for 1.6? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Apr 24 16:26:42 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 24 Apr 2012 16:26:42 -0400 Subject: [Numpy-discussion] Masked Arrays in NumPy 1.x In-Reply-To: References: Message-ID: On Tue, Apr 24, 2012 at 4:03 PM, Charles R Harris wrote: > > Should be 6 in 1.6 > > # Binary compatibility version number. This number is increased whenever the > # C-API is changed such that binary compatibility is broken, i.e. whenever a > # recompile of extension modules is needed. > C_ABI_VERSION = 0x01000009 > > # Minor API version.? This number is increased whenever a change is made to > the > # C-API -- whether it breaks binary compatibility or not.? Some changes, > such > # as adding a function pointer to the end of the function table, can be made > # without breaking binary compatibility.? In this case, only the > C_API_VERSION > # (*not* C_ABI_VERSION) would be increased.? Whenever binary compatibility > is > # broken, both C_API_VERSION and C_ABI_VERSION should be increased. > C_API_VERSION = 0x00000006 > > It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing > 7 for 1.6? My bad, when I grepped, I found this line: ./build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h:#define NPY_API_VERSION 0x00000007 That tell the version 0x00000007. But this is in a file in the build directory. As I my last build was with a later version, it isn't the right number! Fred From charlesr.harris at gmail.com Tue Apr 24 17:25:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 15:25:29 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: 2012/4/24 St?fan van der Walt > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: > > The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > We seem to have a consensus regarding interest in the topic. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Apr 24 18:52:09 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 24 Apr 2012 15:52:09 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: Hi, On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris wrote: > > > 2012/4/24 St?fan van der Walt >> >> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris >> wrote: >> > The advantage of nans, I suppose, is that they are in the hardware and >> > so >> >> Why are we having a discussion on NAN's in a thread on consensus? >> This is a strong indicator of the problem we're facing. >> > > We seem to have a consensus regarding interest in the topic. This email is mainly to Travis. This thread seems to be dying, condemning us to keep repeating the same conversation with no result. Chuck has made it clear he is not interested in this conversation. Until it is clear you are interested in this conversation, it will keep dying. As you know, I think that will be very bad for numpy, and, as you know, I care a great deal about that. So, please, if you care about this, and agree that something should be done, please, say so, and if you don't agree something should be done, say so. It can't better without your help, See you, Matthew From stefan at sun.ac.za Tue Apr 24 19:01:06 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 24 Apr 2012 16:01:06 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris wrote: >> Why are we having a discussion on NAN's in a thread on consensus? >> This is a strong indicator of the problem we're facing. > > We seem to have a consensus regarding interest in the topic. For the benefit of those of us interested in both discussions, would you kindly start a new thread on the MA topic? In response to Travis's suggestion of writing up a short summary of community principles, as well as Matthew's initial formulation, I agree that this would be helpful in enshrining the values we cherish here, as well as in communicating those values to the next generation of developers. >From observing the community, I would guess that these values include: - That any party with an interest in NumPy is given the opportunity to speak and to be heard on the list. - That discussions that influence the course of the project take place openly, for anyone to observe. - That decisions are made once consensus is reached, i.e., if everyone agrees that they can live with the outcome. To summarize: NumPy development that is free & fair, open and unified. We'll sometimes mess up and not follow our own guidelines, but with them in place at least we'll have something to refer back to as a reminder. Regards St?fan From travis at continuum.io Tue Apr 24 19:02:57 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 18:02:57 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: Thanks for the reminder, Stefan and keeping us on track. It is very helpful to those trying to sort through the messages to keep the discussions to one subject per thread. -Travis On Apr 24, 2012, at 2:23 PM, St?fan van der Walt wrote: > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: >> The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Apr 24 19:24:14 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 18:24:14 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> On Apr 24, 2012, at 6:01 PM, St?fan van der Walt wrote: > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: >>> Why are we having a discussion on NAN's in a thread on consensus? >>> This is a strong indicator of the problem we're facing. >> >> We seem to have a consensus regarding interest in the topic. > > For the benefit of those of us interested in both discussions, would > you kindly start a new thread on the MA topic? > > In response to Travis's suggestion of writing up a short summary of > community principles, as well as Matthew's initial formulation, I > agree that this would be helpful in enshrining the values we cherish > here, as well as in communicating those values to the next generation > of developers. > >> From observing the community, I would guess that these values include: > > - That any party with an interest in NumPy is given the opportunity to > speak and to be heard on the list. > - That discussions that influence the course of the project take place > openly, for anyone to observe. > - That decisions are made once consensus is reached, i.e., if everyone > agrees that they can live with the outcome. This is well stated. Thank you Stefan. Some will argue about what "consensus" means or who "everyone" is. But, if we are really worrying about that, then we have stopped listening to each other which is the number one community value that we should be promoting, demonstrating, and living by. Consensus to me means that anyone who can produce a well-reasoned argument and demonstrates by their persistence that they are actually using the code and are aware of the issues has veto power on pull requests. At times people with commit rights to NumPy might perform a pull request anyway, but they should acknowledge at least in the comment (but for major changes --- on this list) that they are doing so and provide their reasons. If I decide later that I think the pull request was made inappropriately in the face of objections and the reasons were not justified, then I will reserve the right to revert the pull request. I would like core developers of NumPy to have the same ability to check me as well. But, if there is a disagreement at that level, then I will reserve the right to decide. Basically, what we have in this situation is that the masked arrays were added to NumPy master with serious objections to the API. What I'm trying to decide right now is can we move forward and satisfy the objections without removing the ndarrayobject changes entirely (I do think the concerns warrant removal of the changes). The discussion around that is the most helpful right now, but should take place on another thread. Thanks, -Travis From travis at continuum.io Tue Apr 24 19:28:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 18:28:50 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: <4ED1A8D0-40A2-457C-BBF4-D15D17A28489@continuum.io> On Apr 24, 2012, at 5:52 PM, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: >> >> >> 2012/4/24 St?fan van der Walt >>> >>> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris >>> wrote: >>>> The advantage of nans, I suppose, is that they are in the hardware and >>>> so >>> >>> Why are we having a discussion on NAN's in a thread on consensus? >>> This is a strong indicator of the problem we're facing. >>> >> >> We seem to have a consensus regarding interest in the topic. > > This email is mainly to Travis. > > This thread seems to be dying, condemning us to keep repeating the > same conversation with no result. > > Chuck has made it clear he is not interested in this conversation. > Until it is clear you are interested in this conversation, it will > keep dying. As you know, I think that will be very bad for numpy, > and, as you know, I care a great deal about that. I am interested in the conversation, but I think I've already stated my views as well as I know how. I'm not sure what else I should do at this point. We do need consensus (defined as the absence of serious objectors) for me to agree to a NumPy 1.X release. I don't think it helps us get to a consensus to further discuss non-technical issues at this point. There is much interest in ideas for finding common ground in the masked array situation, but that should happen on another thread. -Travis From charlesr.harris at gmail.com Tue Apr 24 19:49:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 17:49:53 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: 2012/4/24 St?fan van der Walt > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: > >> Why are we having a discussion on NAN's in a thread on consensus? > >> This is a strong indicator of the problem we're facing. > > > > We seem to have a consensus regarding interest in the topic. > > For the benefit of those of us interested in both discussions, would > you kindly start a new thread on the MA topic? > > In response to Travis's suggestion of writing up a short summary of > community principles, as well as Matthew's initial formulation, I > agree that this would be helpful in enshrining the values we cherish > here, as well as in communicating those values to the next generation > of developers. > > I think we adhere to these pretty well already, the problem is with the word 'everyone'. I grew up in Massachusetts where town meetings were a tradition. At those meetings the townsfolk voted on the budget, zoning, construction of public buildings, use of public spaces and other such topics. A quorum of voters was needed to make the votes binding, and apart from that the meeting was limited to people who lived in the town, they, after all, paid the taxes and had to live with the decisions. Outsiders could sit in by invitation, but had to sit in a special area and were not expected to speak unless called upon and certainly couldn't vote. So that is one tradition, a democratic tradition with a history of success. We are a much smaller community, physically separated, and don't need that sort of exclusivity, but even so we have our version of resident and taxes, which consists of hanging out on the list and contributing work. I think everyone is welcome to express an opinion and make an argument, but not everyone has a veto. I think a veto is a privilege, not a right, and to have that privilege I think one needs to demonstrate an investment in the project, consisting in this case of code contributions, code review, and other such mundane tasks that demonstrate a larger interest and a willingness to work. Anyone can do this, it doesn't require permission or special dispensation, Numpy is very open in that regard. Folks working in related projects, such as ipython and pandas, are also going to be listened to because they have made that investment in time and work and the popularity of Numpy depends on keeping them happy. But a right to veto doesn't automatically extend to everyone who happens to have an interest in a topic. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 24 19:59:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 17:59:09 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:01 PM, St?fan van der Walt wrote: > > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > >>> Why are we having a discussion on NAN's in a thread on consensus? > >>> This is a strong indicator of the problem we're facing. > >> > >> We seem to have a consensus regarding interest in the topic. > > > > For the benefit of those of us interested in both discussions, would > > you kindly start a new thread on the MA topic? > > > > In response to Travis's suggestion of writing up a short summary of > > community principles, as well as Matthew's initial formulation, I > > agree that this would be helpful in enshrining the values we cherish > > here, as well as in communicating those values to the next generation > > of developers. > > > >> From observing the community, I would guess that these values include: > > > > - That any party with an interest in NumPy is given the opportunity to > > speak and to be heard on the list. > > - That discussions that influence the course of the project take place > > openly, for anyone to observe. > > - That decisions are made once consensus is reached, i.e., if everyone > > agrees that they can live with the outcome. > > This is well stated. Thank you Stefan. > > Some will argue about what "consensus" means or who "everyone" is. But, > if we are really worrying about that, then we have stopped listening to > each other which is the number one community value that we should be > promoting, demonstrating, and living by. > > Consensus to me means that anyone who can produce a well-reasoned argument > and demonstrates by their persistence that they are actually using the code > and are aware of the issues has veto power on pull requests. At times > people with commit rights to NumPy might perform a pull request anyway, but > they should acknowledge at least in the comment (but for major changes --- > on this list) that they are doing so and provide their reasons. > > If I decide later that I think the pull request was made inappropriately > in the face of objections and the reasons were not justified, then I will > reserve the right to revert the pull request. I would like core > developers of NumPy to have the same ability to check me as well. But, > if there is a disagreement at that level, then I will reserve the right to > decide. > > Basically, what we have in this situation is that the masked arrays were > added to NumPy master with serious objections to the API. What I'm trying > to decide right now is can we move forward and satisfy the objections > without removing the ndarrayobject changes entirely (I do think the > concerns warrant removal of the changes). The discussion around that is > the most helpful right now, but should take place on another thread. > > Travis, if you are playing the BDFL role, then just make the darn decision and remove the code so we can get on with life. As it is you go back and forth and that does none of us any good, you're a big guy and you're rocking the boat. I don't agree with that decision, I'd rather evolve the code we have, but I'm willing to compromise with your decision in this matter. I'm not willing to compromise with Nathaniel's, nor it seems vice-versa. Nathaniel has volunteered to do the work, just ask him to submit a patch. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Apr 24 20:16:04 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 24 Apr 2012 17:16:04 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tue, Apr 24, 2012 at 4:49 PM, Charles R Harris wrote: > But a right to veto doesn't automatically extend to everyone who happens to have > an interest in a topic. The time has long gone when we simply hacked on NumPy for our own benefit; if you will, NumPy users are our customers, and they have a stake in its development (or, to phrase it differently, I think we have a commitment to them). If we strongly encourage people to discuss, but still give them an avenue to object, we keep ourselves honest (both w.r.t. expectations on numpy and our own insight into problems and their solutions). St?fan From ben.root at ou.edu Tue Apr 24 20:45:34 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 24 Apr 2012 20:45:34 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Tuesday, April 24, 2012, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > > > > > > 2012/4/24 St?fan van der Walt > > >> > >> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > >> > wrote: > >> > The advantage of nans, I suppose, is that they are in the hardware and > >> > so > >> > >> Why are we having a discussion on NAN's in a thread on consensus? > >> This is a strong indicator of the problem we're facing. > >> > > > > We seem to have a consensus regarding interest in the topic. > > This email is mainly to Travis. > > This thread seems to be dying, condemning us to keep repeating the > same conversation with no result. > > Chuck has made it clear he is not interested in this conversation. > Until it is clear you are interested in this conversation, it will > keep dying. As you know, I think that will be very bad for numpy, > and, as you know, I care a great deal about that. > > So, please, if you care about this, and agree that something should be > done, please, say so, and if you don't agree something should be done, > say so. It can't better without your help, > > See you, > > Matthew > Matthew, I agree with the general idea of consensus, and I think many of us here agree with the ideal in principle. Quite frankly, I am not sure what more you want from us. You are only going to get so much leeway on a philosophical discussion on goverance on a numerical computation mail list. The thread keeps "dying" (i say it is getting distracted) because coders are champing at the bit to get stuff done. In a sense, i think there is a consensus, if you will, to move on. All in favor, say "Aye!" Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 24 20:45:46 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 25 Apr 2012 01:45:46 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Wed, Apr 25, 2012 at 12:49 AM, Charles R Harris wrote: > I think we adhere to these pretty well already, the problem is with the word > 'everyone'. I grew up in Massachusetts where town meetings were a tradition. > At those meetings the townsfolk voted on the budget, zoning, construction of > public buildings, use of public spaces and other such topics. A quorum of > voters was needed to make the votes binding, and apart from that the meeting > was limited to people who lived in the town, they, after all, paid the taxes > and had to live with the decisions. Outsiders could sit in by invitation, > but had to sit in a special area and were not expected to speak unless > called upon and certainly couldn't vote. So that is one tradition, a > democratic tradition with a history of success. We are a much smaller > community, physically separated, and don't need that sort of exclusivity, > but even so we have our version of resident and taxes, which consists of > hanging out on the list and contributing work. I think everyone is welcome > to express an opinion and make an argument, but not everyone has a veto. I > think a veto is a privilege, not a right, and to have that privilege I think > one needs to demonstrate an investment in the project, consisting in this > case of code contributions, code review, and other such mundane tasks that > demonstrate a larger interest and a willingness to work. Anyone can do this, > it doesn't require permission or special dispensation, Numpy is very open in > that regard. Folks working in related projects, such as ipython and pandas, > are also going to be listened to because they have made that investment in > time and work and the popularity of Numpy depends on keeping them happy. But > a right to veto doesn't automatically extend to everyone who happens to have > an interest in a topic. Consensus-seeking isn't about privilege or moral rights. It's about ruthless pragmatism. The end of your message actually gets very close to the position I'm advocating -- except that I'm saying, instead of trying to judge which people are worth keeping happy by looking up their commit record on projects you've heard of, you're safer erroring on the side of assuming that anyone taking the time to show up probably has some good reason for doing so, and that their concerns are probably shared by a larger group. You wouldn't refuse to try a chef's cooking until she's proven herself by washing dishes -- why the heck would you demand that people perform "mundane tasks" before you're willing to trust they have some insight? Acting as maintainer isn't a privilege -- it's a gift you give. So is feedback. Ignoring it is just a way of shooting your own project in the foot. - N From travis at continuum.io Tue Apr 24 20:46:31 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 19:46:31 -0500 Subject: [Numpy-discussion] Removal of mask arrays? [was consensus] In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> Message-ID: On Apr 24, 2012, at 6:59 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:01 PM, St?fan van der Walt wrote: > > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > >>> Why are we having a discussion on NAN's in a thread on consensus? > >>> This is a strong indicator of the problem we're facing. > >> > >> We seem to have a consensus regarding interest in the topic. > > > > For the benefit of those of us interested in both discussions, would > > you kindly start a new thread on the MA topic? > > > > In response to Travis's suggestion of writing up a short summary of > > community principles, as well as Matthew's initial formulation, I > > agree that this would be helpful in enshrining the values we cherish > > here, as well as in communicating those values to the next generation > > of developers. > > > >> From observing the community, I would guess that these values include: > > > > - That any party with an interest in NumPy is given the opportunity to > > speak and to be heard on the list. > > - That discussions that influence the course of the project take place > > openly, for anyone to observe. > > - That decisions are made once consensus is reached, i.e., if everyone > > agrees that they can live with the outcome. > > This is well stated. Thank you Stefan. > > Some will argue about what "consensus" means or who "everyone" is. But, if we are really worrying about that, then we have stopped listening to each other which is the number one community value that we should be promoting, demonstrating, and living by. > > Consensus to me means that anyone who can produce a well-reasoned argument and demonstrates by their persistence that they are actually using the code and are aware of the issues has veto power on pull requests. At times people with commit rights to NumPy might perform a pull request anyway, but they should acknowledge at least in the comment (but for major changes --- on this list) that they are doing so and provide their reasons. > > If I decide later that I think the pull request was made inappropriately in the face of objections and the reasons were not justified, then I will reserve the right to revert the pull request. I would like core developers of NumPy to have the same ability to check me as well. But, if there is a disagreement at that level, then I will reserve the right to decide. > > Basically, what we have in this situation is that the masked arrays were added to NumPy master with serious objections to the API. What I'm trying to decide right now is can we move forward and satisfy the objections without removing the ndarrayobject changes entirely (I do think the concerns warrant removal of the changes). The discussion around that is the most helpful right now, but should take place on another thread. > > > Travis, if you are playing the BDFL role, then just make the darn decision and remove the code so we can get on with life. As it is you go back and forth and that does none of us any good, you're a big guy and you're rocking the boat. I don't agree with that decision, I'd rather evolve the code we have, but I'm willing to compromise with your decision in this matter. I'm not willing to compromise with Nathaniel's, nor it seems vice-versa. Nathaniel has volunteered to do the work, just ask him to submit a patch. I would like to see Nathaniel and Mark work out a document that they can both agree to and co-author that is presented to this list before doing something like that. At the very least this should summarize the feature from both perspectives. I have been encouraged by Nathaniel's willingness to contribute code, and I know Mark is looking for acceptable solutions that are still consistent with his view of things. These are all positive signs to me. We need to give this another week or two. I would prefer a solution that evolves the code as well. But, I also don't want yet another masked array implementation that gets little use but has real and long-lasting implications on the ndarray structure. There is both the effect of the growth of the ndarray structure (most uses should not worry about this at all), but also the growth of the *concept* of an ndarray --- this is a little more subtle but also has real downstream implications. Some of these implications have been pointed out already by consumers of the C-API who are unsure about how code that was not built with masks in mind will respond (I believe it will raise an error if they are using the standard APIs -- It probably should if it doesn't). Long term, I agree that the NumPy array should not be so tied to a particular *implementation* as it is now. I also don't think it should be tied so deeply to ABI compatibility. I think that was a mistake to be so devoted to this concept that we created a lot of work for ourselves --- work that is easily eliminated by distributions that re-compile down-stream dependencies after an ABI breaking release of NumPy. I realize, I didn't disagree very strongly before -- I disagree with my earlier view. That's not to say future releases of NumPy 1.X will break ABI compatibility --- I just think the price is not worth the value of the thing we have set as the standard (it's just a simple re-compile of downstream dependencies). We are quite delayed to get things out as you have noted. If the desire is to get a long-term release schedule for Debian and/or Ubuntu, then I think the 1.6.2 release is a good idea. It also makes more sense to me as a Long-Term Release candidate. -Travis > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Apr 24 20:53:59 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 19:53:59 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> Message-ID: On Apr 24, 2012, at 7:16 PM, St?fan van der Walt wrote: > On Tue, Apr 24, 2012 at 4:49 PM, Charles R Harris > wrote: >> But a right to veto doesn't automatically extend to everyone who happens to have >> an interest in a topic. This is not my view, but it is Charles view and as he is an active developer in the NumPy community so this carries weight. I hope he can be convinced that active users are an important part of the community. Charles has made tremendous contributions to this community starting with significant code in Numarray that now lives in NumPy, significant commitment to code quality, significant effort on responding to pull requests, diligence in triaging and applying bug-fixes in tickets, and even responding to people who disagree with him. > > The time has long gone when we simply hacked on NumPy for our own > benefit; if you will, NumPy users are our customers, and they have a > stake in its development (or, to phrase it differently, I think we > have a commitment to them). > > If we strongly encourage people to discuss, but still give them an > avenue to object, we keep ourselves honest (both w.r.t. expectations > on numpy and our own insight into problems and their solutions). +1 > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Apr 24 20:56:27 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 25 Apr 2012 01:56:27 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris wrote: > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > wrote: >> >> On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt >> wrote: >> > If you are referring to the traditional concept of a fork, and not to >> > the type we frequently make on GitHub, then I'm surprised that no one >> > has objected already. ?What would a fork solve? To paraphrase the >> > regexp saying: after forking, we'll simply have two problems. >> >> I concur with you here: github 'forks', yes, as many as possible! >> Hopefully every one of those will produce one or more PRs :) ?But a >> fork in the sense of a divergent parallel project? ?I think that would >> only be indicative of a complete failure to find a way to make >> progress here, and I doubt we're anywhere near that state. >> >> That forks are *possible* is indeed a valuable and important option in >> open source software, because it means that a truly dysfunctional >> original project team/direction can't hold a community hostage >> forever. ?But that doesn't mean that full-blown forks should be >> considered lightly, as they also carry enormous costs. >> >> I see absolutely nothing in the current scenario to even remotely >> consider that a full-blown fork would be a good idea, and I hope I'm >> right. ?It seems to me we're making progress on problems that led to >> real difficulties last year, but from multiple parties I see signs >> that give me reason to be optimistic that the project is getting >> better, not worse. >> > > We certainly aren't there at the moment, but I can see us heading that way. > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since > then datetime, NA, polynomial work, and various other enhancements have gone > in along with some 280 bug fixes. The major technical problem blocking a 1.7 > release is getting datetime working reliably on windows. So I think that is > where the short term effort needs to be. Meanwhile, we are spending effort > to get out a 1.6.2 just so people can work with a stable version with some > of the bug fixes, and potentially we will spend more time and effort to pull > out the NA code. In the future there may be a transition to C++ and > eventually a break with the current ABI. Or not. > > There are at least two motivations that get folks to write code for open > source projects, scratching an itch and money. Money hasn't been a big part > of the Numpy picture so far, so that leaves scratching an itch. One of the > attractions of Numpy is that it is a small project, BSD licensed, and not > overburdened with governance and process. This makes scratching an itch not > as difficult as it would be in a large project. If Numpy remains a small > project but acquires the encumbrances of a big project much of that > attraction will be lost. Momentum and direction also attracts people, but > numpy is stalled at the moment as the whole NA thing circles around once > again. I don't think we need a fork, or to start maintaining separate stable and unstable trees, or any of the other complicated process changes that have been suggested. There are tons of projects that routinely make much bigger changes than we're talking about, and they do it without needing that kind of overhead. I know that these suggestions are all made in good faith, but they remind me of a line from that Apache page I linked earlier: "People tend to avoid conflict and thrash around looking for something to substitute - somebody in charge, a rule, a process, stagnation. None of these tend to be very good substitutes for doing the hard work of resolving the conflict." I also think if you talk to potential contributors, you'll find that clear, simple processes and a history of respecting everyone's input are much more attractive than a no-rules free-for-all. Good engineering practices are not an "encumbrance". Resolving conflicts before merging is a good engineering practice. What happened with the NA discussion is this: - There was substantial disagreement about whether NEP-style masks, or indeed, focusing on a mask-based implementation *at all*, was the best way forward. - There was also a perceived time constraint, that we had to either implement something immediately while Mark was there, or have nothing. So in the end, the latter concern outweighed the former, the discussion was cut off, and Mark's best guess at an API was merged into master. I totally understand how this decision made sense at the time, but the result is what we see now: it's left numpy stalled, rifts on the mailing list, boring discussions about process, and still no agreement about whether NEP-style masks will actually solve our users' problems. Getting past this isn't *complicated* -- it's just "hard work". > What would I suggest as a way forward with the NA option. Let's take the > issues. > > 1) Adding slots to PyArrayObject_fields. I don't think this is likely to be > a problem unless someone's code passes the struct by value or uses > assignment to initialize a statically allocated instance. I'm not saying no > one does that, low level scientific code can contain all sorts of bizarre > and astonishing constructs and it is also possible that these sort of things > might turn up in an old FORTRAN program. The question here is whether to > allow any changes at all, and I think we will have to in the future. Given > that, consistent use of accessors will make later changes to the > organization or implementation of the base structure transparent. Numpy > itself now uses accessors for the heritage slots, but not for the new NA > slots. So I suggest at a minimum adding accessors for the maskna_dtype, > maskna_data, and maskna_strides. Of course, later removing these slots will > still remain a problem. > > 2) NA. This breaks down into API and implementation issues. Personally, I > think marking the NA stuff experimental leaves room to modify both and would > prefer to go with what we have and change it into whatever looks best by > modification through pull requests. This kicks the can down the road, but > not so far that people sufficiently interested in working on the topic can't > get modifications in. My own preferences for future API modifications are as > follows. > > a) All arrays should be implicitly masked, even if the mask isn't initially > allocated. The maskna keyword can then be removed, taking with it the sense > that there are two kinds of arrays. > > b) There needs to be a distinction between missing and ignore. The mechanism > for this is already in place in the payload type, although it isn't clear to > me that that is uniformly used in all the NA code. There is also a place for > missing *and* ignored. Which leads to > > c) Sums, etc. should always skip ignored data. If missing data is present, > but not ignored, then a sum should return NA. The main danger I see here is > that the behavior of arrays becomes state dependent, something that can lead > to subtle problems. Explicit request for a particular behavior, as is done > now, might be preferable for its clarity. > > d) I think views are a good way add another mask layer to existing arrays. > > And for implementation: > > a) Ufunc loop support. This is most easily done with explicit masks. > > b) Apropos a), I'm coming (again) to the opinion that byte masks are the > simplest and most general implementation. Unfortunately, I think that there are more fundamental disagreements to address before we worry about these questions. Even more unfortunately, I've just spent a bunch of time trying to articulate what those are, but it's in a draft of this summary Mark and I are working on, which I can't really share until he's looked at... so, I don't want to ignore your attempts to move forward, but can I ask you to look for my response in a day or two and in another thread? :-) -- Nathaniel From charlesr.harris at gmail.com Tue Apr 24 21:12:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 19:12:53 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: > On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris > wrote: > > > > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > > wrote: > >> > >> On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt > >> wrote: > >> > If you are referring to the traditional concept of a fork, and not to > >> > the type we frequently make on GitHub, then I'm surprised that no one > >> > has objected already. What would a fork solve? To paraphrase the > >> > regexp saying: after forking, we'll simply have two problems. > >> > >> I concur with you here: github 'forks', yes, as many as possible! > >> Hopefully every one of those will produce one or more PRs :) But a > >> fork in the sense of a divergent parallel project? I think that would > >> only be indicative of a complete failure to find a way to make > >> progress here, and I doubt we're anywhere near that state. > >> > >> That forks are *possible* is indeed a valuable and important option in > >> open source software, because it means that a truly dysfunctional > >> original project team/direction can't hold a community hostage > >> forever. But that doesn't mean that full-blown forks should be > >> considered lightly, as they also carry enormous costs. > >> > >> I see absolutely nothing in the current scenario to even remotely > >> consider that a full-blown fork would be a good idea, and I hope I'm > >> right. It seems to me we're making progress on problems that led to > >> real difficulties last year, but from multiple parties I see signs > >> that give me reason to be optimistic that the project is getting > >> better, not worse. > >> > > > > We certainly aren't there at the moment, but I can see us heading that > way. > > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. > Since > > then datetime, NA, polynomial work, and various other enhancements have > gone > > in along with some 280 bug fixes. The major technical problem blocking a > 1.7 > > release is getting datetime working reliably on windows. So I think that > is > > where the short term effort needs to be. Meanwhile, we are spending > effort > > to get out a 1.6.2 just so people can work with a stable version with > some > > of the bug fixes, and potentially we will spend more time and effort to > pull > > out the NA code. In the future there may be a transition to C++ and > > eventually a break with the current ABI. Or not. > > > > There are at least two motivations that get folks to write code for open > > source projects, scratching an itch and money. Money hasn't been a big > part > > of the Numpy picture so far, so that leaves scratching an itch. One of > the > > attractions of Numpy is that it is a small project, BSD licensed, and not > > overburdened with governance and process. This makes scratching an itch > not > > as difficult as it would be in a large project. If Numpy remains a small > > project but acquires the encumbrances of a big project much of that > > attraction will be lost. Momentum and direction also attracts people, but > > numpy is stalled at the moment as the whole NA thing circles around once > > again. > > I don't think we need a fork, or to start maintaining separate stable > and unstable trees, or any of the other complicated process changes > that have been suggested. There are tons of projects that routinely > make much bigger changes than we're talking about, and they do it > without needing that kind of overhead. I know that these suggestions > are all made in good faith, but they remind me of a line from that > Apache page I linked earlier: "People tend to avoid conflict and > thrash around looking for something to substitute - somebody in > charge, a rule, a process, stagnation. None of these tend to be very > good substitutes for doing the hard work of resolving the conflict." > > I also think if you talk to potential contributors, you'll find that > clear, simple processes and a history of respecting everyone's input > are much more attractive than a no-rules free-for-all. Good > engineering practices are not an "encumbrance". Resolving conflicts > before merging is a good engineering practice. > > What happened with the NA discussion is this: > - There was substantial disagreement about whether NEP-style masks, > or indeed, focusing on a mask-based implementation *at all*, was the > best way forward. > - There was also a perceived time constraint, that we had to either > implement something immediately while Mark was there, or have nothing. > > So in the end, the latter concern outweighed the former, the > discussion was cut off, and Mark's best guess at an API was merged > into master. I totally understand how this decision made sense at the > time, but the result is what we see now: it's left numpy stalled, > rifts on the mailing list, boring discussions about process, and still > no agreement about whether NEP-style masks will actually solve our > users' problems. > > Getting past this isn't *complicated* -- it's just "hard work". > I admit to a certain curiosity about your own involvement in FOSS projects, and I know I'm not alone in this. Google shows several years of discussion on Monotone, but I have no idea what your contributions were outside of documentation. Monotone has sort of died, but that is the luck of the draw. Would you care to comment on the success of the process in that project and what went right or wrong? The other thing I see your name attached to is xpra, which gets good reviews, but that was a personal project and forked after you found more interesting things to work on. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Apr 24 22:41:50 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 24 Apr 2012 19:41:50 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: >> >> On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez >> > wrote: >> >> >> >> On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt >> >> wrote: >> >> > If you are referring to the traditional concept of a fork, and not to >> >> > the type we frequently make on GitHub, then I'm surprised that no one >> >> > has objected already. ?What would a fork solve? To paraphrase the >> >> > regexp saying: after forking, we'll simply have two problems. >> >> >> >> I concur with you here: github 'forks', yes, as many as possible! >> >> Hopefully every one of those will produce one or more PRs :) ?But a >> >> fork in the sense of a divergent parallel project? ?I think that would >> >> only be indicative of a complete failure to find a way to make >> >> progress here, and I doubt we're anywhere near that state. >> >> >> >> That forks are *possible* is indeed a valuable and important option in >> >> open source software, because it means that a truly dysfunctional >> >> original project team/direction can't hold a community hostage >> >> forever. ?But that doesn't mean that full-blown forks should be >> >> considered lightly, as they also carry enormous costs. >> >> >> >> I see absolutely nothing in the current scenario to even remotely >> >> consider that a full-blown fork would be a good idea, and I hope I'm >> >> right. ?It seems to me we're making progress on problems that led to >> >> real difficulties last year, but from multiple parties I see signs >> >> that give me reason to be optimistic that the project is getting >> >> better, not worse. >> >> >> > >> > We certainly aren't there at the moment, but I can see us heading that >> > way. >> > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. >> > Since >> > then datetime, NA, polynomial work, and various other enhancements have >> > gone >> > in along with some 280 bug fixes. The major technical problem blocking a >> > 1.7 >> > release is getting datetime working reliably on windows. So I think that >> > is >> > where the short term effort needs to be. Meanwhile, we are spending >> > effort >> > to get out a 1.6.2 just so people can work with a stable version with >> > some >> > of the bug fixes, and potentially we will spend more time and effort to >> > pull >> > out the NA code. In the future there may be a transition to C++ and >> > eventually a break with the current ABI. Or not. >> > >> > There are at least two motivations that get folks to write code for open >> > source projects, scratching an itch and money. Money hasn't been a big >> > part >> > of the Numpy picture so far, so that leaves scratching an itch. One of >> > the >> > attractions of Numpy is that it is a small project, BSD licensed, and >> > not >> > overburdened with governance and process. This makes scratching an itch >> > not >> > as difficult as it would be in a large project. If Numpy remains a small >> > project but acquires the encumbrances of a big project much of that >> > attraction will be lost. Momentum and direction also attracts people, >> > but >> > numpy is stalled at the moment as the whole NA thing circles around once >> > again. >> >> I don't think we need a fork, or to start maintaining separate stable >> and unstable trees, or any of the other complicated process changes >> that have been suggested. There are tons of projects that routinely >> make much bigger changes than we're talking about, and they do it >> without needing that kind of overhead. I know that these suggestions >> are all made in good faith, but they remind me of a line from that >> Apache page I linked earlier: "People tend to avoid conflict and >> thrash around looking for something to substitute - somebody in >> charge, a rule, a process, stagnation. None of these tend to be very >> good substitutes for doing the hard work of resolving the conflict." >> >> I also think if you talk to potential contributors, you'll find that >> clear, simple processes and a history of respecting everyone's input >> are much more attractive than a no-rules free-for-all. Good >> engineering practices are not an "encumbrance". Resolving conflicts >> before merging is a good engineering practice. >> >> What happened with the NA discussion is this: >> ?- There was substantial disagreement about whether NEP-style masks, >> or indeed, focusing on a mask-based implementation *at all*, was the >> best way forward. >> ?- There was also a perceived time constraint, that we had to either >> implement something immediately while Mark was there, or have nothing. >> >> So in the end, the latter concern outweighed the former, the >> discussion was cut off, and Mark's best guess at an API was merged >> into master. I totally understand how this decision made sense at the >> time, but the result is what we see now: it's left numpy stalled, >> rifts on the mailing list, boring discussions about process, and still >> no agreement about whether NEP-style masks will actually solve our >> users' problems. >> >> Getting past this isn't *complicated* -- it's just "hard work". > > > I admit to a certain curiosity about your own involvement in FOSS projects, > and I know I'm not alone in this. Google shows several years of discussion > on Monotone, but I have no idea what your contributions were outside of > documentation. Monotone has sort of died, but that is the luck of the draw. > Would you care to comment on the success of the process in that project and > what went right or wrong? The other thing I see your name attached to is > xpra, which gets good reviews, but that was a personal project and forked > after you found more interesting things to work on. I'm sorry, but I think this is really not OK. This is an ad-hominem attack [1]. I'm not going to join in by justifying Nathaniel's expertise, because I do not think he should have to do that. I just think it is wrong to ask people to justify their expertise when we should be discussing the ideas. If I was a new subscriber to this list, and saw that my right to speak was likely to be audited in public if I disagree, I would be very reluctant to differ from the majority. I do not think that is what we want. I think Nathaniel's summary of how the masked array dispute arose and continued is entirely reasonable. I think his analysis of the issues in the masked array API are also reasonable, well-thought out, and well-argued. I think we should address his arguments. I think we're getting stuck precisely because you will not do that. Best, Matthew [1] http://en.wikipedia.org/wiki/Ad_hominem From fperez.net at gmail.com Tue Apr 24 22:56:41 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 24 Apr 2012 19:56:41 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris wrote: > I admit to a certain curiosity about your own involvement in FOSS projects, > and I know I'm not alone in this. Google shows several years of discussion > on Monotone, but I have no idea what your contributions were Seriously??? Please, let's rise above this. We discuss people's opinions *on their technical merit alone*, regardless of the background of the person presenting them. I don't care if Linus himself shows up on the list with a bad idea, it should be shot down; and if someone we'd never heard of brings up a valid point, we should respect it. The day we start "checking credentials at the door" is the day this project will die as an open source effort. Or at least I think so, but perhaps I don't have enough 'commit credits' in my account for my opinion to matter... Cheers, f From charlesr.harris at gmail.com Tue Apr 24 23:02:27 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 21:02:27 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 8:56 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris > wrote: > > I admit to a certain curiosity about your own involvement in FOSS > projects, > > and I know I'm not alone in this. Google shows several years of > discussion > > on Monotone, but I have no idea what your contributions were > > Seriously??? > > Please, let's rise above this. We discuss people's opinions *on their > technical merit alone*, regardless of the background of the person > presenting them. I don't care if Linus himself shows up on the list > with a bad idea, it should be shot down; and if someone we'd never > heard of brings up a valid point, we should respect it. > > The day we start "checking credentials at the door" is the day this > project will die as an open source effort. Or at least I think so, > but perhaps I don't have enough 'commit credits' in my account for my > opinion to matter... > > Fernando, I'm not checking credentials, I'm curious. Nathaniel has experience with FOSS projects, unlike us first timers, and I'd like to know what that experience was and what he learned from it. He has also mentioned Graydon Hoare in connection with RUST, and since Graydon was the prime mover in Monotone I'd like to know the story of the project. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue Apr 24 23:28:58 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 24 Apr 2012 20:28:58 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris wrote: > Fernando, I'm not checking credentials, I'm curious. Well, at least I think that an inquisitive query about someone's background, phrased like that, can be very easily misread. I can only speak for myself, but I immediately had the impression that you were indeed trying to validate his background as a proxy for the discussion, and suggesting that others had the same curiosity... Had the question been something more like "Hey Nathaniel, what other projects do you think could inform our current view, maybe from stuff you've done in the past or lists you've lurked on?", I would have a very different reaction. But this sentence: """ I admit to a certain curiosity about your own involvement in FOSS projects, and I know I'm not alone in this. """ definitely reads to me with a rather dark and unpleasant angle. Upon rereading it again now, I still don't like the tone. I trust you when you indicate that your intent was different; perhaps it's a matter of phrasing, or the fact that English is not my native language and I may miss subtleties of native speakers. Cheers, f From josef.pktd at gmail.com Tue Apr 24 23:40:17 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 24 Apr 2012 23:40:17 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 11:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: >> Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. ?I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. ?But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. ?I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. I agree with the interpretation, however, whenever I look at this thread with google gmail, then I see the first line "If you hang around big FOSS projects, you'll see the word "consensus"" I'm only hanging around in this neighborhood (9 mailing lists), so I have no idea about big FOSS projects. Josef > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Apr 24 23:50:59 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 21:50:59 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: > > Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. > > Perhaps it was a bit colored, but even so, I'd like to know some specifics of his experience. Monotone was one of the projects that sprang up after Linus started using Bitkeeper as an open alternative, but that is actually fairly recent (2003 or so) and much of the discussion seems to have been carried on over IRC, rather than a mailing list. I'm guessing that some other projects could have taken place in the 90's, but things have changed so much since then that it is hard to know what was going on in that decade. There was certainly work on the C++ Template library, Linux, Python, and various utilities. But it is hard to know. In any case, I'd guess that Monotone was a fairly tight knit community, and about 2007 most of the developers left. I'd guess it was mostly a case of git and mercurial becoming dominant, and possibly they also lost interest in DVCS and moved on to other things. Numpy itself has gone through several of those transitions, and looking back, I think one of the problems was that when Travis left for Enthought he didn't officially hand off maintenance. The whole transition was a bit lucky, with David, Pauli, and myself unofficially continuing the work for the 1.3 and 1.4 releases. At that point I was hoping David could more or less take over, but he graduated, and Pauli would have been an excellent choice, but he took up his graduate studies. Turnover is a problem with open source, and no matter how much discussion there is, if people aren't doing the work the whole thing sort of peters out. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Apr 25 00:01:45 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 23:01:45 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Apr 24, 2012, at 9:41 PM, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris > wrote: >> >> >> On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: >>> >>> On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez >>>> wrote: >>>>> >>>>> On Mon, Apr 23, 2012 at 8:49 PM, St?fan van der Walt >>>>> wrote: >>>>>> If you are referring to the traditional concept of a fork, and not to >>>>>> the type we frequently make on GitHub, then I'm surprised that no one >>>>>> has objected already. What would a fork solve? To paraphrase the >>>>>> regexp saying: after forking, we'll simply have two problems. >>>>> >>>>> I concur with you here: github 'forks', yes, as many as possible! >>>>> Hopefully every one of those will produce one or more PRs :) But a >>>>> fork in the sense of a divergent parallel project? I think that would >>>>> only be indicative of a complete failure to find a way to make >>>>> progress here, and I doubt we're anywhere near that state. >>>>> >>>>> That forks are *possible* is indeed a valuable and important option in >>>>> open source software, because it means that a truly dysfunctional >>>>> original project team/direction can't hold a community hostage >>>>> forever. But that doesn't mean that full-blown forks should be >>>>> considered lightly, as they also carry enormous costs. >>>>> >>>>> I see absolutely nothing in the current scenario to even remotely >>>>> consider that a full-blown fork would be a good idea, and I hope I'm >>>>> right. It seems to me we're making progress on problems that led to >>>>> real difficulties last year, but from multiple parties I see signs >>>>> that give me reason to be optimistic that the project is getting >>>>> better, not worse. >>>>> >>>> >>>> We certainly aren't there at the moment, but I can see us heading that >>>> way. >>>> But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. >>>> Since >>>> then datetime, NA, polynomial work, and various other enhancements have >>>> gone >>>> in along with some 280 bug fixes. The major technical problem blocking a >>>> 1.7 >>>> release is getting datetime working reliably on windows. So I think that >>>> is >>>> where the short term effort needs to be. Meanwhile, we are spending >>>> effort >>>> to get out a 1.6.2 just so people can work with a stable version with >>>> some >>>> of the bug fixes, and potentially we will spend more time and effort to >>>> pull >>>> out the NA code. In the future there may be a transition to C++ and >>>> eventually a break with the current ABI. Or not. >>>> >>>> There are at least two motivations that get folks to write code for open >>>> source projects, scratching an itch and money. Money hasn't been a big >>>> part >>>> of the Numpy picture so far, so that leaves scratching an itch. One of >>>> the >>>> attractions of Numpy is that it is a small project, BSD licensed, and >>>> not >>>> overburdened with governance and process. This makes scratching an itch >>>> not >>>> as difficult as it would be in a large project. If Numpy remains a small >>>> project but acquires the encumbrances of a big project much of that >>>> attraction will be lost. Momentum and direction also attracts people, >>>> but >>>> numpy is stalled at the moment as the whole NA thing circles around once >>>> again. >>> >>> I don't think we need a fork, or to start maintaining separate stable >>> and unstable trees, or any of the other complicated process changes >>> that have been suggested. There are tons of projects that routinely >>> make much bigger changes than we're talking about, and they do it >>> without needing that kind of overhead. I know that these suggestions >>> are all made in good faith, but they remind me of a line from that >>> Apache page I linked earlier: "People tend to avoid conflict and >>> thrash around looking for something to substitute - somebody in >>> charge, a rule, a process, stagnation. None of these tend to be very >>> good substitutes for doing the hard work of resolving the conflict." >>> >>> I also think if you talk to potential contributors, you'll find that >>> clear, simple processes and a history of respecting everyone's input >>> are much more attractive than a no-rules free-for-all. Good >>> engineering practices are not an "encumbrance". Resolving conflicts >>> before merging is a good engineering practice. >>> >>> What happened with the NA discussion is this: >>> - There was substantial disagreement about whether NEP-style masks, >>> or indeed, focusing on a mask-based implementation *at all*, was the >>> best way forward. >>> - There was also a perceived time constraint, that we had to either >>> implement something immediately while Mark was there, or have nothing. >>> >>> So in the end, the latter concern outweighed the former, the >>> discussion was cut off, and Mark's best guess at an API was merged >>> into master. I totally understand how this decision made sense at the >>> time, but the result is what we see now: it's left numpy stalled, >>> rifts on the mailing list, boring discussions about process, and still >>> no agreement about whether NEP-style masks will actually solve our >>> users' problems. >>> >>> Getting past this isn't *complicated* -- it's just "hard work". >> >> >> I admit to a certain curiosity about your own involvement in FOSS projects, >> and I know I'm not alone in this. Google shows several years of discussion >> on Monotone, but I have no idea what your contributions were outside of >> documentation. Monotone has sort of died, but that is the luck of the draw. >> Would you care to comment on the success of the process in that project and >> what went right or wrong? The other thing I see your name attached to is >> xpra, which gets good reviews, but that was a personal project and forked >> after you found more interesting things to work on. > > I'm sorry, but I think this is really not OK. This is an ad-hominem attack [1]. > > I'm not going to join in by justifying Nathaniel's expertise, because > I do not think he should have to do that. I just think it is wrong to > ask people to justify their expertise when we should be discussing the > ideas. > > If I was a new subscriber to this list, and saw that my right to speak > was likely to be audited in public if I disagree, I would be very > reluctant to differ from the majority. I do not think that is what we > want. I agree. I really hope that was not the intent. I'm willing to believe that Chuck had only curiosity has his motivator, or perhaps is really trying to understand Nathaniel's experience so that he can contrast it with his own and make his own mind up about what the best ways to participate in open source development are. However, when I read this I saw how this could be taken in the wrong way and thought "This is not the right line of questioning" I would make a petition that we try and please just focus on the technical discussion. > > I think his analysis of the issues in the masked array API are also > reasonable, well-thought out, and well-argued. I think we should > address his arguments. I think we're getting stuck precisely because > you will not do that. While I agree with your general assessment of this particular email, I don't think that is entirely fair. I'm sure you recognize that Chuck is addressing technical issues and providing possible solutions in many of his emails. It's just that this particular line of questioning seems out of place and not relevant. I do hope this does not continue. -Travis From fperez.net at gmail.com Wed Apr 25 00:12:38 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 24 Apr 2012 21:12:38 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 8:50 PM, Charles R Harris wrote: > Turnover is a problem with open source, and no matter how much discussion > there is, if people aren't doing the work the whole thing sort of peters > out. That's very true, and I hope that by building a friendly and welcoming environment, we'll raise the chances of getting sufficient new contributors to help with this issue. For my talk at Euroscipy last year [1] I made some plots collecting git statistics that show how badly loaded most scientific python projects are on the shoulders of very, very few. I really hope we can find ways of spreading the load a bit wider, and everything we can do to make projects more appealing to new contributors is an effort worth making. Cheers, f http://fperez.org/talks/1108_euroscipy_keynote.pdf From travis at continuum.io Wed Apr 25 00:25:19 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 24 Apr 2012 23:25:19 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: > > Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. > > > Perhaps it was a bit colored, but even so, I'd like to know some specifics of his experience. Monotone was one of the projects that sprang up after Linus started using Bitkeeper as an open alternative, but that is actually fairly recent (2003 or so) and much of the discussion seems to have been carried on over IRC, rather than a mailing list. I'm guessing that some other projects could have taken place in the 90's, but things have changed so much since then that it is hard to know what was going on in that decade. There was certainly work on the C++ Template library, Linux, Python, and various utilities. But it is hard to know. In any case, I'd guess that Monotone was a fairly tight knit community, and about 2007 most of the developers left. I'd guess it was mostly a case of git and mercurial becoming dominant, and possibly they also lost interest in DVCS and moved on to other things. > > Numpy itself has gone through several of those transitions, and looking back, I think one of the problems was that when Travis left for Enthought he didn't officially hand off maintenance. The whole transition was a bit lucky, with David, Pauli, and myself unofficially continuing the work for the 1.3 and 1.4 releases. At that point I was hoping David could more or less take over, but he graduated, and Pauli would have been an excellent choice, but he took up his graduate studies. Turnover is a problem with open source, and no matter how much discussion there is, if people aren't doing the work the whole thing sort of peters out. Thanks for explaining yourself. The tone you used could earlier have been mis-interpreted (though I would hope that people would look at your record of contribution and give you the benefit of the doubt). Your last sentence is very true. In this particular case, however, there is enough interest that the whole thing will not peter out, but there is a strong chance that there will be competing groups with divergent needs and interests vying for how the project should develop. There are many people who rely on NumPy and are concerned about its progress. NumFocus was created to fight for resources to further the whole ecosystem and not just rely on volunteers that are available. I fundamentally do not believe that model can scale. There are, however, ways to keep things open source and allow people to work on NumPy as their day-job. Several companies now exist that benefit from the NumPy code base and will be interested in seeing it grow. It is a mis-characterization to imply that I "left the project" without a "hand-off". I never handed off the project because I never left it. I was very busy at Enthought. I will still be busy now. But, NumPy is very important to me and has remained so. I have spent a great deal of mental effort trying to figure out how to contribute to its growth. Yes, I allowed other people to contribute significantly to the project and was very receptive to their pull requests (even when I didn't think it was the most urgent thing or something I actually disagreed with). That should not be interpreted as having "left". NumPy grew because it solved a useful problem and people were willing to tolerate its problems to make a difference by contributing. None of us matter as much to NumPy as the problems it helps people solve. To the degree it does that we are "lucky" to be able to contribute to the project. I hope all NumPy developers continue to be "lucky" enough to have people actually care about the problems NumPy solves now and can solve in the future. -Travis > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 25 01:02:10 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 01:02:10 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 12:25 AM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez > wrote: >> >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. ?I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. ?But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. ?I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> > > Perhaps it was a bit colored, but even so, I'd like to know some specifics > of his experience. Monotone was one of the projects that sprang up after > Linus started using Bitkeeper as an open alternative, but that is actually > fairly recent (2003 or so) and much of the discussion seems to have been > carried on over IRC, rather than a mailing list. I'm guessing that some > other projects could have taken place in the 90's, but things have changed > so much since then that it is hard to know what was going on in that decade. > There was certainly work on the C++ Template library, Linux, Python, and > various utilities. But it is hard to know. In any case, I'd guess that > Monotone was a fairly tight knit community, and about 2007 most of the > developers left. I'd guess it was mostly a case of git and mercurial > becoming dominant, and possibly they also lost interest in DVCS and moved on > to other things. > > Numpy itself has gone through several of those transitions, and looking > back, I think one of the problems was that when Travis left for Enthought he > didn't officially hand off maintenance. The whole transition was a bit > lucky, with David, Pauli, and myself unofficially continuing the work for > the 1.3 and 1.4 releases. At that point I was hoping David could more or > less take over, but he graduated, and Pauli would have been an excellent > choice, but he took up his graduate studies. Turnover is a problem with open > source, and no matter how much discussion there is, if people aren't doing > the work the whole thing sort of peters out. > > > Thanks for explaining yourself. ? ?The tone you used could earlier have been > mis-interpreted (though I would hope that people would look at your record > of contribution and give you the benefit of the doubt). ? Your last sentence > is very true. ? In this particular case, however, there is enough interest > that the whole thing will not peter out, but there is a strong chance that > there will be competing groups with divergent needs and interests vying for > how the project should develop. > > There are many people who rely on NumPy and are concerned about its > progress. ? NumFocus was created to fight for resources to further the whole > ecosystem and not just rely on volunteers that are available. ? I > fundamentally do not believe that model can scale. ? ?There are, however, > ways to keep things open source and allow people to work on NumPy as their > day-job. ?Several companies now exist that benefit from the NumPy code base > and will be interested in seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a > "hand-off". ? I never handed off the project because I never left it. ? I > was very busy at Enthought. ?I will still be busy now. ? But, NumPy is very > important to me and has remained so. ? I have spent a great deal of mental > effort trying to figure out how to contribute to its growth. ? Yes, I > allowed other people to contribute significantly to the project and was very > receptive to their pull requests (even when I didn't think it was the most > urgent thing or something I actually disagreed with). Sorry that I missed this part of numpy history, I always had the impression that numpy is run by a community led by Chuck and the young guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . (But I came late, and am just a balcony muppet.) Josef > > That should not be interpreted as having "left". ? NumPy grew because it > solved a useful problem and people were willing to tolerate its problems to > make a difference by contributing. ? ? None of us matter as much to NumPy as > the problems it helps people solve. ? To the degree it does that we are > "lucky" to be able to contribute to the project. ? I hope all NumPy > developers continue to be "lucky" enough to have people actually care about > the problems NumPy solves now and can solve in the future. > > -Travis > > > > > > > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Wed Apr 25 01:02:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2012 23:02:55 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 10:25 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> >> > Perhaps it was a bit colored, but even so, I'd like to know some specifics > of his experience. Monotone was one of the projects that sprang up after > Linus started using Bitkeeper as an open alternative, but that is actually > fairly recent (2003 or so) and much of the discussion seems to have been > carried on over IRC, rather than a mailing list. I'm guessing that some > other projects could have taken place in the 90's, but things have changed > so much since then that it is hard to know what was going on in that > decade. There was certainly work on the C++ Template library, Linux, > Python, and various utilities. But it is hard to know. In any case, I'd > guess that Monotone was a fairly tight knit community, and about 2007 most > of the developers left. I'd guess it was mostly a case of git and mercurial > becoming dominant, and possibly they also lost interest in DVCS and moved > on to other things. > > Numpy itself has gone through several of those transitions, and looking > back, I think one of the problems was that when Travis left for Enthought > he didn't officially hand off maintenance. The whole transition was a bit > lucky, with David, Pauli, and myself unofficially continuing the work for > the 1.3 and 1.4 releases. At that point I was hoping David could more or > less take over, but he graduated, and Pauli would have been an excellent > choice, but he took up his graduate studies. Turnover is a problem with > open source, and no matter how much discussion there is, if people aren't > doing the work the whole thing sort of peters out. > > > Thanks for explaining yourself. The tone you used could earlier have > been mis-interpreted (though I would hope that people would look at your > record of contribution and give you the benefit of the doubt). Your last > sentence is very true. In this particular case, however, there is enough > interest that the whole thing will not peter out, but there is a strong > chance that there will be competing groups with divergent needs and > interests vying for how the project should develop. > > There are many people who rely on NumPy and are concerned about its > progress. NumFocus was created to fight for resources to further the > whole ecosystem and not just rely on volunteers that are available. I > fundamentally do not believe that model can scale. There are, however, > ways to keep things open source and allow people to work on NumPy as their > day-job. Several companies now exist that benefit from the NumPy code base > and will be interested in seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a > "hand-off". I never handed off the project because I never left it. I > was very busy at Enthought. I will still be busy now. But, NumPy is very > important to me and has remained so. I have spent a great deal of mental > effort trying to figure out how to contribute to its growth. Yes, I > allowed other people to contribute significantly to the project and was > very receptive to their pull requests (even when I didn't think it was the > most urgent thing or something I actually disagreed with). > Well then, let's say you should have handed off, because you no longer had the time to devote to it. You made the 1.2.1 release, and after that you weren't really involved until recently. Now I'm sure that you didn't lose interest, but you did lose the time, and I think it would have been better if you had realized that fact up front. As it was, I suggested to David that it was time for a 1.3 release, and we preceded without permission from the usual suspects, yourself and Jarrod. I think it was pretty fortunate that David was already producing the releases, and I'm very glad that when he went later went to work he made sure to hand off that role. Ralph has been a life saver. The timeline for people's involvement is pretty clear if you look at the Ohloh graphs of developer commits. Numpy 1.2.1 came out at the end of Oct, 2008 and you can trace the number of commits thereafter. Yours are pretty thin through 2009 and pretty much peter out completely in 2010. That isn't an insult, it's just the facts, and those are what we need to deal with. Now I'm glad that you are back, and it would be nice if we could get Pauli and David back, but at the moment Mark is the best new developer, and not only that, he attracted other new developers when he was working on numpy. I hope he hangs around for a while. It's pretty interesting how personal events are suggested by the statistics. If I were a writer they would all suggest stories. > > That should not be interpreted as having "left". NumPy grew because it > solved a useful problem and people were willing to tolerate its problems to > make a difference by contributing. None of us matter as much to NumPy > as the problems it helps people solve. To the degree it does that we are > "lucky" to be able to contribute to the project. I hope all NumPy > developers continue to be "lucky" enough to have people actually care about > the problems NumPy solves now and can solve in the future. > > All true, but as Fernando confirms, actual people have to do the work. After all, people can solve their problems using Matlab, and I suspect many do. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Apr 25 01:17:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 25 Apr 2012 00:17:12 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> Message-ID: <60A68FB0-D439-40C4-9F3D-0B51239E679D@continuum.io> On Apr 25, 2012, at 12:02 AM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 10:25 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > >> >> >> On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> >> >> Perhaps it was a bit colored, but even so, I'd like to know some specifics of his experience. Monotone was one of the projects that sprang up after Linus started using Bitkeeper as an open alternative, but that is actually fairly recent (2003 or so) and much of the discussion seems to have been carried on over IRC, rather than a mailing list. I'm guessing that some other projects could have taken place in the 90's, but things have changed so much since then that it is hard to know what was going on in that decade. There was certainly work on the C++ Template library, Linux, Python, and various utilities. But it is hard to know. In any case, I'd guess that Monotone was a fairly tight knit community, and about 2007 most of the developers left. I'd guess it was mostly a case of git and mercurial becoming dominant, and possibly they also lost interest in DVCS and moved on to other things. >> >> Numpy itself has gone through several of those transitions, and looking back, I think one of the problems was that when Travis left for Enthought he didn't officially hand off maintenance. The whole transition was a bit lucky, with David, Pauli, and myself unofficially continuing the work for the 1.3 and 1.4 releases. At that point I was hoping David could more or less take over, but he graduated, and Pauli would have been an excellent choice, but he took up his graduate studies. Turnover is a problem with open source, and no matter how much discussion there is, if people aren't doing the work the whole thing sort of peters out. > > Thanks for explaining yourself. The tone you used could earlier have been mis-interpreted (though I would hope that people would look at your record of contribution and give you the benefit of the doubt). Your last sentence is very true. In this particular case, however, there is enough interest that the whole thing will not peter out, but there is a strong chance that there will be competing groups with divergent needs and interests vying for how the project should develop. > > There are many people who rely on NumPy and are concerned about its progress. NumFocus was created to fight for resources to further the whole ecosystem and not just rely on volunteers that are available. I fundamentally do not believe that model can scale. There are, however, ways to keep things open source and allow people to work on NumPy as their day-job. Several companies now exist that benefit from the NumPy code base and will be interested in seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a "hand-off". I never handed off the project because I never left it. I was very busy at Enthought. I will still be busy now. But, NumPy is very important to me and has remained so. I have spent a great deal of mental effort trying to figure out how to contribute to its growth. Yes, I allowed other people to contribute significantly to the project and was very receptive to their pull requests (even when I didn't think it was the most urgent thing or something I actually disagreed with). > > Well then, let's say you should have handed off, because you no longer had the time to devote to it. You made the 1.2.1 release, and after that you weren't really involved until recently. Now I'm sure that you didn't lose interest, but you did lose the time, and I think it would have been better if you had realized that fact up front. I will grant you that. Hindsight is 20/20. And that certainly wasn't my intention and so I'm not sure I could have known enough to have that insight. Disappointment in that direction is another story for another time --- it has led directly to my current involvement and interest. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Apr 25 01:18:27 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 24 Apr 2012 22:18:27 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 10:02 PM, wrote: > Sorry that I missed this part of numpy history, I always had the > impression that numpy is run by a community led by Chuck and the young > guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . > (But I came late, and am just a balcony muppet.) Travis, when you have a free minute (ha :) it would be very nice if you wrote up a blog post with some of the history from say the 2000s with Numeric, through Numarray and into Numpy. Some of us saw all that happen first hand and know it well, but since most of it simply happened on mailing lists, conferences and assorted meetings, it's actually quite hard to understand that history if you arrive now. It's not really written up anywhere, and nobody is going to read 10 years' worth of email archives :) Guido a while back wrote a fantastic set of posts on the history of python itself that I've greatly enjoyed: http://python-history.blogspot.com/ something similar for numpy would be nice to have... Though thinking more about it, perhaps a better alternative could be a 'history of the scipy world' where multiple people could write guest posts about each project they've had a part of. I think something like that could be a lot of fun, and also useful :) Cheers, f From gael.varoquaux at normalesup.org Wed Apr 25 01:53:50 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 25 Apr 2012 07:53:50 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <4F96AE14.20302@crans.org> <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> Message-ID: <20120425055350.GB3267@phare.normalesup.org> On Tue, Apr 24, 2012 at 05:59:09PM -0600, Charles R Harris wrote: > Travis, if you are playing the BDFL role, then just make the darn decision > and remove the code so we can get on with life. As it is you go back and > forth and that does none of us any good, you're a big guy and you're > rocking the boat. I don't agree with that decision, I'd rather evolve the > code we have, but I'm willing to compromise with your decision in this > matter. I think that Chuck's point here, in a thread on consensus, is very important: sometimes design discussions stall. If, in such situation, a BDFL makes a decision, acknowledging that he has no divine power to see the best of all option but needs to move on, it can help the project go forward. As long as nobody's feelings are hurt, a bit of dictatorship well used moves a project forward. Of course, as with any leadership, it only works because we as a community trust the leader. My 2 cents, Gael From travis at continuum.io Wed Apr 25 01:56:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 25 Apr 2012 00:56:30 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <50817935-9145-436E-A6F6-B13E31CDC331@continuum.io> Message-ID: <243FE429-E4AF-4B9C-B45B-450D7D465925@continuum.io> I've given several talks on the subject, but I don't think I've ever written a blog-post about it. A reasonable history does exist in the beginning of the "Guide to NumPy" which is still available for free at http://www.tramy.us/numpybook.pdf -Travis On Apr 25, 2012, at 12:18 AM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 10:02 PM, wrote: >> Sorry that I missed this part of numpy history, I always had the >> impression that numpy is run by a community led by Chuck and the young >> guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . >> (But I came late, and am just a balcony muppet.) > > Travis, when you have a free minute (ha :) it would be very nice if > you wrote up a blog post with some of the history from say the 2000s > with Numeric, through Numarray and into Numpy. Some of us saw all > that happen first hand and know it well, but since most of it simply > happened on mailing lists, conferences and assorted meetings, it's > actually quite hard to understand that history if you arrive now. > It's not really written up anywhere, and nobody is going to read 10 > years' worth of email archives :) > > Guido a while back wrote a fantastic set of posts on the history of > python itself that I've greatly enjoyed: > > http://python-history.blogspot.com/ > > something similar for numpy would be nice to have... > > Though thinking more about it, perhaps a better alternative could be a > 'history of the scipy world' where multiple people could write guest > posts about each project they've had a part of. I think something > like that could be a lot of fun, and also useful :) > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Apr 25 06:07:29 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 25 Apr 2012 11:07:29 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 4:02 AM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 8:56 PM, Fernando Perez > wrote: >> >> On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris >> wrote: >> > I admit to a certain curiosity about your own involvement in FOSS >> > projects, >> > and I know I'm not alone in this. Google shows several years of >> > discussion >> > on Monotone, but I have no idea what your contributions were >> >> Seriously??? >> >> Please, let's rise above this. ?We discuss people's opinions *on their >> technical merit alone*, regardless of the background of the person >> presenting them. ?I don't care if Linus himself shows up on the list >> with a bad idea, it should be shot down; and if someone we'd never >> heard of brings up a valid point, we should respect it. >> >> The day we start "checking credentials at the door" is the day this >> project will die as an open source effort. ?Or at least I think so, >> but perhaps I don't have enough 'commit credits' in my account for my >> opinion to matter... >> > > Fernando, I'm not checking credentials, I'm curious. Nathaniel has > experience with FOSS projects, unlike us first timers, and I'd like to know > what that experience was and what he learned from it. He has also mentioned > Graydon Hoare in connection with RUST, and since Graydon was the prime mover > in Monotone I'd like to know the story of the project. Yeah, I don't want to get into resumes and such here, since it'd be hard to avoid turning it into one of those "whose has a bigger FOSS" pecking-order contests, which I find both unpleasant and counter-productive. If I've learned anything useful from experience, then I've already tried to summarize it here (and really, experience may or may not guarantee any kind of wisdom). If you want to swap war stories, ask me some day over a $BEVERAGE :-). After sleeping on it, I was wondering if part of your objection to the consensus stuff is just to the word "veto"? Would you feel more comfortable if it was phrased like, "the maintainers have noticed that trying to pick and choose on contentious issues tends to come back and bite them, so they've decided that they will not accept changes unless they have reasonable certainty that all substantive objections from the userbase have been worked through and resolved"? It means the same thing in the end, but perhaps makes clearer how the "power" actually works. -- Nathaniel From charlesr.harris at gmail.com Wed Apr 25 08:03:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 25 Apr 2012 06:03:25 -0600 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 4:07 AM, Nathaniel Smith wrote: > On Wed, Apr 25, 2012 at 4:02 AM, Charles R Harris > wrote: > > > > > > On Tue, Apr 24, 2012 at 8:56 PM, Fernando Perez > > wrote: > >> > >> On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris > >> wrote: > >> > I admit to a certain curiosity about your own involvement in FOSS > >> > projects, > >> > and I know I'm not alone in this. Google shows several years of > >> > discussion > >> > on Monotone, but I have no idea what your contributions were > >> > >> Seriously??? > >> > >> Please, let's rise above this. We discuss people's opinions *on their > >> technical merit alone*, regardless of the background of the person > >> presenting them. I don't care if Linus himself shows up on the list > >> with a bad idea, it should be shot down; and if someone we'd never > >> heard of brings up a valid point, we should respect it. > >> > >> The day we start "checking credentials at the door" is the day this > >> project will die as an open source effort. Or at least I think so, > >> but perhaps I don't have enough 'commit credits' in my account for my > >> opinion to matter... > >> > > > > Fernando, I'm not checking credentials, I'm curious. Nathaniel has > > experience with FOSS projects, unlike us first timers, and I'd like to > know > > what that experience was and what he learned from it. He has also > mentioned > > Graydon Hoare in connection with RUST, and since Graydon was the prime > mover > > in Monotone I'd like to know the story of the project. > > Yeah, I don't want to get into resumes and such here, since it'd be > hard to avoid turning it into one of those "whose has a bigger FOSS" > pecking-order contests, which I find both unpleasant and > counter-productive. If I've learned anything useful from experience, > then I've already tried to summarize it here (and really, experience > may or may not guarantee any kind of wisdom). If you want to swap war > stories, ask me some day over a $BEVERAGE :-). > > Well, you have already appealed to the authority of greater experience, so it's a bit late to declare disinterest in the subject ;) I mean, at this point I really would like to see how big your FOSS is. > After sleeping on it, I was wondering if part of your objection to the > consensus stuff is just to the word "veto"? Would you feel more > comfortable if it was phrased like, "the maintainers have noticed that > trying to pick and choose on contentious issues tends to come back and > bite them, so they've decided that they will not accept changes unless > they have reasonable certainty that all substantive objections from > the userbase have been worked through and resolved"? It means the same > thing in the end, but perhaps makes clearer how the "power" actually > works. I don't agree here. People work on open source to scratch an itch, so the process of making a contribution needs to be easy. Widespread veto makes it more difficult and instead of opening up the process, closes it down. There is less freedom, not more. That is one of the reasons that the smaller scikits attract people, they have more freedom to do what they want and fewer people to answer to. Scipy also has some of that advantage because there are a number of packages to choose from. The more strict the process and the more people to please, the less appealing the environment becomes. This can be observed in practice and the voluntary nature of FOSS amplifies the effect. But in the end, someone has to write the code. Steve McConnell (Code Complete) estimates that even in carefully planned projects code construction will take up 60-80 percent of the time and effort. And if the code isn't written, nothing else matters much. That is why people who write code are essential to a project, no amount of structure will take their place. And here again the voluntary nature of FOSS comes into play, folks can't be ordered to do the work. It can be suggested that certain things be done, and the desire to work with the group will lead people to do work they wouldn't consider doing for themselves, but unless they are interested in a particular feature they won't generally be motivated to sit down and devote the effort needed to get it done just because someone else wants it. And they will rightly be offended if anyone demands that they volunteer their work to implement some feature in a particular way. They have to be led there, not pushed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Apr 25 08:08:23 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 25 Apr 2012 14:08:23 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: <20120425120823.GC508@phare.normalesup.org> On Wed, Apr 25, 2012 at 06:03:25AM -0600, Charles R Harris wrote: > Well, you have already appealed to the authority of greater experience, so > it's a bit late to declare disinterest in the subject ;) I mean, at this > point I really would like to see how big your FOSS is. Chuck, I am not sure that this is helpful for the discussion. I think that it is a great discussion to have in real life, as it is one of those in which all participants can learn a lot, but on a mailing list with a wider diffusion, it can very easily drift in a pissing contest. Ga?l From njs at pobox.com Wed Apr 25 10:01:30 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 25 Apr 2012 15:01:30 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 1:03 PM, Charles R Harris wrote: > That is one of the reasons that the smaller > scikits attract people, they have more freedom to do what they want and > fewer people to answer to. Scipy also has some of that advantage because > there are a number of packages to choose from. The more strict the process > and the more people to please, the less appealing the environment becomes. A quick look shows ~100,000 downloads of 1.6.1 via PyPI. SF.net shows >600,000 numpy downloads in the last 12 months. I'm afraid the numpy developers have a lot of people to please, whether they like it or not :-). OTOH I'm still confused at what kind of strictness you're worried about in practice. Not too many of those people actually show up on the mailing list, and usually the problem is convincing those that *do* show up into actually expressing their needs rather than just assuming that "real developers" must know better. Fernando spoke eloquently in this thread in support of consensus, and IPython doesn't seem to be laboring under a strict process that's driving away developers. AFAICT whole-heartedly adopting the consensus idea would only have actually altered one (!) decision in the project to date, which is not exactly jack-booted as these things go. - N From rhattersley at gmail.com Wed Apr 25 11:58:50 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Wed, 25 Apr 2012 16:58:50 +0100 Subject: [Numpy-discussion] A crazy masked-array thought Message-ID: The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. Richard Hattersley -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail.till at gmx.de Wed Apr 25 12:13:56 2012 From: mail.till at gmx.de (Till Stensitzki) Date: Wed, 25 Apr 2012 16:13:56 +0000 (UTC) Subject: [Numpy-discussion] linalg.lstsq Message-ID: Hello, is there weighted version of linalg.lstsq available? In my case, b is a (N,K) matrix, so i can't use manual scaling of x and b. greetings Till From travis at continuum.io Wed Apr 25 12:39:32 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 25 Apr 2012 11:39:32 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: > > I don't agree here. People work on open source to scratch an itch, so the process of making a contribution needs to be easy. Widespread veto makes it more difficult and instead of opening up the process, closes it down. There is less freedom, not more. That is one of the reasons that the smaller scikits attract people, they have more freedom to do what they want and fewer people to answer to. Scipy also has some of that advantage because there are a number of packages to choose from. The more strict the process and the more people to please, the less appealing the environment becomes. This can be observed in practice and the voluntary nature of FOSS amplifies the effect. It is true that it is easier to get developers to contribute to small projects where they can control exactly what happens and not have to appeal to a wider audience to get code changed and committed. This effect is well-illustrated by the emergence of scikits in the presence of SciPy. However, the idea that "people work on open source to scratch an itch" is incomplete. This is certainly one of the reasons volunteers work on open source. There are many people, however, that work on open source as part of their job. In the particular instance of the missing data support, Mark did much of the work as part of his job. It wasn't just to scratch an itch. So, we should not make assumptions on the basis of this incomplete model. NumPy is far-beyond the mode of a few people scratching an itch. It is in wide-spread use. It is a large project with a great deal of history and a diverse user-community. It needs people full-time to help maintain it. It needs maintainers who listen actively to anyone who will express their concerns cogently. It needs maintainers who recognize that any concern that somebody expresses is typically not a unique view. We cannot expect to find people like that who are just interested in "scratching an itch" and always working for free. Most projects suffer from lack of feedback. We should be worried about how to get more feedback and input from *just users* and be very sensitive to anyone feeling like their legitimate concerns are not being heard. Most people, rather than express their concerns, will just work-around the problem, write their own stuff, or move on to other languages and approaches. Your point about somebody writing the code is absolutely true, I would just suggest that the view that FOSS is always just volunteer labor needs to expand. People do work full time on FOSS as part of their job. We need to bring that to NumPy. I know of at least 2 other people besides me who are actively trying to make this possible. At Continuum we offer the opportunity to work on NumPy. We plan to continue this. We are hiring. In this context, I'm especially interested in making sure that it's not just the developers who get to decide what happens to NumPy. Nathaniel has clarified very well what "veto-power" really means. It's not absolute, it just means that users who write clear arguments get "listened to" actively. It doesn't replace the need for developers with wisdom and understanding of user-experiences, but "active listening" is a useful skill that we could all improve on: http://en.wikipedia.org/wiki/Active_listening A list full of bright, interested, active listeners is the kind of culture we need on this list. It's the kind of attitude we need from maintainers of NumPy. -Travis > > But in the end, someone has to write the code. Steve McConnell (Code Complete) estimates that even in carefully planned projects code construction will take up 60-80 percent of the time and effort. And if the code isn't written, nothing else matters much. That is why people who write code are essential to a project, no amount of structure will take their place. And here again the voluntary nature of FOSS comes into play, folks can't be ordered to do the work. It can be suggested that certain things be done, and the desire to work with the group will lead people to do work they wouldn't consider doing for themselves, but unless they are interested in a particular feature they won't generally be motivated to sit down and devote the effort needed to get it done just because someone else wants it. And they will rightly be offended if anyone demands that they volunteer their work to implement some feature in a particular way. They have to be led there, not pushed. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Wed Apr 25 12:58:02 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 25 Apr 2012 12:58:02 -0400 Subject: [Numpy-discussion] linalg.lstsq In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 12:13 PM, Till Stensitzki wrote: > Hello, > is there weighted version of linalg.lstsq available? > In my case, b is a (N,K) matrix, so i can't use manual scaling of x and b. > What shape are the weights in this case? I'm not that familiar with problems with an N x K b matrix. Skipper From matthew.brett at gmail.com Wed Apr 25 16:35:36 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 25 Apr 2012 13:35:36 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Wed, Apr 25, 2012 at 9:39 AM, Travis Oliphant wrote: > > I don't agree here. People work on open source to scratch an itch, so the > process of making a contribution needs to be easy. Widespread veto makes it > more difficult and instead of opening up the process, closes it down. There > is less freedom, not more. That is one of the reasons that the smaller > scikits attract people, they have more freedom to do what they want and > fewer people to answer to. Scipy also has some of that advantage because > there are a number of packages to choose from. The more strict the process > and the more people to please, the less appealing the environment becomes. > This can be observed in practice and the voluntary nature of FOSS amplifies > the effect. > > > It is true that it is easier to get developers to contribute to small > projects where they can control exactly what happens and not have to appeal > to a wider audience to get code changed and committed. ? This effect is > well-illustrated by the emergence of scikits in the presence of SciPy. > > However, the idea that "people work on open source to scratch an itch" is > incomplete. Do you agree that Numpy has not been very successful in recruiting and maintaining new developers compared to its large user-base? Compared to - say - Sympy? Why do you think this is? Would you consider asking that question directly on list and asking for the most honest possible answers? Best, Matthew From lists at hilboll.de Wed Apr 25 16:51:56 2012 From: lists at hilboll.de (Andreas H.) Date: Wed, 25 Apr 2012 22:51:56 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: <4F9863EC.8030707@hilboll.de> > Do you agree that Numpy has not been very successful in recruiting and > maintaining new developers compared to its large user-base? > > Compared to - say - Sympy? > > Why do you think this is? I don't know about SymPy. But in my view (and I'm just a typical user of NumPy), numpy seems to be at the base of what people actually need to do. I would assume most users of numpy actually use it because it's underlying piece of software, i.e. SciPy. It provides convenient, fast array structures to do maths. I would assume that most users see numpy as infrastructure, they write their own code on top of it. As a normal user of numpy, I wouldn't know where it would need improvement to suit my needs because it already does all I need. (Okay, masked arrays are something which could definitely improve, but that's another story.) This is different from other, higher-level FOSS projects, which are closer to end user final requirements, where end users might be more compelled to contribute because it's closer to what they're actually doing. For example, I just wrote two enhancements to scipy.interpolate, which were / will be merged recently / soon. Plus, numpy is a lot of C code, and to me (again, as a user) it seems more complicated to contribute because things are not as isolated. Just my 2 ct. Andreas. From alan.isaac at gmail.com Wed Apr 25 17:04:22 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 25 Apr 2012 17:04:22 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <4F9863EC.8030707@hilboll.de> References: <4F9863EC.8030707@hilboll.de> Message-ID: <4F9866D6.5000003@gmail.com> On 4/25/2012 4:51 PM, Andreas H. wrote: > I would assume that most users see numpy > as infrastructure, they write their own code on top of it. As a normal > user of numpy, I wouldn't know where it would need improvement to suit > my needs because it already does all I need. (Okay, masked arrays are > something which could definitely improve, but that's another story.) > > This is different from other, higher-level FOSS projects, Thank you Andreas. I was debating whether to explain exactly this, to point out that I found Matthew's question inappropriately aggressive, or both. Now I can do both in a flash. But I find I would also like to once again say thank you to the developers, who have given us an amazing piece of software. I would add that I am impressed by the deep respect they show each other even when dealing with hard issues. Alan Isaac Just another grateful user for many years. From hugadams at gwmail.gwu.edu Wed Apr 25 17:16:15 2012 From: hugadams at gwmail.gwu.edu (Adam Hughes) Date: Wed, 25 Apr 2012 17:16:15 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: <4F9866D6.5000003@gmail.com> References: <4F9863EC.8030707@hilboll.de> <4F9866D6.5000003@gmail.com> Message-ID: I too have to agree with Andreas. I have been using Numpy for years in my work, but am not versed in C so I don't even understand what numpy is doing under the hood. I too would only be able to contribute to the code at the python level, or as Andreas said, at improving SciPy packages and other Numpy-based projects. One area that you may be able to get more help from the general user base is with publicity, tutorials and word-of-mouth. I had recently shown Numpy to a friend who was versed in matlab, and he was really impressed because Numpy is easily incorporated into more general Python scripts. I've worked a lot with the Enthought Tool Suite and shown off some of that to my colleagues. They are impressed at the streamlined code-to-visuals process although I don't think they even realize that Numpy is responsible for all the numerics in the program. To this end, I think outreach would be helpful in recruiting new programmers. Once they understand that Numpy does a lot at the C-level and that it is not strictly a Python feature, they may realize its something that they can contribute to. On Wed, Apr 25, 2012 at 5:04 PM, Alan G Isaac wrote: > On 4/25/2012 4:51 PM, Andreas H. wrote: > > I would assume that most users see numpy > > as infrastructure, they write their own code on top of it. As a normal > > user of numpy, I wouldn't know where it would need improvement to suit > > my needs because it already does all I need. (Okay, masked arrays are > > something which could definitely improve, but that's another story.) > > > > This is different from other, higher-level FOSS projects, > > > Thank you Andreas. I was debating whether to explain exactly this, > to point out that I found Matthew's question inappropriately aggressive, > or both. Now I can do both in a flash. > > But I find I would also like to once again say thank you to the > developers, who have given us an amazing piece of software. > I would add that I am impressed by the deep respect they show > each other even when dealing with hard issues. > > Alan Isaac > Just another grateful user for many years. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Apr 25 17:35:16 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 25 Apr 2012 16:35:16 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: > > Do you agree that Numpy has not been very successful in recruiting and > maintaining new developers compared to its large user-base? > > Compared to - say - Sympy? > > Why do you think this is? I think it's mostly because it's infrastructure that is a means to an end. I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy. I've come to love the interesting plateau that NumPy lives on. But, I think it mostly does the job it is supposed to do. The fact that it is in C is also not very sexy. It is also rather complicated with a lot of inter-related parts. I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. You can get to know the code base. It just takes some time and patience. You also have to be comfortable with compilers and building software just to tweak the code. > > Would you consider asking that question directly on list and asking > for the most honest possible answers? I'm always interested in honest answers and welcome any sincere perspective. -Travis From matthew.brett at gmail.com Wed Apr 25 17:54:42 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 25 Apr 2012 14:54:42 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant wrote: >> >> Do you agree that Numpy has not been very successful in recruiting and >> maintaining new developers compared to its large user-base? >> >> Compared to - say - Sympy? >> >> Why do you think this is? > > I think it's mostly because it's infrastructure that is a means to an end. ? I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy. ? ?I've come to love the interesting plateau that NumPy lives on. ? ?But, I think it mostly does the job it is supposed to do. ? ? The fact that it is in C is also not very sexy. ? It is also rather complicated with a lot of inter-related parts. > > I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. > > You can get to know the code base. ?It just takes some time and patience. ? You also have to be comfortable with compilers and building software just to tweak the code. > > >> >> Would you consider asking that question directly on list and asking >> for the most honest possible answers? > > I'm always interested in honest answers and welcome any sincere perspective. Of course, there are potential explanations: 1) Numpy is too low-level for most people 2) The C code is too complicated 3) It's fine already, more or less are some obvious ones. I would say there are the easy answers. But of course, the easy answer may not be the right answer. It may not be easy to get right answer [1]. As you can see from Alan Isaac's reply on this thread, even asking the question can be taken as being in bad faith. In that situation, I think you'll find it hard to get sincere replies. Best, Matthew [1] http://en.wikipedia.org/wiki/Good_to_Great From josef.pktd at gmail.com Wed Apr 25 18:24:07 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 18:24:07 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 5:54 PM, Matthew Brett wrote: > Hi, > > On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant wrote: >>> >>> Do you agree that Numpy has not been very successful in recruiting and >>> maintaining new developers compared to its large user-base? >>> >>> Compared to - say - Sympy? >>> >>> Why do you think this is? >> >> I think it's mostly because it's infrastructure that is a means to an end. ? I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy. ? ?I've come to love the interesting plateau that NumPy lives on. ? ?But, I think it mostly does the job it is supposed to do. ? ? The fact that it is in C is also not very sexy. ? It is also rather complicated with a lot of inter-related parts. >> >> I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. >> >> You can get to know the code base. ?It just takes some time and patience. ? You also have to be comfortable with compilers and building software just to tweak the code. >> >> >>> >>> Would you consider asking that question directly on list and asking >>> for the most honest possible answers? >> >> I'm always interested in honest answers and welcome any sincere perspective. > > Of course, there are potential explanations: > > 1) Numpy is too low-level for most people > 2) The C code is too complicated > 3) It's fine already, more or less > > are some obvious ones. I would say there are the easy answers. But of > course, the easy answer may not be the right answer. It may not be > easy to get right answer [1]. ? As you can see from Alan Isaac's reply > on this thread, even asking the question can be taken as being in bad > faith. ?In that situation, I think you'll find it hard to get sincere > replies. I don't see why this shouldn't be the sincere replies, I think these easy answers are also the right answer for most people. maybe I would add 4) writing code for a few hundred thousand users is a big responsibility and a bit scary Except for a few "core" c developers, most contributors contribute to parts of numpy, best example Pierre and masked arrays, or specific functions. Life goes on for most developers in the application areas, I guess. For example I'm very glad about the time that Pauli is spending on scipy. numpy is "great" [1] Josef > > Best, > > Matthew > > [1] http://en.wikipedia.org/wiki/Good_to_Great [1] http://sourceforge.net/projects/numpy/files/stats/timeline?dates=2000-01-11+to+2012-04-25 http://qa.debian.org/popcon.php?package=python-numpy > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Wed Apr 25 18:29:49 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 25 Apr 2012 18:29:49 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wednesday, April 25, 2012, Matthew Brett wrote: > Hi, > > On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant > > wrote: > >> > >> Do you agree that Numpy has not been very successful in recruiting and > >> maintaining new developers compared to its large user-base? > >> > >> Compared to - say - Sympy? > >> > >> Why do you think this is? > > > > I think it's mostly because it's infrastructure that is a means to an > end. I certainly wasn't excited to have to work on NumPy originally, when > my main interest was SciPy. I've come to love the interesting plateau > that NumPy lives on. But, I think it mostly does the job it is supposed > to do. The fact that it is in C is also not very sexy. It is also > rather complicated with a lot of inter-related parts. > > > > I think NumPy could do much, much more --- but getting there is going to > be a challenge of execution and education. > > > > You can get to know the code base. It just takes some time and > patience. You also have to be comfortable with compilers and building > software just to tweak the code. > > > > > >> > >> Would you consider asking that question directly on list and asking > >> for the most honest possible answers? > > > > I'm always interested in honest answers and welcome any sincere > perspective. > > Of course, there are potential explanations: > > 1) Numpy is too low-level for most people > 2) The C code is too complicated > 3) It's fine already, more or less > > are some obvious ones. I would say there are the easy answers. But of > course, the easy answer may not be the right answer. It may not be > easy to get right answer [1]. As you can see from Alan Isaac's reply > on this thread, even asking the question can be taken as being in bad > faith. In that situation, I think you'll find it hard to get sincere > replies. As with anything, the phrasing of a question makes a world of a difference with regards to replies. Ask any pollster. When phrased correctly, I would not have any doubt about the sincerely of replies, and I would not worry about previewed hostility -- when phrased correctly. As the questioner, the onus is upon you to gauge the community and adjust the question appropriately. I think the fact that we engage in these discussions show that we value and care about each others perceptions and opinions with regards to numpy. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Apr 25 18:49:29 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 25 Apr 2012 15:49:29 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Wed, Apr 25, 2012 at 1:35 PM, Matthew Brett wrote: > Hi, > > On Wed, Apr 25, 2012 at 9:39 AM, Travis Oliphant wrote: >> >> I don't agree here. People work on open source to scratch an itch, so the >> process of making a contribution needs to be easy. Widespread veto makes it >> more difficult and instead of opening up the process, closes it down. There >> is less freedom, not more. That is one of the reasons that the smaller >> scikits attract people, they have more freedom to do what they want and >> fewer people to answer to. Scipy also has some of that advantage because >> there are a number of packages to choose from. The more strict the process >> and the more people to please, the less appealing the environment becomes. >> This can be observed in practice and the voluntary nature of FOSS amplifies >> the effect. >> >> >> It is true that it is easier to get developers to contribute to small >> projects where they can control exactly what happens and not have to appeal >> to a wider audience to get code changed and committed. ? This effect is >> well-illustrated by the emergence of scikits in the presence of SciPy. >> >> However, the idea that "people work on open source to scratch an itch" is >> incomplete. > > Do you agree that Numpy has not been very successful in recruiting and > maintaining new developers compared to its large user-base? > > Compared to - say - Sympy? > > Why do you think this is? > > Would you consider asking that question directly on list and asking > for the most honest possible answers? Aha - I now realize that I was reading too quickly under the influence (again) of too much caffeine, and missed this part of Travis' email: > In this context, I'm especially interested > in making sure that it's not just the developers who get to decide what > happens to NumPy. Nathaniel has clarified very well what "veto-power" > really means. It's not absolute, it just means that users who write clear > arguments get "listened to" actively. It doesn't replace the need for > developers with wisdom and understanding of user-experiences, but "active > listening" is a useful skill that we could all improve on: > http://en.wikipedia.org/wiki/Active_listening A list full of bright, > interested, active listeners is the kind of culture we need on this list. > It's the kind of attitude we need from maintainers of NumPy. which mostly answers my worry, and I apologize for pushing on an open door. See you, Matthew From cournape at gmail.com Wed Apr 25 19:06:04 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 26 Apr 2012 00:06:04 +0100 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 10:54 PM, Matthew Brett wrote: > Hi, > > On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant > wrote: > >> > >> Do you agree that Numpy has not been very successful in recruiting and > >> maintaining new developers compared to its large user-base? > >> > >> Compared to - say - Sympy? > >> > >> Why do you think this is? > > > > I think it's mostly because it's infrastructure that is a means to an > end. I certainly wasn't excited to have to work on NumPy originally, when > my main interest was SciPy. I've come to love the interesting plateau > that NumPy lives on. But, I think it mostly does the job it is supposed > to do. The fact that it is in C is also not very sexy. It is also > rather complicated with a lot of inter-related parts. > > > > I think NumPy could do much, much more --- but getting there is going to > be a challenge of execution and education. > > > > You can get to know the code base. It just takes some time and > patience. You also have to be comfortable with compilers and building > software just to tweak the code. > > > > > >> > >> Would you consider asking that question directly on list and asking > >> for the most honest possible answers? > > > > I'm always interested in honest answers and welcome any sincere > perspective. > > Of course, there are potential explanations: > > 1) Numpy is too low-level for most people > 2) The C code is too complicated > 3) It's fine already, more or less > > are some obvious ones. I would say there are the easy answers. But of > course, the easy answer may not be the right answer. It may not be > easy to get right answer [1]. As you can see from Alan Isaac's reply > on this thread, even asking the question can be taken as being in bad > faith. In that situation, I think you'll find it hard to get sincere > replies. > While I don't think jumping into NumPy C code is as difficult as some people made it to be, I think numpy reaped most of the low-hanging fruits, and is now at a stage where it requires massive investment to get significantly better. I would suggest a different question, whose answer may serve as a proxy to uncover the lack of contributions: what needs to be done in NumPy, and how can we make it simpler for newcommers ? Here is an incomplete, unshamelessly biased list: - Less dependencies on CPython internals - Allow for 3rd parties to extend numpy at the C level in more fundamental ways (e.g. I wished something like half-float dtype could be more easily developed out of tree) - Separate memory representation from higher level representation (slicing, broadcasting, etc?), to allow arrays to "sit" on non-contiguous memory areas, etc? - Test and performance infrastructure so we can track our evolution, get coverage of our C code, etc? - Fix bugs - Better integration with 3rd party on-disk storage (database, etc?) None of that is particularly simple nor has a fast learning curve, except for fixing bugs and maybe some of the infrastructure. I think most of this is necessary for the things Travis talked about a few weeks ago. What could make contributions easier: - different levels of C API documentation (still lacking anything besides reference) - ways to detect early when we break ABI, slightly more obscure platforms (we need good CI, ways to publish binaries that people can easily test, etc...) - improve infrastructure so that we can focus on the things we want to work on (improve the dire situation with bug tracking, etc?) Also, lots of people just don't know/want to know C. But people with say web skills would be welcome: we have a website that could use some help? So -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Apr 25 19:08:51 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 25 Apr 2012 16:08:51 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: Hi, On Wed, Apr 25, 2012 at 3:24 PM, wrote: > On Wed, Apr 25, 2012 at 5:54 PM, Matthew Brett wrote: >> Hi, >> >> On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant wrote: >>>> >>>> Do you agree that Numpy has not been very successful in recruiting and >>>> maintaining new developers compared to its large user-base? >>>> >>>> Compared to - say - Sympy? >>>> >>>> Why do you think this is? >>> >>> I think it's mostly because it's infrastructure that is a means to an end. ? I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy. ? ?I've come to love the interesting plateau that NumPy lives on. ? ?But, I think it mostly does the job it is supposed to do. ? ? The fact that it is in C is also not very sexy. ? It is also rather complicated with a lot of inter-related parts. >>> >>> I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. >>> >>> You can get to know the code base. ?It just takes some time and patience. ? You also have to be comfortable with compilers and building software just to tweak the code. >>> >>> >>>> >>>> Would you consider asking that question directly on list and asking >>>> for the most honest possible answers? >>> >>> I'm always interested in honest answers and welcome any sincere perspective. >> >> Of course, there are potential explanations: >> >> 1) Numpy is too low-level for most people >> 2) The C code is too complicated >> 3) It's fine already, more or less >> >> are some obvious ones. I would say there are the easy answers. But of >> course, the easy answer may not be the right answer. It may not be >> easy to get right answer [1]. ? As you can see from Alan Isaac's reply >> on this thread, even asking the question can be taken as being in bad >> faith. ?In that situation, I think you'll find it hard to get sincere >> replies. > > I don't see why this shouldn't be the sincere replies, I think these > easy answers are also the right answer for most people. I wasn't saying these replies are not sincere, of course they are factors. I have heard other people give reasons why they didn't enjoy numpy development much, but I can't speak for them, only for me. I have done some numpy development, but very little. I've done a moderate amount of scipy development. I have considered doing more numpy development, in particular, I did want to do some work on the longdouble parts of numpy. Part of the reason I didn't do this was because, when I raised the question on the list, it did not seem there was much interest in a change, or even a real discussion. Partly from the masked array discussions, but not only, it seemed that the process of making decisions was not clear, and there seemed to be as many views about how this was done as there were developers. I suppose I'd summarize the atmosphere, as I have have felt it, as being that numpy was owned by someone else, and I wasn't quite sure who that was, but I was fairly sure it wasn't me. On the other hand, in some projects at least - of which Sympy is the most obvious example, I think it's easy to feel that all of us own Sympy (and I've only made one commit to Sympy, and that of someone else's idea). Adding to that, it does seem to me that the atmosphere on this list get ugly sometimes. In particular it seems to me that there's a sort of conformity that starts to emerge in which people feel it is necessary to praise or criticize people, but not the arguments. I suppose that is because there was a long time during which Travis was not on the list to model what kind of discussion he wanted. I'm glad that has changed now. The reason I keep returning to process, even though it is 'non-technical' - is because it seems to me that the atmosphere that I'm describing will have the strong effect of discouraging enthusiastic developers. It certainly discourages me. I don't think open-source software is just developers scratching an itch, I think it's about community, and the pleasure of working with people you like and trust, to do something you think is important. If I've made that harder, then I am sorry, and I'm very happy to hear why that is, and how I can help. Best, Matthew From josef.pktd at gmail.com Wed Apr 25 20:18:09 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 20:18:09 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 7:08 PM, Matthew Brett wrote: > Hi, > > On Wed, Apr 25, 2012 at 3:24 PM, ? wrote: >> On Wed, Apr 25, 2012 at 5:54 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant wrote: >>>>> >>>>> Do you agree that Numpy has not been very successful in recruiting and >>>>> maintaining new developers compared to its large user-base? >>>>> >>>>> Compared to - say - Sympy? >>>>> >>>>> Why do you think this is? >>>> >>>> I think it's mostly because it's infrastructure that is a means to an end. ? I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy. ? ?I've come to love the interesting plateau that NumPy lives on. ? ?But, I think it mostly does the job it is supposed to do. ? ? The fact that it is in C is also not very sexy. ? It is also rather complicated with a lot of inter-related parts. >>>> >>>> I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. >>>> >>>> You can get to know the code base. ?It just takes some time and patience. ? You also have to be comfortable with compilers and building software just to tweak the code. >>>> >>>> >>>>> >>>>> Would you consider asking that question directly on list and asking >>>>> for the most honest possible answers? >>>> >>>> I'm always interested in honest answers and welcome any sincere perspective. >>> >>> Of course, there are potential explanations: >>> >>> 1) Numpy is too low-level for most people >>> 2) The C code is too complicated >>> 3) It's fine already, more or less >>> >>> are some obvious ones. I would say there are the easy answers. But of >>> course, the easy answer may not be the right answer. It may not be >>> easy to get right answer [1]. ? As you can see from Alan Isaac's reply >>> on this thread, even asking the question can be taken as being in bad >>> faith. ?In that situation, I think you'll find it hard to get sincere >>> replies. >> >> I don't see why this shouldn't be the sincere replies, I think these >> easy answers are also the right answer for most people. > > I wasn't saying these replies are not sincere, of course they are factors. > > I have heard other people give reasons why they didn't enjoy numpy > development much, but I can't speak for them, only for me. > > I have done some numpy development, but very little. > > I've done a moderate amount of scipy development. > > I have considered doing more numpy development, in particular, I did > want to do some work on the longdouble parts of numpy. > > Part of the reason I didn't do this was because, when I raised the > question on the list, it did not seem there was much interest in a > change, or even a real discussion. > > Partly from the masked array discussions, but not only, it seemed that > the process of making decisions was not clear, and there seemed to be > as many views about how this was done as there were developers. > > I suppose I'd summarize the atmosphere, as I have have felt it, as > being that numpy was owned by someone else, and I wasn't quite sure > who that was, but I was fairly sure it wasn't me. ? On the other hand, > in some projects at least - of which Sympy is the most obvious > example, I think it's easy to feel that all of us own Sympy (and I've > only made one commit to Sympy, and that of someone else's idea). > > Adding to that, it does seem to me that the atmosphere on this list > get ugly sometimes. ?In particular it seems to me that there's a sort > of conformity that starts to emerge in which people feel it is > necessary to praise or criticize people, but not the arguments. ? I > suppose that is because there was a long time during which Travis was > not on the list to model what kind of discussion he wanted. ?I'm glad > that has changed now. > > The reason I keep returning to process, even though it is > 'non-technical' - is because it seems to me that the atmosphere that > I'm describing will have the strong effect of discouraging > enthusiastic developers. ?It certainly discourages me. ?I don't think > open-source software is just developers scratching an itch, I think > it's about community, and the pleasure of working with people you like > and trust, to do something you think is important. Except for the big changes like NA and datetime, I think the debate is pretty boring. The main problem that I see for discussing technical issues is whether there are many developers really interested in commenting on code and coding. I think it mostly comes down to the discussion on tickets or pull requests. First my own experience with scipy.stats. Most of the time when I was cleaning up scipy.stats, I was alone, except for some helpful comments by Robert. My "itch" was that the bugs in scipy.stats were bugging me, and I just kept working and committing without code review until the bugs that I thought urgent were gone. Now, with Warren and Ralf also working on scipy.stats it is a lot more fun, since there is actually a "regular" community of 3 developers. My impression (since I only pay partially attention to this) is that Pierre's work on np.ma masked arrays and David's first cleanup of the c source and build issues were pretty lonely jobs. Just get the work done with whatever motivation. Chuck is writing np.polynomial, but since Anne left there is very little discussion on the details. My impression is also that the numpy community is much more a user than a developer community. The big successful community project (with some financial support) was the documentation improvement which made it possible for many users to contribute. (The user community is moving to stackoverflow.) So, I don't think it will be easy or possible to get the same enthusiasm for building this great new package such as sympy or scikit-learn with numpy or scipy. But still scipy gets one module after another cleaned up and new enhancements, but often it's a contribution to make a function available to other users, not because we are part of the scipy community but because we are part of the SciPy community. slightly skewed view from a numpy user Josef > > If I've made that harder, then I am sorry, and I'm very happy to hear > why that is, and how I can help. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Wed Apr 25 21:11:15 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 25 Apr 2012 20:11:15 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Apr 25, 2012, at 7:18 PM, josef.pktd at gmail.com wrote: > > Except for the big changes like NA and datetime, I think the debate is > pretty boring. > The main problem that I see for discussing technical issues is whether > there are many > developers really interested in commenting on code and coding. > I think it mostly comes down to the discussion on tickets or pull requests. This is a very insightful comment. Github has been a great thing for both NumPy and SciPy. However, it has changed the community feel for many because these pull request discussions don't happen on this list. You have to comment on a pull request to get notified of future comments or changes. The process is actually pretty nice, but it does mean you can't just hang out watching this list. You have to look at the pull requests and get involved there. It would be nice if every pull request created a message to this list. Is that even possible? -Travis From ben.root at ou.edu Wed Apr 25 21:28:57 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 25 Apr 2012 21:28:57 -0400 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wednesday, April 25, 2012, Travis Oliphant wrote: > > On Apr 25, 2012, at 7:18 PM, josef.pktd at gmail.com wrote: > > > > > Except for the big changes like NA and datetime, I think the debate is > > pretty boring. > > The main problem that I see for discussing technical issues is whether > > there are many > > developers really interested in commenting on code and coding. > > I think it mostly comes down to the discussion on tickets or pull > requests. > > This is a very insightful comment. Github has been a great thing for > both NumPy and SciPy. However, it has changed the community feel for many > because these pull request discussions don't happen on this list. > > You have to comment on a pull request to get notified of future comments > or changes. The process is actually pretty nice, but it does mean you > can't just hang out watching this list. You have to look at the pull > requests and get involved there. > > It would be nice if every pull request created a message to this list. > Is that even possible? > > -Travis > > This ha been a concern of mine for matplotlib as well. The closest I can come is to set up an RSS feed, but all the titles are PR # and a action, so I lose track of which ones I want to view. All devs get an initial email for each PR, but I cant figure out how to get that down to the public list and it is hard to know if another dev took care of the PR or if it is just waiting. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Wed Apr 25 22:10:21 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Wed, 25 Apr 2012 21:10:21 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: <4F98AE8D.6000201@creativetrax.com> On 4/25/12 8:11 PM, Travis Oliphant wrote: > > On Apr 25, 2012, at 7:18 PM, josef.pktd at gmail.com wrote: > >> >> Except for the big changes like NA and datetime, I think the debate is >> pretty boring. >> The main problem that I see for discussing technical issues is whether >> there are many >> developers really interested in commenting on code and coding. >> I think it mostly comes down to the discussion on tickets or pull requests. > > This is a very insightful comment. Github has been a great thing for both NumPy and SciPy. However, it has changed the community feel for many because these pull request discussions don't happen on this list. > > You have to comment on a pull request to get notified of future comments or changes. The process is actually pretty nice, but it does mean you can't just hang out watching this list. You have to look at the pull requests and get involved there. > > It would be nice if every pull request created a message to this list. Is that even possible? Sure. Github has a pretty extensive hook system that can notify (via hitting a URL) about lots of events. https://github.com/blog/964-all-of-the-hooks http://developer.github.com/v3/repos/hooks/ I haven't actually used it (just read the docs), so I may be mistaken... Thanks, Jason From punchagan at gmail.com Thu Apr 26 00:08:07 2012 From: punchagan at gmail.com (Puneeth Chaganti) Date: Thu, 26 Apr 2012 09:38:07 +0530 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Thu, Apr 26, 2012 at 6:41 AM, Travis Oliphant wrote: [snip] > > It would be nice if every pull request created a message to this list. ? ?Is that even possible? That is definitely possible and shouldn't be too hard to do, like Jason said. But that can potentially cause some confusion, with some of the discussion starting off in the mailing list, and some of the discussion happening on the pull-request itself. Are my concerns justified? -- Puneeth From jason-sage at creativetrax.com Thu Apr 26 00:31:38 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Wed, 25 Apr 2012 23:31:38 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: <4F98CFAA.6020205@creativetrax.com> On 4/25/12 11:08 PM, Puneeth Chaganti wrote: > On Thu, Apr 26, 2012 at 6:41 AM, Travis Oliphant wrote: > [snip] >> >> It would be nice if every pull request created a message to this list. Is that even possible? > > That is definitely possible and shouldn't be too hard to do, like > Jason said. But that can potentially cause some confusion, with some > of the discussion starting off in the mailing list, and some of the > discussion happening on the pull-request itself. Are my concerns > justified? It wouldn't be too hard to have mailing list replies sent back to the pull request as comments (again, using the github API). Already, if you're on a ticket, you can just reply to a comment email and the reply is put as a comment in the pull request. Jason From srean.list at gmail.com Thu Apr 26 00:37:53 2012 From: srean.list at gmail.com (srean) Date: Wed, 25 Apr 2012 23:37:53 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 11:08 PM, Puneeth Chaganti wrote: > On Thu, Apr 26, 2012 at 6:41 AM, Travis Oliphant wrote: > [snip] >> >> It would be nice if every pull request created a message to this list. ? ?Is that even possible? > > That is definitely possible and shouldn't be too hard to do, like > Jason said. ?But that can potentially cause some confusion, with some > of the discussion starting off in the mailing list, and some of the > discussion happening on the pull-request itself. ?Are my concerns > justified? Related issue: some projects have an user's list and a devel list. It might be worth (re?)considering that option. They have their pros and cons but I think I like the idea of a devel list and seperate "help wanted" list. Something else that might be helpful for contentious threads is a stack-overflowesque system where readers can vote up responses of others. Sometimes just a "i agree" "i disagree" goes a long way, especially when you have many lurkers. On something else that was brought up: I do not consider myself competent/prepared enough to take on development, but it is not the case that I have _never_ felt the temptation. What I have found intimidating and styming is the perceived politics over development issues. The two places where I have felt this are a) on contentious threads on the list and b) what seems like legitimate patches tickets on trac that seem to be languishing for no compelling technical reason. I would be hardpressed to quote specifics, but I have encountered this feeling a few times. For my case it would not have mattered, because I doubt I would have contriuted anything useful. However, it might be the case that more competent lurkers might have felt the same way. The possibility of a patch relegated semipermanently to trac, or the possibility of getting caught up in the politics is bit of a disincentive. This is just an honest perception/observation. I am more of a get on with it, get the code out and rest will resolve itself eventually kind of a guy, thus long political/philosophical/epistemic threads distance me. I know there are legitimate reasons to have this discussions. But it seems to me that they get a bit too wordy here sometimes. My 10E-2. -- srean From fperez.net at gmail.com Thu Apr 26 00:48:14 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 25 Apr 2012 21:48:14 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Wed, Apr 25, 2012 at 6:28 PM, Benjamin Root wrote: >> It would be nice if every pull request created a message to this list. >> ?Is that even possible? >> >> -Travis >> > > This ha been a concern of mine for matplotlib as well. ?The closest I can > come is to set up an RSS feed, but all the titles are PR # and a action, so > I lose track of which ones I want to view. Same here for IPython. If anybody figures out a clean solution, please advertise it! I think a bunch of us want the same thing... Cheers, f From soucoupevolante at yahoo.com Thu Apr 26 07:37:53 2012 From: soucoupevolante at yahoo.com (Andre Martel) Date: Thu, 26 Apr 2012 04:37:53 -0700 (PDT) Subject: [Numpy-discussion] (cube max and reduction) In-Reply-To: References: <1334945745.69846.YahooMailNeo@web163104.mail.bf1.yahoo.com> Message-ID: <1335440273.41723.YahooMailNeo@web163102.mail.bf1.yahoo.com> ________________________________ From: eat To: Discussion of Numerical Python Sent: Sunday, April 22, 2012 5:54 AM Subject: Re: [Numpy-discussion] (no subject) Thanks, this was useful. For a large cube, though, I had to loop through the indices of the maxima: ??? for i in np.arange(0, ndx.shape[0]): ??????? C_out[i:, ndx == i]= C_in[(i+1):, ndx== i] Would there be a way to speed this up (no loop) ? On Fri, Apr 20, 2012 at 9:15 PM,? wrote: What would be the best way to remove the maximum from a cube and "collapse" the remaining elements along the z-axis ? >For example, I want to reduce Cube to NewCube: > > > >>>> Cube >array([[[? 13,?? 2,?? 3,? 42], >??????? [? 5, 100,?? 7,?? 8], >??????? [? 9,?? 1,? 11,? 12]], > >?????? [[ 25,?? 4,? 15,?? 1], >??????? [ 17,? 30,?? 9,? 20], >??????? [ 21,?? 2,? 23,? 24]], > >?????? [[ 1,?? 2,? 27,? 28], >??????? [ 29,? 18,? 31,? 32], >??????? [ -1,?? 3,? 35,?? 4]]]) > > > >NewCube > > >array([[[? 13,?? 2,?? 3,? 1], >??????? [? 5, 30,?? 7,?? 8], >??????? [? 9,?? 1,? 11,? 12]], > >?????? [[ 1,?? 2,? 15,? 28], >??????? [ 17,? 18,? 9,? 20], >??????? [ -1,?? 2,? 23,?? 4]]]) > > >I tried with argmax() and then roll() and delete() but these >all work on 1-D arrays only. Thanks. > Perhaps it would be more straightforward to process via 2D-arrays, like: In []: C Out[]:? array([[[ 13, ? 2, ? 3, ?42], ? ? ? ? [ ?5, 100, ? 7, ? 8], ? ? ? ? [ ?9, ? 1, ?11, ?12]], ? ? ? ?[[ 25, ? 4, ?15, ? 1], ? ? ? ? [ 17, ?30, ? 9, ?20], ? ? ? ? [ 21, ? 2, ?23, ?24]], ? ? ? ?[[ ?1, ? 2, ?27, ?28], ? ? ? ? [ 29, ?18, ?31, ?32], ? ? ? ? [ -1, ? 3, ?35, ? 4]]]) In []: C_in= C.reshape(3, -1).copy() In []: ndx= C_in.argmax(0) In []: C_out= C_in[:2, :] In []: C_out[:, ndx== 0]= C_in[1:, ndx== 0] In []: C_out[1, ndx== 1]= C_in[2, ndx== 1] In []:?C_out.reshape(2, 3, 4) Out[]:? array([[[13, ?2, ?3, ?1], ? ? ? ? [ 5, 30, ?7, ?8], ? ? ? ? [ 9, ?1, 11, 12]], ? ? ? ?[[ 1, ?2, 15, 28], ? ? ? ? [17, 18, ?9, 20], ? ? ? ? [-1, ?2, 23, ?4]]]) My 2 cents, -eat? >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Thu Apr 26 12:31:29 2012 From: pmhobson at gmail.com (Paul Hobson) Date: Thu, 26 Apr 2012 09:31:29 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: We're kind of drifting again here, but... Remember when all this discussion happened on usenet? Perhaps we're in yet another awkward transition period and soon all email list-type discussions will be on Github, Bitbucket, StackOverflow (e.g. pandas), etc. There's advantages and disadvantages to any sort of discussion paradigm, but I can imagine a future version of Github where each project has a tab for a StackOverflow-esque forum. As a user, that all sounds pretty appealing to me. But this is all just speculation and conjecture... -paul On Wed, Apr 25, 2012 at 9:48 PM, Fernando Perez wrote: > On Wed, Apr 25, 2012 at 6:28 PM, Benjamin Root wrote: >>> It would be nice if every pull request created a message to this list. >>> ?Is that even possible? >>> >>> -Travis >>> >> >> This ha been a concern of mine for matplotlib as well. ?The closest I can >> come is to set up an RSS feed, but all the titles are PR # and a action, so >> I lose track of which ones I want to view. > > Same here for IPython. ?If anybody figures out a clean solution, > please advertise it! ?I think a bunch of us want the same thing... > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Thu Apr 26 13:01:02 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 26 Apr 2012 19:01:02 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Thu, Apr 26, 2012 at 6:37 AM, srean wrote: > > On something else that was brought up: I do not consider myself > competent/prepared enough to take on development, but it is not the > case that I have _never_ felt the temptation. What I have found > intimidating and styming is the perceived politics over development > issues. The two places where I have felt this are a) on contentious > threads on the list and b) what seems like legitimate patches tickets > on trac that seem to be languishing for no compelling technical > reason. I would be hardpressed to quote specifics, but I have > encountered this feeling a few times. > Patches languishing on Trac is a real problem. The issue here is not at all about not wanting those patches, but just about the overhead of getting them reviewed/fixed/committed. This problem has more or less disappeared with Github; there are very few PRs that are just sitting there. As for existing patches on Trac, if you or anyone else has an interest in one of them, checking that patch for test coverage / documentation and resubmitting it as a PR would be a massive help. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 26 13:02:13 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 26 Apr 2012 10:02:13 -0700 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Mon, Apr 23, 2012 at 11:18 PM, Ralf Gommers >> Perhaps a more formal "development release" system could help here. >> IIUC, numpy pretty much has two things: > This is a good idea - not for development releases but for master. Building > nightly/weekly binaries would help more people try out new features. good start, but I think master may fluctuate too quickly (and how often is it broken?) but better than nothing, yes? >> 2) there is the wxversion system > wxversion was broken for a long time on Ubuntu too (~5 yrs ago). I don't > exactly remember it as a good idea. well, it was a good idea, maybe not a good implementation -- and it was vary helpful a few years back when wx was in major flux. What we really need is python itself providing a package version selection mechanism, but Guido&c never saw the need (the existence of virtualenv proves the need if you ask me....) > virtualenv also doesn't help, because if you can use that you know how to build from source anyway. not true -- lots of folks use easy_install and/or pip with virtualenv. and the git barrier to entry is not trivial -- granted jsut getting master is not hard, but I know i've been using git for a couple months on a core project of mine, and I still find it's giving me far more pain that help. (I know I stil haven't wrapped my brain around what DVCS really is...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From ralf.gommers at googlemail.com Thu Apr 26 13:19:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 26 Apr 2012 19:19:36 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: On Thu, Apr 26, 2012 at 7:02 PM, Chris Barker wrote: > On Mon, Apr 23, 2012 at 11:18 PM, Ralf Gommers > > >> Perhaps a more formal "development release" system could help here. > >> IIUC, numpy pretty much has two things: > > > This is a good idea - not for development releases but for master. > Building > > nightly/weekly binaries would help more people try out new features. > > good start, but I think master may fluctuate too quickly (and how > often is it broken?) but better than nothing, yes? > How often is it broken? A couple of failing tests yes, but hardly ever seriously broken. > >> 2) there is the wxversion system > > > wxversion was broken for a long time on Ubuntu too (~5 yrs ago). I don't > > exactly remember it as a good idea. > > well, it was a good idea, maybe not a good implementation -- and it > was vary helpful a few years back when wx was in major flux. What we > really need is python itself providing a package version selection > mechanism, but Guido&c never saw the need (the existence of > virtualenv proves the need if you ask me....) > > agreed > > virtualenv also doesn't help, because if you can use that you know how > to build from source anyway. > > not true -- lots of folks use easy_install and/or pip with virtualenv. > Pip only installs from source, so if you haven't got the right compilers, development headers etc. it will fail for numpy. easy_install is also a lottery, and only works for numpy on Windows unless you are set up to build from source. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Thu Apr 26 14:53:50 2012 From: srean.list at gmail.com (srean) Date: Thu, 26 Apr 2012 13:53:50 -0500 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> Message-ID: > Patches languishing on Trac is a real problem. The issue here is not at all > about not wanting those patches, Oh yes I am sure of that, in the past it had not been clear what more is necessary to get them pulled in, or how to go about satisfying the requirements. The document you mailed on the scipy list goes a long way in addressing those issues. So thanks a lot. In fact it might be a good idea to add the link to it in the signature of the mail that trac replies with. but just about the overhead of getting them > reviewed/fixed/committed. This problem has more or less disappeared with > Github; there are very few PRs that are just sitting there. > > As for existing patches on Trac, if you or anyone else has an interest in > one of them, checking that patch for test coverage / documentation and > resubmitting it as a PR would be a massive help. > > Ralf From charlesr.harris at gmail.com Thu Apr 26 23:29:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 26 Apr 2012 21:29:19 -0600 Subject: [Numpy-discussion] Removal of mask arrays? [was consensus] In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> Message-ID: On Tue, Apr 24, 2012 at 6:46 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:59 PM, Charles R Harris wrote: > > > > On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: > >> >> On Apr 24, 2012, at 6:01 PM, St?fan van der Walt wrote: >> >> > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris >> > wrote: >> >>> Why are we having a discussion on NAN's in a thread on consensus? >> >>> This is a strong indicator of the problem we're facing. >> >> >> >> We seem to have a consensus regarding interest in the topic. >> > >> > For the benefit of those of us interested in both discussions, would >> > you kindly start a new thread on the MA topic? >> > >> > In response to Travis's suggestion of writing up a short summary of >> > community principles, as well as Matthew's initial formulation, I >> > agree that this would be helpful in enshrining the values we cherish >> > here, as well as in communicating those values to the next generation >> > of developers. >> > >> >> From observing the community, I would guess that these values include: >> > >> > - That any party with an interest in NumPy is given the opportunity to >> > speak and to be heard on the list. >> > - That discussions that influence the course of the project take place >> > openly, for anyone to observe. >> > - That decisions are made once consensus is reached, i.e., if everyone >> > agrees that they can live with the outcome. >> >> This is well stated. Thank you Stefan. >> >> Some will argue about what "consensus" means or who "everyone" is. >> But, if we are really worrying about that, then we have stopped listening >> to each other which is the number one community value that we should be >> promoting, demonstrating, and living by. >> >> Consensus to me means that anyone who can produce a well-reasoned >> argument and demonstrates by their persistence that they are actually using >> the code and are aware of the issues has veto power on pull requests. >> At times people with commit rights to NumPy might perform a pull request >> anyway, but they should acknowledge at least in the comment (but for major >> changes --- on this list) that they are doing so and provide their reasons. >> >> If I decide later that I think the pull request was made inappropriately >> in the face of objections and the reasons were not justified, then I will >> reserve the right to revert the pull request. I would like core >> developers of NumPy to have the same ability to check me as well. But, >> if there is a disagreement at that level, then I will reserve the right to >> decide. >> >> Basically, what we have in this situation is that the masked arrays were >> added to NumPy master with serious objections to the API. What I'm trying >> to decide right now is can we move forward and satisfy the objections >> without removing the ndarrayobject changes entirely (I do think the >> concerns warrant removal of the changes). The discussion around that is >> the most helpful right now, but should take place on another thread. >> >> > Travis, if you are playing the BDFL role, then just make the darn decision > and remove the code so we can get on with life. As it is you go back and > forth and that does none of us any good, you're a big guy and you're > rocking the boat. I don't agree with that decision, I'd rather evolve the > code we have, but I'm willing to compromise with your decision in this > matter. I'm not willing to compromise with Nathaniel's, nor it seems > vice-versa. Nathaniel has volunteered to do the work, just ask him to > submit a patch. > > > I would like to see Nathaniel and Mark work out a document that they can > both agree to and co-author that is presented to this list before doing > something like that. At the very least this should summarize the feature > from both perspectives. > > I have been encouraged by Nathaniel's willingness to contribute code, and > I know Mark is looking for acceptable solutions that are still consistent > with his view of things. These are all positive signs to me. We need > to give this another week or two. I would prefer a solution that evolves > the code as well. But, I also don't want yet another masked array > implementation that gets little use but has real and long-lasting > implications on the ndarray structure. There is both the effect of the > growth of the ndarray structure (most uses should not worry about this at > all), but also the growth of the *concept* of an ndarray --- this is a > little more subtle but also has real downstream implications. > > Some of these implications have been pointed out already by consumers of > the C-API who are unsure about how code that was not built with masks in > mind will respond (I believe it will raise an error if they are using the > standard APIs -- It probably should if it doesn't). Long term, I agree > that the NumPy array should not be so tied to a particular *implementation* > as it is now. I also don't think it should be tied so deeply to ABI > compatibility. I think that was a mistake to be so devoted to this > concept that we created a lot of work for ourselves --- work that is easily > eliminated by distributions that re-compile down-stream dependencies after > an ABI breaking release of NumPy. I realize, I didn't disagree very > strongly before -- I disagree with my earlier view. That's not to say > future releases of NumPy 1.X will break ABI compatibility --- I just think > the price is not worth the value of the thing we have set as the standard > (it's just a simple re-compile of downstream dependencies). > > We are quite delayed to get things out as you have noted. If the desire > is to get a long-term release schedule for Debian and/or Ubuntu, then I > think the 1.6.2 release is a good idea. It also makes more sense to me as > a Long-Term Release candidate. > > Now that I've been working on it, I can assure you that 1.6.2 is a terrible candidate for a long term release. The changes between 1.6 and 1.7 are already enormous and I'm not even going to try to backport some of the fixes. In addition, the *big* changes to datetime that made it workable are not in 1.6. The reason the difference is so big is not only the big delay in the release schedule, but the fact that 1.7 was being prepared for the takeoff to 2.0. I think 1.7 or later is the only reasonable candidate for the LTS. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Apr 26 23:53:11 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 26 Apr 2012 22:53:11 -0500 Subject: [Numpy-discussion] Removal of mask arrays? [was consensus] In-Reply-To: References: <54E3B8E1-6F17-460D-87CA-ED88DE7CD425@continuum.io> <4F96AE14.20302@crans.org> <6391FBFE-3859-4047-AA37-9688C96653CD@continuum.io> Message-ID: On Apr 26, 2012, at 10:29 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 6:46 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:59 PM, Charles R Harris wrote: > >> >> >> On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: >> >> On Apr 24, 2012, at 6:01 PM, St?fan van der Walt wrote: >> >> > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris >> > wrote: >> >>> Why are we having a discussion on NAN's in a thread on consensus? >> >>> This is a strong indicator of the problem we're facing. >> >> >> >> We seem to have a consensus regarding interest in the topic. >> > >> > For the benefit of those of us interested in both discussions, would >> > you kindly start a new thread on the MA topic? >> > >> > In response to Travis's suggestion of writing up a short summary of >> > community principles, as well as Matthew's initial formulation, I >> > agree that this would be helpful in enshrining the values we cherish >> > here, as well as in communicating those values to the next generation >> > of developers. >> > >> >> From observing the community, I would guess that these values include: >> > >> > - That any party with an interest in NumPy is given the opportunity to >> > speak and to be heard on the list. >> > - That discussions that influence the course of the project take place >> > openly, for anyone to observe. >> > - That decisions are made once consensus is reached, i.e., if everyone >> > agrees that they can live with the outcome. >> >> This is well stated. Thank you Stefan. >> >> Some will argue about what "consensus" means or who "everyone" is. But, if we are really worrying about that, then we have stopped listening to each other which is the number one community value that we should be promoting, demonstrating, and living by. >> >> Consensus to me means that anyone who can produce a well-reasoned argument and demonstrates by their persistence that they are actually using the code and are aware of the issues has veto power on pull requests. At times people with commit rights to NumPy might perform a pull request anyway, but they should acknowledge at least in the comment (but for major changes --- on this list) that they are doing so and provide their reasons. >> >> If I decide later that I think the pull request was made inappropriately in the face of objections and the reasons were not justified, then I will reserve the right to revert the pull request. I would like core developers of NumPy to have the same ability to check me as well. But, if there is a disagreement at that level, then I will reserve the right to decide. >> >> Basically, what we have in this situation is that the masked arrays were added to NumPy master with serious objections to the API. What I'm trying to decide right now is can we move forward and satisfy the objections without removing the ndarrayobject changes entirely (I do think the concerns warrant removal of the changes). The discussion around that is the most helpful right now, but should take place on another thread. >> >> >> Travis, if you are playing the BDFL role, then just make the darn decision and remove the code so we can get on with life. As it is you go back and forth and that does none of us any good, you're a big guy and you're rocking the boat. I don't agree with that decision, I'd rather evolve the code we have, but I'm willing to compromise with your decision in this matter. I'm not willing to compromise with Nathaniel's, nor it seems vice-versa. Nathaniel has volunteered to do the work, just ask him to submit a patch. > > I would like to see Nathaniel and Mark work out a document that they can both agree to and co-author that is presented to this list before doing something like that. At the very least this should summarize the feature from both perspectives. > > I have been encouraged by Nathaniel's willingness to contribute code, and I know Mark is looking for acceptable solutions that are still consistent with his view of things. These are all positive signs to me. We need to give this another week or two. I would prefer a solution that evolves the code as well. But, I also don't want yet another masked array implementation that gets little use but has real and long-lasting implications on the ndarray structure. There is both the effect of the growth of the ndarray structure (most uses should not worry about this at all), but also the growth of the *concept* of an ndarray --- this is a little more subtle but also has real downstream implications. > > Some of these implications have been pointed out already by consumers of the C-API who are unsure about how code that was not built with masks in mind will respond (I believe it will raise an error if they are using the standard APIs -- It probably should if it doesn't). Long term, I agree that the NumPy array should not be so tied to a particular *implementation* as it is now. I also don't think it should be tied so deeply to ABI compatibility. I think that was a mistake to be so devoted to this concept that we created a lot of work for ourselves --- work that is easily eliminated by distributions that re-compile down-stream dependencies after an ABI breaking release of NumPy. I realize, I didn't disagree very strongly before -- I disagree with my earlier view. That's not to say future releases of NumPy 1.X will break ABI compatibility --- I just think the price is not worth the value of the thing we have set as the standard (it's just a simple re-compile of downstream dependencies). > > We are quite delayed to get things out as you have noted. If the desire is to get a long-term release schedule for Debian and/or Ubuntu, then I think the 1.6.2 release is a good idea. It also makes more sense to me as a Long-Term Release candidate. > > > Now that I've been working on it, I can assure you that 1.6.2 is a terrible candidate for a long term release. The changes between 1.6 and 1.7 are already enormous and I'm not even going to try to backport some of the fixes. In addition, the *big* changes to datetime that made it workable are not in 1.6. The reason the difference is so big is not only the big delay in the release schedule, but the fact that 1.7 was being prepared for the takeoff to 2.0. I think 1.7 or later is the only reasonable candidate for the LTS. > Thanks for that analysis. That is very helpful to know. -Travis > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhattersley at gmail.com Fri Apr 27 06:32:40 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Fri, 27 Apr 2012 11:32:40 +0100 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: I know used a somewhat jokey tone in my original posting, but fundamentally it was a serious question concerning a live topic. So I'm curious about the lack of response. Has this all been covered before? Sorry if I'm being too impatient! On 25 April 2012 16:58, Richard Hattersley wrote: > The masked array discussions have brought up all sorts of interesting > topics - too many to usefully list here - but there's one aspect I haven't > spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just > too awkward to be helpful. But ... > > Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? > > In the library I'm working on, the introduction of MAs (via numpy.ma) > required us to sweep through the library and make a fair few changes. > That's not the sort of thing one would normally expect from the > introduction of a subclass. > > Putting aside the ABI issue, would it help downstream API compatibility if > the POA was a subclass of the MA? Code that's expecting/casting-to a POA > might continue to work and, where appropriate, could be upgraded in their > own time to accept MAs. > > Richard Hattersley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Apr 27 09:05:53 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 27 Apr 2012 09:05:53 -0400 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 6:32 AM, Richard Hattersley wrote: > I know used a somewhat jokey tone in my original posting, but > fundamentally it was a serious question concerning a live topic. So I'm > curious about the lack of response. Has this all been covered before? > > Sorry if I'm being too impatient! > > > Richard, Actually, I am rather surprised by the lack of response as well. Actually, this is quite unusual and I hope it doesn't sour you for more contributions. We do need more "crazy ideas" like your, if only just to help break out of an infinite loop in a discussion. Your idea is interesting, but doesn't it require C++? Or maybe you are thinking of creating a new C type object that would contain all the new features and hold a pointer and function interface to the original POA. Essentially, the new type would act as a wrapper around the original ndarray? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Apr 27 09:55:16 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 27 Apr 2012 14:55:16 +0100 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley wrote: > I know used a somewhat jokey tone in my original posting, but fundamentally > it was a serious question concerning a live topic. So I'm curious about the > lack of response. Has this all been covered before? > > Sorry if I'm being too impatient! That's fine, I know I did read it, but I wasn't sure what to make of it to respond :-) > On 25 April 2012 16:58, Richard Hattersley wrote: >> >> The masked array discussions have brought up all sorts of interesting >> topics - too many to usefully list here - but there's one aspect I haven't >> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just >> too awkward to be helpful. But ... >> >> Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? >> >> In the library I'm working on, the introduction of MAs (via numpy.ma) >> required us to sweep through the library and make a fair few changes. That's >> not the sort of thing one would normally expect from the introduction of a >> subclass. >> >> Putting aside the ABI issue, would it help downstream API compatibility if >> the POA was a subclass of the MA? Code that's expecting/casting-to a POA >> might continue to work and, where appropriate, could be upgraded in their >> own time to accept MAs. This makes a certain amount of sense from a traditional OO modeling perspective, where classes are supposed to refer to sets of objects and subclasses are subsets and superclasses are supersets. This is the property that's needed to guarantee that if A is a subclass of B, then any code that expects a B can also handle an A, since all A's are B's, which is what you need if you're doing type-checking or type-based dispatch. And indeed, from this perspective, MAs are a superclass of POAs, because for every POA there's a equivalent MA (the one with the mask set to all-true), but not vice-versa. But, that model of OO doesn't have much connection to Python. In Python's semantics, classes are almost irrelevant; they're mostly just some convenience tools for putting together the objects you want, and what really matters is the behavior of each object (the famous "duck typing"). You can call isinstance() if you want, but it's just an ordinary function that looks at some attributes on an object; the only magic involved is that some of those attributes have underscores in their name. In Python, subclassing mostly does two things: (1) it's a quick way to define set up a class that's similar to another class (though this is a worse idea than it looks -- you're basically doing 'from other_class import *' with all the usual tight-coupling problems that 'import *' brings). (2) When writing Python objects at the C level, subclassing lets you achieve memory layout compatibility (which is important because C does *not* do duck typing), and it lets you add new fields to a C struct. So at this level, MAs are a subclass of POAs, because MAs have an extra field that POAs don't... So I don't know what to think about subclasses/superclasses here, because they're such confusing and contradictory concepts that it's hard to tell what the actual resulting API semantics would be. - N From charlesr.harris at gmail.com Fri Apr 27 10:15:33 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Apr 2012 08:15:33 -0600 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley wrote: > The masked array discussions have brought up all sorts of interesting > topics - too many to usefully list here - but there's one aspect I haven't > spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just > too awkward to be helpful. But ... > > Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? > > In the library I'm working on, the introduction of MAs (via numpy.ma) > required us to sweep through the library and make a fair few changes. > That's not the sort of thing one would normally expect from the > introduction of a subclass. > > Putting aside the ABI issue, would it help downstream API compatibility if > the POA was a subclass of the MA? Code that's expecting/casting-to a POA > might continue to work and, where appropriate, could be upgraded in their > own time to accept MAs. > > That's a version of the idea that all arrays have masks, just some of them have "missing" masks. That construction was mentioned in the thread but I can see how one might have missed it. I think it is the right way to do things. However, current libraries and such will still need to do some work in order to not do the wrong thing when a "real" mask was present. For instance, check and raise an error if they can't deal with it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 27 10:33:20 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Apr 2012 08:33:20 -0600 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris wrote: > > > On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley > wrote: > >> The masked array discussions have brought up all sorts of interesting >> topics - too many to usefully list here - but there's one aspect I haven't >> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just >> too awkward to be helpful. But ... >> >> Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? >> >> In the library I'm working on, the introduction of MAs (via numpy.ma) >> required us to sweep through the library and make a fair few changes. >> That's not the sort of thing one would normally expect from the >> introduction of a subclass. >> >> Putting aside the ABI issue, would it help downstream API compatibility >> if the POA was a subclass of the MA? Code that's expecting/casting-to a POA >> might continue to work and, where appropriate, could be upgraded in their >> own time to accept MAs. >> >> > That's a version of the idea that all arrays have masks, just some of them > have "missing" masks. That construction was mentioned in the thread but I > can see how one might have missed it. I think it is the right way to do > things. However, current libraries and such will still need to do some work > in order to not do the wrong thing when a "real" mask was present. For > instance, check and raise an error if they can't deal with it. > To expand a bit more, this is precisely why the current work on making masks part of ndarray rather than a subclass was undertaken. There is a flag that says whether or not the array is masked, but you will still need to check that flag to see if you are working with an unmasked instance of ndarray. At the moment the masked version isn't quite completely fused with ndarrays-classic since the maskedness needs to be specified in the constructors and such, but what you suggest is actually what we are working towards. No matter what is done, current functions and libraries that want to use masks are going to have to deal with the existence of both masked and unmasked arrays since the existence of a mask can't be ignored without risking wrong results. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Thu Apr 26 12:20:45 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 26 Apr 2012 11:20:45 -0500 Subject: [Numpy-discussion] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python In-Reply-To: References: Message-ID: Dear all, (Sorry if you receive this announcement multiple times.) Registration for SciPy 2012, the eleventh annual Conference on Scientific Computing with Python, is open! Go to https://conference.scipy.org/scipy2012/register/index.php We would like to remind you that the submissions for talks, posters and tutorials are open *until April 30th, *which is just around the corner. For more information see: http://conference.scipy.org/scipy2012/tutorials.php http://conference.scipy.org/scipy2012/talks/index.php For talks or posters, all we need is an abstract. Tutorials require more significant preparation. If you are preparing a tutorial, please send a brief note to Jonathan Rocher (jrocher at enthought.com) to indicate your intent. We look forward to seeing many of you this summer. Kind regards, The SciPy 2012 organizers scipy2012 at scipy.org On Wed, Apr 4, 2012 at 4:30 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > SciPy 2012, the eleventh annual Conference on Scientific Computing with > Python, will be held July 16?21, 2012, in Austin, Texas. > > At this conference, novel scientific applications and libraries related to > data acquisition, analysis, dissemination and visualization using Python > are presented. Attended by leading figures from both academia and industry, > it is an excellent opportunity to experience the cutting edge of scientific > software development. > > The conference is preceded by two days of tutorials, during which > community experts provide training on several scientific Python packages. > Following the main conference will be two days of coding sprints. > > We invite you to give a talk or present a poster at SciPy 2012. > > The list of topics that are appropriate for the conference includes (but > is not limited to): > > - new Python libraries for science and engineering; > - applications of Python in solving scientific or computational > problems; > - high performance, parallel and GPU computing with Python; > - use of Python in science education. > > > > Specialized Tracks > > Two specialized tracks run in parallel to the main conference: > > - High Performance Computing with Python > Whether your algorithm is distributed, threaded, memory intensive or > latency bound, Python is making headway into the problem. We are looking > for performance driven designs and applications in Python. Candidates > include the use of Python within a parallel application, new architectures, > and ways of making traditional applications execute more efficiently. > > > - Visualization > They say a picture is worth a thousand words--we?re interested in > both! Python provides numerous visualization tools that allow scientists > to show off their work, and we want to know about any new tools and > techniques out there. Come show off your latest graphics, whether it?s an > old library with a slick new feature, a new library out to challenge the > status quo, or simply a beautiful result. > > > > Domain-specific Mini-symposia > > Mini-symposia on the following topics are also being organized: > > - Computational bioinformatics > - Meteorology and climatology > - Astronomy and astrophysics > - Geophysics > > > > Talks, papers and posters > > We invite you to take part by submitting a talk or poster abstract. > Instructions are on the conference website: > > > http://conference.scipy.org/scipy2012/talks.php > > Selected talks are included as papers in the peer-reviewed conference > proceedings, to be published online. > > > Tutorials > > Tutorials will be given July 16?17. We invite instructors to submit > proposals for half-day tutorials on topics relevant to scientific computing > with Python. See > > http://conference.scipy.org/scipy2012/tutorials.php > > for information about submitting a tutorial proposal. To encourage > tutorials of the highest quality, the instructor (or team of instructors) > is given a $1,000 stipend for each half day tutorial. > > > Student/Community Scholarships > > We anticipate providing funding for students and for active members of the > SciPy community who otherwise might not be able to attend the conference. > See > > http://conference.scipy.org/scipy2012/student.php > > for scholarship application guidelines. > > > Be a Sponsor > > The SciPy conference could not run without the generous support of the > institutions and corporations who share our enthusiasm for Python as a tool > for science. Please consider sponsoring SciPy 2012. For more information, > see > > http://conference.scipy.org/scipy2012/sponsor/index.php > > > Important dates: > > Monday, April 30: Talk abstracts and tutorial proposals due. > Monday, May 7: Accepted tutorials announced. > Monday, May 13: Accepted talks announced. > > Monday, June 18: Early registration ends. (Price increases after this > date.) > Sunday, July 8: Online registration ends. > > Monday-Tuesday, July 16 - 17: Tutorials > Wednesday-Thursday, July 18 - July 19: Conference > Friday-Saturday, July 20 - July 21: Sprints > > We look forward to seeing you all in Austin this year! > > The SciPy 2012 Team > http://conference.scipy.org/scipy2012/organizers.php > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Apr 27 11:16:24 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Apr 2012 11:16:24 -0400 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris wrote: > > > On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris > wrote: >> >> >> >> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley >> wrote: >>> >>> The masked array discussions have brought up all sorts of interesting >>> topics - too many to usefully list here - but there's one aspect I haven't >>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just >>> too awkward to be helpful. But ... >>> >>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array >>> (POA)? >>> >>> In the library I'm working on, the introduction of MAs (via numpy.ma) >>> required us to sweep through the library and make a fair few changes. That's >>> not the sort of thing one would normally expect from the introduction of a >>> subclass. >>> >>> Putting aside the ABI issue, would it help downstream API compatibility >>> if the POA was a subclass of the MA? Code that's expecting/casting-to a POA >>> might continue to work and, where appropriate, could be upgraded in their >>> own time to accept MAs. >>> >> >> That's a version of the idea that all arrays have masks, just some of them >> have "missing" masks. That construction was mentioned in the thread but I >> can see how one might have missed it. I think it is the right way to do >> things. However, current libraries and such will still need to do some work >> in order to not do the wrong thing when a "real" mask was present. For >> instance, check and raise an error if they can't deal with it. > > > To expand a bit more, this is precisely why the current work on making masks > part of ndarray rather than a subclass was undertaken. There is a flag that > says whether or not the array is masked, but you will still need to check > that flag to see if you are working with an unmasked instance of ndarray. At > the moment the masked version isn't quite completely fused with > ndarrays-classic since the maskedness needs to be specified in the > constructors and such, but what you suggest is actually what we are working > towards. > > No matter what is done, current functions and libraries that want to use > masks are going to have to deal with the existence of both masked and > unmasked arrays since the existence of a mask can't be ignored without > risking wrong results. (In case it's not the wrong thread) If every ndarray has this maskflag, then it is easy to adjust other library code. if myarr.maskflag is not None: raise SorryException What is expensive is having to do np.isnan(myarr) or np.isfinite(myarr) everywhere. https://github.com/scipy/scipy/pull/48 As a concept I like the idea, masked arrays are the general class with generic defaults, "clean" arrays are a subclass where some methods are overwritten with faster implementations. Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rhattersley at gmail.com Fri Apr 27 11:28:13 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Fri, 27 Apr 2012 16:28:13 +0100 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: Hi all, Thanks for all your responses and for your patience with a newcomer. Don't worry - I'm not going to give up yet. It's all just part of my learning the ropes. On 27 April 2012 14:05, Benjamin Root wrote: > Your idea is interesting, but doesn't it require C++? Or maybe you > are thinking of creating a new C type object that would contain all the new > features and hold a pointer and function interface to the original POA. > Essentially, the new type would act as a wrapper around the original > ndarray? > When talking about subclasses I'm just talking about the end-user experience within Python. In other words, I'm starting from issubclass(POA, MA) == True, and trying to figure out what the Python API implications would be. On 27 April 2012 14:55, Nathaniel Smith wrote: > On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley > wrote: > > I know used a somewhat jokey tone in my original posting, but > fundamentally > > it was a serious question concerning a live topic. So I'm curious about > the > > lack of response. Has this all been covered before? > > > > Sorry if I'm being too impatient! > > That's fine, I know I did read it, but I wasn't sure what to make of > it to respond :-) > > > On 25 April 2012 16:58, Richard Hattersley > wrote: > >> > >> The masked array discussions have brought up all sorts of interesting > >> topics - too many to usefully list here - but there's one aspect I > haven't > >> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or > just > >> too awkward to be helpful. But ... > >> > >> Shouldn't masked arrays (MA) be a superclass of the plain-old-array > (POA)? > >> > >> In the library I'm working on, the introduction of MAs (via numpy.ma) > >> required us to sweep through the library and make a fair few changes. > That's > >> not the sort of thing one would normally expect from the introduction > of a > >> subclass. > >> > >> Putting aside the ABI issue, would it help downstream API compatibility > if > >> the POA was a subclass of the MA? Code that's expecting/casting-to a POA > >> might continue to work and, where appropriate, could be upgraded in > their > >> own time to accept MAs. > > This makes a certain amount of sense from a traditional OO modeling > perspective, where classes are supposed to refer to sets of objects > and subclasses are subsets and superclasses are supersets. This is the > property that's needed to guarantee that if A is a subclass of B, then > any code that expects a B can also handle an A, since all A's are B's, > which is what you need if you're doing type-checking or type-based > dispatch. And indeed, from this perspective, MAs are a superclass of > POAs, because for every POA there's a equivalent MA (the one with the > mask set to all-true), but not vice-versa. > > But, that model of OO doesn't have much connection to Python. In > Python's semantics, classes are almost irrelevant; they're mostly just > some convenience tools for putting together the objects you want, and > what really matters is the behavior of each object (the famous "duck > typing"). You can call isinstance() if you want, but it's just an > ordinary function that looks at some attributes on an object; the only > magic involved is that some of those attributes have underscores in > their name. In Python, subclassing mostly does two things: (1) it's a > quick way to define set up a class that's similar to another class > (though this is a worse idea than it looks -- you're basically doing > 'from other_class import *' with all the usual tight-coupling problems > that 'import *' brings). (2) When writing Python objects at the C > level, subclassing lets you achieve memory layout compatibility (which > is important because C does *not* do duck typing), and it lets you add > new fields to a C struct. > > So at this level, MAs are a subclass of POAs, because MAs have an > extra field that POAs don't... > > So I don't know what to think about subclasses/superclasses here, > because they're such confusing and contradictory concepts that it's > hard to tell what the actual resulting API semantics would be. > It doesn't seem essential that MAs have an extra field that POAs don't. If POA was a subclass of MA, instances of POA could have the extra field set to an "all-valid"/"nothing-is-masked" value. Granted, you'd want that to be a special value so you're not lugging around a load of redundant data (and you can optimise your processing for that), but I'm guessing you'd probably want that kind of capability within MA anyway. On 27 April 2012 15:33, Charles R Harris wrote: > > > On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley < >> rhattersley at gmail.com> wrote: >> >>> The masked array discussions have brought up all sorts of interesting >>> topics - too many to usefully list here - but there's one aspect I haven't >>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just >>> too awkward to be helpful. But ... >>> >>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array >>> (POA)? >>> >>> In the library I'm working on, the introduction of MAs (via numpy.ma) >>> required us to sweep through the library and make a fair few changes. >>> That's not the sort of thing one would normally expect from the >>> introduction of a subclass. >>> >>> Putting aside the ABI issue, would it help downstream API compatibility >>> if the POA was a subclass of the MA? Code that's expecting/casting-to a POA >>> might continue to work and, where appropriate, could be upgraded in their >>> own time to accept MAs. >>> >>> >> That's a version of the idea that all arrays have masks, just some of >> them have "missing" masks. That construction was mentioned in the thread >> but I can see how one might have missed it. I think it is the right way to >> do things. However, current libraries and such will still need to do some >> work in order to not do the wrong thing when a "real" mask was present. For >> instance, check and raise an error if they can't deal with it. >> > > To expand a bit more, this is precisely why the current work on making > masks part of ndarray rather than a subclass was undertaken. There is a > flag that says whether or not the array is masked, but you will still need > to check that flag to see if you are working with an unmasked instance of > ndarray. At the moment the masked version isn't quite completely fused with > ndarrays-classic since the maskedness needs to be specified in the > constructors and such, but what you suggest is actually what we are working > towards. > > No matter what is done, current functions and libraries that want to use > masks are going to have to deal with the existence of both masked and > unmasked arrays since the existence of a mask can't be ignored without > risking wrong results. > > Chuck > Having the class hierarchy would allow isinstance() to help. And there are some substantial API implications for this but... if numpy.mean(...) etc refused to work with MAs then that might also help. (Clearly myarray.mean() would still work if myarray was actually a MA, but then it would also give a correct answer.) What other kinds of checks (implicit or explicit) are already out there? I'm *very* aware that there are other aspects of the API where the desired behaviour is even less clear! Thanks for indulging. Richard Hattersley -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 27 11:54:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Apr 2012 09:54:44 -0600 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 9:16 AM, wrote: > On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris > wrote: > > > > > > On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris > > wrote: > >> > >> > >> > >> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley > >> wrote: > >>> > >>> The masked array discussions have brought up all sorts of interesting > >>> topics - too many to usefully list here - but there's one aspect I > haven't > >>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or > just > >>> too awkward to be helpful. But ... > >>> > >>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array > >>> (POA)? > >>> > >>> In the library I'm working on, the introduction of MAs (via numpy.ma) > >>> required us to sweep through the library and make a fair few changes. > That's > >>> not the sort of thing one would normally expect from the introduction > of a > >>> subclass. > >>> > >>> Putting aside the ABI issue, would it help downstream API compatibility > >>> if the POA was a subclass of the MA? Code that's expecting/casting-to > a POA > >>> might continue to work and, where appropriate, could be upgraded in > their > >>> own time to accept MAs. > >>> > >> > >> That's a version of the idea that all arrays have masks, just some of > them > >> have "missing" masks. That construction was mentioned in the thread but > I > >> can see how one might have missed it. I think it is the right way to do > >> things. However, current libraries and such will still need to do some > work > >> in order to not do the wrong thing when a "real" mask was present. For > >> instance, check and raise an error if they can't deal with it. > > > > > > To expand a bit more, this is precisely why the current work on making > masks > > part of ndarray rather than a subclass was undertaken. There is a flag > that > > says whether or not the array is masked, but you will still need to check > > that flag to see if you are working with an unmasked instance of > ndarray. At > > the moment the masked version isn't quite completely fused with > > ndarrays-classic since the maskedness needs to be specified in the > > constructors and such, but what you suggest is actually what we are > working > > towards. > > > > No matter what is done, current functions and libraries that want to use > > masks are going to have to deal with the existence of both masked and > > unmasked arrays since the existence of a mask can't be ignored without > > risking wrong results. > > (In case it's not the wrong thread) > > If every ndarray has this maskflag, then it is easy to adjust other > library code. > > That is the case. In [1]: ones(1).flags Out[1]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True MASKNA : False OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False What I'd like to add is that the mask is only allocated when NA (or equivalent) is assigned. That way the flag also signals the actual presence of a masked value. > if myarr.maskflag is not None: raise SorryException > > What is expensive is having to do np.isnan(myarr) or > np.isfinite(myarr) everywhere. > https://github.com/scipy/scipy/pull/48 > > As a concept I like the idea, masked arrays are the general class with > generic defaults, "clean" arrays are a subclass where some methods are > overwritten with faster implementations. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Apr 27 12:42:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 27 Apr 2012 11:42:55 -0500 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: Message-ID: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> On Apr 25, 2012, at 10:58 AM, Richard Hattersley wrote: > The masked array discussions have brought up all sorts of interesting topics - too many to usefully list here - but there's one aspect I haven't spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just too awkward to be helpful. But ... > > Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)? Ultimately, this is what Chuck and Mark are advocating, I believe. It's not a crazy idea. In fact, it's probably more correct in that masked arrays *are* more general than POAs. If we were starting from scratch in 1994 (Numeric days), I could see taking this route and setting expectations correctly for downstream libraries. There are three problems I see with jamming this concept into NumPy 1.X, however, by modifying all POA data-structures to now *be* masked arrays. 1) There is a lot of code out there that does not know anything about masks and is not used to checking for masks. It enlarges the basic abstraction in a way that is not backwards compatible *conceptually*. This smells fishy to me and I could see a lot of downstream problems from libraries that rely on NumPy. 2) We cannot agree on how masks should be handled and consequently don't have a real plan for migrating numpy.ma to use these masks. So, we are just growing the API and introducing uncertainty for unclear benefit --- especially for the person that does not want to use masks. 3) Subclassing in C in Python requires that C-structures are *binary* compatible. This implies that all subclasses have *more* attributes than the superclass. The way it is currently implemented, that means that POAs would have these extra pointers they don't need sitting there to satisfy that requirement. From a C-struct perspective it therefore makes more sense for MAs to inherit from POAs. Ideally, that shouldn't drive the design, but it's part of the landscape in NumPy 1.X I have some ideas about how to move forward, but I'm anxiously awaiting the write-up that Mark and Nathaniel are working on to inform and enhance those ideas. Masked arrays do have a long history in the Numeric and NumPy code base. Paul Dubois originally created the first masked arrays in Numeric and helped move them to numpy.ma. Pierre GM took that code and worked very hard to add a lot of features. I'm very concerned about adding a new masked array abstraction into the *core* of all NumPy arrays. Especially one that is not well informed by this history nor its user base. I was just visiting LLNL a couple of weeks ago and realized that they are using masked arrays very heavily in UV-CDAT and elsewhere. I've also seen many other people in industry, academia, and government use masked arrays. I've typically squirmed at that because I know that masked arrays have performance issues because they are in Python. I've also wondered about masked arrays as *subclasses* of POAs because of how much code has to be rewritten in the sub-class for it to work correctly. So, in summary. My view is that NumPy has masked arrays already (and has had them for a long-long time). Missing data is only one of the use-cases for masked arrays (though it is probably the dominant use case for numpy.ma). Independent of the "missing-data" story, any plan to add masks directly to a base-object in NumPy needs to take into account the numpy.ma user-base and the POA user-base that does not expect to be dealing with masks. That doesn't mean it needs to follow numpy.ma design choices and API. It does, however, need to think about how a typical numpy.ma user could instead use the new masked array concept, and how numpy.ma itself could be re-vised to use the new masked array concept. I think Mark has done some amazing coding and I would like to keep as much of it as possible available to people. We may need to adjust *how* it is presented downstream, but I'm hopeful that we can do that. Thanks for your ideas and your comments. -Travis > > In the library I'm working on, the introduction of MAs (via numpy.ma) required us to sweep through the library and make a fair few changes. That's not the sort of thing one would normally expect from the introduction of a subclass. > > Putting aside the ABI issue, would it help downstream API compatibility if the POA was a subclass of the MA? Code that's expecting/casting-to a POA might continue to work and, where appropriate, could be upgraded in their own time to accept MAs. > > Richard Hattersley > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Apr 27 14:52:00 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Apr 2012 14:52:00 -0400 Subject: [Numpy-discussion] ANN: statsmodels 0.4.0 Message-ID: We are pleased to announce the release of statsmodels 0.4.0. The big changes in this release are that most models can now be used with Pandas dataframes, and that we dropped the scikits namespace. Importing scikits.statsmodels is still possible but will be removed in the future. Pandas is now a required dependency. For more changes including some breaks in backwards compatibility see below. Josef and Skipper What it is ========== Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models. Documentation for the 0.4 version is currently at http://statsmodels.sourceforge.net/devel/ Main Changes and Additions in 0.4.0 ----------------------------------- * Added pandas dependency. * Cython source is built automatically if cython and compiler are present * Support use of dates in timeseries models * Improved plots - Violin plots - Bean Plots - QQ Plots * Added lowess function * Support for pandas Series and DataFrame objects. Results instances return pandas objects if the models are fit using pandas objects. * Full Python 3 compatibility * Fix bugs in genfromdta. Convert Stata .dta format to structured array preserving all types. Conversion is much faster now. * Improved documentation * Models and results are pickleable via save/load, optionally saving the model data. * Kernel Density Estimation now uses Cython and is considerably faster. * Diagnostics for outlier and influence statistics in OLS * Added El Nino Sea Surface Temperatures dataset * Numerous bug fixes * Internal code refactoring * Improved documentation including examples as part of HTML * ... *Changes that break backwards compatibility* * Deprecated scikits namespace. The recommended import is now:: import statsmodels.api as sm * model.predict methods signature is now (params, exog, ...) where before it assumed that the model had been fit and omitted the params argument. (this removed circularity between models and results instances) * For consistency with other multi-equation models, the parameters of MNLogit are now transposed. * tools.tools.ECDF -> distributions.ECDF * tools.tools.monotone_fn_inverter -> distributions.monotone_fn_inverter * tools.tools.StepFunction -> distributions.StepFunction Main Features ============= * linear regression models: Generalized least squares (including weighted least squares and least squares with autoregressive errors), ordinary least squares. * glm: Generalized linear models with support for all of the one-parameter exponential family distributions. * discrete: regression with discrete dependent variables, including Logit, Probit, MNLogit, Poisson, based on maximum likelihood estimators * rlm: Robust linear models with support for several M-estimators. * tsa: models for time series analysis - univariate time series analysis: AR, ARIMA - vector autoregressive models, VAR and structural VAR - descriptive statistics and process models for time series analysis * nonparametric : (Univariate) kernel density estimators * datasets: Datasets to be distributed and used for examples and in testing. * stats: a wide range of statistical tests - diagnostics and specification tests - goodness-of-fit and normality tests - functions for multiple testing - various additional statistical tests * iolib - Tools for reading Stata .dta files into numpy arrays. - printing table output to ascii, latex, and html * miscellaneous models * sandbox: statsmodels contains a sandbox folder with code in various stages of developement and testing which is not considered "production ready". This covers among others Mixed (repeated measures) Models, GARCH models, general method of moments (GMM) estimators, kernel regression, various extensions to scipy.stats.distributions, panel data models, generalized additive models and information theoretic measures. Where to get it =============== The master branch on GitHub is the most up to date code https://www.github.com/statsmodels/statsmodels Source download of release tags are available on GitHub https://github.com/statsmodels/statsmodels/tags Binaries and source distributions are available from PyPi http://pypi.python.org/pypi/statsmodels/ From travis at vaught.net Fri Apr 27 16:52:03 2012 From: travis at vaught.net (Travis Vaught) Date: Fri, 27 Apr 2012 15:52:03 -0500 Subject: [Numpy-discussion] datetime dtype possible regression Message-ID: With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In [1]: import numpy as np In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', 'low', ...: 'close', 'volume', 'adjclose'], ...: 'formats':['S8', 'M8', float, float, float, float, ...: float, float]}) In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, 602.0),] In [4]: recdata = np.array(data, dtype=schema) In [5]: recdata Out[5]: array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, 598.0, 602.0, 50000000.0, 602.0)], dtype=[('symbol', '|S8'), ('date', ('>> import numpy as np >>> schema = np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], 'formats':['S8','M8',float,float,float,float,float,float]}) >>> data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, 602.0),] >>> recdata = np.array(data, dtype=schema) Traceback (most recent call last): File "", line 1, in ValueError: Cannot create a NumPy datetime other than NaT with generic units >>> np.version.version '1.7.0.dev-3cb783e' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Any hints about a regression I can check for? Or perhaps I missed an api change for specifying datetime dtypes? Best, Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Apr 27 16:57:06 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Apr 2012 21:57:06 +0100 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 21:52, Travis Vaught wrote: > With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > In [1]: import numpy as np > > In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', 'low', > ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?'close', 'volume', 'adjclose'], > ? ?...: ? ? ? ? ? ? ? ? ? ?'formats':['S8', 'M8', float, float, float, > float, > ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?float, float]}) > > In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, > 602.0),] > > In [4]: recdata = np.array(data, dtype=schema) > > In [5]: recdata > Out[5]: > array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, 598.0, > 602.0, 50000000.0, 602.0)], > ? ? ? dtype=[('symbol', '|S8'), ('date', (' ('high', ' ('adjclose', ' > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > With numpy-1.7.0.dev_3cb783e I get this: > >>>> import numpy as np > >>>> schema = >>>> np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], >>>> 'formats':['S8','M8',float,float,float,float,float,float]}) > >>>> data = ?[("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, >>>> 602.0),] > >>>> recdata = np.array(data, dtype=schema) > Traceback (most recent call last): > ? File "", line 1, in > ValueError: Cannot create a NumPy datetime other than NaT with generic units > >>>> np.version.version > '1.7.0.dev-3cb783e' > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Any hints about a regression I can check for? Or perhaps I missed an api > change for specifying datetime dtypes? Judging from the error message, it looks like an intentional API change. -- Robert Kern From antony.lee at berkeley.edu Fri Apr 27 22:17:39 2012 From: antony.lee at berkeley.edu (Antony Lee) Date: Fri, 27 Apr 2012 19:17:39 -0700 Subject: [Numpy-discussion] Python3, genfromtxt and unicode Message-ID: With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to the largest number of chars (npyio.py line 1596), but it doesn't do the same for unicode fields, which is a pity. See example below. I tried to change npyio.py around line 1600 to add that but it didn't work; from my limited understanding the problem comes earlier, in the way StringBuilder is defined(?). Antony Lee import io, numpy as np s = io.BytesIO() s.write(b"abc 1\ndef 2") s.seek(0) t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes}) print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', ' [('', 1) ('', 2)] [('f0', ' From dineshbvadhia at hotmail.com Fri Apr 27 22:40:18 2012 From: dineshbvadhia at hotmail.com (Dinesh Vadhia) Date: Fri, 27 Apr 2012 19:40:18 -0700 Subject: [Numpy-discussion] (no subject) Message-ID: http://www.cibingru.net/orndkty/566502.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhattersley at gmail.com Sat Apr 28 02:38:27 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Sat, 28 Apr 2012 07:38:27 +0100 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> References: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> Message-ID: On 27 April 2012 17:42, Travis Oliphant wrote: > > 1) There is a lot of code out there that does not know anything about > masks and is not used to checking for masks. It enlarges the basic > abstraction in a way that is not backwards compatible *conceptually*. > This smells fishy to me and I could see a lot of downstream problems from > libraries that rely on NumPy. > That's exactly why I'd love to see plain arrays remain functionally unchanged. It's just a small, random sample, but here's how a few routines from NumPy and SciPy sanitise their inputs... numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray scipy.spatial.KDTree - numpy.asarray scipy.spatial.cKDTree - numpy.ascontiguousarray scipy.integrate.odeint - PyArray_ContiguousFromObject scipy.interpolate.interp1d - numpy.array scipy.interpolate.griddata - numpy.asanyarray & numpy.ascontiguousarray So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. > 2) We cannot agree on how masks should be handled and consequently don't > have a real plan for migrating numpy.ma to use these masks. So, we are > just growing the API and introducing uncertainty for unclear benefit --- > especially for the person that does not want to use masks. > > I've not yet looked at how numpy.ma users could be migrated. But if we make masked arrays a strict superclass and leave the numpy/ndarray interface and behaviour unchanged, API growth shouldn't be an issue. End-users will be able to completely ignore the existence of masked arrays (except for the minority(?) for whom the ABI/re-compile issue would be relevant). > 3) Subclassing in C in Python requires that C-structures are *binary* > compatible. This implies that all subclasses have *more* attributes than > the superclass. The way it is currently implemented, that means that POAs > would have these extra pointers they don't need sitting there to satisfy > that requirement. From a C-struct perspective it therefore makes more > sense for MAs to inherit from POAs. Ideally, that shouldn't drive the > design, but it's part of the landscape in NumPy 1.X > > I'd hate to see the logical class hierarchy inverted (or collapsed to a single class) just to save a pointer or two from the struct. Now seems like a golden opportunity to fix the relationship between masked and plain arrays. I'm assuming (and implicitly checking that assumption with this statement!) that there's far more code using the Python interface to NumPy, than there is code using the C interface. So I'm urging that the logical consistency of the Python interface (and even the C and Cython interfaces) takes precedence over the C-struct memory saving. I'm not sure I agree with "extra pointers they don't need". If we make plain arrays a subclass of masked arrays, aren't these pointers essential to ensure masked array methods can continue to work on plain arrays without requiring special code paths? > I have some ideas about how to move forward, but I'm anxiously awaiting > the write-up that Mark and Nathaniel are working on to inform and enhance > those ideas. > +1 As an aside, the implication of preserving the behaviour of the numpy/ndarray interface is that masked arrays will need a *new* interface. For example: >>> import mumpy # Yes - I know it's a terrible name! But I had to write *something* ... sorry! ;-) >>> import numpy >>> a = mumpy.array(...) # makes a masked array >>> b = numpy.array(...) # makes a plain array >>> isinstance(a, mumpy.ndarray) True >>> isinstance(b, mumpy.ndarray) True >>> isinstance(a, numpy.ndarray) False >>> isinstance(b, numpy.ndarray) True Richard Hattersley -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sat Apr 28 11:13:19 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 28 Apr 2012 11:13:19 -0400 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 4:57 PM, Robert Kern wrote: > On Fri, Apr 27, 2012 at 21:52, Travis Vaught wrote: >> With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> In [1]: import numpy as np >> >> In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', 'low', >> ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?'close', 'volume', 'adjclose'], >> ? ?...: ? ? ? ? ? ? ? ? ? ?'formats':['S8', 'M8', float, float, float, >> float, >> ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?float, float]}) >> >> In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, >> 602.0),] >> >> In [4]: recdata = np.array(data, dtype=schema) >> >> In [5]: recdata >> Out[5]: >> array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, 598.0, >> 602.0, 50000000.0, 602.0)], >> ? ? ? dtype=[('symbol', '|S8'), ('date', ('> ('high', '> ('adjclose', '> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> With numpy-1.7.0.dev_3cb783e I get this: >> >>>>> import numpy as np >> >>>>> schema = >>>>> np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], >>>>> 'formats':['S8','M8',float,float,float,float,float,float]}) >> >>>>> data = ?[("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, >>>>> 602.0),] >> >>>>> recdata = np.array(data, dtype=schema) >> Traceback (most recent call last): >> ? File "", line 1, in >> ValueError: Cannot create a NumPy datetime other than NaT with generic units >> >>>>> np.version.version >> '1.7.0.dev-3cb783e' >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> Any hints about a regression I can check for? Or perhaps I missed an api >> change for specifying datetime dtypes? > > Judging from the error message, it looks like an intentional API change. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Maybe this should be raised as a bug (where do we report NumPy bugs these days, still Trac?). As I'm moving to datetime64 in pandas if NumPy 1.6.1 data has unpickling issues on NumPy 1.7+ it's going to be very problematic. From charlesr.harris at gmail.com Sat Apr 28 11:18:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 28 Apr 2012 09:18:19 -0600 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Sat, Apr 28, 2012 at 9:13 AM, Wes McKinney wrote: > On Fri, Apr 27, 2012 at 4:57 PM, Robert Kern > wrote: > > On Fri, Apr 27, 2012 at 21:52, Travis Vaught wrote: > >> With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: > >> > >> > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > >> In [1]: import numpy as np > >> > >> In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', > 'low', > >> ...: 'close', 'volume', 'adjclose'], > >> ...: 'formats':['S8', 'M8', float, float, float, > >> float, > >> ...: float, float]}) > >> > >> In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, > 50000000, > >> 602.0),] > >> > >> In [4]: recdata = np.array(data, dtype=schema) > >> > >> In [5]: recdata > >> Out[5]: > >> array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, > 598.0, > >> 602.0, 50000000.0, 602.0)], > >> dtype=[('symbol', '|S8'), ('date', (' ' >> ('high', ' >> ('adjclose', ' >> > >> > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > >> > >> With numpy-1.7.0.dev_3cb783e I get this: > >> > >>>>> import numpy as np > >> > >>>>> schema = > >>>>> > np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], > >>>>> 'formats':['S8','M8',float,float,float,float,float,float]}) > >> > >>>>> data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, > >>>>> 602.0),] > >> > >>>>> recdata = np.array(data, dtype=schema) > >> Traceback (most recent call last): > >> File "", line 1, in > >> ValueError: Cannot create a NumPy datetime other than NaT with generic > units > >> > >>>>> np.version.version > >> '1.7.0.dev-3cb783e' > >> > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > >> Any hints about a regression I can check for? Or perhaps I missed an api > >> change for specifying datetime dtypes? > > > > Judging from the error message, it looks like an intentional API change. > > > > -- > > Robert Kern > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Maybe this should be raised as a bug (where do we report NumPy bugs > these days, still Trac?). As I'm moving to datetime64 in pandas if > NumPy 1.6.1 data has unpickling issues on NumPy 1.7+ it's going to be > very problematic. > I was wondering what datetime you were using since the version in 1.6 had issues. Have you tested with both? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 28 12:34:04 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 28 Apr 2012 17:34:04 +0100 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> Message-ID: On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley wrote: > So, assuming numpy.ndarray became a strict subclass of some new masked > array, it looks plausible that adding just a few checks to numpy.ndarray to > exclude the masked superclass would prevent much downstream code from > accidentally operating on masked arrays. I think the main point I was trying to make is that it's the existence and content of these checks that matters. They don't necessarily have any relation at all to which thing Python calls a "superclass" or a "subclass". -- Nathaniel From ndbecker2 at gmail.com Sat Apr 28 12:58:36 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Sat, 28 Apr 2012 12:58:36 -0400 Subject: [Numpy-discussion] A crazy masked-array thought References: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> Message-ID: Nathaniel Smith wrote: > On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley > wrote: >> So, assuming numpy.ndarray became a strict subclass of some new masked >> array, it looks plausible that adding just a few checks to numpy.ndarray to >> exclude the masked superclass would prevent much downstream code from >> accidentally operating on masked arrays. > > I think the main point I was trying to make is that it's the existence > and content of these checks that matters. They don't necessarily have > any relation at all to which thing Python calls a "superclass" or a > "subclass". > > -- Nathaniel I don't agree with the argument that ma should be a superclass of ndarray. It is ma that is adding features. That makes it a subclass. We're not talking mathematics here. There is a well-known disease of OOP where everything seems to bubble up to the top of the class hierarchy - so that the base class becomes bloated to support every feature needed by subclasses. I believe that's considered poor design. Is there a way to support ma as a subclass of ndarray, without introducing overhead into ndarray? Without having given this much real thought, I do have some idea. What are the operations that we need on arrays? The most basic are: 1. element access 2. get size (shape) In an OO design, these would be virtual functions (or in C, pointers to functions). But this would introduce unacceptable overhead. In a generic programming design (c++ templates), we would essentially generate 2 copies of every function, one that operates on plain arrays, and one that operates on masked arrays, each using the appropriate function for element access, shape, etc. This way, no uneeded overhead is introduced, (although the code size is increased - but this is probably of little consequence on modern demand-paged OS). Following this approach, ma and ndarray don't have to have any inheritance relation. OTOH, inheritance is probably useful since there are many common features to ma and ndarray, and a lot of code could be shared. From charlesr.harris at gmail.com Sat Apr 28 23:18:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 28 Apr 2012 21:18:55 -0600 Subject: [Numpy-discussion] A crazy masked-array thought In-Reply-To: References: <0B94F197-6A56-4F24-AA30-232973EE16E3@continuum.io> Message-ID: On Sat, Apr 28, 2012 at 10:58 AM, Neal Becker wrote: > Nathaniel Smith wrote: > > > On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley > > wrote: > >> So, assuming numpy.ndarray became a strict subclass of some new masked > >> array, it looks plausible that adding just a few checks to > numpy.ndarray to > >> exclude the masked superclass would prevent much downstream code from > >> accidentally operating on masked arrays. > > > > I think the main point I was trying to make is that it's the existence > > and content of these checks that matters. They don't necessarily have > > any relation at all to which thing Python calls a "superclass" or a > > "subclass". > > > > -- Nathaniel > > I don't agree with the argument that ma should be a superclass of ndarray. > It > is ma that is adding features. That makes it a subclass. We're not > talking > mathematics here. > It isn't a subclass either. In a true subclass, anything that worked on the base class would work equally well on a subclass *without modification*. Basically, it's an independent class with special functions that can handle combinations and ufuncs. Look at all the functions exported in numpy/ma/core.py. Inheritance really isn't an concept appropriate to this case. Pretty much all the functions are rewritten for masked arrays. Which is one reason maintenance is a hassle, lots of things have to be maintained in two places. | There is a well-known disease of OOP where everything seems to bubble up to the > top of the class hierarchy - so that the base class becomes bloated to > support > every feature needed by subclasses. I believe that's considered poor > design. > > Is there a way to support ma as a subclass of ndarray, without introducing > overhead into ndarray? Without having given this much real thought, I do > have > some idea. What are the operations that we need on arrays? The most > basic are: > > 1. element access > 2. get size (shape) > > In an OO design, these would be virtual functions (or in C, pointers to > functions). But this would introduce unacceptable overhead. > > Sure, and you would still have two different functions of almost everything. > In a generic programming design (c++ templates), we would essentially > generate 2 > copies of every function, one that operates on plain arrays, and one that > operates on masked arrays, each using the appropriate function for element > access, shape, etc. This way, no uneeded overhead is introduced, > (although the > code size is increased - but this is probably of little consequence on > modern > demand-paged OS). > > Following this approach, ma and ndarray don't have to have any inheritance > relation. OTOH, inheritance is probably useful since there are many common > features to ma and ndarray, and a lot of code could be shared. > Not many common behaviours. Analogous behaviours, perhaps. And since everything ends up written twice the best was to share code is to do it in the base class. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sun Apr 29 17:45:52 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 29 Apr 2012 17:45:52 -0400 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Sat, Apr 28, 2012 at 11:18 AM, Charles R Harris wrote: > > > On Sat, Apr 28, 2012 at 9:13 AM, Wes McKinney wrote: >> >> On Fri, Apr 27, 2012 at 4:57 PM, Robert Kern >> wrote: >> > On Fri, Apr 27, 2012 at 21:52, Travis Vaught wrote: >> >> With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: >> >> >> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> In [1]: import numpy as np >> >> >> >> In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', >> >> 'low', >> >> ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?'close', 'volume', 'adjclose'], >> >> ? ?...: ? ? ? ? ? ? ? ? ? ?'formats':['S8', 'M8', float, float, float, >> >> float, >> >> ? ?...: ? ? ? ? ? ? ? ? ? ? ? ?float, float]}) >> >> >> >> In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, >> >> 50000000, >> >> 602.0),] >> >> >> >> In [4]: recdata = np.array(data, dtype=schema) >> >> >> >> In [5]: recdata >> >> Out[5]: >> >> array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, >> >> 598.0, >> >> 602.0, 50000000.0, 602.0)], >> >> ? ? ? dtype=[('symbol', '|S8'), ('date', ('> >> '> >> ('high', '> >> ('adjclose', '> >> >> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> >> >> With numpy-1.7.0.dev_3cb783e I get this: >> >> >> >>>>> import numpy as np >> >> >> >>>>> schema = >> >>>>> >> >>>>> np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], >> >>>>> 'formats':['S8','M8',float,float,float,float,float,float]}) >> >> >> >>>>> data = ?[("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, >> >>>>> 50000000, >> >>>>> 602.0),] >> >> >> >>>>> recdata = np.array(data, dtype=schema) >> >> Traceback (most recent call last): >> >> ? File "", line 1, in >> >> ValueError: Cannot create a NumPy datetime other than NaT with generic >> >> units >> >> >> >>>>> np.version.version >> >> '1.7.0.dev-3cb783e' >> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> Any hints about a regression I can check for? Or perhaps I missed an >> >> api >> >> change for specifying datetime dtypes? >> > >> > Judging from the error message, it looks like an intentional API change. >> > >> > -- >> > Robert Kern >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> Maybe this should be raised as a bug (where do we report NumPy bugs >> these days, still Trac?). As I'm moving to datetime64 in pandas if >> NumPy 1.6.1 data has unpickling issues on NumPy 1.7+ it's going to be >> very problematic. > > > I was wondering what datetime you were using since the version in 1.6 had > issues. Have you tested with both? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Could you define issues? I haven't had a chance to make the library compatible with both 1.6.1 and 1.7.0 yet (like Travis I'm using NumPy 1.6.1 from EPD); it's important though as pandas will be the first widely used library I know of that will make heavy use of datetime64. - Wes From charlesr.harris at gmail.com Sun Apr 29 19:12:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 29 Apr 2012 17:12:44 -0600 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Sun, Apr 29, 2012 at 3:45 PM, Wes McKinney wrote: > On Sat, Apr 28, 2012 at 11:18 AM, Charles R Harris > wrote: > > > > > > On Sat, Apr 28, 2012 at 9:13 AM, Wes McKinney > wrote: > >> > >> On Fri, Apr 27, 2012 at 4:57 PM, Robert Kern > >> wrote: > >> > On Fri, Apr 27, 2012 at 21:52, Travis Vaught > wrote: > >> >> With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: > >> >> > >> >> > >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> >> > >> >> In [1]: import numpy as np > >> >> > >> >> In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', > >> >> 'low', > >> >> ...: 'close', 'volume', 'adjclose'], > >> >> ...: 'formats':['S8', 'M8', float, float, > float, > >> >> float, > >> >> ...: float, float]}) > >> >> > >> >> In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, > >> >> 50000000, > >> >> 602.0),] > >> >> > >> >> In [4]: recdata = np.array(data, dtype=schema) > >> >> > >> >> In [5]: recdata > >> >> Out[5]: > >> >> array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, > >> >> 598.0, > >> >> 602.0, 50000000.0, 602.0)], > >> >> dtype=[('symbol', '|S8'), ('date', (' >> >> ' >> >> ('high', ' >> >> ('adjclose', ' >> >> > >> >> > >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> >> > >> >> > >> >> With numpy-1.7.0.dev_3cb783e I get this: > >> >> > >> >>>>> import numpy as np > >> >> > >> >>>>> schema = > >> >>>>> > >> >>>>> > np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], > >> >>>>> 'formats':['S8','M8',float,float,float,float,float,float]}) > >> >> > >> >>>>> data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, > >> >>>>> 50000000, > >> >>>>> 602.0),] > >> >> > >> >>>>> recdata = np.array(data, dtype=schema) > >> >> Traceback (most recent call last): > >> >> File "", line 1, in > >> >> ValueError: Cannot create a NumPy datetime other than NaT with > generic > >> >> units > >> >> > >> >>>>> np.version.version > >> >> '1.7.0.dev-3cb783e' > >> >> > >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> >> > >> >> Any hints about a regression I can check for? Or perhaps I missed an > >> >> api > >> >> change for specifying datetime dtypes? > >> > > >> > Judging from the error message, it looks like an intentional API > change. > >> > > >> > -- > >> > Robert Kern > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> Maybe this should be raised as a bug (where do we report NumPy bugs > >> these days, still Trac?). As I'm moving to datetime64 in pandas if > >> NumPy 1.6.1 data has unpickling issues on NumPy 1.7+ it's going to be > >> very problematic. > > > > > > I was wondering what datetime you were using since the version in 1.6 had > > issues. Have you tested with both? > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > Could you define issues? I haven't had a chance to make the library > compatible with both 1.6.1 and 1.7.0 yet (like Travis I'm using NumPy > 1.6.1 from EPD); it's important though as pandas will be the first > widely used library I know of that will make heavy use of datetime64. > > I'm not sure myself, but Travis asked Mark to get datetime fixed up. Mark would probably be the best to answer the question. You might ping him offline if he isn't watching the list at the moment. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bacmsantos at gmail.com Mon Apr 30 07:09:20 2012 From: bacmsantos at gmail.com (Bruno Santos) Date: Mon, 30 Apr 2012 12:09:20 +0100 Subject: [Numpy-discussion] Alternative to R phyper Message-ID: Hello everyone, I have a bit of code where I am using rpy2 to import R phyper so I can perform an hypergeometric test. Unfortunately our cluster does not have a functional installation of rpy2 working. So I am wondering if I could translate to scipy which would make the code completly independent of R. The python code I have is as following: def lphyper(self,q,m,n,k): """ self.phyper(self,q,m,n,k) Calculate p-value using R function phyper from rpy2 low-level interface. "R Documentation phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) q: vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m: the number of white balls in the urn. n: the number of black balls in the urn. k: the number of balls drawn from the urn. log.p: logical; if TRUE, probabilities p are given as log(p). lower.tail: logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x]. """ phyper_q = SexpVector([q,], rinterface.INTSXP) phyper_m = SexpVector([m,], rinterface.INTSXP) phyper_n = SexpVector([n,], rinterface.INTSXP) phyper_k = SexpVector([k,], rinterface.INTSXP) return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] I have looked to scipy.stats.hypergeom but it is giving me a different result which is also negative. > 1-phyper(45, 92, 7518, 1329) [1] 6.92113e-13 In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) Out[24]: -8.4343643180773142e-12 This was supposed to be an error with an older version of scipy but I am using more recent versions of it which should not contain the error anymore: In [26]: numpy.__version__ Out[26]: '1.5.1' In [27]: scipy.__version__ Out[27]: '0.9.0' thank you very much in advance for any help. Best, Bruno -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 30 07:27:26 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 30 Apr 2012 12:27:26 +0100 Subject: [Numpy-discussion] Alternative to R phyper In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 12:09, Bruno Santos wrote: > Hello everyone, > > I have a bit of code where I am using rpy2 to import R phyper so I can > perform an hypergeometric test. Unfortunately our cluster does not have a > functional installation of rpy2 working. So I am wondering if I could > translate to scipy which would make the code completly independent of R. The > python code I have is as following: > > > def lphyper(self,q,m,n,k): > ? ? ? ? """ > ? ? ? ? self.phyper(self,q,m,n,k) > ? ? ? ? Calculate p-value using R function phyper from rpy2 low-level > ? ? ? ? interface. > ? ? ? ? "R Documentation > ? ? ? ? phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) > ? ? ? ? q: vector of quantiles representing the number of white balls > ? ? ? ? ? ? drawn without replacement from an urn which contains both > ? ? ? ? ? ? black and white balls. > ? ? ? ? m: the number of white balls in the urn. > ? ? ? ? n: the number of black balls in the urn. > ? ? ? ? k: the number of balls drawn from the urn. > ? ? ? ? log.p: logical; if TRUE, probabilities p are given as log(p). > ? ? ? ? lower.tail: logical; if TRUE (default), probabilities are P[X <= x], > ? ? ? ? ? ? otherwise, P[X > x]. > ? ? ? ? """ > ? ? ? ? phyper_q = SexpVector([q,], rinterface.INTSXP) > ? ? ? ? phyper_m = SexpVector([m,], rinterface.INTSXP) > ? ? ? ? phyper_n = SexpVector([n,], rinterface.INTSXP) > ? ? ? ? phyper_k = SexpVector([k,], rinterface.INTSXP) > ? ? ? ? return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] > > I have looked to scipy.stats.hypergeom but it is giving me a different > result which is also negative. >> 1-phyper(45, 92, 7518, 1329) > [1] 6.92113e-13 > > In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) > Out[24]: -8.4343643180773142e-12 The CDF (CMF? whatever) for stats.hypergeom is not implemented explicitly, so it falls back to the default implementation of just summing up the PMF. You are falling victim to floating point error accumulation such that the sum exceeds 1.0, so the survival function 1-CDF ends up negative. I don't think we have a better implementation of the CDF anywhere in scipy, sorry. -- Robert Kern From josef.pktd at gmail.com Mon Apr 30 09:03:41 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 30 Apr 2012 09:03:41 -0400 Subject: [Numpy-discussion] Alternative to R phyper In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern wrote: > On Mon, Apr 30, 2012 at 12:09, Bruno Santos wrote: >> Hello everyone, >> >> I have a bit of code where I am using rpy2 to import R phyper so I can >> perform an hypergeometric test. Unfortunately our cluster does not have a >> functional installation of rpy2 working. So I am wondering if I could >> translate to scipy which would make the code completly independent of R. The >> python code I have is as following: >> >> >> def lphyper(self,q,m,n,k): >> ? ? ? ? """ >> ? ? ? ? self.phyper(self,q,m,n,k) >> ? ? ? ? Calculate p-value using R function phyper from rpy2 low-level >> ? ? ? ? interface. >> ? ? ? ? "R Documentation >> ? ? ? ? phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) >> ? ? ? ? q: vector of quantiles representing the number of white balls >> ? ? ? ? ? ? drawn without replacement from an urn which contains both >> ? ? ? ? ? ? black and white balls. >> ? ? ? ? m: the number of white balls in the urn. >> ? ? ? ? n: the number of black balls in the urn. >> ? ? ? ? k: the number of balls drawn from the urn. >> ? ? ? ? log.p: logical; if TRUE, probabilities p are given as log(p). >> ? ? ? ? lower.tail: logical; if TRUE (default), probabilities are P[X <= x], >> ? ? ? ? ? ? otherwise, P[X > x]. >> ? ? ? ? """ >> ? ? ? ? phyper_q = SexpVector([q,], rinterface.INTSXP) >> ? ? ? ? phyper_m = SexpVector([m,], rinterface.INTSXP) >> ? ? ? ? phyper_n = SexpVector([n,], rinterface.INTSXP) >> ? ? ? ? phyper_k = SexpVector([k,], rinterface.INTSXP) >> ? ? ? ? return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0] >> >> I have looked to scipy.stats.hypergeom but it is giving me a different >> result which is also negative. >>> 1-phyper(45, 92, 7518, 1329) >> [1] 6.92113e-13 >> >> In [24]: stats.hypergeom.sf(45,(92+7518),92,1329) >> Out[24]: -8.4343643180773142e-12 the corrected version with explicit sf was added in 0.10 >>> from scipy import stats >>> stats.hypergeom.sf(45,(92+7518),92,1329) 6.9212632647221852e-13 >>> import scipy >>> scipy.__version__ '0.10.0b2' Josef > > The CDF (CMF? whatever) for stats.hypergeom is not implemented > explicitly, so it falls back to the default implementation of just > summing up the PMF. You are falling victim to floating point error > accumulation such that the sum exceeds 1.0, so the survival function > 1-CDF ends up negative. I don't think we have a better implementation > of the CDF anywhere in scipy, sorry. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Mon Apr 30 15:19:14 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 30 Apr 2012 14:19:14 -0500 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: I've done some comparisons of 1.6.1 and 1.7 (master), and written up some key differences in a pull request here: https://github.com/numpy/numpy/pull/264/files#diff-0 What you've discovered here looks like an interaction between the automatic unit detection and struct dtypes, it's a bug to do with how flexible types work. To match how the struct dtypes work with flexible string dtypes, it should raise an error when trying to create a dtype with just 'M8' and no unit specified, at In [4]. -Mark On Fri, Apr 27, 2012 at 3:52 PM, Travis Vaught wrote: > With NumPy 1.6.1 (from EPD 7.2-2) I get this behavior: > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > In [1]: import numpy as np > > In [2]: schema = np.dtype({'names':['symbol', 'date', 'open', 'high', > 'low', > ...: 'close', 'volume', 'adjclose'], > ...: 'formats':['S8', 'M8', float, float, float, > float, > ...: float, float]}) > > In [3]: data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, > 50000000, 602.0),] > > In [4]: recdata = np.array(data, dtype=schema) > > In [5]: recdata > Out[5]: > array([ ('AAPL', datetime.datetime(2012, 4, 12, 0, 0), 600.0, 605.0, > 598.0, 602.0, 50000000.0, 602.0)], > dtype=[('symbol', '|S8'), ('date', (' ' ' > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > With numpy-1.7.0.dev_3cb783e I get this: > > >>> import numpy as np > > >>> schema = > np.dtype({'names':['symbol','data','open','high','low','close','volume','adjclose'], > 'formats':['S8','M8',float,float,float,float,float,float]}) > > >>> data = [("AAPL", "2012-04-12", 600.0, 605.0, 598.0, 602.0, 50000000, > 602.0),] > > >>> recdata = np.array(data, dtype=schema) > Traceback (most recent call last): > File "", line 1, in > ValueError: Cannot create a NumPy datetime other than NaT with generic > units > > >>> np.version.version > '1.7.0.dev-3cb783e' > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Any hints about a regression I can check for? Or perhaps I missed an api > change for specifying datetime dtypes? > > Best, > > Travis > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Apr 30 16:16:04 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 30 Apr 2012 22:16:04 +0200 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help Message-ID: Hi all, Charles has done a great job of backporting a lot of bug fixes to 1.6.2, see PRs 260, 261, 262 and 263. For those who are interested, please have a look at those PRs to see and comment on what's proposed to go into 1.6.2. I also have a request for help with testing: can someone who uses MSVC test (preferably with a 2.x and a 3.x version)? I have a branch with all four PRs merged at https://github.com/rgommers/numpy/tree/bports Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 30 17:11:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Apr 2012 15:11:18 -0600 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 1:19 PM, Mark Wiebe wrote: > I've done some comparisons of 1.6.1 and 1.7 (master), and written up some > key differences in a pull request here: > > https://github.com/numpy/numpy/pull/264/files#diff-0 > > What you've discovered here looks like an interaction between the > automatic unit detection and struct dtypes, it's a bug to do with how > flexible types work. To match how the struct dtypes work with flexible > string dtypes, it should raise an error when trying to create a dtype with > just 'M8' and no unit specified, at In [4]. > > Hi Mark, is there anything relevant that should go in the 1.6.2 release? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmanuelle.gouillart at nsup.org Mon Apr 30 17:41:03 2012 From: emmanuelle.gouillart at nsup.org (Emmanuelle Gouillart) Date: Mon, 30 Apr 2012 23:41:03 +0200 Subject: [Numpy-discussion] Euroscipy 2012 deadline extension: May 7th Message-ID: <20120430214103.GA16609@phare.normalesup.org> The committee of the Euroscipy 2012 conference has extended the deadline for abstract submission to **Monday May 7th, midnight** (Brussels time). Up to then, new abstracts may be submitted on http://www.euroscipy.org/conference/euroscipy2012, and already-submitted abstracts can be modified. We are very much looking forward to your submissions to the conference. Euroscipy 2012 is the annual European conference for scientists using Python. It will be held August 23-27 2012 in Brussels, Belgium. It is also still possible to propose sprints that will take place after the conference, please write to Berkin Malkoc (malkocb at itu.edu.tr) for practical organization (rooms, ...). Any other questions should be addressed exclusively to org-team at lists.euroscipy.org -- Emmanuelle From mwwiebe at gmail.com Mon Apr 30 18:20:46 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 30 Apr 2012 17:20:46 -0500 Subject: [Numpy-discussion] datetime dtype possible regression In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 4:11 PM, Charles R Harris wrote: > > > On Mon, Apr 30, 2012 at 1:19 PM, Mark Wiebe wrote: > >> I've done some comparisons of 1.6.1 and 1.7 (master), and written up some >> key differences in a pull request here: >> >> https://github.com/numpy/numpy/pull/264/files#diff-0 >> >> What you've discovered here looks like an interaction between the >> automatic unit detection and struct dtypes, it's a bug to do with how >> flexible types work. To match how the struct dtypes work with flexible >> string dtypes, it should raise an error when trying to create a dtype with >> just 'M8' and no unit specified, at In [4]. >> >> > Hi Mark, is there anything relevant that should go in the 1.6.2 release? > I don't think so, I can't think of anything in the datetime fixes which is small and isolated enough to be worth integrating in the 1.6 series. -Mark > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Mon Apr 30 19:13:58 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Mon, 30 Apr 2012 16:13:58 -0700 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: <4F9F1CB6.5050804@uci.edu> On 4/30/2012 1:16 PM, Ralf Gommers wrote: > Hi all, > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > see PRs 260, 261, 262 and 263. For those who are interested, please have > a look at those PRs to see and comment on what's proposed to go into 1.6.2. > > I also have a request for help with testing: can someone who uses MSVC > test (preferably with a 2.x and a 3.x version)? I have a branch with all > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > Thanks, > Ralf > Hi Ralf, that branch builds and tests OK with msvc9/MKL on win-amd64-py2.7 and win-amd64-py3.2. No apparent incompatibilities with scipy or matplotlib either. Christoph From travis at continuum.io Mon Apr 30 19:31:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 30 Apr 2012 18:31:50 -0500 Subject: [Numpy-discussion] Issue Tracking Message-ID: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Hey all, We have been doing some investigation of various approaches to issue tracking. The last time the conversation left this list was with Ralf's current list of preferences as: 1) Redmine 2) Trac 3) Github Since that time, Maggie who has been doing a lot of work settting up various issue tracking tools over the past couple of months, has set up a redmine instance and played with it. This is a possibility as a future issue tracker. However, today I took a hard look at what the IPython folks are doing with their issue tracker and was very impressed by the level of community integration that having issues tracked by Github provides. Right now, we have a major community problem in that there are 3 conversations taking place (well at least 2 1/2). One on Github, one on this list, and one on the Trac and it's accompanying wiki. I would like to propose just using Github's issue tracker. This just seems like the best move overall for us at this point. I like how the Pull Request mechanism integrates with the issue tracking. We could setup a Redmine instance but this would just re-create the same separation of communities that currently exists with the pull-requests, the mailing list, and the Trac pages. Redmine is nicer than Trac, but it's still a separate space. We need to make Github the NumPy developer hub and not have it spread throughout several sites. The same is true of SciPy. I think if SciPy also migrates to use Github issues, then together with IPython we can really be a voice that helps Github. I will propose to NumFOCUS that the Foundation sponsor migration of the Trac to Github for NumPy and SciPy. If anyone would like to be involved in this migration project, please let me know. Comments, concerns? -Travis From travis at continuum.io Mon Apr 30 19:39:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 30 Apr 2012 18:39:26 -0500 Subject: [Numpy-discussion] Continuous Integration Message-ID: Hello all, NumFOCUS has been working with Continuum Analytics and multiple people in the community on Continuous Integration services for NumPy. Right now the tools we are using are: TeamCity ShiningPandas One great thing about Continuous Integration is that you don't have to make a single decision as long as you have a place to report everything. Continuum Analytics is hosting a TeamCity site and I believe some ShiningPandas sites have been set up. We have some work to do to get the build agents up and running and a single point that communicates the status of all the agents. This email is to just let people know that we are still working on this. Maggie Mari and Bryan van de Ven have been doing the work on the TeamCity side. There are several others who have been involved world wide in setting up build agents and investigating ShiningPandas. Continuum Analytics is still looking for someone who can devote at least 50% of their time to working on these issues as part of their full time job. You can be located anywhere in the world for this. If you are interested send a note to jobs at continuum.io If you have particular reasons why we should choose a particular CI service, please speak up and let your voice be heard. There is still time to make a difference in what we are setting up. Best regards, -Travis From warren.weckesser at enthought.com Mon Apr 30 21:12:52 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 30 Apr 2012 20:12:52 -0500 Subject: [Numpy-discussion] SciPy 2012 Abstract and Tutorial Deadlines Extended Message-ID: SciPy 2012 Conference Deadlines Extended Didn't quite finish your abstract or tutorial yet? Good news: the SciPy 2012 organizers have extended the deadline until Friday, May 4. Proposals for tutorials and abstracts for talks and posters are now due by midnight (Austin time, CDT), May 4. For the many of you who have already submitted an abstract or tutorial: thanks! If you need to make corrections to an abstract or tutorial that you have already submitted, you may resubmit it by the same deadline. The SciPy 2012 Organizers -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Mon Apr 30 23:14:55 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Mon, 30 Apr 2012 22:14:55 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: <4F9F552F.4060605@creativetrax.com> On 4/30/12 6:31 PM, Travis Oliphant wrote: > Hey all, > > We have been doing some investigation of various approaches to issue tracking. The last time the conversation left this list was with Ralf's current list of preferences as: > > 1) Redmine > 2) Trac > 3) Github > > Since that time, Maggie who has been doing a lot of work settting up various issue tracking tools over the past couple of months, has set up a redmine instance and played with it. This is a possibility as a future issue tracker. > > However, today I took a hard look at what the IPython folks are doing with their issue tracker and was very impressed by the level of community integration that having issues tracked by Github provides. Right now, we have a major community problem in that there are 3 conversations taking place (well at least 2 1/2). One on Github, one on this list, and one on the Trac and it's accompanying wiki. > > I would like to propose just using Github's issue tracker. This just seems like the best move overall for us at this point. I like how the Pull Request mechanism integrates with the issue tracking. We could setup a Redmine instance but this would just re-create the same separation of communities that currently exists with the pull-requests, the mailing list, and the Trac pages. Redmine is nicer than Trac, but it's still a separate space. We need to make Github the NumPy developer hub and not have it spread throughout several sites. > > The same is true of SciPy. I think if SciPy also migrates to use Github issues, then together with IPython we can really be a voice that helps Github. I will propose to NumFOCUS that the Foundation sponsor migration of the Trac to Github for NumPy and SciPy. If anyone would like to be involved in this migration project, please let me know. > > Comments, concerns? I've been pretty impressed with the lemonade that the IPython folks have made out of what I see as pretty limiting shortcomings of the github issue tracker. I've been trying to use it for a much smaller project (https://github.com/sagemath/sagecell/), and it is a lot harder, in my (somewhat limited) experience, than using trac or the google issue tracker. None of these issues seems like it would be too hard to solve, but since we don't even have the source to the tracker, we're somewhat at github's mercy for any improvements. Github does have a very nice API for interacting with the data, which somewhat makes up for some of the severe shortcomings of the web interface. In no particular order, here are a few that come to mind immediately: 1. No key:value pairs for labels (Fernando brought this up a long time ago, I think). This is brilliant in Google code's tracker, and allows for custom fields that help in tracking workflow (like status, priority, etc.). Sure, you can do what the IPython folks are doing and just create labels for every possible status, but that's unwieldy and takes a lot of discipline to maintain. Which means it takes a lot of developer time or it becomes inconsistent and not very useful. 2. The disjointed relationship between pull requests and issues. They share numberings, for example, and both support discussions, etc. If you use the API, you can submit code to an issue, but then the issue becomes a pull request, which means that all labels on the issue disappear from the web interface (but you can still manage to set labels using the list view of the issue tracker, if I recall correctly). If you don't attach code to issues, it means that every issue is duplicated in a pull request, which splits the conversation up between an issue ticket and a pull request ticket. 3. No attachments for issues (screenshots, supporting documents, etc.). Having API access to data won't help you here. 4. No custom queries. We love these in the Sage trac instance; since we have full access to the database, we can run any sort of query we want. With API data access, you can build your own queries, so maybe this isn't insurmountable. 5. Stylistically, the webpage is not very dense on information. I get frustrated when trying to see the issues because they only come 25 at a time, and never grouped into any sort of groupings, and there are only 3 options for sorting issues. Compare the very nice, dense layout of Google Code issues or bitbucket. Google Code issues also lets you cross-tabulate the issues so you can quickly triage them. Compare also the pretty comprehensive options for sorting and grouping things in trac. 6. Side-by-side diffs are nice to have, and I believe bitbucket and google code both have them. Of course, this isn't a deal-breaker because you can always pull the branch down, but it would be nice to have, and there's not really a way we can put it into the github tracker ourselves. How does, for example, the JIRA github connector work? Does it pull in code comments, etc.? Anyways, I'm not a regular contributor to numpy, but I have been trying to get used to the github tracker for about a year now, and I just keep getting more frustrated at it. I suppose the biggest frustrating part about it is that it is closed source, so even if I did want to scratch an itch, I can't. That said, it is nice to have code and dev conversations happening in one place. There are great things about github issues, of course. But I'm not so sure, for me, that they outweigh some of the administrative issues listed above. Thanks, Jason From ben.root at ou.edu Mon Apr 30 23:27:13 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 30 Apr 2012 23:27:13 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: On Monday, April 30, 2012, Travis Oliphant wrote: > Hey all, > > We have been doing some investigation of various approaches to issue > tracking. The last time the conversation left this list was with > Ralf's current list of preferences as: > > 1) Redmine > 2) Trac > 3) Github > > Since that time, Maggie who has been doing a lot of work settting up > various issue tracking tools over the past couple of months, has set up a > redmine instance and played with it. This is a possibility as a future > issue tracker. > > However, today I took a hard look at what the IPython folks are doing with > their issue tracker and was very impressed by the level of community > integration that having issues tracked by Github provides. Right now, we > have a major community problem in that there are 3 conversations taking > place (well at least 2 1/2). One on Github, one on this list, and one on > the Trac and it's accompanying wiki. > > I would like to propose just using Github's issue tracker. This just > seems like the best move overall for us at this point. I like how the > Pull Request mechanism integrates with the issue tracking. We could > setup a Redmine instance but this would just re-create the same separation > of communities that currently exists with the pull-requests, the mailing > list, and the Trac pages. Redmine is nicer than Trac, but it's still a > separate space. We need to make Github the NumPy developer hub and not > have it spread throughout several sites. > > The same is true of SciPy. I think if SciPy also migrates to use Github > issues, then together with IPython we can really be a voice that helps > Github. I will propose to NumFOCUS that the Foundation sponsor migration > of the Trac to Github for NumPy and SciPy. If anyone would like to be > involved in this migration project, please let me know. > > Comments, concerns? > > -Travis Would it be possible to use the combined clout of the scipy packages as a way to put some weight behind feature requests to github? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 30 23:28:01 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Apr 2012 21:28:01 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: On Mon, Apr 30, 2012 at 5:31 PM, Travis Oliphant wrote: > Hey all, > > We have been doing some investigation of various approaches to issue > tracking. The last time the conversation left this list was with > Ralf's current list of preferences as: > > 1) Redmine > 2) Trac > 3) Github > > Since that time, Maggie who has been doing a lot of work settting up > various issue tracking tools over the past couple of months, has set up a > redmine instance and played with it. This is a possibility as a future > issue tracker. > > However, today I took a hard look at what the IPython folks are doing with > their issue tracker and was very impressed by the level of community > integration that having issues tracked by Github provides. Right now, we > have a major community problem in that there are 3 conversations taking > place (well at least 2 1/2). One on Github, one on this list, and one on > the Trac and it's accompanying wiki. > > I would like to propose just using Github's issue tracker. This just > seems like the best move overall for us at this point. I like how the > Pull Request mechanism integrates with the issue tracking. We could > setup a Redmine instance but this would just re-create the same separation > of communities that currently exists with the pull-requests, the mailing > list, and the Trac pages. Redmine is nicer than Trac, but it's still a > separate space. We need to make Github the NumPy developer hub and not > have it spread throughout several sites. > > The same is true of SciPy. I think if SciPy also migrates to use Github > issues, then together with IPython we can really be a voice that helps > Github. I will propose to NumFOCUS that the Foundation sponsor migration > of the Trac to Github for NumPy and SciPy. If anyone would like to be > involved in this migration project, please let me know. > > There is a group where I work that purchased the enterprise version of github. But they still use trac. I think Ralf's opinion should count for a fair amount here, since the tracker is important for releases and backports. Having a good connection between commits and tickets is also very helpful, although sticking with github might be better there. The issue tracker isn't really intended as social media and I find the notifications from trac sufficient. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: