From jtaylor.debian at googlemail.com Mon Aug 4 18:05:43 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:05:43 +0200 Subject: [SciPy-Dev] last call for numpy 1.8.2 bugfixes Message-ID: <53E003B7.3090100@googlemail.com> hi, as numpy 1.9 is going to be a relative hard upgrade as indexing changes expose a couple bugs in third party packages and the large amount of small little incompatibilities I will create a numpy 1.8.2 release tomorrow with a couple of important or hard to work around bugfixes. The most important bugfix is fixing the wrong result partition with multiple selections could produce if selections ended up in an equal range, see https://github.com/numpy/numpy/issues/4836 (if the crash is still unreproducable, help appreciated). the rest of the fixes are small ones listed below. If I have missed one or you consider one of the fixes to invasive for a bugfix release please speak up now. As the number of fixes is small I will skip a release candidate. Make fftpack._raw_fft threadsafe https://github.com/numpy/numpy/issues/4656 Prevent division by zero https://github.com/numpy/numpy/issues/650 Fix lack of NULL check in array_richcompare https://github.com/numpy/numpy/issues/4613 incorrect argument order to _copyto in in np.nanmax, np.nanmin https://github.com/numpy/numpy/issues/4628 Hold GIL for types with fields, fixes https://github.com/numpy/numpy/issues/4642 svd ufunc typo https://github.com/numpy/numpy/issues/4733 check alignment of strides for byteswap https://github.com/numpy/numpy/issues/4774 add missing elementsize alignment check for simd reductions https://github.com/numpy/numpy/issues/4853 ifort has issues with optimization flag /O2 https://github.com/numpy/numpy/issues/4602 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Mon Aug 4 18:09:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 4 Aug 2014 15:09:39 -0700 Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: <53E003B7.3090100@googlemail.com> References: <53E003B7.3090100@googlemail.com> Message-ID: Hi, On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor wrote: > hi, > as numpy 1.9 is going to be a relative hard upgrade as indexing changes > expose a couple bugs in third party packages and the large amount of > small little incompatibilities I will create a numpy 1.8.2 release > tomorrow with a couple of important or hard to work around bugfixes. > > The most important bugfix is fixing the wrong result partition with > multiple selections could produce if selections ended up in an equal > range, see https://github.com/numpy/numpy/issues/4836 (if the crash is > still unreproducable, help appreciated). > > the rest of the fixes are small ones listed below. > If I have missed one or you consider one of the fixes to invasive for a > bugfix release please speak up now. > As the number of fixes is small I will skip a release candidate. > > > Make fftpack._raw_fft threadsafe > https://github.com/numpy/numpy/issues/4656 > > Prevent division by zero > https://github.com/numpy/numpy/issues/650 > > Fix lack of NULL check in array_richcompare > https://github.com/numpy/numpy/issues/4613 > > incorrect argument order to _copyto in in np.nanmax, np.nanmin > https://github.com/numpy/numpy/issues/4628 > > Hold GIL for types with fields, fixes > https://github.com/numpy/numpy/issues/4642 > > svd ufunc typo > https://github.com/numpy/numpy/issues/4733 > > check alignment of strides for byteswap > https://github.com/numpy/numpy/issues/4774 > > add missing elementsize alignment check for simd reductions > https://github.com/numpy/numpy/issues/4853 > > ifort has issues with optimization flag /O2 > https://github.com/numpy/numpy/issues/4602 Any chance of a RC to give us some time to test? Cheers, Matthew From jtaylor.debian at googlemail.com Mon Aug 4 18:12:50 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:12:50 +0200 Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> Message-ID: <53E00562.5070602@googlemail.com> On 05.08.2014 00:09, Matthew Brett wrote: > Hi, > > On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor > wrote: >> hi, >> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >> expose a couple bugs in third party packages and the large amount of >> small little incompatibilities I will create a numpy 1.8.2 release >> tomorrow with a couple of important or hard to work around bugfixes. >>... > > Any chance of a RC to give us some time to test? > I hope I have only selected fixes that are safe and do not require a RC. sure we could do one, but if there are issues we can also just make a quick 1.8.3 release follow up. the main backport PR is: https://github.com/numpy/numpy/pull/4949 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From njs at pobox.com Mon Aug 4 18:25:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 4 Aug 2014 23:25:04 +0100 Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: <53E00562.5070602@googlemail.com> References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor wrote: > On 05.08.2014 00:09, Matthew Brett wrote: >> Hi, >> >> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >> wrote: >>> hi, >>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>> expose a couple bugs in third party packages and the large amount of >>> small little incompatibilities I will create a numpy 1.8.2 release >>> tomorrow with a couple of important or hard to work around bugfixes. >>>... >> >> Any chance of a RC to give us some time to test? >> > > I hope I have only selected fixes that are safe and do not require a RC. > sure we could do one, but if there are issues we can also just make a > quick 1.8.3 release follow up. > > the main backport PR is: https://github.com/numpy/numpy/pull/4949 It's probably better to just make an RC if it's not too much trouble... it's always possible to misjudge what issues arise, if there's a real-but-non-catastrophic issue then people 1.8.2 will remain in use even if 1.8.3 is released afterwards and force downstream libraries to work around the issues, and just in general it's good to have and follow standard processes because special cases lead to errors. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From matthew.brett at gmail.com Mon Aug 4 18:27:38 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 4 Aug 2014 15:27:38 -0700 Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith wrote: > On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor > wrote: >> On 05.08.2014 00:09, Matthew Brett wrote: >>> Hi, >>> >>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >>> wrote: >>>> hi, >>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>>> expose a couple bugs in third party packages and the large amount of >>>> small little incompatibilities I will create a numpy 1.8.2 release >>>> tomorrow with a couple of important or hard to work around bugfixes. >>>>... >>> >>> Any chance of a RC to give us some time to test? >>> >> >> I hope I have only selected fixes that are safe and do not require a RC. >> sure we could do one, but if there are issues we can also just make a >> quick 1.8.3 release follow up. A few days to test would be fine, I'd prefer an RC too, Cheers, Matthew From jtaylor.debian at googlemail.com Mon Aug 4 18:46:14 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:46:14 +0200 Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: <53E00D36.3080500@googlemail.com> On 05.08.2014 00:27, Matthew Brett wrote: > On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith wrote: >> On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor >> wrote: >>> On 05.08.2014 00:09, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >>>> wrote: >>>>> hi, >>>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>>>> expose a couple bugs in third party packages and the large amount of >>>>> small little incompatibilities I will create a numpy 1.8.2 release >>>>> tomorrow with a couple of important or hard to work around bugfixes. >>>>> ... >>>> >>>> Any chance of a RC to give us some time to test? >>>> >>> >>> I hope I have only selected fixes that are safe and do not require a RC. >>> sure we could do one, but if there are issues we can also just make a >>> quick 1.8.3 release follow up. > > A few days to test would be fine, I'd prefer an RC too, > alright I'll make an RC tomorrow and planning for release this weekend then. From jtaylor.debian at googlemail.com Tue Aug 5 15:45:02 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 21:45:02 +0200 Subject: [SciPy-Dev] ANN: NumPy 1.8.2 release candidate Message-ID: <53E1343E.7020805@googlemail.com> Hello, I am pleased to announce the first release candidate for numpy 1.8.2, a pure bugfix release for the 1.8.x series. https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ If no regressions show up the final release is planned this weekend. The upgrade is recommended for all users of the 1.8.x series. Following issues have been fixed: * gh-4836: partition produces wrong results for multiple selections in equal ranges * gh-4656: Make fftpack._raw_fft threadsafe * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin * gh-4613: Fix lack of NULL check in array_richcompare * gh-4642: Hold GIL for converting dtypes types with fields * gh-4733: fix np.linalg.svd(b, compute_uv=False) * gh-4853: avoid unaligned simd load on reductions on i386 * gh-4774: avoid unaligned access for strided byteswap * gh-650: Prevent division by zero when creating arrays from some buffers * gh-4602: ifort has issues with optimization flag O2, use O1 Source tarballs, windows installers and release notes can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jtaylor.debian at googlemail.com Sat Aug 9 08:38:02 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 09 Aug 2014 14:38:02 +0200 Subject: [SciPy-Dev] ANN: NumPy 1.8.2 bugfix release Message-ID: <53E6162A.8050809@googlemail.com> Hello, I am pleased to announce the release of NumPy 1.8.2, a pure bugfix release for the 1.8.x series. https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ The upgrade is recommended for all users of the 1.8.x series. Following issues have been fixed: * gh-4836: partition produces wrong results for multiple selections in equal ranges * gh-4656: Make fftpack._raw_fft threadsafe * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin * gh-4642: Hold GIL for converting dtypes types with fields * gh-4733: fix np.linalg.svd(b, compute_uv=False) * gh-4853: avoid unaligned simd load on reductions on i386 * gh-4722: Fix seg fault converting empty string to object * gh-4613: Fix lack of NULL check in array_richcompare * gh-4774: avoid unaligned access for strided byteswap * gh-650: Prevent division by zero when creating arrays from some buffers * gh-4602: ifort has issues with optimization flag O2, use O1 The source distributions have been uploaded to PyPI. The Windows installers, documentation and release notes can be found at: https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Sat Aug 9 20:23:54 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 9 Aug 2014 17:23:54 -0700 Subject: [SciPy-Dev] [Numpy-discussion] ANN: NumPy 1.8.2 bugfix release In-Reply-To: <53E6162A.8050809@googlemail.com> References: <53E6162A.8050809@googlemail.com> Message-ID: On Sat, Aug 9, 2014 at 5:38 AM, Julian Taylor wrote: > Hello, > > I am pleased to announce the release of NumPy 1.8.2, a > pure bugfix release for the 1.8.x series. > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ > The upgrade is recommended for all users of the 1.8.x series. > > Following issues have been fixed: > * gh-4836: partition produces wrong results for multiple selections in > equal ranges > * gh-4656: Make fftpack._raw_fft threadsafe > * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin > * gh-4642: Hold GIL for converting dtypes types with fields > * gh-4733: fix np.linalg.svd(b, compute_uv=False) > * gh-4853: avoid unaligned simd load on reductions on i386 > * gh-4722: Fix seg fault converting empty string to object > * gh-4613: Fix lack of NULL check in array_richcompare > * gh-4774: avoid unaligned access for strided byteswap > * gh-650: Prevent division by zero when creating arrays from some buffers > * gh-4602: ifort has issues with optimization flag O2, use O1 > > > The source distributions have been uploaded to PyPI. The Windows > installers, documentation and release notes can be found at: > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ OSX wheels now also up on pypi, please let us know of any problems, Cheers, Matthew From manojkumarsivaraj334 at gmail.com Mon Aug 11 11:04:27 2014 From: manojkumarsivaraj334 at gmail.com (Manoj Kumar) Date: Mon, 11 Aug 2014 17:04:27 +0200 Subject: [SciPy-Dev] Fastest way to multiply a sparse matrix with another numpy array Message-ID: Hello, I was wondering what is the fastest way (format) to multiply a sparse matrix with a numpy array. Intuitively, a csr format multiplied with a numpy array which is fortran contiguous seems to be the fastest, but I have ran a few benchmarks and it seems otherwise. It is also mentioned here http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html that using csr matrices "may" be faster. In [5]: X Out[5]: <11314x130107 sparse matrix of type '' with 1787565 stored elements in Compressed Sparse Row format> In [6]: _, n_features = X.shape In [9]: w_c = np.random.rand(n_features, 10) In [10]: w_f = np.asarray(w_c, order='f') In [13]: csc = sparse.csc_matrix(X) In [30]: %timeit X * w_f 10 loops, best of 3: 40.5 ms per loop In [31]: %timeit X * w_c 10 loops, best of 3: 37.3 ms per loop In [32]: %timeit csc * w_c 10 loops, best of 3: 24.3 ms per loop In [33]: %timeit csc * w_f 10 loops, best of 3: 27.3 ms per loop It seems here, using a csc matrix is faster with a C-contiguous numpy array which is completely non-intuitive to me. Are there any hard rules for this? or is it data dependent? Sorry for my noobish questions! -- Regards, Manoj Kumar, GSoC 2014, Scikit-learn Mech Undergrad http://manojbits.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From manojkumarsivaraj334 at gmail.com Mon Aug 11 11:08:44 2014 From: manojkumarsivaraj334 at gmail.com (Manoj Kumar) Date: Mon, 11 Aug 2014 17:08:44 +0200 Subject: [SciPy-Dev] Fastest way to multiply a sparse matrix with another numpy array In-Reply-To: References: Message-ID: I'm sorry that I posted this to the developers mailing list. I was meaning to post this to the users list. On Mon, Aug 11, 2014 at 5:04 PM, Manoj Kumar wrote: > Hello, > > I was wondering what is the fastest way (format) to multiply a sparse > matrix with a numpy array. Intuitively, a csr format multiplied with a > numpy array which is fortran contiguous seems to be the fastest, but I have > ran a few benchmarks and it seems otherwise. It is also mentioned here > > http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html > that using csr matrices "may" be faster. > > > In [5]: X > Out[5]: > <11314x130107 sparse matrix of type '' > with 1787565 stored elements in Compressed Sparse Row format> > In [6]: _, n_features = X.shape > In [9]: w_c = np.random.rand(n_features, 10) > In [10]: w_f = np.asarray(w_c, order='f') > In [13]: csc = sparse.csc_matrix(X) > In [30]: %timeit X * w_f > 10 loops, best of 3: 40.5 ms per loop > > In [31]: %timeit X * w_c > 10 loops, best of 3: 37.3 ms per loop > > In [32]: %timeit csc * w_c > 10 loops, best of 3: 24.3 ms per loop > > In [33]: %timeit csc * w_f > 10 loops, best of 3: 27.3 ms per loop > > > It seems here, using a csc matrix is faster with a C-contiguous numpy > array which is completely non-intuitive to me. Are there any hard rules for > this? or is it data dependent? > > Sorry for my noobish questions! > -- > Regards, > Manoj Kumar, > GSoC 2014, Scikit-learn > Mech Undergrad > http://manojbits.wordpress.com > -- Regards, Manoj Kumar, GSoC 2014, Scikit-learn Mech Undergrad http://manojbits.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Tue Aug 12 08:33:11 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Tue, 12 Aug 2014 14:33:11 +0200 Subject: [SciPy-Dev] computing pairwise distance of vectors with missing (nan) values In-Reply-To: References: <53CCCA9E.1060103@gmail.com> Message-ID: So I've made significant headway on cythonizing a pdist function that ignores NaNs. You can see the results here: http://nbviewer.ipython.org/gist/Midnighter/b81d5732a0ef88f2e185 Two questions remain: 1) Can I somehow make use of the distance measures defined in scipy/spatial/src/distance.c? 2) Does anyone know if numexpr could be used to compute the above pairwise distances in parallel? Thank you again. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Wed Aug 13 11:08:35 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Wed, 13 Aug 2014 17:08:35 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values Message-ID: Dear all, As suggested in this github issue ( https://github.com/scipy/scipy/issues/3870), I would like to discuss the merit of introducing a new function nanpdist into scipy.spatial. I have also brought up the problem in the following previous e-mail ( http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and on SO ( http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values ). Warren suggested three ways to tackle this problem: 1. Don't change anything--the users should clean up their data! 2. nanpdist 3. Add a keyword argument to pdist that determines how nan should be treated. Clearly, I don't favor the first option since I believe missing values can be important pieces of information, too. I slightly tend towards option two because adding a keyword will further complicate an already very long pdist function. I'm happy to submit a pull request if there is a consensus that something should be done. Best, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Aug 13 12:15:15 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 13 Aug 2014 12:15:15 -0400 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: On Wed, Aug 13, 2014 at 11:08 AM, Moritz Beber wrote: > Dear all, > > As suggested in this github issue ( > https://github.com/scipy/scipy/issues/3870), I would like to discuss the > merit of introducing a new function nanpdist into scipy.spatial. I have > also brought up the problem in the following previous e-mail ( > http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and > on SO ( > http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values > ). > > Warren suggested three ways to tackle this problem: > > 1. Don't change anything--the users should clean up their data! > 2. nanpdist > 3. Add a keyword argument to pdist that determines how nan should be > treated. > > Clearly, I don't favor the first option since I believe missing values can > be important pieces of information, too. I slightly tend towards option two > because adding a keyword will further complicate an already very long pdist > function. > > I'm happy to submit a pull request if there is a consensus that something > should be done. > > Best, > > Moritz > There are two parts to this: (1) What is the new calculation for handling nan's? (2) What is the API for accessing the new calculation? Before getting into the API (i.e. nanpdist vs. keyword vs. whatever), I'd like better understand (1). Here's a normal use of pdist (no nans): In [158]: set_printoptions(precision=2) In [159]: x = np.arange(1., 11).reshape(-1,2) In [160]: x Out[160]: array([[ 1., 2.], [ 3., 4.], [ 5., 6.], [ 7., 8.], [ 9., 10.]]) In [161]: pdist(x) Out[161]: array([ 2.83, 5.66, 8.49, 11.31, 2.83, 5.66, 8.49, 2.83, 5.66, 2.83]) And here's how pdist currently handles nans: In [162]: y = x.copy() In [163]: y[0,1] = nan In [164]: y[1,0] = nan In [165]: y Out[165]: array([[ 1., nan], [ nan, 4.], [ 5., 6.], [ 7., 8.], [ 9., 10.]]) In [166]: pdist(y) Out[166]: array([ nan, nan, nan, nan, nan, nan, nan, 2.83, 5.66, 2.83]) That is, *any* distance involving a point that has a nan is nan. This seems like a reasonable default behavior. What should nanpdist(y) be? Based on your code snippet on StackOverflow and your comment in the github issue, my understanding is this: for any pair, you ignore the coordinates where either has a nan (i.e. compute the distance in a lower dimension). In this case, pdist(y) would be [nan, 4, 6, 8, 2, 4, 6, 2.83, 5.66, 2.83] (I'm not sure if you would put nan or something else in that first position.) Or, if we use the scaling of `n/(n - p)` that you suggested in the github issue, where n is the dimension of the observations and p is the number of "missing" coordinates, [nan, 8, 12, 16, 4, 8, 12, 2.83, 5.66, 2.83] Is that correct? What's the use-case for this behavior? How widely used is it? Warren > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Aug 13 12:29:22 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 13 Aug 2014 09:29:22 -0700 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: On Wed, Aug 13, 2014 at 8:08 AM, Moritz Beber wrote: > Dear all, > > As suggested in this github issue ( > https://github.com/scipy/scipy/issues/3870), I would like to discuss the > merit of introducing a new function nanpdist into scipy.spatial. I have > also brought up the problem in the following previous e-mail ( > http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and > on SO ( > http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values > ). > > Warren suggested three ways to tackle this problem: > > 1. Don't change anything--the users should clean up their data! > 2. nanpdist > 3. Add a keyword argument to pdist that determines how nan should be > treated. > > Warren has already pointed this out, but let me insist: what is nanpdist, or the nan keyword expected to do? Treat pairs of vectors with NaNs as lower dimensional, removing pairs of entries where either is NaN? Do those results make any real sense? Thinking of euclidean distance for points in 3D space, I have trouble thinking of a practical situation where "if any Z coordinate is missing, just give me the distance of the projections onto the XY plane" would be anything but a misleading result. I presume the case is different for all those other distances I have never needed to use, so I am just curious of the use case. Looking at your linked post, from an implementation point of view, at the low level function that is actually going to do the heavy lifting, it is probable better to, rather than hardcode a check for NaN-ness, take a 'where' kwarg, as numpy ufuncs already do ( http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments), and build the masking array in a higher level wrapper. This would make it easier to eventually make this functionality work with masked arrays or the like. As a separate but related issue, I have had this PR open for almost a year now, https://github.com/scipy/scipy/pull/3163, and although me saying I want to complete it is getting old, hopefully whatever you have in mind can fit with the general structure of that. Lastly, whatever you go for, I don't think you should do anything to pdist that you don't also do for cdist and the individual distance functions. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Thu Aug 14 05:24:09 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Thu, 14 Aug 2014 11:24:09 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: Answers to Warren's post: > > That is, *any* distance involving a point that has a nan is nan. > This seems like a reasonable default behavior. > I agree, it is the way that most functions in other lanaguages/packages handle it. > > What should nanpdist(y) be? > > Based on your code snippet on StackOverflow and your comment in the github > issue, my understanding is this: for any pair, you ignore the coordinates > where either has a nan (i.e. compute the distance in a lower dimension). > In this case, pdist(y) would be > > [nan, 4, 6, 8, 2, 4, 6, 2.83, 5.66, 2.83] > > (I'm not sure if you would put nan or something else in that first > position.) > > Or, if we use the scaling of `n/(n - p)` that you suggested in the github > issue, > where n is the dimension of the observations and p is the number of > "missing" > coordinates, > > [nan, 8, 12, 16, 4, 8, 12, 2.83, 5.66, 2.83] > > Is that correct? > That is what I suggest. The appropriate scaling would have to be checked/discussed in detail as it may differ between distance and similarity measures. > > What's the use-case for this behavior? How widely used is it? > > I work in bioinformatics and my data set consists of thousands of vectors corresponding to different treatment parameters. Each vector consists of basically the changes in expression levels of a number of genes. I am interested in clustering the treatments, i.e., determine which treatments introduce similar gene expression patterns. Not every treatment leads to significant expression changes, of course, which is why there are missing values. So the vectors have roughly 3000 elements and most of them have about 200 missing values. The data are scaled to follow a normal distribution so I could just replace the missing values with the mean and be done with it but I don't think that's the correct approach. I also don't want the current pdist behavior as it would disregard the majority of my otherwise perfectly valid data. As to the popularity of this use-case: Clustering of gene expression data is very wide-spread, however, usually all gene expression data are considered and thus every treatment consists of a completely filled vector. I can't claim that my current use-case is very popular, it's a slightly new approach. If you think that this behavior has no place in scipy, no problem at all. Best, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Thu Aug 14 05:33:52 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Thu, 14 Aug 2014 11:33:52 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: Answers to Jaime's post: Warren has already pointed this out, but let me insist: what is nanpdist, > or the nan keyword expected to do? Treat pairs of vectors with NaNs as > lower dimensional, removing pairs of entries where either is NaN? Do those > results make any real sense? Thinking of euclidean distance for points in > 3D space, I have trouble thinking of a practical situation where "if any Z > coordinate is missing, just give me the distance of the projections onto > the XY plane" would be anything but a misleading result. I presume the case > is different for all those other distances I have never needed to use, so I > am just curious of the use case. > Please see my answer to Warren about the use-case. In three dimensions this would certainly not make sense but my use-case has over three thousand dimensions. What I have in mind is a scaling factor for distance metrics, as suggested before, and an appropriate consideration of dissimilarity of the missing coordinate in similarity measures. > > Looking at your linked post, from an implementation point of view, at the > low level function that is actually going to do the heavy lifting, it is > probable better to, rather than hardcode a check for NaN-ness, take a > 'where' kwarg, as numpy ufuncs already do ( > http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments), > and build the masking array in a higher level wrapper. This would make it > easier to eventually make this functionality work with masked arrays or the > like. > I'd be perfectly happy to do so. The hard-coded check is inspired by bottleneck which does exactly that for all its nan* functions. But I agree that a mask is preferable. > > As a separate but related issue, I have had this PR open for almost a year > now, https://github.com/scipy/scipy/pull/3163, and although me saying I > want to complete it is getting old, hopefully whatever you have in mind can > fit with the general structure of that. > I haven't fully grasped your code in umath_distance.c.src but that's probably a separate discussion. I also couldn't tell if some of that code is automatically generated or all written by hand. > > Lastly, whatever you go for, I don't think you should do anything to pdist > that you don't also do for cdist and the individual distance functions. > > Noted and agreed. Best, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From theodore.goetz at gmail.com Fri Aug 15 10:59:47 2014 From: theodore.goetz at gmail.com (Johann Goetz) Date: Fri, 15 Aug 2014 10:59:47 -0400 Subject: [SciPy-Dev] Histogram as its own class Message-ID: Hello, I'm a long-time user of scipy doing mostly multivariate big-data (several terabytes) analysis in the high-energy physics realm. One thing I've found useful was to promote the histogram to it's own class. Instead of creating yet another package, I have a mind to include it into the scipy.stats module and I would like some feed-back. I.e. is this the right place for such an object? I have some documentation, but not enough I would say, and the classes are currently buried in my "pyhep" project, but they are easily extracted out. https://bitbucket.org/theodoregoetz/pyhep/wiki/Home Here are some details: The histograms I am addressing are N-dimensional over a continuous-domain (floating-point data, no gaps - though bins can have value inf or nan if need-be) along each axis. The axes need not be uniform. There are two classes: HistogramAxis and Histogram. The Axes are always floating point, but the histogram's data can be any dtype (default: np.int, a "cast" to float is done when dividing two histograms). I make use of np.histogramdd() and store the data along with the uncertainty. Many operations are supported including adding, subtracting, multiplying, dividing, bin-merging, cutting/clipping along one or more axes, projecting along an axis, iterating over an axis, filling from a sample with or without weights. Most of power in this package is in the fitting method of the histogram which makes use of scipy.curve_fit(). It handles missing data (when a bin is inf or nan), can include the uncertainty in the fit, and calculates a goodness of fit. On top of this, I have free functions to plot 1D and 2D histograms using matplotlib, as well as functions to handle reading in large HDF5 files. These are auxiliary and may not fit into scipy directly. Thank you all, Johann. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 18 18:20:22 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 19 Aug 2014 00:20:22 +0200 Subject: [SciPy-Dev] sprint @ EuroSciPy, Aug 31 Message-ID: Hi all, Here is a reminder that on Sunday 31 August, there will be a Scipy sprint at EuroSciPy (in Cambridge, UK). Details can be found at https://www.euroscipy.org/2014/program/sprints/ Newcomers to Scipy development are very welcome; actually one of the main goals of the sprint is to help new people to get started. Last year's sprint was excellent - 20 people joined and we still have all-time highs in the commits per month and contributors per month graph to show for it: https://www.openhub.net/p/scipy If you have time and will be at EuroSciPy: please join! Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Fri Aug 22 10:19:03 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Fri, 22 Aug 2014 16:19:03 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: So there's quite obviously not a lot of interest in this. I will simply write my own little package in that case. I guess after the weekend I'll close the issue on github unless anyone wants to keep it open. @Jaime: I've read up on ufuncs and they definitely seem like the way to go. Can you say a bit more on how you generated scipy/spatial/src/umath_distance.c.src? I assume it was generated and not written by hand, so did you do that with Cython or something else not included in the pull request? -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Fri Aug 22 10:31:45 2014 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Fri, 22 Aug 2014 15:31:45 +0100 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: > So there's quite obviously not a lot of interest in this. I will simply You might be rushing to a conclusion a bit. The mailing list was down for a good part of the week. And in general, you might want to let people a bit more time to respond --- response times vary a lot, for better or worse. Evgeni > write my own little package in that case. I guess after the weekend I'll > close the issue on github unless anyone wants to keep it open. > > @Jaime: I've read up on ufuncs and they definitely seem like the way to go. > Can you say a bit more on how you generated > scipy/spatial/src/umath_distance.c.src? I assume it was generated and not > written by hand, so did you do that with Cython or something else not > included in the pull request? > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From njs at pobox.com Fri Aug 22 10:33:16 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 Aug 2014 15:33:16 +0100 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: On Thu, Aug 14, 2014 at 10:24 AM, Moritz Beber wrote: > I work in bioinformatics and my data set consists of thousands of vectors > corresponding to different treatment parameters. Each vector consists of > basically the changes in expression levels of a number of genes. I am > interested in clustering the treatments, i.e., determine which treatments > introduce similar gene expression patterns. Not every treatment leads to > significant expression changes, of course, which is why there are missing > values. So the vectors have roughly 3000 elements and most of them have > about 200 missing values. Just as a scientific issue this seems very odd to me and not at all what statisticians usually mean by missing data. Surely if you want to determine "which treatments introduce similar gene expression patterns" then two treatments that both produce no effect on the expression of the same gene should be counted as more similar to each other? If you've measured an expression change to be near 0 then that's a known measured value that happens to be near 0 -- not an unknown value that could be arbitrarily large or small and you have no idea which. (Obviously I don't know any of the details about your setting, but in particular I worry that your reasoning sounds similar to common misconceptions about what "significant" actually means. "Not significantly different from zero" might well be "significantly different from 1000".) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From moritz.beber at gmail.com Fri Aug 22 12:13:52 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Fri, 22 Aug 2014 18:13:52 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: > > You might be rushing to a conclusion a bit. > The mailing list was down for a good part of the week. And in general, > you might want to let people a bit more time to respond --- response > times vary a lot, for better or worse. > I didn't realize that the mailing list had an outage. Thanks for mentioning it! Also, I'm not terribly in a rush but @argriffing was asking for a PR about a week ago ( https://github.com/scipy/scipy/issues/3870#issuecomment-52348019) so it seemed as if he wanted to move things along. I'm reluctant, obviously, to start a pull request when there's no real interest in it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Fri Aug 22 13:06:06 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Fri, 22 Aug 2014 19:06:06 +0200 Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values In-Reply-To: References: Message-ID: Thank you for your response Nathaniel. I was a bit concerned that by going into the application this would turn into a discussion about the method rather than whether this is a desirable concept for scipy. I suppose it's not possible to fully separate the two issues so I will indulge you. On Fri, Aug 22, 2014 at 4:33 PM, Nathaniel Smith wrote: > > Just as a scientific issue this seems very odd to me and not at all > what statisticians usually mean by missing data. Surely if you want to > determine "which treatments introduce similar gene expression > patterns" then two treatments that both produce no effect on the > expression of the same gene should be counted as more similar to each > other? If you've measured an expression change to be near 0 then > that's a known measured value that happens to be near 0 -- not an > unknown value that could be arbitrarily large or small and you have no > idea which. (Obviously I don't know any of the details about your > setting, but in particular I worry that your reasoning sounds similar > to common misconceptions about what "significant" actually means. "Not > significantly different from zero" might well be "significantly > different from 1000".) > Since I didn't want the discussion to be about the method I tried to describe the situation briefly and did not give you the whole story. My apologies. The real situation is the following: The gene expression data are mapped onto pathways using information on links between proteins and coding genes. The pathway definitions come from a multitude of source databases and were collected in a single database (http://consensuspathdb.org/). Only pathways that have five or more available scores are considered (this is somewhat arbitrary, I suppose). Each pathway is then assigned a mean score. Pathways that have too few scores are not considered. You can read up on more specifics in [1]. So I consider those pathways that did not make the cut-off of 5 scores as "missing values". If all the treatments had missing values at the same pathways, I'd be tempted to just throw those out. We are considering treatments from different studies, however, and the studies report gene expression changes for different genes and consequently different pathways end up having no scores. I still want to be able to compare treatments between different studies. One approach could be to rethink the scoring of pathways and introduce an uncertainty that is larger for pathways with missing scores but since I'm sitting at the end of a pipeline that lands the treatments and pathway response scores in my lap, my preferred way of dealing with this is to simply scale up the distance between treatments where one has a pathway score and it's missing for the other. If this seems unreasonable to you, I'm all ears. It does make sense in my mind. Cheers, Moritz [1] http://toxsci.oxfordjournals.org/content/124/2/278.full in particular in the subsection "pathway response analysis" -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff.grasty at gmail.com Mon Aug 25 18:55:42 2014 From: jeff.grasty at gmail.com (Jeff Grasty) Date: Mon, 25 Aug 2014 23:55:42 +0100 Subject: [SciPy-Dev] Nyquist Filters Message-ID: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> Hi, One of the features that I have found missing in SciPy are functions to design nyquist and root-nyquist filters, such as raised cosine and root-raised cosine filters. I have written several functions for this purpose and was curious if anyone thought was a greater need for this. Thanks, Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From warren.weckesser at gmail.com Mon Aug 25 20:07:06 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 25 Aug 2014 20:07:06 -0400 Subject: [SciPy-Dev] Nyquist Filters In-Reply-To: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> Message-ID: On Mon, Aug 25, 2014 at 6:55 PM, Jeff Grasty wrote: > Hi, > > One of the features that I have found missing in SciPy are functions to > design nyquist and root-nyquist filters, such as raised cosine and > root-raised cosine filters. I have written several functions for this > purpose and was curious if anyone thought was a greater need for this. > > Yes, that would be great! I have some scratch work for the raised cosine and root-raised cosine FIR filters, but they're not ready for contributing to scipy. If you have code in pretty good shape, these would be nice additions to scipy.signal. The first thing to think about is the API. What is the API of your code? A possible design is similar to the Savitzy-Golay filter implementation. It's a very basic, function-oriented API. One function, savgol_coeffs, provides the FIR filter coefficients, given the number of taps and the parameters of the filter. Another function, savgol_filter, takes an input array along with the filter parameters. It computes the coefficients and applies the filter. It is really just a convenience function: it calls savgol_coeffs to compute the filter coefficients, and applies the filter using a convolution (the only complication is that it provides several options for handling the edges of the input). Even more basic are the functions for FIR filter design using the window method. The functions firwin and firwin2 compute the filter coefficients, and leave it up to the user to convolve them with their signal. Looking forward to hearing more. Warren Thanks, > Jeff > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Aug 26 07:39:13 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 26 Aug 2014 07:39:13 -0400 Subject: [SciPy-Dev] Nyquist Filters References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> Message-ID: Here's code for nyquist filter coeffs. I apologize that it is quite old, and maybe could be a little more pretty. It is, however, quite well tested. -------------- next part -------------- A non-text attachment was scrubbed... Name: nyquist.py Type: text/x-python Size: 2754 bytes Desc: not available URL: From kitchi.srikrishna at gmail.com Tue Aug 26 08:36:35 2014 From: kitchi.srikrishna at gmail.com (Sri Krishna) Date: Tue, 26 Aug 2014 18:06:35 +0530 Subject: [SciPy-Dev] To use C code or Cython code? Message-ID: Hi, I'm new to the Scipy-Dev mailing list, looking to contribute wherever I can. I was looking through the open issues and saw this issue , regarding a speed-up for the convolve2d function. My confusion arises from the SciPy coding guidelines which states that using Cython is much preferable to using plain C/C++/Fortran. Would it be desirable then to change the C code of signal/firfilter.c to a Cythonized code? Thanks, Krishna -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Aug 26 13:58:38 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 26 Aug 2014 19:58:38 +0200 Subject: [SciPy-Dev] To use C code or Cython code? In-Reply-To: References: Message-ID: <53FCCACE.8040608@googlemail.com> On 26.08.2014 14:36, Sri Krishna wrote: > Hi, > > I'm new to the Scipy-Dev mailing list, looking to contribute wherever I > can. I was looking through the open issues and saw this issue > , regarding a speed-up for > the convolve2d function. > > My confusion arises from the SciPy coding guidelines > which > states that using Cython is much preferable to using plain C/C++/Fortran. > > Would it be desirable then to change the C code of signal/firfilter.c to > a Cythonized code? > hi, I think it would be better to keep the core of the function in plain C/C++ or Fortran. As this is a function that can profit greatly from lowlevel use of the hardware we retain more flexibility for optimization by staying with a lowlevel language. Cython does not offer any advantage at that level of the code and would make it impossible(?) to use of assembler or intrinsics. The wrapping to python on the other hand is probably preferable in in Cython as it simplifies a lot of mundane and error prone issues. Cheers, Julian From kitchi.srikrishna at gmail.com Tue Aug 26 16:13:03 2014 From: kitchi.srikrishna at gmail.com (Sri Krishna) Date: Wed, 27 Aug 2014 01:43:03 +0530 Subject: [SciPy-Dev] To use C code or Cython code? In-Reply-To: <53FCCACE.8040608@googlemail.com> References: <53FCCACE.8040608@googlemail.com> Message-ID: > > The wrapping to python on the other hand is probably preferable in in > Cython as it simplifies a lot of mundane and error prone issues. > So if I understand correctly - Most, if not all core functionality of Scipy will be in C/C++/Fortran, and the glue code between C and the python interface will run on Cython? Thanks, Krishna On 26 August 2014 23:28, Julian Taylor wrote: > On 26.08.2014 14:36, Sri Krishna wrote: > > Hi, > > > > I'm new to the Scipy-Dev mailing list, looking to contribute wherever I > > can. I was looking through the open issues and saw this issue > > , regarding a speed-up for > > the convolve2d function. > > > > My confusion arises from the SciPy coding guidelines > > which > > states that using Cython is much preferable to using plain C/C++/Fortran. > > > > Would it be desirable then to change the C code of signal/firfilter.c to > > a Cythonized code? > > > > hi, > I think it would be better to keep the core of the function in plain > C/C++ or Fortran. > As this is a function that can profit greatly from lowlevel use of the > hardware we retain more flexibility for optimization by staying with a > lowlevel language. Cython does not offer any advantage at that level of > the code and would make it impossible(?) to use of assembler or intrinsics. > > The wrapping to python on the other hand is probably preferable in in > Cython as it simplifies a lot of mundane and error prone issues. > > Cheers, > Julian > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Tue Aug 26 17:15:36 2014 From: ewm at redtetrahedron.org (Eric Moore) Date: Tue, 26 Aug 2014 17:15:36 -0400 Subject: [SciPy-Dev] To use C code or Cython code? In-Reply-To: References: Message-ID: Krishna, A good place to start before making any changes to firfilter.c would be to evaluate the various convolution routines that already exist. Depending on the inputs, their speed varies quite a bit. We currently have a mix of 1d, 2d, and nd convolution routines in signal, ndimage and numpy (also possibly elsewhere). It would be good to move all of these to a single routine (at least where practical). A related piece of particularly low hanging fruit in signal is to teach lfilter to be smarter when it is passed a FIR filter. There ought to be an immediate speed win here. Eric On Tue, Aug 26, 2014 at 8:36 AM, Sri Krishna wrote: > Hi, > > I'm new to the Scipy-Dev mailing list, looking to contribute wherever I > can. I was looking through the open issues and saw this issue > , regarding a speed-up for > the convolve2d function. > > My confusion arises from the SciPy coding guidelines > which states > that using Cython is much preferable to using plain C/C++/Fortran. > > Would it be desirable then to change the C code of signal/firfilter.c to a > Cythonized code? > > Thanks, > Krishna > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Aug 27 04:36:42 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 27 Aug 2014 08:36:42 +0000 (UTC) Subject: [SciPy-Dev] To use C code or Cython code? References: <53FCCACE.8040608@googlemail.com> Message-ID: <245045135430800225.357621sturla.molden-gmail.com@news.gmane.org> Sri Krishna wrote: > So if I understand correctly - Most, if not all core functionality of Scipy > will be in C/C++/Fortran, and the glue code between C and the python > interface will run on Cython? The glue for Fortran would normally be f2py. Sturla From jtaylor.debian at googlemail.com Wed Aug 27 13:07:24 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 27 Aug 2014 19:07:24 +0200 Subject: [SciPy-Dev] ANN: NumPy 1.9.0 release candidate 1 available Message-ID: <53FE104C.2020006@googlemail.com> Hello, Almost punctually for EuroScipy we have finally managed to release the first release candidate of NumPy 1.9. We intend to only fix bugs until the final release which we plan to do in the next 1-2 weeks. In this release numerous performance improvements have been added, most significantly the indexing code has been rewritten be several times faster for most cases and performance of using small arrays and scalars has almost doubled. Plenty of other functions have been improved too, nonzero, where, count_nonzero, floating point min/max, boolean argmin/argmax, searchsorted, triu/tril, masked sorting can be expected to perform significantly better in many cases. Also NumPy now releases the GIL for more functions, most notably the indexing now releases it and the random modules state object has a private lock instead of using the GIL. This allows leveraging pure python threads more efficiently. In order to make working with arrays containing NaN values easier nanmedian and nanpercentile have been added which ignore these values. These functions and the regular median and percentile now also support generalized axis arguments that ufuncs already have, these allow reducing along multiple axis in one call. Please see the release notes for all the details. Please also take not of the many small compatibility notes and deprecation in the notes. https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst The source tarballs and win32 binaries can be downloaded here: https://sourceforge.net/projects/numpy/files/NumPy/1.9.0rc1 Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jeff.grasty at gmail.com Thu Aug 28 14:43:44 2014 From: jeff.grasty at gmail.com (Jeff Grasty) Date: Thu, 28 Aug 2014 18:43:44 +0000 (UTC) Subject: [SciPy-Dev] Nyquist Filters References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> Message-ID: Warren, I currently have two functions, Nyquist and rootNyquist, that return the coefficients of a nyquist or root-nyquist filter with a specified alpha and length. The algorithm that the functions implement is one proposed by Fred Harris in hist multi-rate signal processing book. It uses the remez algorithm to start as an initial guess and uses a gradient descent method to adjust the cutoff frequency of the passband until the filter's 3 dB (or 6 dB) point is at half the baud rate. I think the API that you mentioned for the Savitzky-Golay filter uses sounds simple and effective. What are ideas of how to test this? I can think of writing some simple unit tests that check filter length, gain, etc. Would that be sufficient. Here is a link to my github project for the code so far: https://github.com/fstop22/nyquist_filters Thanks, Jeff From moritz.beber at gmail.com Fri Aug 29 06:13:08 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Fri, 29 Aug 2014 12:13:08 +0200 Subject: [SciPy-Dev] nested setup.py scripts Message-ID: Dear all, I want to generate a package with a submodule structure similar to what numpy and scipy use. (Or do you recommend not doing that?) I have read the following pieces of documentation but I'm still unclear about how the main setup.py script discovers the nested scripts and gets the configuration values from those. Is this documented somewhere or can anyone point me to how this is done? Thank you in advance, Moritz P.S.: What I've read: https://github.com/numpy/numpy/blob/master/doc/DISTUTILS.rst.txt http://docs.scipy.org/doc/scipy-dev/reference/hacking.html http://docs.scipy.org/doc/scipy/reference/api.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Aug 29 08:50:12 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 29 Aug 2014 08:50:12 -0400 Subject: [SciPy-Dev] ANN: NumPy 1.9.0 release candidate 1 available References: <53FE104C.2020006@googlemail.com> Message-ID: OK, it's fixed by doing: rm -rf ~/.local/lib/python2.7/site-packages/numpy* python setup.py install --user I guess something was not cleaned out from previous packages From ndbecker2 at gmail.com Fri Aug 29 09:25:49 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 29 Aug 2014 09:25:49 -0400 Subject: [SciPy-Dev] Nyquist Filters References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com> Message-ID: Interesting, but I am maybe missing something. This optimization only enforces flatness in passband and stopband, and 3dB pt. But nyquist filter is defined as having nyquist symmetry, which is what leads to zero ISI (the main reason for using a nyquist filter). There doesn't appear to be anything enforcing this symmmetry. Jeff Grasty wrote: > Warren, > > I currently have two functions, Nyquist and rootNyquist, that return the > coefficients of a nyquist or root-nyquist filter with a specified alpha > and length. The algorithm that the functions implement is one proposed > by Fred Harris in hist multi-rate signal processing book. It uses the > remez algorithm to start as an initial guess and uses a gradient > descent method to adjust the cutoff frequency of the passband until > the filter's 3 dB (or 6 dB) point is at half the baud rate. > > I think the API that you mentioned for the Savitzky-Golay filter uses > sounds simple and effective. > > What are ideas of how to test this? I can think of writing some simple > unit tests that check filter length, gain, etc. Would that be sufficient. > > Here is a link to my github project for the code so far: > https://github.com/fstop22/nyquist_filters > > Thanks, > Jeff -- -- Those who don't understand recursion are doomed to repeat it From ben.root at ou.edu Fri Aug 29 09:26:47 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 Aug 2014 09:26:47 -0400 Subject: [SciPy-Dev] [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: References: <53FE104C.2020006@googlemail.com> Message-ID: It is generally a good idea when switching between releases to execute "git clean -fxd" prior to rebuilding. Admittedly, I don't know how cleaning out that directory in .local could have impacted things. Go figure. Cheers! Ben Root On Fri, Aug 29, 2014 at 8:50 AM, Neal Becker wrote: > OK, it's fixed by doing: > > rm -rf ~/.local/lib/python2.7/site-packages/numpy* > python setup.py install --user > > I guess something was not cleaned out from previous packages > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Aug 29 16:06:35 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 29 Aug 2014 21:06:35 +0100 Subject: [SciPy-Dev] nested setup.py scripts In-Reply-To: References: Message-ID: On Fri, Aug 29, 2014 at 11:13 AM, Moritz Beber wrote: > Dear all, > > I want to generate a package with a submodule structure similar to what > numpy and scipy use. (Or do you recommend not doing that?) I have read the > following pieces of documentation but I'm still unclear about how the main > setup.py script discovers the nested scripts and gets the configuration > values from those. Is this documented somewhere or can anyone point me to > how this is done? Getting clever with setup.py leads to suffering. Suffering leads to hate. Hate leads to the Dark Side. (I have no idea how numpy and scipy's setup.py work, but any time I've tried doing anything 1/10th that clever with setup.py I've regretted it.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ralf.gommers at gmail.com Fri Aug 29 16:09:17 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 29 Aug 2014 22:09:17 +0200 Subject: [SciPy-Dev] nested setup.py scripts In-Reply-To: References: Message-ID: On Fri, Aug 29, 2014 at 12:13 PM, Moritz Beber wrote: > Dear all, > > I want to generate a package with a submodule structure similar to what > numpy and scipy use. (Or do you recommend not doing that?) > It's a pretty standard layout for a Python package (assuming it's large in size and has some compiled code in it that actually needs multiple setup.py's), it's fine to copy this structure. > I have read the following pieces of documentation but I'm still unclear > about how the main setup.py script discovers the nested scripts and gets > the configuration values from those. Is this documented somewhere or can > anyone point me to how this is done? > In the main setup.py you'll see: config.add_subpackage('scipy') And in scipy/setup.py config.add_subpackage('cluster') config.add_subpackage('constants') config.add_subpackage('fftpack') ... Cheers, Ralf > Thank you in advance, > Moritz > > P.S.: What I've read: > https://github.com/numpy/numpy/blob/master/doc/DISTUTILS.rst.txt > http://docs.scipy.org/doc/scipy-dev/reference/hacking.html > http://docs.scipy.org/doc/scipy/reference/api.html > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Aug 29 16:10:10 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 29 Aug 2014 22:10:10 +0200 Subject: [SciPy-Dev] nested setup.py scripts In-Reply-To: References: Message-ID: On Fri, Aug 29, 2014 at 10:06 PM, Nathaniel Smith wrote: > On Fri, Aug 29, 2014 at 11:13 AM, Moritz Beber > wrote: > > Dear all, > > > > I want to generate a package with a submodule structure similar to what > > numpy and scipy use. (Or do you recommend not doing that?) I have read > the > > following pieces of documentation but I'm still unclear about how the > main > > setup.py script discovers the nested scripts and gets the configuration > > values from those. Is this documented somewhere or can anyone point me to > > how this is done? > > Getting clever with setup.py leads to suffering. Suffering leads to > hate. Hate leads to the Dark Side. > > (I have no idea how numpy and scipy's setup.py work, but any time I've > tried doing anything 1/10th that clever with setup.py I've regretted > it.) > :) very true - keep the complexity as low as you possibly can Ralf > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: