From jtaylor.debian at googlemail.com  Mon Aug  4 18:05:43 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Tue, 05 Aug 2014 00:05:43 +0200
Subject: [SciPy-Dev] last call for numpy 1.8.2 bugfixes
Message-ID: <53E003B7.3090100@googlemail.com>

hi,
as numpy 1.9 is going to be a relative hard upgrade as indexing changes
expose a couple bugs in third party packages and the large amount of
small little incompatibilities I will create a numpy 1.8.2 release
tomorrow with a couple of important or hard to work around bugfixes.

The most important bugfix is fixing the wrong result partition with
multiple selections could produce if selections ended up in an equal
range, see https://github.com/numpy/numpy/issues/4836 (if the crash is
still unreproducable, help appreciated).

the rest of the fixes are small ones listed below.
If I have missed one or you consider one of the fixes to invasive for a
bugfix release please speak up now.
As the number of fixes is small I will skip a release candidate.


Make fftpack._raw_fft threadsafe
https://github.com/numpy/numpy/issues/4656

Prevent division by zero
https://github.com/numpy/numpy/issues/650

Fix lack of NULL check in array_richcompare
https://github.com/numpy/numpy/issues/4613

incorrect argument order to _copyto in in np.nanmax, np.nanmin
https://github.com/numpy/numpy/issues/4628

Hold GIL for types with fields, fixes
https://github.com/numpy/numpy/issues/4642

svd ufunc typo
https://github.com/numpy/numpy/issues/4733

check alignment of strides for byteswap
https://github.com/numpy/numpy/issues/4774

add missing elementsize alignment check for simd reductions
https://github.com/numpy/numpy/issues/4853

ifort has issues with optimization flag /O2
https://github.com/numpy/numpy/issues/4602

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140805/d0d0255c/attachment.sig>

From matthew.brett at gmail.com  Mon Aug  4 18:09:39 2014
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 4 Aug 2014 15:09:39 -0700
Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2
	bugfixes
In-Reply-To: <53E003B7.3090100@googlemail.com>
References: <53E003B7.3090100@googlemail.com>
Message-ID: <CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>

Hi,

On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> hi,
> as numpy 1.9 is going to be a relative hard upgrade as indexing changes
> expose a couple bugs in third party packages and the large amount of
> small little incompatibilities I will create a numpy 1.8.2 release
> tomorrow with a couple of important or hard to work around bugfixes.
>
> The most important bugfix is fixing the wrong result partition with
> multiple selections could produce if selections ended up in an equal
> range, see https://github.com/numpy/numpy/issues/4836 (if the crash is
> still unreproducable, help appreciated).
>
> the rest of the fixes are small ones listed below.
> If I have missed one or you consider one of the fixes to invasive for a
> bugfix release please speak up now.
> As the number of fixes is small I will skip a release candidate.
>
>
> Make fftpack._raw_fft threadsafe
> https://github.com/numpy/numpy/issues/4656
>
> Prevent division by zero
> https://github.com/numpy/numpy/issues/650
>
> Fix lack of NULL check in array_richcompare
> https://github.com/numpy/numpy/issues/4613
>
> incorrect argument order to _copyto in in np.nanmax, np.nanmin
> https://github.com/numpy/numpy/issues/4628
>
> Hold GIL for types with fields, fixes
> https://github.com/numpy/numpy/issues/4642
>
> svd ufunc typo
> https://github.com/numpy/numpy/issues/4733
>
> check alignment of strides for byteswap
> https://github.com/numpy/numpy/issues/4774
>
> add missing elementsize alignment check for simd reductions
> https://github.com/numpy/numpy/issues/4853
>
> ifort has issues with optimization flag /O2
> https://github.com/numpy/numpy/issues/4602

Any chance of a RC to give us some time to test?

Cheers,

Matthew


From jtaylor.debian at googlemail.com  Mon Aug  4 18:12:50 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Tue, 05 Aug 2014 00:12:50 +0200
Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2
	bugfixes
In-Reply-To: <CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>
References: <53E003B7.3090100@googlemail.com>
	<CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>
Message-ID: <53E00562.5070602@googlemail.com>

On 05.08.2014 00:09, Matthew Brett wrote:
> Hi,
> 
> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>> hi,
>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes
>> expose a couple bugs in third party packages and the large amount of
>> small little incompatibilities I will create a numpy 1.8.2 release
>> tomorrow with a couple of important or hard to work around bugfixes.
>>...
> 
> Any chance of a RC to give us some time to test?
> 

I hope I have only selected fixes that are safe and do not require a RC.
sure we could do one, but if there are issues we can also just make a
quick 1.8.3 release follow up.

the main backport PR is: https://github.com/numpy/numpy/pull/4949

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140805/b7c1254a/attachment.sig>

From njs at pobox.com  Mon Aug  4 18:25:04 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 4 Aug 2014 23:25:04 +0100
Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2
	bugfixes
In-Reply-To: <53E00562.5070602@googlemail.com>
References: <53E003B7.3090100@googlemail.com>
	<CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>
	<53E00562.5070602@googlemail.com>
Message-ID: <CAPJVwBkv-wn7GYM4nr5Gi1X45R9d29=k75Mz4Q01YB4QDTukQg@mail.gmail.com>

On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> On 05.08.2014 00:09, Matthew Brett wrote:
>> Hi,
>>
>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor
>> <jtaylor.debian at googlemail.com> wrote:
>>> hi,
>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes
>>> expose a couple bugs in third party packages and the large amount of
>>> small little incompatibilities I will create a numpy 1.8.2 release
>>> tomorrow with a couple of important or hard to work around bugfixes.
>>>...
>>
>> Any chance of a RC to give us some time to test?
>>
>
> I hope I have only selected fixes that are safe and do not require a RC.
> sure we could do one, but if there are issues we can also just make a
> quick 1.8.3 release follow up.
>
> the main backport PR is: https://github.com/numpy/numpy/pull/4949

It's probably better to just make an RC if it's not too much
trouble... it's always possible to misjudge what issues arise, if
there's a real-but-non-catastrophic issue then people 1.8.2 will
remain in use even if 1.8.3 is released afterwards and force
downstream libraries to work around the issues, and just in general
it's good to have and follow standard processes because special cases
lead to errors.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From matthew.brett at gmail.com  Mon Aug  4 18:27:38 2014
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 4 Aug 2014 15:27:38 -0700
Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2
	bugfixes
In-Reply-To: <CAPJVwBkv-wn7GYM4nr5Gi1X45R9d29=k75Mz4Q01YB4QDTukQg@mail.gmail.com>
References: <53E003B7.3090100@googlemail.com>
	<CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>
	<53E00562.5070602@googlemail.com>
	<CAPJVwBkv-wn7GYM4nr5Gi1X45R9d29=k75Mz4Q01YB4QDTukQg@mail.gmail.com>
Message-ID: <CAH6Pt5p-FM=fHkn6xPKqVjyf4CFY9-1mMtdwWwDKJLvkLRWWFw@mail.gmail.com>

On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>> On 05.08.2014 00:09, Matthew Brett wrote:
>>> Hi,
>>>
>>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com> wrote:
>>>> hi,
>>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes
>>>> expose a couple bugs in third party packages and the large amount of
>>>> small little incompatibilities I will create a numpy 1.8.2 release
>>>> tomorrow with a couple of important or hard to work around bugfixes.
>>>>...
>>>
>>> Any chance of a RC to give us some time to test?
>>>
>>
>> I hope I have only selected fixes that are safe and do not require a RC.
>> sure we could do one, but if there are issues we can also just make a
>> quick 1.8.3 release follow up.

A few days to test would be fine, I'd prefer an RC too,

Cheers,

Matthew


From jtaylor.debian at googlemail.com  Mon Aug  4 18:46:14 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Tue, 05 Aug 2014 00:46:14 +0200
Subject: [SciPy-Dev] [Numpy-discussion] last call for numpy 1.8.2
	bugfixes
In-Reply-To: <CAH6Pt5p-FM=fHkn6xPKqVjyf4CFY9-1mMtdwWwDKJLvkLRWWFw@mail.gmail.com>
References: <53E003B7.3090100@googlemail.com>	<CAH6Pt5pP_gbyWTiK4yDU3J0kFMAfY1sXQssfuEGu01KOc9Nesg@mail.gmail.com>	<53E00562.5070602@googlemail.com>	<CAPJVwBkv-wn7GYM4nr5Gi1X45R9d29=k75Mz4Q01YB4QDTukQg@mail.gmail.com>
	<CAH6Pt5p-FM=fHkn6xPKqVjyf4CFY9-1mMtdwWwDKJLvkLRWWFw@mail.gmail.com>
Message-ID: <53E00D36.3080500@googlemail.com>

On 05.08.2014 00:27, Matthew Brett wrote:
> On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor
>> <jtaylor.debian at googlemail.com> wrote:
>>> On 05.08.2014 00:09, Matthew Brett wrote:
>>>> Hi,
>>>>
>>>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor
>>>> <jtaylor.debian at googlemail.com> wrote:
>>>>> hi,
>>>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes
>>>>> expose a couple bugs in third party packages and the large amount of
>>>>> small little incompatibilities I will create a numpy 1.8.2 release
>>>>> tomorrow with a couple of important or hard to work around bugfixes.
>>>>> ...
>>>>
>>>> Any chance of a RC to give us some time to test?
>>>>
>>>
>>> I hope I have only selected fixes that are safe and do not require a RC.
>>> sure we could do one, but if there are issues we can also just make a
>>> quick 1.8.3 release follow up.
> 
> A few days to test would be fine, I'd prefer an RC too,
> 

alright I'll make an RC tomorrow and planning for release this weekend then.


From jtaylor.debian at googlemail.com  Tue Aug  5 15:45:02 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Tue, 05 Aug 2014 21:45:02 +0200
Subject: [SciPy-Dev] ANN:  NumPy 1.8.2 release candidate
Message-ID: <53E1343E.7020805@googlemail.com>

Hello,

I am pleased to announce the first release candidate for numpy 1.8.2, a
pure bugfix release for the 1.8.x series.
https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/

If no regressions show up the final release is planned this weekend.
The upgrade is recommended for all users of the 1.8.x series.

Following issues have been fixed:
* gh-4836: partition produces wrong results for multiple selections in
equal ranges
* gh-4656: Make fftpack._raw_fft threadsafe
* gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin
* gh-4613: Fix lack of NULL check in array_richcompare
* gh-4642: Hold GIL for converting dtypes types with fields
* gh-4733: fix np.linalg.svd(b, compute_uv=False)
* gh-4853: avoid unaligned simd load on reductions on i386
* gh-4774: avoid unaligned access for strided byteswap
* gh-650: Prevent division by zero when creating arrays from some buffers
* gh-4602: ifort has issues with optimization flag O2, use O1

Source tarballs, windows installers and release notes can be found at
https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/

Cheers,
Julian Taylor

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140805/32c06e86/attachment.sig>

From jtaylor.debian at googlemail.com  Sat Aug  9 08:38:02 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Sat, 09 Aug 2014 14:38:02 +0200
Subject: [SciPy-Dev] ANN:  NumPy 1.8.2 bugfix release
Message-ID: <53E6162A.8050809@googlemail.com>

Hello,

I am pleased to announce the release of NumPy 1.8.2, a
pure bugfix release for the 1.8.x series.
https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/
The upgrade is recommended for all users of the 1.8.x series.

Following issues have been fixed:
* gh-4836: partition produces wrong results for multiple selections in
equal ranges
* gh-4656: Make fftpack._raw_fft threadsafe
* gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin
* gh-4642: Hold GIL for converting dtypes types with fields
* gh-4733: fix np.linalg.svd(b, compute_uv=False)
* gh-4853: avoid unaligned simd load on reductions on i386
* gh-4722: Fix seg fault converting empty string to object
* gh-4613: Fix lack of NULL check in array_richcompare
* gh-4774: avoid unaligned access for strided byteswap
* gh-650: Prevent division by zero when creating arrays from some buffers
* gh-4602: ifort has issues with optimization flag O2, use O1


The source distributions have been uploaded to PyPI. The Windows
installers, documentation and release notes can be found at:
https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/

Cheers,
Julian Taylor

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140809/46a561eb/attachment.sig>

From matthew.brett at gmail.com  Sat Aug  9 20:23:54 2014
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 9 Aug 2014 17:23:54 -0700
Subject: [SciPy-Dev] [Numpy-discussion] ANN: NumPy 1.8.2 bugfix release
In-Reply-To: <53E6162A.8050809@googlemail.com>
References: <53E6162A.8050809@googlemail.com>
Message-ID: <CAH6Pt5rzrZtmnuee3zPHSYbA99gBP17rUDaDqqfkZ0s5ssm7jA@mail.gmail.com>

On Sat, Aug 9, 2014 at 5:38 AM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> Hello,
>
> I am pleased to announce the release of NumPy 1.8.2, a
> pure bugfix release for the 1.8.x series.
> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/
> The upgrade is recommended for all users of the 1.8.x series.
>
> Following issues have been fixed:
> * gh-4836: partition produces wrong results for multiple selections in
> equal ranges
> * gh-4656: Make fftpack._raw_fft threadsafe
> * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin
> * gh-4642: Hold GIL for converting dtypes types with fields
> * gh-4733: fix np.linalg.svd(b, compute_uv=False)
> * gh-4853: avoid unaligned simd load on reductions on i386
> * gh-4722: Fix seg fault converting empty string to object
> * gh-4613: Fix lack of NULL check in array_richcompare
> * gh-4774: avoid unaligned access for strided byteswap
> * gh-650: Prevent division by zero when creating arrays from some buffers
> * gh-4602: ifort has issues with optimization flag O2, use O1
>
>
> The source distributions have been uploaded to PyPI. The Windows
> installers, documentation and release notes can be found at:
> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/

OSX wheels now also up on pypi, please let us know of any problems,

Cheers,

Matthew


From manojkumarsivaraj334 at gmail.com  Mon Aug 11 11:04:27 2014
From: manojkumarsivaraj334 at gmail.com (Manoj Kumar)
Date: Mon, 11 Aug 2014 17:04:27 +0200
Subject: [SciPy-Dev] Fastest way to multiply a sparse matrix with another
	numpy array
Message-ID: <CAFQAd-=+FL0nYVEehqaYSmPrHath97pUeM1SyMorj-Nhy7YM4g@mail.gmail.com>

Hello,

I was wondering what is the fastest way (format) to multiply a sparse
matrix with a numpy array. Intuitively, a csr format multiplied with a
numpy array which is fortran contiguous seems to be the fastest, but I have
ran a few benchmarks and it seems otherwise. It is also mentioned here
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html
that using csr matrices "may" be faster.


In [5]: X
Out[5]:
<11314x130107 sparse matrix of type '<type 'numpy.float64'>'
    with 1787565 stored elements in Compressed Sparse Row format>
In [6]: _, n_features = X.shape
In [9]: w_c = np.random.rand(n_features, 10)
In [10]: w_f = np.asarray(w_c, order='f')
In [13]: csc = sparse.csc_matrix(X)
In [30]: %timeit X * w_f
10 loops, best of 3: 40.5 ms per loop

In [31]: %timeit X * w_c
10 loops, best of 3: 37.3 ms per loop

In [32]: %timeit csc *  w_c
10 loops, best of 3: 24.3 ms per loop

In [33]: %timeit csc * w_f
10 loops, best of 3: 27.3 ms per loop


It seems here, using a csc matrix is faster with a C-contiguous numpy array
which is completely non-intuitive to me. Are there any hard rules for this?
or is it data dependent?

Sorry for my noobish questions!
-- 
Regards,
Manoj Kumar,
GSoC 2014, Scikit-learn
Mech Undergrad
http://manojbits.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140811/6351ab1d/attachment.html>

From manojkumarsivaraj334 at gmail.com  Mon Aug 11 11:08:44 2014
From: manojkumarsivaraj334 at gmail.com (Manoj Kumar)
Date: Mon, 11 Aug 2014 17:08:44 +0200
Subject: [SciPy-Dev] Fastest way to multiply a sparse matrix with
	another numpy array
In-Reply-To: <CAFQAd-=+FL0nYVEehqaYSmPrHath97pUeM1SyMorj-Nhy7YM4g@mail.gmail.com>
References: <CAFQAd-=+FL0nYVEehqaYSmPrHath97pUeM1SyMorj-Nhy7YM4g@mail.gmail.com>
Message-ID: <CAFQAd-niAsmxYjr5qy5A_kS7scLjZyazp0N6XAjssMBjEAXbHw@mail.gmail.com>

I'm sorry that I posted this to the developers mailing list. I was meaning
to post this to the users list.


On Mon, Aug 11, 2014 at 5:04 PM, Manoj Kumar <manojkumarsivaraj334 at gmail.com
> wrote:

> Hello,
>
> I was wondering what is the fastest way (format) to multiply a sparse
> matrix with a numpy array. Intuitively, a csr format multiplied with a
> numpy array which is fortran contiguous seems to be the fastest, but I have
> ran a few benchmarks and it seems otherwise. It is also mentioned here
>
> http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html
> that using csr matrices "may" be faster.
>
>
> In [5]: X
> Out[5]:
> <11314x130107 sparse matrix of type '<type 'numpy.float64'>'
>     with 1787565 stored elements in Compressed Sparse Row format>
> In [6]: _, n_features = X.shape
> In [9]: w_c = np.random.rand(n_features, 10)
> In [10]: w_f = np.asarray(w_c, order='f')
> In [13]: csc = sparse.csc_matrix(X)
> In [30]: %timeit X * w_f
> 10 loops, best of 3: 40.5 ms per loop
>
> In [31]: %timeit X * w_c
> 10 loops, best of 3: 37.3 ms per loop
>
> In [32]: %timeit csc *  w_c
> 10 loops, best of 3: 24.3 ms per loop
>
> In [33]: %timeit csc * w_f
> 10 loops, best of 3: 27.3 ms per loop
>
>
> It seems here, using a csc matrix is faster with a C-contiguous numpy
> array which is completely non-intuitive to me. Are there any hard rules for
> this? or is it data dependent?
>
> Sorry for my noobish questions!
> --
> Regards,
> Manoj Kumar,
> GSoC 2014, Scikit-learn
> Mech Undergrad
> http://manojbits.wordpress.com
>


-- 
Regards,
Manoj Kumar,
GSoC 2014, Scikit-learn
Mech Undergrad
http://manojbits.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140811/aeadd15e/attachment.html>

From moritz.beber at gmail.com  Tue Aug 12 08:33:11 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Tue, 12 Aug 2014 14:33:11 +0200
Subject: [SciPy-Dev] computing pairwise distance of vectors with missing
 (nan) values
In-Reply-To: <BC351261-82C0-4EB3-85E1-A34663B1AB18@gmail.com>
References: <53CCCA9E.1060103@gmail.com>
	<CAFOFTpThYQCEdOxHQrHiORuNLwOdAWdcnSrb_T=TppUhzjikrA@mail.gmail.com>
	<BC351261-82C0-4EB3-85E1-A34663B1AB18@gmail.com>
Message-ID: <CAFOFTpSz5CEcBaHHbCRqrNh38yyavXK=jcDaNXjky+KAtyetDQ@mail.gmail.com>

So I've made significant headway on cythonizing a pdist function that
ignores NaNs. You can see the results here:
http://nbviewer.ipython.org/gist/Midnighter/b81d5732a0ef88f2e185

Two questions remain:

1) Can I somehow make use of the distance measures defined in
scipy/spatial/src/distance.c?
2) Does anyone know if numexpr could be used to compute the above pairwise
distances in parallel?

Thank you again.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140812/19254cc1/attachment.html>

From moritz.beber at gmail.com  Wed Aug 13 11:08:35 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Wed, 13 Aug 2014 17:08:35 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats NaNs
	as missing values
Message-ID: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>

Dear all,

As suggested in this github issue (
https://github.com/scipy/scipy/issues/3870), I would like to discuss the
merit of introducing a new function nanpdist into scipy.spatial. I have
also brought up the problem in the following previous e-mail (
http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and on
SO (
http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values
).

Warren suggested three ways to tackle this problem:

   1. Don't change anything--the users should clean up their data!
   2. nanpdist
   3. Add a keyword argument to pdist that determines how nan should be
   treated.

Clearly, I don't favor the first option since I believe missing values can
be important pieces of information, too. I slightly tend towards option two
because adding a keyword will further complicate an already very long pdist
function.

I'm happy to submit a pull request if there is a consensus that something
should be done.

Best,

Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140813/7c0cae10/attachment.html>

From warren.weckesser at gmail.com  Wed Aug 13 12:15:15 2014
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Wed, 13 Aug 2014 12:15:15 -0400
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
Message-ID: <CAGzF1ueUAaCs00NecnDG93MqR=1uX97QOMqTZDq1SXg_dtpqZA@mail.gmail.com>

On Wed, Aug 13, 2014 at 11:08 AM, Moritz Beber <moritz.beber at gmail.com>
wrote:

> Dear all,
>
> As suggested in this github issue (
> https://github.com/scipy/scipy/issues/3870), I would like to discuss the
> merit of introducing a new function nanpdist into scipy.spatial. I have
> also brought up the problem in the following previous e-mail (
> http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and
> on SO (
> http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values
> ).
>
> Warren suggested three ways to tackle this problem:
>
>    1. Don't change anything--the users should clean up their data!
>    2. nanpdist
>    3. Add a keyword argument to pdist that determines how nan should be
>    treated.
>
> Clearly, I don't favor the first option since I believe missing values can
> be important pieces of information, too. I slightly tend towards option two
> because adding a keyword will further complicate an already very long pdist
> function.
>
> I'm happy to submit a pull request if there is a consensus that something
> should be done.
>
> Best,
>
> Moritz
>

There are two parts to this:

(1)  What is the new calculation for handling nan's?
(2)  What is the API for accessing the new calculation?

Before getting into the API (i.e. nanpdist vs. keyword vs. whatever),
I'd like better understand (1).

Here's a normal use of pdist (no nans):


In [158]: set_printoptions(precision=2)

In [159]: x = np.arange(1., 11).reshape(-1,2)

In [160]: x
Out[160]:
array([[  1.,   2.],
       [  3.,   4.],
       [  5.,   6.],
       [  7.,   8.],
       [  9.,  10.]])

In [161]: pdist(x)
Out[161]:
array([  2.83,   5.66,   8.49,  11.31,   2.83,   5.66,   8.49,   2.83,
         5.66,   2.83])


And here's how pdist currently handles nans:

In [162]: y = x.copy()

In [163]: y[0,1] = nan

In [164]: y[1,0] = nan

In [165]: y
Out[165]:
array([[  1.,  nan],
       [ nan,   4.],
       [  5.,   6.],
       [  7.,   8.],
       [  9.,  10.]])

In [166]: pdist(y)
Out[166]: array([  nan,   nan,   nan,   nan,   nan,   nan,   nan,  2.83,
5.66,  2.83])


That is, *any* distance involving a point that has a nan is nan.
This seems like a reasonable default behavior.

What should nanpdist(y) be?

Based on your code snippet on StackOverflow and your comment in the github
issue, my understanding is this: for any pair, you ignore the coordinates
where either has a nan (i.e. compute the distance in a lower dimension).
In this case, pdist(y) would be

    [nan, 4, 6, 8, 2, 4, 6, 2.83, 5.66, 2.83]

(I'm not sure if you would put nan or something else in that first
position.)

Or, if we use the scaling of `n/(n - p)` that you suggested in the github
issue,
where n is the dimension of the observations and p is the number of
"missing"
coordinates,

    [nan, 8, 12, 16, 4, 8, 12, 2.83, 5.66, 2.83]

Is that correct?

What's the use-case for this behavior?  How widely used is it?


Warren


> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140813/aeb017fd/attachment.html>

From jaime.frio at gmail.com  Wed Aug 13 12:29:22 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 13 Aug 2014 09:29:22 -0700
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
Message-ID: <CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>

On Wed, Aug 13, 2014 at 8:08 AM, Moritz Beber <moritz.beber at gmail.com>
wrote:

> Dear all,
>
> As suggested in this github issue (
> https://github.com/scipy/scipy/issues/3870), I would like to discuss the
> merit of introducing a new function nanpdist into scipy.spatial. I have
> also brought up the problem in the following previous e-mail (
> http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and
> on SO (
> http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values
> ).
>
> Warren suggested three ways to tackle this problem:
>
>    1. Don't change anything--the users should clean up their data!
>    2. nanpdist
>    3. Add a keyword argument to pdist that determines how nan should be
>    treated.
>
> Warren has already pointed this out, but let me insist: what is nanpdist,
or the nan keyword expected to do? Treat pairs of vectors with NaNs as
lower dimensional, removing pairs of entries where either is NaN? Do those
results make any real sense? Thinking of euclidean distance for points in
3D space, I have trouble thinking of a practical situation where "if any Z
coordinate is missing, just give me the distance of the projections onto
the XY plane" would be anything but a misleading result. I presume the case
is different for all those other distances I have never needed to use, so I
am just curious of the use case.

Looking at your linked post, from an implementation point of view, at the
low level function that is actually going to do the heavy lifting, it is
probable better to, rather than hardcode a check for NaN-ness, take a
'where' kwarg, as numpy ufuncs already do (
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments),
and build the masking array in a higher level wrapper. This would make it
easier to eventually make this functionality work with masked arrays or the
like.

As a separate but related issue, I have had this PR open for almost a year
now, https://github.com/scipy/scipy/pull/3163, and although me saying I
want to complete it is getting old, hopefully whatever you have in mind can
fit with the general structure of that.

Lastly, whatever you go for, I don't think you should do anything to pdist
that you don't also do for cdist and the individual distance functions.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140813/24f164a1/attachment.html>

From moritz.beber at gmail.com  Thu Aug 14 05:24:09 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Thu, 14 Aug 2014 11:24:09 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAGzF1ueUAaCs00NecnDG93MqR=1uX97QOMqTZDq1SXg_dtpqZA@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAGzF1ueUAaCs00NecnDG93MqR=1uX97QOMqTZDq1SXg_dtpqZA@mail.gmail.com>
Message-ID: <CAFOFTpRCUEOXzE_q2ox3E_odnpAHXQ_U9bBHpUEoiyraO+p7gA@mail.gmail.com>

Answers to Warren's post:


>
> That is, *any* distance involving a point that has a nan is nan.
> This seems like a reasonable default behavior.
>

I agree, it is the way that most functions in other lanaguages/packages
handle it.


>
> What should nanpdist(y) be?
>
> Based on your code snippet on StackOverflow and your comment in the github
> issue, my understanding is this: for any pair, you ignore the coordinates
> where either has a nan (i.e. compute the distance in a lower dimension).
> In this case, pdist(y) would be
>
>     [nan, 4, 6, 8, 2, 4, 6, 2.83, 5.66, 2.83]
>
> (I'm not sure if you would put nan or something else in that first
> position.)
>
> Or, if we use the scaling of `n/(n - p)` that you suggested in the github
> issue,
> where n is the dimension of the observations and p is the number of
> "missing"
> coordinates,
>
>     [nan, 8, 12, 16, 4, 8, 12, 2.83, 5.66, 2.83]
>
> Is that correct?
>

That is what I suggest. The appropriate scaling would have to be
checked/discussed in detail as it may differ between distance and
similarity measures.


>
> What's the use-case for this behavior?  How widely used is it?
>

>

I work in bioinformatics and my data set consists of thousands of vectors
corresponding to different treatment parameters. Each vector consists of
basically the changes in expression levels of a number of genes. I am
interested in clustering the treatments, i.e., determine which treatments
introduce similar gene expression patterns. Not every treatment leads to
significant expression changes, of course, which is why there are missing
values. So the vectors have roughly 3000 elements and most of them have
about 200 missing values.

The data are scaled to follow a normal distribution so I could just replace
the missing values with the mean and be done with it but I don't think
that's the correct approach. I also don't want the current pdist behavior
as it would disregard the majority of my otherwise perfectly valid data.

As to the popularity of this use-case: Clustering of gene expression data
is very wide-spread, however, usually all gene expression data are
considered and thus every treatment consists of a completely filled vector.
I can't claim that my current use-case is very popular, it's a slightly new
approach.

If you think that this behavior has no place in scipy, no problem at all.

Best,
Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140814/e8c2a3e4/attachment.html>

From moritz.beber at gmail.com  Thu Aug 14 05:33:52 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Thu, 14 Aug 2014 11:33:52 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>
Message-ID: <CAFOFTpR2U0FO8a9YiEVPKjKLLONoOMFyjRAjS-sz9TYTzbwxTw@mail.gmail.com>

Answers to Jaime's post:

Warren has already pointed this out, but let me insist: what is nanpdist,
> or the nan keyword expected to do? Treat pairs of vectors with NaNs as
> lower dimensional, removing pairs of entries where either is NaN? Do those
> results make any real sense? Thinking of euclidean distance for points in
> 3D space, I have trouble thinking of a practical situation where "if any Z
> coordinate is missing, just give me the distance of the projections onto
> the XY plane" would be anything but a misleading result. I presume the case
> is different for all those other distances I have never needed to use, so I
> am just curious of the use case.
>

Please see my answer to Warren about the use-case. In three dimensions this
would certainly not make sense but my use-case has over three thousand
dimensions. What I have in mind is a scaling factor for distance metrics,
as suggested before, and an appropriate consideration of dissimilarity of
the missing coordinate in similarity measures.


>
> Looking at your linked post, from an implementation point of view, at the
> low level function that is actually going to do the heavy lifting, it is
> probable better to, rather than hardcode a check for NaN-ness, take a
> 'where' kwarg, as numpy ufuncs already do (
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments),
> and build the masking array in a higher level wrapper. This would make it
> easier to eventually make this functionality work with masked arrays or the
> like.
>

I'd be perfectly happy to do so. The hard-coded check is inspired by
bottleneck which does exactly that for all its nan* functions. But I agree
that a mask is preferable.


>
> As a separate but related issue, I have had this PR open for almost a year
> now, https://github.com/scipy/scipy/pull/3163, and although me saying I
> want to complete it is getting old, hopefully whatever you have in mind can
> fit with the general structure of that.
>

I haven't fully grasped your code in umath_distance.c.src but that's
probably a separate discussion. I also couldn't tell if some of that code
is automatically generated or all written by hand.


>
> Lastly, whatever you go for, I don't think you should do anything to pdist
> that you don't also do for cdist and the individual distance functions.
>
>
Noted and agreed.

Best,
Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140814/ca0b4d25/attachment.html>

From theodore.goetz at gmail.com  Fri Aug 15 10:59:47 2014
From: theodore.goetz at gmail.com (Johann Goetz)
Date: Fri, 15 Aug 2014 10:59:47 -0400
Subject: [SciPy-Dev] Histogram as its own class
Message-ID: <CAMfAbkiNwO0LjU97OH18zThocNv9UUpivHWAht5nEfFTJ+B8jA@mail.gmail.com>

Hello,
I'm a long-time user of scipy doing mostly multivariate big-data (several
terabytes) analysis in the high-energy physics realm. One thing I've found
useful was to promote the histogram to it's own class. Instead of creating
yet another package, I have a mind to include it into the scipy.stats
module and I would like some feed-back. I.e. is this the right place for
such an object?

I have some documentation, but not enough I would say, and the classes are
currently buried in my "pyhep" project, but they are easily extracted out.

https://bitbucket.org/theodoregoetz/pyhep/wiki/Home

Here are some details:

The histograms I am addressing are N-dimensional over a continuous-domain
(floating-point data, no gaps - though bins can have value inf or nan if
need-be) along each axis. The axes need not be uniform.

There are two classes: HistogramAxis and Histogram. The Axes are always
floating point, but the histogram's data can be any dtype (default: np.int,
a "cast" to float is done when dividing two histograms). I make use of
np.histogramdd() and store the data along with the uncertainty. Many
operations are supported including adding, subtracting, multiplying,
dividing, bin-merging, cutting/clipping along one or more axes, projecting
along an axis, iterating over an axis, filling from a sample with or
without weights.

Most of power in this package is in the fitting method of the histogram
which makes use of scipy.curve_fit(). It handles missing data (when a bin
is inf or nan), can include the uncertainty in the fit, and calculates a
goodness of fit.

On top of this, I have free functions to plot 1D and 2D histograms using
matplotlib, as well as functions to handle reading in large HDF5 files.
These are auxiliary and may not fit into scipy directly.

Thank you all,
Johann.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140815/6c10b2c8/attachment.html>

From ralf.gommers at gmail.com  Mon Aug 18 18:20:22 2014
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 19 Aug 2014 00:20:22 +0200
Subject: [SciPy-Dev] sprint @ EuroSciPy, Aug 31
Message-ID: <CABL7CQjM-OKezmheAE-WR-Ft2W5t2RgCTEUT1YsDJP4oqpBuDg@mail.gmail.com>

Hi all,

Here is a reminder that on Sunday 31 August, there will be a Scipy sprint
at EuroSciPy (in Cambridge, UK). Details can be found at
https://www.euroscipy.org/2014/program/sprints/

Newcomers to Scipy development are very welcome; actually one of the main
goals of the sprint is to help new people to get started. Last year's
sprint was excellent - 20 people joined and we still have all-time highs in
the commits per month and contributors per month graph to show for it:
https://www.openhub.net/p/scipy

If you have time and will be at EuroSciPy: please join!

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140819/872c16ab/attachment.html>

From moritz.beber at gmail.com  Fri Aug 22 10:19:03 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Fri, 22 Aug 2014 16:19:03 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAFOFTpR2U0FO8a9YiEVPKjKLLONoOMFyjRAjS-sz9TYTzbwxTw@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>
	<CAFOFTpR2U0FO8a9YiEVPKjKLLONoOMFyjRAjS-sz9TYTzbwxTw@mail.gmail.com>
Message-ID: <CAFOFTpRAo3QJCFNWqcekHDfMaEaous+yAJiJ5=wBJj=TZJi8GA@mail.gmail.com>

So there's quite obviously not a lot of interest in this. I will simply
write my own little package in that case. I guess after the weekend I'll
close the issue on github unless anyone wants to keep it open.

@Jaime: I've read up on ufuncs and they definitely seem like the way to go.
Can you say a bit more on how you generated
scipy/spatial/src/umath_distance.c.src?
I assume it was generated and not written by hand, so did you do that with
Cython or something else not included in the pull request?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140822/a79a7c13/attachment.html>

From evgeny.burovskiy at gmail.com  Fri Aug 22 10:31:45 2014
From: evgeny.burovskiy at gmail.com (Evgeni Burovski)
Date: Fri, 22 Aug 2014 15:31:45 +0100
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAFOFTpRAo3QJCFNWqcekHDfMaEaous+yAJiJ5=wBJj=TZJi8GA@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>
	<CAFOFTpR2U0FO8a9YiEVPKjKLLONoOMFyjRAjS-sz9TYTzbwxTw@mail.gmail.com>
	<CAFOFTpRAo3QJCFNWqcekHDfMaEaous+yAJiJ5=wBJj=TZJi8GA@mail.gmail.com>
Message-ID: <CAMRo0iuyjo4wtu93rRqTYpXsqwQ=84jOd4zd+S6Hs+p572W-7w@mail.gmail.com>

> So there's quite obviously not a lot of interest in this. I will simply

You might be rushing to a conclusion a bit.
The mailing list was down for a good part of the week. And in general,
you might want to let people a bit more time to respond --- response
times vary a lot, for better or worse.


Evgeni


> write my own little package in that case. I guess after the weekend I'll
> close the issue on github unless anyone wants to keep it open.
>
> @Jaime: I've read up on ufuncs and they definitely seem like the way to go.
> Can you say a bit more on how you generated
> scipy/spatial/src/umath_distance.c.src? I assume it was generated and not
> written by hand, so did you do that with Cython or something else not
> included in the pull request?
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From njs at pobox.com  Fri Aug 22 10:33:16 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 22 Aug 2014 15:33:16 +0100
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAFOFTpRCUEOXzE_q2ox3E_odnpAHXQ_U9bBHpUEoiyraO+p7gA@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAGzF1ueUAaCs00NecnDG93MqR=1uX97QOMqTZDq1SXg_dtpqZA@mail.gmail.com>
	<CAFOFTpRCUEOXzE_q2ox3E_odnpAHXQ_U9bBHpUEoiyraO+p7gA@mail.gmail.com>
Message-ID: <CAPJVwBm++qGZCWf-6vfDzEVUS3GPA+u_QBmuVCSjggwds4NUTA@mail.gmail.com>

On Thu, Aug 14, 2014 at 10:24 AM, Moritz Beber <moritz.beber at gmail.com> wrote:
> I work in bioinformatics and my data set consists of thousands of vectors
> corresponding to different treatment parameters. Each vector consists of
> basically the changes in expression levels of a number of genes. I am
> interested in clustering the treatments, i.e., determine which treatments
> introduce similar gene expression patterns. Not every treatment leads to
> significant expression changes, of course, which is why there are missing
> values. So the vectors have roughly 3000 elements and most of them have
> about 200 missing values.

Just as a scientific issue this seems very odd to me and not at all
what statisticians usually mean by missing data. Surely if you want to
determine "which treatments introduce similar gene expression
patterns" then two treatments that both produce no effect on the
expression of the same gene should be counted as more similar to each
other? If you've measured an expression change to be near 0 then
that's a known measured value that happens to be near 0 -- not an
unknown value that could be arbitrarily large or small and you have no
idea which. (Obviously I don't know any of the details about your
setting, but in particular I worry that your reasoning sounds similar
to common misconceptions about what "significant" actually means. "Not
significantly different from zero" might well be "significantly
different from 1000".)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From moritz.beber at gmail.com  Fri Aug 22 12:13:52 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Fri, 22 Aug 2014 18:13:52 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAMRo0iuyjo4wtu93rRqTYpXsqwQ=84jOd4zd+S6Hs+p572W-7w@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAPOWHWmDhXjYaBX3aY9mhW2qSZcrEyKxVB0xF7Bbizy4WoGL3g@mail.gmail.com>
	<CAFOFTpR2U0FO8a9YiEVPKjKLLONoOMFyjRAjS-sz9TYTzbwxTw@mail.gmail.com>
	<CAFOFTpRAo3QJCFNWqcekHDfMaEaous+yAJiJ5=wBJj=TZJi8GA@mail.gmail.com>
	<CAMRo0iuyjo4wtu93rRqTYpXsqwQ=84jOd4zd+S6Hs+p572W-7w@mail.gmail.com>
Message-ID: <CAFOFTpQpoqBMxxomhNoF85Q_fjQtSvbHaGNhFKA39xxGgD0eJw@mail.gmail.com>

>
> You might be rushing to a conclusion a bit.
> The mailing list was down for a good part of the week. And in general,
> you might want to let people a bit more time to respond --- response
> times vary a lot, for better or worse.
>

I didn't realize that the mailing list had an outage. Thanks for mentioning
it! Also, I'm not terribly in a rush but @argriffing was asking for a PR
about a week ago (
https://github.com/scipy/scipy/issues/3870#issuecomment-52348019) so it
seemed as if he wanted to move things along. I'm reluctant, obviously, to
start a pull request when there's no real interest in it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140822/6063aa6f/attachment.html>

From moritz.beber at gmail.com  Fri Aug 22 13:06:06 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Fri, 22 Aug 2014 19:06:06 +0200
Subject: [SciPy-Dev] Proposal for a new function nanpdist that treats
 NaNs as missing values
In-Reply-To: <CAPJVwBm++qGZCWf-6vfDzEVUS3GPA+u_QBmuVCSjggwds4NUTA@mail.gmail.com>
References: <CAFOFTpRxQqMVDkoyL8iNjiYFAQCtokE1pTqteA_spvbRoygVyg@mail.gmail.com>
	<CAGzF1ueUAaCs00NecnDG93MqR=1uX97QOMqTZDq1SXg_dtpqZA@mail.gmail.com>
	<CAFOFTpRCUEOXzE_q2ox3E_odnpAHXQ_U9bBHpUEoiyraO+p7gA@mail.gmail.com>
	<CAPJVwBm++qGZCWf-6vfDzEVUS3GPA+u_QBmuVCSjggwds4NUTA@mail.gmail.com>
Message-ID: <CAFOFTpQoysSarXU8W4sT6GoprO4haff1ZXw74zzqWV=gvgmAfQ@mail.gmail.com>

Thank you for your response Nathaniel.

I was a bit concerned that by going into the application this would turn
into a discussion about the method rather than whether this is a desirable
concept for scipy. I suppose it's not possible to fully separate the two
issues so I will indulge you.

On Fri, Aug 22, 2014 at 4:33 PM, Nathaniel Smith <njs at pobox.com> wrote:

>
> Just as a scientific issue this seems very odd to me and not at all
> what statisticians usually mean by missing data. Surely if you want to
> determine "which treatments introduce similar gene expression
> patterns" then two treatments that both produce no effect on the
> expression of the same gene should be counted as more similar to each
> other? If you've measured an expression change to be near 0 then
> that's a known measured value that happens to be near 0 -- not an
> unknown value that could be arbitrarily large or small and you have no
> idea which. (Obviously I don't know any of the details about your
> setting, but in particular I worry that your reasoning sounds similar
> to common misconceptions about what "significant" actually means. "Not
> significantly different from zero" might well be "significantly
> different from 1000".)
>

Since I didn't want the discussion to be about the method I tried to
describe the situation briefly and did not give you the whole story. My
apologies.

The real situation is the following: The gene expression data are mapped
onto pathways using information on links between proteins and coding genes.
The pathway definitions come from a multitude of source databases and were
collected in a single database (http://consensuspathdb.org/). Only pathways
that have five or more available scores are considered (this is somewhat
arbitrary, I suppose). Each pathway is then assigned a mean score. Pathways
that have too few scores are not considered. You can read up on more
specifics in [1]. So I consider those pathways that did not make the
cut-off of 5 scores as "missing values". If all the treatments had missing
values at the same pathways, I'd be tempted to just throw those out. We are
considering treatments from different studies, however, and the studies
report gene expression changes for different genes and consequently
different pathways end up having no scores. I still want to be able to
compare treatments between different studies. One approach could be to
rethink the scoring of pathways and introduce an uncertainty that is larger
for pathways with missing scores but since I'm sitting at the end of a
pipeline that lands the treatments and pathway response scores in my lap,
my preferred way of dealing with this is to simply scale up the distance
between treatments where one has a pathway score and it's missing for the
other.

If this seems unreasonable to you, I'm all ears. It does make sense in my
mind.

Cheers,
Moritz

[1] http://toxsci.oxfordjournals.org/content/124/2/278.full in particular
in the subsection "pathway response analysis"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140822/e7a9a528/attachment.html>

From jeff.grasty at gmail.com  Mon Aug 25 18:55:42 2014
From: jeff.grasty at gmail.com (Jeff Grasty)
Date: Mon, 25 Aug 2014 23:55:42 +0100
Subject: [SciPy-Dev] Nyquist Filters
Message-ID: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>

Hi,

One of the features that I have found missing in SciPy are functions to design nyquist and root-nyquist filters, such as raised cosine and root-raised cosine filters.  I have written several functions for this purpose and was curious if anyone thought was a greater need for this. 

Thanks,
Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140825/ceb14283/attachment.sig>

From warren.weckesser at gmail.com  Mon Aug 25 20:07:06 2014
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Mon, 25 Aug 2014 20:07:06 -0400
Subject: [SciPy-Dev] Nyquist Filters
In-Reply-To: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>
References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>
Message-ID: <CAGzF1uf6ztd7rf4BQZcYW_2uoA8p4rr+Ur3GpZHDnVeZka=vpg@mail.gmail.com>

On Mon, Aug 25, 2014 at 6:55 PM, Jeff Grasty <jeff.grasty at gmail.com> wrote:

> Hi,
>
> One of the features that I have found missing in SciPy are functions to
> design nyquist and root-nyquist filters, such as raised cosine and
> root-raised cosine filters.  I have written several functions for this
> purpose and was curious if anyone thought was a greater need for this.
>
>

Yes, that would be great!  I have some scratch work for the raised cosine
and root-raised cosine FIR filters, but they're not ready for contributing
to scipy.  If you have code in pretty good shape, these would be nice
additions to scipy.signal.

The first thing to think about is the API.  What is the API of your code?

A possible design is similar to the Savitzy-Golay filter implementation.
It's a very basic, function-oriented API.  One function, savgol_coeffs,
provides the FIR filter coefficients, given the number of taps and the
parameters of the filter.  Another function, savgol_filter, takes an input
array along with the filter parameters.  It computes the coefficients and
applies the filter.  It is really just a convenience function: it calls
savgol_coeffs to compute the filter coefficients, and applies the filter
using a convolution (the only complication is that it provides several
options for handling the edges of the input).  Even more basic are the
functions for FIR filter design using the window method.  The functions
firwin and firwin2 compute the filter coefficients, and leave it up to the
user to convolve them with their signal.

Looking forward to hearing more.

Warren


Thanks,
> Jeff
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140825/5b251067/attachment.html>

From ndbecker2 at gmail.com  Tue Aug 26 07:39:13 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 26 Aug 2014 07:39:13 -0400
Subject: [SciPy-Dev] Nyquist Filters
References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>
	<CAGzF1uf6ztd7rf4BQZcYW_2uoA8p4rr+Ur3GpZHDnVeZka=vpg@mail.gmail.com>
Message-ID: <lthrl1$q4t$1@ger.gmane.org>

Here's code for nyquist filter coeffs.  I apologize that it is quite old, and 
maybe could be a little more pretty.  It is, however, quite well tested.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nyquist.py
Type: text/x-python
Size: 2754 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140826/225dbb3d/attachment.py>

From kitchi.srikrishna at gmail.com  Tue Aug 26 08:36:35 2014
From: kitchi.srikrishna at gmail.com (Sri Krishna)
Date: Tue, 26 Aug 2014 18:06:35 +0530
Subject: [SciPy-Dev] To use C code or Cython code?
Message-ID: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>

Hi,

I'm new to the Scipy-Dev mailing list, looking to contribute wherever I
can. I was looking through the open issues and saw this issue
<https://github.com/scipy/scipy/issues/3184>, regarding a speed-up for the
convolve2d function.

My confusion arises from the SciPy coding guidelines
<https://github.com/scipy/scipy/blob/master/HACKING.rst.txt> which states
that using Cython is much preferable to using plain C/C++/Fortran.

Would it be desirable then to change the C code of signal/firfilter.c to a
Cythonized code?

Thanks,
Krishna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140826/5c102d4c/attachment.html>

From jtaylor.debian at googlemail.com  Tue Aug 26 13:58:38 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Tue, 26 Aug 2014 19:58:38 +0200
Subject: [SciPy-Dev] To use C code or Cython code?
In-Reply-To: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
References: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
Message-ID: <53FCCACE.8040608@googlemail.com>

On 26.08.2014 14:36, Sri Krishna wrote:
> Hi,
> 
> I'm new to the Scipy-Dev mailing list, looking to contribute wherever I
> can. I was looking through the open issues and saw this issue
> <https://github.com/scipy/scipy/issues/3184>, regarding a speed-up for
> the convolve2d function.
> 
> My confusion arises from the SciPy coding guidelines
> <https://github.com/scipy/scipy/blob/master/HACKING.rst.txt> which
> states that using Cython is much preferable to using plain C/C++/Fortran.
> 
> Would it be desirable then to change the C code of signal/firfilter.c to
> a Cythonized code?
> 

hi,
I think it would be better to keep the core of the function in plain
C/C++ or Fortran.
As this is a function that can profit greatly from lowlevel use of the
hardware we retain more flexibility for optimization by staying with a
lowlevel language. Cython does not offer any advantage at that level of
the code and would make it impossible(?) to use of assembler or intrinsics.

The wrapping to python on the other hand is probably preferable in in
Cython as it simplifies a lot of mundane and error prone issues.

Cheers,
Julian


From kitchi.srikrishna at gmail.com  Tue Aug 26 16:13:03 2014
From: kitchi.srikrishna at gmail.com (Sri Krishna)
Date: Wed, 27 Aug 2014 01:43:03 +0530
Subject: [SciPy-Dev] To use C code or Cython code?
In-Reply-To: <53FCCACE.8040608@googlemail.com>
References: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
	<53FCCACE.8040608@googlemail.com>
Message-ID: <CA+qdgMe1oA4xS25z2UrGNEVveCyJKCfGXxrTNKBphJ2YvFQm9w@mail.gmail.com>

>
> The wrapping to python on the other hand is probably preferable in in
> Cython as it simplifies a lot of mundane and error prone issues.
>

So if I understand correctly - Most, if not all core functionality of Scipy
will be in C/C++/Fortran, and the glue code between C and the python
interface will run on Cython?

Thanks,
Krishna

On 26 August 2014 23:28, Julian Taylor <jtaylor.debian at googlemail.com>
wrote:

> On 26.08.2014 14:36, Sri Krishna wrote:
> > Hi,
> >
> > I'm new to the Scipy-Dev mailing list, looking to contribute wherever I
> > can. I was looking through the open issues and saw this issue
> > <https://github.com/scipy/scipy/issues/3184>, regarding a speed-up for
> > the convolve2d function.
> >
> > My confusion arises from the SciPy coding guidelines
> > <https://github.com/scipy/scipy/blob/master/HACKING.rst.txt> which
> > states that using Cython is much preferable to using plain C/C++/Fortran.
> >
> > Would it be desirable then to change the C code of signal/firfilter.c to
> > a Cythonized code?
> >
>
> hi,
> I think it would be better to keep the core of the function in plain
> C/C++ or Fortran.
> As this is a function that can profit greatly from lowlevel use of the
> hardware we retain more flexibility for optimization by staying with a
> lowlevel language. Cython does not offer any advantage at that level of
> the code and would make it impossible(?) to use of assembler or intrinsics.
>
> The wrapping to python on the other hand is probably preferable in in
> Cython as it simplifies a lot of mundane and error prone issues.
>
> Cheers,
> Julian
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140827/af19696e/attachment.html>

From ewm at redtetrahedron.org  Tue Aug 26 17:15:36 2014
From: ewm at redtetrahedron.org (Eric Moore)
Date: Tue, 26 Aug 2014 17:15:36 -0400
Subject: [SciPy-Dev] To use C code or Cython code?
In-Reply-To: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
References: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
Message-ID: <CAGeA38=cev-1dkCzhOYk_a1hmH+-2TW4mJchtWqjTRz_r+Urrg@mail.gmail.com>

Krishna,

A good place to start before making any changes to firfilter.c would be to
evaluate the various convolution routines that already exist.  Depending on
the inputs, their speed varies quite a bit.  We currently have a mix of 1d,
2d, and nd convolution routines in signal, ndimage and numpy (also possibly
elsewhere). It would be good to move all of these to a single routine (at
least where practical).


A related piece of particularly low hanging fruit in signal is to teach
lfilter to be smarter when it is passed a FIR filter.  There ought to be an
immediate speed win here.

Eric


On Tue, Aug 26, 2014 at 8:36 AM, Sri Krishna <kitchi.srikrishna at gmail.com>
wrote:

> Hi,
>
> I'm new to the Scipy-Dev mailing list, looking to contribute wherever I
> can. I was looking through the open issues and saw this issue
> <https://github.com/scipy/scipy/issues/3184>, regarding a speed-up for
> the convolve2d function.
>
> My confusion arises from the SciPy coding guidelines
> <https://github.com/scipy/scipy/blob/master/HACKING.rst.txt> which states
> that using Cython is much preferable to using plain C/C++/Fortran.
>
> Would it be desirable then to change the C code of signal/firfilter.c to a
> Cythonized code?
>
> Thanks,
> Krishna
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140826/25f35c33/attachment.html>

From sturla.molden at gmail.com  Wed Aug 27 04:36:42 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Wed, 27 Aug 2014 08:36:42 +0000 (UTC)
Subject: [SciPy-Dev] To use C code or Cython code?
References: <CA+qdgMet4YsJ5CCOvtNSryL0khG1xrmQ8R-Q0yhoU0+9k6rdPA@mail.gmail.com>
	<53FCCACE.8040608@googlemail.com>
	<CA+qdgMe1oA4xS25z2UrGNEVveCyJKCfGXxrTNKBphJ2YvFQm9w@mail.gmail.com>
Message-ID: <245045135430800225.357621sturla.molden-gmail.com@news.gmane.org>

Sri Krishna <kitchi.srikrishna at gmail.com> wrote:

> So if I understand correctly - Most, if not all core functionality of Scipy
> will be in C/C++/Fortran, and the glue code between C and the python
> interface will run on Cython?

The glue for Fortran would normally be f2py.

Sturla


From jtaylor.debian at googlemail.com  Wed Aug 27 13:07:24 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Wed, 27 Aug 2014 19:07:24 +0200
Subject: [SciPy-Dev] ANN:  NumPy 1.9.0 release candidate 1 available
Message-ID: <53FE104C.2020006@googlemail.com>

Hello,

Almost punctually for EuroScipy we have finally managed to release the
first release candidate of NumPy 1.9.
We intend to only fix bugs until the final release which we plan to do
in the next 1-2 weeks.

In this release numerous performance improvements have been added, most
significantly the indexing code has been rewritten be several times
faster for most cases and performance of using small arrays and scalars
has almost doubled.
Plenty of other functions have been improved too, nonzero, where,
count_nonzero, floating point min/max, boolean argmin/argmax,
searchsorted, triu/tril, masked sorting can be expected to perform
significantly better in many cases.

Also NumPy now releases the GIL for more functions, most notably the
indexing now releases it and the random modules state object has a
private lock instead of using the GIL. This allows leveraging pure
python threads more efficiently.

In order to make working with arrays containing NaN values easier
nanmedian and nanpercentile have been added which ignore these values.
These functions and the regular median and percentile now also support
generalized axis arguments that ufuncs already have, these allow
reducing along multiple axis in one call.

Please see the release notes for all the details. Please also take not
of the many small compatibility notes and deprecation in the notes.
https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst

The source tarballs and win32 binaries can be downloaded here:
https://sourceforge.net/projects/numpy/files/NumPy/1.9.0rc1

Cheers,
Julian Taylor

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140827/db399ff3/attachment.sig>

From jeff.grasty at gmail.com  Thu Aug 28 14:43:44 2014
From: jeff.grasty at gmail.com (Jeff Grasty)
Date: Thu, 28 Aug 2014 18:43:44 +0000 (UTC)
Subject: [SciPy-Dev] Nyquist Filters
References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>
	<CAGzF1uf6ztd7rf4BQZcYW_2uoA8p4rr+Ur3GpZHDnVeZka=vpg@mail.gmail.com>
Message-ID: <loom.20140828T202223-45@post.gmane.org>

Warren,

I currently have two functions, Nyquist and rootNyquist, that return the 
coefficients of a nyquist or root-nyquist filter with a specified alpha 
and length.  The algorithm that the functions implement is one proposed 
by Fred Harris in hist multi-rate signal processing book.  It uses the 
remez algorithm to start as an initial guess and uses a gradient 
descent method to adjust the cutoff frequency of the passband until 
the filter's 3 dB (or 6 dB) point is at half the baud rate.

I think the API that you mentioned for the Savitzky-Golay filter uses 
sounds simple and effective.

What are ideas of how to test this?  I can think of writing some simple 
unit tests that check filter length, gain, etc.  Would that be sufficient.

Here is a link to my github project for the code so far: 
https://github.com/fstop22/nyquist_filters

Thanks,
Jeff


From moritz.beber at gmail.com  Fri Aug 29 06:13:08 2014
From: moritz.beber at gmail.com (Moritz Beber)
Date: Fri, 29 Aug 2014 12:13:08 +0200
Subject: [SciPy-Dev] nested setup.py scripts
Message-ID: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>

Dear all,

I want to generate a package with a submodule structure similar to what
numpy and scipy use. (Or do you recommend not doing that?)  I have read the
following pieces of documentation but I'm still unclear about how the main
setup.py script discovers the nested scripts and gets the configuration
values from those. Is this documented somewhere or can anyone point me to
how this is done?

Thank you in advance,
Moritz

P.S.: What I've read:
https://github.com/numpy/numpy/blob/master/doc/DISTUTILS.rst.txt
http://docs.scipy.org/doc/scipy-dev/reference/hacking.html
http://docs.scipy.org/doc/scipy/reference/api.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140829/b19a0687/attachment.html>

From ndbecker2 at gmail.com  Fri Aug 29 08:50:12 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 29 Aug 2014 08:50:12 -0400
Subject: [SciPy-Dev] ANN: NumPy 1.9.0 release candidate 1 available
References: <53FE104C.2020006@googlemail.com> <ltpsm5$rmo$2@ger.gmane.org>
Message-ID: <ltpsu4$rmo$4@ger.gmane.org>

OK, it's fixed by doing:

rm -rf ~/.local/lib/python2.7/site-packages/numpy*
python setup.py install --user

I guess something was not cleaned out from previous packages


From ndbecker2 at gmail.com  Fri Aug 29 09:25:49 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 29 Aug 2014 09:25:49 -0400
Subject: [SciPy-Dev] Nyquist Filters
References: <96645C60-DC10-4B80-A521-7733C4D108C8@gmail.com>
	<CAGzF1uf6ztd7rf4BQZcYW_2uoA8p4rr+Ur3GpZHDnVeZka=vpg@mail.gmail.com>
	<loom.20140828T202223-45@post.gmane.org>
Message-ID: <ltpv0u$skt$1@ger.gmane.org>

Interesting, but I am maybe missing something.  This optimization only enforces 
flatness in passband and stopband, and 3dB pt.

But nyquist filter is defined as having nyquist symmetry, which is what leads to 
zero ISI (the main reason for using a nyquist filter).  There doesn't appear to 
be anything enforcing this symmmetry.

Jeff Grasty wrote:

> Warren,
> 
> I currently have two functions, Nyquist and rootNyquist, that return the
> coefficients of a nyquist or root-nyquist filter with a specified alpha
> and length.  The algorithm that the functions implement is one proposed
> by Fred Harris in hist multi-rate signal processing book.  It uses the
> remez algorithm to start as an initial guess and uses a gradient
> descent method to adjust the cutoff frequency of the passband until
> the filter's 3 dB (or 6 dB) point is at half the baud rate.
> 
> I think the API that you mentioned for the Savitzky-Golay filter uses
> sounds simple and effective.
> 
> What are ideas of how to test this?  I can think of writing some simple
> unit tests that check filter length, gain, etc.  Would that be sufficient.
> 
> Here is a link to my github project for the code so far:
> https://github.com/fstop22/nyquist_filters
> 
> Thanks,
> Jeff
-- 
-- Those who don't understand recursion are doomed to repeat it


From ben.root at ou.edu  Fri Aug 29 09:26:47 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 29 Aug 2014 09:26:47 -0400
Subject: [SciPy-Dev] [Numpy-discussion] ANN: NumPy 1.9.0 release
	candidate 1 available
In-Reply-To: <ltpsu4$rmo$4@ger.gmane.org>
References: <53FE104C.2020006@googlemail.com> <ltpsm5$rmo$2@ger.gmane.org>
	<ltpsu4$rmo$4@ger.gmane.org>
Message-ID: <CANNq6Fk-CqT-SdBHZN21caLqdcDSqxwNHavYa5X=+f-6hjqv1A@mail.gmail.com>

It is generally a good idea when switching between releases to execute "git
clean -fxd" prior to rebuilding. Admittedly, I don't know how cleaning out
that directory in .local could have impacted things. Go figure.

Cheers!
Ben Root


On Fri, Aug 29, 2014 at 8:50 AM, Neal Becker <ndbecker2 at gmail.com> wrote:

> OK, it's fixed by doing:
>
> rm -rf ~/.local/lib/python2.7/site-packages/numpy*
> python setup.py install --user
>
> I guess something was not cleaned out from previous packages
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140829/077a339a/attachment.html>

From njs at pobox.com  Fri Aug 29 16:06:35 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 29 Aug 2014 21:06:35 +0100
Subject: [SciPy-Dev] nested setup.py scripts
In-Reply-To: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>
References: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>
Message-ID: <CAPJVwBm-RJaZSQCwW4adeO-kqBY8nQnSFF-Lok2ej0gmvD0VBQ@mail.gmail.com>

On Fri, Aug 29, 2014 at 11:13 AM, Moritz Beber <moritz.beber at gmail.com> wrote:
> Dear all,
>
> I want to generate a package with a submodule structure similar to what
> numpy and scipy use. (Or do you recommend not doing that?)  I have read the
> following pieces of documentation but I'm still unclear about how the main
> setup.py script discovers the nested scripts and gets the configuration
> values from those. Is this documented somewhere or can anyone point me to
> how this is done?

Getting clever with setup.py leads to suffering. Suffering leads to
hate. Hate leads to the Dark Side.

(I have no idea how numpy and scipy's setup.py work, but any time I've
tried doing anything 1/10th that clever with setup.py I've regretted
it.)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From ralf.gommers at gmail.com  Fri Aug 29 16:09:17 2014
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 29 Aug 2014 22:09:17 +0200
Subject: [SciPy-Dev] nested setup.py scripts
In-Reply-To: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>
References: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>
Message-ID: <CABL7CQjOYaQSP0G68TFWB2uo7DGpksuVTG+B2ueN+adPWkZ0qA@mail.gmail.com>

On Fri, Aug 29, 2014 at 12:13 PM, Moritz Beber <moritz.beber at gmail.com>
wrote:

> Dear all,
>
> I want to generate a package with a submodule structure similar to what
> numpy and scipy use. (Or do you recommend not doing that?)
>

It's a pretty standard layout for a Python package (assuming it's large in
size and has some compiled code in it that actually needs multiple
setup.py's), it's fine to copy this structure.


> I have read the following pieces of documentation but I'm still unclear
> about how the main setup.py script discovers the nested scripts and gets
> the configuration values from those. Is this documented somewhere or can
> anyone point me to how this is done?
>

In the main setup.py you'll see:

    config.add_subpackage('scipy')

And in scipy/setup.py

    config.add_subpackage('cluster')
    config.add_subpackage('constants')
    config.add_subpackage('fftpack')
    ...<etc>

Cheers,
Ralf


> Thank you in advance,
> Moritz
>
> P.S.: What I've read:
> https://github.com/numpy/numpy/blob/master/doc/DISTUTILS.rst.txt
> http://docs.scipy.org/doc/scipy-dev/reference/hacking.html
> http://docs.scipy.org/doc/scipy/reference/api.html
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140829/6e35f2ce/attachment.html>

From ralf.gommers at gmail.com  Fri Aug 29 16:10:10 2014
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 29 Aug 2014 22:10:10 +0200
Subject: [SciPy-Dev] nested setup.py scripts
In-Reply-To: <CAPJVwBm-RJaZSQCwW4adeO-kqBY8nQnSFF-Lok2ej0gmvD0VBQ@mail.gmail.com>
References: <CAFOFTpTr6M1SAuRqNZng-hov06fLZ7OA6KKQqtM5TrWKNsCPKA@mail.gmail.com>
	<CAPJVwBm-RJaZSQCwW4adeO-kqBY8nQnSFF-Lok2ej0gmvD0VBQ@mail.gmail.com>
Message-ID: <CABL7CQinz3zJLnmkujCNsQQE1wcGnDKrrcLKVs8QXQdMS25-Tw@mail.gmail.com>

On Fri, Aug 29, 2014 at 10:06 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Aug 29, 2014 at 11:13 AM, Moritz Beber <moritz.beber at gmail.com>
> wrote:
> > Dear all,
> >
> > I want to generate a package with a submodule structure similar to what
> > numpy and scipy use. (Or do you recommend not doing that?)  I have read
> the
> > following pieces of documentation but I'm still unclear about how the
> main
> > setup.py script discovers the nested scripts and gets the configuration
> > values from those. Is this documented somewhere or can anyone point me to
> > how this is done?
>
> Getting clever with setup.py leads to suffering. Suffering leads to
> hate. Hate leads to the Dark Side.
>
> (I have no idea how numpy and scipy's setup.py work, but any time I've
> tried doing anything 1/10th that clever with setup.py I've regretted
> it.)
>

:) very true - keep the complexity as low as you possibly can

Ralf


>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140829/11611db7/attachment.html>