From d.l.goldsmith at gmail.com  Sun Aug  1 00:18:30 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sat, 31 Jul 2010 21:18:30 -0700
Subject: [SciPy-Dev] Status of scipy.* docstrings
Message-ID: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>

Hi, folks!  Except for scipy.stats, the docstrings of all sub-packages
immediately "below" scipy have now had autosummary directives for all (with
one exception) their objects added to them, with either the existing
description if there already was one, or the description pulled in by the
autosummary (perhaps paraphrased if necessary to satisfy the 75 character
restriction) if said pulling worked, or "TODO" if neither of those
conditions were met; scipy.stats was already partially under autosummary
"control" when I checked it, so I've left it alone pending further info as
to whether this incompleteness is intentional or not.  I will continue to
work my way down the namespace tree - at a reduced pace - but I just wanted
to announce that this *ad hoc* (i.e., "unofficial") milestone has been met,
and point out that some of the holes alluded to above can serve as pointers
to places that need work, e.g., places where the autosummary directive
either pulled nothing, "failed to parse" the summary, or pulled an
excessively long description (indicating, IIUC, a docstring w/ an
excessively long Brief Summary).

Thanks for all you do (and thanks for all the kudos that have come in post
my resignation announcement).

DG

-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100731/4958973d/attachment.html>

From josef.pktd at gmail.com  Sun Aug  1 05:48:08 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 1 Aug 2010 05:48:08 -0400
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
Message-ID: <AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>

On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> Hi, folks!? Except for scipy.stats, the docstrings of all sub-packages
> immediately "below" scipy have now had autosummary directives for all (with
> one exception) their objects added to them, with either the existing
> description if there already was one, or the description pulled in by the
> autosummary (perhaps paraphrased if necessary to satisfy the 75 character
> restriction) if said pulling worked, or "TODO" if neither of those
> conditions were met; scipy.stats was already partially under autosummary
> "control" when I checked it, so I've left it alone pending further info as
> to whether this incompleteness is intentional or not.? I will continue to
> work my way down the namespace tree - at a reduced pace - but I just wanted
> to announce that this ad hoc (i.e., "unofficial") milestone has been met,
> and point out that some of the holes alluded to above can serve as pointers
> to places that need work, e.g., places where the autosummary directive
> either pulled nothing, "failed to parse" the summary, or pulled an
> excessively long description (indicating, IIUC, a docstring w/ an
> excessively long Brief Summary).

Is there a way to handle now autosummary and similar directives in
python modules, e.g. info.py.

What's the pattern/recommendation now for content in the module
docstring, in __init__.py or info.py, versus subpackage rst file ?

(I'm still trying to catch up with recent changes.)

>
> Thanks for all you do (and thanks for all the kudos that have come in post
> my resignation announcement).

Also a big thank you from me, especially for getting a more consistent
structure into the docs.

Josef


>
> DG
>
> --
> Mathematician: noun, someone who disavows certainty when their uncertainty
> set is non-empty, even if that set has measure zero.
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From d.l.goldsmith at gmail.com  Sun Aug  1 17:21:12 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sun, 1 Aug 2010 14:21:12 -0700
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
Message-ID: <AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>

Hi, josef, and thanks!

On Sun, Aug 1, 2010 at 2:48 AM, <josef.pktd at gmail.com> wrote:

> On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
> > Hi, folks!  Except for scipy.stats, the docstrings of all sub-packages
> > immediately "below" scipy have now had autosummary directives for all
> (with
> > one exception) their objects added to them, with either the existing
> > description if there already was one, or the description pulled in by the
> > autosummary (perhaps paraphrased if necessary to satisfy the 75 character
> > restriction) if said pulling worked, or "TODO" if neither of those
> > conditions were met; scipy.stats was already partially under autosummary
> > "control" when I checked it, so I've left it alone pending further info
> as
> > to whether this incompleteness is intentional or not.  I will continue to
> > work my way down the namespace tree - at a reduced pace - but I just
> wanted
> > to announce that this ad hoc (i.e., "unofficial") milestone has been met,
> > and point out that some of the holes alluded to above can serve as
> pointers
> > to places that need work, e.g., places where the autosummary directive
> > either pulled nothing, "failed to parse" the summary, or pulled an
> > excessively long description (indicating, IIUC, a docstring w/ an
> > excessively long Brief Summary).
>
> Is there a way to handle now autosummary and similar directives in
> python modules, e.g. info.py.
>

Please clarify precisely what you mean: exactly what problem(s) are you
seeing/having?


> What's the pattern/recommendation now for content in the module
> docstring, in __init__.py or info.py, versus subpackage rst file ?
>

I don't think a formal "policy" was ever formally adopted.  I, rather
unilaterally, took it upon myself to "standardize" to using the autosummary
directive in sub-package and module docstrings on the grounds that it
assures consistency across at least two presentations: the target object
docstring and its one line summary in the auto-rendering of the docstring of
its parent namespace (unfortunately, it doesn't assure consistency in the
"terminal" presentation of the latter docstring--presently, that has to be
done manually--but maybe automation of that too can be added down the
road).  I had no qualms about making the unilateral decision to do this
because: a) I felt my reasoning for doing so was consistent w/ our general
philosophy of fighting docstring divergence, and b) my change could always
be reverted.  Anyway, there it is: if people feel that this is how we should
continue, then there are now a bunch of examples to follow; if they don't,
the changes can be reverted and a different approach to consistency and
minimal maintenance can be proposed.  As far as narrative content in these
top level docstrings is concerned, I did not provide any where it did not
already exist - that open issue is still open AFAIC.

DG


>
> (I'm still trying to catch up with recent changes.)
>
> >
> > Thanks for all you do (and thanks for all the kudos that have come in
> post
> > my resignation announcement).
>
> Also a big thank you from me, especially for getting a more consistent
> structure into the docs.
>
> Josef
>
>
> >
> > DG
> >
> > --
> > Mathematician: noun, someone who disavows certainty when their
> uncertainty
> > set is non-empty, even if that set has measure zero.
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100801/57a8b306/attachment.html>

From josef.pktd at gmail.com  Sun Aug  1 18:34:34 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 1 Aug 2010 18:34:34 -0400
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
Message-ID: <AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>

On Sun, Aug 1, 2010 at 5:21 PM, David Goldsmith <d.l.goldsmith at gmail.com> wrote:
> Hi, josef, and thanks!
>
> On Sun, Aug 1, 2010 at 2:48 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > Hi, folks!? Except for scipy.stats, the docstrings of all sub-packages
>> > immediately "below" scipy have now had autosummary directives for all
>> > (with
>> > one exception) their objects added to them, with either the existing
>> > description if there already was one, or the description pulled in by
>> > the
>> > autosummary (perhaps paraphrased if necessary to satisfy the 75
>> > character
>> > restriction) if said pulling worked, or "TODO" if neither of those
>> > conditions were met; scipy.stats was already partially under autosummary
>> > "control" when I checked it, so I've left it alone pending further info
>> > as
>> > to whether this incompleteness is intentional or not.? I will continue
>> > to
>> > work my way down the namespace tree - at a reduced pace - but I just
>> > wanted
>> > to announce that this ad hoc (i.e., "unofficial") milestone has been
>> > met,
>> > and point out that some of the holes alluded to above can serve as
>> > pointers
>> > to places that need work, e.g., places where the autosummary directive
>> > either pulled nothing, "failed to parse" the summary, or pulled an
>> > excessively long description (indicating, IIUC, a docstring w/ an
>> > excessively long Brief Summary).
>>
>> Is there a way to handle now autosummary and similar directives in
>> python modules, e.g. info.py.
>
> Please clarify precisely what you mean: exactly what problem(s) are you
> seeing/having?
>
>>
>> What's the pattern/recommendation now for content in the module
>> docstring, in __init__.py or info.py, versus subpackage rst file ?
>
> I don't think a formal "policy" was ever formally adopted.? I, rather
> unilaterally, took it upon myself to "standardize" to using the autosummary
> directive in sub-package and module docstrings on the grounds that it
> assures consistency across at least two presentations: the target object
> docstring and its one line summary in the auto-rendering of the docstring of
> its parent namespace (unfortunately, it doesn't assure consistency in the
> "terminal" presentation of the latter docstring--presently, that has to be
> done manually--but maybe automation of that too can be added down the
> road).? I had no qualms about making the unilateral decision to do this
> because: a) I felt my reasoning for doing so was consistent w/ our general
> philosophy of fighting docstring divergence, and b) my change could always
> be reverted.? Anyway, there it is: if people feel that this is how we should
> continue, then there are now a bunch of examples to follow; if they don't,
> the changes can be reverted and a different approach to consistency and
> minimal maintenance can be proposed.? As far as narrative content in these
> top level docstrings is concerned, I did not provide any where it did not
> already exist - that open issue is still open AFAIC.

My impression was that module docstrings, in __init__.py and info.py
are mainly for the commandline/interpreter, and I thought for most
subpackages they are or were not included in the sphinx rendered docs.
In this case, autosummary would be noise in the interpreter and not
picked up by sphinx, which uses the corresponding rst files.

(But I lost a bit the overview how and which parts interpreter,
doceditor and sphinx render.)

Josef


>
> DG
>
>>
>> (I'm still trying to catch up with recent changes.)
>>
>> >
>> > Thanks for all you do (and thanks for all the kudos that have come in
>> > post
>> > my resignation announcement).
>>
>> Also a big thank you from me, especially for getting a more consistent
>> structure into the docs.
>>
>> Josef
>>
>>
>> >
>> > DG
>> >
>> > --
>> > Mathematician: noun, someone who disavows certainty when their
>> > uncertainty
>> > set is non-empty, even if that set has measure zero.
>> >
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> --
> Mathematician: noun, someone who disavows certainty when their uncertainty
> set is non-empty, even if that set has measure zero.
>
> Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
> lies, prevents mankind from committing a general suicide.? (As interpreted
> by Robert Graves)
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From scott.sinclair.za at gmail.com  Mon Aug  2 02:51:01 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Mon, 2 Aug 2010 08:51:01 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
Message-ID: <AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>

>On 2 August 2010 00:34,  <josef.pktd at gmail.com> wrote:
>
> My impression was that module docstrings, in __init__.py and info.py
> are mainly for the commandline/interpreter, and I thought for most
> subpackages they are or were not included in the sphinx rendered docs.

This is correct, __init__.py and info.py are not included in the
Sphinx rendered docs. I think Pauli hand copied their contents into
the <sub-package>.rst files when he created those.

> In this case, autosummary would be noise in the interpreter and not
> picked up by sphinx, which uses the corresponding rst files.

The autosummary directives have only been added in the
<sub-package>.rst files, not the __init__.py and info.py files, so the
Sphinx markup won't appear in the interpreter. The <sub-package>.rst
files and corresponding __init__/info.py files currently have no link
and will need to be separately maintained.

It shouldn't be hard to strip out the Sphinx directives from the
<sub-package>.rst files and periodically copy the updated content into
the __init__.py and info.py, so it probably makes sense to work on the
<sub-package>.rst files for now.

Cheers,
Scott


From josef.pktd at gmail.com  Mon Aug  2 05:32:26 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Aug 2010 05:32:26 -0400
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
Message-ID: <AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>

On Mon, Aug 2, 2010 at 2:51 AM, Scott Sinclair
<scott.sinclair.za at gmail.com> wrote:
>>On 2 August 2010 00:34, ?<josef.pktd at gmail.com> wrote:
>>
>> My impression was that module docstrings, in __init__.py and info.py
>> are mainly for the commandline/interpreter, and I thought for most
>> subpackages they are or were not included in the sphinx rendered docs.
>
> This is correct, __init__.py and info.py are not included in the
> Sphinx rendered docs. I think Pauli hand copied their contents into
> the <sub-package>.rst files when he created those.
>
>> In this case, autosummary would be noise in the interpreter and not
>> picked up by sphinx, which uses the corresponding rst files.
>
> The autosummary directives have only been added in the
> <sub-package>.rst files, not the __init__.py and info.py files, so the
> Sphinx markup won't appear in the interpreter. The <sub-package>.rst
> files and corresponding __init__/info.py files currently have no link
> and will need to be separately maintained.
>
> It shouldn't be hard to strip out the Sphinx directives from the
> <sub-package>.rst files and periodically copy the updated content into
> the __init__.py and info.py, so it probably makes sense to work on the
> <sub-package>.rst files for now.

my understanding:

>From the source tab in http://docs.scipy.org/scipy/docs/scipy.fftpack/
, it looks like this is destined for info.py which is pulled in by
__init__.py.
The rst docs are at http://docs.scipy.org/scipy/docs/scipy-docs/fftpack.rst/

Cheers,
Josef


> Cheers,
> Scott
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From scott.sinclair.za at gmail.com  Mon Aug  2 06:34:24 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Mon, 2 Aug 2010 12:34:24 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
Message-ID: <AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>

>On 2 August 2010 11:32,  <josef.pktd at gmail.com> wrote:
> On Mon, Aug 2, 2010 at 2:51 AM, Scott Sinclair
> <scott.sinclair.za at gmail.com> wrote:
>>>On 2 August 2010 00:34, ?<josef.pktd at gmail.com> wrote:
>>>
>>> My impression was that module docstrings, in __init__.py and info.py
>>> are mainly for the commandline/interpreter, and I thought for most
>>> subpackages they are or were not included in the sphinx rendered docs.
>>
>> This is correct, __init__.py and info.py are not included in the
>> Sphinx rendered docs. I think Pauli hand copied their contents into
>> the <sub-package>.rst files when he created those.
>>
>>> In this case, autosummary would be noise in the interpreter and not
>>> picked up by sphinx, which uses the corresponding rst files.
>>
>> The autosummary directives have only been added in the
>> <sub-package>.rst files, not the __init__.py and info.py files, so the
>> Sphinx markup won't appear in the interpreter. The <sub-package>.rst
>> files and corresponding __init__/info.py files currently have no link
>> and will need to be separately maintained.
>>
>> It shouldn't be hard to strip out the Sphinx directives from the
>> <sub-package>.rst files and periodically copy the updated content into
>> the __init__.py and info.py, so it probably makes sense to work on the
>> <sub-package>.rst files for now.
>
> my understanding:
>
> >From the source tab in http://docs.scipy.org/scipy/docs/scipy.fftpack/
> , it looks like this is destined for info.py which is pulled in by
> __init__.py.
> The rst docs are at http://docs.scipy.org/scipy/docs/scipy-docs/fftpack.rst/

Hmm. Good point. I was looking at the way the docs are built from the
source tree, not at what was edited in the doc-editor.

When the documentation is generated from the source tree, the Sphinx
master document doc/source/index.rst pulls in doc/source/fftpack.rst,
not __init.__.py. In the doc-editor doc/source/index.rst is at
http://docs.scipy.org/scipy/docs/scipy-docs/index.rst/.

The solution would be to use the recent edits at
http://docs.scipy.org/scipy/docs/scipy.<subpackage>/ to update what's
at http://docs.scipy.org/scipy/docs/scipy-docs/<subpackage>.rst/ then
remove the Sphinx directives from
http://docs.scipy.org/scipy/docs/scipy.<subpackage>/ or just revert
them to what's in trunk for now.

Maybe the http://docs.scipy.org/scipy/docs/scipy.<subpackage>/
docstrings should also be marked as unimportant to warn people that
the situation is a little tricky to unravel..

Cheers,
Scott


From pav at iki.fi  Mon Aug  2 07:24:29 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 2 Aug 2010 11:24:29 +0000 (UTC)
Subject: [SciPy-Dev] Status of scipy.* docstrings
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
Message-ID: <i369td$m4a$1@dough.gmane.org>

Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote:
[clip]
> Maybe the http://docs.scipy.org/scipy/docs/scipy.<subpackage>/
> docstrings should also be marked as unimportant to warn people that the
> situation is a little tricky to unravel..

A valid alternative is just to put all of the documentation to the 
info.py, and just put

	.. automodule:: scipy.optimize

to the optimize.rst.

Autosummary directives work correctly in submodule docstrings. For 
instance, this page:

  http://docs.scipy.org/doc/numpy/reference/routines.fft.html

comes solely from ``numpy/fft/info.py``:

  http://docs.scipy.org/doc/numpy/_sources/reference/routines.fft.txt
  http://docs.scipy.org/numpy/source/numpy/dist/lib64/python2.4/site-packages/numpy/fft/info.py

You can also write something like

.. autosummary::
   :toctree:

   some_function       Short blurb describing what it does 

and the "Short blurb ..." will be ignored in the HTML output, but it is 
useful for the people looking at the text via help().

-- 
Pauli Virtanen


From scott.sinclair.za at gmail.com  Mon Aug  2 09:11:42 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Mon, 2 Aug 2010 15:11:42 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <i369td$m4a$1@dough.gmane.org>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
Message-ID: <AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>

On 2 August 2010 13:24, Pauli Virtanen <pav at iki.fi> wrote:
> Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote:
> [clip]
>> Maybe the http://docs.scipy.org/scipy/docs/scipy.<subpackage>/
>> docstrings should also be marked as unimportant to warn people that the
>> situation is a little tricky to unravel..
>
> A valid alternative is just to put all of the documentation to the
> info.py, and just put
>
> ? ? ? ?.. automodule:: scipy.optimize
>
> to the optimize.rst.

This is the approach I prefer as well. I tried to suggest it the last
time we had this discussion
(http://mail.scipy.org/pipermail/scipy-dev/2010-June/015075.html).
Then there is only one place to keep the docs up to date, the downside
being that a bit of Sphinx markup will be seen in the terminal help
for sub-packages.

The question is whether there actually is a strong aversion to seeing
Sphinx markup in the terminal help at the top-level of the
sub-packages. If it doesn't bother too many people, then your
suggestion is the right way to go for all of the Scipy sub-packages.

Cheers,
Scott


From d.l.goldsmith at gmail.com  Thu Aug  5 04:17:16 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Thu, 5 Aug 2010 01:17:16 -0700
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
Message-ID: <AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>

OK, so, should I stop adding autosummaries to module docstrings and revert
the ones I did?

DG

On Mon, Aug 2, 2010 at 6:11 AM, Scott Sinclair
<scott.sinclair.za at gmail.com>wrote:

> On 2 August 2010 13:24, Pauli Virtanen <pav at iki.fi> wrote:
> > Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote:
> > [clip]
> >> Maybe the http://docs.scipy.org/scipy/docs/scipy.<subpackage>/
> >> docstrings should also be marked as unimportant to warn people that the
> >> situation is a little tricky to unravel..
> >
> > A valid alternative is just to put all of the documentation to the
> > info.py, and just put
> >
> >        .. automodule:: scipy.optimize
> >
> > to the optimize.rst.
>
> This is the approach I prefer as well. I tried to suggest it the last
> time we had this discussion
> (http://mail.scipy.org/pipermail/scipy-dev/2010-June/015075.html).
> Then there is only one place to keep the docs up to date, the downside
> being that a bit of Sphinx markup will be seen in the terminal help
> for sub-packages.
>
> The question is whether there actually is a strong aversion to seeing
> Sphinx markup in the terminal help at the top-level of the
> sub-packages. If it doesn't bother too many people, then your
> suggestion is the right way to go for all of the Scipy sub-packages.
>
> Cheers,
> Scott
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100805/cf4edc58/attachment.html>

From pav at iki.fi  Thu Aug  5 05:01:25 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 5 Aug 2010 09:01:25 +0000 (UTC)
Subject: [SciPy-Dev] Status of scipy.* docstrings
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
Message-ID: <i3dul5$pfm$1@dough.gmane.org>

Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote:
> OK, so, should I stop adding autosummaries to module docstrings and
> revert the ones I did?

I think the Sphinx markup involved is not heavy, and having to maintain 
two nearly identical documents is not something we really want to do.

It might be possible to autogenerate the info.py's, but frankly, I think 
setting that up is not a very useful use of time, just to avoid a few RST 
directives. We can think about it later, but for now the priority should 
be to get some useful information both to the HTML docs and to the 
command-line help, and putting everything to info.py seems the way to go 
for me.

I'd at least be OK with moving everything from the *.rst files to 
info.py. In general, I'd like to structure `info.py` in a similar way as 
it's in `numpy.fft`: 

- module name title etc. on top

- function/class listing first

- followed by background information (if any) needed to understand
  what the module is intended to do

- the corresponding .rst file contains only the line

  .. automodule:: scipy.interpolate

The only exception is probably extensive examples, or extensive 
background information, which should probably be retained in the *.rst 
part, and maybe be split into several pages.

-- 
Pauli Virtanen


From josef.pktd at gmail.com  Thu Aug  5 06:55:37 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 5 Aug 2010 06:55:37 -0400
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <i3dul5$pfm$1@dough.gmane.org>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
Message-ID: <AANLkTimW74v0yKfZc9cE3j5y94-c7kqOrfJW-E_CMs_Y@mail.gmail.com>

On Thu, Aug 5, 2010 at 5:01 AM, Pauli Virtanen <pav at iki.fi> wrote:
> Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote:
>> OK, so, should I stop adding autosummaries to module docstrings and
>> revert the ones I did?
>
> I think the Sphinx markup involved is not heavy, and having to maintain
> two nearly identical documents is not something we really want to do.
>
> It might be possible to autogenerate the info.py's, but frankly, I think
> setting that up is not a very useful use of time, just to avoid a few RST
> directives. We can think about it later, but for now the priority should
> be to get some useful information both to the HTML docs and to the
> command-line help, and putting everything to info.py seems the way to go
> for me.
>
> I'd at least be OK with moving everything from the *.rst files to
> info.py. In general, I'd like to structure `info.py` in a similar way as
> it's in `numpy.fft`:
>
> - module name title etc. on top
>
> - function/class listing first
>
> - followed by background information (if any) needed to understand
> ?what the module is intended to do
>
> - the corresponding .rst file contains only the line
>
> ?.. automodule:: scipy.interpolate
>
> The only exception is probably extensive examples, or extensive
> background information, which should probably be retained in the *.rst
> part, and maybe be split into several pages.

One issue is the amount of math/latex, given the discussion we had on fftpack.
Do we restrict latex in the module docstring as in function or class
docstrings, or is it allowed to be used not only very sparingly?

Since info.py files can also be edited in the module editor, I also
think removing the duplication is a good idea.

A related question that is not urgent: Are we keeping the split of
information between tutorial and sub-package rst pages, or should some
background information be moved to the package rst files ?

Josef


>
> --
> Pauli Virtanen
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From scott.sinclair.za at gmail.com  Thu Aug  5 07:03:30 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Thu, 5 Aug 2010 13:03:30 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <i3dul5$pfm$1@dough.gmane.org>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
Message-ID: <AANLkTi=WtBPp-Fe7dHR8NUhyNCR0j-nwQ4YabXGQWQp5@mail.gmail.com>

On 5 August 2010 11:01, Pauli Virtanen <pav at iki.fi> wrote:
> Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote:
>> OK, so, should I stop adding autosummaries to module docstrings and
>> revert the ones I did?
>
> I think the Sphinx markup involved is not heavy, and having to maintain
> two nearly identical documents is not something we really want to do.
>
> I'd at least be OK with moving everything from the *.rst files to
> info.py. In general, I'd like to structure `info.py` in a similar way as
> it's in `numpy.fft`:
>
> - module name title etc. on top
>
> - function/class listing first
>
> - followed by background information (if any) needed to understand
> ?what the module is intended to do
>
> - the corresponding .rst file contains only the line
>
> ?.. automodule:: scipy.interpolate

This sounds like a good plan.

Just a note that all the edits made at
http://docs.scipy.org/scipy/docs/scipy.<sub-package> result in patches
from the doc-editor that target scipy/<sub-package>/__init__.py in the
source tree. If the patch is applied as is, the work from the
doc-editor won't appear in the terminal because the
<sub-package>.__doc__ is overwritten with the content of
scipy/<sub-package>/info.py on import of the sub-package. I expect
that Sphinx will also end up with the docstring from info.py for the
same reason, but don't have time to check right now.

Cheers,
Scott


From scott.sinclair.za at gmail.com  Thu Aug  5 07:10:47 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Thu, 5 Aug 2010 13:10:47 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTimW74v0yKfZc9cE3j5y94-c7kqOrfJW-E_CMs_Y@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
	<AANLkTimW74v0yKfZc9cE3j5y94-c7kqOrfJW-E_CMs_Y@mail.gmail.com>
Message-ID: <AANLkTim9EY5+KTvB+tXt1Y7sLOLf3xNeH5-XmorM-pEa@mail.gmail.com>

On 5 August 2010 12:55,  <josef.pktd at gmail.com> wrote:
> A related question that is not urgent: Are we keeping the split of
> information between tutorial and sub-package rst pages, or should some
> background information be moved to the package rst files ?

I think it makes sense to keep a split between the two. Tutorials
should describe how to use a selection of the sub-package tools by
example, while the sub-package rst pages should be the top level of
reference documentation for the sub-package.

Cheers,
Scott


From josef.pktd at gmail.com  Thu Aug  5 07:23:14 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 5 Aug 2010 07:23:14 -0400
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <AANLkTim9EY5+KTvB+tXt1Y7sLOLf3xNeH5-XmorM-pEa@mail.gmail.com>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
	<AANLkTimW74v0yKfZc9cE3j5y94-c7kqOrfJW-E_CMs_Y@mail.gmail.com>
	<AANLkTim9EY5+KTvB+tXt1Y7sLOLf3xNeH5-XmorM-pEa@mail.gmail.com>
Message-ID: <AANLkTikYr1=1RW12vnMLo2BaedGDZ_Os1C38TPjb5BkV@mail.gmail.com>

On Thu, Aug 5, 2010 at 7:10 AM, Scott Sinclair
<scott.sinclair.za at gmail.com> wrote:
> On 5 August 2010 12:55, ?<josef.pktd at gmail.com> wrote:
>> A related question that is not urgent: Are we keeping the split of
>> information between tutorial and sub-package rst pages, or should some
>> background information be moved to the package rst files ?
>
> I think it makes sense to keep a split between the two. Tutorials
> should describe how to use a selection of the sub-package tools by
> example, while the sub-package rst pages should be the top level of
> reference documentation for the sub-package.

Some tutorial pages also contain the mathematical definitions and
background explanations of the algorithms, e.g. signal.lfilter and
fft, just two examples that I remember. Also having the
definitions/formulas of the stats.distributions in the tutorials hides
them a bit.

Those parts might better belong in a package documentation. Usage
examples of course should remain in the tutorials.

Josef


Josef


>
> Cheers,
> Scott
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From pav at iki.fi  Thu Aug  5 08:20:41 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 5 Aug 2010 12:20:41 +0000 (UTC)
Subject: [SciPy-Dev] Status of scipy.* docstrings
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
	<AANLkTi=WtBPp-Fe7dHR8NUhyNCR0j-nwQ4YabXGQWQp5@mail.gmail.com>
Message-ID: <i3eaap$4fh$1@dough.gmane.org>

Thu, 05 Aug 2010 13:03:30 +0200, Scott Sinclair wrote:
[clip]
> Just a note that all the edits made at
> http://docs.scipy.org/scipy/docs/scipy.<sub-package> result in patches
> from the doc-editor that target scipy/<sub-package>/__init__.py in the
> source tree. If the patch is applied as is, the work from the doc-editor
> won't appear in the terminal because the <sub-package>.__doc__ is
> overwritten with the content of scipy/<sub-package>/info.py on import of
> the sub-package. I expect that Sphinx will also end up with the
> docstring from info.py for the same reason, but don't have time to check
> right now.

Correct. Since the __init__.py do

	from info import __doc__

it's not possible to find out by introspection where the __doc__ actually 
came from. To get the patches to the correct place would need some extra 
smartness and special casing in pydoc-tool.py

-- 
Pauli Virtanen


From scott.sinclair.za at gmail.com  Thu Aug  5 09:47:57 2010
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Thu, 5 Aug 2010 15:47:57 +0200
Subject: [SciPy-Dev] Status of scipy.* docstrings
In-Reply-To: <i3eaap$4fh$1@dough.gmane.org>
References: <AANLkTikJvLeEpuCrXurijNxoT_37gwYARVFPVrOYjdvA@mail.gmail.com>
	<AANLkTi=xvwRz=-WaSJ+CzW_whO9-nMGTE6b=ZzsQtOCY@mail.gmail.com>
	<AANLkTimZB=XgOwGWGNv=X4FzmZWw1W23FQ0yQR5XfkXw@mail.gmail.com>
	<AANLkTikWLa8HiSgPt3KsehHzz-UxoPxQUze2orr=Tttb@mail.gmail.com>
	<AANLkTi=KEqKgmNqNy5G-Smfsxc3bWf11swdG-y4DM5VS@mail.gmail.com>
	<AANLkTimG-nhY_nvJLC1E74GEb9Zuahvo8fsDodJQ6qLJ@mail.gmail.com>
	<AANLkTikm+y332mOhC1KY7GjmfHh9ip-=9Ba1QJDvhajF@mail.gmail.com>
	<i369td$m4a$1@dough.gmane.org>
	<AANLkTin-65V0zOhJhbQRTEiD-u0V2wstmT8pXhurHgan@mail.gmail.com>
	<AANLkTimODr5+WQwUha=HWdutkxgDg=HAvon1aas4=g6S@mail.gmail.com>
	<i3dul5$pfm$1@dough.gmane.org>
	<AANLkTi=WtBPp-Fe7dHR8NUhyNCR0j-nwQ4YabXGQWQp5@mail.gmail.com>
	<i3eaap$4fh$1@dough.gmane.org>
Message-ID: <AANLkTikHG4eREHexmS+t2Lo=akN4Sqe+=Ji+8rH-Cqx8@mail.gmail.com>

On 5 August 2010 14:20, Pauli Virtanen <pav at iki.fi> wrote:
> Thu, 05 Aug 2010 13:03:30 +0200, Scott Sinclair wrote:
> [clip]
>> Just a note that all the edits made at
>> http://docs.scipy.org/scipy/docs/scipy.<sub-package> result in patches
>> from the doc-editor that target scipy/<sub-package>/__init__.py in the
>> source tree. If the patch is applied as is, the work from the doc-editor
>> won't appear in the terminal because the <sub-package>.__doc__ is
>> overwritten with the content of scipy/<sub-package>/info.py on import of
>> the sub-package. I expect that Sphinx will also end up with the
>> docstring from info.py for the same reason, but don't have time to check
>> right now.
>
> Correct. Since the __init__.py do
>
> ? ? ? ?from info import __doc__
>
> it's not possible to find out by introspection where the __doc__ actually
> came from. To get the patches to the correct place would need some extra
> smartness and special casing in pydoc-tool.py

Or some extra cut-and-paste work from whoever applies the patches to
trunk, which is why I brought it up.

Cheers,
Scott


From d.l.goldsmith at gmail.com  Thu Aug  5 16:52:28 2010
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Thu, 5 Aug 2010 13:52:28 -0700
Subject: [SciPy-Dev] scipy.org down?
Message-ID: <AANLkTi=NALFB_d4j5CaiJedbW_+U5OT8Zs+=6xwBaDyb@mail.gmail.com>

I'm getting "Network Timeout" failures trying to visit www.scipy.org and
mail.scipy.org...

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100805/837b7c5f/attachment.html>

From dwf at cs.toronto.edu  Thu Aug  5 19:07:16 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 5 Aug 2010 19:07:16 -0400
Subject: [SciPy-Dev] scipy.org down?
In-Reply-To: <AANLkTi=NALFB_d4j5CaiJedbW_+U5OT8Zs+=6xwBaDyb@mail.gmail.com>
References: <AANLkTi=NALFB_d4j5CaiJedbW_+U5OT8Zs+=6xwBaDyb@mail.gmail.com>
Message-ID: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu>

On 2010-08-05, at 4:52 PM, David Goldsmith wrote:

> I'm getting "Network Timeout" failures trying to visit www.scipy.org and mail.scipy.org...

"easy_install ipython" is also failing with timeouts. 

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100805/c799da9f/attachment.html>

From pwang at enthought.com  Thu Aug  5 22:34:39 2010
From: pwang at enthought.com (Peter Wang)
Date: Thu, 5 Aug 2010 21:34:39 -0500
Subject: [SciPy-Dev] scipy.org down?
In-Reply-To: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu>
References: <AANLkTi=NALFB_d4j5CaiJedbW_+U5OT8Zs+=6xwBaDyb@mail.gmail.com>
	<4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu>
Message-ID: <AANLkTimEb6NSfQnLyWX3yjAPbdB5d-htOY18tQAk3h4A@mail.gmail.com>

On Thu, Aug 5, 2010 at 6:07 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> On 2010-08-05, at 4:52 PM, David Goldsmith wrote:
>
> I'm getting "Network Timeout" failures trying to visit www.scipy.org and
> mail.scipy.org...
>
> "easy_install ipython" is also failing with timeouts.
> David

The scipy.org web page seems to be back up.  What URL is easy_install
timing out on?


-Peter


From dwf at cs.toronto.edu  Fri Aug  6 01:11:37 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 6 Aug 2010 01:11:37 -0400
Subject: [SciPy-Dev] scipy.org down?
In-Reply-To: <AANLkTimEb6NSfQnLyWX3yjAPbdB5d-htOY18tQAk3h4A@mail.gmail.com>
References: <AANLkTi=NALFB_d4j5CaiJedbW_+U5OT8Zs+=6xwBaDyb@mail.gmail.com>
	<4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu>
	<AANLkTimEb6NSfQnLyWX3yjAPbdB5d-htOY18tQAk3h4A@mail.gmail.com>
Message-ID: <8FBF440D-83C1-4020-9F63-71DBFC70AC66@cs.toronto.edu>

It's also fixed. Basically the package tarballs are hosted on ipython.scipy.org (which is maybe not a good idea given scipy.org's yoyo behaviour lately -- even numpy and scipy host on sourceforge)

David

On 2010-08-05, at 10:34 PM, Peter Wang wrote:

> On Thu, Aug 5, 2010 at 6:07 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
>> On 2010-08-05, at 4:52 PM, David Goldsmith wrote:
>> 
>> I'm getting "Network Timeout" failures trying to visit www.scipy.org and
>> mail.scipy.org...
>> 
>> "easy_install ipython" is also failing with timeouts.
>> David
> 
> The scipy.org web page seems to be back up.  What URL is easy_install
> timing out on?
> 
> 
> -Peter
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev


From cimrman3 at ntc.zcu.cz  Fri Aug  6 12:52:31 2010
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Fri, 06 Aug 2010 18:52:31 +0200
Subject: [SciPy-Dev] ANN: SfePy 2010.3
Message-ID: <4C5C3DCF.5010100@ntc.zcu.cz>

I am pleased to announce release 2010.3 of SfePy.

Description
-----------

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method. The code
is based on NumPy and SciPy packages. It is distributed under the new BSD
license.

Development, mailing lists, issue tracking: http://sfepy.org
Documentation: http://docs.sfepy.org/doc
Git repository: http://github.com/sfepy
Project page: http://sfepy.kme.zcu.cz

Highlights of this release
--------------------------
- significantly rewritten code for better interactive use
- cleaner and simpler high level interface
- new tutorial section:
   - Interactive Example: Linear Elasticity [1]

[1] http://docs.sfepy.org/doc/tutorial.html#interactive-example-linear-elasticity

Major improvements
------------------
Apart from many bug-fixes, let us mention:
- new examples:
   - demonstration of the high level interface
- new tests:
   - tests of the new high level interface
- simplified but more powerful homogenization engine

For more information on this release, see
http://sfepy.googlecode.com/svn/web/releases/2010.3_RELEASE_NOTES.txt
(full release notes, rather long and technical).

Best regards,
Robert Cimrman and Contributors (*)

(*) Contributors to this release (alphabetical order):

Vladim?r Luke?, Osman, Andre Smit, Logan Sorenson


From scopatz at gmail.com  Mon Aug  9 15:31:14 2010
From: scopatz at gmail.com (Anthony Scopatz)
Date: Mon, 9 Aug 2010 14:31:14 -0500
Subject: [SciPy-Dev] Contingency Table Model
Message-ID: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>

Hello All,

I have just opened a ticket (http://projects.scipy.org/scipy/ticket/1258) that
adds a general contingency table class to the the stats package.  This class
includes methods to slice and collapse the table as well a calculate metrics
such as chi-squared and entropy.

This implementation came out of Warren Weckesser and me working on this over
the SciPy 2010 statistics sprint.

Please take a look!  Comments and suggestions are always welcome.
Be Well,
Anthony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100809/c4be8b51/attachment.html>

From josef.pktd at gmail.com  Mon Aug  9 16:11:10 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Aug 2010 16:11:10 -0400
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
Message-ID: <AANLkTimfpbNVOJCd4GDmMBWcmzaEaJr_fvJMgmmDsyKJ@mail.gmail.com>

On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz <scopatz at gmail.com> wrote:
> Hello All,
> I have just opened a ticket
> (http://projects.scipy.org/scipy/ticket/1258)?that adds a general
> contingency table class to the the stats package. ?This class includes
> methods to slice and?collapse?the table as well a calculate metrics such as
> chi-squared and entropy.
> This implementation came out of Warren?Weckesser and me working on this over
> the SciPy 2010 statistics sprint.
> Please take a look! ?Comments and suggestions are always welcome.

just a quick question that I don't understand from a brief look at the source

Isn't the core of "from_columns" doing the same quantization as
np.histogramdd? ( I haven't looked closely enough yet)

If x in from_columns is a tuple, then an array_like could also contain
strings, e.g. names/levels of a categorical variable. I'm not sure how
far this should go.

other ideas
methods or functions "from_flat" and "to_flat" would be useful.
chi2 could be renamed to chi2_indep, or take an optional expected
keyword, where the user could specify other distribution hypotheses.

Josef


> Be Well,
> Anthony
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From bsouthey at gmail.com  Mon Aug  9 16:35:34 2010
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 09 Aug 2010 15:35:34 -0500
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
Message-ID: <4C606696.6070707@gmail.com>


On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
> Hello All,
>
> I have just opened a ticket 
> (http://projects.scipy.org/scipy/ticket/1258) that adds a general 
> contingency table class to the the stats package.  This class includes 
> methods to slice and collapse the table as well a calculate metrics 
> such as chi-squared and entropy.
>
> This implementation came out of Warren Weckesser and me working on 
> this over the SciPy 2010 statistics sprint.
>
> Please take a look!  Comments and suggestions are always welcome.
> Be Well,
> Anthony
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
Some points:

1) You can not use numpy's asarray function without checking the input 
type. You must be aware of at least masked arrays and Matrix inputs as 
well as new data types.

2) You can not force a dtype on the user -  on line 54 when you can 
provide optional precision.

3) Can you please clarify lines 112-113?
"  scipy.stats.chisquare -- one-way chi-square test (which is not the same
as the n-way test with n=1)."
This needs to be a little more clear because the exact same test 
statistic is being used. In fact the function must give the correct 
answer with 1d array.

4) Related to point 3, lines 72-74 are not correct, see
http://en.wikipedia.org/wiki/Pearson's_chi-square_test 
<http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test>

5) You must allow the user to provide their own expected values

6) Users need to be able to control the output - really I don't want to 
see the table of expected values unless requested. Also a user might 
just want the table of expected values and nothing else.

7) You should not need the chi2 function.

8) More generally, what is the need for having an ContingencyTable object?


Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100809/625767a5/attachment.html>

From josef.pktd at gmail.com  Mon Aug  9 16:47:12 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Aug 2010 16:47:12 -0400
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <4C606696.6070707@gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
	<4C606696.6070707@gmail.com>
Message-ID: <AANLkTimyoy6RAWoDkuwPbXzBp-WnEtTU3RGB+xPGgZeu@mail.gmail.com>

On Mon, Aug 9, 2010 at 4:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>
> On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
>
> Hello All,
> I have just opened a ticket
> (http://projects.scipy.org/scipy/ticket/1258)?that adds a general
> contingency table class to the the stats package. ?This class includes
> methods to slice and?collapse?the table as well a calculate metrics such as
> chi-squared and entropy.
> This implementation came out of Warren?Weckesser and me working on this over
> the SciPy 2010 statistics sprint.
> Please take a look! ?Comments and suggestions are always welcome.
> Be Well,
> Anthony
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
> Some points:
>
> 1) You can not use numpy's asarray function without checking the input type.
> You must be aware of at least masked arrays and Matrix inputs as well as new
> data types.
>
> 2) You can not force a dtype on the user -? on line 54 when you can provide
> optional precision.
>
> 3) Can you please clarify lines 112-113?
> "? scipy.stats.chisquare -- one-way chi-square test (which is not the same
> as the n-way test with n=1)."
> This needs to be a little more clear because the exact same test statistic
> is being used. In fact the function must give the correct answer with 1d
> array.
>
> 4) Related to point 3, lines 72-74 are not correct, see
> http://en.wikipedia.org/wiki/Pearson's_chi-square_test
>
> 5) You must allow the user to provide their own expected values
>
> 6) Users need to be able to control the output - really I don't want to see
> the table of expected values unless requested. Also a user might just want
> the table of expected values and nothing else.
>
> 7) You should not need the chi2 function.
>
> 8) More generally, what is the need for having an ContingencyTable object?

maybe some usage examples will be nice.

I like the collapse methods, since, I think, it makes it easy to test
(for marginal ?) independence along different variables. Similar for
slicing to test conditional independence, but I haven't read through
the slicing method yet.

In the long term it might also be useful to attach other tests for
contingency tables for convenience, fisher- exact, kendall tau and
other tests that apply.
And when numpy gets the labeled array, we can attach labels for the categories.

Josef


Josef

>
>
> Bruce
>
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From scopatz at gmail.com  Mon Aug  9 17:46:50 2010
From: scopatz at gmail.com (Anthony Scopatz)
Date: Mon, 9 Aug 2010 16:46:50 -0500
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <AANLkTimfpbNVOJCd4GDmMBWcmzaEaJr_fvJMgmmDsyKJ@mail.gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com> 
	<AANLkTimfpbNVOJCd4GDmMBWcmzaEaJr_fvJMgmmDsyKJ@mail.gmail.com>
Message-ID: <AANLkTi=ASnRh2Htpb+vJRGrE9ZOf=HFyB1fQaF+hw3rg@mail.gmail.com>

On Mon, Aug 9, 2010 at 3:11 PM, <josef.pktd at gmail.com> wrote:

> On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz <scopatz at gmail.com> wrote:
> > Hello All,
> > I have just opened a ticket
> > (http://projects.scipy.org/scipy/ticket/1258) that adds a general
> > contingency table class to the the stats package.  This class includes
> > methods to slice and collapse the table as well a calculate metrics such
> as
> > chi-squared and entropy.
> > This implementation came out of Warren Weckesser and me working on this
> over
> > the SciPy 2010 statistics sprint.
> > Please take a look!  Comments and suggestions are always welcome.
>
> just a quick question that I don't understand from a brief look at the
> source
>
> Isn't the core of "from_columns" doing the same quantization as
> np.histogramdd? ( I haven't looked closely enough yet)
>
> If x in from_columns is a tuple, then an array_like could also contain
> strings, e.g. names/levels of a categorical variable. I'm not sure how
> far this should go.
>
>
To kill two birds with one stone, from_columns() and np.histogramdd() do
effectively the same thing for continuous variables but specifying bounds
and distributions rather than bins.  However, from_columns() allows for
discrete variables, which as you pointed out can handle categorical,
string-based data.  See the attached file for an example.  (Maybe this
method of making histograms should be in numpy?)  The reason I with the
bounds/dist rather than bin implementation is that bounds/dists are more
often what you play around with when exploring the data.

other ideas
> methods or functions "from_flat" and "to_flat" would be useful.
> chi2 could be renamed to chi2_indep, or take an optional expected
> keyword, where the user could specify other distribution hypotheses.
>
>
An expected keyword would work well here.  It might be a better idea to
include such a keyword in __init__() and from_columns().  I'd just need to
make sure that the collapse and slice methods propagate this properly.

I can also see how  "from_flat" and "to_flat" methods would be nice.

Be Well
Anthony


> Josef
>
>
> > Be Well,
> > Anthony
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ct_cat.py
Type: text/x-python
Size: 451 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.py>

From roberto.bucher at supsi.ch  Wed Aug 11 04:40:42 2010
From: roberto.bucher at supsi.ch (Roberto Bucher)
Date: Wed, 11 Aug 2010 10:40:42 +0200
Subject: [SciPy-Dev] First contact
Message-ID: <201008111040.42635.roberto.bucher@supsi.ch>

Hi all

I'm new in the mailing list and I'd like to contact people responsible of the 
signal/ltisys module.

I'm working on a porting of all my control functions from Scicoslab under 
Python and I found some errors(?) in the signal/ltisys module. In addition I 
added some code to the module for handling state space MIMO systems.

The first problem was related with the function "ss2tf" which gives as return 
numerator a 2 dimensional array, not more usable by the print function of the 
class. I added these lines at the end of the function:

    # Avoid leading zeros in num
    num=num[0]
    while num[0]==0:
        num=num[1:]   

but I don't want to get problems with other modules...

Best regards

Roberto

-- 
-----------------------------------------------------------------------------
Coltivate Linux! Tanto Windows si pianta da solo...
-----------------------------------------------------------------------------
University of Applied Sciences of Southern Switzerland
Dept. Innovative Technologies
CH-6928 Lugano-Manno
http://web.dti.supsi.ch/~bucher


From derek at astro.physik.uni-goettingen.de  Wed Aug 11 11:02:59 2010
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Wed, 11 Aug 2010 17:02:59 +0200
Subject: [SciPy-Dev] First contact
In-Reply-To: <201008111040.42635.roberto.bucher@supsi.ch>
References: <201008111040.42635.roberto.bucher@supsi.ch>
Message-ID: <A6140C70-E2F7-414C-BFD2-5CDD2D88B539@astro.physik.uni-goettingen.de>

Hi Roberto,

welcome to the list!

> The first problem was related with the function "ss2tf" which gives  
> as return
> numerator a 2 dimensional array, not more usable by the print  
> function of the
> class. I added these lines at the end of the function:
>
>    # Avoid leading zeros in num
>    num=num[0]
>    while num[0]==0:
>        num=num[1:]
>
> but I don't want to get problems with other modules...

I must admit I have no idea what this output is typically supposed to  
look like,
but I note that in the first step you are already replacing a 2D-array  
with its first
1D-element, which is probably not desired if you have input with  
D.shape[0] > 1.
The print function of which class are you referring to BTW?

For the second part, you could replace the "while" loop with a numpy  
operation:

num = num[num.nonzero()[0][0]:]

HTH,
										Derek


From roberto.bucher at supsi.ch  Wed Aug 11 11:55:10 2010
From: roberto.bucher at supsi.ch (Roberto Bucher)
Date: Wed, 11 Aug 2010 17:55:10 +0200
Subject: [SciPy-Dev] First contact
In-Reply-To: <A6140C70-E2F7-414C-BFD2-5CDD2D88B539@astro.physik.uni-goettingen.de>
References: <201008111040.42635.roberto.bucher@supsi.ch>
	<A6140C70-E2F7-414C-BFD2-5CDD2D88B539@astro.physik.uni-goettingen.de>
Message-ID: <201008111755.10096.roberto.bucher@supsi.ch>

Thanks Derek

I solved the problem by changing the lines
    # Avoid leading zeros in num
    num=num[0]
    
    while num[0]==0:
        num=num[1:]

with these lines
    # Avoid leading zeros in num
    [num,den]=normalize(num,den)

Best regards

Roberto

On Wednesday 11 August 2010 17:02:59 Derek Homeier wrote:
> Hi Roberto,
> 
> welcome to the list!
> 
> > The first problem was related with the function "ss2tf" which gives
> > as return
> > numerator a 2 dimensional array, not more usable by the print
> > function of the
> > 
> > class. I added these lines at the end of the function:
> >    # Avoid leading zeros in num
> >    num=num[0]
> >    
> >    while num[0]==0:
> >        num=num[1:]
> > 
> > but I don't want to get problems with other modules...
> 
> I must admit I have no idea what this output is typically supposed to
> look like,
> but I note that in the first step you are already replacing a 2D-array
> with its first
> 1D-element, which is probably not desired if you have input with
> D.shape[0] > 1.
> The print function of which class are you referring to BTW?
> 
> For the second part, you could replace the "while" loop with a numpy
> operation:
> 
> num = num[num.nonzero()[0][0]:]
> 
> HTH,
> 										Derek

-- 
-----------------------------------------------------------------------------
Coltivate Linux! Tanto Windows si pianta da solo...
-----------------------------------------------------------------------------
University of Applied Sciences of Southern Switzerland
Dept. Innovative Technologies
CH-6928 Lugano-Manno
http://web.dti.supsi.ch/~bucher


From scopatz at gmail.com  Wed Aug 11 15:10:07 2010
From: scopatz at gmail.com (Anthony Scopatz)
Date: Wed, 11 Aug 2010 14:10:07 -0500
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <4C606696.6070707@gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com> 
	<4C606696.6070707@gmail.com>
Message-ID: <AANLkTi=MZ++oAyUv4JdX2j7gK5RQPXhTvnyG-1Xcrw6H@mail.gmail.com>

On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:

>
> On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
>
> Hello All,
>
>  I have just opened a ticket (http://projects.scipy.org/scipy/ticket/1258) that
> adds a general contingency table class to the the stats package.  This class
> includes methods to slice and collapse the table as well a calculate metrics
> such as chi-squared and entropy.
>
>  This implementation came out of Warren Weckesser and me working on this
> over the SciPy 2010 statistics sprint.
>
>  Please take a look!  Comments and suggestions are always welcome.
> Be Well,
> Anthony
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev
>
>  Hello All,

I have updated the ticket with new versions of the contingency_table.py and
test_contingency_table.py.  I also have a github clone of scipy now, if you
just want to grab the changes, http://github.com/scopatz/scipy

Issues addressed in the new version:

   1. Expected tables may now be user-specified,
   2. added from_flat() and to_flat() methods,
   3. Retooled the chi_square() method and removed the chisquare_nway()
   function.
   4. All table metric methods (entropy) now add the calculated value to the
   contingency table's attributes as well as returning the value.

Bruce, Thank you for your concerns.  I'd like to address your points below.


> 1) You can not use numpy's asarray function without checking the input
> type. You must be aware of at least masked arrays and Matrix inputs as well
> as new data types.
>
> 2) You can not force a dtype on the user -  on line 54 when you can provide
> optional precision.
>

These are handled by now allowing the user to specify their own expected
table.  The expected_nway() function that these to points relate to can now
be avoided completely, if desired.


>
> 3) Can you please clarify lines 112-113?
> "  scipy.stats.chisquare -- one-way chi-square test (which is not the same
> as the n-way test with n=1)."
> This needs to be a little more clear because the exact same test statistic
> is being used. In fact the function must give the correct answer with 1d
> array.
>
> 4) Related to point 3, lines 72-74 are not correct, see
> http://en.wikipedia.org/wiki/Pearson's_chi-square_test<http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test>
>

The chisquared_nway() function has been removed, so 3) and 4) no longer
apply.


> 5) You must allow the user to provide their own expected values
>

done.


>  6) Users need to be able to control the output - really I don't want to
> see the table of expected values unless requested. Also a user might just
> want the table of expected values and nothing else.
>

The expected table, much like the probability table or the number of degrees
of freedom or the number of dimensions, is not really an output.  Rather it
is more of an attribute that helps calculate outputs, like the entropy,
mutual information, etc.  Therefore it should always be included in an
instance of ContingencyTable.  A user could simply have an array of values
that they call a contingency table, but this class provides a tool for
easily calculating related metrics (outputs).

7) You should not need the chi2 function.
>

Now required since chisquared_nway() was removed.


>  8) More generally, what is the need for having an ContingencyTable object?
>

Basically, my argument for the need is that contingency tables (or cross
tabulations) are expected as standard in any statistics package.  R has
them, Matlab has them, SPSS has them, Stata has them, and so on.  I know
that when I came to scipy.stats and found that they weren't here already, I
was disappointed.

I hope this helps!

Be Well
Anthony


>
>
> Bruce
>
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100811/fd1b0481/attachment.html>

From josef.pktd at gmail.com  Wed Aug 11 15:37:38 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 11 Aug 2010 15:37:38 -0400
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <AANLkTi=MZ++oAyUv4JdX2j7gK5RQPXhTvnyG-1Xcrw6H@mail.gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
	<4C606696.6070707@gmail.com>
	<AANLkTi=MZ++oAyUv4JdX2j7gK5RQPXhTvnyG-1Xcrw6H@mail.gmail.com>
Message-ID: <AANLkTimt1Udr5r2u=HSihdmmxjdYpPwu85+9h=+Ldq6D@mail.gmail.com>

On Wed, Aug 11, 2010 at 3:10 PM, Anthony Scopatz <scopatz at gmail.com> wrote:
>
>
> On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
>>
>> Hello All,
>> I have just opened a ticket
>> (http://projects.scipy.org/scipy/ticket/1258)?that adds a general
>> contingency table class to the the stats package. ?This class includes
>> methods to slice and?collapse?the table as well a calculate metrics such as
>> chi-squared and entropy.
>> This implementation came out of Warren?Weckesser and me working on this
>> over the SciPy 2010 statistics sprint.
>> Please take a look! ?Comments and suggestions are always welcome.
>> Be Well,
>> Anthony
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
> Hello All,
> I have updated the ticket with new versions of the contingency_table.py and
> test_contingency_table.py. ?I also have a github clone of scipy now, if you
> just want to grab the changes,?http://github.com/scopatz/scipy
> Issues addressed in the new version:
>
> Expected tables may now be user-specified,
> added from_flat() and to_flat() methods,

a clarification: for from_flat I was thinking about non-rectangular
data when a simple reshape doesn't work.
something like an nd version of
http://mail.scipy.org/pipermail/scipy-dev/2009-March/011592.html

for example when the count data are given in a structured array with
the corresponding group labels where zero count entries might be
missing and which is not necessarily sorted/ordered in the right way
for a reshape.

but now I think this is also handled by from_columns, where the user
specifies the "distribution" as list of (unique) values. (?)

(I haven't looked at the other changes)

Josef

> Retooled the chi_square() method and removed the chisquare_nway() function.
> All table metric methods (entropy) now add the calculated value to the
> contingency table's attributes as well as returning the value.
>
> Bruce, Thank you for your concerns. ?I'd like to address your points below.
>
>>
>> 1) You can not use numpy's asarray function without checking the input
>> type. You must be aware of at least masked arrays and Matrix inputs as well
>> as new data types.
>>
>> 2) You can not force a dtype on the user -? on line 54 when you can
>> provide optional precision.
>
> These are handled by now allowing the user to specify their own expected
> table. ?The expected_nway() function that these to points relate to can now
> be avoided?completely, if desired.
>
>>
>> 3) Can you please clarify lines 112-113?
>> "? scipy.stats.chisquare -- one-way chi-square test (which is not the same
>> as the n-way test with n=1)."
>> This needs to be a little more clear because the exact same test statistic
>> is being used. In fact the function must give the correct answer with 1d
>> array.
>>
>> 4) Related to point 3, lines 72-74 are not correct, see
>> http://en.wikipedia.org/wiki/Pearson's_chi-square_test
>
> The chisquared_nway() function has been removed, so 3) and 4) no longer
> apply.
>
>>
>> 5) You must allow the user to provide their own expected values
>
>
> done.
>
>>
>> 6) Users need to be able to control the output - really I don't want to
>> see the table of expected values unless requested. Also a user might just
>> want the table of expected values and nothing else.
>
> The expected table, much like the probability table or the number of degrees
> of freedom or the number of dimensions, is not really an output. ?Rather it
> is more of an attribute that helps calculate outputs, like the entropy,
> mutual information, etc. ?Therefore it should always be included in an
> instance of ContingencyTable. ?A user could simply have an array of values
> that they call a contingency table, but this class provides a tool for
> easily calculating related metrics (outputs).
>>
>> 7) You should not need the chi2 function.
>
> Now required since?chisquared_nway() was removed.
>
>>
>> 8) More generally, what is the need for having an ContingencyTable object?
>
> Basically, my argument for the need is that contingency tables (or cross
> tabulations) are expected as standard in any statistics package. ?R has
> them, Matlab has them, SPSS has them, Stata has them, and so on. ?I know
> that when I came to scipy.stats and found that they weren't here already, I
> was disappointed.
> I hope this helps!
> Be Well
> Anthony
>
>>
>> Bruce
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From bsouthey at gmail.com  Wed Aug 11 16:04:36 2010
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 11 Aug 2010 15:04:36 -0500
Subject: [SciPy-Dev] Contingency Table Model
In-Reply-To: <AANLkTi=MZ++oAyUv4JdX2j7gK5RQPXhTvnyG-1Xcrw6H@mail.gmail.com>
References: <AANLkTik6Lp6xodPY2UX_CtBfWVjdgN3-H6h3ts04Kx6U@mail.gmail.com>
	<4C606696.6070707@gmail.com>
	<AANLkTi=MZ++oAyUv4JdX2j7gK5RQPXhTvnyG-1Xcrw6H@mail.gmail.com>
Message-ID: <4C630254.2040805@gmail.com>

  On 08/11/2010 02:10 PM, Anthony Scopatz wrote:
>
>
> On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey <bsouthey at gmail.com 
> <mailto:bsouthey at gmail.com>> wrote:
>
>
>     On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
>>     Hello All,
>>
>>     I have just opened a ticket
>>     (http://projects.scipy.org/scipy/ticket/1258) that adds a general
>>     contingency table class to the the stats package.  This class
>>     includes methods to slice and collapse the table as well a
>>     calculate metrics such as chi-squared and entropy.
>>
>>     This implementation came out of Warren Weckesser and me working
>>     on this over the SciPy 2010 statistics sprint.
>>
>>     Please take a look!  Comments and suggestions are always welcome.
>>     Be Well,
>>     Anthony
>>
>>
>>     _______________________________________________
>>     SciPy-Dev mailing list
>>     SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
>>     http://mail.scipy.org/mailman/listinfo/scipy-dev
>
> Hello All,
>
> I have updated the ticket with new versions of the 
> contingency_table.py and test_contingency_table.py.  I also have a 
> github clone of scipy now, if you just want to grab the changes, 
> http://github.com/scopatz/scipy
>
> Issues addressed in the new version:
>
>    1. Expected tables may now be user-specified,
>    2. added from_flat() and to_flat() methods,
>    3. Retooled the chi_square() method and removed the
>       chisquare_nway() function.
>    4. All table metric methods (entropy) now add the calculated value
>       to the contingency table's attributes as well as returning the
>       value.
>
> Bruce, Thank you for your concerns.  I'd like to address your points 
> below.
>
>     1) You can not use numpy's asarray function without checking the
>     input type. You must be aware of at least masked arrays and Matrix
>     inputs as well as new data types.
>
>     2) You can not force a dtype on the user -  on line 54 when you
>     can provide optional precision.
>
>
> These are handled by now allowing the user to specify their own 
> expected table.  The expected_nway() function that these to points 
> relate to can now be avoided completely, if desired.
>
>
>     3) Can you please clarify lines 112-113?
>     "  scipy.stats.chisquare -- one-way chi-square test (which is not
>     the same
>     as the n-way test with n=1)."
>     This needs to be a little more clear because the exact same test
>     statistic is being used. In fact the function must give the
>     correct answer with 1d array.
>
>     4) Related to point 3, lines 72-74 are not correct, see
>     http://en.wikipedia.org/wiki/Pearson's_chi-square_test
>     <http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test>
>
>
> The chisquared_nway() function has been removed, so 3) and 4) no 
> longer apply.
>
>     5) You must allow the user to provide their own expected values
>
> done.
>
>     6) Users need to be able to control the output - really I don't
>     want to see the table of expected values unless requested. Also a
>     user might just want the table of expected values and nothing else.
>
>
> The expected table, much like the probability table or the number of 
> degrees of freedom or the number of dimensions, is not really an 
> output.  Rather it is more of an attribute that helps calculate 
> outputs, like the entropy, mutual information, etc.  Therefore it 
> should always be included in an instance of ContingencyTable.  A user 
> could simply have an array of values that they call a contingency 
> table, but this class provides a tool for easily calculating related 
> metrics (outputs).
>
>     7) You should not need the chi2 function.
>
>
> Now required since chisquared_nway() was removed.
>
>     8) More generally, what is the need for having an ContingencyTable
>     object?
>
>
> Basically, my argument for the need is that contingency tables (or 
> cross tabulations) are expected as standard in any statistics package. 
>  R has them, Matlab has them, SPSS has them, Stata has them, and so 
> on.  I know that when I came to scipy.stats and found that they 
> weren't here already, I was disappointed.
>
> I hope this helps!
>
> Be Well
> Anthony
>
>
>
>     Bruce
>
>
>
>
>
>
>     _______________________________________________
>     SciPy-Dev mailing list
>     SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
I am very aware that this type of functionality is available in multiple 
applications so that was never my concern. But you have failed to 
address my concerns nor addressed the the questions about why it is 
needed in this form.

An important issue is why we need this code when it was pointed out the 
similarity to numpy's histogram functions. At some stage we have to say 
no to code bloat.

Note, as a class then everything must be self-contained - both _margins 
and expected_nway have little point outside your class.

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100811/0d622bd6/attachment.html>

From millman at berkeley.edu  Tue Aug 17 19:08:35 2010
From: millman at berkeley.edu (Jarrod Millman)
Date: Tue, 17 Aug 2010 16:08:35 -0700
Subject: [SciPy-Dev] status of http://ask.scipy.org and
	http://advice.mechanicalkern.com/
Message-ID: <AANLkTiktuhhJKTc7Qy5tC+AdUiH6miWDk-_9Z7=QMKhS@mail.gmail.com>

Hello,

This email was prompted by a blog post from William Stein:
  http://sagemath.blogspot.com/2010/08/overflow.html

Is http://ask.scipy.org the official site at this point?  What is the
plan for http://advice.mechanicalkern.com/?

Thanks,
Jarrod


From robert.kern at gmail.com  Tue Aug 17 19:22:23 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Aug 2010 18:22:23 -0500
Subject: [SciPy-Dev] status of http://ask.scipy.org and
	http://advice.mechanicalkern.com/
In-Reply-To: <AANLkTiktuhhJKTc7Qy5tC+AdUiH6miWDk-_9Z7=QMKhS@mail.gmail.com>
References: <AANLkTiktuhhJKTc7Qy5tC+AdUiH6miWDk-_9Z7=QMKhS@mail.gmail.com>
Message-ID: <AANLkTik3_5TBccYesHYL0=LznCWdXmZGCpVTkjRMbk0q@mail.gmail.com>

On Tue, Aug 17, 2010 at 18:08, Jarrod Millman <millman at berkeley.edu> wrote:
> Hello,
>
> This email was prompted by a blog post from William Stein:
> ?http://sagemath.blogspot.com/2010/08/overflow.html
>
> Is http://ask.scipy.org the official site at this point?

Might as well be.

>?What is the
> plan for http://advice.mechanicalkern.com/?

David Warde-Farley has manually moved over most, if not all of the
questions and answers. I have made a note on the front page there to
point to ask.scipy.org.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From dwf at cs.toronto.edu  Wed Aug 18 17:33:19 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Wed, 18 Aug 2010 17:33:19 -0400
Subject: [SciPy-Dev] status of http://ask.scipy.org and
	http://advice.mechanicalkern.com/
In-Reply-To: <AANLkTik3_5TBccYesHYL0=LznCWdXmZGCpVTkjRMbk0q@mail.gmail.com>
References: <AANLkTiktuhhJKTc7Qy5tC+AdUiH6miWDk-_9Z7=QMKhS@mail.gmail.com>
	<AANLkTik3_5TBccYesHYL0=LznCWdXmZGCpVTkjRMbk0q@mail.gmail.com>
Message-ID: <9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu>

On 2010-08-17, at 7:22 PM, Robert Kern wrote:

> David Warde-Farley has manually moved over most, if not all of the
> questions and answers. I have made a note on the front page there to
> point to ask.scipy.org.

I think it's all, at this point, at least the chosen answers. I'll do a quick check later tonight for anything I missed.

Robert, would it be hard to disable logins/signups as well? Do you think this is a good idea?

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100818/281d208f/attachment.html>

From robert.kern at gmail.com  Wed Aug 18 17:35:17 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 18 Aug 2010 16:35:17 -0500
Subject: [SciPy-Dev] status of http://ask.scipy.org and
	http://advice.mechanicalkern.com/
In-Reply-To: <9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu>
References: <AANLkTiktuhhJKTc7Qy5tC+AdUiH6miWDk-_9Z7=QMKhS@mail.gmail.com>
	<AANLkTik3_5TBccYesHYL0=LznCWdXmZGCpVTkjRMbk0q@mail.gmail.com>
	<9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu>
Message-ID: <AANLkTinDswh10=BXgiHJ7HnjaMEza9hMY-AHoY07XMPh@mail.gmail.com>

On Wed, Aug 18, 2010 at 16:33, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> Robert, would it be hard to disable logins/signups as well?

Oh, probably. I might be able to just remove the links though...

> Do you think
> this is a good idea?

Of course.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From tmp50 at ukr.net  Fri Aug 20 14:52:09 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Fri, 20 Aug 2010 21:52:09 +0300
Subject: [SciPy-Dev] question to scikits.appspot.com editors
Message-ID: <E1OmWhF-0003a0-WC@ffe8.ukr.net>

hi all,  
who is responsible for scikits.appspot.com editing?  
I see some scikits (e.g. learn) has mentioned their personal website (  http://scikit-learn.sourceforge.net), while openopt entry still points to deprecated location that is out of maintanance for several years; same for svn root mentioned there. Could you either fix it to modern locations (http://openopt.org, svn://openopt.org/PythonPackages/OpenOpt) or, at least, remove it at all to prevent misleading of users?  
  
Regards, D.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100820/5e6813bc/attachment.html>

From tmp50 at ukr.net  Fri Aug 20 14:54:56 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Fri, 20 Aug 2010 21:54:56 +0300
Subject: [SciPy-Dev] could you consider creating "build/install issues" mail
	list?
Message-ID: <E1OmWjw-000Kme-3L@ffe7.ukr.net>

Hi all,  
could you consider creating special mail list "build/install numpy/scipy issues" or something like that?  
I guess lots of people (as well as I) are not interested in reading all those huge amounts of messages related to these issues.  
Regards, D.  
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100820/cddb4161/attachment.html>

From robert.kern at gmail.com  Fri Aug 20 15:01:26 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 20 Aug 2010 14:01:26 -0500
Subject: [SciPy-Dev] could you consider creating "build/install issues"
 mail list?
In-Reply-To: <E1OmWjw-000Kme-3L@ffe7.ukr.net>
References: <E1OmWjw-000Kme-3L@ffe7.ukr.net>
Message-ID: <AANLkTikNZPQx1ip7NG0=GwVoPLBmEsAUiiUwChwn=+Co@mail.gmail.com>

2010/8/20 Dmitrey <tmp50 at ukr.net>:
> Hi all,
> could you consider creating special mail list "build/install numpy/scipy
> issues" or something like that?

Yes, I will consider it.

Having considered it, no I do not think it would be a good idea to
split up the mailing lists even more.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From tmp50 at ukr.net  Fri Aug 20 18:30:57 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 21 Aug 2010 01:30:57 +0300
Subject: [SciPy-Dev] could you consider creating "build/install issues"
	mail list?
In-Reply-To: <AANLkTikNZPQx1ip7NG0=GwVoPLBmEsAUiiUwChwn=+Co@mail.gmail.com>
Message-ID: <E1Oma6z-000CEu-ND@ffe14.ukr.net>

Could also the following idea be considered: create numpy/scipy announce mail list with soft releases and other important info?  
D.  
  
--- ???????? ????????? ---  
?? ????: "Robert Kern" <robert.kern at gmail.com>  
????: "SciPy Developers List" <scipy-dev at scipy.org>  
????: 20 ???????, 22:01:26  
????: Re: [SciPy-Dev] could you consider creating "build/install issues" mail list?  
  
  2010/8/20 Dmitrey <tmp50 at ukr.net>:  
> Hi all,  
> could you consider creating special mail list "build/install numpy/scipy  
> issues" or something like that?  
  
Yes, I will consider it.  
  
Having considered it, no I do not think it would be a good idea to  
split up the mailing lists even more.  
  
--  
Robert Kern  
  
"I have come to believe that the whole world is an enigma, a harmless  
enigma that is made terrible by our own mad attempt to interpret it as  
though it had an underlying truth."  
? -- Umberto Eco  
_______________________________________________  
SciPy-Dev mailing list  
SciPy-Dev at scipy.org  
http://mail.scipy.org/mailman/listinfo/scipy-dev  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100821/c640a671/attachment.html>

From robert.kern at gmail.com  Fri Aug 20 18:37:37 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 20 Aug 2010 17:37:37 -0500
Subject: [SciPy-Dev] could you consider creating "build/install issues"
 mail list?
In-Reply-To: <E1Oma6z-000CEu-ND@ffe14.ukr.net>
References: <AANLkTikNZPQx1ip7NG0=GwVoPLBmEsAUiiUwChwn=+Co@mail.gmail.com>
	<E1Oma6z-000CEu-ND@ffe14.ukr.net>
Message-ID: <AANLkTik0-9_qzr_of5b7geYaJJyD7x9bJ5x_E6Ngt=Rd@mail.gmail.com>

2010/8/20 Dmitrey <tmp50 at ukr.net>:
> Could also the following idea be considered: create numpy/scipy announce
> mail list with soft releases and other important info?

Release announcements get sent to python-announce:

http://mail.python.org/mailman/listinfo/python-announce-list

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From ralf.gommers at googlemail.com  Fri Aug 20 22:03:01 2010
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sat, 21 Aug 2010 10:03:01 +0800
Subject: [SciPy-Dev] could you consider creating "build/install issues"
 mail list?
In-Reply-To: <AANLkTik0-9_qzr_of5b7geYaJJyD7x9bJ5x_E6Ngt=Rd@mail.gmail.com>
References: <AANLkTikNZPQx1ip7NG0=GwVoPLBmEsAUiiUwChwn=+Co@mail.gmail.com>
	<E1Oma6z-000CEu-ND@ffe14.ukr.net>
	<AANLkTik0-9_qzr_of5b7geYaJJyD7x9bJ5x_E6Ngt=Rd@mail.gmail.com>
Message-ID: <AANLkTi=w50gjmAA-0LjidYaeai=yoh0OW=RhF_y+0BxR@mail.gmail.com>

On Sat, Aug 21, 2010 at 6:37 AM, Robert Kern <robert.kern at gmail.com> wrote:

> 2010/8/20 Dmitrey <tmp50 at ukr.net>:
> > Could also the following idea be considered: create numpy/scipy announce
> > mail list with soft releases and other important info?
>
> Release announcements get sent to python-announce:
>
> http://mail.python.org/mailman/listinfo/python-announce-list
>
> Actually, the last releases weren't announced there since I wasn't aware of
this. Will do from now on.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100821/8ed6ff09/attachment.html>

From tmp50 at ukr.net  Sat Aug 21 04:04:54 2010
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 21 Aug 2010 11:04:54 +0300
Subject: [SciPy-Dev] could you consider creating "build/install issues"
	mail list?
In-Reply-To: <AANLkTik0-9_qzr_of5b7geYaJJyD7x9bJ5x_E6Ngt=Rd@mail.gmail.com>
Message-ID: <E1Omj4Q-0000Zq-Rf@ffe9.ukr.net>

Well, thus I guess scipy-user rss subscription is beyond my needs hence I cease it as I had done with numpy-user.  
Regards, D.  
  
--- ???????? ????????? ---  
?? ????: "Robert Kern" <robert.kern at gmail.com>  
????: "SciPy Developers List" <scipy-dev at scipy.org>  
????: 21 ???????, 01:37:37  
????: Re: [SciPy-Dev] could you consider creating "build/install issues" mail list?  
  
  2010/8/20 Dmitrey <tmp50 at ukr.net>:  
> Could also the following idea be considered: create numpy/scipy announce  
> mail list with soft releases and other important info?  
  
Release announcements get sent to python-announce:  
  
http://mail.python.org/mailman/listinfo/python-announce-list  
  
--  
Robert Kern  
  
"I have come to believe that the whole world is an enigma, a harmless  
enigma that is made terrible by our own mad attempt to interpret it as  
though it had an underlying truth."  
? -- Umberto Eco  
_______________________________________________  
SciPy-Dev mailing list  
SciPy-Dev at scipy.org  
http://mail.scipy.org/mailman/listinfo/scipy-dev  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100821/28c82761/attachment.html>

From roberto.bucher at supsi.ch  Sat Aug 21 14:00:01 2010
From: roberto.bucher at supsi.ch (Roberto Bucher)
Date: Sat, 21 Aug 2010 20:00:01 +0200
Subject: [SciPy-Dev] ltisys.py
Message-ID: <201008212000.01702.roberto.bucher@supsi.ch>

I'm working on a control system toolbox for python. In particular I've modified 
some functions of the control system toolbox developed by Richard Murray and I 
added some new functions.  The first problem is related to the modul ltisys.py 
that I modified for handling:
- MIMO systems (state-space only)
- Sampling Time for discrete time system

I've done other modifications to different modul of Richard, who is in CC:
- matlab.py
- statesp.py
- xferfcn.py
 in order to implement the following new functions in my yottalab.py modul:
- c2d (zoh+bilinear)
- d2c (zoh+bilinear)
- dare - discrete riccati solution
- care - continous riccati solution
- dlqr - discrete linear quadratic regulator
- ctrb - controllability matrix
- acker - ackerman pole placement
- minreal - minimal state space representation
- dcgain steady state gain for both continous and discrete time systems
- tf (casting function)
- ss (casting function)

The modified files can be downloaded from my homepage 
in the python section, but I want to see how I can contribute to put the 
modifications in the main Scipy distribution.

Best regards

Roberto

-- 
-----------------------------------------------------------------------------
Coltivate Linux! Tanto Windows si pianta da solo...
-----------------------------------------------------------------------------
University of Applied Sciences of Southern Switzerland
Dept. Innovative Technologies
CH-6928 Lugano-Manno
http://web.dti.supsi.ch/~bucher


From pav at iki.fi  Sat Aug 21 15:00:40 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Sat, 21 Aug 2010 19:00:40 +0000 (UTC)
Subject: [SciPy-Dev] question to scikits.appspot.com editors
References: <E1OmWhF-0003a0-WC@ffe8.ukr.net>
Message-ID: <i4p7ol$8ns$2@dough.gmane.org>

Fri, 20 Aug 2010 21:52:09 +0300, Dmitrey wrote:
> who is responsible for scikits.appspot.com editing? I see some scikits
> (e.g. learn) has mentioned their personal website ( 
> http://scikit-learn.sourceforge.net), while openopt entry still points
> to deprecated location that is out of maintanance for several years;
> same for svn root mentioned there. Could you either fix it to modern
> locations (http://openopt.org, svn://openopt.org/PythonPackages/OpenOpt)
> or, at least, remove it at all to prevent misleading of users?

It's semi-automatic -- as far as I understands it looks what's in scikits 
SVN and PyPi, and works from that.

Anyway, the people who know how it works are probably Stefan van der 
Walt, and somebody else.

BTW, does the openopt stuff in scikits SVN still serve a purpose? I think 
we should just remove it if it's not used -- it can be still recovered 
from the history even after that, if there was something useful there.

-- 
Pauli Virtanen


From josef.pktd at gmail.com  Sat Aug 21 15:09:36 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 21 Aug 2010 15:09:36 -0400
Subject: [SciPy-Dev] ltisys.py
In-Reply-To: <201008212000.01702.roberto.bucher@supsi.ch>
References: <201008212000.01702.roberto.bucher@supsi.ch>
Message-ID: <AANLkTimSFk3XJ2XuE+ysp0aBTfZSghC0U7qkzptJLqbp@mail.gmail.com>

On Sat, Aug 21, 2010 at 2:00 PM, Roberto Bucher <roberto.bucher at supsi.ch> wrote:
> I'm working on a control system toolbox for python. In particular I've modified
> some functions of the control system toolbox developed by Richard Murray and I
> added some new functions. ?The first problem is related to the modul ltisys.py
> that I modified for handling:
> - MIMO systems (state-space only)
> - Sampling Time for discrete time system
>
> I've done other modifications to different modul of Richard, who is in CC:
> - matlab.py
> - statesp.py
> - xferfcn.py
> ?in order to implement the following new functions in my yottalab.py modul:
> - c2d (zoh+bilinear)
> - d2c (zoh+bilinear)
> - dare - discrete riccati solution
> - care - continous riccati solution
> - dlqr - discrete linear quadratic regulator
> - ctrb - controllability matrix
> - acker - ackerman pole placement
> - minreal - minimal state space representation
> - dcgain steady state gain for both continous and discrete time systems
> - tf (casting function)
> - ss (casting function)
>
> The modified files can be downloaded from my homepage
> in the python section, but I want to see how I can contribute to put the
> modifications in the main Scipy distribution.


I would find it very good if such enhancements go into scipy ltisys.
It was often requested, and I think we will be able to use them also
for time series analysis. I stopped looking at it after I figured out
ltisys only handled single input, mainly continuous time processes.
And I hope someone finds the time soon to review this (but
unfortunately I don't have any time at all right now).


A few questions:

Do you have your changes under version control? It would make it
easier to produce a diff and look at the changes.

Do you have examples that could be converted into tests?

Licensing:
scipy and python-control are BSD
your yottalab.py doesn't have a license statement, but it has to be
GPL by infection, because of
"from slycot import sb02od, tb03ad"  since slycot is GPL

Would it be useful to license your parts of yottalab that don't rely
on slycot as BSD?
e.g. if a replacement for slycot could be found.

or maybe python control also moves to GPL with slycot integration ?
http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Developer_assignments

Thanks,

Josef


>
> Best regards
>
> Roberto
>
> --
> -----------------------------------------------------------------------------
> Coltivate Linux! Tanto Windows si pianta da solo...
> -----------------------------------------------------------------------------
> University of Applied Sciences of Southern Switzerland
> Dept. Innovative Technologies
> CH-6928 Lugano-Manno
> http://web.dti.supsi.ch/~bucher
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From roberto.bucher at supsi.ch  Sun Aug 22 02:22:57 2010
From: roberto.bucher at supsi.ch (Roberto Bucher)
Date: Sun, 22 Aug 2010 08:22:57 +0200
Subject: [SciPy-Dev] ltisys.py
In-Reply-To: <AANLkTimSFk3XJ2XuE+ysp0aBTfZSghC0U7qkzptJLqbp@mail.gmail.com>
References: <201008212000.01702.roberto.bucher@supsi.ch>
	<AANLkTimSFk3XJ2XuE+ysp0aBTfZSghC0U7qkzptJLqbp@mail.gmail.com>
Message-ID: <201008220822.57787.roberto.bucher@supsi.ch>

Thanks Josef for your quick answer.

It's not a problem for me to work under version control. I'm already one of 
the developper of Linux RTAI and in particular I've modified the Scicoslab code 
generator in order to create code for RT systems (RTAI+dsPIC). I'm used to 
work with CVS, GIT and SVN tools.

Of course my code will be released completely under the BSD licence. It is 
still in development, because I'm still looking for Matlab FOSS replacements.
Till now, I've worked with Scicoslab, but Python seems to be a good 
alternative too.
At present, the main problem with the control system toolbox under Python is 
that it is not ready for practical applications: in particular it is not able 
to handle with discrete time systems...

I'll put ASAP two examples on my homepage:
- DC motor with state feedback controller, integrator + reduced order observer
- Inverted pendulum, with LQR state feedback + reduced order observer

My first goal was to demonstrate the possibility to reproduce all my Scicoslab 
systems under Python, and I reached it.

Best regards

Roberto


On Saturday 21 August 2010 21:09:36 josef.pktd at gmail.com wrote:
> On Sat, Aug 21, 2010 at 2:00 PM, Roberto Bucher <roberto.bucher at supsi.ch> 
wrote:
> > I'm working on a control system toolbox for python. In particular I've
> > modified some functions of the control system toolbox developed by
> > Richard Murray and I added some new functions.  The first problem is
> > related to the modul ltisys.py that I modified for handling:
> > - MIMO systems (state-space only)
> > - Sampling Time for discrete time system
> > 
> > I've done other modifications to different modul of Richard, who is in
> > CC: - matlab.py
> > - statesp.py
> > - xferfcn.py
> >  in order to implement the following new functions in my yottalab.py
> > modul: - c2d (zoh+bilinear)
> > - d2c (zoh+bilinear)
> > - dare - discrete riccati solution
> > - care - continous riccati solution
> > - dlqr - discrete linear quadratic regulator
> > - ctrb - controllability matrix
> > - acker - ackerman pole placement
> > - minreal - minimal state space representation
> > - dcgain steady state gain for both continous and discrete time systems
> > - tf (casting function)
> > - ss (casting function)
> > 
> > The modified files can be downloaded from my homepage
> > in the python section, but I want to see how I can contribute to put the
> > modifications in the main Scipy distribution.
> 
> I would find it very good if such enhancements go into scipy ltisys.
> It was often requested, and I think we will be able to use them also
> for time series analysis. I stopped looking at it after I figured out
> ltisys only handled single input, mainly continuous time processes.
> And I hope someone finds the time soon to review this (but
> unfortunately I don't have any time at all right now).
> 
> 
> A few questions:
> 
> Do you have your changes under version control? It would make it
> easier to produce a diff and look at the changes.
> 
> Do you have examples that could be converted into tests?
> 
> Licensing:
> scipy and python-control are BSD
> your yottalab.py doesn't have a license statement, but it has to be
> GPL by infection, because of
> "from slycot import sb02od, tb03ad"  since slycot is GPL
> 
> Would it be useful to license your parts of yottalab that don't rely
> on slycot as BSD?
> e.g. if a replacement for slycot could be found.
> 
> or maybe python control also moves to GPL with slycot integration ?
> http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Develo
> per_assignments
> 
> Thanks,
> 
> Josef
> 
> > Best regards
> > 
> > Roberto
> > 
> > --
> > -------------------------------------------------------------------------
> > ---- Coltivate Linux! Tanto Windows si pianta da solo...
> > -------------------------------------------------------------------------
> > ---- University of Applied Sciences of Southern Switzerland
> > Dept. Innovative Technologies
> > CH-6928 Lugano-Manno
> > http://web.dti.supsi.ch/~bucher
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev

-- 
-----------------------------------------------------------------------------
Coltivate Linux! Tanto Windows si pianta da solo...
-----------------------------------------------------------------------------
University of Applied Sciences of Southern Switzerland
Dept. Innovative Technologies
CH-6928 Lugano-Manno
http://web.dti.supsi.ch/~bucher


From strawman at astraw.com  Sun Aug 22 12:57:25 2010
From: strawman at astraw.com (Andrew Straw)
Date: Sun, 22 Aug 2010 09:57:25 -0700
Subject: [SciPy-Dev] ltisys.py
In-Reply-To: <AANLkTimSFk3XJ2XuE+ysp0aBTfZSghC0U7qkzptJLqbp@mail.gmail.com>
References: <201008212000.01702.roberto.bucher@supsi.ch>
	<AANLkTimSFk3XJ2XuE+ysp0aBTfZSghC0U7qkzptJLqbp@mail.gmail.com>
Message-ID: <4C7156F5.7020305@astraw.com>

josef.pktd at gmail.com wrote:
> Licensing:
> scipy and python-control are BSD
> your yottalab.py doesn't have a license statement, but it has to be
> GPL by infection, because of
> "from slycot import sb02od, tb03ad"  since slycot is GPL
>
> Would it be useful to license your parts of yottalab that don't rely
> on slycot as BSD?
> e.g. if a replacement for slycot could be found.
>
> or maybe python control also moves to GPL with slycot integration ?
> http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Developer_assignments
>   

Alternatively, one could ask the slycot/SLICOT authors if they would
relicense their work as BSD so that it can be included in scipy.
Historically, several authors have agreed to change GPL code to BSD when
asked nicely with this given as the reason.

-Andrew


From stefan at sun.ac.za  Mon Aug 23 04:37:01 2010
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Mon, 23 Aug 2010 10:37:01 +0200
Subject: [SciPy-Dev] question to scikits.appspot.com editors
In-Reply-To: <i4p7ol$8ns$2@dough.gmane.org>
References: <E1OmWhF-0003a0-WC@ffe8.ukr.net> <i4p7ol$8ns$2@dough.gmane.org>
Message-ID: <AANLkTi=x=hGFHYPkv+Yx_nSwrrWu81yO=WS2H9eueHvo@mail.gmail.com>

On 21 August 2010 21:00, Pauli Virtanen <pav at iki.fi> wrote:
> It's semi-automatic -- as far as I understands it looks what's in scikits
> SVN and PyPi, and works from that.

That's correct; but we could probably switch off the SVN scanning.

Since "scikits.openopt" is not found on PyPi, it assumes that the SVN
version provides the latest info.

Regards
St?fan


From oliphant at enthought.com  Mon Aug 23 15:33:54 2010
From: oliphant at enthought.com (Travis Oliphant)
Date: Mon, 23 Aug 2010 14:33:54 -0500
Subject: [SciPy-Dev] ltisys.py
In-Reply-To: <201008212000.01702.roberto.bucher@supsi.ch>
References: <201008212000.01702.roberto.bucher@supsi.ch>
Message-ID: <B272615D-C0D6-4CBB-86BC-59E5532501C9@enthought.com>


On Aug 21, 2010, at 1:00 PM, Roberto Bucher wrote:

> I'm working on a control system toolbox for python. In particular I've modified 
> some functions of the control system toolbox developed by Richard Murray and I 
> added some new functions.  The first problem is related to the modul ltisys.py 
> that I modified for handling:
> - MIMO systems (state-space only)
> - Sampling Time for discrete time system
> 
> I've done other modifications to different modul of Richard, who is in CC:
> - matlab.py
> - statesp.py
> - xferfcn.py
> in order to implement the following new functions in my yottalab.py modul:
> - c2d (zoh+bilinear)
> - d2c (zoh+bilinear)
> - dare - discrete riccati solution
> - care - continous riccati solution
> - dlqr - discrete linear quadratic regulator
> - ctrb - controllability matrix
> - acker - ackerman pole placement
> - minreal - minimal state space representation
> - dcgain steady state gain for both continous and discrete time systems
> - tf (casting function)
> - ss (casting function)
> 
> The modified files can be downloaded from my homepage 
> in the python section, but I want to see how I can contribute to put the 
> modifications in the main Scipy distribution.

These would make great additions to SciPy.      We are moving to a more distributed development model which should help you be able to make version-controlled changes to these files as part of SciPy. 

As soon as we make progress in that direction, I can review your changes and get them into SciPy.

-Travis
 

From hardbyte at gmail.com  Tue Aug 24 20:35:56 2010
From: hardbyte at gmail.com (Brian Thorne)
Date: Wed, 25 Aug 2010 12:35:56 +1200
Subject: [SciPy-Dev] Getting Scipy's weave to work reliably on Windows
In-Reply-To: <loom.20100717T171855-543@post.gmane.org>
References: <loom.20100717T171855-543@post.gmane.org>
Message-ID: <AANLkTim6qr+SCgdeDYJ=s7gNu18wZZhNwH4cW5btt5FC@mail.gmail.com>

Hi Chris,
Have you tried patching python's distutils with the patch at (
http://bugs.python.org/issue4508)? If so, does the patch fix both cases of
spaces in the path? It appears to me that one of the compiler files in the
python source is not correctly quoting of spaces in intermediate files.
Cheers,
Brian

On 18 July 2010 03:49, Chris Ball <ceball at gmail.com> wrote:

> Hi,
>
> While testing Scipy's weave on several different Windows installations, I
> came
> across some problems with spaces in paths that often prevent weave from
> working.
> I can see a change that could probably get weave working on most Windows
> installations, but it is a quick hack. Someone knowledgeable about
> distutils
> (and numpy.distutils?) might be able to help me fix this properly. Below I
> describe three common problems with weave on Windows, in the hope that this
> information helps others, or allows someone to suggest how to fix the
> spaces-in-
> paths problem properly.
>
> I think there are three common problems that stop weave from working on
> Windows.
> The first is not having a C compiler. Both Python(x,y) and EPD provide a C
> compiler that seems to work fine, which is great!
>
> The second problem is that if weave is installed to a location with a space
> in
> the path, linking fails. There is already a scipy bug report about this
> (http://projects.scipy.org/scipy/ticket/809). I've just commented on that
> report, saying the problem appears to be with distutils, and there is
> already a
> Python bug report about it (http://bugs.python.org/issue4508). Maybe
> someone
> could close this scipy bug, or link it to the Python one somehow? In any
> case,
> when using Python(x,y) or EPD, this bug will not show up if the default
> installation locations are accepted. So, that's also good news!
>
> The third problem is that if the Windows user name has a space in it (which
> in
> my experience is quite common), compilation fails. Weave uses the user name
> to
> create a path for its "intermediate" and "compiled" files. When the
> compilation
> command is issued, the path with the space in it is also not quoted.
> Presumably
> that is another error in distutils (or numpy.distutils)? Unfortunately I
> wasn't
> able to pinpoint what function is failing to quote strings properly,
> because I
> couldn't figure out the chain that leads to the compiler being called.
> However,
> I can avoid the problem by removing spaces from the user name in weave
> itself
> (catalog.py):
>
> def whoami():
>    """return a string identifying the user."""
>    return (os.environ.get("USER") or os.environ.get("USERNAME") or
> "unknown").replace(" ","")
>
> (where I have added .replace(" ","") to the existing code).
>
> I realize this isn't the right solution, so if someone could help to guide
> me to
> the point where quoting should occur, that would be very helpful.
> Otherwise, is
> there any chance of applying a hack like this so weave can work reliably on
> Windows?
>
> Thanks,
> Chris
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100825/e1e4c36a/attachment.html>

From jdh2358 at gmail.com  Wed Aug 25 10:00:00 2010
From: jdh2358 at gmail.com (John Hunter)
Date: Wed, 25 Aug 2010 09:00:00 -0500
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
Message-ID: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>

Suppose I have an ordered list/array of numbers, and I want to split
them into N chunks, such that the intersection of any chunk with each
other is empty and the data is split as evenly as possible (eg the std
dev of the lengths of the chunks is minimized or some other such
criterion).  Context: I am trying to do a quintile analysis on some
data, and np.percentile doesn't behave like I want because more than
20% of my data equals 1, so 1  is in the first and second quintiles.
I want to avoid this -- I'd rather have uneven counts in my quintiles
than have the same value show up in multiple quintiles, but I'd like
the counts to be as even as possible..

Here is some sample code that illustrates my problem:

In [178]: run ~/test

tile i=1 range=[1.00, 1.00), count=0
tile i=2 range=[1.00, 3.00), count=79
tile i=3 range=[3.00, 4.60), count=42
tile i=4 range=[4.60, 11.00), count=39
tile i=5 range=[11.00, 43.00), count=41


import numpy as np

x = np.array([  2.,   3.,   4.,   5.,   1.,   2.,   1.,   1.,   1.,   2.,   3.,
         1.,   2.,   3.,   1.,   2.,   3.,   1.,   2.,   3.,   4.,   1.,
         1.,   2.,   3.,   2.,   2.,   3.,   4.,   5.,   1.,   2.,   3.,
         4.,   5.,   6.,   7.,   1.,   1.,   2.,   3.,   4.,   5.,   6.,
         7.,   1.,   2.,   3.,   1.,   2.,   1.,   2.,   3.,   1.,   2.,
         4.,   1.,   2.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,
         3.,   1.,   1.,   1.,   1.,   1.,   1.,   2.,   1.,   2.,   3.,
         1.,   2.,   3.,   1.,   1.,   1.,   2.,   3.,   4.,   5.,   6.,
         7.,   8.,   9.,  10.,   1.,   1.,   2.,   3.,   1.,   2.,   3.,
         4.,   5.,   6.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,
         9.,  10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,
        20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,
        31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,
        42.,  43.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   1.,   2.,
         3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
        14.,  15.,  16.,  17.,  18.,  19.,   1.,   2.,   1.,   2.,   3.,
         4.,   5.,   6.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,
         3.,   4.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,   3.,
         1.,   2.,   1.,   2.])


tiles = np.percentile(x, (0, 20, 40, 60, 80, 100))

print
for i in range(1, len(tiles)):
    xmin, xmax = tiles[i-1], tiles[i]
    print 'tile i=%d range=[%.2f, %.2f), count=%d'%(i, xmin, xmax,
((x>=xmin) & (x<xmax)).sum())


From kwgoodman at gmail.com  Wed Aug 25 10:10:56 2010
From: kwgoodman at gmail.com (Keith Goodman)
Date: Wed, 25 Aug 2010 07:10:56 -0700
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
Message-ID: <AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>

On Wed, Aug 25, 2010 at 7:00 AM, John Hunter <jdh2358 at gmail.com> wrote:
> Suppose I have an ordered list/array of numbers, and I want to split
> them into N chunks, such that the intersection of any chunk with each
> other is empty and the data is split as evenly as possible (eg the std
> dev of the lengths of the chunks is minimized or some other such
> criterion).

How about using the percentiles of np.unique(x)? That takes care of
the first constraint (no overlap) but ignores the second constraint
(min std of cluster size).


From jdh2358 at gmail.com  Wed Aug 25 10:19:05 2010
From: jdh2358 at gmail.com (John Hunter)
Date: Wed, 25 Aug 2010 09:19:05 -0500
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
	<AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>
Message-ID: <AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>

On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman <kwgoodman at gmail.com> wrote:

> How about using the percentiles of np.unique(x)? That takes care of
> the first constraint (no overlap) but ignores the second constraint
> (min std of cluster size).

Well, I need the 2nd constraint....

JDH


From kwgoodman at gmail.com  Wed Aug 25 10:32:24 2010
From: kwgoodman at gmail.com (Keith Goodman)
Date: Wed, 25 Aug 2010 07:32:24 -0700
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
	<AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>
	<AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>
Message-ID: <AANLkTi=tnz3W-qSN+yhWzg-97mkR7XTX118KGQFE9Tt0@mail.gmail.com>

On Wed, Aug 25, 2010 at 7:19 AM, John Hunter <jdh2358 at gmail.com> wrote:
> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>
>> How about using the percentiles of np.unique(x)? That takes care of
>> the first constraint (no overlap) but ignores the second constraint
>> (min std of cluster size).
>
> Well, I need the 2nd constraint....

Both can't be hard constraints, so I guess the first step is to define
a utility function that quantifies the trade off between the two.
Would it make sense to then start from the percentile(unique(x), ...)
solution and come up with a heuristic that moves an item with lots of
repeats in a large length quintile to a short lenght quintile and then
accept the moves if it improves the utility? Or try moving each item
to each of the other 4 quintiles and do the move the improves the
utility the most. Then repeat until the utility doesn't improve. But I
guess I'm just stating the obvious and you are looking for something
less obvious and more clever.


From josef.pktd at gmail.com  Wed Aug 25 10:44:49 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Aug 2010 10:44:49 -0400
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTi=tnz3W-qSN+yhWzg-97mkR7XTX118KGQFE9Tt0@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
	<AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>
	<AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>
	<AANLkTi=tnz3W-qSN+yhWzg-97mkR7XTX118KGQFE9Tt0@mail.gmail.com>
Message-ID: <AANLkTimGsu3GoYv1bwKMTsz7Nfvm3POEhvp2+wKgn=Gk@mail.gmail.com>

On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Wed, Aug 25, 2010 at 7:19 AM, John Hunter <jdh2358 at gmail.com> wrote:
>> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>
>>> How about using the percentiles of np.unique(x)? That takes care of
>>> the first constraint (no overlap) but ignores the second constraint
>>> (min std of cluster size).
>>
>> Well, I need the 2nd constraint....
>
> Both can't be hard constraints, so I guess the first step is to define
> a utility function that quantifies the trade off between the two.
> Would it make sense to then start from the percentile(unique(x), ...)
> solution and come up with a heuristic that moves an item with lots of
> repeats in a large length quintile to a short lenght quintile and then
> accept the moves if it improves the utility? Or try moving each item
> to each of the other 4 quintiles and do the move the improves the
> utility the most. Then repeat until the utility doesn't improve. But I
> guess I'm just stating the obvious and you are looking for something
> less obvious and more clever.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>

What I'm doing for some statistical analysis, e.g. chisquare test with
integer data (discrete random variable)?

np.bincount to get the full count, or use theoretical pdf,
then loop over the integers (raw bins) and merge them to satisfy the
constraints.

constraints that I'm using are equal binsizes in one version and
minimum binsizes in the second version.

I haven't found anything else than the loop over the uniques, but I
think there was some discussion on this some time ago on a mailing
list.

Josef


From jswhit at fastmail.fm  Wed Aug 25 10:44:51 2010
From: jswhit at fastmail.fm (Jeff Whitaker)
Date: Wed, 25 Aug 2010 08:44:51 -0600
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
Message-ID: <4C752C63.2070102@fastmail.fm>

  On 8/25/10 8:00 AM, John Hunter wrote:
> Suppose I have an ordered list/array of numbers, and I want to split
> them into N chunks, such that the intersection of any chunk with each
> other is empty and the data is split as evenly as possible (eg the std
> dev of the lengths of the chunks is minimized or some other such
> criterion).  Context: I am trying to do a quintile analysis on some
> data, and np.percentile doesn't behave like I want because more than
> 20% of my data equals 1, so 1  is in the first and second quintiles.
> I want to avoid this -- I'd rather have uneven counts in my quintiles
> than have the same value show up in multiple quintiles, but I'd like
> the counts to be as even as possible..
>
> Here is some sample code that illustrates my problem:
>
> ....


John:  This is a problem we have quite often analyzing precip data in 
arid regions - most of the time it just doesn't rain so the distribution 
has a delta function peak at zero.  There is no good way around it.  
Sometimes people split up the sample into rain and no-rain, and treat 
the two distributions separately.

-Jeff

-- 
Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/PSD  R/PSD1        Email  : Jeffrey.S.Whitaker at noaa.gov
325 Broadway                Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg


From ben.root at ou.edu  Wed Aug 25 11:24:06 2010
From: ben.root at ou.edu (Benjamin Root)
Date: Wed, 25 Aug 2010 10:24:06 -0500
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
Message-ID: <AANLkTi=2RBDLU0dV08kcBuT9uGhZiow2Uc2M4h6+yhKc@mail.gmail.com>

On Wed, Aug 25, 2010 at 9:00 AM, John Hunter <jdh2358 at gmail.com> wrote:

> Suppose I have an ordered list/array of numbers, and I want to split
> them into N chunks, such that the intersection of any chunk with each
> other is empty and the data is split as evenly as possible (eg the std
> dev of the lengths of the chunks is minimized or some other such
> criterion).  Context: I am trying to do a quintile analysis on some
> data, and np.percentile doesn't behave like I want because more than
> 20% of my data equals 1, so 1  is in the first and second quintiles.
> I want to avoid this -- I'd rather have uneven counts in my quintiles
> than have the same value show up in multiple quintiles, but I'd like
> the counts to be as even as possible..
>
> Here is some sample code that illustrates my problem:
>
> In [178]: run ~/test
>
> tile i=1 range=[1.00, 1.00), count=0
> tile i=2 range=[1.00, 3.00), count=79
> tile i=3 range=[3.00, 4.60), count=42
> tile i=4 range=[4.60, 11.00), count=39
> tile i=5 range=[11.00, 43.00), count=41
>
>
> import numpy as np
>
> x = np.array([  2.,   3.,   4.,   5.,   1.,   2.,   1.,   1.,   1.,   2.,
> 3.,
>         1.,   2.,   3.,   1.,   2.,   3.,   1.,   2.,   3.,   4.,   1.,
>         1.,   2.,   3.,   2.,   2.,   3.,   4.,   5.,   1.,   2.,   3.,
>         4.,   5.,   6.,   7.,   1.,   1.,   2.,   3.,   4.,   5.,   6.,
>         7.,   1.,   2.,   3.,   1.,   2.,   1.,   2.,   3.,   1.,   2.,
>         4.,   1.,   2.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,
>         3.,   1.,   1.,   1.,   1.,   1.,   1.,   2.,   1.,   2.,   3.,
>         1.,   2.,   3.,   1.,   1.,   1.,   2.,   3.,   4.,   5.,   6.,
>         7.,   8.,   9.,  10.,   1.,   1.,   2.,   3.,   1.,   2.,   3.,
>         4.,   5.,   6.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,
>         9.,  10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,
>        20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,
>        31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,
>        42.,  43.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   1.,   2.,
>         3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
>        14.,  15.,  16.,  17.,  18.,  19.,   1.,   2.,   1.,   2.,   3.,
>         4.,   5.,   6.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,
>         3.,   4.,   1.,   2.,   3.,   4.,   5.,   6.,   1.,   2.,   3.,
>         1.,   2.,   1.,   2.])
>
>
> tiles = np.percentile(x, (0, 20, 40, 60, 80, 100))
>
> print
> for i in range(1, len(tiles)):
>    xmin, xmax = tiles[i-1], tiles[i]
>    print 'tile i=%d range=[%.2f, %.2f), count=%d'%(i, xmin, xmax,
> ((x>=xmin) & (x<xmax)).sum())
>

Just a crazy thought, but maybe kmeans clustering might be what you are
looking for?  If you know ahead of time the number of bins you want, you can
let kmeans try and group things automatically.  The ones will all fall into
the same membership (and any other duplicated values will, too).  If you
sort the data first, then the behavior should be consistent.

I once used kmeans to help "snap" height data from multiple observations
together onto a common set of heights.  The obs would have many zero height
values, but then the rest of the values would not have many repeated
values.  This approach worked great in our particular application.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100825/df5b7713/attachment.html>

From bsouthey at gmail.com  Wed Aug 25 11:44:22 2010
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 25 Aug 2010 10:44:22 -0500
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <AANLkTimGsu3GoYv1bwKMTsz7Nfvm3POEhvp2+wKgn=Gk@mail.gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>	<AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>	<AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>	<AANLkTi=tnz3W-qSN+yhWzg-97mkR7XTX118KGQFE9Tt0@mail.gmail.com>
	<AANLkTimGsu3GoYv1bwKMTsz7Nfvm3POEhvp2+wKgn=Gk@mail.gmail.com>
Message-ID: <4C753A56.304@gmail.com>

  On 08/25/2010 09:44 AM, josef.pktd at gmail.com wrote:
> On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>> On Wed, Aug 25, 2010 at 7:19 AM, John Hunter<jdh2358 at gmail.com>  wrote:
>>> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>>>
>>>> How about using the percentiles of np.unique(x)? That takes care of
>>>> the first constraint (no overlap) but ignores the second constraint
>>>> (min std of cluster size).
>>> Well, I need the 2nd constraint....
>> Both can't be hard constraints, so I guess the first step is to define
>> a utility function that quantifies the trade off between the two.
>> Would it make sense to then start from the percentile(unique(x), ...)
>> solution and come up with a heuristic that moves an item with lots of
>> repeats in a large length quintile to a short lenght quintile and then
>> accept the moves if it improves the utility? Or try moving each item
>> to each of the other 4 quintiles and do the move the improves the
>> utility the most. Then repeat until the utility doesn't improve. But I
>> guess I'm just stating the obvious and you are looking for something
>> less obvious and more clever.
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
> What I'm doing for some statistical analysis, e.g. chisquare test with
> integer data (discrete random variable)?
>
> np.bincount to get the full count, or use theoretical pdf,
> then loop over the integers (raw bins) and merge them to satisfy the
> constraints.
>
> constraints that I'm using are equal binsizes in one version and
> minimum binsizes in the second version.
>
> I haven't found anything else than the loop over the uniques, but I
> think there was some discussion on this some time ago on a mailing
> list.
>
> Josef
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
As others have indicated you have to work with the unique values as well 
as the frequencies.

Hopefully you can determine what I mean from the code below and modify 
it as needed. It is brute force but provides a couple of options as the 
following output indicates.

3 [(2, 44), (3, 35), (5, 42), (13, 43), (43, 38)]
4 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)]
5 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)]
6 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)]
7 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)]
8 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)]
9 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)]


Some notes:
1) For this example, you need an average of 41 per group (202 elements 
divided by 5). But that will be impossible because the value '1' has a  
frequency of 44, the sum of frequencies of '2' and '3' is 61. This means 
we need some way to allow slight increases in sizes - I use the variable 
eval which is the expected count plus some threshold (berror).

If you have floats then you can not use np.bincount directly. So if 
these are integers use them directly or use some function to create 
these in the desirable range (such as np.ceil or work with 10*x etc.)

Bruce

binx=np.bincount(x.astype(int))
for berror in range(10): # loop over a range of possible variations in 
the counts
     eval=berror+np.ceil(binx.sum()/5.0) # find a count threshold
     count=0
     quintile=[]
     for i in range(binx.shape[0]): #loop over the frequencies to 
determine which bin
         if count+binx[i] > (eval): # If the bin overflows then start a 
new one
             quintile.append((i, count))
             count=binx[i]
         else: #other keep adding into current bin
             count +=binx[i]
     quintile.append((i, count)) #add the last bin
     if len(quintile)==5: # we must have five bins otherwise that loop 
is useless. You can also apply other criteria here as well.
         print berror, quintile


From josef.pktd at gmail.com  Wed Aug 25 12:08:54 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Aug 2010 12:08:54 -0400
Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe
In-Reply-To: <4C753A56.304@gmail.com>
References: <AANLkTikFU-_-fKhmuyuL0Kc8BhKJ1o16oo5iHsPkaoWz@mail.gmail.com>
	<AANLkTi=3yEYL8N0nwnVmMsDFFa7PevBwJ31VdX7EXE=H@mail.gmail.com>
	<AANLkTikzhYnmMXdxvKyctE-Zn1DzkE6iw6H2pukRnuk1@mail.gmail.com>
	<AANLkTi=tnz3W-qSN+yhWzg-97mkR7XTX118KGQFE9Tt0@mail.gmail.com>
	<AANLkTimGsu3GoYv1bwKMTsz7Nfvm3POEhvp2+wKgn=Gk@mail.gmail.com>
	<4C753A56.304@gmail.com>
Message-ID: <AANLkTikR0O_c9gWCWNSGD72qC3wv=L+u07S4fVUkripE@mail.gmail.com>

On Wed, Aug 25, 2010 at 11:44 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> ?On 08/25/2010 09:44 AM, josef.pktd at gmail.com wrote:
>> On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman<kwgoodman at gmail.com> ?wrote:
>>> On Wed, Aug 25, 2010 at 7:19 AM, John Hunter<jdh2358 at gmail.com> ?wrote:
>>>> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman<kwgoodman at gmail.com> ?wrote:
>>>>
>>>>> How about using the percentiles of np.unique(x)? That takes care of
>>>>> the first constraint (no overlap) but ignores the second constraint
>>>>> (min std of cluster size).
>>>> Well, I need the 2nd constraint....
>>> Both can't be hard constraints, so I guess the first step is to define
>>> a utility function that quantifies the trade off between the two.
>>> Would it make sense to then start from the percentile(unique(x), ...)
>>> solution and come up with a heuristic that moves an item with lots of
>>> repeats in a large length quintile to a short lenght quintile and then
>>> accept the moves if it improves the utility? Or try moving each item
>>> to each of the other 4 quintiles and do the move the improves the
>>> utility the most. Then repeat until the utility doesn't improve. But I
>>> guess I'm just stating the obvious and you are looking for something
>>> less obvious and more clever.
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>> What I'm doing for some statistical analysis, e.g. chisquare test with
>> integer data (discrete random variable)?
>>
>> np.bincount to get the full count, or use theoretical pdf,
>> then loop over the integers (raw bins) and merge them to satisfy the
>> constraints.
>>
>> constraints that I'm using are equal binsizes in one version and
>> minimum binsizes in the second version.
>>
>> I haven't found anything else than the loop over the uniques, but I
>> think there was some discussion on this some time ago on a mailing
>> list.
>>
>> Josef
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> As others have indicated you have to work with the unique values as well
> as the frequencies.
>
> Hopefully you can determine what I mean from the code below and modify
> it as needed. It is brute force but provides a couple of options as the
> following output indicates.
>
> 3 [(2, 44), (3, 35), (5, 42), (13, 43), (43, 38)]
> 4 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)]
> 5 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)]
> 6 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)]
> 7 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)]
> 8 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)]
> 9 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)]
>
>
> Some notes:
> 1) For this example, you need an average of 41 per group (202 elements
> divided by 5). But that will be impossible because the value '1' has a
> frequency of 44, the sum of frequencies of '2' and '3' is 61. This means
> we need some way to allow slight increases in sizes - I use the variable
> eval which is the expected count plus some threshold (berror).
>
> If you have floats then you can not use np.bincount directly. So if
> these are integers use them directly or use some function to create
> these in the desirable range (such as np.ceil or work with 10*x etc.)
>
> Bruce
>
> binx=np.bincount(x.astype(int))
> for berror in range(10): # loop over a range of possible variations in
> the counts
> ? ? eval=berror+np.ceil(binx.sum()/5.0) # find a count threshold
> ? ? count=0
> ? ? quintile=[]
> ? ? for i in range(binx.shape[0]): #loop over the frequencies to
> determine which bin
> ? ? ? ? if count+binx[i] > (eval): # If the bin overflows then start a
> new one
> ? ? ? ? ? ? quintile.append((i, count))
> ? ? ? ? ? ? count=binx[i]
> ? ? ? ? else: #other keep adding into current bin
> ? ? ? ? ? ? count +=binx[i]
> ? ? quintile.append((i, count)) #add the last bin
> ? ? if len(quintile)==5: # we must have five bins otherwise that loop
> is useless. You can also apply other criteria here as well.
> ? ? ? ? print berror, quintile

I don't think you can assume anything about the minimum number of bins
in general. For example, my similar code needed to work also for
binary distributions with at most two unique values and bins. A
degenerate distribution would have only a single value and bin.

eval is a built-in function

Josef

>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From gokhansever at gmail.com  Fri Aug 27 11:55:32 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 10:55:32 -0500
Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure
Message-ID: <AANLkTimWuU2w9O44BrMvcsMMzSHo8nq+P9RkOHtXmdPL@mail.gmail.com>

Hello,

On a Fedora 13 VirtualBox setup
Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686
i686 i386 GNU/Linux

python -c 'import numpy; numpy.test()'
Running unit tests for numpy
NumPy version 2.0.0.dev8671
NumPy is installed in /usr/lib/python2.6/site-packages/numpy
Python version 2.6.4 (r264:75706, Jun  4 2010, 18:20:16) [GCC 4.4.4 20100503
(Red Hat 4.4.4-2)]
nose version 0.11.3
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K..................................................................................................................................................................................................................................K............................................................................................K......................K.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.....................................................................................................................................................................................................................................................................................................................................................................................................................................Warning:
divide by zero encountered in log
.......................................................................................................................................................................................................................................................................................
======================================================================
FAIL: test_lapack (test_build.TestF77Mismatch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/numpy/testing/decorators.py", line
146, in skipper_func
    return f(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/numpy/linalg/tests/test_build.py",
line 50, in test_lapack
    information.""")
AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! This
is likely to
cause random crashes and wrong results. See numpy INSTALL.txt for more
information.
    "Fail the test if the expression is true."
>>  if True: raise self.failureException, 'Both g77 and gfortran runtimes
linked in lapack_lite ! This is likely to\ncause random crashes and wrong
results. See numpy INSTALL.txt for more\ninformation.'


----------------------------------------------------------------------
Ran 3024 tests in 21.928s

FAILED (KNOWNFAIL=4, failures=1)


Any idea how to resolve this one? I use package manager to install
requirements. It seems g77 and gfortran are mixed for lapack, but not sure
how to fix it. When I try to uninstall gfortran it tries to remove
lapack/blas/atlas all.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/ecb63adb/attachment.html>

From gokhansever at gmail.com  Fri Aug 27 11:58:17 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 10:58:17 -0500
Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault)
Message-ID: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>

Hello,

Again on Fedora 13 Virtualbox setup:
Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686
i686 i386 GNU/Linux


Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:16)
[GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.test('full')
Running unit tests for scipy
NumPy version 2.0.0.dev8671
NumPy is installed in /usr/lib/python2.6/site-packages/numpy
SciPy version 0.9.0.dev6651
SciPy is installed in /usr/lib/python2.6/site-packages/scipy
Python version 2.6.4 (r264:75706, Jun  4 2010, 18:20:16) [GCC 4.4.4 20100503
(Red Hat 4.4.4-2)]
nose version 0.11.3
................................................................................................................................................................................................................................................................................/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:670:
UserWarning:
The coefficients of the spline returned have been computed as the
minimal norm least-squares solution of a (numerically) rank deficient
system (deficiency=7). If deficiency is large, the results may be
inaccurate. Deficiency may strongly depend on the value of eps.
  warnings.warn(message)
....../usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:601:
UserWarning:
The required storage space exceeds the available storage space: nxest
or nyest too small, or s too small.
The weighted least-squares spline corresponds to the current set of
knots.
  warnings.warn(message)
.............................................K..K............................................................Warning:
divide by zero encountered in log
Warning: invalid value encountered in multiply
Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
.Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
.Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
.........Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
...................................................................................................................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/io/recaster.py:328:
ComplexWarning: Casting complex values to real discards the imaginary part
  test_arr = arr.astype(T)
../usr/lib/python2.6/site-packages/scipy/io/recaster.py:375: ComplexWarning:
Casting complex values to real discards the imaginary part
  return arr.astype(idt)
../usr/lib/python2.6/site-packages/scipy/io/wavfile.py:30: WavFileWarning:
Unfamiliar format bytes
  warnings.warn("Unfamiliar format bytes", WavFileWarning)
/usr/lib/python2.6/site-packages/scipy/io/wavfile.py:120: WavFileWarning:
chunk not understood
  warnings.warn("chunk not understood", WavFileWarning)
.......................................................................................F............................................/usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:86:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
....../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:196:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
.................../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:279:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
..................................................................SSSSSS......SSSSSS......SSSS.....................................................FF........F..Warning:
invalid value encountered in divide
.....Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
.......................................................................................................................................................................................................K............................................./usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:89:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
....../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:199:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
.................../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:282:
ComplexWarning: Casting complex values to real discards the imaginary part
  self.blas_func(x,y,n=3,incy=5)
....................................................................../usr/lib/python2.6/site-packages/scipy/linalg/matfuncs.py:94:
ComplexWarning: Casting complex values to real discards the imaginary part
  return dot(dot(vr,diag(exp(s))),vri).astype(t)
.................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:56:
ComplexWarning: Casting complex values to real discards the imaginary part
  a = a.astype(numpy.float64)
/usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:58:
ComplexWarning: Casting complex values to real discards the imaginary part
  b = b.astype(numpy.float64)
......................................................................................................Warning:
invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
.....................................................................Warning:
invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
........................Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
Warning: invalid value encountered in divide
..............................................9 7
....F...............Warning: invalid value encountered in divide
................./usr/lib/python2.6/site-packages/scipy/signal/filter_design.py:247:
BadCoefficients: Badly conditioned filter coefficients (numerator): the
results may be meaningless
  "results may be meaningless", BadCoefficients)
..............................................................................................................................................................................................................................................................................................SSSSSSSSSSS..........Segmentation
fault (core dumped)


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/50bdc1fa/attachment.html>

From bsouthey at gmail.com  Fri Aug 27 14:14:32 2010
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 27 Aug 2010 13:14:32 -0500
Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault)
In-Reply-To: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>
References: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>
Message-ID: <4C780088.4090602@gmail.com>

  On 08/27/2010 10:58 AM, G?khan Sever wrote:
> Hello,
>
> Again on Fedora 13 Virtualbox setup:
> Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 
> i686 i686 i386 GNU/Linux
>
>
> Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:16)
> [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import scipy
> >>> scipy.test('full')
> Running unit tests for scipy
> NumPy version 2.0.0.dev8671
> NumPy is installed in /usr/lib/python2.6/site-packages/numpy
> SciPy version 0.9.0.dev6651
> SciPy is installed in /usr/lib/python2.6/site-packages/scipy
> Python version 2.6.4 (r264:75706, Jun  4 2010, 18:20:16) [GCC 4.4.4 
> 20100503 (Red Hat 4.4.4-2)]
> nose version 0.11.3
> ................................................................................................................................................................................................................................................................................/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:670: 
> UserWarning:
> The coefficients of the spline returned have been computed as the
> minimal norm least-squares solution of a (numerically) rank deficient
> system (deficiency=7). If deficiency is large, the results may be
> inaccurate. Deficiency may strongly depend on the value of eps.
>   warnings.warn(message)
> ....../usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:601: 
> UserWarning:
> The required storage space exceeds the available storage space: nxest
> or nyest too small, or s too small.
> The weighted least-squares spline corresponds to the current set of
> knots.
>   warnings.warn(message)
> .............................................K..K............................................................Warning: 
> divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> .Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> .Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> .........Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> ...................................................................................................................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/io/recaster.py:328: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   test_arr = arr.astype(T)
> ../usr/lib/python2.6/site-packages/scipy/io/recaster.py:375: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   return arr.astype(idt)
> ../usr/lib/python2.6/site-packages/scipy/io/wavfile.py:30: 
> WavFileWarning: Unfamiliar format bytes
>   warnings.warn("Unfamiliar format bytes", WavFileWarning)
> /usr/lib/python2.6/site-packages/scipy/io/wavfile.py:120: 
> WavFileWarning: chunk not understood
>   warnings.warn("chunk not understood", WavFileWarning)
> .......................................................................................F............................................/usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:86: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> ....../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:196: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> .................../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:279: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> ..................................................................SSSSSS......SSSSSS......SSSS.....................................................FF........F..Warning: 
> invalid value encountered in divide
> .....Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> .......................................................................................................................................................................................................K............................................./usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:89: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> ....../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:199: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> .................../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:282: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   self.blas_func(x,y,n=3,incy=5)
> ....................................................................../usr/lib/python2.6/site-packages/scipy/linalg/matfuncs.py:94: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   return dot(dot(vr,diag(exp(s))),vri).astype(t)
> .................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:56: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   a = a.astype(numpy.float64)
> /usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:58: 
> ComplexWarning: Casting complex values to real discards the imaginary part
>   b = b.astype(numpy.float64)
> ......................................................................................................Warning: 
> invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> .....................................................................Warning: 
> invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> ........................Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> Warning: invalid value encountered in divide
> ..............................................9 7
> ....F...............Warning: invalid value encountered in divide
> ................./usr/lib/python2.6/site-packages/scipy/signal/filter_design.py:247: 
> BadCoefficients: Badly conditioned filter coefficients (numerator): 
> the results may be meaningless
>   "results may be meaningless", BadCoefficients)
> ..............................................................................................................................................................................................................................................................................................SSSSSSSSSSS..........Segmentation 
> fault (core dumped)
>
>
> -- 
> G?khan
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
Hi,
It would be useful to know which test is involved regardless of the 
issue. Can you please run the tests with the verbose option such as: 
'scipy.test(verbose=10)'?

This may have something to do with your numpy problem so please fix that 
up once you have identified the test (failure in linalg would confirm 
that).

As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version 
4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'.

Did you completely wipe the previous numpy installation especially the 
installed numpy files in $PATH2PYTHON/site-packages/ and remove any 
prior build directories?
If so, then you need to create suitable site.cfg file to ensure the 
correct compiler is being used because something has changed in either 
your distro or numpy install.

Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/129dabba/attachment.html>

From sam.m.birch at gmail.com  Fri Aug 27 14:17:25 2010
From: sam.m.birch at gmail.com (Sam Birch)
Date: Fri, 27 Aug 2010 14:17:25 -0400
Subject: [SciPy-Dev] scipy.stats.kde
Message-ID: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>

Hi all,

I was thinking of renovating the kernel density estimation package (although
no promises; I'm leaving for college tomorrow morning!). I was wondering:

a) whether anyone had started code in that direction

b) what people want in it

I was thinking (as an ideal, not necessarily goal):
- Support for more than Gaussian kernels (e.g. custom, uniform, Epanechnikov,
triangular, quartic, cosine, etc.)
- More options for bandwidth selection (custom bandwidth matrices, AMISE
optimization, cross-validation, etc.)
- Assorted conveniences: automatically generate the mesh, limit the kernel's
support for speed

So, thoughts anyone? I figure it's better to over-specify and then
under-produce, so don't hold back.

Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/44454c43/attachment.html>

From josef.pktd at gmail.com  Fri Aug 27 14:38:45 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Aug 2010 14:38:45 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
Message-ID: <AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>

On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com> wrote:
> Hi all,
> I was thinking of?renovating?the kernel density estimation package (although
> no promises; I'm leaving for college tomorrow morning!). I was wondering:
> a) whether anyone had started code in that direction

Mike Crowe wrote code for kernel regression  and Skipper started a 1D
kernel density estimator in scikits.statsmodels, which cover a larger
number of kernels

I don't think I have seen any higher dimensional kernel density
estimation in python besides scipy.stats.kde. The Gaussian kde in
scipy.stats is targeted to the underlying Fortran code for
multivariate normal cdf.
It's not clear to me what other n-dimensional kdes would require or
whether they would fit well with the current code.

One extension that Robert also mentioned in the past that it would be
nice to have adaptive kernels, which I also haven't seen in python
yet.

> b) what people want in it
> I was thinking (as an ideal, not?necessarily?goal):
> - Support for more than Gaussian kernels (e.g. custom,
> uniform,?Epanechnikov, triangular, quartic, cosine, etc.)
> - More options for bandwidth selection (custom bandwidth?matrices, AMISE
> optimization, cross-validation, etc.)

definitely yes, I don't think they are even available for 1D yet.

> - Assorted conveniences: automatically generate the mesh, limit the kernel's
> support for speed

Using scipy.spatial to limit the number of neighbors in a bounded
support kernel might be a good idea.

(just some thought on the topic)

Josef

> So, thoughts anyone? I figure it's better to over-specify and then
> under-produce, so don't hold back.
> Thanks,
> Sam
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From sam.m.birch at gmail.com  Fri Aug 27 14:47:37 2010
From: sam.m.birch at gmail.com (Sam Birch)
Date: Fri, 27 Aug 2010 14:47:37 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
Message-ID: <AANLkTi=Geb4f4ksn67Lt-g5egwEKQZBg8Lw5EgLDTH4o@mail.gmail.com>

Well perhaps I should start with a module that does other kernels &
bandwidth estimation then? Then everybody who uses them can use a standard
implementation. Is that an appropriate addition to SciPy core?

On Fri, Aug 27, 2010 at 2:38 PM, <josef.pktd at gmail.com> wrote:

> On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com> wrote:
> > Hi all,
> > I was thinking of renovating the kernel density estimation package
> (although
> > no promises; I'm leaving for college tomorrow morning!). I was wondering:
> > a) whether anyone had started code in that direction
>
> Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> kernel density estimator in scikits.statsmodels, which cover a larger
> number of kernels
>
> I don't think I have seen any higher dimensional kernel density
> estimation in python besides scipy.stats.kde. The Gaussian kde in
> scipy.stats is targeted to the underlying Fortran code for
> multivariate normal cdf.
> It's not clear to me what other n-dimensional kdes would require or
> whether they would fit well with the current code.
>
> One extension that Robert also mentioned in the past that it would be
> nice to have adaptive kernels, which I also haven't seen in python
> yet.
>
> > b) what people want in it
> > I was thinking (as an ideal, not necessarily goal):
> > - Support for more than Gaussian kernels (e.g. custom,
> > uniform, Epanechnikov, triangular, quartic, cosine, etc.)
> > - More options for bandwidth selection (custom bandwidth matrices, AMISE
> > optimization, cross-validation, etc.)
>
> definitely yes, I don't think they are even available for 1D yet.
>
> > - Assorted conveniences: automatically generate the mesh, limit the
> kernel's
> > support for speed
>
> Using scipy.spatial to limit the number of neighbors in a bounded
> support kernel might be a good idea.
>
> (just some thought on the topic)
>
> Josef
>
> > So, thoughts anyone? I figure it's better to over-specify and then
> > under-produce, so don't hold back.
> > Thanks,
> > Sam
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/62b45a44/attachment.html>

From aarchiba at physics.mcgill.ca  Fri Aug 27 14:48:29 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 27 Aug 2010 14:48:29 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
Message-ID: <AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>

My only experience with KDEs has been on the circle, where there seems
to be little or no literature and the constraints are rather
different.

On 27 August 2010 14:38,  <josef.pktd at gmail.com> wrote:
> On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com> wrote:
>> Hi all,
>> I was thinking of renovating the kernel density estimation package (although
>> no promises; I'm leaving for college tomorrow morning!). I was wondering:
>> a) whether anyone had started code in that direction
>
> Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> kernel density estimator in scikits.statsmodels, which cover a larger
> number of kernels
>
> I don't think I have seen any higher dimensional kernel density
> estimation in python besides scipy.stats.kde. The Gaussian kde in
> scipy.stats is targeted to the underlying Fortran code for
> multivariate normal cdf.
> It's not clear to me what other n-dimensional kdes would require or
> whether they would fit well with the current code.
>
> One extension that Robert also mentioned in the past that it would be
> nice to have adaptive kernels, which I also haven't seen in python
> yet.
>
>> b) what people want in it
>> I was thinking (as an ideal, not necessarily goal):
>> - Support for more than Gaussian kernels (e.g. custom,
>> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
>> - More options for bandwidth selection (custom bandwidth matrices, AMISE
>> optimization, cross-validation, etc.)
>
> definitely yes, I don't think they are even available for 1D yet.

Bandwidth selection is a hotly debated topic, at least in one
dimension, so perhaps not just different methods but tools for
diagnosing bandwidth selection problems would be nice - at the least,
it should be made straightforward to vary the bandwidth (e.g. to plot
the KDE with a range of different bandwidth values).

>> - Assorted conveniences: automatically generate the mesh, limit the kernel's
>> support for speed
>
> Using scipy.spatial to limit the number of neighbors in a bounded
> support kernel might be a good idea.

Simply using it to find the neighbors that need to be used should
speed things up. There may also be some shortcuts for
unbounded-support kernels (no point adding a Gaussian a hundred sigma
away if there's any points nearby).

At the other end of the spectrum, for very dense KDEs, on the circle I
found it extremely convenient to use Fourier transforms to carry out
the convolution of kernel with points. In particular, I represented
the KDE in terms of its Fourier coefficients, so that an inverse FFT
immediately gave me the KDE evaluated on a grid (or, with some
fiddling, integrated over the bins of a histogram). I don't know
whether this is a useful optimization for KDEs on the line or in
higher dimensions, since there's the problem of wrapping.

Anne

> (just some thought on the topic)
>
> Josef
>
>> So, thoughts anyone? I figure it's better to over-specify and then
>> under-produce, so don't hold back.
>> Thanks,
>> Sam
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From robert.kern at gmail.com  Fri Aug 27 14:56:27 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 27 Aug 2010 13:56:27 -0500
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
Message-ID: <AANLkTimQiQvZx-GdNynby8-MBirSb-rGuZSND-n9THW_@mail.gmail.com>

On Fri, Aug 27, 2010 at 13:38,  <josef.pktd at gmail.com> wrote:

> I don't think I have seen any higher dimensional kernel density
> estimation in python besides scipy.stats.kde. The Gaussian kde in
> scipy.stats is targeted to the underlying Fortran code for
> multivariate normal cdf.

Only for the "integrate over a box" functionality, which was what I
needed at the time but is pretty rarely required otherwise. The rest
is pure numpy.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From aarchiba at physics.mcgill.ca  Fri Aug 27 15:05:17 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 27 Aug 2010 15:05:17 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTimQiQvZx-GdNynby8-MBirSb-rGuZSND-n9THW_@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTimQiQvZx-GdNynby8-MBirSb-rGuZSND-n9THW_@mail.gmail.com>
Message-ID: <AANLkTim0E+tPGisU=OKNFGuDy8NLLdYs=A=TBM8Qd6Ki@mail.gmail.com>

On 27 August 2010 14:56, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Aug 27, 2010 at 13:38,  <josef.pktd at gmail.com> wrote:
>
>> I don't think I have seen any higher dimensional kernel density
>> estimation in python besides scipy.stats.kde. The Gaussian kde in
>> scipy.stats is targeted to the underlying Fortran code for
>> multivariate normal cdf.
>
> Only for the "integrate over a box" functionality, which was what I
> needed at the time but is pretty rarely required otherwise. The rest
> is pure numpy.

I should say, integrating over a box is something I do all the time,
though that is partly because it is cheap in my setting. For example,
for plotting on a grid, what you really want to do is not sample on
the grid but produce average values over the grid cells - this way you
never miss or exaggerate a peak. So having efficient methods to
integrate over one box or all grid cells can be really handy.
Unfortunately I think it is often expensive even when approximations
are made that allow discarding sufficiently distant points.

Anne

> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


From robert.kern at gmail.com  Fri Aug 27 15:09:22 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 27 Aug 2010 14:09:22 -0500
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTim0E+tPGisU=OKNFGuDy8NLLdYs=A=TBM8Qd6Ki@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTimQiQvZx-GdNynby8-MBirSb-rGuZSND-n9THW_@mail.gmail.com>
	<AANLkTim0E+tPGisU=OKNFGuDy8NLLdYs=A=TBM8Qd6Ki@mail.gmail.com>
Message-ID: <AANLkTik=j5CeorEJbfOtV1P9n_koMxQ+KxeatvNQOR_d@mail.gmail.com>

On Fri, Aug 27, 2010 at 14:05, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> On 27 August 2010 14:56, Robert Kern <robert.kern at gmail.com> wrote:
>> On Fri, Aug 27, 2010 at 13:38, ?<josef.pktd at gmail.com> wrote:
>>
>>> I don't think I have seen any higher dimensional kernel density
>>> estimation in python besides scipy.stats.kde. The Gaussian kde in
>>> scipy.stats is targeted to the underlying Fortran code for
>>> multivariate normal cdf.
>>
>> Only for the "integrate over a box" functionality, which was what I
>> needed at the time but is pretty rarely required otherwise. The rest
>> is pure numpy.
>
> I should say, integrating over a box is something I do all the time,
> though that is partly because it is cheap in my setting. For example,
> for plotting on a grid, what you really want to do is not sample on
> the grid but produce average values over the grid cells - this way you
> never miss or exaggerate a peak. So having efficient methods to
> integrate over one box or all grid cells can be really handy.
> Unfortunately I think it is often expensive even when approximations
> are made that allow discarding sufficiently distant points.

Well okay then. :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From sam.m.birch at gmail.com  Fri Aug 27 15:27:45 2010
From: sam.m.birch at gmail.com (Sam Birch)
Date: Fri, 27 Aug 2010 15:27:45 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>
Message-ID: <AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>

>
> Bandwidth selection is a hotly debated topic, at least in one

dimension, so perhaps not just different methods but tools for

diagnosing bandwidth selection problems would be nice - at the least,

it should be made straightforward to vary the bandwidth (e.g. to plot

the KDE with a range of different bandwidth values).

Well by allowing them to use a custom bandwidth matrix they can vary it
themselves, no?


 At the other end of the spectrum, for very dense KDEs, on the circle I

found it extremely convenient to use Fourier transforms to carry out

the convolution of kernel with points. In particular, I represented

the KDE in terms of its Fourier coefficients, so that an inverse FFT

immediately gave me the KDE evaluated on a grid (or, with some

fiddling, integrated over the bins of a histogram). I don't know

whether this is a useful optimization for KDEs on the line or in

higher dimensions, since there's the problem of wrapping.

That sounds very interesting. Sorry if I'm being dense (or just wrong, or
both), but do you convolve post-FFT or before? If before why does it make it
easier?

-Sam

On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald
<aarchiba at physics.mcgill.ca>wrote:

> My only experience with KDEs has been on the circle, where there seems
> to be little or no literature and the constraints are rather
> different.
>
> On 27 August 2010 14:38,  <josef.pktd at gmail.com> wrote:
> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com>
> wrote:
> >> Hi all,
> >> I was thinking of renovating the kernel density estimation package
> (although
> >> no promises; I'm leaving for college tomorrow morning!). I was
> wondering:
> >> a) whether anyone had started code in that direction
> >
> > Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> > kernel density estimator in scikits.statsmodels, which cover a larger
> > number of kernels
> >
> > I don't think I have seen any higher dimensional kernel density
> > estimation in python besides scipy.stats.kde. The Gaussian kde in
> > scipy.stats is targeted to the underlying Fortran code for
> > multivariate normal cdf.
> > It's not clear to me what other n-dimensional kdes would require or
> > whether they would fit well with the current code.
> >
> > One extension that Robert also mentioned in the past that it would be
> > nice to have adaptive kernels, which I also haven't seen in python
> > yet.
> >
> >> b) what people want in it
> >> I was thinking (as an ideal, not necessarily goal):
> >> - Support for more than Gaussian kernels (e.g. custom,
> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
> >> - More options for bandwidth selection (custom bandwidth matrices, AMISE
> >> optimization, cross-validation, etc.)
> >
> > definitely yes, I don't think they are even available for 1D yet.
>
> Bandwidth selection is a hotly debated topic, at least in one
> dimension, so perhaps not just different methods but tools for
> diagnosing bandwidth selection problems would be nice - at the least,
> it should be made straightforward to vary the bandwidth (e.g. to plot
> the KDE with a range of different bandwidth values).
>
> >> - Assorted conveniences: automatically generate the mesh, limit the
> kernel's
> >> support for speed
> >
> > Using scipy.spatial to limit the number of neighbors in a bounded
> > support kernel might be a good idea.
>
> Simply using it to find the neighbors that need to be used should
> speed things up. There may also be some shortcuts for
> unbounded-support kernels (no point adding a Gaussian a hundred sigma
> away if there's any points nearby).
>
> At the other end of the spectrum, for very dense KDEs, on the circle I
> found it extremely convenient to use Fourier transforms to carry out
> the convolution of kernel with points. In particular, I represented
> the KDE in terms of its Fourier coefficients, so that an inverse FFT
> immediately gave me the KDE evaluated on a grid (or, with some
> fiddling, integrated over the bins of a histogram). I don't know
> whether this is a useful optimization for KDEs on the line or in
> higher dimensions, since there's the problem of wrapping.
>
> Anne
>
> > (just some thought on the topic)
> >
> > Josef
> >
> >> So, thoughts anyone? I figure it's better to over-specify and then
> >> under-produce, so don't hold back.
> >> Thanks,
> >> Sam
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>
> >>
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/92c066fe/attachment.html>

From josef.pktd at gmail.com  Fri Aug 27 15:39:27 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 27 Aug 2010 15:39:27 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>
	<AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>
Message-ID: <AANLkTi=xuZXOAs2Qqisd-Gg8+4O32MeRrrHLP_F6uo2s@mail.gmail.com>

On Fri, Aug 27, 2010 at 3:27 PM, Sam Birch <sam.m.birch at gmail.com> wrote:
>> Bandwidth selection is a hotly debated topic, at least in one
>
> dimension, so perhaps not just different methods but tools for
>
> diagnosing bandwidth selection problems would be nice - at the least,
>
> it should be made straightforward to vary the bandwidth (e.g. to plot
>
> the KDE with a range of different bandwidth values).
>
> Well by allowing them to use a custom bandwidth matrix they can vary it
> themselves, no?
>
>> At the other end of the spectrum, for very dense KDEs, on the circle I
>
> found it extremely convenient to use Fourier transforms to carry out
>
> the convolution of kernel with points. In particular, I represented
>
> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>
> immediately gave me the KDE evaluated on a grid (or, with some
>
> fiddling, integrated over the bins of a histogram). I don't know
>
> whether this is a useful optimization for KDEs on the line or in
>
> higher dimensions, since there's the problem of wrapping.
>
> That sounds very interesting. Sorry if I'm being dense (or just wrong, or
> both), but do you convolve post-FFT or before? If before why does it make it
> easier?

and also: Do you grid the initial points first ?
I think it sounds similar to what Skipper was trying at some point.
>From the paper it sounded like it's expensive to construct the initial
points, but then much cheaper to evaluate the kde at many points
because of the use of the fft for the actual convolution.

Josef

> -Sam
> On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald <aarchiba at physics.mcgill.ca>
> wrote:
>>
>> My only experience with KDEs has been on the circle, where there seems
>> to be little or no literature and the constraints are rather
>> different.
>>
>> On 27 August 2010 14:38, ?<josef.pktd at gmail.com> wrote:
>> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com>
>> > wrote:
>> >> Hi all,
>> >> I was thinking of renovating the kernel density estimation package
>> >> (although
>> >> no promises; I'm leaving for college tomorrow morning!). I was
>> >> wondering:
>> >> a) whether anyone had started code in that direction
>> >
>> > Mike Crowe wrote code for kernel regression ?and Skipper started a 1D
>> > kernel density estimator in scikits.statsmodels, which cover a larger
>> > number of kernels
>> >
>> > I don't think I have seen any higher dimensional kernel density
>> > estimation in python besides scipy.stats.kde. The Gaussian kde in
>> > scipy.stats is targeted to the underlying Fortran code for
>> > multivariate normal cdf.
>> > It's not clear to me what other n-dimensional kdes would require or
>> > whether they would fit well with the current code.
>> >
>> > One extension that Robert also mentioned in the past that it would be
>> > nice to have adaptive kernels, which I also haven't seen in python
>> > yet.
>> >
>> >> b) what people want in it
>> >> I was thinking (as an ideal, not necessarily goal):
>> >> - Support for more than Gaussian kernels (e.g. custom,
>> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
>> >> - More options for bandwidth selection (custom bandwidth matrices,
>> >> AMISE
>> >> optimization, cross-validation, etc.)
>> >
>> > definitely yes, I don't think they are even available for 1D yet.
>>
>> Bandwidth selection is a hotly debated topic, at least in one
>> dimension, so perhaps not just different methods but tools for
>> diagnosing bandwidth selection problems would be nice - at the least,
>> it should be made straightforward to vary the bandwidth (e.g. to plot
>> the KDE with a range of different bandwidth values).
>>
>> >> - Assorted conveniences: automatically generate the mesh, limit the
>> >> kernel's
>> >> support for speed
>> >
>> > Using scipy.spatial to limit the number of neighbors in a bounded
>> > support kernel might be a good idea.
>>
>> Simply using it to find the neighbors that need to be used should
>> speed things up. There may also be some shortcuts for
>> unbounded-support kernels (no point adding a Gaussian a hundred sigma
>> away if there's any points nearby).
>>
>> At the other end of the spectrum, for very dense KDEs, on the circle I
>> found it extremely convenient to use Fourier transforms to carry out
>> the convolution of kernel with points. In particular, I represented
>> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>> immediately gave me the KDE evaluated on a grid (or, with some
>> fiddling, integrated over the bins of a histogram). I don't know
>> whether this is a useful optimization for KDEs on the line or in
>> higher dimensions, since there's the problem of wrapping.
>>
>> Anne
>>
>> > (just some thought on the topic)
>> >
>> > Josef
>> >
>> >> So, thoughts anyone? I figure it's better to over-specify and then
>> >> under-produce, so don't hold back.
>> >> Thanks,
>> >> Sam
>> >> _______________________________________________
>> >> SciPy-Dev mailing list
>> >> SciPy-Dev at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >>
>> >>
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From aarchiba at physics.mcgill.ca  Fri Aug 27 15:51:50 2010
From: aarchiba at physics.mcgill.ca (Anne Archibald)
Date: Fri, 27 Aug 2010 15:51:50 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>
	<AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>
Message-ID: <AANLkTinLpxrR2obSKZ0cnvLk9aPpzD=bkgSDLH2JTDQe@mail.gmail.com>

On 27 August 2010 15:27, Sam Birch <sam.m.birch at gmail.com> wrote:
>> Bandwidth selection is a hotly debated topic, at least in one
>
> dimension, so perhaps not just different methods but tools for
>
> diagnosing bandwidth selection problems would be nice - at the least,
>
> it should be made straightforward to vary the bandwidth (e.g. to plot
>
> the KDE with a range of different bandwidth values).
>
> Well by allowing them to use a custom bandwidth matrix they can vary it
> themselves, no?

Well, in principle, yes. But if the API forces them to construct an
entirely new KDE object to change the bandwidth matrix, and if this
object involves substantial additional data structures (e.g. a kd-tree
holding the data points) this could be cumbersome.

>> At the other end of the spectrum, for very dense KDEs, on the circle I
>
> found it extremely convenient to use Fourier transforms to carry out
>
> the convolution of kernel with points. In particular, I represented
>
> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>
> immediately gave me the KDE evaluated on a grid (or, with some
>
> fiddling, integrated over the bins of a histogram). I don't know
>
> whether this is a useful optimization for KDEs on the line or in
>
> higher dimensions, since there's the problem of wrapping.
>
> That sounds very interesting. Sorry if I'm being dense (or just wrong, or
> both), but do you convolve post-FFT or before? If before why does it make it
> easier?

Again, this is for work on the circle and for fairly dense data sets.
But in principle, the KDE as a function is the convolution of a forest
of delta functions, one per point, with the kernel. The conventional
way to evaluate this function at a point is simply to evaluate the
kernel once per data point and add them up. To evaluate this on a
grid, you evaluate the kernel once per grid point per data point and
add. Naturally this can be expensive.

My idea is that convolution of functions is simply multiplication of
the Fourier transforms of those functions. So instead of storing a
list of data points in my KDE object, I store a representation of the
forest of delta functions in terms of their Fourier coefficients (the
nth Fourier coefficient of a delta function at phase p is exp(2 pi i n
p)). This is necessarily approximate, since I store finitely many
Fourier coefficients, but it's not hard to store "enough". Now when I
want to convolve this forest by a kernel, I simply multiply these
Fourier coefficients by those of the kernel. The easiest "kernel" is
the sinc function, for which I simply truncate the Fourier
coefficients (which is why it's easy to have enough). We actually use
this "kernel" a lot, even though it's not positive everywhere. A
mathematically-better choice is the von Mises distribution, whose PDF
is proportional to exp(k cos(x)) and whose Fourier coefficients can be
written in terms of Bessel functions.

Once you have the Fourier coefficients of the KDE, you can evaluate it
at a point by taking a sum of sinusoids, but the key idea is that you
can evaluate it on a grid by taking an inverse FFT. If you want
integrals over intervals, well, that you just get by integrating
sinusoids over intervals, so there's a messy but easily-derived way to
work out the area in terms of the Fourier coefficients. This too can
be nicely worked out on a grid, by fiddling the Fourier coefficients
and taking an inverse FFT.


To construct the FCs of the forest of delta functions, if I have
photon arrival phases I just take a sum (which can be slow, but this
isn't really time-critical). But it's also perfectly reasonable to
start from a histogram and take an FFT. Crucially, the histogram need
not be a reasonable-looking histogram - you can never have too many
bins, since it's not a problem if all the bin counts are either zero
or one. The only drawback here is that you introduce an error
averaging to half a bin width on each photon arrival phase. But the
KDE provides a check on this too - if your kernel width is much larger
than the width of the input bins, then the errors you introduced
probably don't matter much (leaving aside nasty issues with Moire
patterns in the likely case that your input times were already
binned).

One thing to note here is that once you have the FCs, you can try
various kernels and bandwidths without going back to your original
data. (You can also get uncertainties on all the various computed
quantities, and in fact you can usually turn around and not only start
from a histogram but start from an array of values with uncertainties.
All this stuff is in a paper that's on my back burner right now.)


The thing is, I don't really know how useful all this is for KDEs on a
line or in R^n. The problem is that working with discrete Fourier
coefficients implicitly wraps the KDE around at the ends of the
interval, and it's not clear that this is still worth doing if you're
going to "pad" your region enough that this isn't a problem: the
padding forces you to evaluate at lots of points you don't care about
and use lots more Fourier coefficients than you would otherwise have
to.


Anne

> -Sam
> On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald <aarchiba at physics.mcgill.ca>
> wrote:
>>
>> My only experience with KDEs has been on the circle, where there seems
>> to be little or no literature and the constraints are rather
>> different.
>>
>> On 27 August 2010 14:38,  <josef.pktd at gmail.com> wrote:
>> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com>
>> > wrote:
>> >> Hi all,
>> >> I was thinking of renovating the kernel density estimation package
>> >> (although
>> >> no promises; I'm leaving for college tomorrow morning!). I was
>> >> wondering:
>> >> a) whether anyone had started code in that direction
>> >
>> > Mike Crowe wrote code for kernel regression  and Skipper started a 1D
>> > kernel density estimator in scikits.statsmodels, which cover a larger
>> > number of kernels
>> >
>> > I don't think I have seen any higher dimensional kernel density
>> > estimation in python besides scipy.stats.kde. The Gaussian kde in
>> > scipy.stats is targeted to the underlying Fortran code for
>> > multivariate normal cdf.
>> > It's not clear to me what other n-dimensional kdes would require or
>> > whether they would fit well with the current code.
>> >
>> > One extension that Robert also mentioned in the past that it would be
>> > nice to have adaptive kernels, which I also haven't seen in python
>> > yet.
>> >
>> >> b) what people want in it
>> >> I was thinking (as an ideal, not necessarily goal):
>> >> - Support for more than Gaussian kernels (e.g. custom,
>> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
>> >> - More options for bandwidth selection (custom bandwidth matrices,
>> >> AMISE
>> >> optimization, cross-validation, etc.)
>> >
>> > definitely yes, I don't think they are even available for 1D yet.
>>
>> Bandwidth selection is a hotly debated topic, at least in one
>> dimension, so perhaps not just different methods but tools for
>> diagnosing bandwidth selection problems would be nice - at the least,
>> it should be made straightforward to vary the bandwidth (e.g. to plot
>> the KDE with a range of different bandwidth values).
>>
>> >> - Assorted conveniences: automatically generate the mesh, limit the
>> >> kernel's
>> >> support for speed
>> >
>> > Using scipy.spatial to limit the number of neighbors in a bounded
>> > support kernel might be a good idea.
>>
>> Simply using it to find the neighbors that need to be used should
>> speed things up. There may also be some shortcuts for
>> unbounded-support kernels (no point adding a Gaussian a hundred sigma
>> away if there's any points nearby).
>>
>> At the other end of the spectrum, for very dense KDEs, on the circle I
>> found it extremely convenient to use Fourier transforms to carry out
>> the convolution of kernel with points. In particular, I represented
>> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>> immediately gave me the KDE evaluated on a grid (or, with some
>> fiddling, integrated over the bins of a histogram). I don't know
>> whether this is a useful optimization for KDEs on the line or in
>> higher dimensions, since there's the problem of wrapping.
>>
>> Anne
>>
>> > (just some thought on the topic)
>> >
>> > Josef
>> >
>> >> So, thoughts anyone? I figure it's better to over-specify and then
>> >> under-produce, so don't hold back.
>> >> Thanks,
>> >> Sam
>> >> _______________________________________________
>> >> SciPy-Dev mailing list
>> >> SciPy-Dev at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >>
>> >>
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


From sam.m.birch at gmail.com  Fri Aug 27 16:37:03 2010
From: sam.m.birch at gmail.com (Sam Birch)
Date: Fri, 27 Aug 2010 16:37:03 -0400
Subject: [SciPy-Dev] scipy.stats.kde
In-Reply-To: <AANLkTinLpxrR2obSKZ0cnvLk9aPpzD=bkgSDLH2JTDQe@mail.gmail.com>
References: <AANLkTinR9oWWq1kdeA5j90MM_R+8=G_o69jSHBLEs+1n@mail.gmail.com>
	<AANLkTi=S9ZQzpaM2YYGdNee=PfT2bMfy22AQdSFVor+2@mail.gmail.com>
	<AANLkTike75njszWLAjVrj4HkU10pFwEUYOyRX9iHs3JO@mail.gmail.com>
	<AANLkTim-SVUrf10=P9pwbaVxcNKNYSkQuLMyCq-Y3CPu@mail.gmail.com>
	<AANLkTinLpxrR2obSKZ0cnvLk9aPpzD=bkgSDLH2JTDQe@mail.gmail.com>
Message-ID: <AANLkTi=d6TO3ekcTNMWyTf7A1_xPa-4AbdgZxd_pvPfB@mail.gmail.com>

Quick question: norm <-> delta function?

Again, this is for work on the circle and for fairly dense data sets.

But in principle, the KDE as a function is the convolution of a forest

of delta functions, one per point, with the kernel. The conventional

way to evaluate this function at a point is simply to evaluate the

kernel once per data point and add them up. To evaluate this on a

grid, you evaluate the kernel once per grid point per data point and

add. Naturally this can be expensive.


Huh, usually I do it the other way around (changing the grid as needed
per-datapoint) because it saves lots of time for kernels with finite
support. Is truncating kernels often applied? It depends on your data
obviously but in many cases the Gaussian drops very rapidly (relative to the
size of the mesh).

To the rest: that's a very clever idea. I see your point about varying the
bandwidths etc. without recomputing the KDE. Maybe then there should be
a separation between "plotting" the KDE with a given kernel & bandwidth and
creating a KDE from a dataset (which would just determine the FCs of the
forest of delta functions). W.r.t. padding, perhaps doing it the other way
around (as I mentioned above) would negate the consequences of an expanded
mesh (I have no idea what I'm talking about really--shot in the dark)?

-Sam

On Fri, Aug 27, 2010 at 3:51 PM, Anne Archibald
<aarchiba at physics.mcgill.ca>wrote:

> On 27 August 2010 15:27, Sam Birch <sam.m.birch at gmail.com> wrote:
> >> Bandwidth selection is a hotly debated topic, at least in one
> >
> > dimension, so perhaps not just different methods but tools for
> >
> > diagnosing bandwidth selection problems would be nice - at the least,
> >
> > it should be made straightforward to vary the bandwidth (e.g. to plot
> >
> > the KDE with a range of different bandwidth values).
> >
> > Well by allowing them to use a custom bandwidth matrix they can vary it
> > themselves, no?
>
> Well, in principle, yes. But if the API forces them to construct an
> entirely new KDE object to change the bandwidth matrix, and if this
> object involves substantial additional data structures (e.g. a kd-tree
> holding the data points) this could be cumbersome.
>
> >> At the other end of the spectrum, for very dense KDEs, on the circle I
> >
> > found it extremely convenient to use Fourier transforms to carry out
> >
> > the convolution of kernel with points. In particular, I represented
> >
> > the KDE in terms of its Fourier coefficients, so that an inverse FFT
> >
> > immediately gave me the KDE evaluated on a grid (or, with some
> >
> > fiddling, integrated over the bins of a histogram). I don't know
> >
> > whether this is a useful optimization for KDEs on the line or in
> >
> > higher dimensions, since there's the problem of wrapping.
> >
> > That sounds very interesting. Sorry if I'm being dense (or just wrong, or
> > both), but do you convolve post-FFT or before? If before why does it make
> it
> > easier?
>
> Again, this is for work on the circle and for fairly dense data sets.
> But in principle, the KDE as a function is the convolution of a forest
> of delta functions, one per point, with the kernel. The conventional
> way to evaluate this function at a point is simply to evaluate the
> kernel once per data point and add them up. To evaluate this on a
> grid, you evaluate the kernel once per grid point per data point and
> add. Naturally this can be expensive.
>
> My idea is that convolution of functions is simply multiplication of
> the Fourier transforms of those functions. So instead of storing a
> list of data points in my KDE object, I store a representation of the
> forest of delta functions in terms of their Fourier coefficients (the
> nth Fourier coefficient of a delta function at phase p is exp(2 pi i n
> p)). This is necessarily approximate, since I store finitely many
> Fourier coefficients, but it's not hard to store "enough". Now when I
> want to convolve this forest by a kernel, I simply multiply these
> Fourier coefficients by those of the kernel. The easiest "kernel" is
> the sinc function, for which I simply truncate the Fourier
> coefficients (which is why it's easy to have enough). We actually use
> this "kernel" a lot, even though it's not positive everywhere. A
> mathematically-better choice is the von Mises distribution, whose PDF
> is proportional to exp(k cos(x)) and whose Fourier coefficients can be
> written in terms of Bessel functions.
>
> Once you have the Fourier coefficients of the KDE, you can evaluate it
> at a point by taking a sum of sinusoids, but the key idea is that you
> can evaluate it on a grid by taking an inverse FFT. If you want
> integrals over intervals, well, that you just get by integrating
> sinusoids over intervals, so there's a messy but easily-derived way to
> work out the area in terms of the Fourier coefficients. This too can
> be nicely worked out on a grid, by fiddling the Fourier coefficients
> and taking an inverse FFT.
>
>
> To construct the FCs of the forest of delta functions, if I have
> photon arrival phases I just take a sum (which can be slow, but this
> isn't really time-critical). But it's also perfectly reasonable to
> start from a histogram and take an FFT. Crucially, the histogram need
> not be a reasonable-looking histogram - you can never have too many
> bins, since it's not a problem if all the bin counts are either zero
> or one. The only drawback here is that you introduce an error
> averaging to half a bin width on each photon arrival phase. But the
> KDE provides a check on this too - if your kernel width is much larger
> than the width of the input bins, then the errors you introduced
> probably don't matter much (leaving aside nasty issues with Moire
> patterns in the likely case that your input times were already
> binned).
>
> One thing to note here is that once you have the FCs, you can try
> various kernels and bandwidths without going back to your original
> data. (You can also get uncertainties on all the various computed
> quantities, and in fact you can usually turn around and not only start
> from a histogram but start from an array of values with uncertainties.
> All this stuff is in a paper that's on my back burner right now.)
>
>
> The thing is, I don't really know how useful all this is for KDEs on a
> line or in R^n. The problem is that working with discrete Fourier
> coefficients implicitly wraps the KDE around at the ends of the
> interval, and it's not clear that this is still worth doing if you're
> going to "pad" your region enough that this isn't a problem: the
> padding forces you to evaluate at lots of points you don't care about
> and use lots more Fourier coefficients than you would otherwise have
> to.
>
>
> Anne
>
> > -Sam
> > On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald <
> aarchiba at physics.mcgill.ca>
> > wrote:
> >>
> >> My only experience with KDEs has been on the circle, where there seems
> >> to be little or no literature and the constraints are rather
> >> different.
> >>
> >> On 27 August 2010 14:38,  <josef.pktd at gmail.com> wrote:
> >> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com>
> >> > wrote:
> >> >> Hi all,
> >> >> I was thinking of renovating the kernel density estimation package
> >> >> (although
> >> >> no promises; I'm leaving for college tomorrow morning!). I was
> >> >> wondering:
> >> >> a) whether anyone had started code in that direction
> >> >
> >> > Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> >> > kernel density estimator in scikits.statsmodels, which cover a larger
> >> > number of kernels
> >> >
> >> > I don't think I have seen any higher dimensional kernel density
> >> > estimation in python besides scipy.stats.kde. The Gaussian kde in
> >> > scipy.stats is targeted to the underlying Fortran code for
> >> > multivariate normal cdf.
> >> > It's not clear to me what other n-dimensional kdes would require or
> >> > whether they would fit well with the current code.
> >> >
> >> > One extension that Robert also mentioned in the past that it would be
> >> > nice to have adaptive kernels, which I also haven't seen in python
> >> > yet.
> >> >
> >> >> b) what people want in it
> >> >> I was thinking (as an ideal, not necessarily goal):
> >> >> - Support for more than Gaussian kernels (e.g. custom,
> >> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
> >> >> - More options for bandwidth selection (custom bandwidth matrices,
> >> >> AMISE
> >> >> optimization, cross-validation, etc.)
> >> >
> >> > definitely yes, I don't think they are even available for 1D yet.
> >>
> >> Bandwidth selection is a hotly debated topic, at least in one
> >> dimension, so perhaps not just different methods but tools for
> >> diagnosing bandwidth selection problems would be nice - at the least,
> >> it should be made straightforward to vary the bandwidth (e.g. to plot
> >> the KDE with a range of different bandwidth values).
> >>
> >> >> - Assorted conveniences: automatically generate the mesh, limit the
> >> >> kernel's
> >> >> support for speed
> >> >
> >> > Using scipy.spatial to limit the number of neighbors in a bounded
> >> > support kernel might be a good idea.
> >>
> >> Simply using it to find the neighbors that need to be used should
> >> speed things up. There may also be some shortcuts for
> >> unbounded-support kernels (no point adding a Gaussian a hundred sigma
> >> away if there's any points nearby).
> >>
> >> At the other end of the spectrum, for very dense KDEs, on the circle I
> >> found it extremely convenient to use Fourier transforms to carry out
> >> the convolution of kernel with points. In particular, I represented
> >> the KDE in terms of its Fourier coefficients, so that an inverse FFT
> >> immediately gave me the KDE evaluated on a grid (or, with some
> >> fiddling, integrated over the bins of a histogram). I don't know
> >> whether this is a useful optimization for KDEs on the line or in
> >> higher dimensions, since there's the problem of wrapping.
> >>
> >> Anne
> >>
> >> > (just some thought on the topic)
> >> >
> >> > Josef
> >> >
> >> >> So, thoughts anyone? I figure it's better to over-specify and then
> >> >> under-produce, so don't hold back.
> >> >> Thanks,
> >> >> Sam
> >> >> _______________________________________________
> >> >> SciPy-Dev mailing list
> >> >> SciPy-Dev at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >> >>
> >> >>
> >> > _______________________________________________
> >> > SciPy-Dev mailing list
> >> > SciPy-Dev at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >> >
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/28fa39c2/attachment.html>

From cournape at gmail.com  Fri Aug 27 20:09:17 2010
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 28 Aug 2010 09:09:17 +0900
Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure
In-Reply-To: <AANLkTimWuU2w9O44BrMvcsMMzSHo8nq+P9RkOHtXmdPL@mail.gmail.com>
References: <AANLkTimWuU2w9O44BrMvcsMMzSHo8nq+P9RkOHtXmdPL@mail.gmail.com>
Message-ID: <AANLkTikFKcwmHuZQ=Z4A_4AXFt_LoH+GjuLXfGFoSOsz@mail.gmail.com>

On Sat, Aug 28, 2010 at 12:55 AM, G?khan Sever <gokhansever at gmail.com> wrote:
> Hello,
> On a Fedora 13 VirtualBox setup
> Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686
> i686 i386 GNU/Linux
> python -c 'import numpy; numpy.test()'
> Running unit tests for numpy
> NumPy version 2.0.0.dev8671
> NumPy is installed in /usr/lib/python2.6/site-packages/numpy
> Python version 2.6.4 (r264:75706, Jun ?4 2010, 18:20:16) [GCC 4.4.4 20100503
> (Red Hat 4.4.4-2)]
> nose version 0.11.3
> ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K..................................................................................................................................................................................................................................K............................................................................................K......................K.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.....................................................................................................................................................................................................................................................................................................................................................................................................................................Warning:
> divide by zero encountered in log
> .......................................................................................................................................................................................................................................................................................
> ======================================================================
> FAIL: test_lapack (test_build.TestF77Mismatch)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ??File "/usr/lib/python2.6/site-packages/numpy/testing/decorators.py", line
> 146, in skipper_func
> ?? ?return f(*args, **kwargs)
> ??File "/usr/lib/python2.6/site-packages/numpy/linalg/tests/test_build.py",
> line 50, in test_lapack
> ?? ?information.""")
> AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! This
> is likely to
> cause random crashes and wrong results. See numpy INSTALL.txt for more
> information.
> ?? ?"Fail the test if the expression is true."
>>> ?if True: raise self.failureException, 'Both g77 and gfortran runtimes
>>> linked in lapack_lite ! This is likely to\ncause random crashes and wrong
>>> results. See numpy INSTALL.txt for more\ninformation.'
>
> ----------------------------------------------------------------------
> Ran 3024 tests in 21.928s
> FAILED (KNOWNFAIL=4, failures=1)
>
>
> Any idea how to resolve this one? I use package manager to install
> requirements. It seems g77 and gfortran are mixed for lapack, but not sure
> how to fix it. When I try to uninstall gfortran it tries to remove
> lapack/blas/atlas all.

Use python setup.py build_ext --fcompiler=gnu95

Maybe removing g77 works as well,

cheers,

David


From gokhansever at gmail.com  Fri Aug 27 20:30:41 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 19:30:41 -0500
Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure
In-Reply-To: <AANLkTikFKcwmHuZQ=Z4A_4AXFt_LoH+GjuLXfGFoSOsz@mail.gmail.com>
References: <AANLkTimWuU2w9O44BrMvcsMMzSHo8nq+P9RkOHtXmdPL@mail.gmail.com>
	<AANLkTikFKcwmHuZQ=Z4A_4AXFt_LoH+GjuLXfGFoSOsz@mail.gmail.com>
Message-ID: <AANLkTinvUicbj-vOGH2F0tDRgay8xmjaB8J60WOnEbcU@mail.gmail.com>

On Fri, Aug 27, 2010 at 7:09 PM, David Cournapeau <cournape at gmail.com>wrote:

> Use python setup.py build_ext --fcompiler=gnu95
>
> Maybe removing g77 works as well,
>
> cheers,
>
> David
>

Both results with the same test failure .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/15c5de02/attachment.html>

From gokhansever at gmail.com  Fri Aug 27 20:38:33 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 19:38:33 -0500
Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure
In-Reply-To: <AANLkTinvUicbj-vOGH2F0tDRgay8xmjaB8J60WOnEbcU@mail.gmail.com>
References: <AANLkTimWuU2w9O44BrMvcsMMzSHo8nq+P9RkOHtXmdPL@mail.gmail.com>
	<AANLkTikFKcwmHuZQ=Z4A_4AXFt_LoH+GjuLXfGFoSOsz@mail.gmail.com>
	<AANLkTinvUicbj-vOGH2F0tDRgay8xmjaB8J60WOnEbcU@mail.gmail.com>
Message-ID: <AANLkTi=6+azquwr7Vz+MJ2ZmfzKZLpcP9PfvFL=0wmQc@mail.gmail.com>

On Fri, Aug 27, 2010 at 7:30 PM, G?khan Sever <gokhansever at gmail.com> wrote:

>
>
> On Fri, Aug 27, 2010 at 7:09 PM, David Cournapeau <cournape at gmail.com>wrote:
>
>> Use python setup.py build_ext --fcompiler=gnu95
>>
>> Maybe removing g77 works as well,
>>
>> cheers,
>>
>> David
>>
>
> Both results with the same test failure .
>

Sorry for the noise. With a clean install and using only gfortran all tests
pass.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/889b8f39/attachment.html>

From gokhansever at gmail.com  Fri Aug 27 22:00:26 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 21:00:26 -0500
Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault)
In-Reply-To: <4C780088.4090602@gmail.com>
References: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>
	<4C780088.4090602@gmail.com>
Message-ID: <AANLkTi=kXeAn1d6o-b1dh6tKnFcuUPmsGyD+9nvp63ie@mail.gmail.com>

On Fri, Aug 27, 2010 at 1:14 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> Hi,
> It would be useful to know which test is involved regardless of the issue.
> Can you please run the tests with the verbose option such as:
> 'scipy.test(verbose=10)'?
>
> This may have something to do with your numpy problem so please fix that up
> once you have identified the test (failure in linalg would confirm that).
>
> As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version
> 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'.
>
> Did you completely wipe the previous numpy installation especially the
> installed numpy files in $PATH2PYTHON/site-packages/ and remove any prior
> build directories?
> If so, then you need to create suitable site.cfg file to ensure the correct
> compiler is being used because something has changed in either your distro
> or numpy install.
>
> Bruce
>

Hello,

Following David and your suggestions I resolved NumPy and SciPy install
problems from the source repo. SciPy tests don't give any segfaults anymore.
>From the verbose=10 run I get a long list of test execution giving:

Ran 4668 tests in 105.707s

FAILED (KNOWNFAIL=12, SKIP=34, failures=2)

I couldn't get it write into a file with piping -> python -c 'import scipy;
scipy.test(verbose=10)' >> scipy_test

How can I get it logging? I might upload the file somewhere for further
investigation.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/edd9cfb7/attachment.html>

From charlesr.harris at gmail.com  Fri Aug 27 22:10:52 2010
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 27 Aug 2010 20:10:52 -0600
Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault)
In-Reply-To: <AANLkTi=kXeAn1d6o-b1dh6tKnFcuUPmsGyD+9nvp63ie@mail.gmail.com>
References: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>
	<4C780088.4090602@gmail.com>
	<AANLkTi=kXeAn1d6o-b1dh6tKnFcuUPmsGyD+9nvp63ie@mail.gmail.com>
Message-ID: <AANLkTi=dicc9YtekHi2OOitgEMF8vKTdZJxOeUhioao4@mail.gmail.com>

On Fri, Aug 27, 2010 at 8:00 PM, G?khan Sever <gokhansever at gmail.com> wrote:

>
>
> On Fri, Aug 27, 2010 at 1:14 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>
>> Hi,
>> It would be useful to know which test is involved regardless of the issue.
>> Can you please run the tests with the verbose option such as:
>> 'scipy.test(verbose=10)'?
>>
>> This may have something to do with your numpy problem so please fix that
>> up once you have identified the test (failure in linalg would confirm that).
>>
>>
>> As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version
>> 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'.
>>
>> Did you completely wipe the previous numpy installation especially the
>> installed numpy files in $PATH2PYTHON/site-packages/ and remove any prior
>> build directories?
>> If so, then you need to create suitable site.cfg file to ensure the
>> correct compiler is being used because something has changed in either your
>> distro or numpy install.
>>
>> Bruce
>>
>
> Hello,
>
> Following David and your suggestions I resolved NumPy and SciPy install
> problems from the source repo. SciPy tests don't give any segfaults anymore.
> From the verbose=10 run I get a long list of test execution giving:
>
> Ran 4668 tests in 105.707s
>
> FAILED (KNOWNFAIL=12, SKIP=34, failures=2)
>
> I couldn't get it write into a file with piping -> python -c 'import scipy;
> scipy.test(verbose=10)' >> scipy_test
>
>
In bash:  python -c 'import scipy; scipy.test(verbose=10)'  &> scipy_test

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/6de17219/attachment.html>

From gokhansever at gmail.com  Fri Aug 27 22:30:34 2010
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Fri, 27 Aug 2010 21:30:34 -0500
Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault)
In-Reply-To: <AANLkTi=dicc9YtekHi2OOitgEMF8vKTdZJxOeUhioao4@mail.gmail.com>
References: <AANLkTinG6Vj7TMYnJ1i6omCfUJANGBrd8mTVSqQn5aLN@mail.gmail.com>
	<4C780088.4090602@gmail.com>
	<AANLkTi=kXeAn1d6o-b1dh6tKnFcuUPmsGyD+9nvp63ie@mail.gmail.com>
	<AANLkTi=dicc9YtekHi2OOitgEMF8vKTdZJxOeUhioao4@mail.gmail.com>
Message-ID: <AANLkTikcbXU=iwJ2D_ZxFSfk+w23wtbjEMQwJTLc-S4v@mail.gmail.com>

On Fri, Aug 27, 2010 at 9:10 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
> In bash:  python -c 'import scipy; scipy.test(verbose=10)'  &> scipy_test
>
> Chuck
>

Thanks Chuck,

& does the trick. See the test results at http://pastebin.com/qg2x2vdV
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100827/42191704/attachment.html>

From josef.pktd at gmail.com  Mon Aug 30 09:59:01 2010
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 30 Aug 2010 09:59:01 -0400
Subject: [SciPy-Dev] Travis: test_continuous_basic.py
Message-ID: <AANLkTimMbFy3exgS6RanKzBmJQ7EU9ChZ9jtZbadWVqC@mail.gmail.com>

Travis,

Is there a reason why you disabled the kolmogorov-smirnov tests ?

http://projects.scipy.org/scipy/changeset/6472/trunk/scipy/stats/tests/test_continuous_basic.py


180	 	            yield check_distribution_rvs, dist, args, alpha, rvs
181	 	
 	184	 #           yield check_distribution_rvs, dist, args, alpha, rvs

Josef


From pav at iki.fi  Tue Aug 31 18:01:44 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Tue, 31 Aug 2010 22:01:44 +0000 (UTC)
Subject: [SciPy-Dev] RFR: N-dimensional interpolation
References: <i2hlje$i95$1@dough.gmane.org>
Message-ID: <i5ju47$i6o$1@dough.gmane.org>

Sun, 25 Jul 2010 15:35:11 +0000, Pauli Virtanen wrote:
> I took the Qhull by the horns, and wrote a straightforward `griddata`
> implementation for working in N-D:
[clip]

It's now committed to SVN (as it works-well-for-me).
Post-mortem review and testing is welcome.

http://projects.scipy.org/scipy/changeset/6653
http://projects.scipy.org/scipy/changeset/6655
http://projects.scipy.org/scipy/changeset/6657

What's in there is:

1) scipy.spatial.qhull

   Delaunay decomposition and some associated low-level N-d geometry
   routines.

2) scipy.interpolate.interpnd

   N-dimensional interpolation:

   1) Linear barycentric interpolation
   2) Cubic spline interpolation (2D-only, C1 continuous, 
      approximately minimum-curvature).

3) scipy.interpolate.griddatand

   Convenience interface to the N-d interpolation classes.

What could be added:

- More comprehensive interface to other features of Qhull

- Using qhull_restore, qhull_save to store Qhull contexts
  instead of copying the relevant data?

- Optimizing the cubic interpolant

- Monotonic cubic interpolation

- Cubic interpolation in 3-d

- Natural neighbour interpolation

- etc.

    ***

Example:

import numpy as np
def func(x, y):
    return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2

grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

points = np.random.rand(1000, 2)
values = func(points[:,0], points[:,1])

from scipy.interpolate import griddata
grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest')
grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear')
grid_z2 = griddata(points, values, (grid_x, grid_y), method='cubic')

import matplotlib.pyplot as plt
plt.subplot(221)
plt.imshow(func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower')
plt.plot(points[:,0], points[:,1], 'k.', ms=1)
plt.title('Original')
plt.subplot(222)
plt.imshow(grid_z0.T, extent=(0,1,0,1), origin='lower')
plt.title('Nearest')
plt.subplot(223)
plt.imshow(grid_z1.T, extent=(0,1,0,1), origin='lower')
plt.title('Linear')
plt.subplot(224)
plt.imshow(grid_z2.T, extent=(0,1,0,1), origin='lower')
plt.title('Cubic')
plt.gcf().set_size_inches(6, 6)
plt.show()


-- 
Pauli Virtanen


From dwf at cs.toronto.edu  Tue Aug 31 18:24:48 2010
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Tue, 31 Aug 2010 18:24:48 -0400
Subject: [SciPy-Dev] RFR: N-dimensional interpolation
In-Reply-To: <i5ju47$i6o$1@dough.gmane.org>
References: <i2hlje$i95$1@dough.gmane.org> <i5ju47$i6o$1@dough.gmane.org>
Message-ID: <17D79004-0D89-41E4-92FA-19819025A342@cs.toronto.edu>

On 2010-08-31, at 6:01 PM, Pauli Virtanen wrote:

> What's in there is:
> 
> 1) scipy.spatial.qhull
> 
>   Delaunay decomposition and some associated low-level N-d geometry
>   routines.
> 
> 2) scipy.interpolate.interpnd
> 
>   N-dimensional interpolation:
> 
>   1) Linear barycentric interpolation
>   2) Cubic spline interpolation (2D-only, C1 continuous, 
>      approximately minimum-curvature).
> 
> 3) scipy.interpolate.griddatand
> 
>   Convenience interface to the N-d interpolation classes.

I don't know if and when I'll have occasion to use this stuff, but I'm glad it's there. Nice work, Pauli!

One comment: the name "griddatand" looks odd to my eyes, my mind wants to group it as "datand". I only mention this because it might slip past someone who's looking for "griddata" (also, does np.lookfor match partial words?). I don't really have a suggestion of another name though. "ndgriddata"? Then griddata looks a bit more separate, and it'd match scipy.ndimage.

David

From pav at iki.fi  Tue Aug 31 18:28:03 2010
From: pav at iki.fi (Pauli Virtanen)
Date: Tue, 31 Aug 2010 22:28:03 +0000 (UTC)
Subject: [SciPy-Dev] RFR: N-dimensional interpolation
References: <i2hlje$i95$1@dough.gmane.org> <i5ju47$i6o$1@dough.gmane.org>
	<17D79004-0D89-41E4-92FA-19819025A342@cs.toronto.edu>
Message-ID: <i5jvlg$i6o$2@dough.gmane.org>

Tue, 31 Aug 2010 18:24:48 -0400, David Warde-Farley wrote:
[clip]
> One comment: the name "griddatand" looks odd to my eyes, my mind wants
> to group it as "datand". I only mention this because it might slip past
> someone who's looking for "griddata"

"griddatand" is the name of the module, and probably nobody will need to 
use it since the stuff is imported to top-level scipy.interpolate. 
(I can't use "griddata" there since it would shadow the function name.)

> (also, does np.lookfor match partial words?). 

Yep.

> I don't really have a suggestion of another name
> though. "ndgriddata"? Then griddata looks a bit more separate, and it'd
> match scipy.ndimage.

That might be a slightly better name, yes.

-- 
Pauli Virtanen


From fperez.net at gmail.com  Tue Aug 31 20:31:03 2010
From: fperez.net at gmail.com (Fernando Perez)
Date: Tue, 31 Aug 2010 17:31:03 -0700
Subject: [SciPy-Dev] Scipy.org down (again)
Message-ID: <AANLkTikx2O5bC5Y6p2DH00sCFCMGj5h1T0k23yuB4zw3@mail.gmail.com>

Howdy,

the usual...

Cheers,

f