From d.l.goldsmith at gmail.com Sun Aug 1 00:18:30 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 31 Jul 2010 21:18:30 -0700 Subject: [SciPy-Dev] Status of scipy.* docstrings Message-ID: Hi, folks! Except for scipy.stats, the docstrings of all sub-packages immediately "below" scipy have now had autosummary directives for all (with one exception) their objects added to them, with either the existing description if there already was one, or the description pulled in by the autosummary (perhaps paraphrased if necessary to satisfy the 75 character restriction) if said pulling worked, or "TODO" if neither of those conditions were met; scipy.stats was already partially under autosummary "control" when I checked it, so I've left it alone pending further info as to whether this incompleteness is intentional or not. I will continue to work my way down the namespace tree - at a reduced pace - but I just wanted to announce that this *ad hoc* (i.e., "unofficial") milestone has been met, and point out that some of the holes alluded to above can serve as pointers to places that need work, e.g., places where the autosummary directive either pulled nothing, "failed to parse" the summary, or pulled an excessively long description (indicating, IIUC, a docstring w/ an excessively long Brief Summary). Thanks for all you do (and thanks for all the kudos that have come in post my resignation announcement). DG -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Aug 1 05:48:08 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Aug 2010 05:48:08 -0400 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith wrote: > Hi, folks!? Except for scipy.stats, the docstrings of all sub-packages > immediately "below" scipy have now had autosummary directives for all (with > one exception) their objects added to them, with either the existing > description if there already was one, or the description pulled in by the > autosummary (perhaps paraphrased if necessary to satisfy the 75 character > restriction) if said pulling worked, or "TODO" if neither of those > conditions were met; scipy.stats was already partially under autosummary > "control" when I checked it, so I've left it alone pending further info as > to whether this incompleteness is intentional or not.? I will continue to > work my way down the namespace tree - at a reduced pace - but I just wanted > to announce that this ad hoc (i.e., "unofficial") milestone has been met, > and point out that some of the holes alluded to above can serve as pointers > to places that need work, e.g., places where the autosummary directive > either pulled nothing, "failed to parse" the summary, or pulled an > excessively long description (indicating, IIUC, a docstring w/ an > excessively long Brief Summary). Is there a way to handle now autosummary and similar directives in python modules, e.g. info.py. What's the pattern/recommendation now for content in the module docstring, in __init__.py or info.py, versus subpackage rst file ? (I'm still trying to catch up with recent changes.) > > Thanks for all you do (and thanks for all the kudos that have come in post > my resignation announcement). Also a big thank you from me, especially for getting a more consistent structure into the docs. Josef > > DG > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From d.l.goldsmith at gmail.com Sun Aug 1 17:21:12 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 1 Aug 2010 14:21:12 -0700 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: Hi, josef, and thanks! On Sun, Aug 1, 2010 at 2:48 AM, wrote: > On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith > wrote: > > Hi, folks! Except for scipy.stats, the docstrings of all sub-packages > > immediately "below" scipy have now had autosummary directives for all > (with > > one exception) their objects added to them, with either the existing > > description if there already was one, or the description pulled in by the > > autosummary (perhaps paraphrased if necessary to satisfy the 75 character > > restriction) if said pulling worked, or "TODO" if neither of those > > conditions were met; scipy.stats was already partially under autosummary > > "control" when I checked it, so I've left it alone pending further info > as > > to whether this incompleteness is intentional or not. I will continue to > > work my way down the namespace tree - at a reduced pace - but I just > wanted > > to announce that this ad hoc (i.e., "unofficial") milestone has been met, > > and point out that some of the holes alluded to above can serve as > pointers > > to places that need work, e.g., places where the autosummary directive > > either pulled nothing, "failed to parse" the summary, or pulled an > > excessively long description (indicating, IIUC, a docstring w/ an > > excessively long Brief Summary). > > Is there a way to handle now autosummary and similar directives in > python modules, e.g. info.py. > Please clarify precisely what you mean: exactly what problem(s) are you seeing/having? > What's the pattern/recommendation now for content in the module > docstring, in __init__.py or info.py, versus subpackage rst file ? > I don't think a formal "policy" was ever formally adopted. I, rather unilaterally, took it upon myself to "standardize" to using the autosummary directive in sub-package and module docstrings on the grounds that it assures consistency across at least two presentations: the target object docstring and its one line summary in the auto-rendering of the docstring of its parent namespace (unfortunately, it doesn't assure consistency in the "terminal" presentation of the latter docstring--presently, that has to be done manually--but maybe automation of that too can be added down the road). I had no qualms about making the unilateral decision to do this because: a) I felt my reasoning for doing so was consistent w/ our general philosophy of fighting docstring divergence, and b) my change could always be reverted. Anyway, there it is: if people feel that this is how we should continue, then there are now a bunch of examples to follow; if they don't, the changes can be reverted and a different approach to consistency and minimal maintenance can be proposed. As far as narrative content in these top level docstrings is concerned, I did not provide any where it did not already exist - that open issue is still open AFAIC. DG > > (I'm still trying to catch up with recent changes.) > > > > > Thanks for all you do (and thanks for all the kudos that have come in > post > > my resignation announcement). > > Also a big thank you from me, especially for getting a more consistent > structure into the docs. > > Josef > > > > > > DG > > > > -- > > Mathematician: noun, someone who disavows certainty when their > uncertainty > > set is non-empty, even if that set has measure zero. > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Aug 1 18:34:34 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Aug 2010 18:34:34 -0400 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On Sun, Aug 1, 2010 at 5:21 PM, David Goldsmith wrote: > Hi, josef, and thanks! > > On Sun, Aug 1, 2010 at 2:48 AM, wrote: >> >> On Sun, Aug 1, 2010 at 12:18 AM, David Goldsmith >> wrote: >> > Hi, folks!? Except for scipy.stats, the docstrings of all sub-packages >> > immediately "below" scipy have now had autosummary directives for all >> > (with >> > one exception) their objects added to them, with either the existing >> > description if there already was one, or the description pulled in by >> > the >> > autosummary (perhaps paraphrased if necessary to satisfy the 75 >> > character >> > restriction) if said pulling worked, or "TODO" if neither of those >> > conditions were met; scipy.stats was already partially under autosummary >> > "control" when I checked it, so I've left it alone pending further info >> > as >> > to whether this incompleteness is intentional or not.? I will continue >> > to >> > work my way down the namespace tree - at a reduced pace - but I just >> > wanted >> > to announce that this ad hoc (i.e., "unofficial") milestone has been >> > met, >> > and point out that some of the holes alluded to above can serve as >> > pointers >> > to places that need work, e.g., places where the autosummary directive >> > either pulled nothing, "failed to parse" the summary, or pulled an >> > excessively long description (indicating, IIUC, a docstring w/ an >> > excessively long Brief Summary). >> >> Is there a way to handle now autosummary and similar directives in >> python modules, e.g. info.py. > > Please clarify precisely what you mean: exactly what problem(s) are you > seeing/having? > >> >> What's the pattern/recommendation now for content in the module >> docstring, in __init__.py or info.py, versus subpackage rst file ? > > I don't think a formal "policy" was ever formally adopted.? I, rather > unilaterally, took it upon myself to "standardize" to using the autosummary > directive in sub-package and module docstrings on the grounds that it > assures consistency across at least two presentations: the target object > docstring and its one line summary in the auto-rendering of the docstring of > its parent namespace (unfortunately, it doesn't assure consistency in the > "terminal" presentation of the latter docstring--presently, that has to be > done manually--but maybe automation of that too can be added down the > road).? I had no qualms about making the unilateral decision to do this > because: a) I felt my reasoning for doing so was consistent w/ our general > philosophy of fighting docstring divergence, and b) my change could always > be reverted.? Anyway, there it is: if people feel that this is how we should > continue, then there are now a bunch of examples to follow; if they don't, > the changes can be reverted and a different approach to consistency and > minimal maintenance can be proposed.? As far as narrative content in these > top level docstrings is concerned, I did not provide any where it did not > already exist - that open issue is still open AFAIC. My impression was that module docstrings, in __init__.py and info.py are mainly for the commandline/interpreter, and I thought for most subpackages they are or were not included in the sphinx rendered docs. In this case, autosummary would be noise in the interpreter and not picked up by sphinx, which uses the corresponding rst files. (But I lost a bit the overview how and which parts interpreter, doceditor and sphinx render.) Josef > > DG > >> >> (I'm still trying to catch up with recent changes.) >> >> > >> > Thanks for all you do (and thanks for all the kudos that have come in >> > post >> > my resignation announcement). >> >> Also a big thank you from me, especially for getting a more consistent >> structure into the docs. >> >> Josef >> >> >> > >> > DG >> > >> > -- >> > Mathematician: noun, someone who disavows certainty when their >> > uncertainty >> > set is non-empty, even if that set has measure zero. >> > >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide.? (As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From scott.sinclair.za at gmail.com Mon Aug 2 02:51:01 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 2 Aug 2010 08:51:01 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: >On 2 August 2010 00:34, wrote: > > My impression was that module docstrings, in __init__.py and info.py > are mainly for the commandline/interpreter, and I thought for most > subpackages they are or were not included in the sphinx rendered docs. This is correct, __init__.py and info.py are not included in the Sphinx rendered docs. I think Pauli hand copied their contents into the .rst files when he created those. > In this case, autosummary would be noise in the interpreter and not > picked up by sphinx, which uses the corresponding rst files. The autosummary directives have only been added in the .rst files, not the __init__.py and info.py files, so the Sphinx markup won't appear in the interpreter. The .rst files and corresponding __init__/info.py files currently have no link and will need to be separately maintained. It shouldn't be hard to strip out the Sphinx directives from the .rst files and periodically copy the updated content into the __init__.py and info.py, so it probably makes sense to work on the .rst files for now. Cheers, Scott From josef.pktd at gmail.com Mon Aug 2 05:32:26 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Aug 2010 05:32:26 -0400 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On Mon, Aug 2, 2010 at 2:51 AM, Scott Sinclair wrote: >>On 2 August 2010 00:34, ? wrote: >> >> My impression was that module docstrings, in __init__.py and info.py >> are mainly for the commandline/interpreter, and I thought for most >> subpackages they are or were not included in the sphinx rendered docs. > > This is correct, __init__.py and info.py are not included in the > Sphinx rendered docs. I think Pauli hand copied their contents into > the .rst files when he created those. > >> In this case, autosummary would be noise in the interpreter and not >> picked up by sphinx, which uses the corresponding rst files. > > The autosummary directives have only been added in the > .rst files, not the __init__.py and info.py files, so the > Sphinx markup won't appear in the interpreter. The .rst > files and corresponding __init__/info.py files currently have no link > and will need to be separately maintained. > > It shouldn't be hard to strip out the Sphinx directives from the > .rst files and periodically copy the updated content into > the __init__.py and info.py, so it probably makes sense to work on the > .rst files for now. my understanding: >From the source tab in http://docs.scipy.org/scipy/docs/scipy.fftpack/ , it looks like this is destined for info.py which is pulled in by __init__.py. The rst docs are at http://docs.scipy.org/scipy/docs/scipy-docs/fftpack.rst/ Cheers, Josef > Cheers, > Scott > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From scott.sinclair.za at gmail.com Mon Aug 2 06:34:24 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 2 Aug 2010 12:34:24 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: >On 2 August 2010 11:32, wrote: > On Mon, Aug 2, 2010 at 2:51 AM, Scott Sinclair > wrote: >>>On 2 August 2010 00:34, ? wrote: >>> >>> My impression was that module docstrings, in __init__.py and info.py >>> are mainly for the commandline/interpreter, and I thought for most >>> subpackages they are or were not included in the sphinx rendered docs. >> >> This is correct, __init__.py and info.py are not included in the >> Sphinx rendered docs. I think Pauli hand copied their contents into >> the .rst files when he created those. >> >>> In this case, autosummary would be noise in the interpreter and not >>> picked up by sphinx, which uses the corresponding rst files. >> >> The autosummary directives have only been added in the >> .rst files, not the __init__.py and info.py files, so the >> Sphinx markup won't appear in the interpreter. The .rst >> files and corresponding __init__/info.py files currently have no link >> and will need to be separately maintained. >> >> It shouldn't be hard to strip out the Sphinx directives from the >> .rst files and periodically copy the updated content into >> the __init__.py and info.py, so it probably makes sense to work on the >> .rst files for now. > > my understanding: > > >From the source tab in http://docs.scipy.org/scipy/docs/scipy.fftpack/ > , it looks like this is destined for info.py which is pulled in by > __init__.py. > The rst docs are at http://docs.scipy.org/scipy/docs/scipy-docs/fftpack.rst/ Hmm. Good point. I was looking at the way the docs are built from the source tree, not at what was edited in the doc-editor. When the documentation is generated from the source tree, the Sphinx master document doc/source/index.rst pulls in doc/source/fftpack.rst, not __init.__.py. In the doc-editor doc/source/index.rst is at http://docs.scipy.org/scipy/docs/scipy-docs/index.rst/. The solution would be to use the recent edits at http://docs.scipy.org/scipy/docs/scipy./ to update what's at http://docs.scipy.org/scipy/docs/scipy-docs/.rst/ then remove the Sphinx directives from http://docs.scipy.org/scipy/docs/scipy./ or just revert them to what's in trunk for now. Maybe the http://docs.scipy.org/scipy/docs/scipy./ docstrings should also be marked as unimportant to warn people that the situation is a little tricky to unravel.. Cheers, Scott From pav at iki.fi Mon Aug 2 07:24:29 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 2 Aug 2010 11:24:29 +0000 (UTC) Subject: [SciPy-Dev] Status of scipy.* docstrings References: Message-ID: Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote: [clip] > Maybe the http://docs.scipy.org/scipy/docs/scipy./ > docstrings should also be marked as unimportant to warn people that the > situation is a little tricky to unravel.. A valid alternative is just to put all of the documentation to the info.py, and just put .. automodule:: scipy.optimize to the optimize.rst. Autosummary directives work correctly in submodule docstrings. For instance, this page: http://docs.scipy.org/doc/numpy/reference/routines.fft.html comes solely from ``numpy/fft/info.py``: http://docs.scipy.org/doc/numpy/_sources/reference/routines.fft.txt http://docs.scipy.org/numpy/source/numpy/dist/lib64/python2.4/site-packages/numpy/fft/info.py You can also write something like .. autosummary:: :toctree: some_function Short blurb describing what it does and the "Short blurb ..." will be ignored in the HTML output, but it is useful for the people looking at the text via help(). -- Pauli Virtanen From scott.sinclair.za at gmail.com Mon Aug 2 09:11:42 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 2 Aug 2010 15:11:42 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On 2 August 2010 13:24, Pauli Virtanen wrote: > Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote: > [clip] >> Maybe the http://docs.scipy.org/scipy/docs/scipy./ >> docstrings should also be marked as unimportant to warn people that the >> situation is a little tricky to unravel.. > > A valid alternative is just to put all of the documentation to the > info.py, and just put > > ? ? ? ?.. automodule:: scipy.optimize > > to the optimize.rst. This is the approach I prefer as well. I tried to suggest it the last time we had this discussion (http://mail.scipy.org/pipermail/scipy-dev/2010-June/015075.html). Then there is only one place to keep the docs up to date, the downside being that a bit of Sphinx markup will be seen in the terminal help for sub-packages. The question is whether there actually is a strong aversion to seeing Sphinx markup in the terminal help at the top-level of the sub-packages. If it doesn't bother too many people, then your suggestion is the right way to go for all of the Scipy sub-packages. Cheers, Scott From d.l.goldsmith at gmail.com Thu Aug 5 04:17:16 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 5 Aug 2010 01:17:16 -0700 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: OK, so, should I stop adding autosummaries to module docstrings and revert the ones I did? DG On Mon, Aug 2, 2010 at 6:11 AM, Scott Sinclair wrote: > On 2 August 2010 13:24, Pauli Virtanen wrote: > > Mon, 02 Aug 2010 12:34:24 +0200, Scott Sinclair wrote: > > [clip] > >> Maybe the http://docs.scipy.org/scipy/docs/scipy./ > >> docstrings should also be marked as unimportant to warn people that the > >> situation is a little tricky to unravel.. > > > > A valid alternative is just to put all of the documentation to the > > info.py, and just put > > > > .. automodule:: scipy.optimize > > > > to the optimize.rst. > > This is the approach I prefer as well. I tried to suggest it the last > time we had this discussion > (http://mail.scipy.org/pipermail/scipy-dev/2010-June/015075.html). > Then there is only one place to keep the docs up to date, the downside > being that a bit of Sphinx markup will be seen in the terminal help > for sub-packages. > > The question is whether there actually is a strong aversion to seeing > Sphinx markup in the terminal help at the top-level of the > sub-packages. If it doesn't bother too many people, then your > suggestion is the right way to go for all of the Scipy sub-packages. > > Cheers, > Scott > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Aug 5 05:01:25 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 5 Aug 2010 09:01:25 +0000 (UTC) Subject: [SciPy-Dev] Status of scipy.* docstrings References: Message-ID: Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote: > OK, so, should I stop adding autosummaries to module docstrings and > revert the ones I did? I think the Sphinx markup involved is not heavy, and having to maintain two nearly identical documents is not something we really want to do. It might be possible to autogenerate the info.py's, but frankly, I think setting that up is not a very useful use of time, just to avoid a few RST directives. We can think about it later, but for now the priority should be to get some useful information both to the HTML docs and to the command-line help, and putting everything to info.py seems the way to go for me. I'd at least be OK with moving everything from the *.rst files to info.py. In general, I'd like to structure `info.py` in a similar way as it's in `numpy.fft`: - module name title etc. on top - function/class listing first - followed by background information (if any) needed to understand what the module is intended to do - the corresponding .rst file contains only the line .. automodule:: scipy.interpolate The only exception is probably extensive examples, or extensive background information, which should probably be retained in the *.rst part, and maybe be split into several pages. -- Pauli Virtanen From josef.pktd at gmail.com Thu Aug 5 06:55:37 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Aug 2010 06:55:37 -0400 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On Thu, Aug 5, 2010 at 5:01 AM, Pauli Virtanen wrote: > Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote: >> OK, so, should I stop adding autosummaries to module docstrings and >> revert the ones I did? > > I think the Sphinx markup involved is not heavy, and having to maintain > two nearly identical documents is not something we really want to do. > > It might be possible to autogenerate the info.py's, but frankly, I think > setting that up is not a very useful use of time, just to avoid a few RST > directives. We can think about it later, but for now the priority should > be to get some useful information both to the HTML docs and to the > command-line help, and putting everything to info.py seems the way to go > for me. > > I'd at least be OK with moving everything from the *.rst files to > info.py. In general, I'd like to structure `info.py` in a similar way as > it's in `numpy.fft`: > > - module name title etc. on top > > - function/class listing first > > - followed by background information (if any) needed to understand > ?what the module is intended to do > > - the corresponding .rst file contains only the line > > ?.. automodule:: scipy.interpolate > > The only exception is probably extensive examples, or extensive > background information, which should probably be retained in the *.rst > part, and maybe be split into several pages. One issue is the amount of math/latex, given the discussion we had on fftpack. Do we restrict latex in the module docstring as in function or class docstrings, or is it allowed to be used not only very sparingly? Since info.py files can also be edited in the module editor, I also think removing the duplication is a good idea. A related question that is not urgent: Are we keeping the split of information between tutorial and sub-package rst pages, or should some background information be moved to the package rst files ? Josef > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From scott.sinclair.za at gmail.com Thu Aug 5 07:03:30 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 5 Aug 2010 13:03:30 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On 5 August 2010 11:01, Pauli Virtanen wrote: > Thu, 05 Aug 2010 01:17:16 -0700, David Goldsmith wrote: >> OK, so, should I stop adding autosummaries to module docstrings and >> revert the ones I did? > > I think the Sphinx markup involved is not heavy, and having to maintain > two nearly identical documents is not something we really want to do. > > I'd at least be OK with moving everything from the *.rst files to > info.py. In general, I'd like to structure `info.py` in a similar way as > it's in `numpy.fft`: > > - module name title etc. on top > > - function/class listing first > > - followed by background information (if any) needed to understand > ?what the module is intended to do > > - the corresponding .rst file contains only the line > > ?.. automodule:: scipy.interpolate This sounds like a good plan. Just a note that all the edits made at http://docs.scipy.org/scipy/docs/scipy. result in patches from the doc-editor that target scipy//__init__.py in the source tree. If the patch is applied as is, the work from the doc-editor won't appear in the terminal because the .__doc__ is overwritten with the content of scipy//info.py on import of the sub-package. I expect that Sphinx will also end up with the docstring from info.py for the same reason, but don't have time to check right now. Cheers, Scott From scott.sinclair.za at gmail.com Thu Aug 5 07:10:47 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 5 Aug 2010 13:10:47 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On 5 August 2010 12:55, wrote: > A related question that is not urgent: Are we keeping the split of > information between tutorial and sub-package rst pages, or should some > background information be moved to the package rst files ? I think it makes sense to keep a split between the two. Tutorials should describe how to use a selection of the sub-package tools by example, while the sub-package rst pages should be the top level of reference documentation for the sub-package. Cheers, Scott From josef.pktd at gmail.com Thu Aug 5 07:23:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 5 Aug 2010 07:23:14 -0400 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On Thu, Aug 5, 2010 at 7:10 AM, Scott Sinclair wrote: > On 5 August 2010 12:55, ? wrote: >> A related question that is not urgent: Are we keeping the split of >> information between tutorial and sub-package rst pages, or should some >> background information be moved to the package rst files ? > > I think it makes sense to keep a split between the two. Tutorials > should describe how to use a selection of the sub-package tools by > example, while the sub-package rst pages should be the top level of > reference documentation for the sub-package. Some tutorial pages also contain the mathematical definitions and background explanations of the algorithms, e.g. signal.lfilter and fft, just two examples that I remember. Also having the definitions/formulas of the stats.distributions in the tutorials hides them a bit. Those parts might better belong in a package documentation. Usage examples of course should remain in the tutorials. Josef Josef > > Cheers, > Scott > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From pav at iki.fi Thu Aug 5 08:20:41 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 5 Aug 2010 12:20:41 +0000 (UTC) Subject: [SciPy-Dev] Status of scipy.* docstrings References: Message-ID: Thu, 05 Aug 2010 13:03:30 +0200, Scott Sinclair wrote: [clip] > Just a note that all the edits made at > http://docs.scipy.org/scipy/docs/scipy. result in patches > from the doc-editor that target scipy//__init__.py in the > source tree. If the patch is applied as is, the work from the doc-editor > won't appear in the terminal because the .__doc__ is > overwritten with the content of scipy//info.py on import of > the sub-package. I expect that Sphinx will also end up with the > docstring from info.py for the same reason, but don't have time to check > right now. Correct. Since the __init__.py do from info import __doc__ it's not possible to find out by introspection where the __doc__ actually came from. To get the patches to the correct place would need some extra smartness and special casing in pydoc-tool.py -- Pauli Virtanen From scott.sinclair.za at gmail.com Thu Aug 5 09:47:57 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 5 Aug 2010 15:47:57 +0200 Subject: [SciPy-Dev] Status of scipy.* docstrings In-Reply-To: References: Message-ID: On 5 August 2010 14:20, Pauli Virtanen wrote: > Thu, 05 Aug 2010 13:03:30 +0200, Scott Sinclair wrote: > [clip] >> Just a note that all the edits made at >> http://docs.scipy.org/scipy/docs/scipy. result in patches >> from the doc-editor that target scipy//__init__.py in the >> source tree. If the patch is applied as is, the work from the doc-editor >> won't appear in the terminal because the .__doc__ is >> overwritten with the content of scipy//info.py on import of >> the sub-package. I expect that Sphinx will also end up with the >> docstring from info.py for the same reason, but don't have time to check >> right now. > > Correct. Since the __init__.py do > > ? ? ? ?from info import __doc__ > > it's not possible to find out by introspection where the __doc__ actually > came from. To get the patches to the correct place would need some extra > smartness and special casing in pydoc-tool.py Or some extra cut-and-paste work from whoever applies the patches to trunk, which is why I brought it up. Cheers, Scott From d.l.goldsmith at gmail.com Thu Aug 5 16:52:28 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 5 Aug 2010 13:52:28 -0700 Subject: [SciPy-Dev] scipy.org down? Message-ID: I'm getting "Network Timeout" failures trying to visit www.scipy.org and mail.scipy.org... DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Thu Aug 5 19:07:16 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 5 Aug 2010 19:07:16 -0400 Subject: [SciPy-Dev] scipy.org down? In-Reply-To: References: Message-ID: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu> On 2010-08-05, at 4:52 PM, David Goldsmith wrote: > I'm getting "Network Timeout" failures trying to visit www.scipy.org and mail.scipy.org... "easy_install ipython" is also failing with timeouts. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwang at enthought.com Thu Aug 5 22:34:39 2010 From: pwang at enthought.com (Peter Wang) Date: Thu, 5 Aug 2010 21:34:39 -0500 Subject: [SciPy-Dev] scipy.org down? In-Reply-To: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu> References: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu> Message-ID: On Thu, Aug 5, 2010 at 6:07 PM, David Warde-Farley wrote: > On 2010-08-05, at 4:52 PM, David Goldsmith wrote: > > I'm getting "Network Timeout" failures trying to visit www.scipy.org and > mail.scipy.org... > > "easy_install ipython" is also failing with timeouts. > David The scipy.org web page seems to be back up. What URL is easy_install timing out on? -Peter From dwf at cs.toronto.edu Fri Aug 6 01:11:37 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 6 Aug 2010 01:11:37 -0400 Subject: [SciPy-Dev] scipy.org down? In-Reply-To: References: <4C7EF0B7-4F53-4D6C-8CBF-1A07ABABA604@cs.toronto.edu> Message-ID: <8FBF440D-83C1-4020-9F63-71DBFC70AC66@cs.toronto.edu> It's also fixed. Basically the package tarballs are hosted on ipython.scipy.org (which is maybe not a good idea given scipy.org's yoyo behaviour lately -- even numpy and scipy host on sourceforge) David On 2010-08-05, at 10:34 PM, Peter Wang wrote: > On Thu, Aug 5, 2010 at 6:07 PM, David Warde-Farley wrote: >> On 2010-08-05, at 4:52 PM, David Goldsmith wrote: >> >> I'm getting "Network Timeout" failures trying to visit www.scipy.org and >> mail.scipy.org... >> >> "easy_install ipython" is also failing with timeouts. >> David > > The scipy.org web page seems to be back up. What URL is easy_install > timing out on? > > > -Peter > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From cimrman3 at ntc.zcu.cz Fri Aug 6 12:52:31 2010 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Fri, 06 Aug 2010 18:52:31 +0200 Subject: [SciPy-Dev] ANN: SfePy 2010.3 Message-ID: <4C5C3DCF.5010100@ntc.zcu.cz> I am pleased to announce release 2010.3 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Development, mailing lists, issue tracking: http://sfepy.org Documentation: http://docs.sfepy.org/doc Git repository: http://github.com/sfepy Project page: http://sfepy.kme.zcu.cz Highlights of this release -------------------------- - significantly rewritten code for better interactive use - cleaner and simpler high level interface - new tutorial section: - Interactive Example: Linear Elasticity [1] [1] http://docs.sfepy.org/doc/tutorial.html#interactive-example-linear-elasticity Major improvements ------------------ Apart from many bug-fixes, let us mention: - new examples: - demonstration of the high level interface - new tests: - tests of the new high level interface - simplified but more powerful homogenization engine For more information on this release, see http://sfepy.googlecode.com/svn/web/releases/2010.3_RELEASE_NOTES.txt (full release notes, rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Vladim?r Luke?, Osman, Andre Smit, Logan Sorenson From scopatz at gmail.com Mon Aug 9 15:31:14 2010 From: scopatz at gmail.com (Anthony Scopatz) Date: Mon, 9 Aug 2010 14:31:14 -0500 Subject: [SciPy-Dev] Contingency Table Model Message-ID: Hello All, I have just opened a ticket (http://projects.scipy.org/scipy/ticket/1258) that adds a general contingency table class to the the stats package. This class includes methods to slice and collapse the table as well a calculate metrics such as chi-squared and entropy. This implementation came out of Warren Weckesser and me working on this over the SciPy 2010 statistics sprint. Please take a look! Comments and suggestions are always welcome. Be Well, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Aug 9 16:11:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Aug 2010 16:11:10 -0400 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: References: Message-ID: On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz wrote: > Hello All, > I have just opened a ticket > (http://projects.scipy.org/scipy/ticket/1258)?that adds a general > contingency table class to the the stats package. ?This class includes > methods to slice and?collapse?the table as well a calculate metrics such as > chi-squared and entropy. > This implementation came out of Warren?Weckesser and me working on this over > the SciPy 2010 statistics sprint. > Please take a look! ?Comments and suggestions are always welcome. just a quick question that I don't understand from a brief look at the source Isn't the core of "from_columns" doing the same quantization as np.histogramdd? ( I haven't looked closely enough yet) If x in from_columns is a tuple, then an array_like could also contain strings, e.g. names/levels of a categorical variable. I'm not sure how far this should go. other ideas methods or functions "from_flat" and "to_flat" would be useful. chi2 could be renamed to chi2_indep, or take an optional expected keyword, where the user could specify other distribution hypotheses. Josef > Be Well, > Anthony > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From bsouthey at gmail.com Mon Aug 9 16:35:34 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Aug 2010 15:35:34 -0500 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: References: Message-ID: <4C606696.6070707@gmail.com> On 08/09/2010 02:31 PM, Anthony Scopatz wrote: > Hello All, > > I have just opened a ticket > (http://projects.scipy.org/scipy/ticket/1258) that adds a general > contingency table class to the the stats package. This class includes > methods to slice and collapse the table as well a calculate metrics > such as chi-squared and entropy. > > This implementation came out of Warren Weckesser and me working on > this over the SciPy 2010 statistics sprint. > > Please take a look! Comments and suggestions are always welcome. > Be Well, > Anthony > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev Some points: 1) You can not use numpy's asarray function without checking the input type. You must be aware of at least masked arrays and Matrix inputs as well as new data types. 2) You can not force a dtype on the user - on line 54 when you can provide optional precision. 3) Can you please clarify lines 112-113? " scipy.stats.chisquare -- one-way chi-square test (which is not the same as the n-way test with n=1)." This needs to be a little more clear because the exact same test statistic is being used. In fact the function must give the correct answer with 1d array. 4) Related to point 3, lines 72-74 are not correct, see http://en.wikipedia.org/wiki/Pearson's_chi-square_test 5) You must allow the user to provide their own expected values 6) Users need to be able to control the output - really I don't want to see the table of expected values unless requested. Also a user might just want the table of expected values and nothing else. 7) You should not need the chi2 function. 8) More generally, what is the need for having an ContingencyTable object? Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Aug 9 16:47:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Aug 2010 16:47:12 -0400 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: <4C606696.6070707@gmail.com> References: <4C606696.6070707@gmail.com> Message-ID: On Mon, Aug 9, 2010 at 4:35 PM, Bruce Southey wrote: > > On 08/09/2010 02:31 PM, Anthony Scopatz wrote: > > Hello All, > I have just opened a ticket > (http://projects.scipy.org/scipy/ticket/1258)?that adds a general > contingency table class to the the stats package. ?This class includes > methods to slice and?collapse?the table as well a calculate metrics such as > chi-squared and entropy. > This implementation came out of Warren?Weckesser and me working on this over > the SciPy 2010 statistics sprint. > Please take a look! ?Comments and suggestions are always welcome. > Be Well, > Anthony > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > Some points: > > 1) You can not use numpy's asarray function without checking the input type. > You must be aware of at least masked arrays and Matrix inputs as well as new > data types. > > 2) You can not force a dtype on the user -? on line 54 when you can provide > optional precision. > > 3) Can you please clarify lines 112-113? > "? scipy.stats.chisquare -- one-way chi-square test (which is not the same > as the n-way test with n=1)." > This needs to be a little more clear because the exact same test statistic > is being used. In fact the function must give the correct answer with 1d > array. > > 4) Related to point 3, lines 72-74 are not correct, see > http://en.wikipedia.org/wiki/Pearson's_chi-square_test > > 5) You must allow the user to provide their own expected values > > 6) Users need to be able to control the output - really I don't want to see > the table of expected values unless requested. Also a user might just want > the table of expected values and nothing else. > > 7) You should not need the chi2 function. > > 8) More generally, what is the need for having an ContingencyTable object? maybe some usage examples will be nice. I like the collapse methods, since, I think, it makes it easy to test (for marginal ?) independence along different variables. Similar for slicing to test conditional independence, but I haven't read through the slicing method yet. In the long term it might also be useful to attach other tests for contingency tables for convenience, fisher- exact, kendall tau and other tests that apply. And when numpy gets the labeled array, we can attach labels for the categories. Josef Josef > > > Bruce > > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From scopatz at gmail.com Mon Aug 9 17:46:50 2010 From: scopatz at gmail.com (Anthony Scopatz) Date: Mon, 9 Aug 2010 16:46:50 -0500 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: References: Message-ID: On Mon, Aug 9, 2010 at 3:11 PM, wrote: > On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz wrote: > > Hello All, > > I have just opened a ticket > > (http://projects.scipy.org/scipy/ticket/1258) that adds a general > > contingency table class to the the stats package. This class includes > > methods to slice and collapse the table as well a calculate metrics such > as > > chi-squared and entropy. > > This implementation came out of Warren Weckesser and me working on this > over > > the SciPy 2010 statistics sprint. > > Please take a look! Comments and suggestions are always welcome. > > just a quick question that I don't understand from a brief look at the > source > > Isn't the core of "from_columns" doing the same quantization as > np.histogramdd? ( I haven't looked closely enough yet) > > If x in from_columns is a tuple, then an array_like could also contain > strings, e.g. names/levels of a categorical variable. I'm not sure how > far this should go. > > To kill two birds with one stone, from_columns() and np.histogramdd() do effectively the same thing for continuous variables but specifying bounds and distributions rather than bins. However, from_columns() allows for discrete variables, which as you pointed out can handle categorical, string-based data. See the attached file for an example. (Maybe this method of making histograms should be in numpy?) The reason I with the bounds/dist rather than bin implementation is that bounds/dists are more often what you play around with when exploring the data. other ideas > methods or functions "from_flat" and "to_flat" would be useful. > chi2 could be renamed to chi2_indep, or take an optional expected > keyword, where the user could specify other distribution hypotheses. > > An expected keyword would work well here. It might be a better idea to include such a keyword in __init__() and from_columns(). I'd just need to make sure that the collapse and slice methods propagate this properly. I can also see how "from_flat" and "to_flat" methods would be nice. Be Well Anthony > Josef > > > > Be Well, > > Anthony > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ct_cat.py Type: text/x-python Size: 451 bytes Desc: not available URL: From roberto.bucher at supsi.ch Wed Aug 11 04:40:42 2010 From: roberto.bucher at supsi.ch (Roberto Bucher) Date: Wed, 11 Aug 2010 10:40:42 +0200 Subject: [SciPy-Dev] First contact Message-ID: <201008111040.42635.roberto.bucher@supsi.ch> Hi all I'm new in the mailing list and I'd like to contact people responsible of the signal/ltisys module. I'm working on a porting of all my control functions from Scicoslab under Python and I found some errors(?) in the signal/ltisys module. In addition I added some code to the module for handling state space MIMO systems. The first problem was related with the function "ss2tf" which gives as return numerator a 2 dimensional array, not more usable by the print function of the class. I added these lines at the end of the function: # Avoid leading zeros in num num=num[0] while num[0]==0: num=num[1:] but I don't want to get problems with other modules... Best regards Roberto -- ----------------------------------------------------------------------------- Coltivate Linux! Tanto Windows si pianta da solo... ----------------------------------------------------------------------------- University of Applied Sciences of Southern Switzerland Dept. Innovative Technologies CH-6928 Lugano-Manno http://web.dti.supsi.ch/~bucher From derek at astro.physik.uni-goettingen.de Wed Aug 11 11:02:59 2010 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 11 Aug 2010 17:02:59 +0200 Subject: [SciPy-Dev] First contact In-Reply-To: <201008111040.42635.roberto.bucher@supsi.ch> References: <201008111040.42635.roberto.bucher@supsi.ch> Message-ID: Hi Roberto, welcome to the list! > The first problem was related with the function "ss2tf" which gives > as return > numerator a 2 dimensional array, not more usable by the print > function of the > class. I added these lines at the end of the function: > > # Avoid leading zeros in num > num=num[0] > while num[0]==0: > num=num[1:] > > but I don't want to get problems with other modules... I must admit I have no idea what this output is typically supposed to look like, but I note that in the first step you are already replacing a 2D-array with its first 1D-element, which is probably not desired if you have input with D.shape[0] > 1. The print function of which class are you referring to BTW? For the second part, you could replace the "while" loop with a numpy operation: num = num[num.nonzero()[0][0]:] HTH, Derek From roberto.bucher at supsi.ch Wed Aug 11 11:55:10 2010 From: roberto.bucher at supsi.ch (Roberto Bucher) Date: Wed, 11 Aug 2010 17:55:10 +0200 Subject: [SciPy-Dev] First contact In-Reply-To: References: <201008111040.42635.roberto.bucher@supsi.ch> Message-ID: <201008111755.10096.roberto.bucher@supsi.ch> Thanks Derek I solved the problem by changing the lines # Avoid leading zeros in num num=num[0] while num[0]==0: num=num[1:] with these lines # Avoid leading zeros in num [num,den]=normalize(num,den) Best regards Roberto On Wednesday 11 August 2010 17:02:59 Derek Homeier wrote: > Hi Roberto, > > welcome to the list! > > > The first problem was related with the function "ss2tf" which gives > > as return > > numerator a 2 dimensional array, not more usable by the print > > function of the > > > > class. I added these lines at the end of the function: > > # Avoid leading zeros in num > > num=num[0] > > > > while num[0]==0: > > num=num[1:] > > > > but I don't want to get problems with other modules... > > I must admit I have no idea what this output is typically supposed to > look like, > but I note that in the first step you are already replacing a 2D-array > with its first > 1D-element, which is probably not desired if you have input with > D.shape[0] > 1. > The print function of which class are you referring to BTW? > > For the second part, you could replace the "while" loop with a numpy > operation: > > num = num[num.nonzero()[0][0]:] > > HTH, > Derek -- ----------------------------------------------------------------------------- Coltivate Linux! Tanto Windows si pianta da solo... ----------------------------------------------------------------------------- University of Applied Sciences of Southern Switzerland Dept. Innovative Technologies CH-6928 Lugano-Manno http://web.dti.supsi.ch/~bucher From scopatz at gmail.com Wed Aug 11 15:10:07 2010 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 11 Aug 2010 14:10:07 -0500 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: <4C606696.6070707@gmail.com> References: <4C606696.6070707@gmail.com> Message-ID: On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey wrote: > > On 08/09/2010 02:31 PM, Anthony Scopatz wrote: > > Hello All, > > I have just opened a ticket (http://projects.scipy.org/scipy/ticket/1258) that > adds a general contingency table class to the the stats package. This class > includes methods to slice and collapse the table as well a calculate metrics > such as chi-squared and entropy. > > This implementation came out of Warren Weckesser and me working on this > over the SciPy 2010 statistics sprint. > > Please take a look! Comments and suggestions are always welcome. > Be Well, > Anthony > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev > > Hello All, I have updated the ticket with new versions of the contingency_table.py and test_contingency_table.py. I also have a github clone of scipy now, if you just want to grab the changes, http://github.com/scopatz/scipy Issues addressed in the new version: 1. Expected tables may now be user-specified, 2. added from_flat() and to_flat() methods, 3. Retooled the chi_square() method and removed the chisquare_nway() function. 4. All table metric methods (entropy) now add the calculated value to the contingency table's attributes as well as returning the value. Bruce, Thank you for your concerns. I'd like to address your points below. > 1) You can not use numpy's asarray function without checking the input > type. You must be aware of at least masked arrays and Matrix inputs as well > as new data types. > > 2) You can not force a dtype on the user - on line 54 when you can provide > optional precision. > These are handled by now allowing the user to specify their own expected table. The expected_nway() function that these to points relate to can now be avoided completely, if desired. > > 3) Can you please clarify lines 112-113? > " scipy.stats.chisquare -- one-way chi-square test (which is not the same > as the n-way test with n=1)." > This needs to be a little more clear because the exact same test statistic > is being used. In fact the function must give the correct answer with 1d > array. > > 4) Related to point 3, lines 72-74 are not correct, see > http://en.wikipedia.org/wiki/Pearson's_chi-square_test > The chisquared_nway() function has been removed, so 3) and 4) no longer apply. > 5) You must allow the user to provide their own expected values > done. > 6) Users need to be able to control the output - really I don't want to > see the table of expected values unless requested. Also a user might just > want the table of expected values and nothing else. > The expected table, much like the probability table or the number of degrees of freedom or the number of dimensions, is not really an output. Rather it is more of an attribute that helps calculate outputs, like the entropy, mutual information, etc. Therefore it should always be included in an instance of ContingencyTable. A user could simply have an array of values that they call a contingency table, but this class provides a tool for easily calculating related metrics (outputs). 7) You should not need the chi2 function. > Now required since chisquared_nway() was removed. > 8) More generally, what is the need for having an ContingencyTable object? > Basically, my argument for the need is that contingency tables (or cross tabulations) are expected as standard in any statistics package. R has them, Matlab has them, SPSS has them, Stata has them, and so on. I know that when I came to scipy.stats and found that they weren't here already, I was disappointed. I hope this helps! Be Well Anthony > > > Bruce > > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Aug 11 15:37:38 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Aug 2010 15:37:38 -0400 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: References: <4C606696.6070707@gmail.com> Message-ID: On Wed, Aug 11, 2010 at 3:10 PM, Anthony Scopatz wrote: > > > On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey wrote: >> >> On 08/09/2010 02:31 PM, Anthony Scopatz wrote: >> >> Hello All, >> I have just opened a ticket >> (http://projects.scipy.org/scipy/ticket/1258)?that adds a general >> contingency table class to the the stats package. ?This class includes >> methods to slice and?collapse?the table as well a calculate metrics such as >> chi-squared and entropy. >> This implementation came out of Warren?Weckesser and me working on this >> over the SciPy 2010 statistics sprint. >> Please take a look! ?Comments and suggestions are always welcome. >> Be Well, >> Anthony >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > Hello All, > I have updated the ticket with new versions of the contingency_table.py and > test_contingency_table.py. ?I also have a github clone of scipy now, if you > just want to grab the changes,?http://github.com/scopatz/scipy > Issues addressed in the new version: > > Expected tables may now be user-specified, > added from_flat() and to_flat() methods, a clarification: for from_flat I was thinking about non-rectangular data when a simple reshape doesn't work. something like an nd version of http://mail.scipy.org/pipermail/scipy-dev/2009-March/011592.html for example when the count data are given in a structured array with the corresponding group labels where zero count entries might be missing and which is not necessarily sorted/ordered in the right way for a reshape. but now I think this is also handled by from_columns, where the user specifies the "distribution" as list of (unique) values. (?) (I haven't looked at the other changes) Josef > Retooled the chi_square() method and removed the chisquare_nway() function. > All table metric methods (entropy) now add the calculated value to the > contingency table's attributes as well as returning the value. > > Bruce, Thank you for your concerns. ?I'd like to address your points below. > >> >> 1) You can not use numpy's asarray function without checking the input >> type. You must be aware of at least masked arrays and Matrix inputs as well >> as new data types. >> >> 2) You can not force a dtype on the user -? on line 54 when you can >> provide optional precision. > > These are handled by now allowing the user to specify their own expected > table. ?The expected_nway() function that these to points relate to can now > be avoided?completely, if desired. > >> >> 3) Can you please clarify lines 112-113? >> "? scipy.stats.chisquare -- one-way chi-square test (which is not the same >> as the n-way test with n=1)." >> This needs to be a little more clear because the exact same test statistic >> is being used. In fact the function must give the correct answer with 1d >> array. >> >> 4) Related to point 3, lines 72-74 are not correct, see >> http://en.wikipedia.org/wiki/Pearson's_chi-square_test > > The chisquared_nway() function has been removed, so 3) and 4) no longer > apply. > >> >> 5) You must allow the user to provide their own expected values > > > done. > >> >> 6) Users need to be able to control the output - really I don't want to >> see the table of expected values unless requested. Also a user might just >> want the table of expected values and nothing else. > > The expected table, much like the probability table or the number of degrees > of freedom or the number of dimensions, is not really an output. ?Rather it > is more of an attribute that helps calculate outputs, like the entropy, > mutual information, etc. ?Therefore it should always be included in an > instance of ContingencyTable. ?A user could simply have an array of values > that they call a contingency table, but this class provides a tool for > easily calculating related metrics (outputs). >> >> 7) You should not need the chi2 function. > > Now required since?chisquared_nway() was removed. > >> >> 8) More generally, what is the need for having an ContingencyTable object? > > Basically, my argument for the need is that contingency tables (or cross > tabulations) are expected as standard in any statistics package. ?R has > them, Matlab has them, SPSS has them, Stata has them, and so on. ?I know > that when I came to scipy.stats and found that they weren't here already, I > was disappointed. > I hope this helps! > Be Well > Anthony > >> >> Bruce >> >> >> >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From bsouthey at gmail.com Wed Aug 11 16:04:36 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Aug 2010 15:04:36 -0500 Subject: [SciPy-Dev] Contingency Table Model In-Reply-To: References: <4C606696.6070707@gmail.com> Message-ID: <4C630254.2040805@gmail.com> On 08/11/2010 02:10 PM, Anthony Scopatz wrote: > > > On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey > wrote: > > > On 08/09/2010 02:31 PM, Anthony Scopatz wrote: >> Hello All, >> >> I have just opened a ticket >> (http://projects.scipy.org/scipy/ticket/1258) that adds a general >> contingency table class to the the stats package. This class >> includes methods to slice and collapse the table as well a >> calculate metrics such as chi-squared and entropy. >> >> This implementation came out of Warren Weckesser and me working >> on this over the SciPy 2010 statistics sprint. >> >> Please take a look! Comments and suggestions are always welcome. >> Be Well, >> Anthony >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > Hello All, > > I have updated the ticket with new versions of the > contingency_table.py and test_contingency_table.py. I also have a > github clone of scipy now, if you just want to grab the changes, > http://github.com/scopatz/scipy > > Issues addressed in the new version: > > 1. Expected tables may now be user-specified, > 2. added from_flat() and to_flat() methods, > 3. Retooled the chi_square() method and removed the > chisquare_nway() function. > 4. All table metric methods (entropy) now add the calculated value > to the contingency table's attributes as well as returning the > value. > > Bruce, Thank you for your concerns. I'd like to address your points > below. > > 1) You can not use numpy's asarray function without checking the > input type. You must be aware of at least masked arrays and Matrix > inputs as well as new data types. > > 2) You can not force a dtype on the user - on line 54 when you > can provide optional precision. > > > These are handled by now allowing the user to specify their own > expected table. The expected_nway() function that these to points > relate to can now be avoided completely, if desired. > > > 3) Can you please clarify lines 112-113? > " scipy.stats.chisquare -- one-way chi-square test (which is not > the same > as the n-way test with n=1)." > This needs to be a little more clear because the exact same test > statistic is being used. In fact the function must give the > correct answer with 1d array. > > 4) Related to point 3, lines 72-74 are not correct, see > http://en.wikipedia.org/wiki/Pearson's_chi-square_test > > > > The chisquared_nway() function has been removed, so 3) and 4) no > longer apply. > > 5) You must allow the user to provide their own expected values > > done. > > 6) Users need to be able to control the output - really I don't > want to see the table of expected values unless requested. Also a > user might just want the table of expected values and nothing else. > > > The expected table, much like the probability table or the number of > degrees of freedom or the number of dimensions, is not really an > output. Rather it is more of an attribute that helps calculate > outputs, like the entropy, mutual information, etc. Therefore it > should always be included in an instance of ContingencyTable. A user > could simply have an array of values that they call a contingency > table, but this class provides a tool for easily calculating related > metrics (outputs). > > 7) You should not need the chi2 function. > > > Now required since chisquared_nway() was removed. > > 8) More generally, what is the need for having an ContingencyTable > object? > > > Basically, my argument for the need is that contingency tables (or > cross tabulations) are expected as standard in any statistics package. > R has them, Matlab has them, SPSS has them, Stata has them, and so > on. I know that when I came to scipy.stats and found that they > weren't here already, I was disappointed. > > I hope this helps! > > Be Well > Anthony > > > > Bruce > > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev I am very aware that this type of functionality is available in multiple applications so that was never my concern. But you have failed to address my concerns nor addressed the the questions about why it is needed in this form. An important issue is why we need this code when it was pointed out the similarity to numpy's histogram functions. At some stage we have to say no to code bloat. Note, as a class then everything must be self-contained - both _margins and expected_nway have little point outside your class. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Tue Aug 17 19:08:35 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Tue, 17 Aug 2010 16:08:35 -0700 Subject: [SciPy-Dev] status of http://ask.scipy.org and http://advice.mechanicalkern.com/ Message-ID: Hello, This email was prompted by a blog post from William Stein: http://sagemath.blogspot.com/2010/08/overflow.html Is http://ask.scipy.org the official site at this point? What is the plan for http://advice.mechanicalkern.com/? Thanks, Jarrod From robert.kern at gmail.com Tue Aug 17 19:22:23 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Aug 2010 18:22:23 -0500 Subject: [SciPy-Dev] status of http://ask.scipy.org and http://advice.mechanicalkern.com/ In-Reply-To: References: Message-ID: On Tue, Aug 17, 2010 at 18:08, Jarrod Millman wrote: > Hello, > > This email was prompted by a blog post from William Stein: > ?http://sagemath.blogspot.com/2010/08/overflow.html > > Is http://ask.scipy.org the official site at this point? Might as well be. >?What is the > plan for http://advice.mechanicalkern.com/? David Warde-Farley has manually moved over most, if not all of the questions and answers. I have made a note on the front page there to point to ask.scipy.org. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From dwf at cs.toronto.edu Wed Aug 18 17:33:19 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 18 Aug 2010 17:33:19 -0400 Subject: [SciPy-Dev] status of http://ask.scipy.org and http://advice.mechanicalkern.com/ In-Reply-To: References: Message-ID: <9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu> On 2010-08-17, at 7:22 PM, Robert Kern wrote: > David Warde-Farley has manually moved over most, if not all of the > questions and answers. I have made a note on the front page there to > point to ask.scipy.org. I think it's all, at this point, at least the chosen answers. I'll do a quick check later tonight for anything I missed. Robert, would it be hard to disable logins/signups as well? Do you think this is a good idea? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Aug 18 17:35:17 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 Aug 2010 16:35:17 -0500 Subject: [SciPy-Dev] status of http://ask.scipy.org and http://advice.mechanicalkern.com/ In-Reply-To: <9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu> References: <9F509279-95DF-443E-973C-549BDE3D128E@cs.toronto.edu> Message-ID: On Wed, Aug 18, 2010 at 16:33, David Warde-Farley wrote: > Robert, would it be hard to disable logins/signups as well? Oh, probably. I might be able to just remove the links though... > Do you think > this is a good idea? Of course. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From tmp50 at ukr.net Fri Aug 20 14:52:09 2010 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 20 Aug 2010 21:52:09 +0300 Subject: [SciPy-Dev] question to scikits.appspot.com editors Message-ID: hi all, who is responsible for scikits.appspot.com editing? I see some scikits (e.g. learn) has mentioned their personal website ( http://scikit-learn.sourceforge.net), while openopt entry still points to deprecated location that is out of maintanance for several years; same for svn root mentioned there. Could you either fix it to modern locations (http://openopt.org, svn://openopt.org/PythonPackages/OpenOpt) or, at least, remove it at all to prevent misleading of users? Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Fri Aug 20 14:54:56 2010 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 20 Aug 2010 21:54:56 +0300 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? Message-ID: Hi all, could you consider creating special mail list "build/install numpy/scipy issues" or something like that? I guess lots of people (as well as I) are not interested in reading all those huge amounts of messages related to these issues. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 20 15:01:26 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 20 Aug 2010 14:01:26 -0500 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? In-Reply-To: References: Message-ID: 2010/8/20 Dmitrey : > Hi all, > could you consider creating special mail list "build/install numpy/scipy > issues" or something like that? Yes, I will consider it. Having considered it, no I do not think it would be a good idea to split up the mailing lists even more. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From tmp50 at ukr.net Fri Aug 20 18:30:57 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 21 Aug 2010 01:30:57 +0300 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? In-Reply-To: Message-ID: Could also the following idea be considered: create numpy/scipy announce mail list with soft releases and other important info? D. --- ???????? ????????? --- ?? ????: "Robert Kern" ????: "SciPy Developers List" ????: 20 ???????, 22:01:26 ????: Re: [SciPy-Dev] could you consider creating "build/install issues" mail list? 2010/8/20 Dmitrey : > Hi all, > could you consider creating special mail list "build/install numpy/scipy > issues" or something like that? Yes, I will consider it. Having considered it, no I do not think it would be a good idea to split up the mailing lists even more. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco _______________________________________________ SciPy-Dev mailing list SciPy-Dev at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 20 18:37:37 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 20 Aug 2010 17:37:37 -0500 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? In-Reply-To: References: Message-ID: 2010/8/20 Dmitrey : > Could also the following idea be considered: create numpy/scipy announce > mail list with soft releases and other important info? Release announcements get sent to python-announce: http://mail.python.org/mailman/listinfo/python-announce-list -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From ralf.gommers at googlemail.com Fri Aug 20 22:03:01 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 21 Aug 2010 10:03:01 +0800 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? In-Reply-To: References: Message-ID: On Sat, Aug 21, 2010 at 6:37 AM, Robert Kern wrote: > 2010/8/20 Dmitrey : > > Could also the following idea be considered: create numpy/scipy announce > > mail list with soft releases and other important info? > > Release announcements get sent to python-announce: > > http://mail.python.org/mailman/listinfo/python-announce-list > > Actually, the last releases weren't announced there since I wasn't aware of this. Will do from now on. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Sat Aug 21 04:04:54 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 21 Aug 2010 11:04:54 +0300 Subject: [SciPy-Dev] could you consider creating "build/install issues" mail list? In-Reply-To: Message-ID: Well, thus I guess scipy-user rss subscription is beyond my needs hence I cease it as I had done with numpy-user. Regards, D. --- ???????? ????????? --- ?? ????: "Robert Kern" ????: "SciPy Developers List" ????: 21 ???????, 01:37:37 ????: Re: [SciPy-Dev] could you consider creating "build/install issues" mail list? 2010/8/20 Dmitrey : > Could also the following idea be considered: create numpy/scipy announce > mail list with soft releases and other important info? Release announcements get sent to python-announce: http://mail.python.org/mailman/listinfo/python-announce-list -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco _______________________________________________ SciPy-Dev mailing list SciPy-Dev at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From roberto.bucher at supsi.ch Sat Aug 21 14:00:01 2010 From: roberto.bucher at supsi.ch (Roberto Bucher) Date: Sat, 21 Aug 2010 20:00:01 +0200 Subject: [SciPy-Dev] ltisys.py Message-ID: <201008212000.01702.roberto.bucher@supsi.ch> I'm working on a control system toolbox for python. In particular I've modified some functions of the control system toolbox developed by Richard Murray and I added some new functions. The first problem is related to the modul ltisys.py that I modified for handling: - MIMO systems (state-space only) - Sampling Time for discrete time system I've done other modifications to different modul of Richard, who is in CC: - matlab.py - statesp.py - xferfcn.py in order to implement the following new functions in my yottalab.py modul: - c2d (zoh+bilinear) - d2c (zoh+bilinear) - dare - discrete riccati solution - care - continous riccati solution - dlqr - discrete linear quadratic regulator - ctrb - controllability matrix - acker - ackerman pole placement - minreal - minimal state space representation - dcgain steady state gain for both continous and discrete time systems - tf (casting function) - ss (casting function) The modified files can be downloaded from my homepage in the python section, but I want to see how I can contribute to put the modifications in the main Scipy distribution. Best regards Roberto -- ----------------------------------------------------------------------------- Coltivate Linux! Tanto Windows si pianta da solo... ----------------------------------------------------------------------------- University of Applied Sciences of Southern Switzerland Dept. Innovative Technologies CH-6928 Lugano-Manno http://web.dti.supsi.ch/~bucher From pav at iki.fi Sat Aug 21 15:00:40 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 21 Aug 2010 19:00:40 +0000 (UTC) Subject: [SciPy-Dev] question to scikits.appspot.com editors References: Message-ID: Fri, 20 Aug 2010 21:52:09 +0300, Dmitrey wrote: > who is responsible for scikits.appspot.com editing? I see some scikits > (e.g. learn) has mentioned their personal website ( > http://scikit-learn.sourceforge.net), while openopt entry still points > to deprecated location that is out of maintanance for several years; > same for svn root mentioned there. Could you either fix it to modern > locations (http://openopt.org, svn://openopt.org/PythonPackages/OpenOpt) > or, at least, remove it at all to prevent misleading of users? It's semi-automatic -- as far as I understands it looks what's in scikits SVN and PyPi, and works from that. Anyway, the people who know how it works are probably Stefan van der Walt, and somebody else. BTW, does the openopt stuff in scikits SVN still serve a purpose? I think we should just remove it if it's not used -- it can be still recovered from the history even after that, if there was something useful there. -- Pauli Virtanen From josef.pktd at gmail.com Sat Aug 21 15:09:36 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Aug 2010 15:09:36 -0400 Subject: [SciPy-Dev] ltisys.py In-Reply-To: <201008212000.01702.roberto.bucher@supsi.ch> References: <201008212000.01702.roberto.bucher@supsi.ch> Message-ID: On Sat, Aug 21, 2010 at 2:00 PM, Roberto Bucher wrote: > I'm working on a control system toolbox for python. In particular I've modified > some functions of the control system toolbox developed by Richard Murray and I > added some new functions. ?The first problem is related to the modul ltisys.py > that I modified for handling: > - MIMO systems (state-space only) > - Sampling Time for discrete time system > > I've done other modifications to different modul of Richard, who is in CC: > - matlab.py > - statesp.py > - xferfcn.py > ?in order to implement the following new functions in my yottalab.py modul: > - c2d (zoh+bilinear) > - d2c (zoh+bilinear) > - dare - discrete riccati solution > - care - continous riccati solution > - dlqr - discrete linear quadratic regulator > - ctrb - controllability matrix > - acker - ackerman pole placement > - minreal - minimal state space representation > - dcgain steady state gain for both continous and discrete time systems > - tf (casting function) > - ss (casting function) > > The modified files can be downloaded from my homepage > in the python section, but I want to see how I can contribute to put the > modifications in the main Scipy distribution. I would find it very good if such enhancements go into scipy ltisys. It was often requested, and I think we will be able to use them also for time series analysis. I stopped looking at it after I figured out ltisys only handled single input, mainly continuous time processes. And I hope someone finds the time soon to review this (but unfortunately I don't have any time at all right now). A few questions: Do you have your changes under version control? It would make it easier to produce a diff and look at the changes. Do you have examples that could be converted into tests? Licensing: scipy and python-control are BSD your yottalab.py doesn't have a license statement, but it has to be GPL by infection, because of "from slycot import sb02od, tb03ad" since slycot is GPL Would it be useful to license your parts of yottalab that don't rely on slycot as BSD? e.g. if a replacement for slycot could be found. or maybe python control also moves to GPL with slycot integration ? http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Developer_assignments Thanks, Josef > > Best regards > > Roberto > > -- > ----------------------------------------------------------------------------- > Coltivate Linux! Tanto Windows si pianta da solo... > ----------------------------------------------------------------------------- > University of Applied Sciences of Southern Switzerland > Dept. Innovative Technologies > CH-6928 Lugano-Manno > http://web.dti.supsi.ch/~bucher > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From roberto.bucher at supsi.ch Sun Aug 22 02:22:57 2010 From: roberto.bucher at supsi.ch (Roberto Bucher) Date: Sun, 22 Aug 2010 08:22:57 +0200 Subject: [SciPy-Dev] ltisys.py In-Reply-To: References: <201008212000.01702.roberto.bucher@supsi.ch> Message-ID: <201008220822.57787.roberto.bucher@supsi.ch> Thanks Josef for your quick answer. It's not a problem for me to work under version control. I'm already one of the developper of Linux RTAI and in particular I've modified the Scicoslab code generator in order to create code for RT systems (RTAI+dsPIC). I'm used to work with CVS, GIT and SVN tools. Of course my code will be released completely under the BSD licence. It is still in development, because I'm still looking for Matlab FOSS replacements. Till now, I've worked with Scicoslab, but Python seems to be a good alternative too. At present, the main problem with the control system toolbox under Python is that it is not ready for practical applications: in particular it is not able to handle with discrete time systems... I'll put ASAP two examples on my homepage: - DC motor with state feedback controller, integrator + reduced order observer - Inverted pendulum, with LQR state feedback + reduced order observer My first goal was to demonstrate the possibility to reproduce all my Scicoslab systems under Python, and I reached it. Best regards Roberto On Saturday 21 August 2010 21:09:36 josef.pktd at gmail.com wrote: > On Sat, Aug 21, 2010 at 2:00 PM, Roberto Bucher wrote: > > I'm working on a control system toolbox for python. In particular I've > > modified some functions of the control system toolbox developed by > > Richard Murray and I added some new functions. The first problem is > > related to the modul ltisys.py that I modified for handling: > > - MIMO systems (state-space only) > > - Sampling Time for discrete time system > > > > I've done other modifications to different modul of Richard, who is in > > CC: - matlab.py > > - statesp.py > > - xferfcn.py > > in order to implement the following new functions in my yottalab.py > > modul: - c2d (zoh+bilinear) > > - d2c (zoh+bilinear) > > - dare - discrete riccati solution > > - care - continous riccati solution > > - dlqr - discrete linear quadratic regulator > > - ctrb - controllability matrix > > - acker - ackerman pole placement > > - minreal - minimal state space representation > > - dcgain steady state gain for both continous and discrete time systems > > - tf (casting function) > > - ss (casting function) > > > > The modified files can be downloaded from my homepage > > in the python section, but I want to see how I can contribute to put the > > modifications in the main Scipy distribution. > > I would find it very good if such enhancements go into scipy ltisys. > It was often requested, and I think we will be able to use them also > for time series analysis. I stopped looking at it after I figured out > ltisys only handled single input, mainly continuous time processes. > And I hope someone finds the time soon to review this (but > unfortunately I don't have any time at all right now). > > > A few questions: > > Do you have your changes under version control? It would make it > easier to produce a diff and look at the changes. > > Do you have examples that could be converted into tests? > > Licensing: > scipy and python-control are BSD > your yottalab.py doesn't have a license statement, but it has to be > GPL by infection, because of > "from slycot import sb02od, tb03ad" since slycot is GPL > > Would it be useful to license your parts of yottalab that don't rely > on slycot as BSD? > e.g. if a replacement for slycot could be found. > > or maybe python control also moves to GPL with slycot integration ? > http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Develo > per_assignments > > Thanks, > > Josef > > > Best regards > > > > Roberto > > > > -- > > ------------------------------------------------------------------------- > > ---- Coltivate Linux! Tanto Windows si pianta da solo... > > ------------------------------------------------------------------------- > > ---- University of Applied Sciences of Southern Switzerland > > Dept. Innovative Technologies > > CH-6928 Lugano-Manno > > http://web.dti.supsi.ch/~bucher > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev -- ----------------------------------------------------------------------------- Coltivate Linux! Tanto Windows si pianta da solo... ----------------------------------------------------------------------------- University of Applied Sciences of Southern Switzerland Dept. Innovative Technologies CH-6928 Lugano-Manno http://web.dti.supsi.ch/~bucher From strawman at astraw.com Sun Aug 22 12:57:25 2010 From: strawman at astraw.com (Andrew Straw) Date: Sun, 22 Aug 2010 09:57:25 -0700 Subject: [SciPy-Dev] ltisys.py In-Reply-To: References: <201008212000.01702.roberto.bucher@supsi.ch> Message-ID: <4C7156F5.7020305@astraw.com> josef.pktd at gmail.com wrote: > Licensing: > scipy and python-control are BSD > your yottalab.py doesn't have a license statement, but it has to be > GPL by infection, because of > "from slycot import sb02od, tb03ad" since slycot is GPL > > Would it be useful to license your parts of yottalab that don't rely > on slycot as BSD? > e.g. if a replacement for slycot could be found. > > or maybe python control also moves to GPL with slycot integration ? > http://sourceforge.net/apps/mediawiki/python-control/index.php?title=Developer_assignments > Alternatively, one could ask the slycot/SLICOT authors if they would relicense their work as BSD so that it can be included in scipy. Historically, several authors have agreed to change GPL code to BSD when asked nicely with this given as the reason. -Andrew From stefan at sun.ac.za Mon Aug 23 04:37:01 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 23 Aug 2010 10:37:01 +0200 Subject: [SciPy-Dev] question to scikits.appspot.com editors In-Reply-To: References: Message-ID: On 21 August 2010 21:00, Pauli Virtanen wrote: > It's semi-automatic -- as far as I understands it looks what's in scikits > SVN and PyPi, and works from that. That's correct; but we could probably switch off the SVN scanning. Since "scikits.openopt" is not found on PyPi, it assumes that the SVN version provides the latest info. Regards St?fan From oliphant at enthought.com Mon Aug 23 15:33:54 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 23 Aug 2010 14:33:54 -0500 Subject: [SciPy-Dev] ltisys.py In-Reply-To: <201008212000.01702.roberto.bucher@supsi.ch> References: <201008212000.01702.roberto.bucher@supsi.ch> Message-ID: On Aug 21, 2010, at 1:00 PM, Roberto Bucher wrote: > I'm working on a control system toolbox for python. In particular I've modified > some functions of the control system toolbox developed by Richard Murray and I > added some new functions. The first problem is related to the modul ltisys.py > that I modified for handling: > - MIMO systems (state-space only) > - Sampling Time for discrete time system > > I've done other modifications to different modul of Richard, who is in CC: > - matlab.py > - statesp.py > - xferfcn.py > in order to implement the following new functions in my yottalab.py modul: > - c2d (zoh+bilinear) > - d2c (zoh+bilinear) > - dare - discrete riccati solution > - care - continous riccati solution > - dlqr - discrete linear quadratic regulator > - ctrb - controllability matrix > - acker - ackerman pole placement > - minreal - minimal state space representation > - dcgain steady state gain for both continous and discrete time systems > - tf (casting function) > - ss (casting function) > > The modified files can be downloaded from my homepage > in the python section, but I want to see how I can contribute to put the > modifications in the main Scipy distribution. These would make great additions to SciPy. We are moving to a more distributed development model which should help you be able to make version-controlled changes to these files as part of SciPy. As soon as we make progress in that direction, I can review your changes and get them into SciPy. -Travis From hardbyte at gmail.com Tue Aug 24 20:35:56 2010 From: hardbyte at gmail.com (Brian Thorne) Date: Wed, 25 Aug 2010 12:35:56 +1200 Subject: [SciPy-Dev] Getting Scipy's weave to work reliably on Windows In-Reply-To: References: Message-ID: Hi Chris, Have you tried patching python's distutils with the patch at ( http://bugs.python.org/issue4508)? If so, does the patch fix both cases of spaces in the path? It appears to me that one of the compiler files in the python source is not correctly quoting of spaces in intermediate files. Cheers, Brian On 18 July 2010 03:49, Chris Ball wrote: > Hi, > > While testing Scipy's weave on several different Windows installations, I > came > across some problems with spaces in paths that often prevent weave from > working. > I can see a change that could probably get weave working on most Windows > installations, but it is a quick hack. Someone knowledgeable about > distutils > (and numpy.distutils?) might be able to help me fix this properly. Below I > describe three common problems with weave on Windows, in the hope that this > information helps others, or allows someone to suggest how to fix the > spaces-in- > paths problem properly. > > I think there are three common problems that stop weave from working on > Windows. > The first is not having a C compiler. Both Python(x,y) and EPD provide a C > compiler that seems to work fine, which is great! > > The second problem is that if weave is installed to a location with a space > in > the path, linking fails. There is already a scipy bug report about this > (http://projects.scipy.org/scipy/ticket/809). I've just commented on that > report, saying the problem appears to be with distutils, and there is > already a > Python bug report about it (http://bugs.python.org/issue4508). Maybe > someone > could close this scipy bug, or link it to the Python one somehow? In any > case, > when using Python(x,y) or EPD, this bug will not show up if the default > installation locations are accepted. So, that's also good news! > > The third problem is that if the Windows user name has a space in it (which > in > my experience is quite common), compilation fails. Weave uses the user name > to > create a path for its "intermediate" and "compiled" files. When the > compilation > command is issued, the path with the space in it is also not quoted. > Presumably > that is another error in distutils (or numpy.distutils)? Unfortunately I > wasn't > able to pinpoint what function is failing to quote strings properly, > because I > couldn't figure out the chain that leads to the compiler being called. > However, > I can avoid the problem by removing spaces from the user name in weave > itself > (catalog.py): > > def whoami(): > """return a string identifying the user.""" > return (os.environ.get("USER") or os.environ.get("USERNAME") or > "unknown").replace(" ","") > > (where I have added .replace(" ","") to the existing code). > > I realize this isn't the right solution, so if someone could help to guide > me to > the point where quoting should occur, that would be very helpful. > Otherwise, is > there any chance of applying a hack like this so weave can work reliably on > Windows? > > Thanks, > Chris > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdh2358 at gmail.com Wed Aug 25 10:00:00 2010 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 25 Aug 2010 09:00:00 -0500 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe Message-ID: Suppose I have an ordered list/array of numbers, and I want to split them into N chunks, such that the intersection of any chunk with each other is empty and the data is split as evenly as possible (eg the std dev of the lengths of the chunks is minimized or some other such criterion). Context: I am trying to do a quintile analysis on some data, and np.percentile doesn't behave like I want because more than 20% of my data equals 1, so 1 is in the first and second quintiles. I want to avoid this -- I'd rather have uneven counts in my quintiles than have the same value show up in multiple quintiles, but I'd like the counts to be as even as possible.. Here is some sample code that illustrates my problem: In [178]: run ~/test tile i=1 range=[1.00, 1.00), count=0 tile i=2 range=[1.00, 3.00), count=79 tile i=3 range=[3.00, 4.60), count=42 tile i=4 range=[4.60, 11.00), count=39 tile i=5 range=[11.00, 43.00), count=41 import numpy as np x = np.array([ 2., 3., 4., 5., 1., 2., 1., 1., 1., 2., 3., 1., 2., 3., 1., 2., 3., 1., 2., 3., 4., 1., 1., 2., 3., 2., 2., 3., 4., 5., 1., 2., 3., 4., 5., 6., 7., 1., 1., 2., 3., 4., 5., 6., 7., 1., 2., 3., 1., 2., 1., 2., 3., 1., 2., 4., 1., 2., 1., 2., 3., 4., 5., 6., 1., 2., 3., 1., 1., 1., 1., 1., 1., 2., 1., 2., 3., 1., 2., 3., 1., 1., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 1., 1., 2., 3., 1., 2., 3., 4., 5., 6., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 1., 2., 3., 4., 5., 6., 7., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 1., 2., 1., 2., 3., 4., 5., 6., 1., 2., 3., 4., 5., 6., 1., 2., 3., 4., 1., 2., 3., 4., 5., 6., 1., 2., 3., 1., 2., 1., 2.]) tiles = np.percentile(x, (0, 20, 40, 60, 80, 100)) print for i in range(1, len(tiles)): xmin, xmax = tiles[i-1], tiles[i] print 'tile i=%d range=[%.2f, %.2f), count=%d'%(i, xmin, xmax, ((x>=xmin) & (x References: Message-ID: On Wed, Aug 25, 2010 at 7:00 AM, John Hunter wrote: > Suppose I have an ordered list/array of numbers, and I want to split > them into N chunks, such that the intersection of any chunk with each > other is empty and the data is split as evenly as possible (eg the std > dev of the lengths of the chunks is minimized or some other such > criterion). How about using the percentiles of np.unique(x)? That takes care of the first constraint (no overlap) but ignores the second constraint (min std of cluster size). From jdh2358 at gmail.com Wed Aug 25 10:19:05 2010 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 25 Aug 2010 09:19:05 -0500 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman wrote: > How about using the percentiles of np.unique(x)? That takes care of > the first constraint (no overlap) but ignores the second constraint > (min std of cluster size). Well, I need the 2nd constraint.... JDH From kwgoodman at gmail.com Wed Aug 25 10:32:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 25 Aug 2010 07:32:24 -0700 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: On Wed, Aug 25, 2010 at 7:19 AM, John Hunter wrote: > On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman wrote: > >> How about using the percentiles of np.unique(x)? That takes care of >> the first constraint (no overlap) but ignores the second constraint >> (min std of cluster size). > > Well, I need the 2nd constraint.... Both can't be hard constraints, so I guess the first step is to define a utility function that quantifies the trade off between the two. Would it make sense to then start from the percentile(unique(x), ...) solution and come up with a heuristic that moves an item with lots of repeats in a large length quintile to a short lenght quintile and then accept the moves if it improves the utility? Or try moving each item to each of the other 4 quintiles and do the move the improves the utility the most. Then repeat until the utility doesn't improve. But I guess I'm just stating the obvious and you are looking for something less obvious and more clever. From josef.pktd at gmail.com Wed Aug 25 10:44:49 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Aug 2010 10:44:49 -0400 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman wrote: > On Wed, Aug 25, 2010 at 7:19 AM, John Hunter wrote: >> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman wrote: >> >>> How about using the percentiles of np.unique(x)? That takes care of >>> the first constraint (no overlap) but ignores the second constraint >>> (min std of cluster size). >> >> Well, I need the 2nd constraint.... > > Both can't be hard constraints, so I guess the first step is to define > a utility function that quantifies the trade off between the two. > Would it make sense to then start from the percentile(unique(x), ...) > solution and come up with a heuristic that moves an item with lots of > repeats in a large length quintile to a short lenght quintile and then > accept the moves if it improves the utility? Or try moving each item > to each of the other 4 quintiles and do the move the improves the > utility the most. Then repeat until the utility doesn't improve. But I > guess I'm just stating the obvious and you are looking for something > less obvious and more clever. > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > What I'm doing for some statistical analysis, e.g. chisquare test with integer data (discrete random variable)? np.bincount to get the full count, or use theoretical pdf, then loop over the integers (raw bins) and merge them to satisfy the constraints. constraints that I'm using are equal binsizes in one version and minimum binsizes in the second version. I haven't found anything else than the loop over the uniques, but I think there was some discussion on this some time ago on a mailing list. Josef From jswhit at fastmail.fm Wed Aug 25 10:44:51 2010 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Wed, 25 Aug 2010 08:44:51 -0600 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: <4C752C63.2070102@fastmail.fm> On 8/25/10 8:00 AM, John Hunter wrote: > Suppose I have an ordered list/array of numbers, and I want to split > them into N chunks, such that the intersection of any chunk with each > other is empty and the data is split as evenly as possible (eg the std > dev of the lengths of the chunks is minimized or some other such > criterion). Context: I am trying to do a quintile analysis on some > data, and np.percentile doesn't behave like I want because more than > 20% of my data equals 1, so 1 is in the first and second quintiles. > I want to avoid this -- I'd rather have uneven counts in my quintiles > than have the same value show up in multiple quintiles, but I'd like > the counts to be as even as possible.. > > Here is some sample code that illustrates my problem: > > .... John: This is a problem we have quite often analyzing precip data in arid regions - most of the time it just doesn't rain so the distribution has a delta function peak at zero. There is no good way around it. Sometimes people split up the sample into rain and no-rain, and treat the two distributions separately. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jeffrey.S.Whitaker at noaa.gov 325 Broadway Office : Skaggs Research Cntr 1D-113 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg From ben.root at ou.edu Wed Aug 25 11:24:06 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 25 Aug 2010 10:24:06 -0500 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: On Wed, Aug 25, 2010 at 9:00 AM, John Hunter wrote: > Suppose I have an ordered list/array of numbers, and I want to split > them into N chunks, such that the intersection of any chunk with each > other is empty and the data is split as evenly as possible (eg the std > dev of the lengths of the chunks is minimized or some other such > criterion). Context: I am trying to do a quintile analysis on some > data, and np.percentile doesn't behave like I want because more than > 20% of my data equals 1, so 1 is in the first and second quintiles. > I want to avoid this -- I'd rather have uneven counts in my quintiles > than have the same value show up in multiple quintiles, but I'd like > the counts to be as even as possible.. > > Here is some sample code that illustrates my problem: > > In [178]: run ~/test > > tile i=1 range=[1.00, 1.00), count=0 > tile i=2 range=[1.00, 3.00), count=79 > tile i=3 range=[3.00, 4.60), count=42 > tile i=4 range=[4.60, 11.00), count=39 > tile i=5 range=[11.00, 43.00), count=41 > > > import numpy as np > > x = np.array([ 2., 3., 4., 5., 1., 2., 1., 1., 1., 2., > 3., > 1., 2., 3., 1., 2., 3., 1., 2., 3., 4., 1., > 1., 2., 3., 2., 2., 3., 4., 5., 1., 2., 3., > 4., 5., 6., 7., 1., 1., 2., 3., 4., 5., 6., > 7., 1., 2., 3., 1., 2., 1., 2., 3., 1., 2., > 4., 1., 2., 1., 2., 3., 4., 5., 6., 1., 2., > 3., 1., 1., 1., 1., 1., 1., 2., 1., 2., 3., > 1., 2., 3., 1., 1., 1., 2., 3., 4., 5., 6., > 7., 8., 9., 10., 1., 1., 2., 3., 1., 2., 3., > 4., 5., 6., 1., 2., 3., 4., 5., 6., 7., 8., > 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., > 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., > 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., > 42., 43., 1., 2., 3., 4., 5., 6., 7., 1., 2., > 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., > 14., 15., 16., 17., 18., 19., 1., 2., 1., 2., 3., > 4., 5., 6., 1., 2., 3., 4., 5., 6., 1., 2., > 3., 4., 1., 2., 3., 4., 5., 6., 1., 2., 3., > 1., 2., 1., 2.]) > > > tiles = np.percentile(x, (0, 20, 40, 60, 80, 100)) > > print > for i in range(1, len(tiles)): > xmin, xmax = tiles[i-1], tiles[i] > print 'tile i=%d range=[%.2f, %.2f), count=%d'%(i, xmin, xmax, > ((x>=xmin) & (x Just a crazy thought, but maybe kmeans clustering might be what you are looking for? If you know ahead of time the number of bins you want, you can let kmeans try and group things automatically. The ones will all fall into the same membership (and any other duplicated values will, too). If you sort the data first, then the behavior should be consistent. I once used kmeans to help "snap" height data from multiple observations together onto a common set of heights. The obs would have many zero height values, but then the rest of the values would not have many repeated values. This approach worked great in our particular application. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Aug 25 11:44:22 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 25 Aug 2010 10:44:22 -0500 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: References: Message-ID: <4C753A56.304@gmail.com> On 08/25/2010 09:44 AM, josef.pktd at gmail.com wrote: > On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman wrote: >> On Wed, Aug 25, 2010 at 7:19 AM, John Hunter wrote: >>> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman wrote: >>> >>>> How about using the percentiles of np.unique(x)? That takes care of >>>> the first constraint (no overlap) but ignores the second constraint >>>> (min std of cluster size). >>> Well, I need the 2nd constraint.... >> Both can't be hard constraints, so I guess the first step is to define >> a utility function that quantifies the trade off between the two. >> Would it make sense to then start from the percentile(unique(x), ...) >> solution and come up with a heuristic that moves an item with lots of >> repeats in a large length quintile to a short lenght quintile and then >> accept the moves if it improves the utility? Or try moving each item >> to each of the other 4 quintiles and do the move the improves the >> utility the most. Then repeat until the utility doesn't improve. But I >> guess I'm just stating the obvious and you are looking for something >> less obvious and more clever. >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > What I'm doing for some statistical analysis, e.g. chisquare test with > integer data (discrete random variable)? > > np.bincount to get the full count, or use theoretical pdf, > then loop over the integers (raw bins) and merge them to satisfy the > constraints. > > constraints that I'm using are equal binsizes in one version and > minimum binsizes in the second version. > > I haven't found anything else than the loop over the uniques, but I > think there was some discussion on this some time ago on a mailing > list. > > Josef > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev As others have indicated you have to work with the unique values as well as the frequencies. Hopefully you can determine what I mean from the code below and modify it as needed. It is brute force but provides a couple of options as the following output indicates. 3 [(2, 44), (3, 35), (5, 42), (13, 43), (43, 38)] 4 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)] 5 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)] 6 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)] 7 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)] 8 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)] 9 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)] Some notes: 1) For this example, you need an average of 41 per group (202 elements divided by 5). But that will be impossible because the value '1' has a frequency of 44, the sum of frequencies of '2' and '3' is 61. This means we need some way to allow slight increases in sizes - I use the variable eval which is the expected count plus some threshold (berror). If you have floats then you can not use np.bincount directly. So if these are integers use them directly or use some function to create these in the desirable range (such as np.ceil or work with 10*x etc.) Bruce binx=np.bincount(x.astype(int)) for berror in range(10): # loop over a range of possible variations in the counts eval=berror+np.ceil(binx.sum()/5.0) # find a count threshold count=0 quintile=[] for i in range(binx.shape[0]): #loop over the frequencies to determine which bin if count+binx[i] > (eval): # If the bin overflows then start a new one quintile.append((i, count)) count=binx[i] else: #other keep adding into current bin count +=binx[i] quintile.append((i, count)) #add the last bin if len(quintile)==5: # we must have five bins otherwise that loop is useless. You can also apply other criteria here as well. print berror, quintile From josef.pktd at gmail.com Wed Aug 25 12:08:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Aug 2010 12:08:54 -0400 Subject: [SciPy-Dev] splitting an ordered list as evenly as possilbe In-Reply-To: <4C753A56.304@gmail.com> References: <4C753A56.304@gmail.com> Message-ID: On Wed, Aug 25, 2010 at 11:44 AM, Bruce Southey wrote: > ?On 08/25/2010 09:44 AM, josef.pktd at gmail.com wrote: >> On Wed, Aug 25, 2010 at 10:32 AM, Keith Goodman ?wrote: >>> On Wed, Aug 25, 2010 at 7:19 AM, John Hunter ?wrote: >>>> On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman ?wrote: >>>> >>>>> How about using the percentiles of np.unique(x)? That takes care of >>>>> the first constraint (no overlap) but ignores the second constraint >>>>> (min std of cluster size). >>>> Well, I need the 2nd constraint.... >>> Both can't be hard constraints, so I guess the first step is to define >>> a utility function that quantifies the trade off between the two. >>> Would it make sense to then start from the percentile(unique(x), ...) >>> solution and come up with a heuristic that moves an item with lots of >>> repeats in a large length quintile to a short lenght quintile and then >>> accept the moves if it improves the utility? Or try moving each item >>> to each of the other 4 quintiles and do the move the improves the >>> utility the most. Then repeat until the utility doesn't improve. But I >>> guess I'm just stating the obvious and you are looking for something >>> less obvious and more clever. >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> What I'm doing for some statistical analysis, e.g. chisquare test with >> integer data (discrete random variable)? >> >> np.bincount to get the full count, or use theoretical pdf, >> then loop over the integers (raw bins) and merge them to satisfy the >> constraints. >> >> constraints that I'm using are equal binsizes in one version and >> minimum binsizes in the second version. >> >> I haven't found anything else than the loop over the uniques, but I >> think there was some discussion on this some time ago on a mailing >> list. >> >> Josef >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > As others have indicated you have to work with the unique values as well > as the frequencies. > > Hopefully you can determine what I mean from the code below and modify > it as needed. It is brute force but provides a couple of options as the > following output indicates. > > 3 [(2, 44), (3, 35), (5, 42), (13, 43), (43, 38)] > 4 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)] > 5 [(2, 44), (3, 35), (5, 42), (14, 45), (43, 36)] > 6 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)] > 7 [(2, 44), (3, 35), (5, 42), (15, 47), (43, 34)] > 8 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)] > 9 [(2, 44), (3, 35), (5, 42), (16, 49), (43, 32)] > > > Some notes: > 1) For this example, you need an average of 41 per group (202 elements > divided by 5). But that will be impossible because the value '1' has a > frequency of 44, the sum of frequencies of '2' and '3' is 61. This means > we need some way to allow slight increases in sizes - I use the variable > eval which is the expected count plus some threshold (berror). > > If you have floats then you can not use np.bincount directly. So if > these are integers use them directly or use some function to create > these in the desirable range (such as np.ceil or work with 10*x etc.) > > Bruce > > binx=np.bincount(x.astype(int)) > for berror in range(10): # loop over a range of possible variations in > the counts > ? ? eval=berror+np.ceil(binx.sum()/5.0) # find a count threshold > ? ? count=0 > ? ? quintile=[] > ? ? for i in range(binx.shape[0]): #loop over the frequencies to > determine which bin > ? ? ? ? if count+binx[i] > (eval): # If the bin overflows then start a > new one > ? ? ? ? ? ? quintile.append((i, count)) > ? ? ? ? ? ? count=binx[i] > ? ? ? ? else: #other keep adding into current bin > ? ? ? ? ? ? count +=binx[i] > ? ? quintile.append((i, count)) #add the last bin > ? ? if len(quintile)==5: # we must have five bins otherwise that loop > is useless. You can also apply other criteria here as well. > ? ? ? ? print berror, quintile I don't think you can assume anything about the minimum number of bins in general. For example, my similar code needed to work also for binary distributions with at most two unique values and bins. A degenerate distribution would have only a single value and bin. eval is a built-in function Josef > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From gokhansever at gmail.com Fri Aug 27 11:55:32 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 10:55:32 -0500 Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure Message-ID: Hello, On a Fedora 13 VirtualBox setup Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686 i686 i386 GNU/Linux python -c 'import numpy; numpy.test()' Running unit tests for numpy NumPy version 2.0.0.dev8671 NumPy is installed in /usr/lib/python2.6/site-packages/numpy Python version 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] nose version 0.11.3 ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K..................................................................................................................................................................................................................................K............................................................................................K......................K.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.....................................................................................................................................................................................................................................................................................................................................................................................................................................Warning: divide by zero encountered in log ....................................................................................................................................................................................................................................................................................... ====================================================================== FAIL: test_lapack (test_build.TestF77Mismatch) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/numpy/testing/decorators.py", line 146, in skipper_func return f(*args, **kwargs) File "/usr/lib/python2.6/site-packages/numpy/linalg/tests/test_build.py", line 50, in test_lapack information.""") AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! This is likely to cause random crashes and wrong results. See numpy INSTALL.txt for more information. "Fail the test if the expression is true." >> if True: raise self.failureException, 'Both g77 and gfortran runtimes linked in lapack_lite ! This is likely to\ncause random crashes and wrong results. See numpy INSTALL.txt for more\ninformation.' ---------------------------------------------------------------------- Ran 3024 tests in 21.928s FAILED (KNOWNFAIL=4, failures=1) Any idea how to resolve this one? I use package manager to install requirements. It seems g77 and gfortran are mixed for lapack, but not sure how to fix it. When I try to uninstall gfortran it tries to remove lapack/blas/atlas all. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Fri Aug 27 11:58:17 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 10:58:17 -0500 Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault) Message-ID: Hello, Again on Fedora 13 Virtualbox setup: Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686 i686 i386 GNU/Linux Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.test('full') Running unit tests for scipy NumPy version 2.0.0.dev8671 NumPy is installed in /usr/lib/python2.6/site-packages/numpy SciPy version 0.9.0.dev6651 SciPy is installed in /usr/lib/python2.6/site-packages/scipy Python version 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] nose version 0.11.3 ................................................................................................................................................................................................................................................................................/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:670: UserWarning: The coefficients of the spline returned have been computed as the minimal norm least-squares solution of a (numerically) rank deficient system (deficiency=7). If deficiency is large, the results may be inaccurate. Deficiency may strongly depend on the value of eps. warnings.warn(message) ....../usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:601: UserWarning: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots. warnings.warn(message) .............................................K..K............................................................Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .........Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply ...................................................................................................................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/io/recaster.py:328: ComplexWarning: Casting complex values to real discards the imaginary part test_arr = arr.astype(T) ../usr/lib/python2.6/site-packages/scipy/io/recaster.py:375: ComplexWarning: Casting complex values to real discards the imaginary part return arr.astype(idt) ../usr/lib/python2.6/site-packages/scipy/io/wavfile.py:30: WavFileWarning: Unfamiliar format bytes warnings.warn("Unfamiliar format bytes", WavFileWarning) /usr/lib/python2.6/site-packages/scipy/io/wavfile.py:120: WavFileWarning: chunk not understood warnings.warn("chunk not understood", WavFileWarning) .......................................................................................F............................................/usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:86: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ....../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:196: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) .................../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:279: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ..................................................................SSSSSS......SSSSSS......SSSS.....................................................FF........F..Warning: invalid value encountered in divide .....Warning: invalid value encountered in divide Warning: invalid value encountered in divide .......................................................................................................................................................................................................K............................................./usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:89: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ....../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:199: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) .................../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:282: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ....................................................................../usr/lib/python2.6/site-packages/scipy/linalg/matfuncs.py:94: ComplexWarning: Casting complex values to real discards the imaginary part return dot(dot(vr,diag(exp(s))),vri).astype(t) .................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:56: ComplexWarning: Casting complex values to real discards the imaginary part a = a.astype(numpy.float64) /usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:58: ComplexWarning: Casting complex values to real discards the imaginary part b = b.astype(numpy.float64) ......................................................................................................Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide .....................................................................Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide ........................Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide ..............................................9 7 ....F...............Warning: invalid value encountered in divide ................./usr/lib/python2.6/site-packages/scipy/signal/filter_design.py:247: BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) ..............................................................................................................................................................................................................................................................................................SSSSSSSSSSS..........Segmentation fault (core dumped) -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Fri Aug 27 14:14:32 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 27 Aug 2010 13:14:32 -0500 Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault) In-Reply-To: References: Message-ID: <4C780088.4090602@gmail.com> On 08/27/2010 10:58 AM, G?khan Sever wrote: > Hello, > > Again on Fedora 13 Virtualbox setup: > Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 > i686 i686 i386 GNU/Linux > > > Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) > [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import scipy > >>> scipy.test('full') > Running unit tests for scipy > NumPy version 2.0.0.dev8671 > NumPy is installed in /usr/lib/python2.6/site-packages/numpy > SciPy version 0.9.0.dev6651 > SciPy is installed in /usr/lib/python2.6/site-packages/scipy > Python version 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 > 20100503 (Red Hat 4.4.4-2)] > nose version 0.11.3 > ................................................................................................................................................................................................................................................................................/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:670: > UserWarning: > The coefficients of the spline returned have been computed as the > minimal norm least-squares solution of a (numerically) rank deficient > system (deficiency=7). If deficiency is large, the results may be > inaccurate. Deficiency may strongly depend on the value of eps. > warnings.warn(message) > ....../usr/lib/python2.6/site-packages/scipy/interpolate/fitpack2.py:601: > UserWarning: > The required storage space exceeds the available storage space: nxest > or nyest too small, or s too small. > The weighted least-squares spline corresponds to the current set of > knots. > warnings.warn(message) > .............................................K..K............................................................Warning: > divide by zero encountered in log > Warning: invalid value encountered in multiply > Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > .Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > .Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > .........Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > Warning: divide by zero encountered in log > Warning: invalid value encountered in multiply > ...................................................................................................................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/io/recaster.py:328: > ComplexWarning: Casting complex values to real discards the imaginary part > test_arr = arr.astype(T) > ../usr/lib/python2.6/site-packages/scipy/io/recaster.py:375: > ComplexWarning: Casting complex values to real discards the imaginary part > return arr.astype(idt) > ../usr/lib/python2.6/site-packages/scipy/io/wavfile.py:30: > WavFileWarning: Unfamiliar format bytes > warnings.warn("Unfamiliar format bytes", WavFileWarning) > /usr/lib/python2.6/site-packages/scipy/io/wavfile.py:120: > WavFileWarning: chunk not understood > warnings.warn("chunk not understood", WavFileWarning) > .......................................................................................F............................................/usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:86: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > ....../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:196: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > .................../usr/lib/python2.6/site-packages/scipy/lib/blas/tests/test_fblas.py:279: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > ..................................................................SSSSSS......SSSSSS......SSSS.....................................................FF........F..Warning: > invalid value encountered in divide > .....Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > .......................................................................................................................................................................................................K............................................./usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:89: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > ....../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:199: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > .................../usr/lib/python2.6/site-packages/scipy/linalg/tests/test_fblas.py:282: > ComplexWarning: Casting complex values to real discards the imaginary part > self.blas_func(x,y,n=3,incy=5) > ....................................................................../usr/lib/python2.6/site-packages/scipy/linalg/matfuncs.py:94: > ComplexWarning: Casting complex values to real discards the imaginary part > return dot(dot(vr,diag(exp(s))),vri).astype(t) > .................................................................................................................................................................................................................................................../usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:56: > ComplexWarning: Casting complex values to real discards the imaginary part > a = a.astype(numpy.float64) > /usr/lib/python2.6/site-packages/scipy/ndimage/tests/test_ndimage.py:58: > ComplexWarning: Casting complex values to real discards the imaginary part > b = b.astype(numpy.float64) > ......................................................................................................Warning: > invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > .....................................................................Warning: > invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > ........................Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > Warning: invalid value encountered in divide > ..............................................9 7 > ....F...............Warning: invalid value encountered in divide > ................./usr/lib/python2.6/site-packages/scipy/signal/filter_design.py:247: > BadCoefficients: Badly conditioned filter coefficients (numerator): > the results may be meaningless > "results may be meaningless", BadCoefficients) > ..............................................................................................................................................................................................................................................................................................SSSSSSSSSSS..........Segmentation > fault (core dumped) > > > -- > G?khan > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev Hi, It would be useful to know which test is involved regardless of the issue. Can you please run the tests with the verbose option such as: 'scipy.test(verbose=10)'? This may have something to do with your numpy problem so please fix that up once you have identified the test (failure in linalg would confirm that). As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'. Did you completely wipe the previous numpy installation especially the installed numpy files in $PATH2PYTHON/site-packages/ and remove any prior build directories? If so, then you need to create suitable site.cfg file to ensure the correct compiler is being used because something has changed in either your distro or numpy install. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.m.birch at gmail.com Fri Aug 27 14:17:25 2010 From: sam.m.birch at gmail.com (Sam Birch) Date: Fri, 27 Aug 2010 14:17:25 -0400 Subject: [SciPy-Dev] scipy.stats.kde Message-ID: Hi all, I was thinking of renovating the kernel density estimation package (although no promises; I'm leaving for college tomorrow morning!). I was wondering: a) whether anyone had started code in that direction b) what people want in it I was thinking (as an ideal, not necessarily goal): - Support for more than Gaussian kernels (e.g. custom, uniform, Epanechnikov, triangular, quartic, cosine, etc.) - More options for bandwidth selection (custom bandwidth matrices, AMISE optimization, cross-validation, etc.) - Assorted conveniences: automatically generate the mesh, limit the kernel's support for speed So, thoughts anyone? I figure it's better to over-specify and then under-produce, so don't hold back. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Aug 27 14:38:45 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Aug 2010 14:38:45 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch wrote: > Hi all, > I was thinking of?renovating?the kernel density estimation package (although > no promises; I'm leaving for college tomorrow morning!). I was wondering: > a) whether anyone had started code in that direction Mike Crowe wrote code for kernel regression and Skipper started a 1D kernel density estimator in scikits.statsmodels, which cover a larger number of kernels I don't think I have seen any higher dimensional kernel density estimation in python besides scipy.stats.kde. The Gaussian kde in scipy.stats is targeted to the underlying Fortran code for multivariate normal cdf. It's not clear to me what other n-dimensional kdes would require or whether they would fit well with the current code. One extension that Robert also mentioned in the past that it would be nice to have adaptive kernels, which I also haven't seen in python yet. > b) what people want in it > I was thinking (as an ideal, not?necessarily?goal): > - Support for more than Gaussian kernels (e.g. custom, > uniform,?Epanechnikov, triangular, quartic, cosine, etc.) > - More options for bandwidth selection (custom bandwidth?matrices, AMISE > optimization, cross-validation, etc.) definitely yes, I don't think they are even available for 1D yet. > - Assorted conveniences: automatically generate the mesh, limit the kernel's > support for speed Using scipy.spatial to limit the number of neighbors in a bounded support kernel might be a good idea. (just some thought on the topic) Josef > So, thoughts anyone? I figure it's better to over-specify and then > under-produce, so don't hold back. > Thanks, > Sam > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From sam.m.birch at gmail.com Fri Aug 27 14:47:37 2010 From: sam.m.birch at gmail.com (Sam Birch) Date: Fri, 27 Aug 2010 14:47:37 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: Well perhaps I should start with a module that does other kernels & bandwidth estimation then? Then everybody who uses them can use a standard implementation. Is that an appropriate addition to SciPy core? On Fri, Aug 27, 2010 at 2:38 PM, wrote: > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch wrote: > > Hi all, > > I was thinking of renovating the kernel density estimation package > (although > > no promises; I'm leaving for college tomorrow morning!). I was wondering: > > a) whether anyone had started code in that direction > > Mike Crowe wrote code for kernel regression and Skipper started a 1D > kernel density estimator in scikits.statsmodels, which cover a larger > number of kernels > > I don't think I have seen any higher dimensional kernel density > estimation in python besides scipy.stats.kde. The Gaussian kde in > scipy.stats is targeted to the underlying Fortran code for > multivariate normal cdf. > It's not clear to me what other n-dimensional kdes would require or > whether they would fit well with the current code. > > One extension that Robert also mentioned in the past that it would be > nice to have adaptive kernels, which I also haven't seen in python > yet. > > > b) what people want in it > > I was thinking (as an ideal, not necessarily goal): > > - Support for more than Gaussian kernels (e.g. custom, > > uniform, Epanechnikov, triangular, quartic, cosine, etc.) > > - More options for bandwidth selection (custom bandwidth matrices, AMISE > > optimization, cross-validation, etc.) > > definitely yes, I don't think they are even available for 1D yet. > > > - Assorted conveniences: automatically generate the mesh, limit the > kernel's > > support for speed > > Using scipy.spatial to limit the number of neighbors in a bounded > support kernel might be a good idea. > > (just some thought on the topic) > > Josef > > > So, thoughts anyone? I figure it's better to over-specify and then > > under-produce, so don't hold back. > > Thanks, > > Sam > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Fri Aug 27 14:48:29 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 27 Aug 2010 14:48:29 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: My only experience with KDEs has been on the circle, where there seems to be little or no literature and the constraints are rather different. On 27 August 2010 14:38, wrote: > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch wrote: >> Hi all, >> I was thinking of renovating the kernel density estimation package (although >> no promises; I'm leaving for college tomorrow morning!). I was wondering: >> a) whether anyone had started code in that direction > > Mike Crowe wrote code for kernel regression and Skipper started a 1D > kernel density estimator in scikits.statsmodels, which cover a larger > number of kernels > > I don't think I have seen any higher dimensional kernel density > estimation in python besides scipy.stats.kde. The Gaussian kde in > scipy.stats is targeted to the underlying Fortran code for > multivariate normal cdf. > It's not clear to me what other n-dimensional kdes would require or > whether they would fit well with the current code. > > One extension that Robert also mentioned in the past that it would be > nice to have adaptive kernels, which I also haven't seen in python > yet. > >> b) what people want in it >> I was thinking (as an ideal, not necessarily goal): >> - Support for more than Gaussian kernels (e.g. custom, >> uniform, Epanechnikov, triangular, quartic, cosine, etc.) >> - More options for bandwidth selection (custom bandwidth matrices, AMISE >> optimization, cross-validation, etc.) > > definitely yes, I don't think they are even available for 1D yet. Bandwidth selection is a hotly debated topic, at least in one dimension, so perhaps not just different methods but tools for diagnosing bandwidth selection problems would be nice - at the least, it should be made straightforward to vary the bandwidth (e.g. to plot the KDE with a range of different bandwidth values). >> - Assorted conveniences: automatically generate the mesh, limit the kernel's >> support for speed > > Using scipy.spatial to limit the number of neighbors in a bounded > support kernel might be a good idea. Simply using it to find the neighbors that need to be used should speed things up. There may also be some shortcuts for unbounded-support kernels (no point adding a Gaussian a hundred sigma away if there's any points nearby). At the other end of the spectrum, for very dense KDEs, on the circle I found it extremely convenient to use Fourier transforms to carry out the convolution of kernel with points. In particular, I represented the KDE in terms of its Fourier coefficients, so that an inverse FFT immediately gave me the KDE evaluated on a grid (or, with some fiddling, integrated over the bins of a histogram). I don't know whether this is a useful optimization for KDEs on the line or in higher dimensions, since there's the problem of wrapping. Anne > (just some thought on the topic) > > Josef > >> So, thoughts anyone? I figure it's better to over-specify and then >> under-produce, so don't hold back. >> Thanks, >> Sam >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From robert.kern at gmail.com Fri Aug 27 14:56:27 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Aug 2010 13:56:27 -0500 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 13:38, wrote: > I don't think I have seen any higher dimensional kernel density > estimation in python besides scipy.stats.kde. The Gaussian kde in > scipy.stats is targeted to the underlying Fortran code for > multivariate normal cdf. Only for the "integrate over a box" functionality, which was what I needed at the time but is pretty rarely required otherwise. The rest is pure numpy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From aarchiba at physics.mcgill.ca Fri Aug 27 15:05:17 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 27 Aug 2010 15:05:17 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On 27 August 2010 14:56, Robert Kern wrote: > On Fri, Aug 27, 2010 at 13:38, wrote: > >> I don't think I have seen any higher dimensional kernel density >> estimation in python besides scipy.stats.kde. The Gaussian kde in >> scipy.stats is targeted to the underlying Fortran code for >> multivariate normal cdf. > > Only for the "integrate over a box" functionality, which was what I > needed at the time but is pretty rarely required otherwise. The rest > is pure numpy. I should say, integrating over a box is something I do all the time, though that is partly because it is cheap in my setting. For example, for plotting on a grid, what you really want to do is not sample on the grid but produce average values over the grid cells - this way you never miss or exaggerate a peak. So having efficient methods to integrate over one box or all grid cells can be really handy. Unfortunately I think it is often expensive even when approximations are made that allow discarding sufficiently distant points. Anne > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From robert.kern at gmail.com Fri Aug 27 15:09:22 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Aug 2010 14:09:22 -0500 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 14:05, Anne Archibald wrote: > On 27 August 2010 14:56, Robert Kern wrote: >> On Fri, Aug 27, 2010 at 13:38, ? wrote: >> >>> I don't think I have seen any higher dimensional kernel density >>> estimation in python besides scipy.stats.kde. The Gaussian kde in >>> scipy.stats is targeted to the underlying Fortran code for >>> multivariate normal cdf. >> >> Only for the "integrate over a box" functionality, which was what I >> needed at the time but is pretty rarely required otherwise. The rest >> is pure numpy. > > I should say, integrating over a box is something I do all the time, > though that is partly because it is cheap in my setting. For example, > for plotting on a grid, what you really want to do is not sample on > the grid but produce average values over the grid cells - this way you > never miss or exaggerate a peak. So having efficient methods to > integrate over one box or all grid cells can be really handy. > Unfortunately I think it is often expensive even when approximations > are made that allow discarding sufficiently distant points. Well okay then. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From sam.m.birch at gmail.com Fri Aug 27 15:27:45 2010 From: sam.m.birch at gmail.com (Sam Birch) Date: Fri, 27 Aug 2010 15:27:45 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: > > Bandwidth selection is a hotly debated topic, at least in one dimension, so perhaps not just different methods but tools for diagnosing bandwidth selection problems would be nice - at the least, it should be made straightforward to vary the bandwidth (e.g. to plot the KDE with a range of different bandwidth values). Well by allowing them to use a custom bandwidth matrix they can vary it themselves, no? At the other end of the spectrum, for very dense KDEs, on the circle I found it extremely convenient to use Fourier transforms to carry out the convolution of kernel with points. In particular, I represented the KDE in terms of its Fourier coefficients, so that an inverse FFT immediately gave me the KDE evaluated on a grid (or, with some fiddling, integrated over the bins of a histogram). I don't know whether this is a useful optimization for KDEs on the line or in higher dimensions, since there's the problem of wrapping. That sounds very interesting. Sorry if I'm being dense (or just wrong, or both), but do you convolve post-FFT or before? If before why does it make it easier? -Sam On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald wrote: > My only experience with KDEs has been on the circle, where there seems > to be little or no literature and the constraints are rather > different. > > On 27 August 2010 14:38, wrote: > > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch > wrote: > >> Hi all, > >> I was thinking of renovating the kernel density estimation package > (although > >> no promises; I'm leaving for college tomorrow morning!). I was > wondering: > >> a) whether anyone had started code in that direction > > > > Mike Crowe wrote code for kernel regression and Skipper started a 1D > > kernel density estimator in scikits.statsmodels, which cover a larger > > number of kernels > > > > I don't think I have seen any higher dimensional kernel density > > estimation in python besides scipy.stats.kde. The Gaussian kde in > > scipy.stats is targeted to the underlying Fortran code for > > multivariate normal cdf. > > It's not clear to me what other n-dimensional kdes would require or > > whether they would fit well with the current code. > > > > One extension that Robert also mentioned in the past that it would be > > nice to have adaptive kernels, which I also haven't seen in python > > yet. > > > >> b) what people want in it > >> I was thinking (as an ideal, not necessarily goal): > >> - Support for more than Gaussian kernels (e.g. custom, > >> uniform, Epanechnikov, triangular, quartic, cosine, etc.) > >> - More options for bandwidth selection (custom bandwidth matrices, AMISE > >> optimization, cross-validation, etc.) > > > > definitely yes, I don't think they are even available for 1D yet. > > Bandwidth selection is a hotly debated topic, at least in one > dimension, so perhaps not just different methods but tools for > diagnosing bandwidth selection problems would be nice - at the least, > it should be made straightforward to vary the bandwidth (e.g. to plot > the KDE with a range of different bandwidth values). > > >> - Assorted conveniences: automatically generate the mesh, limit the > kernel's > >> support for speed > > > > Using scipy.spatial to limit the number of neighbors in a bounded > > support kernel might be a good idea. > > Simply using it to find the neighbors that need to be used should > speed things up. There may also be some shortcuts for > unbounded-support kernels (no point adding a Gaussian a hundred sigma > away if there's any points nearby). > > At the other end of the spectrum, for very dense KDEs, on the circle I > found it extremely convenient to use Fourier transforms to carry out > the convolution of kernel with points. In particular, I represented > the KDE in terms of its Fourier coefficients, so that an inverse FFT > immediately gave me the KDE evaluated on a grid (or, with some > fiddling, integrated over the bins of a histogram). I don't know > whether this is a useful optimization for KDEs on the line or in > higher dimensions, since there's the problem of wrapping. > > Anne > > > (just some thought on the topic) > > > > Josef > > > >> So, thoughts anyone? I figure it's better to over-specify and then > >> under-produce, so don't hold back. > >> Thanks, > >> Sam > >> _______________________________________________ > >> SciPy-Dev mailing list > >> SciPy-Dev at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-dev > >> > >> > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Aug 27 15:39:27 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Aug 2010 15:39:27 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 3:27 PM, Sam Birch wrote: >> Bandwidth selection is a hotly debated topic, at least in one > > dimension, so perhaps not just different methods but tools for > > diagnosing bandwidth selection problems would be nice - at the least, > > it should be made straightforward to vary the bandwidth (e.g. to plot > > the KDE with a range of different bandwidth values). > > Well by allowing them to use a custom bandwidth matrix they can vary it > themselves, no? > >> At the other end of the spectrum, for very dense KDEs, on the circle I > > found it extremely convenient to use Fourier transforms to carry out > > the convolution of kernel with points. In particular, I represented > > the KDE in terms of its Fourier coefficients, so that an inverse FFT > > immediately gave me the KDE evaluated on a grid (or, with some > > fiddling, integrated over the bins of a histogram). I don't know > > whether this is a useful optimization for KDEs on the line or in > > higher dimensions, since there's the problem of wrapping. > > That sounds very interesting. Sorry if I'm being dense (or just wrong, or > both), but do you convolve post-FFT or before? If before why does it make it > easier? and also: Do you grid the initial points first ? I think it sounds similar to what Skipper was trying at some point. >From the paper it sounded like it's expensive to construct the initial points, but then much cheaper to evaluate the kde at many points because of the use of the fft for the actual convolution. Josef > -Sam > On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald > wrote: >> >> My only experience with KDEs has been on the circle, where there seems >> to be little or no literature and the constraints are rather >> different. >> >> On 27 August 2010 14:38, ? wrote: >> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch >> > wrote: >> >> Hi all, >> >> I was thinking of renovating the kernel density estimation package >> >> (although >> >> no promises; I'm leaving for college tomorrow morning!). I was >> >> wondering: >> >> a) whether anyone had started code in that direction >> > >> > Mike Crowe wrote code for kernel regression ?and Skipper started a 1D >> > kernel density estimator in scikits.statsmodels, which cover a larger >> > number of kernels >> > >> > I don't think I have seen any higher dimensional kernel density >> > estimation in python besides scipy.stats.kde. The Gaussian kde in >> > scipy.stats is targeted to the underlying Fortran code for >> > multivariate normal cdf. >> > It's not clear to me what other n-dimensional kdes would require or >> > whether they would fit well with the current code. >> > >> > One extension that Robert also mentioned in the past that it would be >> > nice to have adaptive kernels, which I also haven't seen in python >> > yet. >> > >> >> b) what people want in it >> >> I was thinking (as an ideal, not necessarily goal): >> >> - Support for more than Gaussian kernels (e.g. custom, >> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.) >> >> - More options for bandwidth selection (custom bandwidth matrices, >> >> AMISE >> >> optimization, cross-validation, etc.) >> > >> > definitely yes, I don't think they are even available for 1D yet. >> >> Bandwidth selection is a hotly debated topic, at least in one >> dimension, so perhaps not just different methods but tools for >> diagnosing bandwidth selection problems would be nice - at the least, >> it should be made straightforward to vary the bandwidth (e.g. to plot >> the KDE with a range of different bandwidth values). >> >> >> - Assorted conveniences: automatically generate the mesh, limit the >> >> kernel's >> >> support for speed >> > >> > Using scipy.spatial to limit the number of neighbors in a bounded >> > support kernel might be a good idea. >> >> Simply using it to find the neighbors that need to be used should >> speed things up. There may also be some shortcuts for >> unbounded-support kernels (no point adding a Gaussian a hundred sigma >> away if there's any points nearby). >> >> At the other end of the spectrum, for very dense KDEs, on the circle I >> found it extremely convenient to use Fourier transforms to carry out >> the convolution of kernel with points. In particular, I represented >> the KDE in terms of its Fourier coefficients, so that an inverse FFT >> immediately gave me the KDE evaluated on a grid (or, with some >> fiddling, integrated over the bins of a histogram). I don't know >> whether this is a useful optimization for KDEs on the line or in >> higher dimensions, since there's the problem of wrapping. >> >> Anne >> >> > (just some thought on the topic) >> > >> > Josef >> > >> >> So, thoughts anyone? I figure it's better to over-specify and then >> >> under-produce, so don't hold back. >> >> Thanks, >> >> Sam >> >> _______________________________________________ >> >> SciPy-Dev mailing list >> >> SciPy-Dev at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From aarchiba at physics.mcgill.ca Fri Aug 27 15:51:50 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 27 Aug 2010 15:51:50 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: On 27 August 2010 15:27, Sam Birch wrote: >> Bandwidth selection is a hotly debated topic, at least in one > > dimension, so perhaps not just different methods but tools for > > diagnosing bandwidth selection problems would be nice - at the least, > > it should be made straightforward to vary the bandwidth (e.g. to plot > > the KDE with a range of different bandwidth values). > > Well by allowing them to use a custom bandwidth matrix they can vary it > themselves, no? Well, in principle, yes. But if the API forces them to construct an entirely new KDE object to change the bandwidth matrix, and if this object involves substantial additional data structures (e.g. a kd-tree holding the data points) this could be cumbersome. >> At the other end of the spectrum, for very dense KDEs, on the circle I > > found it extremely convenient to use Fourier transforms to carry out > > the convolution of kernel with points. In particular, I represented > > the KDE in terms of its Fourier coefficients, so that an inverse FFT > > immediately gave me the KDE evaluated on a grid (or, with some > > fiddling, integrated over the bins of a histogram). I don't know > > whether this is a useful optimization for KDEs on the line or in > > higher dimensions, since there's the problem of wrapping. > > That sounds very interesting. Sorry if I'm being dense (or just wrong, or > both), but do you convolve post-FFT or before? If before why does it make it > easier? Again, this is for work on the circle and for fairly dense data sets. But in principle, the KDE as a function is the convolution of a forest of delta functions, one per point, with the kernel. The conventional way to evaluate this function at a point is simply to evaluate the kernel once per data point and add them up. To evaluate this on a grid, you evaluate the kernel once per grid point per data point and add. Naturally this can be expensive. My idea is that convolution of functions is simply multiplication of the Fourier transforms of those functions. So instead of storing a list of data points in my KDE object, I store a representation of the forest of delta functions in terms of their Fourier coefficients (the nth Fourier coefficient of a delta function at phase p is exp(2 pi i n p)). This is necessarily approximate, since I store finitely many Fourier coefficients, but it's not hard to store "enough". Now when I want to convolve this forest by a kernel, I simply multiply these Fourier coefficients by those of the kernel. The easiest "kernel" is the sinc function, for which I simply truncate the Fourier coefficients (which is why it's easy to have enough). We actually use this "kernel" a lot, even though it's not positive everywhere. A mathematically-better choice is the von Mises distribution, whose PDF is proportional to exp(k cos(x)) and whose Fourier coefficients can be written in terms of Bessel functions. Once you have the Fourier coefficients of the KDE, you can evaluate it at a point by taking a sum of sinusoids, but the key idea is that you can evaluate it on a grid by taking an inverse FFT. If you want integrals over intervals, well, that you just get by integrating sinusoids over intervals, so there's a messy but easily-derived way to work out the area in terms of the Fourier coefficients. This too can be nicely worked out on a grid, by fiddling the Fourier coefficients and taking an inverse FFT. To construct the FCs of the forest of delta functions, if I have photon arrival phases I just take a sum (which can be slow, but this isn't really time-critical). But it's also perfectly reasonable to start from a histogram and take an FFT. Crucially, the histogram need not be a reasonable-looking histogram - you can never have too many bins, since it's not a problem if all the bin counts are either zero or one. The only drawback here is that you introduce an error averaging to half a bin width on each photon arrival phase. But the KDE provides a check on this too - if your kernel width is much larger than the width of the input bins, then the errors you introduced probably don't matter much (leaving aside nasty issues with Moire patterns in the likely case that your input times were already binned). One thing to note here is that once you have the FCs, you can try various kernels and bandwidths without going back to your original data. (You can also get uncertainties on all the various computed quantities, and in fact you can usually turn around and not only start from a histogram but start from an array of values with uncertainties. All this stuff is in a paper that's on my back burner right now.) The thing is, I don't really know how useful all this is for KDEs on a line or in R^n. The problem is that working with discrete Fourier coefficients implicitly wraps the KDE around at the ends of the interval, and it's not clear that this is still worth doing if you're going to "pad" your region enough that this isn't a problem: the padding forces you to evaluate at lots of points you don't care about and use lots more Fourier coefficients than you would otherwise have to. Anne > -Sam > On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald > wrote: >> >> My only experience with KDEs has been on the circle, where there seems >> to be little or no literature and the constraints are rather >> different. >> >> On 27 August 2010 14:38, wrote: >> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch >> > wrote: >> >> Hi all, >> >> I was thinking of renovating the kernel density estimation package >> >> (although >> >> no promises; I'm leaving for college tomorrow morning!). I was >> >> wondering: >> >> a) whether anyone had started code in that direction >> > >> > Mike Crowe wrote code for kernel regression and Skipper started a 1D >> > kernel density estimator in scikits.statsmodels, which cover a larger >> > number of kernels >> > >> > I don't think I have seen any higher dimensional kernel density >> > estimation in python besides scipy.stats.kde. The Gaussian kde in >> > scipy.stats is targeted to the underlying Fortran code for >> > multivariate normal cdf. >> > It's not clear to me what other n-dimensional kdes would require or >> > whether they would fit well with the current code. >> > >> > One extension that Robert also mentioned in the past that it would be >> > nice to have adaptive kernels, which I also haven't seen in python >> > yet. >> > >> >> b) what people want in it >> >> I was thinking (as an ideal, not necessarily goal): >> >> - Support for more than Gaussian kernels (e.g. custom, >> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.) >> >> - More options for bandwidth selection (custom bandwidth matrices, >> >> AMISE >> >> optimization, cross-validation, etc.) >> > >> > definitely yes, I don't think they are even available for 1D yet. >> >> Bandwidth selection is a hotly debated topic, at least in one >> dimension, so perhaps not just different methods but tools for >> diagnosing bandwidth selection problems would be nice - at the least, >> it should be made straightforward to vary the bandwidth (e.g. to plot >> the KDE with a range of different bandwidth values). >> >> >> - Assorted conveniences: automatically generate the mesh, limit the >> >> kernel's >> >> support for speed >> > >> > Using scipy.spatial to limit the number of neighbors in a bounded >> > support kernel might be a good idea. >> >> Simply using it to find the neighbors that need to be used should >> speed things up. There may also be some shortcuts for >> unbounded-support kernels (no point adding a Gaussian a hundred sigma >> away if there's any points nearby). >> >> At the other end of the spectrum, for very dense KDEs, on the circle I >> found it extremely convenient to use Fourier transforms to carry out >> the convolution of kernel with points. In particular, I represented >> the KDE in terms of its Fourier coefficients, so that an inverse FFT >> immediately gave me the KDE evaluated on a grid (or, with some >> fiddling, integrated over the bins of a histogram). I don't know >> whether this is a useful optimization for KDEs on the line or in >> higher dimensions, since there's the problem of wrapping. >> >> Anne >> >> > (just some thought on the topic) >> > >> > Josef >> > >> >> So, thoughts anyone? I figure it's better to over-specify and then >> >> under-produce, so don't hold back. >> >> Thanks, >> >> Sam >> >> _______________________________________________ >> >> SciPy-Dev mailing list >> >> SciPy-Dev at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From sam.m.birch at gmail.com Fri Aug 27 16:37:03 2010 From: sam.m.birch at gmail.com (Sam Birch) Date: Fri, 27 Aug 2010 16:37:03 -0400 Subject: [SciPy-Dev] scipy.stats.kde In-Reply-To: References: Message-ID: Quick question: norm <-> delta function? Again, this is for work on the circle and for fairly dense data sets. But in principle, the KDE as a function is the convolution of a forest of delta functions, one per point, with the kernel. The conventional way to evaluate this function at a point is simply to evaluate the kernel once per data point and add them up. To evaluate this on a grid, you evaluate the kernel once per grid point per data point and add. Naturally this can be expensive. Huh, usually I do it the other way around (changing the grid as needed per-datapoint) because it saves lots of time for kernels with finite support. Is truncating kernels often applied? It depends on your data obviously but in many cases the Gaussian drops very rapidly (relative to the size of the mesh). To the rest: that's a very clever idea. I see your point about varying the bandwidths etc. without recomputing the KDE. Maybe then there should be a separation between "plotting" the KDE with a given kernel & bandwidth and creating a KDE from a dataset (which would just determine the FCs of the forest of delta functions). W.r.t. padding, perhaps doing it the other way around (as I mentioned above) would negate the consequences of an expanded mesh (I have no idea what I'm talking about really--shot in the dark)? -Sam On Fri, Aug 27, 2010 at 3:51 PM, Anne Archibald wrote: > On 27 August 2010 15:27, Sam Birch wrote: > >> Bandwidth selection is a hotly debated topic, at least in one > > > > dimension, so perhaps not just different methods but tools for > > > > diagnosing bandwidth selection problems would be nice - at the least, > > > > it should be made straightforward to vary the bandwidth (e.g. to plot > > > > the KDE with a range of different bandwidth values). > > > > Well by allowing them to use a custom bandwidth matrix they can vary it > > themselves, no? > > Well, in principle, yes. But if the API forces them to construct an > entirely new KDE object to change the bandwidth matrix, and if this > object involves substantial additional data structures (e.g. a kd-tree > holding the data points) this could be cumbersome. > > >> At the other end of the spectrum, for very dense KDEs, on the circle I > > > > found it extremely convenient to use Fourier transforms to carry out > > > > the convolution of kernel with points. In particular, I represented > > > > the KDE in terms of its Fourier coefficients, so that an inverse FFT > > > > immediately gave me the KDE evaluated on a grid (or, with some > > > > fiddling, integrated over the bins of a histogram). I don't know > > > > whether this is a useful optimization for KDEs on the line or in > > > > higher dimensions, since there's the problem of wrapping. > > > > That sounds very interesting. Sorry if I'm being dense (or just wrong, or > > both), but do you convolve post-FFT or before? If before why does it make > it > > easier? > > Again, this is for work on the circle and for fairly dense data sets. > But in principle, the KDE as a function is the convolution of a forest > of delta functions, one per point, with the kernel. The conventional > way to evaluate this function at a point is simply to evaluate the > kernel once per data point and add them up. To evaluate this on a > grid, you evaluate the kernel once per grid point per data point and > add. Naturally this can be expensive. > > My idea is that convolution of functions is simply multiplication of > the Fourier transforms of those functions. So instead of storing a > list of data points in my KDE object, I store a representation of the > forest of delta functions in terms of their Fourier coefficients (the > nth Fourier coefficient of a delta function at phase p is exp(2 pi i n > p)). This is necessarily approximate, since I store finitely many > Fourier coefficients, but it's not hard to store "enough". Now when I > want to convolve this forest by a kernel, I simply multiply these > Fourier coefficients by those of the kernel. The easiest "kernel" is > the sinc function, for which I simply truncate the Fourier > coefficients (which is why it's easy to have enough). We actually use > this "kernel" a lot, even though it's not positive everywhere. A > mathematically-better choice is the von Mises distribution, whose PDF > is proportional to exp(k cos(x)) and whose Fourier coefficients can be > written in terms of Bessel functions. > > Once you have the Fourier coefficients of the KDE, you can evaluate it > at a point by taking a sum of sinusoids, but the key idea is that you > can evaluate it on a grid by taking an inverse FFT. If you want > integrals over intervals, well, that you just get by integrating > sinusoids over intervals, so there's a messy but easily-derived way to > work out the area in terms of the Fourier coefficients. This too can > be nicely worked out on a grid, by fiddling the Fourier coefficients > and taking an inverse FFT. > > > To construct the FCs of the forest of delta functions, if I have > photon arrival phases I just take a sum (which can be slow, but this > isn't really time-critical). But it's also perfectly reasonable to > start from a histogram and take an FFT. Crucially, the histogram need > not be a reasonable-looking histogram - you can never have too many > bins, since it's not a problem if all the bin counts are either zero > or one. The only drawback here is that you introduce an error > averaging to half a bin width on each photon arrival phase. But the > KDE provides a check on this too - if your kernel width is much larger > than the width of the input bins, then the errors you introduced > probably don't matter much (leaving aside nasty issues with Moire > patterns in the likely case that your input times were already > binned). > > One thing to note here is that once you have the FCs, you can try > various kernels and bandwidths without going back to your original > data. (You can also get uncertainties on all the various computed > quantities, and in fact you can usually turn around and not only start > from a histogram but start from an array of values with uncertainties. > All this stuff is in a paper that's on my back burner right now.) > > > The thing is, I don't really know how useful all this is for KDEs on a > line or in R^n. The problem is that working with discrete Fourier > coefficients implicitly wraps the KDE around at the ends of the > interval, and it's not clear that this is still worth doing if you're > going to "pad" your region enough that this isn't a problem: the > padding forces you to evaluate at lots of points you don't care about > and use lots more Fourier coefficients than you would otherwise have > to. > > > Anne > > > -Sam > > On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald < > aarchiba at physics.mcgill.ca> > > wrote: > >> > >> My only experience with KDEs has been on the circle, where there seems > >> to be little or no literature and the constraints are rather > >> different. > >> > >> On 27 August 2010 14:38, wrote: > >> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch > >> > wrote: > >> >> Hi all, > >> >> I was thinking of renovating the kernel density estimation package > >> >> (although > >> >> no promises; I'm leaving for college tomorrow morning!). I was > >> >> wondering: > >> >> a) whether anyone had started code in that direction > >> > > >> > Mike Crowe wrote code for kernel regression and Skipper started a 1D > >> > kernel density estimator in scikits.statsmodels, which cover a larger > >> > number of kernels > >> > > >> > I don't think I have seen any higher dimensional kernel density > >> > estimation in python besides scipy.stats.kde. The Gaussian kde in > >> > scipy.stats is targeted to the underlying Fortran code for > >> > multivariate normal cdf. > >> > It's not clear to me what other n-dimensional kdes would require or > >> > whether they would fit well with the current code. > >> > > >> > One extension that Robert also mentioned in the past that it would be > >> > nice to have adaptive kernels, which I also haven't seen in python > >> > yet. > >> > > >> >> b) what people want in it > >> >> I was thinking (as an ideal, not necessarily goal): > >> >> - Support for more than Gaussian kernels (e.g. custom, > >> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.) > >> >> - More options for bandwidth selection (custom bandwidth matrices, > >> >> AMISE > >> >> optimization, cross-validation, etc.) > >> > > >> > definitely yes, I don't think they are even available for 1D yet. > >> > >> Bandwidth selection is a hotly debated topic, at least in one > >> dimension, so perhaps not just different methods but tools for > >> diagnosing bandwidth selection problems would be nice - at the least, > >> it should be made straightforward to vary the bandwidth (e.g. to plot > >> the KDE with a range of different bandwidth values). > >> > >> >> - Assorted conveniences: automatically generate the mesh, limit the > >> >> kernel's > >> >> support for speed > >> > > >> > Using scipy.spatial to limit the number of neighbors in a bounded > >> > support kernel might be a good idea. > >> > >> Simply using it to find the neighbors that need to be used should > >> speed things up. There may also be some shortcuts for > >> unbounded-support kernels (no point adding a Gaussian a hundred sigma > >> away if there's any points nearby). > >> > >> At the other end of the spectrum, for very dense KDEs, on the circle I > >> found it extremely convenient to use Fourier transforms to carry out > >> the convolution of kernel with points. In particular, I represented > >> the KDE in terms of its Fourier coefficients, so that an inverse FFT > >> immediately gave me the KDE evaluated on a grid (or, with some > >> fiddling, integrated over the bins of a histogram). I don't know > >> whether this is a useful optimization for KDEs on the line or in > >> higher dimensions, since there's the problem of wrapping. > >> > >> Anne > >> > >> > (just some thought on the topic) > >> > > >> > Josef > >> > > >> >> So, thoughts anyone? I figure it's better to over-specify and then > >> >> under-produce, so don't hold back. > >> >> Thanks, > >> >> Sam > >> >> _______________________________________________ > >> >> SciPy-Dev mailing list > >> >> SciPy-Dev at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-dev > >> >> > >> >> > >> > _______________________________________________ > >> > SciPy-Dev mailing list > >> > SciPy-Dev at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-dev > >> > > >> _______________________________________________ > >> SciPy-Dev mailing list > >> SciPy-Dev at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Aug 27 20:09:17 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 28 Aug 2010 09:09:17 +0900 Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure In-Reply-To: References: Message-ID: On Sat, Aug 28, 2010 at 12:55 AM, G?khan Sever wrote: > Hello, > On a Fedora 13 VirtualBox setup > Linux a 2.6.33.6-147.2.4.fc13.i686 #1 SMP Fri Jul 23 17:27:40 UTC 2010 i686 > i686 i386 GNU/Linux > python -c 'import numpy; numpy.test()' > Running unit tests for numpy > NumPy version 2.0.0.dev8671 > NumPy is installed in /usr/lib/python2.6/site-packages/numpy > Python version 2.6.4 (r264:75706, Jun ?4 2010, 18:20:16) [GCC 4.4.4 20100503 > (Red Hat 4.4.4-2)] > nose version 0.11.3 > ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K..................................................................................................................................................................................................................................K............................................................................................K......................K.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.....................................................................................................................................................................................................................................................................................................................................................................................................................................Warning: > divide by zero encountered in log > ....................................................................................................................................................................................................................................................................................... > ====================================================================== > FAIL: test_lapack (test_build.TestF77Mismatch) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ??File "/usr/lib/python2.6/site-packages/numpy/testing/decorators.py", line > 146, in skipper_func > ?? ?return f(*args, **kwargs) > ??File "/usr/lib/python2.6/site-packages/numpy/linalg/tests/test_build.py", > line 50, in test_lapack > ?? ?information.""") > AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! This > is likely to > cause random crashes and wrong results. See numpy INSTALL.txt for more > information. > ?? ?"Fail the test if the expression is true." >>> ?if True: raise self.failureException, 'Both g77 and gfortran runtimes >>> linked in lapack_lite ! This is likely to\ncause random crashes and wrong >>> results. See numpy INSTALL.txt for more\ninformation.' > > ---------------------------------------------------------------------- > Ran 3024 tests in 21.928s > FAILED (KNOWNFAIL=4, failures=1) > > > Any idea how to resolve this one? I use package manager to install > requirements. It seems g77 and gfortran are mixed for lapack, but not sure > how to fix it. When I try to uninstall gfortran it tries to remove > lapack/blas/atlas all. Use python setup.py build_ext --fcompiler=gnu95 Maybe removing g77 works as well, cheers, David From gokhansever at gmail.com Fri Aug 27 20:30:41 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 19:30:41 -0500 Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 7:09 PM, David Cournapeau wrote: > Use python setup.py build_ext --fcompiler=gnu95 > > Maybe removing g77 works as well, > > cheers, > > David > Both results with the same test failure . -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Fri Aug 27 20:38:33 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 19:38:33 -0500 Subject: [SciPy-Dev] NumPy 2.0.0.dev8671 test failure In-Reply-To: References: Message-ID: On Fri, Aug 27, 2010 at 7:30 PM, G?khan Sever wrote: > > > On Fri, Aug 27, 2010 at 7:09 PM, David Cournapeau wrote: > >> Use python setup.py build_ext --fcompiler=gnu95 >> >> Maybe removing g77 works as well, >> >> cheers, >> >> David >> > > Both results with the same test failure . > Sorry for the noise. With a clean install and using only gfortran all tests pass. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Fri Aug 27 22:00:26 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 21:00:26 -0500 Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault) In-Reply-To: <4C780088.4090602@gmail.com> References: <4C780088.4090602@gmail.com> Message-ID: On Fri, Aug 27, 2010 at 1:14 PM, Bruce Southey wrote: > Hi, > It would be useful to know which test is involved regardless of the issue. > Can you please run the tests with the verbose option such as: > 'scipy.test(verbose=10)'? > > This may have something to do with your numpy problem so please fix that up > once you have identified the test (failure in linalg would confirm that). > > As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version > 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'. > > Did you completely wipe the previous numpy installation especially the > installed numpy files in $PATH2PYTHON/site-packages/ and remove any prior > build directories? > If so, then you need to create suitable site.cfg file to ensure the correct > compiler is being used because something has changed in either your distro > or numpy install. > > Bruce > Hello, Following David and your suggestions I resolved NumPy and SciPy install problems from the source repo. SciPy tests don't give any segfaults anymore. >From the verbose=10 run I get a long list of test execution giving: Ran 4668 tests in 105.707s FAILED (KNOWNFAIL=12, SKIP=34, failures=2) I couldn't get it write into a file with piping -> python -c 'import scipy; scipy.test(verbose=10)' >> scipy_test How can I get it logging? I might upload the file somewhere for further investigation. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 27 22:10:52 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Aug 2010 20:10:52 -0600 Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault) In-Reply-To: References: <4C780088.4090602@gmail.com> Message-ID: On Fri, Aug 27, 2010 at 8:00 PM, G?khan Sever wrote: > > > On Fri, Aug 27, 2010 at 1:14 PM, Bruce Southey wrote: > >> Hi, >> It would be useful to know which test is involved regardless of the issue. >> Can you please run the tests with the verbose option such as: >> 'scipy.test(verbose=10)'? >> >> This may have something to do with your numpy problem so please fix that >> up once you have identified the test (failure in linalg would confirm that). >> >> >> As a reference, my Fedora 13 x64 bit system uses gfortran - 'gcc version >> 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)'. >> >> Did you completely wipe the previous numpy installation especially the >> installed numpy files in $PATH2PYTHON/site-packages/ and remove any prior >> build directories? >> If so, then you need to create suitable site.cfg file to ensure the >> correct compiler is being used because something has changed in either your >> distro or numpy install. >> >> Bruce >> > > Hello, > > Following David and your suggestions I resolved NumPy and SciPy install > problems from the source repo. SciPy tests don't give any segfaults anymore. > From the verbose=10 run I get a long list of test execution giving: > > Ran 4668 tests in 105.707s > > FAILED (KNOWNFAIL=12, SKIP=34, failures=2) > > I couldn't get it write into a file with piping -> python -c 'import scipy; > scipy.test(verbose=10)' >> scipy_test > > In bash: python -c 'import scipy; scipy.test(verbose=10)' &> scipy_test Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Fri Aug 27 22:30:34 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 27 Aug 2010 21:30:34 -0500 Subject: [SciPy-Dev] SciPy 0.9.0.dev6651 test failures (segfault) In-Reply-To: References: <4C780088.4090602@gmail.com> Message-ID: On Fri, Aug 27, 2010 at 9:10 PM, Charles R Harris wrote: > > In bash: python -c 'import scipy; scipy.test(verbose=10)' &> scipy_test > > Chuck > Thanks Chuck, & does the trick. See the test results at http://pastebin.com/qg2x2vdV -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Aug 30 09:59:01 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 30 Aug 2010 09:59:01 -0400 Subject: [SciPy-Dev] Travis: test_continuous_basic.py Message-ID: Travis, Is there a reason why you disabled the kolmogorov-smirnov tests ? http://projects.scipy.org/scipy/changeset/6472/trunk/scipy/stats/tests/test_continuous_basic.py 180 yield check_distribution_rvs, dist, args, alpha, rvs 181 184 # yield check_distribution_rvs, dist, args, alpha, rvs Josef From pav at iki.fi Tue Aug 31 18:01:44 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 31 Aug 2010 22:01:44 +0000 (UTC) Subject: [SciPy-Dev] RFR: N-dimensional interpolation References: Message-ID: Sun, 25 Jul 2010 15:35:11 +0000, Pauli Virtanen wrote: > I took the Qhull by the horns, and wrote a straightforward `griddata` > implementation for working in N-D: [clip] It's now committed to SVN (as it works-well-for-me). Post-mortem review and testing is welcome. http://projects.scipy.org/scipy/changeset/6653 http://projects.scipy.org/scipy/changeset/6655 http://projects.scipy.org/scipy/changeset/6657 What's in there is: 1) scipy.spatial.qhull Delaunay decomposition and some associated low-level N-d geometry routines. 2) scipy.interpolate.interpnd N-dimensional interpolation: 1) Linear barycentric interpolation 2) Cubic spline interpolation (2D-only, C1 continuous, approximately minimum-curvature). 3) scipy.interpolate.griddatand Convenience interface to the N-d interpolation classes. What could be added: - More comprehensive interface to other features of Qhull - Using qhull_restore, qhull_save to store Qhull contexts instead of copying the relevant data? - Optimizing the cubic interpolant - Monotonic cubic interpolation - Cubic interpolation in 3-d - Natural neighbour interpolation - etc. *** Example: import numpy as np def func(x, y): return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2 grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j] points = np.random.rand(1000, 2) values = func(points[:,0], points[:,1]) from scipy.interpolate import griddata grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest') grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear') grid_z2 = griddata(points, values, (grid_x, grid_y), method='cubic') import matplotlib.pyplot as plt plt.subplot(221) plt.imshow(func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower') plt.plot(points[:,0], points[:,1], 'k.', ms=1) plt.title('Original') plt.subplot(222) plt.imshow(grid_z0.T, extent=(0,1,0,1), origin='lower') plt.title('Nearest') plt.subplot(223) plt.imshow(grid_z1.T, extent=(0,1,0,1), origin='lower') plt.title('Linear') plt.subplot(224) plt.imshow(grid_z2.T, extent=(0,1,0,1), origin='lower') plt.title('Cubic') plt.gcf().set_size_inches(6, 6) plt.show() -- Pauli Virtanen From dwf at cs.toronto.edu Tue Aug 31 18:24:48 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 31 Aug 2010 18:24:48 -0400 Subject: [SciPy-Dev] RFR: N-dimensional interpolation In-Reply-To: References: Message-ID: <17D79004-0D89-41E4-92FA-19819025A342@cs.toronto.edu> On 2010-08-31, at 6:01 PM, Pauli Virtanen wrote: > What's in there is: > > 1) scipy.spatial.qhull > > Delaunay decomposition and some associated low-level N-d geometry > routines. > > 2) scipy.interpolate.interpnd > > N-dimensional interpolation: > > 1) Linear barycentric interpolation > 2) Cubic spline interpolation (2D-only, C1 continuous, > approximately minimum-curvature). > > 3) scipy.interpolate.griddatand > > Convenience interface to the N-d interpolation classes. I don't know if and when I'll have occasion to use this stuff, but I'm glad it's there. Nice work, Pauli! One comment: the name "griddatand" looks odd to my eyes, my mind wants to group it as "datand". I only mention this because it might slip past someone who's looking for "griddata" (also, does np.lookfor match partial words?). I don't really have a suggestion of another name though. "ndgriddata"? Then griddata looks a bit more separate, and it'd match scipy.ndimage. David From pav at iki.fi Tue Aug 31 18:28:03 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 31 Aug 2010 22:28:03 +0000 (UTC) Subject: [SciPy-Dev] RFR: N-dimensional interpolation References: <17D79004-0D89-41E4-92FA-19819025A342@cs.toronto.edu> Message-ID: Tue, 31 Aug 2010 18:24:48 -0400, David Warde-Farley wrote: [clip] > One comment: the name "griddatand" looks odd to my eyes, my mind wants > to group it as "datand". I only mention this because it might slip past > someone who's looking for "griddata" "griddatand" is the name of the module, and probably nobody will need to use it since the stuff is imported to top-level scipy.interpolate. (I can't use "griddata" there since it would shadow the function name.) > (also, does np.lookfor match partial words?). Yep. > I don't really have a suggestion of another name > though. "ndgriddata"? Then griddata looks a bit more separate, and it'd > match scipy.ndimage. That might be a slightly better name, yes. -- Pauli Virtanen From fperez.net at gmail.com Tue Aug 31 20:31:03 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 31 Aug 2010 17:31:03 -0700 Subject: [SciPy-Dev] Scipy.org down (again) Message-ID: Howdy, the usual... Cheers, f