From mluessi at gmail.com Wed Jan 2 16:09:51 2013 From: mluessi at gmail.com (Martin Luessi) Date: Wed, 2 Jan 2013 16:09:51 -0500 Subject: [SciPy-Dev] Scipy.org is down + question about "topical software" In-Reply-To: References: <50211CF8.9020108@michaelclerx.com> <50222CBA.5070202@hilboll.de>

Message-ID: On Wed, Aug 8, 2012 at 11:33 AM, Ognen Duzlevski wrote: > On Wed, Aug 8, 2012 at 10:21 AM, Ognen Duzlevski wrote: > >> I am just tarring everything up now and will proceed to delete >> everything that is older than August. >> Ognen > > OK - I have deleted the obvious suspects but there are many more. I > will try and figure out a way to do this better. > Ognen Hi, I just tried to edit http://www.scipy.org/Topical_Software and encountered the "Too many links" error. Is it possible for someone to delete old entries again? Maybe a solution could be to have a cronjob that deletes "MoinEditorBackup" entries that are older than a few days. Best, Martin From pav at iki.fi Sat Jan 5 14:15:16 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 05 Jan 2013 21:15:16 +0200 Subject: [SciPy-Dev] PR 397: Getting rid of 2to3 (single codebase for Python 2 & 3) Message-ID: Hi, Prompted by this: http://jakevdp.github.com/blog/2013/01/03/will-scientists-ever-move-to-python-3/#comment-755694121 here's a conversion of the Scipy code base runnable on Python 2.6 and 3.x without 2to3: https://github.com/scipy/scipy/pull/397 That was fairly easy to do, and I suspect the case is the same for Numpy. But do we want to go this way? On the one hand, this is a cleaner way to go than relying on 2to3 --- which does not convert all semantic differences and can lead to some subtle bugs... On the other hand, well, you have to add list() around map() et al. to make them lists, and have to import xrange, izip et al. from a compatibility module. To me, overall, this doesn't look like a bad route to go. Thoughts? Pauli From josef.pktd at gmail.com Sat Jan 5 15:54:17 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 5 Jan 2013 15:54:17 -0500 Subject: [SciPy-Dev] PR 397: Getting rid of 2to3 (single codebase for Python 2 & 3) In-Reply-To: References: Message-ID: On Sat, Jan 5, 2013 at 2:15 PM, Pauli Virtanen wrote: > Hi, > > Prompted by this: > > > http://jakevdp.github.com/blog/2013/01/03/will-scientists-ever-move-to-python-3/#comment-755694121 > > here's a conversion of the Scipy code base runnable on Python 2.6 and > 3.x without 2to3: > > https://github.com/scipy/scipy/pull/397 > > That was fairly easy to do, and I suspect the case is the same for Numpy. > > But do we want to go this way? On the one hand, this is a cleaner way to > go than relying on 2to3 --- which does not convert all semantic > differences and can lead to some subtle bugs... > > On the other hand, well, you have to add list() around map() et al. to > make them lists, and have to import xrange, izip et al. from a > compatibility module. > > To me, overall, this doesn't look like a bad route to go. Thoughts? I looked through your changes in scipy stats. They don't look too difficult and I don't see a reason not to switch to this. Some things might be difficult to remember and might slip through pull requests. For example for statsmodels I need to do compatibility fixes (with python and older numpy) at irregular intervals, which got easier however with having python 3 tested by TravisCI. Josef > > Pauli > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From cournape at gmail.com Sat Jan 5 16:02:34 2013 From: cournape at gmail.com (David Cournapeau) Date: Sat, 5 Jan 2013 15:02:34 -0600 Subject: [SciPy-Dev] PR 397: Getting rid of 2to3 (single codebase for Python 2 & 3) In-Reply-To: References: Message-ID: On Sat, Jan 5, 2013 at 1:15 PM, Pauli Virtanen wrote: > Hi, > > Prompted by this: > > > http://jakevdp.github.com/blog/2013/01/03/will-scientists-ever-move-to-python-3/#comment-755694121 > > here's a conversion of the Scipy code base runnable on Python 2.6 and > 3.x without 2to3: > > https://github.com/scipy/scipy/pull/397 > > That was fairly easy to do, and I suspect the case is the same for Numpy. > > But do we want to go this way? On the one hand, this is a cleaner way to > go than relying on 2to3 --- which does not convert all semantic > differences and can lead to some subtle bugs... > > On the other hand, well, you have to add list() around map() et al. to > make them lists, and have to import xrange, izip et al. from a > compatibility module. > > To me, overall, this doesn't look like a bad route to go. Thoughts? +1 on this. For numpy/scipy code, supporting both 2 and 3 is not too difficult, and should not cause too many performance issues. David From arsenovic at virginia.edu Sun Jan 6 10:24:30 2013 From: arsenovic at virginia.edu (Alexander Arsenovic) Date: Sun, 6 Jan 2013 10:24:30 -0500 Subject: [SciPy-Dev] ipython sphinx directive in docstrings Message-ID: i recently started using the ipython sphinx directive in my docs. it seems to me that using this in the `Examples` section of the docstrings would be good practice, as it prevents me from making error in the examples, and could be used to insert plots. from what i see, the scipy/numpy doc strings dont make use docs use this. is there a reason why *not* to use the ipython directive in scipy/numpy-convention docstrings? alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 9 12:32:49 2013 From: toddrjen at gmail.com (Todd) Date: Wed, 9 Jan 2013 18:32:49 +0100 Subject: [SciPy-Dev] Vector Strength function Message-ID: I am interested in implementing a function for scipy. The function is called "vector strength". It is basically a measure of how reliably a set of events occur at a particular phase. It was originally developed for neuroscience research, to determine how well a set of neural events sync up with a periodic stimulus like a sound waveform. However, it is useful for determining how periodic a supposedly periodic set of events really are, for example: 1. Determining whether crime is really more common during a full moon and by how much 2. Determining how concentrated visitors to a coffee shop are during rush hour 3. Determining exactly how concentrated hurricanes are during hurricane season My thinking is that this could be implemented in stages: First would be a Numpy function that would add a set of vectors in polar coordinates. Given a number of magnitude/angle pairs it would provide a summed magnitude/angle pair. This would probably be combined with a cartesian<->polar conversion functions. Making use of this function would be a scipy function that would actually implement the vector strength calculation. This is done by treating each event as a unit vector with a phase, then taking the average of the vectors. If all events have the same phase, the result will have an amplitude of 1. If they all have a different phases, the result will have an amplitude of 0. It may even be worth having a dedicated polar dtype, although that may be too much. What does everyone think of this proposal? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 9 14:35:12 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 9 Jan 2013 20:35:12 +0100 Subject: [SciPy-Dev] ipython sphinx directive in docstrings In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 4:24 PM, Alexander Arsenovic wrote: > i recently started using the ipython sphinx directive in my docs. it > seems to me that using this in the `Examples` section of the docstrings > would be good practice, as it prevents me from making error in the > examples, and could be used to insert plots. > > from what i see, the scipy/numpy doc strings dont make use docs use > this. is there a reason why *not* to use the ipython directive in > scipy/numpy-convention docstrings? > We already can and do include plots with the plot directive. Changing ">>>" to "In [1]" is also not really needed. And the standard way to check for errors in examples is with doctest, although we don't really care much about that at the moment in numpy/scipy. So there's not a large incentive to start using the ipython directive AFAICT. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 9 14:44:44 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 9 Jan 2013 14:44:44 -0500 Subject: [SciPy-Dev] Vector Strength function In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 12:32 PM, Todd wrote: > I am interested in implementing a function for scipy. The function is > called "vector strength". It is basically a measure of how reliably a set > of events occur at a particular phase. > > It was originally developed for neuroscience research, to determine how well > a set of neural events sync up with a periodic stimulus like a sound > waveform. > > However, it is useful for determining how periodic a supposedly periodic set > of events really are, for example: > > 1. Determining whether crime is really more common during a full moon and by > how much > 2. Determining how concentrated visitors to a coffee shop are during rush > hour > 3. Determining exactly how concentrated hurricanes are during hurricane > season > > > My thinking is that this could be implemented in stages: > > First would be a Numpy function that would add a set of vectors in polar > coordinates. Given a number of magnitude/angle pairs it would provide a > summed magnitude/angle pair. This would probably be combined with a > cartesian<->polar conversion functions. > > Making use of this function would be a scipy function that would actually > implement the vector strength calculation. This is done by treating each > event as a unit vector with a phase, then taking the average of the vectors. > If all events have the same phase, the result will have an amplitude of 1. > If they all have a different phases, the result will have an amplitude of 0. > > It may even be worth having a dedicated polar dtype, although that may be > too much. > > What does everyone think of this proposal? Is this the same as a mean resultant in circular statistics? def circular_resultant(rads, axis=0): mp = np.sum(np.exp(1j*rads), axis=axis) rho = np.abs(mp) mu = np.angle(mp) return mp, rho, mu Josef > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From pierre at barbierdereuille.net Tue Jan 15 05:44:28 2013 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Tue, 15 Jan 2013 11:44:28 +0100 Subject: [SciPy-Dev] Bug in the Delaunay triangulation with QHull Message-ID: Hey, I documented a bug in the Delaunay triangulation code here: http://projects.scipy.org/scipy/ticket/1810 The bug can be seen either as a bug in the compilation or in the code and is triggered if there is already a qhull lib compiled for the system, but configured differently. I am not sure if SciPy should use the system lib or the internal one, but it should one or the other consistently, not compile with the internal and link to the system one. I believe this should be easy to fix once a decision has been reached. -- Barbier de Reuille Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jan 15 06:06:35 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 15 Jan 2013 11:06:35 +0000 (UTC) Subject: [SciPy-Dev] Bug in the Delaunay triangulation with QHull References: Message-ID: Pierre Barbier de Reuille barbierdereuille.net> writes: > Hey, > I documented a bug in the Delaunay triangulation code here: http://projects.scipy.org/scipy/ticket/1810 > > > The bug can be seen either as a bug in the compilation or > in the code and is triggered if there is already a qhull lib > compiled for the system, but configured differently. I am not > sure if SciPy should use the system lib or the internal one, but > it should one or the other consistently, not compile with the > internal and link to the system one. > I believe this should be easy to fix once a decision has been reached. We want to link with the bundled library, as there is no guarantee that the system library may be compiled with qh_POINTER or without. The fix is easy. -- Pauli Virtanen From suryak at ieee.org Mon Jan 21 13:05:25 2013 From: suryak at ieee.org (Surya Kasturi) Date: Mon, 21 Jan 2013 23:35:25 +0530 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! Message-ID: Hi, I am Surya, studying Junior Year - Electronics & Communication Engineering with Computer Science/ Programming background. I have looked into SciPy and its really amazing! In this regard, I would like to explore the possibility of contributing to this project by writing code and simultaneously learn the real engineering stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud platforms (Openshift), Git. Also, I wrote some fun-stuff projects during week ends which you might like to take a look. 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends using cartoons (Python, Django -- PIL) 2. Https://apps.facebook.com/suryaphotography -- social reader framework for my photography blog; Not yet finished (Python, Django -- Google Feed API) - Got to finish if time permits 3. Https://github.com/ksurya -- Github handle So, I am ready to take up any work and get along with it that involves Python! Regarding my scientific skills, I studied Engineering Mathematics, Digital Signal Processing (now studying), Signals & Systems etc. [ More on signals ] Thanks for reading! waiting for your reply -- Surya -------------- next part -------------- An HTML attachment was scrubbed... URL: From jarausch at igpm.rwth-aachen.de Tue Jan 22 07:12:50 2013 From: jarausch at igpm.rwth-aachen.de (Helmut Jarausch) Date: Tue, 22 Jan 2013 12:12:50 +0000 (UTC) Subject: [SciPy-Dev] weave for Python3.3 ? Message-ID: Hi, has anybody started to update weave for Python3.3? It's not a simple 2to3 call. Helmut. From pav at iki.fi Tue Jan 22 12:25:02 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 22 Jan 2013 19:25:02 +0200 Subject: [SciPy-Dev] weave for Python3.3 ? In-Reply-To: References: Message-ID: 22.01.2013 14:12, Helmut Jarausch kirjoitti: > has anybody started to update weave for Python3.3? > It's not a simple 2to3 call. Nobody, as far as I know. -- Pauli Virtanen From ralf.gommers at gmail.com Tue Jan 22 13:59:33 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 22 Jan 2013 19:59:33 +0100 Subject: [SciPy-Dev] weave for Python3.3 ? In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 6:25 PM, Pauli Virtanen wrote: > 22.01.2013 14:12, Helmut Jarausch kirjoitti: > > has anybody started to update weave for Python3.3? > > It's not a simple 2to3 call. > > Nobody, as far as I know. And if you're considering working on that: think really hard first about why you wouldn't move to Cython instead. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From smith.daniel.br at gmail.com Tue Jan 22 14:26:24 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Tue, 22 Jan 2013 14:26:24 -0500 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! Message-ID: Hello, I am also looking for ways to contribute to Scipy. I have experience with Python, C/C++ and limited experience with the Numpy C API. In particular, I have some code implementing the kernel density estimator bandwidth selection algorithm from the following paper: Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916?2957, 2010. That method is more resilient to multi-modal data than the standard plug-in estimators. I would love to add that method to the current SciPy stats package if there is interest. Thanks, Daniel > Hi, > > I am Surya, studying Junior Year - Electronics & Communication Engineering > with Computer Science/ Programming background. I have looked into SciPy and > its really amazing! > > In this regard, I would like to explore the possibility of contributing to > this project by writing code and simultaneously learn the real engineering > stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud > platforms (Openshift), Git. > > Also, I wrote some fun-stuff projects during week ends which you might like > to take a look. > > 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends > using cartoons (Python, Django -- PIL) > 2. Https://apps.facebook.com/suryaphotography -- social reader framework > for my photography blog; Not yet finished (Python, Django -- Google Feed > API) - Got to finish if time permits > 3. Https://github.com/ksurya -- Github handle > > So, I am ready to take up any work and get along with it that involves > Python! > > Regarding my scientific skills, I studied Engineering Mathematics, Digital > Signal Processing (now studying), Signals & Systems etc. [ More on signals ] > > > Thanks for reading! waiting for your reply > > -- Surya From josef.pktd at gmail.com Tue Jan 22 15:18:09 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 22 Jan 2013 15:18:09 -0500 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 2:26 PM, Daniel Smith wrote: > Hello, > > I am also looking for ways to contribute to Scipy. I have experience > with Python, C/C++ and limited experience with the Numpy C API. In > particular, I have some code implementing the kernel density estimator > bandwidth selection algorithm from the following paper: > > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density > estimation via diffusion. The Annals of Statistics, 38(5):2916?2957, > 2010. > > That method is more resilient to multi-modal data than the standard > plug-in estimators. I would love to add that method to the current > SciPy stats package if there is interest. Looks interesting, either for scipy.stats or statsmodels. statsmodels has now kde with least-squares cross-validation among other bandwidth choices. However, there is nothing to improve boundary effects or that has adaptive bandwidth choice. Which programming language did you write it in? and out of curiosity: Do you know how well the estimator behaves in smaller samples, 200 or 500. The paper seems to consider sample size of 1000 as small. (very fast skimming of article) Josef > > Thanks, > Daniel > >> Hi, >> >> I am Surya, studying Junior Year - Electronics & Communication Engineering >> with Computer Science/ Programming background. I have looked into SciPy and >> its really amazing! >> >> In this regard, I would like to explore the possibility of contributing to >> this project by writing code and simultaneously learn the real engineering >> stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud >> platforms (Openshift), Git. >> >> Also, I wrote some fun-stuff projects during week ends which you might like >> to take a look. >> >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends >> using cartoons (Python, Django -- PIL) >> 2. Https://apps.facebook.com/suryaphotography -- social reader framework >> for my photography blog; Not yet finished (Python, Django -- Google Feed >> API) - Got to finish if time permits >> 3. Https://github.com/ksurya -- Github handle >> >> So, I am ready to take up any work and get along with it that involves >> Python! >> >> Regarding my scientific skills, I studied Engineering Mathematics, Digital >> Signal Processing (now studying), Signals & Systems etc. [ More on signals ] >> >> >> Thanks for reading! waiting for your reply >> >> -- Surya > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From smith.daniel.br at gmail.com Tue Jan 22 16:16:36 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Tue, 22 Jan 2013 16:16:36 -0500 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: gmail.com> writes: > > On Tue, Jan 22, 2013 at 2:26 PM, Daniel Smith gmail.com> wrote: > > Hello, > > > > I am also looking for ways to contribute to Scipy. I have experience > > with Python, C/C++ and limited experience with the Numpy C API. In > > particular, I have some code implementing the kernel density estimator > > bandwidth selection algorithm from the following paper: > > > > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density > > estimation via diffusion. The Annals of Statistics, 38(5):2916?2957, > > 2010. > > > > That method is more resilient to multi-modal data than the standard > > plug-in estimators. I would love to add that method to the current > > SciPy stats package if there is interest. > > Looks interesting, either for scipy.stats or statsmodels. > statsmodels has now kde with least-squares cross-validation among > other bandwidth choices. > > However, there is nothing to improve boundary effects or that has > adaptive bandwidth choice. Boundary effects are another issue. I don't have working code, but I have seen a few algorithms and could certainly add those corrections to existing code. > > Which programming language did you write it in? Everything is in Python/SciPy/Numpy. The most computationally expensive parts are the FFT, iFFT and a fixed point calculation, which are all implemented in SciPy/Numpy. The code is reasonably fast as it stands. I could make it faster by using Cython or C for calculating the derivatives of the estimated probability distribution function (pdf). > > and out of curiosity: Do you know how well the estimator behaves in > smaller samples, 200 or 500. The paper seems to consider sample size > of 1000 as small. (very fast skimming of article) Personally, I've had pretty good luck going down to 50-100 samples. The exact sample size needed largely depends on how ragged the pdf you are estimating is. > > Josef > > > > > Thanks, > > Daniel > > > >> Hi, > >> > >> I am Surya, studying Junior Year - Electronics & Communication Engineering > >> with Computer Science/ Programming background. I have looked into SciPy and > >> its really amazing! > >> > >> In this regard, I would like to explore the possibility of contributing to > >> this project by writing code and simultaneously learn the real engineering > >> stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud > >> platforms (Openshift), Git. > >> > >> Also, I wrote some fun-stuff projects during week ends which you might like > >> to take a look. > >> > >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends > >> using cartoons (Python, Django -- PIL) > >> 2. Https://apps.facebook.com/suryaphotography -- social reader framework > >> for my photography blog; Not yet finished (Python, Django -- Google Feed > >> API) - Got to finish if time permits > >> 3. Https://github.com/ksurya -- Github handle > >> > >> So, I am ready to take up any work and get along with it that involves > >> Python! > >> > >> Regarding my scientific skills, I studied Engineering Mathematics, Digital > >> Signal Processing (now studying), Signals & Systems etc. [ More on signals ] > >> > >> > >> Thanks for reading! waiting for your reply > >> > >> -- Surya > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > From suryak at ieee.org Tue Jan 22 19:56:49 2013 From: suryak at ieee.org (Surya Kasturi) Date: Wed, 23 Jan 2013 06:26:49 +0530 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: On Mon, Jan 21, 2013 at 11:35 PM, Surya Kasturi wrote: > Hi, > > I am Surya, studying Junior Year - Electronics & Communication Engineering > with Computer Science/ Programming background. I have looked into SciPy and > its really amazing! > > In this regard, I would like to explore the possibility of contributing to > this project by writing code and simultaneously learn the real engineering > stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud > platforms (Openshift), Git. > > Also, I wrote some fun-stuff projects during week ends which you might > like to take a look. > > 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends > using cartoons (Python, Django -- PIL) > 2. Https://apps.facebook.com/suryaphotography -- social reader framework > for my photography blog; Not yet finished (Python, Django -- Google Feed > API) - Got to finish if time permits > 3. Https://github.com/ksurya -- Github handle > > So, I am ready to take up any work and get along with it that involves > Python! > > Regarding my scientific skills, I studied Engineering Mathematics, Digital > Signal Processing (now studying), Signals & Systems etc. [ More on signals ] > > > Thanks for reading! waiting for your reply > > -- Surya > > Guys can anyone let me know about this above message.. please -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Tue Jan 22 20:16:10 2013 From: pmhobson at gmail.com (Paul Hobson) Date: Tue, 22 Jan 2013 17:16:10 -0800 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 4:56 PM, Surya Kasturi wrote: > On Mon, Jan 21, 2013 at 11:35 PM, Surya Kasturi wrote: > >> Hi, >> >> I am Surya, studying Junior Year - Electronics & Communication >> Engineering with Computer Science/ Programming background. I have looked >> into SciPy and its really amazing! >> >> In this regard, I would like to explore the possibility of contributing >> to this project by writing code and simultaneously learn the real >> engineering stuff. My skills lie in Python, Django, C - and little Facebook >> API, Cloud platforms (Openshift), Git. >> >> Also, I wrote some fun-stuff projects during week ends which you might >> like to take a look. >> >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends >> using cartoons (Python, Django -- PIL) >> 2. Https://apps.facebook.com/suryaphotography -- social reader framework >> for my photography blog; Not yet finished (Python, Django -- Google Feed >> API) - Got to finish if time permits >> 3. Https://github.com/ksurya -- Github handle >> >> So, I am ready to take up any work and get along with it that involves >> Python! >> >> Regarding my scientific skills, I studied Engineering Mathematics, >> Digital Signal Processing (now studying), Signals & Systems etc. [ More on >> signals ] >> >> >> Thanks for reading! waiting for your reply >> >> -- Surya >> >> > Guys can anyone let me know about this above message.. please > The source repository for scipy is hosted on github, https://github.com/scipy/scipy Most contributions come in the form of Pull Requests (PR) on github. So all you need to do is fork the repository, write some code, write some tests that verify your code behaves as desired, and submit a PR. Hope that helps, -p -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 22 20:17:10 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 22 Jan 2013 18:17:10 -0700 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 5:56 PM, Surya Kasturi wrote: > On Mon, Jan 21, 2013 at 11:35 PM, Surya Kasturi wrote: > >> Hi, >> >> I am Surya, studying Junior Year - Electronics & Communication >> Engineering with Computer Science/ Programming background. I have looked >> into SciPy and its really amazing! >> >> In this regard, I would like to explore the possibility of contributing >> to this project by writing code and simultaneously learn the real >> engineering stuff. My skills lie in Python, Django, C - and little Facebook >> API, Cloud platforms (Openshift), Git. >> >> Also, I wrote some fun-stuff projects during week ends which you might >> like to take a look. >> >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends >> using cartoons (Python, Django -- PIL) >> 2. Https://apps.facebook.com/suryaphotography -- social reader framework >> for my photography blog; Not yet finished (Python, Django -- Google Feed >> API) - Got to finish if time permits >> 3. Https://github.com/ksurya -- Github handle >> >> So, I am ready to take up any work and get along with it that involves >> Python! >> >> Regarding my scientific skills, I studied Engineering Mathematics, >> Digital Signal Processing (now studying), Signals & Systems etc. [ More on >> signals ] >> >> >> Thanks for reading! waiting for your reply >> >> -- Surya >> >> > Guys can anyone let me know about this above message.. please > > You don't need permission to work on open source projects. You can make pull requests, review others' pull requests, post to the list and participate in discussions, or look over the bug reports on trac and see if there are any you can fix. There is always a lack of developers so anyone who does work is welcome. But the work is voluntary and a person's interest is measured by their participation. You have made the first step by posting, so don't be shy, but don't expect a formal induction into the 'club', just start doing things. If you need more specific pointers to things to do, perhaps someone here can offer some suggestions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 22 21:26:30 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 22 Jan 2013 21:26:30 -0500 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 8:17 PM, Charles R Harris wrote: > > > On Tue, Jan 22, 2013 at 5:56 PM, Surya Kasturi wrote: >> >> On Mon, Jan 21, 2013 at 11:35 PM, Surya Kasturi wrote: >>> >>> Hi, >>> >>> I am Surya, studying Junior Year - Electronics & Communication >>> Engineering with Computer Science/ Programming background. I have looked >>> into SciPy and its really amazing! >>> >>> In this regard, I would like to explore the possibility of contributing >>> to this project by writing code and simultaneously learn the real >>> engineering stuff. My skills lie in Python, Django, C - and little Facebook >>> API, Cloud platforms (Openshift), Git. >>> >>> Also, I wrote some fun-stuff projects during week ends which you might >>> like to take a look. >>> >>> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends >>> using cartoons (Python, Django -- PIL) >>> 2. Https://apps.facebook.com/suryaphotography -- social reader framework >>> for my photography blog; Not yet finished (Python, Django -- Google Feed >>> API) - Got to finish if time permits >>> 3. Https://github.com/ksurya -- Github handle >>> >>> So, I am ready to take up any work and get along with it that involves >>> Python! >>> >>> Regarding my scientific skills, I studied Engineering Mathematics, >>> Digital Signal Processing (now studying), Signals & Systems etc. [ More on >>> signals ] >>> >>> >>> Thanks for reading! waiting for your reply >>> >>> -- Surya >>> >> >> Guys can anyone let me know about this above message.. please >> > > You don't need permission to work on open source projects. You can make > pull requests, review others' pull requests, post to the list and > participate in discussions, or look over the bug reports on trac and see if > there are any you can fix. There is always a lack of developers so anyone > who does work is welcome. But the work is voluntary and a person's interest > is measured by their participation. You have made the first step by posting, > so don't be shy, but don't expect a formal induction into the 'club', just > start doing things. If you need more specific pointers to things to do, > perhaps someone here can offer some suggestions. some more details https://github.com/scipy/scipy/blob/master/HACKING.rst.txt I think, a good way to get started, if you don't have any specific code submission in mind, is to go through the tickets of a subpackage like scipy.signal and see if you can fix or improve something. Some less commonly used functions might also have insufficient test coverage, and it's very useful to go through the code and unit tests, for new developers to learn how the code works, and for users to get better tested code. scikits.signal doesn't seem to have taken off, but scipy.signal is pretty active, although I think it doesn't have a dedicated "maintainer". With your web programming background there might also be interesting contributions to http://scikits.appspot.com/ http://scipy-central.org/ or ... Josef > > Chuck > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Tue Jan 22 21:48:29 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 22 Jan 2013 21:48:29 -0500 Subject: [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn! In-Reply-To: References:

Message-ID: On Tue, Jan 22, 2013 at 4:16 PM, Daniel Smith wrote: > gmail.com> writes: > >> >> On Tue, Jan 22, 2013 at 2:26 PM, Daniel Smith gmail.com> wrote: >> > Hello, >> > >> > I am also looking for ways to contribute to Scipy. I have experience >> > with Python, C/C++ and limited experience with the Numpy C API. In >> > particular, I have some code implementing the kernel density estimator >> > bandwidth selection algorithm from the following paper: >> > >> > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density >> > estimation via diffusion. The Annals of Statistics, 38(5):2916?2957, >> > 2010. >> > >> > That method is more resilient to multi-modal data than the standard >> > plug-in estimators. I would love to add that method to the current >> > SciPy stats package if there is interest. >> >> Looks interesting, either for scipy.stats or statsmodels. >> statsmodels has now kde with least-squares cross-validation among >> other bandwidth choices. >> >> However, there is nothing to improve boundary effects or that has >> adaptive bandwidth choice. > > Boundary effects are another issue. I don't have working code, but I have seen > a few algorithms and could certainly add those corrections to existing code. > >> >> Which programming language did you write it in? > > Everything is in Python/SciPy/Numpy. The most computationally expensive parts > are the FFT, iFFT and a fixed point calculation, which are all implemented in > SciPy/Numpy. The code is reasonably fast as it stands. I could make it faster > by using Cython or C for calculating the derivatives of the estimated > probability distribution function (pdf). sounds good about the implementation. I didn't see that it uses fft (when I looked at the 42 page paper for 5 to 10 minutes :) It could also be interesting to tie it in with the fft based kde in statsmodels https://github.com/statsmodels/statsmodels/blob/master/statsmodels/nonparametric/kde.py#L377 Ralph also worked on kde in scipy.stats and in statsmodels and will also have an idea which might be a better fit. Do you have the code somewhere publicly available? Thanks, Josef > >> >> and out of curiosity: Do you know how well the estimator behaves in >> smaller samples, 200 or 500. The paper seems to consider sample size >> of 1000 as small. (very fast skimming of article) > > Personally, I've had pretty good luck going down to 50-100 samples. The exact > sample size needed largely depends on how ragged the pdf you are estimating is. > >> >> Josef >> >> > >> > Thanks, >> > Daniel >> > >> >> Hi, >> >> >> >> I am Surya, studying Junior Year - Electronics & Communication Engineering >> >> with Computer Science/ Programming background. I have looked into SciPy and >> >> its really amazing! >> >> >> >> In this regard, I would like to explore the possibility of contributing to >> >> this project by writing code and simultaneously learn the real engineering >> >> stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud >> >> platforms (Openshift), Git. >> >> >> >> Also, I wrote some fun-stuff projects during week ends which you might like >> >> to take a look. >> >> >> >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends >> >> using cartoons (Python, Django -- PIL) >> >> 2. Https://apps.facebook.com/suryaphotography -- social reader framework >> >> for my photography blog; Not yet finished (Python, Django -- Google Feed >> >> API) - Got to finish if time permits >> >> 3. Https://github.com/ksurya -- Github handle >> >> >> >> So, I am ready to take up any work and get along with it that involves >> >> Python! >> >> >> >> Regarding my scientific skills, I studied Engineering Mathematics, Digital >> >> Signal Processing (now studying), Signals & Systems etc. [ More on signals ] >> >> >> >> >> >> Thanks for reading! waiting for your reply >> >> >> >> -- Surya >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From jarausch at igpm.rwth-aachen.de Wed Jan 23 05:54:25 2013 From: jarausch at igpm.rwth-aachen.de (Helmut Jarausch) Date: Wed, 23 Jan 2013 10:54:25 +0000 (UTC) Subject: [SciPy-Dev] weave for Python3.3 ? References:

Message-ID: On Tue, 22 Jan 2013 19:59:33 +0100, Ralf Gommers wrote: > On Tue, Jan 22, 2013 at 6:25 PM, Pauli Virtanen wrote: > >> 22.01.2013 14:12, Helmut Jarausch kirjoitti: >> > has anybody started to update weave for Python3.3? >> > It's not a simple 2to3 call. >> >> Nobody, as far as I know. > > > And if you're considering working on that: think really hard first about > why you wouldn't move to Cython instead. > Yes, I prefer Cython myself. But weave is still part of the GIT version of SciPy. And that's the reason why the installation of SciPy fails for Python3.3 since the weave parts cannot be (pre-)compiled during install. What about removing weave altogether from SciPy? Thanks, Helmut From arnd.baecker at web.de Wed Jan 23 06:46:14 2013 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed, 23 Jan 2013 12:46:14 +0100 (CET) Subject: [SciPy-Dev] weave for Python3.3 ? In-Reply-To: References:

Message-ID: On Wed, 23 Jan 2013, Helmut Jarausch wrote: > On Tue, 22 Jan 2013 19:59:33 +0100, Ralf Gommers wrote: > >> On Tue, Jan 22, 2013 at 6:25 PM, Pauli Virtanen wrote: >> >>> 22.01.2013 14:12, Helmut Jarausch kirjoitti: >>>> has anybody started to update weave for Python3.3? >>>> It's not a simple 2to3 call. >>> >>> Nobody, as far as I know. >> >> >> And if you're considering working on that: think really hard first about >> why you wouldn't move to Cython instead. >> > > Yes, I prefer Cython myself. But weave is still part of the GIT version of > SciPy. And that's the reason why the installation of SciPy fails for > Python3.3 since the weave parts cannot be (pre-)compiled during install. > > What about removing weave altogether from SciPy? In complete ignorance of how much work it would be to get weave to python3.3, I can only add an end-users perspective: we have a lot of code for our research projects which uses weave. Only few people in our group started using cython recently, so I don't see that we will move the well-used and tested code over to cython in the near future.... Best, Arnd From suryak at ieee.org Wed Jan 23 10:50:58 2013 From: suryak at ieee.org (Surya Kasturi) Date: Wed, 23 Jan 2013 21:20:58 +0530 Subject: [SciPy-Dev] Like to participate in scipy.signals Message-ID: Hi David, I am Surya, Electronics Engineering Student with CS background. I see that you are one of the maintainers of Scipy.Signals for which I am thinking to contribute some code (in Python)! As I am a newcomer to Opensource development and writing production level code, I am wondering if you could help me a bit in starting things out initially. Currently I am not really having any new ideas to pick up and deliver modules.. So, I am very open to pick up any assigned tasks or start with small stuff! Also, I have gone through some bugs in Signals module (#980 #928). However, as I am fairly new to the whole system, it would be great if you can provide some tips/ pointers etc for starting off. (Please let me know if there is any other interesting stuff I can do) Regarding my skills and interests: I know Python, Django, C. Interested in Signal Processing - Image Processing , Web dev. -- I was asked by couple of guys to check up respective leads for help Regards Surya -------------- next part -------------- An HTML attachment was scrubbed... URL: From smith.daniel.br at gmail.com Wed Jan 23 15:11:51 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Wed, 23 Jan 2013 15:11:51 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality Message-ID: Hello, This was started on a different thread, but I thought I would post a new thread focused on this. Currently, I have some existing code that implements the bandwidth selection algorithm from: Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010. Zdravko Botev implemented the code in MatLab which can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator My code for that is here: https://github.com/Daniel-B-Smith/KDE-for-SciPy I assume I probably need to find a workaround to avoid the float128 in the function fixed_point before I can add it to SciPy. I wrote the code a couple of years ago, so it will take me a moment to map out the best workaround (there is a very large number being multiplied by a very small number). I can also add the 2d-version once I start integrating with SciPy. I have a couple of questions remaining. First, should I implement this in SciPy? StatsModels? Both? Secondly, can I use Cython to generate C code for the function fixed_point? Or do I need to write it up in the Numpy C API? If there is somewhere else I should post this and/or someone I should directly contact, I would greatly appreciate it. Thanks, Daniel From vanderplas at astro.washington.edu Wed Jan 23 15:30:19 2013 From: vanderplas at astro.washington.edu (Jake Vanderplas) Date: Wed, 23 Jan 2013 12:30:19 -0800 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: <5100485B.4000307@astro.washington.edu> Hi Daniel, That looks like a nice implementation. My concern about adding it to scipy is twofold: 1) Is this a well-known and well-proven technique, or is it more cutting-edge? My view is that scipy should not seek to implement every cutting-edge algorithm: in the long-run this will lead to code bloat and difficulty of maintenance. If that's the case, your code might be a better fit for statsmodels or another more specialized package. 2) The algorithm seems limited to one or maybe two dimensions. scipy.stats.gaussian_kde is designed for N dimensions, so it might be difficult to find a fit for this bandwidth selection method. One option might be to allow this bandwidth selection method via a flag in scipy.stats.gaussian_kde, and raise an error if the dimensionality is too high. To do that, your code would need to be reworked fairly extensively to fit in the gaussian_kde class. I'd like other devs to weigh-in about the algorithm, especially my concern #1, before any work starts on a scipy PR. Thanks, Jake On 01/23/2013 12:11 PM, Daniel Smith wrote: > Hello, > > This was started on a different thread, but I thought I would post a > new thread focused on this. Currently, I have some existing code that > implements the bandwidth selection algorithm from: > > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density > estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, > 2010. > > Zdravko Botev implemented the code in MatLab which can be found here: > > http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator > > My code for that is here: > > https://github.com/Daniel-B-Smith/KDE-for-SciPy > > I assume I probably need to find a workaround to avoid the float128 in > the function fixed_point before I can add it to SciPy. I wrote the > code a couple of years ago, so it will take me a moment to map out the > best workaround (there is a very large number being multiplied by a > very small number). I can also add the 2d-version once I start > integrating with SciPy. I have a couple of questions remaining. First, > should I implement this in SciPy? StatsModels? Both? Secondly, can I > use Cython to generate C code for the function fixed_point? Or do I > need to write it up in the Numpy C API? > > If there is somewhere else I should post this and/or someone I should > directly contact, I would greatly appreciate it. > > Thanks, > Daniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From ralf.gommers at gmail.com Wed Jan 23 17:21:11 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 23 Jan 2013 23:21:11 +0100 Subject: [SciPy-Dev] Like to participate in scipy.signals In-Reply-To: References: Message-ID: On Wed, Jan 23, 2013 at 4:50 PM, Surya Kasturi wrote: > Hi David, > > I am Surya, Electronics Engineering Student with CS background. I see that > you are one of the maintainers of > Scipy.Signals for which I am thinking to contribute some code (in Python)! > > As I am a newcomer to Opensource development and writing production level > code, I am wondering if you could help me a bit in starting things out > initially. Currently I am not really having any new ideas to pick up and > deliver modules.. So, I am very open to pick up any assigned tasks or start > with small stuff! > > Also, I have gone through some bugs in Signals module (#980 #928). > However, as I am fairly new to the whole system, it would be great if you > can provide some tips/ pointers etc for starting off. (Please let me know > if there is any other interesting stuff I can do) > Hi Surya, I had a look through the current scipy.signal tickets for ones that would be not too complex to get started with: #1621: a suggestion for performance improvement of fftconvolve(), with a patch that can be adapted. Would need some benchmarking for different size input arrays. #1637: extra normalization option for lombscargle(). Would be useful and not too hard probably. #1448: bug that should be straightforward to fix. https://github.com/scipy/scipy/pull/337: some pending improvements that could use a check, especially of change 4 (use of rfftn). Testing with various shapes/values of input would be good to do there. That hopefully gives you somewhere to start. I recommend to just dive in. Once you make some pull requests on Github you'll quickly get a feel for what's required of code added to scipy in terms of code style, tests, documentation, etc. Cheers, Ralf > Regarding my skills and interests: I know Python, Django, C. Interested in > Signal Processing - Image Processing , Web dev. > > -- I was asked by couple of guys to check up respective leads for help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 23 17:40:09 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 23 Jan 2013 17:40:09 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: <5100485B.4000307@astro.washington.edu> References: <5100485B.4000307@astro.washington.edu> Message-ID: On Wed, Jan 23, 2013 at 3:30 PM, Jake Vanderplas wrote: > Hi Daniel, > That looks like a nice implementation. My concern about adding it to > scipy is twofold: > > 1) Is this a well-known and well-proven technique, or is it more > cutting-edge? My view is that scipy should not seek to implement every > cutting-edge algorithm: in the long-run this will lead to code bloat and > difficulty of maintenance. If that's the case, your code might be a > better fit for statsmodels or another more specialized package. 146 citations in google scholar for the paper since 2010 across many fields 169 downloads in the last month for the matlab version The availability of the matlab code is increasing the number of citations, from what I can see in a few examples. So, it looks popular and it works, even if it's new. Using fft for kde is old, but I didn't look yet at the details. > > 2) The algorithm seems limited to one or maybe two dimensions. > scipy.stats.gaussian_kde is designed for N dimensions, so it might be > difficult to find a fit for this bandwidth selection method. One option > might be to allow this bandwidth selection method via a flag in > scipy.stats.gaussian_kde, and raise an error if the dimensionality is > too high. To do that, your code would need to be reworked fairly > extensively to fit in the gaussian_kde class. My guess is that it doesn't make much sense to merge it into gaussian_kde. I doubt there will be much direct code sharing, and the implementation differs quite a bit. In statsmodels we have separate classes for univariate and multivariate kde (although most of the kernel density estimation and kernel regression in statsmodels is new and not settled yet). Josef > > I'd like other devs to weigh-in about the algorithm, especially my > concern #1, before any work starts on a scipy PR. Thanks, > Jake > > On 01/23/2013 12:11 PM, Daniel Smith wrote: >> Hello, >> >> This was started on a different thread, but I thought I would post a >> new thread focused on this. Currently, I have some existing code that >> implements the bandwidth selection algorithm from: >> >> Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density >> estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, >> 2010. >> >> Zdravko Botev implemented the code in MatLab which can be found here: >> >> http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator >> >> My code for that is here: >> >> https://github.com/Daniel-B-Smith/KDE-for-SciPy >> >> I assume I probably need to find a workaround to avoid the float128 in >> the function fixed_point before I can add it to SciPy. I wrote the >> code a couple of years ago, so it will take me a moment to map out the >> best workaround (there is a very large number being multiplied by a >> very small number). I can also add the 2d-version once I start >> integrating with SciPy. I have a couple of questions remaining. First, >> should I implement this in SciPy? StatsModels? Both? Secondly, can I >> use Cython to generate C code for the function fixed_point? Or do I >> need to write it up in the Numpy C API? >> >> If there is somewhere else I should post this and/or someone I should >> directly contact, I would greatly appreciate it. >> >> Thanks, >> Daniel >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From ralf.gommers at gmail.com Wed Jan 23 17:41:27 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 23 Jan 2013 23:41:27 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: <5100485B.4000307@astro.washington.edu> References: <5100485B.4000307@astro.washington.edu> Message-ID: On Wed, Jan 23, 2013 at 9:30 PM, Jake Vanderplas < vanderplas at astro.washington.edu> wrote: > Hi Daniel, > That looks like a nice implementation. My concern about adding it to > scipy is twofold: > > 1) Is this a well-known and well-proven technique, or is it more > cutting-edge? My view is that scipy should not seek to implement every > cutting-edge algorithm: in the long-run this will lead to code bloat and > difficulty of maintenance. If that's the case, your code might be a > better fit for statsmodels or another more specialized package. > It seems to be a new technique, however the paper has already 150 citations. The algorithm also seems to be straightforward to implement, so I think it could be put into scipy.stats. statsmodels.nonparametric would also be a good place for it though. > > 2) The algorithm seems limited to one or maybe two dimensions. > scipy.stats.gaussian_kde is designed for N dimensions, so it might be > difficult to find a fit for this bandwidth selection method. One option > might be to allow this bandwidth selection method via a flag in > scipy.stats.gaussian_kde, and raise an error if the dimensionality is > too high. To do that, your code would need to be reworked fairly > extensively to fit in the gaussian_kde class. > > I'd like other devs to weigh-in about the algorithm, especially my > concern #1, before any work starts on a scipy PR. > I quickly browsed the paper and original (BSD-licensed) code. My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation method but an adaptive density estimator. The method is only 1-D, but will handle especially multimodal distributions much better than gaussian_kde. My suggestion would be to implement the density estimator and do a good amount of performance testing, at least show that the performance is as good as described in table 1 of the paper. Then we can still decide where to put it. Ralf > On 01/23/2013 12:11 PM, Daniel Smith wrote: > > Hello, > > > > This was started on a different thread, but I thought I would post a > > new thread focused on this. Currently, I have some existing code that > > implements the bandwidth selection algorithm from: > > > > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density > > estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, > > 2010. > > > > Zdravko Botev implemented the code in MatLab which can be found here: > > > > > http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator > > > > My code for that is here: > > > > https://github.com/Daniel-B-Smith/KDE-for-SciPy > > > > I assume I probably need to find a workaround to avoid the float128 in > > the function fixed_point before I can add it to SciPy. I wrote the > > code a couple of years ago, so it will take me a moment to map out the > > best workaround (there is a very large number being multiplied by a > > very small number). I can also add the 2d-version once I start > > integrating with SciPy. I have a couple of questions remaining. First, > > should I implement this in SciPy? StatsModels? Both? Secondly, can I > > use Cython to generate C code for the function fixed_point? Or do I > > need to write it up in the Numpy C API? > > > > If there is somewhere else I should post this and/or someone I should > > directly contact, I would greatly appreciate it. > > > > Thanks, > > Daniel > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 23 18:29:32 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 24 Jan 2013 00:29:32 +0100 Subject: [SciPy-Dev] weave for Python3.3 ? In-Reply-To: References:

Message-ID: On Wed, Jan 23, 2013 at 11:54 AM, Helmut Jarausch < jarausch at igpm.rwth-aachen.de> wrote: > On Tue, 22 Jan 2013 19:59:33 +0100, Ralf Gommers wrote: > > > On Tue, Jan 22, 2013 at 6:25 PM, Pauli Virtanen wrote: > > > >> 22.01.2013 14:12, Helmut Jarausch kirjoitti: > >> > has anybody started to update weave for Python3.3? > >> > It's not a simple 2to3 call. > >> > >> Nobody, as far as I know. > > > > > > And if you're considering working on that: think really hard first about > > why you wouldn't move to Cython instead. > > > > Yes, I prefer Cython myself. But weave is still part of the GIT version of > SciPy. And that's the reason why the installation of SciPy fails for > Python3.3 since the weave parts cannot be (pre-)compiled during install. > Hmm, haven't heard that before. Scipy installs fine on 3.1 and 3.2 at least. Weave preventing an install is quite different from it not working under 3.3, and should be much easier to fix. Can you give some details on how you're installing (build command, OS) and how it's failing for you? > What about removing weave altogether from SciPy? > That's not an option, for backward compatibility reasons. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at barbierdereuille.net Thu Jan 24 07:52:27 2013 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Thu, 24 Jan 2013 13:52:27 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: <5100485B.4000307@astro.washington.edu> Message-ID: Hi, I am not a developer of SciPy per se, but I am currently looking into these KDE methods with bounded domains. First, it looks like there are two parts with the algorithm: 1 - the estimation of the bandwidth 2 - the estimation of the density It should be easy to separate them and use the estimation of the bandwidth without the density estimation. This would be interesting as it seems that the density needs to be estimated on a regular mesh. It also allows comparison of the method with other estimators. For example, as stated in the paper, the method is equivalent to a reflexion method with a gaussian kernel. But the renormalisation method gives very similar results too, without enforcing that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). I have a different concern though: is it normal that the density returned by the method is not normalized? (i.e. the integral of the density is far from being one). Also, can you generalise the bandwidth calculation to unbounded domains? or at least half-domains (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. -- Barbier de Reuille Pierre On 23 January 2013 23:41, Ralf Gommers wrote: > > > > On Wed, Jan 23, 2013 at 9:30 PM, Jake Vanderplas < > vanderplas at astro.washington.edu> wrote: > >> Hi Daniel, >> That looks like a nice implementation. My concern about adding it to >> scipy is twofold: >> >> 1) Is this a well-known and well-proven technique, or is it more >> cutting-edge? My view is that scipy should not seek to implement every >> cutting-edge algorithm: in the long-run this will lead to code bloat and >> difficulty of maintenance. If that's the case, your code might be a >> better fit for statsmodels or another more specialized package. >> > > It seems to be a new technique, however the paper has already 150 > citations. The algorithm also seems to be straightforward to implement, so > I think it could be put into scipy.stats. statsmodels.nonparametric would > also be a good place for it though. > > >> >> 2) The algorithm seems limited to one or maybe two dimensions. >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be >> difficult to find a fit for this bandwidth selection method. One option >> might be to allow this bandwidth selection method via a flag in >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is >> too high. To do that, your code would need to be reworked fairly >> extensively to fit in the gaussian_kde class. >> >> I'd like other devs to weigh-in about the algorithm, especially my >> concern #1, before any work starts on a scipy PR. >> > > I quickly browsed the paper and original (BSD-licensed) code. My > impression is that this can't be integrated with gaussian_kde - it's not a > bandwidth estimation method but an adaptive density estimator. The method > is only 1-D, but will handle especially multimodal distributions much > better than gaussian_kde. > > My suggestion would be to implement the density estimator and do a good > amount of performance testing, at least show that the performance is as > good as described in table 1 of the paper. Then we can still decide where > to put it. > > Ralf > > > >> On 01/23/2013 12:11 PM, Daniel Smith wrote: >> > Hello, >> > >> > This was started on a different thread, but I thought I would post a >> > new thread focused on this. Currently, I have some existing code that >> > implements the bandwidth selection algorithm from: >> > >> > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density >> > estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, >> > 2010. >> > >> > Zdravko Botev implemented the code in MatLab which can be found here: >> > >> > >> http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator >> > >> > My code for that is here: >> > >> > https://github.com/Daniel-B-Smith/KDE-for-SciPy >> > >> > I assume I probably need to find a workaround to avoid the float128 in >> > the function fixed_point before I can add it to SciPy. I wrote the >> > code a couple of years ago, so it will take me a moment to map out the >> > best workaround (there is a very large number being multiplied by a >> > very small number). I can also add the 2d-version once I start >> > integrating with SciPy. I have a couple of questions remaining. First, >> > should I implement this in SciPy? StatsModels? Both? Secondly, can I >> > use Cython to generate C code for the function fixed_point? Or do I >> > need to write it up in the Numpy C API? >> > >> > If there is somewhere else I should post this and/or someone I should >> > directly contact, I would greatly appreciate it. >> > >> > Thanks, >> > Daniel >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smith.daniel.br at gmail.com Thu Jan 24 09:49:39 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Thu, 24 Jan 2013 09:49:39 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality Message-ID: Ok, let's see if I can respond to everyone's comments. >From Jake: > 2) The algorithm seems limited to one or maybe two dimensions. > scipy.stats.gaussian_kde is designed for N dimensions, so it might be > difficult to find a fit for this bandwidth selection method. One option > might be to allow this bandwidth selection method via a flag in > scipy.stats.gaussian_kde, and raise an error if the dimensionality is > too high. To do that, your code would need to be reworked fairly > extensively to fit in the gaussian_kde class. In principal, this method can be applied in N dimensions. However, I think it would be unwise to do so. The method requires that you simultaneously estimate the density and the bandwidth. Because of that, you have to implement the method on some mesh, and mesh size grows exponentially with the number of dimensions. The code certainly could be reworked to work in N dimensions, but I don't think it would be effective enough to be worth the effort. The results are also primarily used for visualization, which is useless beyond 2-d. >From Ralf: > My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation method but an adaptive density estimator. It's both. The bandwidth estimate falls out of the density estimate. That bandwidth estimate could be easily used to generate an estimate on a different mesh. > My suggestion would be to implement the density estimator and do a good amount of performance testing, at least show that the performance is as good as described in table 1 of the paper. I can certainly do that. I will post here when the tests are up and running. >From Barbier de Reuille Pierre: > It should be easy to separate them and use the estimation of the bandwidth without the density estimation. Unfortunately, that is not the case. The bandwidth estimate is generated from a fixed point calculation based on the norm of a derivative of the estimated density. Unless I am missing something, it would not be possible to estimate that derivative without an explicit density estimate. Fourier coordinates are used because the derivative estimate is simpler in those coordinates. > For example, as stated in the paper, the method is equivalent to a reflexion method with a gaussian kernel. But the renormalisation method gives very similar results too, without enforcing > that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). I have not currently implemented any boundary corrections, but it would not be difficult to implement the renormalization method using the bandwidth estimate from this method. It would require a second density estimate, but the estimate would be much, much better than the current code. > Also, can you generalise the bandwidth calculation to unbounded domains? or at least half-domains (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. In fact, the method currently only works on unbounded domains. The exact domain you calculate the density on is an optional parameter to the density estimator function. The actual domain you calculate on has to be finite because a finite mesh is needed. > I have a different concern though: is it normal that the density returned by the method is not normalized? (i.e. the integral of the density is far from being one). That's a bug. I can fix that with one line of code. I have always just used the density estimate without units, because they aren't particularly informative. However, the output should be normalized, or at least a flag included to make it so. It seems like the next step is to set up a testing regime for comparison to the two existing methods to compare speed and reproduce the data from Table 1 in the paper. Also, it seems likely that statsmodels is the more appropriate setting for this project. In particular, I want to generalize the method to periodic domains, which appears to be a novel implementation so more intensive testing will likely be needed. Thanks, Daniel From pav at iki.fi Thu Jan 24 09:51:00 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jan 2013 14:51:00 +0000 (UTC) Subject: [SciPy-Dev] Like to participate in scipy.signals References:

Message-ID: Ralf Gommers gmail.com> writes: [clip] > Hi Surya, I had a look through the current scipy.signal > tickets for ones that would be not too complex to get > started with: [clip] BTW, the full ticket list is here: http://projects.scipy.org/scipy/query? status=apply&status=needs_decision&status=needs_info&status=needs_review&status= needs_work&status=new&status=reopened&component=scipy.signal&order=priority&col= id&col=summary&col=status&col=type&col=priority&col=milestone&col=component The typical work cycle that we use is, - start a new branch for Git for the feature / bug fix - work on it - submit a pull request for code review Useful links: [1] http://docs.scipy.org/doc/numpy/dev/gitwash/development_setup.html [2] https://github.com/scipy/scipy/blob/master/HACKING.rst.txt [3] https://help.github.com/ To get your hands on the source code, the first step you probably want to do is to create an account on github.com, and press the "Fork" button on https://github.com/scipy/scipy After that, you can start working with the version controlled source code as explained in [1] If you run into trouble, feel free to ask here. I'm not very familiar with DSP, so I can't give specific hints on what would be nice projects in scipy.signal (but Ralf gave some pointers). *** There's one large longer-term project I know touching scipy.signal, though (but probably not good for a first project) --- scipy.signal has some functions dealing with B-splines on a uniform grid. Similar B-spline stuff appears in scipy.ndimage and scipy.interpolate (non-uniform grid splines). However, since these sub-packages have been written at different times by different people, the spline representations they use are not compatible or the inner workings are not exposed. Some house cleaning should be done --- we'd probably need a single way to represent and evaluate B-splines either on uniform or irregular grid, for 1-D and tensor product splines in N-dim (which is what scipy.ndimage AFAIK uses). Then the different packages in Scipy could work with these common spline objects --- the B-spline is a commonly appearing object deserving its own abstraction. *** On the web development side one project would be improving the http://scipy-central.org code sharing site. You can find the source code for it here (uses Django): https://github.com/kgdunn/SciPyCentral/ The main things to polish would probably be trying to make the interface more attractive and easier to use, i.e., some web design work, and some development. -- Pauli Virtanen From pav at iki.fi Thu Jan 24 09:57:39 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jan 2013 14:57:39 +0000 (UTC) Subject: [SciPy-Dev] weave for Python3.3 ? References:

Message-ID: Arnd Baecker web.de> writes: [clip] > In complete ignorance of how much work it would be to get > weave to python3.3, I can only add an end-users perspective: > we have a lot of code for our research projects which uses weave. > Only few people in our group started using cython recently, > so I don't see that we will move the well-used and tested > code over to cython in the near future.... The main stumbling block in porting weave is the way Python I/O was changed in Python 3, i.e., no FILE* pointers any more. Also, unicode conversion maybe causes some issues. So, the data type conversions are not completely straightforward to port. Porting it is certainly possible, but requires more work than other parts of Numpy and Scipy, so I dropped it at the time. -- Pauli Virtanen From suryak at ieee.org Thu Jan 24 10:03:59 2013 From: suryak at ieee.org (Surya Kasturi) Date: Thu, 24 Jan 2013 20:33:59 +0530 Subject: [SciPy-Dev] Like to participate in scipy.signals In-Reply-To: References:

Message-ID: On Thu, Jan 24, 2013 at 8:21 PM, Pauli Virtanen wrote: > Ralf Gommers gmail.com> writes: > [clip] > > Hi Surya, I had a look through the current scipy.signal > > tickets for ones that would be not too complex to get > > started with: > [clip] > > BTW, the full ticket list is here: > > http://projects.scipy.org/scipy/query? > > status=apply&status=needs_decision&status=needs_info&status=needs_review&status= > > needs_work&status=new&status=reopened&component=scipy.signal&order=priority&col= > id&col=summary&col=status&col=type&col=priority&col=milestone&col=component > > The typical work cycle that we use is, > > - start a new branch for Git for the feature / bug fix > - work on it > - submit a pull request for code review > > Useful links: > > [1] http://docs.scipy.org/doc/numpy/dev/gitwash/development_setup.html > > [2] https://github.com/scipy/scipy/blob/master/HACKING.rst.txt > > [3] https://help.github.com/ > > To get your hands on the source code, the first step you probably want > to do is to create an account on github.com, and press the "Fork" button > on https://github.com/scipy/scipy After that, you can start working with > the version controlled source code as explained in [1] > > If you run into trouble, feel free to ask here. > > I'm not very familiar with DSP, so I can't give specific hints on what > would be nice projects in scipy.signal (but Ralf gave some pointers). > > I am currently looking into what Ralf has said! Now, I am going through the documentation.. So, how far I should be reading the source-code of the project before proceeding with the bugs? I seriously feel reading code is a very tough task! Actually, I am going ahead for Signals module is because I study Electronics and DSP is the area where I like to specialize.. However, I am very open to other projects too.. I just want to get involved in a beautiful Scientific Project and improve coding skills *** > > There's one large longer-term project I know touching scipy.signal, > though (but probably not good for a first project) --- scipy.signal > has some functions dealing with B-splines on a uniform grid. Similar > B-spline stuff appears in scipy.ndimage and scipy.interpolate > (non-uniform grid splines). However, since these sub-packages have > been written at different times by different people, the spline > representations they use are not compatible or the inner workings > are not exposed. > > Some house cleaning should be done --- we'd probably need > a single way to represent and evaluate B-splines either on uniform > or irregular grid, for 1-D and tensor product splines in N-dim (which > is what scipy.ndimage AFAIK uses). Then the different packages in > Scipy could work with these common spline objects --- the B-spline > is a commonly appearing object deserving its own abstraction. > > *** > > On the web development side one project would be improving > the http://scipy-central.org code sharing site. You can find > the source code for it here (uses Django): > https://github.com/kgdunn/SciPyCentral/ > > The main things to polish would probably be trying to make the > interface more attractive and easier to use, i.e., some web design > work, and some development. > > Looking into it now! This website looks very simple. May be we can try to use Twitter Bootstrap... Its UI is quite attractive and even simple to implement. What do you say? Probably I might propose new ideas if I look into the Django Stuff (the development part). -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jan 24 10:20:56 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jan 2013 15:20:56 +0000 (UTC) Subject: [SciPy-Dev] Like to participate in scipy.signals References:

Message-ID: Surya Kasturi ieee.org> writes: [clip] > I am currently looking into ?what Ralf has said! Now, I am going > through the documentation..? > So, how far I should be reading the source-code of the project > before proceeding with the bugs? I seriously feel reading code > is a very tough task! With the bugs, I would only read the part of the source code that you need to, i.e., (i) try to first have a mathematical background understanding of what the computation is supposed to do, (ii) then try to locate the piece that is failing, (iii) look at the source code of the failing part, and then repeat from (i) on that. Scipy is quite wide topic-wise, so understanding what all the code e.g. in scipy.signal does takes more effort than the typical non-scientific code base. Luckily, different topics tend to be separated, so you can conserve your energy by digging only into a single one at a time. [clip: scipy-central] > Looking into it now! This website looks very simple. > May be we can try to use Twitter Bootstrap... Its UI is > quite attractive and even simple to implement. > > What do you say?? > > Probably I might propose new ideas if I look into > the Django Stuff (the development part). Yes, certainly. With the design one might want to make it somehow look like it's to some degree affiliated with scipy.org (although the design for the latter looks like from the last century :) Using Bootstrap as the base layout did come up some time ago in discussions, and I think was regarded as a good idea. The scipy-central project has been "sleeping" for a year now, but it would be nice to have again more activity with it. -- Pauli Virtanen From suryak at ieee.org Thu Jan 24 10:31:10 2013 From: suryak at ieee.org (Surya Kasturi) Date: Thu, 24 Jan 2013 21:01:10 +0530 Subject: [SciPy-Dev] Like to participate in scipy.signals In-Reply-To: References:

Message-ID: On Thu, Jan 24, 2013 at 8:50 PM, Pauli Virtanen wrote: > Surya Kasturi ieee.org> writes: > [clip] > > I am currently looking into what Ralf has said! Now, I am going > > through the documentation.. > > So, how far I should be reading the source-code of the project > > before proceeding with the bugs? I seriously feel reading code > > is a very tough task! > > With the bugs, I would only read the part of the source code > that you need to, i.e., (i) try to first have a mathematical > background understanding of what the computation is supposed > to do, (ii) then try to locate the piece that is failing, > (iii) look at the source code of the failing part, and > then repeat from (i) on that. > > Scipy is quite wide topic-wise, so understanding what all the > code e.g. in scipy.signal does takes more effort than the typical > non-scientific code base. Luckily, different topics tend to be > separated, so you can conserve your energy by digging only into > a single one at a time. > > [clip: scipy-central] > > Looking into it now! This website looks very simple. > > May be we can try to use Twitter Bootstrap... Its UI is > > quite attractive and even simple to implement. > > > > What do you say? > > > > Probably I might propose new ideas if I look into > > the Django Stuff (the development part). > > Yes, certainly. With the design one might want to make it somehow > look like it's to some degree affiliated with scipy.org (although > the design for the latter looks like from the last century :) > Using Bootstrap as the base layout did come up some time ago in > discussions, and I think was regarded as a good idea. > > The scipy-central project has been "sleeping" for a year now, but > it would be nice to have again more activity with it. > Yeah! I just looked into the repo.. last commit was about a year ago! As you said, it might be mandatory to create a design that looks like scipy.org otherwise, people might feel to have landed on Mars. Also, I see that SciPy Central doesn't have support for OpenID.. So, considering my Django experience, with some help/ guidance I can do that! We can possibly write from scratch or user some opensource projects which have already done some homework. http://stackoverflow.com/q/2123369/1162468 --> we can see lots of libs. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From suryak at ieee.org Thu Jan 24 10:49:10 2013 From: suryak at ieee.org (Surya Kasturi) Date: Thu, 24 Jan 2013 21:19:10 +0530 Subject: [SciPy-Dev] Like to participate in scipy.signals In-Reply-To: References:

Message-ID: On Thu, Jan 24, 2013 at 9:01 PM, Surya Kasturi wrote: > > > On Thu, Jan 24, 2013 at 8:50 PM, Pauli Virtanen wrote: > >> Surya Kasturi ieee.org> writes: >> [clip] >> > I am currently looking into what Ralf has said! Now, I am going >> > through the documentation.. >> > So, how far I should be reading the source-code of the project >> > before proceeding with the bugs? I seriously feel reading code >> > is a very tough task! >> >> With the bugs, I would only read the part of the source code >> that you need to, i.e., (i) try to first have a mathematical >> background understanding of what the computation is supposed >> to do, (ii) then try to locate the piece that is failing, >> (iii) look at the source code of the failing part, and >> then repeat from (i) on that. >> >> Scipy is quite wide topic-wise, so understanding what all the >> code e.g. in scipy.signal does takes more effort than the typical >> non-scientific code base. Luckily, different topics tend to be >> separated, so you can conserve your energy by digging only into >> a single one at a time. >> >> [clip: scipy-central] >> > Looking into it now! This website looks very simple. >> > May be we can try to use Twitter Bootstrap... Its UI is >> > quite attractive and even simple to implement. >> > >> > What do you say? >> > >> > Probably I might propose new ideas if I look into >> > the Django Stuff (the development part). >> >> Yes, certainly. With the design one might want to make it somehow >> look like it's to some degree affiliated with scipy.org (although >> the design for the latter looks like from the last century :) >> Using Bootstrap as the base layout did come up some time ago in >> discussions, and I think was regarded as a good idea. >> >> The scipy-central project has been "sleeping" for a year now, but >> it would be nice to have again more activity with it. >> > > Yeah! I just looked into the repo.. last commit was about a year ago! > > As you said, it might be mandatory to create a design that looks like > scipy.org otherwise, people might feel to have landed on Mars. > > Also, I see that SciPy Central doesn't have support for OpenID.. So, > considering my Django experience, with some help/ guidance I can do that! > We can possibly write from scratch or user some opensource projects which > have already done some homework. > > http://stackoverflow.com/q/2123369/1162468 --> we can see lots of libs. > > > >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > > May be I seem to land up on few ideas to discuss on Scipy Central.. we might need some new thread for it to discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From suryak at ieee.org Thu Jan 24 10:54:07 2013 From: suryak at ieee.org (Surya Kasturi) Date: Thu, 24 Jan 2013 21:24:07 +0530 Subject: [SciPy-Dev] Updating SciPy-Central Website Message-ID: After discussing some ideas, I consolidated few ideas on SciPy Central 1. We could improve the design of the site 1.a) Using Twitter Bootstrap 1.b) OR, we can take up scipy.org design template and implement 2. We could add OpenID feature for the site. So, what do you say? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Thu Jan 24 11:10:47 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Thu, 24 Jan 2013 17:10:47 +0100 Subject: [SciPy-Dev] Updating SciPy-Central Website In-Reply-To: References: Message-ID: <51015D07.4030201@hilboll.de> > After discussing some ideas, I consolidated few ideas on SciPy Central > > 1. We could improve the design of the site > > 1.a) Using Twitter Bootstrap > 1.b) OR, we can take up scipy.org design template > and implement > > 2. We could add OpenID feature for the site. > > So, what do you say? How about using this new impetus to redesign the SciPy website as well? There's a 'new' website at http://scipy.github.com/, the sources are at https://github.com/scipy/scipy.org-new. The SciPy website itself is created using Sphinx (http://sphinx-doc.org), and I guess this should not change (correct me if I'm wrong). Over the past months, I've seen some quite modern Sphinx themes (apart from sphinx-doc.org, see e.g. http://packages.python.org/cloud_sptheme/), and I'm sure for someone with modern CSS/webdesign skills, it should be easy to create a new Sphinx theme based on boostrap, which could then at the same time be the basis for a new scipy-central.org theme. Also, I recomment to search this mailing list for threads about scipy.github.org / scipy.org-new and scipy-central, I remember some good ideas / opinions about general design wishes. Just my 2 cts. Cheers, Andreas. From josef.pktd at gmail.com Thu Jan 24 11:19:28 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2013 11:19:28 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith wrote: > Ok, let's see if I can respond to everyone's comments. > > >From Jake: > >> 2) The algorithm seems limited to one or maybe two dimensions. >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be >> difficult to find a fit for this bandwidth selection method. One option >> might be to allow this bandwidth selection method via a flag in >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is >> too high. To do that, your code would need to be reworked fairly >> extensively to fit in the gaussian_kde class. > > In principal, this method can be applied in N dimensions. However, I > think it would be unwise to do so. The method requires that you > simultaneously estimate the density and the bandwidth. Because of > that, you have to implement the method on some mesh, and mesh size > grows exponentially with the number of dimensions. The code certainly > could be reworked to work in N dimensions, but I don't think it would > be effective enough to be worth the effort. The results are also > primarily used for visualization, which is useless beyond 2-d. > > >From Ralf: > >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation method but an adaptive density estimator. > > It's both. The bandwidth estimate falls out of the density estimate. > That bandwidth estimate could be easily used to generate an estimate > on a different mesh. > >> My suggestion would be to implement the density estimator and do a good amount of performance testing, at least show that the performance is as good as described in table 1 of the paper. > > I can certainly do that. I will post here when the tests are up and running. > > >From Barbier de Reuille Pierre: > >> It should be easy to separate them and use the estimation of the bandwidth without the density estimation. > > Unfortunately, that is not the case. The bandwidth estimate is > generated from a fixed point calculation based on the norm of a > derivative of the estimated density. Unless I am missing something, it > would not be possible to estimate that derivative without an explicit > density estimate. Fourier coordinates are used because the derivative > estimate is simpler in those coordinates. > >> For example, as stated in the paper, the method is equivalent to a reflexion method with a gaussian kernel. But the renormalisation method gives very similar results too, without enforcing >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). > > I have not currently implemented any boundary corrections, but it > would not be difficult to implement the renormalization method using > the bandwidth estimate from this method. It would require a second > density estimate, but the estimate would be much, much better than the > current code. > >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half-domains (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. > > In fact, the method currently only works on unbounded domains. The > exact domain you calculate the density on is an optional parameter to > the density estimator function. The actual domain you calculate on has > to be finite because a finite mesh is needed. > >> I have a different concern though: is it normal that the density returned by the method is not normalized? (i.e. the integral of the density is far from being one). > > That's a bug. I can fix that with one line of code. I have always just > used the density estimate without units, because they aren't > particularly informative. However, the output should be normalized, or > at least a flag included to make it so. > > > > It seems like the next step is to set up a testing regime for > comparison to the two existing methods to compare speed and reproduce > the data from Table 1 in the paper. Also, it seems likely that > statsmodels is the more appropriate setting for this project. In > particular, I want to generalize the method to periodic domains, which > appears to be a novel implementation so more intensive testing will > likely be needed. Related to this part: I would like to have in statsmodels a collection of commonly used examples processes, and I'd like to add the Marron, Wand examples there. For kernel regression, I started with this during the nonparametric merge review https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/nonparametric/dgp_examples.py https://groups.google.com/d/topic/pystatsmodels/itS_DyPHLA8/discussion (some of those examples show that also for kernel regression we need adaptive bandwidth in cases with uneven "smoothness") I had looked at the Marron Wand examples before, but IIRC, it was for either orthogonal series or spline estimation. Statsmodels is using cython to do the binning for the fft based kde, but I never checked how much the speed gain is compared to np.histogram for example. (Skipper's work) Josef > > Thanks, > Daniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From smith.daniel.br at gmail.com Thu Jan 24 11:41:49 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Thu, 24 Jan 2013 11:41:49 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: gmail.com> writes: > > On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith gmail.com> wrote: > > Ok, let's see if I can respond to everyone's comments. > > > > >From Jake: > > > >> 2) The algorithm seems limited to one or maybe two dimensions. > >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be > >> difficult to find a fit for this bandwidth selection method. One option > >> might be to allow this bandwidth selection method via a flag in > >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is > >> too high. To do that, your code would need to be reworked fairly > >> extensively to fit in the gaussian_kde class. > > > > In principal, this method can be applied in N dimensions. However, I > > think it would be unwise to do so. The method requires that you > > simultaneously estimate the density and the bandwidth. Because of > > that, you have to implement the method on some mesh, and mesh size > > grows exponentially with the number of dimensions. The code certainly > > could be reworked to work in N dimensions, but I don't think it would > > be effective enough to be worth the effort. The results are also > > primarily used for visualization, which is useless beyond 2-d. > > > > >From Ralf: > > > >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation > method but an adaptive density estimator. > > > > It's both. The bandwidth estimate falls out of the density estimate. > > That bandwidth estimate could be easily used to generate an estimate > > on a different mesh. > > > >> My suggestion would be to implement the density estimator and do a good amount of performance testing, at > least show that the performance is as good as described in table 1 of the paper. > > > > I can certainly do that. I will post here when the tests are up and running. > > > > >From Barbier de Reuille Pierre: > > > >> It should be easy to separate them and use the estimation of the bandwidth without the density estimation. > > > > Unfortunately, that is not the case. The bandwidth estimate is > > generated from a fixed point calculation based on the norm of a > > derivative of the estimated density. Unless I am missing something, it > > would not be possible to estimate that derivative without an explicit > > density estimate. Fourier coordinates are used because the derivative > > estimate is simpler in those coordinates. > > > >> For example, as stated in the paper, the method is equivalent to a reflexion method with a gaussian > kernel. But the renormalisation method gives very similar results too, without enforcing > >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). > > > > I have not currently implemented any boundary corrections, but it > > would not be difficult to implement the renormalization method using > > the bandwidth estimate from this method. It would require a second > > density estimate, but the estimate would be much, much better than the > > current code. > > > >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half- domains > (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. > > > > In fact, the method currently only works on unbounded domains. The > > exact domain you calculate the density on is an optional parameter to > > the density estimator function. The actual domain you calculate on has > > to be finite because a finite mesh is needed. > > > >> I have a different concern though: is it normal that the density returned by the method is not normalized? > (i.e. the integral of the density is far from being one). > > > > That's a bug. I can fix that with one line of code. I have always just > > used the density estimate without units, because they aren't > > particularly informative. However, the output should be normalized, or > > at least a flag included to make it so. > > > > > > > > It seems like the next step is to set up a testing regime for > > comparison to the two existing methods to compare speed and reproduce > > the data from Table 1 in the paper. Also, it seems likely that > > statsmodels is the more appropriate setting for this project. In > > particular, I want to generalize the method to periodic domains, which > > appears to be a novel implementation so more intensive testing will > > likely be needed. > > Related to this part: > > I would like to have in statsmodels a collection of commonly used > examples processes, and I'd like to add the Marron, Wand examples > there. Do you have existing code in statsmodels for this? If I'm already writing up such a thing for testing, it's worth my time to integrate it into statsmodels. A lot of things are already in np.random, but I could extend that in statsmodels with the examples from the Botev paper and those from the Marron and Wand paper. > > For kernel regression, I started with this during the nonparametric merge review > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/nonparametric/dgp_e xamples.py > https://groups.google.com/d/topic/pystatsmodels/itS_DyPHLA8/discussion > > (some of those examples show that also for kernel regression we need > adaptive bandwidth in cases with uneven "smoothness") > > I had looked at the Marron Wand examples before, but IIRC, it was for > either orthogonal series or spline estimation. > > Statsmodels is using cython to do the binning for the fft based kde, > but I never checked how much the speed gain is compared to > np.histogram for example. (Skipper's work) > > Josef > > > > > Thanks, > > Daniel > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Thu Jan 24 11:49:21 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2013 11:49:21 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: On Thu, Jan 24, 2013 at 11:41 AM, Daniel Smith wrote: > gmail.com> writes: > >> >> On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith gmail.com> wrote: >> > Ok, let's see if I can respond to everyone's comments. >> > >> > >From Jake: >> > >> >> 2) The algorithm seems limited to one or maybe two dimensions. >> >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be >> >> difficult to find a fit for this bandwidth selection method. One option >> >> might be to allow this bandwidth selection method via a flag in >> >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is >> >> too high. To do that, your code would need to be reworked fairly >> >> extensively to fit in the gaussian_kde class. >> > >> > In principal, this method can be applied in N dimensions. However, I >> > think it would be unwise to do so. The method requires that you >> > simultaneously estimate the density and the bandwidth. Because of >> > that, you have to implement the method on some mesh, and mesh size >> > grows exponentially with the number of dimensions. The code certainly >> > could be reworked to work in N dimensions, but I don't think it would >> > be effective enough to be worth the effort. The results are also >> > primarily used for visualization, which is useless beyond 2-d. >> > >> > >From Ralf: >> > >> >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation >> method but an adaptive density estimator. >> > >> > It's both. The bandwidth estimate falls out of the density estimate. >> > That bandwidth estimate could be easily used to generate an estimate >> > on a different mesh. >> > >> >> My suggestion would be to implement the density estimator and do a good amount of > performance testing, at >> least show that the performance is as good as described in table 1 of the paper. >> > >> > I can certainly do that. I will post here when the tests are up and running. >> > >> > >From Barbier de Reuille Pierre: >> > >> >> It should be easy to separate them and use the estimation of the bandwidth without the density > estimation. >> > >> > Unfortunately, that is not the case. The bandwidth estimate is >> > generated from a fixed point calculation based on the norm of a >> > derivative of the estimated density. Unless I am missing something, it >> > would not be possible to estimate that derivative without an explicit >> > density estimate. Fourier coordinates are used because the derivative >> > estimate is simpler in those coordinates. >> > >> >> For example, as stated in the paper, the method is equivalent to a reflexion method with a > gaussian >> kernel. But the renormalisation method gives very similar results too, without enforcing >> >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). >> > >> > I have not currently implemented any boundary corrections, but it >> > would not be difficult to implement the renormalization method using >> > the bandwidth estimate from this method. It would require a second >> > density estimate, but the estimate would be much, much better than the >> > current code. >> > >> >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half- > domains >> (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. >> > >> > In fact, the method currently only works on unbounded domains. The >> > exact domain you calculate the density on is an optional parameter to >> > the density estimator function. The actual domain you calculate on has >> > to be finite because a finite mesh is needed. >> > >> >> I have a different concern though: is it normal that the density returned by the method is not > normalized? >> (i.e. the integral of the density is far from being one). >> > >> > That's a bug. I can fix that with one line of code. I have always just >> > used the density estimate without units, because they aren't >> > particularly informative. However, the output should be normalized, or >> > at least a flag included to make it so. >> > >> > >> > >> > It seems like the next step is to set up a testing regime for >> > comparison to the two existing methods to compare speed and reproduce >> > the data from Table 1 in the paper. Also, it seems likely that >> > statsmodels is the more appropriate setting for this project. In >> > particular, I want to generalize the method to periodic domains, which >> > appears to be a novel implementation so more intensive testing will >> > likely be needed. >> >> Related to this part: >> >> I would like to have in statsmodels a collection of commonly used >> examples processes, and I'd like to add the Marron, Wand examples >> there. > > Do you have existing code in statsmodels for this? If I'm already > writing up such a thing for testing, it's worth > my time to integrate it into statsmodels. A lot of things are already > in np.random, but I could > extend that in statsmodels with the examples from the Botev paper > and those from the Marron > and Wand paper. Nothing yet, I have used a few examples before to try out things, but they are spread over some uncommitted scripts. The idea to add them more systematically for reuse is only recent. Thanks, Josef > >> >> For kernel regression, I started with this during the nonparametric merge review >> > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/nonparametric/dgp_e > xamples.py >> https://groups.google.com/d/topic/pystatsmodels/itS_DyPHLA8/discussion >> >> (some of those examples show that also for kernel regression we need >> adaptive bandwidth in cases with uneven "smoothness") >> >> I had looked at the Marron Wand examples before, but IIRC, it was for >> either orthogonal series or spline estimation. >> >> Statsmodels is using cython to do the binning for the fft based kde, >> but I never checked how much the speed gain is compared to >> np.histogram for example. (Skipper's work) >> >> Josef >> >> > >> > Thanks, >> > Daniel >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From smith.daniel.br at gmail.com Thu Jan 24 11:55:15 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Thu, 24 Jan 2013 11:55:15 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: gmail.com> writes: > > On Thu, Jan 24, 2013 at 11:41 AM, Daniel Smith > gmail.com> wrote: > > gmail.com> writes: > > > >> > >> On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith gmail.com> wrote: > >> > Ok, let's see if I can respond to everyone's comments. > >> > > >> > >From Jake: > >> > > >> >> 2) The algorithm seems limited to one or maybe two dimensions. > >> >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be > >> >> difficult to find a fit for this bandwidth selection method. One option > >> >> might be to allow this bandwidth selection method via a flag in > >> >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is > >> >> too high. To do that, your code would need to be reworked fairly > >> >> extensively to fit in the gaussian_kde class. > >> > > >> > In principal, this method can be applied in N dimensions. However, I > >> > think it would be unwise to do so. The method requires that you > >> > simultaneously estimate the density and the bandwidth. Because of > >> > that, you have to implement the method on some mesh, and mesh size > >> > grows exponentially with the number of dimensions. The code certainly > >> > could be reworked to work in N dimensions, but I don't think it would > >> > be effective enough to be worth the effort. The results are also > >> > primarily used for visualization, which is useless beyond 2-d. > >> > > >> > >From Ralf: > >> > > >> >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation > >> method but an adaptive density estimator. > >> > > >> > It's both. The bandwidth estimate falls out of the density estimate. > >> > That bandwidth estimate could be easily used to generate an estimate > >> > on a different mesh. > >> > > >> >> My suggestion would be to implement the density estimator and do a good amount of > > performance testing, at > >> least show that the performance is as good as described in table 1 of the paper. > >> > > >> > I can certainly do that. I will post here when the tests are up and running. > >> > > >> > >From Barbier de Reuille Pierre: > >> > > >> >> It should be easy to separate them and use the estimation of the bandwidth without the density > > estimation. > >> > > >> > Unfortunately, that is not the case. The bandwidth estimate is > >> > generated from a fixed point calculation based on the norm of a > >> > derivative of the estimated density. Unless I am missing something, it > >> > would not be possible to estimate that derivative without an explicit > >> > density estimate. Fourier coordinates are used because the derivative > >> > estimate is simpler in those coordinates. > >> > > >> >> For example, as stated in the paper, the method is equivalent to a reflexion method with a > > gaussian > >> kernel. But the renormalisation method gives very similar results too, without enforcing > >> >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). > >> > > >> > I have not currently implemented any boundary corrections, but it > >> > would not be difficult to implement the renormalization method using > >> > the bandwidth estimate from this method. It would require a second > >> > density estimate, but the estimate would be much, much better than the > >> > current code. > >> > > >> >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half- > > domains > >> (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. > >> > > >> > In fact, the method currently only works on unbounded domains. The > >> > exact domain you calculate the density on is an optional parameter to > >> > the density estimator function. The actual domain you calculate on has > >> > to be finite because a finite mesh is needed. > >> > > >> >> I have a different concern though: is it normal that the density returned by the method is not > > normalized? > >> (i.e. the integral of the density is far from being one). > >> > > >> > That's a bug. I can fix that with one line of code. I have always just > >> > used the density estimate without units, because they aren't > >> > particularly informative. However, the output should be normalized, or > >> > at least a flag included to make it so. > >> > > >> > > >> > > >> > It seems like the next step is to set up a testing regime for > >> > comparison to the two existing methods to compare speed and reproduce > >> > the data from Table 1 in the paper. Also, it seems likely that > >> > statsmodels is the more appropriate setting for this project. In > >> > particular, I want to generalize the method to periodic domains, which > >> > appears to be a novel implementation so more intensive testing will > >> > likely be needed. > >> > >> Related to this part: > >> > >> I would like to have in statsmodels a collection of commonly used > >> examples processes, and I'd like to add the Marron, Wand examples > >> there. > > > > Do you have existing code in statsmodels for this? If I'm already > > writing up such a thing for testing, it's worth > > my time to integrate it into statsmodels. A lot of things are already > > in np.random, but I could > > extend that in statsmodels with the examples from the Botev paper > > and those from the Marron > > and Wand paper. > > Nothing yet, I have used a few examples before to try out things, but > they are spread over some uncommitted scripts. The idea to add them > more systematically for reuse is only recent. > > Thanks, > Josef Ok. If you suggest a module name, I can start to write such a thing. I assume it is ok to base it on numpy.random? > > > > >> > >> For kernel regression, I started with this during the nonparametric merge review > >> > > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/nonparametric/dgp_e > > xamples.py > >> https://groups.google.com/d/topic/pystatsmodels/itS_DyPHLA8/discussion > >> > >> (some of those examples show that also for kernel regression we need > >> adaptive bandwidth in cases with uneven "smoothness") > >> > >> I had looked at the Marron Wand examples before, but IIRC, it was for > >> either orthogonal series or spline estimation. > >> > >> Statsmodels is using cython to do the binning for the fft based kde, > >> but I never checked how much the speed gain is compared to > >> np.histogram for example. (Skipper's work) > >> > >> Josef > >> > >> > > >> > Thanks, > >> > Daniel > >> > _______________________________________________ > >> > SciPy-Dev mailing list > >> > SciPy-Dev scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-dev > >> > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Thu Jan 24 12:09:46 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2013 12:09:46 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: On Thu, Jan 24, 2013 at 11:55 AM, Daniel Smith wrote: > gmail.com> writes: > >> >> On Thu, Jan 24, 2013 at 11:41 AM, Daniel Smith >> gmail.com> wrote: >> > gmail.com> writes: >> > >> >> >> >> On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith gmail.com> wrote: >> >> > Ok, let's see if I can respond to everyone's comments. >> >> > >> >> > >From Jake: >> >> > >> >> >> 2) The algorithm seems limited to one or maybe two dimensions. >> >> >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be >> >> >> difficult to find a fit for this bandwidth selection method. One option >> >> >> might be to allow this bandwidth selection method via a flag in >> >> >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is >> >> >> too high. To do that, your code would need to be reworked fairly >> >> >> extensively to fit in the gaussian_kde class. >> >> > >> >> > In principal, this method can be applied in N dimensions. However, I >> >> > think it would be unwise to do so. The method requires that you >> >> > simultaneously estimate the density and the bandwidth. Because of >> >> > that, you have to implement the method on some mesh, and mesh size >> >> > grows exponentially with the number of dimensions. The code certainly >> >> > could be reworked to work in N dimensions, but I don't think it would >> >> > be effective enough to be worth the effort. The results are also >> >> > primarily used for visualization, which is useless beyond 2-d. >> >> > >> >> > >From Ralf: >> >> > >> >> >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation >> >> method but an adaptive density estimator. >> >> > >> >> > It's both. The bandwidth estimate falls out of the density estimate. >> >> > That bandwidth estimate could be easily used to generate an estimate >> >> > on a different mesh. >> >> > >> >> >> My suggestion would be to implement the density estimator and do a good amount of >> > performance testing, at >> >> least show that the performance is as good as described in table 1 of the paper. >> >> > >> >> > I can certainly do that. I will post here when the tests are up and running. >> >> > >> >> > >From Barbier de Reuille Pierre: >> >> > >> >> >> It should be easy to separate them and use the estimation of the bandwidth without the density >> > estimation. >> >> > >> >> > Unfortunately, that is not the case. The bandwidth estimate is >> >> > generated from a fixed point calculation based on the norm of a >> >> > derivative of the estimated density. Unless I am missing something, it >> >> > would not be possible to estimate that derivative without an explicit >> >> > density estimate. Fourier coordinates are used because the derivative >> >> > estimate is simpler in those coordinates. >> >> > >> >> >> For example, as stated in the paper, the method is equivalent to a reflexion method with a >> > gaussian >> >> kernel. But the renormalisation method gives very similar results too, without enforcing >> >> >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). >> >> > >> >> > I have not currently implemented any boundary corrections, but it >> >> > would not be difficult to implement the renormalization method using >> >> > the bandwidth estimate from this method. It would require a second >> >> > density estimate, but the estimate would be much, much better than the >> >> > current code. >> >> > >> >> >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half- >> > domains >> >> (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. >> >> > >> >> > In fact, the method currently only works on unbounded domains. The >> >> > exact domain you calculate the density on is an optional parameter to >> >> > the density estimator function. The actual domain you calculate on has >> >> > to be finite because a finite mesh is needed. >> >> > >> >> >> I have a different concern though: is it normal that the density returned by the method is not >> > normalized? >> >> (i.e. the integral of the density is far from being one). >> >> > >> >> > That's a bug. I can fix that with one line of code. I have always just >> >> > used the density estimate without units, because they aren't >> >> > particularly informative. However, the output should be normalized, or >> >> > at least a flag included to make it so. >> >> > >> >> > >> >> > >> >> > It seems like the next step is to set up a testing regime for >> >> > comparison to the two existing methods to compare speed and reproduce >> >> > the data from Table 1 in the paper. Also, it seems likely that >> >> > statsmodels is the more appropriate setting for this project. In >> >> > particular, I want to generalize the method to periodic domains, which >> >> > appears to be a novel implementation so more intensive testing will >> >> > likely be needed. >> >> >> >> Related to this part: >> >> >> >> I would like to have in statsmodels a collection of commonly used >> >> examples processes, and I'd like to add the Marron, Wand examples >> >> there. >> > >> > Do you have existing code in statsmodels for this? If I'm already >> > writing up such a thing for testing, it's worth >> > my time to integrate it into statsmodels. A lot of things are already >> > in np.random, but I could >> > extend that in statsmodels with the examples from the Botev paper >> > and those from the Marron >> > and Wand paper. >> >> Nothing yet, I have used a few examples before to try out things, but >> they are spread over some uncommitted scripts. The idea to add them >> more systematically for reuse is only recent. >> >> Thanks, >> Josef > > Ok. If you suggest a module name, I can start to write such a thing. I > assume it is ok to base it on numpy.random? Here is an example how I used it https://github.com/statsmodels/statsmodels/blob/master/statsmodels/examples/ex_kernel_regression_dgp.py There are more xxx_dgp.py examples, where I had to work around some limits in the current design. ``dgp`` for data generating process how about ``dgp_density.py``? We put the current module in the sandbox during the merge, because we still need to adjust it as we get new use cases. numpy.random is fine in your case, since it's normal mixtures. For the regression case, I wanted to have a more flexible option for choosing the distribution of x and the noise. If part of the discussion gets more statsmodels specific, then I think it would be more appropriate to take those to the statsmodels list. We could still keep general kde improvements here. Josef > >> >> > >> >> >> >> For kernel regression, I started with this during the nonparametric merge review >> >> >> > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/nonparametric/dgp_e >> > xamples.py >> >> https://groups.google.com/d/topic/pystatsmodels/itS_DyPHLA8/discussion >> >> >> >> (some of those examples show that also for kernel regression we need >> >> adaptive bandwidth in cases with uneven "smoothness") >> >> >> >> I had looked at the Marron Wand examples before, but IIRC, it was for >> >> either orthogonal series or spline estimation. >> >> >> >> Statsmodels is using cython to do the binning for the fft based kde, >> >> but I never checked how much the speed gain is compared to >> >> np.histogram for example. (Skipper's work) >> >> >> >> Josef >> >> >> >> > >> >> > Thanks, >> >> > Daniel >> >> > _______________________________________________ >> >> > SciPy-Dev mailing list >> >> > SciPy-Dev scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From pav at iki.fi Thu Jan 24 13:52:10 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jan 2013 20:52:10 +0200 Subject: [SciPy-Dev] Updating SciPy-Central Website In-Reply-To: <51015D07.4030201@hilboll.de> References: <51015D07.4030201@hilboll.de> Message-ID: Hi, 24.01.2013 18:10, Andreas Hilboll kirjoitti: [clip] > How about using this new impetus to redesign the SciPy website as well? > There's a 'new' website at http://scipy.github.com/, the sources are at > https://github.com/scipy/scipy.org-new. The SciPy website itself is > created using Sphinx (http://sphinx-doc.org), and I guess this should > not change (correct me if I'm wrong). > > Over the past months, I've seen some quite modern Sphinx themes (apart > from sphinx-doc.org, see e.g. > http://packages.python.org/cloud_sptheme/), and I'm sure for someone > with modern CSS/webdesign skills, it should be easy to create a new > Sphinx theme based on boostrap, which could then at the same time be the > basis for a new scipy-central.org theme. > > Also, I recomment to search this mailing list for threads about > scipy.github.org / scipy.org-new and scipy-central, I remember some good > ideas / opinions about general design wishes. I think the important thing here would be to not lose too much time in discussion, and just get a first version done, and then we can go on from that. Using Bootstrap is likely a good starting point. If some room is reserved for navigation tools between scipy-central.org, scipy.org, numpy.org, and docs.scipy.org, and whatnot, that will probably be enough. On the technical side, Sphinx themes are not too difficult to do, in general it's possible to do anything. Pauli From suryak at ieee.org Thu Jan 24 13:52:51 2013 From: suryak at ieee.org (Surya Kasturi) Date: Fri, 25 Jan 2013 00:22:51 +0530 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server Message-ID: I successfully installed the scipy central site on my computer. I just tried to submit a code snipped for testing.. and ran into the below error. IOError at /item/new/ [Errno 2] No such file or directory: 'compile\\conf.py' Please look into the error: http://dpaste.com/890230/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 24 14:13:16 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2013 14:13:16 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 9:49 AM, Daniel Smith wrote: > Ok, let's see if I can respond to everyone's comments. > > >From Jake: > >> 2) The algorithm seems limited to one or maybe two dimensions. >> scipy.stats.gaussian_kde is designed for N dimensions, so it might be >> difficult to find a fit for this bandwidth selection method. One option >> might be to allow this bandwidth selection method via a flag in >> scipy.stats.gaussian_kde, and raise an error if the dimensionality is >> too high. To do that, your code would need to be reworked fairly >> extensively to fit in the gaussian_kde class. > > In principal, this method can be applied in N dimensions. However, I > think it would be unwise to do so. The method requires that you > simultaneously estimate the density and the bandwidth. Because of > that, you have to implement the method on some mesh, and mesh size > grows exponentially with the number of dimensions. The code certainly > could be reworked to work in N dimensions, but I don't think it would > be effective enough to be worth the effort. The results are also > primarily used for visualization, which is useless beyond 2-d. > > >From Ralf: > >> My impression is that this can't be integrated with gaussian_kde - it's not a bandwidth estimation method but an adaptive density estimator. > > It's both. The bandwidth estimate falls out of the density estimate. > That bandwidth estimate could be easily used to generate an estimate > on a different mesh. > >> My suggestion would be to implement the density estimator and do a good amount of performance testing, at least show that the performance is as good as described in table 1 of the paper. > > I can certainly do that. I will post here when the tests are up and running. > > >From Barbier de Reuille Pierre: > >> It should be easy to separate them and use the estimation of the bandwidth without the density estimation. > > Unfortunately, that is not the case. The bandwidth estimate is > generated from a fixed point calculation based on the norm of a > derivative of the estimated density. Unless I am missing something, it > would not be possible to estimate that derivative without an explicit > density estimate. Fourier coordinates are used because the derivative > estimate is simpler in those coordinates. > >> For example, as stated in the paper, the method is equivalent to a reflexion method with a gaussian kernel. But the renormalisation method gives very similar results too, without enforcing >> that f'(0) = 0 (i.e. first derivative is always 0 on the boundaries). > > I have not currently implemented any boundary corrections, but it > would not be difficult to implement the renormalization method using > the bandwidth estimate from this method. It would require a second > density estimate, but the estimate would be much, much better than the > current code. > >> Also, can you generalise the bandwidth calculation to unbounded domains? or at least half-domains (i.e. [a; oo[ or ]-oo; a])? It seems that it all depends on the domain being of finite size. > > In fact, the method currently only works on unbounded domains. The > exact domain you calculate the density on is an optional parameter to > the density estimator function. The actual domain you calculate on has > to be finite because a finite mesh is needed. To the domain question: Besides the boundary problem in bounded domains, there is also the problem with unbounded domains, that the tails might not be well captured by a kde, especially with heavier tails. One idea I would have liked to borrow from matlab since I saw it the first time a few years ago, is http://www.mathworks.com/help/stats/paretotails.html kde (or other nonparametric density) in the middle, paretotails at the ends. but I never got around to coding this. Josef > >> I have a different concern though: is it normal that the density returned by the method is not normalized? (i.e. the integral of the density is far from being one). > > That's a bug. I can fix that with one line of code. I have always just > used the density estimate without units, because they aren't > particularly informative. However, the output should be normalized, or > at least a flag included to make it so. > > > > It seems like the next step is to set up a testing regime for > comparison to the two existing methods to compare speed and reproduce > the data from Table 1 in the paper. Also, it seems likely that > statsmodels is the more appropriate setting for this project. In > particular, I want to generalize the method to periodic domains, which > appears to be a novel implementation so more intensive testing will > likely be needed. > > Thanks, > Daniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From njs at pobox.com Thu Jan 24 14:16:14 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Jan 2013 11:16:14 -0800 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 10:52 AM, Surya Kasturi wrote: > I successfully installed the scipy central site on my computer. I just tried > to submit a code snipped for testing.. and ran into the below error. > > IOError at /item/new/ [Errno 2] No such file or directory: > 'compile\\conf.py' > > Please look into the error: http://dpaste.com/890230/ I'm not an expert on this code (it's not clear that anyone who's still active is; you may be stuck debugging things yourself in general), but it looks like there's some code there using windows-style directory separators (\ instead of /) and you're not running on windows? -n From suryak at ieee.org Thu Jan 24 14:18:23 2013 From: suryak at ieee.org (Surya Kasturi) Date: Fri, 25 Jan 2013 00:48:23 +0530 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: On Fri, Jan 25, 2013 at 12:46 AM, Nathaniel Smith wrote: > On Thu, Jan 24, 2013 at 10:52 AM, Surya Kasturi wrote: > > I successfully installed the scipy central site on my computer. I just > tried > > to submit a code snipped for testing.. and ran into the below error. > > > > IOError at /item/new/ [Errno 2] No such file or directory: > > 'compile\\conf.py' > > > > Please look into the error: http://dpaste.com/890230/ > > I'm not an expert on this code (it's not clear that anyone who's still > active is; you may be stuck debugging things yourself in general), but > it looks like there's some code there using windows-style directory > separators (\ instead of /) and you're not running on windows? > I am on windows. > > -n > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 24 14:20:36 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Jan 2013 11:20:36 -0800 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: Oh, I see, the / separators I was seeing are URL separators, not file separators. Well, maybe someone else will speak up, and in any case good luck figuring it out :-) -n On Thu, Jan 24, 2013 at 11:18 AM, Surya Kasturi wrote: > > > > On Fri, Jan 25, 2013 at 12:46 AM, Nathaniel Smith wrote: >> >> On Thu, Jan 24, 2013 at 10:52 AM, Surya Kasturi wrote: >> > I successfully installed the scipy central site on my computer. I just >> > tried >> > to submit a code snipped for testing.. and ran into the below error. >> > >> > IOError at /item/new/ [Errno 2] No such file or directory: >> > 'compile\\conf.py' >> > >> > Please look into the error: http://dpaste.com/890230/ >> >> I'm not an expert on this code (it's not clear that anyone who's still >> active is; you may be stuck debugging things yourself in general), but >> it looks like there's some code there using windows-style directory >> separators (\ instead of /) and you're not running on windows? > > > I am on windows. > >> >> >> -n >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From pav at iki.fi Thu Jan 24 14:21:07 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jan 2013 21:21:07 +0200 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: Hi, 24.01.2013 20:52, Surya Kasturi kirjoitti: > I successfully installed the scipy central site on my computer. I just > tried to submit a code snipped for testing.. and ran into the below error. > > IOError at /item/new/ [Errno 2] No such file or directory: > 'compile\\conf.py' > > Please look into the error: http://dpaste.com/890230/ There's a patch needed (my pull request wasn't merged yet) https://github.com/kgdunn/SciPyCentral/pull/115 see commit 22232f8 Pauli From suryak at ieee.org Thu Jan 24 15:01:32 2013 From: suryak at ieee.org (Surya Kasturi) Date: Fri, 25 Jan 2013 01:31:32 +0530 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: On Fri, Jan 25, 2013 at 12:51 AM, Pauli Virtanen wrote: > Hi, > > 24.01.2013 20:52, Surya Kasturi kirjoitti: > > I successfully installed the scipy central site on my computer. I just > > tried to submit a code snipped for testing.. and ran into the below > error. > > > > IOError at /item/new/ [Errno 2] No such file or directory: > > 'compile\\conf.py' > > > > Please look into the error: http://dpaste.com/890230/ > > There's a patch needed (my pull request wasn't merged yet) > > https://github.com/kgdunn/SciPyCentral/pull/115 > > see commit 22232f8 > tried the pathc.. here is the latest error: http://dpaste.com/890286/ > > Pauli > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 24 15:04:53 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 24 Jan 2013 13:04:53 -0700 Subject: [SciPy-Dev] Getting IO Error while running Scipy Central on local server In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 1:01 PM, Surya Kasturi wrote: > > > > On Fri, Jan 25, 2013 at 12:51 AM, Pauli Virtanen wrote: > >> Hi, >> >> 24.01.2013 20:52, Surya Kasturi kirjoitti: >> > I successfully installed the scipy central site on my computer. I just >> > tried to submit a code snipped for testing.. and ran into the below >> error. >> > >> > IOError at /item/new/ [Errno 2] No such file or directory: >> > 'compile\\conf.py' >> > >> > Please look into the error: http://dpaste.com/890230/ >> >> There's a patch needed (my pull request wasn't merged yet) >> >> https://github.com/kgdunn/SciPyCentral/pull/115 >> >> see commit 22232f8 >> > > > tried the pathc.. > > here is the latest error: http://dpaste.com/890286/ > Somewhere about here you will need to exercise your debugging powerz ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at barbierdereuille.net Thu Jan 24 16:10:54 2013 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Thu, 24 Jan 2013 22:10:54 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: On 24 January 2013 15:49, Daniel Smith wrote: > In fact, the method currently only works on unbounded domains. The > exact domain you calculate the density on is an optional parameter to > the density estimator function. The actual domain you calculate on has > to be finite because a finite mesh is needed. > About this: this is incorrect, as you work with a DCT, it is equivalent to repeat the data on both sides by reflexion. Which means your method is equivalent to the reflexion method. Also note this is pointed out in the paper itself. That being said, if there is enough "padding" on both sides (i.e. such that the tail of the kernel is almost 0) there is no effect to this. Also, you can replace the CDT with a FFT to get a cyclic density. I adapted your code for this and it works great! Back on the computation of the bandwidth, I would argue that you can compute it without computing the density itself. It's true that it makes sense to combine the binning as it useful for both, but I don't agree that it's necessary. -- Barbier de Reuille Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Jan 25 08:30:45 2013 From: sturla at molden.no (Sturla Molden) Date: Fri, 25 Jan 2013 14:30:45 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: <51028905.5000609@molden.no> On 24.01.2013 20:13, josef.pktd at gmail.com wrote: > Besides the boundary problem in bounded domains, there is also the > problem with unbounded domains, that the tails might not be well > captured by a kde, especially with heavier tails. We can easily correct the boundary effect by calculating the gain of the KDE. Just use the kernel to low-pass filter a signal (or image) that is 1 within the boundaries and 0 outside. Then divide the KDE by this gain estimator. It usually needs some regularization to avoid instability very close to the edges. Sturla From sturla at molden.no Fri Jan 25 08:45:46 2013 From: sturla at molden.no (Sturla Molden) Date: Fri, 25 Jan 2013 14:45:46 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: <51028C8A.8050507@molden.no> On 24.01.2013 20:13, josef.pktd at gmail.com wrote: > Besides the boundary problem in bounded domains, there is also the > problem with unbounded domains, that the tails might not be well > captured by a kde, especially with heavier tails. Information is always lost by smoothing. One can always use a delta function as kernel though. It retains all the information we have about the sampled distribution. For unbounded domains the KDE can be seen as a censored data problem. One can estimate the invisible boundaries by computing BIC or MML (or by cross-validation). Sturla From sturla at molden.no Fri Jan 25 08:53:09 2013 From: sturla at molden.no (Sturla Molden) Date: Fri, 25 Jan 2013 14:53:09 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: <51028C8A.8050507@molden.no> References: <51028C8A.8050507@molden.no> Message-ID: <51028E45.3090900@molden.no> On 25.01.2013 14:45, Sturla Molden wrote: > One can always use a delta function as kernel though. It retains all the > information we have about the sampled distribution. This is not as crazy as it might sound. It the basis for bootstrap and jack-knife procedures, PRESS in regression analysis, etc. Also by viewing a data sample as an "analog signal" consisting of a sum of delta functions (or equivalently: a KDE using a delta kernel), all the methods of DSP becomes available to statistical data analysis. The first step in which case is to digitize the signal by anti-alias filtering and regular sampling. And as it turs out, the anti-alias filtering is just another case of KDE. Sturla From josef.pktd at gmail.com Fri Jan 25 10:28:06 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 25 Jan 2013 10:28:06 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: <51028E45.3090900@molden.no> References: <51028C8A.8050507@molden.no> <51028E45.3090900@molden.no> Message-ID: On Fri, Jan 25, 2013 at 8:53 AM, Sturla Molden wrote: > On 25.01.2013 14:45, Sturla Molden wrote: > >> One can always use a delta function as kernel though. It retains all the >> information we have about the sampled distribution. > > This is not as crazy as it might sound. It the basis for bootstrap and > jack-knife procedures, PRESS in regression analysis, etc. Also by > viewing a data sample as an "analog signal" consisting of a sum of delta > functions (or equivalently: a KDE using a delta kernel), all the methods > of DSP becomes available to statistical data analysis. The first step in > which case is to digitize the signal by anti-alias filtering and regular > sampling. And as it turs out, the anti-alias filtering is just another > case of KDE. I'm not sure what you mean. If you just use a delta function, you get the original data back, and we get the empirical distribution function, isn't it. I don't understand how this relates to digitizing the data. It's useful for many applications, but not the point for kde. IIRC, the empirical distribution has a large variance, and the point of kde is to "loose" information and remove the noise. The pointwise variance of the density estimate is much smaller with a smooth, large bandwidth kernel, and the main task is to find the right bias-variance trade-off. Or do I misinterpret what you have in mind? Josef > > Sturla > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From sturla at molden.no Fri Jan 25 11:01:02 2013 From: sturla at molden.no (Sturla Molden) Date: Fri, 25 Jan 2013 17:01:02 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: <51028C8A.8050507@molden.no> <51028E45.3090900@molden.no> Message-ID: <5102AC3E.8040508@molden.no> On 25.01.2013 16:28, josef.pktd at gmail.com wrote: > If you just use a delta function, you get the original data back, and > we get the empirical distribution function, isn't it.I don't > understand how this relates to digitizing the data. What you get is a mathematical function that describes the data as an analog signal. The KDE can be seen as an anti-alias filter and an ADC. > The pointwise variance of the density estimate is much smaller with a > smooth, large bandwidth kernel, and the main task is to find the right > bias-variance trade-off. Bias-variance trade-off is the statistical perspective. The DSP perspective is selecting the appropriate low-pass filtering frequency. But numerically it is the same. But what if the distribution has a sharp edge? In DSP one often finds that wavelet shrinkage is better than low-pass filters at suppressing white noise from an arbitrary waveform. That for example applies to density estimation too. Wavelets can do better than KDE. Sturla From smith.daniel.br at gmail.com Fri Jan 25 11:36:56 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Fri, 25 Jan 2013 11:36:56 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References: Message-ID: Barbier de Reuille Pierre > About this: this is incorrect, as you work with a DCT, it is equivalent to repeat the data > on both sides by reflexion. Which means your method is equivalent to the reflexion > method. Also note this is pointed out in the paper itself. That being said, if there is > enough "padding" on both sides (i.e. such that the tail of the kernel is almost 0) there is > no effect to this. Also, you can replace the CDT with a FFT to get a cyclic density. I > adapted your code for this and it works great! You are correct. I had always ended up having padding on each side and gotten nonsense near the boundary. When I fixed the boundary correctly, it gave me nice answers. Could you send me your code for the cyclic density? I do some molecular dynamics work, and it would be really useful for making angular density plots. > Back on the computation of the bandwidth, I would argue that you can compute it > without computing the density itself. It's true that it makes sense to combine the > binning as it useful for both, but I don't agree that it's necessary.j Let me rephrase my sentiment. I think we can't calculate the bandwidth without the moral equivalent of calculating the density. Basically, we need a mapping from our set of samples plus our bandwidth to the square norm of the n'th derivative. Last night, I came up with a far more efficient method that I think demonstrates the moral equivalence. With some clever calculus, we can write down the mapping from the samples plus bandwidth to the j'th DCT (or Fourier) component. We can simply iterate over the DCT components until the change in the derivative estimate falls below some threshold. That saves us the histogramming step (not that important), but it also means we almost assuredly don't need 2**14 DCT components. For all intents in purposes, we have also constructed an estimate of the density in our DCT components. Without working through the math exactly, I think every representation of our data which allows us to estimate the density derivative is going to be equivalent, up to isometry, to the density itself. All that is neither here nor there, but certainly let me know if you have an idea how we could do such a calculation. I would be very interested in finding out that I'm wrong on this point. Josef: > Besides the boundary problem in bounded domains, there is also the > problem with unbounded domains, that the tails might not be well > captured by a kde, especially with heavier tails. You are absolutely correct, but that is another problem completely. Let me know if you implement the Pareto tails idea. > how about ``dgp_density.py``? > We put the current module in the sandbox during the merge, because we > still need to adjust it as we get new use cases. Cool. I'll get started on this over the weekend. Also, numpy.random has a whole bunch of distributions. We'll just need to combine them in clever ways to get our example distributions. Thanks, Daniel From josef.pktd at gmail.com Fri Jan 25 13:03:58 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 25 Jan 2013 13:03:58 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: <5102AC3E.8040508@molden.no> References: <51028C8A.8050507@molden.no> <51028E45.3090900@molden.no> <5102AC3E.8040508@molden.no> Message-ID: On Fri, Jan 25, 2013 at 11:01 AM, Sturla Molden wrote: > On 25.01.2013 16:28, josef.pktd at gmail.com wrote: > >> If you just use a delta function, you get the original data back, and >> we get the empirical distribution function, isn't it.I don't >> understand how this relates to digitizing the data. > > What you get is a mathematical function that describes the data as an > analog signal. The KDE can be seen as an anti-alias filter and an ADC. > > >> The pointwise variance of the density estimate is much smaller with a >> smooth, large bandwidth kernel, and the main task is to find the right >> bias-variance trade-off. > > Bias-variance trade-off is the statistical perspective. The DSP > perspective is selecting the appropriate low-pass filtering frequency. > But numerically it is the same. First, I still have problems to understand some of the DSP terminology and have to translate it into statistics or time series analysis. The two main differences I see from the "typical" assumptions DSP works with equal spaced time intervals, KDE works with unequal (random) location of the points (binned fft and similar are an exception). KDE imposes properties of a density function, non-negative and integrating to one, while an arbitrary bandpass filter doesn't impose either. However as in the current case, binning and fft convolution works faster for KDE in large samples, (and it might still be an under-exploited property.) > > But what if the distribution has a sharp edge? > > In DSP one often finds that wavelet shrinkage is better than low-pass > filters at suppressing white noise from an arbitrary waveform. That for > example applies to density estimation too. Wavelets can do better than KDE. That's a different kind of fish. I haven't seen any ready made recipes for density estimation with wavelets. Ralph and I briefly discussed this a while ago that it would be nice to have. In my first tries to understand wavelets using pywavelets, I didn't manage to get anything that looked smooth. My impression was that we might get sharp edges that are there, but we also get sharp bouncing around where it should be smooth. (After that, I decided I'm not interested enough to figure out the details on my own.) Josef > > > Sturla > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Fri Jan 25 13:13:53 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 25 Jan 2013 13:13:53 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: On Fri, Jan 25, 2013 at 11:36 AM, Daniel Smith wrote: > Barbier de Reuille Pierre > >> About this: this is incorrect, as you work with a DCT, it is equivalent to repeat the data >> on both sides by reflexion. Which means your method is equivalent to the reflexion >> method. Also note this is pointed out in the paper itself. That being said, if there is >> enough "padding" on both sides (i.e. such that the tail of the kernel is almost 0) there is >> no effect to this. Also, you can replace the CDT with a FFT to get a cyclic density. I >> adapted your code for this and it works great! > > You are correct. I had always ended up having padding on each side and > gotten nonsense near the boundary. When I fixed the boundary > correctly, it gave me nice answers. Could you send me your code for > the cyclic density? I do some molecular dynamics work, and it would be > really useful for making angular density plots. > >> Back on the computation of the bandwidth, I would argue that you can compute it >> without computing the density itself. It's true that it makes sense to combine the >> binning as it useful for both, but I don't agree that it's necessary.j > > Let me rephrase my sentiment. I think we can't calculate the bandwidth > without the moral equivalent of calculating the density. Basically, we > need a mapping from our set of samples plus our bandwidth to the > square norm of the n'th derivative. Last night, I came up with a far > more efficient method that I think demonstrates the moral equivalence. > With some clever calculus, we can write down the mapping from the > samples plus bandwidth to the j'th DCT (or Fourier) component. We can > simply iterate over the DCT components until the change in the > derivative estimate falls below some threshold. That saves us the > histogramming step (not that important), but it also means we almost > assuredly don't need 2**14 DCT components. For all intents in > purposes, we have also constructed an estimate of the density in our > DCT components. Without working through the math exactly, I think > every representation of our data which allows us to estimate the > density derivative is going to be equivalent, up to isometry, to the > density itself. > > All that is neither here nor there, but certainly let me know if you > have an idea how we could do such a calculation. I would be very > interested in finding out that I'm wrong on this point. It would be useful, for me and maybe to others, if you could use github to keep track of the different versions (your repo or gists). I would like to see how the boundary and periodicity are affected by the different fft and dct, since I bump into this also in other areas. Thanks, Josef > > Josef: > >> Besides the boundary problem in bounded domains, there is also the >> problem with unbounded domains, that the tails might not be well >> captured by a kde, especially with heavier tails. > > You are absolutely correct, but that is another problem completely. > Let me know if you implement the Pareto tails idea. > >> how about ``dgp_density.py``? >> We put the current module in the sandbox during the merge, because we >> still need to adjust it as we get new use cases. > > Cool. I'll get started on this over the weekend. Also, numpy.random > has a whole bunch of distributions. We'll just need to combine them in > clever ways to get our example distributions. > > Thanks, > Daniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From pierre at barbierdereuille.net Fri Jan 25 13:41:56 2013 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Fri, 25 Jan 2013 19:41:56 +0100 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: On 25 January 2013 17:36, Daniel Smith wrote: > You are correct. I had always ended up having padding on each side and > gotten nonsense near the boundary. When I fixed the boundary > correctly, it gave me nice answers. Could you send me your code for > the cyclic density? I do some molecular dynamics work, and it would be > really useful for making angular density plots. > Hey, I attach the code here. As for your other part, I will have to think about it, but essentially I came up with the conclusion that the bandwidth estimation would require a sparser grid than the density estimation. Making some test, a grid of 2^10 elements seem plenty (i.e. I get 4 significant digits compared to 2^14) and computation time falls from ~250ms to ~15ms using a dataset with 1000 samples. And 15 ms to compute the bandwidth is perfectly acceptable for me. Now, if you have an adaptive method that can perform similarly, that would be awesome. The bandwidth can then be used in any context in which it make sense. -- Barbier de Reuille Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kde_fft.py Type: application/octet-stream Size: 1035 bytes Desc: not available URL: From smith.daniel.br at gmail.com Fri Jan 25 15:32:35 2013 From: smith.daniel.br at gmail.com (Daniel Smith) Date: Fri, 25 Jan 2013 15:32:35 -0500 Subject: [SciPy-Dev] Expanding Scipy's KDE functionality In-Reply-To: References:

Message-ID: > As for your other part, I will have to think about it, but essentially I came up with the conclusion > that the bandwidth estimation would require a sparser grid than the density estimation. Making > some test, a grid of 2^10 elements seem plenty (i.e. I get 4 significant digits compared to 2^14) > and computation time falls from ~250ms to ~15ms using a dataset with 1000 samples. And 15 > ms to compute the bandwidth is perfectly acceptable for me. Now, if you have an adaptive > method that can perform similarly, that would be awesome. The bandwidth can then be used in > any context in which it make sense. I obviously did not write clearly enough. My apologies. You're absolutely right that the density estimation can use a sparser grid. The algorithm I was describing was designed to calculate the bandwidth estimate with the minimal grid necessary. You still need a density estimate, but not necessarily the 2^14 estimate I have defaulted. I will actually code that idea up to make it more clear. Thank you very much for your code. > It would be useful, for me and maybe to others, if you could use > github to keep track of the different versions (your repo or gists). > I would like to see how the boundary and periodicity are affected by > the different fft and dct, since I bump into this also in other areas. Future updates should go to that same github link I sent earlier. I haven't written any new code. I simply just generated some exponentially distributed data set MIN to be 0. My periodic idea involves a different kernel, so it will take a moment to map out. I haven't played with the code Barbier de Reuille Pierre posted yet. I will add the distribution generating code to that same git depository. Daniel From ralf.gommers at gmail.com Fri Jan 25 17:14:12 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 25 Jan 2013 23:14:12 +0100 Subject: [SciPy-Dev] 0.12.0 release schedule Message-ID: Hi all, It's time to start thinking about the 0.12.0 release. There's a number of cool new features, and a huge amount of maintenance work (thanks in large part to Pauli). I don't think there's any real blockers at the moment. We should merge as many open PRs as possible though. Does anyone still have important fixes or other things that should go in? I propose the following release schedule: Feb 9 : beta 1 Feb 23 : rc 1 Mar 9 : rc 2 (if needed) Mar 16 : final release Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 26 11:59:46 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Jan 2013 11:59:46 -0500 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries Message-ID: I just tried to install a new python 3.3 on Windows 7 python-3.3.0.amd64.msi numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe scipy-0.12.0.dev.win-amd64-py3.3.exe installers ran without problems (aside: I also have the 32bit version of python installed in a different directory from the 64 bit version) numpy.test() ends without errors and failures scipy.test(verbose=3) ends with a segfault at Compare dsbevx eigenvalues and eigenvectors ... I'm just a "consumer" of binaries, and have no idea. Josef From ralf.gommers at gmail.com Sat Jan 26 12:10:48 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 26 Jan 2013 18:10:48 +0100 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries In-Reply-To: References: Message-ID: On Sat, Jan 26, 2013 at 5:59 PM, wrote: > I just tried to install a new python 3.3 on Windows 7 > > python-3.3.0.amd64.msi > numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe > scipy-0.12.0.dev.win-amd64-py3.3.exe > > installers ran without problems > (aside: I also have the 32bit version of python installed in a > different directory from the 64 bit version) > > numpy.test() ends without errors and failures > > scipy.test(verbose=3) ends with a segfault at > > Compare dsbevx eigenvalues and eigenvectors ... > > > I'm just a "consumer" of binaries, and have no idea. > Fixed by https://github.com/scipy/scipy/pull/404 (merged yesterday). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 26 12:52:40 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Jan 2013 12:52:40 -0500 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries In-Reply-To: References:

Message-ID: On Sat, Jan 26, 2013 at 12:10 PM, Ralf Gommers wrote: > > > > On Sat, Jan 26, 2013 at 5:59 PM, wrote: >> >> I just tried to install a new python 3.3 on Windows 7 >> >> python-3.3.0.amd64.msi >> numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe >> scipy-0.12.0.dev.win-amd64-py3.3.exe >> >> installers ran without problems >> (aside: I also have the 32bit version of python installed in a >> different directory from the 64 bit version) >> >> numpy.test() ends without errors and failures >> >> scipy.test(verbose=3) ends with a segfault at >> >> Compare dsbevx eigenvalues and eigenvectors ... >> >> >> I'm just a "consumer" of binaries, and have no idea. > > > Fixed by https://github.com/scipy/scipy/pull/404 (merged yesterday Very good, and thanks to Pauli and you for the maintenance work. Josef > > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From cgohlke at uci.edu Sat Jan 26 12:55:40 2013 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sat, 26 Jan 2013 09:55:40 -0800 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries In-Reply-To: References:

Message-ID: <5104189C.5080107@uci.edu> On 1/26/2013 9:52 AM, josef.pktd at gmail.com wrote: > On Sat, Jan 26, 2013 at 12:10 PM, Ralf Gommers wrote: >> >> >> >> On Sat, Jan 26, 2013 at 5:59 PM, wrote: >>> >>> I just tried to install a new python 3.3 on Windows 7 >>> >>> python-3.3.0.amd64.msi >>> numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe >>> scipy-0.12.0.dev.win-amd64-py3.3.exe >>> >>> installers ran without problems >>> (aside: I also have the 32bit version of python installed in a >>> different directory from the 64 bit version) >>> >>> numpy.test() ends without errors and failures >>> >>> scipy.test(verbose=3) ends with a segfault at >>> >>> Compare dsbevx eigenvalues and eigenvectors ... >>> >>> >>> I'm just a "consumer" of binaries, and have no idea. >> >> >> Fixed by https://github.com/scipy/scipy/pull/404 (merged yesterday > > Very good, and thanks to Pauli and you for the maintenance work. > > Josef > I just uploaded new binaries. Christoph >> >> Ralf >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > From josef.pktd at gmail.com Sat Jan 26 13:20:20 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Jan 2013 13:20:20 -0500 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries In-Reply-To: <5104189C.5080107@uci.edu> References:

<5104189C.5080107@uci.edu> Message-ID: On Sat, Jan 26, 2013 at 12:55 PM, Christoph Gohlke wrote: > On 1/26/2013 9:52 AM, josef.pktd at gmail.com wrote: >> On Sat, Jan 26, 2013 at 12:10 PM, Ralf Gommers wrote: >>> >>> >>> >>> On Sat, Jan 26, 2013 at 5:59 PM, wrote: >>>> >>>> I just tried to install a new python 3.3 on Windows 7 >>>> >>>> python-3.3.0.amd64.msi >>>> numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe >>>> scipy-0.12.0.dev.win-amd64-py3.3.exe >>>> >>>> installers ran without problems >>>> (aside: I also have the 32bit version of python installed in a >>>> different directory from the 64 bit version) >>>> >>>> numpy.test() ends without errors and failures >>>> >>>> scipy.test(verbose=3) ends with a segfault at >>>> >>>> Compare dsbevx eigenvalues and eigenvectors ... >>>> >>>> >>>> I'm just a "consumer" of binaries, and have no idea. >>> >>> >>> Fixed by https://github.com/scipy/scipy/pull/404 (merged yesterday >> >> Very good, and thanks to Pauli and you for the maintenance work. >> >> Josef >> > > I just uploaded new binaries. Thanks you for the fast response and all your support of Windows users. scipy test suite runs without failures or errors (both short scipy.test() and long nosetests scipy) However there are again many warnings printed out during the test run For example several "ComplexWarning: Casting complex values to real discards the imaginary part" in scipy.sparse, besides the zero division and similar floating point errors. Josef > > Christoph > > >>> >>> Ralf >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Sun Jan 27 10:02:45 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 27 Jan 2013 10:02:45 -0500 Subject: [SciPy-Dev] segfault on python 3 - Gohlke binaries In-Reply-To: References:

<5104189C.5080107@uci.edu> Message-ID: On Sat, Jan 26, 2013 at 1:20 PM, wrote: > On Sat, Jan 26, 2013 at 12:55 PM, Christoph Gohlke wrote: >> On 1/26/2013 9:52 AM, josef.pktd at gmail.com wrote: >>> On Sat, Jan 26, 2013 at 12:10 PM, Ralf Gommers wrote: >>>> >>>> >>>> >>>> On Sat, Jan 26, 2013 at 5:59 PM, wrote: >>>>> >>>>> I just tried to install a new python 3.3 on Windows 7 >>>>> >>>>> python-3.3.0.amd64.msi >>>>> numpy-MKL-1.7.0rc1.win-amd64-py3.3.exe >>>>> scipy-0.12.0.dev.win-amd64-py3.3.exe >>>>> >>>>> installers ran without problems >>>>> (aside: I also have the 32bit version of python installed in a >>>>> different directory from the 64 bit version) >>>>> >>>>> numpy.test() ends without errors and failures >>>>> >>>>> scipy.test(verbose=3) ends with a segfault at >>>>> >>>>> Compare dsbevx eigenvalues and eigenvectors ... >>>>> >>>>> >>>>> I'm just a "consumer" of binaries, and have no idea. >>>> >>>> >>>> Fixed by https://github.com/scipy/scipy/pull/404 (merged yesterday >>> >>> Very good, and thanks to Pauli and you for the maintenance work. >>> >>> Josef >>> >> >> I just uploaded new binaries. > > Thanks you for the fast response and all your support of Windows users. > > scipy test suite runs without failures or errors (both short > scipy.test() and long nosetests scipy) > > However there are again many warnings printed out during the test run > > For example several > "ComplexWarning: Casting complex values to real discards the imaginary part" > in scipy.sparse, besides the zero division and similar floating point errors. Just to finish up the testing I also installed the 32bit version of python, numpy and scipy (on 64bit Windows 7) (Gohlke binaries of yesterday.) running the full scipy tests, I get one failure (the 17 errors are just known fails, reported because I ran nosetests on the command line) ====================================================================== FAIL: test_decomp.TestEig.test_singular ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python33\lib\site-packages\nose\case.py", line 198, in runTest self.test(*self.arg) File "C:\Programs\Python33\lib\site-packages\scipy\linalg\tests\test_decomp.py", line 210, in test_singular self._check_gen_eig(A, B) File "C:\Programs\Python33\lib\site-packages\scipy\linalg\tests\test_decomp.py", line 197, in _check_gen_eig err_msg=msg) File "C:\Programs\Python33\lib\site-packages\numpy\testing\utils.py", line 812, in assert_array_almost_equal header=('Arrays are not almost equal to %d decimals' % decimal)) File "C:\Programs\Python33\lib\site-packages\numpy\testing\utils.py", line 645, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 6 decimals array([[22, 34, 31, 31, 17], [45, 45, 42, 19, 29], [39, 47, 49, 26, 34], [27, 31, 26, 21, 15], [38, 44, 44, 24, 30]]) array([[13, 26, 25, 17, 24], [31, 46, 40, 26, 37], [26, 40, 19, 25, 25], [16, 25, 27, 14, 23], [24, 35, 18, 21, 22]]) (mismatch 50.0%) x: array([ -1.32014829e-08+0.j, 1.32014809e-08+0.j, 1.33224503e+00+0.j, 2.00000000e+00+0.j]) y: array([ -5.90370943e-01+0.j, -1.54128768e-07+0.j, 1.54128748e-07+0.j, 2.00000000e+00+0.j]) ---------------------------------------------------------------------- Ran 6063 tests in 383.527s FAILED (SKIP=34, errors=17, failures=1) > > Josef > >> >> Christoph >> >> >>>> >>>> Ralf >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev From ralf.gommers at gmail.com Sun Jan 27 14:51:22 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 27 Jan 2013 20:51:22 +0100 Subject: [SciPy-Dev] docstring standard: parameter shape description Message-ID: Hi, When merging the doc wiki edits there were a large number of changes to the shape description of parameters/returns. This is not yet described in the docstring standard ( https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and currently is done in various ways: param1 : ndarray, shape (N,) param1 : ndarray of shape (N,) param1 : (N,) ndarray param1 : 1-D ndarray param1 : ndarray A 1-D array .... To keep this consistent I'd like to add it to the standard. My proposal would be: 1. If the actual shape is used in the description (for example to say "return size is N+1), then use: param1 : ndarray, shape (N,) 2. If it's not used but has to be 1-D (or 2-D, ...), then use: param1 : 1-D ndarray This post was triggered by doc wiki edits and a review comment on those at https://github.com/scipy/scipy/pull/405 by the way. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jan 27 14:57:23 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 27 Jan 2013 14:57:23 -0500 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: References: Message-ID: On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers wrote: > Hi, > > When merging the doc wiki edits there were a large number of changes to the > shape description of parameters/returns. This is not yet described in the > docstring standard > (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and > currently is done in various ways: > > param1 : ndarray, shape (N,) I think the word "shape" is not really necessary. Josef > > param1 : ndarray of shape (N,) > > param1 : (N,) ndarray > > param1 : 1-D ndarray > > param1 : ndarray > A 1-D array .... > > To keep this consistent I'd like to add it to the standard. My proposal > would be: > 1. If the actual shape is used in the description (for example to say > "return size is N+1), then use: > param1 : ndarray, shape (N,) > 2. If it's not used but has to be 1-D (or 2-D, ...), then use: > param1 : 1-D ndarray > > This post was triggered by doc wiki edits and a review comment on those at > https://github.com/scipy/scipy/pull/405 by the way. > > Ralf > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From jh at physics.ucf.edu Mon Jan 28 16:21:56 2013 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 28 Jan 2013 22:21:56 +0100 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: (scipy-dev-request@scipy.org) Message-ID: On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers wrote: > Hi, > > When merging the doc wiki edits there were a large number of changes to the > shape description of parameters/returns. This is not yet described in the > docstring standard > (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and > currently is done in various ways: > > param1 : ndarray, shape (N,) I think it should be consistent between all cases, start with the class and then the shape, and solve the general problem. Initially, I agreed with Josef about being terse, but it reads hard that way and if you're a newbie you might wonder what the numbers in parens are. The word "shape" does not add an extra line, and the comma makes sense as an appositive in English. So, I prefer: param1 : ndarray, shape XXXXX For XXXXX, we need to specify: ranges of allowed numbers of dimensions ranges of allowed sizes within each dimension low- or high-side unconstrained sizes in either case We should accept the output of .shape, and define some range conventions. Of course, there will be pathological cases, particularly in specialist packages that adopt the numpy doc standard, where nothing but text will adequately describe the allowed dimensions ("If there are three dimensions, then the second dimension must..."). A "(see text)" should be allowed after the shape spec. So, this is my counterproposal for inclusion in the standard: ------------------------------------------------------------------------------- param1 : ndarray, shape [(see text)] as in param1 : ndarray, shape (2, 2+, dim(any), 4-, 4-6, any) (see text) in : the spec reads from the slowest-varying to the fastest-varying dimension a number means exactly that number of items on that axis a number followed by a "+" ("-") means that number or more (fewer) items a-b means between a and b items, INCLUSIVE "any" means any number of items on that axis dim(dimspec) means the conventions above apply for dimensions instead of items The example would mean an array with dimensions, from slowest to fastest-varying, of size: 2 2 or more (0 or more axes can be inserted here) 0 to 4 4 to 6 any size, including absent (use 1+ to require a dimension) ------------------------------------------------------------------------------- I thought of basing the ranges off the Python indexing spec, but I find it potentially confusing. Is there a reason to propagate the Python weirdness that the ending index is 1 more than the final item? The latter behavior of Python is useful in programming (you don't have to write "-1" all the time), but it is error-inducing to many, even to non-beginners. However, if we did this, the example would look like: (2, :3, dim(:), 4:, 4:7, :) I don't think it wise to use the indexing spec with a changed meaning for the ending index. Either we should adopt the indexing spec or we should adopt another spec that looks different enough from the indexing spec not to be confusing. Remember that the docs need to be clear to beginners and help them not make errors. Thoughts? --jh-- From njs at pobox.com Mon Jan 28 16:47:26 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 28 Jan 2013 13:47:26 -0800 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: References: Message-ID: On Mon, Jan 28, 2013 at 1:21 PM, Joe Harrington wrote: > On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers wrote: >> Hi, >> >> When merging the doc wiki edits there were a large number of changes to the >> shape description of parameters/returns. This is not yet described in the >> docstring standard >> (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and >> currently is done in various ways: >> >> param1 : ndarray, shape (N,) > > I think it should be consistent between all cases, start with the class > and then the shape, and solve the general problem. > > Initially, I agreed with Josef about being terse, but it reads hard that > way and if you're a newbie you might wonder what the numbers in parens > are. The word "shape" does not add an extra line, and the comma makes > sense as an appositive in English. +1, the word 'shape' is a pretty critical clue the first time you see this. > So, I prefer: > > param1 : ndarray, shape XXXXX > > For XXXXX, we need to specify: > > ranges of allowed numbers of dimensions > ranges of allowed sizes within each dimension > low- or high-side unconstrained sizes in either case > > We should accept the output of .shape, and define some range > conventions. Of course, there will be pathological cases, particularly > in specialist packages that adopt the numpy doc standard, where nothing > but text will adequately describe the allowed dimensions ("If there are > three dimensions, then the second dimension must..."). A "(see text)" > should be allowed after the shape spec. > > So, this is my counterproposal for inclusion in the standard: > > ------------------------------------------------------------------------------- > param1 : ndarray, shape [(see text)] > as in > param1 : ndarray, shape (2, 2+, dim(any), 4-, 4-6, any) (see text) > > in : > the spec reads from the slowest-varying to the fastest-varying dimension > a number means exactly that number of items on that axis > a number followed by a "+" ("-") means that number or more (fewer) items > a-b means between a and b items, INCLUSIVE > "any" means any number of items on that axis > dim(dimspec) means the conventions above apply for dimensions instead of items > > The example would mean an array with dimensions, from slowest to > fastest-varying, of size: > 2 > 2 or more > (0 or more axes can be inserted here) > 0 to 4 > 4 to 6 > any size, including absent (use 1+ to require a dimension) "any size" should mean 0+. "absent" is not a size. If a function does accept an optional final dimension, can we write that like 'shape (N, D) or shape (N,)'? For inserting axes, "..." is clearer than the rather opaque "any(dim)", and matches existing Python convention. Generally, though, for input parameters it's usually best to specify the size as a variable rather than a numeric range so it can be referred back to later, right? And for output parameters there's no need to specify ranges, since the shape should be determined by the input? 'in1 : ndarray, shape (N, M), in2 : ndarray, shape (M, K), out : ndarray, shape (N, K)'. The spec in this complexity seems to be in peril of overengineering. Do we have examples of when these more elaborate specifiers would be useful? -n From jh at physics.ucf.edu Mon Jan 28 19:35:28 2013 From: jh at physics.ucf.edu (Joe Harrington) Date: Tue, 29 Jan 2013 01:35:28 +0100 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: (message from Nathaniel Smith on Mon, 28 Jan 2013 13:47:26 -0800) Message-ID: On Mon, 28 Jan 2013 13:47:26 -0800 Nathaniel Smith wrote: > On Mon, Jan 28, 2013 at 1:21 PM, Joe Harrington wrote: > > On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers > wrote> : > >> Hi, > >> > >> When merging the doc wiki edits there were a large number of changes to the > >> shape description of parameters/returns. This is not yet described in the > >> docstring standard > >> (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), > >> an> d > >> currently is done in various ways: > >> > >> param1 : ndarray, shape (N,) > > > > I think it should be consistent between all cases, start with the class > > and then the shape, and solve the general problem. > > > > Initially, I agreed with Josef about being terse, but it reads hard that > > way and if you're a newbie you might wonder what the numbers in parens > > are. The word "shape" does not add an extra line, and the comma makes > > sense as an appositive in English. > > +1, the word 'shape' is a pretty critical clue the first time you see this. > > > So, I prefer: > > > > param1 : ndarray, shape XXXXX > > > > For XXXXX, we need to specify: > > > > ranges of allowed numbers of dimensions > > ranges of allowed sizes within each dimension > > low- or high-side unconstrained sizes in either case > > > > We should accept the output of .shape, and define some range > > conventions. Of course, there will be pathological cases, particularly > > in specialist packages that adopt the numpy doc standard, where nothing > > but text will adequately describe the allowed dimensions ("If there are > > three dimensions, then the second dimension must..."). A "(see text)" > > should be allowed after the shape spec. > > > > So, this is my counterproposal for inclusion in the standard: > > > > > ----------------------------------------------------------------------------> > > --- > > param1 : ndarray, shape [(see text)] > > as in > > param1 : ndarray, shape (2, 2+, dim(any), 4-, 4-6, any) (see text) > > > > in : > > the spec reads from the slowest-varying to the fastest-varying dimension > > a number means exactly that number of items on that axis > > a number followed by a "+" ("-") means that number or more (fewer) items > > a-b means between a and b items, INCLUSIVE > > "any" means any number of items on that axis > > dim(dimspec) means the conventions above apply for dimensions > > instead of i> tems > > > > The example would mean an array with dimensions, from slowest to > > fastest-varying, of size: > > 2 > > 2 or more > > (0 or more axes can be inserted here) > > 0 to 4 > > 4 to 6 > > any size, including absent (use 1+ to require a dimension) > > "any size" should mean 0+. "absent" is not a size. If a function does > accept an optional final dimension, can we write that like 'shape (N, > D) or shape (N,)'? An array dimension cannot have 0 elements (the total size is the product of the shape tuple's elements). "1+" means the dimension has to be there. "any" means it could be there or not. There are many cases in image processing where an optional initial or final dimension appears, so I felt this would cover most of our cases and avoid most uses of dim(). But you could get rid of "any" and use "dim(1-)" instead. I'm not sure which is clearer to a beginner. (N,) is too obscure to a beginner and might be missed by anyone reading fast. Also, (2,) is a valid 1D shape (i.e., it's valid tuple notation). Having a different meaning from the rules governing the normal output of .shape is not a good idea. > For inserting axes, "..." is clearer than the rather opaque > "any(dim)", and matches existing Python convention. It's dim(any), not any(dim), so it's clear enough. "dim()" just means the contents applies to dimensions, not items. The reason to use dim() is the generality. What if you can only insert 1 axis, or up to two? Then you can say "dim(1)" or "dim(2-)". "..." doesn't capture this at all. However, I don't mind allowing "..." as a shorthand for "dim(any)". How about adding : "..." is an alias for dim(any) to the spec list. > Generally, though, for input parameters it's usually best to specify > the size as a variable rather than a numeric range so it can be > referred back to later, right? And for output parameters there's no > need to specify ranges, since the shape should be determined by the > input? 'in1 : ndarray, shape (N, M), in2 : ndarray, shape (M, K), out > : ndarray, shape (N, K)'. I agree completely. How about adding : a capital letter (variable) means any number of items, for later reference in the spec list. > The spec in this complexity seems to be in > peril of overengineering. Rather than overengineering, I'm trying to prevent underengineering ourselves into a corner from which it is difficult to recover. The spec as proposed will not look strange in any normal case (now that the omission of variables is fixed). For those objects that are weird, it will look as good as it can while still delivering the desired information. The danger of not thinking it through now is that we underspecify, document things a certain way for a while, then discover that we need more, and further that what we have specified is not compatible with the best way to do it generally. Then we need to comb numpy's 2000+ functions, rewriting all the shapespecs, not to mention all the other packages that now use the numpy doc spec or derivatives of it. > Do we have examples of when these more > elaborate specifiers would be useful? Sure, mainly in science image processing, such as in astronomy, especially where true-color, tagged, and mosaicked images are involved. I've written many routines that handle both the case of a single image and stacks/arrays of images in arbitrary or semi-arbitrary configurations, which would be (dim(any), N, M) or (dim(1-2), N, M) The latter case is common as the input to image mosaicking software handling either strips or 2D mosaics. I've also seen cases where there is an array of information ancillary to each pixel in an image. For example, the per-pixel status bits from the Spitzer Space Telescope's calibration pipeline could be carried along this way, or the temperature, pressure, and humidity in each grid cell of a general circulation model could be stored this way, or uncertainties could be kept this way. The spec would be (N, M, dim(any)) or (N, M, dim(0-1)) A true-color image is (N, M, 3) except when it's (N, M, 4) or (3, N, M) if stored as 3 separate images, or (4, N, M) if it's got transparency. So, that's (3-4, N, M) or (N, M, 3-4) So, there's an argument to allowing a list of shapespecs. The input for mosaicking color, tagged images would be: (dim(1-2), 3-4, N, M, dim(0-1)) If the latter arrays were restricted to being image stacks as opposed to 2D mosaics, and the routine were smart enough to know that if there's only one image in the stack then just return it, then: (dim(1-), 3-4, N, M, dim(0-1)) Also, the shapespec is a little underspecified, still. In the latter case, what if you wanted to handle both monochrome and color images? Then the 3-4 dimension is optional. I suppose we should add to the spec: opt() means this part of the specification is optional as in (dim(1-), opt(3-4,) N, M, dim(0-1)) So, my proposal is now: ------------------------------------------------------------------------------ param1 : ndarray, shape [(see text)] as in param1 : ndarray, shape (2, 2+, dim(0+), 4-, 4-6, opt(N)) (see text) in : the spec reads from the slowest-varying to the fastest-varying dimension a number means exactly that number of items on that axis a capital letter (variable) means 1 or more items, for later reference a number followed by a "+" ("-") means that number or more (fewer) items a-b, where a and b are numerical sizes, means between a and b items, INCLUSIVE dim(dimspec) means the conventions above apply for dimensions instead of items opt() means this part of the specification is optional shorthands: 0 starting a range means the dimension (or list of dimensions) is optional "..." is an alias for dim(0+) (any number of dimensions, or none) The example would mean an array with dimensions, from slowest to fastest-varying, of size: 2 2 or more (0 or more axes can appear here) 0 to 4 4 to 6 any size, including absent axis While the shapespec allows for complex shape options to be specified, always use the simplest shapespec possible for the object. ------------------------------------------------------------------------------- The variables let us get rid of "any". Use either "N" or "opt(N)", depending on what you mean. The important thing to remember is that the vast majority of routines will have nice, simple shapespecs. We're just ensuring that the complex cases can be handled in the same documentation standard. --jh-- From tim at cerazone.net Mon Jan 28 23:17:56 2013 From: tim at cerazone.net (Cera, Tim) Date: Mon, 28 Jan 2013 23:17:56 -0500 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: References: Message-ID: > So, my proposal is now: > > > ------------------------------------------------------------------------------ > param1 : ndarray, shape [(see text)] > as in > param1 : ndarray, shape (2, 2+, dim(0+), 4-, 4-6, opt(N)) (see text) > > in : > the spec reads from the slowest-varying to the fastest-varying dimension > a number means exactly that number of items on that axis > a capital letter (variable) means 1 or more items, for later reference > a number followed by a "+" ("-") means that number or more (fewer) items > a-b, where a and b are numerical sizes, means between a and b items, > INCLUSIVE > dim(dimspec) means the conventions above apply for dimensions instead of > items > opt() means this part of the specification is optional > shorthands: > 0 starting a range means the dimension (or list of dimensions) is > optional > "..." is an alias for dim(0+) (any number of dimensions, or none) shape(-1) (couldn't resist) But really I don't understand. This might be useful as a programming specification to be parsed to validate parameters, but for documentation? I remember this discussion in the numpy or scipy doc editor. My memory is that the decision was... param1 : (N,) array_like This is a 1D array. That is why I made the changes that I did. The problem is that those discussions are not searchable and I couldn't find the thread (quickly anyway). Could someone who has some Django experience add the capability to search discussions? Helpful link -> https://github.com/pv/pydocweb Ralf - I appreciate the tedium of going through hundreds of changes and deciding should it stay or should it go. Sorry. Now, if they were only my edits, you could have just accepted them because they were perfect. :-) Another nice thing would be to have the docstring editor have stronger 'linting' capabilities. It catches some things like long lines, but it would be nice to test for more of the docstring standard AND to be able to search for failed 'linting' tests - the same way you can search for docstring without examples. I don't understand the workflow concerning the interaction between the code and pydocweb, so this is just talking out my hat, but can't the docstrings be edited in a batch (for example using 'sed') and the edited docstrings brought into pydocweb? I am neutral on Ralf's initial recommendation. Suggest that should pick one way to do it though, so if a change is to be made I think having a uniform approach is more readable. param1 : array_like Can be any shape... param2 : array_like, shape(N,) A 1D array, specified whether ``N`` is used or not. param3 : array_like, shape(K,) Another 1D array As I was typing out the examples above I realized the `param3` proves my point. If ``K`` is not mentioned then Ralf's initial suggested docstring standard would have described the shape of `param3` in the text. This would make the 'param3 : ...' line and 'param1 : ...' line seem to indicate the same thing. You would have to read the text to see the difference. Now, how often does this happen where you have parameters of different shapes that would confuse under the proposed standard? I have no idea. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Mon Jan 28 23:27:04 2013 From: tim at cerazone.net (Cera, Tim) Date: Mon, 28 Jan 2013 23:27:04 -0500 Subject: [SciPy-Dev] docstring standard: parameter shape description In-Reply-To: References: