From vincent at vincentdavis.net Thu Jul 1 00:08:29 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 30 Jun 2010 22:08:29 -0600 Subject: [SciPy-User] looking for a python computational library... In-Reply-To: References: Message-ID: On Wed, Jun 30, 2010 at 8:17 PM, Anthony Palomba wrote: > Hey scipy-ers, > > I was wondering if there is some python module out there > that does computational geometry that I could use in > conjunction with scipy. I have python bindings for gcal suggested when I ask a similar question http://cgal-python.gforge.inria.fr/ Vincent > > > > Thanks, > Anthony > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From renato.fabbri at gmail.com Thu Jul 1 05:18:40 2010 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Thu, 1 Jul 2010 06:18:40 -0300 Subject: [SciPy-User] sum up to a specific value In-Reply-To: References: Message-ID: hi, i need to find which elements of an array sums up to an specific value any idea of how to do this? best, rf -- GNU/Linux User #479299 skype: fabbri.renato From david.mrva at isamfunds.com Thu Jul 1 06:12:56 2010 From: david.mrva at isamfunds.com (David Mrva) Date: Thu, 1 Jul 2010 05:12:56 -0500 Subject: [SciPy-User] Error calling mov_max() on scikits.timeseries object In-Reply-To: References: Message-ID: Hi Pierre, Many thanks for your reply. It clarified the observed behaviour considerably. I use: >>> scikits.timeseries.__version__ '0.91.3' >>> numpy.__version__ '1.3.0' With Python 2.6. Both libraries came with PythonXY. One suggestion: The doc for the 'tsfromtxt' function talks about structured dtype. A link to numpy's documentation describing the syntax and purpose would have been very useful. Similarly, a mention of the fact that the moving window functions do not accept masked arrays with structured dtype would have set the expectations right. Many thanks, David -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Pierre GM Sent: 30 June 2010 18:29 To: SciPy Users List Subject: Re: [SciPy-User] Error calling mov_max() on scikits.timeseries object On Jun 30, 2010, at 8:02 AM, David Mrva wrote: > Hello All, > > As a new user to scikits.timeseries, I started with a simple piece of code: read a one column timeseries data from a CSV file and find moving maxima. > > How should I correctly use the mov_max() function with a timeseries object? > > When I call the mov_max() function, I keep getting an exception: > > >>> import numpy as np > >>> import scikits.timeseries as ts > >>> import scikits.timeseries.lib.moving_funcs as mf > >>> b=ts.tsfromtxt("test4.csv", delimiter=',', names='price', datecols=(0), dtype='float') > >>> b > timeseries([(5277.0,) (5214.0,) (5180.0,) (5092.5,)], > dtype = [('price', ' dates = [737791 738156 738521 738886], > freq = U) > > >>> c=mf.mov_max(b, 2) > Traceback (most recent call last): > File "C:\Python26\lib\site-packages\scikits\timeseries\lib\moving_funcs.py", line 228, in mov_max > return _moving_func(data, MA_mov_max, kwargs) > File "C:\Python26\lib\site-packages\scikits\timeseries\lib\moving_funcs.py", line 121, in _moving_func > data = ma.fix_invalid(data) > File "C:\Python26\lib\site-packages\numpy\ma\core.py", line 516, in fix_invalid > invalid = np.logical_not(np.isfinite(a._data)) > AttributeError: logical_not > >>> > > Where the contents of the test4.csv file is: > 24/06/2010 09:10,5092.5 > 23/06/2010 09:10,5180 > 22/06/2010 09:10,5214 > 21/06/2010 09:10,5277 > > Calling mov_max() on a list of numbers works fine. The moving functions don't require that the input is a time_series (a standard ndarray or MaskedArray works frine), but you can't use a series w/ a structured dtype (that is, w/ named fields, like the one you have). Instead, you should use >>> c=mf.mov_max(b['price'], 2) I'm a tad surprised by the exception you're getting. Which version of timeseries/numpy are you using ? Mine give a NotImplementedError: Not implemented for this type which is far more explanatory. _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From sturla at molden.no Thu Jul 1 12:12:58 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 01 Jul 2010 18:12:58 +0200 Subject: [SciPy-User] sum up to a specific value In-Reply-To: References: Message-ID: <4C2CBE8A.2030408@molden.no> Renato Fabbri skrev: > hi, > > i need to find which elements of an array sums up to an specific value > > any idea of how to do this? > http://mathworld.wolfram.com/SubsetSumProblem.html Sturla From vincent at vincentdavis.net Thu Jul 1 15:53:01 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 1 Jul 2010 13:53:01 -0600 Subject: [SciPy-User] solvers for discrete problems Message-ID: On the numpy list Renato Fabbri ask for a solver for the subset sum problem (special case of the knapsack problem) Referances by Pauli Virtanen http://en.wikipedia.org/wiki/Knapsack_problem http://en.wikipedia.org/wiki/Subset_sum_problem Is there a place in scipy for a set of solvers for discrete solvers like this? Would others find them useful? Maybe there is another python package that has many of these, i don't know. I ask because I would like to (be willing to) spend form time writing solvers for these types of problems. (maybe at the end of the year) Vincent From pav at iki.fi Thu Jul 1 16:20:07 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 1 Jul 2010 20:20:07 +0000 (UTC) Subject: [SciPy-User] solvers for discrete problems References: Message-ID: Thu, 01 Jul 2010 13:53:01 -0600, Vincent Davis wrote: [clip: dynamic programming, discrete problems] > Is there a place in scipy for a set of solvers for discrete solvers like > this? Would others find them useful? Maybe there is another python > package that has many of these, i don't know. scipy.optimize would probably be the best place for them, I believe. If you want to work on this, great! -- Pauli Virtanen From bsouthey at gmail.com Thu Jul 1 16:28:40 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 01 Jul 2010 15:28:40 -0500 Subject: [SciPy-User] solvers for discrete problems In-Reply-To: References: Message-ID: <4C2CFA78.4070901@gmail.com> On 07/01/2010 03:20 PM, Pauli Virtanen wrote: > Thu, 01 Jul 2010 13:53:01 -0600, Vincent Davis wrote: > [clip: dynamic programming, discrete problems] > >> Is there a place in scipy for a set of solvers for discrete solvers like >> this? Would others find them useful? Maybe there is another python >> package that has many of these, i don't know. >> > scipy.optimize would probably be the best place for them, I believe. If > you want to work on this, great! > > There are many examples of Python code for the knapsack problem. For example, Stephen Marsland uses numpy in his book "Machine Learning: An Algorithmic Perspective" http://www-ist.massey.ac.nz/smarsland/MLbook.html (chapter 12) However, the code is non-commercial licensed. Also I do not know how general the code is. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Fri Jul 2 14:31:08 2010 From: jh at physics.ucf.edu (Joe Harrington) Date: Fri, 02 Jul 2010 14:31:08 -0400 Subject: [SciPy-User] SciPy docs: volunteers needed now! Message-ID: Dear SciPy users and developers, For the past two summers, the SciPy Documentation Project has run a concerted effort to write docstrings for all the NumPy objects. This has been very successful, with over 75 writers contributing. Nearly every "important" object now has a reviewable docstring, and we finally have someone working on doc wiki programming again! We accomplished this by writing 3000-5000 words a week, as tabulated on the NumPy doc wiki's stats page. Sometimes we wrote much more. This worked because 1) many volunteered and 2) there was a paid coordinator who managed the workflow, motivated the community, and chased down loose ends. This year we have taken on the SciPy docs, which are bigger and more technical. A handful of very dedicated people has taken up the challenge, and David Goldsmith is once again our able coordinator. However, at the rate we are going it will take something like a decade to finish. We're writing just 700-1200 words a week. I have had to question whether it is worthwhile paying a coordinator for the effort of just a few volunteers, as these people are fairly focused already and there just are not very many of them. I would hate to pull support for the coordinator now, but in the end this project lives or dies by the willingness of its users - US - to contribute our time. Each of us should consider the cost - thousands of dollars a year for many of us with big projects - of commercial numerical software. Extracting (or even requesting) payment in exchange for the right to use SciPy isn't our model, but we still need the labor. In the end, all labor is either volunteer or paid. So, this is an appeal for all knowledgeable SciPy users, including current code developers, to consider writing a little bit of documentation over this summer. For those in a position to assign people to work on docs or to pay people who do, now is the time to step forward as well. Specifically, we need to get the number of words per week up into the 3000 range to make a paid coordinator worthwhile (and indeed, to get anywhere in a reasonable time). Short of a significant pickup in participation and words produced, documenting SciPy will once again be a 100% volunteer effort. Please visit http://docs.scipy.org/numpy/ to learn about editing docs, then sign up, request edit rights, and visit http://docs.scipy.org/scipy/Milestones/ to get connected to the work going on. We meet on Skype every Friday at noon EDT. Contact David Goldsmith (d.l.goldsmith at gmail.com, d.l.goldsmith on Skype) to join those conversations. Email discussion of doc issues happens on scipy-dev. Thanks, --jh-- From matthew.brett at gmail.com Fri Jul 2 14:40:01 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 2 Jul 2010 14:40:01 -0400 Subject: [SciPy-User] [SciPy-Dev] SciPy docs: volunteers needed now! In-Reply-To: References: Message-ID: Hi, > This year we have taken on the SciPy docs, which are bigger and more > technical. ?A handful of very dedicated people has taken up the > challenge, and David Goldsmith is once again our able coordinator. > However, at the rate we are going it will take something like a decade > to finish. ?We're writing just 700-1200 words a week. ?I have had to > question whether it is worthwhile paying a coordinator for the effort > of just a few volunteers, as these people are fairly focused already > and there just are not very many of them. I wonder whether there is any other approach that we can explore to help generate more volunteer work? Do you think it is mainly the difference between scipy and numpy that explains the drop-off? Or something else? To the extent that it is the technical differences - do you think there would be any point in trying to establish something like nominated experts or want-to-find-out type experts who will offer to advise on particular parts of scipy - even if they don't themselves write the docstrings? Or anything else that might help? Best, Matthew From elmar at net4werling.de Fri Jul 2 15:18:11 2010 From: elmar at net4werling.de (elmar) Date: Fri, 02 Jul 2010 21:18:11 +0200 Subject: [SciPy-User] KDTree count_neighbors Message-ID: hi all, just trying to wright a small scipts to get the distribution pattern of points in xy space using count_neighbors from scipy.spatial. Here is a shortend version of the skipt: from numpy import array from numpy.random import uniform from scipy.spatial import KDTree # just to get a pattern x = uniform(-2.0, 2.0, 200) y = uniform(-2.0, 2.0, 200) pattern = zip(x.ravel(), y.ravel()) tree = KDTree(pattern) point = array([1, 0]) # just a point r = 0.1 # and just a distance num = tree.count_neighbors(point, r) And here is the error messages: num = tree.count_neighbors(point, r) File "C:\Python26\lib\site-packages\scipy\spatial\kdtree.py", line 566, in count_neighbors R2 = Rectangle(other.maxes, other.mins) AttributeError: 'numpy.ndarray' object has no attribute 'maxes' Can anyone give me some support ? Any help is wellcome ! elmar From sturla at molden.no Fri Jul 2 15:40:03 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 02 Jul 2010 21:40:03 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: Message-ID: <4C2E4093.7060809@molden.no> elmar skrev: > Can anyone give me some support ? Any help is wellcome ! > It seems we have a bug. The code threat the argument 'r' as a KDTree instead of array. Sturla From sturla at molden.no Fri Jul 2 15:42:13 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 02 Jul 2010 21:42:13 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: <4C2E4093.7060809@molden.no> References: <4C2E4093.7060809@molden.no> Message-ID: <4C2E4115.3010502@molden.no> Sturla Molden skrev: > elmar skrev: > >> Can anyone give me some support ? Any help is wellcome ! >> >> > It seems we have a bug. The code threat the argument 'r' as a KDTree > instead of array. > > Please ignore this :( I was not thinking stright. From sturla at molden.no Fri Jul 2 15:44:35 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 02 Jul 2010 21:44:35 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: Message-ID: <4C2E41A3.7050105@molden.no> elmar skrev: > Can anyone give me some support ? Any help is wellcome ! > You are passing an array instead of a KDTree. def count_neighbors(self, other, r, p=2.): """Count how many nearby pairs can be formed. Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1,x2,p)<=r. This is the "two-point correlation" described in Gray and Moore 2000, "N-body problems in statistical learning", and the code here is based on their algorithm. Parameters ========== other : KDTree From kwgoodman at gmail.com Fri Jul 2 15:47:51 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 2 Jul 2010 12:47:51 -0700 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: <4C2E4115.3010502@molden.no> References: <4C2E4093.7060809@molden.no> <4C2E4115.3010502@molden.no> Message-ID: On Fri, Jul 2, 2010 at 12:42 PM, Sturla Molden wrote: > Sturla Molden skrev: >> elmar skrev: >> >>> Can anyone give me some support ? Any help is wellcome ! >>> >>> >> It seems we have a bug. The code threat the argument 'r' as a KDTree >> instead of array. >> >> > Please ignore this :( > I was not thinking stright. (I thought the same thing.) Maybe you want this function: Definition: tree.query_ball_point(self, x, r, p=2.0, eps=0) Docstring: Find all points within r of x From elmar at net4werling.de Fri Jul 2 16:15:59 2010 From: elmar at net4werling.de (elmar) Date: Fri, 02 Jul 2010 22:15:59 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: <4C2E41A3.7050105@molden.no> References: <4C2E41A3.7050105@molden.no> Message-ID: Am 02.07.2010 21:44, schrieb Sturla Molden: > elmar skrev: >> Can anyone give me some support ? Any help is wellcome ! >> > > You are passing an array instead of a KDTree. > > def count_neighbors(self, other, r, p=2.): > """Count how many nearby pairs can be formed. > > Count the number of pairs (x1,x2) can be formed, with x1 drawn > from self and x2 drawn from other, and where distance(x1,x2,p)<=r. > This is the "two-point correlation" described in Gray and Moore 2000, > "N-body problems in statistical learning", and the code here is based > on their algorithm. > > Parameters > ========== > > other : KDTree hi sturla, it's late, it's hot, football is on TV and num = KDTree.count_neighbors(point, tree, r) aren't working neither- Can you give a newby a more detailed help ? elmar From elmar at net4werling.de Fri Jul 2 16:19:00 2010 From: elmar at net4werling.de (elmar) Date: Fri, 02 Jul 2010 22:19:00 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E4093.7060809@molden.no> <4C2E4115.3010502@molden.no> Message-ID: Am 02.07.2010 21:47, schrieb Keith Goodman: ....... > > (I thought the same thing.) > > Maybe you want this function: > > Definition: tree.query_ball_point(self, x, r, p=2.0, eps=0) > Docstring: > Find all points within r of x hi Keith, thank you for reply, but i'm looking for the number of neighbors and not for the neighbors themself. elmar From aarchiba at physics.mcgill.ca Fri Jul 2 16:25:35 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 2 Jul 2010 16:25:35 -0400 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E41A3.7050105@molden.no> Message-ID: The count_neighbors function is designed to compare close neighbors between two trees, not between one tree and one point or one tree and one array of points. I think the fastest way to get what you want is to just query for the neighbors and take the length of the resulting list. You could also try putting your query point (or points) into a KDtreel; if you have many query points this will be much faster. Anne On 2 July 2010 16:15, elmar wrote: > Am 02.07.2010 21:44, schrieb Sturla Molden: >> elmar skrev: >>> Can anyone give me some support ? Any help is wellcome ! >>> >> >> You are passing an array instead of a KDTree. >> >> def count_neighbors(self, other, r, p=2.): >> """Count how many nearby pairs can be formed. >> >> Count the number of pairs (x1,x2) can be formed, with x1 drawn >> from self and x2 drawn from other, and where distance(x1,x2,p)<=r. >> This is the "two-point correlation" described in Gray and Moore 2000, >> "N-body problems in statistical learning", and the code here is based >> on their algorithm. >> >> Parameters >> ========== >> >> other : KDTree > > hi sturla, > > it's late, it's hot, football is on TV and > > num = KDTree.count_neighbors(point, tree, r) > > aren't working neither- > > Can you give a newby a more detailed help ? > > elmar > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From elmar at net4werling.de Fri Jul 2 16:33:23 2010 From: elmar at net4werling.de (elmar) Date: Fri, 02 Jul 2010 22:33:23 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E41A3.7050105@molden.no> Message-ID: hi Anne, will try length - but tomorrow elmar Am 02.07.2010 22:25, schrieb Anne Archibald: > The count_neighbors function is designed to compare close neighbors > between two trees, not between one tree and one point or one tree and > one array of points. I think the fastest way to get what you want is > to just query for the neighbors and take the length of the resulting > list. You could also try putting your query point (or points) into a > KDtreel; if you have many query points this will be much faster. > > Anne ...................... From kwgoodman at gmail.com Fri Jul 2 16:35:31 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 2 Jul 2010 13:35:31 -0700 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E41A3.7050105@molden.no> Message-ID: On Fri, Jul 2, 2010 at 1:25 PM, Anne Archibald wrote: > The count_neighbors function is designed to compare close neighbors > between two trees, not between one tree and one point or one tree and > one array of points. I think the fastest way to get what you want is > to just query for the neighbors and take the length of the resulting > list. You could also try putting your query point (or points) into a > KDtreel; if you have many query points this will be much faster. Oh, that sounds interesting. So I build one tree for the data and another tree for the query points? (For each query point I want to find the k nearest neighbors in the data tree). What's the next step? From aarchiba at physics.mcgill.ca Fri Jul 2 20:07:06 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 2 Jul 2010 20:07:06 -0400 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E41A3.7050105@molden.no> Message-ID: On 2 July 2010 16:35, Keith Goodman wrote: > On Fri, Jul 2, 2010 at 1:25 PM, Anne Archibald > wrote: >> The count_neighbors function is designed to compare close neighbors >> between two trees, not between one tree and one point or one tree and >> one array of points. I think the fastest way to get what you want is >> to just query for the neighbors and take the length of the resulting >> list. You could also try putting your query point (or points) into a >> KDtreel; if you have many query points this will be much faster. > > Oh, that sounds interesting. So I build one tree for the data and > another tree for the query points? (For each query point I want to > find the k nearest neighbors in the data tree). What's the next step? Looking at the code, I see that much of this is not implemented, unfortunately. The compiled version (much faster and more space-efficient) supports only querying of arrays of points. The python version (more flexible) has several additional algorithms implemented; in particular it has query_ball_point, which finds all neighbours within a radius no matter how many there are, and query_ball_tree, which does the same thing but accepts a tree as the second argument and takes advantage of it to accelerate the algorithm (from something like m log(n) to something like log(m)log(n)). Because the python tree supports annotations more naturally, I also implemented a two-tree neighbour-counting algorithm. What isn't there is what I think you were hoping for: an algorithm that takes two trees and finds the k nearest neighbours in one of each point in the other. The reason it's not there is not just because I didn't bother to implement it; it's there because you don't gain the same benefits from such an algorithm - since each point always has k neighbours, you're stuck with at least m log(n) behaviour. There is some tree traversal that can sometimes be avoided, since points near to each other may have similar sets of neighbours (or not), but it would take quite a sophisticated and clever algorithm to take full advantage of it. Anne From eavventi at yahoo.it Sat Jul 3 05:32:38 2010 From: eavventi at yahoo.it (enrico avventi) Date: Sat, 3 Jul 2010 02:32:38 -0700 (PDT) Subject: [SciPy-User] R: [ANN] NLopt, a nonlinear optimization library, now with Python interface In-Reply-To: <76d97234-00ad-4516-a786-e71ee9a866f4@g19g2000yqc.googlegroups.com> Message-ID: <484818.50069.qm@web26702.mail.ukl.yahoo.com> hello, i wanted to try out your library but the configure process won't find the header arrayobject.h and thus won't compile the bindings. where am i supposed to put that header? as my distro (Archlinux) doesn't seem to install the numpy headers i tried to copy them manually. so far i tried: /usr/include/numpy/arrayobject.h /usr/include/python26/numpy/arrayobject.h adding CPPFLAGS=-I/path/to/include/numpy where the headers for python are in /usr/include/python26 thanks in advance, /Enrico --- Gio 17/6/10, Steven G. Johnson ha scritto: Da: Steven G. Johnson Oggetto: [SciPy-User] [ANN] NLopt, a nonlinear optimization library, now with Python interface A: scipy-user at scipy.org Data: Gioved? 17 giugno 2010, 18:11 The NLopt library, available from ? ???http://ab-initio.mit.edu/nlopt provides a common interface for a large number of algorithms for both global and local nonlinear optimizations, both with and without gradient information, and including both bound constraints and nonlinear equality/inequality constraints. NLopt is written in C, but now includes a Python interface (as well as interfaces for C++, Fortran, Matlab, Octave, and Guile). It is free software under the GNU LGPL. Regards, Steven G. Johnson _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Jul 3 10:24:50 2010 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 3 Jul 2010 09:24:50 -0500 Subject: [SciPy-User] [SciPy-Dev] SciPy docs: volunteers needed now! In-Reply-To: References: Message-ID: Joshua, In addition to the very technical writing for individual functions, we also need documentation that is accessible to newcomers. Many modules do not implement any functions themselves, but act as a grouping module (for example, scipy.io). These modules could definitely use good, up-to-date, summary narratives. Even some modules further down the stack can still benefit from good summaries. To everyone, if you do join the documentation efforts to contribute little bits of writing, it is a common courtesy to notify any others who might also be working on a particular document. The current system does not automatically notify authors of any changes, so it is hard to know if any changes have been made. General rule of thumb is to notify authors who have made changes to the doc within the last 3 months (I believe). I really hope to see you all soon in the marathon! Ben Root On Sat, Jul 3, 2010 at 3:08 AM, Joshua Holbrook wrote: > My own reasons for hesitating have more to do with knowing that any > documentation I write will likely have poor style. I tend to write in > a very informal, conversational manner. > > That said, I'll try to do my part as I use parts of scipy, since > having unprofessional documentation is probably better than having no > documentation. > > --Josh > > 2010/7/3 St?fan van der Walt : > > On 2 July 2010 14:14, Joe Harrington wrote: > >>> I wonder whether there is any other approach that we can explore to > >>> help generate more volunteer work? Do you think it is mainly the > >>> difference between scipy and numpy that explains the drop-off? Or > >>> something else? To the extent that it is the technical differences > >>> - do you think there would be any point in trying to establish > >>> something like nominated experts or want-to-find-out type experts who > >>> will offer to advise on particular parts of scipy - even if they don't > >>> themselves write the docstrings? Or anything else that might help? > >> > >> We already looked for topical experts. We have a few; David can > >> comment more. In the end what we need are rank-and-file writers, > >> people who will take something on, learn about it, and write about it. > >> Yes, SciPy is more technical, but we've all dealt with harder tasks > >> than documenting SciPy. > > > > All the posts I have seen talk about achieving higher word counts, > > covering more functions, going bigger and better. While that's > > certainly what we want, such requests may be intimidating to new > > contributors. > > > > My feeling is that we should identify a small handful of functions to > > focus on. That way, we may only document 10 functions a week, but at > > least those will get done. Emanuelle's suggestion to target specific > > writers also seems sensible. > > > > Regards > > St?fan > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.tollerud at gmail.com Sat Jul 3 19:14:01 2010 From: erik.tollerud at gmail.com (Erik Tollerud) Date: Sat, 3 Jul 2010 16:14:01 -0700 Subject: [SciPy-User] Co-ordinating Python astronomy libraries? In-Reply-To: <4C2BBBA7.5060006@gemini.edu> References: <4C2BBBA7.5060006@gemini.edu> Message-ID: Dear AstroPy list and James, Thanks for your efforts to do this coordination - clearly there's quite a bit out there doing similar work. For the last couple years I (and a few others) have been working on Astropysics (http://packages.python.org/Astropysics/ and source code at https://launchpad.net/astropysics). This initially spun out of my personal need for some of the utilities these other projects discussed (e.g. IDL astron-like functions), but I've been turning it in a slightly different direction. Astropysics now is being written as a more "pythonic" way of doing the same tasks instead of starting from cloning IDL procedural techniques. I'm trying to leverage all the existing tools that have been written in python for other domains (e.g. Enthought Traits to easily make good GUIs, sphinx to make useful documentation, distribute to make packaging easier). Further, where useful, I use object-oriented techniques instead of the procedural approach people familiar with IDL and IRAF are used to. The idea is to start doing things on "Spectrum" and "CCDImage" objects instead of passing arrays into functions and so on, and each astronomer can then sub-class these classes to do whatever they want. The aim here is to eliminate the tendancy for people to take public codes and change them internally, leading to some very painful efforts in disentangling versions. So far, Astropysics includes the following modules: * constants ? physical constants and cosmological calculations * coords ? coordinate classes and coordinate conversions * models ? model and data-fitting classes and tools * objcat ? flexible, dynamically updated object catalog * ccd ? image processing tools * spec ? spectrum and SED classes and tools * phot ? photometry/flux measurement classes and tools * obstools ? miscellaneous tools for observation * io ? data input/output classes and tools * plotting ? astronomy-oriented matplotlib and mayavi plotting tools * pipeline ? classes for data reduction and analysis pipelines * utils ? utility classes and functions And these GUI tools, which require Enthought Traits (and Enthought chaco for plots in the GUI): * Spylot ? Spectrum Plotter * Fitgui ? Interactive Curve Fitting * Spectarget ? MOS Targeting I'm making a concerted effort to document everything in a consistent manner using sphinx (http://sphinx.pocoo.org) - the resulting documents (http://packages.python.org/Astropysics/) end up being much more useful. I also try to bind in python wrappers around external tools like sextractor, Mike Blanton's kcorrect, MOS mask design tools, and galfit (planned) ... this happens mostly as I need them for my own science. But this cuts down on the time re-implementing standard programs that aren't necessarily worth re-working. There are two main things left before I consider it "releasable" in terms of the baseline functionality: First, it needs WCS support to be integrated in the coords framework, and second, the objcat package should have a web server module that lets you show/edit the catalog over the internet instead of within python. Another major goal I'd like to do when I or someone else gets a chance is something like ATV (http://www.physics.uci.edu/~barth/atv/) utilizing the same TraitsGUI framework that the other gui tools are written in. I hope this explains the direction I have in mind for astropysics. As far as I can tell, I have a slightly different philosophy - I'm trying to set up something of a framework of my design, rather than a function library. This is why I have not been working on adding it to astropy, because astropy seems much more like the traditional library... That and I'm not a fan of the Trac project management system. At any rate, I think there's definitely room for all in the community. And if anyone likes what they see, feel free to drop me a line if you want to contribute. -- Erik Tollerud Graduate Student Center For Cosmology University of California, Irvine http://ps.uci.edu/~etolleru On Wed, Jun 30, 2010 at 2:48 PM, James Turner wrote: > Dear Python users in astronomy, > > At SciPy 2009, I arranged an astronomy BoF where we discussed the > fact that there are now a number of astronomy libraries for Python > floating around and maybe it would be good to collect more code into > a single place. People seemed receptive to this idea and weren't sure > why it hasn't already happened, given that there has been an Astrolib > page at SciPy for some years now, with an associated SVN repository: > > ? http://scipy.org/AstroLib > > After the meeting last August, I was supposed to contact the mailing > list and some library authors I had talked to previously, to discuss > this further. My apologies for taking 10 months to do that! I did > draft an email the day after the BoF, but then we ran into a hurdle > with setting up new committers to the AstroLib repository (which has > taken a lot longer than expected to resolve), so it seemed a bad > time to suggest that new people start using it. > > To discuss these issues further, we'd like to encourage everyone to > sign up for the AstroPy mailing list if you are not already on it. > The traffic is just a few messages per month. > > ? http://lists.astropy.scipy.org/mailman/listinfo/astropy > > We (the 2009 BoF group) would also like to hear on the list about > why people have decided to host their own astronomy library (eg. not > being aware of the one at SciPy). Are you interested in contributing > to Astrolib? Do you have any other comments or concerns about > co-ordinating tools? Our motivation is to make libraries easy to > find and install, allow sharing code easily, help rationalize > available functionality and fill in what's missing. A standard > astronomy library with a single set of documentation should be more > coherent and easier to maintain. The idea is not to limit authors' > flexibility of take ownership of their code -- the sub-packages > can still be maintained by different people. > > If you're at SciPy this week, Perry Greenfield and I would be happy > to talk to you. If you would like to add your existing library to > Astrolib, please contact Perry Greenfield or Mark Sienkiewicz at > STScI for access (contact details at http://scipy.org/AstroLib). > Note that the repository is being moved to a new server this week, > after which the URLs will be updated at scipy.org. > > Thanks! > > James Turner (Gemini). > > Bcc: various library authors > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cohen at lpta.in2p3.fr Sun Jul 4 03:30:14 2010 From: cohen at lpta.in2p3.fr (Johann Cohen-Tanugi) Date: Sun, 04 Jul 2010 09:30:14 +0200 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: References: <4C2BBBA7.5060006@gemini.edu> Message-ID: <4C303886.6040206@lpta.in2p3.fr> Hi Erik, this is completely independent from the STS software, that shares some modules' name with you : https://www.stsci.edu/trac/ssb/astrolib ? Johann On 07/04/2010 01:14 AM, Erik Tollerud wrote: > Dear AstroPy list and James, > > Thanks for your efforts to do this coordination - clearly there's > quite a bit out there doing similar work. > > For the last couple years I (and a few others) have been working on > Astropysics (http://packages.python.org/Astropysics/ and source code > at https://launchpad.net/astropysics). This initially spun out of my > personal need for some of the utilities these other projects discussed > (e.g. IDL astron-like functions), but I've been turning it in a > slightly different direction. Astropysics now is being written as a > more "pythonic" way of doing the same tasks instead of starting from > cloning IDL procedural techniques. I'm trying to leverage all the > existing tools that have been written in python for other domains > (e.g. Enthought Traits to easily make good GUIs, sphinx to make useful > documentation, distribute to make packaging easier). Further, where > useful, I use object-oriented techniques instead of the procedural > approach people familiar with IDL and IRAF are used to. The idea is > to start doing things on "Spectrum" and "CCDImage" objects instead of > passing arrays into functions and so on, and each astronomer can then > sub-class these classes to do whatever they want. The aim here is to > eliminate the tendancy for people to take public codes and change them > internally, leading to some very painful efforts in disentangling > versions. > > So far, Astropysics includes the following modules: > * constants ? physical constants and cosmological calculations > * coords ? coordinate classes and coordinate conversions > * models ? model and data-fitting classes and tools > * objcat ? flexible, dynamically updated object catalog > * ccd ? image processing tools > * spec ? spectrum and SED classes and tools > * phot ? photometry/flux measurement classes and tools > * obstools ? miscellaneous tools for observation > * io ? data input/output classes and tools > * plotting ? astronomy-oriented matplotlib and mayavi plotting tools > * pipeline ? classes for data reduction and analysis pipelines > * utils ? utility classes and functions > > And these GUI tools, which require Enthought Traits (and Enthought > chaco for plots in the GUI): > * Spylot ? Spectrum Plotter > * Fitgui ? Interactive Curve Fitting > * Spectarget ? MOS Targeting > > I'm making a concerted effort to document everything in a consistent > manner using sphinx (http://sphinx.pocoo.org) - the resulting > documents (http://packages.python.org/Astropysics/) end up being much > more useful. > > I also try to bind in python wrappers around external tools like > sextractor, Mike Blanton's kcorrect, MOS mask design tools, and > galfit (planned) ... this happens mostly as I need them for my own > science. But this cuts down on the time re-implementing standard > programs that aren't necessarily worth re-working. > > There are two main things left before I consider it "releasable" in > terms of the baseline functionality: First, it needs WCS support to be > integrated in the coords framework, and second, the objcat package > should have a web server module that lets you show/edit the catalog > over the internet instead of within python. Another major goal I'd > like to do when I or someone else gets a chance is something like ATV > (http://www.physics.uci.edu/~barth/atv/) utilizing the same TraitsGUI > framework that the other gui tools are written in. > > I hope this explains the direction I have in mind for astropysics. As > far as I can tell, I have a slightly different philosophy - I'm trying > to set up something of a framework of my design, rather than a > function library. This is why I have not been working on adding it to > astropy, because astropy seems much more like the traditional > library... That and I'm not a fan of the Trac project management > system. At any rate, I think there's definitely room for all in the > community. And if anyone likes what they see, feel free to drop me a > line if you want to contribute. > > From elmar at net4werling.de Sun Jul 4 13:59:27 2010 From: elmar at net4werling.de (elmar) Date: Sun, 04 Jul 2010 19:59:27 +0200 Subject: [SciPy-User] KDTree count_neighbors In-Reply-To: References: <4C2E41A3.7050105@molden.no> Message-ID: Am 02.07.2010 22:35, schrieb Keith Goodman: ... > Oh, that sounds interesting. So I build one tree for the data and > another tree for the query points? (For each query point I want to > find the k nearest neighbors in the data tree). What's the next step? hi all, thanks for help. Scipts is now working, but still needs some improvement. cheers elmar from numpy import array, arange, zeros from numpy.random import uniform from scipy.spatial import KDTree from matplotlib.pylab import scatter, contourf # just to get a pattern x = uniform(0, 360, 100) y = uniform(0, 90, 100) pattern = zip(x.ravel(), y.ravel()) tree_pattern = KDTree(pattern) # background grid or reference points neighbors_grid = zeros((360, 90)) # get number of neighbors within radius of r r = 20 for i in range(360): for j in range(90): ref=array([[float(i), float(j)]]) tree_ref=KDTree(ref) neighbors_grid[i][j] = KDTree.count_neighbors(tree_ref,tree_pattern, r) contourf(neighbors_grid) scatter(y,x) From ralf.gommers at googlemail.com Sun Jul 4 19:34:01 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 5 Jul 2010 07:34:01 +0800 Subject: [SciPy-User] ANN: scipy 0.8.0 release candidate 1 Message-ID: I'm pleased to announce the availability of the first release candidate of SciPy 0.8.0. Please try it out and report any problems on the scipy-dev mailing list. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release candidate release comes almost one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0rc1 requires Python 2.4-2.6 and NumPy 1.4.1 or greater. For more information, please see the release notes: http://sourceforge.net/projects/scipy/files/scipy/0.8.0rc1/NOTES.txt/view You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available, as well as source tarballs for other platforms and the documentation in pdf form. Thank you to everybody who contributed to this release. Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Mon Jul 5 06:40:58 2010 From: robince at gmail.com (Robin) Date: Mon, 5 Jul 2010 11:40:58 +0100 Subject: [SciPy-User] many test failures on windows 64 Message-ID: Hi, I am using Python.org amd64 python build on windows 7 64 bit. I am using numpy and scipy builds from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/ I get many errors in scipy test (none for numpy). Particularly in scipy.sparse.linalg which I need to use (and in my code it appears spsolve is giving incorrect results). Is there a better 64 bit windows build to use? >>> scipy.test() Running unit tests for scipy NumPy version 1.4.1 NumPy is installed in C:\Python26\lib\site-packages\numpy SciPy version 0.8.0b1 SciPy is installed in C:\Python26\lib\site-packages\scipy Python version 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)] nose version 0.11.3 ........................................................................................................................ ............................E....................................F...................................................... ..............................C:\Python26\lib\site-packages\scipy\interpolate\fitpack2.py:639: UserWarning: The coefficients of the spline returned have been computed as the minimal norm least-squares solution of a (numerically) rank deficient system (deficiency=7). If deficiency is large, the results may be inaccurate. Deficiency may strongly depend on the value of eps. warnings.warn(message) .....C:\Python26\lib\site-packages\scipy\interpolate\fitpack2.py:580: UserWarning: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots. warnings.warn(message) ...........................................K..K......................................................................... ........................................................................................................................ ........................................................................................................................ EC:\Python26\lib\site-packages\numpy\lib\utils.py:140: DeprecationWarning: `write_array` is deprecated! This function is replaced by numpy.savetxt which allows the same functionality through a different syntax. warnings.warn(depdoc, DeprecationWarning) C:\Python26\lib\site-packages\numpy\lib\utils.py:140: DeprecationWarning: `read_array` is deprecated! The functionality of read_array is in numpy.loadtxt which allows the same functionality using different syntax. warnings.warn(depdoc, DeprecationWarning) ...........................................Exception AttributeError: "'netcdf_file' object has no attribute 'mode'" in < bound method netcdf_file.close of > ignored ............C:\Python26\lib\site-packages\numpy\lib\utils.py:140: DeprecationWarning: `npfile` is deprecated! You can achieve the same effect as using npfile using numpy.save and numpy.load. You can use memory-mapped arrays and data-types to map out a file format for direct manipulation in NumPy. warnings.warn(depdoc, DeprecationWarning) .........C:\Python26\lib\site-packages\scipy\io\wavfile.py:30: WavFileWarning: Unfamiliar format bytes warnings.warn("Unfamiliar format bytes", WavFileWarning) C:\Python26\lib\site-packages\scipy\io\wavfile.py:120: WavFileWarning: chunk not understood warnings.warn("chunk not understood", WavFileWarning) ........................................................................................................................ .......................................................................................................SSSSSS......SSSSS S......SSSS...............................................................S............................................. ........................................................................................................................ ..............................................E......................................................................... ........................................................................................................................ ....SSS.........S....................................................................................................... .............................................................F.......................................................... ........................................................................................................................ .....................................................FFF.....................................................C:\Python26 \lib\site-packages\scipy\signal\filter_design.py:247: BadCoefficients: Badly conditioned filter coefficients (numerator) : the results may be meaningless "results may be meaningless", BadCoefficients) ........................................................................................................................ ................................................................................................E....................... ..........................SSSSSSSSSSS.FE.EE.EE......K.........E.E...................................E................... ....................................................................K..................E................................ ............K..................E.................................................E...................................... ..............................................KK........................E............................................... ........................................................................................................................ ..................................................................................................................F..... ...............................................................................K.K...................................... ........................................................................................................................ ........................................................................................................................ ..........................F...F..............................................................K........K.........SSSSS... ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ .............................S.......................................................................................... ...........................................................................................C:\Python26\lib\site-packages \scipy\stats\morestats.py:702: UserWarning: Ties preclude use of exact statistic. warnings.warn("Ties preclude use of exact statistic.") ........................................................................................................................ ........................................................................................................................ ....................................................error removing c:\users\robince\appdata\local\temp\tmpr3s_aecat_test : c:\users\robince\appdata\local\temp\tmpr3s_aecat_test: The directory is not empty .................................................................................................. ====================================================================== ERROR: Testing that kmeans2 init methods work. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\cluster\tests\test_vq.py", line 166, in test_kmeans2_init kmeans2(data, 3, minit = 'points') File "C:\Python26\lib\site-packages\scipy\cluster\vq.py", line 671, in kmeans2 clusters = init(data, k) File "C:\Python26\lib\site-packages\scipy\cluster\vq.py", line 523, in _kpoints p = np.random.permutation(n) File "mtrand.pyx", line 4231, in mtrand.RandomState.permutation (build\scons\numpy\random\mtrand\mtrand.c:18669) File "mtrand.pyx", line 4174, in mtrand.RandomState.shuffle (build\scons\numpy\random\mtrand\mtrand.c:18261) TypeError: len() of unsized object ====================================================================== ERROR: test_basic (test_array_import.TestNumpyio) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\io\tests\test_array_import.py", line 29, in test_basic b = numpyio.fread(fid,1000000,N.Int16,N.Int) MemoryError ====================================================================== ERROR: test_decomp.test_lapack_misaligned(, (array([[ 1.734e-255, 8.189e-217, 4.025e-178, 1.903e-139, 9.344e-101, ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\linalg\tests\test_decomp.py", line 1074, in check_lapack_misaligned func(*a,**kwargs) File "C:\Python26\lib\site-packages\scipy\linalg\basic.py", line 49, in solve a1, b1 = map(asarray_chkfinite,(a,b)) File "C:\Python26\lib\site-packages\numpy\lib\function_base.py", line 586, in asarray_chkfinite raise ValueError, "array must not contain infs or NaNs" ValueError: array must not contain infs or NaNs ====================================================================== ERROR: Regression test for #880: empty array for zi crashes. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\signal\tests\test_signaltools.py", line 422, in test_empty_zi y, zf = lfilter(b, a, x, zi=zi) File "C:\Python26\lib\site-packages\scipy\signal\signaltools.py", line 610, in lfilter return sigtools._linear_filter(b, a, x, axis, zi) TypeError: array cannot be safely cast to required type ====================================================================== ERROR: test_linsolve.TestSplu.test_lu_refcount ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 122, in test_lu_refcount lu = splu(a_) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_linsolve.TestSplu.test_spilu_smoketest ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 60, in test_spilu_smokete st lu = spilu(self.A, drop_tol=1e-2, fill_factor=5) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 245, in spilu ilu=True, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_linsolve.TestSplu.test_splu_basic ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 87, in test_splu_basic lu = splu(a_) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_linsolve.TestSplu.test_splu_perm ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 100, in test_splu_perm lu = splu(a_) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_linsolve.TestSplu.test_splu_smoketest ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 53, in test_splu_smoketes t lu = splu(self.A) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: Check that QMR works with left and right preconditioners ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\linalg\isolve\tests\test_iterative.py", line 161, in test_leftright_p recond L_solver = splu(L) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_preconditioner (test_lgmres.TestLGMRES) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\linalg\isolve\tests\test_lgmres.py", line 38, in test_preconditioner pc = splu(Am.tocsc()) File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 173, in splu ilu=False, options=_options) RuntimeError: Factor is exactly singular ====================================================================== ERROR: test_mu (test_base.TestBSR) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", line 966, in test_mu D1 = A * B.T File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", line 319, in __mul__ return N.dot(self, asmatrix(other)) TypeError: array cannot be safely cast to required type ====================================================================== ERROR: test_mu (test_base.TestCSC) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", line 966, in test_mu D1 = A * B.T File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", line 319, in __mul__ return N.dot(self, asmatrix(other)) TypeError: array cannot be safely cast to required type ====================================================================== ERROR: test_mu (test_base.TestCSR) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", line 966, in test_mu D1 = A * B.T File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", line 319, in __mul__ return N.dot(self, asmatrix(other)) TypeError: array cannot be safely cast to required type ====================================================================== ERROR: test_mu (test_base.TestDIA) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", line 966, in test_mu D1 = A * B.T File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", line 319, in __mul__ return N.dot(self, asmatrix(other)) TypeError: array cannot be safely cast to required type ====================================================================== ERROR: test_mu (test_base.TestLIL) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", line 966, in test_mu D1 = A * B.T File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", line 319, in __mul__ return N.dot(self, asmatrix(other)) TypeError: array cannot be safely cast to required type ====================================================================== FAIL: test_complex (test_basic.TestLongDoubleFailure) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\fftpack\tests\test_basic.py", line 527, in test_complex np.longcomplex) AssertionError: Type not supported but does not fail ====================================================================== FAIL: extrema 3 ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\ndimage\tests\test_ndimage.py", line 3149, in test_extrema03 self.failUnless(numpy.all(output1[2] == output4)) AssertionError ====================================================================== FAIL: test_lorentz (test_odr.TestODR) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", line 292, in test_lorentz 3.7798193600109009e+00]), File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 765, in assert_array_almost_equal header='Arrays are not almost equal') File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 609, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 100.0%) x: array([ 1.00000000e+03, 1.00000000e-01, 3.80000000e+00]) y: array([ 1.43067808e+03, 1.33905090e-01, 3.77981936e+00]) ====================================================================== FAIL: test_multi (test_odr.TestODR) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", line 188, in test_multi 0.5101147161764654, 0.5173902330489161]), File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 765, in assert_array_almost_equal header='Arrays are not almost equal') File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 609, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 100.0%) x: array([ 4. , 2. , 7. , 0.4, 0.5]) y: array([ 4.37998803, 2.43330576, 8.00288459, 0.51011472, 0.51739023]) ====================================================================== FAIL: test_pearson (test_odr.TestODR) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", line 235, in test_pearson np.array([ 5.4767400299231674, -0.4796082367610305]), File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 765, in assert_array_almost_equal header='Arrays are not almost equal') File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line 609, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 100.0%) x: array([ 1., 1.]) y: array([ 5.47674003, -0.47960824]) ====================================================================== FAIL: test_twodiags (test_linsolve.TestLinsolve) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", line 39, in test_twodiags assert( norm(b - Asp*x) < 10 * cond_A * eps ) AssertionError ====================================================================== FAIL: test_kdtree.test_vectorization.test_single_query ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\spatial\tests\test_kdtree.py", line 154, in test_single_query assert isinstance(i,int) AssertionError ====================================================================== FAIL: test_data.test_boost(,) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", line 205, in _test_factory test.check(dtype=dtype) File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", line 187, in check assert False, "\n".join(msg) AssertionError: Max |adiff|: 1.77636e-15 Max |rdiff|: 1.09352e-13 Bad results for the following points (in output 0): 1.0000014305114746 => 0.0016914556651294794 != 0.0016914556651292944 (rdiff 1.093 5249113484058e-13) 1.000007152557373 => 0.0037822080446614169 != 0.0037822080446612951 (rdiff 3.222 0418006721235e-14) 1.0000138282775879 => 0.005258943946801071 != 0.0052589439468011014 (rdiff 5.772 5773723182603e-15) 1.0000171661376953 => 0.0058593666181291238 != 0.0058593666181292027 (rdiff 1.347 0725302071254e-14) 1.0000600814819336 => 0.01096183199218881 != 0.010961831992188852 (rdiff 3.798 0296955025714e-15) 1.0001168251037598 => 0.015285472131830317 != 0.015285472131830425 (rdiff 7.036 2795851489781e-15) 1.0001487731933594 => 0.017249319093529933 != 0.017249319093529877 (rdiff 3.218 1647826365358e-15) 1.0003981590270996 => 0.028218171738655599 != 0.028218171738655373 (rdiff 7.991 8023735059643e-15) 1.000638484954834 => 0.035732814682314498 != 0.035732814682314568 (rdiff 1.941 8828227213605e-15) 1.0010714530944824 => 0.046287402472878984 != 0.046287402472878776 (rdiff 4.497 2672043800306e-15) 1.0049939155578613 => 0.099897593086028066 != 0.099897593086027803 (rdiff 2.639 4826962588157e-15) 1.024169921875 => 0.21942279004958387 != 0.21942279004958354 (rdiff 1.517 9230348510424e-15) ====================================================================== FAIL: test_data.test_boost(,) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", line 186, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", line 205, in _test_factory test.check(dtype=dtype) File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", line 187, in check assert False, "\n".join(msg) AssertionError: Max |adiff|: 6.39488e-12 Max |rdiff|: 1.01982e-12 Bad results for the following points (in output 0): -0.99999284744262695 => -6.2705920974721474 != -6.2705920974657525 (rdiff 1.019 8214973073088e-12) -0.99998283386230469 => -5.832855225376532 != -5.832855225378502 (rdiff 3.377 3849320373679e-13) ---------------------------------------------------------------------- Ran 4410 tests in 29.842s FAILED (KNOWNFAIL=11, SKIP=38, errors=16, failures=9) >>> From bsouthey at gmail.com Mon Jul 5 22:36:52 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 5 Jul 2010 21:36:52 -0500 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: On Mon, Jul 5, 2010 at 5:40 AM, Robin wrote: > Hi, > > I am using Python.org amd64 python build on windows 7 64 bit. > > I am using numpy and scipy builds from here: > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > I get many errors in scipy test (none for numpy). Particularly in > scipy.sparse.linalg which I need to use (and in my code it appears > spsolve is giving incorrect results). > > Is there a better 64 bit windows build to use? > >>>> scipy.test() > Running unit tests for scipy > NumPy version 1.4.1 > NumPy is installed in C:\Python26\lib\site-packages\numpy > SciPy version 0.8.0b1 > SciPy is installed in C:\Python26\lib\site-packages\scipy > Python version 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 > 64 bit (AMD64)] > nose version 0.11.3 > ........................................................................................................................ > ............................E....................................F...................................................... > ..............................C:\Python26\lib\site-packages\scipy\interpolate\fitpack2.py:639: > UserWarning: > The coefficients of the spline returned have been computed as the > minimal norm least-squares solution of a (numerically) rank deficient > system (deficiency=7). If deficiency is large, the results may be > inaccurate. Deficiency may strongly depend on the value of eps. > ?warnings.warn(message) > .....C:\Python26\lib\site-packages\scipy\interpolate\fitpack2.py:580: > UserWarning: > The required storage space exceeds the available storage space: nxest > or nyest too small, or s too small. > The weighted least-squares spline corresponds to the current set of > knots. > ?warnings.warn(message) > ...........................................K..K......................................................................... > ........................................................................................................................ > ........................................................................................................................ > EC:\Python26\lib\site-packages\numpy\lib\utils.py:140: > DeprecationWarning: `write_array` is deprecated! > > This function is replaced by numpy.savetxt which allows the same functionality > through a different syntax. > > ?warnings.warn(depdoc, DeprecationWarning) > C:\Python26\lib\site-packages\numpy\lib\utils.py:140: > DeprecationWarning: `read_array` is deprecated! > > The functionality of read_array is in numpy.loadtxt which allows the same > functionality using different syntax. > > ?warnings.warn(depdoc, DeprecationWarning) > ...........................................Exception AttributeError: > "'netcdf_file' object has no attribute 'mode'" in < > bound method netcdf_file.close of at 0x000000000C64D6D8>> ignored > ............C:\Python26\lib\site-packages\numpy\lib\utils.py:140: > DeprecationWarning: `npfile` is deprecated! > > You can achieve the same effect as using npfile using numpy.save and > numpy.load. > > You can use memory-mapped arrays and data-types to map out a > file format for direct manipulation in NumPy. > > ?warnings.warn(depdoc, DeprecationWarning) > .........C:\Python26\lib\site-packages\scipy\io\wavfile.py:30: > WavFileWarning: Unfamiliar format bytes > ?warnings.warn("Unfamiliar format bytes", WavFileWarning) > C:\Python26\lib\site-packages\scipy\io\wavfile.py:120: WavFileWarning: > chunk not understood > ?warnings.warn("chunk not understood", WavFileWarning) > ........................................................................................................................ > .......................................................................................................SSSSSS......SSSSS > S......SSSS...............................................................S............................................. > ........................................................................................................................ > ..............................................E......................................................................... > ........................................................................................................................ > ....SSS.........S....................................................................................................... > .............................................................F.......................................................... > ........................................................................................................................ > .....................................................FFF.....................................................C:\Python26 > \lib\site-packages\scipy\signal\filter_design.py:247: BadCoefficients: > Badly conditioned filter coefficients (numerator) > : the results may be meaningless > ?"results may be meaningless", BadCoefficients) > ........................................................................................................................ > ................................................................................................E....................... > ..........................SSSSSSSSSSS.FE.EE.EE......K.........E.E...................................E................... > ....................................................................K..................E................................ > ............K..................E.................................................E...................................... > ..............................................KK........................E............................................... > ........................................................................................................................ > ..................................................................................................................F..... > ...............................................................................K.K...................................... > ........................................................................................................................ > ........................................................................................................................ > ..........................F...F..............................................................K........K.........SSSSS... > ........................................................................................................................ > ........................................................................................................................ > ........................................................................................................................ > ........................................................................................................................ > .............................S.......................................................................................... > ...........................................................................................C:\Python26\lib\site-packages > \scipy\stats\morestats.py:702: UserWarning: Ties preclude use of exact > statistic. > ?warnings.warn("Ties preclude use of exact statistic.") > ........................................................................................................................ > ........................................................................................................................ > ....................................................error removing > c:\users\robince\appdata\local\temp\tmpr3s_aecat_test > : c:\users\robince\appdata\local\temp\tmpr3s_aecat_test: The directory > is not empty > .................................................................................................. > ====================================================================== > ERROR: Testing that kmeans2 init methods work. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\cluster\tests\test_vq.py", > line 166, in test_kmeans2_init > ? ?kmeans2(data, 3, minit = 'points') > ?File "C:\Python26\lib\site-packages\scipy\cluster\vq.py", line 671, in kmeans2 > ? ?clusters = init(data, k) > ?File "C:\Python26\lib\site-packages\scipy\cluster\vq.py", line 523, > in _kpoints > ? ?p = np.random.permutation(n) > ?File "mtrand.pyx", line 4231, in mtrand.RandomState.permutation > (build\scons\numpy\random\mtrand\mtrand.c:18669) > ?File "mtrand.pyx", line 4174, in mtrand.RandomState.shuffle > (build\scons\numpy\random\mtrand\mtrand.c:18261) > TypeError: len() of unsized object > > ====================================================================== > ERROR: test_basic (test_array_import.TestNumpyio) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\io\tests\test_array_import.py", > line 29, in test_basic > ? ?b = numpyio.fread(fid,1000000,N.Int16,N.Int) > MemoryError > > ====================================================================== > ERROR: test_decomp.test_lapack_misaligned( 0x0000000006366438>, (array([[ ?1.734e-255, ? 8.189e-217, > ?4.025e-178, ? 1.903e-139, ? 9.344e-101, > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\linalg\tests\test_decomp.py", > line 1074, in check_lapack_misaligned > ? ?func(*a,**kwargs) > ?File "C:\Python26\lib\site-packages\scipy\linalg\basic.py", line 49, in solve > ? ?a1, b1 = map(asarray_chkfinite,(a,b)) > ?File "C:\Python26\lib\site-packages\numpy\lib\function_base.py", > line 586, in asarray_chkfinite > ? ?raise ValueError, "array must not contain infs or NaNs" > ValueError: array must not contain infs or NaNs > > ====================================================================== > ERROR: Regression test for #880: empty array for zi crashes. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\signal\tests\test_signaltools.py", > line 422, in test_empty_zi > ? ?y, zf = lfilter(b, a, x, zi=zi) > ?File "C:\Python26\lib\site-packages\scipy\signal\signaltools.py", > line 610, in lfilter > ? ?return sigtools._linear_filter(b, a, x, axis, zi) > TypeError: array cannot be safely cast to required type > > ====================================================================== > ERROR: test_linsolve.TestSplu.test_lu_refcount > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 122, in test_lu_refcount > ? ?lu = splu(a_) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_linsolve.TestSplu.test_spilu_smoketest > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 60, in test_spilu_smokete > st > ? ?lu = spilu(self.A, drop_tol=1e-2, fill_factor=5) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 245, in spilu > ? ?ilu=True, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_linsolve.TestSplu.test_splu_basic > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 87, in test_splu_basic > ? ?lu = splu(a_) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_linsolve.TestSplu.test_splu_perm > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 100, in test_splu_perm > ? ?lu = splu(a_) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_linsolve.TestSplu.test_splu_smoketest > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 53, in test_splu_smoketes > t > ? ?lu = splu(self.A) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: Check that QMR works with left and right preconditioners > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\isolve\tests\test_iterative.py", > line 161, in test_leftright_p > recond > ? ?L_solver = splu(L) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_preconditioner (test_lgmres.TestLGMRES) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\isolve\tests\test_lgmres.py", > line 38, in test_preconditioner > ? ?pc = splu(Am.tocsc()) > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", > line 173, in splu > ? ?ilu=False, options=_options) > RuntimeError: Factor is exactly singular > > ====================================================================== > ERROR: test_mu (test_base.TestBSR) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", > line 966, in test_mu > ? ?D1 = A * B.T > ?File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", > line 319, in __mul__ > ? ?return N.dot(self, asmatrix(other)) > TypeError: array cannot be safely cast to required type > > ====================================================================== > ERROR: test_mu (test_base.TestCSC) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", > line 966, in test_mu > ? ?D1 = A * B.T > ?File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", > line 319, in __mul__ > ? ?return N.dot(self, asmatrix(other)) > TypeError: array cannot be safely cast to required type > > ====================================================================== > ERROR: test_mu (test_base.TestCSR) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", > line 966, in test_mu > ? ?D1 = A * B.T > ?File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", > line 319, in __mul__ > ? ?return N.dot(self, asmatrix(other)) > TypeError: array cannot be safely cast to required type > > ====================================================================== > ERROR: test_mu (test_base.TestDIA) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", > line 966, in test_mu > ? ?D1 = A * B.T > ?File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", > line 319, in __mul__ > ? ?return N.dot(self, asmatrix(other)) > TypeError: array cannot be safely cast to required type > > ====================================================================== > ERROR: test_mu (test_base.TestLIL) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\tests\test_base.py", > line 966, in test_mu > ? ?D1 = A * B.T > ?File "C:\Python26\lib\site-packages\numpy\matrixlib\defmatrix.py", > line 319, in __mul__ > ? ?return N.dot(self, asmatrix(other)) > TypeError: array cannot be safely cast to required type > > ====================================================================== > FAIL: test_complex (test_basic.TestLongDoubleFailure) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\fftpack\tests\test_basic.py", > line 527, in test_complex > ? ?np.longcomplex) > AssertionError: Type not supported but does not fail > > ====================================================================== > FAIL: extrema 3 > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\ndimage\tests\test_ndimage.py", > line 3149, in test_extrema03 > ? ?self.failUnless(numpy.all(output1[2] ?== output4)) > AssertionError > > ====================================================================== > FAIL: test_lorentz (test_odr.TestODR) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", > line 292, in test_lorentz > ? ?3.7798193600109009e+00]), > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 765, in assert_array_almost_equal > ? ?header='Arrays are not almost equal') > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 609, in assert_array_compare > ? ?raise AssertionError(msg) > AssertionError: > Arrays are not almost equal > > (mismatch 100.0%) > ?x: array([ ?1.00000000e+03, ? 1.00000000e-01, ? 3.80000000e+00]) > ?y: array([ ?1.43067808e+03, ? 1.33905090e-01, ? 3.77981936e+00]) > > ====================================================================== > FAIL: test_multi (test_odr.TestODR) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", > line 188, in test_multi > ? ?0.5101147161764654, ?0.5173902330489161]), > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 765, in assert_array_almost_equal > ? ?header='Arrays are not almost equal') > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 609, in assert_array_compare > ? ?raise AssertionError(msg) > AssertionError: > Arrays are not almost equal > > (mismatch 100.0%) > ?x: array([ 4. , ?2. , ?7. , ?0.4, ?0.5]) > ?y: array([ 4.37998803, ?2.43330576, ?8.00288459, ?0.51011472, ?0.51739023]) > > ====================================================================== > FAIL: test_pearson (test_odr.TestODR) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\odr\tests\test_odr.py", > line 235, in test_pearson > ? ?np.array([ 5.4767400299231674, -0.4796082367610305]), > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 765, in assert_array_almost_equal > ? ?header='Arrays are not almost equal') > ?File "C:\Python26\lib\site-packages\numpy\testing\utils.py", line > 609, in assert_array_compare > ? ?raise AssertionError(msg) > AssertionError: > Arrays are not almost equal > > (mismatch 100.0%) > ?x: array([ 1., ?1.]) > ?y: array([ 5.47674003, -0.47960824]) > > ====================================================================== > FAIL: test_twodiags (test_linsolve.TestLinsolve) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\scipy\sparse\linalg\dsolve\tests\test_linsolve.py", > line 39, in test_twodiags > ? ?assert( norm(b - Asp*x) < 10 * cond_A * eps ) > AssertionError > > ====================================================================== > FAIL: test_kdtree.test_vectorization.test_single_query > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\spatial\tests\test_kdtree.py", > line 154, in test_single_query > ? ?assert isinstance(i,int) > AssertionError > > ====================================================================== > FAIL: test_data.test_boost(,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", > line 205, in _test_factory > ? ?test.check(dtype=dtype) > ?File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", > line 187, in check > ? ?assert False, "\n".join(msg) > AssertionError: > Max |adiff|: 1.77636e-15 > Max |rdiff|: 1.09352e-13 > Bad results for the following points (in output 0): > ? ? ? ? ? ?1.0000014305114746 => ? ? ? ? ?0.0016914556651294794 != > ? ? ?0.0016914556651292944 ?(rdiff ? ? ? ? 1.093 > 5249113484058e-13) > ? ? ? ? ? ? 1.000007152557373 => ? ? ? ? ?0.0037822080446614169 != > ? ? ?0.0037822080446612951 ?(rdiff ? ? ? ? 3.222 > 0418006721235e-14) > ? ? ? ? ? ?1.0000138282775879 => ? ? ? ? ? 0.005258943946801071 != > ? ? ?0.0052589439468011014 ?(rdiff ? ? ? ? 5.772 > 5773723182603e-15) > ? ? ? ? ? ?1.0000171661376953 => ? ? ? ? ?0.0058593666181291238 != > ? ? ?0.0058593666181292027 ?(rdiff ? ? ? ? 1.347 > 0725302071254e-14) > ? ? ? ? ? ?1.0000600814819336 => ? ? ? ? ? ?0.01096183199218881 != > ? ? ? 0.010961831992188852 ?(rdiff ? ? ? ? 3.798 > 0296955025714e-15) > ? ? ? ? ? ?1.0001168251037598 => ? ? ? ? ? 0.015285472131830317 != > ? ? ? 0.015285472131830425 ?(rdiff ? ? ? ? 7.036 > 2795851489781e-15) > ? ? ? ? ? ?1.0001487731933594 => ? ? ? ? ? 0.017249319093529933 != > ? ? ? 0.017249319093529877 ?(rdiff ? ? ? ? 3.218 > 1647826365358e-15) > ? ? ? ? ? ?1.0003981590270996 => ? ? ? ? ? 0.028218171738655599 != > ? ? ? 0.028218171738655373 ?(rdiff ? ? ? ? 7.991 > 8023735059643e-15) > ? ? ? ? ? ? 1.000638484954834 => ? ? ? ? ? 0.035732814682314498 != > ? ? ? 0.035732814682314568 ?(rdiff ? ? ? ? 1.941 > 8828227213605e-15) > ? ? ? ? ? ?1.0010714530944824 => ? ? ? ? ? 0.046287402472878984 != > ? ? ? 0.046287402472878776 ?(rdiff ? ? ? ? 4.497 > 2672043800306e-15) > ? ? ? ? ? ?1.0049939155578613 => ? ? ? ? ? 0.099897593086028066 != > ? ? ? 0.099897593086027803 ?(rdiff ? ? ? ? 2.639 > 4826962588157e-15) > ? ? ? ? ? ? ? ?1.024169921875 => ? ? ? ? ? ?0.21942279004958387 != > ? ? ? ?0.21942279004958354 ?(rdiff ? ? ? ? 1.517 > 9230348510424e-15) > > ====================================================================== > FAIL: test_data.test_boost(,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg\nose\case.py", > line 186, in runTest > ? ?self.test(*self.arg) > ?File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", > line 205, in _test_factory > ? ?test.check(dtype=dtype) > ?File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", > line 187, in check > ? ?assert False, "\n".join(msg) > AssertionError: > Max |adiff|: 6.39488e-12 > Max |rdiff|: 1.01982e-12 > Bad results for the following points (in output 0): > ? ? ? ? ?-0.99999284744262695 => ? ? ? ? ? ?-6.2705920974721474 != > ? ? ? ?-6.2705920974657525 ?(rdiff ? ? ? ? 1.019 > 8214973073088e-12) > ? ? ? ? ?-0.99998283386230469 => ? ? ? ? ? ? -5.832855225376532 != > ? ? ? ? -5.832855225378502 ?(rdiff ? ? ? ? 3.377 > 3849320373679e-13) > > ---------------------------------------------------------------------- > Ran 4410 tests in 29.842s > > FAILED (KNOWNFAIL=11, SKIP=38, errors=16, failures=9) > >>>> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Under 32-bit Python and the scipy 0.8 rc1 under Windows 7 64bit, I only get the test_boost error the directory removal error (from this test: "test_create_catalog (test_catalog.TestGetCatalog) ..."). Some of the errors could be due to Window's lack of support for 64-bit like the "test_complex (test_basic.TestLongDoubleFailure)". However, you probably would have to build your own find out those if no one else has them. Given all the issues with 64-bit windows, do you really need 64-bit numpy/scipy? Bruce >>> scipy.test() Running unit tests for scipy NumPy version 1.4.1 NumPy is installed in E:\Python26\lib\site-packages\numpy SciPy version 0.8.0rc1 SciPy is installed in E:\Python26\lib\site-packages\scipy Python version 2.6.3 (r263rc1:75186, Oct 2 2009, 20:40:30) [MSC v.1500 32 bit (Intel)] nose version 0.11.1 [snip] ====================================================================== FAIL: test_data.test_boost(,) ---------------------------------------------------------------------- Traceback (most recent call last): File "E:\Python26\lib\site-packages\nose-0.11.1-py2.6.egg\nose\case.py", line 183, in runTest self.test(*self.arg) File "E:\Python26\lib\site-packages\scipy\special\tests\test_data.py", line 205, in _test_factory test.check(dtype=dtype) File "E:\Python26\lib\site-packages\scipy\special\tests\testutils.py", line 223, in check assert False, "\n".join(msg) AssertionError: Max |adiff|: 1.77636e-15 Max |rdiff|: 2.44233e-14 Bad results for the following points (in output 0): 1.0000014305114746 => 0.0016914556651292853 != 0.0016914556651292944 (rdiff 5.3842961637318929e-15) 1.000007152557373 => 0.0037822080446613874 != 0.0037822080446612951 (rdiff 2.4423306175913249e-14) 1.0000138282775879 => 0.0052589439468011612 != 0.0052589439468011014 (rdiff 1.1380223962570286e-14) 1.0000600814819336 => 0.010961831992188913 != 0.010961831992188852 (rdiff 5.5387933059412495e-15) 1.0001168251037598 => 0.015285472131830449 != 0.015285472131830425 (rdiff 1.5888373256788015e-15) 1.0003981590270996 => 0.028218171738655283 != 0.028218171738655373 (rdiff 3.1967209494023856e-15) ---------------------------------------------------------------------- From almar.klein at gmail.com Tue Jul 6 04:57:05 2010 From: almar.klein at gmail.com (Almar Klein) Date: Tue, 6 Jul 2010 10:57:05 +0200 Subject: [SciPy-User] ANN: Visvis 1.3.1 (bugfix release) Message-ID: Hi all, I recently announced Visvis version 1.3. However, there was a bug in the setup script that caused the processing subpackage to not being loaded. Visvis 1.3.1 fixes this issue along with another issue with regard to volume rendering. ===== Below is the announcement of visvis 1.3: ===== I am exited to announce version 1.3 of Visvis, the object oriented approach to visualization. Website: http://code.google.com/p/visvis/ Discussion group: http://groups.google.com/group/visvis/ Documentation: http://code.google.com/p/visvis/wiki/Visvis_basics The largest improvement is the Mesh class to represent triangular and quad meshes and surface data. The Axes class got a property to access 8 different light sources. These improvements enable numerous new possibilities to visualize data using Visvis. Further changes include the introduction of polar plotting and 3D bar charts. For a (more) complete list of changes see the release notes . === Description === Visvis is a pure Python library for visualization of 1D to 4D data in an object oriented way. Essentially, visvis is an object oriented layer of Python on top of OpenGl? , thereby combining the power of OpenGl?with the usability of Python. A Matlab-like interface in the form of a set of functions allows easy creation of objects (e.g. plot(), imshow(), volshow(), surf()). Regards, Almar -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Jul 6 09:03:29 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 14:03:29 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 3:36 AM, Bruce Southey wrote: > On Mon, Jul 5, 2010 at 5:40 AM, Robin wrote: >> Hi, >> >> I am using Python.org amd64 python build on windows 7 64 bit. >> >> I am using numpy and scipy builds from here: >> http://www.lfd.uci.edu/~gohlke/pythonlibs/ >> >> I get many errors in scipy test (none for numpy). Particularly in >> scipy.sparse.linalg which I need to use (and in my code it appears >> spsolve is giving incorrect results). >> >> Is there a better 64 bit windows build to use? > > Under 32-bit Python and the scipy 0.8 rc1 under Windows 7 64bit, I > only get the test_boost error the directory removal error (from this > test: "test_create_catalog (test_catalog.TestGetCatalog) ..."). > > Some of the errors could be due to Window's lack of support for 64-bit > like the "test_complex (test_basic.TestLongDoubleFailure)". However, > you probably would have to build your own find out those if no one > else has them. I suspect there are more errors because of indices being longs instead of ints on Windows. > Given all the issues with 64-bit windows, do you really need 64-bit numpy/scipy? Unfortunately I do... it looks like I will now have to port a lot of Python code to Matlab. I know Windows isn't very popular in the Scipy community, and I try to avoid using it when I can, but it seems Windows 7 is a lot better than previous versions. Also >4GB RAM is now more or less standard for numerical work so I think 64 bit windows really should be supported. In my group a large factor in the decision to use windows was remote desktop and terminal services... For non-command line users there is nothing equivalent that I know of. (There is NX for linux but only 2 users is free - with a small tweak to windows 7 it is possible to have full terminal server behaviour). I wonder how enthought get around this problem with 64 bit EPD on windows? Cheers Robin > > Bruce > > >>>> scipy.test() > Running unit tests for scipy > NumPy version 1.4.1 > NumPy is installed in E:\Python26\lib\site-packages\numpy > SciPy version 0.8.0rc1 > SciPy is installed in E:\Python26\lib\site-packages\scipy > Python version 2.6.3 (r263rc1:75186, Oct ?2 2009, 20:40:30) [MSC > v.1500 32 bit (Intel)] > nose version 0.11.1 > [snip] > ====================================================================== > FAIL: test_data.test_boost(,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "E:\Python26\lib\site-packages\nose-0.11.1-py2.6.egg\nose\case.py", > line 183, in runTest > ? ?self.test(*self.arg) > ?File "E:\Python26\lib\site-packages\scipy\special\tests\test_data.py", > line 205, in _test_factory > ? ?test.check(dtype=dtype) > ?File "E:\Python26\lib\site-packages\scipy\special\tests\testutils.py", > line 223, in check > ? ?assert False, "\n".join(msg) > AssertionError: > Max |adiff|: 1.77636e-15 > Max |rdiff|: 2.44233e-14 > Bad results for the following points (in output 0): > ? ? ? ? ? ?1.0000014305114746 => ? ? ? ? ?0.0016914556651292853 != > ? ? ?0.0016914556651292944 ?(rdiff ? ? ? ? 5.3842961637318929e-15) > ? ? ? ? ? ? 1.000007152557373 => ? ? ? ? ?0.0037822080446613874 != > ? ? ?0.0037822080446612951 ?(rdiff ? ? ? ? 2.4423306175913249e-14) > ? ? ? ? ? ?1.0000138282775879 => ? ? ? ? ?0.0052589439468011612 != > ? ? ?0.0052589439468011014 ?(rdiff ? ? ? ? 1.1380223962570286e-14) > ? ? ? ? ? ?1.0000600814819336 => ? ? ? ? ? 0.010961831992188913 != > ? ? ? 0.010961831992188852 ?(rdiff ? ? ? ? 5.5387933059412495e-15) > ? ? ? ? ? ?1.0001168251037598 => ? ? ? ? ? 0.015285472131830449 != > ? ? ? 0.015285472131830425 ?(rdiff ? ? ? ? 1.5888373256788015e-15) > ? ? ? ? ? ?1.0003981590270996 => ? ? ? ? ? 0.028218171738655283 != > ? ? ? 0.028218171738655373 ?(rdiff ? ? ? ? 3.1967209494023856e-15) > > ---------------------------------------------------------------------- > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cycomanic at gmail.com Tue Jul 6 09:43:43 2010 From: cycomanic at gmail.com (=?ISO-8859-1?Q?Jochen_Schr=F6der?=) Date: Tue, 06 Jul 2010 23:43:43 +1000 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: <4C33330F.1000807@gmail.com> On 06/07/10 23:03, Robin wrote: > On Tue, Jul 6, 2010 at 3:36 AM, Bruce Southey wrote: >> On Mon, Jul 5, 2010 at 5:40 AM, Robin wrote: >>> Hi, >>> >>> I am using Python.org amd64 python build on windows 7 64 bit. >>> >>> I am using numpy and scipy builds from here: >>> http://www.lfd.uci.edu/~gohlke/pythonlibs/ >>> >>> I get many errors in scipy test (none for numpy). Particularly in >>> scipy.sparse.linalg which I need to use (and in my code it appears >>> spsolve is giving incorrect results). >>> >>> Is there a better 64 bit windows build to use? >> >> Under 32-bit Python and the scipy 0.8 rc1 under Windows 7 64bit, I >> only get the test_boost error the directory removal error (from this >> test: "test_create_catalog (test_catalog.TestGetCatalog) ..."). >> >> Some of the errors could be due to Window's lack of support for 64-bit >> like the "test_complex (test_basic.TestLongDoubleFailure)". However, >> you probably would have to build your own find out those if no one >> else has them. > > I suspect there are more errors because of indices being longs instead > of ints on Windows. > >> Given all the issues with 64-bit windows, do you really need 64-bit numpy/scipy? > > Unfortunately I do... it looks like I will now have to port a lot of > Python code to Matlab. I know Windows isn't very popular in the Scipy > community, and I try to avoid using it when I can, but it seems > Windows 7 is a lot better than previous versions. Also>4GB RAM is now > more or less standard for numerical work so I think 64 bit windows > really should be supported. In my group a large factor in the decision > to use windows was remote desktop and terminal services... For > non-command line users there is nothing equivalent that I know of. > (There is NX for linux but only 2 users is free - with a small tweak > to windows 7 it is possible to have full terminal server behaviour). Just for the record, there's a large number of remote desktop solutions for Linux: remote X, VNC, NX and there's NeatX which is an open source NX server writtten by Google. Sorry doesn't help with your problems though. > > I wonder how enthought get around this problem with 64 bit EPD on windows? > > Cheers > > Robin > > >> >> Bruce >> >> >>>>> scipy.test() >> Running unit tests for scipy >> NumPy version 1.4.1 >> NumPy is installed in E:\Python26\lib\site-packages\numpy >> SciPy version 0.8.0rc1 >> SciPy is installed in E:\Python26\lib\site-packages\scipy >> Python version 2.6.3 (r263rc1:75186, Oct 2 2009, 20:40:30) [MSC >> v.1500 32 bit (Intel)] >> nose version 0.11.1 >> [snip] >> ====================================================================== >> FAIL: test_data.test_boost(,) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "E:\Python26\lib\site-packages\nose-0.11.1-py2.6.egg\nose\case.py", >> line 183, in runTest >> self.test(*self.arg) >> File "E:\Python26\lib\site-packages\scipy\special\tests\test_data.py", >> line 205, in _test_factory >> test.check(dtype=dtype) >> File "E:\Python26\lib\site-packages\scipy\special\tests\testutils.py", >> line 223, in check >> assert False, "\n".join(msg) >> AssertionError: >> Max |adiff|: 1.77636e-15 >> Max |rdiff|: 2.44233e-14 >> Bad results for the following points (in output 0): >> 1.0000014305114746 => 0.0016914556651292853 != >> 0.0016914556651292944 (rdiff 5.3842961637318929e-15) >> 1.000007152557373 => 0.0037822080446613874 != >> 0.0037822080446612951 (rdiff 2.4423306175913249e-14) >> 1.0000138282775879 => 0.0052589439468011612 != >> 0.0052589439468011014 (rdiff 1.1380223962570286e-14) >> 1.0000600814819336 => 0.010961831992188913 != >> 0.010961831992188852 (rdiff 5.5387933059412495e-15) >> 1.0001168251037598 => 0.015285472131830449 != >> 0.015285472131830425 (rdiff 1.5888373256788015e-15) >> 1.0003981590270996 => 0.028218171738655283 != >> 0.028218171738655373 (rdiff 3.1967209494023856e-15) >> >> ---------------------------------------------------------------------- >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Tue Jul 6 09:46:20 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 6 Jul 2010 21:46:20 +0800 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 9:03 PM, Robin wrote: > On Tue, Jul 6, 2010 at 3:36 AM, Bruce Southey wrote: > > On Mon, Jul 5, 2010 at 5:40 AM, Robin wrote: > >> Hi, > >> > >> I am using Python.org amd64 python build on windows 7 64 bit. > >> > >> I am using numpy and scipy builds from here: > >> http://www.lfd.uci.edu/~gohlke/pythonlibs/ > >> > >> I get many errors in scipy test (none for numpy). Particularly in > >> scipy.sparse.linalg which I need to use (and in my code it appears > >> spsolve is giving incorrect results). > >> > >> Is there a better 64 bit windows build to use? > > > > Under 32-bit Python and the scipy 0.8 rc1 under Windows 7 64bit, I > > only get the test_boost error the directory removal error (from this > > test: "test_create_catalog (test_catalog.TestGetCatalog) ..."). > > > > Some of the errors could be due to Window's lack of support for 64-bit > > like the "test_complex (test_basic.TestLongDoubleFailure)". However, > > you probably would have to build your own find out those if no one > > else has them. > > I suspect there are more errors because of indices being longs instead > of ints on Windows. > > > Given all the issues with 64-bit windows, do you really need 64-bit > numpy/scipy? > > Unfortunately I do... it looks like I will now have to port a lot of > Python code to Matlab. First about the test output: in 0.8.0rc1 all printed warnings, the lapack_misaligned and the npyio errors are gone. The boost errors will be gone in the final release as well. So you have about 20 errors/failures left, mostly located in the sparse and odr modules. Unless you're a heavy user of those, no need to move to matlab. You could also decide to look into the errors instead of rewriting your code. > I know Windows isn't very popular in the Scipy > community, and I try to avoid using it when I can, but it seems > Windows 7 is a lot better than previous versions. Also >4GB RAM is now > more or less standard for numerical work so I think 64 bit windows > really should be supported. In my group a large factor in the decision > to use windows was remote desktop and terminal services... For > non-command line users there is nothing equivalent that I know of. > (There is NX for linux but only 2 users is free - with a small tweak > to windows 7 it is possible to have full terminal server behaviour). > > I wonder how enthought get around this problem with 64 bit EPD on windows? > So why not use EPD? Still many times cheaper than Matlab.... Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Jul 6 09:52:16 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 14:52:16 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: <4C33330F.1000807@gmail.com> References: <4C33330F.1000807@gmail.com> Message-ID: 2010/7/6 Jochen Schr?der : > Just for the record, there's a large number of remote desktop solutions > for Linux: remote X, VNC, NX and there's NeatX which is an open source > NX server writtten by Google. Sorry doesn't help with your problems though. Thanks... I find X and VNC are not really comparable to RDP in terms of usability (eg performance, connecting from different resolutions, keeping sessions, suitability for windows users etc.) NX is great but quite expensive. I didn't know about NeatX so I will have a look. But it doesn't have a release yet so I imagine its a little experimental. I stil think that for sharing a well equipped workstation among a small laboratory group (with no command line experience) win 7 and remote desktop is the best solution. Cheers Robin From robince at gmail.com Tue Jul 6 10:01:14 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 15:01:14 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 2:46 PM, Ralf Gommers wrote: > First about the test output: in 0.8.0rc1 all printed warnings, the > lapack_misaligned and the npyio errors are gone. The boost errors will be > gone in the final release as well. So you have about 20 errors/failures > left, mostly located in the sparse and odr modules. Unless you're a heavy > user of those, no need to move to matlab. You could also decide to look into > the errors instead of rewriting your code. Thanks... I depend on the sparse module quite heavily which is why it's a problem. (my code that uses spsolve is giving incorrect results, although no errors). It's probably not as bad as I made out though - I'm sure I can do a fair bit with 32 bit Python, it's just more difficult to make it accessible to my colleagues (will have to install 32 bit MATLAB and they will have to pay attention to which one they are using). >> I know Windows isn't very popular in the Scipy >> community, and I try to avoid using it when I can, but it seems >> Windows 7 is a lot better than previous versions. Also >4GB RAM is now >> more or less standard for numerical work so I think 64 bit windows >> really should be supported. In my group a large factor in the decision >> to use windows was remote desktop and terminal services... For >> non-command line users there is nothing equivalent that I know of. >> (There is NX for linux but only 2 users is free - with a small tweak >> to windows 7 it is possible to have full terminal server behaviour). >> >> I wonder how enthought get around this problem with 64 bit EPD on windows? > > So why not use EPD? Still many times cheaper than Matlab.... I would suggest that if I had any influence at all on purchasing decisions, but as a PhD student I don't. Also MATLAB license is paid for by central IT, whereas any extra software would have to come out of group grants. If I wasn't the only person using it there might be a case, but unless I can get it working on Windows I'll continue to be the only person using it! (bit of a chicken and egg). Cheers Robin From bsouthey at gmail.com Tue Jul 6 10:04:02 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Jul 2010 09:04:02 -0500 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: Message-ID: <4C3337D2.3070208@gmail.com> On 07/06/2010 08:03 AM, Robin wrote: > On Tue, Jul 6, 2010 at 3:36 AM, Bruce Southey wrote: > >> On Mon, Jul 5, 2010 at 5:40 AM, Robin wrote: >> >>> Hi, >>> >>> I am using Python.org amd64 python build on windows 7 64 bit. >>> >>> I am using numpy and scipy builds from here: >>> http://www.lfd.uci.edu/~gohlke/pythonlibs/ >>> >>> I get many errors in scipy test (none for numpy). Particularly in >>> scipy.sparse.linalg which I need to use (and in my code it appears >>> spsolve is giving incorrect results). >>> >>> Is there a better 64 bit windows build to use? >>> >> Under 32-bit Python and the scipy 0.8 rc1 under Windows 7 64bit, I >> only get the test_boost error the directory removal error (from this >> test: "test_create_catalog (test_catalog.TestGetCatalog) ..."). >> >> Some of the errors could be due to Window's lack of support for 64-bit >> like the "test_complex (test_basic.TestLongDoubleFailure)". However, >> you probably would have to build your own find out those if no one >> else has them. >> > I suspect there are more errors because of indices being longs instead > of ints on Windows. > It would be great to track some of these down. Basically scipy has not had the attention that numpy has in this matter eventhough David Cournapeau done really incredible work in getting numpy/scipy to work under 64-bit Windows. >> Given all the issues with 64-bit windows, do you really need 64-bit numpy/scipy? >> > Unfortunately I do... it looks like I will now have to port a lot of > Python code to Matlab. I know Windows isn't very popular in the Scipy > community, and I try to avoid using it when I can, but it seems > Windows 7 is a lot better than previous versions. Windows 7 is a big improvement over Vista but both suffer the transisition from 32-bit to x64 64-bit architecture (similar to Linux when these x64 cpu's came out). Sure most people do not develop with Windows but do not equate that with a lack of interest. The problem is that Windows and how the Windows binaries are build just makes it very extremely hard to develop for. > Also>4GB RAM is now > more or less standard for numerical work so I think 64 bit windows > really should be supported. Yes, there are many people who want it but the tools are too complex to use by casual people. > In my group a large factor in the decision > to use windows was remote desktop and terminal services... For > non-command line users there is nothing equivalent that I know of. > (There is NX for linux but only 2 users is free - with a small tweak > to windows 7 it is possible to have full terminal server behaviour). > You tried FreeNx? http://freenx.berlios.de/ While this is really old (and has some big issues including not being maintained) but I occasionally use xrdp as you can connect to Linux with Windows remote desktop. "RDP Server - An open source RDP server and X server capable of accepting connections from rdesktop and ms terminal server clients." http://xrdp.sourceforge.net/ > I wonder how enthought get around this problem with 64 bit EPD on windows? > > Cheers > > Robin > > Can't comment on those. Bruce > >> Bruce >> >> >> >>>>> scipy.test() >>>>> >> Running unit tests for scipy >> NumPy version 1.4.1 >> NumPy is installed in E:\Python26\lib\site-packages\numpy >> SciPy version 0.8.0rc1 >> SciPy is installed in E:\Python26\lib\site-packages\scipy >> Python version 2.6.3 (r263rc1:75186, Oct 2 2009, 20:40:30) [MSC >> v.1500 32 bit (Intel)] >> nose version 0.11.1 >> [snip] >> ====================================================================== >> FAIL: test_data.test_boost(,) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "E:\Python26\lib\site-packages\nose-0.11.1-py2.6.egg\nose\case.py", >> line 183, in runTest >> self.test(*self.arg) >> File "E:\Python26\lib\site-packages\scipy\special\tests\test_data.py", >> line 205, in _test_factory >> test.check(dtype=dtype) >> File "E:\Python26\lib\site-packages\scipy\special\tests\testutils.py", >> line 223, in check >> assert False, "\n".join(msg) >> AssertionError: >> Max |adiff|: 1.77636e-15 >> Max |rdiff|: 2.44233e-14 >> Bad results for the following points (in output 0): >> 1.0000014305114746 => 0.0016914556651292853 != >> 0.0016914556651292944 (rdiff 5.3842961637318929e-15) >> 1.000007152557373 => 0.0037822080446613874 != >> 0.0037822080446612951 (rdiff 2.4423306175913249e-14) >> 1.0000138282775879 => 0.0052589439468011612 != >> 0.0052589439468011014 (rdiff 1.1380223962570286e-14) >> 1.0000600814819336 => 0.010961831992188913 != >> 0.010961831992188852 (rdiff 5.5387933059412495e-15) >> 1.0001168251037598 => 0.015285472131830449 != >> 0.015285472131830425 (rdiff 1.5888373256788015e-15) >> 1.0003981590270996 => 0.028218171738655283 != >> 0.028218171738655373 (rdiff 3.1967209494023856e-15) >> >> ---------------------------------------------------------------------- >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Jul 6 10:13:17 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 15:13:17 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: <4C3337D2.3070208@gmail.com> References: <4C3337D2.3070208@gmail.com> Message-ID: On Tue, Jul 6, 2010 at 3:04 PM, Bruce Southey wrote: > Windows 7 is a big improvement over Vista but both suffer the transisition > from 32-bit to x64 64-bit architecture (similar to Linux when these x64 > cpu's came out). Sure most people do not develop with Windows but do not > equate that with a lack of interest. The problem is that Windows and how the > Windows binaries are build just makes it very extremely hard to develop for. Yes, I was really surprised at this. I don't know very much about the workings of Python, but presumably theres a reason the Python people couldn't have made ints on win64 proper 64 bit ints using whatever type microsoft requires instead of just sticking with 32bit C longs. I tried not to have a gripey negative tone in the original email but perhaps I failed. It is always frustrating when you spend a lot of time on something (I spent quite a long time getting MATLAB-Python integration working on 64 bit windows... of course I should have checked numpy+scipy first!). Any way I really appreciate all the work thats gone into making numpy and scipy available... I just wanted to make the point that with windows 7 64 bit windows isn't such a joke and there are people who would use a win64 scipy stack. > > You tried FreeNx? > http://freenx.berlios.de/ When I tried it it was very hard to get working a bit tempramental... it was a while ago though. Also I'm the only linux user in the lab and I'm leaving soon so windows really was the only option. Cheers Robin From kwgoodman at gmail.com Tue Jul 6 11:40:56 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 6 Jul 2010 08:40:56 -0700 Subject: [SciPy-User] [ANN] la 0.4, the labeled array Message-ID: The main class of the la package is a labeled array, larry. A larry consists of data and labels. The data is stored as a NumPy array and the labels as a list of lists (one list per dimension). Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys. The focus of this release was binary operations between unaligned larrys with user control of the join method (five available) and the fill method. A general binary function, la.binaryop(), was added as were the convenience functions add, subtract, multiply, divide. Supporting functions such as la.align(), which aligns two larrys, were also added. download http://pypi.python.org/pypi/la doc http://larry.sourceforge.net code http://github.com/kwgoodman/la list1 http://groups.google.ca/group/pystatsmodels list2 http://groups.google.com/group/labeled-array RELEASE NOTES New larry methods - ismissing: A bool larry with element-wise marking of missing values - take: A copy of the specified elements of a larry along an axis New functions - rand: Random samples from a uniform distribution - randn: Random samples from a Gaussian distribution - missing_marker: Return missing value marker for the given larry - ismissing: A bool Numpy array with element-wise marking of missing values - correlation: Correlation of two Numpy arrays along the specified axis - split: Split into train and test data along given axis - listmap_fill: Index map a list onto another and index of unmappable elements - listmap_fill: Cython version of listmap_fill - align: Align two larrys using one of five join methods - info: la package information such as version number and HDF5 availability - binaryop: Binary operation on two larrys with given function and join method - add: Sum of two larrys using given join and fill methods - subtract: Difference of two larrys using given join and fill methods - multiply: Multiply two larrys element-wise using given join and fill methods - divide: Divide two larrys element-wise using given join and fill methods Enhancements - listmap now has option to ignore unmappable elements instead of KeyError - listmap.pyx now has option to ignore unmappable elements instead of KeyError - larry.morph() is much faster as are methods, such as merge, that use it Breakage from la 0.3 - Development moved from launchpad to github - func.py and afunc.py renamed flarry.py and farray.py to match new flabel.py. Broke: "from la.func import stack"; Did not break: "from la import stack" - Default binary operators (+, -, ...) no longer raise an error when no labels overlap Bug fixes - #590270 Index with 1d array bug: lar[1darray,:] worked; lar[1darray] crashed From Chris.Barker at noaa.gov Tue Jul 6 13:17:35 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 06 Jul 2010 10:17:35 -0700 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> Message-ID: <4C33652F.4070002@noaa.gov> Robin wrote: > NX is great but quite expensive. yes, though it looks cheaper than MATLAB. But if you've got people that want to use Windows, you've got people that want to use Windows. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Tue Jul 6 13:53:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Jul 2010 11:53:22 -0600 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: <4C33652F.4070002@noaa.gov> References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: On Tue, Jul 6, 2010 at 11:17 AM, Christopher Barker wrote: > Robin wrote: > > NX is great but quite expensive. > > yes, though it looks cheaper than MATLAB. But if you've got people that > want to use Windows, you've got people that want to use Windows. > > Let's get this thread back to the errors. The problems seem specific to the python.org amd64 python, is that correct? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From devicerandom at gmail.com Tue Jul 6 14:02:22 2010 From: devicerandom at gmail.com (ms) Date: Tue, 06 Jul 2010 19:02:22 +0100 Subject: [SciPy-User] scipy.optimize.leastsq question In-Reply-To: References: <4C289756.4070406@googlemail.com> <4C29A09F.5010504@googlemail.com> Message-ID: <4C336FAE.3090005@gmail.com> On 29/06/10 09:59, Sebastian Walter wrote: >>> Only use derivative free optimization methods if your problem is not continuous. >>> If your problem is differentiable, you should compute the Jacobian >>> yourself, e.g. with >>> >>> def myJacobian(x): >>> h = 10**-3 >>> # do finite differences approximation >>> return .... >>> >>> and provide the Jacobian to >>> scipy.optimize.leastsq(..., Dfun = myJacobian) Uh, I am real newbie in this field, but I expected that the Jacobian was needed if there was an analytical expression for the derivatives; I thought the leastsq routine calculated the finite difference approximation by itself otherwise. So I never bothered providing an "approximate" Jacobian. Or maybe I do not get what do you mean by finite difference. Could someone provide some insight on this? thanks! m. >>> This should work much better/reliable/faster than any of the alternatives. >> >> Maybe increasing the step length in the options to leastsq also works: >> >> epsfcn ? A suitable step length for the forward-difference >> approximation of the Jacobian (for Dfun=None). >> >> I don't think I have tried for leastsq, but for some fmin it works >> much better with larger step length for the finite difference >> approximation. > > choosing the right "step length" h is an art that I don't know much about. > But apparently one rule of thumb is to use > > h = abs(x)* sqrt(numpy.finfo(float).eps) > to compute > f'(x) = (f(x+h) - f(x))/h > > i.e. if one has x = [1,10**-3, 10**4] one would have to scale h with > 1, 10**-3 and 10**4. > > Regarding epsfcn: I find the documentation of leastsq a "little" confusing. > > epsfcn -- A suitable step length for the forward-difference > approximation of the Jacobian (for Dfun=None). If > epsfcn is less than the machine precision, it is assumed > that the relative errors in the functions are of > the order of the machine precision. > > In particular I don't quite get what is meant by "relative errors in > the functions". Which "functions" does it refer to? > > > Sebastian > >> >> Josef >> >> >> >>> >>> Also, using Algorithmic Differentiation to compute the Jacobian would >>> probably help in terms of robustness and convergence speed of leastsq. >>> >>> Sebastian >>> >>> >>> >>> >>> >>>> >>>> Cheers, Ralph >>>> >>>> Den 28.06.10 17.13, skrev Sebastian Walter: >>>>> there may be others who have more experience with scipy.optimize.leastsq. >>>>> >>>>>> From the mathematical point of view you should be certain that your >>>>> function is continuously differentiable or at least >>>>> (Lipschitz-)continuous. >>>>> This is because scipy.optimize.leastsq uses the Levenberg-Marquardt >>>>> algorithm which requires the Jacobian J(x) = dF/dx. >>>>> >>>>> You do not provide an analytic Jacobian for scipy.optimize.leastsq. >>>>> That means that scipy.optimize.leastsq uses some finite differences >>>>> approximation to approximate the Jacobian J(x). >>>>> It can happen that this finite differences approximation is so poor >>>>> that no descent direction for the residual can be found. >>>>> >>>>> So the first thing I would check is if the Jacobian J(x) makes sense. >>>>> You should be able to extract it from >>>>> scipy.optimize.leastsq's output infodict['fjac']. >>>>> >>>>> Then I'd check if >>>>> F(x + h*v) - F(x)/h, for h \approx 10**-8 >>>>> >>>>> gives the same vector as dot(J(x),v) >>>>> if this doesn't match at all, then your Jacobian is wrong resp. your >>>>> function is not continuously differentiable. >>>>> >>>>> Hope this helps a little, >>>>> Sebastian >>>>> >>>>> >>>>> >>>>> On Mon, Jun 28, 2010 at 2:36 PM, Ralph Kube wrote: >>>>>> Hello people, >>>>>> I am having a problem using the leastsq routine. My goal is to >>>>>> determine three parameters r_i, r_s and ppw so that the residuals >>>>>> to a model function a(r_i, r_s, ppw) to a measurement are minimal. >>>>>> When I call the leastsq routine with a good guess of starting values, it >>>>>> iterates 6 times without changing the vales of the initial parameters >>>>>> and then exits without an error. >>>>>> The function a is very complicated and expensive to evaluate. Some >>>>>> evaluation is done by using the subprocess module of python. Can this >>>>>> pose a problem for the leastsq routine? >>>>>> >>>>>> >>>>>> This is in the main routine: >>>>>> >>>>>> import numpy as N >>>>>> >>>>>> for t_idx, t in enumerate(time_var): >>>>>> >>>>>> r_i = 300. >>>>>> r_s = 1.0 >>>>>> ppw=1e-6 >>>>>> sza = 70. >>>>>> wl = N.arange(300., 3001., 1.) >>>>>> >>>>>> albedo_true = compute_albedo(r_i, r_s, ppw, sza, wl) >>>>>> # This emulates the measurement data >>>>>> albedo_meas = albedo_true + 0.01*N.random.randn(len(wl)) >>>>>> >>>>>> print 'Optimizing albedo' >>>>>> p0 = [2.*r_i, 1.4*r_s, 4.*ppw] >>>>>> plsq2 = leastsq(albedo_residual, p0, args=(albedo_meas, sza, >>>>>> wl)) >>>>>> print '... done: ', plsq2[0][0], plsq2[0][1], plsq2[0][2] >>>>>> albedo_model = compute_albedo(plsq2[0][0], plsq2[0][1], plsq2[0][2], >>>>>> sza, wl) >>>>>> >>>>>> The residual function: >>>>>> def albedo_residual(p, y, sza, wvl): >>>>>> r_i, r_s, ppw = p >>>>>> albedo = compute_albedo(r_i, r_s, ppw, sza, wvl) >>>>>> err = albedo - y >>>>>> print 'Albedo for r_i = %4.0f, r_s = %4.2f, ppw = %3.2e \ >>>>>> Residual squared: %5f' % (r_i, r_s, ppw, N.sum(err**2)) >>>>>> >>>>>> return err >>>>>> >>>>>> The definition of the function a(r_i, r_s, ppw) >>>>>> def compute_albedo(radius_ice, radius_soot, ppw, sza, wvl): >>>>>> >>>>>> The output is: >>>>>> Optimizing albedo >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 Residual squared: >>>>>> 0.973819 >>>>>> ... done: 600.0 1.4 4e-06 >>>>>> >>>>>> To check for errors, I implemented the example code from >>>>>> http://www.tau.ac.il/~kineret/amit/scipy_tutorial/ in my code and it >>>>>> runs successfully. >>>>>> >>>>>> I would be glad for any suggestion. >>>>>> >>>>>> >>>>>> Cheers, Ralph >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> -- >>>> >>>> Cheers, Ralph >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josh.holbrook at gmail.com Tue Jul 6 14:19:38 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 10:19:38 -0800 Subject: [SciPy-User] scipy.optimize.leastsq question In-Reply-To: <4C336FAE.3090005@gmail.com> References: <4C289756.4070406@googlemail.com> <4C29A09F.5010504@googlemail.com> <4C336FAE.3090005@gmail.com> Message-ID: On Tue, Jul 6, 2010 at 10:02 AM, ms wrote: > On 29/06/10 09:59, Sebastian Walter wrote: > >>> Only use derivative free optimization methods if your problem is not > continuous. > >>> If your problem is differentiable, you should compute the Jacobian > >>> yourself, e.g. with > >>> > >>> def myJacobian(x): > >>> h = 10**-3 > >>> # do finite differences approximation > >>> return .... > >>> > >>> and provide the Jacobian to > >>> scipy.optimize.leastsq(..., Dfun = myJacobian) > > Uh, I am real newbie in this field, but I expected that the Jacobian was > needed if there was an analytical expression for the derivatives; I > thought the leastsq routine calculated the finite difference > approximation by itself otherwise. So I never bothered providing an > "approximate" Jacobian. Or maybe I do not get what do you mean by finite > difference. > > Could someone provide some insight on this? > > thanks! > m. > > >>> This should work much better/reliable/faster than any of the > alternatives. > >> > >> Maybe increasing the step length in the options to leastsq also works: > >> > >> epsfcn ? A suitable step length for the forward-difference > >> approximation of the Jacobian (for Dfun=None). > >> > >> I don't think I have tried for leastsq, but for some fmin it works > >> much better with larger step length for the finite difference > >> approximation. > > > > choosing the right "step length" h is an art that I don't know much > about. > > But apparently one rule of thumb is to use > > > > h = abs(x)* sqrt(numpy.finfo(float).eps) > > to compute > > f'(x) = (f(x+h) - f(x))/h > > > > i.e. if one has x = [1,10**-3, 10**4] one would have to scale h with > > 1, 10**-3 and 10**4. > > > > Regarding epsfcn: I find the documentation of leastsq a "little" > confusing. > > > > epsfcn -- A suitable step length for the forward-difference > > approximation of the Jacobian (for Dfun=None). If > > epsfcn is less than the machine precision, it is assumed > > that the relative errors in the functions are of > > the order of the machine precision. > > > > In particular I don't quite get what is meant by "relative errors in > > the functions". Which "functions" does it refer to? > > > > > > Sebastian > > > >> > >> Josef > >> > >> > >> > >>> > >>> Also, using Algorithmic Differentiation to compute the Jacobian would > >>> probably help in terms of robustness and convergence speed of leastsq. > >>> > >>> Sebastian > >>> > >>> > >>> > >>> > >>> > >>>> > >>>> Cheers, Ralph > >>>> > >>>> Den 28.06.10 17.13, skrev Sebastian Walter: > >>>>> there may be others who have more experience with > scipy.optimize.leastsq. > >>>>> > >>>>>> From the mathematical point of view you should be certain that your > >>>>> function is continuously differentiable or at least > >>>>> (Lipschitz-)continuous. > >>>>> This is because scipy.optimize.leastsq uses the Levenberg-Marquardt > >>>>> algorithm which requires the Jacobian J(x) = dF/dx. > >>>>> > >>>>> You do not provide an analytic Jacobian for scipy.optimize.leastsq. > >>>>> That means that scipy.optimize.leastsq uses some finite differences > >>>>> approximation to approximate the Jacobian J(x). > >>>>> It can happen that this finite differences approximation is so poor > >>>>> that no descent direction for the residual can be found. > >>>>> > >>>>> So the first thing I would check is if the Jacobian J(x) makes sense. > >>>>> You should be able to extract it from > >>>>> scipy.optimize.leastsq's output infodict['fjac']. > >>>>> > >>>>> Then I'd check if > >>>>> F(x + h*v) - F(x)/h, for h \approx 10**-8 > >>>>> > >>>>> gives the same vector as dot(J(x),v) > >>>>> if this doesn't match at all, then your Jacobian is wrong resp. your > >>>>> function is not continuously differentiable. > >>>>> > >>>>> Hope this helps a little, > >>>>> Sebastian > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Jun 28, 2010 at 2:36 PM, Ralph Kube > wrote: > >>>>>> Hello people, > >>>>>> I am having a problem using the leastsq routine. My goal is to > >>>>>> determine three parameters r_i, r_s and ppw so that the residuals > >>>>>> to a model function a(r_i, r_s, ppw) to a measurement are minimal. > >>>>>> When I call the leastsq routine with a good guess of starting > values, it > >>>>>> iterates 6 times without changing the vales of the initial > parameters > >>>>>> and then exits without an error. > >>>>>> The function a is very complicated and expensive to evaluate. Some > >>>>>> evaluation is done by using the subprocess module of python. Can > this > >>>>>> pose a problem for the leastsq routine? > >>>>>> > >>>>>> > >>>>>> This is in the main routine: > >>>>>> > >>>>>> import numpy as N > >>>>>> > >>>>>> for t_idx, t in enumerate(time_var): > >>>>>> > >>>>>> r_i = 300. > >>>>>> r_s = 1.0 > >>>>>> ppw=1e-6 > >>>>>> sza = 70. > >>>>>> wl = N.arange(300., 3001., 1.) > >>>>>> > >>>>>> albedo_true = compute_albedo(r_i, r_s, ppw, sza, wl) > >>>>>> # This emulates the measurement data > >>>>>> albedo_meas = albedo_true + 0.01*N.random.randn(len(wl)) > >>>>>> > >>>>>> print 'Optimizing albedo' > >>>>>> p0 = [2.*r_i, 1.4*r_s, 4.*ppw] > >>>>>> plsq2 = leastsq(albedo_residual, p0, args=(albedo_meas, > sza, > >>>>>> wl)) > >>>>>> print '... done: ', plsq2[0][0], plsq2[0][1], plsq2[0][2] > >>>>>> albedo_model = compute_albedo(plsq2[0][0], plsq2[0][1], > plsq2[0][2], > >>>>>> sza, wl) > >>>>>> > >>>>>> The residual function: > >>>>>> def albedo_residual(p, y, sza, wvl): > >>>>>> r_i, r_s, ppw = p > >>>>>> albedo = compute_albedo(r_i, r_s, ppw, sza, wvl) > >>>>>> err = albedo - y > >>>>>> print 'Albedo for r_i = %4.0f, r_s = %4.2f, ppw = %3.2e \ > >>>>>> Residual squared: %5f' % (r_i, r_s, ppw, > N.sum(err**2)) > >>>>>> > >>>>>> return err > >>>>>> > >>>>>> The definition of the function a(r_i, r_s, ppw) > >>>>>> def compute_albedo(radius_ice, radius_soot, ppw, sza, wvl): > >>>>>> > >>>>>> The output is: > >>>>>> Optimizing albedo > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> Albedo for r_i = 600, r_s = 1.40, ppw = 4.00e-06 > Residual squared: > >>>>>> 0.973819 > >>>>>> ... done: 600.0 1.4 4e-06 > >>>>>> > >>>>>> To check for errors, I implemented the example code from > >>>>>> http://www.tau.ac.il/~kineret/amit/scipy_tutorial/in my code and it > >>>>>> runs successfully. > >>>>>> > >>>>>> I would be glad for any suggestion. > >>>>>> > >>>>>> > >>>>>> Cheers, Ralph > >>>>>> _______________________________________________ > >>>>>> SciPy-User mailing list > >>>>>> SciPy-User at scipy.org > >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>> > >>>>> _______________________________________________ > >>>>> SciPy-User mailing list > >>>>> SciPy-User at scipy.org > >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>> > >>>> -- > >>>> > >>>> Cheers, Ralph > >>>> _______________________________________________ > >>>> SciPy-User mailing list > >>>> SciPy-User at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>> > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > m. said: > Uh, I am real newbie in this field, but I expected that the Jacobian was > needed if there was an analytical expression for the derivatives; I > thought the leastsq routine calculated the finite difference > approximation by itself otherwise. So I never bothered providing an > "approximate" Jacobian. Or maybe I do not get what do you mean by finite > difference. > > Could someone provide some insight on this? > > thanks! > m. I say this not as someone intimately familiar with scipy.optimize, but as someone who has implemented a least squares-ish algorithm himself. You are almost certainly correct in that leastsq calculates an approximate Jacobian using a finite difference method on its own. However, if you can symbolically differentiate your promblem without too much heartache, then supplying an exact Jacobian is probably preferrable due to higher precision and less function evaluation (f(x) and f(x+h), differenced and normalized, vs. simply f'(x)). On the other hand: When I implemented my algorithm (nearly two years ago), my equations were pretty nasty. My derivatives just happened to be much much worse (as can be seen at http://modzer0.cs.uaf.edu/~jesusabdullah/gradients.html, at least for a little while), and at the time sympy honestly wasn't production-ready. So, I ended up using a finite difference method to calculate them (I believe I used scipy's derivative function), with which I did have to tweak step sizes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Tue Jul 6 14:36:22 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Tue, 6 Jul 2010 14:36:22 -0400 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: <4C2BBBA7.5060006@gemini.edu> References: <4C2BBBA7.5060006@gemini.edu> Message-ID: <735692AF-E60A-439E-9E9C-C858EA28DB57@gmail.com> Hi all, Unlike several people who have replied so far, I have not been involved in general purpose astronomy libraries, but rather a few small packages, including: - APLpy (FITS image plotting built on matplotlib) at http://aplpy.sourceforge.net (co-developed with Eli Bressert) - ATpy (Seamless multi-format table handler) at http://atpy.sourceforge.net (also co-developed with Eli Bressert) - IDLSave (To read IDL save files into python) at http://idlsave.sourceforge.net - python-montage (Montage wrapper) at http://python-montage.sourceforgenet The main reason for keeping these separate rather than group them into a single package was that each of these packages accomplishes a well-defined task, and we were not prepared to develop all the other tools needed in a general-purpose library. However, I think that from a user point of view, it would be nice to have something more unified. The main point I want to make is that we need to distinguish between merging all development into a single repository, and bundling packages. There are cases where merging the development of packages does not make sense. For example, in the case of IDLSave, the module was originally developed for the astronomy community, but in fact can be used by any scientist that uses IDL. By developing it as part of a general astronomy libraries makes it less likely that non-astronomers will find it and use it. Another example is Tom Aldcroft's asciitable module (http://cxc.harvard.edu/contrib/asciitable/). This was developed to read in ASCII tables, but was not actually designed in an astronomy specific ways, since ASCII tables are of course not limited to astronomy. In the case of such a package, developing it as part of a general astronomy library would be detrimental, but bundling it as part of a general package might be desirable. So here is what I personally see as the ideal (hierarchical) setup: 1. A core library of essential routines, which handle for example FITS I/O, VO tools, WCS, coordinate transformations, etc. These could be held on a common development server. This would be a kind of 'numpy' or 'scipy' for astronomy, e.g. 'astropy' or 'astrocore'. Documentation for these would be merged and unified. 2. An astronomy 'bundle' which would include these essential routines, as well as extra astronomy packages (e.g. asciitable, IDLSave, ATpy, etc.). This bundle could be installable on top of the system python, or the enthought python distribution, e.g. 'astrolab'. Documentation for the extra packages would be left to the developers, but could be required to be of a consistent style (e.g. Sphinx) for inclusion in the bundle. 3. An 'all-in-one' package that would also include a full Python distribution, with numpy, scipy, matplotlib, etc - essentially an alternative to EPD for astronomy, e.g. 'APD (Astronomy Python Distribution)'. It could even come with a custom ipython prompt with all packges pre-loaded. All dependencies would be included. This would then cater to the different levels of python users in Astronomy. One quick note is that if we follow this or a similar model of bundling some packages without merging development repositories, is that developers of small packages will need to be more careful what license they release their code under, to ensure they can be bundled and re-released (but this should be trivial). Cheers, Thomas Robitaille On Jun 30, 2010, at 5:48 PM, James Turner wrote: > Dear Python users in astronomy, > > At SciPy 2009, I arranged an astronomy BoF where we discussed the > fact that there are now a number of astronomy libraries for Python > floating around and maybe it would be good to collect more code into > a single place. People seemed receptive to this idea and weren't sure > why it hasn't already happened, given that there has been an Astrolib > page at SciPy for some years now, with an associated SVN repository: > > http://scipy.org/AstroLib > > After the meeting last August, I was supposed to contact the mailing > list and some library authors I had talked to previously, to discuss > this further. My apologies for taking 10 months to do that! I did > draft an email the day after the BoF, but then we ran into a hurdle > with setting up new committers to the AstroLib repository (which has > taken a lot longer than expected to resolve), so it seemed a bad > time to suggest that new people start using it. > > To discuss these issues further, we'd like to encourage everyone to > sign up for the AstroPy mailing list if you are not already on it. > The traffic is just a few messages per month. > > http://lists.astropy.scipy.org/mailman/listinfo/astropy > > We (the 2009 BoF group) would also like to hear on the list about > why people have decided to host their own astronomy library (eg. not > being aware of the one at SciPy). Are you interested in contributing > to Astrolib? Do you have any other comments or concerns about > co-ordinating tools? Our motivation is to make libraries easy to > find and install, allow sharing code easily, help rationalize > available functionality and fill in what's missing. A standard > astronomy library with a single set of documentation should be more > coherent and easier to maintain. The idea is not to limit authors' > flexibility of take ownership of their code -- the sub-packages > can still be maintained by different people. > > If you're at SciPy this week, Perry Greenfield and I would be happy > to talk to you. If you would like to add your existing library to > Astrolib, please contact Perry Greenfield or Mark Sienkiewicz at > STScI for access (contact details at http://scipy.org/AstroLib). > Note that the repository is being moved to a new server this week, > after which the URLs will be updated at scipy.org. > > Thanks! > > James Turner (Gemini). > > Bcc: various library authors > > _______________________________________________ > AstroPy mailing list > AstroPy at scipy.org > http://mail.scipy.org/mailman/listinfo/astropy -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 6 14:43:24 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 6 Jul 2010 18:43:24 +0000 (UTC) Subject: [SciPy-User] many test failures on windows 64 References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: Tue, 06 Jul 2010 11:53:22 -0600, Charles R Harris wrote: [clip] > Let's get this thread back to the errors. The problems seem specific to > the python.org amd64 python, is that correct? The SuperLU failures puzzle me. It should be "straightforward" C code, and I don't understand what can go wrong there. The "Factor is exactly singular" error indicates essentially means that SuperLU thinks it detects a zero pivot or something, so something seems to fail at a fairly low level. This seems quite difficult to debug without a Win-64 at hand. Another thing is that Gohlke's binaries are also built against MKL, and SuperLU does call BLAS routines. I wonder if something can break because of that... @Robin: Also, for the cases where a wrong result is produced with no error: is it easy to write a small test program demonstrating this? If yes, could you write one? -- Pauli Virtanen From devicerandom at gmail.com Tue Jul 6 14:46:29 2010 From: devicerandom at gmail.com (ms) Date: Tue, 06 Jul 2010 19:46:29 +0100 Subject: [SciPy-User] scipy.optimize.leastsq question In-Reply-To: References: <4C289756.4070406@googlemail.com> <4C29A09F.5010504@googlemail.com> <4C336FAE.3090005@gmail.com> Message-ID: <4C337A05.8070304@gmail.com> On 06/07/10 19:19, Joshua Holbrook wrote: > On Tue, Jul 6, 2010 at 10:02 AM, ms wrote: > >> On 29/06/10 09:59, Sebastian Walter wrote: >>>>> Only use derivative free optimization methods if your problem is not >> continuous. >>>>> If your problem is differentiable, you should compute the Jacobian >>>>> yourself, e.g. with >>>>> >>>>> def myJacobian(x): >>>>> h = 10**-3 >>>>> # do finite differences approximation >>>>> return .... >> >>> >> >>> and provide the Jacobian to >> >>> scipy.optimize.leastsq(..., Dfun = myJacobian) >> >> Uh, I am real newbie in this field, but I expected that the Jacobian was >> needed if there was an analytical expression for the derivatives; I >> thought the leastsq routine calculated the finite difference >> approximation by itself otherwise. So I never bothered providing an >> "approximate" Jacobian. Or maybe I do not get what do you mean by finite >> difference. > I say this not as someone intimately familiar with scipy.optimize, but as > someone who has implemented a least squares-ish algorithm himself. > > You are almost certainly correct in that leastsq calculates an approximate > Jacobian using a finite difference method on its own. However, if you can > symbolically differentiate your promblem without too much heartache, then > supplying an exact Jacobian is probably preferrable due to higher precision > and less function evaluation (f(x) and f(x+h), differenced and normalized, > vs. simply f'(x)). > > On the other hand: When I implemented my algorithm (nearly two years ago), > my equations were pretty nasty. My derivatives just happened to be much much > worse (as can be seen at > http://modzer0.cs.uaf.edu/~jesusabdullah/gradients.html, at least for a > little while), and at the time sympy honestly wasn't production-ready. So, I > ended up using a finite difference method to calculate them (I believe I > used scipy's derivative function), with which I did have to tweak step > sizes. Thank you. What you tell me is very similar to what I have always understood. But I was confused because of the pseudocode that Sebastian Walter provided: >> On 29/06/10 09:59, Sebastian Walter wrote: >>>>> If your problem is differentiable, you should compute the Jacobian >>>>> yourself, e.g. with >>>>> >>>>> def myJacobian(x): >>>>> h = 10**-3 >>>>> # do finite differences approximation >>>>> return .... >> >>> >> >>> and provide the Jacobian to >> >>> scipy.optimize.leastsq(..., Dfun = myJacobian) that explicitly says you can provide one with finite differences approximation, so I am unsure. thanks, M. From robince at gmail.com Tue Jul 6 14:46:57 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 19:46:57 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: On Tue, Jul 6, 2010 at 6:53 PM, Charles R Harris > > Let's get this thread back to the errors. The problems seem specific to the > python.org amd64 python, is that correct? Yes, that is the build I am using. The mtrand problems come from shape of arrays being longs instead of ints (since ints are 32 bit). I'm not sure if this could be having similar problems elsewhere. As Pauli noted the numpy build is with MKL - but I chose that one because it is required by the scipy amd64 build on that page... I'm not aware of any other scipy win64 builds. Cheers Robin From cgohlke at uci.edu Tue Jul 6 14:52:41 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Tue, 06 Jul 2010 11:52:41 -0700 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: <4C337B79.5080608@uci.edu> On 7/6/2010 11:43 AM, Pauli Virtanen wrote: > Tue, 06 Jul 2010 11:53:22 -0600, Charles R Harris wrote: > [clip] >> Let's get this thread back to the errors. The problems seem specific to >> the python.org amd64 python, is that correct? No, the unofficial scipy-0.8.0rc1.win32-py2.6 build fails with (mostly) the same errors. Most of the reported errors are apparently specific to my build environment: msvc9, ifort 11.1, mkl 10.2, numscons 0.12.0dev. > The SuperLU failures puzzle me. It should be "straightforward" C code, > and I don't understand what can go wrong there. The "Factor is exactly > singular" error indicates essentially means that SuperLU thinks it > detects a zero pivot or something, so something seems to fail at a fairly > low level. > > This seems quite difficult to debug without a Win-64 at hand. > > Another thing is that Gohlke's binaries are also built against MKL, and > SuperLU does call BLAS routines. I wonder if something can break because > of that... > > @Robin: Also, for the cases where a wrong result is produced with no > error: is it easy to write a small test program demonstrating this? If > yes, could you write one? > -- Christoph From josh.holbrook at gmail.com Tue Jul 6 14:54:12 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 10:54:12 -0800 Subject: [SciPy-User] scipy.optimize.leastsq question In-Reply-To: <4C337A05.8070304@gmail.com> References: <4C289756.4070406@googlemail.com> <4C29A09F.5010504@googlemail.com> <4C336FAE.3090005@gmail.com> <4C337A05.8070304@gmail.com> Message-ID: On Tue, Jul 6, 2010 at 10:46 AM, ms wrote: > > On 06/07/10 19:19, Joshua Holbrook wrote: > > On Tue, Jul 6, 2010 at 10:02 AM, ms ?wrote: > > > >> On 29/06/10 09:59, Sebastian Walter wrote: > >>>>> Only use derivative free optimization methods if your problem is not > >> continuous. > >>>>> If your problem is differentiable, you should compute the Jacobian > >>>>> yourself, e.g. with > >>>>> > >>>>> def myJacobian(x): > >>>>> ? ? ? h = 10**-3 > >>>>> ? ? ? # do finite differences approximation > >>>>> ? ? ? return .... > >> ? >>> > >> ? >>> ?and provide the Jacobian to > >> ? >>> ?scipy.optimize.leastsq(..., Dfun = myJacobian) > >> > >> Uh, I am real newbie in this field, but I expected that the Jacobian was > >> needed if there was an analytical expression for the derivatives; I > >> thought the leastsq routine calculated the finite difference > >> approximation by itself otherwise. So I never bothered providing an > >> "approximate" Jacobian. Or maybe I do not get what do you mean by finite > >> difference. > > > I say this not as someone intimately familiar with scipy.optimize, but as > > someone who has implemented a least squares-ish algorithm himself. > > > > You are almost certainly correct in that leastsq calculates an approximate > > Jacobian using a finite difference method on its own. However, if you can > > symbolically differentiate your promblem without too much heartache, then > > supplying an exact Jacobian is probably preferrable due to higher precision > > and less function evaluation (f(x) and f(x+h), differenced and normalized, > > vs. simply f'(x)). > > > > On the other hand: When I implemented my algorithm (nearly two years ago), > > my equations were pretty nasty. My derivatives just happened to be much much > > worse (as can be seen at > > http://modzer0.cs.uaf.edu/~jesusabdullah/gradients.html, at least for a > > little while), and at the time sympy honestly wasn't production-ready. So, I > > ended up using a finite difference method to calculate them (I believe I > > used scipy's derivative function), with which I did have to tweak step > > sizes. > > Thank you. What you tell me is very similar to what I have always > understood. But I was confused because of the pseudocode that Sebastian > Walter provided: > > ?>> On 29/06/10 09:59, Sebastian Walter wrote: > ?>>>>> If your problem is differentiable, you should compute the Jacobian > ?>>>>> yourself, e.g. with > ?>>>>> > ?>>>>> def myJacobian(x): > ?>>>>> ? ? ? h = 10**-3 > ?>>>>> ? ? ? # do finite differences approximation > ?>>>>> ? ? ? return .... > ?>> ? >>> > ?>> ? >>> ?and provide the Jacobian to > ?>> ? >>> ?scipy.optimize.leastsq(..., Dfun = myJacobian) > > that explicitly says you can provide one with finite differences > approximation, so I am unsure. > > thanks, > M. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Huh! I didn't notice that interesting comment there :) It could be that there's something I don't know. On the other hand, it could also just be that computing your own Jacobian allows for fine-tuning--for example, different dx's depending on the function, higher-precision FDMs for more complex equations, and maybe even mixings-in of simple, known derivatives. For example, while most of my Jacobian's equations were pretty obnoxious, I did have a few simple ones (some 0s, and possibly some basic trig functions here n' there--I don't remember exactly), and while some of my functions used degrees/radians as inputs, others used distance, and the shape of the function depended heavily on a relative scale, not an absolute one, and as such had a parameterized dx. --Josh From cournape at gmail.com Tue Jul 6 15:06:54 2010 From: cournape at gmail.com (David Cournapeau) Date: Tue, 6 Jul 2010 21:06:54 +0200 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C3337D2.3070208@gmail.com> Message-ID: On Tue, Jul 6, 2010 at 4:13 PM, Robin wrote: > On Tue, Jul 6, 2010 at 3:04 PM, Bruce Southey wrote: >> Windows 7 is a big improvement over Vista but both suffer the transisition >> from 32-bit to x64 64-bit architecture (similar to Linux when these x64 >> cpu's came out). Sure most people do not develop with Windows but do not >> equate that with a lack of interest. The problem is that Windows and how the >> Windows binaries are build just makes it very extremely hard to develop for. > > Yes, I was really surprised at this. I don't know very much about the > workings of Python, but presumably theres a reason the Python people > couldn't have made ints on win64 proper 64 bit ints using whatever > type microsoft requires instead of just sticking with 32bit C longs. I am not sure why you think that's the problem to the issues you are describing. Numpy does use a type which is 64 bits for indexing on windows as everywhere else, and that's not the cause of the issues you have described so far. The random shuffle limitation for example is not windows specific, for example. Concerning sparse matrices, the index is currently limited to 32 bits: you can change this by hand if you need 64 bits indexing (in sparsetools.i, add DECLARE_INDEX_TYPE(npy_intp)). David From robince at gmail.com Tue Jul 6 15:20:36 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 20:20:36 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: On Tue, Jul 6, 2010 at 7:43 PM, Pauli Virtanen wrote: > Tue, 06 Jul 2010 11:53:22 -0600, Charles R Harris wrote: > [clip] >> Let's get this thread back to the errors. The problems seem specific to >> the python.org amd64 python, is that correct? > > The SuperLU failures puzzle me. It should be "straightforward" C code, > and I don't understand what can go wrong there. The "Factor is exactly > singular" error indicates essentially means that SuperLU thinks it > detects a zero pivot or something, so something seems to fail at a fairly > low level. > > This seems quite difficult to debug without a Win-64 at hand. > > Another thing is that Gohlke's binaries are also built against MKL, and > SuperLU does call BLAS routines. I wonder if something can break because > of that... > > @Robin: Also, for the cases where a wrong result is produced with no > error: is it easy to write a small test program demonstrating this? If > yes, could you write one? Yes, below is a simple example. On my mac it works: In [10]: run -i sptest.py [-0.34841705 -0.23272338 0.27248558] [-0.34841705 -0.23272338 0.27248558] On the 64bit windows installation: In [7]: run -i spsolve_test.py [ 0.08507826 1.04401349 -1.56609783] [-0.52208434 -1.48101957 -1.56609783] In [8]: run -i spsolve_test.py [ 0.71923676 -0.12209489 -0.16069061] [-0.2827855 0.55854616 -0.16069061] import numpy as np import scipy as sp import scipy.sparse.linalg Adense = np.matrix([[ 0., 1., 1.], [ 1., 0., 1.], [ 0., 0., 1.]]) As = sp.sparse.csc_matrix(Adense) x = np.random.randn(3) b = As.matvec(x) print x print sp.sparse.linalg.spsolve(As, b) From charlesr.harris at gmail.com Tue Jul 6 15:26:49 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Jul 2010 13:26:49 -0600 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: On Tue, Jul 6, 2010 at 1:20 PM, Robin wrote: > On Tue, Jul 6, 2010 at 7:43 PM, Pauli Virtanen wrote: > > Tue, 06 Jul 2010 11:53:22 -0600, Charles R Harris wrote: > > [clip] > >> Let's get this thread back to the errors. The problems seem specific to > >> the python.org amd64 python, is that correct? > > > > The SuperLU failures puzzle me. It should be "straightforward" C code, > > and I don't understand what can go wrong there. The "Factor is exactly > > singular" error indicates essentially means that SuperLU thinks it > > detects a zero pivot or something, so something seems to fail at a fairly > > low level. > > > > This seems quite difficult to debug without a Win-64 at hand. > > > > Another thing is that Gohlke's binaries are also built against MKL, and > > SuperLU does call BLAS routines. I wonder if something can break because > > of that... > > > > @Robin: Also, for the cases where a wrong result is produced with no > > error: is it easy to write a small test program demonstrating this? If > > yes, could you write one? > > Yes, below is a simple example. On my mac it works: > In [10]: run -i sptest.py > [-0.34841705 -0.23272338 0.27248558] > [-0.34841705 -0.23272338 0.27248558] > > On the 64bit windows installation: > In [7]: run -i spsolve_test.py > [ 0.08507826 1.04401349 -1.56609783] > [-0.52208434 -1.48101957 -1.56609783] > In [8]: run -i spsolve_test.py > [ 0.71923676 -0.12209489 -0.16069061] > [-0.2827855 0.55854616 -0.16069061] > > import numpy as np > import scipy as sp > import scipy.sparse.linalg > > Adense = np.matrix([[ 0., 1., 1.], > [ 1., 0., 1.], > [ 0., 0., 1.]]) > As = sp.sparse.csc_matrix(Adense) > x = np.random.randn(3) > b = As.matvec(x) > > print x > print sp.sparse.linalg.spsolve(As, b) > Is x the same on both machines? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Jul 6 15:31:42 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 20:31:42 +0100 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: On Tue, Jul 6, 2010 at 8:26 PM, Charles R Harris wrote: > Is x the same on both machines? No I was randomly generating each time... With the same x on both machines: mac In [23]: run -i sptest.py [-0.52196943 0.04636895 -0.39616894] [-0.52196943 0.04636895 -0.39616894] Windows 64: In [23]: run -i spsolve_test.py [-0.52196943 0.04636895 -0.39616894] [-0.34979999 -0.91813837 -0.39616894] From perry at stsci.edu Tue Jul 6 16:32:25 2010 From: perry at stsci.edu (Perry Greenfield) Date: Tue, 6 Jul 2010 16:32:25 -0400 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: <735692AF-E60A-439E-9E9C-C858EA28DB57@gmail.com> References: <4C2BBBA7.5060006@gemini.edu> <735692AF-E60A-439E-9E9C-C858EA28DB57@gmail.com> Message-ID: <9D3556DE-43C3-4047-B25F-CA6F67F07EC7@stsci.edu> On Jul 6, 2010, at 2:36 PM, Thomas Robitaille wrote: > Hi all, > > Unlike several people who have replied so far, I have not been > involved in general purpose astronomy libraries, but rather a few > small packages, including: > > - APLpy (FITS image plotting built on matplotlib) at http://aplpy.sourceforge.net > (co-developed with Eli Bressert) > - ATpy (Seamless multi-format table handler) at http://atpy.sourceforge.net > (also co-developed with Eli Bressert) > - IDLSave (To read IDL save files into python) at http://idlsave.sourceforge.net > - python-montage (Montage wrapper) at http://python-montage.sourceforgenet > > The main reason for keeping these separate rather than group them > into a single package was that each of these packages accomplishes a > well-defined task, and we were not prepared to develop all the other > tools needed in a general-purpose library. However, I think that > from a user point of view, it would be nice to have something more > unified. > > The main point I want to make is that we need to distinguish between > merging all development into a single repository, and bundling > packages. There are cases where merging the development of packages > does not make sense. For example, in the case of IDLSave, the module > was originally developed for the astronomy community, but in fact > can be used by any scientist that uses IDL. By developing it as part > of a general astronomy libraries makes it less likely that non- > astronomers will find it and use it. Another example is Tom > Aldcroft's asciitable module (http://cxc.harvard.edu/contrib/asciitable/ > ). This was developed to read in ASCII tables, but was not actually > designed in an astronomy specific ways, since ASCII tables are of > course not limited to astronomy. In the case of such a package, > developing it as part of a general astronomy library would be > detrimental, but bundling it as part of a general package might be > desirable. > In principle yes, some of these things are generic. But having libraries in a common repository doesn't preclude distributing them separately from astronomy. I also tend to thing that premature fragmentation is as bad a problem as being way too monolithic. I'd tend to wait until it was clear we had too much stuff in one repository before worrying about how to split things up. Sometimes it never becomes an issue. If some items grow non-astronomical contributors, they can go to scipy (or something similar). There are some advantages to a common repository: 1) We become more aware of areas of commonality and that foster greater sense of making things work better together. 2) Eventually it would help make for more consistent documentation and style, and ways of integrating the documentation. 3) Making the process of integrating common or redundant functionality easier later (I'll say a little about that in a separate email) And there are downsides of course: 1) Not everyone wants to use the same version control software (svn, git, hg...) or wiki, or issue tracker. 2) Coordination with other developers and projects is more work than doing things on your own. [...] > 3. An 'all-in-one' package that would also include a full Python > distribution, with numpy, scipy, matplotlib, etc - essentially an > alternative to EPD for astronomy, e.g. 'APD (Astronomy Python > Distribution)'. It could even come with a custom ipython prompt with > all packges pre-loaded. All dependencies would be included. > In this area STScI and Gemini are attempting to do something like this now as a joint project. We are just starting on this effort, but we hope to make a fairly easy to install distribution that includes most core tools astronomers would need (or something that would be easy to install optional items on). But we don't want to talk too much about this until we have something to have people try (I think Gemini would like to have something basic by the end of the year). We intend to include IRAF and the common external IRAF packages as well. > This would then cater to the different levels of python users in > Astronomy. One quick note is that if we follow this or a similar > model of bundling some packages without merging development > repositories, is that developers of small packages will need to be > more careful what license they release their code under, to ensure > they can be bundled and re-released (but this should be trivial). > Yes, licensing is one of the important issues to deal with... Perry From stevenj at alum.mit.edu Wed Jul 7 00:24:54 2010 From: stevenj at alum.mit.edu (Steven G. Johnson) Date: Wed, 07 Jul 2010 00:24:54 -0400 Subject: [SciPy-User] R: [ANN] NLopt, a nonlinear optimization library, now with Python interface In-Reply-To: <484818.50069.qm@web26702.mail.ukl.yahoo.com> References: <76d97234-00ad-4516-a786-e71ee9a866f4@g19g2000yqc.googlegroups.com> <484818.50069.qm@web26702.mail.ukl.yahoo.com> Message-ID: enrico avventi wrote: > i wanted to try out your library but the configure process won't find > the header arrayobject.h and thus won't compile the bindings. > where am i supposed to put that header? > as my distro (Archlinux) doesn't seem to install the numpy headers i > tried to copy them manually. so far i tried: I don't know anything about Archlinux. On my (Debian) system, the numpy headers are installed under /usr/include/python2.5/numpy/arrayobject.h It sounds like the directory you put it under should be okay, but perhaps you screwed up something else in your NumPy installation (e.g. you are missing other header files, or....). Whenever you find yourself copying header files around manually, you are usually making a mistake...it would really be much better if you figure out what package in your distro is supposed to include the header files. You can look in the config.log file generated by the configure script in order to find the exact compiler error message that caused #include to fail. Regards, Steven G. Johnson From cournape at gmail.com Wed Jul 7 02:12:00 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Jul 2010 08:12:00 +0200 Subject: [SciPy-User] R: [ANN] NLopt, a nonlinear optimization library, now with Python interface In-Reply-To: References: <76d97234-00ad-4516-a786-e71ee9a866f4@g19g2000yqc.googlegroups.com> <484818.50069.qm@web26702.mail.ukl.yahoo.com> Message-ID: On Wed, Jul 7, 2010 at 6:24 AM, Steven G. Johnson wrote: > enrico avventi wrote: >> i wanted to try out your library but the configure process won't find >> the header arrayobject.h and thus won't compile the bindings. >> where am i supposed to put that header? >> as my distro (Archlinux) doesn't seem to install the numpy headers i >> tried to copy them manually. so far i tried: > > I don't know anything about Archlinux. ?On my (Debian) system, the numpy > headers are installed under > > ? ? ? ?/usr/include/python2.5/numpy/arrayobject.h That's actually debian specific. I have not looked at the nlopt code, but the recommended way to use numpy headers is the get_numpy_includes from numpy.distutils.misc_util.get_numpy_include_dirs. Doing so guarantees it will work everywhere, David From njs at pobox.com Wed Jul 7 09:47:50 2010 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 7 Jul 2010 06:47:50 -0700 Subject: [SciPy-User] R: [ANN] NLopt, a nonlinear optimization library, now with Python interface In-Reply-To: References: <76d97234-00ad-4516-a786-e71ee9a866f4@g19g2000yqc.googlegroups.com> <484818.50069.qm@web26702.mail.ukl.yahoo.com> Message-ID: On Tue, Jul 6, 2010 at 11:12 PM, David Cournapeau wrote: > That's actually debian specific. I have not looked at the nlopt code, > but the recommended way to use numpy headers is the get_numpy_includes > from numpy.distutils.misc_util.get_numpy_include_dirs. Doing so > guarantees it will work everywhere, I'm told the even *more* recommended way is to use 'numpy.get_include()': http://www.mail-archive.com/cython-dev at codespeak.net/msg08368.html -- Nathaniel From stevenj at alum.mit.edu Wed Jul 7 11:34:34 2010 From: stevenj at alum.mit.edu (Steven G. Johnson) Date: Wed, 07 Jul 2010 11:34:34 -0400 Subject: [SciPy-User] R: [ANN] NLopt, a nonlinear optimization library, now with Python interface In-Reply-To: References: <76d97234-00ad-4516-a786-e71ee9a866f4@g19g2000yqc.googlegroups.com> <484818.50069.qm@web26702.mail.ukl.yahoo.com> Message-ID: Nathaniel Smith wrote: > I'm told the even *more* recommended way is to use 'numpy.get_include()': > http://www.mail-archive.com/cython-dev at codespeak.net/msg08368.html Thanks for the tip. I've just released version 2.1.1 of NLopt, which corrects the configure script's checks for Python and Numpy include directories. Regards, Steven G. Johnson From peterhoward42 at gmail.com Wed Jul 7 14:40:31 2010 From: peterhoward42 at gmail.com (Peter Howard) Date: Wed, 7 Jul 2010 19:40:31 +0100 Subject: [SciPy-User] butterworth filter on .WAV file Message-ID: I'm trying to write a very simple example of applying a band pass filter to a .WAV music file. I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies if I've made a dumb mistake. It executes without errors or warnings. It produces the output file, but this is twice the size of the input file, which is clearly wrong. I'm most uncertain about casting the filtered data back to integers and thus being suitable for writing back out to .WAV. I'm also bit uncertain about my interpretation / understanding of the frequency and gain specifications. Any help and advice very much appreciated. Pete from scipy.io.wavfile import read, write from scipy.signal.filter_design import butter, buttord from scipy.signal import lfilter from numpy import asarray def convert_hertz(freq): # convert frequency in hz to units of pi rad/sample # (our .WAV is sampled at 44.1KHz) return freq * 2.0 / 44100.0 rate, sound_samples = read('monty.wav') pass_freq = convert_hertz(440.0) # pass up to 'middle C' stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves higher pass_gain = 3.0 # tolerable loss in passband (dB) stop_gain = 60.0 # required attenuation in stopband (dB) ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) b, a = butter(ord, wn, btype = 'low') filtered = lfilter(b, a, sound_samples) integerised_filtered = asarray(filtered, int) write('monty-filtered.wav', rate, integerised_filtered) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Wed Jul 7 14:45:25 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 7 Jul 2010 10:45:25 -0800 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: On Wed, Jul 7, 2010 at 10:40 AM, Peter Howard wrote: > I'm trying to write a very simple example of applying a band pass filter to > a .WAV music file. > I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies > if I've made a dumb mistake. > > It executes without errors or warnings. > It produces the output file, but this is twice the size of the input file, > which is clearly wrong. > I'm most uncertain about casting the filtered data back to integers and thus > being suitable for writing back out to .WAV. > I'm also bit uncertain about my interpretation / understanding of the > frequency and gain specifications. > > Any help and advice very much appreciated. > > Pete > > > > > from scipy.io.wavfile import read, write > from scipy.signal.filter_design import butter, buttord > from scipy.signal import lfilter > from numpy import asarray > > def convert_hertz(freq): > ??? # convert frequency in hz to units of pi rad/sample > ??? # (our .WAV is sampled at 44.1KHz) > ??? return freq * 2.0 / 44100.0 > > rate, sound_samples = read('monty.wav') > pass_freq = convert_hertz(440.0) # pass up to 'middle C' > stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves higher > pass_gain = 3.0 # tolerable loss in passband (dB) > stop_gain = 60.0 # required attenuation in stopband (dB) > ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) > b, a = butter(ord, wn, btype = 'low') > filtered = lfilter(b, a, sound_samples) > integerised_filtered = asarray(filtered, int) > write('monty-filtered.wav', rate, integerised_filtered) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > What does the output file sound like? --Josh From peterhoward42 at gmail.com Wed Jul 7 17:34:51 2010 From: peterhoward42 at gmail.com (Peter Howard) Date: Wed, 7 Jul 2010 22:34:51 +0100 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: I forgot to mention that part - it plays without complaint in Window's Media Player - but is completely silent. Would it help if I provide the input .WAV file? Can you attach things to posts via email? (It is 13MB) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Wed Jul 7 17:55:47 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 7 Jul 2010 13:55:47 -0800 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: On Wed, Jul 7, 2010 at 1:34 PM, Peter Howard wrote: > I forgot to mention that part - it plays without complaint in Window's Media > Player - but is completely silent. > Would it help if I provide the input .WAV file? > Can you attach things to posts via email? > (It is 13MB) > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > That's huge! Nevermind that. Interestingly enough, I tried messing with the problem using a random wav file I found from the internet (that plays in vlc just fine) and got an ugly error. So, I guess I can't help you in this way :S That said, here's what I would do: First, plot the array that results from reading the wav file to get an idea of what it looks like. Then, after you butterworth it up, plot the result of THAT to see what you have. That could give you an idea as to whether the problem is in the reading, the butterworthing, the output, or whatever. Just my $0.02. Good luck! --Josh p.s: If anyone cares about the error I got, here you go: In [12]: a,b=wavfile.read('viacom2.wav') Reading fmt chunk Reading data chunk Warning: %s chunk not understood --------------------------------------------------------------------------- error Traceback (most recent call last) /home/josh/ in () /usr/lib/python2.6/site-packages/scipy/io/wavfile.pyc in read(file) 65 else: 66 print "Warning: %s chunk not understood" ---> 67 size = struct.unpack('I',fid.read(4))[0] 68 bytes = fid.read(size) 69 fid.close() error: unpack requires a string argument of length 4 The file could be borked for all I know, and maybe this was fixed a long time ago and my scipy package (probably stock fedora) could be a bit stale, but the %s bit makes me think that maybe that part wasn't completely finished? Anyways. From ben.root at ou.edu Wed Jul 7 17:59:42 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 7 Jul 2010 16:59:42 -0500 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: Well, to give you hope, it is feasible as some friends of mine did a notch filter on some .wav files at the conference last week. They had a bunch of issues as well. I will see if he is actively reading this list. Ben Root On Wed, Jul 7, 2010 at 4:55 PM, Joshua Holbrook wrote: > On Wed, Jul 7, 2010 at 1:34 PM, Peter Howard > wrote: > > I forgot to mention that part - it plays without complaint in Window's > Media > > Player - but is completely silent. > > Would it help if I provide the input .WAV file? > > Can you attach things to posts via email? > > (It is 13MB) > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > That's huge! Nevermind that. > > Interestingly enough, I tried messing with the problem using a random > wav file I found from the internet (that plays in vlc just fine) and > got an ugly error. So, I guess I can't help you in this way :S > > That said, here's what I would do: First, plot the array that results > from reading the wav file to get an idea of what it looks like. Then, > after you butterworth it up, plot the result of THAT to see what you > have. That could give you an idea as to whether the problem is in the > reading, the butterworthing, the output, or whatever. > > Just my $0.02. Good luck! > > --Josh > > p.s: > > If anyone cares about the error I got, here you go: > > In [12]: a,b=wavfile.read('viacom2.wav') > Reading fmt chunk > Reading data chunk > Warning: %s chunk not understood > --------------------------------------------------------------------------- > error Traceback (most recent call last) > > /home/josh/ in () > > /usr/lib/python2.6/site-packages/scipy/io/wavfile.pyc in read(file) > 65 else: > 66 print "Warning: %s chunk not understood" > ---> 67 size = struct.unpack('I',fid.read(4))[0] > 68 bytes = fid.read(size) > 69 fid.close() > > error: unpack requires a string argument of length 4 > > The file could be borked for all I know, and maybe this was fixed a > long time ago and my scipy package (probably stock fedora) could be a > bit stale, but the %s bit makes me think that maybe that part wasn't > completely finished? Anyways. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_baddeley at yahoo.com.au Wed Jul 7 18:11:12 2010 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Wed, 7 Jul 2010 15:11:12 -0700 (PDT) Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: <912430.75679.qm@web33005.mail.mud.yahoo.com> In addition to the comments that have already been made (suggesting you plot the result etc ...), I think you might have problems with your final cast (the asarray(filtered, int) bit). On my computer that results in a 64 bit integer type (might be 32 bit on some platforms). I guess you want a 16 bit integer, as this is what most sound programs will be expecting. You could try: asarray(filtered, 'int16) or filtered.astype('int16') cheers, David ________________________________ From: Peter Howard To: scipy-user at scipy.org Sent: Thu, 8 July, 2010 6:40:31 AM Subject: [SciPy-User] butterworth filter on .WAV file I'm trying to write a very simple example of applying a band pass filter to a .WAV music file. I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies if I've made a dumb mistake. It executes without errors or warnings. It produces the output file, but this is twice the size of the input file, which is clearly wrong. I'm most uncertain about casting the filtered data back to integers and thus being suitable for writing back out to .WAV. I'm also bit uncertain about my interpretation / understanding of the frequency and gain specifications. Any help and advice very much appreciated. Pete from scipy.io.wavfile import read, write from scipy.signal.filter_design import butter, buttord from scipy.signal import lfilter from numpy import asarray def convert_hertz(freq): # convert frequency in hz to units of pi rad/sample # (our .WAV is sampled at 44.1KHz) return freq * 2.0 / 44100.0 rate, sound_samples = read('monty.wav') pass_freq = convert_hertz(440.0) # pass up to 'middle C' stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves higher pass_gain = 3.0 # tolerable loss in passband (dB) stop_gain = 60.0 # required attenuation in stopband (dB) ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) b, a = butter(ord, wn, btype = 'low') filtered = lfilter(b, a, sound_samples) integerised_filtered = asarray(filtered, int) write('monty-filtered.wav', rate, integerised_filtered) -------------- next part -------------- An HTML attachment was scrubbed... URL: From c-b at asu.edu Wed Jul 7 18:12:26 2010 From: c-b at asu.edu (Christopher Brown) Date: Wed, 7 Jul 2010 15:12:26 -0700 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: Message-ID: <201007071512.26819.c-b@asu.edu> I've had good luck with: # Read fs,data = read(infilename) data = np.float64(data/32768.) # ... process ... # Write write(outfilename, fs, np.int16(data*32768)) On Wednesday 07 July 2010 11:40:31 Peter Howard wrote: > I'm trying to write a very simple example of applying a band pass filter to > a .WAV music file. > I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies > if I've made a dumb mistake. > > It executes without errors or warnings. > It produces the output file, but this is twice the size of the input file, > which is clearly wrong. > I'm most uncertain about casting the filtered data back to integers and > thus being suitable for writing back out to .WAV. > I'm also bit uncertain about my interpretation / understanding of the > frequency and gain specifications. > > Any help and advice very much appreciated. > > Pete > > > > > from scipy.io.wavfile import read, write > from scipy.signal.filter_design import butter, buttord > from scipy.signal import lfilter > from numpy import asarray > > def convert_hertz(freq): > # convert frequency in hz to units of pi rad/sample > # (our .WAV is sampled at 44.1KHz) > return freq * 2.0 / 44100.0 > > rate, sound_samples = read('monty.wav') > pass_freq = convert_hertz(440.0) # pass up to 'middle C' > stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves > higher pass_gain = 3.0 # tolerable loss in passband (dB) > stop_gain = 60.0 # required attenuation in stopband (dB) > ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) > b, a = butter(ord, wn, btype = 'low') > filtered = lfilter(b, a, sound_samples) > integerised_filtered = asarray(filtered, int) > write('monty-filtered.wav', rate, integerised_filtered) -- Christopher Brown, Ph.D. Associate Research Professor Department of Speech and Hearing Science Arizona State University http://pal.asu.edu From cgohlke at uci.edu Wed Jul 7 18:57:51 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Wed, 07 Jul 2010 15:57:51 -0700 Subject: [SciPy-User] many test failures on windows 64 In-Reply-To: References: <4C33330F.1000807@gmail.com> <4C33652F.4070002@noaa.gov> Message-ID: <4C35066F.9000205@uci.edu> On 7/6/2010 11:43 AM, Pauli Virtanen wrote: > Tue, 06 Jul 2010 11:53:22 -0600, Charles R Harris wrote: > [clip] >> Let's get this thread back to the errors. The problems seem specific to >> the python.org amd64 python, is that correct? > > The SuperLU failures puzzle me. It should be "straightforward" C code, > and I don't understand what can go wrong there. The "Factor is exactly > singular" error indicates essentially means that SuperLU thinks it > detects a zero pivot or something, so something seems to fail at a fairly > low level. > > This seems quite difficult to debug without a Win-64 at hand. > > Another thing is that Gohlke's binaries are also built against MKL, and > SuperLU does call BLAS routines. I wonder if something can break because > of that... Apparently it can. When I link SuperLU against CBLAS (from the SuperLU_4.0 distribution) instead of MKL 10.2 the test errors "RuntimeError: Factor is exactly singular" disappear. Now I get many more "TypeError: array cannot be safely cast to required type" failures, but that's a different problem. -- Christoph From peterhoward42 at gmail.com Thu Jul 8 07:41:34 2010 From: peterhoward42 at gmail.com (Peter Howard) Date: Thu, 8 Jul 2010 12:41:34 +0100 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: <201007071512.26819.c-b@asu.edu> References: <201007071512.26819.c-b@asu.edu> Message-ID: This tip from Chris, has nearly solved the problem. Thanks to everybody that has contributed. I now get sound out that sounds similar to but different from the input - so the fundamentals are sorted thank you. However I can only get it to work by experimentation with the frequencies and gains - it doesn't work with my understanding of what they theoretically should be. Either the filter design module objects to the coefficients or I get silence. Also, (and almost certainly significantly) the filtered output is only on the right stereo channel. I wonder if each integer sound sample in a WAV file is split bitwise into left and right sectors and the type conversion is corrupting half of each sample? Here's the state of play: from scipy.io.wavfile import read, write from scipy.signal.filter_design import butter, buttord from scipy.signal import lfilter, lfiltic import numpy as np from math import log rate, sound_samples = read('monty.wav') sound_samples = np.float64(sound_samples / 32768.0) pass_freq = 0.2 stop_freq = 0.3 pass_gain = 0.5 # permissible loss (ripple) in passband (dB) stop_gain = 10.0 # attenuation required in stopband (dB) ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) b, a = butter(ord, wn, btype = 'low') filtered = lfilter(b, a, sound_samples) filtered = np.int16(filtered * 32768 * 10) write('monty-filtered.wav', rate, filtered) Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Thu Jul 8 07:53:35 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 8 Jul 2010 13:53:35 +0200 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: <201007071512.26819.c-b@asu.edu> Message-ID: You could "post" (upload) large files maybe to http://drop.io for everyone interested to listen in (100MB file limit) -Sebastian On Thu, Jul 8, 2010 at 1:41 PM, Peter Howard wrote: > This tip from Chris, has nearly solved the problem. > Thanks to everybody that has contributed. > I now get sound out that sounds similar to but different from the input - so > the fundamentals are sorted thank you. > However I can only get it to work by experimentation with the frequencies > and gains - it doesn't work with my understanding of what they theoretically > should be. Either the filter design module objects to the coefficients or I > get silence. > Also, (and almost certainly significantly) the filtered output is only on > the right stereo channel. > I wonder if each integer sound sample in a WAV file is split bitwise into > left and right sectors and the type conversion is corrupting half of each > sample? > > Here's the state of play: > > > from scipy.io.wavfile import read, write > from scipy.signal.filter_design import butter, buttord > from scipy.signal import lfilter, lfiltic > import numpy as np > from math import log > > rate, sound_samples = read('monty.wav') > sound_samples = np.float64(sound_samples / 32768.0) > pass_freq = 0.2 > stop_freq = 0.3 > pass_gain = 0.5 # permissible loss (ripple) in passband (dB) > stop_gain = 10.0 # attenuation required in stopband (dB) > ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) > b, a = butter(ord, wn, btype = 'low') > filtered = lfilter(b, a, sound_samples) > filtered = np.int16(filtered * 32768 * 10) > write('monty-filtered.wav', rate, filtered) > > > > > > Pete > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From peterhoward42 at gmail.com Thu Jul 8 08:00:02 2010 From: peterhoward42 at gmail.com (Peter Howard) Date: Thu, 8 Jul 2010 13:00:02 +0100 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: <201007071512.26819.c-b@asu.edu> Message-ID: Hang on a minute - my brain is working backwards... am I wasting everyone's time here... If a .WAV file contains stereo sound data - then it's never going to work sucking it in and simply applying a filter to the numbers is it? So how come it sort of works? Or is the entire NumPy/SciPy suite so clever about working in N-dimensions that it treats the two sound channels as multi-dimensioned arrays right from the .WAV read() call? - and the filtering is separate for each dimension? Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Thu Jul 8 09:21:20 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 08 Jul 2010 10:21:20 -0300 Subject: [SciPy-User] butterworth filter on .WAV file In-Reply-To: References: <201007071512.26819.c-b@asu.edu> Message-ID: <1278595280.15708.9.camel@Portable-s2m.cnrs-mrs.fr> Le jeudi 08 juillet 2010 ? 13:00 +0100, Peter Howard a ?crit : > Hang on a minute - my brain is working backwards... am I wasting > everyone's time here... > > If a .WAV file contains stereo sound data - then it's never going to > work sucking it in and simply applying a filter to the numbers is it? > > So how come it sort of works? > > Or is the entire NumPy/SciPy suite so clever about working in > N-dimensions that it treats the two sound channels as > multi-dimensioned arrays right from the .WAV read() call? - and the > filtering is separate for each dimension? > So you might consider NumPy/SciPy clever! When reading a wav file, the output array is a 2D array with shape (M,N) where M is the number of samples of each channel (time range*sampling frequency) and N is the number of channels. Each channel is stored in a column of the output array. And it is so clever that it handles to apply a filter (with scipy.signal.lfilter) on each of the columns of the array. Example In [2]: import scipy.io.wavfile as wv In [3]: Fs,Sig = wv.read("STE-023.wav") Warning: %s chunk not understood Reading fmt chunk Reading data chunk In [4]: Fs Out[4]: 44100 In [5]: Sig.shape, Sig.dtype Out[5]: ((4434112, 2), dtype('int16')) In [6]: import scipy.signal as ss In [8]: SigFilt = ss.lfilter([1],[1, .1], Sig) Out[8]: array([[ 0. , 0. ], [ 4. , -9.4], [ 7. , -20.7], ..., [ 0. , 0. ], [ 0. , 0. ], [ 0. , 0. ]]) In [9]: SigFilt.shape, SigFilt.dtype Out[9]: ((4434112, 2), dtype('float64')) From eijkhout at tacc.utexas.edu Thu Jul 8 12:44:49 2010 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 8 Jul 2010 11:44:49 -0500 Subject: [SciPy-User] scalars vs array of length 1 Message-ID: I want to write a function that can accept both a scalar and a vector, when called on a vector it should return a vector of scalar application results. However, scalars seem to be treated differently from length-1 lists. %%%%% input from numpy import array,matrix,cos,sin,tan def f(x): valueMatrix = array([cos(x),sin(x),tan(x)]) print valueMatrix.shape valueMatrix = matrix(valueMatrix) print valueMatrix.shape print print "scalars" f(5) f([5]) %%% output scalars (3,) (1, 3) (3, 1) (3, 1) %%%% How do I get that shape to be the same in both cases? Victor. From denis-bz-gg at t-online.de Thu Jul 8 13:17:51 2010 From: denis-bz-gg at t-online.de (denis) Date: Thu, 8 Jul 2010 10:17:51 -0700 (PDT) Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? Message-ID: Folks, to get the best few of a large number of objects, e.g. vectors near a given one, or small distances in spatial.distance.cdist or .pdist, argsort( bigArray )[: a few ] is not so hot. It would be nice if argsort( bigArray, few= ) did this -- faster, save mem too. Would anyone else find this useful ? I recently stumbled across partial_sort in stl; fwiw, std:partial_sort( A, A + sqrt(N), A + N ) is ~ 10 times faster than std:sort on my old mac ppc, even for N 100. Also fwiw, nth_element alone is ~ twice as slow as partial_sort -- odd. cheers -- denis From dwf at cs.toronto.edu Thu Jul 8 13:23:15 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Jul 2010 13:23:15 -0400 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: References: Message-ID: <62179F55-B9B5-4CF7-8F05-727BE9331DF6@cs.toronto.edu> On 2010-07-08, at 12:44 PM, Victor Eijkhout wrote: > I want to write a function that can accept both a scalar and a vector, when called on a vector it should return a vector of scalar application results. However, scalars seem to be treated differently from length-1 lists. ... > > How do I get that shape to be the same in both cases? x = atleast_1d(x) David From eijkhout at tacc.utexas.edu Thu Jul 8 13:29:08 2010 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 8 Jul 2010 12:29:08 -0500 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: <62179F55-B9B5-4CF7-8F05-727BE9331DF6@cs.toronto.edu> References: <62179F55-B9B5-4CF7-8F05-727BE9331DF6@cs.toronto.edu> Message-ID: <6E96CD6F-D3E3-47A5-82F8-9B00B5314397@tacc.utexas.edu> On 2010/07/08, at 12:23 PM, David Warde-Farley wrote: > > On 2010-07-08, at 12:44 PM, Victor Eijkhout wrote: > >> I want to write a function that can accept both a scalar and a vector, when called on a vector it should return a vector of scalar application results. However, scalars seem to be treated differently from length-1 lists. > > ... > >> >> How do I get that shape to be the same in both cases? > > x = atleast_1d(x) Fantastic. Thanks! Victor. From kwgoodman at gmail.com Thu Jul 8 13:30:28 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 8 Jul 2010 10:30:28 -0700 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 10:17 AM, denis wrote: > Folks, > ?to get the best few of a large number of objects, > e.g. vectors near a given one, or small distances in > spatial.distance.cdist or .pdist, > argsort( bigArray )[: a few ] is not so hot. ?It would be nice if > ? ?argsort( bigArray, few= ) > did this -- faster, save mem too. Would anyone else find this useful ? > > I recently stumbled across partial_sort in stl; fwiw, > std:partial_sort( A, A + sqrt(N), A + N ) is ~ 10 times faster than > std:sort > on my old mac ppc, even for N 100. > Also fwiw, nth_element alone is ~ twice as slow as partial_sort -- > odd. I think a lot of people would like a partial sort. It comes up on the list now and then. There's a ticket with a cython partial sort: http://projects.scipy.org/numpy/ticket/1213 Here's the docstring: def select(a, k, inplace=False): ''' Wirth's version of Hoare's quick select Parameters ---------- a : array_like k : integer inplace : boolean The partial sort is done inplace if a is a contiguous ndarray and inplace=True. Default: False. Returns ------- out : ndarray Partially sorted a such that out[k] is the k largest element. Elements smaller than out[k] are unsorted in out[:k]. Elements larger than out[k] are unsorted in out[k:]. ''' From dwf at cs.toronto.edu Thu Jul 8 13:30:54 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Jul 2010 13:30:54 -0400 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: <776399F1-FEC9-43BF-AF47-4ECE0FC2F6C8@cs.toronto.edu> On 2010-07-08, at 1:17 PM, denis wrote: > argsort( bigArray )[: a few ] is not so hot. It would be nice if > argsort( bigArray, few= ) > did this -- faster, save mem too. Would anyone else find this useful ? Quite likely. I've seen these sorts of algorithms written up elsewhere and they indeed have favourable time complexity, but I've never had sufficient need to go and implement them. I don't think we'd want to introduce C++ dependencies in NumPy though, one would probably want to modify the existing sort/argsort machinery. David From devicerandom at gmail.com Thu Jul 8 13:46:07 2010 From: devicerandom at gmail.com (ms) Date: Thu, 08 Jul 2010 18:46:07 +0100 Subject: [SciPy-User] optimize.leastsq and improper input parameters Message-ID: <4C360EDF.3080805@gmail.com> Hi, I am stuck with optimize.leastsq. I am writing a quite complicated code that fits data to two variations of the same function, each of those can be with or without some parameters fixed. The concept is: - write the two variations - have a list of what parameters will be fixed and which not - use reasonable starting points of the non-fixed params to fit, and keep the fixed params as fixed within the function - then have a generalized routine that has as input the function to actually fit, among other things - apply leastsq on that function What comes out is similar, in structure, to this simplified example: ----- import scipy as sp import scipy.optimize as opt import numpy as np #Initial data x = np.arange(0,10,1) y = [i**2 for i in x] #Three nice functions to fit def xexponent1(param,i): exp = param[0] return i**exp def xexponent2(param, i): exp_a=param[0] exp_b=param[1] return i**(exp_a) + i**(exp_b) def line(param,i): A = param[0] b = param[1] return i*b + A def f_to_minimize(pars,args): #Generalized function we use to minimize the fit #calculates function and squared residuals function,x,y = args[0],args[1],args[2] y_estimate = [function(pars,xi) for xi in x] #calculate squared residuals resid = [(i-j)**2 for i,j in zip(y,y_estimate)] return sum(resid) def minimize(x,y,func,p0): #calls minimization args = [func,x,y] i = opt.leastsq(f_to_minimize, p0, args) print i minimize(x,y,line,[1,2]) ----- Here you can play with p0 and the function to give to f_to_minimize as an argument. What comes out is that if I give a p0 = [1] and I use a single-variable function, it works. As soon as I try a two-variable function (and thus I need two input parameters), I get: massimo at boltzmann:~/work$ python test_norm_leastsq.py Traceback (most recent call last): File "test_norm_leastsq.py", line 44, in minimize(x,y,[1,2]) File "test_norm_leastsq.py", line 41, in minimize i = opt.leastsq(f_to_minimize, p0, args) File "/usr/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 300, in leastsq raise errors[info][1], errors[info][0] TypeError: Improper input parameters. The funny thing is that it worked *before* I messed with the thing to simplify the function-choosing mechanism (before I had N different functions for each combination of fixed/nonfixed params, now I just have two and I fix stuff *inside* the function), and I can't see however how can this be different. Also, the example above leaves me perplexed -it seems leastsq simply doesn't want two-variable functions to be minimized in this case. Any hint? Thanks a lot, Massimo From aarchiba at physics.mcgill.ca Thu Jul 8 14:14:29 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Thu, 8 Jul 2010 14:14:29 -0400 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: Just to complicate the issue somewhat, it might be valuable to have both k-largest and k-smallest; also, one should think about what happens to NaNs and infinities. Further, it might be worth including a fast median calculation (i.e. find the median element(s) without completely sorting the array). Anne On 8 July 2010 13:30, Keith Goodman wrote: > On Thu, Jul 8, 2010 at 10:17 AM, denis wrote: >> Folks, >> to get the best few of a large number of objects, >> e.g. vectors near a given one, or small distances in >> spatial.distance.cdist or .pdist, >> argsort( bigArray )[: a few ] is not so hot. It would be nice if >> argsort( bigArray, few= ) >> did this -- faster, save mem too. Would anyone else find this useful ? >> >> I recently stumbled across partial_sort in stl; fwiw, >> std:partial_sort( A, A + sqrt(N), A + N ) is ~ 10 times faster than >> std:sort >> on my old mac ppc, even for N 100. >> Also fwiw, nth_element alone is ~ twice as slow as partial_sort -- >> odd. > > I think a lot of people would like a partial sort. It comes up on the > list now and then. There's a ticket with a cython partial sort: > > http://projects.scipy.org/numpy/ticket/1213 > > Here's the docstring: > > def select(a, k, inplace=False): > ''' > Wirth's version of Hoare's quick select > > Parameters > ---------- > a : array_like > k : integer > inplace : boolean > The partial sort is done inplace if a is a > contiguous ndarray and inplace=True. > Default: False. > > Returns > ------- > out : ndarray > Partially sorted a such that out[k] is > the k largest element. Elements smaller than > out[k] are unsorted in out[:k]. Elements larger > than out[k] are unsorted in out[k:]. > > ''' > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kwgoodman at gmail.com Thu Jul 8 14:19:12 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 8 Jul 2010 11:19:12 -0700 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 11:14 AM, Anne Archibald wrote: > Just to complicate the issue somewhat, it might be valuable to have > both k-largest and k-smallest; also, one should think about what > happens to NaNs and infinities. Further, it might be worth including a > fast median calculation (i.e. find the median element(s) without > completely sorting the array). I should have mentioned the name of the ticket: median in average O(n) time. From charlesr.harris at gmail.com Thu Jul 8 14:29:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 8 Jul 2010 12:29:37 -0600 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 12:19 PM, Keith Goodman wrote: > On Thu, Jul 8, 2010 at 11:14 AM, Anne Archibald > wrote: > > Just to complicate the issue somewhat, it might be valuable to have > > both k-largest and k-smallest; also, one should think about what > > happens to NaNs and infinities. Further, it might be worth including a > > fast median calculation (i.e. find the median element(s) without > > completely sorting the array). > > I should have mentioned the name of the ticket: median in average O(n) > time. > Yeah, I've got that. I was thinking of adding it as a generalized ufunc but haven't gotten around to it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jul 8 14:37:10 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 8 Jul 2010 11:37:10 -0700 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 11:29 AM, Charles R Harris wrote: > > > On Thu, Jul 8, 2010 at 12:19 PM, Keith Goodman wrote: >> >> On Thu, Jul 8, 2010 at 11:14 AM, Anne Archibald >> wrote: >> > Just to complicate the issue somewhat, it might be valuable to have >> > both k-largest and k-smallest; also, one should think about what >> > happens to NaNs and infinities. Further, it might be worth including a >> > fast median calculation (i.e. find the median element(s) without >> > completely sorting the array). >> >> I should have mentioned the name of the ticket: median in average O(n) >> time. > > Yeah, I've got that. I was thinking of adding it as a generalized ufunc but > haven't gotten around to it. You'll be a hero when you do it. There's got to be some ticker tape around here somewhere... From seb.haase at gmail.com Thu Jul 8 15:09:48 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 8 Jul 2010 21:09:48 +0200 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 6:44 PM, Victor Eijkhout wrote: > I want to write a function that can accept both a scalar and a vector, when called on a vector it should return a vector of scalar application results. However, scalars seem to be treated differently from length-1 lists. > > %%%%% input > > from numpy import array,matrix,cos,sin,tan > > def f(x): > ? ?valueMatrix = array([cos(x),sin(x),tan(x)]) > ? ?print valueMatrix.shape > ? ?valueMatrix = matrix(valueMatrix) > ? ?print valueMatrix.shape > ? ?print > > print "scalars" > f(5) > f([5]) > > %%% output > > scalars > (3,) > (1, 3) > > (3, 1) > (3, 1) > > %%%% > > How do I get that shape to be the same in both cases? > > Victor. why are you mixing numpy.array and numpy.matrix !? Are you sure you need both matrix and array ? Matrixs are like arrays except for the the *-operator, and the shape is always made to 2d (at least 2d maybe !!?? I don't know ....) Most people need only array - ever .... [ ;-) ] - Sebastian Haase From eijkhout at tacc.utexas.edu Thu Jul 8 15:27:03 2010 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 8 Jul 2010 14:27:03 -0500 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: References: Message-ID: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> On 2010/07/08, at 2:09 PM, Sebastian Haase wrote: > why are you mixing numpy.array and numpy.matrix !? > Are you sure you need both matrix and array ? > Matrixs are like arrays except for the the *-operator, I need true matrix-matrix (or rather matrix-vector) multiplication. > and the shape > is always made to 2d (at least 2d maybe !!?? I don't know ....) That's what it looks like to me. I'm not sure that I like the fact that a vector is a 2d matrix. You should be able to subscript a vector with one index, not two. Also, limiting to 2 means you can not extend to tensors, which is what I'll be doing shortly. Victor. From kwgoodman at gmail.com Thu Jul 8 15:32:51 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 8 Jul 2010 12:32:51 -0700 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> References: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> Message-ID: On Thu, Jul 8, 2010 at 12:27 PM, Victor Eijkhout wrote: > > On 2010/07/08, at 2:09 PM, Sebastian Haase wrote: > >> why are you mixing numpy.array and numpy.matrix !? >> Are you sure you need both matrix and array ? >> Matrixs are like arrays except for the the *-operator, > > I need true matrix-matrix (or rather matrix-vector) multiplication. > >> and the shape >> is always made to 2d (at least 2d maybe !!?? I don't know ....) > > That's what it looks like to me. I'm not sure that I like the fact that a vector is a 2d matrix. You should be able to subscript a vector with one index, not two. > > Also, limiting to 2 means you can not extend to tensors, which is what I'll be doing shortly. Sounds like you want arrays. Instead of mat1 * mat2 the dot product for arrays is np.dot(arr1, arr2) and a vector is 1d. From sturla at molden.no Thu Jul 8 16:30:31 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 08 Jul 2010 22:30:31 +0200 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: References: Message-ID: <4C363567.707@molden.no> Victor Eijkhout skrev: > I want to write a function that can accept both a scalar and a vector, when called on a vector it should return a vector of scalar application results. However, scalars seem to be treated differently from length-1 lists. > > Is Matlab syntax is confusing you? In Python, length-1 containers and scalars are not the same. Lists: >>> a = [1,2,3,4] >>> print a [1, 2, 3, 4] >>> print a[0] 1 >>> print a[0:1] [1] >>> print a[0:0] [] NumPy does the same: >>> import numpy as np >>> a = np.array([1,2,3,4]) >>> print a [1 2 3 4] >>> print a[0] 1 >>> print a[0:1] [1] >>> print a[0:0] [] >>> a array([1, 2, 3, 4]) >>> type(a[0]) >>> type(a[0:1]) You can use the function np.isscalar to check if an argument is scalar. Sturla > %%%%% input > > from numpy import array,matrix,cos,sin,tan > > def f(x): > valueMatrix = array([cos(x),sin(x),tan(x)]) > print valueMatrix.shape > valueMatrix = matrix(valueMatrix) > print valueMatrix.shape > print > > print "scalars" > f(5) > f([5]) > > %%% output > > scalars > (3,) > (1, 3) > > (3, 1) > (3, 1) > > %%%% > > How do I get that shape to be the same in both cases? > > Victor. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Thu Jul 8 16:50:50 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 8 Jul 2010 16:50:50 -0400 Subject: [SciPy-User] optimize.leastsq and improper input parameters In-Reply-To: <4C360EDF.3080805@gmail.com> References: <4C360EDF.3080805@gmail.com> Message-ID: On Thu, Jul 8, 2010 at 1:46 PM, ms wrote: > Hi, > > I am stuck with optimize.leastsq. > I am writing a quite complicated code that fits data to two variations > of the same function, each of those can be with or without some > parameters fixed. The concept is: > - write the two variations > - have a list of what parameters will be fixed and which not > - use reasonable starting points of the non-fixed params to fit, and > keep the fixed params as fixed within the function > - then have a generalized routine that has as input the function to > actually fit, among other things > - apply leastsq on that function > > What comes out is similar, in structure, to this simplified example: > ----- > import scipy as sp > import scipy.optimize as opt > import numpy as np > > > #Initial data > x = np.arange(0,10,1) > y = [i**2 for i in x] > > #Three nice functions to fit > def xexponent1(param,i): > ? ? exp = param[0] > ? ? return i**exp > > def xexponent2(param, i): > ? ? exp_a=param[0] > ? ? exp_b=param[1] > ? ? return i**(exp_a) + i**(exp_b) > > def line(param,i): > ? ? A = param[0] > ? ? b = param[1] > ? ? return i*b + A > > > def f_to_minimize(pars,args): > ? ? #Generalized function we use to minimize the fit > ? ? #calculates function and squared residuals > ? ? function,x,y = args[0],args[1],args[2] > ? ? y_estimate = [function(pars,xi) for xi in x] > > ? ? #calculate squared residuals > ? ? resid = [(i-j)**2 for i,j in zip(y,y_estimate)] > ? ? return sum(resid) > > def minimize(x,y,func,p0): > ? ? #calls minimization > ? ? args = [func,x,y] > ? ? i = opt.leastsq(f_to_minimize, p0, args) > ? ? print i > > minimize(x,y,line,[1,2]) > ----- > > Here you can play with p0 and the function to give to f_to_minimize as > an argument. > > What comes out is that if I give a p0 = [1] and I use a single-variable > function, it works. As soon as I try a two-variable function (and thus I > need two input parameters), I get: > > massimo at boltzmann:~/work$ python test_norm_leastsq.py > Traceback (most recent call last): > ? File "test_norm_leastsq.py", line 44, in > ? ? minimize(x,y,[1,2]) > ? File "test_norm_leastsq.py", line 41, in minimize > ? ? i = opt.leastsq(f_to_minimize, p0, args) > ? File "/usr/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 300, in leastsq > ? ? raise errors[info][1], errors[info][0] > TypeError: Improper input parameters. > > The funny thing is that it worked *before* I messed with the thing to > simplify the function-choosing mechanism (before I had N different > functions for each combination of fixed/nonfixed params, now I just have > two and I fix stuff *inside* the function), and I can't see however how > can this be different. Also, the example above leaves me perplexed -it > seems leastsq simply doesn't want two-variable functions to be minimized > in this case. Any hint? change in def f_to_minimize(pars,args): #calculate squared residuals ## resid = [(i-j)**2 for i,j in zip(y,y_estimate)] ## return sum(resid) return [(i-j) for i,j in zip(y,y_estimate)] leastsq does the squared sum itself, and needs as return of the function the vector of residuals docstring: func ? A Python function or method which takes at least one (possibly length N vector) argument and returns M floating point numbers with only one parameter leastsq assumes you have one observation and one parameter N=1, M=1 this is what I get running the changed script (array([-12., 9.]), 2) I didn't check anything else in your script. Josef > > Thanks a lot, > Massimo > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Thu Jul 8 16:59:42 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 08 Jul 2010 22:59:42 +0200 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: References: Message-ID: <4C363C3E.2070009@molden.no> Victor Eijkhout skrev: > How do I get that shape to be the same in both cases? > x = (x,) if np.isscalar(x) else x Scalars are not iterable, which is why array([cos(x),sin(x),tan(x)]) becomes a vector if x is scalar. From Chris.Barker at noaa.gov Thu Jul 8 17:44:58 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 08 Jul 2010 14:44:58 -0700 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> References: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> Message-ID: <4C3646DA.5040006@noaa.gov> Victor Eijkhout wrote: > That's what it looks like to me. I'm not sure that I like the fact that a vector is a 2d matrix. One of the reasons np.matrix hasn't really caught on -- I think it needs a more complete implementation (row and column vectors) to really be useful, but then I don't want it anyway. (when I used MATLAB, I had to write ".*" all the time, it was really annpying, far less annoying that writing np.dot(x,y) once in a while) If you want to write some code, search this list (and the wiki for discussions, many of the issues have been hashed out, but so far no one wants it bad enough to write the code. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From aisaac at american.edu Thu Jul 8 17:48:45 2010 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 08 Jul 2010 17:48:45 -0400 Subject: [SciPy-User] scalars vs array of length 1 In-Reply-To: <4C3646DA.5040006@noaa.gov> References: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> <4C3646DA.5040006@noaa.gov> Message-ID: <4C3647BD.5030004@american.edu> On 7/8/2010 5:44 PM, Christopher Barker wrote: > np.dot(x,y) Which reminds me: expected in 1.4.1 to be able to do x.dot(y), but it's not there. Will it be in 1.5? Thanks, Alan Isaac From devicerandom at gmail.com Thu Jul 8 19:13:49 2010 From: devicerandom at gmail.com (ms) Date: Fri, 09 Jul 2010 00:13:49 +0100 Subject: [SciPy-User] optimize.leastsq and improper input parameters In-Reply-To: References: <4C360EDF.3080805@gmail.com> Message-ID: <4C365BAD.3040101@gmail.com> On 08/07/10 21:50, josef.pktd at gmail.com wrote: > On Thu, Jul 8, 2010 at 1:46 PM, ms wrote: >> massimo at boltzmann:~/work$ python test_norm_leastsq.py >> Traceback (most recent call last): >> File "test_norm_leastsq.py", line 44, in >> minimize(x,y,[1,2]) >> File "test_norm_leastsq.py", line 41, in minimize >> i = opt.leastsq(f_to_minimize, p0, args) >> File "/usr/lib/python2.6/dist-packages/scipy/optimize/minpack.py", >> line 300, in leastsq >> raise errors[info][1], errors[info][0] >> TypeError: Improper input parameters. >> >> The funny thing is that it worked *before* I messed with the thing to >> simplify the function-choosing mechanism (before I had N different >> functions for each combination of fixed/nonfixed params, now I just have >> two and I fix stuff *inside* the function), and I can't see however how >> can this be different. Also, the example above leaves me perplexed -it >> seems leastsq simply doesn't want two-variable functions to be minimized >> in this case. Any hint? > > > change in def f_to_minimize(pars,args): > > #calculate squared residuals > ## resid = [(i-j)**2 for i,j in zip(y,y_estimate)] > ## return sum(resid) > return [(i-j) for i,j in zip(y,y_estimate)] > > leastsq does the squared sum itself, and needs as return of the > function the vector of residuals > > docstring: > func ? A Python function or method which takes at least one > (possibly length N vector) argument and returns M floating point numbers > > with only one parameter leastsq assumes you have one observation and > one parameter N=1, M=1 > > this is what I get running the changed script > (array([-12., 9.]), 2) > > I didn't check anything else in your script. > > Josef Thanks! I didn't notice that leastsq was so clever :) I tried quickly now and I still have some trouble, but seems unrelated to the issue, tomorrow I'll see if it's definitely fixed. thanks again, m. From josef.pktd at gmail.com Thu Jul 8 19:21:20 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 8 Jul 2010 19:21:20 -0400 Subject: [SciPy-User] optimize.leastsq and improper input parameters In-Reply-To: <4C365BAD.3040101@gmail.com> References: <4C360EDF.3080805@gmail.com> <4C365BAD.3040101@gmail.com> Message-ID: On Thu, Jul 8, 2010 at 7:13 PM, ms wrote: > On 08/07/10 21:50, josef.pktd at gmail.com wrote: >> On Thu, Jul 8, 2010 at 1:46 PM, ms ?wrote: > >>> massimo at boltzmann:~/work$ python test_norm_leastsq.py >>> Traceback (most recent call last): >>> ? ?File "test_norm_leastsq.py", line 44, in >>> ? ? ?minimize(x,y,[1,2]) >>> ? ?File "test_norm_leastsq.py", line 41, in minimize >>> ? ? ?i = opt.leastsq(f_to_minimize, p0, args) >>> ? ?File "/usr/lib/python2.6/dist-packages/scipy/optimize/minpack.py", >>> line 300, in leastsq >>> ? ? ?raise errors[info][1], errors[info][0] >>> TypeError: Improper input parameters. >>> >>> The funny thing is that it worked *before* I messed with the thing to >>> simplify the function-choosing mechanism (before I had N different >>> functions for each combination of fixed/nonfixed params, now I just have >>> two and I fix stuff *inside* the function), and I can't see however how >>> can this be different. Also, the example above leaves me perplexed -it >>> seems leastsq simply doesn't want two-variable functions to be minimized >>> in this case. Any hint? >> >> >> change in def f_to_minimize(pars,args): >> >> ? ? ?#calculate squared residuals >> ## ? ?resid = [(i-j)**2 for i,j in zip(y,y_estimate)] >> ## ? ?return sum(resid) >> ? ? ?return [(i-j) for i,j in zip(y,y_estimate)] >> >> leastsq does the squared sum itself, and needs as return of the >> function the vector of residuals >> >> docstring: >> func ? A Python function or method which takes at least one >> (possibly length N vector) argument and returns M floating point numbers >> >> with only one parameter leastsq assumes you have one observation and >> one parameter N=1, M=1 >> >> this is what I get running the changed script >> (array([-12., ? 9.]), 2) >> >> I didn't check anything else in your script. >> >> Josef > > Thanks! I didn't notice that leastsq was so clever :) > I tried quickly now and I still have some trouble, but seems unrelated > to the issue, tomorrow I'll see if it's definitely fixed. If it's possible in your original code, I would replace the list comprehensions with numpy array operations. Josef > > thanks again, > > m. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Thu Jul 8 19:30:15 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 01:30:15 +0200 Subject: [SciPy-User] calculating covariances fast (and accurate) Message-ID: <4C365F87.4020901@molden.no> I needed to calculate covariances with rounding error correction. As SciPy expose BLAS, I could use dger and dgemm. Here is the code: import numpy as np import scipy as sp import scipy.linalg from scipy.linalg.fblas import dger, dgemm def mean_cov(X): n,p = X.shape m = X.mean(axis=0) # covariance matrix with correction for rounding error # S = (cx'*cx - (scx'*scx/n))/(n-1) # Am Stat 1983, vol 37: 242-247. cx = X - m scx = cx.sum(axis=0) scx_op = dger(-1.0/n,scx,scx) S = dgemm(1.0, cx.T, cx.T, beta=1.0, c=scx_op, trans_a=0, trans_b=1, overwrite_c=1) S[:] *= 1.0/(n-1) return m,S.T Let's time this couple of times against NumPy: if __name__ == '__main__': from time import clock n,p = 20000,1000 X = 2*np.random.randn(n,p) + 5 t0 = clock() m,S = X.mean(axis=0), np.cov(X, rowvar=False, bias=0) t1 = clock() print t1-t0 t0 = clock() m,S = mean_cov(X) t1 = clock() print t1-t0 L:\>meancov.py 7.39771102515 2.24604790004 L:\>meancov.py 16.1079984658 2.21100101726 That speaks for itself :-D One important lesson from this: it does not help to write a function like cov(X) in C, when we have access to optimized BLAS from Python. So let this serve as a warning against using C istead of Python. If we don't want to correct rounding error, dger goes out and it just becomes: def mean_cov(X): n,p = X.shape m = X.mean(axis=0) cx = X - m S = dgemm(1./(n-1), cx.T, cx.T, trans_a=0, trans_b=1) return m,S.T Sturla P.S. I should perhaps mention that I use SciPy and Numpy linked with MKL. Looking at Windows' task manager, it seems both np.cov and my code saturate all four cores. From sturla at molden.no Thu Jul 8 20:08:32 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 02:08:32 +0200 Subject: [SciPy-User] calculating covariances fast (and accurate) In-Reply-To: <4C365F87.4020901@molden.no> References: <4C365F87.4020901@molden.no> Message-ID: <4C366880.2060208@molden.no> Sturla Molden skrev: > One important lesson from this: it does not help to write a function > like cov(X) in C, when we have access to optimized BLAS from Python. So > let this serve as a warning against using C istead of Python. > After plowing though numpy svn, I located np.dot here: http://svn.scipy.org/svn/numpy/trunk/numpy/ma/extras.py It seems numpy.cov is written in Python too. But it uses np.dot instead of dgemm (why is that less efficient?), deals with masked arrays, and forms more temporary arrays. That's why it's slower. The biggest contribution is probably dgemm vs. np.dot. And np.cov does not correct for rounding errors (which I think it should). Sturla From kwgoodman at gmail.com Thu Jul 8 20:54:16 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 8 Jul 2010 17:54:16 -0700 Subject: [SciPy-User] calculating covariances fast (and accurate) In-Reply-To: <4C365F87.4020901@molden.no> References: <4C365F87.4020901@molden.no> Message-ID: On Thu, Jul 8, 2010 at 4:30 PM, Sturla Molden wrote: > I needed to calculate covariances with rounding error correction. As > SciPy expose BLAS, I could use dger and dgemm. Here is the code: > > import numpy as np > import scipy as sp > import scipy.linalg > from scipy.linalg.fblas import dger, dgemm > > def mean_cov(X): > ? ?n,p = X.shape > ? ?m = X.mean(axis=0) > ? ?# covariance matrix with correction for rounding error > ? ?# S = (cx'*cx - (scx'*scx/n))/(n-1) > ? ?# Am Stat 1983, vol 37: 242-247. > ? ?cx = X - m > ? ?scx = cx.sum(axis=0) > ? ?scx_op = dger(-1.0/n,scx,scx) > ? ?S = dgemm(1.0, cx.T, cx.T, beta=1.0, > ? ? ? ? ? ?c=scx_op, trans_a=0, trans_b=1, overwrite_c=1) > ? ?S[:] *= 1.0/(n-1) > ? ?return m,S.T > > Let's time this couple of times against NumPy: > > if __name__ == '__main__': > > ? ?from time import clock > > ? ?n,p = 20000,1000 > ? ?X = 2*np.random.randn(n,p) + 5 > > ? ?t0 = clock() > ? ?m,S = X.mean(axis=0), np.cov(X, rowvar=False, bias=0) > ? ?t1 = clock() > ? ?print t1-t0 > > ? ?t0 = clock() > ? ?m,S = mean_cov(X) > ? ?t1 = clock() > ? ?print t1-t0 > > L:\>meancov.py > 7.39771102515 > 2.24604790004 > > L:\>meancov.py > 16.1079984658 > 2.21100101726 > > That speaks for itself :-D > > One important lesson from this: it does not help to write a function > like cov(X) in C, when we have access to optimized BLAS from Python. So > let this serve as a warning against using C istead of Python. > > If we don't want to correct rounding error, dger goes out and it just > becomes: > > def mean_cov(X): > ? ?n,p = X.shape > ? ?m = X.mean(axis=0) > ? ?cx = X - m > ? ?S = dgemm(1./(n-1), cx.T, cx.T, trans_a=0, trans_b=1) > ? ?return m,S.T > > > Sturla > > > P.S. I should perhaps mention that I use SciPy and Numpy linked with > MKL. Looking at Windows' task manager, it seems both np.cov and my code > saturate all four cores. I don't have MKL, so just using one core: $ python mean_cov.py 5.09 4.73 If I switch from n,p = 20000,1000 to n,p = 2000,1000 I get: 0.51 0.48 A slightly faster version for small arrays (using dot instead of mean and sum): def mean_cov2(X): n,p = X.shape one = np.empty(n) one.fill(1.0 / n) m = np.dot(one, X) cx = X - m scx = np.dot(one, cx) scx *= n scx_op = dger(-1.0/n,scx,scx) S = dgemm(1.0, cx.T, cx.T, beta=1.0, c=scx_op, trans_a=0, trans_b=1, overwrite_c=1) S[:] *= 1.0/(n-1) return m,S.T gives $ python mean_cov.py 0.51 0.48 0.45 but does that get rid of the rounding error correction? (Pdb) ((S0 - S)*(S0 - S)).sum() 2.6895813860036798e-28 (Pdb) ((S2 - S)*(S2 - S)).sum() 2.331973369765596e-27 where S0 is np.cov, S is mean_cov, and S2 is mean_cov2. From sturla at molden.no Thu Jul 8 22:14:21 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 04:14:21 +0200 Subject: [SciPy-User] Cholesky problem (I need dtrtrs, not dpotrs) Message-ID: <4C3685FD.9060406@molden.no> Is there any way of getting access to Lapack function dtrtrs or BLAS function dtrsm from SciPy? cho_solve does not really do what I want (it calls dpotrs). Which by the way is extremely annoying, since 99% of use cases for Cholesky (at least in statistics) require solving U'X = Y, not U'UX = Y as cho_solve do. Goloub & van Loan does not even bother to mention the dpotrs algorithm. Sturla From sturla at molden.no Fri Jul 9 00:33:28 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 06:33:28 +0200 Subject: [SciPy-User] Cholesky problem (I need dtrtrs, not dpotrs) In-Reply-To: <4C3685FD.9060406@molden.no> References: <4C3685FD.9060406@molden.no> Message-ID: <4C36A698.9090900@molden.no> Sturla Molden skrev: > Is there any way of getting access to Lapack function dtrtrs or BLAS > function dtrsm from SciPy? cho_solve does not really do what I want (it > calls dpotrs). Which by the way is extremely annoying, since 99% of use > cases for Cholesky (at least in statistics) require solving U'X = Y, not > U'UX = Y as cho_solve do. Goloub & van Loan does not even bother to > mention the dpotrs algorithm. > Just to elaborate on this: Say we want to calculate the Mahalanobis distance from somw points X to a distribution N(m,S). With cho_factor and cho_solve, that would be cx = X - m sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1) whereas a similar routine "tri_solve" using dtrtrs would be cx = X - m sqmahal = (tri_solve(cho_factor(S),cx.T).T**2).sum(axis=1) This looks almost the same in Python, but the solution with tri_solve (dtrtrs) requires only half as many flops as cho_solve (dpotrs) does. In many statistical applications requiring substantial amount of computation (EM algorithms, MCMC simulation, and the like), computing Mahalanobis distances can be the biggest bottleneck. So that is one thing I really miss in http://svn.scipy.org/svn/scipy/trunk/scipy/linalg/decomp_cholesky.py P.S. In Fortran or C we would rather use a tight loop on dtrsv instead, computing sqmahal point by point, as it is more friendly to cache than dtrtrs on the whole block. Computing mahalanobis distances efficiently is a so common use case for Cholesky that I (almost) suggest this be added to SciPy as well. P.P.S. Those transpositions are actually required to make it run fast; in Matlab they would slow things down terribly. NumPy and Matlab is very different in this respect. A transpose does not create a new array in NumPy, it just switches the order flag between C and Fortran. C order is NumPy's native, but we must have Fortran order before calling BLAS or LAPACK. If we don't, f2py will make a copy with a transpose. So we avoid a transpose by taking a transpose. It might seem a bit paradoxical. Sturla From opossumnano at gmail.com Fri Jul 9 09:00:42 2010 From: opossumnano at gmail.com (Tiziano Zito) Date: Fri, 9 Jul 2010 15:00:42 +0200 (CEST) Subject: [SciPy-User] =?utf-8?q?=5BANN=5D_Autumn_School_=22Advanced_Scient?= =?utf-8?q?ific_Programming_in_Python=22_in_Trento=2C_Italy?= Message-ID: <20100709130042.99F312494DF@mail.bccn-berlin> Advanced Scientific Programming in Python ========================================= an Autumn School by the G-Node, the Center for Mind/Brain Sciences and the Fondazione Bruno Kessler Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques with theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We'll use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. Clean language design and easy extensibility are driving Python to become a standard tool for scientific computing. Some of the most useful open source libraries for scientific computing and visualization will be presented. This school is targeted at Post-docs and PhD students from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. A basic knowledge of the Python language is assumed. Participants without prior experience with Python should work through the proposed introductory materials. Date and Location ================= October 4th?8th, 2010. Trento, Italy. Preliminary Program =================== Day 0 (Mon Oct 4) ? Software Carpentry & Advanced Python ? Documenting code and using version control ? Object-oriented programming, design patterns, and agile programming ? Exception handling, lambdas, decorators, context managers, metaclasses Day 1 (Tue Oct 5) ? Software Carpentry ? Test-driven development, unit testing & Quality Assurance ? Debugging, profiling and benchmarking techniques ? Data serialization: from pickle to databases Day 2 (Wed Oct 6) ? Scientific Tools for Python ? Advanced NumPy ? The Quest for Speed (intro): Interfacing to C ? Programming project Day 3 (Thu Oct 7) ? The Quest for Speed ? Writing parallel applications in Python ? When parallelization does not help: the starving CPUs problem ? Programming project Day 4 (Fri Oct 8) ? Practical Software Development ? Efficient programming in teams ? Programming project ? The Pac-Man Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects Applications ============ You can apply on-line at http://www.g-node.org/python-autumnschool Applications must be submitted before August 31th, 2010. Notifications of acceptance will be sent by September 4th, 2010. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate in past editions was around 30%. Prerequisites ============= You are supposed to know the basics of Python to participate in the lectures! Look on the website for a list of introductory material. Faculty ======= ? Francesc Alted, author of PyTables, Castell? de la Plana, Spain ? Pietro Berkes, Volen Center for Complex Systems, Brandeis University, USA ? Valentin Haenel, Berlin Institute of Technology and Bernstein Center for Computational Neuroscience Berlin, Germany ? Zbigniew J?drzejewski-Szmek, Faculty of Physics, University of Warsaw, Poland ? Eilif Muller, The Blue Brain Project, Ecole Polytechnique F?d?rale de Lausanne, Switzerland ? Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy ? Rike-Benjamin Schuppner, Bernstein Center for Computational Neuroscience Berlin, Germany ? Bartosz Tele?czuk, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany ? Bastian Venthur, Berlin Institute of Technology and Bernstein Focus: Neurotechnology, Germany ? St?fan van der Walt, Applied Mathematics, University of Stellenbosch, South Africa ? Tiziano Zito, Berlin Institute of Technology and Bernstein Center for Computational Neuroscience Berlin, Germany Organized by Paolo Avesani for the Center for Mind/Brain Sciences and the Fondazione Bruno Kessler , and by Zbigniew J?drzejewscySzmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Website: http://www.g-node.org/python-autumnschool Contact: python-info at g-node.org From giacomo.boffi at polimi.it Fri Jul 9 09:36:28 2010 From: giacomo.boffi at polimi.it (Giacomo Boffi) Date: Fri, 9 Jul 2010 15:36:28 +0200 Subject: [SciPy-User] use of solveh_banded Message-ID: <19511.9692.29476.677182@aiuole.stru.polimi.it> i'm trying to use solveh_banded, but to no avail until now from ipython -p scipy: ======================================================================== In [74]: b Out[74]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.]) In [75]: m Out[75]: array([[-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.], [-1., 3.]]) In [76]: solveh_banded(m,b) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/boffi/ in () /usr/lib/python2.6/dist-packages/scipy/linalg/basic.pyc in solveh_banded(ab, b, overwrite_ab, overwrite_b, lower) 254 lower=lower, 255 overwrite_ab=overwrite_ab, --> 256 overwrite_b=overwrite_b) 257 if info==0: 258 return c, x ValueError: On entry to DPBSV parameter number 8 had an illegal value In [77]: pbsv?? Type: fortran String Form: Namespace: Interactive Docstring [source file open failed]: dpbsv - Function signature: c,x,info = dpbsv(ab,b,[lower,ldab,overwrite_ab,overwrite_b]) Required arguments: ab : input rank-2 array('d') with bounds (ldab,n) b : input rank-2 array('d') with bounds (nrhs,ldb) Optional arguments: lower := 0 input int overwrite_ab := 0 input int ldab := shape(ab,0) input int overwrite_b := 0 input int Return objects: c : rank-2 array('d') with bounds (ldab,n) and ab storage x : rank-2 array('d') with bounds (nrhs,ldb) and b storage info : int In [78]: ======================================================================== in particular, i cannot understand the "parameter number 8 had an illegal value" error message thank you in advance, gb -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From denis-bz-gg at t-online.de Fri Jul 9 10:33:06 2010 From: denis-bz-gg at t-online.de (denis) Date: Fri, 9 Jul 2010 07:33:06 -0700 (PDT) Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: Message-ID: <370a7c72-cdbd-4cf0-bd96-d677304df10c@u26g2000yqu.googlegroups.com> Folks, you've probably seen http://www.sgi.com/tech/stl/partial_sort.html -- sort uses the introsort algorithm and partial_sort uses heapsort introsort is usually faster by a factor of 2 to 5 Heapsort fans: the normal pivot selector (without medians) is blind; trying for a pivot near a small k/n percentile is fun. Any comments on argsort( few= ) vs a separate partial_sort() ? cheers -- denis From charlesr.harris at gmail.com Fri Jul 9 11:46:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Jul 2010 09:46:37 -0600 Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: <370a7c72-cdbd-4cf0-bd96-d677304df10c@u26g2000yqu.googlegroups.com> References: <370a7c72-cdbd-4cf0-bd96-d677304df10c@u26g2000yqu.googlegroups.com> Message-ID: On Fri, Jul 9, 2010 at 8:33 AM, denis wrote: > Folks, > you've probably seen http://www.sgi.com/tech/stl/partial_sort.html > -- > sort uses the introsort algorithm and partial_sort uses heapsort > introsort is usually faster by a factor of 2 to 5 > > Sounds like partial_sort just sets up the heap and then pulls off the needed elements. That would make it about twice as fast as the normal heapsort for a small number of values, or about the same as a full quicksort. > Heapsort fans: the normal pivot selector (without medians) is blind; > trying for a pivot near a small k/n percentile is fun. > > You mean quicksort? > Any comments on argsort( few= ) vs a separate partial_sort() ? > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-gg at t-online.de Fri Jul 9 12:13:02 2010 From: denis-bz-gg at t-online.de (denis) Date: Fri, 9 Jul 2010 09:13:02 -0700 (PDT) Subject: [SciPy-User] get best few of many: argsort( few= ) using std::partial_sort ? In-Reply-To: References: <370a7c72-cdbd-4cf0-bd96-d677304df10c@u26g2000yqu.googlegroups.com> Message-ID: On Jul 9, 5:46?pm, Charles R Harris wrote: > On Fri, Jul 9, 2010 at 8:33 AM, denis wrote: > > Folks, > > ?you've probably seenhttp://www.sgi.com/tech/stl/partial_sort.html > > -- > > ? ?sort uses the introsort algorithm and partial_sort uses heapsort > > ? ?introsort is usually faster by a factor of 2 to 5 > > Sounds like partial_sort just sets up the heap and then pulls off the needed > elements. That would make it about twice as fast as the normal heapsort for > a small number of values, or about the same as a full quicksort. Yes, at least in STLport _algo.c -- not so hot. What with factors of 2 from various *sort, arch, cache, inline operator< vs lessf() the possible gain of partial_sort is melting away ... > > Heapsort fans: the normal pivot selector (without medians) is blind; > > trying for a pivot near a small k/n percentile is fun. > > You mean quicksort? yes, my mistake From devicerandom at gmail.com Fri Jul 9 12:50:55 2010 From: devicerandom at gmail.com (ms) Date: Fri, 09 Jul 2010 17:50:55 +0100 Subject: [SciPy-User] [SOLVED] optimize.leastsq and improper input parameters In-Reply-To: References: <4C360EDF.3080805@gmail.com> <4C365BAD.3040101@gmail.com> Message-ID: <4C37536F.6090607@gmail.com> On 09/07/10 00:21, josef.pktd at gmail.com wrote: >> >> Thanks! I didn't notice that leastsq was so clever :) >> I tried quickly now and I still have some trouble, but seems unrelated >> to the issue, tomorrow I'll see if it's definitely fixed. > > If it's possible in your original code, I would replace the list > comprehensions with numpy array operations. Good tip, I'll look at that. It seems the issue is solved. Thanks a lot! m. > Josef > >> >> thanks again, >> >> m. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Fri Jul 9 12:53:51 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 18:53:51 +0200 Subject: [SciPy-User] calculating covariances fast (and accurate) In-Reply-To: References: <4C365F87.4020901@molden.no> Message-ID: <4C37541F.5020903@molden.no> Keith Goodman skrev: > I don't have MKL, so just using one core: > Consider obtaining NumPy one SciPy from one of these sources instead of scipy.org: http://www.lfd.uci.edu/~gohlke/pythonlibs/ http://www.enthought.com/products/epd.php From warren.weckesser at enthought.com Fri Jul 9 12:54:26 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 09 Jul 2010 11:54:26 -0500 Subject: [SciPy-User] use of solveh_banded In-Reply-To: <19511.9692.29476.677182@aiuole.stru.polimi.it> References: <19511.9692.29476.677182@aiuole.stru.polimi.it> Message-ID: <4C375442.6000201@enthought.com> Giacomo Boffi wrote: > i'm trying to use solveh_banded, but to no avail until now > There was a bug in solveh_banded; see http://projects.scipy.org/scipy/ticket/676 It has been fixed for the soon-to-be-officially-released SciPy 0.8. Warren > from ipython -p scipy: > ======================================================================== > In [74]: b > Out[74]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > > In [75]: m > Out[75]: > array([[-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.], > [-1., 3.]]) > > In [76]: solveh_banded(m,b) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > > /home/boffi/ in () > > /usr/lib/python2.6/dist-packages/scipy/linalg/basic.pyc in solveh_banded(ab, b, overwrite_ab, overwrite_b, lower) > 254 lower=lower, > 255 overwrite_ab=overwrite_ab, > --> 256 overwrite_b=overwrite_b) > 257 if info==0: > 258 return c, x > > ValueError: On entry to DPBSV parameter number 8 had an illegal value > > In [77]: pbsv?? > Type: fortran > String Form: > Namespace: Interactive > Docstring [source file open failed]: > dpbsv - Function signature: > c,x,info = dpbsv(ab,b,[lower,ldab,overwrite_ab,overwrite_b]) > Required arguments: > ab : input rank-2 array('d') with bounds (ldab,n) > b : input rank-2 array('d') with bounds (nrhs,ldb) > Optional arguments: > lower := 0 input int > overwrite_ab := 0 input int > ldab := shape(ab,0) input int > overwrite_b := 0 input int > Return objects: > c : rank-2 array('d') with bounds (ldab,n) and ab storage > x : rank-2 array('d') with bounds (nrhs,ldb) and b storage > info : int > > > In [78]: > ======================================================================== > > in particular, i cannot understand the "parameter number 8 had an > illegal value" error message > > thank you in advance, > gb > > From kwgoodman at gmail.com Fri Jul 9 12:58:55 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 9 Jul 2010 09:58:55 -0700 Subject: [SciPy-User] calculating covariances fast (and accurate) In-Reply-To: <4C37541F.5020903@molden.no> References: <4C365F87.4020901@molden.no> <4C37541F.5020903@molden.no> Message-ID: On Fri, Jul 9, 2010 at 9:53 AM, Sturla Molden wrote: > Keith Goodman skrev: >> I don't have MKL, so just using one core: >> > Consider obtaining NumPy one SciPy from one of these sources instead of > scipy.org: > > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > http://www.enthought.com/products/epd.php I'm on linux. But I am curious about the speed ups you get. Is there an array size below which your parallel version is slower? From sturla at molden.no Fri Jul 9 13:41:07 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 09 Jul 2010 19:41:07 +0200 Subject: [SciPy-User] calculating covariances fast (and accurate) In-Reply-To: References: <4C365F87.4020901@molden.no> <4C37541F.5020903@molden.no> Message-ID: <4C375F33.1020302@molden.no> Keith Goodman skrev: > I'm on linux. You can e.g. build GotoBLAS2. I think it contains a full LAPACK now, and is known to be faster than MKL. GotoBLAS is very easy to build, just execute a bash script, no configuration at all. Then comes the annoying part, which is buildning and installing NumPy and SciPy. From thomas.robitaille at gmail.com Fri Jul 9 17:02:23 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Fri, 9 Jul 2010 17:02:23 -0400 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: <4C2BBBA7.5060006@gemini.edu> References: <4C2BBBA7.5060006@gemini.edu> Message-ID: Hi all, After reading all the replies, I have the following suggestion to make. The model scipy follows is to have a 'core' scipy package, and 'scikit' packages which share a common namespace, and which are meant as addons to the scipy package, but have not yet (or might never) make it to the main scipy package: http://www.scipy.org/scipy/scikits/ "Scipy Toolkits are independent and seperately installable projects hosted under a common namespace. Packages that are distributed in this way are here (instead of in monolithic scipy) for at least one of three general reasons. Each of these reasons use the same high-level namespace (scikits)." I think we can use this model, and that the following approach can be used here, namely: - start with a very basic 'astropy' package, with e.g. support for FITS/WCS - agree to coordinate astronomy packages with a common namespace (e.g. 'astrokit', so for example, APLpy would become astrokit.aplpy). This can help us manage the namespace (as suggested in Joe Harrington's email) - as astrokit modules mature, they can (if the authors are willing) be merged into the main 'astropy' package, once they have met a number of conditions, including e.g. unit tests, sphinx documentation, limited dependencies (e.g. numpy/scipy/matplotlib, and any package in the 'astropy' package), and compatible license. The advantage of this model is that this encourages the growth from the bottom up of a core astronomy package, which is manageable, as well as the independent development of other packages. It also means that the core package will be quite stable, because it will only accrete 'astrokit' modules as they become stable and mature. At the same time, it encourages developers to make their own innovative astrokit, but without commitment from the maintainers of the core package to accrete it in future. In passing, this also leaves the possibility for those who want to develop meta-pacakges of astrokit modules. Also, it will make it easier for those who want to build fully fledged astronomy python distributions. There have been many ideas passed around in the mailing list, but I think that some are maybe too ambitious. I do think that the idea suggested above is much more manageable. The concrete steps would be: - setup a central repository for the core packages, as well as astrokit pacakges (although we should leave the option for people to develop astrokit packages outside this repository too, and rely on trust and communication to avoid namespace clashes) - start a core package with e.g. FITS and WCS support (e.g. pyfits + pywcs) - set up a list of 'registered' astrokit names to avoid conflict. - set up a list of recommendations and requirements for astrokit pacakges - encourage developers of existing packages to use this namespace and follow the guidelines. This would be very little work in some cases, which is why I think it could work. Sharing a namespace is (I think) the first step to showing that we are willing to work on something coherent, and users would only see two top-level namespaces - astropy and astrokit, which would be a great improvement over the current situation where it is not immediately obvious which packages are astronomy related. Comments/suggestions/criticism welcome! Cheers, Tom On Jun 30, 2010, at 5:48 PM, James Turner wrote: > Dear Python users in astronomy, > > At SciPy 2009, I arranged an astronomy BoF where we discussed the > fact that there are now a number of astronomy libraries for Python > floating around and maybe it would be good to collect more code into > a single place. People seemed receptive to this idea and weren't sure > why it hasn't already happened, given that there has been an Astrolib > page at SciPy for some years now, with an associated SVN repository: > > http://scipy.org/AstroLib > > After the meeting last August, I was supposed to contact the mailing > list and some library authors I had talked to previously, to discuss > this further. My apologies for taking 10 months to do that! I did > draft an email the day after the BoF, but then we ran into a hurdle > with setting up new committers to the AstroLib repository (which has > taken a lot longer than expected to resolve), so it seemed a bad > time to suggest that new people start using it. > > To discuss these issues further, we'd like to encourage everyone to > sign up for the AstroPy mailing list if you are not already on it. > The traffic is just a few messages per month. > > http://lists.astropy.scipy.org/mailman/listinfo/astropy > > We (the 2009 BoF group) would also like to hear on the list about > why people have decided to host their own astronomy library (eg. not > being aware of the one at SciPy). Are you interested in contributing > to Astrolib? Do you have any other comments or concerns about > co-ordinating tools? Our motivation is to make libraries easy to > find and install, allow sharing code easily, help rationalize > available functionality and fill in what's missing. A standard > astronomy library with a single set of documentation should be more > coherent and easier to maintain. The idea is not to limit authors' > flexibility of take ownership of their code -- the sub-packages > can still be maintained by different people. > > If you're at SciPy this week, Perry Greenfield and I would be happy > to talk to you. If you would like to add your existing library to > Astrolib, please contact Perry Greenfield or Mark Sienkiewicz at > STScI for access (contact details at http://scipy.org/AstroLib). > Note that the repository is being moved to a new server this week, > after which the URLs will be updated at scipy.org. > > Thanks! > > James Turner (Gemini). > > Bcc: various library authors > > _______________________________________________ > AstroPy mailing list > AstroPy at scipy.org > http://mail.scipy.org/mailman/listinfo/astropy From dwf at cs.toronto.edu Fri Jul 9 18:52:09 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 9 Jul 2010 18:52:09 -0400 Subject: [SciPy-User] Cholesky problem (I need dtrtrs, not dpotrs) In-Reply-To: <4C36A698.9090900@molden.no> References: <4C3685FD.9060406@molden.no> <4C36A698.9090900@molden.no> Message-ID: <74EC831F-A029-40CC-BFC5-2839836DE13B@cs.toronto.edu> On 2010-07-09, at 12:33 AM, Sturla Molden wrote: > cx = X - m > sqmahal = (tri_solve(cho_factor(S),cx.T).T**2).sum(axis=1) > > This looks almost the same in Python, but the solution with tri_solve (dtrtrs) requires only half as many flops as cho_solve (dpotrs) does. > > In many statistical applications requiring substantial amount of computation (EM algorithms, MCMC simulation, and the like), computing Mahalanobis distances can be the biggest bottleneck. +1 on Mahalanobis calculations being a pain/bottleneck... And, wow. I did not quite realize the situation was that bad. David From super.inframan at gmail.com Sat Jul 10 16:21:52 2010 From: super.inframan at gmail.com (Mr Nilsson) Date: Sat, 10 Jul 2010 13:21:52 -0700 (PDT) Subject: [SciPy-User] overflow Message-ID: Hey Im working on a photo processing tool using numpy arrays to store the pixel data. I am currently in the process of adding 8 and 16bit support to my system in addition to float32. The problem is that when working with uint8 and uint16 numpy overflows the values of my pixels when I for example multiply (brighten) an image, making bright pixels dark again. Is there a way to have numpy clip the values to the maximum value of whatever dtype is used instead of overflowing? cheers Gusty From sturla at molden.no Sat Jul 10 17:28:09 2010 From: sturla at molden.no (Sturla Molden) Date: Sat, 10 Jul 2010 23:28:09 +0200 Subject: [SciPy-User] overflow In-Reply-To: References: Message-ID: <4C38E5E9.5050702@molden.no> Mr Nilsson skrev: > Is there a way to have numpy clip the values to the maximum value of > whatever dtype is used instead of overflowing? > Subclass the scalar np.uint8, and change how multiplication works to something like this: inline npy_uint8 multiply(npy_uint8 a, npy_uint8 b) { npy_unit32 tmp = b; tmp *= a; return (npy_uint8)((tmp >> 2) ? 0xFF : tmp); } Sturla From seb.haase at gmail.com Sat Jul 10 17:43:07 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Sat, 10 Jul 2010 23:43:07 +0200 Subject: [SciPy-User] overflow In-Reply-To: <4C38E5E9.5050702@molden.no> References: <4C38E5E9.5050702@molden.no> Message-ID: On Sat, Jul 10, 2010 at 11:28 PM, Sturla Molden wrote: > Mr Nilsson skrev: >> Is there a way to have numpy clip the values to the maximum value of >> whatever dtype is used instead of overflowing? >> > Subclass the scalar np.uint8, and change how multiplication works to > something like this: > > inline npy_uint8 multiply(npy_uint8 a, npy_uint8 b) > { > ? ?npy_unit32 tmp = b; > ? ?tmp *= a; > ? ?return (npy_uint8)((tmp >> 2) ? 0xFF : tmp); > } > This does not sound like an easy solution ... First: How to sub-class a basic thing like this .... this looks like it's happening on the C level ... !? Second: there are probably quite a lot of functions to take care of ... Could numexpr be helpful for this - if memory is an issue ? If memory is no issue, it probably the easiest solution to just (explicitly) convert to float32 and only as last step convert back to uint8. -Sebastian Haase From sturla at molden.no Sat Jul 10 18:09:29 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 11 Jul 2010 00:09:29 +0200 Subject: [SciPy-User] overflow In-Reply-To: References: <4C38E5E9.5050702@molden.no> Message-ID: <4C38EF99.6080908@molden.no> Sebastian Haase skrev: > > This does not sound like an easy solution ... > First: How to sub-class a basic thing like this .... this looks like > it's happening on the C level ... !? > See chapter 15.3 in Travis' NumPy book. I think the new dtype must register the C function to the for loop of ufunc np.multiply, though I haven't checked thoroughly. Subclassing with Cython is likely to be easier. Sturla From charlesr.harris at gmail.com Sat Jul 10 18:10:45 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 10 Jul 2010 16:10:45 -0600 Subject: [SciPy-User] overflow In-Reply-To: References: <4C38E5E9.5050702@molden.no> Message-ID: On Sat, Jul 10, 2010 at 3:43 PM, Sebastian Haase wrote: > On Sat, Jul 10, 2010 at 11:28 PM, Sturla Molden wrote: > > Mr Nilsson skrev: > >> Is there a way to have numpy clip the values to the maximum value of > >> whatever dtype is used instead of overflowing? > >> > > Subclass the scalar np.uint8, and change how multiplication works to > > something like this: > > > > inline npy_uint8 multiply(npy_uint8 a, npy_uint8 b) > > { > > npy_unit32 tmp = b; > > tmp *= a; > > return (npy_uint8)((tmp >> 2) ? 0xFF : tmp); > > } > > > This does not sound like an easy solution ... > First: How to sub-class a basic thing like this .... this looks like > it's happening on the C level ... !? > Second: there are probably quite a lot of functions to take care of ... > > Could numexpr be helpful for this - if memory is an issue ? > If memory is no issue, it probably the easiest solution to just > (explicitly) convert to float32 and only as last step convert back to > uint8. > > That's what I would recommend as the minimal work approach if memory isn't an issue. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 10 18:52:53 2010 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 10 Jul 2010 15:52:53 -0700 Subject: [SciPy-User] overflow In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 1:21 PM, Mr Nilsson wrote: > Hey > Im working on a photo processing tool using numpy arrays to store the > pixel data. I am currently in the process of adding 8 and 16bit > support to my system in addition to float32. > The problem is that when working with uint8 and uint16 numpy overflows > the values of my pixels when I for example multiply (brighten) an > image, making bright pixels dark again. > > Is there a way to have numpy clip the values to the maximum value of > whatever dtype is used instead of overflowing? You could do it manually (not tested): def uint_saturating_multiply(image, gain): type_info = np.iinfo(image.dtype) assert type_info.kind == "u" # Find smallest number n such that n * gain will overflow: min_clipping = (long(type_info.max) + gain) // gain clip_mask = image >= min_clipping result = image * gain result[clip_mask] = type_info.max return result -- Nathaniel From super.inframan at gmail.com Sat Jul 10 19:00:20 2010 From: super.inframan at gmail.com (Mr Nilsson) Date: Sat, 10 Jul 2010 16:00:20 -0700 (PDT) Subject: [SciPy-User] overflow In-Reply-To: References: <4C38E5E9.5050702@molden.no> Message-ID: Hm well the reason im going 8/16 bit in the first place is to conserve memory, so going back to float isnt optimal.. Is there a reason numpy overflows in the first place? Coming from an image processing background i cant really think of any situation where it would be preferable, possibly other than performance? thanks for the help, Gusty On 10 July, 23:10, Charles R Harris wrote: > On Sat, Jul 10, 2010 at 3:43 PM, Sebastian Haase wrote: > > > > > > > On Sat, Jul 10, 2010 at 11:28 PM, Sturla Molden wrote: > > > Mr Nilsson skrev: > > >> Is there a way to have numpy clip the values to the maximum value of > > >> whatever dtype is used instead of overflowing? > > > > Subclass the scalar np.uint8, and change how multiplication works to > > > something like this: > > > > inline npy_uint8 multiply(npy_uint8 a, npy_uint8 b) > > > { > > > ? ?npy_unit32 tmp = b; > > > ? ?tmp *= a; > > > ? ?return (npy_uint8)((tmp >> 2) ? 0xFF : tmp); > > > } > > > This does not sound like an easy solution ... > > First: How to sub-class a basic thing like this ?.... this looks like > > it's happening on the C level ... !? > > Second: there are probably quite a lot of functions to take care of ... > > > Could numexpr be helpful for this - if memory is an issue ? > > If memory is no issue, it probably the easiest solution to just > > (explicitly) convert to float32 and only as last step convert back to > > uint8. > > That's what I would recommend as the minimal work approach if memory isn't > an issue. > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From brazhe at gmail.com Sat Jul 10 19:26:11 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 03:26:11 +0400 Subject: [SciPy-User] raising a matrix to float power Message-ID: Hi, I failed to find a way to raise a matrix to a non-integer power in numpy/scipy In Octave/Matlab, one would write M^0.5 to get the result whereas in numpy >>> maxtrix(M, 0.5) raises the "TypeError: exponent must be an integer" Is there a way to do matrix exponentiation to non-integer powers in numpy or scipy? Hope the answer is positive :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jul 10 19:29:18 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 10 Jul 2010 17:29:18 -0600 Subject: [SciPy-User] overflow In-Reply-To: References: <4C38E5E9.5050702@molden.no> Message-ID: On Sat, Jul 10, 2010 at 5:00 PM, Mr Nilsson wrote: > Hm well the reason im going 8/16 bit in the first place is to conserve > memory, so going back to float isnt optimal.. > Is there a reason numpy overflows in the first place? Coming from an > image processing background i cant really think of any situation where > it would be preferable, possibly other than performance? > > It's what c does, c integers are modulo 2**(whatever). You could write a special purpose ufunc for scaling arrays if you really need that option. There was a simple example here using cython a while back , OK, here it is on the cython wiki CreatingUFuncs. So you might want to ask on the cython list since the example might be a touch on the sketchy side... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Jul 10 19:39:25 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 11 Jul 2010 01:39:25 +0200 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: <4C3904AD.6090905@molden.no> Alexey Brazhe skrev: > Hi, > I failed to find a way to raise a matrix to a non-integer power in > numpy/scipy > > In Octave/Matlab, one would write M^0.5 to get the result > whereas in numpy > >>> maxtrix(M, 0.5) > raises the "TypeError: exponent must be an integer" > > Is there a way to do matrix exponentiation to non-integer powers in > numpy or scipy? > > Hope the answer is positive :) Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what matrix exponentiation could possibly mean. Are you sure you don't mean array exponentiation? Sturla From josef.pktd at gmail.com Sat Jul 10 19:45:36 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 10 Jul 2010 19:45:36 -0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: <4C3904AD.6090905@molden.no> References: <4C3904AD.6090905@molden.no> Message-ID: On Sat, Jul 10, 2010 at 7:39 PM, Sturla Molden wrote: > Alexey Brazhe skrev: >> Hi, >> I failed to find a way to raise a matrix to a non-integer power in >> numpy/scipy >> >> In Octave/Matlab, one would write M^0.5 to get the result >> whereas in numpy >> >>> maxtrix(M, 0.5) >> raises the "TypeError: exponent must be an integer" >> >> Is there a way to do matrix exponentiation to non-integer powers in >> numpy or scipy? >> >> Hope the answer is positive :) > Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what > matrix exponentiation could possibly mean. > > Are you sure you don't mean array exponentiation? scipy linalg has several matrix functions 'expm', 'expm2', 'expm3', 'sqrtm', 'logm' 'sqrtm' solves dot(B,B) = A not dot(B.T,B) = A Besides cholesky, I use eigenvector decomposition to get the powers and other functions. Josef > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josh.holbrook at gmail.com Sat Jul 10 19:47:04 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Sat, 10 Jul 2010 15:47:04 -0800 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: <4C3904AD.6090905@molden.no> References: <4C3904AD.6090905@molden.no> Message-ID: On Sat, Jul 10, 2010 at 3:39 PM, Sturla Molden wrote: > Alexey Brazhe skrev: >> Hi, >> I failed to find a way to raise a matrix to a non-integer power in >> numpy/scipy >> >> In Octave/Matlab, one would write M^0.5 to get the result >> whereas in numpy >> >>> maxtrix(M, 0.5) >> raises the "TypeError: exponent must be an integer" >> >> Is there a way to do matrix exponentiation to non-integer powers in >> numpy or scipy? >> >> Hope the answer is positive :) > Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what > matrix exponentiation could possibly mean. > > Are you sure you don't mean array exponentiation? > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I don't know the answer, but I did play around with Octave, and I figured out that the meaning of float exponentiation in octave is: A^x = B such that B^(1/x) = A So, for example: octave:7> a a = 1 2 3 4 octave:8> b=a^0.5 b = 0.55369 + 0.46439i 0.80696 - 0.21243i 1.21044 - 0.31864i 1.76413 + 0.14575i octave:9> b^2 ans = 1.0000 - 0.0000i 2.0000 - 0.0000i 3.0000 - 0.0000i 4.0000 + 0.0000i I'd never really heard of this before, but it does seem to work for floats. (josef said the same while I was writing this, but whatever. >:o ) --Josh From sturla at molden.no Sat Jul 10 19:47:45 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 11 Jul 2010 01:47:45 +0200 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: <4C3904AD.6090905@molden.no> References: <4C3904AD.6090905@molden.no> Message-ID: <4C3906A1.1050101@molden.no> Sturla Molden skrev: > ure, M**0.5 is cho_factor(M). For other non-integers I am not sure what > matrix exponentiation could possibly mean. > > Are you sure you don't mean array exponentiation? > Actually cho_fatcor don't bother to zero the lower triangle, so M**0.5 is cholesky(M). From aarchiba at physics.mcgill.ca Sat Jul 10 19:47:48 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Sat, 10 Jul 2010 19:47:48 -0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On 10 July 2010 19:26, Alexey Brazhe wrote: > Hi, > I failed to find a way to raise a matrix to a non-integer power in > numpy/scipy > > In Octave/Matlab, one would write M^0.5 to get the result > whereas in numpy >>>> maxtrix(M, 0.5) > raises the "TypeError: exponent must be an integer" > > Is there a way to do matrix exponentiation to non-integer powers in numpy or > scipy? > > Hope the answer is positive :) There are several, but you need to think carefully about what you're actually trying to do, and what matrices you're going to try to do it with. If you want the "matrix square root", that's obtained using the cholesky decomposition but only makes sense for positive definite matrices. If you want more general powers, scipy.linalg provides "expm", "expm2", and "expm3", which all calculate notionally the same thing (what you get by plugging a matrix into the Taylor series for exp) in different ways; which one is the most accurate, efficient, and/or applicable for your particular matrix will vary. scipy.linalg also contains matrix versions of various other functions. Anne P.S. please keep in mind the distinction between "matrix" objects (arrays with syntactic sugar for multiplication and exponentiation) and matrix versions of various functions (which generally operate on arrays). -A > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From brazhe at gmail.com Sat Jul 10 19:57:02 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 03:57:02 +0400 Subject: [SciPy-User] raising a matrix to float power Message-ID: >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what >matrix exponentiation could possibly mean. >Are you sure you don't mean array exponentiation? Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, -1/2). More precisely, I need to compute W(W^TW)^(-1/2). cho_factor fails with "matrix not positive definite", and I don't know how to avoid that A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jul 10 20:15:25 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 10 Jul 2010 18:15:25 -0600 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 5:57 PM, Alexey Brazhe wrote: > >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what > > >matrix exponentiation could possibly mean. > > >Are you sure you don't mean array exponentiation? > > Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, > -1/2). > More precisely, I need to compute W(W^TW)^(-1/2). > cho_factor fails with "matrix not positive definite", and I don't know how > to avoid that > Well, the question remains as to the precise meaning of the square root (what is the application?), but my guess is that if you use eigh to decompose (W^TW) into u*d*u^T then form u*(d^{-1/2}*u^T you will get what you need. Maybe ;) Zero elements of d, if any, will be a problem. You can also use the svd if the previous interpretation is correct, since if W = u*d*v the whole expression above reduces to u*d^(.5)*v. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Sat Jul 10 20:20:17 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Sat, 10 Jul 2010 16:20:17 -0800 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 3:57 PM, Alexey Brazhe wrote: >>Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what >>matrix exponentiation could possibly mean. > >>Are you sure you don't mean array exponentiation? > > Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, -1/2). > More precisely, I need to compute W(W^TW)^(-1/2). > cho_factor fails with "matrix not positive definite", and I don't know how > to avoid that > > A. > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > Avoid negative, indefinite and semidefinite matrices. ;) More seriously, Cholesky factorization requires positive definite matrices, so if you have negative eigenvalues, that's not gonna work. --Josh From josef.pktd at gmail.com Sat Jul 10 20:24:39 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 10 Jul 2010 20:24:39 -0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 8:15 PM, Charles R Harris wrote: > > > On Sat, Jul 10, 2010 at 5:57 PM, Alexey Brazhe wrote: >> >> >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what >> >matrix exponentiation could possibly mean. >> >> >Are you sure you don't mean array exponentiation? >> >> Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, >> -1/2). >> More precisely, I need to compute W(W^TW)^(-1/2). >> cho_factor fails with "matrix not positive definite", and I don't know how >> to avoid that > > Well, the question remains as to the precise meaning of the square root > (what is the application?), but my guess is that if you use eigh to > decompose? (W^TW) into u*d*u^T then form u*(d^{-1/2}*u^T you will get what > you need. Maybe ;) Zero elements of d, if any, will be a problem. Part of some code I used where I don't find the cleaned up version omega = np.dot(dummyall, dummyall.T) ev, evec = np.linalg.eigh(omega) #eig doesn't work omegainvhalf = evec/np.sqrt(ev) print np.max(np.abs(np.dot(omegainvhalf,omegainvhalf.T) - omegainv)) # now we can use omegainvhalf in GLS (instead of the cholesky) no guarantees, Josef > > You can also use the svd if the previous interpretation is correct, since if > W = u*d*v the whole expression above reduces to u*d^(.5)*v. > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From charlesr.harris at gmail.com Sat Jul 10 20:35:47 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 10 Jul 2010 18:35:47 -0600 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 6:15 PM, Charles R Harris wrote: > > > On Sat, Jul 10, 2010 at 5:57 PM, Alexey Brazhe wrote: > >> >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what >> >> >matrix exponentiation could possibly mean. >> >> >Are you sure you don't mean array exponentiation? >> >> Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, >> -1/2). >> More precisely, I need to compute W(W^TW)^(-1/2). >> cho_factor fails with "matrix not positive definite", and I don't know how >> to avoid that >> > > Well, the question remains as to the precise meaning of the square root > (what is the application?), but my guess is that if you use eigh to > decompose (W^TW) into u*d*u^T then form u*(d^{-1/2}*u^T you will get what > you need. Maybe ;) Zero elements of d, if any, will be a problem. > > You can also use the svd if the previous interpretation is correct, since > if W = u*d*v the whole expression above reduces to u*d^(.5)*v. > > Oops, it is even simpler than that, it reduces to u*v. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Jul 10 22:55:24 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 10 Jul 2010 21:55:24 -0500 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 18:47, Anne Archibald wrote: > On 10 July 2010 19:26, Alexey Brazhe wrote: >> Hi, >> I failed to find a way to raise a matrix to a non-integer power in >> numpy/scipy >> >> In Octave/Matlab, one would write M^0.5 to get the result >> whereas in numpy >>>>> maxtrix(M, 0.5) >> raises the "TypeError: exponent must be an integer" >> >> Is there a way to do matrix exponentiation to non-integer powers in numpy or >> scipy? >> >> Hope the answer is positive :) > > There are several, but you need to think carefully about what you're > actually trying to do, and what matrices you're going to try to do it > with. If you want the "matrix square root", that's obtained using the > cholesky decomposition but only makes sense for positive definite > matrices. If you want more general powers, scipy.linalg provides > "expm", "expm2", and "expm3", which all calculate notionally the same > thing (what you get by plugging a matrix into the Taylor series for > exp) in different ways; Actually, that will help you calculate scalar**matrix, not matrix**scalar, which is what was being requested here. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From d.l.goldsmith at gmail.com Sat Jul 10 23:37:10 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 10 Jul 2010 20:37:10 -0700 Subject: [SciPy-User] Reality check Message-ID: The default for the axis argument of fftpack.rfft is -1; I couldn't find the source for what this function ultimately calls, but the fact that it preserves the shape of its input (determined empirically) suggests that it means transform over the last axis (as opposed to transform flattened input). Two questions: a) Is this correct? b) I forget: this means that it is transforming "across" columns, i.e., each row represents a separate data set, correct? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Jul 10 23:43:59 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 10 Jul 2010 22:43:59 -0500 Subject: [SciPy-User] Reality check In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 22:37, David Goldsmith wrote: > The default for the axis argument of fftpack.rfft is -1; I couldn't find the > source for what this function ultimately calls, but the fact that it > preserves the shape of its input (determined empirically) suggests that it > means transform over the last axis (as opposed to transform flattened > input).? Two questions: > > a) Is this correct? Yes. > b) I forget: this means that it is transforming "across" columns, i.e., each > row represents a separate data set, correct? Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From brazhe at gmail.com Sun Jul 11 03:24:22 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 11:24:22 +0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: Did I get it right that W(W^TW)^(-1/2) reduces to UV, where U,V come from svd of W? As for the application: I'm trying to translate a piece of Matlab code to numpy+scipy+matplotlib that does simple independent component analysis on stacks of images. The code: http://www.mathworks.nl/matlabcentral/fx_files/25405/2/content/CellSort%201.1/doc/CellsortICA.htmland the line with ^(1/2) is number 137. On Sun, Jul 11, 2010 at 4:35 AM, Charles R Harris wrote: > > > On Sat, Jul 10, 2010 at 6:15 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jul 10, 2010 at 5:57 PM, Alexey Brazhe wrote: >> >>> >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure what >>> >>> >matrix exponentiation could possibly mean. >>> >>> >Are you sure you don't mean array exponentiation? >>> >>> Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, >>> -1/2). >>> More precisely, I need to compute W(W^TW)^(-1/2). >>> cho_factor fails with "matrix not positive definite", and I don't know >>> how to avoid that >>> >> >> Well, the question remains as to the precise meaning of the square root >> (what is the application?), but my guess is that if you use eigh to >> decompose (W^TW) into u*d*u^T then form u*(d^{-1/2}*u^T you will get what >> you need. Maybe ;) Zero elements of d, if any, will be a problem. >> >> You can also use the svd if the previous interpretation is correct, since >> if W = u*d*v the whole expression above reduces to u*d^(.5)*v. >> >> > Oops, it is even simpler than that, it reduces to u*v. > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brazhe at gmail.com Sun Jul 11 05:13:09 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 13:13:09 +0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: Yes, my problem seems to be solved with this: #-------------------------------- def winvhalf(X): e, V = linalg.eigh(X) return dot(V, dot(inv(diag(e+0j)**0.5),V.T)) ## I need W = W(W^T W)^{-1/2} x = dot(B.T,B) x = winvhalf(x) B = dot(B, real(x)) #---------------------------------- On Sun, Jul 11, 2010 at 4:24 AM, wrote: > On Sat, Jul 10, 2010 at 8:15 PM, Charles R Harris > wrote: > > > > > > On Sat, Jul 10, 2010 at 5:57 PM, Alexey Brazhe wrote: > >> > >> >Sure, M**0.5 is cho_factor(M). For other non-integers I am not sure > what > >> >matrix exponentiation could possibly mean. > >> > >> >Are you sure you don't mean array exponentiation? > >> > >> Indeed, I needed to raise a matrix (not array) to power 1/2 (in fact, > >> -1/2). > >> More precisely, I need to compute W(W^TW)^(-1/2). > >> cho_factor fails with "matrix not positive definite", and I don't know > how > >> to avoid that > > > > Well, the question remains as to the precise meaning of the square root > > (what is the application?), but my guess is that if you use eigh to > > decompose (W^TW) into u*d*u^T then form u*(d^{-1/2}*u^T you will get > what > > you need. Maybe ;) Zero elements of d, if any, will be a problem. > > > Part of some code I used where I don't find the cleaned up version > > omega = np.dot(dummyall, dummyall.T) > ev, evec = np.linalg.eigh(omega) #eig doesn't work > omegainvhalf = evec/np.sqrt(ev) > print np.max(np.abs(np.dot(omegainvhalf,omegainvhalf.T) - omegainv)) > # now we can use omegainvhalf in GLS (instead of the cholesky) > > no guarantees, > > Josef > > > > > You can also use the svd if the previous interpretation is correct, since > if > > W = u*d*v the whole expression above reduces to u*d^(.5)*v. > > > > Chuck > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 11 09:40:00 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 11 Jul 2010 07:40:00 -0600 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: On Sun, Jul 11, 2010 at 1:24 AM, Alexey Brazhe wrote: > Did I get it right that W(W^TW)^(-1/2) > reduces to UV, where U,V come from svd of W? > > Yep. If W = u * d * v, then (W^TW)^(-1/2) = v^T * 1/d * v and the rest follows because u and v are orthogonal. The result is an orthogonal matrix if none of the d's are zeros. This looks like some sort of whitening, so with some adustments you can probably get equivalent results using the q from a qr decomposition or even just the u from the svd. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jul 11 11:24:20 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 11 Jul 2010 23:24:20 +0800 Subject: [SciPy-User] ANN: scipy 0.8.0 release candidate 2 Message-ID: I'm pleased to announce the availability of the second release candidate of SciPy 0.8.0. The only changes compared to rc1 are the fixed test failures for special.arccosh/arctanh on Windows, and correct version numbering of the documentation. If no more problems are reported, the final release will be available this Wednesday. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release candidate release comes almost one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0rc2 requires Python 2.4-2.6 and NumPy 1.4.1 or greater. For more information, please see the release notes: http://sourceforge.net/projects/scipy/files/scipy/0.8.0rc2/NOTES.txt/view You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available, as well as source tarballs for other platforms and the documentation in pdf form. Thank you to everybody who contributed to this release. Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.h.jaffe at gmail.com Sun Jul 11 11:26:32 2010 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun, 11 Jul 2010 16:26:32 +0100 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: Message-ID: <4C39E2A8.9040206@gmail.com> Hi, On 11/07/2010 00:26, Alexey Brazhe wrote: > Hi, > I failed to find a way to raise a matrix to a non-integer power in > numpy/scipy > > In Octave/Matlab, one would write M^0.5 to get the result > whereas in numpy > >>> maxtrix(M, 0.5) > raises the "TypeError: exponent must be an integer" > > Is there a way to do matrix exponentiation to non-integer powers in > numpy or scipy? > > Hope the answer is positive :) Although most people already know this, since nobody's actually said it yet in this thread, and there seems to be some confusion, the generic meaning of matrix exponentiation is usually the following. We can diagonalize a matrix M = R^T E R where R is the matrix of eigenvectors (^T is transpose or hermitian conjugate) and E = diag(lambda_1, lambda_2, ...) is the diagonal matrix of eigenvalues. Then, we can define M^a = R^T E^a R where E^a = diag(lambda_1^a, lambda_2^a, ...) in particular, this gives the obvious answers for integer powers and even negative integers, including -1 for the inverse. (+1/2 doesn't give the Cholesky decomposition, but the Hermitian square root) Andrew From d.l.goldsmith at gmail.com Sun Jul 11 13:00:29 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 11 Jul 2010 10:00:29 -0700 Subject: [SciPy-User] Reality check In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 8:43 PM, Robert Kern wrote: > On Sat, Jul 10, 2010 at 22:37, David Goldsmith > wrote: > > The default for the axis argument of fftpack.rfft is -1; I couldn't find > the > > source for what this function ultimately calls, but the fact that it > > preserves the shape of its input (determined empirically) suggests that > it > > means transform over the last axis (as opposed to transform flattened > > input). Two questions: > > > > a) Is this correct? > > Yes. > > > b) I forget: this means that it is transforming "across" columns, i.e., > each > > row represents a separate data set, correct? > > Yes. > Thanks, Robert! :-) DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Sun Jul 11 13:23:20 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 11 Jul 2010 10:23:20 -0700 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: <4C39E2A8.9040206@gmail.com> References: <4C39E2A8.9040206@gmail.com> Message-ID: On Sun, Jul 11, 2010 at 8:26 AM, Andrew Jaffe wrote: > Hi, > > On 11/07/2010 00:26, Alexey Brazhe wrote: > > Hi, > > I failed to find a way to raise a matrix to a non-integer power in > > numpy/scipy > > > > In Octave/Matlab, one would write M^0.5 to get the result > > whereas in numpy > > >>> maxtrix(M, 0.5) > > raises the "TypeError: exponent must be an integer" > > > > Is there a way to do matrix exponentiation to non-integer powers in > > numpy or scipy? > > > > Hope the answer is positive :) > > Although most people already know this, since nobody's actually said it > yet in this thread, and there seems to be some confusion, the generic > meaning of matrix exponentiation is usually the following. > > We can diagonalize a matrix > M = R^T E R > where R is the matrix of eigenvectors (^T is transpose or hermitian > conjugate) and > E = diag(lambda_1, lambda_2, ...) is the diagonal matrix of > eigenvalues. > > Then, we can define > M^a = R^T E^a R > where E^a = diag(lambda_1^a, lambda_2^a, ...) > > in particular, this gives the obvious answers for integer powers and > even negative integers, including -1 for the inverse. (+1/2 doesn't give > the Cholesky decomposition, but the Hermitian square root) > Thanks, Andrew, I was wanting to provide something like this, but I was going to have to go look it up and, well, have higher priorities at the moment. :-) But you left off one "intuitive" identity that one would want to be true, which would appear to be trivially so, unless something unexpected screws it up, namely: (M^a)^(1/a) = (M^(1/a))^a = M; I assume this is valid, correct? DG > > Andrew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brazhe at gmail.com Sun Jul 11 13:31:24 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 21:31:24 +0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: Seems to be, but not for any matrix: #----------------- def mpower(M, p): "Matrix exponentiation" e,EV = linalg.eigh(M) return dot(EV.T, dot(diag((e)**p), EV)) m = array([[1.0,2.0], [3.0,4.0]]) then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) But mpower(mpower(m,0.5),2) doesn't equal m! On Sun, Jul 11, 2010 at 9:23 PM, David Goldsmith wrote: > On Sun, Jul 11, 2010 at 8:26 AM, Andrew Jaffe wrote: > >> Although most people already know this, since nobody's actually said it >> yet in this thread, and there seems to be some confusion, the generic >> meaning of matrix exponentiation is usually the following. >> >> We can diagonalize a matrix >> M = R^T E R >> where R is the matrix of eigenvectors (^T is transpose or hermitian >> conjugate) and >> E = diag(lambda_1, lambda_2, ...) is the diagonal matrix of >> eigenvalues. >> >> Then, we can define >> M^a = R^T E^a R >> where E^a = diag(lambda_1^a, lambda_2^a, ...) >> >> in particular, this gives the obvious answers for integer powers and >> even negative integers, including -1 for the inverse. (+1/2 doesn't give >> the Cholesky decomposition, but the Hermitian square root) >> > > Thanks, Andrew, I was wanting to provide something like this, but I was > going to have to go look it up and, well, have higher priorities at the > moment. :-) But you left off one "intuitive" identity that one would want > to be true, which would appear to be trivially so, unless something > unexpected screws it up, namely: (M^a)^(1/a) = (M^(1/a))^a = M; I assume > this is valid, correct? > > DG > > >> >> Andrew >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 11 13:41:39 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 11 Jul 2010 11:41:39 -0600 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: On Sun, Jul 11, 2010 at 11:31 AM, Alexey Brazhe wrote: > Seems to be, but not for any matrix: > > #----------------- > > def mpower(M, p): > "Matrix exponentiation" > e,EV = linalg.eigh(M) > return dot(EV.T, > dot(diag((e)**p), EV)) > > m = array([[1.0,2.0], [3.0,4.0]]) > > then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) > > But mpower(mpower(m,0.5),2) doesn't equal m! > > For this algorithm the matrix needs to be Hermitean, which is the case for W^T W. More generally, the matrix needs to be normal, i.e., commute with it's transpose. A matrix can be diagonalized iff it is normal. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From brazhe at gmail.com Sun Jul 11 13:54:01 2010 From: brazhe at gmail.com (Alexey Brazhe) Date: Sun, 11 Jul 2010 21:54:01 +0400 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: Thank you for explanation. I must admit, I know very little linear algebra :( On Sun, Jul 11, 2010 at 9:41 PM, Charles R Harris wrote: > > > On Sun, Jul 11, 2010 at 11:31 AM, Alexey Brazhe wrote: > >> Seems to be, but not for any matrix: >> >> #----------------- >> >> def mpower(M, p): >> "Matrix exponentiation" >> e,EV = linalg.eigh(M) >> return dot(EV.T, >> dot(diag((e)**p), EV)) >> >> m = array([[1.0,2.0], [3.0,4.0]]) >> >> then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) >> >> But mpower(mpower(m,0.5),2) doesn't equal m! >> >> > For this algorithm the matrix needs to be Hermitean, which is the case for > W^T W. More generally, the matrix needs to be normal, i.e., commute with > it's transpose. A matrix can be diagonalized iff it is normal. > > > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jturner at gemini.edu Sun Jul 11 15:10:14 2010 From: jturner at gemini.edu (James Turner) Date: Sun, 11 Jul 2010 15:10:14 -0400 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: References: <4C2BBBA7.5060006@gemini.edu> Message-ID: <4C3A1716.9060508@gemini.edu> Hi Thomas, I think this seems a good idea, even if it only solves part of the problem. Astrokits (or just scikits?) could be a good place for code to go whilst it's maturing, or for other reasons, with a low barrier to entry. We could encourage authors to follow the same standards/ guidelines as the core library, without enforcing them. A couple of limitations spring to mind. First, a proliferation of astrokits with overlapping functionality could perpetuate duplication and incoherence. As Perry said, there's no harm in doing something a couple of ways to see which is best, but if there are 5 different wrappers for WCSLIB, that's just confusing and makes it hard to focus on quality. WCS is not the best example, since you proposed putting that in the core, but in general this seems like something to keep in mind and it isn't solved just by associating different libraries. My second immediate concern would be distribution -- installing lots of astrokits could be a bigger problem for end users than a single library. Of course that could be solved by including most of the astrokits in our Gemini/STScI Python distribution once it's available, but these are early days and I don't think we're ready to promise the whole astronomy/science community a solution to its installation problems. Maybe others (eg. scisoft) would also pick them up though. Nevertheless, I think astrokits can only be better than lots of totally uncoordinated libraries and they could be a good path to accepting code into the core and/or encouraging contributions that might not happen otherwise. I think I'd back this proposal, unless others have objections I haven't considered yet -- but I'm not speaking for Perry's group, which ultimately manages Astrolib. Cheers, James. > Hi all, > > After reading all the replies, I have the following suggestion to make. > > The model scipy follows is to have a 'core' scipy package, and 'scikit' packages which share a common namespace, and which are meant as addons to the scipy package, but have not yet (or might never) make it to the main scipy package: > > http://www.scipy.org/scipy/scikits/ > > "Scipy Toolkits are independent and seperately installable projects hosted under a common namespace. Packages that are distributed in this way are here (instead of in monolithic scipy) for at least one of three general reasons. Each of these reasons use the same high-level namespace (scikits)." > > I think we can use this model, and that the following approach can be used here, namely: > > - start with a very basic 'astropy' package, with e.g. support for FITS/WCS > - agree to coordinate astronomy packages with a common namespace (e.g. 'astrokit', so for example, APLpy would become astrokit.aplpy). This can help us manage the namespace (as suggested in Joe Harrington's email) > - as astrokit modules mature, they can (if the authors are willing) be merged into the main 'astropy' package, once they have met a number of conditions, including e.g. unit tests, sphinx documentation, limited dependencies (e.g. numpy/scipy/matplotlib, and any package in the 'astropy' package), and compatible license. > > The advantage of this model is that this encourages the growth from the bottom up of a core astronomy package, which is manageable, as well as the independent development of other packages. It also means that the core package will be quite stable, because it will only accrete 'astrokit' modules as they become stable and mature. At the same time, it encourages developers to make their own innovative astrokit, but without commitment from the maintainers of the core package to accrete it in future. > > In passing, this also leaves the possibility for those who want to develop meta-pacakges of astrokit modules. Also, it will make it easier for those who want to build fully fledged astronomy python distributions. > > There have been many ideas passed around in the mailing list, but I think that some are maybe too ambitious. I do think that the idea suggested above is much more manageable. The concrete steps would be: > > - setup a central repository for the core packages, as well as astrokit pacakges (although we should leave the option for people to develop astrokit packages outside this repository too, and rely on trust and communication to avoid namespace clashes) > - start a core package with e.g. FITS and WCS support (e.g. pyfits + pywcs) > - set up a list of 'registered' astrokit names to avoid conflict. > - set up a list of recommendations and requirements for astrokit pacakges > - encourage developers of existing packages to use this namespace and follow the guidelines. This would be very little work in some cases, which is why I think it could work. > > Sharing a namespace is (I think) the first step to showing that we are willing to work on something coherent, and users would only see two top-level namespaces - astropy and astrokit, which would be a great improvement over the current situation where it is not immediately obvious which packages are astronomy related. > > Comments/suggestions/criticism welcome! > > Cheers, > > Tom From d.l.goldsmith at gmail.com Sun Jul 11 16:17:07 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 11 Jul 2010 13:17:07 -0700 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: On Sun, Jul 11, 2010 at 10:41 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > On Sun, Jul 11, 2010 at 11:31 AM, Alexey Brazhe wrote: > >> Seems to be, but not for any matrix: >> >> #----------------- >> >> def mpower(M, p): >> "Matrix exponentiation" >> e,EV = linalg.eigh(M) >> return dot(EV.T, >> dot(diag((e)**p), EV)) >> >> m = array([[1.0,2.0], [3.0,4.0]]) >> >> then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) >> >> But mpower(mpower(m,0.5),2) doesn't equal m! >> >> > For this algorithm the matrix needs to be Hermitean, which is the case for > W^T W. More generally, the matrix needs to be normal, i.e., commute with > it's transpose. A matrix can be diagonalized iff it is normal. > So it isn't just an algorithmic issue: the general matrix exponentiation only works for normal matrices (in theory as well as in computational practice)? Or is it just the (M^a)^(1/a) = (M^(1/a))^a identity that fails if M isn't normal? DG > > > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 11 18:02:10 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 11 Jul 2010 16:02:10 -0600 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: On Sun, Jul 11, 2010 at 2:17 PM, David Goldsmith wrote: > On Sun, Jul 11, 2010 at 10:41 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> On Sun, Jul 11, 2010 at 11:31 AM, Alexey Brazhe wrote: >> >>> Seems to be, but not for any matrix: >>> >>> #----------------- >>> >>> def mpower(M, p): >>> "Matrix exponentiation" >>> e,EV = linalg.eigh(M) >>> return dot(EV.T, >>> dot(diag((e)**p), EV)) >>> >>> m = array([[1.0,2.0], [3.0,4.0]]) >>> >>> then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) >>> >>> But mpower(mpower(m,0.5),2) doesn't equal m! >>> >>> >> For this algorithm the matrix needs to be Hermitean, which is the case for >> W^T W. More generally, the matrix needs to be normal, i.e., commute with >> it's transpose. A matrix can be diagonalized iff it is normal. >> > > So it isn't just an algorithmic issue: the general matrix exponentiation > only works for normal matrices (in theory as well as in computational > practice)? Or is it just the (M^a)^(1/a) = (M^(1/a))^a identity that fails > if M isn't normal? > Well, you can square any matrix and the result has a square root. The series expansion can even converge. For instance [[1,1],[0,1]] isn't normal but has an n'th root [[1,1/n],[0,1]], i.e., In [6]: m = array([[1.,.2],[0.,1.]]) In [7]: dot(dot(dot(dot(m,m),m),m),m) Out[7]: array([[ 1., 1.], [ 0., 1.]]) In fact, all upper triangular matrices with ones on the diagonal have series with a finite number of terms. So if the matrix is reduced to Jordan form and none of the diagonal elements are zero, then it should be possible to find the roots. OTOH, [[0,1][0,0]] doesn't have a square root. The case of normal matrices is easy to handle efficiently, however, while reduction to Jordan form is often difficult and can be numerically tricky. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Sun Jul 11 18:56:08 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 11 Jul 2010 15:56:08 -0700 Subject: [SciPy-User] raising a matrix to float power In-Reply-To: References: <4C39E2A8.9040206@gmail.com> Message-ID: On Sun, Jul 11, 2010 at 3:02 PM, Charles R Harris wrote: > > On Sun, Jul 11, 2010 at 2:17 PM, David Goldsmith wrote: > >> On Sun, Jul 11, 2010 at 10:41 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> On Sun, Jul 11, 2010 at 11:31 AM, Alexey Brazhe wrote: >>> >>>> Seems to be, but not for any matrix: >>>> >>>> #----------------- >>>> >>>> def mpower(M, p): >>>> "Matrix exponentiation" >>>> e,EV = linalg.eigh(M) >>>> return dot(EV.T, >>>> dot(diag((e)**p), EV)) >>>> >>>> m = array([[1.0,2.0], [3.0,4.0]]) >>>> >>>> then dot(m.T,m) does equal mpower(mpower(dot(m.T,m), 0.5), 2.0) >>>> >>>> But mpower(mpower(m,0.5),2) doesn't equal m! >>>> >>>> >>> For this algorithm the matrix needs to be Hermitean, which is the case >>> for W^T W. More generally, the matrix needs to be normal, i.e., commute with >>> it's transpose. A matrix can be diagonalized iff it is normal. >>> >> >> So it isn't just an algorithmic issue: the general matrix exponentiation >> only works for normal matrices (in theory as well as in computational >> practice)? Or is it just the (M^a)^(1/a) = (M^(1/a))^a identity that fails >> if M isn't normal? >> > > Well, you can square any matrix and the result has a square root. The > series expansion can even converge. For instance [[1,1],[0,1]] isn't normal > but has an n'th root [[1,1/n],[0,1]], i.e., > > In [6]: m = array([[1.,.2],[0.,1.]]) > > In [7]: dot(dot(dot(dot(m,m),m),m),m) > Out[7]: > array([[ 1., 1.], > [ 0., 1.]]) > > In fact, all upper triangular matrices with ones on the diagonal have > series with a finite number of terms. So if the matrix is reduced to Jordan > form and none of the diagonal elements are zero, then it should be possible > to find the roots. OTOH, [[0,1][0,0]] doesn't have a square root. The case > of normal matrices is easy to handle efficiently, however, while reduction > to Jordan form is often difficult and can be numerically tricky. > OK, so it is, at least in part, an algorithmic issue; which begs the question: does scipy have a more generally applicable, though still robust (i.e., behaves nicely when it fails) pow(matrix, float) function? DG > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahman at astro.utoronto.ca Sat Jul 3 11:57:07 2010 From: rahman at astro.utoronto.ca (Mubdi Rahman) Date: Sat, 3 Jul 2010 15:57:07 +0000 (GMT) Subject: [SciPy-User] Co-ordinating Python astronomy libraries? In-Reply-To: <4C2BBBA7.5060006@gemini.edu> References: <4C2BBBA7.5060006@gemini.edu> Message-ID: <115583.37382.qm@web29012.mail.ird.yahoo.com> Hi James and AstroPy colleagues, Thanks for the note - coordinating the astronomical python community has been a long time coming, and I'm glad that someone's taken the initiative to start this line of communication. Last year, a number of us located here in Toronto started pyAstroLib, with the goal of converting the NASA IDL Astronomy Library into Python. We roadmapped the course of doing an initial conversion of the entire library, with the exception of scripts that already have python equivalents (i.e. those provided by pyFits, et cetera). In the case of existing functionality, the documentation of pyAstroLib would point users in the direction of the tools that do the equivalent. We further roadmapped a plan to go beyond the structure of the IDL AstroLib (we were calling the direct python conversion of these libraries the "legacy" library), and developing a more structured, streamlined and less redundant general purpose astronomy library, so as to make entry into the world of astronomical python as painless as possible. Beyond this stage, we had a number of dream-goals, including a lightweight python-based replacement to DS9 with a more intuitive UI (think Adobe Photoshop), which could be used as a standalone package or embedded within other python scripts. We had a number of design requirements for the library: - The library was to be fully open-source (we chose the LGPL as our model). - The library was to require as few external libraries as necessary (we've limited ourselves to NumPy, SciPy, PyFits, and MatPlotLib). - The library was intended to be seamlessly cross-platform. Personally, I'm a Linux/Windows user, where as some of the others in our collaboration are pure Mac or pure Linux users. We realized that at some point, the project would merge with a number of others out there, and our goal was to have a bit of a code base before we started expanding and merging. Needless to say, we were ambitious. That was a year ago. In that time, we had started the conversion process - a large number of the general IDL astronomical functions have been converted (as one can pull from our git repo on sourceforge), complete with docstrings, but without extensive testing. But as is always the case, each of us individually became more concerned about the functionality that we needed for our research. For instance, much of my reliance on IDL has been due to the coordinate conversion functions - so I had just ported that part of wcslib into python. (The other half of my reliance on IDL has come from the pixel to wcs coordinate transformations and the image manipulation and alignment scripts, which I'm currently working on). We've also been receiving a lot of contributed code for a variety of tools and functions that other users have needed. Up till now, we haven't had a place to put this code. Now considering the path that this project has taken, we're moving in a different direction, which is to round up the contributed code, even if it doesn't actually fit in the IDL AstroLib framework, give it a common API, and just get people using it as is for now, and slowly categorize and refine the code in subsequent releases. That's basically where we are with pyAstroLib. I hope this clears up some of the confusion and helps coordinating effort. Mubdi Rahman on behalf of the pyAstroLib crew. ----- Original Message ---- > From: James Turner > To: AstroPy ; SciPy Users List > Sent: Wed, June 30, 2010 5:48:23 PM > Subject: Co-ordinating Python astronomy libraries? > > Dear Python users in astronomy, At SciPy 2009, I arranged an astronomy > BoF where we discussed the fact that there are now a number of astronomy > libraries for Python floating around and maybe it would be good to collect > more code into a single place. People seemed receptive to this idea and > weren't sure why it hasn't already happened, given that there has been an > Astrolib page at SciPy for some years now, with an associated SVN > repository: > >http://scipy.org/AstroLib After the meeting last August, I was > supposed to contact the mailing list and some library authors I had talked to > previously, to discuss this further. My apologies for taking 10 months to do > that! I did draft an email the day after the BoF, but then we ran into a > hurdle with setting up new committers to the AstroLib repository (which > has taken a lot longer than expected to resolve), so it seemed a bad time > to suggest that new people start using it. To discuss these issues > further, we'd like to encourage everyone to sign up for the AstroPy mailing > list if you are not already on it. The traffic is just a few messages per > month. > href="http://lists.astropy.scipy.org/mailman/listinfo/astropy" target=_blank > >http://lists.astropy.scipy.org/mailman/listinfo/astropy We (the 2009 > BoF group) would also like to hear on the list about why people have decided > to host their own astronomy library (eg. not being aware of the one at > SciPy). Are you interested in contributing to Astrolib? Do you have any other > comments or concerns about co-ordinating tools? Our motivation is to make > libraries easy to find and install, allow sharing code easily, help > rationalize available functionality and fill in what's missing. A > standard astronomy library with a single set of documentation should be > more coherent and easier to maintain. The idea is not to limit > authors' flexibility of take ownership of their code -- the > sub-packages can still be maintained by different people. If you're at > SciPy this week, Perry Greenfield and I would be happy to talk to you. If you > would like to add your existing library to Astrolib, please contact Perry > Greenfield or Mark Sienkiewicz at STScI for access (contact details at > href="http://scipy.org/AstroLib" target=_blank > >http://scipy.org/AstroLib). Note that the repository is being moved to a > new server this week, after which the URLs will be updated at > scipy.org. Thanks! James Turner (Gemini). Bcc: various > library authors From peterhoward42 at gmail.com Wed Jul 7 05:56:39 2010 From: peterhoward42 at gmail.com (Peter Howard) Date: Wed, 7 Jul 2010 10:56:39 +0100 Subject: [SciPy-User] band pass filter .WAV file Message-ID: I'm trying to write a very simple example of applying a band pass filter to a .WAV music file. I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies if I've made a dumb mistake. It executes without errors or warnings. It produces the output file, but this is twice the size of the input file, which is clearly wrong. I'm most uncertain about casting the filtered data back to integers and thus being suitable for writing back out to .WAV. I'm a bit uncertain about my interpretation / understanding of the frequency and gain specifications. Any help and advice very much appreciated. Pete from scipy.io.wavfile import read, write from scipy.signal.filter_design import butter, buttord from scipy.signal import lfilter from numpy import asarray def convert_hertz(freq): # convert frequency in hz to units of pi rad/sample # (our .WAV is sampled at 44.1KHz) return freq * 2.0 / 44100.0 rate, sound_samples = read('monty.wav') pass_freq = convert_hertz(440.0) # pass up to 'middle C' stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves higher pass_gain = 3.0 # tolerable loss in passband (dB) stop_gain = 60.0 # required attenuation in stopband (dB) ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) b, a = butter(ord, wn, btype = 'low') filtered = lfilter(b, a, sound_samples) integerised_filtered = asarray(filtered, int) write('monty-filtered.wav', rate, integerised_filtered) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cfrazer at uci.edu Thu Jul 8 13:34:28 2010 From: cfrazer at uci.edu (cfrazer at uci.edu) Date: Thu, 8 Jul 2010 10:34:28 -0700 Subject: [SciPy-User] Saving Complex Numbers Message-ID: <0c2af6ba358eb4f073177cfd6784a03c.squirrel@webmail.uci.edu> I'm looking to save complex number to a text file. My code is failing me miserably. Here's a simpler case of my problem: Code: --------------------------------- from numpy import * from pylab import * a = [3+2j,5+7j,8+2j] savetxt("complex.out",a) --------------------------------- SavedFile: --------------------------------- 3.000000000000000000e+00 5.000000000000000000e+00 8.000000000000000000e+00 --------------------------------- Need Complex numbers in the saved file. Thanks in advance! -Chris Frazer From ifs at lanl.gov Fri Jul 9 14:14:07 2010 From: ifs at lanl.gov (Isaac Salazar) Date: Fri, 9 Jul 2010 12:14:07 -0600 Subject: [SciPy-User] permission error. Message-ID: <8162CA82-05CF-4F23-9F6E-D4F4362EFF12@lanl.gov> Hello, I am getting a permission error when trying to open a figure using matplotlib. TclError: couldn't open "/Library/Frameworks/Python.framework/ Versions/2.6/lib/python2.6/site-packages/matplotlib/mpl-data/images/ home.ppm": permission denied Attached is a test log file. Isaac Salazar W-13: ADVANCED ENGINEERING ANALYSIS TA-03, Building 1400, Room 2229 MS A142 ifs at lanl.gov phone: 667 9225 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.tollerud at gmail.com Sun Jul 11 22:09:50 2010 From: erik.tollerud at gmail.com (Erik Tollerud) Date: Sun, 11 Jul 2010 19:09:50 -0700 Subject: [SciPy-User] [AstroPy] Co-ordinating Python astronomy libraries? In-Reply-To: <4C3A1716.9060508@gemini.edu> References: <4C2BBBA7.5060006@gemini.edu> <4C3A1716.9060508@gemini.edu> Message-ID: I think the "astrokits" idea is a good one - (definitely not melding into scikits, though - that's too big of an umbrella community for me to be comfortable with). I'm somewhat concerned about resulting long import statements("from astrokits.apackage.amodule.submodule import SomeReallyLongClassOrFunctionName"), but I guess that might be the necessary price to pay. I still think, though, that using that same sort of infrastructure, while relaxing the namespace requirement, is a better solution - that is, you can have an organizational structure like Thomas suggests without requiring that the packages start with "astrokits" - just strongly recommend the guidelines of consistent documentation (ideally, I think, using sphinx) and consistent use of the standard setup.py schemes (most important here is consistency in compiling any Fortran or C codes... or better yet have everyone use Cython for any interface layers), and you get the same advantages. As James rightly pointed out, if you go the astrokits route, you still don't really solve the installation problem because everything is installed separately - all that's really important is a central listing of the "sanctioned" astrokits. Another thing that would probably be a very good thing to do if possible would be to allow for a "proprietary time" - that is, allow people to develop something only they have access to until they publish the first paper or whatever using/describing that tool, and then afterwards it goes public as an astrokit for everyone to see. So in my mind it seems reasonable to set up a shared repository/web site with the option (but not requirement) of using an "astrokits" namespace, with the understanding that when possible, the core "astropy" modules would be used to avoid duplicating effort (and the code must follow the aforementioned rather loose standards). On Wed, Jul 7, 2010 at 6:01 PM, Kelle Cruz wrote: > Well, Eli and Tom have already done the listing on AstroPython: > http://www.astropython.org/resources > or > http://www.astropython.org/resources?sort=tag I hadn't realized how complete this listing is - it's a great start, but the one thing that's missing for what I had in mind is an access API or rigorous entry standard. That is, it has to be possible to write scripts that work like pip to grab download links and use them to automatically build and install packages. This isn't at all hard to do, it just requires either a secondary database with an API, or a slightly more standardized entry format with source download links (which don't have to be hosted on the same page), and some standardized description of the non-python requirements. With that, it'd be pretty easy to write something like an "astropip" that could be used to automate installation/upgrade of any of the listed packages. An "astropip" has the advantage of allowing us to just roll our own sorts of metapackages and get around the setuptools/distribute shortcomings. The more I think about it, that might be the way to go. > - I think a AAS Splinter is a great idea...looks like the deadline isn't > until Dec 1 for Seattle, but we should get on it since they are assigned on > a first-come, first-served basis based on room availability. ?It might make > more sense to do Boston in May 2011 because people won't be as busy with > other meeting things and it should be nice there in spring I would think Seattle might be better because a lot more people will be there... and anyway, a dreary Seattle winter day is much more likely to keep people in a room talking about python libraries, isn't it? (Also, for full disclosure, I'm on the west coast, so I personally would prefer it given that I'm not sure I can make it to Boston...) On Thu, Jul 8, 2010 at 11:03 AM, Perry Greenfield wrote: > Well, more the former, but also to enable something along the latter (though > not necessarily part of a single install). When things are here and there, > it is more difficult to package those together in an automatic way. Not > impossible, just more work and more ways for things to go wrong. You are right about this, but as I've described above, I think with better consistency in how all the packages are installed, it'll be far easier to deal with these problems. And in my mind, the gains are much greater because it give access to a much larger brain-trust of people who want a fair amount of freedom (at least initially) in how they're doing their project. > But trac is fairly portable since it is widely used. Migrating the info from > some of the others can be a ?more difficult problem (lock-in issues). As for > svn, git, hg, bazaar issues, nothing is going to satisfy everyone, I agree The Trac thing is a personal preference - I admit it might well be the best all-around compromise, although I still think allowing the option of external hosting might be a good idea, as long as they have standardized entries in whatever listing is used. I want to reiterate, though, that I think it's important to *not* use svn over the other options I mentioned - distributed version control systems (e.g. bzr, hg, or git) are far more suited to the kind of development I think most astronomers typically are used to, as they can make their own local copy, do their own thing without any access to the central repository, and later merge it back in. > That works to a certain level. But before long, there are n flavors of > representing this and that, and combining 10 packages like this to use in > your own ?code can get to be a real chore. You'll spend a lot of time moving > from one convention to another (and not catching some bugs). I don't think > making everyone conform is a good solution, but eventually standardizing on > core libraries is a very good thing in the long run. I think there is a > reasonable middle ground on this kind of thing. You're absolutely right about the problems of fragmentation and the virtues of later standardization. But right now, given that those core libraries don't exist, the freedom to experiment is also very important. Of course, I'm biased in that I'm working for a rather more pythonic/object-oriented flavor than the traditional IDL-like libraries so I'm leveraged a bit towards the experiment side... but I think the point is still valid, so as you say, a general middle ground is important. -- Erik Tollerud From robert.kern at gmail.com Sun Jul 11 22:11:06 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 11 Jul 2010 21:11:06 -0500 Subject: [SciPy-User] permission error. In-Reply-To: <8162CA82-05CF-4F23-9F6E-D4F4362EFF12@lanl.gov> References: <8162CA82-05CF-4F23-9F6E-D4F4362EFF12@lanl.gov> Message-ID: On Fri, Jul 9, 2010 at 13:14, Isaac Salazar wrote: > Hello, > I am getting a permission error when trying to open a figure using > matplotlib. The matplotlib mailing list is over here: https://lists.sourceforge.net/lists/listinfo/matplotlib-users -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From vincent at vincentdavis.net Mon Jul 12 09:36:16 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Mon, 12 Jul 2010 07:36:16 -0600 Subject: [SciPy-User] more efficient way of dealing with numpy arrays? In-Reply-To: <7e882c44-6040-406c-b811-19f89be81d33@y11g2000yqm.googlegroups.com> References: <7e882c44-6040-406c-b811-19f89be81d33@y11g2000yqm.googlegroups.com> Message-ID: You might look at this http://docs.scipy.org/doc/numpy/reference/generated/numpy.extract.html#numpy-extract Vincent On Thu, Jun 24, 2010 at 7:08 AM, tashbean wrote: > Hi, > > I would like to pick certain rows of an array based on matching the > first column with options contained in another array e.g. I have this > array: > > parameter_list = > array([['Q10', 'scipy.stats.uniform(2,10-2)'], > ? ? ? ['mpe', 'scipy.stats.uniform(0.,1.)'], > ? ? ? ['rdr_a', 'scipy.stats.uniform(5e-5,1.24-5e-5)'], > ? ? ? ['rdr_b', 'scipy.stats.uniform(-60.18,-3.41--60.18)']], > ? ? ?dtype='|S40') > > I have an array which contains the strings of the first column which I > would like to pick out e.g. > > param_options = ['Q10' , 'mpe'] > > My solution of how to do this is as follows: > > new_params = numpy.array([]) > > for i in xrange(len(param_options)): > ? ? new_params = numpy.append(new_params, > parameter_list[parameter_list[:,0]==param_options[i]]) > > Is there a more efficient way of doing this? > > Thank you for your help! > Tash > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ben.root at ou.edu Mon Jul 12 10:35:57 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 12 Jul 2010 09:35:57 -0500 Subject: [SciPy-User] band pass filter .WAV file In-Reply-To: References: Message-ID: I don't know if something is wrong or not, but a quick guess at why the file is twice the size of the original might be that the input file was mono while the output file was stereo, maybe? Ben Root On Wed, Jul 7, 2010 at 4:56 AM, Peter Howard wrote: > I'm trying to write a very simple example of applying a band pass filter to > a .WAV music file. > I'm distinctly rusty on DSP and inexperienced with SciPy/NumPy so apologies > if I've made a dumb mistake. > > It executes without errors or warnings. > It produces the output file, but this is twice the size of the input file, > which is clearly wrong. > I'm most uncertain about casting the filtered data back to integers and > thus being suitable for writing back out to .WAV. > I'm a bit uncertain about my interpretation / understanding of the > frequency and gain specifications. > > Any help and advice very much appreciated. > > Pete > > > > > from scipy.io.wavfile import read, write > from scipy.signal.filter_design import butter, buttord > from scipy.signal import lfilter > from numpy import asarray > > def convert_hertz(freq): > # convert frequency in hz to units of pi rad/sample > # (our .WAV is sampled at 44.1KHz) > return freq * 2.0 / 44100.0 > > rate, sound_samples = read('monty.wav') > pass_freq = convert_hertz(440.0) # pass up to 'middle C' > stop_freq = convert_hertz(440.0 * 4) # max attenuation from 3 octaves > higher > pass_gain = 3.0 # tolerable loss in passband (dB) > stop_gain = 60.0 # required attenuation in stopband (dB) > ord, wn = buttord(pass_freq, stop_freq, pass_gain, stop_gain) > b, a = butter(ord, wn, btype = 'low') > filtered = lfilter(b, a, sound_samples) > integerised_filtered = asarray(filtered, int) > write('monty-filtered.wav', rate, integerised_filtered) > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonightwedrink at hotmail.com Tue Jul 13 03:00:23 2010 From: tonightwedrink at hotmail.com (ben h) Date: Tue, 13 Jul 2010 07:00:23 +0000 Subject: [SciPy-User] [newbie] how to compare two datasets Message-ID: The datasets are borehole data - so they have borehole name, depth, and a value. Each borehole has two datasets - one real, one modelled. I want to compare the values between them for each depth in modelled dataset (lower resolution / fewer samples). If there is no matching depth in real dataset I want to linearly interpolate between nearest values. Comparison to be quite simple at first, difference between values, and stats for entire set of differences. example data (depth, value): model: 0 15.5 -10 17.0 -20 18.5 -30 20.0 real: 0 16.5 -1 16.6 -2 16.6 ... -655 55.3 Not having used python much, i don't know best data structure (dictionary? sequence? list?), or if there are helpful things in SciPy to help this come together (stats, methods for comparing datasets like these, linear interp methods?). Looking for inspiration and pointers! ben. _________________________________________________________________ New, Used, Demo, Dealer or Private? Find it at CarPoint.com.au http://clk.atdmt.com/NMN/go/206222968/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 13 04:29:54 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 13 Jul 2010 08:29:54 +0000 (UTC) Subject: [SciPy-User] scalars vs array of length 1 References: <4601EBA5-C5F7-44AA-9449-50027052C181@tacc.utexas.edu> <4C3646DA.5040006@noaa.gov> <4C3647BD.5030004@american.edu> Message-ID: Thu, 08 Jul 2010 17:48:45 -0400, Alan G Isaac wrote: > On 7/8/2010 5:44 PM, Christopher Barker wrote: >> np.dot(x,y) > > Which reminds me: > expected in 1.4.1 to be able to do x.dot(y), but it's not there. Will > it be in 1.5? Yes. -- Pauli Virtanen From thoeger at fys.ku.dk Tue Jul 13 09:37:04 2010 From: thoeger at fys.ku.dk (=?ISO-8859-1?Q?Th=F8ger?= Emil Juul Thorsen) Date: Tue, 13 Jul 2010 15:37:04 +0200 Subject: [SciPy-User] [newbie] how to compare two datasets In-Reply-To: References: Message-ID: <1279028224.3792.20.camel@falconeer> That sounds like a set of NumPy arrays is what you need. You can simply import your dataset to an array and perform row- and columnwise operations. First I would do an interpolation of the real data, though I'd probably use a cubic spline, but linear is fine too. The spline function will operate on a numpy array and return the mathematical object, *not* a new array. This spline can then be evaluated in the depths for which you have your model data. An example of how it could be done would be: modeldata = numpy.genfromtxt('modeled.data') realdata = numpy.genfromtxt('real.data') # Now say depth is the first column, and value is second: tck = scipy.interpolate.splrep(realdata[:, 0], realdata[:, 1]) iplrealdata = scipy.interpolate.splev(modeldata[:, 0], tck) #You will now have an interpolated value of the real data for every #depth of the model data - done with a cubic spline. #Linear interpolation would be done by, instead of doing splev, doing: # Interpolate: func = scipy.interpolate.interp1d(realdata[:, 0], realdata[:, 1]) # Evaluate: iplrealdata = func(modeldata[:, 0]) Cheers; Emil On Tue, 2010-07-13 at 07:00 +0000, ben h wrote: > The datasets are borehole data - so they have borehole name, depth, > and a value. > Each borehole has two datasets - one real, one modelled. > I want to compare the values between them for each depth in modelled > dataset (lower resolution / fewer samples). > If there is no matching depth in real dataset I want to linearly > interpolate between nearest values. > Comparison to be quite simple at first, difference between values, and > stats for entire set of differences. > > example data (depth, value): > model: > 0 15.5 > -10 17.0 > -20 18.5 > -30 20.0 > > real: > 0 16.5 > -1 16.6 > -2 16.6 > ... > -655 55.3 > > > Not having used python much, i don't know best data structure > (dictionary? sequence? list?), or if there are helpful things in SciPy > to help this come together (stats, methods for comparing datasets like > these, linear interp methods?). > > Looking for inspiration and pointers! > > ben. > > > ______________________________________________________________________ > Find it at CarPoint.com.au New, Used, Demo, Dealer or Private? > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Thu Jul 15 11:27:45 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 15 Jul 2010 23:27:45 +0800 Subject: [SciPy-User] ANN: scipy 0.8.0 release candidate 3 Message-ID: I'm pleased to announce the availability of the third release candidate of SciPy 0.8.0. The only changes compared to rc2 are a fix for a regression in interpolate.Rbf and some fixes for failures on 64-bit Windows. If no more problems are reported, the final release will be available in one week. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release candidate release comes one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0rc3 requires Python 2.4-2.6 and NumPy 1.4.1 or greater. For more information, please see the release notes: http://sourceforge.net/projects/scipy/files/scipy/0.8.0rc3/NOTES.txt/view You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available, as well as source tarballs for other platforms and the documentation in pdf form. Thank you to everybody who contributed to this release. Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jf at heliotis.ch Thu Jul 15 11:29:35 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:29:35 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715152935.6977.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:30:06 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:30:06 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153006.7473.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:30:28 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:30:28 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153028.7686.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:30:45 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:30:45 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153045.7827.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:31:03 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:31:03 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153103.8020.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:31:20 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:31:20 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153120.8151.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:31:37 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:31:37 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153137.8294.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:31:55 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:31:55 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153155.8392.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:32:19 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:32:19 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153219.8639.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From jf at heliotis.ch Thu Jul 15 11:32:37 2010 From: jf at heliotis.ch (jf at heliotis.ch) Date: 15 Jul 2010 17:32:37 +0200 Subject: [SciPy-User] =?utf-8?q?ANN=3A_scipy_0=2E8=2E0_release_candidate_3?= Message-ID: <20100715153237.8802.qmail@einstein.sui-inter.net> I will be out of office till 8. August For urgent matters, please contact Patrick.Lambelet at heliotis.ch From seb.haase at gmail.com Thu Jul 15 16:39:33 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 15 Jul 2010 22:39:33 +0200 Subject: [SciPy-User] one-sided gauss fit -- or: how to estimate backgound noise ? Message-ID: Hi, In image analysis one is often faced with (often unknown) background levels (offset) + (Gaussian) background noise. The overall intensity histogram of the image is in fact often Gaussian (Bell shaped), but depending on how many (foreground) objects are present the histogram shows a positive tail of some sort. So, I just got the idea if there was a function (i.e. mathematical algorithm) that would allow to fit only the left half of a Gaussian bell curve to data points !? This would have to be done in a way that the center, the variance (or sigma) and the peak height are free fitting parameters. Any help or ideas are appreciated, thanks Sebastian Haase From zachary.pincus at yale.edu Thu Jul 15 18:13:28 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 15 Jul 2010 18:13:28 -0400 Subject: [SciPy-User] one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: Message-ID: <1CA111DD-C6E6-4D22-9D2A-4489D172B222@yale.edu> Hi Sebastian, > In image analysis one is often faced with (often unknown) background > levels (offset) + (Gaussian) background noise. > The overall intensity histogram of the image is in fact often Gaussian > (Bell shaped), but depending on how many (foreground) objects are > present the histogram shows a positive tail of some sort. > > So, I just got the idea if there was a function (i.e. mathematical > algorithm) that would allow to fit only the left half of a Gaussian > bell curve to data points !? > This would have to be done in a way that the center, the variance (or > sigma) and the peak height are free fitting parameters. For this task, I usually use some form of robust estimator for the mean and std, which is designed to ignore noise in the tails. Below I've pasted code that I use for an "minimum covariance determinant" estimate, which is translated from some matlab code I found online. For large images, it's slow and you'll probably want to randomly sample pixels to feed to the MCD estimator instead of using the entire image. And there are probably many simpler, faster robust estimators (like cutting off the tails, etc.) that are out there. Zach import numpy import scipy.stats as stats def unimcd(y,h): """unimcd(y, h) -> subset_mask unimcd computes the MCD estimator of a univariate data set. This estimator is given by the subset of h observations with smallest variance. The MCD location estimate is then the mean of those h points, and the MCD scale estimate is their standard deviation. A boolean mask is returned indicating which elements of the input array are in the MCD subset. The MCD method was introduced in: Rousseeuw, P.J. (1984), "Least Median of Squares Regression," Journal of the American Statistical Association, Vol. 79, pp. 871-881. The algorithm to compute the univariate MCD is described in Rousseeuw, P.J., Leroy, A., (1988), "Robust Regression and Outlier Detection," John Wiley, New York. This function based on UNIMCD from LIBRA: the Matlab Library for Robust Analysis, available at: http://wis.kuleuven.be/stat/robust.html """ y = numpy.asarray(y, dtype=float) ncas = len(y) length = ncas-h+1 if length <= 1: return numpy.ones(len(y), dtype=bool) indices = y.argsort() y = y[indices] ind = numpy.arange(length-1) ay = numpy.empty(length) ay[0] = y[0:h].sum() ay[1:] = y[ind+h] - y[ind] ay = numpy.add.accumulate(ay) ay2=ay**2/h sq = numpy.empty(length) sq[0] = (y[0:h]**2).sum() - ay2[0] sq[1:] = y[ind+h]**2 - y[ind]**2 + ay2[ind] - ay2[ind+1] sq = numpy.add.accumulate(sq) sqmin=sq.min() ii = numpy.where(sq==sqmin)[0] Hopt = indices[ii[0]:ii[0]+h] ndup = len(ii) slutn = ay[ii] initmean=slutn[numpy.floor((ndup+1)/2 - 1)]/h initcov=sqmin/(h-1) # calculating consistency factor res=(y-initmean)**2/initcov sortres=numpy.sort(res) factor=sortres[h-1]/stats.chi2.ppf(float(h)/ncas,1) initcov=factor*initcov res=(y-initmean)**2/initcov #raw_robdist^2 quantile=stats.chi2.ppf(0.975,1) weights=res References: Message-ID: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> Hi Sebastian, in astronomy a method called kappa-sigma-clipping is sometimes used to estimate the background level by clipping away most of the signal: http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro I am not aware of a python implementation, but it's just a few lines of code. If you can identify the background level approximately by eye, e.g. by plotting a histogram of your data, you should be able to just fit the tail of the Gaussian that only contains background. Here is my attempt at doing such a fit using scipy.stats.rv_continous.fit(), similar to but not exactly what you want: from scipy.stats import norm, halfnorm, uniform signal = - uniform.rvs(0, 3, size=10000) background = norm.rvs(size=10000) data = hstack((signal, background)) hist(data, bins=30) selection = data[data>0] halfnorm.fit(selection) x = linspace(-3, 3, 100) y = selection.sum() * halfnorm.pdf(x)/3 plot(x,y) Good luck! Christoph On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: > Hi, > In image analysis one is often faced with (often unknown) background > levels (offset) + (Gaussian) background noise. > The overall intensity histogram of the image is in fact often Gaussian > (Bell shaped), but depending on how many (foreground) objects are > present the histogram shows a positive tail of some sort. > > So, I just got the idea if there was a function (i.e. mathematical > algorithm) that would allow to fit only the left half of a Gaussian > bell curve to data points !? > This would have to be done in a way that the center, the variance (or > sigma) and the peak height are free fitting parameters. > > Any help or ideas are appreciated, > thanks > Sebastian Haase > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From dwf at cs.toronto.edu Thu Jul 15 20:00:09 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 15 Jul 2010 20:00:09 -0400 Subject: [SciPy-User] Saving Complex Numbers In-Reply-To: <0c2af6ba358eb4f073177cfd6784a03c.squirrel@webmail.uci.edu> References: <0c2af6ba358eb4f073177cfd6784a03c.squirrel@webmail.uci.edu> Message-ID: <936CB3EB-F2FB-4926-9AAD-1F238343EF0F@cs.toronto.edu> (CCing NumPy-discussion where this really belongs) On 2010-07-08, at 1:34 PM, cfrazer at uci.edu wrote: > Need Complex numbers in the saved file. Ack, this has come up several times according to list archives and no one's been able to provide a real answer. It seems that there is nearly no formatting support for complex numbers in Python. for a single value, "{0.real:.18e}{0.imag:+.18e}".format(val) will get the job done, but because of the way numpy.savetxt creates its format string this isn't a trivial fix. Anyone else have ideas on how complex number format strings can be elegantly incorporated in savetxt? David From briedel at wisc.edu Fri Jul 16 01:08:21 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Fri, 16 Jul 2010 00:08:21 -0500 Subject: [SciPy-User] curve_fit missing from scipy.optimize Message-ID: Hello all, I was setting up my new server at the moment and wanted to install scipy on it. I got it all setup thanks to a couple online tutorials. When I tried to run one of my scripts, I got a segmentation fault when it came to importing scipy.optimize. I then used the software manager to install another version of scipy (0.7.0-2 instead of 0.7.2). I then could at least import scipy.optimize, but scipy.optimize.curve_fit could not be found. So I installed 0.7.2 again and now scipy.optimize could be found, but curve_fit was still missing. I looked on google and could only find one solution by replacing the minpack.py file. I tried that and does not seem to work either. Any other ideas or hints? Thanks a lot in advance. Cheers, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 16 01:23:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 15 Jul 2010 23:23:32 -0600 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: On Thu, Jul 15, 2010 at 11:08 PM, Benedikt Riedel wrote: > Hello all, > > I was setting up my new server at the moment and wanted to install scipy on > it. I got it all setup thanks to a couple online tutorials. When I tried to > run one of my scripts, I got a segmentation fault when it came to importing > scipy.optimize. I then used the software manager to install another version > of scipy (0.7.0-2 instead of 0.7.2). I then could at least import > scipy.optimize, but scipy.optimize.curve_fit could not be found. So I > installed 0.7.2 again and now scipy.optimize could be found, but curve_fit > was still missing. I looked on google and could only find one solution by > replacing the minpack.py file. I tried that and does not seem to work > either. Any other ideas or hints? > > What operating system/distribution is this on? What software manager? This definitely looks like an installation problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 16 01:43:31 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 16 Jul 2010 01:43:31 -0400 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: On Fri, Jul 16, 2010 at 1:23 AM, Charles R Harris wrote: > > > On Thu, Jul 15, 2010 at 11:08 PM, Benedikt Riedel wrote: >> >> Hello all, >> >> I was setting up my new server at the moment and wanted to install scipy >> on it. I got it all setup thanks to a couple online tutorials. When I tried >> to run one of my scripts, I got a segmentation fault when it came to >> importing scipy.optimize. I then used the software manager to install >> another version of scipy (0.7.0-2 instead of 0.7.2). I then could at least >> import scipy.optimize, but scipy.optimize.curve_fit could not be found. So I >> installed 0.7.2 again and now scipy.optimize could be found, but curve_fit >> was still missing. I looked on google and could only find one solution by >> replacing the minpack.py file. I tried that and does not seem to work >> either. Any other ideas or hints? >> > > What operating system/distribution is this on? What software manager? This > definitely looks like an installation problem. optimize.curve_fit was added to scipy after 0.7.x (if I remember correctly) curve_fit is a standalone function (plus two helper functions) and can be copied anywhere. Josef > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From briedel at wisc.edu Fri Jul 16 01:52:03 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Fri, 16 Jul 2010 00:52:03 -0500 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: Hey, It is Ubuntu 10.04 on an AMD-64 from the alternative install CD. I first installed build-essential gfortran libatlas-sse2-dev python-all-dev ipython subversion via apt-get. Then I installed nose, numpy and finally scipy using the package from the website http://python-nose.googlecode.com/files http://superb-east.dl.sourceforge.net/sourceforge/numpy http://voxel.dl.sourceforge.net/sourceforge/scipy/ When I first installed it. Nose and numpy went through fine, but scipy installation had some g++ problems, so I had to install a new g++. I tried it out and optimize was screwed. When that did not work, I installed 0.7.0 via the Ubuntu software center. After that optimize worked, but curve_fit was gone. So back to 0.7.2 I went to and optimize now worked, but curve_fit still did not. Cheers, Ben On Fri, Jul 16, 2010 at 00:23, Charles R Harris wrote: > > > On Thu, Jul 15, 2010 at 11:08 PM, Benedikt Riedel wrote: > >> Hello all, >> >> I was setting up my new server at the moment and wanted to install scipy >> on it. I got it all setup thanks to a couple online tutorials. When I tried >> to run one of my scripts, I got a segmentation fault when it came to >> importing scipy.optimize. I then used the software manager to install >> another version of scipy (0.7.0-2 instead of 0.7.2). I then could at least >> import scipy.optimize, but scipy.optimize.curve_fit could not be found. So I >> installed 0.7.2 again and now scipy.optimize could be found, but curve_fit >> was still missing. I looked on google and could only find one solution by >> replacing the minpack.py file. I tried that and does not seem to work >> either. Any other ideas or hints? >> >> > What operating system/distribution is this on? What software manager? This > definitely looks like an installation problem. > > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 16 02:10:44 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 16 Jul 2010 00:10:44 -0600 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: On Thu, Jul 15, 2010 at 11:52 PM, Benedikt Riedel wrote: > Hey, > > It is Ubuntu 10.04 on an AMD-64 from the alternative install CD. I first > installed > > Same here except AMD Phenom. > build-essential gfortran libatlas-sse2-dev > > Special SSE2 packages are a 32 bit thing, are you running the 32 bit version of Ubuntu? > python-all-dev ipython > subversion > > via apt-get. Then I installed nose, numpy and finally scipy using the > package from the website > > Usually I apt-get numpy and scipy for the dependencies, then install from svn. If you do install from source in addition to the ubuntu packages you might want to modify the path so the proper package is used. I use $charris at ubuntu ~$ cat ~/.local/lib/python2.6/site-packages/install.pth /usr/local/lib/python2.6/dist-packages Although I suspect /usr/local/lib/python2.6/site-packages would work as well. > http://python-nose.googlecode.com/files > > http://superb-east.dl.sourceforge.net/sourceforge/numpy > http://voxel.dl.sourceforge.net/sourceforge/scipy/ > > When I first installed it. Nose and numpy went through fine, but scipy > installation had some g++ problems, so I had to install a new g++. I tried > it out and optimize was screwed. > > When that did not work, I installed 0.7.0 via the Ubuntu software center. > After that optimize worked, but curve_fit was gone. So back to 0.7.2 I went > to and optimize now worked, but curve_fit still did not. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From briedel at wisc.edu Fri Jul 16 09:22:51 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Fri, 16 Jul 2010 08:22:51 -0500 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: On Fri, Jul 16, 2010 at 01:10, Charles R Harris wrote: > > > On Thu, Jul 15, 2010 at 11:52 PM, Benedikt Riedel wrote: > >> Hey, >> >> It is Ubuntu 10.04 on an AMD-64 from the alternative install CD. I first >> installed >> >> > Same here except AMD Phenom. > >> build-essential gfortran libatlas-sse2-dev >> >> > Special SSE2 packages are a 32 bit thing, are you running the 32 bit > version of Ubuntu? > Running a 64-bit version > python-all-dev ipython >> subversion >> >> via apt-get. Then I installed nose, numpy and finally scipy using the >> package from the website >> >> > Usually I apt-get numpy and scipy for the dependencies, then install from > svn. If you do install from source in addition to the ubuntu packages you > might want to modify the path so the proper package is used. I use > > $charris at ubuntu ~$ cat ~/.local/lib/python2.6/site-packages/install.pth > /usr/local/lib/python2.6/dist-packages > > Although I suspect /usr/local/lib/python2.6/site-packages would work as > well. > I tried the apt-get path but it tells me that numpy and scipy is installed therefore I cant reinstall it. I dont seem to have /usr/local/lib/python2.6/site-packages but instead only have a /usr/local/lib/python2.6/dist-packages. I checked out the minpack.py in the dist-packages folder (/usr/local/lib/python2.6/dist-packages) and found that it did have a reference to curve_fit, but when I checked /usr/lib/python2.6/dist-packages and the minpack.py did not have a reference to curve_fit. http://python-nose.googlecode.com/files >> >> http://superb-east.dl.sourceforge.net/sourceforge/numpy >> http://voxel.dl.sourceforge.net/sourceforge/scipy/ >> >> When I first installed it. Nose and numpy went through fine, but scipy >> installation had some g++ problems, so I had to install a new g++. I tried >> it out and optimize was screwed. >> >> When that did not work, I installed 0.7.0 via the Ubuntu software center. >> After that optimize worked, but curve_fit was gone. So back to 0.7.2 I went >> to and optimize now worked, but curve_fit still did not. >> >> > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > Cheers, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 16 10:02:34 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 16 Jul 2010 08:02:34 -0600 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: On Fri, Jul 16, 2010 at 7:22 AM, Benedikt Riedel wrote: > > > On Fri, Jul 16, 2010 at 01:10, Charles R Harris > wrote: > >> >> >> On Thu, Jul 15, 2010 at 11:52 PM, Benedikt Riedel wrote: >> >>> Hey, >>> >>> It is Ubuntu 10.04 on an AMD-64 from the alternative install CD. I first >>> installed >>> >>> >> Same here except AMD Phenom. >> >>> build-essential gfortran libatlas-sse2-dev >>> >>> >> Special SSE2 packages are a 32 bit thing, are you running the 32 bit >> version of Ubuntu? >> > > > Running a 64-bit version > >> python-all-dev ipython >>> subversion >>> >>> via apt-get. Then I installed nose, numpy and finally scipy using the >>> package from the website >>> >>> >> Usually I apt-get numpy and scipy for the dependencies, then install from >> svn. If you do install from source in addition to the ubuntu packages you >> might want to modify the path so the proper package is used. I use >> >> $charris at ubuntu ~$ cat ~/.local/lib/python2.6/site-packages/install.pth >> /usr/local/lib/python2.6/dist-packages >> >> Although I suspect /usr/local/lib/python2.6/site-packages would work as >> well. >> > > > I tried the apt-get path but it tells me that numpy and scipy is installed > therefore I cant reinstall it. > > I dont seem to have /usr/local/lib/python2.6/site-packages but instead only > have a /usr/local/lib/python2.6/dist-packages. > > I checked out the minpack.py in the dist-packages folder > (/usr/local/lib/python2.6/dist-packages) and found that it did have a > reference to curve_fit, but when I checked /usr/lib/python2.6/dist-packages > and the minpack.py did not have a reference to curve_fit. > > So are things working for you now? You need to make ~/.local/lib/python2.6/site-packages/install.pth yourself as it isn't there out of the box. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Fri Jul 16 15:34:36 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 16 Jul 2010 21:34:36 +0200 Subject: [SciPy-User] one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> Message-ID: Zach and Christoph, thanks for your replies. I was thinking about 1D-fitting the histogram. And I need this to be fully automatic so that I can apply that to many subregions of many images. I have to think about your suggestions for a while. Thanks, Sebastian. On Fri, Jul 16, 2010 at 1:04 AM, Christoph Deil wrote: > Hi Sebastian, > > in astronomy a method called kappa-sigma-clipping is sometimes used > to estimate the background level by clipping away most of the signal: > http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro > I am not aware of a python implementation, but it's just a few lines of code. > > If you can identify the background level approximately by eye, > e.g. by plotting a histogram of your data, you should be able to > just fit the tail of the Gaussian that only contains background. > > Here is my attempt at doing such a fit using scipy.stats.rv_continous.fit(), > similar to but not exactly what you want: > > from scipy.stats import norm, halfnorm, uniform > signal = - uniform.rvs(0, 3, size=10000) > background = norm.rvs(size=10000) > data = hstack((signal, background)) > hist(data, bins=30) > selection = data[data>0] > halfnorm.fit(selection) > x = linspace(-3, 3, 100) > y = selection.sum() * halfnorm.pdf(x)/3 > plot(x,y) > > Good luck! > Christoph > > On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: > >> Hi, >> In image analysis one is often faced with (often unknown) background >> levels (offset) + (Gaussian) background noise. >> The overall intensity histogram of the image is in fact often Gaussian >> (Bell shaped), but depending on how many (foreground) objects are >> present the histogram shows a positive tail of some sort. >> >> So, I just got the idea if there was a function (i.e. mathematical >> algorithm) that would allow to fit only the left half of a Gaussian >> bell curve to data points !? >> This would have to be done in a way that the center, the variance (or >> sigma) and the peak height are free fitting parameters. >> >> Any help or ideas are appreciated, >> thanks >> Sebastian Haase >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Fri Jul 16 18:07:08 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 16 Jul 2010 16:07:08 -0600 Subject: [SciPy-User] one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> Message-ID: <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> > I was thinking about 1D-fitting the histogram. And I need this to be > fully automatic so that I can apply that to many subregions of many > images. > I have to think about your suggestions for a while. Well, if you don't want to estimate the mean/std of the underlying data but just work based on the histogram, you could use a nonlinear optimizer to fit a mean/std-parameterized gaussian PDF to the 1D histogram (with some amount of the tails chopped off). Just make the loss function the RMS between the data points (histogram height) and the fit curve at the positions of each data point? That would probably work too? But I'm not a statistician. Though, now that I think on it, I seem to recall that the EM algorithm was originally deployed to estimate parameters of gaussians with censored tails. (I think the problem was: how tall was the average Frenchman, given distribution of heights of French soldiers and the knowledge that there was a height minimum for army service?) I think you just estimate the mean/std from the censored data, then fill in the censored tails with samples from the fit distribution, and then re- estimate the mean/std from the new data, etc. I forget exactly how one does this (does it work on the histogram, or the underlying data, e.g.) but that's the general idea. Zach > Thanks, > Sebastian. > > > On Fri, Jul 16, 2010 at 1:04 AM, Christoph Deil > wrote: >> Hi Sebastian, >> >> in astronomy a method called kappa-sigma-clipping is sometimes used >> to estimate the background level by clipping away most of the signal: >> http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro >> I am not aware of a python implementation, but it's just a few >> lines of code. >> >> If you can identify the background level approximately by eye, >> e.g. by plotting a histogram of your data, you should be able to >> just fit the tail of the Gaussian that only contains background. >> >> Here is my attempt at doing such a fit using >> scipy.stats.rv_continous.fit(), >> similar to but not exactly what you want: >> >> from scipy.stats import norm, halfnorm, uniform >> signal = - uniform.rvs(0, 3, size=10000) >> background = norm.rvs(size=10000) >> data = hstack((signal, background)) >> hist(data, bins=30) >> selection = data[data>0] >> halfnorm.fit(selection) >> x = linspace(-3, 3, 100) >> y = selection.sum() * halfnorm.pdf(x)/3 >> plot(x,y) >> >> Good luck! >> Christoph >> >> On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: >> >>> Hi, >>> In image analysis one is often faced with (often unknown) background >>> levels (offset) + (Gaussian) background noise. >>> The overall intensity histogram of the image is in fact often >>> Gaussian >>> (Bell shaped), but depending on how many (foreground) objects are >>> present the histogram shows a positive tail of some sort. >>> >>> So, I just got the idea if there was a function (i.e. mathematical >>> algorithm) that would allow to fit only the left half of a Gaussian >>> bell curve to data points !? >>> This would have to be done in a way that the center, the variance >>> (or >>> sigma) and the peak height are free fitting parameters. >>> >>> Any help or ideas are appreciated, >>> thanks >>> Sebastian Haase >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From brian.lee.hawthorne at gmail.com Sat Jul 17 11:19:30 2010 From: brian.lee.hawthorne at gmail.com (Brian Hawthorne) Date: Sat, 17 Jul 2010 08:19:30 -0700 Subject: [SciPy-User] (no subject) Message-ID: http://sites.google.com/site/fdgy754g/ljie4j From brian.lee.hawthorne at gmail.com Sat Jul 17 11:19:35 2010 From: brian.lee.hawthorne at gmail.com (Brian Hawthorne) Date: Sat, 17 Jul 2010 08:19:35 -0700 Subject: [SciPy-User] (no subject) Message-ID: http://sites.google.com/site/fdgy754g/avsu5o From cool-rr at cool-rr.com Sun Jul 18 13:00:26 2010 From: cool-rr at cool-rr.com (cool-RR) Date: Sun, 18 Jul 2010 19:00:26 +0200 Subject: [SciPy-User] Python 2.7 MSI installer for SciPy Message-ID: Hello. I'd appreciate if the SciPy team could provide an MSI installer for Python 2.7. Thanks, Ram Rachum. -------------- next part -------------- An HTML attachment was scrubbed... URL: From briedel at wisc.edu Mon Jul 19 00:30:53 2010 From: briedel at wisc.edu (Benedikt Riedel) Date: Sun, 18 Jul 2010 23:30:53 -0500 Subject: [SciPy-User] curve_fit missing from scipy.optimize In-Reply-To: References: Message-ID: Hi Sorry for the late reply. I was out for a couple days. I was trying to create the install.pth file and noticed that there is no ~/.local/lib/ directory. This might the problem here, but i am not sure. Cheers, Ben On Fri, Jul 16, 2010 at 09:02, Charles R Harris wrote: > > > On Fri, Jul 16, 2010 at 7:22 AM, Benedikt Riedel wrote: > >> >> >> On Fri, Jul 16, 2010 at 01:10, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Thu, Jul 15, 2010 at 11:52 PM, Benedikt Riedel wrote: >>> >>>> Hey, >>>> >>>> It is Ubuntu 10.04 on an AMD-64 from the alternative install CD. I first >>>> installed >>>> >>>> >>> Same here except AMD Phenom. >>> >>>> build-essential gfortran libatlas-sse2-dev >>>> >>>> >>> Special SSE2 packages are a 32 bit thing, are you running the 32 bit >>> version of Ubuntu? >>> >> >> >> Running a 64-bit version >> >>> python-all-dev ipython >>>> subversion >>>> >>>> via apt-get. Then I installed nose, numpy and finally scipy using the >>>> package from the website >>>> >>>> >>> Usually I apt-get numpy and scipy for the dependencies, then install from >>> svn. If you do install from source in addition to the ubuntu packages you >>> might want to modify the path so the proper package is used. I use >>> >>> $charris at ubuntu ~$ cat ~/.local/lib/python2.6/site-packages/install.pth >>> /usr/local/lib/python2.6/dist-packages >>> >>> Although I suspect /usr/local/lib/python2.6/site-packages would work as >>> well. >>> >> >> >> I tried the apt-get path but it tells me that numpy and scipy is installed >> therefore I cant reinstall it. >> >> I dont seem to have /usr/local/lib/python2.6/site-packages but instead >> only have a /usr/local/lib/python2.6/dist-packages. >> >> I checked out the minpack.py in the dist-packages folder >> (/usr/local/lib/python2.6/dist-packages) and found that it did have a >> reference to curve_fit, but when I checked /usr/lib/python2.6/dist-packages >> and the minpack.py did not have a reference to curve_fit. >> >> > So are things working for you now? You need to make > ~/.local/lib/python2.6/site-packages/install.pth yourself as it isn't there > out of the box. > > > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Benedikt Riedel Graduate Student University of Wisconsin-Madison Department of Physics Office: 2304 Chamberlin Hall Lab: 6247 Chamberlin Hall Tel: (608) 301-5736 Cell: (213) 519-1771 Lab: (608) 262-5916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Mon Jul 19 14:42:32 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 19 Jul 2010 13:42:32 -0500 Subject: [SciPy-User] ANN: scipy 0.8.0 release candidate 3 In-Reply-To: References: Message-ID: On Thu, Jul 15, 2010 at 10:27 AM, Ralf Gommers wrote: > I'm pleased to announce the availability of the third release candidate of > SciPy 0.8.0. The only changes compared to rc2 are a fix for a regression in > interpolate.Rbf and some fixes for failures on 64-bit Windows. If no more > problems are reported, the final release will be available in one week. > > SciPy is a package of tools for science and engineering for Python. It > includes modules for statistics, optimization, integration, linear algebra, > Fourier transforms, signal and image processing, ODE solvers, and more. > > This release candidate release comes one and a half year after the 0.7.0 > release and contains many new features, numerous bug-fixes, improved test > coverage, and better documentation. ?Please note that SciPy 0.8.0rc3 > requires Python 2.4-2.6 and NumPy 1.4.1 or greater. > > For more information, please see the release notes: > http://sourceforge.net/projects/scipy/files/scipy/0.8.0rc3/NOTES.txt/view > > You can download the release from here: > https://sourceforge.net/projects/scipy/ > Python 2.5/2.6 binaries for Windows and OS X are available, as well as > source tarballs for other platforms and the documentation in pdf form. > > Thank you to everybody who contributed to this release. > > Enjoy, > Ralf > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > Hi, I do not have any errors with Fedora 13 Python 2.6 version. This bug still prevents building scipy under Python 2.7: http://projects.scipy.org/scipy/ticket/1180 There are four tests that fail under Python2.4 with my Linux 64bit system. The first is due to 'functools' which is new in Python 2.5 (http://docs.python.org/library/functools.html) and the rest are from the same problem. I have not had the time yet to look further at these. Also, I have to update by SVN to see what Python 2.4 issues are present. My Unladen Swallow 2009Q4 build (python 2.6) crashes during tests: test_singular (test_linsolve.TestLinsolve) ... ok test_twodiags (test_linsolve.TestLinsolve) ... ok test_linsolve.TestSplu.test_lu_refcount ... ok test_linsolve.TestSplu.test_spilu_nnz0 ... Fatal Python error: UNREF invalid object Aborted I have Unladen Swallow out of curiosity so I don't really care about this problem. Since the standard Python 2.6 runs it may just be that version Unladen Swallow. Python 2.5 is missing because it not built correctly (I get one error due to missing _md5 modules as a result). Bruce Python 2.4.5 (#1, Oct 6 2008, 09:54:35) [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> import scipy as sp >>> np.__version__ '2.0.0.dev8391' >>> sp.__version__ '0.8.0rc3' ====================================================================== ERROR: Failure: ImportError (No module named functools) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/nose/loader.py", line 363, in loadTestsFromName module = self.importer.importFromPath( File "/usr/local/lib/python2.4/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/local/lib/python2.4/site-packages/nose/importer.py", line 84, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.4/site-packages/scipy/io/matlab/tests/test_mio.py", line 11, in ? from functools import partial ImportError: No module named functools ====================================================================== ERROR: test suite ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 154, in run self.setUp() File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 180, in setUp if not self: File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 65, in __nonzero__ test = self.test_generator.next() File "/usr/local/lib/python2.4/site-packages/nose/loader.py", line 221, in generate for test in g(): File "/usr/local/lib/python2.4/site-packages/scipy/io/tests/test_netcdf.py", line 52, in test_read_write_files f = netcdf_file('simple.nc') File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 182, in __init__ self._read() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 411, in _read self._read_var_array() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 451, in _read_var_array (name, dimensions, shape, attributes, File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 533, in _read_var dimname = self._dims[dimid] TypeError: list indices must be integers ====================================================================== ERROR: test suite ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 154, in run self.setUp() File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 180, in setUp if not self: File "/usr/local/lib/python2.4/site-packages/nose/suite.py", line 65, in __nonzero__ test = self.test_generator.next() File "/usr/local/lib/python2.4/site-packages/nose/loader.py", line 221, in generate for test in g(): File "/usr/local/lib/python2.4/site-packages/scipy/io/tests/test_netcdf.py", line 91, in test_read_write_sio f2 = netcdf_file(eg_sio2) File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 182, in __init__ self._read() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 411, in _read self._read_var_array() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 451, in _read_var_array (name, dimensions, shape, attributes, File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 533, in _read_var dimname = self._dims[dimid] TypeError: list indices must be integers ====================================================================== ERROR: test_netcdf.test_read_example_data ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib/python2.4/site-packages/scipy/io/tests/test_netcdf.py", line 119, in test_read_example_data f = netcdf_file(fname, 'r') File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 182, in __init__ self._read() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 411, in _read self._read_var_array() File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 451, in _read_var_array (name, dimensions, shape, attributes, File "/usr/local/lib/python2.4/site-packages/scipy/io/netcdf.py", line 533, in _read_var dimname = self._dims[dimid] TypeError: list indices must be integers ---------------------------------------------------------------------- Ran 4258 tests in 43.713s FAILED (KNOWNFAIL=13, SKIP=38, errors=4) From palaniappan.chetty at gmail.com Mon Jul 19 19:04:33 2010 From: palaniappan.chetty at gmail.com (=?UTF-8?B?4K6q4K604K6o4K6/IOCumuCvhw==?=) Date: Mon, 19 Jul 2010 19:04:33 -0400 Subject: [SciPy-User] pylab Message-ID: hi, I have a question about pylab/matplotlib, I am interested in plots and I want to know if I can have some data points in a data sets missing but still create a plot using pylab? For example (assuming all modules have been imported) >x = [1,2,3,4] >y=[10,20,30,40] >pylab.plot(x,y) >pylab.show() works fine. But what if I have one or more data points missing in my y data set? like this >x = [1,2,3,4] >y=[10,20, ,40] >pylab.plot(x,y) >pylab.show() I know that I cannot have an empty element in my list and this does not work Thanks -- Palani From josh.holbrook at gmail.com Mon Jul 19 19:24:28 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Mon, 19 Jul 2010 15:24:28 -0800 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: 2010/7/19 ???? ?? : > hi, > I have a question about pylab/matplotlib, I am interested in plots and > I want to know if I can have some data points in a data sets missing > but still create a plot using pylab? For example (assuming all modules > have been imported) > >>x = [1,2,3,4] >>y=[10,20,30,40] >>pylab.plot(x,y) >>pylab.show() > > works fine. But what if I have one or more data points missing in my y > data set? like this > >>x = [1,2,3,4] >>y=[10,20, ,40] >>pylab.plot(x,y) >>pylab.show() > I know that I cannot have an empty element in my list and this does not work > > Thanks > -- > Palani > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Hey Palani, I'm not extremely familiar with matplotlib, but my experience tells me that, while MPL itself wouldn't really have any nice way to do this, that you could import a dataset and use python/numpy to clean it up. For example, you could maybe use filter() and zip() (zip's my favorite toy), maybe like this: In [30]: x Out[30]: [0, 1, 2, 3, 4] In [31]: y Out[31]: [0, 1, 4, None, 16] In [32]: zip(*filter(lambda x: x[1] != None, zip(x,y))) Out[32]: [(0, 1, 2, 4), (0, 1, 4, 16)] and then you could do plot(_[0],_[1]). Alternately, and this would probably be worth investigating for bigger datasets, you could maybe use masked arrays (http://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html) to do something similar in spirit. Hope that helped! --Josh From PHobson at Geosyntec.com Mon Jul 19 19:39:18 2010 From: PHobson at Geosyntec.com (PHobson at Geosyntec.com) Date: Mon, 19 Jul 2010 19:39:18 -0400 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: > -----Original Message----- > From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] > On Behalf Of Joshua Holbrook > Sent: Monday, July 19, 2010 4:24 PM > To: SciPy Users List > Subject: Re: [SciPy-User] pylab > > 2010/7/19 ???? ?? : > > hi, > > I have a question about pylab/matplotlib, I am interested in plots and > > I want to know if I can have some data points in a data sets missing > > but still create a plot using pylab? For example (assuming all modules > > have been imported) > > > >>x = [1,2,3,4] > >>y=[10,20,30,40] > >>pylab.plot(x,y) > >>pylab.show() > > > > works fine. But what if I have one or more data points missing in my y > > data set? like this > > > >>x = [1,2,3,4] > >>y=[10,20, ,40] > >>pylab.plot(x,y) > >>pylab.show() > > I know that I cannot have an empty element in my list and this does not > work > > > > Thanks > > -- > > Palani > > > > Hey Palani, > > I'm not extremely familiar with matplotlib, but my experience tells me > that, while MPL itself wouldn't really have any nice way to do this, > that you could import a dataset and use python/numpy to clean it up. > For example, you could maybe use filter() and zip() (zip's my > favorite toy), maybe like this: > > > In [30]: x > Out[30]: [0, 1, 2, 3, 4] > > In [31]: y > Out[31]: [0, 1, 4, None, 16] > > In [32]: zip(*filter(lambda x: x[1] != None, zip(x,y))) > Out[32]: [(0, 1, 2, 4), (0, 1, 4, 16)] > > and then you could do plot(_[0],_[1]). Alternately, and this would > probably be worth investigating for bigger datasets, you could maybe > use masked arrays > (http://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html) > to do something similar in spirit. > > Hope that helped! As a big MPL user, that's an interesting solution. MPL and numpy were my primary gateways into Python from Matlab, so that's pretty informative for me. Given my background, I tend to take the more brute-force approach and would use the masked arrays. For the OP: #--- import numpy as np import matplotlib.pyplot as plt x = np.arange(5) y = np.ma.MaskedArray(data=[0,1,4,None,16],mask=[0,0,0,1,0]) fig = plt.figure() ax1 = fig.add_subplot(111) ax1.plot(x,y,'ko') fig.savefig('masktest.png') -paul -------------- next part -------------- A non-text attachment was scrubbed... Name: masktest.png Type: image/png Size: 38523 bytes Desc: masktest.png URL: From josh.holbrook at gmail.com Mon Jul 19 19:49:18 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Mon, 19 Jul 2010 15:49:18 -0800 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: On Mon, Jul 19, 2010 at 3:39 PM, wrote: > > >> -----Original Message----- >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] >> On Behalf Of Joshua Holbrook >> Sent: Monday, July 19, 2010 4:24 PM >> To: SciPy Users List >> Subject: Re: [SciPy-User] pylab >> >> 2010/7/19 ???? ?? : >> > hi, >> > I have a question about pylab/matplotlib, I am interested in plots and >> > I want to know if I can have some data points in a data sets missing >> > but still create a plot using pylab? For example (assuming all modules >> > have been imported) >> > >> >>x = [1,2,3,4] >> >>y=[10,20,30,40] >> >>pylab.plot(x,y) >> >>pylab.show() >> > >> > works fine. But what if I have one or more data points missing in my y >> > data set? like this >> > >> >>x = [1,2,3,4] >> >>y=[10,20, ,40] >> >>pylab.plot(x,y) >> >>pylab.show() >> > I know that I cannot have an empty element in my list and this does not >> work >> > >> > Thanks >> > -- >> > Palani >> > >> >> Hey Palani, >> >> I'm not extremely familiar with matplotlib, but my experience tells me >> that, while MPL itself wouldn't really have any nice way to do this, >> that you could import a dataset and use python/numpy to clean it up. >> For example, you could maybe use filter() and zip() ?(zip's my >> favorite toy), maybe like this: >> >> >> ? ? In [30]: x >> ? ? Out[30]: [0, 1, 2, 3, 4] >> >> ? ? In [31]: y >> ? ? Out[31]: [0, 1, 4, None, 16] >> >> ? ? In [32]: zip(*filter(lambda x: x[1] != None, zip(x,y))) >> ? ? Out[32]: [(0, 1, 2, 4), (0, 1, 4, 16)] >> >> and then you could do plot(_[0],_[1]). ?Alternately, and this would >> probably be worth investigating for bigger datasets, you could maybe >> use masked arrays >> (http://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html) >> to do something similar in spirit. >> >> Hope that helped! > > As a big MPL user, that's an interesting solution. MPL and numpy were my primary gateways into Python from Matlab, so that's pretty informative for me. Given my background, I tend to take the more brute-force approach and would use the masked arrays. > > For the OP: > #--- > import numpy as np > import matplotlib.pyplot as plt > x = np.arange(5) > y = np.ma.MaskedArray(data=[0,1,4,None,16],mask=[0,0,0,1,0]) > fig = plt.figure() > ax1 = fig.add_subplot(111) > ax1.plot(x,y,'ko') > fig.savefig('masktest.png') > > -paul > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > So you can pass MaskedArrays to MPL and it'll filter them out on its own? Neat! Probably faster too, for large datasets. --Josh From lists at hilboll.de Mon Jul 19 20:07:27 2010 From: lists at hilboll.de (Andreas) Date: Tue, 20 Jul 2010 02:07:27 +0200 (CEST) Subject: [SciPy-User] pylab Message-ID: > works fine. But what if I have one or more data points missing in my y data set? like this >> x = [1,2,3,4] >> y=[10,20, ,40] >> pylab.plot(x,y) >> pylab.show() take a look at numpy.masked_array. matplotlib can handle masked arrays. cheers, a. From ben.root at ou.edu Mon Jul 19 19:59:29 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 19 Jul 2010 18:59:29 -0500 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: On Mon, Jul 19, 2010 at 6:49 PM, Joshua Holbrook wrote: > On Mon, Jul 19, 2010 at 3:39 PM, wrote: > > > > > >> -----Original Message----- > >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org > ] > >> On Behalf Of Joshua Holbrook > >> Sent: Monday, July 19, 2010 4:24 PM > >> To: SciPy Users List > >> Subject: Re: [SciPy-User] pylab > >> > >> 2010/7/19 ???? ?? : > >> > hi, > >> > I have a question about pylab/matplotlib, I am interested in plots and > >> > I want to know if I can have some data points in a data sets missing > >> > but still create a plot using pylab? For example (assuming all modules > >> > have been imported) > >> > > >> >>x = [1,2,3,4] > >> >>y=[10,20,30,40] > >> >>pylab.plot(x,y) > >> >>pylab.show() > >> > > >> > works fine. But what if I have one or more data points missing in my y > >> > data set? like this > >> > > >> >>x = [1,2,3,4] > >> >>y=[10,20, ,40] > >> >>pylab.plot(x,y) > >> >>pylab.show() > >> > I know that I cannot have an empty element in my list and this does > not > >> work > >> > > >> > Thanks > >> > -- > >> > Palani > >> > > >> > >> Hey Palani, > >> > >> I'm not extremely familiar with matplotlib, but my experience tells me > >> that, while MPL itself wouldn't really have any nice way to do this, > >> that you could import a dataset and use python/numpy to clean it up. > >> For example, you could maybe use filter() and zip() (zip's my > >> favorite toy), maybe like this: > >> > >> > >> In [30]: x > >> Out[30]: [0, 1, 2, 3, 4] > >> > >> In [31]: y > >> Out[31]: [0, 1, 4, None, 16] > >> > >> In [32]: zip(*filter(lambda x: x[1] != None, zip(x,y))) > >> Out[32]: [(0, 1, 2, 4), (0, 1, 4, 16)] > >> > >> and then you could do plot(_[0],_[1]). Alternately, and this would > >> probably be worth investigating for bigger datasets, you could maybe > >> use masked arrays > >> (http://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html) > >> to do something similar in spirit. > >> > >> Hope that helped! > > > > As a big MPL user, that's an interesting solution. MPL and numpy were my > primary gateways into Python from Matlab, so that's pretty informative for > me. Given my background, I tend to take the more brute-force approach and > would use the masked arrays. > > > > For the OP: > > #--- > > import numpy as np > > import matplotlib.pyplot as plt > > x = np.arange(5) > > y = np.ma.MaskedArray(data=[0,1,4,None,16],mask=[0,0,0,1,0]) > > fig = plt.figure() > > ax1 = fig.add_subplot(111) > > ax1.plot(x,y,'ko') > > fig.savefig('masktest.png') > > > > -paul > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > So you can pass MaskedArrays to MPL and it'll filter them out on its > own? Neat! Probably faster too, for large datasets. > > > --Josh > Yes, MaskedArrays are the preferred way to do this. If you run into a situation where plotting MaskedArrays does not work, then that is a bug and should be reported. Btw, I would avoid using None as an empty value. NaNs might be better. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Mon Jul 19 20:01:02 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 19 Jul 2010 19:01:02 -0500 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: <4C44E73E.5040808@enthought.com> Andreas wrote: >> works fine. But what if I have one or more data points missing in my y >> > data set? like this > >>> x = [1,2,3,4] >>> y=[10,20, ,40] >>> pylab.plot(x,y) >>> pylab.show() >>> > > take a look at numpy.masked_array. matplotlib can handle masked arrays. > > It also skips nan's. E.g. In [21]: x = range(10) In [22]: y = [10,20,25,nan,40,10,15,nan,40,50] In [23]: plot(x, y, 'bo-') (I'm using ipython with the -pylab option, so 'nan' is really 'numpy.nan'.) Warren > cheers, > > a. > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pgmdevlist at gmail.com Mon Jul 19 20:05:07 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 19 Jul 2010 20:05:07 -0400 Subject: [SciPy-User] pylab In-Reply-To: References: Message-ID: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> On Jul 19, 2010, at 7:59 PM, Benjamin Root wrote: > > Yes, MaskedArrays are the preferred way to do this. If you run into a situation where plotting MaskedArrays does not work, then that is a bug and should be reported. > > Btw, I would avoid using None as an empty value. NaNs might be better. Indeed, Ben, indeed. A None in a list as input of numpy.ma.array will give your array a 'object' dtype, which will probably not be what you expect. numpy.nans are an option if you deal with floats, not if you deal with integers... But keep in mind that whatever value you choose can be masked: check the masked_where function and similar, for example... From djpine at gmail.com Tue Jul 20 08:08:18 2010 From: djpine at gmail.com (David Pine) Date: Tue, 20 Jul 2010 08:08:18 -0400 Subject: [SciPy-User] inserting nan at a point in two arrays Message-ID: <56A9600E-FD3D-4485-BFE0-D57EF2B884CE@gmail.com> I want to plot tan(x) vs x with a gap in the line plot when x goes thought pi/2, the point where tan(x) goes from +infinity to -infinity. The idea is to insert nan between the points in the tan(x) and x arrays at x=pi/2, which will leave the desired gap when plot is called. Does anyone have an efficient way to do this? From PHobson at Geosyntec.com Tue Jul 20 12:10:58 2010 From: PHobson at Geosyntec.com (PHobson at Geosyntec.com) Date: Tue, 20 Jul 2010 12:10:58 -0400 Subject: [SciPy-User] inserting nan at a point in two arrays In-Reply-To: <56A9600E-FD3D-4485-BFE0-D57EF2B884CE@gmail.com> References: <56A9600E-FD3D-4485-BFE0-D57EF2B884CE@gmail.com> Message-ID: > -----Original Message----- > From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] > On Behalf Of David Pine > Sent: Tuesday, July 20, 2010 5:08 AM > To: scipy-user at scipy.org > Subject: [SciPy-User] inserting nan at a point in two arrays > > I want to plot tan(x) vs x with a gap in the line plot when x goes > thought pi/2, the point where tan(x) goes from +infinity to -infinity. > The idea is to insert nan between the points in the tan(x) and x arrays > at x=pi/2, which will leave the desired gap when plot is called. Does > anyone have an efficient way to do this? Simplest thing to do would be to set a threshold (10,000?) and use a masked array. # code --- import numpy as np import matplotlib.pyplot as plt threshold = 1e5 x = np.arange(0, 2*np.pi, np.pi/16) y = np.tan(x) ym = np.ma.MaskedArray(y, np.abs(y) > threshold) fig = plt.figure() ax1 = fig.add_subplot(1,1,1) ax1.plot(x, ym, 'k-') ax1.set_xlabel('Radians') ax1.set_ylabel('Masked Tangent Function') # ---/code Hope that help -paul From PHobson at Geosyntec.com Tue Jul 20 12:17:20 2010 From: PHobson at Geosyntec.com (PHobson at Geosyntec.com) Date: Tue, 20 Jul 2010 12:17:20 -0400 Subject: [SciPy-User] pylab In-Reply-To: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: > -----Original Message----- > From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] > On Behalf Of Pierre GM > Sent: Monday, July 19, 2010 5:05 PM > To: SciPy Users List > Subject: Re: [SciPy-User] pylab > > > On Jul 19, 2010, at 7:59 PM, Benjamin Root wrote: > > Btw, I would avoid using None as an empty value. NaNs might be better. > > Indeed, Ben, indeed. A None in a list as input of numpy.ma.array will > give your array a 'object' dtype, which will probably not be what you > expect. > numpy.nans are an option if you deal with floats, not if you deal with > integers... > But keep in mind that whatever value you choose can be masked: check the > masked_where function and similar, for example... Ben and Pierre, Thanks for the tips! Glad I chimed in here b/c I've definitely learned something. I'm often pulling data out from a database, so null records come back to me as None's. I'll be sure to set a CASE statement now that fills the NULLs in with an obviously junk value that I can mask from now on. -paul From josh.holbrook at gmail.com Tue Jul 20 12:27:28 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 20 Jul 2010 08:27:28 -0800 Subject: [SciPy-User] pylab In-Reply-To: References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: On Tue, Jul 20, 2010 at 8:17 AM, wrote: >> -----Original Message----- >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] >> On Behalf Of Pierre GM >> Sent: Monday, July 19, 2010 5:05 PM >> To: SciPy Users List >> Subject: Re: [SciPy-User] pylab >> >> >> On Jul 19, 2010, at 7:59 PM, Benjamin Root wrote: >> > Btw, I would avoid using None as an empty value. ?NaNs might be better. >> >> Indeed, Ben, indeed. A None in a list as input of numpy.ma.array will >> give your array a 'object' dtype, which will probably not be what you >> expect. >> numpy.nans are an option if you deal with floats, not if you deal with >> integers... >> But keep in mind that whatever value you choose can be masked: check the >> masked_where function and similar, for example... > > Ben and Pierre, > > Thanks for the tips! Glad I chimed in here b/c I've definitely learned something. I'm often pulling data out from a database, so null records come back to me as None's. I'll be sure to set a CASE statement now that fills the NULLs in with an obviously junk value that I can mask from now on. > > -paul > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Why not user your None as said junk value? --Josh From pgmdevlist at gmail.com Tue Jul 20 12:35:06 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 20 Jul 2010 12:35:06 -0400 Subject: [SciPy-User] pylab In-Reply-To: References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: <52709C90-92EB-47EF-94E1-E8EE89A3A24B@gmail.com> On Jul 20, 2010, at 12:27 PM, Joshua Holbrook wrote: >>> >> >> Ben and Pierre, >> >> Thanks for the tips! Glad I chimed in here b/c I've definitely learned something. I'm often pulling data out from a database, so null records come back to me as None's. I'll be sure to set a CASE statement now that fills the NULLs in with an obviously junk value that I can mask from now on. Well, depend on the kind of db you use. I have a piece of code somewhere that lets you get a MaskedArray or a TimeSeries from a SQLite db, there should be support for MaskedArray in pytables... From palaniappan.chetty at gmail.com Tue Jul 20 13:02:30 2010 From: palaniappan.chetty at gmail.com (=?UTF-8?B?4K6q4K604K6o4K6/IOCumuCvhw==?=) Date: Tue, 20 Jul 2010 13:02:30 -0400 Subject: [SciPy-User] SciPy-User Digest, Vol 83, Issue 36 In-Reply-To: References: Message-ID: Thanks guys! On Tue, Jul 20, 2010 at 1:00 PM, wrote: > Send SciPy-User mailing list submissions to > scipy-user at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/scipy-user > or, via email, send a message with subject or body 'help' to > scipy-user-request at scipy.org > > You can reach the person managing the list at > scipy-user-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of SciPy-User digest..." > > > Today's Topics: > > 1. Re: pylab (Joshua Holbrook) > 2. Re: pylab (Pierre GM) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 20 Jul 2010 08:27:28 -0800 > From: Joshua Holbrook > Subject: Re: [SciPy-User] pylab > To: SciPy Users List > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > On Tue, Jul 20, 2010 at 8:17 AM, wrote: > >> -----Original Message----- > >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org > ] > >> On Behalf Of Pierre GM > >> Sent: Monday, July 19, 2010 5:05 PM > >> To: SciPy Users List > >> Subject: Re: [SciPy-User] pylab > >> > >> > >> On Jul 19, 2010, at 7:59 PM, Benjamin Root wrote: > >> > Btw, I would avoid using None as an empty value. ?NaNs might be > better. > >> > >> Indeed, Ben, indeed. A None in a list as input of numpy.ma.array will > >> give your array a 'object' dtype, which will probably not be what you > >> expect. > >> numpy.nans are an option if you deal with floats, not if you deal with > >> integers... > >> But keep in mind that whatever value you choose can be masked: check the > >> masked_where function and similar, for example... > > > > Ben and Pierre, > > > > Thanks for the tips! Glad I chimed in here b/c I've definitely learned > something. I'm often pulling data out from a database, so null records come > back to me as None's. I'll be sure to set a CASE statement now that fills > the NULLs in with an obviously junk value that I can mask from now on. > > > > -paul > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > Why not user your None as said junk value? > > --Josh > > > ------------------------------ > > Message: 2 > Date: Tue, 20 Jul 2010 12:35:06 -0400 > From: Pierre GM > Subject: Re: [SciPy-User] pylab > To: SciPy Users List > Message-ID: <52709C90-92EB-47EF-94E1-E8EE89A3A24B at gmail.com> > Content-Type: text/plain; charset=us-ascii > > > On Jul 20, 2010, at 12:27 PM, Joshua Holbrook wrote: > >>> > >> > >> Ben and Pierre, > >> > >> Thanks for the tips! Glad I chimed in here b/c I've definitely learned > something. I'm often pulling data out from a database, so null records come > back to me as None's. I'll be sure to set a CASE statement now that fills > the NULLs in with an obviously junk value that I can mask from now on. > > Well, depend on the kind of db you use. I have a piece of code somewhere > that lets you get a MaskedArray or a TimeSeries from a SQLite db, there > should be support for MaskedArray in pytables... > > > > ------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > End of SciPy-User Digest, Vol 83, Issue 36 > ****************************************** > -- Palani Home: (708) 872-5264 -------------- next part -------------- An HTML attachment was scrubbed... URL: From PHobson at Geosyntec.com Tue Jul 20 13:24:41 2010 From: PHobson at Geosyntec.com (PHobson at Geosyntec.com) Date: Tue, 20 Jul 2010 13:24:41 -0400 Subject: [SciPy-User] pylab In-Reply-To: References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: > > Ben and Pierre, > > > > Thanks for the tips! Glad I chimed in here b/c I've definitely learned > something. I'm often pulling data out from a database, so null records > come back to me as None's. I'll be sure to set a CASE statement now that > fills the NULLs in with an obviously junk value that I can mask from now > on. > > > > -paul > Why not user your None as said junk value? > > --Josh All my existing routines just throw values from the database (Postgres or MS SQL) cursor directly into numpy arrays. As Pierre pointed out, the None values force the array's dtype to object. In [40]: x Out[40]: array([2.5, 12.2, 5, None], dtype=object) I could easily mask the Nones, but switching it over to a value such as -99999 would let it me keep the dtype as float or integer. Probably not a big deal at the moment, but I'm trying to adopt as many best practices as possible so nothing comes back to bite me later on. (Negative values are pretty rare in environmental data, so I think it's a safe bet). -paul From djpine at gmail.com Tue Jul 20 13:28:24 2010 From: djpine at gmail.com (David Pine) Date: Wed, 21 Jul 2010 01:28:24 +0800 Subject: [SciPy-User] inserting nan at a point in two arrays In-Reply-To: References: <56A9600E-FD3D-4485-BFE0-D57EF2B884CE@gmail.com> Message-ID: <32A0A102-C01C-4792-88DA-CA489CA282B1@gmail.com> Paul, Marvelous! Thanks. That does the trick. Dave On Jul 21, 2010, at 12:10 AM, wrote: >> -----Original Message----- >> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] >> On Behalf Of David Pine >> Sent: Tuesday, July 20, 2010 5:08 AM >> To: scipy-user at scipy.org >> Subject: [SciPy-User] inserting nan at a point in two arrays >> >> I want to plot tan(x) vs x with a gap in the line plot when x goes >> thought pi/2, the point where tan(x) goes from +infinity to -infinity. >> The idea is to insert nan between the points in the tan(x) and x arrays >> at x=pi/2, which will leave the desired gap when plot is called. Does >> anyone have an efficient way to do this? > > Simplest thing to do would be to set a threshold (10,000?) and use a masked array. > > # code --- > import numpy as np > import matplotlib.pyplot as plt > > threshold = 1e5 > x = np.arange(0, 2*np.pi, np.pi/16) > y = np.tan(x) > ym = np.ma.MaskedArray(y, np.abs(y) > threshold) > > fig = plt.figure() > ax1 = fig.add_subplot(1,1,1) > ax1.plot(x, ym, 'k-') > ax1.set_xlabel('Radians') > ax1.set_ylabel('Masked Tangent Function') > # ---/code > > Hope that help > -paul > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From seb.haase at gmail.com Tue Jul 20 15:14:20 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 20 Jul 2010 21:14:20 +0200 Subject: [SciPy-User] Fwd: one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> Message-ID: On Sat, Jul 17, 2010 at 12:07 AM, Zachary Pincus wrote: >> I was thinking about 1D-fitting the histogram. And I need this to be >> fully automatic so that I can apply that to many subregions of many >> images. >> I have to think about your suggestions for a while. > > Well, if you don't want to estimate the mean/std of the underlying > data but just work based on the histogram, you could use a nonlinear > optimizer to fit a mean/std-parameterized gaussian PDF to the 1D > histogram (with some amount of the tails chopped off). Just make the > loss function the RMS between the data points (histogram height) and > the fit curve at the positions of each data point? That would probably > work too? But I'm not a statistician. > Zach, thanks for your reply. The idea is to calculate the mean/std of the *background noise* of the underlying (2d or 3d) image data based on the image's 1d image intensity histogram. Regarding the "tail", the problem is that in general the signal intensities are not well separated from the background. Thus, the right half of the background's (Gaussian) noise distribution may already be significantly miss-shaped - whereas the left side, i.e. all values below the mean background level, should be nicely Bell-shape distributed. (This idea comes also from looking at many hundreds of image histogram [my wx/OpenGL based image viewer always displays the image intensity histogram right below the image]) >From your answer I just got the idea of changing the error function in such a way to only sum up the datapoint-model-errors (IOW, build the RMS) over intensities smaller than the models mean value. So, the resulting question becomes if the error function has the Gaussian-mean value of the "current" fitting step available to it !? I can probably find that out. Thanks again, Sebastian > Though, now that I think on it, I seem to recall that the EM algorithm > was originally deployed to estimate parameters of gaussians with > censored tails. (I think the problem was: how tall was the average > Frenchman, given distribution of heights of French soldiers and the > knowledge that there was a height minimum for army service?) I think > you just estimate the mean/std from the censored data, then fill in > the censored tails with samples from the fit distribution, and then re- > estimate the mean/std from the new data, etc. I forget exactly how one > does this (does it work on the histogram, or the underlying data, > e.g.) but that's the general idea. > > Zach > > >> Thanks, >> Sebastian. >> >> >> On Fri, Jul 16, 2010 at 1:04 AM, Christoph Deil >> wrote: >>> Hi Sebastian, >>> >>> in astronomy a method called kappa-sigma-clipping is sometimes used >>> to estimate the background level by clipping away most of the signal: >>> http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro >>> I am not aware of a python implementation, but it's just a few >>> lines of code. >>> >>> If you can identify the background level approximately by eye, >>> e.g. by plotting a histogram of your data, you should be able to >>> just fit the tail of the Gaussian that only contains background. >>> >>> Here is my attempt at doing such a fit using >>> scipy.stats.rv_continous.fit(), >>> similar to but not exactly what you want: >>> >>> from scipy.stats import norm, halfnorm, uniform >>> signal = - uniform.rvs(0, 3, size=10000) >>> background = norm.rvs(size=10000) >>> data = hstack((signal, background)) >>> hist(data, bins=30) >>> selection = data[data>0] >>> halfnorm.fit(selection) >>> x = linspace(-3, 3, 100) >>> y = selection.sum() * halfnorm.pdf(x)/3 >>> plot(x,y) >>> >>> Good luck! >>> Christoph >>> >>> On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: >>> >>>> Hi, >>>> In image analysis one is often faced with (often unknown) background >>>> levels (offset) + (Gaussian) background noise. >>>> The overall intensity histogram of the image is in fact often >>>> Gaussian >>>> (Bell shaped), but depending on how many (foreground) objects are >>>> present the histogram shows a positive tail of some sort. >>>> >>>> So, I just got the idea if there was a function (i.e. mathematical >>>> algorithm) that would allow to fit only the left half of a Gaussian >>>> bell curve to data points !? >>>> This would have to be done in a way that the center, the variance >>>> (or >>>> sigma) and the peak height are free fitting parameters. >>>> >>>> Any help or ideas are appreciated, >>>> thanks >>>> Sebastian Haase From zachary.pincus at yale.edu Tue Jul 20 15:33:41 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 20 Jul 2010 13:33:41 -0600 Subject: [SciPy-User] Fwd: one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> Message-ID: > Zach, > thanks for your reply. The idea is to calculate the mean/std of the > *background noise* of the underlying (2d or 3d) image data based on > the image's 1d image intensity histogram. > Regarding the "tail", the problem is that in general the signal > intensities are not well separated from the background. Thus, the > right half of the background's (Gaussian) noise distribution may > already be significantly miss-shaped - whereas the left side, i.e. all > values below the mean background level, should be nicely Bell-shape > distributed. (This idea comes also from looking at many hundreds of > image histogram [my wx/OpenGL based image viewer always displays the > image intensity histogram right below the image]) Right... if you're in the regime that I'm in where there are orders of magnitude more background pixels than foreground ones, then a robust estimator of mean/std (e.g. one that's immune to "outliers" aka foreground pixel intensities) has, in my experience, worked very well for this precise task: estimating the mean/std of the majority of the pixels (background) in the presence of possibly many outlier pixels (foreground). I seem to recall that the MCD estimator can work in the presence of ~30% outliers... Zach >> From your answer I just got the idea of changing the error function >> in > such a way to only sum up the datapoint-model-errors (IOW, build the > RMS) over intensities smaller than the models mean value. So, the > resulting question becomes if the error function has the Gaussian-mean > value of the "current" fitting step available to it !? I can probably > find that out. > > > Thanks again, > Sebastian > > > > >> Though, now that I think on it, I seem to recall that the EM >> algorithm >> was originally deployed to estimate parameters of gaussians with >> censored tails. (I think the problem was: how tall was the average >> Frenchman, given distribution of heights of French soldiers and the >> knowledge that there was a height minimum for army service?) I think >> you just estimate the mean/std from the censored data, then fill in >> the censored tails with samples from the fit distribution, and then >> re- >> estimate the mean/std from the new data, etc. I forget exactly how >> one >> does this (does it work on the histogram, or the underlying data, >> e.g.) but that's the general idea. >> >> Zach >> >> >>> Thanks, >>> Sebastian. >>> >>> >>> On Fri, Jul 16, 2010 at 1:04 AM, Christoph Deil >>> wrote: >>>> Hi Sebastian, >>>> >>>> in astronomy a method called kappa-sigma-clipping is sometimes used >>>> to estimate the background level by clipping away most of the >>>> signal: >>>> http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro >>>> I am not aware of a python implementation, but it's just a few >>>> lines of code. >>>> >>>> If you can identify the background level approximately by eye, >>>> e.g. by plotting a histogram of your data, you should be able to >>>> just fit the tail of the Gaussian that only contains background. >>>> >>>> Here is my attempt at doing such a fit using >>>> scipy.stats.rv_continous.fit(), >>>> similar to but not exactly what you want: >>>> >>>> from scipy.stats import norm, halfnorm, uniform >>>> signal = - uniform.rvs(0, 3, size=10000) >>>> background = norm.rvs(size=10000) >>>> data = hstack((signal, background)) >>>> hist(data, bins=30) >>>> selection = data[data>0] >>>> halfnorm.fit(selection) >>>> x = linspace(-3, 3, 100) >>>> y = selection.sum() * halfnorm.pdf(x)/3 >>>> plot(x,y) >>>> >>>> Good luck! >>>> Christoph >>>> >>>> On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: >>>> >>>>> Hi, >>>>> In image analysis one is often faced with (often unknown) >>>>> background >>>>> levels (offset) + (Gaussian) background noise. >>>>> The overall intensity histogram of the image is in fact often >>>>> Gaussian >>>>> (Bell shaped), but depending on how many (foreground) objects are >>>>> present the histogram shows a positive tail of some sort. >>>>> >>>>> So, I just got the idea if there was a function (i.e. mathematical >>>>> algorithm) that would allow to fit only the left half of a >>>>> Gaussian >>>>> bell curve to data points !? >>>>> This would have to be done in a way that the center, the variance >>>>> (or >>>>> sigma) and the peak height are free fitting parameters. >>>>> >>>>> Any help or ideas are appreciated, >>>>> thanks >>>>> Sebastian Haase > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From seb.haase at gmail.com Tue Jul 20 15:47:27 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 20 Jul 2010 21:47:27 +0200 Subject: [SciPy-User] Fwd: one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> Message-ID: On Tue, Jul 20, 2010 at 9:33 PM, Zachary Pincus wrote: >> Zach, >> thanks for your reply. The idea is to calculate the mean/std of the >> *background noise* of the underlying (2d or 3d) image data based on >> the image's 1d image intensity histogram. >> Regarding the "tail", the problem is that in general the signal >> intensities are not well separated from the background. Thus, the >> right half of the background's (Gaussian) noise distribution may >> already be significantly miss-shaped - whereas the left side, i.e. all >> values below the mean background level, should be nicely Bell-shape >> distributed. (This idea comes also from looking at many hundreds of >> image histogram [my wx/OpenGL based image viewer always displays the >> image intensity histogram right below the image]) > > Right... if you're in the regime that I'm in where there are orders of > magnitude more background pixels than foreground ones, then a robust > estimator of mean/std (e.g. one that's immune to "outliers" aka > foreground pixel intensities) has, in my experience, worked very well > for this precise task: estimating the mean/std of the majority of the > pixels (background) in the presence of possibly many outlier pixels > (foreground). I seem to recall that the MCD estimator can work in the > presence of ~30% outliers... > > Zach I'm hoping that it even works for biological image data with much less background - as long as the signal is always above the mean background .... I'll do some tests and report back. -Sebastian > > >>> From your answer I just got the idea of changing the error function >>> in >> such a way to only sum up the datapoint-model-errors (IOW, build the >> RMS) over intensities smaller than the models mean value. ?So, the >> resulting question becomes if the error function has the Gaussian-mean >> value of the "current" fitting step available to it !? I can probably >> find that out. >> >> >> Thanks again, >> Sebastian >> >> >> >> >>> Though, now that I think on it, I seem to recall that the EM >>> algorithm >>> was originally deployed to estimate parameters of gaussians with >>> censored tails. (I think the problem was: how tall was the average >>> Frenchman, given distribution of heights of French soldiers and the >>> knowledge that there was a height minimum for army service?) I think >>> you just estimate the mean/std from the censored data, then fill in >>> the censored tails with samples from the fit distribution, and then >>> re- >>> estimate the mean/std from the new data, etc. I forget exactly how >>> one >>> does this (does it work on the histogram, or the underlying data, >>> e.g.) but that's the general idea. >>> >>> Zach >>> >>> >>>> Thanks, >>>> Sebastian. >>>> >>>> >>>> On Fri, Jul 16, 2010 at 1:04 AM, Christoph Deil >>>> wrote: >>>>> Hi Sebastian, >>>>> >>>>> in astronomy a method called kappa-sigma-clipping is sometimes used >>>>> to estimate the background level by clipping away most of the >>>>> signal: >>>>> http://idlastro.gsfc.nasa.gov/ftp/pro/math/meanclip.pro >>>>> I am not aware of a python implementation, but it's just a few >>>>> lines of code. >>>>> >>>>> If you can identify the background level approximately by eye, >>>>> e.g. by plotting a histogram of your data, you should be able to >>>>> just fit the tail of the Gaussian that only contains background. >>>>> >>>>> Here is my attempt at doing such a fit using >>>>> scipy.stats.rv_continous.fit(), >>>>> similar to but not exactly what you want: >>>>> >>>>> from scipy.stats import norm, halfnorm, uniform >>>>> signal = - uniform.rvs(0, 3, size=10000) >>>>> background = norm.rvs(size=10000) >>>>> data = hstack((signal, background)) >>>>> hist(data, bins=30) >>>>> selection = data[data>0] >>>>> halfnorm.fit(selection) >>>>> x = linspace(-3, 3, 100) >>>>> y = selection.sum() * halfnorm.pdf(x)/3 >>>>> plot(x,y) >>>>> >>>>> Good luck! >>>>> Christoph >>>>> >>>>> On Jul 15, 2010, at 10:39 PM, Sebastian Haase wrote: >>>>> >>>>>> Hi, >>>>>> In image analysis one is often faced with (often unknown) >>>>>> background >>>>>> levels (offset) + (Gaussian) background noise. >>>>>> The overall intensity histogram of the image is in fact often >>>>>> Gaussian >>>>>> (Bell shaped), but depending on how many (foreground) objects are >>>>>> present the histogram shows a positive tail of some sort. >>>>>> >>>>>> So, I just got the idea if there was a function (i.e. mathematical >>>>>> algorithm) that would allow to fit only the left half of a >>>>>> Gaussian >>>>>> bell curve to data points !? >>>>>> This would have to be done in a way that the center, the variance >>>>>> (or >>>>>> sigma) and the peak height are free fitting parameters. >>>>>> >>>>>> Any help or ideas are appreciated, >>>>>> thanks >>>>>> Sebastian Haase >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Tue Jul 20 16:38:06 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 22:38:06 +0200 Subject: [SciPy-User] Fwd: one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> Message-ID: > On Sat, Jul 17, 2010 at 12:07 AM, Zachary Pincus > Zach, > thanks for your reply. The idea is to calculate the mean/std of the > *background noise* of the underlying (2d or 3d) image data based on > the image's 1d image intensity histogram. > Regarding the "tail", the problem is that in general the signal > intensities are not well separated from the background. Thus, the > right half of the background's (Gaussian) noise distribution may > already be significantly miss-shaped - whereas the left side, i.e. all > values below the mean background level, should be nicely Bell-shape > distributed. Have you considered fitting a mixture model using the EM algortithm? You could e.g. include one Gaussian for the signal and a different probability model for the noise (e.g. a Poisson process). Start by fitting a Gaussian the standard way (maximum likelihood), then use the EM to prune out the noise samples. you can then see this as a data clustering problem (i.e. you try to classify each point as being signal or noise). Sturla From seb.haase at gmail.com Tue Jul 20 17:03:02 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Tue, 20 Jul 2010 23:03:02 +0200 Subject: [SciPy-User] Fwd: one-sided gauss fit -- or: how to estimate backgound noise ? In-Reply-To: References: <73866407-34E3-47A7-87F5-D9D61E324B7C@googlemail.com> <93D88C28-6BB9-47C1-90DD-CD2F601372BC@yale.edu> Message-ID: On Tue, Jul 20, 2010 at 10:38 PM, Sturla Molden wrote: >> On Sat, Jul 17, 2010 at 12:07 AM, Zachary Pincus > >> Zach, >> thanks for your reply. The idea is to calculate the mean/std of the >> *background noise* of the underlying (2d or 3d) image data based on >> the image's 1d image intensity histogram. >> Regarding the "tail", the problem is that in general the signal >> intensities are not well separated from the background. Thus, the >> right half of the background's (Gaussian) noise distribution may >> already be significantly miss-shaped - whereas the left side, i.e. all >> values below the mean background level, should be nicely Bell-shape >> distributed. > > Have you considered fitting a mixture model using the EM algortithm? You > could e.g. include one Gaussian for the signal and a different probability > model for the noise (e.g. a Poisson process). Start by fitting a Gaussian > the standard way (maximum likelihood), then use the EM to prune out the > noise samples. you can then see this as a data clustering problem (i.e. > you try to classify each point as being signal or noise). > > Sturla As far as I know I can only expect Gaussian distribution for the noise of the background. The (foreground) signal could in general be of any kind - including few well separated events, or a broad intensity distribution just about background mean, or something in between ... -Sebastian From ben.root at ou.edu Tue Jul 20 19:12:31 2010 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 20 Jul 2010 18:12:31 -0500 Subject: [SciPy-User] pylab In-Reply-To: References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: On Tue, Jul 20, 2010 at 12:24 PM, wrote: > > > Ben and Pierre, > > > > > > Thanks for the tips! Glad I chimed in here b/c I've definitely learned > > something. I'm often pulling data out from a database, so null records > > come back to me as None's. I'll be sure to set a CASE statement now that > > fills the NULLs in with an obviously junk value that I can mask from now > > on. > > > > > > -paul > > > Why not user your None as said junk value? > > > > --Josh > > All my existing routines just throw values from the database (Postgres or > MS SQL) cursor directly into numpy arrays. As Pierre pointed out, the None > values force the array's dtype to object. > > In [40]: x > Out[40]: array([2.5, 12.2, 5, None], dtype=object) > > I could easily mask the Nones, but switching it over to a value such as > -99999 would let it me keep the dtype as float or integer. Probably not a > big deal at the moment, but I'm trying to adopt as many best practices as > possible so nothing comes back to bite me later on. (Negative values are > pretty rare in environmental data, so I think it's a safe bet). > -paul > > As a general rule, magic values can be very tricky to work with, and it is a good idea to think out conventions. I was just bitten by a bug where a previous developer wanted coordinate values of exactly 0.0 (yes, floating point) to mean something special, but had no constraints for the domain. So, once in a blue moon, a data point would get processed differently than all the others, causing slightly odd analysis results. So, my personal rule is to use NaNs whenever/where-ever possible because if they sneak into a calculation unexpectedly, they will usually propagate such that your program breaks -- which is a good thing. Bad results that appear to be good is a bad thing. When you have to do integers, pick a magic value that is impossible (e.g., negative value for temperature in Kelvins), and document that magic value. I hope this is helpful. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.holbrook at gmail.com Tue Jul 20 19:17:42 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 20 Jul 2010 15:17:42 -0800 Subject: [SciPy-User] pylab In-Reply-To: References: <9560E0CD-AA2C-4D4E-B9E6-262A96D38C73@gmail.com> Message-ID: On Tue, Jul 20, 2010 at 3:12 PM, Benjamin Root wrote: > On Tue, Jul 20, 2010 at 12:24 PM, wrote: >> >> > > Ben and Pierre, >> > > >> > > Thanks for the tips! Glad I chimed in here b/c I've definitely learned >> > something. I'm often pulling data out from a database, so null records >> > come back to me as None's. I'll be sure to set a CASE statement now that >> > fills the NULLs in with an obviously junk value that I can mask from now >> > on. >> > > >> > > -paul >> >> > Why not user your None as said junk value? >> > >> > --Josh >> >> All my existing routines just throw values from the database (Postgres or >> MS SQL) cursor directly into numpy arrays. As Pierre pointed out, the None >> values force the array's dtype to object. >> >> In [40]: x >> Out[40]: array([2.5, 12.2, 5, None], dtype=object) >> >> I could easily mask the Nones, but switching it over to a value such as >> -99999 would let it me keep the dtype as float or integer. Probably not a >> big deal at the moment, but I'm trying to adopt as many best practices as >> possible so nothing comes back to bite me later on. (Negative values are >> pretty rare in environmental data, so I think it's a safe bet). >> -paul >> > > As a general rule, magic values can be very tricky to work with, and it is a > good idea to think out conventions. ?I was just bitten by a bug where a > previous developer wanted coordinate values of exactly 0.0 (yes, floating > point) to mean something special, but had no constraints for the domain. > ?So, once in a blue moon, a data point would get processed differently than > all the others, causing slightly odd analysis results. > > So, my personal rule is to use NaNs whenever/where-ever possible because if > they sneak into a calculation unexpectedly, they will usually propagate such > that your program breaks -- which is a good thing. ?Bad results that appear > to be good is a bad thing. ?When you have to do integers, pick a magic value > that is impossible (e.g., negative value for temperature in Kelvins), and > document that magic value. > > I hope this is helpful. > > Ben Root > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > Hmm! Yeah, NaN would make more sense. >_< From irbdavid at gmail.com Wed Jul 21 04:18:24 2010 From: irbdavid at gmail.com (David Andrews) Date: Wed, 21 Jul 2010 09:18:24 +0100 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? Message-ID: Hi All, I suppose this might not strictly be a scipy type question, but I'll ask here as I expect some of you might understand what I'm getting at! I'm in the process of porting some code from IDL (Interactive Data Language - popular in some fields of science, but largely nowhere else) to Python. Essentially it's just plotting and analyzing time series data, and so most of the porting is relatively simple. The one stumbling block - is there an equivalent or useful replacement for the "common block" concept in IDL available in Python? Common blocks are areas of shared memory held by IDL that can be accessed easily from within sub-routines. So for example, in our IDL code, we load data into these common blocks at the start of a session, and then perform whatever analysis on it. In this manner, we do not have to continually re-load data every time we re-perform a piece of analysis. They store their contents persistently, for the duration of the IDL session. It's all for academic research purposes, so it's very much 'try this / see what happens / alter it, try again' kind of work. The loading and initial processing of data is fairly time intensive, so having to reload at each step is a bit frustrating and not very productive. So, does anyone have any suggestions as to the best way to go about porting this sort of behavior? Pickle seems to be one option, but that would involve read/write to disk operations anyway? Any others? Kind Regards, David --------------------------------------- David Andrews Postgraduate Student, Radio & Space Plasma Physics Group University of Leicester, UK From robince at gmail.com Wed Jul 21 05:09:48 2010 From: robince at gmail.com (Robin) Date: Wed, 21 Jul 2010 10:09:48 +0100 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Wed, Jul 21, 2010 at 9:18 AM, David Andrews wrote: > Hi All, > > I suppose this might not strictly be a scipy type question, but I'll > ask here as I expect some of you might understand what I'm getting at! > > I'm in the process of porting some code from IDL (Interactive Data > Language - popular in some fields of science, but largely nowhere > else) to Python. ?Essentially it's just plotting and analyzing time > series data, and so most of the porting is relatively simple. ?The one > stumbling block - is there an equivalent or useful replacement for the > "common block" concept in IDL available in Python? > > Common blocks are areas of shared memory held by IDL that can be > accessed easily from within sub-routines. ?So for example, in our IDL > code, we load data into these common blocks at the start of a session, > and then perform whatever analysis on it. ?In this manner, we do not > have to continually re-load data every time we re-perform a piece of > analysis. ?They store their contents persistently, for the duration of > the IDL session. ?It's all for academic research purposes, so it's > very much 'try this / see what happens / alter it, try again' kind of > work. ?The loading and initial processing of data is fairly time > intensive, so having to reload at each step is a bit frustrating and > not very productive. > > So, does anyone have any suggestions as to the best way to go about > porting this sort of behavior? ?Pickle seems to be one option, but > that would involve read/write to disk operations anyway? ?Any others? One way when working interactively (eg in ipython) would be to load the data in the workspace in single variable (I've found the Bunch class useful for this) then pass it explicitly to all the analysis functions. import analysis data = analysis.load_data() analysis.do_something(data, params) Alternatively you could use module level variables - any imported module provides a scope which can contain variables so using something like: analysis.py def load_data(): global data data = [1,2,3,4] def print_data(): global data print data You could do import analysis analysis.load_data() # data available interactively as analysis.data analysis.print_data() # uses module variable I prefer the first way - as if I am modifying the code I am using I can reload the module and use the new functions without having to load the data again. Cheers Robin From mail.to.daniel.platz at googlemail.com Wed Jul 21 07:33:10 2010 From: mail.to.daniel.platz at googlemail.com (Daniel Platz) Date: Wed, 21 Jul 2010 13:33:10 +0200 Subject: [SciPy-User] Gauss-Lobatto quadrature in scipy Message-ID: Hi! I am new to numerical integration in scipy. Is there a Gauss-Lobatto quadrature available. In matlab this would be the function "quadl". If there is not, is there a way to quickly implement this using one of the other integration functions? Thanks in advance Daniel From yosefmel at post.tau.ac.il Wed Jul 21 08:02:33 2010 From: yosefmel at post.tau.ac.il (Yosef Meller) Date: Wed, 21 Jul 2010 15:02:33 +0300 Subject: [SciPy-User] Problem with self-built numpy 1.4.1 on kubuntu lucid Message-ID: <201007211502.33753.yosefmel@post.tau.ac.il> Hi, I'm trying to build numpy 1.4.1 on an Ubuntu Lucid system. I've done this recently on a similar system and it went fine. But here, the build finishes fine, but the tests give me errors (see below). I can;t find what I'm doing wrong, and searching the web didn't find earlier reports of this. Any idea? Thanks, Yosef. -------------- In [3]: import numpy In [4]: numpy.test() Running unit tests for numpy NumPy version 1.4.1 NumPy is installed in /usr/local/lib/python2.6/dist-packages/numpy Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] nose version 0.11.1 .EEEEEEEEEEE..........................................................................................................................................SSSSSSSS................................................................................................................................................................................................................................................................................SSS...........................................................................................................................................................................................................................................K........................................................................................K......................K.K......................................................................................................EEEEEEE......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ====================================================================== ERROR: test_creation (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 10, in test_creation dt1 = np.dtype('M8[750%s]'%unit) TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_as (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 58, in test_divisor_conversion_as self.assertRaises(ValueError, lambda : np.dtype('M8[as/10]')) File "/usr/lib/python2.6/unittest.py", line 336, in failUnlessRaises callableObj(*args, **kwargs) File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 58, in self.assertRaises(ValueError, lambda : np.dtype('M8[as/10]')) TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_bday (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 32, in test_divisor_conversion_bday assert np.dtype('M8[B/12]') == np.dtype('M8[2h]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_day (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 37, in test_divisor_conversion_day assert np.dtype('M8[D/12]') == np.dtype('M8[2h]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_fs (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 54, in test_divisor_conversion_fs assert np.dtype('M8[fs/100]') == np.dtype('M8[10as]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_hour (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 42, in test_divisor_conversion_hour assert np.dtype('m8[h/30]') == np.dtype('m8[2m]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_minute (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 46, in test_divisor_conversion_minute assert np.dtype('m8[m/30]') == np.dtype('m8[2s]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_month (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 21, in test_divisor_conversion_month assert np.dtype('M8[M/2]') == np.dtype('M8[2W]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_second (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 50, in test_divisor_conversion_second assert np.dtype('m8[s/100]') == np.dtype('m8[10ms]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_week (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 26, in test_divisor_conversion_week assert np.dtype('m8[W/5]') == np.dtype('m8[B]') TypeError: data type not understood ====================================================================== ERROR: test_divisor_conversion_year (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist- packages/numpy/core/tests/test_datetime.py", line 16, in test_divisor_conversion_year assert np.dtype('M8[Y/4]') == np.dtype('M8[3M]') TypeError: data type not understood ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_callback.py", line 4, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_mixed.py", line 7, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_return_character.py", line 3, in from numpy.compat import asbytes ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_return_complex.py", line 3, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_return_integer.py", line 3, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_return_logical.py", line 3, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ====================================================================== ERROR: Failure: ImportError (cannot import name asbytes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/local/lib/python2.6/dist- packages/numpy/f2py/tests/test_return_real.py", line 4, in import util File "/usr/local/lib/python2.6/dist-packages/numpy/f2py/tests/util.py", line 21, in from numpy.compat import asbytes, asstr ImportError: cannot import name asbytes ---------------------------------------------------------------------- Ran 2501 tests in 9.603s FAILED (KNOWNFAIL=4, SKIP=11, errors=18) Out[4]: From arserlom at gmail.com Wed Jul 21 08:13:58 2010 From: arserlom at gmail.com (Armando Serrano Lombillo) Date: Wed, 21 Jul 2010 14:13:58 +0200 Subject: [SciPy-User] specify lognormal distribution with mu and sigma using scipy.stats In-Reply-To: <1cd32cbb0910140620j1c3a8a70p5226359576406128@mail.gmail.com> References: <6946b9500910140122l4473d801s431d304cf20bca41@mail.gmail.com> <1cd32cbb0910140620j1c3a8a70p5226359576406128@mail.gmail.com> Message-ID: Hello, I'm also having difficulties with lognorm. If mu is the mean and s**2 is the variance then... >>> from scipy.stats import lognorm >>> from math import exp >>> mu = 10 >>> s = 1 >>> d = lognorm(s, scale=exp(mu)) >>> d.stats('m') array(36315.502674246643) shouldn't that be 10? On Wed, Oct 14, 2009 at 3:20 PM, wrote: > On Wed, Oct 14, 2009 at 4:22 AM, Mark Bakker wrote: > > Hello list, > > I am having trouble creating a lognormal distribution with known mean mu > and > > standard deviation sigma using scipy.stats > > According to the docs, the programmed function is: > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > > So s is the standard deviation. But how do I specify the mean? I found > some > > information that when you specify loc and scale, you replace x by > > (x-loc)/scale > > But in the lognormal distribution, you want to replace log(x) by > log(x)-loc > > where loc is mu. How do I do that? In addition, would it be a good idea > to > > create some convenience functions that allow you to simply create > lognormal > > (and maybe normal) distributions by specifying the more common mu and > sigma? > > That would surely make things more userfriendly. > > Thanks, > > Mark > > I don't think loc of lognorm makes much sense in most application, > since it is just shifting the support, lower boundary is zero+loc. The > loc of the underlying normal distribution enters through the scale. > > see also > http://en.wikipedia.org/wiki/Log-normal_distribution#Mean_and_standard_deviation > > > >>> print stats.lognorm.extradoc > > > Lognormal distribution > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > for x > 0, s > 0. > > If log x is normally distributed with mean mu and variance sigma**2, > then x is log-normally distributed with shape paramter sigma and scale > parameter exp(mu). > > > roundtrip with mean mu of the underlying normal distribution (scale=1): > > >>> mu=np.arange(5) > >>> np.log(stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0])-0.5 > array([ 0., 1., 2., 3., 4.]) > > corresponding means of lognormal distribution > > >>> stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0] > array([ 1.64872127, 4.48168907, 12.18249396, 33.11545196, 90.0171313 > ]) > > > shifting support: > > >>> stats.lognorm.a > 0.0 > >>> stats.lognorm.ppf([0, 0.5, 1], 1, loc=3,scale=1) > array([ 3., 4., Inf]) > > > The only case that I know for lognormal is in regression, so I'm not > sure what you mean by the convenience functions. > (the normal distribution is defined by loc=mean, scale=standard deviation) > > assume the regression equation is > y = x*beta*exp(u) u distributed normal(0, sigma^2) > this implies > ln y = ln(x*beta) + u which is just a standard linear regression > equation which can be estimated by ols or mle > > exp(u) in this case is lognormal distributed > > Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Wed Jul 21 08:29:33 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 21 Jul 2010 14:29:33 +0200 Subject: [SciPy-User] Problem with self-built numpy 1.4.1 on kubuntu lucid In-Reply-To: <201007211502.33753.yosefmel@post.tau.ac.il> References: <201007211502.33753.yosefmel@post.tau.ac.il> Message-ID: >On 21 July 2010 14:02, Yosef Meller wrote: > I'm trying to build numpy 1.4.1 on an Ubuntu Lucid system. I've done this > recently on a similar system and it went fine. But here, the build finishes fine, > but the tests give me errors (see below). > > -------------- > In [3]: import numpy > > In [4]: numpy.test() > Running unit tests for numpy > NumPy version 1.4.1 > NumPy is installed in /usr/local/lib/python2.6/dist-packages/numpy > Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] > nose version 0.11.1 > .EEEEEEEEEEE..........................................................................................................................................SSSSSSSS................................................................................................................................................................................................................................................................................SSS...........................................................................................................................................................................................................................................K........................................................................................K......................K.K......................................................................................................EEEEEEE.................................................................................................. > ?............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > ?.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > ====================================================================== > ERROR: test_creation (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/usr/local/lib/python2.6/dist- > packages/numpy/core/tests/test_datetime.py", line 10, in test_creation > ? ?dt1 = np.dtype('M8[750%s]'%unit) > TypeError: data type not understood NumPy 1.4.1 shouldn't have 'numpy/core/tests/test_datetime.py'. It looks like you have some files left over from an old installation (maybe 1.4.0?). Make sure that 1) you remove the build sub-directory from wherever you've unpacked the source code before building again and 2) you completely remove the contents of /usr/local/lib/python2.6/dist-packages/numpy before installing. Cheers, Scott From yosefmel at post.tau.ac.il Wed Jul 21 08:51:22 2010 From: yosefmel at post.tau.ac.il (Yosef Meller) Date: Wed, 21 Jul 2010 15:51:22 +0300 Subject: [SciPy-User] Problem with self-built numpy 1.4.1 on kubuntu lucid In-Reply-To: References: <201007211502.33753.yosefmel@post.tau.ac.il> Message-ID: <201007211551.22873.yosefmel@post.tau.ac.il> On ??? ????? 21 ???? 2010 15:29:33 Scott Sinclair wrote: > >On 21 July 2010 14:02, Yosef Meller wrote: > > I'm trying to build numpy 1.4.1 on an Ubuntu Lucid system. I've done this > > recently on a similar system and it went fine. But here, the build > > finishes fine, but the tests give me errors (see below). > > [snip] > > NumPy 1.4.1 shouldn't have 'numpy/core/tests/test_datetime.py'. It > looks like you have some files left over from an old installation > (maybe 1.4.0?). > > Make sure that 1) you remove the build sub-directory from wherever > you've unpacked the source code before building again and 2) you > completely remove the contents of > /usr/local/lib/python2.6/dist-packages/numpy before installing. Yup, turns out I had another numpy installed in /usr/local. Thanks for the tip, Yosef. From aarchiba at physics.mcgill.ca Wed Jul 21 11:17:58 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Wed, 21 Jul 2010 11:17:58 -0400 Subject: [SciPy-User] Gauss-Lobatto quadrature in scipy In-Reply-To: References: Message-ID: On 21 July 2010 07:33, Daniel Platz wrote: > Hi! > > I am new to numerical integration in scipy. Is there a Gauss-Lobatto > quadrature available. In matlab this would be the function "quadl". If > there is not, is there a way to quickly implement this using one of > the other integration functions? scipy.integrate.fixed_quad does Gauss-Lobatto with a fixed order, and scipy.integrate.quadrature is adaptive, with a fixed tolerance. Both use Legendre roots. If these aren't flexible enough, scipy.special provides a variety of tools for working with orthogonal polynomials. Be careful though, as in older versions of scipy become numerically unstable at high (>~20) orders. Newer scipy versions include, I think, some specialized code to evaluate roots and weights more stably. But if you need high orders I recommend you test for numerical accuracy yourself. Anne > Thanks in advance > > Daniel > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From gokhansever at gmail.com Wed Jul 21 11:47:00 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 21 Jul 2010 10:47:00 -0500 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Wed, Jul 21, 2010 at 3:18 AM, David Andrews wrote: > Hi All, > > I suppose this might not strictly be a scipy type question, but I'll > ask here as I expect some of you might understand what I'm getting at! > > I'm in the process of porting some code from IDL (Interactive Data > Language - popular in some fields of science, but largely nowhere > else) to Python. Essentially it's just plotting and analyzing time > series data, and so most of the porting is relatively simple. The one > stumbling block - is there an equivalent or useful replacement for the > "common block" concept in IDL available in Python? > > Common blocks are areas of shared memory held by IDL that can be > accessed easily from within sub-routines. So for example, in our IDL > code, we load data into these common blocks at the start of a session, > and then perform whatever analysis on it. In this manner, we do not > have to continually re-load data every time we re-perform a piece of > analysis. They store their contents persistently, for the duration of > the IDL session. It's all for academic research purposes, so it's > very much 'try this / see what happens / alter it, try again' kind of > work. The loading and initial processing of data is fairly time > intensive, so having to reload at each step is a bit frustrating and > not very productive. > > So, does anyone have any suggestions as to the best way to go about > porting this sort of behavior? Pickle seems to be one option, but > that would involve read/write to disk operations anyway? Any others? > > Kind Regards, > > David > > --------------------------------------- > David Andrews > Postgraduate Student, Radio & Space Plasma Physics Group > University of Leicester, UK > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Hello, I was once dealing porting some IDL code into Python. My simple solution was the following: Consider this sample IDL piece: myconst = {a: 1, b: 2} function myfunc, x common myconst return myconst.a * x + myconst.b * x end in Python I would define a dictionary like: myconst = {'a':1, 'b':2} then in the function: def myfunc(x): return myconst['a']*x + myconst['b']*x saving me typing a "common myconst" and an extra "end" -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Jul 21 11:48:10 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 21 Jul 2010 10:48:10 -0500 Subject: [SciPy-User] specify lognormal distribution with mu and sigma using scipy.stats In-Reply-To: References: <6946b9500910140122l4473d801s431d304cf20bca41@mail.gmail.com> <1cd32cbb0910140620j1c3a8a70p5226359576406128@mail.gmail.com> Message-ID: <4C4716BA.7000807@enthought.com> Armando Serrano Lombillo wrote: > Hello, I'm also having difficulties with lognorm. > > If mu is the mean and s**2 is the variance then... > > >>> from scipy.stats import lognorm > >>> from math import exp > >>> mu = 10 > >>> s = 1 > >>> d = lognorm(s, scale=exp(mu)) > >>> d.stats('m') > array(36315.502674246643) > > shouldn't that be 10? In terms of mu and sigma, the mean of the lognormal distribution is exp(mu + 0.5*sigma**2). In your example: In [16]: exp(10.5) Out[16]: 36315.502674246636 Warren > > On Wed, Oct 14, 2009 at 3:20 PM, > wrote: > > On Wed, Oct 14, 2009 at 4:22 AM, Mark Bakker > wrote: > > Hello list, > > I am having trouble creating a lognormal distribution with known > mean mu and > > standard deviation sigma using scipy.stats > > According to the docs, the programmed function is: > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > > So s is the standard deviation. But how do I specify the mean? I > found some > > information that when you specify loc and scale, you replace x by > > (x-loc)/scale > > But in the lognormal distribution, you want to replace log(x) by > log(x)-loc > > where loc is mu. How do I do that? In addition, would it be a > good idea to > > create some convenience functions that allow you to simply > create lognormal > > (and maybe normal) distributions by specifying the more common > mu and sigma? > > That would surely make things more userfriendly. > > Thanks, > > Mark > > I don't think loc of lognorm makes much sense in most application, > since it is just shifting the support, lower boundary is zero+loc. The > loc of the underlying normal distribution enters through the scale. > > see also > http://en.wikipedia.org/wiki/Log-normal_distribution#Mean_and_standard_deviation > > > >>> print stats.lognorm.extradoc > > > Lognormal distribution > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > for x > 0, s > 0. > > If log x is normally distributed with mean mu and variance sigma**2, > then x is log-normally distributed with shape paramter sigma and scale > parameter exp(mu). > > > roundtrip with mean mu of the underlying normal distribution > (scale=1): > > >>> mu=np.arange(5) > >>> np.log(stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0])-0.5 > array([ 0., 1., 2., 3., 4.]) > > corresponding means of lognormal distribution > > >>> stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0] > array([ 1.64872127, 4.48168907, 12.18249396, 33.11545196, > 90.0171313 ]) > > > shifting support: > > >>> stats.lognorm.a > 0.0 > >>> stats.lognorm.ppf([0, 0.5, 1], 1, loc=3,scale=1) > array([ 3., 4., Inf]) > > > The only case that I know for lognormal is in regression, so I'm not > sure what you mean by the convenience functions. > (the normal distribution is defined by loc=mean, scale=standard > deviation) > > assume the regression equation is > y = x*beta*exp(u) u distributed normal(0, sigma^2) > this implies > ln y = ln(x*beta) + u which is just a standard linear regression > equation which can be estimated by ols or mle > > exp(u) in this case is lognormal distributed > > Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From quesada at gmail.com Wed Jul 21 11:59:45 2010 From: quesada at gmail.com (Jose Quesada) Date: Wed, 21 Jul 2010 17:59:45 +0200 Subject: [SciPy-User] segfault using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598 Message-ID: Hi, We are on a coding sprint trying to implement sparse matrix support in MDP ( http://sourceforge.net/apps/mediawiki/mdp-toolkit/index.php?title=MDP_Sprint_2010). The new sparse.linalg is very useful here. We are getting segfaults using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598. I understand that (1) this is an unreleased version, and (2) these methods may depend on external C and fortran code that could have not being installed well on my machine, so this may be difficult to debug. I have added instructions to reproduce the segfault, but please ask for anything else that could be needed and I'll try to provide it. I installed the svn version on a virtualenv using pip:~/.virtualenvs/sprint$ pip install svn+http://svn.scipy.org/svn/scipy/trunk/#egg=scipyc This generates a long log that could contain the explanation, so I posted it here (going as far back as my terminal's scrollback enabled: http://pastebin.org/410867 Last, here's an example that reproduces the segfault: #!/usr/bin/env python # -*- coding: utf-8 -*- #-------------------------- # simply run an svd on a a sparse matrix, svn 0.9.0.dev6598 #-------------------------- import scipy from scipy import sparse from numpy.random import rand # create random sparse matrix x = scipy.sparse.lil_matrix((1000000, 1000000)) x[0, :100] = rand(100) x[1, 100:200] = x[0, :100] x.setdiag(rand(1000)) x = x.tocsr() # convert it to CSR #v, u, w = scipy.sparse.linalg.eigen_symmetric(x) # segmentation fault # try a simpler matrix y = scipy.sparse.lil_matrix((10, 10)) y.setdiag(range(10)) y = y.tocsr() # convert it to CSR #v, u, w = scipy.sparse.linalg.eigen_symmetric(y) # #./sampleSegFault.py #Traceback (most recent call last): #File "./sampleSegFault.py", line 13, in #x[0, :100] = rand(100) #File "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/lil.py", line 319, in __setitem__ #x = lil_matrix(x, copy=False) #File "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/lil.py", line 98, in __init__ #A = csr_matrix(A, dtype=dtype).tolil() #File "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/compressed.py", line 71, in __init__ #self._set_self( self.__class__(coo_matrix(arg1, dtype=dtype)) ) #File "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/coo.py", line 171, in __init__ #self.data = M[self.row,self.col] #ValueError: shape mismatch: objects cannot be broadcast to a single shape #*** glibc detected *** python: double free or corruption (!prev): 0x0000000004075ec0 *** # some other linalg methods ly,v = scipy.sparse.linalg.eig(y) # segmentation fault #====# Thanks a lot in advance, -Jose Jose Quesada, PhD. Max Planck Institute, Center for Adaptive Behavior and Cognition, Berlin http://www.josequesada.name/ http://twitter.com/Quesada -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Wed Jul 21 12:27:18 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 21 Jul 2010 12:27:18 -0400 Subject: [SciPy-User] segfault using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598 In-Reply-To: References: Message-ID: On Wed, Jul 21, 2010 at 11:59 AM, Jose Quesada wrote: > Hi, > > We are on a coding sprint trying to implement sparse matrix support in MDP > (http://sourceforge.net/apps/mediawiki/mdp-toolkit/index.php?title=MDP_Sprint_2010). > The new sparse.linalg is very useful here. > > We are getting segfaults using _sparse_ svd, eigen, eigen_symmetric with svn > 0.9.0.dev6598. I understand that (1) this is an unreleased version, and (2) > these methods may depend on external C and fortran code that could have not > being installed well on my machine, so this may be difficult to debug. I > have added instructions to reproduce the segfault, but please ask for > anything else that could be needed and I'll try to provide it. > > I installed the svn version on a virtualenv using pip:~/.virtualenvs/sprint$ > pip install svn+http://svn.scipy.org/svn/scipy/trunk/#egg=scipyc > > This generates a long log that could contain the explanation, so I posted it > here (going as far back as my terminal's scrollback enabled: > http://pastebin.org/410867 > > Last, here's an example that reproduces the segfault: > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > #-------------------------- > # simply run an svd on a a sparse matrix, svn 0.9.0.dev6598 > #-------------------------- > import scipy > from scipy import sparse > from numpy.random import rand > > # create random sparse matrix > x = scipy.sparse.lil_matrix((1000000, 1000000)) > x[0, :100] = rand(100) > x[1, 100:200] = x[0, :100] > x.setdiag(rand(1000)) > x = x.tocsr() # convert it to CSR > #v, u, w = scipy.sparse.linalg.eigen_symmetric(x) # segmentation fault > > # try a simpler matrix > y = scipy.sparse.lil_matrix((10, 10)) > y.setdiag(range(10)) > y = y.tocsr() # convert it to CSR > #v, u, w = scipy.sparse.linalg.eigen_symmetric(y) # > I have to import the linalg separately, and my docs say that eigen_symmetric only returns w and v, so I can do import scipy.sparse.linalg as splinalg w, v = splinalg.eigen_symmetric(y) without a segfault. I'm running the most recent git mirror version of scipy. Just installed this morning. I don't know how to check the git concept of a revision number yet... > #./sampleSegFault.py > #Traceback (most recent call last): > ??? #File "./sampleSegFault.py", line 13, in > ??? #x[0, :100] = rand(100) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/lil.py", > line 319, in __setitem__ > ??? #x = lil_matrix(x, copy=False) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/lil.py", > line 98, in __init__ > ??? #A = csr_matrix(A, dtype=dtype).tolil() > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/compressed.py", > line 71, in __init__ > ??? #self._set_self( self.__class__(coo_matrix(arg1, dtype=dtype)) ) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/sparse/coo.py", > line 171, in __init__ > ??? #self.data? = M[self.row,self.col] > ??? #ValueError: shape mismatch: objects cannot be broadcast to a single > shape > ??? #*** glibc detected *** python: double free or corruption (!prev): > 0x0000000004075ec0 *** > > > # some other linalg methods > ly,v = scipy.sparse.linalg.eig(y) # segmentation fault > I don't have splinalg.eig, but I have splinalg.eigen and it works without segfault. Probably a bad install is my guess. I don't use pip, but you might want to just try building from source and provide the full output of the build process. Skipper From arserlom at gmail.com Wed Jul 21 13:15:28 2010 From: arserlom at gmail.com (Armando Serrano Lombillo) Date: Wed, 21 Jul 2010 19:15:28 +0200 Subject: [SciPy-User] specify lognormal distribution with mu and sigma using scipy.stats In-Reply-To: <4C4716BA.7000807@enthought.com> References: <6946b9500910140122l4473d801s431d304cf20bca41@mail.gmail.com> <1cd32cbb0910140620j1c3a8a70p5226359576406128@mail.gmail.com> <4C4716BA.7000807@enthought.com> Message-ID: Ok, I had misunderstood that mu and sigma where the mean of the lognormally distributed variable. So, this is what I should have written: >>> mean = 10.0 >>> variance = 1.0 >>> mean_n = log(mean) - 0.5*log(1 + variance/mean**2) >>> variance_n = log(variance/mean**2 + 1) >>> d = lognorm(sqrt(variance_n), scale=exp(mean_n)) >>> d.stats() (array(10.000000000000002), array(1.0000000000000013)) Thanks, Armando. On Wed, Jul 21, 2010 at 5:48 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > Armando Serrano Lombillo wrote: > > Hello, I'm also having difficulties with lognorm. > > > > If mu is the mean and s**2 is the variance then... > > > > >>> from scipy.stats import lognorm > > >>> from math import exp > > >>> mu = 10 > > >>> s = 1 > > >>> d = lognorm(s, scale=exp(mu)) > > >>> d.stats('m') > > array(36315.502674246643) > > > > shouldn't that be 10? > > In terms of mu and sigma, the mean of the lognormal distribution > is exp(mu + 0.5*sigma**2). In your example: > > In [16]: exp(10.5) > Out[16]: 36315.502674246636 > > > Warren > > > > > > > > On Wed, Oct 14, 2009 at 3:20 PM, > > wrote: > > > > On Wed, Oct 14, 2009 at 4:22 AM, Mark Bakker > > wrote: > > > Hello list, > > > I am having trouble creating a lognormal distribution with known > > mean mu and > > > standard deviation sigma using scipy.stats > > > According to the docs, the programmed function is: > > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > > > So s is the standard deviation. But how do I specify the mean? I > > found some > > > information that when you specify loc and scale, you replace x by > > > (x-loc)/scale > > > But in the lognormal distribution, you want to replace log(x) by > > log(x)-loc > > > where loc is mu. How do I do that? In addition, would it be a > > good idea to > > > create some convenience functions that allow you to simply > > create lognormal > > > (and maybe normal) distributions by specifying the more common > > mu and sigma? > > > That would surely make things more userfriendly. > > > Thanks, > > > Mark > > > > I don't think loc of lognorm makes much sense in most application, > > since it is just shifting the support, lower boundary is zero+loc. > The > > loc of the underlying normal distribution enters through the scale. > > > > see also > > > http://en.wikipedia.org/wiki/Log-normal_distribution#Mean_and_standard_deviation > > > > > > >>> print stats.lognorm.extradoc > > > > > > Lognormal distribution > > > > lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) > > for x > 0, s > 0. > > > > If log x is normally distributed with mean mu and variance sigma**2, > > then x is log-normally distributed with shape paramter sigma and > scale > > parameter exp(mu). > > > > > > roundtrip with mean mu of the underlying normal distribution > > (scale=1): > > > > >>> mu=np.arange(5) > > >>> np.log(stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0])-0.5 > > array([ 0., 1., 2., 3., 4.]) > > > > corresponding means of lognormal distribution > > > > >>> stats.lognorm.stats(1, loc=0,scale=np.exp(mu))[0] > > array([ 1.64872127, 4.48168907, 12.18249396, 33.11545196, > > 90.0171313 ]) > > > > > > shifting support: > > > > >>> stats.lognorm.a > > 0.0 > > >>> stats.lognorm.ppf([0, 0.5, 1], 1, loc=3,scale=1) > > array([ 3., 4., Inf]) > > > > > > The only case that I know for lognormal is in regression, so I'm not > > sure what you mean by the convenience functions. > > (the normal distribution is defined by loc=mean, scale=standard > > deviation) > > > > assume the regression equation is > > y = x*beta*exp(u) u distributed normal(0, sigma^2) > > this implies > > ln y = ln(x*beta) + u which is just a standard linear regression > > equation which can be estimated by ols or mle > > > > exp(u) in this case is lognormal distributed > > > > Josef > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Wed Jul 21 16:25:13 2010 From: argriffi at ncsu.edu (alex) Date: Wed, 21 Jul 2010 16:25:13 -0400 Subject: [SciPy-User] kmeans Message-ID: Hi, I want to nitpick about the scipy kmeans clustering implementation. Throughout the documentation http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the 'distortion' of a clustering is defined as "the sum of the distances between each observation vector and its dominating centroid." I think that the sum of squares of distances should be used instead of the sum of distances, and all of the miscellaneous kmeans descriptions I found with google would seem to support this. For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and the old center is 3, then the centroid updating step will move the centroid to 4. This step reduces the sum of squares of distances from 55 to 50, but it increases the distortion from 11 to 12. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Jul 21 17:10:09 2010 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 21 Jul 2010 21:10:09 +0000 (UTC) Subject: [SciPy-User] segfault using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598 References: Message-ID: Wed, 21 Jul 2010 17:59:45 +0200, Jose Quesada wrote: [clip] > We are getting segfaults using _sparse_ svd, eigen, eigen_symmetric with > svn 0.9.0.dev6598. I understand that (1) this is an unreleased version, > and (2) these methods may depend on external C and fortran code that > could have not being installed well on my machine, so this may be > difficult to debug. I have added instructions to reproduce the segfault, > but please ask for anything else that could be needed and I'll try to > provide it. The example code runs without segfaults for me, on Python 2.6 + 32-bit linux, and Python 2.5 + 64-bit linux. That part of scipy.sparse AFAIK does not use external code, ARPACK is bundled, so that's probably not an explanation. I'd perhaps first try just removing the Scipy and Numpy installations, and getting the source codes and rebuilding them "manually", via svn co http://... python setup.py install > build.log 2>&1 rather than using Pip or relying on some other automagical system. (I'd also record the build logs and post again here if this didn't help :) -- Pauli Virtanen From minggu2 at gmail.com Wed Jul 21 23:19:20 2010 From: minggu2 at gmail.com (Ming Gu) Date: Wed, 21 Jul 2010 23:19:20 -0400 Subject: [SciPy-User] what's the combination of python, scipy, numpy, and MinGW for Weave to work? Message-ID: Hi guys, I wonder what's the combination of versions for python, scipy, numpy, and MinGW for Weave to work? My current systems keeps giving the following error: Traceback (most recent call last): File "", line 1, in test_weave_3(100) File "C:\temp\test_weave_3.py", line 54, in test_weave_3 weave.inline(code1, ['m', 's', 'm2', 'm3'], type_converters=converters.blitz, compiler='gcc', verbose = 2) File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line 355, in inline **kw) File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line 482, in compile_function verbose=verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\ext_tools.py", line 367, in compile verbose = verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 273, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 162, in setup raise SystemExit, error CompileError: error: Bad file descriptor Here is the result from weave.test() Running unit tests for scipy.weave NumPy version 1.4.1 NumPy is installed in C:\Python26\lib\site-packages\numpy SciPy version 0.8.0rc3 SciPy is installed in C:\Python26\lib\site-packages\scipy Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] nose version 0.11.2 Thanks a lot!!! Ming -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jul 22 06:32:20 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 22 Jul 2010 18:32:20 +0800 Subject: [SciPy-User] what's the combination of python, scipy, numpy, and MinGW for Weave to work? In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 11:19 AM, Ming Gu wrote: > Hi guys, > > I wonder what's the combination of versions for python, scipy, numpy, and > MinGW for Weave to work? > > My current systems keeps giving the following error: > > Traceback (most recent call last): > File "", line 1, in > test_weave_3(100) > File "C:\temp\test_weave_3.py", line 54, in test_weave_3 > weave.inline(code1, ['m', 's', 'm2', 'm3'], > type_converters=converters.blitz, compiler='gcc', verbose = 2) > File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line > 355, in inline > **kw) > File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line > 482, in compile_function > verbose=verbose, **kw) > File "C:\Python26\lib\site-packages\scipy\weave\ext_tools.py", line 367, > in compile > verbose = verbose, **kw) > File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line > 273, in build_extension > setup(name = module_name, ext_modules = [ext],verbose=verb) > File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 186, > in setup > return old_setup(**new_attr) > File "C:\Python26\lib\distutils\core.py", line 162, in setup > raise SystemExit, error > CompileError: error: Bad file descriptor > > Here is the result from weave.test() > Running unit tests for scipy.weave > NumPy version 1.4.1 > NumPy is installed in C:\Python26\lib\site-packages\numpy > SciPy version 0.8.0rc3 > SciPy is installed in C:\Python26\lib\site-packages\scipy > Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit > (Intel)] > nose version 0.11.2 > > With MinGW 5.1.6 (has gcc 3.4.5) and the same python/numpy/scipy as you have it works for me. Do the weave tests pass for you? i.e.: >>> import scipy.weave >>> scipy.weave.test() $ g++.exe --version g++.exe (GCC) 3.4.5 (mingw-vista special r3) Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 22 10:48:04 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 22 Jul 2010 09:48:04 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Wed, Jul 21, 2010 at 3:25 PM, alex wrote: > Hi, > > I want to nitpick about the scipy kmeans clustering implementation. > Throughout the documentation > http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the > 'distortion' of a clustering is defined as "the sum of the distances between > each observation vector and its dominating centroid." I think that the sum > of squares of distances should be used instead of the sum of distances, and > all of the miscellaneous kmeans descriptions I found with google would seem > to support this. > > For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and the > old center is 3, then the centroid updating step will move the centroid to > 4. This step reduces the sum of squares of distances from 55 to 50, but it > increases the distortion from 11 to 12. > > Alex > Every implementation of kmeans (except for SciPy's) that I have seen allowed for the user to specify which distance measure they want to use. There is no right answer for a distance measure except for "it depends". Maybe SciPy's implementation should be updated to allow for user-specified distance measures (e.g. - absolute, euclidian, city-block, etc.)? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From minggu2 at gmail.com Thu Jul 22 10:55:01 2010 From: minggu2 at gmail.com (Ming Gu) Date: Thu, 22 Jul 2010 10:55:01 -0400 Subject: [SciPy-User] what's the combination of python, scipy, numpy, and MinGW for Weave to work? In-Reply-To: References: Message-ID: thanks Ralf. I got it to work just now. ming On Thu, Jul 22, 2010 at 6:32 AM, Ralf Gommers wrote: > > > On Thu, Jul 22, 2010 at 11:19 AM, Ming Gu wrote: > >> Hi guys, >> >> I wonder what's the combination of versions for python, scipy, numpy, and >> MinGW for Weave to work? >> >> My current systems keeps giving the following error: >> >> Traceback (most recent call last): >> File "", line 1, in >> test_weave_3(100) >> File "C:\temp\test_weave_3.py", line 54, in test_weave_3 >> weave.inline(code1, ['m', 's', 'm2', 'm3'], >> type_converters=converters.blitz, compiler='gcc', verbose = 2) >> File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line >> 355, in inline >> **kw) >> File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line >> 482, in compile_function >> verbose=verbose, **kw) >> File "C:\Python26\lib\site-packages\scipy\weave\ext_tools.py", line 367, >> in compile >> verbose = verbose, **kw) >> File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line >> 273, in build_extension >> setup(name = module_name, ext_modules = [ext],verbose=verb) >> File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 186, >> in setup >> return old_setup(**new_attr) >> File "C:\Python26\lib\distutils\core.py", line 162, in setup >> raise SystemExit, error >> CompileError: error: Bad file descriptor >> >> Here is the result from weave.test() >> Running unit tests for scipy.weave >> NumPy version 1.4.1 >> NumPy is installed in C:\Python26\lib\site-packages\numpy >> SciPy version 0.8.0rc3 >> SciPy is installed in C:\Python26\lib\site-packages\scipy >> Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 >> bit (Intel)] >> nose version 0.11.2 >> >> With MinGW 5.1.6 (has gcc 3.4.5) and the same python/numpy/scipy as you > have it works for me. Do the weave tests pass for you? i.e.: > >>> import scipy.weave > >>> scipy.weave.test() > > $ g++.exe --version > g++.exe (GCC) 3.4.5 (mingw-vista special r3) > > Cheers, > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Thu Jul 22 11:42:36 2010 From: argriffi at ncsu.edu (alex) Date: Thu, 22 Jul 2010 11:42:36 -0400 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 10:48 AM, Benjamin Root wrote: > On Wed, Jul 21, 2010 at 3:25 PM, alex wrote: > >> Hi, >> >> I want to nitpick about the scipy kmeans clustering implementation. >> Throughout the documentation >> http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the >> 'distortion' of a clustering is defined as "the sum of the distances between >> each observation vector and its dominating centroid." I think that the sum >> of squares of distances should be used instead of the sum of distances, and >> all of the miscellaneous kmeans descriptions I found with google would seem >> to support this. >> >> For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and the >> old center is 3, then the centroid updating step will move the centroid to >> 4. This step reduces the sum of squares of distances from 55 to 50, but it >> increases the distortion from 11 to 12. >> >> Alex >> > > Every implementation of kmeans (except for SciPy's) that I have seen > allowed for the user to specify which distance measure they want to use. > There is no right answer for a distance measure except for "it depends". > Maybe SciPy's implementation should be updated to allow for user-specified > distance measures (e.g. - absolute, euclidian, city-block, etc.)? > > Ben Root > While the best distance might depend on the application, I think that of all these distance measures only the sum of squares of Euclidean distances is guaranteed to monotonically decrease at each step of the algorithm. If the scipy kmeans implementation depends on this monotonicity, which I think it currently does, then this assumption could be the source of subtle bugs when error measures like distortion are used. Maybe I can nitpick more effectively if I frame it as "scipy is doing something strange and possibly buggy" instead of as "scipy is not using the distance function I want". For example: >>> import numpy as np >>> from scipy import cluster >>> v = np.array([1,2,3,4,10]) >>> cluster.vq.kmeans(v, 1) (array([4]), 2.3999999999999999) >>> np.mean([abs(x-4) for x in v]) 2.3999999999999999 >>> np.mean([abs(x-3) for x in v]) 2.2000000000000002 The result of this kmeans call suggests that the center 4 is best with distortion 2.4. In fact this is not the case because a center of 3 would have distortion 2.2. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 22 12:24:09 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 22 Jul 2010 11:24:09 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 10:42 AM, alex wrote: > On Thu, Jul 22, 2010 at 10:48 AM, Benjamin Root wrote: > >> On Wed, Jul 21, 2010 at 3:25 PM, alex wrote: >> >>> Hi, >>> >>> I want to nitpick about the scipy kmeans clustering implementation. >>> Throughout the documentation >>> http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the >>> 'distortion' of a clustering is defined as "the sum of the distances between >>> each observation vector and its dominating centroid." I think that the sum >>> of squares of distances should be used instead of the sum of distances, and >>> all of the miscellaneous kmeans descriptions I found with google would seem >>> to support this. >>> >>> For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and >>> the old center is 3, then the centroid updating step will move the centroid >>> to 4. This step reduces the sum of squares of distances from 55 to 50, but >>> it increases the distortion from 11 to 12. >>> >>> Alex >>> >> >> Every implementation of kmeans (except for SciPy's) that I have seen >> allowed for the user to specify which distance measure they want to use. >> There is no right answer for a distance measure except for "it depends". >> Maybe SciPy's implementation should be updated to allow for user-specified >> distance measures (e.g. - absolute, euclidian, city-block, etc.)? >> >> Ben Root >> > > While the best distance might depend on the application, I think that of > all these distance measures only the sum of squares of Euclidean distances > is guaranteed to monotonically decrease at each step of the algorithm. If > the scipy kmeans implementation depends on this monotonicity, which I think > it currently does, then this assumption could be the source of subtle bugs > when error measures like distortion are used. > > Maybe I can nitpick more effectively if I frame it as "scipy is doing > something strange and possibly buggy" instead of as "scipy is not using the > distance function I want". > > It has been a while since I played with kmeans, but I believe that the distance measure merely has to satisfy Minkowski's inequality which is that the norm of a + b <= norm of a + norm of b (or was it the Cauchy-Schwarz inequality?) > For example: > > >>> import numpy as np > >>> from scipy import cluster > >>> v = np.array([1,2,3,4,10]) > >>> cluster.vq.kmeans(v, 1) > (array([4]), 2.3999999999999999) > >>> np.mean([abs(x-4) for x in v]) > 2.3999999999999999 > >>> np.mean([abs(x-3) for x in v]) > 2.2000000000000002 > > The result of this kmeans call suggests that the center 4 is best with > distortion 2.4. In fact this is not the case because a center of 3 would > have distortion 2.2. > > I wonder if this is really a bug in the minimization code rather than an issue with the distortion measure itself. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jul 22 13:07:53 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 22 Jul 2010 11:07:53 -0600 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 10:24 AM, Benjamin Root wrote: > On Thu, Jul 22, 2010 at 10:42 AM, alex wrote: > >> On Thu, Jul 22, 2010 at 10:48 AM, Benjamin Root wrote: >> >>> On Wed, Jul 21, 2010 at 3:25 PM, alex wrote: >>> >>>> Hi, >>>> >>>> I want to nitpick about the scipy kmeans clustering implementation. >>>> Throughout the documentation >>>> http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the >>>> 'distortion' of a clustering is defined as "the sum of the distances between >>>> each observation vector and its dominating centroid." I think that the sum >>>> of squares of distances should be used instead of the sum of distances, and >>>> all of the miscellaneous kmeans descriptions I found with google would seem >>>> to support this. >>>> >>>> For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and >>>> the old center is 3, then the centroid updating step will move the centroid >>>> to 4. This step reduces the sum of squares of distances from 55 to 50, but >>>> it increases the distortion from 11 to 12. >>>> >>>> Alex >>>> >>> >>> Every implementation of kmeans (except for SciPy's) that I have seen >>> allowed for the user to specify which distance measure they want to use. >>> There is no right answer for a distance measure except for "it depends". >>> Maybe SciPy's implementation should be updated to allow for user-specified >>> distance measures (e.g. - absolute, euclidian, city-block, etc.)? >>> >>> Ben Root >>> >> >> While the best distance might depend on the application, I think that of >> all these distance measures only the sum of squares of Euclidean distances >> is guaranteed to monotonically decrease at each step of the algorithm. If >> the scipy kmeans implementation depends on this monotonicity, which I think >> it currently does, then this assumption could be the source of subtle bugs >> when error measures like distortion are used. >> >> Maybe I can nitpick more effectively if I frame it as "scipy is doing >> something strange and possibly buggy" instead of as "scipy is not using the >> distance function I want". >> >> > It has been a while since I played with kmeans, but I believe that the > distance measure merely has to satisfy Minkowski's inequality which is that > the norm of a + b <= norm of a + norm of b (or was it the Cauchy-Schwarz > inequality?) > That's the triangle inequality and is a required property of anything called a norm. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Thu Jul 22 13:12:01 2010 From: argriffi at ncsu.edu (alex) Date: Thu, 22 Jul 2010 13:12:01 -0400 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: > > > For example: >> >> >>> import numpy as np >> >>> from scipy import cluster >> >>> v = np.array([1,2,3,4,10]) >> >>> cluster.vq.kmeans(v, 1) >> (array([4]), 2.3999999999999999) >> >>> np.mean([abs(x-4) for x in v]) >> 2.3999999999999999 >> >>> np.mean([abs(x-3) for x in v]) >> 2.2000000000000002 >> >> The result of this kmeans call suggests that the center 4 is best with >> distortion 2.4. In fact this is not the case because a center of 3 would >> have distortion 2.2. >> >> > I wonder if this is really a bug in the minimization code rather than an > issue with the distortion measure itself. > > Ben Root > The bug is in the _kmeans function in vq.py where it uses avg_dist[-2] - avg_dist[-1] <= thresh as a stopping condition. This condition mistakenly assumes that the distortion monotonically decreases. One consequence is that when the distortion increases, avg_dist[-2] - avg_dist[-1] will be negative, and the codebook and distortion associated with avg_dist[-1] are returned. This is where the 2.4 vs 2.2 error comes from. I guess there could be a few ways to resolve the bug. One way could be to use the sum of squares of distances instead of the distortion; this would guarantee that the error sequence monotonically decreases, and I suspect that this is what the author had originally intended. Another way to deal with the bug could be to report the second to last codebook and distortion instead of the last codebook and distortion when the stopping condition is met. This would probably fix the bug in the 2.2 vs. 2.4 example, but it is kind of a kludge; if the sequence does not monotonically decrease, then does it really make sense to use a small change as a stopping condition? Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jul 22 15:15:54 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 22 Jul 2010 12:15:54 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 10:12 AM, alex wrote: >> >>> For example: >>> >>> >>> import numpy as np >>> >>> from scipy import cluster >>> >>> v = np.array([1,2,3,4,10]) >>> >>> cluster.vq.kmeans(v, 1) >>> (array([4]), 2.3999999999999999) >>> >>> np.mean([abs(x-4) for x in v]) >>> 2.3999999999999999 >>> >>> np.mean([abs(x-3) for x in v]) >>> 2.2000000000000002 >>> >>> The result of this kmeans call suggests that the center 4 is best with >>> distortion 2.4.? In fact this is not the case because a center of 3 would >>> have distortion 2.2. >>> >> >> I wonder if this is really a bug in the minimization code rather than an >> issue with the distortion measure itself. >> >> Ben Root > > The bug is in the _kmeans function in vq.py where it uses avg_dist[-2] - > avg_dist[-1] <= thresh as a stopping condition.? This condition mistakenly > assumes that the distortion monotonically decreases.? One consequence is > that when the distortion increases, avg_dist[-2] - avg_dist[-1] will be > negative, and the codebook and distortion associated with avg_dist[-1] are > returned.? This is where the 2.4 vs 2.2 error comes from. > > I guess there could be a few ways to resolve the bug.? One way could be to > use the sum of squares of distances instead of the distortion; this would > guarantee that the error sequence monotonically decreases, and I suspect > that this is what the author had originally intended. You'd like to minimize the squared error (I don't know much about it but that makes sense to me). But in the example you chose, the squared error is minimized since the mean is 4. Was that just a coincidence? I guess in the end the code is protected against any claims of bugs since it doesn't guarantee to find the global minimum :) > Another way to deal with the bug could be to report the second to last > codebook and distortion instead of the last codebook and distortion when the > stopping condition is met.? This would probably fix the bug in the 2.2 vs. > 2.4 example, but it is kind of a kludge; if the sequence does not > monotonically decrease, then does it really make sense to use a small change > as a stopping condition? > > Alex > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ben.root at ou.edu Thu Jul 22 15:23:47 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 22 Jul 2010 14:23:47 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 12:12 PM, alex wrote: >> >>> For example: >>> >>> >>> import numpy as np >>> >>> from scipy import cluster >>> >>> v = np.array([1,2,3,4,10]) >>> >>> cluster.vq.kmeans(v, 1) >>> (array([4]), 2.3999999999999999) >>> >>> np.mean([abs(x-4) for x in v]) >>> 2.3999999999999999 >>> >>> np.mean([abs(x-3) for x in v]) >>> 2.2000000000000002 >>> >>> The result of this kmeans call suggests that the center 4 is best with >>> distortion 2.4. In fact this is not the case because a center of 3 would >>> have distortion 2.2. >>> >> >> I wonder if this is really a bug in the minimization code rather than an >> issue with the distortion measure itself. >> >> Ben Root > > The bug is in the _kmeans function in vq.py where it uses avg_dist[-2] - > avg_dist[-1] <= thresh as a stopping condition. This condition mistakenly > assumes that the distortion monotonically decreases. One consequence is > that when the distortion increases, avg_dist[-2] - avg_dist[-1] will be > negative, and the codebook and distortion associated with avg_dist[-1] are > returned. This is where the 2.4 vs 2.2 error comes from. > > I guess there could be a few ways to resolve the bug. One way could be to > use the sum of squares of distances instead of the distortion; this would > guarantee that the error sequence monotonically decreases, and I suspect > that this is what the author had originally intended. > > Another way to deal with the bug could be to report the second to last > codebook and distortion instead of the last codebook and distortion when the > stopping condition is met. This would probably fix the bug in the 2.2 vs. > 2.4 example, but it is kind of a kludge; if the sequence does not > monotonically decrease, then does it really make sense to use a small change > as a stopping condition? > > Alex > I have to search for some old code of mine, but if I remember correctly, it would seem that the latter solution is actually a better fix because it addresses the heart of the matter. A minimization algorithm that doesn't return the most minimum value that it knows is buggy. The stopping condition is another issue altogether. There was a nice C-implemented, MIT-licensed kmeans algorithm that I used several years ago that had a much smarter stopping condition and several other useful features. Let me see if I can find it and we can compare. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 22 15:25:56 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 22 Jul 2010 14:25:56 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 12:07 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, Jul 22, 2010 at 10:24 AM, Benjamin Root wrote: > >> On Thu, Jul 22, 2010 at 10:42 AM, alex wrote: >> >>> On Thu, Jul 22, 2010 at 10:48 AM, Benjamin Root wrote: >>> >>>> On Wed, Jul 21, 2010 at 3:25 PM, alex wrote: >>>> >>>>> Hi, >>>>> >>>>> I want to nitpick about the scipy kmeans clustering implementation. >>>>> Throughout the documentation >>>>> http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, >>>>> the 'distortion' of a clustering is defined as "the sum of the distances >>>>> between each observation vector and its dominating centroid." I think that >>>>> the sum of squares of distances should be used instead of the sum of >>>>> distances, and all of the miscellaneous kmeans descriptions I found with >>>>> google would seem to support this. >>>>> >>>>> For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and >>>>> the old center is 3, then the centroid updating step will move the centroid >>>>> to 4. This step reduces the sum of squares of distances from 55 to 50, but >>>>> it increases the distortion from 11 to 12. >>>>> >>>>> Alex >>>>> >>>> >>>> Every implementation of kmeans (except for SciPy's) that I have seen >>>> allowed for the user to specify which distance measure they want to use. >>>> There is no right answer for a distance measure except for "it depends". >>>> Maybe SciPy's implementation should be updated to allow for user-specified >>>> distance measures (e.g. - absolute, euclidian, city-block, etc.)? >>>> >>>> Ben Root >>>> >>> >>> While the best distance might depend on the application, I think that of >>> all these distance measures only the sum of squares of Euclidean distances >>> is guaranteed to monotonically decrease at each step of the algorithm. If >>> the scipy kmeans implementation depends on this monotonicity, which I think >>> it currently does, then this assumption could be the source of subtle bugs >>> when error measures like distortion are used. >>> >>> Maybe I can nitpick more effectively if I frame it as "scipy is doing >>> something strange and possibly buggy" instead of as "scipy is not using the >>> distance function I want". >>> >>> >> It has been a while since I played with kmeans, but I believe that the >> distance measure merely has to satisfy Minkowski's inequality which is that >> the norm of a + b <= norm of a + norm of b (or was it the Cauchy-Schwarz >> inequality?) >> > > That's the triangle inequality and is a required property of anything > called a norm. > > > > Chuck > > > Right... why is it that I first think of the most complex concepts first...? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Thu Jul 22 15:31:13 2010 From: argriffi at ncsu.edu (alex) Date: Thu, 22 Jul 2010 15:31:13 -0400 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 3:15 PM, Keith Goodman wrote: > You'd like to minimize the squared error (I don't know much about it > but that makes sense to me). But in the example you chose, the squared > error is minimized since the mean is 4. Was that just a coincidence? I > guess in the end the code is protected against any claims of bugs > since it doesn't guarantee to find the global minimum :) > This was not really a coincidence, because the algorithm converges to a local minimum of sum of squared distances. This is why I was suggesting using this sum of squared distances as a stopping criterion and returning this value instead of the distortion. Or alternatively we could use the k-means code Benjamin mentioned if he digs it up and if it allows multiple distance functions and has a reasonable stopping criterion. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jul 22 15:51:15 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 22 Jul 2010 12:51:15 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 12:31 PM, alex wrote: > On Thu, Jul 22, 2010 at 3:15 PM, Keith Goodman wrote: > >> >> You'd like to minimize the squared error (I don't know much about it >> but that makes sense to me). But in the example you chose, the squared >> error is minimized since the mean is 4. Was that just a coincidence? I >> guess in the end the code is protected against any claims of bugs >> since it doesn't guarantee to find the global minimum :) > > This was not really a coincidence, because the algorithm converges to a > local minimum of sum of squared distances.? This is why I was suggesting > using this sum of squared distances as a stopping criterion and returning > this value instead of the distortion.? Or alternatively we could use the > k-means code Benjamin mentioned if he digs it up and if it allows multiple > distance functions and has a reasonable stopping criterion. OK, thank you, I think I get it. It minimizes one measure (squared distance) but it uses another measure (distance) for stopping. Another plus for squared distance is that it is faster to calculate then mean distance, dot(dist, dist) versus mean(dist) or dot(dist, one). From charlesr.harris at gmail.com Thu Jul 22 20:26:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 22 Jul 2010 18:26:32 -0600 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Wed, Jul 21, 2010 at 2:18 AM, David Andrews wrote: > Hi All, > > I suppose this might not strictly be a scipy type question, but I'll > ask here as I expect some of you might understand what I'm getting at! > > I'm in the process of porting some code from IDL (Interactive Data > Language - popular in some fields of science, but largely nowhere > else) to Python. Essentially it's just plotting and analyzing time > series data, and so most of the porting is relatively simple. The one > stumbling block - is there an equivalent or useful replacement for the > "common block" concept in IDL available in Python? > > Common blocks are areas of shared memory held by IDL that can be > accessed easily from within sub-routines. So for example, in our IDL > code, we load data into these common blocks at the start of a session, > and then perform whatever analysis on it. In this manner, we do not > have to continually re-load data every time we re-perform a piece of > analysis. They store their contents persistently, for the duration of > the IDL session. It's all for academic research purposes, so it's > very much 'try this / see what happens / alter it, try again' kind of > work. The loading and initial processing of data is fairly time > intensive, so having to reload at each step is a bit frustrating and > not very productive. > > So, does anyone have any suggestions as to the best way to go about > porting this sort of behavior? Pickle seems to be one option, but > that would involve read/write to disk operations anyway? Any others? > > Depending on the sort of data you have, PyTables might be an option. I'm currently using it to store a 42 GB image data cube on disk and it works well for that. I can browse through an image and shift-click on a pixel to get a plot of the data associated with the pixel. It is quite fast. The data cube needs to be passed as an argument to the various functions that need the data, but that isn't much of a problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 22 23:49:37 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 22 Jul 2010 22:49:37 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 2:23 PM, Benjamin Root wrote: > On Thu, Jul 22, 2010 at 12:12 PM, alex wrote: > >> > >>> For example: > >>> > >>> >>> import numpy as np > >>> >>> from scipy import cluster > >>> >>> v = np.array([1,2,3,4,10]) > >>> >>> cluster.vq.kmeans(v, 1) > >>> (array([4]), 2.3999999999999999) > >>> >>> np.mean([abs(x-4) for x in v]) > >>> 2.3999999999999999 > >>> >>> np.mean([abs(x-3) for x in v]) > >>> 2.2000000000000002 > >>> > >>> The result of this kmeans call suggests that the center 4 is best with > >>> distortion 2.4. In fact this is not the case because a center of 3 > would > >>> have distortion 2.2. > >>> > >> > >> I wonder if this is really a bug in the minimization code rather than an > >> issue with the distortion measure itself. > >> > >> Ben Root > > > > The bug is in the _kmeans function in vq.py where it uses avg_dist[-2] - > > avg_dist[-1] <= thresh as a stopping condition. This condition > mistakenly > > assumes that the distortion monotonically decreases. One consequence is > > that when the distortion increases, avg_dist[-2] - avg_dist[-1] will be > > negative, and the codebook and distortion associated with avg_dist[-1] > are > > returned. This is where the 2.4 vs 2.2 error comes from. > > > > I guess there could be a few ways to resolve the bug. One way could be > to > > use the sum of squares of distances instead of the distortion; this would > > guarantee that the error sequence monotonically decreases, and I suspect > > that this is what the author had originally intended. > > > > Another way to deal with the bug could be to report the second to last > > codebook and distortion instead of the last codebook and distortion when > the > > stopping condition is met. This would probably fix the bug in the 2.2 > vs. > > 2.4 example, but it is kind of a kludge; if the sequence does not > > monotonically decrease, then does it really make sense to use a small > change > > as a stopping condition? > > > > Alex > > > > I have to search for some old code of mine, but if I remember correctly, it > would seem that the latter solution is actually a better fix because it > addresses the heart of the matter. A minimization algorithm that doesn't > return the most minimum value that it knows is buggy. The stopping > condition is another issue altogether. > > There was a nice C-implemented, MIT-licensed kmeans algorithm that I used > several years ago that had a much smarter stopping condition and several > other useful features. Let me see if I can find it and we can compare. > > Ben Root > > > Ok, I think I found it. If this is the same library that I remember, it isn't MIT licensed, but rather it seems to be Python licensed and already in use for Pycluster. http://bonsai.hgc.jp/~mdehoon/software/cluster/index.html I'll look through the code in the morning to see what the stopping condition is, but I believe that it keeps iterating until it can no longer make any reassignments. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Fri Jul 23 03:01:28 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 23 Jul 2010 09:01:28 +0200 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 2:26 AM, Charles R Harris wrote: > > > On Wed, Jul 21, 2010 at 2:18 AM, David Andrews wrote: >> >> Hi All, >> >> I suppose this might not strictly be a scipy type question, but I'll >> ask here as I expect some of you might understand what I'm getting at! >> >> I'm in the process of porting some code from IDL (Interactive Data >> Language - popular in some fields of science, but largely nowhere >> else) to Python. ?Essentially it's just plotting and analyzing time >> series data, and so most of the porting is relatively simple. ?The one >> stumbling block - is there an equivalent or useful replacement for the >> "common block" concept in IDL available in Python? >> >> Common blocks are areas of shared memory held by IDL that can be >> accessed easily from within sub-routines. ?So for example, in our IDL >> code, we load data into these common blocks at the start of a session, >> and then perform whatever analysis on it. ?In this manner, we do not >> have to continually re-load data every time we re-perform a piece of >> analysis. ?They store their contents persistently, for the duration of >> the IDL session. ?It's all for academic research purposes, so it's >> very much 'try this / see what happens / alter it, try again' kind of >> work. ?The loading and initial processing of data is fairly time >> intensive, so having to reload at each step is a bit frustrating and >> not very productive. >> >> So, does anyone have any suggestions as to the best way to go about >> porting this sort of behavior? ?Pickle seems to be one option, but >> that would involve read/write to disk operations anyway? ?Any others? >> > > Depending on the sort of data you have, PyTables might be an option. I'm > currently using it to store a 42 GB image data cube on disk and it works > well for that. I can browse through an image and shift-click on a pixel to > get a plot of the data associated with the pixel. It is quite fast. The data > cube needs to be passed as an argument to the various functions that need > the data, but that isn't much of a problem. > Chuck, just out of curiosity: what are the specs of your hardware and which OS are you on ? -Sebastian Haase From irbdavid at gmail.com Fri Jul 23 06:19:02 2010 From: irbdavid at gmail.com (David Andrews) Date: Fri, 23 Jul 2010 11:19:02 +0100 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: Okay, thanks for the input, all. I've just fiddled around with it a bit, and it seems that Robin's suggestion makes the most sense. Define a module that has the persistent data in it. That can then provide load / unload type functions that manage the data it stores. Used inside of ipython it looks like it prevents needing to re-load data unnecessarily. As an aside, it looks to me like the cPickle stuff is actually substantially faster than I thought it might be, so it's probably not as prohibitive as I thought to use that as a session 'scratch file' or whatever one wants to call it. Now it's just a matter of converting ~13k lines of IDL into python :D Cheers, Dave On Fri, Jul 23, 2010 at 8:01 AM, Sebastian Haase wrote: > On Fri, Jul 23, 2010 at 2:26 AM, Charles R Harris > wrote: >> >> >> On Wed, Jul 21, 2010 at 2:18 AM, David Andrews wrote: >>> >>> Hi All, >>> >>> I suppose this might not strictly be a scipy type question, but I'll >>> ask here as I expect some of you might understand what I'm getting at! >>> >>> I'm in the process of porting some code from IDL (Interactive Data >>> Language - popular in some fields of science, but largely nowhere >>> else) to Python. ?Essentially it's just plotting and analyzing time >>> series data, and so most of the porting is relatively simple. ?The one >>> stumbling block - is there an equivalent or useful replacement for the >>> "common block" concept in IDL available in Python? >>> >>> Common blocks are areas of shared memory held by IDL that can be >>> accessed easily from within sub-routines. ?So for example, in our IDL >>> code, we load data into these common blocks at the start of a session, >>> and then perform whatever analysis on it. ?In this manner, we do not >>> have to continually re-load data every time we re-perform a piece of >>> analysis. ?They store their contents persistently, for the duration of >>> the IDL session. ?It's all for academic research purposes, so it's >>> very much 'try this / see what happens / alter it, try again' kind of >>> work. ?The loading and initial processing of data is fairly time >>> intensive, so having to reload at each step is a bit frustrating and >>> not very productive. >>> >>> So, does anyone have any suggestions as to the best way to go about >>> porting this sort of behavior? ?Pickle seems to be one option, but >>> that would involve read/write to disk operations anyway? ?Any others? >>> >> >> Depending on the sort of data you have, PyTables might be an option. I'm >> currently using it to store a 42 GB image data cube on disk and it works >> well for that. I can browse through an image and shift-click on a pixel to >> get a plot of the data associated with the pixel. It is quite fast. The data >> cube needs to be passed as an argument to the various functions that need >> the data, but that isn't much of a problem. >> > Chuck, just out of curiosity: what are the specs of your hardware and > which OS are you on ? > > -Sebastian Haase > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cournape at gmail.com Fri Jul 23 08:55:25 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 23 Jul 2010 21:55:25 +0900 Subject: [SciPy-User] Audiolab 0.11.0 Message-ID: Hi, I am pleased to announce the 0.11.0 release of audiolab scikits, the package to read/write audio file formats into numpy. This release has barely no changes compared to 0.10.x series, but it finally fixes annoying windows issues (which ended up being mingw bugs). Source tarball and python 2.6 windows installer are available on pypi, cheers, David From charlesr.harris at gmail.com Fri Jul 23 10:28:59 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Jul 2010 08:28:59 -0600 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 1:01 AM, Sebastian Haase wrote: > On Fri, Jul 23, 2010 at 2:26 AM, Charles R Harris > wrote: > > > > > > On Wed, Jul 21, 2010 at 2:18 AM, David Andrews > wrote: > >> > >> Hi All, > >> > >> I suppose this might not strictly be a scipy type question, but I'll > >> ask here as I expect some of you might understand what I'm getting at! > >> > >> I'm in the process of porting some code from IDL (Interactive Data > >> Language - popular in some fields of science, but largely nowhere > >> else) to Python. Essentially it's just plotting and analyzing time > >> series data, and so most of the porting is relatively simple. The one > >> stumbling block - is there an equivalent or useful replacement for the > >> "common block" concept in IDL available in Python? > >> > >> Common blocks are areas of shared memory held by IDL that can be > >> accessed easily from within sub-routines. So for example, in our IDL > >> code, we load data into these common blocks at the start of a session, > >> and then perform whatever analysis on it. In this manner, we do not > >> have to continually re-load data every time we re-perform a piece of > >> analysis. They store their contents persistently, for the duration of > >> the IDL session. It's all for academic research purposes, so it's > >> very much 'try this / see what happens / alter it, try again' kind of > >> work. The loading and initial processing of data is fairly time > >> intensive, so having to reload at each step is a bit frustrating and > >> not very productive. > >> > >> So, does anyone have any suggestions as to the best way to go about > >> porting this sort of behavior? Pickle seems to be one option, but > >> that would involve read/write to disk operations anyway? Any others? > >> > > > > Depending on the sort of data you have, PyTables might be an option. I'm > > currently using it to store a 42 GB image data cube on disk and it works > > well for that. I can browse through an image and shift-click on a pixel > to > > get a plot of the data associated with the pixel. It is quite fast. The > data > > cube needs to be passed as an argument to the various functions that need > > the data, but that isn't much of a problem. > > > Chuck, just out of curiosity: what are the specs of your hardware and > which OS are you on ? > > It is 64 bit ubuntu running on quad core intel, 8GB memory. The memory usage is pretty modest in practice as PyTables chunks the data. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 23 10:37:20 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Jul 2010 08:37:20 -0600 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 4:19 AM, David Andrews wrote: > Okay, thanks for the input, all. I've just fiddled around with it a > bit, and it seems that Robin's suggestion makes the most sense. > Define a module that has the persistent data in it. That can then > provide load / unload type functions that manage the data it stores. > Used inside of ipython it looks like it prevents needing to re-load > data unnecessarily. > > As an aside, it looks to me like the cPickle stuff is actually > substantially faster than I thought it might be, so it's probably not > as prohibitive as I thought to use that as a session 'scratch file' or > whatever one wants to call it. > > If you use cPickle for large amounts of data the protocol becomes important, there is a vast speed difference between the default protocol and protocol=2. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Jul 23 11:27:45 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 23 Jul 2010 11:27:45 -0400 Subject: [SciPy-User] impulse invariant (or step invariant) IIR filter design Message-ID: Has anyone implemented impulse invariant (or step invariant) IIR filter design in scipy? I can find some matlab code, but I don't know matlab. From Chris.Barker at noaa.gov Fri Jul 23 12:41:16 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 23 Jul 2010 09:41:16 -0700 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: References: Message-ID: <4C49C62C.3000902@noaa.gov> David Andrews wrote: > Now it's just a matter of converting ~13k lines of IDL into python :D A little piece of advice here -- I would resist the urge to "convert" your code line by line, or even function by function. I'm not familiar with IDL, but I do know that you will be much happier in the long run with a code structure that is natural in Python. IIUC, IDL is an array-oriented language, so that part will hopefully be a natural transition, but from the sounds of it the overall code structure may need to be quite different. So, think about the project as a whole, and how it can best be structured in Python, and start from there. Oh, and write unit-tests from the beginning! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From perry at stsci.edu Fri Jul 23 13:16:04 2010 From: perry at stsci.edu (Perry Greenfield) Date: Fri, 23 Jul 2010 13:16:04 -0400 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: <4C49C62C.3000902@noaa.gov> References: <4C49C62C.3000902@noaa.gov> Message-ID: <4F22CB64-4D53-4B98-A273-D0619AF57B6E@stsci.edu> On Jul 23, 2010, at 12:41 PM, Christopher Barker wrote: > David Andrews wrote: >> Now it's just a matter of converting ~13k lines of IDL into python :D > > A little piece of advice here -- I would resist the urge to "convert" > your code line by line, or even function by function. > > I'm not familiar with IDL, but I do know that you will be much happier > in the long run with a code structure that is natural in Python. IIUC, > IDL is an array-oriented language, so that part will hopefully be a > natural transition, but from the sounds of it the overall code > structure > may need to be quite different. > > So, think about the project as a whole, and how it can best be > structured in Python, and start from there. > > Oh, and write unit-tests from the beginning! I wonder if that is the best approach if one doesn't have much experience with Python. I know that is the ideal thing to do, but it you don't have much experience, you'll probably have to do it again anyway. Perhaps do a few line-by-line (the ones you need right now if you can keep it to a small subset), and after you get some experience writing (and reading other people's) Python code, redo them. Just a thought. Another thought: avoid the "common block" approach as much as you can in your new code. There are usually better ways to do it. Perry From ben.root at ou.edu Fri Jul 23 13:19:39 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 12:19:39 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Thu, Jul 22, 2010 at 10:49 PM, Benjamin Root wrote: > On Thu, Jul 22, 2010 at 2:23 PM, Benjamin Root wrote: > >> On Thu, Jul 22, 2010 at 12:12 PM, alex wrote: >> >> >> >>> For example: >> >>> >> >>> >>> import numpy as np >> >>> >>> from scipy import cluster >> >>> >>> v = np.array([1,2,3,4,10]) >> >>> >>> cluster.vq.kmeans(v, 1) >> >>> (array([4]), 2.3999999999999999) >> >>> >>> np.mean([abs(x-4) for x in v]) >> >>> 2.3999999999999999 >> >>> >>> np.mean([abs(x-3) for x in v]) >> >>> 2.2000000000000002 >> >>> >> >>> The result of this kmeans call suggests that the center 4 is best with >> >>> distortion 2.4. In fact this is not the case because a center of 3 >> would >> >>> have distortion 2.2. >> >>> >> >> >> >> I wonder if this is really a bug in the minimization code rather than >> an >> >> issue with the distortion measure itself. >> >> >> >> Ben Root >> > >> > The bug is in the _kmeans function in vq.py where it uses avg_dist[-2] - >> > avg_dist[-1] <= thresh as a stopping condition. This condition >> mistakenly >> > assumes that the distortion monotonically decreases. One consequence is >> > that when the distortion increases, avg_dist[-2] - avg_dist[-1] will be >> > negative, and the codebook and distortion associated with avg_dist[-1] >> are >> > returned. This is where the 2.4 vs 2.2 error comes from. >> > >> > I guess there could be a few ways to resolve the bug. One way could be >> to >> > use the sum of squares of distances instead of the distortion; this >> would >> > guarantee that the error sequence monotonically decreases, and I suspect >> > that this is what the author had originally intended. >> > >> > Another way to deal with the bug could be to report the second to last >> > codebook and distortion instead of the last codebook and distortion when >> the >> > stopping condition is met. This would probably fix the bug in the 2.2 >> vs. >> > 2.4 example, but it is kind of a kludge; if the sequence does not >> > monotonically decrease, then does it really make sense to use a small >> change >> > as a stopping condition? >> > >> > Alex >> > >> >> I have to search for some old code of mine, but if I remember correctly, >> it would seem that the latter solution is actually a better fix because it >> addresses the heart of the matter. A minimization algorithm that doesn't >> return the most minimum value that it knows is buggy. The stopping >> condition is another issue altogether. >> >> There was a nice C-implemented, MIT-licensed kmeans algorithm that I used >> several years ago that had a much smarter stopping condition and several >> other useful features. Let me see if I can find it and we can compare. >> >> Ben Root >> >> >> > Ok, I think I found it. If this is the same library that I remember, it > isn't MIT licensed, but rather it seems to be Python licensed and already in > use for Pycluster. > > http://bonsai.hgc.jp/~mdehoon/software/cluster/index.html > > I'll look through the code in the morning to see what the stopping > condition is, but I believe that it keeps iterating until it can no longer > make any reassignments. > > Ben Root > Examining further, I see that SciPy's implementation is fairly simplistic and has some issues. In the given example, the reason why 3 is never returned is not because of the use of the distortion metric, but rather because the kmeans function never sees the distance for using 3. As a matter of fact, the actual code that does the convergence is in vq and py_vq (vector quantization) and it tries to minimize the sum of squared errors. kmeans just keeps on retrying the convergence with random guesses to see if different convergences occur. Is Pycluster even maintained anymore? Maybe we should look into integrating it into SciPy if it isn't being maintained. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Jul 23 13:27:56 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 24 Jul 2010 02:27:56 +0900 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sat, Jul 24, 2010 at 2:19 AM, Benjamin Root wrote: > > Examining further, I see that SciPy's implementation is fairly simplistic > and has some issues.? In the given example, the reason why 3 is never > returned is not because of the use of the distortion metric, but rather > because the kmeans function never sees the distance for using 3.? As a > matter of fact, the actual code that does the convergence is in vq and py_vq > (vector quantization) and it tries to minimize the sum of squared errors. > kmeans just keeps on retrying the convergence with random guesses to see if > different convergences occur. As one of the maintainer of kmeans, I would be the first to admit the code is basic, for good and bad. Something more elaborate for clustering may indeed be useful, as long as the interface stays simple. More complex needs should turn on scikits.learn or more specialized packages, cheers, David From ben.root at ou.edu Fri Jul 23 13:36:49 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 12:36:49 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 12:27 PM, David Cournapeau wrote: > On Sat, Jul 24, 2010 at 2:19 AM, Benjamin Root wrote: > > > > > Examining further, I see that SciPy's implementation is fairly simplistic > > and has some issues. In the given example, the reason why 3 is never > > returned is not because of the use of the distortion metric, but rather > > because the kmeans function never sees the distance for using 3. As a > > matter of fact, the actual code that does the convergence is in vq and > py_vq > > (vector quantization) and it tries to minimize the sum of squared errors. > > kmeans just keeps on retrying the convergence with random guesses to see > if > > different convergences occur. > > As one of the maintainer of kmeans, I would be the first to admit the > code is basic, for good and bad. Something more elaborate for > clustering may indeed be useful, as long as the interface stays > simple. > > More complex needs should turn on scikits.learn or more specialized > packages, > > cheers, > > David > I agree, kmeans does not need to get very complicated because kmeans (the general concept) is not very suitable for very complicated situations. As a thought, a possible way to help out the current implementation is to ensure that unique guesses are made. Currently, several iterations are wasted by performing guesses that it has already done before. Is there a way to do sampling without replacement in numpy.random? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Fri Jul 23 13:48:47 2010 From: argriffi at ncsu.edu (alex) Date: Fri, 23 Jul 2010 13:48:47 -0400 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 1:36 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 12:27 PM, David Cournapeau wrote: > >> On Sat, Jul 24, 2010 at 2:19 AM, Benjamin Root wrote: >> >> > >> > Examining further, I see that SciPy's implementation is fairly >> simplistic >> > and has some issues. In the given example, the reason why 3 is never >> > returned is not because of the use of the distortion metric, but rather >> > because the kmeans function never sees the distance for using 3. As a >> > matter of fact, the actual code that does the convergence is in vq and >> py_vq >> > (vector quantization) and it tries to minimize the sum of squared >> errors. >> > kmeans just keeps on retrying the convergence with random guesses to see >> if >> > different convergences occur. >> >> As one of the maintainer of kmeans, I would be the first to admit the >> code is basic, for good and bad. Something more elaborate for >> clustering may indeed be useful, as long as the interface stays >> simple. >> >> More complex needs should turn on scikits.learn or more specialized >> packages, >> >> cheers, >> >> David >> > > I agree, kmeans does not need to get very complicated because kmeans (the > general concept) is not very suitable for very complicated situations. > > As a thought, a possible way to help out the current implementation is to > ensure that unique guesses are made. Currently, several iterations are > wasted by performing guesses that it has already done before. Is there a > way to do sampling without replacement in numpy.random? > > Ben Root > > Here is an old thread about initializing kmeans with/without replacement http://old.nabble.com/kmeans-and-initial-centroid-guesses-td26938926.html If scipy wants to use the most vanilla kmeans, then I suggest that it should use sum of squares of errors everywhere it is currently using the sum of errors. If you really want to optimize the sum of errors, then the median is probably a better cluster center than the mean, but adding more center definitions would start to get more complicated. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jul 23 14:12:21 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 13:12:21 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 12:48 PM, alex wrote: > On Fri, Jul 23, 2010 at 1:36 PM, Benjamin Root wrote: > >> On Fri, Jul 23, 2010 at 12:27 PM, David Cournapeau wrote: >> >>> On Sat, Jul 24, 2010 at 2:19 AM, Benjamin Root wrote: >>> >>> > >>> > Examining further, I see that SciPy's implementation is fairly >>> simplistic >>> > and has some issues. In the given example, the reason why 3 is never >>> > returned is not because of the use of the distortion metric, but rather >>> > because the kmeans function never sees the distance for using 3. As a >>> > matter of fact, the actual code that does the convergence is in vq and >>> py_vq >>> > (vector quantization) and it tries to minimize the sum of squared >>> errors. >>> > kmeans just keeps on retrying the convergence with random guesses to >>> see if >>> > different convergences occur. >>> >>> As one of the maintainer of kmeans, I would be the first to admit the >>> code is basic, for good and bad. Something more elaborate for >>> clustering may indeed be useful, as long as the interface stays >>> simple. >>> >>> More complex needs should turn on scikits.learn or more specialized >>> packages, >>> >>> cheers, >>> >>> David >>> >> >> I agree, kmeans does not need to get very complicated because kmeans (the >> general concept) is not very suitable for very complicated situations. >> >> As a thought, a possible way to help out the current implementation is to >> ensure that unique guesses are made. Currently, several iterations are >> wasted by performing guesses that it has already done before. Is there a >> way to do sampling without replacement in numpy.random? >> >> Ben Root >> >> [clip] > If scipy wants to use the most vanilla kmeans, then I suggest that it should > use sum of squares of errors everywhere it is currently using the sum of > errors. If you really want to optimize the sum of errors, then the median > is probably a better cluster center than the mean, but adding more center > definitions would start to get more complicated. > > Alex > > I think there is some confusion here about how the distortion is being used. In line 257 of vq.py, the sum of the square of the difference between the observation and the centroid is calculated for each given obs. Then the square root is taken (line 261) and returned as a distortion at line 378. The only place where I see a simple subtraction being used is up in _py_vq_1d() which isn't being called and at line 391, which is merely being used for tolerance testing. Even in the C code, the difference is taken and immediately squared. Maybe the documentation no longer matches the code? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 14:16:08 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 11:16:08 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 10:48 AM, alex wrote: > On Fri, Jul 23, 2010 at 1:36 PM, Benjamin Root wrote: >> As a thought, a possible way to help out the current implementation is to >> ensure that unique guesses are made.? Currently, several iterations are >> wasted by performing guesses that it has already done before.? Is there a >> way to do sampling without replacement in numpy.random? >> > > Here is an old thread about initializing kmeans with/without replacement > http://old.nabble.com/kmeans-and-initial-centroid-guesses-td26938926.html And here's the ticket: http://projects.scipy.org/scipy/ticket/1078 From lutz.maibaum at gmail.com Fri Jul 23 14:23:43 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 11:23:43 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: > Examining further, I see that SciPy's implementation is fairly simplistic > and has some issues.? In the given example, the reason why 3 is never > returned is not because of the use of the distortion metric, but rather > because the kmeans function never sees the distance for using 3.? As a > matter of fact, the actual code that does the convergence is in vq and py_vq > (vector quantization) and it tries to minimize the sum of squared errors. > kmeans just keeps on retrying the convergence with random guesses to see if > different convergences occur. At least to me, this is pretty much the definition of the k-means algorithm. To be more precise, it is the "standard algorithm" that finds a solution to the k-means optimization problem (to minimize the intra-cluster variance) which doesn't necessarily correspond to the global mimimum (see, for example, http://en.wikipedia.org/wiki/K-means_clustering). I agree that it would be much more natural if the resulting sum of squared distances were returned, since this is the optimization function. > Is Pycluster even maintained anymore?? Maybe we should look into integrating > it into SciPy if it isn't being maintained. As far as I can tell, Pycluster does pretty much the same thing. One improvement that I would suggest is that the kmeans algorithm performs its calculation in a floating point data type if given integer values, which would make it more compatible with np.mean(). At least there should be a warning in the documentation that it doesn't. For example, right now I get the following: In [58]: cluster.vq.kmeans(np.array([1,2]), 1) Out[58]: (array([1]), 0.5) In [59]: cluster.vq.kmeans(np.array([1.,2.]), 1) Out[59]: (array([ 1.5]), 0.5) In [60]: np.mean(np.array([1,2])) Out[60]: 1.5 Best, Lutz From kwgoodman at gmail.com Fri Jul 23 14:33:06 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 11:33:06 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 11:23 AM, Lutz Maibaum wrote: >> Examining further, I see that SciPy's implementation is fairly simplistic >> and has some issues.? In the given example, the reason why 3 is never >> returned is not because of the use of the distortion metric, but rather >> because the kmeans function never sees the distance for using 3.? As a >> matter of fact, the actual code that does the convergence is in vq and py_vq >> (vector quantization) and it tries to minimize the sum of squared errors. >> kmeans just keeps on retrying the convergence with random guesses to see if >> different convergences occur. > > At least to me, this is pretty much the definition of the k-means > algorithm. To be more precise, it is the "standard algorithm" that > finds a solution to the k-means optimization problem (to minimize the > intra-cluster variance) which doesn't necessarily correspond to the > global mimimum (see, for example, > http://en.wikipedia.org/wiki/K-means_clustering). I agree that it > would be much more natural if the resulting sum of squared distances > were returned, since this is the optimization function. > >> Is Pycluster even maintained anymore?? Maybe we should look into integrating >> it into SciPy if it isn't being maintained. > > As far as I can tell, Pycluster does pretty much the same thing. > > One improvement that I would suggest is that the kmeans algorithm > performs its calculation in a floating point data type if given > integer values, which would make it more compatible with np.mean(). At > least there should be a warning in the documentation that it doesn't. > For example, right now I get the following: > > In [58]: cluster.vq.kmeans(np.array([1,2]), 1) > Out[58]: (array([1]), 0.5) > > In [59]: cluster.vq.kmeans(np.array([1.,2.]), 1) > Out[59]: (array([ 1.5]), 0.5) > > In [60]: np.mean(np.array([1,2])) > Out[60]: 1.5 Looks like a bug to me. I think it makes sense to fix what is already there and then take the time to look for new implementations. Big projects like new implementations tend not to get done. What needs to be fixed? - Switch code and doc to use rmse - Integer bug - Select initial centroids without replacement From lutz.maibaum at gmail.com Fri Jul 23 14:39:27 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 11:39:27 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 11:33 AM, Keith Goodman wrote: > What needs to be fixed? > > - Switch code and doc to use rmse To be compatible with the (at least to me!) standard use of k-means, I think both code and doc should use the sum of squared distances as the cost function in the optimization, and also as the return value. -- Lutz From kwgoodman at gmail.com Fri Jul 23 14:54:27 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 11:54:27 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum wrote: > On Fri, Jul 23, 2010 at 11:33 AM, Keith Goodman wrote: >> What needs to be fixed? >> >> - Switch code and doc to use rmse > > To be compatible with the (at least to me!) standard use of k-means, I > think both code and doc should use the sum of squared distances as the > cost function in the optimization, and also as the return value. What about the thresh (threshold) input parameter? If the sum of squares were used then the user would have to adjust the threshold for the number of data points. From jh at physics.ucf.edu Fri Jul 23 15:01:06 2010 From: jh at physics.ucf.edu (Joe Harrington) Date: Fri, 23 Jul 2010 15:01:06 -0400 Subject: [SciPy-User] reviewers needed for NumPy Message-ID: Hi folks, We are (finally) about to begin reviewing and proofing the NumPy docstrings! This is the final step in producing professional-level docs for NumPy. What we need now are people willing to review docs. There are two types of reviewers: Technical reviewers should be developers or *very* experienced NumPy users. Technical review entails checking the source code (it's available on a click in the doc wiki) and reading the doc to ensure that the signature and description are both correct and complete. Presentation reviewers need to be modestly experienced with NumPy, and should have some experience either in technical writing or as educators. Their job is to make sure the docstring is understandable to the target audience (one level below the expected user of that item), including appropriate examples and references. Review entails reading each page, checking that it meets the review standards, and either approving it or saying how it doesn't meet them. All this takes place on the doc wiki, so the mechanics are easy. Please post a message on scipy-dev if you are interested in becoming a reviewer, or if you have questions about reviewing. As a volunteer reviewer, you can put as much or as little time into this as you like. Thanks! --jh-- for the SciPy Documentation Project team From lutz.maibaum at gmail.com Fri Jul 23 15:06:13 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 12:06:13 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 11:54 AM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum wrote: >> To be compatible with the (at least to me!) standard use of k-means, I >> think both code and doc should use the sum of squared distances as the >> cost function in the optimization, and also as the return value. > > What about the thresh (threshold) input parameter? If the sum of > squares were used then the user would have to adjust the threshold for > the number of data points. That's true, but personally I don't find that much of a problem. Using an absolute threshold one needs to have some intuition about the magnitude of the cost function based on the type and amount of data. Alternatively, one could use a relative improvement as the convergence criterion, for example (something like "if (old_cost-new_cost)/old_cost < threshhold then converged"), which may be suitable for a larger variety of clustering problems. -- Lutz From dagss at student.matnat.uio.no Fri Jul 23 15:09:28 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 23 Jul 2010 21:09:28 +0200 Subject: [SciPy-User] Porting code from IDL to Python - 'Common block' equivalent? In-Reply-To: <4C49C62C.3000902@noaa.gov> References: <4C49C62C.3000902@noaa.gov> Message-ID: <4C49E8E8.6050103@student.matnat.uio.no> On 07/23/2010 06:41 PM, Christopher Barker wrote: > David Andrews wrote: > >> Now it's just a matter of converting ~13k lines of IDL into python :D >> > A little piece of advice here -- I would resist the urge to "convert" > your code line by line, or even function by function. > > I'm not familiar with IDL, but I do know that you will be much happier > in the long run with a code structure that is natural in Python. IIUC, > IDL is an array-oriented language, so that part will hopefully be a > natural transition, but from the sounds of it the overall code structure > may need to be quite different. > > So, think about the project as a whole, and how it can best be > structured in Python, and start from there. > > Oh, and write unit-tests from the beginning! > If pyIDL lives up to what it says, I'd resist the temptation to rewrite *anything* up front, just write new stuff in Python, and occasionally call routines in IDL. Then bring over piece by piece when it feels natural (and one knows what is "wrong" by keeping it in IDL and how it would be better in Python); always making sure the whole thing works as before. (I never used pyIDL, but I feel it should be something to investigate.) http://www.cacr.caltech.edu/~mmckerns/software.html Dag Sverre From ben.root at ou.edu Fri Jul 23 15:40:58 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 14:40:58 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 2:06 PM, Lutz Maibaum wrote: > On Fri, Jul 23, 2010 at 11:54 AM, Keith Goodman > wrote: > > On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum > wrote: > >> To be compatible with the (at least to me!) standard use of k-means, I > >> think both code and doc should use the sum of squared distances as the > >> cost function in the optimization, and also as the return value. > > > > What about the thresh (threshold) input parameter? If the sum of > > squares were used then the user would have to adjust the threshold for > > the number of data points. > > That's true, but personally I don't find that much of a problem. Using > an absolute threshold one needs to have some intuition about the > magnitude of the cost function based on the type and amount of data. > Alternatively, one could use a relative improvement as the convergence > criterion, for example (something like "if > (old_cost-new_cost)/old_cost < threshhold then converged"), which may > be suitable for a larger variety of clustering problems. > > -- Lutz > However, we wouldn't want to change the characteristic behavior of kmeans... yet. Personally, I never liked using tolerances and thresholds for stopping conditions, which is why I like the C Clustering library's approach of iterating until there are no more reassignments (or max iterations). Although, I can't remember how it handles the edge case of assignments getting passed back and forth between members. Just to be clear, the C Clustering library's implementation of kmeans is entirely different from SciPy's implementation. While I am certainly no expert in determining which approach is better than another, I can say that I have used it before and it has worked very nicely for me and my uses. Ben Root P.S. - As a complete side-note, while I am in this nostalgic fervor, a particularly clever use of kmeans/kmedians that I came up with was to 'snap' similar grids to a common grid without requiring one to predefine that grid. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 15:53:13 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 12:53:13 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 12:40 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 2:06 PM, Lutz Maibaum > wrote: >> >> On Fri, Jul 23, 2010 at 11:54 AM, Keith Goodman >> wrote: >> > On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum >> > wrote: >> >> To be compatible with the (at least to me!) standard use of k-means, I >> >> think both code and doc should use the sum of squared distances as the >> >> cost function in the optimization, and also as the return value. >> > >> > What about the thresh (threshold) input parameter? If the sum of >> > squares were used then the user would have to adjust the threshold for >> > the number of data points. >> >> That's true, but personally I don't find that much of a problem. Using >> an absolute threshold one needs to have some intuition about the >> magnitude of the cost function based on the type and amount of data. >> Alternatively, one could use a relative improvement as the convergence >> criterion, for example (something like "if >> (old_cost-new_cost)/old_cost < threshhold then converged"), which may >> be suitable for a larger variety of clustering problems. >> >> ?-- Lutz > > However, we wouldn't want to change the characteristic behavior of kmeans... > yet. That's a good point. Are all these considered "bugs"? - Switch code and doc to use rmse - Integer bug - Select initial centroids without replacement > Personally, I never liked using tolerances and thresholds for stopping > conditions, > which is why I like the C Clustering library's approach of iterating until > there are > no more reassignments (or max iterations).? Although, I can't remember how > it > handles the edge case of assignments getting passed back and forth between > members. > > Just to be clear, the C Clustering library's implementation of kmeans is > entirely > different from SciPy's implementation.? While I am certainly no expert in > determining > which approach is better than another, I can say that I have used it before > and it has > worked very nicely for me and my uses. > > Ben Root > > P.S. - As a complete side-note, while I am in this nostalgic fervor, a > particularly clever use > of kmeans/kmedians that I came up with was to 'snap' similar grids to a > common grid without requiring > one to predefine that grid. > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ben.root at ou.edu Fri Jul 23 16:01:28 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 15:01:28 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 12:40 PM, Benjamin Root wrote: > > On Fri, Jul 23, 2010 at 2:06 PM, Lutz Maibaum > > wrote: > >> > >> On Fri, Jul 23, 2010 at 11:54 AM, Keith Goodman > >> wrote: > >> > On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum < > lutz.maibaum at gmail.com> > >> > wrote: > >> >> To be compatible with the (at least to me!) standard use of k-means, > I > >> >> think both code and doc should use the sum of squared distances as > the > >> >> cost function in the optimization, and also as the return value. > >> > > >> > What about the thresh (threshold) input parameter? If the sum of > >> > squares were used then the user would have to adjust the threshold for > >> > the number of data points. > >> > >> That's true, but personally I don't find that much of a problem. Using > >> an absolute threshold one needs to have some intuition about the > >> magnitude of the cost function based on the type and amount of data. > >> Alternatively, one could use a relative improvement as the convergence > >> criterion, for example (something like "if > >> (old_cost-new_cost)/old_cost < threshhold then converged"), which may > >> be suitable for a larger variety of clustering problems. > >> > >> -- Lutz > > > > However, we wouldn't want to change the characteristic behavior of > kmeans... > > yet. > > That's a good point. Are all these considered "bugs"? > > - Switch code and doc to use rmse > - Integer bug > - Select initial centroids without replacement > My vote is yes, although I am not 100% convinced that the integer bug should be changed because it may cause breakage with those who have been depending on integer output. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 16:12:06 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 13:12:06 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 1:01 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman wrote: >> >> On Fri, Jul 23, 2010 at 12:40 PM, Benjamin Root wrote: >> > On Fri, Jul 23, 2010 at 2:06 PM, Lutz Maibaum >> > wrote: >> >> >> >> On Fri, Jul 23, 2010 at 11:54 AM, Keith Goodman >> >> wrote: >> >> > On Fri, Jul 23, 2010 at 11:39 AM, Lutz Maibaum >> >> > >> >> > wrote: >> >> >> To be compatible with the (at least to me!) standard use of k-means, >> >> >> I >> >> >> think both code and doc should use the sum of squared distances as >> >> >> the >> >> >> cost function in the optimization, and also as the return value. >> >> > >> >> > What about the thresh (threshold) input parameter? If the sum of >> >> > squares were used then the user would have to adjust the threshold >> >> > for >> >> > the number of data points. >> >> >> >> That's true, but personally I don't find that much of a problem. Using >> >> an absolute threshold one needs to have some intuition about the >> >> magnitude of the cost function based on the type and amount of data. >> >> Alternatively, one could use a relative improvement as the convergence >> >> criterion, for example (something like "if >> >> (old_cost-new_cost)/old_cost < threshhold then converged"), which may >> >> be suitable for a larger variety of clustering problems. >> >> >> >> ?-- Lutz >> > >> > However, we wouldn't want to change the characteristic behavior of >> > kmeans... >> > yet. >> >> That's a good point. Are all these considered "bugs"? >> >> - Switch code and doc to use rmse >> - Integer bug >> - Select initial centroids without replacement > > My vote is yes, although I am not 100% convinced that the integer bug should > be changed because it may cause breakage with those who have been depending > on integer output. Maybe just make a ticket for now for the integer problem? Lutz, do you want to make the ticket? It would be nice to find a simple problem that gives the wrong centroids due to the sum of dist bug. We could use that for a unit test of the fix. The example given earlier in the thread returns the right centroid. I guess we need a ticket for this one too. From lutz.maibaum at gmail.com Fri Jul 23 16:18:58 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 13:18:58 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Jul 23, 2010, at 12:40 PM, Benjamin Root wrote > > Just to be clear, the C Clustering library's implementation of kmeans is entirely > different from SciPy's implementation. While I am certainly no expert in determining > which approach is better than another, I can say that I have used it before and it has > worked very nicely for me and my uses. I am not sure the implementations are so different (possible bugs not withstanding ;). At implementation in the C clustering library does the following: 1. Start with an initial guess of the cluster assignments 2. Compute means for each cluster 3. Assign each data point to the nearest cluster mean. 4. If the cost function did not decrease or the maximum number of iterations has been reached => exit 5. Go to 2. This algorithm finds a local mimimum, and can be repeated a number of times with different initial clusterings to select from a number of locally optimal solutions. I am less familiar with the k-means implementation in scipy, but at first glance it seems pretty similar. However, the implementation in the C clustering library is more robust in that it detects cycles in the iteration process, and it makes sure that each cluster contains at least one data point. Lutz From ben.root at ou.edu Fri Jul 23 16:29:57 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 15:29:57 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 3:18 PM, Lutz Maibaum wrote: > On Jul 23, 2010, at 12:40 PM, Benjamin Root wrote > > > > Just to be clear, the C Clustering library's implementation of kmeans is > entirely > > different from SciPy's implementation. While I am certainly no expert in > determining > > which approach is better than another, I can say that I have used it > before and it has > > worked very nicely for me and my uses. > > I am not sure the implementations are so different (possible bugs not > withstanding ;). At implementation in the C clustering library does the > following: > > 1. Start with an initial guess of the cluster assignments > 2. Compute means for each cluster > 3. Assign each data point to the nearest cluster mean. > 4. If the cost function did not decrease or the maximum number of > iterations has been reached => exit > 5. Go to 2. > > >From the C Clustering Library's documentation: The expectation-maximization (EM) algorithm is commonly used to find the > partitioning > into k groups. The first step in the EM algorithm is to create k clusters > and randomly assign > items (genes or microarrays) to them. We then iterate: > ? Calculate the centroid of each cluster; > ? For each item, determine which cluster centroid is closest; > ? Reassign the item to that cluster. > The iteration is stopped if no further item reassignments take place. > The C Clustering Library makes an initial guess of the assignments and calculates the medians of the assignments. SciPy's kmeans makes an initial guess of the centroids and assigns the obs to the different centroid guesses. It is a subtle difference, but it does result in different ways to solve the problem. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.maibaum at gmail.com Fri Jul 23 17:18:07 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 14:18:07 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Jul 23, 2010, at 1:12 PM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 1:01 PM, Benjamin Root wrote: >> On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman wrote: >>> That's a good point. Are all these considered "bugs"? >>> >>> - Switch code and doc to use rmse >>> - Integer bug >>> - Select initial centroids without replacement >> >> My vote is yes, although I am not 100% convinced that the integer bug should >> be changed because it may cause breakage with those who have been depending >> on integer output. > > Maybe just make a ticket for now for the integer problem? Lutz, do you > want to make the ticket? I have opened a ticket (#1246). An easy and safe fix would be to simply add a statement to the docstring that warns the user about clustering integer data. > It would be nice to find a simple problem that gives the wrong > centroids due to the sum of dist bug. We could use that for a unit > test of the fix. The example given earlier in the thread returns the > right centroid. I guess we need a ticket for this one too. Actually, it not entirely clear to me anymore what the bug is. According to the k-means Wikipedia page, the objective function that the algorithm tries to minimize is the total intra-cluster variance (the sum of squares of distances of data points from cluster centroids). However, the two steps of the iteration (assignment to centroids, and centroid update) use regular distances and means. Is this not what the current code is doing? In the Wikipedia description, the iteration proceeds until no more changes are made. The SciPy implementation has an additional convergence criterion. I would have thought that the change in the sum of squared distances would be a better choice than the change in the sum of distances, but since this convergence criterion is not part of any other implementation of kmeans this may incorrect. One issue I see is that the documentation mentions that k-means tries to minimize distorition, defined as the sum of distances, which (at least according to the Wikipedia page) is not correct, because it tries to minimize the sum of squared distances. -- Lutz From ben.root at ou.edu Fri Jul 23 17:55:20 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 16:55:20 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum wrote: > On Jul 23, 2010, at 1:12 PM, Keith Goodman wrote: > > On Fri, Jul 23, 2010 at 1:01 PM, Benjamin Root wrote: > >> On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman > wrote: > >>> That's a good point. Are all these considered "bugs"? > >>> > >>> - Switch code and doc to use rmse > >>> - Integer bug > >>> - Select initial centroids without replacement > >> > >> My vote is yes, although I am not 100% convinced that the integer bug > should > >> be changed because it may cause breakage with those who have been > depending > >> on integer output. > > > > Maybe just make a ticket for now for the integer problem? Lutz, do you > > want to make the ticket? > > I have opened a ticket (#1246). An easy and safe fix would be to simply add > a statement to the docstring that warns the user about clustering integer > data. > > > It would be nice to find a simple problem that gives the wrong > > centroids due to the sum of dist bug. We could use that for a unit > > test of the fix. The example given earlier in the thread returns the > > right centroid. I guess we need a ticket for this one too. > > > Actually, it not entirely clear to me anymore what the bug is. According to > the k-means Wikipedia page, the objective function that the algorithm tries > to minimize is the total intra-cluster variance (the sum of squares of > distances of data points from cluster centroids). However, the two steps of > the iteration (assignment to centroids, and centroid update) use regular > distances and means. Is this not what the current code is doing? > > Which is why I have been saying that there is no bug here because the code is technically correct. A mean of regular distances is a sum of squared distances that has been divided. The only reason why the current code is not returning the correct answer for the given example is that it never tries 3 as a centroid value. This is a different issue. [snip] One issue I see is that the documentation mentions that k-means tries to > minimize distorition, defined as the sum of distances, which (at least > according to the Wikipedia page) is not correct, because it tries to > minimize the sum of squared distances. > > Exactly... the documentation is wrong. Kmeans works to minimize the sum of squared distances. How you define distances is up to you, so long as it satisfies the triangle inequality. And, as far as I can see, the code is doing exactly this using euclidean distance. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 18:15:59 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 15:15:59 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 2:55 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum > wrote: >> >> On Jul 23, 2010, at 1:12 PM, Keith Goodman wrote: >> > On Fri, Jul 23, 2010 at 1:01 PM, Benjamin Root wrote: >> >> On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman >> >> wrote: >> >>> That's a good point. Are all these considered "bugs"? >> >>> >> >>> - Switch code and doc to use rmse >> >>> - Integer bug >> >>> - Select initial centroids without replacement >> >> >> >> My vote is yes, although I am not 100% convinced that the integer bug >> >> should >> >> be changed because it may cause breakage with those who have been >> >> depending >> >> on integer output. >> > >> > Maybe just make a ticket for now for the integer problem? Lutz, do you >> > want to make the ticket? >> >> I have opened a ticket (#1246). An easy and safe fix would be to simply >> add a statement to the docstring that warns the user about clustering >> integer data. >> >> > It would be nice to find a simple problem that gives the wrong >> > centroids due to the sum of dist bug. We could use that for a unit >> > test of the fix. The example given earlier in the thread returns the >> > right centroid. I guess we need a ticket for this one too. >> >> >> Actually, it not entirely clear to me anymore what the bug is. According >> to the k-means Wikipedia page, the objective function that the algorithm >> tries to minimize is the total intra-cluster variance (the sum of squares of >> distances of data points from cluster centroids). However, the two steps of >> the iteration (assignment to centroids, and centroid update) use regular >> distances and means. Is this not what the current code is doing? >> > > Which is why I have been saying that there is no bug here because the code > is technically correct.? A mean of regular distances is a sum of squared > distances that has been divided.? The only reason why the current code is > not returning the correct answer for the given example is that it never > tries 3 as a centroid value.? This is a different issue. > > [snip] > >> One issue I see is that the documentation mentions that k-means tries to >> minimize distorition, defined as the sum of distances, which (at least >> according to the Wikipedia page) is not correct, because it tries to >> minimize the sum of squared distances. >> > > Exactly... the documentation is wrong.? Kmeans works to minimize the sum of > squared distances.? How you define distances is up to you, so long as it > satisfies the triangle inequality.? And, as far as I can see, the code is > doing exactly this using euclidean distance. My understanding is that the problem is the stopping condition. Each iteration lowers the sum of squares but iteration stops when the mean non-squared distance is below a threshold. From lutz.maibaum at gmail.com Fri Jul 23 18:27:33 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Fri, 23 Jul 2010 15:27:33 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Jul 23, 2010, at 2:55 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum wrote: >> Actually, it not entirely clear to me anymore what the bug is. According to the k-means Wikipedia page, the objective function that the algorithm tries to minimize is the total intra-cluster variance (the sum of squares of distances of data points from cluster centroids). However, the two steps of the iteration (assignment to centroids, and centroid update) use regular distances and means. Is this not what the current code is doing? > > Which is why I have been saying that there is no bug here because the code is technically correct. A mean of regular distances is a sum of squared distances that has been divided. The only reason why the current code is not returning the correct answer for the given example is that it never tries 3 as a centroid value. This is a different issue. I apologize if I am being obtuse, but why do you think the current code does not return the correct answer? >>> import numpy as np >>> from scipy import cluster >>> v = np.array([1,2,3,4,10],dtype=float) >>> cluster.vq.kmeans(v, 1) (array([ 4.]), 2.3999999999999999) >>> np.sum([abs(x-4)**2 for x in v]) 50.0 >>> np.sum([abs(x-3)**2 for x in v]) 55.0 The centroid 4 minimizes the sum of squared distances, which is what kmeans is supposed to find. Best, Lutz From ben.root at ou.edu Fri Jul 23 18:53:55 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 17:53:55 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 5:27 PM, Lutz Maibaum wrote: > On Jul 23, 2010, at 2:55 PM, Benjamin Root wrote: > > On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum > wrote: > >> Actually, it not entirely clear to me anymore what the bug is. According > to the k-means Wikipedia page, the objective function that the algorithm > tries to minimize is the total intra-cluster variance (the sum of squares of > distances of data points from cluster centroids). However, the two steps of > the iteration (assignment to centroids, and centroid update) use regular > distances and means. Is this not what the current code is doing? > > > > Which is why I have been saying that there is no bug here because the > code is technically correct. A mean of regular distances is a sum of > squared distances that has been divided. The only reason why the current > code is not returning the correct answer for the given example is that it > never tries 3 as a centroid value. This is a different issue. > > I apologize if I am being obtuse, but why do you think the current code > does not return the correct answer? > > >>> import numpy as np > >>> from scipy import cluster > >>> v = np.array([1,2,3,4,10],dtype=float) > >>> cluster.vq.kmeans(v, 1) > (array([ 4.]), 2.3999999999999999) > >>> np.sum([abs(x-4)**2 for x in v]) > 50.0 > >>> np.sum([abs(x-3)**2 for x in v]) > 55.0 > > The centroid 4 minimizes the sum of squared distances, which is what kmeans > is supposed to find. > > Best, > > Lutz > > Right, sorry, I forgot that we already figured that out. So, there is no bug in this respect. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jul 23 19:00:21 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 18:00:21 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 5:15 PM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 2:55 PM, Benjamin Root wrote: > > On Fri, Jul 23, 2010 at 4:18 PM, Lutz Maibaum > > wrote: > >> > >> On Jul 23, 2010, at 1:12 PM, Keith Goodman wrote: > >> > On Fri, Jul 23, 2010 at 1:01 PM, Benjamin Root > wrote: > >> >> On Fri, Jul 23, 2010 at 2:53 PM, Keith Goodman > >> >> wrote: > >> >>> That's a good point. Are all these considered "bugs"? > >> >>> > >> >>> - Switch code and doc to use rmse > >> >>> - Integer bug > >> >>> - Select initial centroids without replacement > >> >> > >> >> My vote is yes, although I am not 100% convinced that the integer bug > >> >> should > >> >> be changed because it may cause breakage with those who have been > >> >> depending > >> >> on integer output. > >> > > >> > Maybe just make a ticket for now for the integer problem? Lutz, do you > >> > want to make the ticket? > >> > >> I have opened a ticket (#1246). An easy and safe fix would be to simply > >> add a statement to the docstring that warns the user about clustering > >> integer data. > >> > >> > It would be nice to find a simple problem that gives the wrong > >> > centroids due to the sum of dist bug. We could use that for a unit > >> > test of the fix. The example given earlier in the thread returns the > >> > right centroid. I guess we need a ticket for this one too. > >> > >> > >> Actually, it not entirely clear to me anymore what the bug is. According > >> to the k-means Wikipedia page, the objective function that the algorithm > >> tries to minimize is the total intra-cluster variance (the sum of > squares of > >> distances of data points from cluster centroids). However, the two steps > of > >> the iteration (assignment to centroids, and centroid update) use regular > >> distances and means. Is this not what the current code is doing? > >> > > > > Which is why I have been saying that there is no bug here because the > code > > is technically correct. A mean of regular distances is a sum of squared > > distances that has been divided. The only reason why the current code is > > not returning the correct answer for the given example is that it never > > tries 3 as a centroid value. This is a different issue. > > > > [snip] > > > >> One issue I see is that the documentation mentions that k-means tries to > >> minimize distorition, defined as the sum of distances, which (at least > >> according to the Wikipedia page) is not correct, because it tries to > >> minimize the sum of squared distances. > >> > > > > Exactly... the documentation is wrong. Kmeans works to minimize the sum > of > > squared distances. How you define distances is up to you, so long as it > > satisfies the triangle inequality. And, as far as I can see, the code is > > doing exactly this using euclidean distance. > > My understanding is that the problem is the stopping condition. Each > iteration lowers the sum of squares but iteration stops when the mean > non-squared distance is below a threshold. > The stopping condition uses the change in the distortion, not a non-squared distance. The distortion is already a sum of squares. The only place that a non-squared distance is used is in _py_vq_1d() which appears to be very old code and it has a raise error at the very first statement. I have gone ahead and started to modify the documentation for vq and vq.kmeans in order to correct the mistakes and clarify the document. This is being done in the Doc Editor, as it should, so feel free to comment on my changes. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 19:48:52 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 16:48:52 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root wrote: > The stopping condition uses the change in the distortion, not a non-squared > distance.? The distortion is already a sum of squares.? The only place that > a non-squared distance is used is in _py_vq_1d() which appears to be very > old code and it has a raise error at the very first statement. That's good news. Another place that a non-squared distance is used is the return value: >> import numpy as np >> from scipy import cluster >> v = np.array([1,2,3,4,10],dtype=float) >> cluster.vq.kmeans(v, 1) (array([ 4.]), 2.3999999999999999) >> np.sqrt(np.dot(v-4, v-4) / 5.0) 3.1622776601683795 # Nope, not returned >> np.absolute(v - 4).mean() 2.3999999999999999 # Yep, this one is returned Is that a code bug or a doc bug? From ben.root at ou.edu Fri Jul 23 20:46:46 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 19:46:46 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root wrote: > > > The stopping condition uses the change in the distortion, not a > non-squared > > distance. The distortion is already a sum of squares. The only place > that > > a non-squared distance is used is in _py_vq_1d() which appears to be very > > old code and it has a raise error at the very first statement. > > That's good news. > > Another place that a non-squared distance is used is the return value: > > >> import numpy as np > >> from scipy import cluster > >> v = np.array([1,2,3,4,10],dtype=float) > >> cluster.vq.kmeans(v, 1) > (array([ 4.]), 2.3999999999999999) > > >> np.sqrt(np.dot(v-4, v-4) / 5.0) > 3.1622776601683795 # Nope, not returned > >> np.absolute(v - 4).mean() > 2.3999999999999999 # Yep, this one is returned > > Is that a code bug or a doc bug? > Well, see, that's just the thing... the doc says that it returns the distortion, which is what it does, but obviously, this distortion was a MAE and not a RMSE. The problem is that I have gone backwards and forwards over the codes, including the Cython version, and I can't find anyplace where this is happening. Does anybody know of any good code tracing tools? I used trace once, but it wasn't very user-friendly... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 23 20:53:33 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 23 Jul 2010 17:53:33 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 5:46 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman wrote: >> >> On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root wrote: >> >> > The stopping condition uses the change in the distortion, not a >> > non-squared >> > distance.? The distortion is already a sum of squares.? The only place >> > that >> > a non-squared distance is used is in _py_vq_1d() which appears to be >> > very >> > old code and it has a raise error at the very first statement. >> >> That's good news. >> >> Another place that a non-squared distance is used is the return value: >> >> >> import numpy as np >> >> from scipy import cluster >> >> v = np.array([1,2,3,4,10],dtype=float) >> >> cluster.vq.kmeans(v, 1) >> ? (array([ 4.]), 2.3999999999999999) >> >> >> np.sqrt(np.dot(v-4, v-4) / 5.0) >> ? 3.1622776601683795 ?# Nope, not returned >> >> np.absolute(v - 4).mean() >> ? 2.3999999999999999 # Yep, this one is returned >> >> Is that a code bug or a doc bug? > > Well, see, that's just the thing... the doc says that it returns the > distortion, which is what it does, but obviously, this distortion was a MAE > and not a RMSE.? The problem is that I have gone backwards and forwards over > the codes, including the Cython version, and I can't find anyplace where > this is happening. > > Does anybody know of any good code tracing tools?? I used trace once, but it > wasn't very user-friendly... I think I see it! Yes, the squared distance is calculated. But before it is summed or meaned, the square root is taken. That turns the squared distance into just distance. From ben.root at ou.edu Fri Jul 23 21:56:53 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 20:56:53 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 7:53 PM, Keith Goodman wrote: > On Fri, Jul 23, 2010 at 5:46 PM, Benjamin Root wrote: > > On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman > wrote: > >> > >> On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root wrote: > >> > >> > The stopping condition uses the change in the distortion, not a > >> > non-squared > >> > distance. The distortion is already a sum of squares. The only place > >> > that > >> > a non-squared distance is used is in _py_vq_1d() which appears to be > >> > very > >> > old code and it has a raise error at the very first statement. > >> > >> That's good news. > >> > >> Another place that a non-squared distance is used is the return value: > >> > >> >> import numpy as np > >> >> from scipy import cluster > >> >> v = np.array([1,2,3,4,10],dtype=float) > >> >> cluster.vq.kmeans(v, 1) > >> (array([ 4.]), 2.3999999999999999) > >> > >> >> np.sqrt(np.dot(v-4, v-4) / 5.0) > >> 3.1622776601683795 # Nope, not returned > >> >> np.absolute(v - 4).mean() > >> 2.3999999999999999 # Yep, this one is returned > >> > >> Is that a code bug or a doc bug? > > > > Well, see, that's just the thing... the doc says that it returns the > > distortion, which is what it does, but obviously, this distortion was a > MAE > > and not a RMSE. The problem is that I have gone backwards and forwards > over > > the codes, including the Cython version, and I can't find anyplace where > > this is happening. > > > > Does anybody know of any good code tracing tools? I used trace once, but > it > > wasn't very user-friendly... > > I think I see it! Yes, the squared distance is calculated. But before > it is summed or meaned, the square root is taken. That turns the > squared distance into just distance. > Are you talking about the sqrt in py_vq()? That doesn't get called in the given example... however, you are right that the list of distances that is being returned are being square-rooted before the return. It is happening in the C code, though, and I just don't know where... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jul 23 22:24:22 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 23 Jul 2010 21:24:22 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Fri, Jul 23, 2010 at 8:56 PM, Benjamin Root wrote: > On Fri, Jul 23, 2010 at 7:53 PM, Keith Goodman wrote: > >> On Fri, Jul 23, 2010 at 5:46 PM, Benjamin Root wrote: >> > On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman >> wrote: >> >> >> >> On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root >> wrote: >> >> >> >> > The stopping condition uses the change in the distortion, not a >> >> > non-squared >> >> > distance. The distortion is already a sum of squares. The only >> place >> >> > that >> >> > a non-squared distance is used is in _py_vq_1d() which appears to be >> >> > very >> >> > old code and it has a raise error at the very first statement. >> >> >> >> That's good news. >> >> >> >> Another place that a non-squared distance is used is the return value: >> >> >> >> >> import numpy as np >> >> >> from scipy import cluster >> >> >> v = np.array([1,2,3,4,10],dtype=float) >> >> >> cluster.vq.kmeans(v, 1) >> >> (array([ 4.]), 2.3999999999999999) >> >> >> >> >> np.sqrt(np.dot(v-4, v-4) / 5.0) >> >> 3.1622776601683795 # Nope, not returned >> >> >> np.absolute(v - 4).mean() >> >> 2.3999999999999999 # Yep, this one is returned >> >> >> >> Is that a code bug or a doc bug? >> > >> > Well, see, that's just the thing... the doc says that it returns the >> > distortion, which is what it does, but obviously, this distortion was a >> MAE >> > and not a RMSE. The problem is that I have gone backwards and forwards >> over >> > the codes, including the Cython version, and I can't find anyplace where >> > this is happening. >> > >> > Does anybody know of any good code tracing tools? I used trace once, >> but it >> > wasn't very user-friendly... >> >> I think I see it! Yes, the squared distance is calculated. But before >> it is summed or meaned, the square root is taken. That turns the >> squared distance into just distance. >> > > Are you talking about the sqrt in py_vq()? That doesn't get called in the > given example... however, you are right that the list of distances that is > being returned are being square-rooted before the return. It is happening > in the C code, though, and I just don't know where... > > Actually, I think I see it now. in src/vq.c, you have the function double_vq_obs() which finds out which centroid an obs should be assigned to and it calculates a euclidean distance, as it should, and returns the smallest distance and the centroid it matched best with. This info is passed to double_tvq(), which does this for each observation. double_tvq() is called by compute_vq() in src/vq_module.c, which is the function called by _vq.vq() in vq.py... That array of distances is what gets passed into the mean() call in _kmeans(). Therefore, either we need to square the returned value, or remove all of the square roots elsewhere (making sure we put a square root when we are done, of course...). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Sat Jul 24 13:36:17 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 24 Jul 2010 10:36:17 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: _kmeans chokes on large thresholds: >> from scipy import cluster >> v = np.array([1,2,3,4,10], dtype=float) >> cluster.vq.kmeans(v, 1, thresh=1e15) (array([ 4.]), 2.3999999999999999) >> cluster.vq.kmeans(v, 1, thresh=1e16) IndexError: list index out of range The problem is in these lines: diff = thresh+1. while diff > thresh: if(diff > thresh): If thresh is large then (thresh + 1) > thresh is False: >> thresh = 1e16 >> diff = thresh + 1.0 >> diff > thresh False What's a use case for a large threshold? You might want to study the algorithm by seeing the result after one iteration (not to be confused with the iter input which is something else). One fix is to use 2*thresh instead for thresh + 1. But that just pushes the problem out to higher thresholds: >> thresh = 1e16 >> diff = 2 * thresh >> diff > thresh True >> thresh = 1e400 >> diff = 2 * thresh >> diff > thresh False A better fix is to replace: if dist > thresh with if (dist > thresh) or (count = 0) or if (dist > thresh) or firstflag Ticket: http://projects.scipy.org/scipy/ticket/1247 From newton at tethers.com Sat Jul 24 13:37:33 2010 From: newton at tethers.com (Tyrel Newton) Date: Sat, 24 Jul 2010 10:37:33 -0700 Subject: [SciPy-User] memory errors when using savemat Message-ID: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> I'm trying to use scipy.io.savemat to export a very large set of data to a .mat file. The dataset contains around 20 million floats. When I try to export this to a .mat file, I get a MemoryError. The specific MemoryError is: File "C:\Python26\lib\site-packages\scipy\io\matlab\miobase.py", line 557 in write_bytes self.file_stream.write(arr.tostring(order='F')) I'm running this on Windows under Python 2.6. Does anybody know of a way to deal with this type memory error? Either increasing python's available memory or telling scipy to break apart the export into chunks . . . Thanks in advance for any suggestions. Tyrel From cournape at gmail.com Sat Jul 24 16:00:46 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 25 Jul 2010 05:00:46 +0900 Subject: [SciPy-User] memory errors when using savemat In-Reply-To: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> References: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> Message-ID: On Sun, Jul 25, 2010 at 2:37 AM, Tyrel Newton wrote: > I'm trying to use scipy.io.savemat to export a very large set of data to a .mat file. The dataset contains around 20 million floats. When I try to export this to a .mat file, I get a MemoryError. The specific MemoryError is: > > File "C:\Python26\lib\site-packages\scipy\io\matlab\miobase.py", line 557 in write_bytes > ? ? ? ?self.file_stream.write(arr.tostring(order='F')) > > I'm running this on Windows under Python 2.6. Could you give us a small script which reproduces the error ? 20 million is pretty small, I suspect something else or a bug in scipy, cheers, David From galpin at gmail.com Sat Jul 24 17:14:55 2010 From: galpin at gmail.com (Martin Galpin) Date: Sat, 24 Jul 2010 22:14:55 +0100 Subject: [SciPy-User] numpy.append and persisting original datatype Message-ID: Hello, Given the following example: import numpy as np foo = np.array([], dtype=np.float32) print a.dtype >> float32 foo = np.append(foo, 1) print foo.dtype >> float64 Is this the correct behaviour? I realise that numpy.append() returns a new copy of foo but is it the correct behaviour that the original datatype is not persisted? If so, should this not be noted in the documentation? Forgive me if I have missed something. Best wishes -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdrude at me.com Sat Jul 24 18:12:45 2010 From: gdrude at me.com (JerryRude) Date: Sat, 24 Jul 2010 15:12:45 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] The scipy.test('1', '10') hanging on "make sure it handles relative values... ok" Message-ID: <29239203.post@talk.nabble.com> Thank you for taking the time to read this. After 2 hours of google I have not found a similar problem. I have installed EPD for the purpose of using scipy and matplotlib. When I ran the scipy.test('1','10') to test the install the function hangs after the test "make sure it handles relative values... ok". If someone happens to have some advice on what may be going on I would appreciate it. I am running this install on a Mac OSX Leopard 10.6.4 on a brand new 13" macbook pro. gfortran --version GNU Fortran (GCC) 4.4.1 Copyright (C) 2009 Free Software Foundation, Inc. gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) EPD Version 6.2-2 python -V Python 2.6.5 -- EPD 6.2-2 (32-bit) Cheers, Jerry -- View this message in context: http://old.nabble.com/The-scipy.test%28%271%27%2C%2710%27%29-hanging-on-%22make-sure-it-handles-relative-values...-ok%22-tp29239203p29239203.html Sent from the Scipy-User mailing list archive at Nabble.com. From ben.root at ou.edu Sat Jul 24 22:08:09 2010 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 24 Jul 2010 21:08:09 -0500 Subject: [SciPy-User] numpy.append and persisting original datatype In-Reply-To: References: Message-ID: On Sat, Jul 24, 2010 at 4:14 PM, Martin Galpin wrote: > Hello, > > Given the following example: > > import numpy as np > foo = np.array([], dtype=np.float32) > print a.dtype > >> float32 > foo = np.append(foo, 1) > print foo.dtype > >> float64 > > Is this the correct behaviour? I realise that numpy.append() returns a new > copy of foo but is it the correct behaviour that the original datatype is > not persisted? If so, should this not be noted in the documentation? > > Forgive me if I have missed something. > > Best wishes > > -- > Martin > > I have moved this over to the numpy-discussion list. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From newton at tethers.com Sun Jul 25 12:32:17 2010 From: newton at tethers.com (Tyrel Newton) Date: Sun, 25 Jul 2010 09:32:17 -0700 Subject: [SciPy-User] memory errors when using savemat In-Reply-To: References: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> Message-ID: > Could you give us a small script which reproduces the error ? 20 > million is pretty small, I suspect something else or a bug in scipy, Not really, but I think I can list off the important parts. 1) Once I know the length, I declare an empty array with: dataarr = numpy.empty(length, dtype=numpy.dtype(float)) 2) I then populate the array with values, potentially out-of-order (which is I was a declare a large empty array first). The arrays are populated with something like: dataarr[index] = float(datavalue) 3) I then create a dict object with multiple of these large float-type arrays, usually less than 6 arrays total. Each array is the same length, in this case 20M samples. 4) To this dict object, I add a few single-entry float arrays and strings that represent metadata for the large arrays. These are used to automatically process the data in MATLAB. Examples of creating the numpy types: dataarr = numpy.empty(1, dtype=numpy.dtype(float)) dataarr[0] = datavalue strdata = numpy.core.defchararray.array(str) 5) I then export the entire dict object in a single call: scipy.io.savemat(fname, datadict) Hope that's enough to explain what's going on. Thanks, Tyrel From iain at day-online.org.uk.invalid Sun Jul 25 12:35:20 2010 From: iain at day-online.org.uk.invalid (Iain Day) Date: Sun, 25 Jul 2010 17:35:20 +0100 Subject: [SciPy-User] GSVD in Scipy (ticket 964) Message-ID: Hello, Apologies if this is the wrong mailing list for this. I'm working on a problem which needs the generalized SVD of a matrix pair, similar to the MatLab routine gsvd. There is a ticket open for an enhancement for this to go into scipy/linalg (ticket 964). Does anyone know if/when this is likely to land in a release of scipy? Other wise, I've been trying to use the patch provided by Mike Trumpis to wrap the relevant lapack calls myself. I've taken his .pyf file and downloaded the four lapack routines for gsvd. I'm trying to follow the f2py example on the scipy pages: $ f2py2.6 -c lapack_gsvd.pyf sggsvd.f dggsvd.f cggsvd.f zggsvd.f -latlas -llapack -lblas which fails with an error given below. Does anyone have any pointers as where I might be going wrong. Many thanks. For information, numpy, scipy, atlas, lapack and blas are all from Fink. Many thanks. Iain running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building extension "lapack_gsvd" sources creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_ creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6 f2py options: [] f2py: lapack_gsvd.pyf Reading fortran codes... Reading file 'lapack_gsvd.pyf' (format:free) Line #6 in lapack_gsvd.pyf:"subroutine ggsvd(m,p,n,k,l,du,dv,dq,a,b,compute_vecs,u,v,q,alpha,beta,work,iwork,info) " analyzeline: No name/args pattern found for line. Line #34 in lapack_gsvd.pyf:"end subroutine ggsvd" analyzeline: No name/args pattern found for line. Line #36 in lapack_gsvd.pyf:"subroutine ggsvd(m,p,n,k,l,du,dv,dq,a,b,compute_vecs,u,v,q,alpha,beta,work,rwork,iwork,info) " analyzeline: No name/args pattern found for line. Line #65 in lapack_gsvd.pyf:"end subroutine ggsvd " analyzeline: No name/args pattern found for line. crackline: groupcounter=4 groupname={0: '', 1: 'python module', 2: 'interface', 3: 'subroutine', 4: 'subroutine', 5: 'subroutine', 6: 'subroutine'} crackline: Mismatch of blocks encountered. Trying to fix it by assuming "end" statement. crackline: groupcounter=3 groupname={0: '', 1: 'python module', 2: 'interface', 3: 'subroutine', 4: 'subroutine', 5: 'subroutine', 6: 'subroutine'} crackline: Mismatch of blocks encountered. Trying to fix it by assuming "end" statement. crackline: groupcounter=2 groupname={0: '', 1: 'python module', 2: 'interface', 3: 'subroutine', 4: 'subroutine', 5: 'subroutine', 6: 'subroutine'} crackline: Mismatch of blocks encountered. Trying to fix it by assuming "end" statement. crackline: groupcounter=1 groupname={0: '', 1: 'python module', 2: 'interface', 3: 'subroutine', 4: 'subroutine', 5: 'subroutine', 6: 'subroutine'} crackline: Mismatch of blocks encountered. Trying to fix it by assuming "end" statement. Post-processing... Block: lapack_gsvd Block: unknown_subroutine Post-processing (stage 2)... Building modules... Building module "lapack_gsvd"... Constructing wrapper function "unknown_subroutine"... unknown_subroutine() Wrote C/API module "lapack_gsvd" to file "/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6/lapack_gsvdmodule.c" adding '/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6/fortranobject.c' to sources. adding '/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6' to include_dirs. copying /sw/lib/python2.6/site-packages/numpy/f2py/src/fortranobject.c -> /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6 copying /sw/lib/python2.6/site-packages/numpy/f2py/src/fortranobject.h -> /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6 build_src: building npy-pkg config files running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext customize NAGFCompiler Could not locate executable f95 customize AbsoftFCompiler Could not locate executable f90 Could not locate executable f77 customize IBMFCompiler Could not locate executable xlf90 Could not locate executable xlf customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize GnuFCompiler Could not locate executable g77 customize Gnu95FCompiler Found executable /sw/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using build_ext building 'lapack_gsvd' extension compiling C sources C compiler: gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders/Ss creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp- creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_ creating /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6 compile options: '-I/var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6 -I/sw/lib/python2.6/site-packages/numpy/core/include -I/sw/include/python2.6 -c' gcc: /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6/lapack_gsvdmodule.c /var/folders/Ss/SsCJ6A6sENq5sQzZeDzoT++++TI/-Tmp-/tmpYRUAv_/src.macosx-10.6-x86_64-2.6/lapack_gsvdmodule.c:100: error: expected declaration specifiers or ?...? before ? Hi all I have the following function y = m[0] A + m[1] B + m[2] C and I have the following data A ; B ; C ; y 10 ; 122 ; 7 ; 4 12 ; 134 ; 9 ; 2 11 ; 131 ; 6 ; 4 7 ; 180 ; 4 ; 2 Do you know what function of python I could use to find the weights m[i] that best fit for these data? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Jul 25 12:49:10 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 25 Jul 2010 16:49:10 +0000 (UTC) Subject: [SciPy-User] memory errors when using savemat References: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> Message-ID: Sat, 24 Jul 2010 10:37:33 -0700, Tyrel Newton wrote: > I'm trying to use scipy.io.savemat to export a very large set of data to > a .mat file. The dataset contains around 20 million floats. When I try > to export this to a .mat file, I get a MemoryError. The specific > MemoryError is: > > File "C:\Python26\lib\site-packages\scipy\io\matlab\miobase.py", line > 557 in write_bytes > self.file_stream.write(arr.tostring(order='F')) What is the complete error message? -- It typically indicates the specific part in C code the error originates from. (The full traceback, thanks!) On the other hand, along that code path it seems the only source of a MemoryError can really be a failure to allocate memory for the tostring. Your data apparently needs 160 MB free for this to succeed -- which is not so much. So the question comes to what is the memory usage of the code when saving, compared to the available free memory? -- Pauli Virtanen From kwgoodman at gmail.com Sun Jul 25 12:59:12 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 25 Jul 2010 09:59:12 -0700 Subject: [SciPy-User] fitting multiple coefficients In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 9:48 AM, eneide.odissea wrote: > Hi all > I have the following function > y = m[0] A + m[1] B + m[2] C > and I have the following data > A ? ; ? B ?; ? ? C ? ; ? ?y > 10 ? ?; ?122 ; ?7 ? ; ? ?4 > 12 ? ?; ?134 ; ?9 ? ; ? ?2 > 11 ? ?; ?131 ; ?6 ? ; ? ?4 > 7 ? ? ?; ?180 ; ?4 ? ; ? ?2 > Do you know what function of python I could use to find > the weights m[i] that ?best fit for?these data? >> import numpy as np >> x = np.array([[10, 122, 7], [12, 134, 9], [11, 131, 6], [7, 180, 4]]) >> y = np.array([4, 2, 4, 2]) >> m, ig1, ig2, ig3 = np.linalg.lstsq(x, y) >> m array([ 0.84562416, -0.00455408, -0.74598866]) >> np.dot(x, m) # <-- fitted y array([ 2.67872369, 2.82334576, 4.22934983, 2.11568083]) From alind_sap at yahoo.com Sun Jul 25 13:09:36 2010 From: alind_sap at yahoo.com (alind sharma) Date: Sun, 25 Jul 2010 22:39:36 +0530 (IST) Subject: [SciPy-User] matrix product over row or column Message-ID: <112597.35141.qm@web94906.mail.in2.yahoo.com> Hi all, I have a sparse lil_matrix. I want to get the product over rows/column. Whats the best way to achive that. I tried searching but unable to find a straight forward function to do that in scipy.sparse. Though we have lil_matirx.sum(dim), but no such thing for product. Suppose i have [[1, 1, 2], [2, 3, 5], [9, 0, 1 ]] I want answer as in case of over rows: [18, 0, 10] and in case of over colums : [2, 30, 0] Thanks in advance, Alind Sharma -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jul 25 15:41:32 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 26 Jul 2010 04:41:32 +0900 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman wrote: > _kmeans chokes on large thresholds: > >>> from scipy import cluster >>> v = np.array([1,2,3,4,10], dtype=float) >>> cluster.vq.kmeans(v, 1, thresh=1e15) > ? (array([ 4.]), 2.3999999999999999) >>> cluster.vq.kmeans(v, 1, thresh=1e16) > > IndexError: list index out of range > > The problem is in these lines: > > ? ?diff = thresh+1. > ? ?while diff > thresh: > ? ? ? ? > ? ? ? ?if(diff > thresh): > > If thresh is large then (thresh + 1) > thresh is False: > >>> thresh = 1e16 >>> diff = thresh + 1.0 >>> diff > thresh > ? False > > What's a use case for a large threshold? You might want to study the > algorithm by seeing the result after one iteration (not to be confused > with the iter input which is something else). > > One fix is to use 2*thresh instead for thresh + 1. But that just > pushes the problem out to higher thresholds Or just use the spacing function, which by definition returns the smallest number M such as thresh + M > thresh (except for nan/inf) David From ben.root at ou.edu Sun Jul 25 15:48:52 2010 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 25 Jul 2010 14:48:52 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 2:41 PM, David Cournapeau wrote: > On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman > wrote: > > _kmeans chokes on large thresholds: > > > >>> from scipy import cluster > >>> v = np.array([1,2,3,4,10], dtype=float) > >>> cluster.vq.kmeans(v, 1, thresh=1e15) > > (array([ 4.]), 2.3999999999999999) > >>> cluster.vq.kmeans(v, 1, thresh=1e16) > > > > IndexError: list index out of range > > > > The problem is in these lines: > > > > diff = thresh+1. > > while diff > thresh: > > > > if(diff > thresh): > > > > If thresh is large then (thresh + 1) > thresh is False: > > > >>> thresh = 1e16 > >>> diff = thresh + 1.0 > >>> diff > thresh > > False > > > > What's a use case for a large threshold? You might want to study the > > algorithm by seeing the result after one iteration (not to be confused > > with the iter input which is something else). > > > > One fix is to use 2*thresh instead for thresh + 1. But that just > > pushes the problem out to higher thresholds > > Or just use the spacing function, which by definition returns the > smallest number M such as thresh + M > thresh (except for nan/inf) > > Or, one could just go with a "prime the loop" approach and perform the operation once before the loop begins. Admittedly, this does seem rather un-pythonic unless python has a do...while idiom that I am unaware of. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Sun Jul 25 17:53:53 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 25 Jul 2010 14:53:53 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau wrote: > On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman wrote: >> _kmeans chokes on large thresholds: >> >>>> from scipy import cluster >>>> v = np.array([1,2,3,4,10], dtype=float) >>>> cluster.vq.kmeans(v, 1, thresh=1e15) >> ? (array([ 4.]), 2.3999999999999999) >>>> cluster.vq.kmeans(v, 1, thresh=1e16) >> >> IndexError: list index out of range >> >> The problem is in these lines: >> >> ? ?diff = thresh+1. >> ? ?while diff > thresh: >> ? ? ? ? >> ? ? ? ?if(diff > thresh): >> >> If thresh is large then (thresh + 1) > thresh is False: >> >>>> thresh = 1e16 >>>> diff = thresh + 1.0 >>>> diff > thresh >> ? False >> >> What's a use case for a large threshold? You might want to study the >> algorithm by seeing the result after one iteration (not to be confused >> with the iter input which is something else). >> >> One fix is to use 2*thresh instead for thresh + 1. But that just >> pushes the problem out to higher thresholds > > Or just use the spacing function, which by definition returns the > smallest number M such as thresh + M > thresh (except for nan/inf) Neat, I've never heard of np.spacing. But it suffers the same fate: Works: >> thresh = 1e16 >> diff = thresh + np.spacing(thresh) >> diff > thresh True Doesn't work: >> thresh = 1e400 >> diff = thresh + np.spacing(thresh) >> diff > thresh False len(avg_dist) == 0 could be used to mark the first time through the loop. Another minor issue: The kmeans docstring says iteration stops when the change in distortion is less than threshold. But as coded (if diff > thresh) iteration also stops when the change is equal to the threshold. Could either fix the code or the docstring. Fixing the code (if diff >= thresh) means that thresh=0 could enter an infinite loop (negative thresh already enters an infinite loop). So fixing the docstring seems better. To avoid infinite loops, I think iteration should terminite when there is no change in distortion. But then, since there would be two termination reasons, you'd probably want to output the reason iteration terminiated. From cournape at gmail.com Sun Jul 25 17:59:53 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 26 Jul 2010 06:59:53 +0900 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Mon, Jul 26, 2010 at 6:53 AM, Keith Goodman wrote: > On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau wrote: >> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman wrote: >>> _kmeans chokes on large thresholds: >>> >>>>> from scipy import cluster >>>>> v = np.array([1,2,3,4,10], dtype=float) >>>>> cluster.vq.kmeans(v, 1, thresh=1e15) >>> ? (array([ 4.]), 2.3999999999999999) >>>>> cluster.vq.kmeans(v, 1, thresh=1e16) >>> >>> IndexError: list index out of range >>> >>> The problem is in these lines: >>> >>> ? ?diff = thresh+1. >>> ? ?while diff > thresh: >>> ? ? ? ? >>> ? ? ? ?if(diff > thresh): >>> >>> If thresh is large then (thresh + 1) > thresh is False: >>> >>>>> thresh = 1e16 >>>>> diff = thresh + 1.0 >>>>> diff > thresh >>> ? False >>> >>> What's a use case for a large threshold? You might want to study the >>> algorithm by seeing the result after one iteration (not to be confused >>> with the iter input which is something else). >>> >>> One fix is to use 2*thresh instead for thresh + 1. But that just >>> pushes the problem out to higher thresholds >> >> Or just use the spacing function, which by definition returns the >> smallest number M such as thresh + M > thresh (except for nan/inf) > > Neat, I've never heard of np.spacing. But it suffers the same fate: > > Works: > >>> thresh = 1e16 >>> diff = thresh + np.spacing(thresh) >>> diff > thresh > ? True > > Doesn't work: > >>> thresh = 1e400 >>> diff = thresh + np.spacing(thresh) >>> diff > thresh > ? False That's because 1e400 is inf for double precision numbers, and inf + N > inf is never true :) David From ben.root at ou.edu Sun Jul 25 18:11:12 2010 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 25 Jul 2010 17:11:12 -0500 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 4:53 PM, Keith Goodman wrote: > On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau > wrote: > > On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman > wrote: > >> _kmeans chokes on large thresholds: > >> > >>>> from scipy import cluster > >>>> v = np.array([1,2,3,4,10], dtype=float) > >>>> cluster.vq.kmeans(v, 1, thresh=1e15) > >> (array([ 4.]), 2.3999999999999999) > >>>> cluster.vq.kmeans(v, 1, thresh=1e16) > >> > >> IndexError: list index out of range > >> > >> The problem is in these lines: > >> > >> diff = thresh+1. > >> while diff > thresh: > >> > >> if(diff > thresh): > >> > >> If thresh is large then (thresh + 1) > thresh is False: > >> > >>>> thresh = 1e16 > >>>> diff = thresh + 1.0 > >>>> diff > thresh > >> False > >> > >> What's a use case for a large threshold? You might want to study the > >> algorithm by seeing the result after one iteration (not to be confused > >> with the iter input which is something else). > >> > >> One fix is to use 2*thresh instead for thresh + 1. But that just > >> pushes the problem out to higher thresholds > > > > Or just use the spacing function, which by definition returns the > > smallest number M such as thresh + M > thresh (except for nan/inf) > > Neat, I've never heard of np.spacing. But it suffers the same fate: > > Works: > > >> thresh = 1e16 > >> diff = thresh + np.spacing(thresh) > >> diff > thresh > True > > Doesn't work: > > >> thresh = 1e400 > >> diff = thresh + np.spacing(thresh) > >> diff > thresh > False > > len(avg_dist) == 0 could be used to mark the first time through the loop. > > Another minor issue: > > The kmeans docstring says iteration stops when the change in > distortion is less than threshold. But as coded (if diff > thresh) > iteration also stops when the change is equal to the threshold. > > Could either fix the code or the docstring. Fixing the code (if diff > >= thresh) means that thresh=0 could enter an infinite loop (negative > thresh already enters an infinite loop). So fixing the docstring seems > better. > > I have updated the docstring via the wiki. There are probably a few more changes that needs to be done before marking it as ready for release. http://docs.scipy.org/scipy/docs/scipy.cluster.vq.kmeans/ Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Sun Jul 25 18:17:55 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 25 Jul 2010 15:17:55 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 2:59 PM, David Cournapeau wrote: > On Mon, Jul 26, 2010 at 6:53 AM, Keith Goodman wrote: >> On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau wrote: >>> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman wrote: >>>> _kmeans chokes on large thresholds: >>>> >>>>>> from scipy import cluster >>>>>> v = np.array([1,2,3,4,10], dtype=float) >>>>>> cluster.vq.kmeans(v, 1, thresh=1e15) >>>> ? (array([ 4.]), 2.3999999999999999) >>>>>> cluster.vq.kmeans(v, 1, thresh=1e16) >>>> >>>> IndexError: list index out of range >>>> >>>> The problem is in these lines: >>>> >>>> ? ?diff = thresh+1. >>>> ? ?while diff > thresh: >>>> ? ? ? ? >>>> ? ? ? ?if(diff > thresh): >>>> >>>> If thresh is large then (thresh + 1) > thresh is False: >>>> >>>>>> thresh = 1e16 >>>>>> diff = thresh + 1.0 >>>>>> diff > thresh >>>> ? False >>>> >>>> What's a use case for a large threshold? You might want to study the >>>> algorithm by seeing the result after one iteration (not to be confused >>>> with the iter input which is something else). >>>> >>>> One fix is to use 2*thresh instead for thresh + 1. But that just >>>> pushes the problem out to higher thresholds >>> >>> Or just use the spacing function, which by definition returns the >>> smallest number M such as thresh + M > thresh (except for nan/inf) >> >> Neat, I've never heard of np.spacing. But it suffers the same fate: >> >> Works: >> >>>> thresh = 1e16 >>>> diff = thresh + np.spacing(thresh) >>>> diff > thresh >> ? True >> >> Doesn't work: >> >>>> thresh = 1e400 >>>> diff = thresh + np.spacing(thresh) >>>> diff > thresh >> ? False > > That's because 1e400 is inf for double precision numbers, and inf + N >> inf is never true :) That makes sense. But it is also the reason not to use np.spacing for kmeans. Entering thresh=np.inf seems reasonable if you want to make sure only one iteration is performed. Using if (diff > thesh) or (len(dist_arg) == 0) should fix it. Is the extra time OK for such a small corner case? I think so. From kwgoodman at gmail.com Sun Jul 25 18:24:57 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 25 Jul 2010 15:24:57 -0700 Subject: [SciPy-User] kmeans In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 3:17 PM, Keith Goodman wrote: > On Sun, Jul 25, 2010 at 2:59 PM, David Cournapeau wrote: >> On Mon, Jul 26, 2010 at 6:53 AM, Keith Goodman wrote: >>> On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau wrote: >>>> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman wrote: >>>>> _kmeans chokes on large thresholds: >>>>> >>>>>>> from scipy import cluster >>>>>>> v = np.array([1,2,3,4,10], dtype=float) >>>>>>> cluster.vq.kmeans(v, 1, thresh=1e15) >>>>> ? (array([ 4.]), 2.3999999999999999) >>>>>>> cluster.vq.kmeans(v, 1, thresh=1e16) >>>>> >>>>> IndexError: list index out of range >>>>> >>>>> The problem is in these lines: >>>>> >>>>> ? ?diff = thresh+1. >>>>> ? ?while diff > thresh: >>>>> ? ? ? ? >>>>> ? ? ? ?if(diff > thresh): >>>>> >>>>> If thresh is large then (thresh + 1) > thresh is False: >>>>> >>>>>>> thresh = 1e16 >>>>>>> diff = thresh + 1.0 >>>>>>> diff > thresh >>>>> ? False >>>>> >>>>> What's a use case for a large threshold? You might want to study the >>>>> algorithm by seeing the result after one iteration (not to be confused >>>>> with the iter input which is something else). >>>>> >>>>> One fix is to use 2*thresh instead for thresh + 1. But that just >>>>> pushes the problem out to higher thresholds >>>> >>>> Or just use the spacing function, which by definition returns the >>>> smallest number M such as thresh + M > thresh (except for nan/inf) >>> >>> Neat, I've never heard of np.spacing. But it suffers the same fate: >>> >>> Works: >>> >>>>> thresh = 1e16 >>>>> diff = thresh + np.spacing(thresh) >>>>> diff > thresh >>> ? True >>> >>> Doesn't work: >>> >>>>> thresh = 1e400 >>>>> diff = thresh + np.spacing(thresh) >>>>> diff > thresh >>> ? False >> >> That's because 1e400 is inf for double precision numbers, and inf + N >>> inf is never true :) > > That makes sense. But it is also the reason not to use np.spacing for > kmeans. Entering thresh=np.inf seems reasonable if you want to make > sure only one iteration is performed. Using > > if (diff > thesh) or (len(dist_arg) == 0) > > should fix it. Is the extra time OK for such a small corner case? I think so. Oh, but I'm getting lost in a thicket of small issues. The big issue is that the stopping condition is wrong. kmeans currently stops iterating when changes in the mean sum of distances is below the threshold. But the mean distance doesn't monotonically decrease. So should we switch from changes in mean distance to changes in the root of the mean squared distance? And then update the doc? Or have we already decided on that? Unless I'm missing something (and I'm new to kmeans) I don't think we have a choice. David, what's your take? From d.l.goldsmith at gmail.com Sun Jul 25 19:25:23 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Jul 2010 16:25:23 -0700 Subject: [SciPy-User] Is DFITPACK just a double precision FITPACK? Message-ID: And if I were to make FITPACK a link in a docstring, which do you think would be a better destination for our typical user: http://www.cisl.ucar.edu/softlib/FITPACK.html or http://www.netlib.org/fitpack/ Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 25 21:51:35 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 25 Jul 2010 19:51:35 -0600 Subject: [SciPy-User] Is DFITPACK just a double precision FITPACK? In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 5:25 PM, David Goldsmith wrote: > And if I were to make FITPACK a link in a docstring, which do you think > would be a better destination for our typical user: > > As far as I can tell, there are two C interfaces to fitpack: dfitpack and _fitpack. The dfitpack interface is produced by f2py while _fitpack is from fitpackmodule. The python module fitpack2 uses mostly dfitpack. I think there is some history there that Pearu could explain. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Sun Jul 25 22:42:18 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Jul 2010 19:42:18 -0700 Subject: [SciPy-User] Is DFITPACK just a double precision FITPACK? In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 6:51 PM, Charles R Harris wrote: > > On Sun, Jul 25, 2010 at 5:25 PM, David Goldsmith wrote: > >> And if I were to make FITPACK a link in a docstring, which do you think >> would be a better destination for our typical user: >> >> > As far as I can tell, there are two C interfaces to fitpack: dfitpack and > _fitpack. The dfitpack interface is produced by f2py while _fitpack is from > fitpackmodule. The python module fitpack2 uses mostly dfitpack. I think > there is some history there that Pearu could explain. > Here's the deal: I added Brief and Extended (not too long, don't worry) Summaries to the docstring for scipy.interpolate, the latter of which simply consists of a high-level narrative listing of the sub-package contents, including mention of the FITPACK and DFITPACK wrappers. I figured I might as well make the words FITPACK and DFITPACK links to relevant sites, and, using Google, found two good candidates for FITPACK (those indicated in the OP), but all the top hits for DFITPACK were to something in the scipy milieu! :-) Is DFITPACK different enough from FITPACK that it should have its own link, and if so, can anyone suggest a good one? Thanks, DG > > > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Sun Jul 25 22:59:43 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Jul 2010 19:59:43 -0700 Subject: [SciPy-User] Meaning of the phrase "Kolmo.-Smirnov complementary CDF" Message-ID: This phrase occurs in the description of smirnov in scipy.special; is the meaning changed if I swap the complementary and the Kolmo.-Smirnov, i.e., is "Complementary Kolmo.-Smirnov CDF" equivalent in meaning? (I need to shorten the description to fit on one terminal line and would like to substitute 1 - KSCDF or some such for it.) DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 25 23:58:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 25 Jul 2010 21:58:22 -0600 Subject: [SciPy-User] Is DFITPACK just a double precision FITPACK? In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 8:42 PM, David Goldsmith wrote: > On Sun, Jul 25, 2010 at 6:51 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> On Sun, Jul 25, 2010 at 5:25 PM, David Goldsmith > > wrote: >> >>> And if I were to make FITPACK a link in a docstring, which do you think >>> would be a better destination for our typical user: >>> >>> >> As far as I can tell, there are two C interfaces to fitpack: dfitpack and >> _fitpack. The dfitpack interface is produced by f2py while _fitpack is from >> fitpackmodule. The python module fitpack2 uses mostly dfitpack. I think >> there is some history there that Pearu could explain. >> > > Here's the deal: I added Brief and Extended (not too long, don't worry) > Summaries to the docstring for scipy.interpolate, the latter of which simply > consists of a high-level narrative listing of the sub-package contents, > including mention of the FITPACK and DFITPACK wrappers. I figured I might > as well make the words FITPACK and DFITPACK links to relevant sites, and, > using Google, found two good candidates for FITPACK (those indicated in the > OP), but all the top hits for DFITPACK were to something in the scipy > milieu! :-) > > Is DFITPACK different enough from FITPACK that it should have its own link, > and if so, can anyone suggest a good one? > > No. dfitpack *is* fitpack, or at least some of it. The _fitpack module also interfaces to fitpack but is slightly higher level. Basically we have two interfaces to fitpack and a bit of redundancy. It would be nice to clean it all up someday. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jul 26 00:04:13 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 25 Jul 2010 22:04:13 -0600 Subject: [SciPy-User] Meaning of the phrase "Kolmo.-Smirnov complementary CDF" In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 8:59 PM, David Goldsmith wrote: > This phrase occurs in the description of smirnov in scipy.special; is the > meaning changed if I swap the complementary and the Kolmo.-Smirnov, i.e., is > "Complementary Kolmo.-Smirnov CDF" equivalent in meaning? (I need to > shorten the description to fit on one terminal line and would like to > substitute 1 - KSCDF or some such for it.) > > I think Kolmogorov should be spelled out. Is "Complementary Kolmo.-Smirnov CDF" equivalent to what? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jul 26 01:17:37 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 26 Jul 2010 00:17:37 -0500 Subject: [SciPy-User] Meaning of the phrase "Kolmo.-Smirnov complementary CDF" In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 23:04, Charles R Harris wrote: > > > On Sun, Jul 25, 2010 at 8:59 PM, David Goldsmith > wrote: >> >> This phrase occurs in the description of smirnov in scipy.special; is the >> meaning changed if I swap the complementary and the Kolmo.-Smirnov, i.e., is >> "Complementary Kolmo.-Smirnov CDF" equivalent in meaning?? (I need to >> shorten the description to fit on one terminal line and would like to >> substitute 1 - KSCDF or some such for it.) >> > > I think Kolmogorov should be spelled out. +1 > Is "Complementary Kolmo.-Smirnov > CDF" equivalent to what? Subject: Meaning of the phrase "Kolmo.-Smirnov complementary CDF" "The complement of the one-sided Kolmogorov-Smirnov CDF." will suffice for a one-line description. Do not use "1 - KSCDF", please. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From d.l.goldsmith at gmail.com Mon Jul 26 02:53:27 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Jul 2010 23:53:27 -0700 Subject: [SciPy-User] Is DFITPACK just a double precision FITPACK? In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 8:58 PM, Charles R Harris wrote: > > On Sun, Jul 25, 2010 at 8:42 PM, David Goldsmith wrote: > >> On Sun, Jul 25, 2010 at 6:51 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> On Sun, Jul 25, 2010 at 5:25 PM, David Goldsmith < >>> d.l.goldsmith at gmail.com> wrote: >>> >>>> And if I were to make FITPACK a link in a docstring, which do you think >>>> would be a better destination for our typical user: >>>> >>> >>> As far as I can tell, there are two C interfaces to fitpack: dfitpack and >>> _fitpack. The dfitpack interface is produced by f2py while _fitpack is from >>> fitpackmodule. The python module fitpack2 uses mostly dfitpack. I think >>> there is some history there that Pearu could explain. >>> >> >> Here's the deal: I added Brief and Extended (not too long, don't worry) >> Summaries to the docstring for scipy.interpolate, the latter of which simply >> consists of a high-level narrative listing of the sub-package contents, >> including mention of the FITPACK and DFITPACK wrappers. I figured I might >> as well make the words FITPACK and DFITPACK links to relevant sites, and, >> using Google, found two good candidates for FITPACK (those indicated in the >> OP), but all the top hits for DFITPACK were to something in the scipy >> milieu! :-) >> >> Is DFITPACK different enough from FITPACK that it should have its own >> link, and if so, can anyone suggest a good one? >> > > No. dfitpack *is* fitpack, or at least some of it. The _fitpack module also > interfaces to fitpack but is slightly higher level. Basically we have two > interfaces to fitpack and a bit of redundancy. It would be nice to clean it > all up someday. > > Thanks, Chuck, that's what I needed to know. :-) DG > Chuck > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Mon Jul 26 02:54:48 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Jul 2010 23:54:48 -0700 Subject: [SciPy-User] Meaning of the phrase "Kolmo.-Smirnov complementary CDF" In-Reply-To: References: Message-ID: On Sun, Jul 25, 2010 at 10:17 PM, Robert Kern wrote: > On Sun, Jul 25, 2010 at 23:04, Charles R Harris > wrote: > > > > > > On Sun, Jul 25, 2010 at 8:59 PM, David Goldsmith < > d.l.goldsmith at gmail.com> > > wrote: > >> > >> This phrase occurs in the description of smirnov in scipy.special; is > the > >> meaning changed if I swap the complementary and the Kolmo.-Smirnov, > i.e., is > >> "Complementary Kolmo.-Smirnov CDF" equivalent in meaning? (I need to > >> shorten the description to fit on one terminal line and would like to > >> substitute 1 - KSCDF or some such for it.) > >> > > > > I think Kolmogorov should be spelled out. > > +1 > > > Is "Complementary Kolmo.-Smirnov > > CDF" equivalent to what? > > Subject: Meaning of the phrase "Kolmo.-Smirnov complementary CDF" > > "The complement of the one-sided Kolmogorov-Smirnov CDF." will suffice > for a one-line description. Do not use "1 - KSCDF", please. > Excellent, thanks! DG > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Mon Jul 26 03:02:04 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Mon, 26 Jul 2010 00:02:04 -0700 Subject: [SciPy-User] Meaning of the phrase "Kolmo.-Smirnov complementary CDF" In-Reply-To: References: Message-ID: Same problem w/ kolmogorov function: Present description: "The complementary CDF of the (scaled) two-sided test statistic (Kn*) valid for large n." Proposed alternative: The complement of the scaled 2-sided Kn* CDF for large n OK? DG On Sun, Jul 25, 2010 at 11:54 PM, David Goldsmith wrote: > On Sun, Jul 25, 2010 at 10:17 PM, Robert Kern wrote: > >> On Sun, Jul 25, 2010 at 23:04, Charles R Harris >> wrote: >> > >> > >> > On Sun, Jul 25, 2010 8:59 PM, David Goldsmith >> > wrote: >> >> >> >> This phrase occurs in the description of smirnov in scipy.special; is >> the >> >> meaning changed if I swap the complementary and the Kolmo.-Smirnov, >> i.e., is >> >> "Complementary Kolmo.-Smirnov CDF" equivalent in meaning? (I need to >> >> shorten the description to fit on one terminal line and would like to >> >> substitute 1 - KSCDF or some such for it.) >> >> >> > >> > I think Kolmogorov should be spelled out. >> >> +1 >> >> > Is "Complementary Kolmo.-Smirnov >> > CDF" equivalent to what? >> >> Subject: Meaning of the phrase "Kolmo.-Smirnov complementary CDF" >> >> "The complement of the one-sided Kolmogorov-Smirnov CDF." will suffice >> for a one-line description. Do not use "1 - KSCDF", please. >> > > Excellent, thanks! > > DG > > >> >> -- >> >> Robert Kern >> >> "I have come to believe that the whole world is an enigma, a harmless >> enigma that is made terrible by our own mad attempt to interpret it as >> though it had an underlying truth." >> -- Umberto Eco >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From thoeger at fys.ku.dk Mon Jul 26 12:34:39 2010 From: thoeger at fys.ku.dk (=?ISO-8859-1?Q?Th=F8ger?= Emil Juul Thorsen) Date: Mon, 26 Jul 2010 18:34:39 +0200 Subject: [SciPy-User] Points with given distance to a polygon Message-ID: <1280162079.6595.5.camel@falconeer> I am working on a project where I am defining some regions of interest. I have a 2200x2200 px 2D Array in which my ROI is defined by a polygon. However, my data are smoothed by a gaussian kernel of width 300px, and I would like to draw some lines indicating this inner 150px distance to the borders of the ROI. I cannot come up with any way to do this, does anyone have an idea? Best regards; Emil From djpine at gmail.com Mon Jul 26 12:46:58 2010 From: djpine at gmail.com (David Pine) Date: Mon, 26 Jul 2010 12:46:58 -0400 Subject: [SciPy-User] SciPy ODE integrator Message-ID: <838DE651-0C9F-4713-99D7-4997A234AEC9@gmail.com> Is there a SciPy ODE integrator that does adaptive stepsize integration AND produces output with the adaptive time steps intact? The standard SciPy ODE integrator seems to be scipy.integrate.odeint and its simpler cousin scipy.integrate.ode. These work just fine but both take a user-specified time series and returns the solution at those points only. Often, I prefer to have a more classic adaptive stepsize integrator that returns the solution at time steps determined by the integrator (and the degree of desired precision input by the user). This is often the most useful kind of solution because it tends to produce more points where the solution is varying rapidly and fewer where it is not varying much. A classic Runge-Kugga adaptive stepsize ODE solver does this as to many others, but I can't find a nice implementation in SciPy or NumPy. Please advise. Thanks. David From e.antero.tammi at gmail.com Mon Jul 26 14:11:44 2010 From: e.antero.tammi at gmail.com (eat) Date: Mon, 26 Jul 2010 18:11:44 +0000 (UTC) Subject: [SciPy-User] Points with given distance to a polygon References: <1280162079.6595.5.camel@falconeer> Message-ID: Th?ger Emil Juul Thorsen fys.ku.dk> writes: > > I am working on a project where I am defining some regions of interest. > I have a 2200x2200 px 2D Array in which my ROI is defined by a polygon. > However, my data are smoothed by a gaussian kernel of width 300px, and I > would like to draw some lines indicating this inner 150px distance to > the borders of the ROI. I cannot come up with any way to do this, does > anyone have an idea? > > Best regards; > > Emil > Hi, Assuming you are looking for shrinking the polygon and your polygons remains simple (nonselfintersecting) after shrinkage, I'll suggest following rough procedure (in pseudo python). def translate(segment, dist): normal= calculate inward normal of line segment normal= normal/ norm(normal) return segment+ dist* mormal def shrink_polygon(polygon, dist): shrinked= translate(polygon(first segment), dist) for segment in second to last- 1 segment of polygon: x= intersect(shrinked, translate(segment, dist)) shrinked[-1]= x shrinked.append(end point of segment) x= intersect(shrinked, translate(polygon(last segment), dist)) shrinked[0]= x shrinked[-1]= x return shrinked Does this fulfill your requirements? If so, I can work out more details if needed. My 2 cents, eat From zachary.pincus at yale.edu Mon Jul 26 14:26:49 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 26 Jul 2010 14:26:49 -0400 Subject: [SciPy-User] Points with given distance to a polygon In-Reply-To: <1280162079.6595.5.camel@falconeer> References: <1280162079.6595.5.camel@falconeer> Message-ID: > I am working on a project where I am defining some regions of > interest. > I have a 2200x2200 px 2D Array in which my ROI is defined by a > polygon. > However, my data are smoothed by a gaussian kernel of width 300px, > and I > would like to draw some lines indicating this inner 150px distance to > the borders of the ROI. I cannot come up with any way to do this, does > anyone have an idea? Two broad options spring to mind: (1) Geometric -- shrink the polygon along the normals to the vertices. [Oh, I see that eat has given pseudocode for same... good] (2) Gridded -- rasterize the polygon to a binary mask (no tools for this in scipy, I fear... but if you're handy with opengl or something, that's not too hard), and then use scipy.ndimage to erode or dilate the mask as necessary. Zach From jkington at wisc.edu Mon Jul 26 15:28:06 2010 From: jkington at wisc.edu (Joe Kington) Date: Mon, 26 Jul 2010 14:28:06 -0500 Subject: [SciPy-User] Points with given distance to a polygon In-Reply-To: References: <1280162079.6595.5.camel@falconeer> Message-ID: On Mon, Jul 26, 2010 at 1:26 PM, Zachary Pincus wrote: > > I am working on a project where I am defining some regions of > > interest. > > I have a 2200x2200 px 2D Array in which my ROI is defined by a > > polygon. > > However, my data are smoothed by a gaussian kernel of width 300px, > > and I > > would like to draw some lines indicating this inner 150px distance to > > the borders of the ROI. I cannot come up with any way to do this, does > > anyone have an idea? > > Two broad options spring to mind: > (1) Geometric -- shrink the polygon along the normals to the vertices. > [Oh, I see that eat has given pseudocode for same... good] > (2) Gridded -- rasterize the polygon to a binary mask (no tools for > this in scipy, I fear... but if you're handy with opengl or something, > that's not too hard), and then use scipy.ndimage to erode or dilate > the mask as necessary. > > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > If you go with a purely geometric method, look into a module like shapelyto buffer the polygon. (similar to the code snipped eat posted, but more flexible) See this examplefor a general idea. -Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Mon Jul 26 15:41:48 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 26 Jul 2010 15:41:48 -0400 Subject: [SciPy-User] Points with given distance to a polygon In-Reply-To: References: <1280162079.6595.5.camel@falconeer> Message-ID: Oh cool! Shapely looks really useful -- thanks. On Jul 26, 2010, at 3:28 PM, Joe Kington wrote: > On Mon, Jul 26, 2010 at 1:26 PM, Zachary Pincus > wrote: > > I am working on a project where I am defining some regions of > > interest. > > I have a 2200x2200 px 2D Array in which my ROI is defined by a > > polygon. > > However, my data are smoothed by a gaussian kernel of width 300px, > > and I > > would like to draw some lines indicating this inner 150px distance > to > > the borders of the ROI. I cannot come up with any way to do this, > does > > anyone have an idea? > > Two broad options spring to mind: > (1) Geometric -- shrink the polygon along the normals to the vertices. > [Oh, I see that eat has given pseudocode for same... good] > (2) Gridded -- rasterize the polygon to a binary mask (no tools for > this in scipy, I fear... but if you're handy with opengl or something, > that's not too hard), and then use scipy.ndimage to erode or dilate > the mask as necessary. > > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > If you go with a purely geometric method, look into a module like > shapely to buffer the polygon. (similar to the code snipped eat > posted, but more flexible) > > See this example for a general idea. > > -Joe > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ryanlists at gmail.com Tue Jul 27 11:23:10 2010 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue, 27 Jul 2010 10:23:10 -0500 Subject: [SciPy-User] Integrating a matrix exponential Message-ID: I am trying to discretize a state-space model. linalg.expm makes this easy for the plant matrix. I am having trouble with the input matrix. The continuous model is xdot = A*x(t) + B*u and the discrete time model will be x(k+1) = G(T) *x(k) + H(T)*u(k) Following Ogata's "Discrete-Time Control Systems" second edition, page 317: G(T) = linalg.expm(A*T) #this works great H(T) = int(expm(A*t)) dt from 0 to T then dot with B That seems easy enough, and I think I want to use numeric integration to approximate the integral of expm(A*t). But is there a method in scipy.integrate for definite integration of a matrix of integrands? And is this the best approach? Thanks, Ryan From ralf.gommers at googlemail.com Tue Jul 27 12:49:23 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 28 Jul 2010 00:49:23 +0800 Subject: [SciPy-User] ANN: SciPy 0.8.0 Message-ID: I'm pleased to announce the release of SciPy 0.8.0. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release comes one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0 requires Python 2.4-2.6 and NumPy 1.4.1 or greater. For more information, please see the release notes at the end of this email. You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available, as well as source tarballs for other platforms and the documentation in pdf form. Thank you to everybody who contributed to this release. Enjoy, The SciPy developers ========================= SciPy 0.8.0 Release Notes ========================= .. contents:: SciPy 0.8.0 is the culmination of 17 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.8.x branch, and on adding new features on the development trunk. This release requires Python 2.4 - 2.6 and NumPy 1.4.1 or greater. Please note that SciPy is still considered to have "Beta" status, as we work toward a SciPy 1.0.0 release. The 1.0.0 release will mark a major milestone in the development of SciPy, after which changing the package structure or API will be much more difficult. Whilst these pre-1.0 releases are considered to have "Beta" status, we are committed to making them as bug-free as possible. However, until the 1.0 release, we are aggressively reviewing and refining the functionality, organization, and interface. This is being done in an effort to make the package as coherent, intuitive, and useful as possible. To achieve this, we need help from the community of users. Specifically, we need feedback regarding all aspects of the project - everything - from which algorithms we implement, to details about our function's call signatures. Python 3 ======== Python 3 compatibility is planned and is currently technically feasible, since Numpy has been ported. However, since the Python 3 compatible Numpy 1.5 has not been released yet, support for Python 3 in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned for fall 2010, will very likely include experimental support for Python 3. Major documentation improvements ================================ SciPy documentation is greatly improved. Deprecated features =================== Swapping inputs for correlation functions (scipy.signal) -------------------------------------------------------- Concern correlate, correlate2d, convolve and convolve2d. If the second input is larger than the first input, the inputs are swapped before calling the underlying computation routine. This behavior is deprecated, and will be removed in scipy 0.9.0. Obsolete code deprecated (scipy.misc) ------------------------------------- The modules `helpmod`, `ppimport` and `pexec` from `scipy.misc` are deprecated. They will be removed from SciPy in version 0.9. Additional deprecations ----------------------- * linalg: The function `solveh_banded` currently returns a tuple containing the Cholesky factorization and the solution to the linear system. In SciPy 0.9, the return value will be just the solution. * The function `constants.codata.find` will generate a DeprecationWarning. In Scipy version 0.8.0, the keyword argument 'disp' was added to the function, with the default value 'True'. In 0.9.0, the default will be 'False'. * The `qshape` keyword argument of `signal.chirp` is deprecated. Use the argument `vertex_zero` instead. * Passing the coefficients of a polynomial as the argument `f0` to `signal.chirp` is deprecated. Use the function `signal.sweep_poly` instead. * The `io.recaster` module has been deprecated and will be removed in 0.9.0. New features ============ DCT support (scipy.fftpack) --------------------------- New realtransforms have been added, namely dct and idct for Discrete Cosine Transform; type I, II and III are available. Single precision support for fft functions (scipy.fftpack) ---------------------------------------------------------- fft functions can now handle single precision inputs as well: fft(x) will return a single precision array if x is single precision. At the moment, for FFT sizes that are not composites of 2, 3, and 5, the transform is computed internally in double precision to avoid rounding error in FFTPACK. Correlation functions now implement the usual definition (scipy.signal) ----------------------------------------------------------------------- The outputs should now correspond to their matlab and R counterparts, and do what most people expect if the old_behavior=False argument is passed: * correlate, convolve and their 2d counterparts do not swap their inputs depending on their relative shape anymore; * correlation functions now conjugate their second argument while computing the slided sum-products, which correspond to the usual definition of correlation. Additions and modification to LTI functions (scipy.signal) ---------------------------------------------------------- * The functions `impulse2` and `step2` were added to `scipy.signal`. They use the function `scipy.signal.lsim2` to compute the impulse and step response of a system, respectively. * The function `scipy.signal.lsim2` was changed to pass any additional keyword arguments to the ODE solver. Improved waveform generators (scipy.signal) ------------------------------------------- Several improvements to the `chirp` function in `scipy.signal` were made: * The waveform generated when `method="logarithmic"` was corrected; it now generates a waveform that is also known as an "exponential" or "geometric" chirp. (See http://en.wikipedia.org/wiki/Chirp.) * A new `chirp` method, "hyperbolic", was added. * Instead of the keyword `qshape`, `chirp` now uses the keyword `vertex_zero`, a boolean. * `chirp` no longer handles an arbitrary polynomial. This functionality has been moved to a new function, `sweep_poly`. A new function, `sweep_poly`, was added. New functions and other changes in scipy.linalg ----------------------------------------------- The functions `cho_solve_banded`, `circulant`, `companion`, `hadamard` and `leslie` were added to `scipy.linalg`. The function `block_diag` was enhanced to accept scalar and 1D arguments, along with the usual 2D arguments. New function and changes in scipy.optimize ------------------------------------------ The `curve_fit` function has been added; it takes a function and uses non-linear least squares to fit that to the provided data. The `leastsq` and `fsolve` functions now return an array of size one instead of a scalar when solving for a single parameter. New sparse least squares solver ------------------------------- The `lsqr` function was added to `scipy.sparse`. `This routine `_ finds a least-squares solution to a large, sparse, linear system of equations. ARPACK-based sparse SVD ----------------------- A naive implementation of SVD for sparse matrices is available in scipy.sparse.linalg.eigen.arpack. It is based on using an symmetric solver on , and as such may not be very precise. Alternative behavior available for `scipy.constants.find` --------------------------------------------------------- The keyword argument `disp` was added to the function `scipy.constants.find`, with the default value `True`. When `disp` is `True`, the behavior is the same as in Scipy version 0.7. When `False`, the function returns the list of keys instead of printing them. (In SciPy version 0.9, the default will be reversed.) Incomplete sparse LU decompositions ----------------------------------- Scipy now wraps SuperLU version 4.0, which supports incomplete sparse LU decompositions. These can be accessed via `scipy.sparse.linalg.spilu`. Upgrade to SuperLU 4.0 also fixes some known bugs. Faster matlab file reader and default behavior change ------------------------------------------------------ We've rewritten the matlab file reader in Cython and it should now read matlab files at around the same speed that Matlab does. The reader reads matlab named and anonymous functions, but it can't write them. Until scipy 0.8.0 we have returned arrays of matlab structs as numpy object arrays, where the objects have attributes named for the struct fields. As of 0.8.0, we return matlab structs as numpy structured arrays. You can get the older behavior by using the optional ``struct_as_record=False`` keyword argument to `scipy.io.loadmat` and friends. There is an inconsistency in the matlab file writer, in that it writes numpy 1D arrays as column vectors in matlab 5 files, and row vectors in matlab 4 files. We will change this in the next version, so both write row vectors. There is a `FutureWarning` when calling the writer to warn of this change; for now we suggest using the ``oned_as='row'`` keyword argument to `scipy.io.savemat` and friends. Faster evaluation of orthogonal polynomials ------------------------------------------- Values of orthogonal polynomials can be evaluated with new vectorized functions in `scipy.special`: `eval_legendre`, `eval_chebyt`, `eval_chebyu`, `eval_chebyc`, `eval_chebys`, `eval_jacobi`, `eval_laguerre`, `eval_genlaguerre`, `eval_hermite`, `eval_hermitenorm`, `eval_gegenbauer`, `eval_sh_legendre`, `eval_sh_chebyt`, `eval_sh_chebyu`, `eval_sh_jacobi`. This is faster than constructing the full coefficient representation of the polynomials, which was previously the only available way. Note that the previous orthogonal polynomial routines will now also invoke this feature, when possible. Lambert W function ------------------ `scipy.special.lambertw` can now be used for evaluating the Lambert W function. Improved hypergeometric 2F1 function ------------------------------------ Implementation of `scipy.special.hyp2f1` for real parameters was revised. The new version should produce accurate values for all real parameters. More flexible interface for Radial basis function interpolation --------------------------------------------------------------- The `scipy.interpolate.Rbf` class now accepts a callable as input for the "function" argument, in addition to the built-in radial basis functions which can be selected with a string argument. Removed features ================ scipy.stsci: the package was removed The module `scipy.misc.limits` was removed. scipy.io -------- The IO code in both NumPy and SciPy is being extensively reworked. NumPy will be where basic code for reading and writing NumPy arrays is located, while SciPy will house file readers and writers for various data formats (data, audio, video, images, matlab, etc.). Several functions in `scipy.io` are removed in the 0.8.0 release including: `npfile`, `save`, `load`, `create_module`, `create_shelf`, `objload`, `objsave`, `fopen`, `read_array`, `write_array`, `fread`, `fwrite`, `bswap`, `packbits`, `unpackbits`, and `convert_objectarray`. Some of these functions have been replaced by NumPy's raw reading and writing capabilities, memory-mapping capabilities, or array methods. Others have been moved from SciPy to NumPy, since basic array reading and writing capability is now handled by NumPy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Wed Jul 28 10:45:59 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Wed, 28 Jul 2010 10:45:59 -0400 Subject: [SciPy-User] SciPy ODE integrator In-Reply-To: <838DE651-0C9F-4713-99D7-4997A234AEC9@gmail.com> References: <838DE651-0C9F-4713-99D7-4997A234AEC9@gmail.com> Message-ID: On 26 July 2010 12:46, David Pine wrote: > Is there a SciPy ODE integrator that does adaptive stepsize integration AND produces output with the adaptive time steps intact? It is not obvious, but the object-oriented integrator, based on VODE, can be run in this mode. You normally tell it how much to advance on each call and it does as many adaptive steps as it takes to get there, but there is an optional argument you can pass it that will make it take just one step of the underlying integrator. You can then write a python loop to produce the solution you want. If this seems messy, I have to agree. scipy's ODE integrators are in desperate need of an API redesign (they've had one already, which is why there are two completely different interfaces, but they need another). You could try pydstool, which is designed for the study of dynamical systems and has many more tools for working with ODEs and their solutions. Anne > The standard SciPy ODE integrator seems to be scipy.integrate.odeint and its simpler cousin scipy.integrate.ode. ?These work just fine but ?both take a user-specified time series and returns the solution at those points only. ?Often, I prefer to have a more classic adaptive stepsize integrator that returns the solution at time steps determined by the integrator (and the degree of desired precision input by the user). ?This is often the most useful kind of solution because it tends to produce more points where the solution is varying rapidly and fewer where it is not varying much. ?A classic Runge-Kugga adaptive stepsize ODE solver does this as to many others, but I can't find a nice implementation in SciPy or NumPy. ?Please advise. ?Thanks. > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From lorenzo.isella at gmail.com Wed Jul 28 11:11:01 2010 From: lorenzo.isella at gmail.com (Lorenzo Isella) Date: Wed, 28 Jul 2010 17:11:01 +0200 Subject: [SciPy-User] Turnover Optimization Message-ID: <1280329861.1488.39.camel@rattlesnake> Dear All, I hope this is not too off-topic. I am working on a problem that, without too much details, resembles the time allocation of work shifts. Let us say that you have a sequence of N contiguous time slots (let us assume for now that all the slots have the same duration delta). I also have a set of individuals {x_i} which I need to assign to each time slot with the following conditions (1) at each time slot 2 different individuals x_i and x_j, i!=j, must be at work (2) if x_i works at the m-th time slot, then he cannot work at the m+1th slot (3) each x_i must work a number k_i of time slots such that \sum_i k_i=N. There may be other other (almost endless) rules one may want to add: certain time slots may be unaccessible for an individual or certain coupling x_i,x_j may be forbidden, but you get the idea. How to tackle this problem? I would like to know first of all if, for a given set of boundary there is a solution and if not, what is the solution 'closest' (here I may need to introduce a concept of distance or a penalty function) to the given constraints. Should I try to evolve the system starting from a random allocation of the time slots and rooting out at each generations those allocations which do not respect the constraints? Can the problem be solved exactly in some cases? Also having an idea of the degeneracy of the possible solutions would be good. Any suggestion is appreciated. Best Regard Lorenzo P.S.: it goes without saying that it would be an added bonus to have an algorithm that allows for the easy implementation of extra rules/variations of the existing rules (e.g. three people at work simultaneously and so on) From vanforeest at gmail.com Wed Jul 28 14:27:54 2010 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 28 Jul 2010 20:27:54 +0200 Subject: [SciPy-User] Turnover Optimization In-Reply-To: <1280329861.1488.39.camel@rattlesnake> References: <1280329861.1488.39.camel@rattlesnake> Message-ID: Hi Lorenzo, This appears to me to be a typical scheduling problem. You might consult one of the books of Pinedo, or Peter Brucker, to see whether it has been solved. > (1) at each time slot 2 different individuals x_i and x_j, i!=j, must be > at work > (2) if x_i works at the m-th time slot, then he cannot work at the m+1th > slot You might as well (it seems to me at least) say that once x_i works at the m-th time, he also works at the m+1 th time. Then you have to compensate in the restrictions such that the occupation for each slot is 4, and such that the sum of the allocated slots for each worker is 2N. > (3) each x_i must work a number k_i of time slots such that \sum_i > k_i=N. I am pretty sure your problem is hard, np-complete or something the like. I tried to tackle a similar problem, but I could only solve small instances, 40 workers or so, and I had to use a good IP solver for this, in my case, gurobi. I can send you some code I used for my problem (in pulp), but then send me a private mail to prevent clutter on this mailing list. bye Nicky From emanuele at relativita.com Thu Jul 29 04:47:41 2010 From: emanuele at relativita.com (Emanuele Olivetti) Date: Thu, 29 Jul 2010 10:47:41 +0200 Subject: [SciPy-User] Turnover Optimization In-Reply-To: <1280329861.1488.39.camel@rattlesnake> References: <1280329861.1488.39.camel@rattlesnake> Message-ID: <4C51402D.1030402@relativita.com> On 07/28/2010 05:11 PM, Lorenzo Isella wrote: > Dear All, > I hope this is not too off-topic. > I am working on a problem that, without too much details, resembles the > time allocation of work shifts. > ... Hi Lorenzo, I used to play with this kind of problems while a I was working on a symbolic model checker years ago [0]. Basically you have a language that let you describe the basic variables of the problem and the constraints. Then you state that there is no assignment of the variables (i.e., allocation of shifts) that satisfies all the constraints and let the model checker prove whether you are right in your statement or wrong. If you are wrong, i.e., there is an allocation that 'solves' your problem, then the model checker will find it and present it to you. As far as I remember you don't have the full list of possible solutions as output, but just one solution, the first met by the model checker while exploring the search space. Maybe this approach is an overkill with respect your needs. Surely it is flexible because it has a language with whom you can play and express many different problems. Best, Emanuele [0]: http://nusmv.fbk.eu/ From djpine at gmail.com Thu Jul 29 06:44:19 2010 From: djpine at gmail.com (David Pine) Date: Thu, 29 Jul 2010 06:44:19 -0400 Subject: [SciPy-User] SciPy ODE integrator In-Reply-To: References: <838DE651-0C9F-4713-99D7-4997A234AEC9@gmail.com> Message-ID: <269E30BC-CF34-4971-B306-9A7327B126A4@gmail.com> Anne, Thanks. Actually I finally figured this (the VODE option) out but I agree that scipy's ODE solvers need a makeover. The routines under the hood seem to be quite nice but the interface to Python is clumsy at best and the documentation on how to use it is pretty awful. I'll take a look at pydstool. Thanks. David On Jul 28, 2010, at 10:45 AM, Anne Archibald wrote: > On 26 July 2010 12:46, David Pine wrote: >> Is there a SciPy ODE integrator that does adaptive stepsize integration AND produces output with the adaptive time steps intact? > > It is not obvious, but the object-oriented integrator, based on VODE, > can be run in this mode. You normally tell it how much to advance on > each call and it does as many adaptive steps as it takes to get there, > but there is an optional argument you can pass it that will make it > take just one step of the underlying integrator. You can then write a > python loop to produce the solution you want. > > If this seems messy, I have to agree. scipy's ODE integrators are in > desperate need of an API redesign (they've had one already, which is > why there are two completely different interfaces, but they need > another). You could try pydstool, which is designed for the study of > dynamical systems and has many more tools for working with ODEs and > their solutions. > > Anne > >> The standard SciPy ODE integrator seems to be scipy.integrate.odeint and its simpler cousin scipy.integrate.ode. These work just fine but both take a user-specified time series and returns the solution at those points only. Often, I prefer to have a more classic adaptive stepsize integrator that returns the solution at time steps determined by the integrator (and the degree of desired precision input by the user). This is often the most useful kind of solution because it tends to produce more points where the solution is varying rapidly and fewer where it is not varying much. A classic Runge-Kugga adaptive stepsize ODE solver does this as to many others, but I can't find a nice implementation in SciPy or NumPy. Please advise. Thanks. >> >> David >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From thomas.robitaille at gmail.com Thu Jul 29 11:10:14 2010 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Thu, 29 Jul 2010 11:10:14 -0400 Subject: [SciPy-User] Logarithmic interpolation Message-ID: <622E9CD2-DAFA-4A06-9627-A995F1E7D0D8@gmail.com> Hello, For a problem I am trying to solve, I need to perform interpolation, but in log-log space, or linear-log space. The obvious way to do this is to do something like f = interp1d(np.log10(x), np.log10(y)) ynew = 10.**f(np.log10(xnew)) but this can actually be pretty inefficient, because the log of all the values in x and y has to be computed. So for arrays of size e.g. 100, the above requires 201 calls to the log function. A more efficient way for interpolation of a single value would be to basically search where xnew falls in x, say between x[i] and x[i+1], and then do the log interpolation using only 4 calls to log10: ynew = 10.**(np.log10(y[i+1]/y[i]) / np.log10(x[i+1]/x[i]) * np.log10(xnew/x[i]) + np.log10(y[i])) Is there a way to already do this more easily somewhere in scipy.interpolate? Thanks, Thomas From jake.biesinger at gmail.com Sat Jul 31 20:48:28 2010 From: jake.biesinger at gmail.com (Jacob Biesinger) Date: Sat, 31 Jul 2010 17:48:28 -0700 Subject: [SciPy-User] Off by one bug in Scipy.stats.hypergeom Message-ID: Hi! Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one. Specifically, I think it's one-too-high. >From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), scipy.stats.hypergeom? Hypergeometric distribution Models drawing objects from a bin. M is total number of objects, n is total number of Type I objects. RV counts number of Type I objects in N drawn without replacement from population. So translating wikipedia's example... Pr(x=4; M=50, n=5, N=10) = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583 Pr(x=5; M=50, n=5, N=10) = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375 Which you can check with the python code: from scipy import comb as chse # "combination" => choose float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1) # example one 0.0039645830580150654155 # okay! float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two 0.00011893749174045196247 # okay! Try example one with scipy.stats.hypergeom: # scipy.stats.hypergeom.sf(x, M, n, N) scipy.stats.hypergeom.sf(4,50,5,10) 0.00011893749169422652 # correct value for x=5, not x=4 scipy.stats.hypergeom.sf(5,50,5,10) -4.6185277824406512e-14 # wrong It seems that changing the loc value from =0 (default) to =1 fixes the issue... scipy.stats.hypergeom.sf(4,50,5,10, loc=1) 0.0040835205497095073 # close enough scipy.stats.hypergeom.sf(5,50,5,10, loc=1) 0.00011893749169422652 # okay! Am I using the package wrong? -- Jake Biesinger Graduate Student Xie Lab, UC Irvine (949) 231-7587 -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sat Jul 31 21:02:03 2010 From: argriffi at ncsu.edu (alex) Date: Sat, 31 Jul 2010 21:02:03 -0400 Subject: [SciPy-User] Off by one bug in Scipy.stats.hypergeom In-Reply-To: References: Message-ID: On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger wrote: > Hi! > > Perhaps I'm using the module incorrectly, but it looks like the x parameter > in scipy.stats.hypergeom is off by one. Specifically, I think it's > one-too-high. > > From the wikipedia article > http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I > know they could be wrong-- just hear me out on this), > > I often see slight parameterization differences for stats packages and I never assume they are the same as each other or even as their documentation until I test some examples myself. So unless this behavior doesn't match the scipy docs then it probably doesn't count as a bug. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jul 31 22:27:30 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 31 Jul 2010 22:27:30 -0400 Subject: [SciPy-User] Off by one bug in Scipy.stats.hypergeom In-Reply-To: References: Message-ID: On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger wrote: > Hi! > Perhaps I'm using the module incorrectly, but it looks like the x parameter > in scipy.stats.hypergeom is off by one. ?Specifically, I think it's > one-too-high. > From the wikipedia article > http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example?(I > know they could be wrong-- just hear me out on this), > scipy.stats.hypergeom? > Hypergeometric distribution > > ?? ? ? Models drawing objects from a bin. > ?? ? ? M is total number of objects, n is total number of Type I objects. > ?? ? ? RV counts number of Type I objects in N drawn without replacement > from > ?? ? ? population. > So translating wikipedia's example... > Pr(x=4; M=50, n=5, N=10) ?= (choose(5,4) * choose(50-5, 10-4)) / > choose(50,10) = .003964583 > Pr(x=5; M=50, n=5, N=10) ?= (choose(5,5) * choose(50-5, 10-5)) / > choose(50,10) = .0001189375 > Which you can check with the python code: > from scipy import comb as chse ? # "combination" => choose > > float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1) > ?# example one > 0.0039645830580150654155 ?# okay! > float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) > # example two > 0.00011893749174045196247 ?# okay! > Try example one with scipy.stats.hypergeom: > # scipy.stats.hypergeom.sf(x, M, n, N) > scipy.stats.hypergeom.sf(4,50,5,10) > 0.00011893749169422652 ? ? # correct value for x=5, not x=4 > scipy.stats.hypergeom.sf(5,50,5,10) > -4.6185277824406512e-14 ? ?# wrong > It seems that changing the loc value from =0 (default) to =1 fixes the > issue... > scipy.stats.hypergeom.sf(4,50,5,10, loc=1) > 0.0040835205497095073 ? ?# close enough > scipy.stats.hypergeom.sf(5,50,5,10, loc=1) > 0.00011893749169422652 ? # okay! > Am I using the package wrong? I don't know why you are using the survival function. >From some quick examples, hypergeom looks ok: pmf has identical results to the Wikipedia example you referenced >>> stats.hypergeom.pmf([4,5],50,5,10) array([ 0.00396458, 0.00011894]) >>> stats.hypergeom.pmf(4,50,5,10) 0.0039645830580151411 >>> stats.hypergeom.pmf(5,50,5,10) 0.00011893749174045286 consistency of pmf, cdf, sf >>> stats.hypergeom.pmf(np.arange(5),50,5,10).sum() 0.9998810625082829 >>> stats.hypergeom.cdf(4,50,5,10) 0.9998810625082829 >>> stats.hypergeom.sf(4,50,5,10) 0.00011893749171709711 >>> 1-stats.hypergeom.pmf(np.arange(5),50,5,10).sum() 0.00011893749171709711 I'm always glad when someone verifies the numbers in scipy stats and appreciate any reports of inconsistencies or bugs. If there are differences in the parameterization, then we should make sure that they are sufficiently documented. Most distributions are internally tested or against numpy.random and there are very few tests with other packages. Thanks, Josef > -- > Jake Biesinger > Graduate Student > Xie Lab, UC Irvine > (949) 231-7587 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From benedikt.riedel at gmail.com Fri Jul 16 00:53:41 2010 From: benedikt.riedel at gmail.com (Benedikt Riedel) Date: Fri, 16 Jul 2010 04:53:41 -0000 Subject: [SciPy-User] curve_fit missing from scipy.optimize Message-ID: Hello all, I was setting up my new server at the moment and wanted to install scipy on it. I got it all setup thanks to a couple online tutorials. When I tried to run one of my scripts, I got a segmentation fault when it came to importing scipy.optimize. I then used the software manager to install another version of scipy (0.7.0-2 instead of 0.7.2). I then could at least import scipy.optimize, but scipy.optimize.curve_fit could not be found. So I installed 0.7.2 again and now scipy.optimize could be found, but curve_fit was still missing. I looked on google and could only find one solution by replacing the minpack.py file. I tried that and does not seem to work either. Any other ideas or hints? Thanks a lot in advance. Cheers, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From vilkeliskis.t at gmail.com Mon Jul 19 12:20:15 2010 From: vilkeliskis.t at gmail.com (Tadas Vilkeliskis) Date: Mon, 19 Jul 2010 16:20:15 -0000 Subject: [SciPy-User] Porting matlab's ksdensity to scipy Message-ID: <31ca5275-e3b6-4f65-a896-740f9f1eb0ba@l14g2000yql.googlegroups.com> Hi guys! I am trying to port the code written in matlab to python and I got stuck with matlab's ksdensity function. The matlab code I have looks something like this: [f, x] = ksdensity(values) I used scipy's gaussian_kde to achieve the same result; however, I modified the matlab's code to f = ksdensity(values, x) and where x are fixed values. When x are fixed gaussian_kde works fine, but this is not what I need. I want to select x based on the input values as in [f, x] = ksdensity(values) which returns x based on the input. Is there a way to do this in scipy? For instance, if I have input values [0, 1, 40, 2, 3, 2] how do I get the range of x values? Thank you very much. Tadas From Post Mon Jul 19 19:40:35 2010 From: Post (Post) Date: Mon, 19 Jul 2010 23:40:35 -0000 Subject: [SciPy-User] NDN: SciPy-User Digest, Vol 83, Issue 34 Message-ID: Sorry. Your message could not be delivered to: Archive Services (Mailbox or Conference is full.) * * * * * * * * * * * * * * * This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, by anyone other than the intended recipient, is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfeuzs at googlemail.com Wed Jul 21 08:22:03 2010 From: dfeuzs at googlemail.com (bearce meanu) Date: Wed, 21 Jul 2010 14:22:03 +0200 Subject: [SciPy-User] scipy.spatial.distance.mahalanobis & inverse covariance matrix Message-ID: Dear experts, i just switched from matlab to scipy/numpy and i am sorry for this very basic question. my goal is to calculate the mahalanobis distance btw to vectors x & y. Here is my code: from scipy.spatial.distance import mahalanobis import numpy as np x=np.random.normal(size=25) y=np.random.normal(size=25) V = np.linalg.inv(np.cov(np.concatenate((x, y)).T)) # inverse covariance matrix which gives the following error: Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/dist-packages/numpy/linalg/linalg.py", line 355, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) IndexError: tuple index out of range what is the appropriate way of calculating the inv cov matrix for this case. I really appreciate your help, BM From John.Grosspietsch at motorola.com Wed Jul 21 13:19:20 2010 From: John.Grosspietsch at motorola.com (Grosspietsch John-AJG001) Date: Wed, 21 Jul 2010 12:19:20 -0500 Subject: [SciPy-User] SciPy-User Digest, Vol 83, Issue 41 Message-ID: <8a74201cb28f8$d36d0a68$2a08b10a@ds.mot.com> "scipy-user-request at scipy.org" wrote: Send SciPy-User mailing list submissions to scipy-user at scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/scipy-user or, via email, send a message with subject or body 'help' to scipy-user-request at scipy.org You can reach the person managing the list at scipy-user-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-User digest..." Today's Topics: 1. Re: segfault using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598 (Skipper Seabold) ---------------------------------------------------------------------- Message: 1 Date: Wed, 21 Jul 2010 12:27:18 -0400 From: Skipper Seabold Subject: Re: [SciPy-User] segfault using _sparse_ svd, eigen, eigen_symmetric with svn 0.9.0.dev6598 To: SciPy Users List Cc: pietro.berkes at googlemail.com Message-ID: Content-Type: text/plain; charset=ISO-8859-1 On Wed, Jul 21, 2010 at 11:59 AM, Jose Quesada wrote: > Hi, > > We are on a coding sprint trying to implement sparse matrix support in MDP > (http://sourceforge.net/apps/mediawiki/mdp-toolkit/index.php?title=MDP_Spr int_2010). > The new sparse.linalg is very useful here. > > We are getting segfaults using _sparse_ svd, eigen, eigen_symmetric with svn > 0.9.0.dev6598. I understand that (1) this is an unreleased version, and (2) > these methods may depend on external C and fortran code that could have not > being installed well on my machine, so this may be difficult to debug. I > have added instructions to reproduce the segfault, but please ask for > anything else that could be needed and I'll try to provide it. > > I installed the svn version on a virtualenv using pip:~/.virtualenvs/sprint$ > pip install svn+http://svn.scipy.org/svn/scipy/trunk/#egg=scipyc > > This generates a long log that could contain the explanation, so I posted it > here (going as far back as my terminal's scrollback enabled: > http://pastebin.org/410867 > > Last, here's an example that reproduces the segfault: > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > #-------------------------- > # simply run an svd on a a sparse matrix, svn 0.9.0.dev6598 > #-------------------------- > import scipy > from scipy import sparse > from numpy.random import rand > > # create random sparse matrix > x = scipy.sparse.lil_matrix((1000000, 1000000)) > x[0, :100] = rand(100) > x[1, 100:200] = x[0, :100] > x.setdiag(rand(1000)) > x = x.tocsr() # convert it to CSR > #v, u, w = scipy.sparse.linalg.eigen_symmetric(x) # segmentation fault > > # try a simpler matrix > y = scipy.sparse.lil_matrix((10, 10)) > y.setdiag(range(10)) > y = y.tocsr() # convert it to CSR > #v, u, w = scipy.sparse.linalg.eigen_symmetric(y) # > I have to import the linalg separately, and my docs say that eigen_symmetric only returns w and v, so I can do import scipy.sparse.linalg as splinalg w, v = splinalg.eigen_symmetric(y) without a segfault. I'm running the most recent git mirror version of scipy. Just installed this morning. I don't know how to check the git concept of a revision number yet... > #./sampleSegFault.py > #Traceback (most recent call last): > ??? #File "./sampleSegFault.py", line 13, in > ??? #x[0, :100] = rand(100) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/spars e/lil.py", > line 319, in __setitem__ > ??? #x = lil_matrix(x, copy=False) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/spars e/lil.py", > line 98, in __init__ > ??? #A = csr_matrix(A, dtype=dtype).tolil() > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/spars e/compressed.py", > line 71, in __init__ > ??? #self._set_self( self.__class__(coo_matrix(arg1, dtype=dtype)) ) > ??? #File > "/home/quesada/.virtualenvs/sprint/lib/python2.6/site-packages/scipy/spars e/coo.py", > line 171, in __init__ > ??? #self.data? = M[self.row,self.col] > ??? #ValueError: shape mismatch: objects cannot be broadcast to a single > shape > ??? #*** glibc detected *** python: double free or corruption (!prev): > 0x0000000004075ec0 *** > > > # some other linalg methods > ly,v = scipy.sparse.linalg.eig(y) # segmentation fault > I don't have splinalg.eig, but I have splinalg.eigen and it works without segfault. Probably a bad install is my guess. I don't use pip, but you might want to just try building from source and provide the full output of the build process. Skipper ------------------------------ _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user End of SciPy-User Digest, Vol 83, Issue 41 ****************************************** From gdrude at me.com Thu Jul 22 15:38:03 2010 From: gdrude at me.com (JerryRude) Date: Thu, 22 Jul 2010 12:38:03 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] The scipy.test('1', '10') hanging on "make sure it handles relative values... ok" Message-ID: <29239203.post@talk.nabble.com> Thank you for taking the time to read this. After 2 hours of google I have not found a similar problem. I have installed EPD for the purpose of using scipy and matplotlib. When I ran the scipy.test('1','10') to test the install the function hangs after the test "make sure it handles relative values... ok". If someone happens to have some advice on what may be going on I would appreciate it. I am running this install on a Mac OSX Leopard 10.6.4 on a brand new 13" macbook pro. gfortran --version GNU Fortran (GCC) 4.4.1 Copyright (C) 2009 Free Software Foundation, Inc. gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) EPD Version 6.2-2 python -V Python 2.6.5 -- EPD 6.2-2 (32-bit) Cheers, Jerry -- View this message in context: http://old.nabble.com/The-scipy.test%28%271%27%2C%2710%27%29-hanging-on-%22make-sure-it-handles-relative-values...-ok%22-tp29239203p29239203.html Sent from the Scipy-User mailing list archive at Nabble.com. From newton at tethers.com Sun Jul 25 13:13:57 2010 From: newton at tethers.com (Tyrel Newton) Date: Sun, 25 Jul 2010 10:13:57 -0700 Subject: [SciPy-User] memory errors when using savemat In-Reply-To: References: <1622F899-A270-431B-89B6-A31C118CD71B@tethers.com> Message-ID: On Jul 25, 2010, at 9:49 AM, Pauli Virtanen wrote: > Sat, 24 Jul 2010 10:37:33 -0700, Tyrel Newton wrote: >> I'm trying to use scipy.io.savemat to export a very large set of data to >> a .mat file. The dataset contains around 20 million floats. When I try >> to export this to a .mat file, I get a MemoryError. The specific >> MemoryError is: >> >> File "C:\Python26\lib\site-packages\scipy\io\matlab\miobase.py", line >> 557 in write_bytes >> self.file_stream.write(arr.tostring(order='F')) > > What is the complete error message? -- It typically indicates the > specific part in C code the error originates from. (The full traceback, > thanks!) Attached is a Windows command line screenshot my colleague captured. > > On the other hand, along that code path it seems the only source of a > MemoryError can really be a failure to allocate memory for the tostring. > Your data apparently needs 160 MB free for this to succeed -- which is > not so much. So the question comes to what is the memory usage of the > code when saving, compared to the available free memory? > Yeah, my theory is that the tostring process is basically trying to duplicate the memory usage by creating a string that is then written to the file. This seems like an inefficient way to do it, but my understanding of the code is limited, so I'm probably missing something. -------------- next part -------------- A non-text attachment was scrubbed... Name: memory_error_cropped.png Type: image/png Size: 108251 bytes Desc: not available URL: From hua.wong at pasteur.fr Wed Jul 28 08:34:44 2010 From: hua.wong at pasteur.fr (Hua Wong) Date: Wed, 28 Jul 2010 14:34:44 +0200 Subject: [SciPy-User] Scipy.io .mat files are bigger than those made with Matlab Message-ID: <4C5023E4.5040900@pasteur.fr> Is there any option I should set to make these .mat smaller? One 2726*2726 matrix gives 10Mo with matlab and goes up to 57Mo when exported with scipy.io From andreas.hardock at isc.fraunhofer.de Thu Jul 29 02:09:03 2010 From: andreas.hardock at isc.fraunhofer.de (Hardock, Andreas) Date: Thu, 29 Jul 2010 08:09:03 +0200 Subject: [SciPy-User] Linear interpolation in 3D Message-ID: Hi, I'm trying to interpolate a 3D data (from the pic attached) with the interp2d command. What I have, are three vectors f, z, A (x, y, z respectively, A is the percentage data given on the isolines). I first put the f and z in a meshgrid and afterwards in the griddata to get a 3D-grid then started the interpolateion. I plotted the the data after gridding, and I observed that almost all nodes are ignored. Do you have any idea how to prepare data to the interp2d command? Don't hesitate to suggest any other solution. my code so far is: import numpy as np from mpl_toolkits.mplot3d import axes3d from scipy.interpolate import interp2d import matplotlib.pyplot as plt from matplotlib import mlab plt.clf() fig = plt.figure(1) ax = axes3d.Axes3D(fig) #read data (ff,ZZ,A,a) = np.loadtxt("accuracy-map.txt", unpack=True) f=np.log10(ff) z=np.log10(ZZ) ##grid everything fgrid, zgrid=np.meshgrid(f,z) #define grid ef=np.linspace(min(f), max(f), len(f)) ez=np.linspace(min(z), max(z), len(f)) Agrid=mlab.griddata(f,z,A, ef,ez) int2d=interp2d(fgrid, zgrid, Agrid, kind='linear') ax.plot(f, z, A, 'ok', markerfacecolor='w') ax.plot_surface(fgrid, zgrid, Agrid) ax.set_xlim3d((min(f), max(f))) ax.set_ylim3d(min(z), max(z)) ax.set_zlim3d(0,100) plt.show() -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: novo-error.pdf Type: application/pdf Size: 151539 bytes Desc: novo-error.pdf URL: