From gael.varoquaux at normalesup.org Wed Oct 1 01:26:26 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 1 Oct 2008 07:26:26 +0200 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <20080930213423.GB22173@phare.normalesup.org> Message-ID: <20081001052626.GA22435@phare.normalesup.org> On Tue, Sep 30, 2008 at 06:10:46PM -0400, Anne Archibald wrote: > > k=None in the third call to T.query seems redundant. It should be > > possible do put some logics so that the call is simply > > distances, indices = T.query(xs, distance_upper_bound=1.0) > Well, the problem with this is that you often want to provide a > distance upper bound as well as a number of nearest neighbors. Absolutely. I just think k should default to None, when distance_upper_bound is specified. k=None could be interpreted as k=1 when distance_uppper_bound is not specified. My 2 cents, Ga?l From barrywark at gmail.com Wed Oct 1 02:08:24 2008 From: barrywark at gmail.com (Barry Wark) Date: Tue, 30 Sep 2008 23:08:24 -0700 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: Message-ID: On Mon, Sep 29, 2008 at 8:24 PM, Anne Archibald wrote: > Hi, > > Once again there has been a thread on the numpy/scipy mailing lists > requesting (essentially) some form of spatial data structure. Pointers > have been posted to ANN (sadly LGPLed and in C++) as well as a handful > of pure-python implementations of kd-trees. I suggest the creation of > a new submodule of scipy, scipy.spatial, to contain spatial data > structures and algorithms. Specifically, I propose it contain a > kd-tree implementation providing nearest-neighbor, approximate > nearest-neighbor, and all-points-near queries. I have a few other > suggestions for things it might contain, but kd-trees seem like a good > first step. > > 2008/9/27 Nathan Bell : >> On Sat, Sep 27, 2008 at 11:18 PM, Anne Archibald >> wrote: >>> >>> I think a kd-tree implementation would be a valuable addition to >>> scipy, perhaps in a submodule scipy.spatial that might eventually >>> contain other spatial data structures and algorithms. What do you >>> think? Should we have one? Should it be based on Sturla Molden's code, >>> if the license permits? I am willing to contribute one, if not. >> >> +1 > > Judging that your vote and mine are enough in the absence of > dissenting voices, I have written an implementation based on yours, > Sturla Molden's, and the algorithms described by the authors of the > ANN library. Before integrating it into scipy, though, I'd like to > send it around for comments. > > Particular issues: > > * It's pure python right now, but with some effort it could be > partially or completely cythonized. This is probably a good idea in > the long run. In the meantime I have crudely vectorized it so that > users can at least avoid looping in their own code. > * It is able to return the r nearest neighbors, optionally subject to > a maximum distance limit, as well as approximate nearest neighbors. > * It is not currently set up to conveniently return all points within > some fixed distance of the target point, but this could easily be > added. > * It returns distances and indices of nearest neighbors in the original array. > * The testing code is, frankly, a mess. I need to look into using nose > in a sensible fashion. > * The license is the scipy license. > > I am particularly concerned about providing a convenient return > format. The natural return from a query is a list of neighbors, since > it may have variable length (there may not be r elements in the tree, > or you might have supplied a maximum distance which doesn't contain r > points). For a single query, it's simple to return a python list > (should it be sorted? currently it's a heap). But if you want to > vectorize the process, storing the results in an array becomes > cumbersome. One option is an object array full of lists; another, > which, I currently use, is an array with nonexistent points marked > with an infinite distance and an invalid index. A third would be to > return masked arrays. How do you recommend handling this variable (or > potentially-variable) sized output? > >> If you're implementing one, I would highly recommend the >> "left-balanced" kd-tree. >> http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/2535/pdf/imm2535.pdf > > Research suggests that at least in high dimension a more geometric > balancing criterion can produce vastly better results. I used the > "sliding midpoint" rule, which doesn't allow such a numerical > balancing but does ensure that you don't have too many long skinny > cells (since a sphere tends to cut very many of these). > > I've also been thinking about what else would go in scipy.spatial. I > think it would be valuable to have a reasonably efficient distance > matrix function (we seem to get that question a lot, and the answer's > not trivial) as well as a sparse distance matrix function based on the > kd-trees. The module could potentially also swallow the (currently > sandboxed?) delaunay code. > > Anne Anne, Thanks for taking this on. The scikits.ann has licensing issues (as noted above), so it would be nice to have a clean-room implementation in scipy. I am happy to port the scikits.ann API to the final API that you choose, however, if you think that would be helpful. Cheers, Barry From wnbell at gmail.com Wed Oct 1 02:10:13 2008 From: wnbell at gmail.com (Nathan Bell) Date: Wed, 1 Oct 2008 02:10:13 -0400 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: <20081001052626.GA22435@phare.normalesup.org> References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <20080930213423.GB22173@phare.normalesup.org> <20081001052626.GA22435@phare.normalesup.org> Message-ID: On Wed, Oct 1, 2008 at 1:26 AM, Gael Varoquaux wrote: > > Absolutely. I just think k should default to None, when > distance_upper_bound is specified. k=None could be interpreted as k=1 > when distance_uppper_bound is not specified. > Why not expose the various possibilities through different names? # nearest k points (possibly fewer) query_nearest(pt, k=1) # all points within given distance query_sphere(pt, distance) #nearest k points within given distance (possibly fewer) query(pt, k, distance) Few people will use the last form, but it's useful nevertheless. -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From schut at sarvision.nl Wed Oct 1 04:47:09 2008 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 01 Oct 2008 10:47:09 +0200 Subject: [Numpy-discussion] xml-rpc with numpy arrays In-Reply-To: References: <98DE1D44-A0B6-4CC6-8FD0-11F187D20862@bryant.edu> <200809301318.36701.binet@cern.ch> <200809301542.44608.binet@cern.ch> <1861310C-ED7D-4554-83BF-5751549041E7@bryant.edu> Message-ID: Lisandro Dalcin wrote: > > I believe xmlrpclib is currently the simpler approach. Some day I'll > have the time to implement something similar using MPI communication > with mpi4py. However, I believe it can be done even better: local, > client-side proxies should automatically provide access to all > members/methods of remote, server-side instances. The registering step > needed within xmlrpclib is a bit ugly ;-) > > Try pyro: or rpyc: , both of which if I recall correctly, do implement this. VS. From peridot.faceted at gmail.com Wed Oct 1 05:41:54 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 1 Oct 2008 05:41:54 -0400 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: <20081001052626.GA22435@phare.normalesup.org> References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <20080930213423.GB22173@phare.normalesup.org> <20081001052626.GA22435@phare.normalesup.org> Message-ID: 2008/10/1 Gael Varoquaux : > On Tue, Sep 30, 2008 at 06:10:46PM -0400, Anne Archibald wrote: >> > k=None in the third call to T.query seems redundant. It should be >> > possible do put some logics so that the call is simply > >> > distances, indices = T.query(xs, distance_upper_bound=1.0) > >> Well, the problem with this is that you often want to provide a >> distance upper bound as well as a number of nearest neighbors. > > Absolutely. I just think k should default to None, when > distance_upper_bound is specified. k=None could be interpreted as k=1 > when distance_uppper_bound is not specified. That seems very confusing. Better perhaps to have a function query_all_neighbors, even if it's just a wrapper. Anne From bblais at bryant.edu Wed Oct 1 05:50:47 2008 From: bblais at bryant.edu (Brian Blais) Date: Wed, 1 Oct 2008 05:50:47 -0400 Subject: [Numpy-discussion] xml-rpc with numpy arrays In-Reply-To: References: <98DE1D44-A0B6-4CC6-8FD0-11F187D20862@bryant.edu> <200809301318.36701.binet@cern.ch> <200809301542.44608.binet@cern.ch> <1861310C-ED7D-4554-83BF-5751549041E7@bryant.edu> Message-ID: <7793B4DF-E7AA-4D33-8643-A667EC137C1C@bryant.edu> On Sep 30, 2008, at 23:16 , Lisandro Dalcin wrote: > On Tue, Sep 30, 2008 at 9:27 PM, Brian Blais >> thanks for all of the help. My initial solution is to pickle my >> object, >> with the text-based version of pickle, and send it across rpc. I >> do this >> because the actual thing I am sending is a dictionary, with lots >> of arrays, >> and other things. I'll have a look at the format that Robert >> sent, because >> that looks useful for other things I am doing. > > Did you try to send binary pickles (protocol=2)? Perhaps it works, > give a try! Of course, you need the client and server machines having > the same arch. I tried that first, and marshal doesn't handle binary streams, so that's when I tried the text version. I'm not sending a lot of info, so it's not that big of a deal. Most of the time is spent on the simulation on the other end. bb -- Brian Blais bblais at bryant.edu http://web.bryant.edu/~bblais -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Wed Oct 1 05:57:51 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 1 Oct 2008 05:57:51 -0400 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: Message-ID: 2008/10/1 Barry Wark : > Thanks for taking this on. The scikits.ann has licensing issues (as > noted above), so it would be nice to have a clean-room implementation > in scipy. I am happy to port the scikits.ann API to the final API that > you choose, however, if you think that would be helpful. That's a nice idea. I'm not totally sure yet how much it's going to be possible for different implementations to be plug-in replacements, but it sure would be nice if users could use ANN transparently. Anne From philbinj at gmail.com Wed Oct 1 08:40:20 2008 From: philbinj at gmail.com (James Philbin) Date: Wed, 1 Oct 2008 13:40:20 +0100 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> Message-ID: <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> > distances, indices = T.query(xs) # single nearest neighbor I'm not sure if it's implied, but can xs be a NxD matrix here i.e. query for all N points rather than just one. This will reduce the python call overhead for large queries. Also, I have some c++ code for locality sensitive hashing which might be useful? James From dmitrey.kroshko at scipy.org Wed Oct 1 09:04:27 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 01 Oct 2008 16:04:27 +0300 Subject: [Numpy-discussion] why type(array(1).tolist()) is int? Message-ID: <48E3755B.2050805@scipy.org> hi all, why array(1).tolist() returns 1? I expected to get [1] instead. D. From dmitrey.kroshko at scipy.org Wed Oct 1 09:34:30 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 01 Oct 2008 16:34:30 +0300 Subject: [Numpy-discussion] why type(array(1).tolist()) is int? In-Reply-To: <48E3755B.2050805@scipy.org> References: <48E3755B.2050805@scipy.org> Message-ID: <48E37C66.3020805@scipy.org> let me also note that list(array((1))) returns Traceback (innermost last): File "", line 1, in TypeError: iteration over a 0-d array D. dmitrey wrote: > hi all, > why array(1).tolist() returns 1? I expected to get [1] instead. > D. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > From dmitrey.kroshko at scipy.org Wed Oct 1 09:38:37 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 01 Oct 2008 16:38:37 +0300 Subject: [Numpy-discussion] will array(Python set) be ever implemented as cast method? Message-ID: <48E37D5D.2060903@scipy.org> hi all, will array(Python set) (and asarray, asfarray etc) ever be implemented as cast method? Now it just puts the set into 1st element: >>> asarray(set([11, 12, 13, 14])) array(set([11, 12, 13, 14]), dtype=object) >>> array(set([11, 12, 13, 14])) array(set([11, 12, 13, 14]), dtype=object) Currently I use array(list(my_set)) instead. D. From aisaac at american.edu Wed Oct 1 09:46:47 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 01 Oct 2008 09:46:47 -0400 Subject: [Numpy-discussion] why type(array(1).tolist()) is int? In-Reply-To: <48E3755B.2050805@scipy.org> References: <48E3755B.2050805@scipy.org> Message-ID: <48E37F47.2080208@american.edu> On 10/1/2008 9:04 AM dmitrey apparently wrote: > why array(1).tolist() returns 1? I expected to get [1] instead. I guess I would expect it not to work at all. Given that it does work, this seems the best result. What list shape matches the shape of a 0-d array? What is the use case that makes this seem wrong? Alan Isaac From oliphant at enthought.com Wed Oct 1 09:58:32 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 01 Oct 2008 08:58:32 -0500 Subject: [Numpy-discussion] why type(array(1).tolist()) is int? In-Reply-To: <48E37C66.3020805@scipy.org> References: <48E3755B.2050805@scipy.org> <48E37C66.3020805@scipy.org> Message-ID: <48E38208.2090104@enthought.com> dmitrey wrote: > let me also note that list(array((1))) returns > > Traceback (innermost last): > File "", line 1, in > TypeError: iteration over a 0-d array > > D. > This is expected. 0-d arrays are currently not iterable. -Travis From oliphant at enthought.com Wed Oct 1 09:59:41 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 01 Oct 2008 08:59:41 -0500 Subject: [Numpy-discussion] will array(Python set) be ever implemented as cast method? In-Reply-To: <48E37D5D.2060903@scipy.org> References: <48E37D5D.2060903@scipy.org> Message-ID: <48E3824D.5020503@enthought.com> dmitrey wrote: > hi all, > will array(Python set) (and asarray, asfarray etc) ever be implemented > as cast method? > Use fromiter instead. We could special case set objects in array(...) if that is deemed desirable. -Travis From dmitrey.kroshko at scipy.org Wed Oct 1 09:57:26 2008 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 01 Oct 2008 16:57:26 +0300 Subject: [Numpy-discussion] why type(array(1).tolist()) is int? In-Reply-To: <48E37F47.2080208@american.edu> References: <48E3755B.2050805@scipy.org> <48E37F47.2080208@american.edu> Message-ID: <48E381C6.8010800@scipy.org> Alan G Isaac wrote: > On 10/1/2008 9:04 AM dmitrey apparently wrote: > >> why array(1).tolist() returns 1? I expected to get [1] instead. >> > > I guess I would expect it not to work at all. > Given that it does work, this seems the best result. > What list shape matches the shape of a 0-d array? > > What is the use case that makes this seem wrong? > Because I just expect something.tolist() return *Type* list, not *Type* integer. tolist documentation says "Return the array as a list or nested lists" and nothing about possibility to return anything else. As for my situation I store the list in my data field and then call for item from prob.my_list: do_something() D. > Alan Isaac > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > From travis at enthought.com Wed Oct 1 10:36:54 2008 From: travis at enthought.com (Travis Vaught) Date: Wed, 1 Oct 2008 09:36:54 -0500 Subject: [Numpy-discussion] Texas Python Regional Unconference Reminders Message-ID: <0107962E-D762-497B-BCEC-24CBD78B381B@enthought.com> Greetings, The Texas Python Regional Unconference is coming up this weekend (October 4-5) and I wanted to send out some more details of the meeting. The web page for the meeting is here: http://www.scipy.org/TXUncon2008 The meeting is _absolutely free_, so please add yourself to the Attendees page if you're able to make it. Also, if you're planning to attend, please send me the following information (to travis at enthought.com ) so I can request wireless access for you during the meeting: - Full Name - Phone or email - Address - Affiliation There are still opportunities to present your pet projects at the meeting, so feel free to sign up on the presentation schedule here: http://www.scipy.org/TXUncon2008Schedule For those who are in town Friday evening, we're planning to get together for a casual dinner in downtown Austin that night. We'll meet at Enthought offices (http://www.enthought.com/contact/map-directions.php ) and walk to a casual restaurant nearby. Show up as early as 5:30pm and you can hang out and tour the Enthought offices--we'll head out to eat at 7:00pm sharp. Best, Travis From lists_ravi at lavabit.com Wed Oct 1 15:21:29 2008 From: lists_ravi at lavabit.com (Ravi) Date: Wed, 1 Oct 2008 15:21:29 -0400 Subject: [Numpy-discussion] PyArray_Resize reference counting In-Reply-To: <48E2AE1D.9080802@enthought.com> References: <200809301548.10782.lists_ravi@lavabit.com> <200809301720.53958.lists_ravi@lavabit.com> <48E2AE1D.9080802@enthought.com> Message-ID: <200810011521.29476.lists_ravi@lavabit.com> On Tuesday 30 September 2008 18:54:21 Travis E. Oliphant wrote: > I just went to the code and noticed that PyArray_Resize returns None. ? > So, you certainly don't want to point array to it. ?The array does not > get any reference count changes. Thanks for the very clear explanation. > PyObject *dummy; > dummy = PyArray_Resize(array, ...) > if (dummy == NULL) goto fail; > Py_DECREF(dummy) > > is what you need to do. If dummy is NULL, is the original array guaranteed to be unchanged (strong exception safety with rollback semantics)? If not, is the original array guaranteed to be at least in an internally consistent state (weak exception safety)? In general, do functions modifying numpy arrays provide at least weak exception safety guarantee? Or do they go one step further and provide rollback semantics in case of exceptions? Regards, Ravi From dblubaugh at belcan.com Wed Oct 1 16:04:13 2008 From: dblubaugh at belcan.com (Blubaugh, David A.) Date: Wed, 1 Oct 2008 16:04:13 -0400 Subject: [Numpy-discussion] f2py IS NOW WORKING Message-ID: <27CC3060AF71DA40A5DC85F7D5B70F380544D48C@AWMAIL04.belcan.com> To all, I have now been able to develop a stable file via f2py!! However, I had to execute the following: 1.) First, I had to copy all required library files from my selected Compaq visual Fortran compiler under python's scripts directory along with f2py itself. 2.) I also had to include a dll from my compiler under python's dll directory as well. I know that the reason as to why I needed to execute these actions, is that I do not know as to what should be my correct environmental variables within windows XP running Compaq Visual Fortran 6.6. Once again, I would appreciate to know as to what are the correct environmental variables should I set my windows xp under, given that the compiler I must utilize is a Compaq Visual Fortran Compiler 6.6??? Thanks, David Blubaugh This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. From oc-spam66 at laposte.net Wed Oct 1 16:46:10 2008 From: oc-spam66 at laposte.net (oc-spam66) Date: Wed, 1 Oct 2008 22:46:10 +0200 (CEST) Subject: [Numpy-discussion] Vectorization of the product of several matrices ? Message-ID: <17478156.887841222893970667.JavaMail.www@wwinf8216> Hello and thank you for your answer. > There are at least three methods I can think of, but choosing the best one > requires more information. How long are the lists? Do the arrays have > variable dimensions? The simplest and most adaptable method is probably The lists would be made of 4x4 matrices : LM = [M_i, i=1..N], M_i 4x4 matrix LN = [N_i, i=1..N], N_i 4x4 matrix. N would be 1000 or more (why not 100000... if the computation is not too long) > In [3]: P = [m*n for m,n in zip(M,N)] Thank you for this one. I am curious about other possibilities. And also : is there a document about how the python interpreter works ? (order of the operations, typical timings, ...) Regards, O.C. Cr?ez votre adresse ?lectronique prenom.nom at laposte.net 1 Go d'espace de stockage, anti-spam et anti-virus int?gr?s. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglist.honeypot at gmail.com Wed Oct 1 20:56:18 2008 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Wed, 1 Oct 2008 20:56:18 -0400 Subject: [Numpy-discussion] Texas Python Regional Unconference Reminders In-Reply-To: <0107962E-D762-497B-BCEC-24CBD78B381B@enthought.com> References: <0107962E-D762-497B-BCEC-24CBD78B381B@enthought.com> Message-ID: Hi, Are there any plans to tape the presentations? Unfortunately some of us can't make it down to Texas, but the talks look quite interesting. Thanks, -steve On Oct 1, 2008, at 10:36 AM, Travis Vaught wrote: > Greetings, > > The Texas Python Regional Unconference is coming up this weekend > (October 4-5) and I wanted to send out some more details of the > meeting. The web page for the meeting is here: > > http://www.scipy.org/TXUncon2008 > > The meeting is _absolutely free_, so please add yourself to the > Attendees page if you're able to make it. Also, if you're planning to > attend, please send me the following information (to travis at enthought.com > ) so I can request wireless access for you during the meeting: > > - Full Name > - Phone or email > - Address > - Affiliation > > There are still opportunities to present your pet projects at the > meeting, so feel free to sign up on the presentation schedule here: > > http://www.scipy.org/TXUncon2008Schedule > > For those who are in town Friday evening, we're planning to get > together for a casual dinner in downtown Austin that night. We'll > meet at Enthought offices (http://www.enthought.com/contact/map-directions.php > ) and walk to a casual restaurant nearby. Show up as early as 5:30pm > and you can hang out and tour the Enthought offices--we'll head out to > eat at 7:00pm sharp. > > Best, > > Travis > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Wed Oct 1 21:16:02 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Oct 2008 19:16:02 -0600 Subject: [Numpy-discussion] Vectorization of the product of several matrices ? In-Reply-To: <17478156.887841222893970667.JavaMail.www@wwinf8216> References: <17478156.887841222893970667.JavaMail.www@wwinf8216> Message-ID: On Wed, Oct 1, 2008 at 2:46 PM, oc-spam66 wrote: > Hello and thank you for your answer. > > > There are at least three methods I can think of, but choosing the best > one > > requires more information. How long are the lists? Do the arrays have > > variable dimensions? The simplest and most adaptable method is probably > > The lists would be made of 4x4 matrices : > LM = [M_i, i=1..N], M_i 4x4 matrix > LN = [N_i, i=1..N], N_i 4x4 matrix. > > N would be 1000 or more (why not 100000... if the computation is not too > long) > > > > In [3]: P = [m*n for m,n in zip(M,N)] > > Thank you for this one. I am curious about other possibilities. > For stacks of small matrices of same dimensions stored in three dimensional arrays (not matrices), you can do In [1]: M = array([eye(4)]*2) In [2]: N = array([eye(4)]*2) In [3]: P = (M[...,:,:,newaxis]*N[...,newaxis,::]).sum(axis=-2) In [4]: P Out[4]: array([[[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]], [[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]]]) This will use more memory than the first alternative. Whether or not it might be faster and what the tradeoffs are will depend on the problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From f.yw at hotmail.com Wed Oct 1 21:27:27 2008 From: f.yw at hotmail.com (frank wang) Date: Wed, 1 Oct 2008 19:27:27 -0600 Subject: [Numpy-discussion] Help to process a large data file In-Reply-To: References: <9911419a0809232336n89d6e4djcc59ad71f28732f5@mail.gmail.com> <9457e7c80809240810l6bf656aey70434d893200aa98@mail.gmail.com> <9457e7c80809250051y755e232bl8ca79847db186b2@mail.gmail.com> <48E14D27.5040604@enthought.com> <3d375d730809291822x2c4fb34dq19aab6e9466e2d64@mail.gmail.com> Message-ID: Hi, I have a large data file which contains 2 columns of data. The two columns only have zero and one. Now I want to cound how many one in between if both columns are one. For example, if my data is: 1 0 0 0 1 1 0 0 0 1 x 0 1 x 0 0 0 1 x 1 1 0 0 0 1 x 0 1 x 1 1 Then my count will be 3 and 2 (the numbers with x). Are there an efficient way to do this? My data file is pretty big. Thanks Frank _________________________________________________________________ See how Windows connects the people, information, and fun that are part of your life. http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 1 21:42:36 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Oct 2008 19:42:36 -0600 Subject: [Numpy-discussion] Vectorization of the product of several matrices ? In-Reply-To: References: <17478156.887841222893970667.JavaMail.www@wwinf8216> Message-ID: On Wed, Oct 1, 2008 at 7:16 PM, Charles R Harris wrote: > > > On Wed, Oct 1, 2008 at 2:46 PM, oc-spam66 wrote: > >> Hello and thank you for your answer. >> >> > There are at least three methods I can think of, but choosing the best >> one >> > requires more information. How long are the lists? Do the arrays have >> > variable dimensions? The simplest and most adaptable method is probably >> >> The lists would be made of 4x4 matrices : >> LM = [M_i, i=1..N], M_i 4x4 matrix >> LN = [N_i, i=1..N], N_i 4x4 matrix. >> >> N would be 1000 or more (why not 100000... if the computation is not too >> long) >> >> >> > In [3]: P = [m*n for m,n in zip(M,N)] >> >> Thank you for this one. I am curious about other possibilities. >> > > For stacks of small matrices of same dimensions stored in three dimensional > arrays (not matrices), you can do > > In [1]: M = array([eye(4)]*2) > > In [2]: N = array([eye(4)]*2) > > In [3]: P = (M[...,:,:,newaxis]*N[...,newaxis,::]).sum(axis=-2) > Should be In [5]: P = (M[...,:,:,newaxis]*N[...,newaxis,:,:]).sum(axis=-2) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 1 22:41:31 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Oct 2008 20:41:31 -0600 Subject: [Numpy-discussion] Portable functions for nans, signbit, etc. Message-ID: The normal signbit function of gcc without the -std=c99 flag doesn't work correctly for nans and infs. I found the following code on a boost mailing list and it might be helpful here for portability. const boost::uint32_t signbit_mask = binary_cast(1.0f) ^ binary_cast(-1.0f); inline bool signbit(float x) { return binary_cast(x) & signbit_mask; } inline bool signbit(double x) { return signbit(static_cast(x)); } inline bool signbit(long double x) { return signbit(static_cast(x)); } Which is rather clever. I think binary_cast will require some pointer abuse. There are a bunch of other boost functions here that might prove useful. This file floating_point_utilities_v3.zip portable isnan, fpclassify, signbit etc. + facets for portable handling of infinity and NaN in text streams Looks particularly interesting. It's a bit large for the mailing list so I won't attach it. The Boost license should be compatible with numpy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From johngu at gmail.com Thu Oct 2 00:15:38 2008 From: johngu at gmail.com (John Gu) Date: Wed, 1 Oct 2008 23:15:38 -0500 Subject: [Numpy-discussion] complex numpy.ndarray dtypes Message-ID: <79258dea0810012115o1a77aebbg429d471d42e06d3c@mail.gmail.com> Hello, I am using numpy in conjunction with pyTables. The data that I read in from pyTables seem to have the following dtype: p = hdf5.root.myTable.read() p.__class__ p[0].__class__ p.dtype dtype([('time', ' Traceback (most recent call last) p:\AsiaDesk\johngu\projects\deltaForce\ in () : expected a readable buffer object I did find some documentation about array type descriptors when reading from files... it seems like these array types are specific to arrays created when reading from some sort of file / buffer? Any help is appreciated. Thanks! John -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Oct 2 02:03:43 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2008 00:03:43 -0600 Subject: [Numpy-discussion] nan, sign, and all that Message-ID: Hi All, I've added ufuncs fmin and fmax that behave as follows: In [3]: a = array([NAN, 0, NAN, 1]) In [4]: b = array([0, NAN, NAN, 0]) In [5]: fmax(a,b) Out[5]: array([ 0., 0., NaN, 1.]) In [6]: fmin(a,b) Out[6]: array([ 0., 0., NaN, 0.]) In [7]: fmax.reduce(a) Out[7]: 1.0 In [8]: fmin.reduce(a) Out[8]: 0.0 In [9]: fmax.reduce([NAN,NAN]) Out[9]: nan In [10]: fmin.reduce([NAN,NAN]) Out[10]: nan I also made the sign ufunc return the sign of nan. That works, but I'm not sure it is the way to go because there doesn't seem to be any spec as to what sign nan takes. The current np.nan on my machine is negative and 0/0, inf/inf all return negative nan. So it doesn't look like the actual sign of nan makes any sense. Currently sign(NAN) returns 0, which doesn't look right either, so I think the thing to do is return nan but this will be a change in numpy behavior. Any thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Oct 2 03:37:12 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 2 Oct 2008 09:37:12 +0200 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: References: Message-ID: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> Hi Charles, 2008/10/2 Charles R Harris : > In [3]: a = array([NAN, 0, NAN, 1]) > In [4]: b = array([0, NAN, NAN, 0]) > > In [5]: fmax(a,b) > Out[5]: array([ 0., 0., NaN, 1.]) > > In [6]: fmin(a,b) > Out[6]: array([ 0., 0., NaN, 0.]) These are great, many thanks! My only gripe is that they have the same NaN-handling as amin and friends, which I consider to be broken. Others also mentioned that this should be changed, and I think David C wrote a patch for it (but I am not informed as to the speed implications). If I had to choose, this would be my preferred output: In [5]: fmax(a,b) Out[5]: array([ NaN, NaN, NaN, 1.]) Cheers St?fan From robert.kern at gmail.com Thu Oct 2 03:42:58 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2008 02:42:58 -0500 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> Message-ID: <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> On Thu, Oct 2, 2008 at 02:37, St?fan van der Walt wrote: > Hi Charles, > > 2008/10/2 Charles R Harris : >> In [3]: a = array([NAN, 0, NAN, 1]) >> In [4]: b = array([0, NAN, NAN, 0]) >> >> In [5]: fmax(a,b) >> Out[5]: array([ 0., 0., NaN, 1.]) >> >> In [6]: fmin(a,b) >> Out[6]: array([ 0., 0., NaN, 0.]) > > These are great, many thanks! > > My only gripe is that they have the same NaN-handling as amin and > friends, which I consider to be broken. No, these follow well-defined C99 semantics of the fmin() and fmax() functions in libm. If exactly one of the arguments is a NaN, the non-NaN argument is returned. This is *not* the current behavior of amin() et al., which just do naive comparisons. > Others also mentioned that > this should be changed, and I think David C wrote a patch for it (but > I am not informed as to the speed implications). > > If I had to choose, this would be my preferred output: > > In [5]: fmax(a,b) > Out[5]: array([ NaN, NaN, NaN, 1.]) Chuck proposes letting minimum() and maximum() have that behavior. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From nwagner at iam.uni-stuttgart.de Thu Oct 2 03:48:24 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Thu, 02 Oct 2008 09:48:24 +0200 Subject: [Numpy-discussion] loadtxt Message-ID: Hi all, how can I load ASCII data if the file contains characters instead of floats Traceback (most recent call last): File "test_csv.py", line 2, in A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';') NameError: name 'char' is not defined Nils From stefan at sun.ac.za Thu Oct 2 04:10:49 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 2 Oct 2008 10:10:49 +0200 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> Message-ID: <9457e7c80810020110i29986675j3d1ccb1f91c6cd98@mail.gmail.com> 2008/10/2 Robert Kern : >> My only gripe is that they have the same NaN-handling as amin and >> friends, which I consider to be broken. > > No, these follow well-defined C99 semantics of the fmin() and fmax() > functions in libm. If exactly one of the arguments is a NaN, the > non-NaN argument is returned. This is *not* the current behavior of > amin() et al., which just do naive comparisons. Let me rephrase: I'm not convinced that these C99 semantics provide an optimal user experience. It worries me greatly that NaN's pop up in operations and then disappear again. It is entirely possible for a script to run without failure and spew out garbage without the user ever knowing. >> Others also mentioned that >> this should be changed, and I think David C wrote a patch for it (but >> I am not informed as to the speed implications). >> >> If I had to choose, this would be my preferred output: >> >> In [5]: fmax(a,b) >> Out[5]: array([ NaN, NaN, NaN, 1.]) > > Chuck proposes letting minimum() and maximum() have that behavior. That would be a good start, which would be complemented by educating the user via some appropriate mechanism (I still don't know if one exists; there is no NumPy Paperclip TM that states "You have decided to commit scientific suicide. Would you like me to cut your wrists?"). That's meant only half-tongue-in-cheekedly :) Thanks for your comments, Cheers St?fan From cournape at gmail.com Thu Oct 2 04:23:59 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 2 Oct 2008 17:23:59 +0900 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> Message-ID: <5b8d13220810020123h16fb733dp7aea5085402a9344@mail.gmail.com> On Thu, Oct 2, 2008 at 4:37 PM, St?fan van der Walt wrote: > > These are great, many thanks! > > My only gripe is that they have the same NaN-handling as amin and > friends, which I consider to be broken. Others also mentioned that > this should be changed, and I think David C wrote a patch for it (but > I am not informed as to the speed implications). Hopefully, Chuck and me synchronised a bit on this :) The idea is that before, I thought that there was a nan ignoring and nan propagating behavior. Robert later mentioned that fmin/fmax has a third, well specified behavior in C99. All those three are useful, and as such have been more or less implemented by Chuck or me. I think having the new C functions by Chuck makes sense as a new python API, to follow C99 fmax/fmin. They could be used for the new max/min, but then, it feels it a bit strange compared to nanmax/nanmin, so I would prefer having the *current* numpy.max and numpy.min propagate the NaN, and nanmax/nanmin ignoring the NaN altogether. Also note that matlab does not propagate NaN for max/min. The last question is FPU status flag handling: I thought comparing NaN directly with < would throw a FPE_INVALID. But this is not the case (at least on Linux with glibc and Mac OS X). This is confusing because I thought the whole point of C99 macro isgreater was to not throw this. This is also how I understand both glibc manual and mac os x man isgreater. Robert, do you have any insight on this ? David From faltet at pytables.org Thu Oct 2 04:28:23 2008 From: faltet at pytables.org (Francesc Alted) Date: Thu, 2 Oct 2008 10:28:23 +0200 Subject: [Numpy-discussion] complex numpy.ndarray dtypes In-Reply-To: <79258dea0810012115o1a77aebbg429d471d42e06d3c@mail.gmail.com> References: <79258dea0810012115o1a77aebbg429d471d42e06d3c@mail.gmail.com> Message-ID: <200810021028.24155.faltet@pytables.org> A Thursday 02 October 2008, John Gu escrigu?: > Hello, > > I am using numpy in conjunction with pyTables. The data that I read > in from pyTables seem to have the following dtype: > > p = hdf5.root.myTable.read() > > p.__class__ > > > p[0].__class__ > > > p.dtype > dtype([('time', ' ' > p.shape > (61230,) > > The manner in which I access a particular column is p['time'] or > p['obs1']. I have a couple of questions regarding this data > structure: 1) how do I restructure the array into a 61230 x 4 array > that can be indexed using [r,c] notation? In your example, the table (record array in NumPy jargon) is inhomogeneous (all fields are 'f4' except 'obs2' which is 'f8'). In that case, you can obtain an homogeneous array by doing something like: In [44]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1',' 2) What kind of dtype is > pyTables using? How do I create a similar array that can be indexed > by a named column? I tried various ways: > > a = array([[1,2],[3,4]], > dtype=dtype([('obs1',' --------------------------------------------------------------------- >------ Traceback (most > recent call last) > > p:\AsiaDesk\johngu\projects\deltaForce\ in > () > > : expected a readable buffer object Yeah, the error message is too terse in this case. Record array constructor needs to be sure where your records start and end, and this is achieved by mapping tuples to records. So, your example must be rewritten as: In [70]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1',' I did find some documentation about array type descriptors when > reading from files... it seems like these array types are specific to > arrays created when reading from some sort of file / buffer? Any > help is appreciated. Thanks! I'm not sure on what you are asking here. At any rate, it might be useful to have a look at complex dtype examples in: http://www.scipy.org/Numpy_Example_List#head-f9175c69cccd74b9e4ee92e2a060af27c7447b76 Hope that helps, -- Francesc Alted From faltet at pytables.org Thu Oct 2 04:31:08 2008 From: faltet at pytables.org (Francesc Alted) Date: Thu, 2 Oct 2008 10:31:08 +0200 Subject: [Numpy-discussion] loadtxt In-Reply-To: References: Message-ID: <200810021031.08853.faltet@pytables.org> A Thursday 02 October 2008, Nils Wagner escrigu?: > Hi all, > > how can I load ASCII data if the file contains characters > instead of floats > > Traceback (most recent call last): > File "test_csv.py", line 2, in > A = loadtxt('ca6_sets.csv',dtype=char ,delimiter=';') > NameError: name 'char' is not defined You would need to specify the length of your strings. Try with dtype="SN", where N is the expected length of the strings. Cheers, -- Francesc Alted From cournape at gmail.com Thu Oct 2 04:41:43 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 2 Oct 2008 17:41:43 +0900 Subject: [Numpy-discussion] Portable functions for nans, signbit, etc. In-Reply-To: References: Message-ID: <5b8d13220810020141w74d207cen9809a0363232346b@mail.gmail.com> On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris wrote: > > Which is rather clever. I think binary_cast will require some pointer abuse. Yep (the funny thing is that the bit twiddling will likely end up more readable than this C++ stuff) cheers, David From stefan at sun.ac.za Thu Oct 2 05:15:46 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 2 Oct 2008 11:15:46 +0200 Subject: [Numpy-discussion] loadtxt In-Reply-To: <200810021031.08853.faltet@pytables.org> References: <200810021031.08853.faltet@pytables.org> Message-ID: <9457e7c80810020215h3aed755doa8a301e6cbb38761@mail.gmail.com> 2008/10/2 Francesc Alted : >> how can I load ASCII data if the file contains characters >> instead of floats > > You would need to specify the length of your strings. Try with > dtype="SN", where N is the expected length of the strings. Other options include: - using converters to convert the character to a value: np.loadtxt('/tmp/bleh.dat', converters={2: lambda x: 0}) - Skipping the specified column: np.loadtxt('/tmp/bleh.dat', usecols=(0,1)) Cheers St?fan From pete.forman at westerngeco.com Thu Oct 2 05:50:59 2008 From: pete.forman at westerngeco.com (Pete Forman) Date: Thu, 02 Oct 2008 10:50:59 +0100 Subject: [Numpy-discussion] nan, sign, and all that References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> <9457e7c80810020110i29986675j3d1ccb1f91c6cd98@mail.gmail.com> Message-ID: "St?fan van der Walt" writes: > Let me rephrase: I'm not convinced that these C99 semantics provide > an optimal user experience. It worries me greatly that NaN's pop > up in operations and then disappear again. It is entirely possible > for a script to run without failure and spew out garbage without > the user ever knowing. By default NaNs are propagated through operations on them. At the end of this discussion we ought to end up with a list of functions such as fmax, isnan, and copysign that are the exceptions. I think that it is right to defer to IEEE for their decisions on the behavior of NaNs, etc. That is what C and Fortran are doing. I have not checked but I would guess that CPUs and FPUs behave that way too. So it should be easier and faster to follow IEEE. Note that in the just released Python 2.6 floating point support of IEEE 754 has been beefed up. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman at westerngeco.com -./\.- the opinion of Schlumberger or http://petef.22web.net -./\.- WesternGeco. From charlesr.harris at gmail.com Thu Oct 2 09:22:28 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2008 07:22:28 -0600 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> Message-ID: On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern wrote: > On Thu, Oct 2, 2008 at 02:37, St?fan van der Walt > wrote: > > Hi Charles, > > > > 2008/10/2 Charles R Harris : > >> In [3]: a = array([NAN, 0, NAN, 1]) > >> In [4]: b = array([0, NAN, NAN, 0]) > >> > >> In [5]: fmax(a,b) > >> Out[5]: array([ 0., 0., NaN, 1.]) > >> > >> In [6]: fmin(a,b) > >> Out[6]: array([ 0., 0., NaN, 0.]) > > > > These are great, many thanks! > > > > My only gripe is that they have the same NaN-handling as amin and > > friends, which I consider to be broken. > > No, these follow well-defined C99 semantics of the fmin() and fmax() > functions in libm. If exactly one of the arguments is a NaN, the > non-NaN argument is returned. This is *not* the current behavior of > amin() et al., which just do naive comparisons. > > > Others also mentioned that > > this should be changed, and I think David C wrote a patch for it (but > > I am not informed as to the speed implications). > > > > If I had to choose, this would be my preferred output: > > > > In [5]: fmax(a,b) > > Out[5]: array([ NaN, NaN, NaN, 1.]) > > Chuck proposes letting minimum() and maximum() have that behavior. > Yes. If there is any agreement on this I would like to go ahead and do it. It does change the current behavior of maximum and minimum. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Oct 2 09:11:15 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 02 Oct 2008 22:11:15 +0900 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> Message-ID: <48E4C873.6080200@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > Yes. If there is any agreement on this I would like to go ahead and do > it. It does change the current behavior of maximum and minimum. If you do it, please do it with as many tests as possible (it should not be difficult to have a comprehensive test with *all* float data types), because this is likely to cause problems on some platforms. thanks, David From charlesr.harris at gmail.com Thu Oct 2 09:31:29 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2008 07:31:29 -0600 Subject: [Numpy-discussion] Portable functions for nans, signbit, etc. In-Reply-To: <5b8d13220810020141w74d207cen9809a0363232346b@mail.gmail.com> References: <5b8d13220810020141w74d207cen9809a0363232346b@mail.gmail.com> Message-ID: On Thu, Oct 2, 2008 at 2:41 AM, David Cournapeau wrote: > On Thu, Oct 2, 2008 at 11:41 AM, Charles R Harris > wrote: > > > > > Which is rather clever. I think binary_cast will require some pointer > abuse. > > Yep (the funny thing is that the bit twiddling will likely end up more > readable than this C++ stuff) > The zip file has the bit twiddling, which is worth looking at if only for the note on the PPC extended precision. Motorola seems to be a problem but I don't think we support any of the 66xx series. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Thu Oct 2 09:46:19 2008 From: david.huard at gmail.com (David Huard) Date: Thu, 2 Oct 2008 09:46:19 -0400 Subject: [Numpy-discussion] Help to process a large data file In-Reply-To: References:

<9457e7c80809250051y755e232bl8ca79847db186b2@mail.gmail.com> <48E14D27.5040604@enthought.com> <3d375d730809291822x2c4fb34dq19aab6e9466e2d64@mail.gmail.com>

Message-ID: <91cf711d0810020646h125c7589v21ae89fb93ab9eed@mail.gmail.com> Frank, How about that: x = np.loadtxt('file') z = x.sum(1) # Reduce data to an array of 0,1,2 rz = z[z>0] # Remove all 0s since you don't want to count those. loc = np.where(rz==2)[0] # The location of the (1,1)s count = np.diff(loc) - 1 # The spacing between those (1,1)s, ie, the number of elements that have one 1. HTH, David On Wed, Oct 1, 2008 at 9:27 PM, frank wang wrote: > Hi, > > I have a large data file which contains 2 columns of data. The two columns > only have zero and one. Now I want to cound how many one in between if both > columns are one. For example, if my data is: > > 1 0 > 0 0 > 1 1 > 0 0 > 0 1 x > 0 1 x > 0 0 > 0 1 x > 1 1 > 0 0 > 0 1 x > 0 1 x > 1 1 > > Then my count will be 3 and 2 (the numbers with x). > > Are there an efficient way to do this? My data file is pretty big. > > Thanks > > Frank > > ------------------------------ > See how Windows connects the people, information, and fun that are part of > your life. See Now > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orionbelt2 at gmail.com Thu Oct 2 11:43:37 2008 From: orionbelt2 at gmail.com (orionbelt2 at gmail.com) Date: Thu, 2 Oct 2008 17:43:37 +0200 Subject: [Numpy-discussion] Help to process a large data file In-Reply-To: References: <9457e7c80809240810l6bf656aey70434d893200aa98@mail.gmail.com> <9457e7c80809250051y755e232bl8ca79847db186b2@mail.gmail.com> <48E14D27.5040604@enthought.com> <3d375d730809291822x2c4fb34dq19aab6e9466e2d64@mail.gmail.com>

Message-ID: <20081002154337.GA20908@ulb.ac.be> Frank, I would imagine that you cannot get a much better performance in python than this, which avoids string conversions: c = [] count = 0 for line in open('foo'): if line == '1 1\n': c.append(count) count = 0 else: if '1' in line: count += 1 One could do some numpy trick like: a = np.loadtxt('foo',dtype=int) a = np.sum(a,axis=1) # Add the two columns horizontally b = np.where(a==2)[0] # Find with sum == 2 (1 + 1) count = [] for i,j in zip(b[:-1],b[1:]): count.append( a[i+1:j].sum() ) # Calculate number of lines with 1 but on my machine the numpy version takes about 20 sec for a 'foo' file of 2,500,000 lines versus 1.2 sec for the pure python version... As a side note, if i replace "line == '1 1\n'" with "line.startswith('1 1')", the pure python version goes up to 1.8 sec... Isn't this a bit weird, i'd think startswith() should be faster... Chris On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote: > Hi, > > I have a large data file which contains 2 columns of data. The two > columns only have zero and one. Now I want to cound how many one in > between if both columns are one. For example, if my data is: > > 1 0 > 0 0 > 1 1 > 0 0 > 0 1 x > 0 1 x > 0 0 > 0 1 x > 1 1 > 0 0 > 0 1 x > 0 1 x > 1 1 > > Then my count will be 3 and 2 (the numbers with x). > > Are there an efficient way to do this? My data file is pretty big. > > Thanks > > Frank From f.yw at hotmail.com Thu Oct 2 15:20:09 2008 From: f.yw at hotmail.com (frank wang) Date: Thu, 2 Oct 2008 13:20:09 -0600 Subject: [Numpy-discussion] Help to process a large data file In-Reply-To: <20081002154337.GA20908@ulb.ac.be> References: <9457e7c80809240810l6bf656aey70434d893200aa98@mail.gmail.com> <9457e7c80809250051y755e232bl8ca79847db186b2@mail.gmail.com> <48E14D27.5040604@enthought.com> <3d375d730809291822x2c4fb34dq19aab6e9466e2d64@mail.gmail.com>

<20081002154337.GA20908@ulb.ac.be> Message-ID: Thans David and Chris for providing the nice solution. Both method works gread. I could not tell the speed difference between the two solutions. My data size is 1048577 lines. I did not try the second solution from Chris since it is too slow as Chris stated. Frank > Date: Thu, 2 Oct 2008 17:43:37 +0200> From: orionbelt2 at gmail.com> To: numpy-discussion at scipy.org> CC: orionbelt2 at gmail.com> Subject: Re: [Numpy-discussion] Help to process a large data file> > Frank,> > I would imagine that you cannot get a much better performance in python > than this, which avoids string conversions:> > c = []> count = 0> for line in open('foo'):> if line == '1 1\n':> c.append(count)> count = 0> else:> if '1' in line: count += 1> > One could do some numpy trick like:> > a = np.loadtxt('foo',dtype=int)> a = np.sum(a,axis=1) # Add the two columns horizontally> b = np.where(a==2)[0] # Find with sum == 2 (1 + 1)> count = []> for i,j in zip(b[:-1],b[1:]):> count.append( a[i+1:j].sum() ) # Calculate number of lines with 1> > but on my machine the numpy version takes about 20 sec for a 'foo' file > of 2,500,000 lines versus 1.2 sec for the pure python version...> > As a side note, if i replace "line == '1 1\n'" with "line.startswith('1 > 1')", the pure python version goes up to 1.8 sec... Isn't this a bit > weird, i'd think startswith() should be faster...> > Chris> > On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote:> > > Hi,> > > > I have a large data file which contains 2 columns of data. The two > > columns only have zero and one. Now I want to cound how many one in > > between if both columns are one. For example, if my data is:> > > > 1 0> > 0 0> > 1 1> > 0 0> > 0 1 x> > 0 1 x> > 0 0> > 0 1 x> > 1 1> > 0 0> > 0 1 x> > 0 1 x> > 1 1> > > > Then my count will be 3 and 2 (the numbers with x).> > > > Are there an efficient way to do this? My data file is pretty big.> > > > Thanks> > > > Frank> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ See how Windows connects the people, information, and fun that are part of your life. http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at enthought.com Thu Oct 2 15:27:12 2008 From: travis at enthought.com (Travis Vaught) Date: Thu, 2 Oct 2008 14:27:12 -0500 Subject: [Numpy-discussion] Texas Python Regional Unconference Reminders In-Reply-To: References: <0107962E-D762-497B-BCEC-24CBD78B381B@enthought.com> Message-ID: <97CF029B-DE73-47A7-8320-044F6AA6E082@enthought.com> Hey Steve, I'll bring my camera and try to recruit a volunteer. No guarantees, but we should at least be able to record things (any volunteers to transcode a pile of scipy videos? ;-) ). Best, Travis On Oct 1, 2008, at 7:56 PM, Steve Lianoglou wrote: > Hi, > > Are there any plans to tape the presentations? Unfortunately some of > us can't make it down to Texas, but the talks look quite interesting. > > Thanks, > -steve > > On Oct 1, 2008, at 10:36 AM, Travis Vaught wrote: > >> Greetings, >> >> The Texas Python Regional Unconference is coming up this weekend >> (October 4-5) and I wanted to send out some more details of the >> meeting. The web page for the meeting is here: >> >> http://www.scipy.org/TXUncon2008 >> >> The meeting is _absolutely free_, so please add yourself to the >> Attendees page if you're able to make it. Also, if you're planning >> to >> attend, please send me the following information (to travis at enthought.com >> ) so I can request wireless access for you during the meeting: >> >> - Full Name >> - Phone or email >> - Address >> - Affiliation >> >> There are still opportunities to present your pet projects at the >> meeting, so feel free to sign up on the presentation schedule here: >> >> http://www.scipy.org/TXUncon2008Schedule >> >> For those who are in town Friday evening, we're planning to get >> together for a casual dinner in downtown Austin that night. We'll >> meet at Enthought offices (http://www.enthought.com/contact/map-directions.php >> ) and walk to a casual restaurant nearby. Show up as early as 5:30pm >> and you can hang out and tour the Enthought offices--we'll head out >> to >> eat at 7:00pm sharp. >> >> Best, >> >> Travis >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Thu Oct 2 15:40:32 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2008 14:40:32 -0500 Subject: [Numpy-discussion] nan, sign, and all that In-Reply-To: References: <9457e7c80810020037s13020d14p8bc6ff67726af845@mail.gmail.com> <3d375d730810020042i5a20da38sc3f65bef813fefcf@mail.gmail.com> Message-ID: <3d375d730810021240n111a1014sc62f883d48ec8e41@mail.gmail.com> On Thu, Oct 2, 2008 at 08:22, Charles R Harris wrote: > > On Thu, Oct 2, 2008 at 1:42 AM, Robert Kern wrote: >> >> On Thu, Oct 2, 2008 at 02:37, St?fan van der Walt >> wrote: >> > Hi Charles, >> > >> > 2008/10/2 Charles R Harris : >> >> In [3]: a = array([NAN, 0, NAN, 1]) >> >> In [4]: b = array([0, NAN, NAN, 0]) >> >> >> >> In [5]: fmax(a,b) >> >> Out[5]: array([ 0., 0., NaN, 1.]) >> >> >> >> In [6]: fmin(a,b) >> >> Out[6]: array([ 0., 0., NaN, 0.]) >> > >> > These are great, many thanks! >> > >> > My only gripe is that they have the same NaN-handling as amin and >> > friends, which I consider to be broken. >> >> No, these follow well-defined C99 semantics of the fmin() and fmax() >> functions in libm. If exactly one of the arguments is a NaN, the >> non-NaN argument is returned. This is *not* the current behavior of >> amin() et al., which just do naive comparisons. >> >> > Others also mentioned that >> > this should be changed, and I think David C wrote a patch for it (but >> > I am not informed as to the speed implications). >> > >> > If I had to choose, this would be my preferred output: >> > >> > In [5]: fmax(a,b) >> > Out[5]: array([ NaN, NaN, NaN, 1.]) >> >> Chuck proposes letting minimum() and maximum() have that behavior. > > Yes. If there is any agreement on this I would like to go ahead and do it. > It does change the current behavior of maximum and minimum. I think the position we've held is that in the presence of NaNs, the behavior of these functions have been left unspecified, so I think it is okay to change them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From bolme1234 at comcast.net Thu Oct 2 15:40:07 2008 From: bolme1234 at comcast.net (David Bolme) Date: Thu, 2 Oct 2008 13:40:07 -0600 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> Message-ID: <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net> I also like the idea of a scipy.spatial library. For the research I do in machine learning and computer vision we are often interested in specifying different distance measures. It would be nice to have a way to specify the distance measure. I would like to see a standard set included: City Block, Euclidean, Correlation, etc as well as a capability for a user defined distance or similarity function. From matthieu.brucher at gmail.com Thu Oct 2 16:01:51 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 2 Oct 2008 22:01:51 +0200 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net> References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net> Message-ID: 2008/10/2 David Bolme : > I also like the idea of a scipy.spatial library. For the research I > do in machine learning and computer vision we are often interested in > specifying different distance measures. It would be nice to have a > way to specify the distance measure. I would like to see a standard > set included: City Block, Euclidean, Correlation, etc as well as a > capability for a user defined distance or similarity function. You mean similarity or dissimilarity ? Distance is a dissimilarity but correlation is a similarity measure. Matthieu -- French PhD student Information System Engineer Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From dblubaugh at belcan.com Thu Oct 2 16:20:38 2008 From: dblubaugh at belcan.com (Blubaugh, David A.) Date: Thu, 2 Oct 2008 16:20:38 -0400 Subject: [Numpy-discussion] f2py IS NOW WORKING In-Reply-To: <27CC3060AF71DA40A5DC85F7D5B70F380521CBBE@AWMAIL04.belcan.com> References: <27CC3060AF71DA40A5DC85F7D5B70F380521CBBE@AWMAIL04.belcan.com> Message-ID: <27CC3060AF71DA40A5DC85F7D5B70F38054B6973@AWMAIL04.belcan.com> To all, I have now been able to develop a stable file via f2py!! However, I had to execute the following: 1.) First, I had to copy all required library files from my selected Compaq visual Fortran compiler under python's scripts directory along with f2py itself. 2.) I also had to include a dll from my compiler under python's dll directory as well. I know that the reason as to why I needed to execute these actions, is that I do not know as to what should be my correct environmental variables within windows XP running Compaq Visual Fortran 6.6. Once again, I would appreciate to know as to what are the correct environmental variables should I set my windows xp under, given that the compiler I must utilize is a Compaq Visual Fortran Compiler 6.6??? Thanks, David Blubaugh This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. From Chris.Barker at noaa.gov Thu Oct 2 17:45:20 2008 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 02 Oct 2008 17:45:20 -0400 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: References: Message-ID: <48E540F0.2080102@noaa.gov> Jarrod Millman wrote: > The 1.2.0rc2 is now available: > http://svn.scipy.org/svn/numpy/tags/1.2.0rc2 what's the status of this? > Here are the Window's binaries: > http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe this appears to be a dead link. thanks, -Chris From robert.kern at gmail.com Thu Oct 2 17:52:26 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2008 16:52:26 -0500 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: <48E540F0.2080102@noaa.gov> References: <48E540F0.2080102@noaa.gov> Message-ID: <3d375d730810021452o18efc1e2l9636ee35f14c11dc@mail.gmail.com> On Thu, Oct 2, 2008 at 16:45, Chris Barker wrote: > Jarrod Millman wrote: >> The 1.2.0rc2 is now available: >> http://svn.scipy.org/svn/numpy/tags/1.2.0rc2 > > what's the status of this? Superceded by the 1.2.0 release. See the thread "ANN: NumPy 1.2.0". >> Here are the Window's binaries: >> http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0rc2-win32-superpack-python2.5.exe > > this appears to be a dead link. Superceded by http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From bolme1234 at comcast.net Thu Oct 2 17:17:13 2008 From: bolme1234 at comcast.net (David Bolme) Date: Thu, 2 Oct 2008 15:17:13 -0600 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net> Message-ID: It may be useful to have an interface that handles both cases: similarity and dissimilarity. Often I have seen "Nearest Neighbor" algorithms that look for maximum similarity instead of minimum distance. In my field (biometrics) we often deal with very specialized distance or similarity measures. I would like to see support for user defined distance and similarity functions. It should be easy to implement by passing a function object to the KNN class. I am not sure if kd-trees or other fast algorithms are compatible with similarities or non-euclidian norms, however I would be willing to implement an exhaustive search KNN that would support user defined functions. On Oct 2, 2008, at 2:01 PM, Matthieu Brucher wrote: > 2008/10/2 David Bolme : >> I also like the idea of a scipy.spatial library. For the research I >> do in machine learning and computer vision we are often interested in >> specifying different distance measures. It would be nice to have a >> way to specify the distance measure. I would like to see a standard >> set included: City Block, Euclidean, Correlation, etc as well as a >> capability for a user defined distance or similarity function. > > You mean similarity or dissimilarity ? Distance is a dissimilarity but > correlation is a similarity measure. > > Matthieu > -- > French PhD student > Information System Engineer > Website: http://matthieu-brucher.developpez.com/ > Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn: http://www.linkedin.com/in/matthieubrucher > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From Chris.Barker at noaa.gov Thu Oct 2 19:29:35 2008 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 02 Oct 2008 19:29:35 -0400 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: <3d375d730810021452o18efc1e2l9636ee35f14c11dc@mail.gmail.com> References: <48E540F0.2080102@noaa.gov> <3d375d730810021452o18efc1e2l9636ee35f14c11dc@mail.gmail.com> Message-ID: <48E5595F.6000304@noaa.gov> Robert Kern wrote: > Superceded by the 1.2.0 release. See the thread "ANN: NumPy 1.2.0". I thought I'd seen that, but when I went to: http://www.scipy.org/Download And I still got 1.1 > Superceded by http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/numpy-1.2.0-win32-superpack-python2.5.exe thanks, -Chris From peridot.faceted at gmail.com Thu Oct 2 21:57:08 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 2 Oct 2008 21:57:08 -0400 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net>

Message-ID: 2008/10/2 David Bolme : > It may be useful to have an interface that handles both cases: > similarity and dissimilarity. Often I have seen "Nearest Neighbor" > algorithms that look for maximum similarity instead of minimum > distance. In my field (biometrics) we often deal with very > specialized distance or similarity measures. I would like to see > support for user defined distance and similarity functions. It should > be easy to implement by passing a function object to the KNN class. I > am not sure if kd-trees or other fast algorithms are compatible with > similarities or non-euclidian norms, however I would be willing to > implement an exhaustive search KNN that would support user defined > functions. kd-trees can only work for distance measures which have certain special properties (in particular, you have to be able to bound them based on coordinate differences). This is just fine for all the Minkowski p-norms (so in particular, Euclidean distance, maximum-coordinate-difference, and Manhattan distance) and in fact the current implementation already supports all of these. I don't think that correlation can be made into such a distance measure - the neighborhoods are the wrong shape. In fact the basic space is projective n-1 space rather than affine n-space, so I think you're going to need some very different algorithm. If you make a metric space out of it - define d(A,B) to be the angle between A and B - then cover trees can serve as a spatial data structure for nearest-neighbor search. Cover trees may be worth implementing, as they're a very generic data structure, suitable for (among other things) low-dimensional data in high-dimensional spaces. Anne From josef.pktd at gmail.com Thu Oct 2 22:14:33 2008 From: josef.pktd at gmail.com (joep) Date: Thu, 2 Oct 2008 19:14:33 -0700 (PDT) Subject: [Numpy-discussion] numpy.random.hypergeometric - strange results In-Reply-To: References: Message-ID: <49eebbd7-3bed-4ad9-a95f-b7fd53c75d1d@k30g2000hse.googlegroups.com> see http://scipy.org/scipy/numpy/ticket/921 I think I found the error http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c {{{ 805 /* this is a correction to HRUA* by Ivan Frohne in rv.py */ 806 if (good > bad) Z = m - Z; }}} Quickly looking at the referenced program, downloaded from: http://pal.ece.iisc.ernet.in/~dhani/frohne/rv.py Notation: alpha = bad, beta = good: {{{ if alpha > beta: # Error in HRUA*, this is correct. z = m - z }}} As you can see, if my interpretation is correct, then line 806 should have good and bad reversed, i.e. {{{ 806 if (bad > good) Z = m - Z; }}} Can you verify this? I never tried to build numpy from source. Josef On Sep 25, 4:18?pm, joep wrote: > In my fuzz testing of scipy stats, I get sometimes a test failure. I > think there is something > wrong with numpy.random.hypergeometric for some cases: > > Josef > > >>> import numpy.random as mtrand > >>> mtrand.hypergeometric(3,17,12,size=10) ? # there are only 3 good balls in urn > > array([16, 17, 16, 16, 15, 16, 17, 16, 17, 16])>>> mtrand.hypergeometric(17,3,12,size=10) ? #negative result > > array([-3, -4, -3, -4, -3, -3, -4, -4, -5, -4]) > > >>> mtrand.hypergeometric(4,3,12,size=10) > >>> np.version.version > > '1.2.0rc2' > > I did not find any clear pattern when trying out different parameter > values: > > >>> mtrand.hypergeometric(10,10,12,size=10) > > array([5, 6, 4, 4, 8, 5, 4, 6, 7, 4])>>> mtrand.hypergeometric(10,10,20,size=10) > > array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])>>> mtrand.hypergeometric(10,10,19,size=10) > > array([10, ?9, ?9, ?9, ?9, ?9, 10, ?9, ?9, ?9])>>> mtrand.hypergeometric(10,10,5,size=10) > > array([3, 5, 2, 2, 1, 2, 2, 4, 3, 1])>>> mtrand.hypergeometric(10,2,5,size=10) > > array([4, 5, 4, 5, 5, 5, 4, 3, 4, 4])>>> mtrand.hypergeometric(2,10,5,size=10) > > array([0, 2, 1, 0, 2, 2, 1, 1, 1, 1]) > > >>> mtrand.hypergeometric(17,3,12,size=10) > > array([-5, -3, -4, -4, -4, -3, -4, -4, -3, -3])>>> mtrand.hypergeometric(3,17,12,size=10) > > array([15, 16, 17, 16, 15, 16, 15, 15, 17, 17])>>> mtrand.hypergeometric(18,3,12,size=10) > > array([-5, -6, -6, -4, -4, -4, -5, -3, -5, -5]) > > >>> mtrand.hypergeometric(18,3,5,size=10) > > array([4, 5, 5, 5, 5, 5, 4, 5, 4, 3])>>> mtrand.hypergeometric(18,3,19,size=10) > > array([1, 1, 2, 1, 1, 1, 1, 3, 1, 1]) > _______________________________________________ > Numpy-discussion mailing list > Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion From millman at berkeley.edu Fri Oct 3 01:04:45 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 2 Oct 2008 22:04:45 -0700 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: <48E5595F.6000304@noaa.gov> References: <48E540F0.2080102@noaa.gov> <3d375d730810021452o18efc1e2l9636ee35f14c11dc@mail.gmail.com> <48E5595F.6000304@noaa.gov> Message-ID: On Thu, Oct 2, 2008 at 4:29 PM, Chris Barker wrote: > Robert Kern wrote: >> Superceded by the 1.2.0 release. See the thread "ANN: NumPy 1.2.0". > > I thought I'd seen that, but when I went to: > > http://www.scipy.org/Download > > And I still got 1.1 I updated the page to point to the sourceforge page. Thanks for catching that. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From josef.pktd at gmail.com Fri Oct 3 01:17:28 2008 From: josef.pktd at gmail.com (joep) Date: Thu, 2 Oct 2008 22:17:28 -0700 (PDT) Subject: [Numpy-discussion] numpy.random.logseries - incorrect convergence for k=1, k=2 In-Reply-To: References: Message-ID: <95144c91-3f49-48e8-898e-d3cfa225a108@p25g2000hsf.googlegroups.com> Filed as http://scipy.org/scipy/numpy/ticket/923 and I think i finally tracked down the source of the incorrect random numbers, a reversed inequality in http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions.c line 871, see my last comment to the trac ticket. Josef On Sep 27, 2:12?pm, joep wrote: > random numbers generated by numpy.random.logseries do not converge to > theoretical distribution: > > for probability paramater pr = 0.8, the random number generator > converges to a > frequency for k=1 at 39.8 %, while the theoretical probability mass is > 49.71 > k=2 is oversampled, other k's look ok > > check frequency of k=1 and k=2 at N = ?1000000 > 0.398406 0.296465 > pmf at k = 1 and k=2 with formula > [ 0.4971 ?0.1988] > > for probability paramater pr = 0.3, the results are not as bad, but > still off: > frequency for k=1 at 82.6 %, while the theoretical probability mass is > 84.11 > > check frequency of k=1 and k=2 at N = ?1000000 > 0.826006 0.141244 > pmf at k = 1 and k=2 with formula > [ 0.8411 ?0.1262] > > below is a quick script for checking this > > Josef > > {{{ > import numpy as np > from scipy import stats > > pr = 0.8 > np.set_printoptions(precision=2, suppress=True) > > # calculation for N=1million takes some time > for N in [1000, 10000, 10000, 1000000]: > ? ? rvsn=np.random.logseries(pr,size=N) > ? ? fr=stats.itemfreq(rvsn) > ? ? pmfs=stats.logser.pmf(fr[:,0],pr)*100 > ? ? print 'log series sample frequency and pmf (in %) with N = ', N > ? ? print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs)) > > np.set_printoptions(precision=4, suppress=True) > > print 'check frequency of k=1 and k=2 at N = ', N > print np.sum(rvsn==1)/float(N), > print np.sum(rvsn==2)/float(N) > > k = np.array([1,2]) > print 'pmf at k = 1 and k=2 with formula' > print -pr**k * 1.0 / k / np.log(1-pr)}}} > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion From faltet at pytables.org Fri Oct 3 03:35:41 2008 From: faltet at pytables.org (Francesc Alted) Date: Fri, 3 Oct 2008 09:35:41 +0200 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: References: <48E5595F.6000304@noaa.gov> Message-ID: <200810030935.41460.faltet@pytables.org> A Friday 03 October 2008, Jarrod Millman escrigu?: > On Thu, Oct 2, 2008 at 4:29 PM, Chris Barker wrote: > > Robert Kern wrote: > >> Superceded by the 1.2.0 release. See the thread "ANN: NumPy > >> 1.2.0". > > > > I thought I'd seen that, but when I went to: > > > > http://www.scipy.org/Download > > > > And I still got 1.1 > > I updated the page to point to the sourceforge page. Thanks for > catching that. It would be nice if you can update the PYPI package index too. Perhaps having a list of places on where to announce NumPy on every release would be handy. Thanks! -- Francesc Alted From michel.dupront at hotmail.fr Fri Oct 3 06:13:26 2008 From: michel.dupront at hotmail.fr (Michel Dupront) Date: Fri, 3 Oct 2008 12:13:26 +0200 Subject: [Numpy-discussion] =?iso-8859-1?q?Probl=E8me_pour_construire_les_?= =?iso-8859-1?q?tests_Numpy-Swig?= Message-ID: Bonjour, Je viens d'installer Numpy. Je suis int?ress? par l'utilisation de swig. Lorsque je tente de construire les tests j'ai cette erreur: """ swig -c++ -python Array.i :9: Error: Macro '%typecheck' expects 1 argument :36: Error: Macro '%typecheck' expects 1 argument :64: Error: Macro '%typecheck' expects 1 argument :92: Error: Macro '%typecheck' expects 1 argument :119: Error: Macro '%typecheck' expects 1 argument :148: Error: Macro '%typecheck' expects 1 argument :177: Error: Macro '%typecheck' expects 1 argument :206: Error: Macro '%typecheck' expects 1 argument :235: Error: Macro '%typecheck' expects 1 argument ....... """ Il semble que ca provienne des directives %numpy_typemaps a la fin du fichier numpy.i: """ /* Concrete instances of the %numpy_typemaps() macro: Each invocation * below applies all of the typemaps above to the specified data type. */ %numpy_typemaps(signed char , NPY_BYTE , int) %numpy_typemaps(unsigned char , NPY_UBYTE , int)*/ %numpy_typemaps(short , NPY_SHORT , int) /*%numpy_typemaps(unsigned short , NPY_USHORT , int) %numpy_typemaps(int , NPY_INT , int) %numpy_typemaps(unsigned int , NPY_UINT , int) %numpy_typemaps(long , NPY_LONG , int) %numpy_typemaps(unsigned long , NPY_ULONG , int) %numpy_typemaps(long long , NPY_LONGLONG , int) %numpy_typemaps(unsigned long long, NPY_ULONGLONG, int) %numpy_typemaps(float , NPY_FLOAT , int) %numpy_typemaps(double , NPY_DOUBLE , int) """ Est ce que quelqu'un a rencontr? ce probl?me ? Merci de bien vouloir m'aider. Amicalement Michel _________________________________________________________________ Email envoy? avec Windows Live Hotmail. Dites adieux aux spam et virus, passez ? Hotmail?! C'est gratuit ! http://www.windowslive.fr/hotmail/default.asp From michel.dupront at hotmail.fr Fri Oct 3 06:21:20 2008 From: michel.dupront at hotmail.fr (Michel Dupront) Date: Fri, 3 Oct 2008 12:21:20 +0200 Subject: [Numpy-discussion] =?iso-8859-1?q?_RE=3A__Probl=E8me_pour_constru?= =?iso-8859-1?q?ire_les_tests_Numpy-Swig?= In-Reply-To: References: Message-ID: Oh sorry I wrote my first email in french Hello, I just installed Numpy. I am interested in using Swig. When I try to build the tests I get the following error message: """ swig -c++ -python Array.i :9: Error: Macro '%typecheck' expects 1 argument :36: Error: Macro '%typecheck' expects 1 argument :64: Error: Macro '%typecheck' expects 1 argument :92: Error: Macro '%typecheck' expects 1 argument :119: Error: Macro '%typecheck' expects 1 argument :148: Error: Macro '%typecheck' expects 1 argument :177: Error: Macro '%typecheck' expects 1 argument :206: Error: Macro '%typecheck' expects 1 argument :235: Error: Macro '%typecheck' expects 1 argument ....... """ It seems that the directive %numpy_typemaps is responsible for this error: """ /* Concrete instances of the %numpy_typemaps() macro: Each invocation * below applies all of the typemaps above to the specified data type. */ %numpy_typemaps(signed char , NPY_BYTE , int) %numpy_typemaps(unsigned char , NPY_UBYTE , int)*/ %numpy_typemaps(short , NPY_SHORT , int) /*%numpy_typemaps(unsigned short , NPY_USHORT , int) %numpy_typemaps(int , NPY_INT , int) %numpy_typemaps(unsigned int , NPY_UINT , int) %numpy_typemaps(long , NPY_LONG , int) %numpy_typemaps(unsigned long , NPY_ULONG , int) %numpy_typemaps(long long , NPY_LONGLONG , int) %numpy_typemaps(unsigned long long, NPY_ULONGLONG, int) %numpy_typemaps(float , NPY_FLOAT , int) %numpy_typemaps(double , NPY_DOUBLE , int) """ Somebody already faced this problem ? Thank you very much for any help. Friendly, Michel ---------------------------------------- > From: michel.dupront at hotmail.fr > To: numpy-discussion at scipy.org > Date: Fri, 3 Oct 2008 12:13:26 +0200 > Subject: [Numpy-discussion] Probl?me pour construire les tests Numpy-Swig > > > > Bonjour, > > Je viens d'installer Numpy. Je suis int?ress? par l'utilisation de swig. > Lorsque je tente de construire les tests j'ai cette erreur: > > """ > swig -c++ -python Array.i > :9: Error: Macro '%typecheck' expects 1 argument > :36: Error: Macro '%typecheck' expects 1 argument > :64: Error: Macro '%typecheck' expects 1 argument > :92: Error: Macro '%typecheck' expects 1 argument > :119: Error: Macro '%typecheck' expects 1 argument > :148: Error: Macro '%typecheck' expects 1 argument > :177: Error: Macro '%typecheck' expects 1 argument > :206: Error: Macro '%typecheck' expects 1 argument > :235: Error: Macro '%typecheck' expects 1 argument > ....... > """ > > Il semble que ca provienne des directives %numpy_typemaps a la fin du fichier numpy.i: > > """ > /* Concrete instances of the %numpy_typemaps() macro: Each invocation > * below applies all of the typemaps above to the specified data type. > */ > %numpy_typemaps(signed char , NPY_BYTE , int) > %numpy_typemaps(unsigned char , NPY_UBYTE , int)*/ > %numpy_typemaps(short , NPY_SHORT , int) > /*%numpy_typemaps(unsigned short , NPY_USHORT , int) > %numpy_typemaps(int , NPY_INT , int) > %numpy_typemaps(unsigned int , NPY_UINT , int) > %numpy_typemaps(long , NPY_LONG , int) > %numpy_typemaps(unsigned long , NPY_ULONG , int) > %numpy_typemaps(long long , NPY_LONGLONG , int) > %numpy_typemaps(unsigned long long, NPY_ULONGLONG, int) > %numpy_typemaps(float , NPY_FLOAT , int) > %numpy_typemaps(double , NPY_DOUBLE , int) > """ > > Est ce que quelqu'un a rencontr? ce probl?me ? > Merci de bien vouloir m'aider. > Amicalement > Michel > _________________________________________________________________ > Email envoy? avec Windows Live Hotmail. Dites adieux aux spam et virus, passez ? Hotmail ! C'est gratuit ! > http://www.windowslive.fr/hotmail/default.asp > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ T?l?phonez gratuitement ? tous vos proches avec Windows Live Messenger? !? T?l?chargez-le maintenant ! http://www.windowslive.fr/messenger/1.asp From bertle at smoerz.org Fri Oct 3 09:14:02 2008 From: bertle at smoerz.org (Roman Bertle) Date: Fri, 3 Oct 2008 15:14:02 +0200 Subject: [Numpy-discussion] choose() broadcasting, and Trac In-Reply-To: <20081003111152.GA7228@smoerz.org> References: <20081003111152.GA7228@smoerz.org> Message-ID: <20081003131402.GA7652@smoerz.org> Hello, I have found something I call a bug in the numpy choose() method and wanted to report it in trac. http://scipy.org/BugReport states that "SciPy and NumPy Developer Pages use the same login/password". However, I (username "smoerz") can log in with my Scipy account at the Scipy Developer Page (http://projects.scipy.org/scipy/scipy/), but not at the Numpy Developer Page (http://projects.scipy.org/scipy/numpy/). Whatever, porting some code from numarray to numpy, I found a regression in the broadcasting of choose(): import numarray, numpy numarray.choose([[0,0,1], [0,0,1]], ([2,2,2], [3,3,3])) array([[2, 2, 3], [2, 2, 3]]) numarray.choose([0,0,1], ([[2,2,2],[2,2,2]], [[3,3,3],[3,3,3]])) array([[2, 2, 3], [2, 2, 3]]) numarray.choose([0,0,1], ([2,2,2], [[3,3,3],[3,3,3]])) array([[2, 2, 3], [2, 2, 3]]) Of these 3 cases, only the first one works for numpy, for the other ones I get: /usr/lib/python2.5/site-packages/numpy/core/fromnumeric.pyc in choose(a, choices, out, mode) 167 choose = a.choose 168 except AttributeError: --> 169 return _wrapit(a, 'choose', choices, out=out, mode=mode) 170 return choose(choices, out=out, mode=mode) 171 /usr/lib/python2.5/site-packages/numpy/core/fromnumeric.pyc in _wrapit(obj, method, *args, **kwds) 35 except AttributeError: 36 wrap = None ---> 37 result = getattr(asarray(obj),method)(*args, **kwds) 38 if wrap: 39 if not isinstance(result, mu.ndarray): ValueError: too many dimensions I consider this as a bad regression from numarray to numpy, because the failing broadcast examples seem to be more important than the working one. Best Regards, Roman From millman at berkeley.edu Fri Oct 3 09:38:02 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Fri, 3 Oct 2008 06:38:02 -0700 Subject: [Numpy-discussion] 1.2.0rc2 tagged! --PLEASE TEST-- In-Reply-To: <200810030935.41460.faltet@pytables.org> References: <48E5595F.6000304@noaa.gov> <200810030935.41460.faltet@pytables.org> Message-ID: On Fri, Oct 3, 2008 at 12:35 AM, Francesc Alted wrote: > It would be nice if you can update the PYPI package index too. Perhaps > having a list of places on where to announce NumPy on every release > would be handy. Done. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From david.huard at gmail.com Fri Oct 3 09:48:52 2008 From: david.huard at gmail.com (David Huard) Date: Fri, 3 Oct 2008 09:48:52 -0400 Subject: [Numpy-discussion] Help to process a large data file In-Reply-To: References: <9457e7c80809240810l6bf656aey70434d893200aa98@mail.gmail.com> <48E14D27.5040604@enthought.com> <3d375d730809291822x2c4fb34dq19aab6e9466e2d64@mail.gmail.com>

<20081002154337.GA20908@ulb.ac.be> Message-ID: <91cf711d0810030648i6199e512t9bd4e806f3720cd4@mail.gmail.com> Frank, On Thu, Oct 2, 2008 at 3:20 PM, frank wang wrote: > > Thans David and Chris for providing the nice solution. > > Glad it helped. > Both method works gread. I could not tell the speed difference between the > two solutions. My data size is 1048577 lines. > I'd be curious to know what happens for larger files (~ 10 M lines). I'd guess Chris solution would be the fastest since it works incrementally and does not load the entire data in memory. If you ever try, I'll be interested to know how it turns out. David > I did not try the second solution from Chris since it is too slow as Chris > stated. > > Frank > > > > Date: Thu, 2 Oct 2008 17:43:37 +0200 > > From: orionbelt2 at gmail.com > > To: numpy-discussion at scipy.org > > CC: orionbelt2 at gmail.com > > Subject: Re: [Numpy-discussion] Help to process a large data file > > > > > Frank, > > > > I would imagine that you cannot get a much better performance in python > > than this, which avoids string conversions: > > > > c = [] > > count = 0 > > for line in open('foo'): > > if line == '1 1\n': > > c.append(count) > > count = 0 > > else: > > if '1' in line: count += 1 > > > > One could do some numpy trick like: > > > > a = np.loadtxt('foo',dtype=int) > > a = np.sum(a,axis=1) # Add the two columns horizontally > > b = np.where(a==2)[0] # Find with sum == 2 (1 + 1) > > count = [] > > for i,j in zip(b[:-1],b[1:]): > > count.append( a[i+1:j].sum() ) # Calculate number of lines with 1 > > > > but on my machine the numpy version takes about 20 sec for a 'foo' file > > of 2,500,000 lines versus 1.2 sec for the pure python version... > > > > As a side note, if i replace "line == '1 1\n'" with "line.startswith('1 > > 1')", the pure python version goes up to 1.8 sec... Isn't this a bit > > weird, i'd think startswith() should be faster... > > > > Chris > > > > On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote: > > > > > Hi, > > > > > > I have a large data file which contains 2 columns of data. The two > > > columns only have zero and one. Now I want to cound how many one in > > > between if both columns are one. For example, if my data is: > > > > > > 1 0 > > > 0 0 > > > 1 1 > > > 0 0 > > > 0 1 x > > > 0 1 x > > > 0 0 > > > 0 1 x > > > 1 1 > > > 0 0 > > > 0 1 x > > > 0 1 x > > > 1 1 > > > > > > Then my count will be 3 and 2 (the numbers with x). > > > > > > Are there an efficient way to do this? My data file is pretty big. > > > > > > Thanks > > > > > > Frank > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > See how Windows connects the people, information, and fun that are part of > your life. See Now > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Fri Oct 3 10:52:13 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 03 Oct 2008 09:52:13 -0500 Subject: [Numpy-discussion] choose() broadcasting, and Trac In-Reply-To: <20081003131402.GA7652@smoerz.org> References: <20081003111152.GA7228@smoerz.org> <20081003131402.GA7652@smoerz.org> Message-ID: <48E6319D.7050702@enthought.com> Roman Bertle wrote: > Hello, > > I have found something I call a bug in the numpy choose() method and > wanted to report it in trac. > Thanks for your report. I'm not sure why you are having trouble with Trac, but I've created a ticket for this problem. -Travis From michel.dupront at hotmail.fr Fri Oct 3 11:21:41 2008 From: michel.dupront at hotmail.fr (Michel Dupront) Date: Fri, 3 Oct 2008 17:21:41 +0200 Subject: [Numpy-discussion] =?iso-8859-1?q?_RE=3A___RE=3A__Probl=E8me_pour?= =?iso-8859-1?q?_construire_les_tests_Numpy-Swig?= In-Reply-To: References:

Message-ID: I was using swig 1.3.24. I installed the last swig version 1.3.36 and now it is working fine ! and it makes me very very happy !!! ---------------------------------------- > From: michel.dupront at hotmail.fr > To: numpy-discussion at scipy.org > Date: Fri, 3 Oct 2008 12:21:20 +0200 > Subject: [Numpy-discussion] RE: Probl?me pour construire les tests Numpy-Swig > > > > Oh sorry I wrote my first email in french > > Hello, > I just installed Numpy. I am interested in using Swig. > When I try to build the tests I get the following error message: > > """ > swig -c++ -python Array.i > :9: Error: Macro '%typecheck' expects 1 argument > :36: Error: Macro '%typecheck' expects 1 argument > :64: Error: Macro '%typecheck' expects 1 argument > :92: Error: Macro '%typecheck' expects 1 argument > :119: Error: Macro '%typecheck' expects 1 argument > :148: Error: Macro '%typecheck' expects 1 argument > :177: Error: Macro '%typecheck' expects 1 argument > :206: Error: Macro '%typecheck' expects 1 argument > :235: Error: Macro '%typecheck' expects 1 argument > ....... > """ > > It seems that the directive %numpy_typemaps is responsible for this error: > """ > /* Concrete instances of the %numpy_typemaps() macro: Each invocation > * below applies all of the typemaps above to the specified data type. > */ > %numpy_typemaps(signed char , NPY_BYTE , int) > %numpy_typemaps(unsigned char , NPY_UBYTE , int)*/ > %numpy_typemaps(short , NPY_SHORT , int) > /*%numpy_typemaps(unsigned short , NPY_USHORT , int) > %numpy_typemaps(int , NPY_INT , int) > %numpy_typemaps(unsigned int , NPY_UINT , int) > %numpy_typemaps(long , NPY_LONG , int) > %numpy_typemaps(unsigned long , NPY_ULONG , int) > %numpy_typemaps(long long , NPY_LONGLONG , int) > %numpy_typemaps(unsigned long long, NPY_ULONGLONG, int) > %numpy_typemaps(float , NPY_FLOAT , int) > %numpy_typemaps(double , NPY_DOUBLE , int) > """ > > Somebody already faced this problem ? > Thank you very much for any help. > Friendly, > Michel > > > ---------------------------------------- > > From: michel.dupront at hotmail.fr > > To: numpy-discussion at scipy.org > > Date: Fri, 3 Oct 2008 12:13:26 +0200 > > Subject: [Numpy-discussion] Probl?me pour construire les tests Numpy-Swig > > > > > > > > Bonjour, > > > > Je viens d'installer Numpy. Je suis int?ress? par l'utilisation de swig. > > Lorsque je tente de construire les tests j'ai cette erreur: > > > > """ > > swig -c++ -python Array.i > > :9: Error: Macro '%typecheck' expects 1 argument > > :36: Error: Macro '%typecheck' expects 1 argument > > :64: Error: Macro '%typecheck' expects 1 argument > > :92: Error: Macro '%typecheck' expects 1 argument > > :119: Error: Macro '%typecheck' expects 1 argument > > :148: Error: Macro '%typecheck' expects 1 argument > > :177: Error: Macro '%typecheck' expects 1 argument > > :206: Error: Macro '%typecheck' expects 1 argument > > :235: Error: Macro '%typecheck' expects 1 argument > > ....... > > """ > > > > Il semble que ca provienne des directives %numpy_typemaps a la fin du fichier numpy.i: > > > > """ > > /* Concrete instances of the %numpy_typemaps() macro: Each invocation > > * below applies all of the typemaps above to the specified data type. > > */ > > %numpy_typemaps(signed char , NPY_BYTE , int) > > %numpy_typemaps(unsigned char , NPY_UBYTE , int)*/ > > %numpy_typemaps(short , NPY_SHORT , int) > > /*%numpy_typemaps(unsigned short , NPY_USHORT , int) > > %numpy_typemaps(int , NPY_INT , int) > > %numpy_typemaps(unsigned int , NPY_UINT , int) > > %numpy_typemaps(long , NPY_LONG , int) > > %numpy_typemaps(unsigned long , NPY_ULONG , int) > > %numpy_typemaps(long long , NPY_LONGLONG , int) > > %numpy_typemaps(unsigned long long, NPY_ULONGLONG, int) > > %numpy_typemaps(float , NPY_FLOAT , int) > > %numpy_typemaps(double , NPY_DOUBLE , int) > > """ > > > > Est ce que quelqu'un a rencontr? ce probl?me ? > > Merci de bien vouloir m'aider. > > Amicalement > > Michel > > _________________________________________________________________ > > Email envoy? avec Windows Live Hotmail. Dites adieux aux spam et virus, passez ? Hotmail ! C'est gratuit ! > > http://www.windowslive.fr/hotmail/default.asp > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > _________________________________________________________________ > T?l?phonez gratuitement ? tous vos proches avec Windows Live Messenger ! T?l?chargez-le maintenant ! > http://www.windowslive.fr/messenger/1.asp > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Installez gratuitement les 20 ?m?ticones Windows Live Messenger les plus fous ! Cliquez ici ! http://www.emoticones-messenger.fr/ From kpvincent at hotmail.com Fri Oct 3 12:39:39 2008 From: kpvincent at hotmail.com (Kelly Vincent) Date: Fri, 3 Oct 2008 12:39:39 -0400 Subject: [Numpy-discussion] Array shape Message-ID: I'm using Numpy to do some basic array manipulation, and I'm getting some unexpected behavior from shape. Specifically, I have some 3x3 and 2x2 matrices, and shape gives me (5, 3) and (3, 2) for their respective sizes. I was expecting (3, 3) and (2, 2), for number of rows, number of columns. I'm assuming I must either be misunderstanding what shape gives you or doing something wrong. Can anybody give me any advice? I'm using Python 2.5 and Numpy 1.1.0. Thanks, Kelly -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Fri Oct 3 12:44:23 2008 From: rmay31 at gmail.com (Ryan May) Date: Fri, 03 Oct 2008 11:44:23 -0500 Subject: [Numpy-discussion] Array shape In-Reply-To: References: Message-ID: <48E64BE7.1010800@gmail.com> Kelly Vincent wrote: > I'm using Numpy to do some basic array manipulation, and I'm getting > some unexpected behavior from shape. Specifically, I have some 3x3 and > 2x2 matrices, and shape gives me (5, 3) and (3, 2) for their respective > sizes. I was expecting (3, 3) and (2, 2), for number of rows, number of > columns. I'm assuming I must either be misunderstanding what shape gives > you or doing something wrong. Can anybody give me any advice? I'm using > Python 2.5 and Numpy 1.1.0. Can you post a complete, minimal example that shows the problem you have? For an array object A, A.shape should give the shape you're expecting. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From wnbell at gmail.com Fri Oct 3 12:56:19 2008 From: wnbell at gmail.com (Nathan Bell) Date: Fri, 3 Oct 2008 12:56:19 -0400 Subject: [Numpy-discussion] =?iso-8859-1?q?Probl=E8me_pour_construire_les_?= =?iso-8859-1?q?tests_Numpy-Swig?= In-Reply-To: References:

Message-ID: On Fri, Oct 3, 2008 at 11:21 AM, Michel Dupront wrote: > > I was using swig 1.3.24. > I installed the last swig version 1.3.36 and now it is working fine ! > and it makes me very very happy !!! > SWIG often has that effect on people :) -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ Satisfied SWIG customer From gael.varoquaux at normalesup.org Fri Oct 3 12:59:02 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 3 Oct 2008 18:59:02 +0200 Subject: [Numpy-discussion] making numpy.dot faster Message-ID: <20081003165902.GB28423@phare.normalesup.org> I am doing a calculation where one call numpy.dot ends up taking 90% of the time (the array is huge: (61373, 500) ). Any chance I can make this faster? I would believe BLAS/ATLAS would be behind this, but from my quick analysis (ldd on numpy/core/multiarray.so) it doesn't seem so. Have I done something stupid when building numpy (disclaimer: I am on a system I don't know well --Mandriva--, so I could very well have done something stupid). Cheers, Ga?l From charlesr.harris at gmail.com Fri Oct 3 13:14:08 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Oct 2008 11:14:08 -0600 Subject: [Numpy-discussion] making numpy.dot faster In-Reply-To: <20081003165902.GB28423@phare.normalesup.org> References: <20081003165902.GB28423@phare.normalesup.org> Message-ID: On Fri, Oct 3, 2008 at 10:59 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > I am doing a calculation where one call numpy.dot ends up taking 90% of > the time (the array is huge: (61373, 500) ). > > Any chance I can make this faster? I would believe BLAS/ATLAS would be > behind this, but from my quick analysis (ldd on numpy/core/multiarray.so) > it doesn't seem so. Have I done something stupid when building numpy > (disclaimer: I am on a system I don't know well --Mandriva--, so I could > very well have done something stupid). > What does np.__config__.show() show? What exactly are you multiplying? What is the original problem? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwgk at yahoo.com Fri Oct 3 14:49:21 2008 From: rwgk at yahoo.com (Ralf W. Grosse-Kunstleve) Date: Fri, 3 Oct 2008 11:49:21 -0700 (PDT) Subject: [Numpy-discussion] Co-existing Numeric and numpy? Message-ID: <979257.99821.qm@web31403.mail.mud.yahoo.com> Hi, We have some older 3rd party packages that require Numeric (24.2), but like to also use newer 3rd party packages that require a recent numpy. Can Numeric and numpy co-exist in the same process? -- I'm mainly worried about clashes at the C API level. Thanks! Ralf From Chris.Barker at noaa.gov Fri Oct 3 15:12:48 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 03 Oct 2008 12:12:48 -0700 Subject: [Numpy-discussion] Co-existing Numeric and numpy? In-Reply-To: <979257.99821.qm@web31403.mail.mud.yahoo.com> References: <979257.99821.qm@web31403.mail.mud.yahoo.com> Message-ID: <48E66EB0.6080301@noaa.gov> Ralf W. Grosse-Kunstleve wrote: > We have some older 3rd party packages that require Numeric (24.2), but like to also use newer > 3rd party packages that require a recent numpy. Can Numeric and numpy co-exist in the same > process? -- I'm mainly worried about clashes at the C API level. nope, there are no problems. In fact, converting between the two arrays types with asarray() works very efficiently. I sure wish everyone would make the transition, though. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rwgk at yahoo.com Fri Oct 3 15:36:30 2008 From: rwgk at yahoo.com (Ralf W. Grosse-Kunstleve) Date: Fri, 3 Oct 2008 12:36:30 -0700 (PDT) Subject: [Numpy-discussion] Co-existing Numeric and numpy? Message-ID: <519676.31370.qm@web31406.mail.mud.yahoo.com> Thank you very much for the information! There are weird but important reasons why we are stuck with Numeric 24.2 for the foreseeable future. It is extremely valuable that you provide a smooth upgrade path. Ralf ----- Original Message ---- From: Christopher Barker To: Discussion of Numerical Python Sent: Friday, October 3, 2008 12:12:48 PM Subject: Re: [Numpy-discussion] Co-existing Numeric and numpy? Ralf W. Grosse-Kunstleve wrote: > We have some older 3rd party packages that require Numeric (24.2), but like to also use newer > 3rd party packages that require a recent numpy. Can Numeric and numpy co-exist in the same > process? -- I'm mainly worried about clashes at the C API level. nope, there are no problems. In fact, converting between the two arrays types with asarray() works very efficiently. I sure wish everyone would make the transition, though. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Fri Oct 3 17:11:58 2008 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 3 Oct 2008 21:11:58 +0000 (UTC) Subject: [Numpy-discussion] making numpy.dot faster References: <20081003165902.GB28423@phare.normalesup.org> Message-ID: Fri, 03 Oct 2008 18:59:02 +0200, Gael Varoquaux wrote: > I am doing a calculation where one call numpy.dot ends up taking 90% of > the time (the array is huge: (61373, 500) ). > > Any chance I can make this faster? I would believe BLAS/ATLAS would be > behind this, but from my quick analysis (ldd on > numpy/core/multiarray.so) it doesn't seem so. Have I done something > stupid when building numpy (disclaimer: I am on a system I don't know > well --Mandriva--, so I could very well have done something stupid). AFAIK, multiarray.so is never linked against ATLAS. The accelerated dot implementation is in _dotblas.so, and can be toggled with alterdot/ restoredot (but the ATLAS one should be active by default). >>> numpy.dot.__module__ 'numpy.core._dotblas' Are your arrays appropriately contiguous? Numpy needs to copy the data if they are not; though I'm not sure if this could account for what you see. -- Pauli Virtanen From bolme1234 at comcast.net Fri Oct 3 20:25:25 2008 From: bolme1234 at comcast.net (David Bolme) Date: Fri, 3 Oct 2008 18:25:25 -0600 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net>

Message-ID: <26EB3B10-1BC7-438A-8C1A-6C41A227D6F6@comcast.net> I remember reading a paper or book that stated that for data that has been normalized correlation and Euclidean are equivalent and will produce the same knn results. To this end I spent a couple hours this afternoon doing the math. This document is the result. http://www.cs.colostate.edu/~bolme/bolme08euclidean.pdf I believe that with mean subtracted and unit length vectors, a Euclidean knn algorithm will produces the same result as if the vectors were compared using correlation. I am not sure if kd-trees will perform well on the normalized vectors as they have a very specific geometry. If my math checks out it may be worth adding Pearson's correlation as a default option or as a separate class. I have also spent a little time looking at kd-trees and the kdtree code. It looks good. As I understand it kd-trees only work well when the number of datapoints (N) is larger than 2^D, where D is the dimensionality of those points. This will not work well for many of my computer vision problems because often D is large. As Anne suggested I will probably look at cover trees because often times the data are "low-dimensional data in high-dimensional spaces". I have been told though that with a large D there is know known fast algorithm for knn. Another problem is that the distances and similarity measures used in biometrics and computer vision are often very specialized and may or may not conform to the underlying assumptions of fast algorithms. I think for this reason I will need an exhaustive search algorithm. I will code it up modeled after Anne's interface and hopefully it will make it into the spatial module. I think that kd-trees and the spatial module are a good contribution to scipy. I have also enjoyed learning more about norms, correlation, and fast knn algorithms. Thanks, Dave From peridot.faceted at gmail.com Fri Oct 3 23:03:08 2008 From: peridot.faceted at gmail.com (Anne Archibald) Date: Fri, 3 Oct 2008 23:03:08 -0400 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: <26EB3B10-1BC7-438A-8C1A-6C41A227D6F6@comcast.net> References: <48E1A6A2.4050406@noaa.gov> <320fb6e00809300129j1b45ee21k6756b4c2b1945d84@mail.gmail.com> <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net>

<26EB3B10-1BC7-438A-8C1A-6C41A227D6F6@comcast.net> Message-ID: <7F4C2565-D598-4733-9392-64EBD054D5FA@tamu.edu> My version of a wrapper of ANN is attached. I wrote it when I had some issues installing the scikits.ann package. It uses ctypes, and might be useful in deciding on an API. Please feel free to take what you like, -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: ann.tar.gz Type: application/x-gzip Size: 6640 bytes Desc: not available URL: -------------- next part -------------- ---- Rob Hetland, Associate Professor Dept. of Oceanography, Texas A&M University http://pong.tamu.edu/~rob phone: 979-458-0096, fax: 979-845-6331 From hetland at tamu.edu Wed Oct 8 15:11:14 2008 From: hetland at tamu.edu (Rob Hetland) Date: Wed, 8 Oct 2008 14:11:14 -0500 Subject: [Numpy-discussion] Proposal: scipy.spatial In-Reply-To: References: <2b1c8c4f0810010540t7167281ev4d5e952093811250@mail.gmail.com> <5472729D-2D1B-491C-8DFA-C578AF61A940@comcast.net>

<26EB3B10-1BC7-438A-8C1A-6C41A227D6F6@comcast.net> <7F4C2565-D598-4733-9392-64EBD054D5FA@tamu.edu> Message-ID: On Oct 8, 2008, at 1:36 PM, Anne Archibald wrote: > How important did you find the ability to select priority versus > non-priority search? > > How important did you find the ability to select other splitting > rules? In both cases, I included those things just because they were options in ANN, rather than any real need on my part. The things that I found important (and lacking in some of the other many kd_tree implementations) were the ability to select an arbitrary number of nearest neighbors returned and specifying a max search radius. I can see that having a choice of splitting rules may be handy for unusual point distributions, but I have always used this on fairly regularly spaced data. -Rob ---- Rob Hetland, Associate Professor Dept. of Oceanography, Texas A&M University http://pong.tamu.edu/~rob phone: 979-458-0096, fax: 979-845-6331 From zachary.pincus at yale.edu Wed Oct 8 21:00:16 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 8 Oct 2008 21:00:16 -0400 Subject: [Numpy-discussion] 2D (or n-d) fancy indexing? Message-ID: Hello all, I'm doing something silly with images and am unable to figure out the right way to express this with "fancy indexing" -- or anything other than a brute for-loop for that matter. The basic gist is that I have an array representing n images, of shape (n, x, y). I also have a "map" of shape (x, y), which contains indices in the range [0, n-1]. I then want to construct the "composite" image, of shape (x, y), with pixels selected from the n source images as per the indices in the map, i.e.: composite[x, y] = images[map[x, y], x, y] for all (x, y). Now, I can't figure out if there's an easy way to express this in numpy. For that matter, I can't even figure out a simple way to do the 1D version of the same: composite[i] = images[map[i], i] where composite and map have shape (m,), and images has shape (n, m). Can anyone assist? Surely there's something simple that I'm just not seeing. Thanks, Zach From robert.kern at gmail.com Wed Oct 8 21:25:55 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 8 Oct 2008 20:25:55 -0500 Subject: [Numpy-discussion] 2D (or n-d) fancy indexing? In-Reply-To: References: Message-ID: <3d375d730810081825x75ca88c1n4efe0c1fe9a73027@mail.gmail.com> On Wed, Oct 8, 2008 at 20:00, Zachary Pincus wrote: > Hello all, > > I'm doing something silly with images and am unable to figure out the > right way to express this with "fancy indexing" -- or anything other > than a brute for-loop for that matter. > > The basic gist is that I have an array representing n images, of shape > (n, x, y). I also have a "map" of shape (x, y), which contains indices > in the range [0, n-1]. I then want to construct the "composite" image, > of shape (x, y), with pixels selected from the n source images as per > the indices in the map, i.e.: > > composite[x, y] = images[map[x, y], x, y] > for all (x, y). > > Now, I can't figure out if there's an easy way to express this in > numpy. For that matter, I can't even figure out a simple way to do the > 1D version of the same: > > composite[i] = images[map[i], i] > where composite and map have shape (m,), and images has shape (n, m). > > Can anyone assist? Surely there's something simple that I'm just not > seeing. You need to give an array for each axis. Each of these arrays will be broadcast against each other to form three arrays of the desired shape of composite. This is discussed in the manual here: http://mentat.za.net/numpy/refguide/indexing.xhtml#indexing-multi-dimensional-arrays Conceptually, you need arrays A, B, and C such that composite[x,y] == images[A[x,y], B[x,y], C[x,y]] for all x,y You already have A, you just need to construct B and C correctly. Here is an example: In [26]: from numpy import * In [27]: Nx = 480 In [28]: Ny = 640 In [29]: N = 100 In [30]: images = random.randint(0, 1000, size=(N, Nx, Ny)) In [31]: map = random.randint(0, N, size=(Nx, Ny)) In [32]: composite = images[map, arange(Nx)[:,newaxis], arange(Ny)] In [33]: for x in range(Nx): ....: for y in range(Ny): ....: assert composite[x,y] == images[map[x,y],x,y] ....: ....: In [34]: When arange(Nx)[:,newaxis] and arange(Ny) get broadcasted with map, you get (480,640) arrays like you would get with mgrid[0:Nx,0:Ny]. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From zachary.pincus at yale.edu Wed Oct 8 21:44:15 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 8 Oct 2008 21:44:15 -0400 Subject: [Numpy-discussion] 2D (or n-d) fancy indexing? In-Reply-To: <3d375d730810081825x75ca88c1n4efe0c1fe9a73027@mail.gmail.com> References: <3d375d730810081825x75ca88c1n4efe0c1fe9a73027@mail.gmail.com> Message-ID: <19CF3A93-8B73-4E92-97B8-7C76A1596C8E@yale.edu> > You need to give an array for each axis. Each of these arrays will be > broadcast against each other to form three arrays of the desired shape > of composite. This is discussed in the manual here: > > http://mentat.za.net/numpy/refguide/indexing.xhtml#indexing-multi-dimensional-arrays > > Conceptually, you need arrays A, B, and C such that > > composite[x,y] == images[A[x,y], B[x,y], C[x,y]] > for all x,y Aha -- thanks especially for the clear illustration of what B and C need to be. That really helps. > In [32]: composite = images[map, arange(Nx)[:,newaxis], arange(Ny)] > > When arange(Nx)[:,newaxis] and arange(Ny) get broadcasted with map, > you get (480,640) arrays like you would get with mgrid[0:Nx,0:Ny]. That's very handy indeed. Thanks for your help! Zach From stefan at sun.ac.za Thu Oct 9 03:53:52 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 9 Oct 2008 09:53:52 +0200 Subject: [Numpy-discussion] 2D (or n-d) fancy indexing? In-Reply-To: <19CF3A93-8B73-4E92-97B8-7C76A1596C8E@yale.edu> References: <3d375d730810081825x75ca88c1n4efe0c1fe9a73027@mail.gmail.com> <19CF3A93-8B73-4E92-97B8-7C76A1596C8E@yale.edu> Message-ID: <9457e7c80810090053u61b4886cu2e1342b5f576fe61@mail.gmail.com> Hi Zach 2008/10/9 Zachary Pincus : >> Conceptually, you need arrays A, B, and C such that >> >> composite[x,y] == images[A[x,y], B[x,y], C[x,y]] >> for all x,y > > Aha -- thanks especially for the clear illustration of what B and C > need to be. That really helps. I also summarised some of the posts on this topic between Jack Cooke and Robert Kern in my SciPy'08 slides: http://mentat.za.net/numpy/numpy_advanced_slides/ I don't know if they're much use without the dialogue. Or maybe they're better :) Cheers St?fan From ndbecker2 at gmail.com Thu Oct 9 08:04:52 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 09 Oct 2008 08:04:52 -0400 Subject: [Numpy-discussion] Apply a vector function to each row of a matrix Message-ID: Suppose I have a function (I wrote in c++) that accepts a numpy 1-d vector. What is the recommended way to apply it to each row of a matrix, returning a new matrix result? (Assume the function has signature newvec = f (oldvec)) From david.huard at gmail.com Thu Oct 9 09:24:45 2008 From: david.huard at gmail.com (David Huard) Date: Thu, 9 Oct 2008 09:24:45 -0400 Subject: [Numpy-discussion] Apply a vector function to each row of a matrix In-Reply-To: References: Message-ID: <91cf711d0810090624s1c465992l6dd922cfaa7c67c7@mail.gmail.com> Neal, Look at: apply_along_axis David On Thu, Oct 9, 2008 at 8:04 AM, Neal Becker wrote: > Suppose I have a function (I wrote in c++) that accepts a numpy 1-d vector. > What is the recommended way to apply it to each row of a matrix, returning > a new matrix result? (Assume the function has signature newvec = f > (oldvec)) > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu Oct 9 09:40:06 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 09 Oct 2008 09:40:06 -0400 Subject: [Numpy-discussion] Apply a vector function to each row of a matrix References: